CS 5785 COMBINED-XLIST Applied Machine Learning (2022FA)
Course Abstract
Learn and apply key concepts of modeling, analysis and validation from machine learning, data mining and signal processing to analyze and extract meaning from data. Implement algorithms and perform experiments on images, text, audio and mobile sensor measurements. Gain working knowledge of supervised and unsupervised techniques including classification, regression, clustering, feature selection, and dimensionality reduction.
Course Materials
All the lecture slides and executable notebooks will be posted to our Github repo.
The lectures videos will be streamed on Zoom until mid-Fall and recordings will be made available on Canvas under the Zoom tab. Note also that all videos from last year are available online on Youtube.
Feedback Form
During anytime of this course, students can submit their feedback on the course via the link below, the form can be submitted for any number of times throughout the semester so that the teaching team could make prompt adjustments. Any thoughts, comments, advice will be appreciated.
Feedback form link: https://forms.gle/VuKuXQtCiUDsY6cT8
Prerequisites
CS 2800 or equivalent, linear algebra, probability, and experience programming with Python, or permission of the instructor
First Lecture Information
The first lecture is going to be on Tuesday Aug 23 at 1:00pm - 2:15pm ET at Bloomberg Center 131. You can also use the following Zoom link to connect to the lecture:
Zoom Link: https://cornell.zoom.us/j/97506503316?pwd=TEFWTUFZcTJaNnJza3YwSmhrRkNZdz09
Instruction Format
The class will be held twice a week, on Tuesdays and Thursdays. Instruction will be in person. We will try to accommodate students via Zoom: see the Zoom link below as well as the "Zoom" tab if you need additional information. However, there will be limited support for tuning-in to lecture remotely this year (i.e., the experience may not be as smooth as with in person lectures, and we cannot promise to resolve any technical issues in a timely manner). Therefore, we encourage everyone to be there in person and view Zoom as more of a backup option.
Zoom Link: https://cornell.zoom.us/j/6627526390?pwd=TWhiNytYbVZPbVEzRXpzTmVMbDB0Zz09
Information
Instructor: Volodymyr Kuleshov
Course Frequency: Fall Term
Times: Tues/ Thurs 1:00pm - 2:15pm Eastern Time.
Location: in-person: Bloomberg Center 131
Teaching Staff and Office Hours
Volodymyr Kuleshov (Instructor). Office Hours: Tue 2:15pm-2:45pm (after class) Thu 2:15pm-2:45pm (after class). Volodymyr will be taking questions in the auditorium after the lecture and then move to the tables outside Bloomberg 131. I will try to make myself available via Zoom as best as I can, but this option is not officially supported this semester.
Guandao Yang (Head TA). Email: gy46@cornell.edu Office hours: Monday 10:00 - 11:00 AM (In Person, Bloomberg 360).
Noriyuki Kojima (TA). Email: nk654@cornell.edu Office hours: Friday 10:00 - 11:00 AM ((In Person, Bloomberg 375)
Rui Qian (TA). Email: rq49@cornell.edu Office hours: Wednesdays 12:00 - 1:00 PM (In Person, Bloomberg 367, Starting Aug 31st)
Yair Schiff (TA). Email: yzs2@cornell.edu; Office hours: Thursdays 9:55 - 10:55 AM (In Person: Bloomberg 318; Zoom)
Top Piriyakulkij (TA - remote). Email: wp237@cornell.edu; Office hours: Tuesday 10-11AM (Zoom).
Student Outcomes
- Be able to analyze and extract meaning from data by applying key concepts of modeling, analysis, and validation from Machine Learning, Data Mining, and Signal Processing.
- Implement algorithms and perform experiments on images, text, audio, and other modalities.
- Demonstrate an understanding of modern machine learning algorithms like tree-based models boosting, and deep neural networks.
- Gain working knowledge of supervised and unsupervised techniques and their relevant trade-offs in practical usage.
Preparation
Math. Students need to be comfortable with multivariable calculus, primarily integration and differentiation in multiple dimensions. Course will also require a basic understanding of probability at the level of an introductory undergraduate course. Teaching staff will hold review sessions to cover background material.
Programming. Students should have a basic programming ability. Course will use Python and related data science libraries, including numpy, scipy, scikit-learn, and tensorflow or pytorch. Familiarity with these libraries is preferred, but we expect students to be able to learn parts of these libraries during the course. Teaching staff will hold review sessions to cover background material.
Prerequisites. CS 2800 or equivalent, Linear Algebra, Probability, and experience programming with Python, or permission of the instructor
Textbooks and Other Materials
- Textbooks (Optional)
- T. Hastie, R. Tibshirani and J. Friedman, The Elements of Statistical Learning: Data Mining, Inference, and Prediction (2nd edition), Springer-Verlag, 2008. (available for free)
- K. Murphy, Machine Learning: A Probabilistic Perspective, MIT Press, 2012.
- C. Bishop, Pattern Recognition and Machine Learning, Springer, 2006.
- Lecture Scribe notes available on the website
- A list of probability and linear algebra resources link
Class and Laboratory Schedule
Lectures: 2.5 hrs/wk
Recitations: None required. Optional sessions with graduate or undergraduate TAs
Midterm (October 25th, in class)
We will have an in-class mid-term on November 1st, Tuesday, 2022.
Grading
Homework 1 |
Combination of theory and programming questions |
10% |
Homework 2 |
Combination of theory and programming questions |
10% |
Homework 3 |
Combination of theory and programming questions |
10% |
Homework 4 |
Combination of theory and programming questions |
10% |
Prelim |
A test on the course contents. |
15% |
Project Proposal |
Brief description of the planned project, around 300 words. |
5% |
Project Milestone |
Mid-semester progress report on course project, 3-5 pages in length. |
15% |
Final Project |
Final report on the course project, 5 pages in length. |
25% |
Total Points |
100% |
Basis of grade determination:
Grade |
Percent |
A+ |
98-100 |
A |
93-97 |
A- |
90-92 |
B+ |
88-89 |
B |
83-87 |
B- |
80-82 |
C+ |
78-79 |
C |
73-77 |
C- |
70-72 |
D |
60-69 |
F |
<60 |
Assignments
Written Assignments: Homework should be written up clearly and succinctly; you may lose points if your answers are unclear or unnecessarily complicated. You are encouraged to use LaTeX to writeup your homework, but this is not a requirement. Assignments will be submitted on Gradescope, if you have not been enrolled in this course on Gradescope, you can use entry code V5BVKP to enroll. You may work in teams of two: make sure to put both of your names on the submission and submit as a team in Gradescope.
Late Submissions: You have 6 late days for assignments and project related submissions which you can use at any time during the term without penalty, with maximum 2 late days per submission (i.e. you cannot use ≥3 late days for any assignment), but no late days for the final project writeup. Once you run out of late days, you will incur in a 20% penalty for each extra late day you use. When submitting as a team, using late days will deduct the remaining quota of all members of the team. Each late submission should be clearly marked as “Late” on the first page. No submission will be accepted ≥3 days after the deadline.
Course Project
The course project will give the students a chance to explore machine learning in greater detail. Course projects will be done in groups of up to 3 students and can fall into one or more of the following categories:
- Application of machine learning to a practical problem of your choice.
- Improvements to machine learning algorithms.
- Competing on a machine learning benchmark.
- Theoretical analysis of any aspect of machine learning models.
Pick a topic that's meaningful to you and that excites you. For example, if you do PhD research in biology, you can do a project related a research problem that you're working on. If you're in Urban Tech, you can work with a city dataset that you find interesting. You are encouraged to find something on your own, but we are also going to share topic ideas in Canvas and you should feel free to talk to the teaching team during office hours.
Proposal (Due Date: October 7, 11:59 PM ET)
Your proposal should give the title of the project, the project category, the names of your team members, their NetID, and a 300-500 word description of what you plan to do. It should contain the following information.
- Motivation: What problem are you tackling? Is this an application or a theoretical result?
- Method: What machine learning techniques are you planning to apply or improve upon and how?
- Experiments: What experiments are you planning to perform (or what theorems do you want to prove)?
The goal of the proposal is make sure you're on the right track. As long as you follow the above guidelines, you should do well.
Please submit the proposal via Gradescope and make sure to submit as a team.
Milestone (Due Date: November 13, 11:59 PM ET)
The milestone submission should describe what you've accomplished so far, and briefly say what else you plan to do. The format should be the same as of the final project, with a maximum length of 3 pages (excluding references). The goal is to make sure that you are on track to finish the final project.
- Motivation: What problem are you tackling? Is this an application or a theoretical result?
- Method: What machine learning techniques are you planning to apply or improve upon and how?
- Preliminary experiments: Describe the experiments that you've run, the outcomes, and any error analysis that you've done. You should have tried at least one baseline.
- Future work: What else do you plan to do?
The goal of the milestone is make sure you're on the right track. As long as you follow the above guidelines, you should do well.
Please submit the milestone via Gradescope and make sure to submit as a team.
Final Writeup (Due Dec 12 at 11:59pm ET -- no late days!)
The final writeup should describe all the work you did for your course project and summarize the main results. You can think of it as a technical report that presents your findings to a general machine learning audience.
The style and format of the writeup should be similar to that of a research paper. The maximum length is 5 pages, excluding references. We provide a Latex template adapted from the NeurIPS style files for your reference (Go to Files->Project).
There are no strict requirements on the structure of the final writeup, but one way to structuring it would be include the following sections, which are fairly standard for a research paper.
- Abstract: Summarize the problem, novel contributions, and results in one paragraph.
- Introduction: Provide motivation for the problem and expand upon the overview in the abstract.
- Background: Briefly summarize the background knowledge needed to understand the work.
- Method: Describe the methods that will be used or implemented in the paper.
- Theoretical analysis: If you are doing a theory project, describe your theoretical results here.
- Experimental analysis: Describe in detail your experiments.
- Discussion and Prior Work: Discuss the key takeaways from your experiments. Put your results in the context of previous work
- Conclusion. You may summarize the paper or talk about open problems and open directions.
Regardless of how the writeup is structured, please make sure to cover the following points.
- Motivation: What problem are you tackling? Why is it interesting? What type of project will this be (application, method, theory)?
- Method: What machine learning techniques are you planning to apply or improve upon and how? Make sure to describe them in detail and provide enough context for the reader to understand the methods at least at a high level. Provide any background that is necessary for that.
- Experiments: Describe the experiments that you've run, the outcomes, and any error analysis that you've done. Make sure that the setup is described in enough detail for someone else to reproduce your results. Also, if you have an experimental project, make sure to provide a detailed experimental analysis. Things you should consider including are: train/test performance, learning curves, model samples, error analyses, ablation analyses, etc. Most projects should also include baselines.
- Theory: If doing a theory project, state your results formally as theorems. Make sure that all the symbols are defined. Also, the best presentation of theoretical results tends to also explain the results in plain language and conveys the intuition behind them.
- Context: Explain how you build upon previous work and how your results compare to what has been done previously.
Writeups will be evaluated for their presentation clarity, the respect of the above guidelines, the significance of the project (does it explore a toy dataset or a real problem) and the technical quality of the work (the level of depth in the experimental or theoretical analyses, does the approach make sense technically, are the algorithms implemented reasonable and studied in enough detail, etc.).
Please submit the writeup via Gradescope and make sure to submit as a team.
Collaboration Policy and Honor Code
You are free to form study groups and discuss homeworks and projects. However, you must write up homeworks and code from scratch independently without referring to any notes from the joint session. You should not copy, refer to, or look at the solutions in preparing their answers from previous years’ homeworks. It is an honor code violation to intentionally refer to a previous year’s solutions, either official or written up by another student. Anybody violating the honor code will be referred to the Office of Judicial Affairs.
Academic Integrity:
Each student in this course is expected to abide by the Cornell University Code of Academic Integrity. Any work submitted by a student in this course for academic credit will be the student's own work. The policy can be found on the university’s website here: https://theuniversityfaculty.cornell.edu/academic-integrity/.
You are encouraged to study together and to discuss information and concepts covered in lecture and the sections with other students. You can give "consulting" help to or receive "consulting" help from such students. However, this permissible cooperation should never involve one student having possession of a copy of all or part of work done by someone else, in the form of an e-mail, an e-mail attachment file, a diskette, or a hard copy.
Should copying occur, both the student who copied work from another student and the student who gave material to be copied will both automatically receive a zero for the assignment. Penalty for violation of this Code can also be extended to include failure of the course and University disciplinary action.
During examinations, you must do your own work. Talking or discussion is not permitted during the examinations, nor may you compare papers, copy from others, or collaborate in any way. Any collaborative behavior during the examinations will result in failure of the exam, and may lead to failure of the course and University disciplinary action.
Optional statement about Academic Misconduct:
- Academic Misconduct. A faculty member may impose a grade penalty for any misconduct in the classroom or examination room. Examples of academic misconduct include, but are not limited to, talking during an exam, bringing unauthorized materials into the exam room, and disruptive behavior in the classroom.
Students with Disabilities
Your access in this course is important. Please give me [the TA, the Course Coordinator] your Student Disability Services (SDS) accommodation letter early in the semester so that we have adequate time to arrange your approved academic accommodations. If you need an immediate accommodation for equal access, please speak with me after class or send an email message to me and/or SDS at sds_cu@cornell.edu. If the need arises for additional accommodations during the semester, please contact SDS. You may also feel free to speak with Student Services at Cornell Tech who will connect you with the university SDS office.
Religious Observances
Cornell University is committed to supporting students who wish to practice their religious beliefs. Students are advised to discuss religious absences with their instructors well in advance of the religious holiday so that arrangements for making up work can be resolved before the absence.
Options Statement about our supportive community:
Cornell Tech Cares: The Cornell Tech community is a diverse and vibrant group of students, faculty, and staff. We take our responsibility to look out for one another seriously. As members of this community, your openness and proactive communication will allow us all to better care for students and respond to their needs, whether they be interpersonal or academic. Please help us continue to build and strengthen our community by reaching out if you are having an issue or are concerned about a fellow student. Contact studentwellness@tech.cornell.edu with concerns and we will make sure to care for one another. In the event of an emergency, please call 911 and Cornell Tech Safety & Security at 646-971-3611 (This number is also located on the back of your Cornell ID), when safe to do so.
Key Deadlines
Homework 1 |
September 13th, 11:59 PM ET |
Homework 2 |
September 29th, 11:59 PM ET |
Homework 3 |
October 25th, 11:59 PM ET |
Homework 4 |
TBD |
Prelim |
November 1st, in class |
Project Proposal |
October 7th, 11:59 PM ET |
Project Milestone |
November 13th, 11:59 PM ET |
Final Project |
December 12th, 11:59 PM ET |
Course Summary:
Date | Details | Due |
---|---|---|