CS 5785 COMBINED-XLIST Applied Machine Learning (2024FA)
Course Description
Provides a broad overview of key concepts across machine learning with a focus on applications. Introduces supervised and unsupervised algorithms including logistic regression, support vector machines, neural networks, Gaussian mixture models, as well as other methods for classification, regression, clustering, and dimensionality reduction. Covers foundational concepts such as maximum likelihood estimation, overfitting, regularization, generative models, latent variables, and non-parametric methods. Applications include data analysis on images, text, time series, and other types of data using modern software tools such as numpy, scikit-learn, and keras.
Course Materials
All the lecture materials and executable notebooks will be hosted online:
- Lecture slides and notes (in Jupyter format) are available on Github here: https://github.com/kuleshov/cornell-cs5785-2024-applied-ml
- Lecture notes (compiled into an HTML book) are available here: https://kuleshov-group.github.io/aml-book/intro.html
- These will be receiving a thorough editing pass as the course progresses
- Pre-recorded lecture videos are available on Youtube.
General Information
Instructor: Volodymyr Kuleshov (Cornell Tech) & Brandon Amos (Meta AI Research)
Times: Mon / Wed 2:55pm - 4:10pm Eastern Time.
Location: Bloomberg Center 131.
Attendance policy: Instruction is in-person. Lectures will normally not be streamed over Zoom.
Complete syllabus: https://classes.cornell.edu/browse/roster/FA24/subject/CS
Teaching Staff and Office Hours
This year, the class will be co-taught with Brandon Amos from Meta AI Research. Certain lectures will be given by Volodymyr, and others by Brandon. In October, Brandon will also deliver a block of 4-5 lectures on deep learning, including advanced topics like language modeling, leveraging his experience at Meta.
Volodymyr Kuleshov and Brandon Amos (Instructors). Office Hours: Mon/Wed 4:10-4:40pm (after class)
Volodymyr and Brandon will be taking questions in the auditorium after the lecture either in Bloomberg 131 or at the tables outside.
Shachi Deshpande (Head TA). Office hours: Monday 9-10 am, common area outside Bloomberg 131, Email: ssd86@cornell.edu
Yen-Yu Chang (TA). Office Hours: Friday 10-11 am, Bloomberg 360. Email: yc2463@cornell.edu
Guanghan Wang (TA). Office Hours: Thursday 4:30-5:30 pm, Bloomberg 360. Email: gw354@cornell.edu
Preparation
Math. Students need to be comfortable with multivariable calculus, primarily integration and differentiation in multiple dimensions. Course will also require a basic understanding of probability at the level of an introductory undergraduate course. Teaching staff will hold review sessions to cover background material.
Programming. Students should have a basic programming ability. Course will use Python and related data science libraries, including numpy, scipy, scikit-learn, and tensorflow or pytorch. Familiarity with these libraries is preferred, but we expect students to be able to learn parts of these libraries during the course. Teaching staff will hold review sessions to cover background material.
Prerequisites. Programming experience (ideally Python; Cornell CS 1110 or equivalent). Linear algebra. (Cornell MATH 2210, MATH 4310 or equivalent). Statistics and probability. (Cornell STSCI 2100 or equivalent)
Textbooks and Other Materials
- Textbooks (Optional)
- T. Hastie, R. Tibshirani and J. Friedman, The Elements of Statistical Learning: Data Mining, Inference, and Prediction (2nd edition), Springer-Verlag, 2008. (available for free)
- K. Murphy, Machine Learning: A Probabilistic Perspective, MIT Press, 2012.
- C. Bishop, Pattern Recognition and Machine Learning, Springer, 2006.
- Lecture notes, slides, and videos available via Github
- A list of probability and linear algebra resources link
Prelim (October 30, in class)
We will have an in-class prelim on October 30, 2024.
Grading
Homework 1 |
Released: 9/4. Due: 9/18. |
10% |
Homework 2 |
Released: 9/23. Due: 10/7. |
10% |
Homework 3 |
Released: 10/9. Due: 10/23. |
10% |
Homework 4 |
Released: 11/11. Due: 11/25. |
10% |
Prelim |
In class on 10/30 |
15% |
Project Proposal |
Brief description of the planned project, around 300 words. Due 9/23 |
5% |
Project Milestone |
Mid-semester progress report on course project, 3-5 pages in length. Due 11/11 |
15% |
Final Project |
Final report on the course project, 5 pages in length. Due 12/14. |
25% |
Total Points |
100% |
Basis of grade determination:
Grade |
Percent |
A+ |
98-100 |
A |
93-97 |
A- |
90-92 |
B+ |
88-89 |
B |
83-87 |
B- |
80-82 |
C+ |
78-79 |
C |
73-77 |
C- |
70-72 |
D |
60-69 |
F |
<60 |
Assignments
Written Assignments: You are encouraged to use LaTeX to writeup your homework. Assignments will be submitted on Gradescope. If you have not been enrolled in this course on Gradescope, you can use entry code 3RD7ZB to enroll. You may work in teams of two: make sure to put both of your names on the submission and submit as a team in Gradescope.
Late Submissions: You have 6 late days for assignments and project related submissions which you can use at any time during the term without penalty, with maximum 2 late days per submission (i.e. you cannot use ≥3 late days for any assignment), but no late days for the final project writeup. Once you run out of late days, you will incur in a 20% penalty for each extra late day you use. When submitting as a team, using late days will deduct the remaining quota of all members of the team. Each late submission should be clearly marked as “Late” on the first page. No submission will be accepted ≥3 days after the deadline.
Course Project [Due 12/14]
The course project will give the students a chance to explore machine learning in greater detail. Course projects will be done in groups of up to 3 students and can fall into one or more of the following categories:
- Application of machine learning to a practical problem of your choice.
- Improvements to machine learning algorithms.
- Competing on a machine learning benchmark.
- Theoretical analysis of any aspect of machine learning models.
Pick a topic that's meaningful to you and that excites you. For example, if you do PhD research in biology, you can do a project related a research problem that you're working on. If you're in Urban Tech, you can work with a city dataset that you find interesting. You are encouraged to find something on your own, but we are also going to share topic ideas in Canvas and you should feel free to talk to the teaching team during office hours.
Proposal
Your proposal should give the title of the project, the project category, the names of your team members, their NetID, and a 300-500 word description of what you plan to do. It should contain the following information.
- Motivation: What problem are you tackling? Is this an application or a theoretical result?
- Method: What machine learning techniques are you planning to apply or improve upon and how?
- Experiments: What experiments are you planning to perform (or what theorems do you want to prove)?
The goal of the proposal is make sure you're on the right track. As long as you follow the above guidelines, you should do well.
Please submit the proposal via Gradescope and make sure to submit as a team.
Milestone
The milestone submission should describe what you've accomplished so far, and briefly say what else you plan to do. The format should be the same as of the final project, with a maximum length of 3 pages (excluding references). The goal is to make sure that you are on track to finish the final project.
- Motivation: What problem are you tackling? Is this an application or a theoretical result?
- Method: What machine learning techniques are you planning to apply or improve upon and how?
- Preliminary experiments: Describe the experiments that you've run, the outcomes, and any error analysis that you've done. You should have tried at least one baseline.
- Future work: What else do you plan to do?
The goal of the milestone is make sure you're on the right track. As long as you follow the above guidelines, you should do well.
Please submit the milestone via Gradescope and make sure to submit as a team.
Final Writeup (Due Dec 14 at 11:59pm ET -- no late days!)
The final writeup should describe all the work you did for your course project and summarize the main results. You can think of it as a technical report that presents your findings to a general machine learning audience.
The style and format of the writeup should be similar to that of a research paper. The maximum length is 5 pages, excluding references. We provide a Latex template adapted from the NeurIPS style files for your reference (Go to Files->Project).
There are no strict requirements on the structure of the final writeup, but one way to structuring it would be include the following sections, which are fairly standard for a research paper.
- Abstract: Summarize the problem, novel contributions, and results in one paragraph.
- Introduction: Provide motivation for the problem and expand upon the overview in the abstract.
- Background: Briefly summarize the background knowledge needed to understand the work.
- Method: Describe the methods that will be used or implemented in the paper.
- Theoretical analysis: If you are doing a theory project, describe your theoretical results here.
- Experimental analysis: Describe in detail your experiments.
- Discussion and Prior Work: Discuss the key takeaways from your experiments. Put your results in the context of previous work
- Conclusion. You may summarize the paper or talk about open problems and open directions.
Regardless of how the writeup is structured, please make sure to cover the following points.
- Motivation: What problem are you tackling? Why is it interesting? What type of project will this be (application, method, theory)?
- Method: What machine learning techniques are you planning to apply or improve upon and how? Make sure to describe them in detail and provide enough context for the reader to understand the methods at least at a high level. Provide any background that is necessary for that.
- Experiments: Describe the experiments that you've run, the outcomes, and any error analysis that you've done. Make sure that the setup is described in enough detail for someone else to reproduce your results. Also, if you have an experimental project, make sure to provide a detailed experimental analysis. Things you should consider including are: train/test performance, learning curves, model samples, error analyses, ablation analyses, etc. Most projects should also include baselines.
- Theory: If doing a theory project, state your results formally as theorems. Make sure that all the symbols are defined. Also, the best presentation of theoretical results tends to also explain the results in plain language and conveys the intuition behind them.
- Context: Explain how you build upon previous work and how your results compare to what has been done previously.
Writeups will be evaluated for their presentation clarity, the respect of the above guidelines, the significance of the project (does it explore a toy dataset or a real problem) and the technical quality of the work (the level of depth in the experimental or theoretical analyses, does the approach make sense technically, are the algorithms implemented reasonable and studied in enough detail, etc.).
Please submit the writeup via Gradescope and make sure to submit as a team.
Improving Lecture Notes
We are going to be offering bonus points to students for submitting PRs against our open-source lecture notes repo at https://github.com/kuleshov-group/aml-book
- We are going to have three types of PR: (1) correcting a typo; (2) clarifying something; (3) significant improvements to the lecture notes. A typo changes a few words in the notes to fix a small error. A clarification PR edits approximately 1-5 paragraphs to explain something better. A significant improvement to the lecture notes would edit about 20% or more of a lecture and introduce major changes, including potentially new material.
- When a student submits a PR, they should indicate in the submission link the type of PR as well as their NetID.
- TAs will be responsible for reviewing the PR, and either approving or requesting changes.
- If a PR is approved, students will get bonus points in the course. Correcting three typos or doing one clarification will be worth 0.5 points. A significant improvement will be worth two points. The maximum number of bonus points per students will be two.
Course Summary:
Date | Details | Due |
---|---|---|