CS 5785 COMBINED-XLIST Applied Machine Learning (2021FA)

Course Abstract

Learn and apply key concepts of modeling, analysis and validation from machine learning, data mining and signal processing to analyze and extract meaning from data. Implement algorithms and perform experiments on images, text, audio and mobile sensor measurements. Gain working knowledge of supervised and unsupervised techniques including classification, regression, clustering, feature selection, and dimensionality reduction.

Course Materials

All the lecture slides and executable notebooks will be posted to our Github repo.

The lectures videos will be streamed on Zoom until mid-Fall and recordings will be made available on Canvas under the Zoom tab. Note also that all videos from last year are available online on Youtube.

Feedback Form

During anytime of this course, students can submit their feedback on the course via this link, the form can be submitted for any number of times throughout the semester so that the teaching team could make prompt adjustments. Any thoughts, comments, advice will be appreciated.

Prerequisites

CS 2800 or equivalent, Linear Algebra, Probability, and experience programming with Python, or permission of the instructor

First Lecture Information

The first lecture is going to be on Thursday Aug 26 at 1:00pm - 2:15pm ET at Bloomberg Center 131. You can also use the following Zoom link to connect to the lecture:

 https://cornell.zoom.us/j/95208049944?pwd=aUpKamM5UlphZFQ3cEZyQ3JTc1VPdz09

Instruction Format

The class will be held twice a week, on Tuesdays and Thursdays. Instruction will held be in the hybrid form of online and in-person, zoom link for the online option can be found above.

Information

Instructor: Volodymyr Kuleshov

Credits: 3

Course Frequency: Fall Term

Times: Tues/ Thurs 1:00pm - 2:15pm Eastern Time.

Location: in-person: Bloomberg Center 131; online: zoom at  https://cornell.zoom.us/j/95208049944?pwd=aUpKamM5UlphZFQ3cEZyQ3JTc1VPdz09.

Teaching Staff and Office Hours

Volodymyr Kuleshov (Instructor). Office Hours: Tue 2:15pm-3:15pm (after class) Thu 2:15pm-2:45pm (after class). Volodymyr will be taking questions in the auditorium after the lecture and then move to the tables outside Bloomberg 131. You can also join via Zoom.

Zheng Li (Head TA). Email: zl634@cornell.edu Office hours: Fri 10.30 am to 12.00pm EST, Zoom link (subject to changes, please see the zoom tab for updates).

Yiran Zhao (TA). Email: yz2647@cornell.edu Office hours: Thurs 11:30 - 1:00 pm EST on Zoom link

Yin Li (TA). Email: yl3243@cornell.edu Office hours: Wed 11:00 AM - 12:30 PM EST,  Zoom link (subject to changes, please see the zoom tab for updates).

Due to Cornell Tech policies, all office hours will be held remotely on zoom.

Student Outcomes

  1. Be able to analyze and extract meaning from data by applying key concepts of modeling, analysis, and validation from Machine Learning, Data Mining, and Signal Processing. 
  2. Implement algorithms and perform experiments on images, text, audio, and other modalities. 
  3. Demonstrate an understanding of modern machine learning algorithms like tree-based models boosting, and deep neural networks.
  4. Gain working knowledge of supervised and unsupervised techniques and their relevant trade-offs in practical usage.

Preparation

Math. Students need to be comfortable with multivariable calculus, primarily integration and differentiation in multiple dimensions.  Course will also require a basic understanding of probability at the level of an introductory undergraduate course. Teaching staff will hold review sessions to cover background material.

Programming. Students should have a basic programming ability. Course will use Python and related data science libraries, including numpy, scipy, scikit-learn, and tensorflow or pytorch. Familiarity with these libraries is preferred, but we expect students to be able to learn parts of these libraries during the course. Teaching staff will hold review sessions to cover background material.

Prerequisites. CS 2800 or equivalent, Linear Algebra, Probability, and experience programming with Python, or permission of the instructor

Textbooks and Other Materials

  • Textbooks (Optional) 
    • T. Hastie, R. Tibshirani and J. Friedman, The Elements of Statistical Learning: Data Mining, Inference, and Prediction (2nd edition), Springer-Verlag, 2008. (available for free)
    • K. Murphy, Machine Learning: A Probabilistic Perspective, MIT Press, 2012.
    • C. Bishop, Pattern Recognition and Machine Learning, Springer, 2006.
  • Lecture Scribe notes available on the website
  • A list of probability and linear algebra resources link

Class and Laboratory Schedule

Lectures: 2.5 hrs/wk

Recitations: None required.  Optional sessions with graduate or undergraduate TAs

Grading

Homework 1

Combination of theory and programming questions

10%

Homework 2

Combination of theory and programming questions

10%

Homework 3

Combination of theory and programming questions

10%

Homework 4

Combination of theory and programming questions

10%

Prelim

A test on the course contents.

15%

Project Proposal

Brief description of the planned project, around 300 words.

5%

Project Milestone

Mid-semester progress report on course project, 3-5 pages in length.

15%

Final Project

Final report on the course project, 5 pages in length.

25%

Total Points

100%

Basis of grade determination: 

Grade

Percent

A+

98-100

A

93-97

A-

90-92

B+

88-89

B

83-87

B-

80-82

C+

78-79

C

73-77

C-

70-72

D

60-69

F

<60

Assignments

Written Assignments: Homework should be written up clearly and succinctly; you may lose points if your answers are unclear or unnecessarily complicated. You are encouraged to use LaTeX to writeup your homework, but this is not a requirement. Assignments will be submitted on Gradescope, if you have not been enrolled in this course on Gradescope, you can use entry code to enroll. You may work in teams of two: make sure to put both of your names on the submission and submit as a team in Gradescope.

Late Submissions: You have 6 late days for assignments and project related submissions which you can use at any time during the term without penalty, with maximum 2 late days per submission (i.e. you cannot use ≥3 late days for any assignment), but no late days for the final project writeup. Once you run out of late days, you will incur in a 20% penalty for each extra late day you use. When submitting as a team, using late days will deduct the remaining quota of all members of the team. Each late submission should be clearly marked as “Late” on the first page. No submission will be accepted ≥3 days after the deadline.

Course Project

The course project will give the students a chance to explore machine learning in greater detail. Course projects will be done in groups of up to 3 students and can fall into one or more of the following categories:

  • Application of machine learning to a practical problem or a dataset.
  • Improvements to machine learning algorithms.
  • Theoretical analysis of any aspect of machine learning models.

Pick a topic that's meaningful to you and that excites you. For example, if you do PhD research in biology, you can do a project related a dataset that you work with. If you're in Urban Tech, you can work with a city dataset that you find interesting. You are encouraged to find something on your own, but we are also going to share topic ideas in Canvas and you should feel free to talk to the teaching team during office hours.

Proposal (Due Sep 30 at 11:59pm ET)

Your proposal should give the title of the project, the project category, the names of your team members, their NetID, and a 300-500 word description of what you plan to do. It should contain the following information.

  • Motivation: What problem are you tackling? Is this an application or a theoretical result?
  • Method: What machine learning techniques are you planning to apply or improve upon and how?
  • Experiments: What experiments are you planning to perform (or what theorems do you want to prove)?

The goal of the proposal is make sure you're on the right track. As long as you follow the above guidelines, you should do well.

Please submit the proposal via Gradescope and make sure to submit as a team.

Milestone (Due Nov 11 at 11:59pm ET)

The milestone submission should describe what you've accomplished so far, and briefly say what else you plan to do. The format should be the same as of the final project, with a maximum length of 3 pages (excluding references). The goal is to make sure that you are on track to finish the final project.

  • Motivation: What problem are you tackling? Is this an application or a theoretical result?
  • Method: What machine learning techniques are you planning to apply or improve upon and how?
  • Preliminary experiments: Describe the experiments that you've run, the outcomes, and any error analysis that you've done. You should have tried at least one baseline.
  • Future work: What else do you plan to do?

The goal of the milestone is make sure you're on the right track. As long as you follow the above guidelines, you should do well.

Please submit the milestone via Gradescope and make sure to submit as a team.

Final Writeup (Due Dec 13 at 11:59pm ET -- no late days!)

The final writeup should describe all the work you did for your course project and summarize the main results. You can think of it as a technical report that presents your findings to a general machine learning audience.

The style and format of the writeup should be similar to that of a research paper. The maximum length is 5 pages, excluding references. We provide a Latex template adapted from the NeurIPS style files for your reference (Go to Files->Project). 

There are no strict requirements on the structure of the final writeup, but one way to structuring it would be include the following sections, which are fairly standard for a research paper.

  • Abstract: Summarize the problem, novel contributions, and results in one paragraph.
  • Introduction: Provide motivation for the problem and expand upon the overview in the abstract.
  • Background: Briefly summarize the background knowledge needed to understand the work.
  • Method: Describe the methods that will be used or implemented in the paper.
  • Theoretical analysis: If you are doing a theory project, describe your theoretical results here.
  • Experimental analysis: Describe in detail your experiments.
  • Discussion and Prior Work: Discuss the key takeaways from your experiments. Put your results in the context of previous work
  • Conclusion. You may summarize the paper or talk about open problems and open directions.

Regardless of how the writeup is structured, please make sure to cover the following points.

  • Motivation: What problem are you tackling? Why is it interesting? What type of project will this be (application, method, theory)?
  • Method: What machine learning techniques are you planning to apply or improve upon and how? Make sure to describe them in detail and provide enough context for the reader to understand the methods at least at a high level. Provide any background that is necessary for that.
  • Experiments: Describe the experiments that you've run, the outcomes, and any error analysis that you've done. Make sure that the setup is described in enough detail for someone else to reproduce your results. Also, if you have an experimental project, make sure to provide a detailed experimental analysis. Things you should consider including are: train/test performance, learning curves, model samples, error analyses, ablation analyses, etc. Most projects should also include baselines.
  • Theory: If doing a theory project, state your results formally as theorems. Make sure that all the symbols are defined. Also, the best presentation of theoretical results tends to also explain the results in plain language and conveys the intuition behind them.
  • Context: Explain how you build upon previous work and how your results compare to what has been done previously.

Writeups will be evaluated for their presentation clarity, the respect of the above guidelines, the significance of the project (does it explore a toy dataset or a real problem) and the technical quality of the work (the level of depth in the experimental or theoretical analyses, does the approach make sense technically, are the algorithms implemented reasonable and studied in enough detail, etc.).

Please submit the writeup via Gradescope and make sure to submit as a team.

Collaboration Policy and Honor Code

You are free to form study groups and discuss homeworks and projects. However, you must write up homeworks and code from scratch independently without referring to any notes from the joint session. You should not copy, refer to, or look at the solutions in preparing their answers from previous years’ homeworks. It is an honor code violation to intentionally refer to a previous year’s solutions, either official or written up by another student. Anybody violating the honor code will be referred to the Office of Judicial Affairs.

Contents

Date Weekday No. Topic Readings
8/26/2021 Thursday 1 Introduction: Supervised, unsupervised, reinforcement learning, TBA
8/31/2021 Tuesday 2 [SL] Introduction. Models, features, objectives, optimization.
9/2/2021 Thursday 3 [SL] Regression. Linear Regression. OLS.
9/7/2021 Tuesday 4 [SL] Classification. Logistic Regression and Max Likelihood
9/9/2021 Thursday 5 [SL] Why Does SL Work? Data distribution, over/under fitting, regularization
9/14/2021 Tuesday 6 [SL] Generative models. Gaussian Discriminant Analysis
9/16/2021 Thursday 7 [SL] Naive Bayes. Bag of words, generative vs. discriminative methods.
9/21/2021 Tuesday 8 [UL] Introduction to Unsupervised Learning. K-Means
9/23/2021 Thursday 9 [UL] Density Estimation. Histogram Method, K-Nearest Neighbors (SL)
9/28/2021 Tuesday 10 [UL] Clustering. Gaussian mixture models, expectation-maximization.
9/30/2021 Thursday 11 [UL] Dimensionality Reduction. PCA.
10/5/2021 Tuesday 12 [SL] SVMs. Margins, max-margin classifiers, hinge loss, optimization
10/7/2021 Thursday 13 [SL] Dual Formulation of SVMs. Lagrange duality, SVMs duals, SMO
10/12/2021 Tuesday Fall Break - No class
10/14/2021 Thursday 14 Prelim Review
10/19/2021 Tuesday 15 Prelim In Class
10/21/2021 Thursday 16 [SL] Kernels. Kernel Trick, Example Kernels, Mercer's theorem
10/26/2021 Tuesday 17 [SL] Decision Trees. Bagging, ensembling, CART.
10/28/2021 Thursday 18 [SL] Boosting. Adaboost, gradient boosting.
11/2/2021 Tuesday 19 [SL] Neural Networks. Perceptrons, multi-layer neural networks.
11/4/2021 Thursday 20 [SL] Deep Learning. Convolutional neural networks and applications.
11/9/2021 Tuesday 21 Bonus Lecture: Advanced Deep Learning Topics
11/11/2021 Thursday 22 Bonus Lecture: Advanced Deep Learning Topics
11/16/2021 Tuesday 23 Applying Machine Learning: Evaluation. Dataset splits; cross-validation, performance measures
11/18/2021 Thursday 24 Applying Machine Learning: Diagnosis. Model iteration process, bias/variance tradeoff, baselines, learning curves
11/23/2021 Tuesday 25 Applying Machine Learning: Diagnosis. Error analysis, data integrity, human-level performance
11/26/2021 Friday Thanksgiving Break - No class
11/30/2021 Tuesday 26 Understanding Machine Learning: Bias/variance tradeoff. Empirical risk minimization. Learning theory.
12/2/2021 Thursday 27 Final Lecture. Overview of the course. Taxonomy of ML algorithms. Research directions.
12/7/2021 Tuesday 28 Probably No Lecture to Work on Projects
Final Projects Due 12/13 (no late days!)

 

Academic Integrity:

Each student in this course is expected to abide by the Cornell University Code of Academic Integrity.  Any work submitted by a student in this course for academic credit will be the student's own work. The policy can be found on the university’s website here: https://theuniversityfaculty.cornell.edu/academic-integrity/

You are encouraged to study together and to discuss information and concepts covered in lecture and the sections with other students. You can give "consulting" help to or receive "consulting" help from such students. However, this permissible cooperation should never involve one student having possession of a copy of all or part of work done by someone else, in the form of an e-mail, an e-mail attachment file, a diskette, or a hard copy. 

Should copying occur, both the student who copied work from another student and the student who gave material to be copied will both automatically receive a zero for the assignment. Penalty for violation of this Code can also be extended to include failure of the course and University disciplinary action. 

During examinations, you must do your own work. Talking or discussion is not permitted during the examinations, nor may you compare papers, copy from others, or collaborate in any way. Any collaborative behavior during the examinations will result in failure of the exam, and may lead to failure of the course and University disciplinary action.

Optional statement about Academic Misconduct:

  • Academic Misconduct. A faculty member may impose a grade penalty for any misconduct in the classroom or examination room. Examples of academic misconduct include, but are not limited to, talking during an exam, bringing unauthorized materials into the exam room, and disruptive behavior in the classroom.

Students with Disabilities

Your access in this course is important. Please give me [the TA, the Course Coordinator] your Student Disability Services (SDS) accommodation letter early in the semester so that we have adequate time to arrange your approved academic accommodations. If you need an immediate accommodation for equal access, please speak with me after class or send an email message to me and/or SDS at sds_cu@cornell.edu. If the need arises for additional accommodations during the semester, please contact SDS. You may also feel free to speak with Student Services at Cornell Tech who will connect you with the university SDS office.

Religious Observances

Cornell University is committed to supporting students who wish to practice their religious beliefs. Students are advised to discuss religious absences with their instructors well in advance of the religious holiday so that arrangements for making up work can be resolved before the absence.

Options Statement about our supportive community:

Cornell Tech Cares: The Cornell Tech community is a diverse and vibrant group of students, faculty, and staff.  We take our responsibility to look out for one another seriously. As members of this community, your openness and proactive communication will allow us all to better care for students and respond to their needs, whether they be interpersonal or academic. Please help us continue to build and strengthen our community by reaching out if you are having an issue or are concerned about a fellow student. Contact studentwellness@tech.cornell.edu with concerns and we will make sure to care for one another.  In the event of an emergency, please call 911 and Cornell Tech Safety & Security at 646-971-3611 (This number is also located on the back of your Cornell ID), when safe to do so.

Course Summary:

Date Details Due