Course Syllabus
Machine learning is increasingly driven by advances in the underlying hardware and software systems. This course will focus on the challenges inherent to engineering machine learning systems to be correct, robust, and fast. The course walks through the development of a software library for machine learning from scratch, with each assignment requiring students to build models in their own library. Topics will include: tensor languages and auto-differentiation; model debugging, testing, and visualization; fundamentals of GPUs; compression and low-power inference. Guest lectures will cover current topics from ML engineers.
Instructor: Prof. Alexander (Sasha) Rush
OH: Monday 3pm (Bloomberg 368)
TAs:
Jing Nathan Yan <jy858@cornell.edu>,
Junxiong Wang <jw2544@cornell.edu>,
Ahmed AbouElhamayed <afa55@cornell.edu>,
Tauhid Tanjim <tt485@cornell.edu>
TA OH: Wednesday After Class. Zoom before assignments.
Credits: 3
Course Frequency: Fall Term
Times: Mon - Wed 5:55-7:10pm
Location: Bloomberg Auditorium
Ed Discussions: https://canvas.cornell.edu/courses/56371/external_tools/3334?display=borderless
MiniTorch: https://minitorch.github.io/
Survey: https://canvas.cornell.edu/courses/56371/quizzes/122547
Course Structure
CS 5781 is a course designed for students interested in the engineering aspects of ML systems. Instead of surveying different tasks and algorithms in ML, the course will focus on the end-to-end process of implementing, optimizing, and deploying a specific model. By limiting ourselves to a fixed model architecture, we will be able to better examine each aspect of the pipeline leading to final deployment, and examine the trade-offs in training, debugging, testing, and deployment, both at a low-level (hardware) and at a high-level (user tools). CS 5781 will be less mathematically demanding than other ML courses, although it does require familiarity with matrices and derivatives. On the other hand, it will be significantly more programming intensive. Each assignment will require completing significant programming exercises in Python, leading up to full implementation of ML systems.
This year the course targets non-linear, dense logistic regression, roughly “deep learning”, models. The following are the main units covered. There will be additional sub-units throughout the semester.
- Intro
- Unit 0: Fundamentals
- Unit 1: Autodifferentiation
- Unit 2: Tensors
- Unit 3: Efficiency
- Unit 4: Networks
Grading
Grading for the course is:
50% Modules HWs (for late submissions: 10% docked per day)
25% Course Midterm
10% Course Attendance (quiz completion)
15% Quiz Grades
Student Outcomes
• Mastery of the key algorithms for training and executing core machine learning methods.
• Understanding of the computational requirements of running these systems.
• Practical ability to debug, optimize, and tune existing models in production environments.
• Skills to develop front-ends to easily interact with and explain predictive systems.
• Understanding how bias can be propagated and magnified by ML systems.
• Facility to compare and contrast different systems along facets such as accuracy, deployment, and robustness.
Preparation
Prerequisites: CS 2110 or equivalent programming experience
Math: Students need to be comfortable with calculus and probability, primarily differentiation and basic discrete distributions. The course does not require proofs or extensive symbolic mathematics.
CS: This course is programming intensive. Students should have strong familiarity with Python and ideally some form of numerical library (e.g. numpy, scipy, scikit-learn, torch, tensorflow). Students should have familiarity with foundational CS concepts such as memory requirements and computational complexity.
Assessment and Deliverables
The assessment structure of MLE is completely problem-set and quiz-based. Throughout the semester there will be 5 problem sets (roughly every two weeks). Students may work in teams, but must submit their own implementations.
The goal of the class is for each student to build their own ML Framework from scratch. Each assignment adds one component to the framework, and by the end of the semester students will be able to efficiently train ML models efficiently with their own framework.
The problem sets:
* Assignment 0: Testing, Modules, and Visualization
* Assignment 1: Auto-Derivatives and Training
* Assignment 2: All about Tensors
* Midterm Exam
* Assignment 3: Speeding Things Up
* Assignment 4: Building Real Models
FAQ
Q: What resources do I need to complete the class?
A: The course will require you to have a python development environment set up, ideally on your own machine or on a cloud server. It does not need to be very powerful nor will that help you do better in the class. We will have some lectures using GPUs, but will use Google Colab for these lectures.
Q: What technologies do I need to know to complete the class?
A: This is a software engineering style course, and so we recommend that you have a strong background in standard tools such as Git and GitHub, Python, and command-line programming.
Q: What math do I need to know to complete the class?
A: This course will require light-undergraduate level calculus and vector manipulation. We will provide resources for reviewing these aspects in homework assignments.
Q: How will the course schedule interact with Project Studio?
There will be three Thursday lectures which will be moved to Sunday due to interaction with Project Studio Maker Days.
Methods of Assessing Student Achievement
The assignments for this class primarily consist of completing programming assignments developing ML systems. There will be a few written questions asking students to explain reasoning and plan for engineering aspects. These will be graded for accuracy and completeness. Most points however will be assigned based on the accuracy of code implementation as assessed by automated tests and predictive accuracy of final systems.
Exams will consist of multiple choice questions based on code examples and in-class problems, as well as long-form written problems which will take the form of word problems asking students to assess and diagnose real-world systems.
Topics Covered
• Implementation of backpropagation engines
• Low-level interaction with modern hardware GPU
• High-level autodifferentiation in modern libaries
• Core models for vision, text, and recommendations
• Debugging large scale machine learning systems
• Visualization, hyperparameter tuning, and deployment
• Interactions with data processing systems
Student Outcomes
• Mastery of the key algorithms for training and executing core machine learning methods.
• Understanding of the computational requirements of running these systems.
• Practical ability to debug, optimize, and tune existing models in production environments.
• Skills to develop frontends to easily interact with and explain predictive systems.
• Understanding how bias can be propagated and magnified by ML systems.
• Facility to compare and contrast different systems along facets such as accuracy, deployment,
and robustness.
Academic Integrity
Each student in this course is expected to abide by the Cornell University Code of Academic Integrity. Any work submitted by a student in this course for academic credit will be the student's own work. The policy can be found on the university’s website here: https://theuniversityfaculty.cornell.edu/academic-integrity/.
You are encouraged to study together and to discuss information and concepts covered in lecture and the sections with other students. You can give "consulting" help to or receive "consulting" help from such students. However, this permissible cooperation should never involve one student having possession of a copy of all or part of work done by someone else, in the form of an e-mail, an e-mail attachment file, a diskette, or a hard copy.
Should copying occur, both the student who copied work from another student and the student who gave material to be copied will both automatically receive a zero for the assignment. Penalty for violation of this Code can also be extended to include failure of the course and University disciplinary action.
During examinations, you must do your own work. Talking or discussion is not permitted during the examinations, nor may you compare papers, copy from others, or collaborate in any way. Any collaborative behavior during the examinations will result in failure of the exam, and may lead to failure of the course and University disciplinary action.
Generative AI
Student's will be required to abide by the generative AI policy specified in each of the assignment releases. Use of unauthorized tools or resources will be handled by the discretion of the course staff with the possibility of failing or low-grades on assignments.
COVID-19 Guidelines
The course will follow Cornell and Cornell Tech COVID-19 guidelines.
Remote participation is not permitted. Masks are required in class for all in attendance, regardless of vaccination status.
Students with Disabilities
Your access in this course is important. Please give me your Student Disability Services (SDS) ac commodation letter early in the semester so that we have adequate time to arrange your approved academic accommodations. If you need an immediate accommodation for equal access, please speak with me after class or send an email message to me and/or SDS at sds_cu@cornell.edu. If the need arises for additional accommodations during the semester, please contact SDS. You
may also feel free to speak with Student Services at Cornell Tech who will connect you with the
university SDS office.
Religious Observances
Cornell University is committed to supporting students who wish to practice their religious beliefs. Students are advised to discuss religious absences with their instructors well in advance of the religious holiday so that arrangements for making up work can be resolved before the absence.
Statement about our supportive community
The Cornell Tech community is a diverse and vibrant group of students faculty, and staff. We take our responsibility to look out for one another seriously. As members of this community, your openness and proactive communication will allow us all to better care for students and respond to their needs, whether they be interpersonal or academic. Please help us continue to build and strengthen our community by reaching out if you are having an issue or are concerned about a fellow student. Contact studentwellness@tech.cornell.edu with concerns and we will make sure to care for one another. In the event of an emergency, please call 911 and Cornell Tech Safety Security at 646-971-3611 (This number is also located on the back of your Cornell ID), when safe to do so.
Course Summary:
| Date | Details | Due | 
|---|---|---|