10-315, Spring 2023

Introduction to Machine Learning (SCS majors)

Overview

Key Information

Monday + Wednesday, 12:30 pm - 1:50 pm, Hall of Arts 160

Sections A and B, Friday 1:00 pm - 1:50 pm, Hall of Arts 160, see Recitation

Section C, Friday 11:00 am - 11:50 am, WEH 2302

Section D, Friday 9:00 am - 9:50 am, GHC 4102

Section E, Friday 10:00 am - 10:50 am, DH 1112

Section F, Friday 11:00 am - 11:50 am, DH 1112

Meher Mankikar, Shreeya Khurana, Alex Xu, Deep Patel, Ruthie Lin, Saumya Gandhi, Saloni Parekh, Medha Palavalli, Devanshi Gupta, Arya Shah, see the 315 Staff page

Grades will be collected in Canvas.
Midterm 1 20%, Midterm 2 20%, Written/Programming Homework 40%, Pre-Lecture Reading Checkpoints 5%, Online Homework 5%, Participation 5%, Mini-project 5%

There is no required textbook for this course. Any recommended readings will come from sources freely available online.

We will use Piazza for questions and any course announcements.

Students will turn in their homework electronically using Gradescope.

Machine Learning is concerned with computer programs that automatically improve their performance through experience (e.g., programs that learn to recognize human faces, recommend music and movies, and drive autonomous robots). This course covers the core concepts, theory, algorithms and applications of machine learning.

Learning Objectives

After completing the course, students should be able to:

  • Select and apply an appropriate supervised learning algorithm for classification and regression problems (e.g., linear regression, logistic regression, ridge regression, nonparametric kernel regression, neural networks, naive Bayes, support vector machines).
  • Recognize different types of unsupervised learning problems, and select and apply appropriate algorithms (e.g., k-means clustering, Gaussian mixture models, linear and nonlinear dimensionality reduction).
  • Work with probability (Bayes rule, conditioning, expectations, independence), linear algebra (vector and matrix operations, eigenvectors, SVD), and calculus (gradients, Jacobians) to derive machine learning methods such as linear regression, naive Bayes, and principal component analysis.
  • Understand machine learning principles such as model selection, overfitting, and underfitting, and techniques such as cross-validation and regularization.
  • Implement machine learning algorithms such as logistic regression via stochastic gradient descent, linear regression, or k-means clustering.
  • Run appropriate supervised and unsupervised learning algorithms on real and synthetic data sets and interpret the results.

Levels

This course is designed for SCS undergraduate majors. It covers many similar topics to other introductory machine learning course, such as 10-301/10-601 and 10-701. This 10-315 course and 15-281 AI Representation and Problem Solving are designed to complement each other and provide both breadth and depth across AI and ML topics. Contact the instructor if you are concerned about which machine learning course is appropriate for you.

Prerequisites

The prequisites for this course are:

  • 15-122: Principles of Imperative Computation
  • 36-225 or 36-218 or 36-217 or 15-259 or 15-359 or 21-325 or 36-219: Probability
  • 15-151 or 21-127 or 21-128: Mathematical Foundations of Computer Science / Concepts of Mathematics.
  • 21-241 or 21-240 or 21-242: Linear Algebra

While not explicitly a prerequisite, we will be programming exclusively in Python. Please see the instructor if you are unsure whether your background is suitable for the course.

Office Hours

Pat would very much like to help you all as much as possible. In addition to standing office hourse, he often have "OH" (or "Open") appointment slots on his office hours appointment calendar. If no there are no available OH or appointments that meet your needs, please contact Pat via a private post on Piazza with a list of times that work for you to meet.

Schedule

Subject to change

Textbooks:

Bishop, Christopher. Pattern Recognition and Machine Learning, available online, (optional)

Daumé III, Hal. A Course in Machine Learning, available online

(DL) Goodfellow, Ian, Yoshua Bengio, Aaron Courville. Deep Learning, available online, (optional)

(MML) Marc Peter Deisenroth, A. Aldo Faisal, and Cheng Soon Ong. Mathematics for Machine Learning, available online

Mitchell, Tom. Machine Learning, available online

Murphy, Kevin P. Machine Learning: A Probabilistic Perspective, available online, (optional)

(KMPA) Shaw-Taylor, John, Nello Cristianini. Kernel Methods for Pattern Analysis, available online, (optional)


Dates Topic Lecture Materials Pre-Reading Reading (optional)
1/16 Mon No class: MLK Day MML 2.1-3, 2.5, 2.6 and 3.1, 3.2.1, 3.3
1/18 Wed Introduction 10315 Notation Guide.pdf
pptx (inked) pdf (inked)
Mitchell 1.1-1.2
1/23 Mon Decision Trees pptx (inked) pdf (inked) Daumé 1
1/25 Wed Decision Trees
K-Nearest Neighbor and Model Selection
pptx (inked) pdf (inked)
pptx (inked) pdf (inked)
Daumé 2, Daumé 3
Entropy, Cross-Entropy video, A. Géron
1/30 Mon K-NN and Model Selection (cont.) See previous lecture slides
Pre-reading: ipynb or pdf
Checkpoint 1 due 1/30 Mon morning, 10 am
MML 8.2.4, 8.3.3
2/1 Wed Optimization and Linear Regression pptx (inked) pdf (inked)
regression interactive.ipynb
regression blind interactive.ipynb
MML 8.2-8.2.2
MML 5.2-5.5
2/6 Mon Optimization and Linear Regression (cont.)
Feature Engineering
See previous lecture slides

pptx (inked) pdf (inked)
2/8 Wed Logistic Regression pptx (inked) pdf (inked)
linear logistic.ipynb
quadratic logistic.ipynb
Multiclass logistic Desmos
Bishop 4.1.3, 4.3.2, 4.3.4
2/13 Mon Neural Networks pptx (inked) pdf (inked)
Convex functions Desmos
MML 5.6
DL 6
2/15 Wed Neural Networks See previous lecture slides
Universal network Desmos
2/20 Mon Regularization pptx (inked) pdf (inked) DL 7.1,7.8
Bishop 3.1.4
2/22 Wed MLE and Probabilistic Modeling pptx (inked) pdf (inked)
MLE notes (draft; last section added soon!): pdf
MML 9
Bishop 1.2.4-5, 3.1.1-2
2/27 Mon MLE and Probabilistic Modeling See previous lecture slides
3/1 Wed EXAM 1
In-class
Learning objectives: pdf
Practice problems: pdf (sol)
3/6 Mon No class: Spring Break
3/8 Wed No class: Spring Break
3/13 Mon Neural Net Applications In-progress: pptx pdf DL 9
3/15 Wed Neural Net Applications (cont.) See previous lecture slides
3/20 Mon MAP pptx (inked) pdf (inked) Pre-reading: MLE notes (draft; last section added soon!) pdf Mitchell MLE and MAP
3/22 Wed Probabilistic Generative Models and Naive Bayes pptx (inked) pdf (inked) discriminant analysis.ipynb
Mitchel Generative and Discriminative Classifiers
Murphy 3.5, 4.2, 8.6
3/27 Mon Probabilistic Generative Models and Naive Bayes (cont.) See previous lecture slides
3/29 Wed Dimensionality Reduction: PCA, Autoencoders, Feature Learning pptx (inked) pdf (inked) Bishop 12.1, Murphy 12.2
4/3 Mon Recommender Systems pptx (inked) pdf (inked) Matrix Factorization Techniques for Recommender Systems. Koren, Bell, and Volinsky (2009)
4/5 Wed Clustering, K-means pptx (inked) pdf (inked) Bishop 9.1, Murphy 25.5
4/10 Mon Gaussian Mixture Models, EM Algorithm
pptx (inked) pdf (inked) Bishop 9.2
4/12 Wed Nonparametric Regression, Kernels pptx (inked) pdf (inked) KMPA 7.3, 7.3.2
4/17 Mon SVMs, Duality
4/19 Wed Learning Theory, PAC pdf (inked)
4/24 Mon Learning Theory, VC Dimensions
4/26 Wed EXAM 2
In-class
Learning objectives: pdf
Practice problems: pdf (sol)

Recitation

Recitation starts the first week of class, Friday, Jan. 20. Recitation attendence is recommended to help solidfy weekly course topics. That being said, the recitation materials published below are required content and are in-scope for midterms 1 and 2. Students frequently say that the recitations are one of the most important aspects of the course.

Recitation section assignments will be locked-down after the third week. Until then, you may try attending different recitation sections to find the best fit for you. In the case of any over-crowded recitation sections, priority goes to students that are officially registered for that section in SIO. The process to select your final recitation assignment will be announced on Piazza as we get closer to Recitation 4.

Recitations will be on Fridays in the following individual recitation sections:


Section Time Location TAs Resources
A+B Friday 1:00 pm - 1:50 pm Hall of Arts 160 Meher, Deep Drive folder
C Friday 11:00 am - 11:50 am WEH 2302 Shreeya, Medha Drive folder
D Friday 9:00 am - 9:50 am GHC 4102 Saloni, Alex Drive folder
E Friday 10:00 am - 10:50 am DH 1112 Devanshi, Arya Drive folder
F Friday 11:00 am - 11:50 am DH 1112 Saumya, Ruthie Drive folder


Dates Recitation Handout/Code
1/20 Fri Recitation 1: NumPy Reference: NumPy_Tutorial_from_11-785.ipynb
visualizing_data_1.ipynb
visualizing_data_2.ipynb
visualizing_data_3.ipynb
indexing_trick.ipynb
messing_with_mnist.ipynb
1/27 Fri Recitation 2: Decision Trees and K-NN pdf (solution) and kNN.ipynb
2/3 Fri Recitation 3: Matrix Calculus and Linear Regression pdf (solution)
2/10 Fri Recitation 4: Logistic Regression pdf (solution)
2/17 Fri Recitation 5: Neural Networks pdf (solution)
2/24 Fri Recitation 6: Regularization, Prob/Stat/MLE pdf (solution)
3/3 Fri No recitation
3/10 Fri No recitation
3/17 Fri Recitation 7: PyTorch, Convnets, MAP PyTorch Overview Slides
PyTorch Tutorial Notebook.ipynb
Worksheet: pdf (solution)
3/24 Fri Recitation 8: Generative Models pdf (solution)
discriminant analysis.ipynb
3/31 Fri Recitation 9: Generative+MAP, PCA pdf (solution)
4/7 Fri Recitation 10: Recommender Systems, Clustering pdf (solution)
4/14 Fri Recitation 10.5: Kernel Regression (no in-person recitation) pdf (solution)
4/21 Fri Recitation 11 pdf (solution)
4/28 Fri Recitation 12

Exams

The course includes two midterm exams. The midterms will be 12:30-1:50 pm on Mar. 1 and Apr. 26. Both will take place in class. Plan any travel around exams, as exams cannot be rescheduled.

Mini-project

A mini-project due during the final exam period. This will be an opportunity to work with a team and apply machine learning concepts from class to a project that is more customized to your interests. More details about the mini-project details and deadlines will be announce later in the semester.

Assignments

There will be approximately six homework assignments that will have written and programming components and approximately five online assignments (subject to change). Written and online components will involve working through algorithms presented in the class, deriving and proving mathematical results, and critically analyzing material presented in class. Programming assignments will involve writing code in Python to implement various algorithms.

For any assignments that aren't released yet, the dates below are tentative and subject to change.

Assignment due dates (Tentative)

Assignment Link (if released) Due Date
HW 1 (programming) hw1.ipynb 1/29 Sun, 11:59 pm
HW 2 (online) Gradescope 2/4 Sat, 11:59 pm
HW 3 (written/programming) hw3_blank.pdf, hw3_tex.zip, hw3.ipynb 2/9 Thu, 11:59 pm
HW 4 (online) Gradescope 2/16 Thu, 11:59 pm
HW 5 (written/programming) hw5_blank.pdf, hw5_tex.zip, hw5.ipynb 2/23 Thu, 11:59 pm
HW 6 (online) Gradescope 3/24 Fri, 11:59 pm
HW 7 (written/programming) hw7_blank.pdf, hw7_tex.zip, hw7.ipynb 4/1 Sat, 11:59 pm
HW 8 (online) Gradescope 4/9 Sun, 11:59 pm
HW 9 (online) Gradescope 4/17 Mon, 11:59 pm
HW 10 (written/programming) hw10_blank.pdf, hw10_tex.zip, hw10.ipynb 4/22 Sat, 11:59 pm

Policies

Grading

Grades will ultimately be collected and reported in Canvas.

Final scores will be composed of:

  • 20% Midterm Exam 1
  • 20% Midterm Exam 2
  • 40% Written/Programming homework
  • 5% Online homework
  • 5% Participation
  • 5% Mini-project
  • 5% Pre-Lecture Reading Checkpoints

Final Grade

This class is not curved. However, we convert final course scores to letter grades based on grade boundaries that are determined at the end of the semester. What follows is a rough guide to how course grades will be established, not a precise formula — we will fine-tune cutoffs and other details as we see fit after the end of the course. This is meant to help you set expectations and take action if your trajectory in the class does not take you to the grade you are hoping for. So, here's a rough heuristics about the correlation between final grades and total scores:

  • A: above 90%
  • B: 80-90%
  • C: 70-80%
  • D: 60-70%

This heuristic assumes that the makeup of a student's grade is not wildly anomalous: exceptionally low overall scores on exams, programming assignments, or written assignments will be treated on a case-by-case basis.

Precise grade cutoffs will not be discussed at any point during or after the semester. For students very close to grade boundaries, instructors may, at their discretion, consider participation in lecture and recitation, exam performance, and overall grade trends when assigning the final grade.

Participation

In class, we will use a series of polls as part of an active learning technique called Peer Instruction. Your participation grade will be based on the percentage of these in-class poll questions answered:
  • 5% for 80% or greater poll participation
  • 3% for 70%
  • 1% for 60%
  • Correctness of in-class polling responses will not be taken into account for participation grades.
  • If a poll is duplicated with the same question (e.g. before and after discussing with your neighbor), you should answer all of the duplicated versions as well, as they will be counted as separate polls.
  • If you have systemic/repeated technical issues, please let us know as soon as possible, so we can resolve the situation.
  • Missing polls due to absences (e.g., brief illness) from lecture or due to technical difficulties is expected occasionally, and this is why you only need to answer >= 80% of the polls to get full credit.
It is against the course academic integrity policy to answer in-class polls when you are not present in lecture. Violations of this policy will be reported as an academic integrity violation. Information about academic integrity at CMU may be found at https://www.cmu.edu/academic-integrity.

Late Policies, and Extensions, and Exceptions

Participation

  • Missing polls due to absences (e.g., brief illness) from lecture or due to technical difficulties is expected occasionally, and this is why you only need to answer >= 80% of the polls to get full credit.
  • If you must miss many lectures due to circumstances outside of your control (e.g., if you have an extended illness) please e-mail Joshmin, joshminr@andrew.cmu.edu, prior to lecture.

Pre-reading checkpoints

Pre-reading checkpoints don't have any extensions or late days. However, the lowest two checkpoints will be dropped when computing your semester score. Reasoning: We want to make sure that everyone is able to complete the pre-reading prior to lecture, so we can build on that knowledge in class; minor illness and other minor disruptive events outside of your control happen occasionally and thus dropping the lowest two scores. See below for information on rare exceptions.

Written/programming homework and online homework

You have a pool of 6 slip days across all written/programming and online assignment types

  • Use up to two per assignment
  • Written and programming assignments with the same homework number are considered the same assignment; so e.g., if you turn in both programming and written components within 24 hours after the due date, you will use one slip day, not two.
  • You may use these at your discretion, but they are intended for minor illness and other disruptive events outside of your control, and not for poor time management.
  • No need to inform us that you are using a slip day; just submit it to Gradescope during the slip day.
  • You are responsible to keep track of your own slip days. Gradescope will not enforce the total number of slip days
  • Homework submitted after these two slip days or submitted by a student without any slip days remaining will be given a score of 0.

Exceptions and extensions

Aside from slip days, dropping the lowest checkpoints, and the 80% threshold for participation, there will be no extensions on assignments in general. If you think you really really need an extension on a particular assignment, e-mail Joshmin, joshminr@andrew.cmu.edu, as soon as possible and before the deadline. Please be aware that extensions are entirely discretionary and will be granted only in exceptional circumstances outside of your control (e.g., due to severe illness or major personal/family emergencies, but not for competitions, club-related events, or interviews). The instructors will require confirmation from University Health Services or your academic advisor, as appropriate.
We certainly understand that unfortunate things happen in life. However, not all unfortunate circumstances are valid reasons for an extension. Nearly all situations that make you run late on an assignment homework can be avoided with proper planning - often just starting early. Here are some examples:

  • I have so many deadlines this week: you know your deadlines ahead of time - plan accordingly.
  • It's a minute before the deadline and the network is down: you always have multiple submissions - it's not a good idea to wait for the deadline for your first submission.
  • My computer crashed and I lost everything: Use Google Drive, Dropbox, or similar system to do real-time backup - recover your files and finish your homework from a cluster machine or borrowed computer.

Collaboration and Academic Integrity Policies

Collaboration

  • The purpose of student collaboration is to facilitate learning, not to circumvent it. Studying the material in groups is strongly encouraged. You are also allowed to seek help from other students in understanding the material needed to solve a particular homework problem, provided any written notes (including code) are taken on an impermanent surface (e.g. whiteboard, chalkboard), and provided learning is facilitated, not circumvented. The actual solution must be done by each student alone.
  • A good method to follow when collaborating is to meet with your peers, discuss ideas at a high level, but do not copy down any notes from each other or from a white board. Any scratch work done at this time should be your own only. Before writing the assignment solutions, you should make sure that you are doing this without anyone else present, putting all notes away, closing all tabs on your computer, and writing it completely by yourself with no other resources.
  • You may NOT view, share, or communicate about any artifact that will be submitted as part of an assignment. Example artifacts include, but are not limited to: code, pseudocode, diagrams, and text.
  • You may look at another student's code output and discuss it at a conceptual level, as long as it is not output that appears directly in the homework submission.
  • You may look at another student's code error messages and discuss what the error means at a conceptual level. However, you may NOT give specific instructions to fix the error.
  • All work that you present must be your own. Auto-generated code, for example, is not acceptable.
  • Using any external sources of code or algorithms in any way must have approval from the instructor before submitting the work. For example, you must get instructor approval before using an algorithm you found online for implementing a optmization function in a programming assignment.
  • The presence or absence of any form of help or collaboration, whether given or received, must be explicitly stated and disclosed in full by all involved. Specifically, each assignment solution must include answering the following questions:
    1. Did you receive any help whatsoever from anyone in solving this assignment? Yes / No.
      • If you answered ‘yes’, give full details: ____________
      • (e.g. “Jane Doe explained to me what is asked in Question 3.4”)
    2. Did you give any help whatsoever to anyone in solving this assignment? Yes / No.
      • If you answered ‘yes’, give full details: _____________
      • (e.g. “I pointed Full Name to section 2.3 since they didn’t know how to proceed with Question 2”)
    3. Did you find or come across code that implements any part of this assignment ? Yes / No. (See below policy on “found code”)
      • If you answered ‘yes’, give full details: _____________
      • (book & page, URL & location within the page, etc.).
  • If you gave help after turning in your own assignment and/or after answering the questions above, you must update your answers before the assignment’s deadline, if necessary by emailing Joshmin.
  • Collaboration without full disclosure will be handled severely, in compliance with CMU’s Policy on Academic Integrity.

Policy Regarding “Found Code”

You are encouraged to read books and other instructional materials, both online and offline, to help you understand the concepts and algorithms taught in class. These materials may contain example code or pseudo code, which may help you better understand an algorithm or an implementation detail. However, when you implement your own solution to an assignment, you must put all materials aside, and write your code completely on your own, starting “from scratch”. Specifically, you may not use any code you found or came across. If you find or come across code that implements any part of your assignment, you must disclose this fact in your collaboration statement.

Duty to Protect One’s Work

Students are responsible for pro-actively protecting their work from copying and misuse by other students. If a student’s work is copied by another student, the original author is also considered to be at fault and in gross violation of the course policies. It does not matter whether the author allowed the work to be copied or was merely negligent in preventing it from being copied. When overlapping work is submitted by different students, both students will be punished.

Do not post your solutions publicly, neither during the course nor afterwards.

Penalties for Violations of Course Policies

Violations of these policies will be reported as an academic integrity violation and will also result in a -100% score on the associated assignment/exam. Information about academic integrity at CMU may be found at https://www.cmu.edu/academic-integrity. Please contact the instructor if you ever have any questions regarding academic integrity or these collaboration policies.

(The above policies are adapted from 10-601 Fall 2018 and 10-301/601 Spring 2023 course policies.)

Accommodations for Students with Disabilities

If you have a disability and have an accommodations letter from the Disability Resources office, we encourage you to discuss your accommodations and needs with us as early in the semester as possible. We will work with you to ensure that accommodations are provided as appropriate. If you suspect that you may have a disability and would benefit from accommodations but are not yet registered with the Office of Disability Resources, we encourage you to visit their website.

Statement of Support for Students’ Health & Well-being

Take care of yourself. Do your best to maintain a healthy lifestyle this semester by eating well, exercising, getting enough sleep, and taking some time to relax. This will help you achieve your goals and cope with stress.
All of us benefit from support during times of struggle. There are many helpful resources available on campus and an important part of the college experience is learning how to ask for help. Asking for support sooner rather than later is almost always helpful.
If you or anyone you know experiences any academic stress, difficult life events, or feelings like anxiety or depression, we strongly encourage you to seek support. Counseling and Psychological Services (CaPS) is here to help: call 412-268-2922 and visit their website at http://www.cmu.edu/counseling/. Consider reaching out to a friend, faculty or family member you trust for help getting connected to the support that can help.
If you have questions about this or your coursework, please let us know. Thank you, and have a great semester.