# Introduction to Machine Learning

10-301 + 10-601, Spring 2023
School of Computer Science
Carnegie Mellon University

# Syllabus

### 1. Course Description

Machine Learning is concerned with computer programs that automatically improve their performance through experience (e.g., programs that learn to recognize human faces, recommend music and movies, and drive autonomous robots). This course covers the theory and practical algorithms for machine learning from a variety of perspectives. We cover topics such as decision tree learning, neural networks, statistical learning methods, unsupervised learning and reinforcement learning. The course covers theoretical concepts such as inductive bias, the PAC learning framework, Bayesian learning methods, and Occam’s Razor. Programming assignments include hands-on experiments with various learning algorithms. This course is designed to give a graduate-level student a thorough grounding in the methodologies, technologies, mathematics and algorithms currently needed by people who do research in machine learning.

10-301 and 10-601 are identical. Undergraduates must register for 10-301 and graduate students must register for 10-601.

Learning Outcomes: By the end of the course, students should be able to:

• Implement and analyze existing learning algorithms, including well-studied methods for classification, regression, structured prediction, clustering, and representation learning
• Integrate multiple facets of practical machine learning in a single system: data preprocessing, learning, regularization and model selection
• Describe the the formal properties of models and algorithms for learning and explain the practical implications of those results
• Compare and contrast different paradigms for learning (supervised, unsupervised, etc.)
• Design experiments to evaluate and compare different machine learning techniques on real-world problems
• Employ probability, statistics, calculus, linear algebra, and optimization in order to develop new predictive models or learning methods
• Given a description of a ML technique, analyze it to identify (1) the expressive power of the formalism; (2) the inductive bias implicit in the algorithm; (3) the size and complexity of the search space; (4) the computational properties of the algorithm: (5) any guarantees (or lack thereof) regarding termination, convergence, correctness, accuracy or generalization power.

For more details about topics covered, see the Schedule page.

### 2. Prerequisites

Students entering the class are expected to have a pre-existing working knowledge of probability, linear algebra, statistics and algorithms, though the class has been designed to allow students with a strong numerate background to catch up and fully participate. In addition, recitation sessions will be held to review some basic concepts.

1. You need to have, before starting this course, significant experience programming in a general programming language. Specifically, you need to have written from scratch programs consisting of several hundred lines of code. For undergraduate students, this will be satisfied for example by having passed 15-122 (Principles of Imperative Computation) with a grade of ‘C’ or higher, or comparable courses or experience elsewhere.

Note: For each programming assignment, you will be required to use Python. You will be expected to know, or be able to quickly pick up, that programming language.

2. You need to have, before starting this course, basic familiarity with probability and statistics, as can be achieved at CMU by having passed 36-217 (Probability Theory and Random Processes) or 36-225 (Introduction to Probability and Statistics I), or 15-359, or 21-325, or comparable courses elsewhere, with a grade of ‘C’ or higher.

3. You need to have, before starting this course, college-level maturity in discrete mathematics, as can be achieved at CMU by having passed 21-127 (Concepts of Mathematics) or 15-151 (Mathematical Foundations of Computer Science), or comparable courses elsewhere, with a grade of ‘C’ or higher.

You must strictly adhere to these pre-requisites! Even if CMU’s registration system does not prevent you from registering for this course, it is still your responsibility to make sure you have all of these prerequisites before you register.

(Adapted from Roni Rosenfeld’s 10-601 Spring 2016 Course Policies.)

The core content of this course does not exactly follow any one textbook. However, several of the readings will come from the Murphy book (available free online via the library) and Daumé book (only available online). Some of the readings will include new chapters (available as free online PDFs) for the Mitchell book.

The textbook below is a great resource for those hoping to brush up on the prerequisite mathematics background for this course.

### 4. Course Components

The requirements of this course consist of participating in lectures, midterm and final exams, homework assignments, and readings. The grading breakdown is the following:

• 50% Homework Assignments
• 15% Exam 1
• 15% Exam 2
• 15% Exam 3 (during Final Exam week)
• 5% Participation
• On Piazza, the Top Student “Endorsed Answer” Answerers can earn bonus points

• ≥ 97% A+
• ≥ 93% A
• ≥ 90% A-
• ≥ 87% B+
• ≥ 83% B
• ≥ 80% B-
• ≥ 77% C+
• ≥ 73% C
• ≥ 70% C-
• ≥ 67% D+
• ≥ 63% D
• otherwise R

Each individual component (e.g. an exam) may be curved upwards at the end. As well, the cutoffs above are merely an upper bound, at the end they may be adjusted down. We expect that the number of students that receive A’s (including A+, A, A-) is at least half the number of students that take the midterm exam(s). The number of B’s (including B+, B, B-) will be at least two-thirds the number of A’s.

#### Midterm and Final Exams

Unless otherwise noted, all exams are closed-book.

You are required to attend all the exams. The midterm exam(s) will be given in the evening – not in class. The final exam will be scheduled by the registrar sometime during the official final exams period. Please plan your travel accordingly as we will not be able accommodate individual travel needs (e.g. by offering the exam early).

If you have an unavoidable conflict with an exam (e.g. an exam in another course), notify us by filling out “exam conflict” form. These Exam Conflict Forms are announced on Piazza before each exam.

If your conflict is with an exam in another course, please promptly email the following people to let them know of the exam conflict:

• the instructor(s) for this course
• the education associate(s) for this course
• the instructor(s) for the other course

In the case of exam conflicts, we will discuss with the other course and suggest a resolution. One possible resolution is that we will accommodate you and will ask you to fill out the exam conflict form above. Please note that email should ONLY be used to address conflicts with an exam in another course. The definition of a final exam conflict and the standard procedures are described here: https://www.cmu.edu/hub/registrar/exams-and-grading/conflict-guidelines.html

#### Homework

The homeworks will divide into two types: programming and written. The programming assignments will ask you to implement ML algorithms from scratch; they emphasize understanding of real-world applications of ML, building end-to-end systems, and experimental design. The written assignments will focus on core concepts, “on-paper” implementations of classic learning algorithms, derivations, and understanding of theory.

More details are listed on the Coursework page.

LaTeX is a valuable tool for communicating machine learning concepts to others. In order to encourage you to use LaTeX, we will give you 1 bonus point on each homework that you write up entirely in LaTeX. We always release a LaTeX starter template.

#### Participation

##### In-Class Polls

We will be using Google Forms for in-class polls. Here’s how it will work:

1. Sometime before the lecture, we will post a Google Form containing a few questions. In order to access it, you must sign into Google using your Andrew Email – all students are automatically given access to G Suite, which allows such a sign-in. The link to each poll will appear on the Schedule page.
2. You will always be allowed to submit multiple times. So if there are multiple questions during a lecture, you should submit multiple times. You can do so by clicking the “Edit Your Response” link after each submission.
3. If you do not have a smartphone or tablet, please pick up a poll card at the front of class as you enter and hand in a paper copy at the end of class. (Do not submit a paper copy if you have a wireless device as it will create a mountain of paperwork for us.)

Here are some important notes regarding grading of these polls:

• Each question will include a toxic option which if chosen will give you negative points. If you were to answer the polls randomly without learning the toxic option in class, you would receive negative points in expectation.
• If you answer any non-toxic option for each question during class, you will receive full credit. If you answer any non-toxic option for each question after class within 24 hours of the end of the lecture, you will receive partial credit (50% credit).
• Everyone receives 8 “free poll points” – meaning that you can skip up to 8 polls (~25% of lectures) and still get 100% for the in-class polls. As a result, you should never come to us asking for points because, e.g. your dog ate your smartphone. You cannot use more than 3 free polls consecutively! (Note that negative toxic points will consume multiple free polls.) Note that hitting a toxic option could easily wipe out 3 or more of your free poll points.
##### Exit Polls

Your submission of our various exit polls will also count towards your participation grade. There will be one exit poll for each homework assignment and one for each exam; these will be released alongside the corresponding homework assignment or the day after the corresponding exam. You will receive full credit for any exit poll filled in within one week of the homework assignment due date or exam date.

##### Practice Problems

Practice Problems are optional. These problems will be released as a PDF. We will include the solutions for them as well. Some of these problems are exam-style and some are homework-style. The latter type are longer in form than we would typically include on an exam, but are still good practice.

#### Lectures

Attendance at lectures is expected except for those with explicit approval to miss lectures due to course conflicts or timezone conflicts. At least one section will be livestreamed and recorded for later viewing.

#### Recitations

Attendance at recitations (Friday sessions) is not required, but strongly encouraged. These sessions will be interactive and focus on problem solving; we strongly encourage you to actively participate. The recitations for at least one section will be livestreamed, and the recording will be available. A problem sheet will usually be released prior to the recitation. If you are unable to attend one or you missed an important detail, feel free to stop by office hours to ask the TAs about the content that was covered. Of course, we also encourage you to exchange notes with your peers.

#### Office Hours

The schedule for Office Hours will always appear on the Google Calendar on the Office Hours page.

Office Hours Locations:

All TA Office Hours will be held in-person in a large lecture hall.

Join the office hours queue and wait for a TA to accept your question.

Instructor office hours will (usually) be held immediately after class in-person just outside the lecture hall.

We encourage you to stick around and ask any questions you have about lecture material, homework problems, exam prepation, course logistics, etc.

TA Office Hours Protocols:

• Join the office hours queue (link above) and enter a detailed description of their question and mention (1) a concept and (2) the homework problem number (if applicable). If you don’t put a homework problem number, we’ll assume you have no questions about the homework. If you do not follow this format, your question may be frozen by the TAs (returned to the queue) so that you can correct it.
• When it is your turn, you will be notified on the office hours queue, the line ‘TA [name of a TA] is on the way’. Once you see this message, please find the TA and pose your question.
• The TA will determine whether or not your question would be best addressed publicly (i.e. to anyone who wants to listen in) or privately.
• 10 Minute Rule: Each student’s question will be addressed by the TA for at most 10 minutes. The only exception to this will be if a TA is answering a question publicly that has broad interest to many other students.
• The Pseudo Code Rule: This is not a programming course; you are expected to know how to debug code. As such, if your question is of the form “Could you help me to debug my code?”, you must bring with you detailed pseudo code that describes your implementation design. If you do not have pseudo code, the TA will not look at your code, but instead ask you to sketch out pseudo code at the chalkboard and discuss there instead. After discussing at a high-level if your 10 minutes have not expired, the TA may have time to look at your code.
• While your awaiting your turn, we encourage you to listen in to the answers to any publicly answered questions. Please be courteous and allow the student who posed the question to primarily direct the discussion with the TA. We also encourage you to collaborate with others (following our collaboration policies below) while waiting.
• The TAs will usually close the office hours queue 30–60 minutes before the end of office hours in order to avoid going overtime.

#### Solution Sessions

As a rule, we never release PDF solutions for any homework. Instead, we will hold a solution session in which we go through how to get the answers for the most challenging problems.

Note that video recordings of the solution sessions will only be available for 72 hours.

The purpose of the readings is to provide a broader and deeper foundation than just the lectures and assessments. The readings for this course are required. We recommend you read them after the lecture. Sometimes the readings include whole topics that are not mentioned in lecture; such topics will (in general) not appear on the exams, but we still encourage you to skim those portions.

#### Background Test + Homework 1 (Written)

Read this section carefully! First some context:

• Exams are good assessment tools. They inform you of what you know and what you need to learn or brush up on.
• Homeworks are good teaching tools. You can spend time reviewing the material while figuring out the correct answer.
• We want to offer you the best of both: an assessment of your prerequisite knowledge of the course, and a homework assignment that gives you time to refresh that material.
• Our goal is to give the underprepared student clear feedback on the areas that they should review, but then allow them to put in extra work so that their grade isn’t adversely affected by their under-preparedness. Similarly, we hope to save the well-prepared student some time.

Background Test: In the first week of classes, we will give a Background Test, in-class. You are required to attend the Background Test in person. The purpose of this test is to assess your prerequisite knowledge. The test will roughly cover the following:

1. Probability and statistics
2. Other math background (calculus, linear algebra, geometry)
3. Computer science and programming skills

This is not a normal exam:

• The material is not the core content of this course, but rather of its prerequisites. You could choose to study for it, or not; either choice is perfectly fine. We expect most people will not study ahead of time, but simply use it as an assessment tool.
• It is low stakes since your score is not incorporated directly into your final grade, but rather to bolster your score on Homework 1 (Written).

Background Exercises: Right after the test, we will release the Background Exercises (i.e. the written portion of Homework 1). This assignment will cover roughly the same content.

Grading Scheme: You will receive a single combined grade computed from the scores your earn on the Background Test and Background Exercises. The points you earn on the Background Exercises can only increase your Background Test scores.

Let $$\alpha =$$ proportion of points the Background Test. Let $$\beta =$$ proportion of points on the Background Exercises. Your overall score on Homework 1 (Written) $$\gamma$$ is calculated as: $\gamma = \alpha + (1 - \alpha) \beta = \beta + (1 - \beta) \alpha$

You can view the Background Test as a bonus points for the Background Exercises; or vice versa. If you get a zero on the Background Test, you could still get full marks by correctly solving the Background Exercises. Conversely, if you get many points on the Background Test, you can probably speed through the Background Exercises without much concern.

### 5. Technologies

We use a variety of technologies:

#### Piazza

We will use Piazza for all course discussion. Questions about homeworks, course content, logistics, etc. should all be directed to Piazza. If you have a question, chances are several others had the same question. By posting your question publicly on Piazza, the course staff can answer once and everyone benefits. If you have a private question, you should also use Piazza as it will likely receive a faster response.

We use Gradescope to collect PDF submissions of open-ended questions on the homework (e.g. mathematical derivations, plots, short answers). The course staff will manually grade your submission, and you’ll receive personalized feedback explaining your final marks.

Exams will have a scheduled time and a fixed time limit, and will be live proctored in-person.

Regrade Requests: If you believe an error was made during manual grading, you’ll be able to submit a regrade request on Gradescope. For each Gradescope artifact (e.g. homework, exam), regrade requests will be open for only 1 week after the grades have been published. This is to encourage you to check the feedback you’ve received early!

#### Zoom

Lectures and recitations for at least one section will be livestreamed via Zoom.

#### Panopto

Lecture and recitation video recordings for at least one section will be available on Panopto. The link to the Video Recordings is available in the “Links” dropdown and the recitation recordings will be available.

### 6. General Policies

#### Late homework policy

Late homework submissions are only eligible for 75% of the points the first day (24-hour period) after the deadline, 50% the second, and 25% the third.

You receive 6 total grace days for use on any homework assignment except HW1. We will automatically keep a tally of these grace days for you; they will be applied greedily. No assignment will be accepted more than 3 days after the deadline. This has two important implications: (1) you may not use more than 3 graces days on any single assignment (2) you may not combine grace days with the late policy above to submit more than 3 days late.

HW3, HW6, and HW9 will not be accepted more than 2 days after the deadline, so that we can hold the solution session before the subsequent exams. To ensure you receive graded feedback before the exams, you must submit HW3, HW6, HW9 on time.

All homework submissions are electronic (see Technologies section below). As such, lateness will be determined by the latest timestamp of any part of your submission. For example, suppose the homework requires two submission uploads – if you submit the first upload on time but the second upload 1 minute late, you entire homework will be penalized for the full 24-hour period.

#### Extensions

In general, we do not grant extensions on assignments. There are several exceptions:

• Medical Emergencies: If you are sick and unable to complete an assignment or attend class, please go to University Health Services. For minor illnesses, we expect grace days or our late penalties to provide sufficient accommodation. For medical emergencies (e.g. prolonged hospitalization), students may request an extension afterwards.
• University-Approved Travel: If you are traveling out-of-town to a university approved event or an academic conference, you may request an extension for any time lost due to traveling. For university approved absences, you must provide confirmation of attendance, usually from a faculty or staff organizer of the event or via travel/conference receipts.

For any of the above situations, you may request an extension by emailing the Education Associate(s) at eas-10-601@cs.cmu.edu – do not email the instructor or TAs. Please be specific about which assessment(s) you are requesting an extension for and the number of hours requested. The email should be sent as soon as you are aware of the conflict and at least 5 days prior to the deadline. In the case of an emergency, no notice is needed.

If this is a medical emergency or mental health crisis, you must also CC your CMU College Liaison and your academic advisor. Do not submit any medical documentation to the course staff. If necessary, your College Liaison and The Division of Student Affairs (DoSA) will request such documentation and they will view the health documentation and conclude whether a retroactive extension is appropriate. (If you haven’t interacted with your college liaison before, they are experienced Student Affairs staff who work in partnership with students, housefellows, advisors, faculty, and associate deans in each college to assure support for students regarding their overall Carnegie Mellon experience.)

#### Audit Policy

Formal auditing of this course is permitted. However, we give priority to students taking the course for a letter grade.

You must follow the official procedures for a Course Audit as outlined by the HUB / registrar. Please do not email the instructor requesting permission to audit. Instead, you should first register for the appropriate section. Next fill out the Course Audit Approval form and obtain the instructor’s signature in-person (either at office hours or immediately after class).

Auditors are required to:

1. Attend or watch all of the lectures.
3. Submit at least 3 of the 9 homework assignments.

Auditors are encouraged to sit for the exams, but should only do so if they plan to put forth actual effort in solving them.

#### Pass/Fail Policy

We allow you take the course as Pass/Fail. Instructor permission is not required. What letter grade is the cutoff for a Pass will depend on your specific program; we do not specify whether or not you Pass but rather we compute your letter grade the same as everyone else in the class (i.e. using the cutoffs listed above) and your program converts that letter grade to a Pass or Fail depending on their cutoff. Be sure to check with your program / department as to whether you can count a Pass/Fail course towards your degree requirements.

#### Accommodations for Students with Disabilities:

If you have a disability and have an accommodations letter from the Disability Resources office, please email the Education Associate(s) at eas-10-601@cs.cmu.edu requesting to set up a meeting with them to discuss your accommodations and needs as early in the semester as possible. The EAs will work with you to ensure that accommodations are provided as appropriate. If you suspect that you may have a disability and would benefit from accommodations but are not yet registered with the Office of Disability Resources, I encourage you to contact them at access@andrew.cmu.edu.

### 7. Collaboration and Academic Integrity Policies

#### Collaboration among Students

• The purpose of student collaboration is to facilitate learning, not to circumvent it. Studying the material in groups is strongly encouraged. You are also allowed to seek help from other students in understanding the material needed to solve a particular homework problem, provided any written notes (including code) are taken on an impermanent surface (e.g. whiteboard, chalkboard), and provided learning is facilitated, not circumvented. The actual solution must be done by each student alone.
• A good method to follow when collaborating is to meet with your peers, discuss ideas at a high level, but do not copy down any notes from each other or from a white board. Any scratch work done at this time should be your own only. Before writing the assignment solutions, you should make sure that you are doing this without anyone else present, putting all notes away, closing all tabs on your computer, and writing it completely by yourself with no other resources.
• You are absolutely not allowed to share/compare answers or screen share your work with one another.

• The presence or absence of any form of help or collaboration, whether given or received, must be explicitly stated and disclosed in full by all involved. Specifically, each assignment solution must include answering the following questions:
1. Did you receive any help whatsoever from anyone in solving this assignment? Yes / No.
• If you answered ‘yes’, give full details: ____________
• (e.g. “Jane Doe explained to me what is asked in Question 3.4”)
2. Did you give any help whatsoever to anyone in solving this assignment? Yes / No.
• If you answered ‘yes’, give full details: _____________
• (e.g. “I pointed Joe Smith to section 2.3 since he didn’t know how to proceed with Question 2”)
3. Did you find or come across code that implements any part of this assignment ? Yes / No. (See below policy on “found code”)
• If you answered ‘yes’, give full details: _____________
• (book & page, URL & location within the page, etc.).
• If you gave help after turning in your own assignment and/or after answering the questions above, you must update your answers before the assignment’s deadline, if necessary by emailing the course staff.
• Collaboration without full disclosure will be handled severely, in compliance with CMU’s Policy on Academic Integrity.

#### Previously Used Assignments

Some of the homework assignments used in this class may have been used in prior versions of this class, or in classes at other institutions, or elsewhere. Solutions to them may be, or may have been, available online, or from other people or sources. It is explicitly forbidden to use any such sources, or to consult people who have solved these problems before. It is explicitly forbidden to search for these problems or their solutions on the internet. You must solve the homework assignments completely on your own. We will be actively monitoring your compliance. Collaboration with other students who are currently taking the class is allowed, but only under the conditions stated above.

#### AI Assistance

You are not permitted to seek assistance from an artificial intelligence system (e.g., a large language model such as ChatGPT).

#### Policy Regarding “Found Code”:

You are encouraged to read books and other instructional materials, both online and offline, to help you understand the concepts and algorithms taught in class. These materials may contain example code or pseudo code, which may help you better understand an algorithm or an implementation detail. However, when you implement your own solution to an assignment, you must put all materials aside, and write your code completely on your own, starting “from scratch”. Specifically, you may not use any code you found or came across. If you find or come across code that implements any part of your assignment, you must disclose this fact in your collaboration statement.

#### Duty to Protect One’s Work

Students are responsible for pro-actively protecting their work from copying and misuse by other students. If a student’s work is copied by another student, the original author is also considered to be at fault and in gross violation of the course policies. It does not matter whether the author allowed the work to be copied or was merely negligent in preventing it from being copied. When overlapping work is submitted by different students, both students will be punished.

To protect future students, do not post your solutions publicly, neither during the course nor afterwards.

#### Penalties for Violations of Course Policies

All violations (even first one) of course policies will always be reported to the university authorities (your Department Head, Associate Dean, Dean of Student Affairs, etc.) as an official Academic Integrity Violation and will carry severe penalties.

1. The penalty for the first violation is a negative 100% on the assignment (i.e. it would have been better to submit nothing and receive a 0%).

2. The penalty for the second violation is failure in the course, and can even lead to dismissal from the university.

(The above policies are adapted from Roni Rosenfeld’s 10-601 Spring 2016 Course Policies.)

### 8. Support

Take care of yourself. Do your best to maintain a healthy lifestyle this semester by eating well, exercising, avoiding drugs and alcohol, getting enough sleep and taking some time to relax. This will help you achieve your goals and cope with stress.

All of us benefit from support during times of struggle. You are not alone. There are many helpful resources available on campus and an important part of the college experience is learning how to ask for help. Asking for support sooner rather than later is often helpful.

If you or anyone you know experiences any academic stress, difficult life events, or feelings like anxiety or depression, we strongly encourage you to seek support. Counseling and Psychological Services (CaPS) is here to help: call 412-268-2922 and visit their website at http://www.cmu.edu/counseling/. Consider reaching out to a friend, faculty or family member you trust for help getting connected to the support that can help.

If you or someone you know is feeling suicidal or in danger of self-harm, call someone immediately, day or night:

• CaPS: 412-268-2922
• Re:solve Crisis Network: 888-796-8226
• If the situation is life threatening, call the police:
• On campus: CMU Police: 412-268-2323
• Off campus: 911.

### 9. Diversity

We must treat every individual with respect. We are diverse in many ways, and this diversity is fundamental to building and maintaining an equitable and inclusive campus community. Diversity can refer to multiple ways that we identify ourselves, including but not limited to race, color, national origin, language, sex, disability, age, sexual orientation, gender identity, religion, creed, ancestry, belief, veteran status, or genetic information. Each of these diverse identities, along with many others not mentioned here, shape the perspectives our students, faculty, and staff bring to our campus. We, at CMU, will work to promote diversity, equity and inclusion not only because diversity fuels excellence and innovation, but because we want to pursue justice. We acknowledge our imperfections while we also fully commit to the work, inside and outside of our classrooms, of building and sustaining a campus community that increasingly embraces these core values.

Each of us is responsible for creating a safer, more inclusive environment.

Unfortunately, incidents of bias or discrimination do occur, whether intentional or unintentional. They contribute to creating an unwelcoming environment for individuals and groups at the university. Therefore, the university encourages anyone who experiences or observes unfair or hostile treatment on the basis of identity to speak out for justice and support, within the moment of the incident or after the incident has passed. Anyone can share these experiences using the following resources:

• Center for Student Diversity and Inclusion: csdi@andrew.cmu.edu, (412) 268-2150