Introduction to Machine Learning

10-601, Spring 2017
School of Computer Science
Carnegie Mellon University


Course Info

1. Course description

Machine Learning is concerned with computer programs that automatically improve their performance through experience (e.g., programs that learn to recognize human faces, recommend music and movies, and drive autonomous robots). This course covers the theory and practical algorithms for machine learning from a variety of perspectives. We cover topics such as Bayesian networks, decision tree learning, Support Vector Machines, statistical learning methods, unsupervised learning and reinforcement learning. The course covers theoretical concepts such as inductive bias, the PAC learning framework, Bayesian learning methods, margin-based learning, and Occam’s Razor. Short programming assignments include hands-on experiments with various learning algorithms. This course is designed to give a graduate-level student a thorough grounding in the methodologies, technologies, mathematics and algorithms currently needed by people who do research in machine learning.

10-601 is open to all but is recommended for CS Seniors & Juniors, Quantitative Masters students, and non-SCS PhD students.

2. Prerequisites

Students entering the class are expected to have a pre-existing working knowledge of probability, linear algebra, statistics and algorithms, though the class has been designed to allow students with a strong numerate background to catch up and fully participate. In addition, recitation sessions will be held to review some basic concepts.

  1. You need to have, before starting this course, significant experience programming in a general programming language. Specifically, you need to have written from scratch programs consisting of several hundred lines of code. For undergraduate students, this will be satisfied for example by having passed 15-122 (Principles of Imperative Computation) with a grade of ‘C’ or higher, or comparable courses or experience elsewhere.

    Note: For each programming assignment, we will allow you to pick between Python and Octave (an open source version of Matlab).

  2. You need to have, before starting this course, basic familiarity with probability and statistics, as can be achieved at CMU by having passed 36-217 (Probability Theory and Random Processes) or 36-225 (Introduction to Probability and Statistics I), or 15-359, or 21-325, or comparable courses elsewhere, with a grade of ‘C’ or higher.

  3. You need to have, before starting this course, college-level maturity in discrete mathematics, as can be achieved at CMU by having passed 21-127 (Concepts of Mathematics) or 15-151 (Mathematical Foundations of Computer Science), or comparable courses elsewhere, with a grade of ‘C’ or higher.

You must strictly adhere to these pre-requisites! Even if CMU’s registration system does not prevent you from registering for this course, it is still your responsibility to make sure you have all of these prerequisites before you register.

(Adapted from Roni Rosenfeld’s 10-601 Spring 2016 Course Policies.)

4. Grading

The requirements of this course consist of participating in lectures, midterm, final, and 8 problem sets. This is a MS level class, and the most important thing for us is that by the end of this class students understand the basic methodologies in machine learning, and be able to use them to solve real problems of modest complexity. The grading breakdown is the following:

  • 45% for Homeworks
  • 25% for Midterm Exam
  • 30% for Final Exam
  • On Piazza, the Top Student “Endorsed Answer” Answerers can earn bonus points

Background Test + Homework 1: Background Exercises

Read this section carefully! First some context:

  • Exams are good assessment tools. They inform you of your standing relative to your peers.
  • Homeworks are good teaching tools. You can spend time reviewing the material while figuring out the correct answer.
  • We want to offer you the best of both: an assessment of your prerequisite knowledge of the course, and a homework assignment that gives you time to refresh that material.
  • Our goal is to give the underprepared student clear feedback on the areas that they should review, but then allow him/her to put in extra work so that their grade isn’t adversely affected by their under-preparedness.

Background Test: In the second week of classes, we will give a Background Test. The exact time / location will be announced on Piazza, but it will be in the evening – not during class. You are required to attend the Background Test in person. The purpose of this test is to assess your prerequisite knowledge. The test will be broken into four sections:

  1. Probability and statistics
  2. Other math background (calculus, linear algebra)
  3. Algorithms and Computer Science
  4. Programming Skills

Background Exercises: Shortly after the exam, we will release the Background Exercises (Homework 1). This assignment will be divided into the same four sections.

Grading Scheme: You will receive a single combined grade computed from the scores your earn on the Background Test and Background Exercises. For each section, the points you earn on Background Exercises can only increase your Background Test scores.

For all \(i \in {1,2,3,4}\), let \(\alpha_i = \) proportion of points on section \(i\) of the Background Test. Let \(\beta_i = \) proportion of points on section \(i\) of the Background Exercises. Your overall section score \(\gamma_i\) is calculated as: \[ \gamma_i = \alpha_i + (1 - \alpha_i) \beta_i \]

Thus, even if you get a zero on a section of the Background Test, you could still recover all the points by correctly solving the Background Exercises in the corresponding section.

The overall score \( \frac{1}{4} \sum_{i=1}^4 \gamma_i \) will be counted as your Homework 1 grade.

Midterm and Final Exam

You are required to attend the midterm and final fxams. The midterm exam will be given in the evening – not in class. The final exam will be given during the official final exam week.

If you have an unavoidable conflict with an exam, notify us at by emailing the assistant instructor(s) at 10601-assistant-instructors@cs.cmu.edu – do not email the instructor or TAs.

No electronic devices are allowed during the exam. Unless otherwise noted, all exams are closed-book.

Extensions

If you have an unavoidable conflict that would prevent you from completing a homework on time (e.g. travel to a conference, medical emergency), you may request an extension by emailing the assistant instructor(s) at 10601-assistant-instructors@cs.cmu.edu – do not email the instructor or TAs. The email should be sent as soon as you are aware of the conflict and at least 5 days prior to the deadline. In the case of an emergency (e.g. sudden illness or family emergency), no notice is needed.

Late homework policy

Late homework submissions are penalized by 25% for each day (24-hour period) past the deadline.

All homework submissions are electronic (see Technologies section below). As such, lateness will be determined by the latest timestamp of any part of your submission. For example, suppose the homework requires submissions to both Gradescope and Autolab – if you submit to Gradescope on time but to Autolab 1 minute late, you entire homework will be penalized for the full 24-hour period.

Audit Policy

Formal auditing of this course is permitted. However, we give priority to students taking the course for a letter grade. For sections A and B, we will only allow auditors in after the waitlist has been cleared and everyone wishing to register for a letter grade has been given an opportunity. Anyone is welcome to audit in section C at any time. You must follow the official procedures for a Course Audit as outlined by the HUB / registrar.

Please do not email the instructor requesting permission to audit. Instead, you should first register for the appropriate section (usually section C). Next fill out the Course Audit Approval form and obtain the instructor’s signature in-person (either at office hours or immediately after class).

Auditors are required to:

  1. Attend or watch all of the lectures.
  2. Fill out the 10-601 Lecture Feedback form for 75% of the lectures. You must include your Andrew Email to receive credit for submission. Please give honest and detailed feedback.
  3. Take the midterm and final exam. Please clearly indicate on the front page of your exam that you are auditing. The TAs may or may not grade your exam. You do not need to obtain a certain score, we simply expect you to give your best effort.

Auditors are permitted but not required to submit auto-graded homework assignments (i.e. Autolab and QnA), but we ask that they do not submit free-hand assignments (i.e. Gradescope). Note that such submissions are entirely optional.

Pass/Fail Policy

We allow you take the course as Pass/Fail. Instructor permission is not required. What grade is the cutoff for Pass will depend on your program. Be sure to check with your program / department as to whether you can count a Pass/Fail course towards your degree requirements.

5. Technologies

We use a variety of technologies:

Piazza

We will use Piazza for all course discussion. Questions about homeworks, course content, logistics, etc. should all be directed to Piazza. If you have a question, chances are several others had the same question. By posting your question publicly on Piazza, the course staff can answer once and everyone benefits. If you have a private question, you should also use Piazza as it will likely receive a faster response.

Autolab

You will submit your code for programming questions on the homework to Autolab. After uploading your code, our grading scripts will autograde your assignment by running your program on a VM. This provides you with immediate feedback on the performance of your submission.

Gradescope

We use Gradescope to collect PDF submissions of open-ended questions on the homework (e.g. mathematical derivations, plots, short answers). Upon uploading your PDF, Gradescope will ask you to identify which page(s) contains your solution for each problem – this is a great way to double check that you haven’t left anything out. The course staff will manually grade your submission, and you’ll receive personalized feedback explaining your final marks.

Regrade Requests: If you believe an error was made during manual grading, you’ll be able to submit a regrade request on Gradescope. For each homework, regrade requests will be open for only 1 week after the grades have been published. This is to encourage you to check the feedback you’ve received early!

QnA

QnA is a tool developed at CMU for quiz-style problems (e.g. multiple choice, true / false, numerical answers). Grading is done automatically – usually after the deadline has passed – and you’ll receive feedback then.

Autolab’s Gradebook

All three of the above (Autolab, Gradescope, QnA) will give you marks for each part of the corresponding assignment. We will also periodically post aggregate grades to Autolab (usually around midsemester grades and final grades). This provides you a chance to double check that your overall grade is what you expected.

6. Homework Resources and Collaboration Policies

Read this carefully!

(Adapted from Roni Rosenfeld’s 10-601 Spring 2016 Course Policies.)

Collaboration among Students

  • The purpose of student collaboration is to facilitate learning, not to circumvent it. Studying the material in groups is strongly encouraged. It is also allowed to seek help from other students in understanding the material needed to solve a particular homework problem, provided no written notes (including code) are shared, or are taken at that time, and provided learning is facilitated, not circumvented. The actual solution must be done by each student alone.
  • The presence or absence of any form of help or collaboration, whether given or received, must be explicitly stated and disclosed in full by all involved. Specifically, each assignment solution must include answering the following questions:
    1. Did you receive any help whatsoever from anyone in solving this assignment? Yes / No.
      • If you answered ‘yes’, give full details: ____________
      • (e.g. “Jane Doe explained to me what is asked in Question 3.4”)
    2. Did you give any help whatsoever to anyone in solving this assignment? Yes / No.
      • If you answered ‘yes’, give full details: _____________
      • (e.g. “I pointed Joe Smith to section 2.3 since he didn’t know how to proceed with Question 2”)
    3. Did you find or come across code that implements any part of this assignment ? Yes / No. (See below policy on “found code”)
      • If you answered ‘yes’, give full details: _____________
      • (book & page, URL & location within the page, etc.).
  • If you gave help after turning in your own assignment and/or after answering the questions above, you must update your answers before the assignment’s deadline, if necessary by emailing the course staff.
  • Collaboration without full disclosure will be handled severely, in compliance with CMU’s Policy on Cheating and Plagiarism.

Previously Used Assignments

Some of the homework assignments used in this class may have been used in prior versions of this class, or in classes at other institutions, or elsewhere. Solutions to them may be, or may have been, available online, or from other people or sources. It is explicitly forbidden to use any such sources, or to consult people who have solved these problems before. It is explicitly forbidden to search for these problems or their solutions on the internet. You must solve the homework assignments completely on your own. We will be actively monitoring your compliance. Collaboration with other students who are currently taking the class is allowed, but only under the conditions stated above.

Policy Regarding “Found Code”:

You are encouraged to read books and other instructional materials, both online and offline, to help you understand the concepts and algorithms taught in class. These materials may contain example code or pseudo code, which may help you better understand an algorithm or an implementation detail. However, when you implement your own solution to an assignment, you must put all materials aside, and write your code completely on your own, starting “from scratch”. Specifically, you may not use any code you found or came across. If you find or come across code that implements any part of your assignment, you must disclose this fact in your collaboration statement.

Duty to Protect One’s Work

Students are responsible for pro-actively protecting their work from copying and misuse by other students. If a student’s work is copied by another student, the original author is also considered to be at fault and in gross violation of the course policies. It does not matter whether the author allowed the work to be copied or was merely negligent in preventing it from being copied. When overlapping work is submitted by different students, both students will be punished.

To protect future students, do not post your solutions publicly, neither during the course nor afterwards.

Penalties for Violations of Course Policies

All violations (even first one) of course policies will always be reported to the university authorities (your Department Head, Associate Dean, Dean of Student Affairs, etc.) as an official Academic Integrity Violation and will carry severe penalties.

  1. The penalty for the first violation is a one-and-a-half letter grade reduction. For example, if your final letter grade for the course was to be an A-, it would become a C+.

  2. The penalty for the second violation is failure in the course, and can even lead to dismissal from the university.

7. Accommodations for Students with Disabilities:

If you have a disability and have an accommodations letter from the Disability Resources office, I encourage you to discuss your accommodations and needs with me as early in the semester as possible. I will work with you to ensure that accommodations are provided as appropriate. If you suspect that you may have a disability and would benefit from accommodations but are not yet registered with the Office of Disability Resources, I encourage you to contact them at access@andrew.cmu.edu.

8. Support

Take care of yourself. Do your best to maintain a healthy lifestyle this semester by eating well, exercising, avoiding drugs and alcohol, getting enough sleep and taking some time to relax. This will help you achieve your goals and cope with stress.

All of us benefit from support during times of struggle. You are not alone. There are many helpful resources available on campus and an important part of the college experience is learning how to ask for help. Asking for support sooner rather than later is often helpful.

If you or anyone you know experiences any academic stress, difficult life events, or feelings like anxiety or depression, we strongly encourage you to seek support. Counseling and Psychological Services (CaPS) is here to help: call 412-268-2922 and visit their website at http://www.cmu.edu/counseling/. Consider reaching out to a friend, faculty or family member you trust for help getting connected to the support that can help.

If you or someone you know is feeling suicidal or in danger of self-harm, call someone immediately, day or night:

  • CaPS: 412-268-2922
  • Re:solve Crisis Network: 888-796-8226
  • If the situation is life threatening, call the police:
    • On campus: CMU Police: 412-268-2323
    • Off campus: 911.

If you have questions about this or your coursework, please let the instructors know.


9. Note to people outside CMU

Please feel free to reuse any of these course materials that you find of use in your own courses. We ask that you retain any copyright notices, and include written notice indicating the source of any materials you use.