11-785 Introduction to Deep Learning
Spring 2018

“Deep Learning” systems, typified by deep neural networks, are increasingly taking over all AI tasks, ranging from language understanding, and speech and image recognition, to machine translation, planning, and even game playing and autonomous driving. As a result, expertise in deep learning is fast changing from an esoteric desirable to a mandatory prerequisite in many advanced academic settings, and a large advantage in the industrial job market.

In this course we will learn about the basics of deep neural networks, and their applications to various AI tasks. By the end of the course, it is expected that students will have significant familiarity with the subject, and to be able to apply to them to a variety of tasks. They will also be positioned to understand much of the current literature on the topic and extend their knowledge through further study.

Instructor: Bhiksha Raj

TAs:

Lecture: Monday and Wednesday, 9.00am-10.20am

Recitation: Friday, 9.00am-10.20am, Newell Simon 3002

Office hours:

Prerequisites

  1. We will be using one of several toolkits (typically TensorFlow or PyTorch). The toolkits are largely programmed in Python. You will need to be able to program in at least one of these languages. Alternately, you will be responsible for finding and learning a toolkit that requires programming in a language you are comfortable with,
  2. You will need familiarity with basic calculus (differentiation, chain rule), linear algebra and basic probability.

Units

This course is worth 12 units.

Course Work

Grading

Grading will be based on weekly quizzes, homework assignments and a final project.

There will be five assignments in all. Note that assignments 4 and 5 are released simultaneously. They will also be due on the same date.

Maximum
Quizzes 12 or 13, total contribution to grade 25%
Assignments 5, total contribution to grade 50%
Project Total contribution to grade: 25%

Books

The course will not follow a specific book, but will draw from a number of sources. We list relevant books at the end of this page. We will also put up links to relevant reading material for each class. Students are expected to familiarize themselves with the material before the class. The readings will sometimes be arcane and difficult to understand; if so, do not worry, we will present simpler explanations in class.

Discussion board: Piazza

We will use Piazza for discussions. Here is the link. Please sign up.

Wiki page

We have created an experimental wiki explaining the types of neural networks in use today. Here is the link.

You can also find a nice catalog of models that are current in the literature here. We expect that you will be in a position to interpret, if not fully understand many of the architectures on the wiki and the catalog by the end of the course.

Kaggle

Kaggle is a popular data science platform where visitors compete to produce the best model for learning or analyzing a data set.

For assignments 4 and 5 you will be submitting your evaluation results to a Kaggle leaderboard.

Academic Integrity

You are expected to comply with the University Policy on Academic Integrity and Plagiarism.
  • You are allowed to talk with / work with other students on homework assignments
  • You can share ideas but not code, you should submit your own code
Your course instructor reserves the right to determine an appropriate penalty based on the violation of academic dishonesty that occurs. Violations of the university policy can result in severe penalties including failing this course and possible expulsion from Carnegie Mellon University. If you have any questions about this policy and any work you are doing in the course, please feel free to contact your instructor for help.

Tentative Schedule

Lecture Start date Topics Lecture notes/Slides Additional readings, if any Quizzes/Assignments
1 January 17
  • Introduction to deep learning
  • Course logistics
  • History and cognitive basis of neural computation.
  • The perceptron / multi-layer perceptron
slides
2 January 22
  • The neural net as a universal approximator
slides
3 January 24
  • Training a neural network
  • Perceptron learning rule
  • Empirical Risk Minimization
  • Optimization by gradient descent
slides Assignment 1
4 January 29
  • Back propagation
  • Calculus of back propogation
slides
5 January 31
  • Convergence in neural networks
  • Rates of convergence
  • Loss surfaces
  • Learning rates, and optimization methods
  • RMSProp, Adagrad, Momentum
slides
6 February 5
  • Stochastic gradient descent
  • Acceleration
  • Overfitting and regularization
  • Tricks of the trade:
    • Choosing a divergence (loss) function
    • Batch normalization
    • Dropout
slides
7 February 7 Guest Lecture (Scott Fahlman)
8 February 12
  • Convolutional Neural Networks (CNNs)
  • Weights as templates
  • Translation invariance
  • Training with shared parameters
  • Arriving at the convlutional model
slides Assignment 2
9 February 14
  • Models of vision
  • Neocognitron
  • Mathematical details of CNNs
  • Alexnet, Inception, VGG
slides
10 February 19
  • Recurrent Neural Networks (RNNs)
  • Modeling series
  • Back propogation through time
  • Bidirectional RNNs
slides
11 February 21
  • Stability
  • Exploding/vanishing gradients
  • Long Short-Term Memory Units (LSTMs) and variants
  • Resnets
slides
12 February 23
  • Loss functions for recurrent networks
  • Connectionist Temporal Classification (CTC)
  • Sequence prediction
13 February 28
  • What do networks represent
  • Autoencoders and dimensionality reduction
  • Representation learning
slides Assignment 3
14 March 5
  • Variational Autoencoders (VAEs) Part 1
  • Factor Analysis
  • Expectation Maximization and Variational Inference
slides
15 March 7
  • Variational Autoencoders (VAEs) Part 2
slides
16 March 12 Spring break
17 March 14 Spring break
18 March 19 NNets in Speech Recognition, Guest Lecture (Stern)
19 March 21
  • Generative Adversarial Networks (GANs) Part 1
slides Assignments 4 and 5
20 March 26
  • Generative Adversarial Networks (GANs) Part 2
slides
21 March 28
  • Hopfield Networks
  • Energy functions
slides
22 April 2
  • Boltzmann Machines
  • Learning in Boltzmann machines
slides
23 April 4
  • Restricted Boltzman Machines
  • Deep Boltzman Machines
slides
24 April 9 Guest lecture (TBD)
25 April 11
  • Reinforcement Learning 1
26 April 16
  • Reinforcement Learning 2
27 April 18
  • Reinforcement Learning 3
28 April 23
  • Q Learning
  • Deep Q Learning
29 April 25 Guest Lecture (TBD)
30 April 30
  • Multi-task and multi-label learning, transfer learning with NNets
31 May 2
  • Newer models and trends
  • Review

Tentative Schedule of Recitations

Recitation Start date Topics
1 January 19 Amazon Web Services (AWS)
2 January 26 Practical Deep Learning in Python
3 February 2 Optimization methods
4 February 9 Tuning methods
5 February 16 TBD
6 February 23 TBD
7 March 2 RNN's and LSTM's
8 March 9 TBD
9 March 16 TBD
10 March 23 Practical implementation of VAEs
11 March 30 Practical implementation of GANs
12 April 6 Practice with BMs and RBMs
13 April 13 TBD
14 April 20 TBD
15 April 27 TBD

Documentation and Tools

Textbooks

Deep Learning
Deep Learning By Ian Goodfellow, Yoshua Bengio, Aaron Courville Online book, 2017
Neural Networks and Deep Learning
Neural Networks and Deep Learning By Michael Nielsen Online book, 2016
Deep Learning with Python
Deep Learning with Python By J. Brownlee
Parallel Distributed Processing
Parallel Distributed Processing By Rumelhart and McClelland Out of print, 1986