11-785 Introduction to Deep Learning
Fall 2017

“Deep Learning” systems, typified by deep neural networks, are increasingly taking over all AI tasks, ranging from language understanding, and speech and image recognition, to machine translation, planning, and even game playing and autonomous driving. As a result, expertise in deep learning is fast changing from an esoteric desirable to a mandatory prerequ isite in many advanced academic settings, and a large advantage in the industrial job market.

In this course we will learn about the basics of deep neural networks, and their applications to various AI tasks. By the end of the course, it is expected that students will have significant familiarity with the subject, and to be able to apply to them to a variety of tasks. They will also be positioned to understand much of the current literature on the topic and extend their knowledge through further study.

Instructor: Bhiksha Raj


Time: Mondays, Thursdays, 9.00am-10.20am

Office hours:


  1. We will be using one of several toolkits. The toolkits are largely programmed in Python or Lua. You will need to be able to program in at least one of these languages. Alternately, you will be responsible for finding and learning a toolkit that requires programming in a language you are comfortable with,
  2. You will need familiarity with basic calculus (differentiation, chain rule), linear algebra and basic probability.


This course is worth 12 units.

Course Work


Grading will be based on weekly quizzes, homework assignments and a final project. There will be six assignments in all.

Quizzes 12 or 13, total contribution to grade 25%
Assignments 6, total contribution to grade 50%
Project Total contribution to grade: 25%


Deep learning is a relatively new, fast developing topic, and there are no standard textbooks on the subject that cover the state-of-art, although there are several excellent tutorial books that one can refer to. The topics in this course are collected from a variety of sources, including recent papers. As a result, we do not specify a single standard textbook. However, we list a number of useful books at the end of this page, which we greatly encourage students to read, as they will provide much of the background for the course. We will also put up links to relevant reading material for each class. Students are expected to familiarize themselves with the material before the class. The readings will sometimes be arcane and difficult to understand; if so, do not worry, we will present simpler explanations in class.

Discussion board: Piazza

We will use Piazza for discussions. Here is the link. Please sign up.

Academic Integrity

You are expected to comply with the University Policy on Academic Integrity and Plagiarism.
  • You are allowed to talk with/work with other students on homework assignments
  • You can share ideas but not code, you should submit your own code
Your course instructor reserves the right to determine an appropriate penalty based on the violation of academic dishonesty that occurs. Violations of the university policy can result in severe penalties including failing this course and possible expulsion from Carnegie Mellon University. If you have any questions about this policy and any work you are doing in the course, please feel free to contact your instructor for help.

Spring 2018 Course Page

Tentative Schedule, Fall 2017

Lecture Start date Topics Lecture notes/Slides Additional readings, if any Quizzes/Assignments
1 August 28
  • Introduction to deep learning
  • Course logistics
  • The perceptron/multli-layer perceptron
  • Hebbian learning
2 August 30
  • The neural net as a universal approximator
3 September 6
  • Training a neural network
  • Perceptron learning rule
  • Empirical Risk Minimization
  • Optimization by gradient descent
4 September 11
  • Back propagation
  • Calculus of back propogation
5 September 13
  • Convergence in neural networks
  • Rates of convergence
  • Loss surfaces
  • Learning rate and data normalization
  • RMSProp, Adagrad, Momentum
slides Assignment 1
6 September 18
  • Stochastic gradient descent
  • Acceleration
  • Overfitting and regularization
  • Tricks of the trade:
    • Choosing a divergence (loss) function
    • Batch normalization
    • Dropout
7 September 20 Review of neural network training
8 September 25
  • Using Deep Learning Models to Understand Visual Cortex (Mike Tarr)
9 September 27
  • Cascade Correlation (Scott Fahlman)
slides available on Piazza Please do not distribute.
10 October 2
  • Convolutional Neural Networks (CNNs)
  • Weights as templates
  • Translation invariance
  • Training with shared parameters
  • Arriving at the convlutional model
slides Goodfellow Chapter 9
11 October 4
  • Models of vision
  • Neocognitron
  • Mathematical details of CNNs
  • Alexnet, Inception, VGG
12 October 9 Class cancelled
13 October 11
  • Recurrent Neural Networks (RNNs)
  • Modeling series
  • Back propogation through time
  • Bidirectional RNNs
slides Goodfellow Chapter 10
14 October 16
  • Exploding/vanishing gradients
  • Long Short-Term Memory Units (LSTMs)
15 October 18
  • What do network's represent
  • Autoencoders and dimensionality reduction
  • Representation learning
16 October 23
  • Variational Autoencoders (VAEs)
  • Factor Analysis
  • Expectation Maximization and Variational Inference
17 October 25
  • Generative Adversarial Networks (GANs)
Recitation October 27
  • RNN Recitation
18 October 30
  • Hopfield Networks
  • Energy functions
19 November 1 Pulkit Agarwal
  • Curiosity Driven Exploration by Self Supervised Prediction
20 November 6
  • Deepnets for Speech recognition (Rich Stern)
21 November 8
  • Class cancelled
22 November 13 Graham Neubig
  • Sequence-to-Sequence models
  • Attention
23 November 15
  • Hopfield network storage capacity
24 November 17 (Make up class)
  • From Hopfield networks to Boltzman Machines
25 November 20 Reinforcement Learning (part 1)
  • Markov Reward Process
  • Markov Decision Process
  • Bellman Expectation Equation
26 November 27 Reinforcement Learning (part 2)
  • Actions
  • Optimal Policies
  • Solving the value function
slides: same deck as RL part 1
27 November 29 Reinforcement Learning (part 3)
  • Value iterations
  • Policy iterations
28 December 4
  • Reinforcement Learning (part 4)
29 December 6
  • Newer models and trends. Memory, tape, and Turing machines

Documentation and Tools


ebook image
Deep Learning By Ian Goodfellow, Yoshua Bengio, Aaron Courville Online book, 2017
ebook image
Neural Networks and Deep Learning By Michael Nielsen Online book, 2016
Deep Learning with Python
Deep Learning with Python By J. Brownlee
ebook image
Parallel Distributed Processing By Rumelhart and McClelland Out of print, 1986

A nice catalog of the various types of neural network models that are current in the literature can be found here. We expect that you will be in a position to interpret, if not fully understand many of these architectures by the end of the course.