Information Processing and Learning

10-704, Spring 2012

Aarti Singh

Teaching Assistant: Min Xu
Class Assistant: Michelle Martin


Scribe Schedule - LaTeX Template (courtesy of UC Berkeley EECS dept) - Presentation Schedule

Date Lecture Topics Suggested Readings Assignments
Jan 17
Lecture1

  • Intro
  • Information Content, Entropy, Relative Entropy
  • Connection to Maximum Likelihood Estimation
  • Cover-Thomas: 2.1-2.4
  • MacKay: 2.4, 2.5
Jan 19
Lecture2

  • Gibbs/Information Inequality
  • Data Processing Inequality, Sufficient Statistics
  • Fano's Inequality
  • Cover-Thomas: 2.5-2.10
  • MacKay: Ch 8
Jan 24
Lecture3

  • Fano is sharp
  • Continuous random variables
  • Max entropy distributions
  • Cover-Thomas: 2.10, 8.1, 8.4, 8.5, 12.1
HW1 released
Solutions
Jan 26
Lecture4

  • Max entropy distributions (contd..)
  • Information Geometry - I-Projection and Pythagoras theorem
  • Duality - Maximum Likelihood estimation in Exponential families
Jan 31
Lecture5

  • Information geometry, Orthogonal families
  • Entropy rate of a stochastic process
  • Max entropy rate process (Burg's theorem)
  • Cover-Thomas: 4.1, 4.2, 12.5, 12.6
Feb 2
Lecture6

  • Data Compression/Source Coding
  • Asymptotic Equipartition property
  • Achievability of data compression limit via typicality
  • Cover-Thomas: Ch 3, 5.1
  • MacKay: Ch 4
Feb 7
Lecture7

  • Unique decodability, Prefix codes
  • Kraft and Mc-Millan Inequalities
  • Ideal prefix codelength and Shannon code
  • Cover-Thomas: 5.2-5.5
  • MacKay: 5.1-5.4
HW2 released
Solutions
Feb 9 Guest Lecture - Or Sheffet
Feb 14
Lecture8

  • Source coding theorem
  • Lower bound for non-singular codelengths
  • Huffman coding and optimality
  • Cover-Thomas: 5.6-5.8
  • MacKay: 5.5-5.7
Feb 16
Lecture9

  • Arithmetic coding
  • Redundancy of a code
Feb 21
Lecture10

  • Universal coding and prediction
  • Minimax expected redundancy and Minimax excess risk
  • Universal codes and predictors for iid and Markov processes
Feb 23
Lecture11

  • Worst case redundancy bounds for iid processes
  • Weak universality for stationary processes
HW3 released
Solution
Feb 28
Lecture12

  • Universality for Hierarchical Classes
Mar 1
Lecture13

  • Minimum Description Length (MDL) Principle
  • Model Selection
Mar 6
Lecture14

  • Complexity-penalized density estimation
  • Regularized Maximum Likelihood
Mar 8 Quiz 1
Mar 13 Spring Break
Mar 15
Mar 20
Lecture15

  • Information Channel Capacity
  • Rate of a channel code
  • Operational Channel Capacity
  • Cover-Thomas: 7.1,7.2,7.5
  • MacKay: 9.1-9.6
Mar 22
Lecture16

  • Joint Typicality
  • Channel Coding Theorem
  • Cover-Thomas: 7.4-7.9
Mar 27
Lecture17

  • Feedback capacity
  • Joint Source-Channel Coding
  • Continuous (Gaussian) Channels - Information Capacity
  • Cover-Thomas: 7.12, 7.13, 9.1
Mar 29 Class Cancelled
Apr 3
Lecture18

  • Gaussian channel - Operational Capacity, Continuous-time
  • Parallel channels (independent and correlated)
  • Rate-distortion theory
  • Cover-Thomas: 9.1-9.5, 10.1-10.3
Apr 5
Lecture19

  • Redundancy-Capacity Theorem
HW4 released Solution
Apr 10
Lecture20

  • Lower bounds on redundancy
  • General Loss functions
  • Prediction of Individual Sequences and Online learning with mixture of experts
Apr 12
Lecture21

  • Hypothesis Testing: Neyman-Pearson, Bayesian
  • Method of Types
  • Large Deviations - Sanov's Theorem
  • Cover-Thomas: 11.1, 11.4,11.5
Apr 17
Lecture22

  • Error Exponents in Hypothesis Testing
  • Generalized Likelihood Ratio
  • Cover-Thomas: 11.7,11.8,11.9
Apr 19 No Class - spring carnival
Apr 24
HW5 released
Apr 26 Quiz 2
May 1 Project Presentations
May 3 Project Presentations