Quantitatively Tight Sample Complexity Bounds

John Langford
I present many new results on sample complexity bounds (bounds on the future error rate of arbitrary learning algorithms). Of theoretical interest are qualitative and quantitative improvements in sample complexity bounds as well as some techniques and criteria for judging the tightness of sample complexity bounds.

On the practical side, I show quantitative results (with true error rate bounds sometimes less than 0.01) for decision trees and neural networks with these sample complexity bounds applied to real world problems. I also present a technique for using both sample complexity bounds and (more traditional) holdout techniques.

Together, the theoretical and practical results of this thesis provide a well-founded practical method for evaluating learning algorithm performance based upon both training and testing set performance.

Code for calculating these bounds is provided.

  Contents
Part 1.  Introductory Learning Theory
Chapter 1.  Informal Introduction
 1.1.  The learning problem
 1.2.  The problem with the learning problem
 1.3.  A plethora of learning models
 1.4.  The oblivious passive supervised learning model
 1.5.  Questions we can answer
Chapter 2.  Formal Model and Context
 2.1.  Formal Model
 2.2.  Relationship to Prior Work
 2.3.  Overview of the document
Chapter 3.  Basic Observations
 3.1.  The Basic Building Block
 3.2.  Approximation techniques
 3.3.  Binomial Tail calculation techniques
 3.4.  Converting to a P-value approach
 3.5.  Bounding the Union
 3.6.  Arbitrary Loss functions
Chapter 4.  Simple Sample Complexity bounds
 4.1.  Simple Holdout
 4.2.  The basic training set bound
 4.3.  Lower Bounds
 4.4.  Lower Upper Bounds
 4.5.  Structural Risk Minimization
 4.6.  Incorporating a “Prior”
Part 2.  New Techniques
Chapter 5.  Microchoice Bounds (the algebra of choices)
 5.1.  A Motivating Observation
 5.2.  The Simple Microchoice Bound
 5.3.  Combining Microchoice with Freund’s Query Tree approach
 5.4.  Microchoice discussion
Chapter 6.  PAC-Bayes bounds
 6.1.  PAC-Bayes Basics
 6.2.  A Tighter PAC-Bayes Bound
 6.3.  PAC-Bayes Approximations
 6.4.  Application of the PAC-Bayes bound
Chapter 7.  Averaging Bounds (Improved margin)
 7.1.  Earlier Results
 7.2.  A generalized averaging bound
 7.3.  Proof of main theorem
 7.4.  Methods for tightening
 7.5.  Final thoughts for Averaging Bounds
Chapter 8.  Computable Shell bounds
 8.1.  The Discrete Shell Bound
 8.2.  Sampling Shell Bound
 8.3.  Lower Bounds
 8.4.  Shell Bounds for Continuous Spaces
 8.5.  Conclusion
Chapter 9.  Tight covering number bounds
 9.1.  Introduction
 9.2.  The Setting and Prior Results
 9.3.  Bracketing Covering Number Bound
 9.4.  Covering number calculations
 9.5.  Conclusion and Future Work
Chapter 10.  Holdout bounds: Progressive Validation
 10.1.  Progressive Validation Technique
 10.2.  Variance Analysis
 10.3.  Deviation Analysis
 10.4.  A Quick Experiment
 10.5.  Conclusion
Chapter 11.  Combining sample complexity and holdout bounds
 11.1.  Combination Possibilities
 11.2.  General Approaches for Combined Bounds
 11.3.  Approximations in Combinations
 11.4.  Conclusion
Part 3.  Experimental Results
Chapter 12.  Decision Trees
 12.1.  The Decision Tree Learning Algorithm
 12.2.  Bound Application Details
 12.3.  Results & Discussion
 12.4.  Discussion
Chapter 13.  Neural Networks
 13.1.  Theoretical setup
 13.2.  Experimental Results
 13.3.  Conclusion
Chapter 14.  Conclusion & Challenges
  Bibliography
Chapter 15.  Appendix: Definitions
Chapter 16.  Appendix: Manual
 16.1.  Test Error Bound Calculation
 16.2.  Training Set Bound Calculation
 16.3.  Shell Bound Calculation
 16.4.  Combined Bound Calculation