Ensemble Learning

Making use of a number of weak learners to reduce variance and prevent overfitting

Bagging: sample data, and train a learner on the sample. output is average (regression) or voting (classification) of all learners.
Boosting: iteratively train models. each newly trained model tries to fix previous mistakes. might overfit
- Gradient descent in function space: http://maths.dur.ac.uk/~dma6kp/pdf/face_recognition/Boosting/Mason99AnyboostLong.pdf

Examples of ensemble learning methods:

Random Forest: Sample data with replacement. then sample features to train a decision tree (largest possible). See http://www.stat.berkeley.edu/~breiman/RandomForests/cc_home.htm Trees are not correlated due to data/feature sampling.
Gradient Boosting: learn a combination of multiple weak learners to approximate any arbitrary function. iterative process using functional gradient descent.
Comparison: http://jessica2.msri.org/attachments/10778/10778-boost.pdf Looks like Boosting wins most of the times, but RF's computation can be paralleled.

Reading