The large representational capacity of deep learning models is often viewed as a positive attribute which allows us to learn interactions of many input variables. However, large model classes also present challenges for estimation. In this talk, we take special interest in learning interaction effects.
First, we precisely define interaction effects through the statistical framework of the functional ANOVA. By giving care to this definition, we encounter several surprising findings about the nature of interaction effects (e.g. all interaction effects look like XOR). Next, we find that traditional machine learning models (such as tree-based models) gain almost all of their predictive power from loworder interaction effects. Turning to deep models, we find that fully-connected networks tend to estimate a large amount of spurious interaction effects, and Dropout regularizes these away. As a result, we find that even complicated models such as deep neural networks tend to gain much of their predictive power from low-order interactions.
This talk is based on joint work with Rich Caruana and Eric Xing.
Presented in Partial Fulfillment of the CSD Speaking Skills Requirement.