As machine learning is pushed to understand billions of tweets, photos, and people, our algorithms need to both handle terrabytes of information and fit billions of parameters. However, most modern algorithms only focus on distributed learning for either big data or huge models, but not both. Here, we focus on scaling stochastic gradient descent (SGD) for both big data and huge latent factor models, including coupled tensor decomposition, topic modeling, and many others.
We offer a scalable, distributed system to learn a wide variety of latent factor models. To do this, we first explain how SGD can be can be used for distributed learning on a wide variety of models, including different partitioning schemes and constraints. Second, we explain how to efficiently implement such a distributed learning framework, even on standard Hadoop. Third, we offer a novel, easy-to-use technique for improving efficiency in the face of stragglers during distributed computing.
With all of these components, we demonstrate that our system is more efficient and scales to larger problems than other state of the art distributed learning systems. We scale to billions of parameters, high rank decompositions, and use fewer machines than other systems.
Based on work done with Abhimanu Kumar, Vagelis Papalexakis, Partha Talukdar, Qirong Ho, Christos Faloutsos, and Eric Xing