When training machine learning models with many variables, a single machine is often inadequate since (1) the variables may be too large to fit in memory, and (2) training can take a long time. A natural choice is to turn to distributed computing on a cluster. However, naive parallelization of machine learning algorithms can make inefficient use of distributed memory, while failing to obtain convergence speed-ups or inference correctness. Such inefficiencies are often caused by dependencies between variables.
In this talk, we will explore partitioning and update scheduling of variables in distributed machine learning algorithms, in order to improve their memory efficiency and speed up convergence without compromising correctness. Specifically, I will describe how partitioning and scheduling strategies can scale up machine learning models including Lasso, Matrix Factorization, and Latent Dirichlet Allocation with many variables. The experiments will demonstrate the speed and scalability of our approach in comparison to other parallel machine learning methods.