FacebookTwitterGoogle PlusRSS News Feed

SCS Student Seminar Series and Computer Science Speaking Skills Talk

Speaking Skills
Computer Science Department
Carnegie Mellon University
Scheduling and Partitioning Big Models for Parallel Machine Learning
Friday, March 28, 2014 - 12:00pm
McWilliams Classroom 4303 
Gates&Hillman Centers
Abstract:

When training machine learning models with many variables, a single machine is often inadequate since (1) the variables may be too large to fit in memory, and (2) training can take a long time. A natural choice is to turn to distributed computing on a cluster. However, naive parallelization of machine learning algorithms can make inefficient use of distributed memory, while failing to obtain convergence speed-ups or inference correctness. Such inefficiencies are often caused by dependencies between variables.

In this talk, we will explore partitioning and update scheduling of variables in distributed machine learning algorithms, in order to improve their memory efficiency and speed up convergence without compromising correctness. Specifically, I will describe how partitioning and scheduling strategies can scale up machine learning models including Lasso, Matrix Factorization, and Latent Dirichlet Allocation with many variables. The experiments will demonstrate the speed and scalability of our approach in comparison to other parallel machine learning methods.

Presented in Partial Fulfillment of the CSD Speaking Skills Requirement.

Keywords:
For More Information, Please Contact:

deb [atsymbol] cs ~replace-with-a-dot~ cmu ~replace-with-a-dot~ edu