My doctoral research is focused on enabling machine learning researchers and practitioners to efficiently train large and complex models with big data on distributed clusters.
I spent a few years on developing parameter server systems
to make machine learning programs run fast and efficiently and worked
on automating dependence-aware parallelization
of serial, imperative ML programs for distributed training.
I am now working on dynamic scheduling
(i.e., distributed device placement) for neural network training to complete my thesis.