I am an Assistant Professor in the Computer Science Department at Carnegie Mellon University, with a courtesy appointment in the ECE department. I lead the TheSys research group
at CMU, and also a part of the Parallel Data Lab (PDL)
. My research interests lie in the broad area of computer and networked systems with a current focus on reliability, availability, scalability, and performance challenges in data storage and caching systems,
in systems for machine learning
and in live video streaming
A bulk of my past research
has focussed on the storage/caching layer and in part on the application (specifically, machine learning) layer:
- Storage/caching: My research focus here has been on fault tolerance, scalability, load balancing, and reducing latency in large-scale distributed data storage and caching systems. We designed coding theory based solutions that we showed are provably optimal. We also built systems and evaluated them on Facebook's data-analytics cluster and on Amazon EC2 showing significant benefits over the state-of-the-art. Our solutions are now a part of Apache Hadoop 3.0 and are also being considered by several companies such as NetApp and Cisco.
- Machine learning: My research focus here has been on the generalization performance of a class of learning algorithms that are widely used for ranking. We designed an algorithm building on top of Multiple Additive Regression Trees, and through empirical evaluation on real-world datasets showed significant improvement over classification, regression, and ranking tasks. The new algorithm that we proposed is now deployed in production in Microsoft's data-analysis toolbox which powers the Azure Machine Learning product.