Joseph K. Bradley

my picture
jkbradle (yes, without the y) at cs dot cmu dot edu

I am an Apache Spark committer working at Databricks, the company founded by the creators of Spark and still leading Spark development. I spend most of my time working on MLlib, the Machine Learning library built on top of Spark.


Previously, I spent a year as a postdoc working with Kannan Ramchandran and Martin Wainwright at UC Berkeley. I received my Ph.D. in Machine Learning from Carnegie Mellon University, where I worked with Carlos Guestrin in the Select Lab. I received my B.S.E. in Computer Science from Princeton University, where I did research with Robert E. Schapire.

Research of Olde

I am interested in large-scale machine learning, especially in trade-offs between sample complexity, computational complexity, and potential for parallelization. My approach combines theory and application, focusing on methods which have strong theoretical guarantees and are competitive in practice.

Selected topics of current and past research: Code and Data: Available on project pages listed above. For Spark, go straight to the source.


M. Armbrust, R. Xin, C. Lian, Y. Huai, D. Liu, J. Bradley, X. Meng, T. Kaftan, M. Franklin, A. Ghodsi and M. Zaharia.
Spark SQL: Relational Data Processing in Spark.
Paper (PDF)

X. Meng, J. Bradley, B. Yavuz, E. Sparks, S. Venkataraman, D. Liu, J. Freeman, DB Tsai, M. Amde, S. Owen, D. Xin, R. Xin, M.J. Franklin, R. Zadeh, M. Zaharia, and A. Talwalkar.
MLlib: Machine Learning in Apache Spark
Accepted to JMLR-MLOSS
Paper (PDF on Arxiv)

Xiao Li, Joseph K. Bradley, Sameer Pawar, and Kannan Ramchandran.
Robustifying the Sparse Walsh-Hadamard Transform without Increasing the Sample Complexity of O(K log N).
IEEE International Symposium on Information Theory (ISIT), 2014.
Tech Report version of ISIT 2014 paper (PDF)

Nihar B. Shah, Joseph K. Bradley, Abhay Parekh, Martin Wainwright, and Kannan Ramchandran.
A Case for Ordinal Peer-evaluation in MOOCs.
NIPS Workshop on Data Driven Education, 2013.
Paper (PDF)

Joseph K. Bradley.
Learning Large-Scale Conditional Random Fields.
Ph.D. Thesis, Machine Learning Department, Carnegie Mellon University, 2013.
Thesis (PDF)
Defense Slides (PPT)

Joseph K. Bradley and Carlos Guestrin.
Sample Complexity of Composite Likelihood.
International Conference on Artificial Intelligence and Statistics (AISTATS), 2012.
Paper (PDF)
Poster (PPT)
Talk from CMU Machine Learning Lunch talk (Vimeo)
Slides from CMU Machine Learning Lunch talk (PPT)

Joseph K. Bradley, Aapo Kyrola, Danny Bickson, and Carlos Guestrin.
Parallel Coordinate Descent for L1-Regularized Loss Minimization.
International Conference on Machine Learning (ICML), 2011.
Paper (PDF)
Talk slides (PPT)
Project page (with code, data, and supplementary material) video of ICML talk

Joseph K. Bradley and Carlos Guestrin.
Learning Tree Conditional Random Fields.
International Conference on Machine Learning (ICML), 2010.
Paper (PDF)
Talk slides (PPT)

Joseph K. Bradley and Robert E. Schapire.
FilterBoost: Regression and Classification on Large Datasets.
Advances in Neural Information Processing Systems (NIPS), 2008.
Paper (PDF) with Appendix
Slides (PPT) from oral at NIPS
Data Analysis Project report, with multiclass extensions


I do competitive Latin, Standard and Smooth ballroom dancing. It's awesome. You should do it too. (Check out CMU's Ballroom Dance Club!)

I like traveling.