Skip to the content.

I have been a Solutions Architect and Software Engineer at Databricks and an academic researcher, all in various parts of the Machine Learning and advanced analytics space.


Summary of work

I currently work at Databricks, the company founded by the original creators of Apache Spark, Delta Lake, and MLflow. At Databricks, I spent my first 5.5 years at Databricks leading some of our ML efforts from the engineering side, both as an Apache Spark committer and PMC member working on open source and as a tech lead working on the Databricks product. I am now an ML specialist in the Solutions Architect organization, working more directly with customers.

Previously, I spent a year as a postdoc working with Kannan Ramchandran and Martin Wainwright at UC Berkeley. I received my Ph.D. in Machine Learning from Carnegie Mellon University, where I worked with Carlos Guestrin. I received my B.S.E. in Computer Science from Princeton University, where I did research with Robert E. Schapire.

Blog posts, talks, etc. while at Databricks

My public talks and blog posts can mostly be found via:

Open source work

I did most of my work in open source work during my earlier years at Databricks. You can find it by looking at:

Research from years past

My research was generally in large-scale machine learning, especially in trade-offs between sample complexity, computational complexity, and potential for parallelization. My approach combined theory and application, focusing on methods which have strong theoretical guarantees and are competitive in practice.

Selected topics of current and past research:

Academic publications

Year Title Authors Venue Documents
2016 Yggdrasil: An Optimized System for Training Deep Decision Trees at Scale F. Abuzaid, J. Bradley, F. Liang, A. Feng, L. Yang, M. Zaharia, A. Talwalkar NeurIPS PDF
2016 Estimation from Pairwise Comparisons: Sharp Minimax Bounds with Topology Dependence N. Shah, S. Balakrishnan, J. Bradley, A. Parekh, K. Ramchandran and M. Wainwright JMLR 17(58): 1-47, 2016 PDF; Earlier version in AISTATS 2015
2016 MLlib: Machine Learning in Apache Spark X. Meng, J. Bradley, B. Yavuz, E. Sparks, S. Venkataraman, D. Liu, J. Freeman, DB Tsai, M. Amde, S. Owen, D. Xin, R. Xin, M.J. Franklin, R. Zadeh, M. Zaharia, and A. Talwalkar JMLR 17(1): 1-7, 2016 arxiv
2015 Spark SQL: Relational Data Processing in Spark M. Armbrust, R. Xin, C. Lian, Y. Huai, D. Liu, J. Bradley, X. Meng, T. Kaftan, M. Franklin, A. Ghodsi and M. Zaharia SIGMOD PDF
2015 Estimation from Pairwise Comparisons: Sharp Minimax Bounds with Topology Dependence N. Shah, S. Balakrishnan, J. Bradley, A. Parekh, K. Ramchandran and M. Wainwright AISTATS PDF; supplement
2014 Robustifying the Sparse Walsh-Hadamard Transform without Increasing the Sample Complexity of O(K log N) Xiao Li, Joseph K. Bradley, Sameer Pawar, and Kannan Ramchandran IEEE International Symposium on Information Theory (ISIT) PDF
2013 A Case for Ordinal Peer-evaluation in MOOCs Nihar B. Shah, Joseph K. Bradley, Abhay Parekh, Martin Wainwright, and Kannan Ramchandran NeurIPS Workshop on Data Driven Education PDF
2013 Learning Large-Scale Conditional Random Fields Joseph K. Bradley Ph.D. Thesis, Machine Learning Department, Carnegie Mellon University Thesis PDF; Defense PPT
2012 Sample Complexity of Composite Likelihood Joseph K. Bradley and Carlos Guestrin International Conference on Artificial Intelligence and Statistics (AISTATS) PDF; poster PPT
2011 Parallel Coordinate Descent for L1-Regularized Loss Minimization Joseph K. Bradley, Aapo Kyrola, Danny Bickson, and Carlos Guestrin International Conference on Machine Learning (ICML) arxiv; Corrected PDF; Theory supplement; Scalability analysis; Lasso benchmark; Logreg benchmark
2010 Learning Tree Conditional Random Fields Joseph K. Bradley and Carlos Guestrin International Conference on Machine Learning (ICML) PDF
2008 FilterBoost: Regression and Classification on Large Datasets Joseph K. Bradley and Robert E. Schapire NeurIPS PDF, with appendix; slides; CMU Data Analysis Project version, with multiclass extensions