Joseph Bradley

I have been a Solutions Architect and Software Engineer at Databricks and an academic researcher, all in various parts of the Machine Learning and advanced analytics space.

Summary of work

I currently work at Databricks, the company founded by the original creators of Apache Spark, Delta Lake, and MLflow. At Databricks, I spent my first 5.5 years at Databricks leading some of our ML efforts from the engineering side, both as an Apache Spark committer and PMC member working on open source and as a tech lead working on the Databricks product. I am now an ML specialist in the Solutions Architect organization, working more directly with customers.

Previously, I spent a year as a postdoc working with Kannan Ramchandran and Martin Wainwright at UC Berkeley. I received my Ph.D. in Machine Learning from Carnegie Mellon University, where I worked with Carlos Guestrin. I received my B.S.E. in Computer Science from Princeton University, where I did research with Robert E. Schapire.

Blog posts, talks, etc. while at Databricks

My public talks and blog posts can mostly be found via:

Databricks Blog
Databricks speaker links from past Spark Summits, Spark+AI Summits, and Data+AI Summits
Miscellaneous meetup, webinar, and conference talks … sometimes discoverable via web search

Open source work

I did most of my work in open source work during my earlier years at Databricks. You can find it by looking at:

Research from years past

My research was generally in large-scale machine learning, especially in trade-offs between sample complexity, computational complexity, and potential for parallelization. My approach combined theory and application, focusing on methods which have strong theoretical guarantees and are competitive in practice.

Selected topics of current and past research:

Academic publications

Year	Title	Authors	Venue	Documents
2016	Yggdrasil: An Optimized System for Training Deep Decision Trees at Scale	F. Abuzaid, J. Bradley, F. Liang, A. Feng, L. Yang, M. Zaharia, A. Talwalkar	NeurIPS	PDF
2016	Estimation from Pairwise Comparisons: Sharp Minimax Bounds with Topology Dependence	N. Shah, S. Balakrishnan, J. Bradley, A. Parekh, K. Ramchandran and M. Wainwright	JMLR 17(58): 1-47, 2016	PDF; Earlier version in AISTATS 2015
2016	MLlib: Machine Learning in Apache Spark	X. Meng, J. Bradley, B. Yavuz, E. Sparks, S. Venkataraman, D. Liu, J. Freeman, DB Tsai, M. Amde, S. Owen, D. Xin, R. Xin, M.J. Franklin, R. Zadeh, M. Zaharia, and A. Talwalkar	JMLR 17(1): 1-7, 2016	arxiv
2015	Spark SQL: Relational Data Processing in Spark	M. Armbrust, R. Xin, C. Lian, Y. Huai, D. Liu, J. Bradley, X. Meng, T. Kaftan, M. Franklin, A. Ghodsi and M. Zaharia	SIGMOD	PDF
2015	Estimation from Pairwise Comparisons: Sharp Minimax Bounds with Topology Dependence	N. Shah, S. Balakrishnan, J. Bradley, A. Parekh, K. Ramchandran and M. Wainwright	AISTATS	PDF; supplement
2014	Robustifying the Sparse Walsh-Hadamard Transform without Increasing the Sample Complexity of O(K log N)	Xiao Li, Joseph K. Bradley, Sameer Pawar, and Kannan Ramchandran	IEEE International Symposium on Information Theory (ISIT)	PDF
2013	A Case for Ordinal Peer-evaluation in MOOCs	Nihar B. Shah, Joseph K. Bradley, Abhay Parekh, Martin Wainwright, and Kannan Ramchandran	NeurIPS Workshop on Data Driven Education	PDF
2013	Learning Large-Scale Conditional Random Fields	Joseph K. Bradley	Ph.D. Thesis, Machine Learning Department, Carnegie Mellon University	Thesis PDF; Defense PPT
2012	Sample Complexity of Composite Likelihood	Joseph K. Bradley and Carlos Guestrin	International Conference on Artificial Intelligence and Statistics (AISTATS)	PDF; poster PPT
2011	Parallel Coordinate Descent for L1-Regularized Loss Minimization	Joseph K. Bradley, Aapo Kyrola, Danny Bickson, and Carlos Guestrin	International Conference on Machine Learning (ICML)	arxiv; Corrected PDF; Theory supplement; Scalability analysis; Lasso benchmark; Logreg benchmark
2010	Learning Tree Conditional Random Fields	Joseph K. Bradley and Carlos Guestrin	International Conference on Machine Learning (ICML)	PDF
2008	FilterBoost: Regression and Classification on Large Datasets	Joseph K. Bradley and Robert E. Schapire	NeurIPS	PDF, with appendix; slides; CMU Data Analysis Project version, with multiclass extensions