Joseph K. Bradley

my picture
jkbradle (yes, without the y) at cs dot cmu dot edu

Current status

I am an Apache Spark committer and PMC member working at Databricks, the company founded by the creators of Spark. I spend most of my time working on machine learning and graph processing on top of Spark. My current (open source) work can be found most easily by looking at:

  • Github for activity, especially within open source Apache Spark or Spark Packages
  • Databricks' Slideshare page or Spark Summit records for slides and talks
  • Databricks blog posts which I occasionally write


    Previously, I spent a year as a postdoc working with Kannan Ramchandran and Martin Wainwright at UC Berkeley. I received my Ph.D. in Machine Learning from Carnegie Mellon University, where I worked with Carlos Guestrin in the Select Lab. I received my B.S.E. in Computer Science from Princeton University, where I did research with Robert E. Schapire.

    Research of Olde

    I am interested in large-scale machine learning, especially in trade-offs between sample complexity, computational complexity, and potential for parallelization. My approach combines theory and application, focusing on methods which have strong theoretical guarantees and are competitive in practice.

    Selected topics of current and past research: Code and Data: Available on project pages listed above. For Apache Spark, go straight to the source.


    F. Abuzaid, J. Bradley, F. Liang, A. Feng, L. Yang, M. Zaharia, A. Talwalkar.
    Yggdrasil: An Optimized System for Training Deep Decision Trees at Scale.
    NIPS 2016
    Paper to be posted soon

    N. Shah, S. Balakrishnan, J. Bradley, A. Parekh, K. Ramchandran and M. Wainwright.
    Estimation from Pairwise Comparisons: Sharp Minimax Bounds with Topology Dependence.
    JMLR 17(58): 1-47, 2016
    Paper (PDF)
    Earlier version in AISTATS 2015: Paper and supplementary material (PDFs)

    M. Armbrust, R. Xin, C. Lian, Y. Huai, D. Liu, J. Bradley, X. Meng, T. Kaftan, M. Franklin, A. Ghodsi and M. Zaharia.
    Spark SQL: Relational Data Processing in Spark.
    SIGMOD 2015
    Paper (PDF)

    X. Meng, J. Bradley, B. Yavuz, E. Sparks, S. Venkataraman, D. Liu, J. Freeman, DB Tsai, M. Amde, S. Owen, D. Xin, R. Xin, M.J. Franklin, R. Zadeh, M. Zaharia, and A. Talwalkar.
    MLlib: Machine Learning in Apache Spark
    Accepted to JMLR-MLOSS
    Paper (PDF on Arxiv)

    Xiao Li, Joseph K. Bradley, Sameer Pawar, and Kannan Ramchandran.
    Robustifying the Sparse Walsh-Hadamard Transform without Increasing the Sample Complexity of O(K log N).
    IEEE International Symposium on Information Theory (ISIT), 2014.
    Tech Report version of ISIT 2014 paper (PDF)

    Nihar B. Shah, Joseph K. Bradley, Abhay Parekh, Martin Wainwright, and Kannan Ramchandran.
    A Case for Ordinal Peer-evaluation in MOOCs.
    NIPS Workshop on Data Driven Education, 2013.
    Paper (PDF)

    Joseph K. Bradley.
    Learning Large-Scale Conditional Random Fields.
    Ph.D. Thesis, Machine Learning Department, Carnegie Mellon University, 2013.
    Thesis (PDF)
    Defense Slides (PPT)

    Joseph K. Bradley and Carlos Guestrin.
    Sample Complexity of Composite Likelihood.
    International Conference on Artificial Intelligence and Statistics (AISTATS), 2012.
    Paper (PDF)
    Poster (PPT)
    Talk from CMU Machine Learning Lunch talk (Vimeo)
    Slides from CMU Machine Learning Lunch talk (PPT)

    Joseph K. Bradley, Aapo Kyrola, Danny Bickson, and Carlos Guestrin.
    Parallel Coordinate Descent for L1-Regularized Loss Minimization.
    International Conference on Machine Learning (ICML), 2011.
    Paper (PDF)
    Talk slides (PPT)
    Project page (with code, data, and supplementary material) video of ICML talk

    Joseph K. Bradley and Carlos Guestrin.
    Learning Tree Conditional Random Fields.
    International Conference on Machine Learning (ICML), 2010.
    Paper (PDF)
    Talk slides (PPT)

    Joseph K. Bradley and Robert E. Schapire.
    FilterBoost: Regression and Classification on Large Datasets.
    Advances in Neural Information Processing Systems (NIPS), 2008.
    Paper (PDF) with Appendix
    Slides (PPT) from oral at NIPS
    Data Analysis Project report, with multiclass extensions


    I do competitive Latin, Standard and Smooth ballroom dancing. It's awesome. You should do it too. (Check out CMU's Ballroom Dance Club!)

    I like traveling.