Benedikt Boecking

I am a PhD student at Carnegie Mellon University, where I am a member of the Auton Lab advised by Artur Dubrawski. I am interested in the technical and theoretical aspects of how we engage domain experts in building and training machine learning models. My current research focus is to develop methods for learning from domain knowledge via various forms of weak supevision. In the past, I have also worked on algorithms, tools, and data analysis to help fight sex trafficking using deep web and dark web data.

You can contact me at: boecking at -nospam-

Google Scholar Profile


Uber Presidential Fellowship, Carnegie Mellon University, 2018.

Best Paper Award at IPP 2014, Oxford Internet Institute, University of Oxford.

Working Papers

Boecking, B., Neiswanger, W., Roberts, N., Ermon, S., Sala, F., & Dubrawski, A. (2022). Generative Modeling Helps Weak Supervision (and Vice Versa). [arXiv]

Conference and Journal Publications

Boecking, B.*, Usuyama, N.*, Bannur, S., Castro, D.C., Schwaighofer, A., Hyland, S., Wetscherek, M., Naumann, T., Nori, A., Alvarez-Valle, J., Poon, H., Oktay, O. (2022). Making the Most of Text Semantics to Improve Biomedical Vision-Language Processing.
European Conference on Computer Vision (ECCV). [arXiv] *equal contribution
        •Check out our local alignment dataset for MIMIC-CXR on PhysioNet: [MS-CXR]
        •And our language models: [CXR-BERT-general] [CXR-BERT-specialized]

Boecking, B., Jeanselme, V., & Dubrawski, A. (2022). Constrained Clustering via Metric and Kernel Learning without Pairwise Constraint Relaxation.
Advances in Data Analysis and Classification. [arXiv]

Boecking, B., Neiswanger, W., Xing, E.P., & Dubrawski, A. (2021). Interactive Weak Supervision: Learning Useful Heuristics for Data Labeling.
International Conference on Learning Representations (ICLR). [arXiv] [OpenReview] [code]

Rühling Cachay, S., Boecking, B., & Dubrawski, A. (2021). End-to-End Weak Supervision.
Thirty-fifth Conference on Neural Information Processing Systems (NeurIPS). [arXiv] [code]

Goswami, M., Boecking, B., & Dubrawski, A. (2021). Weak Supervision for Affordable Modeling of Electrocardiogram Data.
AMIA 2021 Annual Symposium.

Boecking, B., Miller, K., Kennedy, E., & Dubrawski, A. (2019). Quantifying the Relationship between Large Public Events and Escort Advertising Behavior.
Journal of Human Trafficking, 5(3):220–237. [Taylor&Francis]

Hundman, K., Gowda, T., Kejriwal, K., Boecking, B.(2018). Always Lurking: Understanding and Mitigating Bias in Online Human Trafficking Detection.
In Proceedings of AAAI/ACM Conference on AI, Ethics, and Society (AIES).[acm]

Boecking, B., Hall, M., & Schneider, J. (2015). Event prediction with learning algorithms—A study of events surrounding the egyptian revolution of 2011 on the basis of micro blog data.
Policy & Internet, 7(2), 159-184. [Wiley]

Dubrawski, A., Miller, K., Barnes, M., Boecking, B., & Kennedy, E. (2015). Leveraging publicly available data to discern patterns of human-trafficking activity.
Journal of Human Trafficking, 1(1), 65-85. [Taylor&Francis]

Boecking, B., Hall, M., & Schneider, J. (2014). Predicting Events Surrounding the Egyptian Revolution of 2011 Using Learning Algorithms on Micro Blog Data.
Internet, Politics, and Policy 2014: Crowdsourcing for Politics and Policy, University of Oxford. Best Paper Award

Boecking, B., Chalup, S. K., Seese, D., & Wong, A. S. (2014). Support vector clustering of time series data with alignment kernels.
Pattern Recognition Letters, 45, 129-135. [Elsevier]

Peer-Reviewed Workshop Publications

Rühling Cachay, S., Boecking, B., & Dubrawski, A. (2021). Dependency Structure Misspecification in Multi-Source Weak Supervision Models.
ICLR Workshop on Weakly Supervised Learning (WeaSuL).

Rühling Cachay, S., Boecking, B., & Dubrawski, A. (2020). Model Misspecification in Multiple Weak Supervision.
NeurIPS LatinX in AI Workshop.

Boecking, B. and Dubrawski, A. (2019). Pairwise Feedback for Data Programming.
NeurIPS Workshop on Learning with Rich Experience (LIRE). [arXiv]

Nagpal, C., Miller, K., Boecking, B., & Dubrawski, A. (2017). An Entity Resolution approach to isolate instances of Human Trafficking online.
3rd Workshop on Noisy User-generated Text (W-NUT) at EMNLP 2017, Copenhagen. [aclweb]

Other Papers

De-Arteaga, M.* and Boecking, B.* (2019). Killings of social leaders in the Colombian post-conflict: Data analysis for investigative journalism. arXiv:1906.08206. [arXiv] (*Indicates equal contribution)

Other Projects

Líderes en vía de extinción. A data-driven journalistic investigation into killings of social leaders in Colombia, together with Maria De-Arteaga and CONNECTAS, published in El País in Colombia. Read the article here. This article won 2nd place at Premio ¡Investiga! 2019, an award for investigative journalism in Colombia.