Feras Saad فِرَاسْ سَعَد
I am an Assistant Professor in the Computer Science Department at Carnegie Mellon, affiliated with the Principles of Programming and Artificial Intelligence groups.
My research explores the design and implementation of principled probabilistic inference systems that operate with a high degree of automation, accuracy, and scale. My interests span multiple levels of the computation stack, including:
- statistical inference and data science, e.g., Bayesian methods, nonparametrics;
- probabilistic programming languages, e.g., semantics, compilers, program synthesis;
- mathematical foundations, e.g., computability and complexity of stochastic simulation.
Background
I received the PhD in Computer Science, and MEng/SB degrees in Electrical Engineering and Computer Science, from MIT. My theses on probabilistic programming were recognized with the Sprowls PhD Thesis Award in AI+Decision Making and Johnson MEng Thesis Award in Computer Science. Prior to CMU, I was a Visiting Research Scientist at Google.
Contact
|
|
Research
My work combines ideas from probability and programming languages to build scalable systems for sound probabilistic modeling and inference. Current research themes include:
Probabilistic Programming. Researchers in diverse areas (e.g., econometrics, cognitive science, bioinformatics) routinely develop computational models of uncertain data and perform inference from observed data to learn about the underlying data-generating process. Probabilistic programs, which produce outputs by making random choices, are a unifying and uniquely expressive approach to modeling and inference over these processes. These languages aim to rigorously formalize the semantics of probabilistic models, automate the process of discovering models for complex data, and scale-up the hard aspects of probabilistic inference in high-dimensional settings. [PLDI-24] [PLDI-21] [POPL-19] [PLDI-19] [AISTATS-17] [NIPS-16]
Automated Probabilistic Model Discovery. A grand challenge in AI is automating the process of discovering accurate and interpretable probabilistic models for reasoning about data. I am interested in studying this problem through the lens of probabilistic program synthesis, formalized as Bayesian structure learning over rich generative model families. [ICML-23] [UAI-21] [POPL-19] [AISTATS-18]
Statistical Estimators & Tests. As probabilistic programs become a more widespread representation, new statistical techniques are needed for inferring their properties through the black-box computational inferences they expose. While static or analytical analysis is sometimes possible, a more flexible approach is to infer program properties using simulation-based statistical analyses of sampled execution traces. [PLDI-24] [AISTATS-22] [AISTATS-19]
Random Variate Generation. Generating random variables is a fundamental operation in scientific computation. This work explores (1) computational and information-theoretic limits of random sampling algorithms; and (2) developing new samplers that are theoretically optimal or near-optimal (in entropy consumption and/or statistical sampling error) while also extremely efficient in practice. [POPL-20] [AISTATS-20].
Software + Applications. A central motivation of my work is to develop performant, freely available software systems that make probabilistic inference more broadly accessible to everyday users while also helping domain-experts solve meaningful problems in the sciences, engineering, and public interest. [eLife-2019]
Probabilistic Computing Systems Lab
Dr. Wonyeol Lee
Postdoctoral Associate
PhD, Stanford University, 2023
Interests: continuous computation, program correctness
Jesse Michel
Visiting Graduate Student
Massachusetts Institute of Technology
Interests: differentiable programming, programming languages
Publications
Scalable Spatiotemporal Prediction with Bayesian Neural Fields
Saad, Burnim, Carroll, Patton, Köster, Saurous, Hoffman
To Appear: Nature Communications, 2024
arXiv | code | BibTeXProgrammable MCMC with Soundly Composed Guide Programs
Pham, Wang, Saad, Hoffmann
To Appear: OOPSLA, Proc. ACM Program. Lang. 8(OOPSLA2), 2024
paper | artifact | link | BibTeXRobust Resource Bounds with Static Analysis and Bayesian Inference
Pham, Saad, Hoffmann
PLDI, Proc. ACM Program. Lang. 8(PLDI), 2024
paper | artifact | link | BibTeXGenSQL: A Probabilistic Programming System for Querying Generative
Models of Database Tables
Huot, Ghavami, Lew, Schaechtle, Freer, Shelby, Rinard, Saad, Mansinghka
PLDI, Proc. ACM Program. Lang. 8(PLDI), 2024
paper | artifact | link | code | BibTeXLearning Generative Population Models From Multiple Clinical Datasets
via Probabilistic Programming
Loula, Collins, Schaechtle, Tenenbaum, Weller, Saad, O'Donnell, Mansinghka
AccMLBio, ICML Workshop on Efficient and Accessible Foundation Models for Biological Discovery, 2024
paper | link | BibTeXSequential Monte Carlo Learning for Time Series Structure Discovery
Saad, Patton, Hoffman, Saurous, Mansinghka
ICML, Proc. 40th International Conf. on Machine Learning, 2023
paper | link | code | BibTeXScalable Structure Learning, Inference, and Analysis with Probabilistic Programs
Saad
PhD Thesis, Massachusetts Institute of Technology, 2022
MIT George M. Sprowls PhD Thesis Award
Estimators of Entropy and Information via Inference in Probabilistic Models
Saad, Cusumano-Towner, Mansinghka
AISTATS, Proc. 25th International Conf. on Artificial Intelligence and Statistics, 2022
paper | link | arXiv | supplement | BibTeXBayesian AutoML for Databases via the InferenceQL Probabilistic Programming System
Schaechtle, Freer, Shelby, Saad, Mansinghka
AutoML, 1st International Conf. on Automated Machine Learning (Workshop), 2022
paper | link | BibTeXSPPL: Probabilistic Programming with Fast Exact Symbolic Inference
Saad, Rinard, Mansinghka
PLDI, Proc. 42nd International Conf. on Programming Design and Implementation, 2021
paper | artifact | link | arXiv | code | BibTeXHierarchical Infinite Relational Model
Saad, Mansinghka
UAI, Proc. 37th Conf. on Uncertainty in Artificial Intelligence, 2021
Oral Presentation
paper | link | arXiv | code | BibTeXThe Fast Loaded Dice Roller: A Near Optimal Exact Sampler for Discrete Probability Distributions
Saad, Freer, Rinard, Mansinghka
AISTATS, Proc. 24th International Conf. on Artificial Intelligence and Statistics, 2020
paper | supplement | link | arXiv | code | BibTeXOptimal Approximate Sampling from Discrete Probability Distributions
Saad, Freer, Rinard, Mansinghka
POPL, Proc. ACM Program. Lang. 4(POPL), 2020
paper | supplement | artifact | link | code | arXiv | BibTeXA Family of Exact Goodness-of-Fit Tests for High-dimensional Discrete Distributions
Saad, Freer, Ackerman, Mansinghka
AISTATS, Proc. 23rd International Conf. on Artificial Intelligence and Statistics, 2019
paper | supplement | link | arXiv | BibTeXBayesian Synthesis of Probabilistic Programs for Automatic Data Modeling
Saad, Cusumano-Towner, Schaechtle, Rinard, Mansinghka
POPL, Proc. ACM Program. Lang. 3, 2019
paper | supplement | link | arXiv | BibTeXGen: A General Purpose Probabilistic Programming System with Programmable Inference
Cusumano-Towner, Saad, Lew, Mansinghka
PLDI, Proc. 40th International Conf. on Programming Design and Implementation, 2019
paper | link | code | BibTeXElements of a Stochastic 3D Prediction Engine in Larval Zebrafish Prey Capture
Bolton, Haesemeyer, Jordi, Schaechtle, Saad, Mansinghka, Tenenbaum, Engert
eLife 8:e51975, 2019
paper | BibTeXTemporally-Reweighted Chinese Restaurant Process Mixtures
for Clustering, Imputing, and Forecasting Multivariate Time Series
Saad, Mansinghka
AISTATS, Proc. 21st International Conf. on Artificial Intelligence and Statistics, 2018
paper | supplement | link | code | BibTeXGoodness-of-Fit Tests for High-dimensional Discrete Distributions
with Application to Convergence Diagnostics in Approximate Bayesian Inference
Saad, Freer, Ackerman, Mansinghka
AABI, 1st Symposium on Advances in Approximate Bayesian Inference, 2018
paper | link | BibTeXDetecting Dependencies in Sparse, Multivariate Databases
Using Probabilistic Programming and Non-parametric Bayes
Saad, Mansinghka
AISTATS, Proc. 20th International Conf. on Artificial Intelligence and Statistics, 2017
paper | supplement | link | arXiv | code | BibTeXProbabilistic Search for Structured Data via Probabilistic Programming
and Nonparametric Bayes
Saad, Casarsa, Mansinghka
arXiv, Technical Report arXiv:1704.01087, 2017
PROBPROG, 1st International Conf. for Probabilistic Programming, 2018
arXiv | code | BibTeXTime Series Structure Discovery via Probabilistic Program Synthesis
Schaechtle*, Saad*, Radul, Mansinghka
arXiv, Technical Report arXiv:1611.07051, 2017
PROBPROG, 1st International Conf. for Probabilistic Programming, 2018
arXiv | BibTeXA Probabilistic Programming Approach to Probabilistic Data Analysis
Saad, Mansinghka
NIPS, Proc. 30th Conf. on Neural Information Processing Systems, 2016
paper | link | code | BibTeXProbabilistic Data Analysis with Probabilistic Programming
Saad, Mansinghka
arXiv, Technical Report arXiv:1608.05347, 2016
Extended version of NIPS 2016.
arXiv | BibTeXProbabilistic Data Analysis with Probabilistic Programming
Saad
MEng Thesis, Massachusetts Institute of Technology, 2016
MIT Charles & Jennifer Johnson MEng Thesis Award, 1st Place
Google Scholar | dblp | ORCID | BibTeX
Software
GitHub Homepage: Probabilistic Computing Systems Lab
Bayesian neural fields for spatiotemporal data modeling [arXiv 2024]
https://github.com/google/bayesnfAutomated Bayesian model discovery for time series data [ICML 2023]
https://github.com/probsys/AutoGP.jlEstimators of entropy and mutual information in probabilistic programs [AISTATS 2022] https://proceedings.mlr.press/v151/saad22a/saad22a-supp.zip
Bayesian nonparametric structure learning for relational systems [UAI 2021]
https://github.com/probsys/hierarchical-irmProbabilistic programming language with fast exact symbolic inference [PLDI 2021]
https://github.com/probsys/spplFast exact sampler for discrete probability distributions [AISTATS 2020] https://github.com/probsys/fast-loaded-dice-roller
See also: code generator by Peter Occil.Optimal limited-precision approximate sampler for discrete distributions [POPL 2020]
https://github.com/probsys/optimal-approximate-samplingBayesian method for clustering, imputing, and forecasting time series [AISTATS 2018]
https://github.com/probcomp/trcrpmProbabilistic programming system with programmable inference [PLDI 2019]
https://gen.devLibrary of composable probabilistic generative models [NIPS 2016, MEng Thesis] https://github.com/probcomp/cgpm
Probabilistic programming for data analysis built on sqlite [NIPS 2016, AISTATS 2017]
https://github.com/probcomp/bayeslite
Sublime Text Users: Productivity plugins (9000+ users):
AddRemoveFolder;
RemoveLineBreaks;
ViewSetting.
Talks
Scalable Spatiotemporal Prediction with Bayesian Neural Fields
UAI 2024 Workshop on Tractable Probabilistic Modeling, Barcelona, Spain
Automated Gaussian Processes and Sequential Monte Carlo
Learning Bayesian Statistics Podcast
Keynote: Domain-Specific Probabilistic Programs for Time Series Modeling
Google Forecasting Summit 2023, Mountain View, CA
Programmable Systems for Probabilistic Modeling and Inference
SCS Faculty Lightning Talks 2023, Virtual
Scalable Structure Learning and Inference via Probabilistic Programming
MIT Thesis Defense, Virtual
MIT Thesis Defense, Cambridge, MA
Scalable Structure Learning and Inference for Domain-Specific Probabilistic Programs
LAFI 2022, Philadelphia, PA
SPPL: Probabilistic Programming with Fast Exact Symbolic Inference
PROBPROG 2021, Virtual
SPLASH 2021, Chicago, IL
MIT PLSE Seminars 2021, Virtual
PLDI 2021 [extended] [lightning], VirtualFairer and Faster AI Using Probabilistic Programming Languages
MIT Horizon 2021, VirtualHierarchical Infinite Relational Model
UAI 2021 (Oral Presentation), VirtualFast Loaded Dice Roller
AISTATS 2020, VirtualOptimal Approximate Sampling from Discrete Probability Distributions
POPL 2020, New Orleans, LABayesian Synthesis of Probabilistic Programs for Automatic Data Modeling
CICS Seminars 2019, Notre Dame, IN
POPL 2019, Cascais, Portugal
PROBPROG 2018, Boston MA
Press
MIT researchers introduce generative AI for databases. MIT News, Jul. 2024
Estimating the informativeness of data. MIT News, Apr. 2022
Exact symbolic AI for faster, better assessment of AI fairness. MIT News, Aug. 2021
How and why computers roll loaded dice. Quanta Magazine, Jul. 2020
How Rolling Loaded Dice Will Change the Future of AI. Intel Press Release, Jul. 2020
Algorithm quickly simulates a roll of loaded dice. MIT News, Jan. 2020
MIT Debuts Gen, a Julia-Based Language for Artificial Intelligence. InfoQ, Jul. 2019
New AI programming language goes beyond deep learning. MIT News, Jun. 2019
Democratizing data science. MIT News, Jan. 2019
MIT lets AI ``synthesize'' computer programs to aid data scientists. ZDNet, Jan. 2019