Feras Saad فِرَاسْ سَعَد
I am an Assistant Professor
in the Computer Science Department
at Carnegie Mellon University, affiliated with the
Principles of Programming
and
Artificial Intelligence groups.
My research explores the design and implementation of scalable
probabilistic computing systems, theoretical understanding of their properties,
and practical applications to challenging problems in modeling and inference.
This work integrates ideas from programming languages, probability,
and computation—which together give us the building
blocks for engineering powerful probabilistic reasoning systems with a high
degree of automation, accuracy, and scale.
I explore diverse applications of these ideas in areas such as
automated statistical model discovery, spatiotemporal data science,
probabilistic program analysis, and random variate generation.
I also maintain open-source software libraries to
help practitioners use these methods in their own areas. See
research and
software
for an overview.
Background
I received the PhD in
Computer Science
and the MEng/SB degrees in
Electrical Engineering and Computer Science
from MIT.
My theses on probabilistic programming were recognized with the
Sprowls PhD Thesis Award in AI+Decision Making and
Johnson MEng Thesis Award in Computer Science.
Prior to CMU, I was a Visiting Research Scientist at
Google.
Contact
Research Opportunities
I am actively recruiting students and postdocs at all levels.
- Prospective Students: If you are interested in working with me as a
graduate student, please apply to the CMU CS PhD
program and mention my name in your application.
- Current CMU Students: If you are already
a graduate, masters, or undergraduate student at CMU, please send me an
email and we can find a time to chat.
- Postdocs: Please send me an email with your CV, two representative
papers, and a description of your research background and interests.
Research
My work combines ideas from probability and programming languages to
build scalable systems for sound probabilistic modeling
and inference. Current research themes include:
Probabilistic Programming.
Researchers in diverse areas (e.g., econometrics, cognitive science,
bioinformatics) work hard to develop computational models of uncertain data
and perform inference from observed data to learn about the underlying
data-generating process. Probabilistic programs, which produce outputs by
making random choices, are a unifying and uniquely expressive approach to
modeling and inference over these processes. These languages
aim to rigorously formalize the
semantics of probabilistic models, automate the process of discovering
models for complex data, and scale-up the hard aspects of
probabilistic inference in high-dimensional settings.
[PLDI-21]
[POPL-19]
[PLDI-19]
[AISTATS-17]
[NIPS-16]
Automated Probabilistic Model Discovery.
A grand challenge in AI is automating the process of
discovering accurate and interpretable probabilistic models for reasoning about data.
I am interested in studying this problem through the lens of probabilistic program synthesis,
formalized as Bayesian structure learning over rich generative model families.
[ICML-23]
[UAI-21]
[POPL-19]
[AISTATS-18]
Statistical Estimators & Tests.
As probabilistic programs become a more widespread representation,
new statistical techniques are needed for inferring their properties
through the black-box computational inferences they expose. While static or
closed-form analysis is sometimes possible, a more flexible
approach is to infer program properties using
simulation-based analyses of sampled execution traces.
[AISTATS-22]
[AISTATS-19]
Random Sampling Algorithms.
This line of work explores fundamental computational limits of random
variate generation, which is a key operation that enables probabilistic
computation. Specific interests include new sampling algorithms that are
theoretically optimal or near-optimal (in entropy consumption and statistical
sampling error) while remaining extremely efficient in practice.
[POPL-20]
[AISTATS-20].
Software + Applications.
A central motivation of my work is to develop performant, freely available
software systems that make probabilistic inference more
broadly accessible
to everyday users while also helping domain-experts solve meaningful problems in the
sciences,
engineering,
and
public interest.
[eLife-2019]
Lab Members
Dr. Wonyeol Lee
Postdoctoral Associate
PhD, Stanford University, 2023
Interests: continuous computation, program correctness
Jesse Michel
Visiting Graduate Student
Massachusetts Institute of Technology
Interests: differentiable programming, programming languages
Publications
Google Scholar
| dblp
| ORCID
| BibTeX
Scalable Spatiotemporal Prediction with Bayesian Neural Fields
F. Saad, J. Burnim, C. Carroll, B. Patton, U. Köster, R. Saurous, M. Hoffman
arXiv Preprint (Under Review), 2024
paper
| link
| code
| BibTeX
Sequential Monte Carlo Learning for Time Series Structure Discovery
F. Saad, B. Patton, M. Hoffman, R. Saurous, V. Mansinghka
ICML, Proc. 40th International Conf. on Machine Learning, 2023
paper
| link
| code
| BibTeX
Scalable Structure Learning, Inference, and Analysis with Probabilistic Programs
F. Saad
PhD Thesis, Massachusetts Institute of Technology, 2022
MIT George M. Sprowls PhD Thesis Award
Estimators of Entropy and Information via Inference in Probabilistic Models
F. Saad, M. Cusumano-Towner, V. Mansinghka
AISTATS, Proc. 25th International Conf. on Artificial Intelligence and Statistics, 2022
paper
| link
| arXiv
| supplement
| BibTeX
Bayesian AutoML for Databases via the InferenceQL Probabilistic Programming System
U. Schaechtle, C. Freer, Z. Shelby, F. Saad, V. Mansinghka
AutoML, 1st International Conf. on Automated Machine Learning (Workshop), 2022
paper
| link
| BibTeX
SPPL: Probabilistic Programming with Fast Exact Symbolic Inference
F. Saad, M. Rinard, V. Mansinghka
PLDI, Proc. 42nd International Conf. on Programming Design and Implementation, 2021
paper
| artifact
| link
| arXiv
| code
| BibTeX
Hierarchical Infinite Relational Model
F. Saad, V. Mansinghka
UAI, Proc. 37th Conf. on Uncertainty in Artificial Intelligence, 2021
Oral Presentation
paper
| link
| arXiv
| code
| BibTeX
The Fast Loaded Dice Roller: A Near Optimal Exact Sampler for Discrete Probability Distributions
F. Saad, C. Freer, M. Rinard, V. Mansinghka
AISTATS, Proc. 24th International Conf. on Artificial Intelligence and Statistics, 2020
paper
| supplement
| link
| arXiv
| code
| BibTeX
Optimal Approximate Sampling from Discrete Probability Distributions
F. Saad, C. Freer, M. Rinard, V. Mansinghka
POPL, Proc. ACM Program. Lang. 4(POPL), 2020
paper
| supplement
| artifact
| link
| code
| arXiv
| BibTeX
A Family of Exact Goodness-of-Fit Tests for High-dimensional Discrete Distributions
F. Saad, C. Freer, N. Ackerman, V. Mansinghka
AISTATS, Proc. 23rd International Conf. on Artificial Intelligence and Statistics, 2019
paper
| supplement
| link
| arXiv
| BibTeX
Bayesian Synthesis of Probabilistic Programs for Automatic Data Modeling
F. Saad, M. Cusumano-Towner, U. Schaechtle, M. Rinard, V. Mansinghka
POPL, Proc. ACM Program. Lang. 3, 2020
paper
| supplement
| link
| arXiv
| BibTeX
Gen: A General Purpose Probabilistic Programming System with Programmable Inference
M. Cusumano-Towner, F. Saad, A. Lew, Mansinghka
PLDI, Proc. 40th International Conf. on Programming Design and Implementation, 2019
paper
| link
| code
| BibTeX
Elements of a Stochastic 3D Prediction Engine in Larval Zebrafish Prey Capture
A. Bolton, M. Haesemeyer, J. Jordi, U. Schaechtle, F. Saad, V. Mansinghka, J. Tenenbaum, F. Engert
eLife 8:e51975, 2019
paper
| BibTeX
Temporally-Reweighted Chinese Restaurant Process Mixtures
for Clustering, Imputing, and Forecasting Multivariate Time Series
F. Saad, V. Mansinghka
AISTATS, Proc. 21st International Conf. on Artificial Intelligence and Statistics, 2018
paper
| supplement
| link
| code
| BibTeX
Goodness-of-Fit Tests for High-dimensional Discrete Distributions
with Application to Convergence Diagnostics in Approximate Bayesian Inference
F. Saad, C. Freer, N. Ackerman, V. Mansinghka
AABI, 1st Symposium on Advances in Approximate Bayesian Inference, 2018
paper
| link
| BibTeX
Detecting Dependencies in Sparse, Multivariate Databases
Using Probabilistic Programming and Non-parametric Bayes
F. Saad, V. Mansinghka
AISTATS, Proc. 20th International Conf. on Artificial Intelligence and Statistics, 2017
paper
| supplement
| link
| arXiv
| code
| BibTeX
Probabilistic Search for Structured Data via Probabilistic Programming
and Nonparametric Bayes
F. Saad, L. Casarsa, V. Mansinghka
arXiv, Technical Report arXiv:1704.01087, 2017
PROBPROG, 1st International Conf. for Probabilistic Programming, 2018
arXiv
| code
| BibTeX
Time Series Structure Discovery via Probabilistic Program Synthesis
U. Schaechtle*, F. Saad*, A. Radul, V. Mansinghka
arXiv, Technical Report arXiv:1611.07051, 2017
PROBPROG, 1st International Conf. for Probabilistic Programming, 2018
arXiv
| BibTeX
A Probabilistic Programming Approach to Probabilistic Data Analysis
F. Saad, V. Mansinghka
NIPS, Proc. 30th Conf. on Neural Information Processing Systems, 2016
paper
| link
| code
| BibTeX
Probabilistic Data Analysis with Probabilistic Programming
F. Saad, V. Mansinghka
arXiv, Technical Report arXiv:1608.05347, 2016
Extended version of NIPS 2016.
arXiv
| BibTeX
Probabilistic Data Analysis with Probabilistic Programming
F. Saad
MEng Thesis, Massachusetts Institute of Technology, 2016
MIT Charles & Jennifer Johnson MEng Thesis Award, 1st Place
Talks
Keynote: Domain-Specific Probabilistic Programs for Time Series Modeling
2023 Google Forecasting Summit, Mountain View, CA
Programmable Systems for Probabilistic Modeling and Inference
SCS Faculty Lightning Talks 2023, Virtual
Scalable Structure Learning and Inference via Probabilistic Programming
MIT Thesis Defense, Virtual
MIT Thesis Defense, Cambridge, MA
Scalable Structure Learning and Inference for Domain-Specific Probabilistic Programs
LAFI 2022, Philadelphia, PA
SPPL: Probabilistic Programming with Fast Exact Symbolic Inference
PROBPROG 2021, Virtual
SPLASH 2021, Chicago, IL
MIT PLSE Seminars 2021, Virtual
PLDI 2021
[extended]
[lightning], Virtual
Fairer and Faster AI Using Probabilistic Programming Languages
MIT Horizon 2021, Virtual
Hierarchical Infinite Relational Model
UAI 2021 (Oral Presentation), Virtual
Fast Loaded Dice Roller
AISTATS 2020, Virtual
Optimal Approximate Sampling from Discrete Probability Distributions
POPL 2020, New Orleans, LA
Bayesian Synthesis of Probabilistic Programs for Automatic Data Modeling
CICS Seminars 2019, Notre Dame, IN
POPL 2019, Cascais, Portugal
PROBPROG 2018, Boston MA
Press
- Estimating the informativeness of data,
MIT News, Apr. 2022
- Exact symbolic AI for faster, better assessment of AI fairness,
MIT News, Aug. 2021
- How and why computers roll loaded dice,
Quanta Magazine, Jul. 2020
- How Rolling Loaded Dice Will Change the Future of AI,
Intel Press Release, Jul. 2020
- Algorithm quickly simulates a roll of loaded dice,
MIT News, Jan. 2020
- MIT Debuts Gen, a Julia-Based Language for Artificial Intelligence,
InfoQ, Jul. 2019
- New AI programming language goes beyond deep learning,
MIT News, Jun. 2019
- Democratizing data science,
MIT News, Jan. 2019
- MIT lets AI ``synthesize'' computer programs to aid data scientists,
ZDNet, Jan. 2019