Feras Saad
I recently completed the PhD in EECS at MIT (MEng/SB 2016), where I worked with Dr. Vikash Mansinghka and Prof. Martin Rinard. My research integrates ideas from programming languages, artificial intelligence, and statistics that together enable sound and scalable systems for probabilistic inference.
I am pleased to be joining the Computer Science Department at Carnegie Mellon University as an Assistant Professor in Fall 2023. Before joining, I am spending one year as a Visiting Research Scientist at Google.
My group is recruiting students and postdocs! If you are interested in the research areas described below, please send me an email and (prospective students) apply to the CMU CS PhD program.
Research
I have broad interests in developing new techniques that enable large-scale probabilistic modeling, inference, and computation across many applications. Some current research themes are described below.
Probabilistic programming.
Programs are a uniquely expressive formalism for modeling and understanding complex empirical phenomena.
I am interested in building systems that help automate, formalize,
and scale-up hard aspects of modeling and inference. Projects include
synthesizing probabilistic programs for automated model discovery,
symbolic solvers for fast Bayesian inference,
modeling
and query DSLs
for Bayesian databases,
and general-purpose systems
for integrating symbolic, probabilistic, and neural approaches to engineering intelligent systems.
Automatically discovering models from data.
How can we rapidly convert datasets into probabilistic models that surface
interpretable patterns and make accurate predictions?
Our approach is to perform Bayesian nonparametric inference over symbolic model representations
that combine simple rules to form powerful models in aggregate.
These methods operate within
domain-specific data modeling languages
and have been applied to discovering models of
cross-sectional data,
relational systems,
and univariate and multivariate time series.
Statistical estimators and tests.
As computer programs become the standard representation for complex probability distributions
(e.g., stochastic simulators or probabilistic programs)
we need new techniques to analyze their statistical properties through the black-box interfaces they expose.
Examples include goodness-of-fit tests for programs
that simulate random discrete data structures and estimators of entropy and information
for probabilistic generative models.
Fast random sampling algorithms.
Generating random variates is a fundamental operation enabling probabilistic computation.
I am interested in exploring fundamental computational limits of sampling,
devising algorithms that are theoretically optimal or near-optimal in entropy and error,
and engineering samplers with extremely efficient runtime and memory—see optimal approximate sampling and
fast loaded dice roller.
Applications.
To be useful in modern applications, all these methods must be implemented in performant software systems,
many of which are available as open-source projects.
We aim to build software that make probabilistic learning more
broadly accessible
and help domain specialists solve applied problems in the
sciences,
engineering,
and
public interest.
Publications
Scalable Structure Learning, Inference, and Analysis with Probabilistic Programs
F. Saad
PhD Thesis, Massachusetts Institute of Technology, 2022
Estimators of Entropy and Information via Inference in Probabilistic Models
F. Saad, M. Cusumano-Towner, V. Mansinghka
AISTATS, Proc. 25th International Conference on Artificial Intelligence and Statistics, 2022
paper | link | arXiv | supplement | BibTeXBayesian AutoML for Databases via the InferenceQL Probabilistic Programming System
U. Schaechtle, C. Freer, Z. Shelby, F. Saad, V. Mansinghka
AutoML, 1st International Conference on Automated Machine Learning (Late-Breaking Workshop), 2022
paper | link | BibTeXSPPL: Probabilistic Programming with Fast Exact Symbolic Inference
F. Saad, M. Rinard, Mansinghka
PLDI, Proc. 42nd International Conference on Programming Design and Implementation, 2021
paper | artifact | link | arXiv | code | BibTeXHierarchical Infinite Relational Model
F. Saad, V. Mansinghka
UAI, Proc. 37th Conference on Uncertainty in Artificial Intelligence, 2021
Oral Presentation
paper | link | arXiv | code | BibTeXThe Fast Loaded Dice Roller: A Near Optimal Exact Sampler for Discrete Probability Distributions
F. Saad, C. Freer, M. Rinard, V. Mansinghka
AISTATS, Proc. 24th International Conference on Artificial Intelligence and Statistics, 2020
paper | supplement | link | arXiv | code | BibTeXOptimal Approximate Sampling from Discrete Probability Distributions
F. Saad, C. Freer, M. Rinard, V. Mansinghka
POPL, Proc. ACM Program. Lang. 4(POPL), 2020
paper | supplement | artifact | link | code | arXiv | BibTeXA Family of Exact Goodness-of-Fit Tests for High-dimensional Discrete Distributions.
F. Saad, C. Freer, N. Ackerman, V. Mansinghka
AISTATS, Proc. 23rd International Conference on Artificial Intelligence and Statistics, 2019
paper | supplement | link | arXiv | BibTeXBayesian Synthesis of Probabilistic Programs for Automatic Data Modeling
F. Saad, M. Cusumano-Towner, U. Schaechtle, M. Rinard, V. Mansinghka
POPL, Proc. ACM Program. Lang. 3, 2020
paper | supplement | link | arXiv | BibTeXGen: A General Purpose Probabilistic Programming System with Programmable Inference
M. Cusumano-Towner, F. Saad, A. Lew, Mansinghka
PLDI, Proc. 40th International Conference on Programming Design and Implementation, 2019
paper | link | code | BibTeXElements of a Stochastic 3D Prediction Engine in Larval Zebrafish Prey Capture
A. Bolton, M. Haesemeyer, J. Jordi, U. Schaechtle, F. Saad, V. Mansinghka, J. Tenenbaum, F. Engert
eLife 8:e51975, 2019
paper | BibTeXTemporally-Reweighted Chinese Restaurant Process Mixtures for Clustering, Imputing, and Forecasting Multivariate Time Series
F. Saad, V. Mansinghka
AISTATS, Proc. 21st International Conference on Artificial Intelligence and Statistics, 2018
paper | supplement | link | code | BibTeXGoodness-of-Fit Tests for High-dimensional Discrete Distributions with Application to Convergence Diagnostics in Approximate Bayesian Inference
F. Saad, C. Freer, N. Ackerman, V. Mansinghka
AABI, 1st Symposium on Advances in Approximate Bayesian Inference, 2018
paper | link | BibTeXDetecting Dependencies in Sparse, Multivariate Databases Using Probabilistic Programming and Non-parametric Bayes
F. Saad, V. Mansinghka
AISTATS, Proc. 20th International Conference on Artificial Intelligence and Statistics, 2017
paper | supplement | link | arXiv | code | BibTeXProbabilistic Search for Structured Data via Probabilistic Programming and Nonparametric Bayes
F. Saad, L. Casarsa, V. Mansinghka
Technical Report arXiv:1704.01087, 2017
PROBPROG, 1st International Conference for Probabilistic Programming, 2018
arXiv | code | BibTeXTime Series Structure Discovery via Probabilistic Program Synthesis
U. Schaechtle*, F. Saad*, A. Radul, V. Mansinghka
Technical Report arXiv:1611.07051, 2017
PROBPROG, 1st International Conference for Probabilistic Programming, 2018
arXiv | BibTeXA Probabilistic Programming Approach to Probabilistic Data Analysis
F. Saad, V. Mansinghka
NIPS, Proc. 30th Conference on Neural Information Processing Systems, 2016
paper | link | code | BibTeXProbabilistic Data Analysis with Probabilistic Programming
F. Saad, V. Mansinghka
Technical Report arXiv:1608.05347, 2016
Extended version of NIPS 2016.
arXivProbabilistic Data Analysis with Probabilistic Programming
F. Saad
MEng Thesis, Massachusetts Institute of Technology, 2016
See also Google Scholar | dblp | BibTeX
Software
Software and repositories from research projects (2500+ Github stars).
estimators of entropy and information (AISTATS 2022)
Scalable estimators of information-theoretic quantities between probabilistic program variables.hierarchical infinite relational model (UAI 2021)
Bayesian nonparametric structure learning for complex relational systems.sum-product probabilistic language (PLDI 2021)
Probabilistic programming language with fast exact symbolic inference.fast loaded dice roller (AISTATS 2020)
Fast exact sampler for discrete probability distributions (see code generator by P. Occil).optimal approximate sampling (POPL 2020)
Optimal limited-precision approximate sampler for discrete distributions.temporal CRP mixture models (AISTATS 2018)
Bayesian method for clustering, imputing, and forecasting multivariate time series.Gen (PLDI 2019)
General-purpose probabilistic programming system with programmable inference.bayesdb (AISTATS 2017, NIPS 2016, MEng Thesis)
Probabilistic programming database for probabilistic data analysis built on sqlite.cgpm (AISTATS 2017, NIPS 2016, MEng Thesis)
Library of composable probabilistic models, used as modeling/inference backed of bayesdb
Sublime Text users check out these productivity plugins (8000+ users):
AddRemoveFolder;
RemoveLineBreaks;
ViewSetting.
Talks
Videos of presentations at conferences, workshops, and seminars.
Scalable Structure Learning and Inference via Probabilistic Programming
Thesis Defense (Online Version), Virtual
Thesis Defense (In-Person Version), Cambridge, MA
Scalable Structure Learning and Inference for Domain-Specific Probabilistic Programs
at LAFI 2022, Philadelphia, PA
SPPL: Probabilistic Programming with Fast Exact Symbolic Inference
at PROBPROG 2021, Virtual
at SPLASH 2021, Chicago, IL
at MIT PLSE Seminars 2021, Virtual
at PLDI 2021 [extended] [lightning], VirtualFairer and Faster AI Using Probabilistic Programming Languages
at MIT Horizon 2021, VirtualHierarchical Infinite Relational Model
at UAI 2021 (Oral Presentation), VirtualFast Loaded Dice Roller
at AISTATS 2020, VirtualOptimal Approximate Sampling
at POPL 2020, New Orleans, LABayesian Synthesis of Probabilistic Programs for Automatic Data Modeling
at CICS Seminars 2019, Notre Dame, IN
at POPL 2019, Cascais, Portugal
at PROBPROG 2018, Boston MA
Press
My research is occasionally covered in the press.- Estimating the informativeness of data, MIT News, Apr. 2022
- Exact symbolic AI for faster, better assessment of AI fairness, MIT News, Aug. 2021
- How and why computers roll loaded dice, Quanta Magazine, Jul. 2020
- How Rolling Loaded Dice Will Change the Future of AI, Intel Press Release, Jul. 2020
- Algorithm quickly simulates a roll of loaded dice, MIT News, Jan. 2020
- MIT Debuts Gen, a Julia-Based Language for Artificial Intelligence, InfoQ, Jul. 2019
- New AI programming language goes beyond deep learning, MIT News, Jun. 2019
- Democratizing data science, MIT News, Jan. 2019
- MIT lets AI ``synthesize'' computer programs to aid data scientists, ZDNet, Jan. 2019
Awards
Department Head Special Recognition Award for Exceptional Academic Service
MIT Department of Electrical Engineering and Computer Science, 2018Charles & Jennifer Johnson Computer Science Master of Engineering Thesis Award, 1st Place
MIT Department of Electrical Engineering and Computer Science, 2017