Feras Saad
I recently completed the PhD in EECS at MIT (MEng/SB 2016), where I worked with Vikash Mansinghka at the Probabilistic Computing Project and Martin Rinard at the Computer Science and Artificial Intelligence Laboratory. I work broadly in the areas of programming languages, statistics, and artificial intelligence.
I am pleased to be joining the Computer Science Department at Carnegie Mellon University as an Assistant Professor in Fall 2023. Before joining, I am spending one year as a Visiting Research Scientist at Google. My group is recruiting students and postdocs. If you are interested in working with me, please send me an email to fsaad@cmu.edu and (prospective students) apply to the CMU CS PhD program.
Research
I am interested in developing techniques that enable large-scale probabilistic modeling, inference, and computation across challenging application domains. Some current research themes include:
Probabilistic programming.
Programs give us a uniquely expressive formalism for modeling and understanding
complex empirical phenomena. My work develops new programmable systems that
help automate, formalize, and scale-up the very hard aspects of modeling and inference.
[PLDI-21]
[POPL-19]
[PLDI-19]
[AISTATS-17]
[NIPS-16]
Automatically discovering models from data.
A grand challenge of AI is the ability to automate the process of discovering
accurate and interpretable models from data.
A mathematically elegant and practical approach to this problem is
Bayesian structure learning over rich probabilistic model families.
[ICML-23]
[UAI-21]
[POPL-19]
[AISTATS-18]
Simulation-based statistical estimators and tests.
As probabilistic programs become more widespread for representing complex probability distributions,
scalable simulation-based techniques are needed to analyze their statistical
properties using the black-box computational interfaces they expose.
[AISTATS-22]
[AISTATS-19]
Fast random sampling algorithms.
This thread explores fundamental computational limits of random sampling;
including new algorithms that are theoretically optimal/near-optimal
(in entropy and error) and extremely efficient in practice.
[POPL-20]
[AISTATS-20].
Software and Applications.
A central goal of my work is to build performant, freely available
software systems that make probabilistic modeling and inference more
broadly accessible
and help domain-specialists solve high-impact applied problems in the
sciences,
engineering,
and
public interest.
[eLife-2019]
Publications
Sequential Monte Carlo Learning for Time Series Structure Discovery
F. Saad, B. Patton, M. Hoffman, R. Saurous, V. Mansinghka
ICML, Proc. 40th International Conference on Machine Learning, 2023
To AppearScalable Structure Learning, Inference, and Analysis with Probabilistic Programs
F. Saad
PhD Thesis, Massachusetts Institute of Technology, 2022
Estimators of Entropy and Information via Inference in Probabilistic Models
F. Saad, M. Cusumano-Towner, V. Mansinghka
AISTATS, Proc. 25th International Conference on Artificial Intelligence and Statistics, 2022
paper | link | arXiv | supplement | BibTeXBayesian AutoML for Databases via the InferenceQL Probabilistic Programming System
U. Schaechtle, C. Freer, Z. Shelby, F. Saad, V. Mansinghka
AutoML, 1st International Conference on Automated Machine Learning (Late-Breaking Workshop), 2022
paper | link | BibTeXSPPL: Probabilistic Programming with Fast Exact Symbolic Inference
F. Saad, M. Rinard, Mansinghka
PLDI, Proc. 42nd International Conference on Programming Design and Implementation, 2021
paper | artifact | link | arXiv | code | BibTeXHierarchical Infinite Relational Model
F. Saad, V. Mansinghka
UAI, Proc. 37th Conference on Uncertainty in Artificial Intelligence, 2021
Oral Presentation
paper | link | arXiv | code | BibTeXThe Fast Loaded Dice Roller: A Near Optimal Exact Sampler for Discrete Probability Distributions
F. Saad, C. Freer, M. Rinard, V. Mansinghka
AISTATS, Proc. 24th International Conference on Artificial Intelligence and Statistics, 2020
paper | supplement | link | arXiv | code | BibTeXOptimal Approximate Sampling from Discrete Probability Distributions
F. Saad, C. Freer, M. Rinard, V. Mansinghka
POPL, Proc. ACM Program. Lang. 4(POPL), 2020
paper | supplement | artifact | link | code | arXiv | BibTeXA Family of Exact Goodness-of-Fit Tests for High-dimensional Discrete Distributions
F. Saad, C. Freer, N. Ackerman, V. Mansinghka
AISTATS, Proc. 23rd International Conference on Artificial Intelligence and Statistics, 2019
paper | supplement | link | arXiv | BibTeXBayesian Synthesis of Probabilistic Programs for Automatic Data Modeling
F. Saad, M. Cusumano-Towner, U. Schaechtle, M. Rinard, V. Mansinghka
POPL, Proc. ACM Program. Lang. 3, 2020
paper | supplement | link | arXiv | BibTeXGen: A General Purpose Probabilistic Programming System with Programmable Inference
M. Cusumano-Towner, F. Saad, A. Lew, Mansinghka
PLDI, Proc. 40th International Conference on Programming Design and Implementation, 2019
paper | link | code | BibTeXElements of a Stochastic 3D Prediction Engine in Larval Zebrafish Prey Capture
A. Bolton, M. Haesemeyer, J. Jordi, U. Schaechtle, F. Saad, V. Mansinghka, J. Tenenbaum, F. Engert
eLife 8:e51975, 2019
paper | BibTeXTemporally-Reweighted Chinese Restaurant Process Mixtures
for Clustering, Imputing, and Forecasting Multivariate Time Series
F. Saad, V. Mansinghka
AISTATS, Proc. 21st International Conference on Artificial Intelligence and Statistics, 2018
paper | supplement | link | code | BibTeXGoodness-of-Fit Tests for High-dimensional Discrete Distributions
with Application to Convergence Diagnostics in Approximate Bayesian Inference
F. Saad, C. Freer, N. Ackerman, V. Mansinghka
AABI, 1st Symposium on Advances in Approximate Bayesian Inference, 2018
paper | link | BibTeXDetecting Dependencies in Sparse, Multivariate Databases
Using Probabilistic Programming and Non-parametric Bayes
F. Saad, V. Mansinghka
AISTATS, Proc. 20th International Conference on Artificial Intelligence and Statistics, 2017
paper | supplement | link | arXiv | code | BibTeXProbabilistic Search for Structured Data via Probabilistic Programming and Nonparametric Bayes
F. Saad, L. Casarsa, V. Mansinghka
arXiv, Technical Report arXiv:1704.01087, 2017
PROBPROG, 1st International Conference for Probabilistic Programming, 2018
arXiv | code | BibTeXTime Series Structure Discovery via Probabilistic Program Synthesis
U. Schaechtle*, F. Saad*, A. Radul, V. Mansinghka
arXiv, Technical Report arXiv:1611.07051, 2017
PROBPROG, 1st International Conference for Probabilistic Programming, 2018
arXiv | BibTeXA Probabilistic Programming Approach to Probabilistic Data Analysis
F. Saad, V. Mansinghka
NIPS, Proc. 30th Conference on Neural Information Processing Systems, 2016
paper | link | code | BibTeXProbabilistic Data Analysis with Probabilistic Programming
F. Saad, V. Mansinghka
arXiv, Technical Report arXiv:1608.05347, 2016
Extended version of NIPS 2016.
arXiv | BibTeXProbabilistic Data Analysis with Probabilistic Programming
F. Saad
MEng Thesis, Massachusetts Institute of Technology, 2016
Charles & Jennifer Johnson MEng Thesis Award, 1st Place
See also Google Scholar | dblp | BibTeX
Software
Software and repositories from research projects (2500+ Github stars).
estimators of entropy and information (AISTATS 2022)
Scalable estimators of information-theoretic quantities between probabilistic program variables.hierarchical infinite relational model (UAI 2021)
Bayesian nonparametric structure learning for complex relational systems.sum-product probabilistic language (PLDI 2021)
Probabilistic programming language with fast exact symbolic inference.fast loaded dice roller (AISTATS 2020)
Fast exact sampler for discrete probability distributions (see code generator by P. Occil).optimal approximate sampling (POPL 2020)
Optimal limited-precision approximate sampler for discrete distributions.temporal CRP mixture models (AISTATS 2018)
Bayesian method for clustering, imputing, and forecasting multivariate time series.Gen (PLDI 2019)
General-purpose probabilistic programming system with programmable inference.bayesdb (AISTATS 2017, NIPS 2016, MEng Thesis)
Probabilistic programming database for probabilistic data analysis built on sqlite.cgpm (AISTATS 2017, NIPS 2016, MEng Thesis)
Library of composable probabilistic models, used as modeling/inference backed of bayesdb
Sublime Text Users: You may be interested in these productivity plugins (8000+ users):
AddRemoveFolder;
RemoveLineBreaks;
ViewSetting.
Talks
Scalable Structure Learning and Inference via Probabilistic Programming
MIT Thesis Defense, Virtual
MIT Thesis Defense, Cambridge, MA
Scalable Structure Learning and Inference for Domain-Specific Probabilistic Programs
LAFI 2022, Philadelphia, PA
SPPL: Probabilistic Programming with Fast Exact Symbolic Inference
PROBPROG 2021, Virtual
SPLASH 2021, Chicago, IL
MIT PLSE Seminars 2021, Virtual
PLDI 2021 [extended] [lightning], VirtualFairer and Faster AI Using Probabilistic Programming Languages
MIT Horizon 2021, VirtualHierarchical Infinite Relational Model
UAI 2021 (Oral Presentation), VirtualFast Loaded Dice Roller
AISTATS 2020, VirtualOptimal Approximate Sampling from Discrete Probability Distributions
POPL 2020, New Orleans, LABayesian Synthesis of Probabilistic Programs for Automatic Data Modeling
CICS Seminars 2019, Notre Dame, IN
POPL 2019, Cascais, Portugal
PROBPROG 2018, Boston MA
Press
- Estimating the informativeness of data, MIT News, Apr. 2022
- Exact symbolic AI for faster, better assessment of AI fairness, MIT News, Aug. 2021
- How and why computers roll loaded dice, Quanta Magazine, Jul. 2020
- How Rolling Loaded Dice Will Change the Future of AI, Intel Press Release, Jul. 2020
- Algorithm quickly simulates a roll of loaded dice, MIT News, Jan. 2020
- MIT Debuts Gen, a Julia-Based Language for Artificial Intelligence, InfoQ, Jul. 2019
- New AI programming language goes beyond deep learning, MIT News, Jun. 2019
- Democratizing data science, MIT News, Jan. 2019
- MIT lets AI ``synthesize'' computer programs to aid data scientists, ZDNet, Jan. 2019