# Feras Saad

I recently completed the PhD in EECS at MIT (MEng/SB 2016), where I worked with Vikash Mansinghka at the Probabilistic Computing Project and Martin Rinard at the Computer Science and Artificial Intelligence Laboratory. I work broadly in the areas of programming languages, statistics, and artificial intelligence.

I am pleased to be joining the Computer Science Department at Carnegie Mellon University as an Assistant Professor in Fall 2023. Before joining, I am spending one year as a Visiting Research Scientist at Google. My group is recruiting students and postdocs. If you are interested in working with me, please send me an email to fsaad@cmu.edu and (prospective students) apply to the CMU CS PhD program.

## Research

I am interested in developing techniques that enable large-scale probabilistic modeling, inference, and computation across challenging application domains. Some current research themes include:

**Probabilistic programming**.
Programs give us a uniquely expressive formalism for modeling and understanding
complex empirical phenomena. My work develops new programmable systems that
help automate, formalize, and scale-up the very hard aspects of modeling and inference.
[PLDI-21]
[POPL-19]
[PLDI-19]
[AISTATS-17]
[NIPS-16]

**Automatically discovering models from data**.
A grand challenge of AI is the ability to automate the process of discovering
accurate and interpretable models from data.
A mathematically elegant and practical approach to this problem is
Bayesian structure learning over rich probabilistic model families.
[ICML-23]
[UAI-21]
[POPL-19]
[AISTATS-18]

**Simulation-based statistical estimators and tests**.
As probabilistic programs become more widespread for representing complex probability distributions,
scalable simulation-based techniques are needed to analyze their statistical
properties using the black-box computational interfaces they expose.
[AISTATS-22]
[AISTATS-19]

**Fast random sampling algorithms**.
This thread explores fundamental computational limits of random sampling;
including new algorithms that are theoretically optimal/near-optimal
(in entropy and error) and extremely efficient in practice.
[POPL-20]
[AISTATS-20].

**Software and Applications**.
A central goal of my work is to build performant, freely available
software systems that make probabilistic modeling and inference more
broadly accessible
and help domain-specialists solve high-impact applied problems in the
sciences,
engineering,
and
public interest.
[eLife-2019]

## Publications

Sequential Monte Carlo Learning for Time Series Structure Discovery

**F. Saad**, B. Patton, M. Hoffman, R. Saurous, V. Mansinghka

ICML, Proc. 40th International Conference on Machine Learning, 2023

*To Appear*Scalable Structure Learning, Inference, and Analysis with Probabilistic Programs

**F. Saad**

PhD Thesis, Massachusetts Institute of Technology, 2022

Estimators of Entropy and Information via Inference in Probabilistic Models

**F. Saad**, M. Cusumano-Towner, V. Mansinghka

AISTATS, Proc. 25th International Conference on Artificial Intelligence and Statistics, 2022

paper | link | arXiv | supplement | BibTeXBayesian AutoML for Databases via the InferenceQL Probabilistic Programming System

U. Schaechtle, C. Freer, Z. Shelby,**F. Saad**, V. Mansinghka

AutoML, 1st International Conference on Automated Machine Learning (Late-Breaking Workshop), 2022

paper | link | BibTeXSPPL: Probabilistic Programming with Fast Exact Symbolic Inference

**F. Saad**, M. Rinard, Mansinghka

PLDI, Proc. 42nd International Conference on Programming Design and Implementation, 2021

paper | artifact | link | arXiv | code | BibTeXHierarchical Infinite Relational Model

**F. Saad**, V. Mansinghka

UAI, Proc. 37th Conference on Uncertainty in Artificial Intelligence, 2021

Oral Presentation

paper | link | arXiv | code | BibTeXThe Fast Loaded Dice Roller: A Near Optimal Exact Sampler for Discrete Probability Distributions

**F. Saad**, C. Freer, M. Rinard, V. Mansinghka

AISTATS, Proc. 24th International Conference on Artificial Intelligence and Statistics, 2020

paper | supplement | link | arXiv | code | BibTeXOptimal Approximate Sampling from Discrete Probability Distributions

**F. Saad**, C. Freer, M. Rinard, V. Mansinghka

POPL, Proc. ACM Program. Lang. 4(POPL), 2020

paper | supplement | artifact | link | code | arXiv | BibTeXA Family of Exact Goodness-of-Fit Tests for High-dimensional Discrete Distributions

**F. Saad**, C. Freer, N. Ackerman, V. Mansinghka

AISTATS, Proc. 23rd International Conference on Artificial Intelligence and Statistics, 2019

paper | supplement | link | arXiv | BibTeXBayesian Synthesis of Probabilistic Programs for Automatic Data Modeling

**F. Saad**, M. Cusumano-Towner, U. Schaechtle, M. Rinard, V. Mansinghka

POPL, Proc. ACM Program. Lang. 3, 2020

paper | supplement | link | arXiv | BibTeXGen: A General Purpose Probabilistic Programming System with Programmable Inference

M. Cusumano-Towner,**F. Saad**, A. Lew, Mansinghka

PLDI, Proc. 40th International Conference on Programming Design and Implementation, 2019

paper | link | code | BibTeXElements of a Stochastic 3D Prediction Engine in Larval Zebrafish Prey Capture

A. Bolton, M. Haesemeyer, J. Jordi, U. Schaechtle,**F. Saad**, V. Mansinghka, J. Tenenbaum, F. Engert

eLife 8:e51975, 2019

paper | BibTeXTemporally-Reweighted Chinese Restaurant Process Mixtures

for Clustering, Imputing, and Forecasting Multivariate Time Series

**F. Saad**, V. Mansinghka

AISTATS, Proc. 21st International Conference on Artificial Intelligence and Statistics, 2018

paper | supplement | link | code | BibTeXGoodness-of-Fit Tests for High-dimensional Discrete Distributions

with Application to Convergence Diagnostics in Approximate Bayesian Inference

**F. Saad**, C. Freer, N. Ackerman, V. Mansinghka

AABI, 1st Symposium on Advances in Approximate Bayesian Inference, 2018

paper | link | BibTeXDetecting Dependencies in Sparse, Multivariate Databases

Using Probabilistic Programming and Non-parametric Bayes

**F. Saad**, V. Mansinghka

AISTATS, Proc. 20th International Conference on Artificial Intelligence and Statistics, 2017

paper | supplement | link | arXiv | code | BibTeXProbabilistic Search for Structured Data via Probabilistic Programming and Nonparametric Bayes

**F. Saad**, L. Casarsa, V. Mansinghka

arXiv, Technical Report*arXiv:1704.01087*, 2017

PROBPROG, 1st International Conference for Probabilistic Programming, 2018

arXiv | code | BibTeXTime Series Structure Discovery via Probabilistic Program Synthesis

U. Schaechtle*,**F. Saad***, A. Radul, V. Mansinghka

arXiv, Technical Report*arXiv:1611.07051*, 2017

PROBPROG, 1st International Conference for Probabilistic Programming, 2018

arXiv | BibTeXA Probabilistic Programming Approach to Probabilistic Data Analysis

**F. Saad**, V. Mansinghka

NIPS, Proc. 30th Conference on Neural Information Processing Systems, 2016

paper | link | code | BibTeXProbabilistic Data Analysis with Probabilistic Programming

**F. Saad**, V. Mansinghka

arXiv, Technical Report*arXiv:1608.05347*, 2016

Extended version of NIPS 2016.

arXiv | BibTeXProbabilistic Data Analysis with Probabilistic Programming

**F. Saad**

MEng Thesis, Massachusetts Institute of Technology, 2016

Charles & Jennifer Johnson MEng Thesis Award, 1st Place

See also Google Scholar | dblp | BibTeX

## Software

Software and repositories from research projects (2500+ Github stars).

estimators of entropy and information (AISTATS 2022)

Scalable estimators of information-theoretic quantities between probabilistic program variables.hierarchical infinite relational model (UAI 2021)

Bayesian nonparametric structure learning for complex relational systems.sum-product probabilistic language (PLDI 2021)

Probabilistic programming language with fast exact symbolic inference.fast loaded dice roller (AISTATS 2020)

Fast exact sampler for discrete probability distributions (see code generator by P. Occil).optimal approximate sampling (POPL 2020)

Optimal limited-precision approximate sampler for discrete distributions.temporal CRP mixture models (AISTATS 2018)

Bayesian method for clustering, imputing, and forecasting multivariate time series.Gen (PLDI 2019)

General-purpose probabilistic programming system with programmable inference.bayesdb (AISTATS 2017, NIPS 2016, MEng Thesis)

Probabilistic programming database for probabilistic data analysis built on sqlite.cgpm (AISTATS 2017, NIPS 2016, MEng Thesis)

Library of composable probabilistic models, used as modeling/inference backed of bayesdb

__Sublime Text Users__: You may be interested in these productivity plugins (8000+ users):

AddRemoveFolder;
RemoveLineBreaks;
ViewSetting.

## Talks

Scalable Structure Learning and Inference via Probabilistic Programming

*MIT Thesis Defense*, Virtual

*MIT Thesis Defense*, Cambridge, MA

Scalable Structure Learning and Inference for Domain-Specific Probabilistic Programs

*LAFI 2022*, Philadelphia, PA

SPPL: Probabilistic Programming with Fast Exact Symbolic Inference

*PROBPROG 2021*, Virtual

*SPLASH 2021*, Chicago, IL

*MIT PLSE Seminars 2021*, Virtual

*PLDI 2021*[extended] [lightning], VirtualFairer and Faster AI Using Probabilistic Programming Languages

*MIT Horizon 2021*, VirtualHierarchical Infinite Relational Model

*UAI 2021*(Oral Presentation), VirtualFast Loaded Dice Roller

*AISTATS 2020*, VirtualOptimal Approximate Sampling from Discrete Probability Distributions

*POPL 2020*, New Orleans, LABayesian Synthesis of Probabilistic Programs for Automatic Data Modeling

*CICS Seminars 2019*, Notre Dame, IN

*POPL 2019*, Cascais, Portugal

*PROBPROG 2018*, Boston MA

## Press

*Estimating the informativeness of data*, MIT News, Apr. 2022*Exact symbolic AI for faster, better assessment of AI fairness*, MIT News, Aug. 2021*How and why computers roll loaded dice*, Quanta Magazine, Jul. 2020*How Rolling Loaded Dice Will Change the Future of AI*, Intel Press Release, Jul. 2020*Algorithm quickly simulates a roll of loaded dice*, MIT News, Jan. 2020*MIT Debuts Gen, a Julia-Based Language for Artificial Intelligence*, InfoQ, Jul. 2019*New AI programming language goes beyond deep learning*, MIT News, Jun. 2019*Democratizing data science*, MIT News, Jan. 2019*MIT lets AI ``synthesize'' computer programs to aid data scientists*, ZDNet, Jan. 2019