# Feras Saad

I recently completed the PhD in EECS at MIT (MEng/SB 2016), where I worked with Dr. Vikash Mansinghka and Prof. Martin Rinard. My research spans programming languages, artificial intelligence, and computational statistics.

I am pleased to be joining the Computer Science Department at Carnegie Mellon University as an Assistant Professor in Fall 2023. Before joining, I am spending one year as a Visiting Research Scientist at Google.

My group is recruiting students and postdocs! If you are interested in the research areas described below, please send me an email and (prospective students) apply to the CMU CS PhD program.

## Research

I have broad interests in developing new techniques that enable large-scale probabilistic modeling, inference, and computation across many applications. My work integrates ideas from programming and probability and encompasses the following research areas:

**Probabilistic programming**.
Programs are a uniquely expressive formalism for modeling and understanding complex empirical phenomena.
I am interested in building systems that help automate, formalize,
and scale-up hard aspects of modeling and inference. Projects include
synthesizing probabilistic programs for automated model discovery,
symbolic solvers for fast Bayesian inference,
modeling
and query DSLs
for Bayesian databases,
and general-purpose systems
for integrating symbolic, probabilistic, and neural approaches to engineering intelligent systems.

**Automatically discovering models from data**.
How can we rapidly convert datasets into probabilistic models that surface
interpretable patterns and make accurate predictions?
By performing Bayesian nonparametric inference over flexible symbolic model representations
that combine simple forms of judgement to form powerful models in aggregate.
These methods operate within
domain-specific data modeling languages
and have been applied to discovering models of
cross-sectional data,
relational systems,
and univariate and multivariate time series.

**Statistical estimators and tests**.
Now that most interesting probability distributions are expressed as computer programs
(e.g., stochastic simulators or probabilistic programs)
we need new techniques to analyze their statistical properties through the black-box interfaces they expose.
Examples include goodness-of-fit tests for programs
that simulate random discrete data structures and estimators of entropy and information
for probabilistic generative models.

**Fast random sampling algorithms**.
Generating random variates is a fundamental operation enabling probabilistic computation.
I am interested in exploring fundamental computational limits of sampling,
devising algorithms that are theoretically optimal or near-optimal in entropy and error,
and engineering samplers with extremely efficient runtime and memory—see optimal approximate sampling and
fast loaded dice roller.

**Applications**.
To be useful in modern applications, all these methods must be implemented in performant software systems,
many of which are available as open-source projects.
A long-term aim is to build software that make probabilistic learning more
broadly accessible
and help domain specialists solve applied problems in the
sciences,
engineering,
and
public interest.

## Publications

Scalable Structure Learning, Inference, and Analysis with Probabilistic Programs

**F. Saad**

PhD Thesis, Massachusetts Institute of Technology, 2022

Estimators of Entropy and Information via Inference in Probabilistic Models

**F. Saad**, M. Cusumano-Towner, V. Mansinghka

AISTATS, Proc. 25th International Conference on Artificial Intelligence and Statistics, 2022

paper | link | arXiv | supplement | BibTeXBayesian AutoML for Databases via the InferenceQL Probabilistic Programming System

U. Schaechtle, C. Freer, Z. Shelby,**F. Saad**, V. Mansinghka

AutoML, 1st International Conference on Automated Machine Learning (Late-Breaking Workshop), 2022

paper | link | BibTeXSPPL: Probabilistic Programming with Fast Exact Symbolic Inference

**F. Saad**, M. Rinard, Mansinghka

PLDI, Proc. 42nd International Conference on Programming Design and Implementation, 2021

paper | artifact | link | arXiv | code | BibTeXHierarchical Infinite Relational Model

**F. Saad**, V. Mansinghka

UAI, Proc. 37th Conference on Uncertainty in Artificial Intelligence, 2021

**Oral Presentation**

paper | link | arXiv | code | BibTeXThe Fast Loaded Dice Roller: A Near Optimal Exact Sampler for Discrete Probability Distributions

**F. Saad**, C. Freer, M. Rinard, V. Mansinghka

AISTATS, Proc. 24th International Conference on Artificial Intelligence and Statistics, 2020

paper | supplement | link | arXiv | code | BibTeXOptimal Approximate Sampling from Discrete Probability Distributions

**F. Saad**, C. Freer, M. Rinard, V. Mansinghka

POPL, Proc. ACM Program. Lang. 4(POPL), 2020

paper | supplement | artifact | link | code | arXiv | BibTeXA Family of Exact Goodness-of-Fit Tests for High-dimensional Discrete Distributions.

**F. Saad**, C. Freer, N. Ackerman, V. Mansinghka

AISTATS, Proc. 23rd International Conference on Artificial Intelligence and Statistics, 2019

paper | supplement | link | arXiv | BibTeXBayesian Synthesis of Probabilistic Programs for Automatic Data Modeling

**F. Saad**, M. Cusumano-Towner, U. Schaechtle, M. Rinard, V. Mansinghka

POPL, Proc. ACM Program. Lang. 3, 2020

paper | supplement | link | arXiv | BibTeXGen: A General Purpose Probabilistic Programming System with Programmable Inference

M. Cusumano-Towner,**F. Saad**, A. Lew, Mansinghka

PLDI, Proc. 40th International Conference on Programming Design and Implementation, 2019

paper | link | code | BibTeXElements of a Stochastic 3D Prediction Engine in Larval Zebrafish Prey Capture

A. Bolton, M. Haesemeyer, J. Jordi, U. Schaechtle,**F. Saad**, V. Mansinghka, J. Tenenbaum, F. Engert

eLife 8:e51975, 2019

paper | BibTeXTemporally-Reweighted Chinese Restaurant Process Mixtures for Clustering, Imputing, and Forecasting Multivariate Time Series

**F. Saad**, V. Mansinghka

AISTATS, Proc. 21st International Conference on Artificial Intelligence and Statistics, 2018

paper | supplement | link | code | BibTeXGoodness-of-Fit Tests for High-dimensional Discrete Distributions with Application to Convergence Diagnostics in Approximate Bayesian Inference

**F. Saad**, C. Freer, N. Ackerman, V. Mansinghka

AABI, 1st Symposium on Advances in Approximate Bayesian Inference, 2018

paper | link | BibTeXDetecting Dependencies in Sparse, Multivariate Databases Using Probabilistic Programming and Non-parametric Bayes

**F. Saad**, V. Mansinghka

AISTATS, Proc. 20th International Conference on Artificial Intelligence and Statistics, 2017

paper | supplement | link | arXiv | code | BibTeXProbabilistic Search for Structured Data via Probabilistic Programming and Nonparametric Bayes

**F. Saad**, L. Casarsa, V. Mansinghka

Technical Report*arXiv:1704.01087*, 2017

PROBPROG, 1st International Conference for Probabilistic Programming, 2018

arXiv | code | BibTeXTime Series Structure Discovery via Probabilistic Program Synthesis

U. Schaechtle*,**F. Saad***, A. Radul, V. Mansinghka

Technical Report*arXiv:1611.07051*, 2017

PROBPROG, 1st International Conference for Probabilistic Programming, 2018

arXiv | BibTeXA Probabilistic Programming Approach to Probabilistic Data Analysis

**F. Saad**, V. Mansinghka

NIPS, Proc. 30th Conference on Neural Information Processing Systems, 2016

paper | link | code | BibTeXProbabilistic Data Analysis with Probabilistic Programming

**F. Saad**, V. Mansinghka

Technical Report*arXiv:1608.05347*, 2016

Extended version of NIPS 2016.

arXivProbabilistic Data Analysis with Probabilistic Programming

**F. Saad**

MEng Thesis, Massachusetts Institute of Technology, 2016

See also Google Scholar | dblp | BibTeX

## Software

Software and repositories from research projects (2500+ Github stars).

estimators of entropy and information (AISTATS 2022)

Scalable estimators of information-theoretic quantities between probabilistic program variables.hierarchical infinite relational model (UAI 2021)

Bayesian nonparametric structure learning for complex relational systems.sum-product probabilistic language (PLDI 2021)

Probabilistic programming language with fast exact symbolic inference.fast loaded dice roller (AISTATS 2020)

Fast exact sampler for discrete probability distributions (see code generator by P. Occil).optimal approximate sampling (POPL 2020)

Optimal limited-precision approximate sampler for discrete distributions.temporal CRP mixture models (AISTATS 2018)

Bayesian method for clustering, imputing, and forecasting multivariate time series.Gen (PLDI 2019)

General-purpose probabilistic programming system with programmable inference.bayesdb (AISTATS 2017, NIPS 2016, MEng Thesis)

Probabilistic programming database for probabilistic data analysis built on sqlite.cgpm (AISTATS 2017, NIPS 2016, MEng Thesis)

Library of composable probabilistic models, used as modeling/inference backed of bayesdb

__Sublime Text users__ check out these productivity plugins (8000+ users):

AddRemoveFolder;
RemoveLineBreaks;
ViewSetting.

## Talks

Videos of presentations at conferences, workshops, and seminars.

Scalable Structure Learning and Inference via Probabilistic Programming

*Thesis Defense*(Online Version), Virtual

*Thesis Defense*(In-Person Version), Cambridge, MA

Scalable Structure Learning and Inference for Domain-Specific Probabilistic Programs

at*LAFI 2022*, Philadelphia, PA

SPPL: Probabilistic Programming with Fast Exact Symbolic Inference

at*PROBPROG 2021*, Virtual

at*SPLASH 2021*, Chicago, IL

at*MIT PLSE Seminars 2021*, Virtual

at*PLDI 2021*[extended] [lightning], VirtualFairer and Faster AI Using Probabilistic Programming Languages

at*MIT Horizon 2021*, VirtualHierarchical Infinite Relational Model

at*UAI 2021*(Oral Presentation), VirtualFast Loaded Dice Roller

at*AISTATS 2020*, VirtualOptimal Approximate Sampling

at*POPL 2020*, New Orleans, LABayesian Synthesis of Probabilistic Programs for Automatic Data Modeling

at*CICS Seminars 2019*, Notre Dame, IN

at*POPL 2019*, Cascais, Portugal

at*PROBPROG 2018*, Boston MA

## Press

My research is occasionally covered in the press.*Estimating the informativeness of data*, MIT News, Apr. 2022*Exact symbolic AI for faster, better assessment of AI fairness*, MIT News, Aug. 2021*How and why computers roll loaded dice*, Quanta Magazine, Jul. 2020*How Rolling Loaded Dice Will Change the Future of AI*, Intel Press Release, Jul. 2020*Algorithm quickly simulates a roll of loaded dice*, MIT News, Jan. 2020*MIT Debuts Gen, a Julia-Based Language for Artificial Intelligence*, InfoQ, Jul. 2019*New AI programming language goes beyond deep learning*, MIT News, Jun. 2019*Democratizing data science*, MIT News, Jan. 2019*MIT lets AI ``synthesize'' computer programs to aid data scientists*, ZDNet, Jan. 2019

## Awards

Department Head Special Recognition Award for Exceptional Academic Service

MIT Department of Electrical Engineering and Computer Science, 2018Charles & Jennifer Johnson Computer Science Master of Engineering Thesis Award, 1st Place

MIT Department of Electrical Engineering and Computer Science, 2017