// about

Postdoc in the SAILING Lab at CMU working with Eric Xing, Machine Learning Department.

// research interests

- Machine learning
- Graphical models
- High-dimensional statistics
- Computational biology
- Unsupervised learning

// recent work

Aragam, B., J. Gu, and Zhou, Q. (2017). Learning Large-Scale Bayesian Networks with the sparsebn Package. (preprint)

Learning graphical models from data is an important problem with wide applications, ranging from genomics to the social sciences. Nowadays datasets typically have upwards of thousands---sometimes tens or hundreds of thousands---of variables and far fewer samples. To meet this challenge, we develop a new R package called sparsebn for learning the structure of large, sparse graphical models with a focus on Bayesian networks. While there are many existing packages for this task within the R ecosystem, this package focuses on the unique setting of learning large networks from high-dimensional data, possibly with interventions. As such, the methods provided place a premium on scalability and consistency in a high-dimensional setting. Furthermore, in the presence of interventions, the methods implemented here achieve the goal of learning a causal network from data. The sparsebn package is open-source and available on CRAN.

Keywords: R, software, graphical modeling, directed acyclic graphs, structural equations

Aragam, B., Amini, A. and Zhou, Q. (2015). Learning Directed Acyclic Graphs with Penalized Neighbourhood Regression. (preprint)

We consider the problem of estimating a directed acyclic graph (DAG) for a multivariate normal distribution from high-dimensional data with p >> n. Our main results establish nonasymptotic deviation bounds on the estimation error, sparsity bounds, and model selection consistency for a penalized least squares estimator under concave regularization. The proofs rely on interpreting the graphical model as a recursive linear structural equation model, which reduces the estimation problem to a series of tractable neighbourhood regressions and allows us to avoid making any assumptions regarding faithfulness. In doing so, we provide some novel techniques for handling general nonidentifiable and nonconvex problems. These techniques are used to guarantee uniform control over a superexponential number of neighbourhood regression problems by exploiting various notions of monotonicity among them. Our results apply to a wide variety of practical situations that allow for arbitrary nondegenerate covariance structures as well as many popular regularizers including the MCP, SCAD, L0 and L1.

Keywords: Graphical modeling, high-dimensional statistics, concave regularization, directed acyclic graphs, structural equations, sparse regression

Aragam, B. and Zhou, Q. (2015). Concave penalized estimation of sparse Gaussian Bayesian networks. Journal of Machine Learning Research, 16:2273–2328, 2015. (link)

We develop a penalized likelihood estimation framework to learn the structure of Gaussian Bayesian networks from observational data. In contrast to recent methods which accelerate the learning problem by restricting the search space, our main contribution is a fast algorithm for score-based structure learning which does not restrict the search space in any way and works on high-dimensional data sets with thousands of variables. Our use of concave regularization, as opposed to the more popular L0 (e.g. BIC) penalty, is new. Moreover, we provide theoretical guarantees which generalize existing asymptotic results when the underlying distribution is Gaussian. Most notably, our framework does not require the existence of a so-called faithful DAG representation, and as a result, the theory must handle the inherent nonidentifiability of the estimation problem in a novel way. Finally, as a matter of independent interest, we provide a comprehensive comparison of our approach to several standard structure learning methods using open-source packages developed for the R language. Based on these experiments, we show that our algorithm obtains higher sensitivity with comparable false discovery rates for high-dimensional data and scales efficiently as the number of nodes increases. In particular, the total runtime for our method to generate a solution path of 20 estimates for DAGs with 8000 nodes is around one hour.

Keywords: Bayesian networks, concave penalization, directed acyclic graphs, coordinate descent, nonconvex optimization

// research

My main interests involve problems at the intersection of high-dimensional statistics and machine learning, with a focus on developing scalable algorithms with sound theoretical guarantees. I am particularly interested in learning in nonconvex and distributed settings. Some of my general interests include:

- High-dimensional statistics / sparse modeling
- Unsupervised learning / graphical models
- Structural equation modeling / causal inference
- Computational biology / genomics
- Decision theory / social choice theory

// preprints

Aragam, B., J. Gu, and Zhou, Q. (2017). Learning Large-Scale Bayesian Networks with the sparsebn Package. (preprint)

Learning graphical models from data is an important problem with wide applications, ranging from genomics to the social sciences. Nowadays datasets typically have upwards of thousands---sometimes tens or hundreds of thousands---of variables and far fewer samples. To meet this challenge, we develop a new R package called sparsebn for learning the structure of large, sparse graphical models with a focus on Bayesian networks. While there are many existing packages for this task within the R ecosystem, this package focuses on the unique setting of learning large networks from high-dimensional data, possibly with interventions. As such, the methods provided place a premium on scalability and consistency in a high-dimensional setting. Furthermore, in the presence of interventions, the methods implemented here achieve the goal of learning a causal network from data. The sparsebn package is open-source and available on CRAN.

Keywords: R, software, graphical modeling, directed acyclic graphs, structural equations

Aragam, B., Amini, A. and Zhou, Q. (2015). Learning Directed Acyclic Graphs with Penalized Neighbourhood Regression. (preprint)

We consider the problem of estimating a directed acyclic graph (DAG) for a multivariate normal distribution from high-dimensional data with p >> n. Our main results establish nonasymptotic deviation bounds on the estimation error, sparsity bounds, and model selection consistency for a penalized least squares estimator under concave regularization. The proofs rely on interpreting the graphical model as a recursive linear structural equation model, which reduces the estimation problem to a series of tractable neighbourhood regressions and allows us to avoid making any assumptions regarding faithfulness. In doing so, we provide some novel techniques for handling general nonidentifiable and nonconvex problems. These techniques are used to guarantee uniform control over a superexponential number of neighbourhood regression problems by exploiting various notions of monotonicity among them. Our results apply to a wide variety of practical situations that allow for arbitrary nondegenerate covariance structures as well as many popular regularizers including the MCP, SCAD, L0 and L1.

Keywords: Graphical modeling, high-dimensional statistics, concave regularization, directed acyclic graphs, structural equations, sparse regression

// publications

Aragam, B. and Zhou, Q. (2015). Concave penalized estimation of sparse Gaussian Bayesian networks. Journal of Machine Learning Research, 16:2273–2328, 2015. (link)

We develop a penalized likelihood estimation framework to learn the structure of Gaussian Bayesian networks from observational data. In contrast to recent methods which accelerate the learning problem by restricting the search space, our main contribution is a fast algorithm for score-based structure learning which does not restrict the search space in any way and works on high-dimensional data sets with thousands of variables. Our use of concave regularization, as opposed to the more popular L0 (e.g. BIC) penalty, is new. Moreover, we provide theoretical guarantees which generalize existing asymptotic results when the underlying distribution is Gaussian. Most notably, our framework does not require the existence of a so-called faithful DAG representation, and as a result, the theory must handle the inherent nonidentifiability of the estimation problem in a novel way. Finally, as a matter of independent interest, we provide a comprehensive comparison of our approach to several standard structure learning methods using open-source packages developed for the R language. Based on these experiments, we show that our algorithm obtains higher sensitivity with comparable false discovery rates for high-dimensional data and scales efficiently as the number of nodes increases. In particular, the total runtime for our method to generate a solution path of 20 estimates for DAGs with 8000 nodes is around one hour.

Keywords: Bayesian networks, concave penalization, directed acyclic graphs, coordinate descent, nonconvex optimization

// dissertation and theses

Structure Learning of Linear Bayesian Networks in High-Dimensions, PhD Thesis (joint work with Q. Zhou and A. Amini)

Research into graphical models is a rapidly developing enterprise, garnering significant interest from both the statistics and machine learning communities. A parallel thread in both communities has been the study of low-dimensional structures in high-dimensional models where $p\gg n$. Recently, there has been a surge of interest in connecting these threads in order to understand the behaviour of graphical models in high-dimensions. Due to their relative simplicity, undirected models such as the Gaussian graphical model and Ising models have received most of the attention, whereas directed graphical models have received comparatively little attention. An important yet largely unresolved class of directed graphical models are Bayesian networks, or directed acyclic graphs (DAGs). These models have a wide variety of applications in aritificial intelligence, machine learning, genetics, and computer vision, but estimation of Bayesian networks in high-dimensions is not well-understood. The main focus of this dissertation is to address some fundamental questions about these models in high-dimensions.

The primary goal is to develop both algorithms and theory for estimating continuous, linear Bayesian networks, capable of handling modern high-dimensional problems. Motivated by problems from the regression literature, we show how to adapt recent work in sparse learning and nonconvex optimization to the structure learning problem for Bayesian networks in order to estimate DAGs with several thousand nodes. We draw an explicit connection between linear Bayesian networks and so-called neighbourhood regression problems and show how this can be exploited in order to derive nonasymptotic performance bounds for penalized least squares estimators of directed graphical models.

On the algorithmic side, we develop a method for estimating Gaussian Bayesian networks based on convex reparametrization and cyclic coordinate descent. In contrast to recent methods which accelerate the learning problem by restricting the search space, we propose a method for score-based structure learning which does not restrict the search space. We do not require the existence of a so-called faithful DAG representation, and as a result, our methodology must handle the inherent nonidentifiability of the estimation problem in a novel way. On the theoretical side, we provide (a) Finite-dimensional performance guarantees for local minima of the resulting nonconvex program, and (b) A general high-dimensional framework for global minima of the nonconvex program. Both the algorithms and theory apply to a general class of regularizers, including the MCP, SCAD, $\ell_1$ and $\ell_0$ penalties. Finally, as a matter of independent interest, we provide a comprehensive comparison of our approach to several standard structure learning methods using open-source packages developed for the \texttt{R} language.

Keywords: Bayesian networks, high-dimensional statistics, graphical models, sparse regression, concave regularization, nonconvex optimization

Volume Comparison with Integral Bounds in Lorentz Manifolds, Undergraduate Thesis (joint work with J. Corvino, A. Karl, and A. Rochford)

Ten years ago, Ehrlich and Sanchez produced a pointwise statement of the classical Bishop volume comparison theorem for so-called SCLV subsets of the causal future in a Lorentz manifold, while Petersen and Wei developed and proved an integral version for Riemannian manifolds. We apply Peterson and Wei's method to the SCLV sets, and verify that two essential differential equations from the Riemannian proof extend to the Lorentz setting. As a result, we obtain a volume comparison theorem for Lorentz manifolds with integral, rather than pointwise, bounds. We also briey discuss the history of the problem, starting with Bishop's original theorem from 1963.

Keywords: Differential geometry, volume comparison, Lorentz manifolds

I will not be teaching during 2016-17.

Past teaching assignments (all at UCLA):

- Statistics 10: Introduction to Statistical Reasoning (2015-2016)
- Statistics 495A: Teaching College Statistics (Winter 2015)
- Statistics 100A: Introduction to Probability (Spring 2014)
- Statistics 101B: Introduction to Design and Analysis of Experiments (Winter 2014)
- Statistics 102A: Introduction to Computational Statistics with R (Fall 2013)
- PIC 20A: Principles of Java (Spring 2010)
- PIC 10A: Introduction to C++ Programming (Winter 2010)
- PIC 10A: Introduction to C++ Programming (Fall 2009)

// software

// sparsebn package for R

sparsebn is an R package for learning large-scale Bayesian networks from high-dimensional data. Using recent techniques based on sparse regularization, coordinate descent, and nonconvex optimization, it provides methods that incorporate mixed experimental and observational data with either continuous or discrete observations. Notably, the algorithms scale to datasets with many thousands of variables. The underlying framework uses penalized maximum likelihood under L1 or concave (MCP) regularization.

// ccdr package for R

The source code for the CCDr algorithm described in Aragam and Zhou (2015) is freely available online through GitHub.

ccdr is an R package for structure learning of linear Bayesian networks from high-dimensional, Gaussian data. The underlying algorithm estimates a Bayesian network (aka DAG or belief net) using penalized maximum likelihood based on L1 or concave (MCP) regularization and observational data.

// consulting

I have some limited availability for part-time and short-term consulting projects in analytics, data science, modeling, and/or simulation. I am experienced in many core languages as well as web-oriented and design-oriented software.

- Data: R + Rcpp / SQL / PL/pgSQL / MapReduce
- Numerics: Mathematica / MATLAB
- Core: C++ / C / Python / Java / Visual Basic
- Web: HTML(5) / XML / CSS / Javascript / jQuery
- Design: Photoshop / Digital photography / Video editing

I am particularly interested in the analysis of social data, including social networks, advertising, and marketing.

// availability

Please e-mail me for more details.

// contact

If you want to...

...e-mail me: naragam at cs dot cmu dot edu.

...find me on LinkedIn, click here.

...see my CV, click here.

I also got a little bored while designing this site so I hid some easter eggs here and there.