Machine Learning Thesis Proposal
- Gates Hillman Centers
- XUN ZHENG
- Ph.D. Student
- Machine Learning Department
- Carnegie Mellon University
Learning DAGs with Continuous Optimization
Learning the structure of directed acyclic graphs (DAGs, also known as Bayesian networks) from data is an important and classical problem in machine learning, with prominent applications in causal inference, fairness, interpretability, and biology, etc. This is a challenging problem since the search space of DAGs is combinatorial and scales superexponentially with the number of nodes. Existing approaches often rely on various local heuristics for enforcing the acyclicity constraint. By contrast, structure learning for undirected graphical models (e.g. Gaussian MRF) is recognized as a tractable optimization problem nowadays, and achieved huge success in various practical domains such as bioinformatics.In this thesis, we take a first step towards bridging this gap between directed and undirected graphical models. We begin by introducing a fundamentally differ- ent strategy for Bayesian network structure learning: We formulate the problem as a purely continuous optimization program over real matrices that avoids the combinatorial constraint entirely. This is achieved by a novel characterization of acyclicity that is not only smooth but also exact. The resulting problem can be efficiently solved by standard numerical algorithms, without imposing any structural assumptions on the graph such as bounded treewidth or in-degree.
We then study the generalization of the above continuous algorithm to learning nonparametric DAGs. We extend the algebraic characterization of acyclicity to nonparametric structural equation model (SEM) by leveraging nonparametric sparsity based on partial derivatives, resulting in a continuous optimization problem that can be applied to a variety of nonparametric and semiparametric models including GLMs, additive noise models, and index models as special cases.
Lastly, we propose to investigate the theoretical aspects of the continuous optimization for DAG learning. In particular, as the algebraic characterization inherits the nonconvexity of the set of DAGs, we plan to study the trajectory of penalty-type methods for convex loss functions with nonconvex constraints. We also study the implication of this approach to recent advances in learning disentangled representations.
Eric Xing (Co-Chair)
Pradeep Ravikumar (Co-Chair)
Silvia Chiappa (DeepMind)