Data Analysis Project (DAP) Presentation

  • Gates&Hillman Centers
  • McWilliams Classroom 4303
  • Ph.D. Student
  • Machine Learning Department
  • Carnegie Mellon University
Project Presentations

An Integrated Approach to Validating ChIP-Seq using A* Lasso for Sparse Bayesian Network Learning

Mapping the transcription network is integral to our understanding of the transcriptional regulation in cells. Recently, ChIP-Seq experiments have become popular to determine the regulatory relationships between transcription factors (TF) and its target genes by detecting physical binding. However, these experiments are noisy and many binding events are not actually functional. Instead of doing further laboratory experiments which are expensive and not realistic for humans, we propose to estimate a sparse Bayesian network structure of the TFs and target genes, and detect SNPs that perturb the network. These SNP perturbations provide evidence of functional bindings.

To learn the sparse Bayesian network, we present A* lasso, a single stage method that recovers the optimal sparse Bayesian network structure by solving a single optimization problem with A* search algorithm that uses lasso in its scoring system. Our approach substantially improves the computational efficiency of the well-known exact methods based on dynamic programming. In addition, to make method more practical for settings such as this that have a large number of variables, we suggest a heuristic scheme that dramatically reduces computational time without substantially compromising the quality of solutions. We apply our method to integrate SNP, expression and ChIP-Seq data. We detect several influential SNPs in our network, and are able to provide evidence for TF binding and characterize the effect on the surrounding network.

DAP Committee:
Seyoung Kim
Carl Kingsford
Geoff Gordon

For More Information, Please Contact: