Publications

Hetunandan Kamisetty, Bornika Ghosh, Chris Bailey-Kellogg and Chris J. Langmead, "Modeling and Inference of Sequence-Structure Specificity" in Proceedings of the Eight Annual Conference on Computational Systems Bioinformatics(CSB 2009), [pdf].
In order to evaluate protein sequences for simultaneous satisfaction of evolutionary and physical constraints, this paper develops a graphical model approach integrating sequence information from the evolutionary record of a protein family with structural information based on a molecular mechanics force field. Nodes in the graphical model represent choices for the backbone (native vs. decoys), amino acids (conservation analysis), and side-chain conformations (rotamer library). Edges capture dependence relationships, in both the sequence (correlated mutations) and the structure (direct physical interactions). The sequence and structure components of the model are complementary, in that the structure component may support choices that were not present in the sequence record due to bias and artifacts, while the sequence component may capture other constraints on protein viability, such as permitting an efficient folding pathway. Inferential procedures enable computation of the joint probability of a sequence-structure pair, thereby assessing the quality of the sequence with respect to both the protein family and the specificity of its energetic preference for the native structure against decoy structures. In a case study of WW domains, we show that by using the joint model and evaluating specificity, we obtain better prediction of foldedness of designed proteins (AUC of 0.85) than either a sequence-only or a structure-only model, and gain insights into how, where, and why the sequence and structure components complement each other.
Hetunandan Kamisetty and Chris J. Langmead, "A Bayesian Approach to Protein Model Quality Assesment," in Proceedings of the 26th Annual International Conference on Machine Learning(ICML 2009), p.p: 481-488 [pdf].
Given multiple possible models b1,b2,..,bn for a protein structure, a common sub-task in in-silico Protein Structure Prediction is ranking these models according to their quality. Extant approaches use MLE estimates of parameters ri to obtain point estimates of the Model Quality. We describe a Bayesian alternative to assessing the quality of these models that builds an MRF over the parameters of each model and performs approximate inference to integrate over them. Hyper- parameters w are learnt by optimizing a list-wise loss function over training data. Our results indicate that our Bayesian approach can significantly outperform MLE estimates and that optimizing the hyper-parameters can further improve results.
Hetunandan Kamisetty, Chris Bailey-Kellogg and Chris J. Langmead, "A Graphical Model Approach for Predicting Free Energies of Association for Protein-Protein Interactions under Backbone and Side-chain Flexibility", Carnegie Mellon University School of Computer Science Technical Report CMU-CS-08-162 December 2008. A talk based on these results won the Best Scientific Contribution Award at the Fifth ISMB Satellite Meeting on Structural Bioinformatics and Computational Biophysics, 3DSIG 2009 [pdf].
Biomolecular systems are governed by changes in free energy, and the ability to predict binding free energies provides both better understanding of biomolecular interactions and the ability to optimize them. We present the first graphical-model based approach, which we call GOBLIN (Graphical mOdel for BiomoLecular INteractions), for predicting binding free energies for all-atom models of protein complexes. Our method is physically sound in that internal energies are computed using standard molecular-mechanics force fields, and free energies are obtained by computing a rigorous approximation to the partition function of the system. Moreover, GOBLIN explicitly models both backbone and side-chain flexibility, and, when desired, employs non-linear regression to optimize force-field parameters. In tests on a benchmark set of more than 700 mutants, we show that our method is fast, running in a few minutes, and accurate, achieving root mean square errors (RMSEs) between predicted and experimental binding free energies of 2.05 kcal/mol. GOBLIN’s RMSEs are 0.55 kcal/mol better than the well-known program ROSETTA, despite the fact that we use the ROSETTA force field for computing internal energies. That is, our increase in accuracy is due to our ability to accurately estimate entropic contributions to the free energy. Finally, using our novel algorithm for optimizing force-field parameters on specific protein complexes reduced GOBLIN’s RMSE by 0.26 kcal/mol on average.
Hetunandan Kamisetty, Eric P. Xing and Chris J. Langmead, "Free Energy Estimates of All-atom Protein Structures using Generalized Belief Propagation." Proceedings of the Eleventh Annual International Conference on Research in Computational Molecular Biology (RECOMB 2007), pp:366-380 [pdf]. An earlier version appeared as CMU-CS-06-160.
We present a technique for approximating the free energy of protein structures using Generalized Belief Propagation (GBP). The accuracy and utility of these estimates are then demonstrated in two different application domains. First, we show that the entropy component of our free energy estimates can be useful in distinguishing native protein structures from decoys. Second, we show that our estimates of the changes in free energy of protein structures upon mutation have a linear correlation of upto 0.70 with laboratory measurements. GBP is also efficient, taking a few minutes to run on a typical sized protein, further suggesting that GBP may be an attractive alternative to more costly molecular dynamic simulations for some tasks.
Hetunandan Kamisetty, Chris Bailey-Kellogg and Gopal Pandurangan, "An efficient randomized algorithm for contact-based NMR backbone resonance assignment," Bioinformatics 2006, 22(2):172-180 [abstract, html, pdf, preprint(color)].
This paper develops, analyzes and applies a novel algorithm for the identification of polytopes representing consistent patterns of edges in a corrupted NOESY graph. We employ an NMR-specific random graph model in proving that our algorithm gives optimal performance in expected polynomial time, even when the input graph is significantly corrupted. We confirm this analysis in simulation studies with graphs corrupted by up to 500% noise. Finally, we demonstrate the practical application of the algorithm on several experimental ß-sheet datasets. Our approach is able to eliminate a large majority of noise edges and to uncover large consistent sets of interactions.
Talks

RECOMB 07 talk [pdf]. A longer version of the talk, that I used for a guest lecture in the Computational Structural Biology course [pdf].



Home | Courses