Hetunandan Kamisetty, Bornika Ghosh, Chris Bailey-Kellogg and Chris J. Langmead, "Modeling and Inference of Sequence-Structure Specificity"
in Proceedings of the Eight Annual Conference on Computational Systems Bioinformatics(CSB 2009),
[pdf].
In order to evaluate protein sequences for simultaneous satisfaction of evolutionary
and physical constraints, this paper develops a graphical model approach integrating sequence
information from the evolutionary record of a protein family with structural information based
on a molecular mechanics force field. Nodes in the graphical model represent choices for the
backbone (native vs. decoys), amino acids (conservation analysis), and side-chain conformations
(rotamer library). Edges capture dependence relationships, in both the sequence (correlated
mutations) and the structure (direct physical interactions). The sequence and structure
components of the model are complementary, in that the structure component may support
choices that were not present in the sequence record due to bias and artifacts, while the sequence
component may capture other constraints on protein viability, such as permitting an
efficient folding pathway. Inferential procedures enable computation of the joint probability
of a sequence-structure pair, thereby assessing the quality of the sequence with respect to both
the protein family and the specificity of its energetic preference for the native structure against
decoy structures. In a case study of WW domains, we show that by using the joint model and
evaluating specificity, we obtain better prediction of foldedness of designed proteins (AUC of
0.85) than either a sequence-only or a structure-only model, and gain insights into how, where,
and why the sequence and structure components complement each other.
Hetunandan Kamisetty and Chris J. Langmead, "A Bayesian Approach to Protein Model Quality Assesment,"
in Proceedings of the 26th Annual International Conference on Machine Learning(ICML 2009),
p.p: 481-488
[pdf].
Given multiple possible models b1,b2,..,bn for a protein structure, a common sub-task
in in-silico Protein Structure Prediction is
ranking these models according to their quality. Extant approaches use MLE estimates of parameters ri to obtain point estimates of the Model Quality. We describe a Bayesian
alternative to assessing the quality of these models that builds an MRF over the parameters of each model and performs approximate inference to integrate over them. Hyper-
parameters w are learnt by optimizing a list-wise loss function over training data. Our
results indicate that our Bayesian approach can significantly outperform MLE estimates and that optimizing the hyper-parameters can further improve results.
Hetunandan Kamisetty, Chris Bailey-Kellogg and Chris J. Langmead, "A Graphical Model Approach for Predicting Free Energies of Association for Protein-Protein Interactions under
Backbone and Side-chain Flexibility", Carnegie Mellon University School of Computer Science Technical Report CMU-CS-08-162
December 2008. A talk based on these results won the Best Scientific Contribution Award at the Fifth ISMB Satellite Meeting on Structural Bioinformatics
and Computational Biophysics, 3DSIG 2009
[pdf].
Biomolecular systems are governed by changes in free energy, and the ability to predict binding
free energies provides both better understanding of biomolecular interactions and the ability to optimize
them. We present the first graphical-model based approach, which we call GOBLIN (Graphical
mOdel for BiomoLecular INteractions), for predicting binding free energies for all-atom models
of protein complexes. Our method is physically sound in that internal energies are computed
using standard molecular-mechanics force fields, and free energies are obtained by computing a
rigorous approximation to the partition function of the system. Moreover, GOBLIN explicitly models
both backbone and side-chain flexibility, and, when desired, employs non-linear regression to
optimize force-field parameters. In tests on a benchmark set of more than 700 mutants, we show
that our method is fast, running in a few minutes, and accurate, achieving root mean square errors
(RMSEs) between predicted and experimental binding free energies of 2.05 kcal/mol. GOBLIN’s
RMSEs are 0.55 kcal/mol better than the well-known program ROSETTA, despite the fact that we
use the ROSETTA force field for computing internal energies. That is, our increase in accuracy is
due to our ability to accurately estimate entropic contributions to the free energy. Finally, using
our novel algorithm for optimizing force-field parameters on specific protein complexes reduced
GOBLIN’s RMSE by 0.26 kcal/mol on average.
Hetunandan Kamisetty, Eric P. Xing and Chris J. Langmead, "Free
Energy Estimates of All-atom Protein Structures using
Generalized Belief Propagation." Proceedings of the Eleventh Annual
International Conference on Research in Computational
Molecular Biology (RECOMB 2007), pp:366-380 [pdf].
An earlier version appeared as CMU-CS-06-160.
We present a technique for approximating
the free energy of protein structures using Generalized Belief
Propagation (GBP). The accuracy and utility of these estimates are
then demonstrated in two different application domains. First, we
show that the entropy component of our free energy estimates can
be useful in distinguishing native protein structures from decoys.
Second, we show that our estimates of the changes in free energy of protein
structures upon mutation have a linear correlation of upto 0.70 with
laboratory measurements.
GBP is also efficient, taking a few minutes to run on a typical sized protein,
further suggesting that GBP may be an attractive alternative to
more costly molecular dynamic simulations for some tasks.
Hetunandan Kamisetty, Chris Bailey-Kellogg and Gopal Pandurangan, "An efficient randomized algorithm for contact-based
NMR backbone resonance assignment," Bioinformatics 2006, 22(2):172-180 [abstract, html, pdf, preprint(color)].
This paper develops, analyzes and applies a novel algorithm for the identification of polytopes representing consistent patterns of
edges in a corrupted NOESY graph. We employ an NMR-specific random graph
model in proving that our algorithm gives optimal performance in expected polynomial time,
even when the input graph is significantly corrupted. We confirm this analysis in simulation studies with graphs
corrupted by up to 500% noise. Finally, we demonstrate the practical application of the algorithm on several experimental ß-sheet
datasets. Our approach is able to eliminate a large majority of noise edges and to uncover large consistent sets of interactions.