Hetunandan Kamisetty, Sergey Ovchinnikov, David Baker,
"Assessing the utility of co-evolution based residue-residue contact predictions in a sequence and structure-rich era," PNAS (accepted), 2013 [webserver].
We develop an improved method for predicting residue - residue contacts in protein structures that achieves higher accuracy than previous methods by integrating structural context and sequence coevolution information. We then determine the conditions under which these predicted contacts are likely to be useful for structure modeling and identify more than 400 protein families where these conditions are currently met.
Timothy A. Whitehead, Aaron Chevalier, Yifan Song, Cyrille Dreyfus, Sarel J. Fleishman, Cecilia De Mattos, Chris A. Myers, Hetunandan Kamisetty, Patrick Blair, Ian A. Wilson, David Baker,
"Optimization of affinity, specificity, and function of designed Influenza inhibitors using next generation sequencing," Nature Biotechnology, 2012.
We show that comprehensive sequence-function maps obtained by deep sequencing can be used to reprogram interaction
specificity and to leapfrog over bottlenecks in affinity maturation by combining many individually small contributions not
detectable in conventional approaches. We use this approach to optimize two computationally designed inhibitors against
H1N1 influenza hemagglutinin and, in both cases, obtain variants with subnanomolar binding affinity. The most potent
of these, a 51-residue protein, is broadly cross-reactive against all influenza group 1 hemagglutinins, including human
H2, and neutralizes H1N1 viruses with a potency that rivals that of several human monoclonal antibodies, demonstrating
that computational design followed by comprehensive energy landscape mapping can generate proteins with potential
N. S. Razavian, H. Kamisetty, C.J. Langmead, "Learning Generative Models of Molecular Dynamics,"
BMC Genomics (in press), 2012
We introduce three algorithms for learning generative models of molecular structures from molecular dynamics simulations. The first algorithm learns a Bayesian-optimal undirected probabilistic model over user-specified covariates (e.g., fluctuations, distances, angles, etc).
L1 regularization is used to ensure sparse models and thus reduce the risk of over-fitting the data. The topology of the resulting model reveals important couplings between different parts of the protein, thus aiding in the analysis of molecular motions.
The generative nature of the model makes it well-suited to making predictions about the global effects of local structural changes (e.g., the binding of an allosteric regulator). Additionally, the model can be used to sample new conformations. The second algorithm learns a time-varying graphical model where the topology and parameters change smoothly along the trajectory, revealing the conformational sub-states.
The last algorithm learns a Markov Chain over undirected graphical models which can be used to study and simulate kinetics. We demonstrate our algorithms on multiple molecular dynamics trajectories.
Hetunandan Kamisetty, Eric P. Xing and Chris J. Langmead, "Approximating Correlated Equilibria using Relaxations on the Marginal Polytope," ICML 2011, 1153-1160.
In game theory, a Correlated Equilibrium (CE) is an equilibrium concept that generalizes the more well-known Nash Equilibrium. If the game is represented as a graphical game, the computational complexity of computing an optimum CE is exponential in the tree-width of the graph. In settings where this exact computation is not feasible, it is desirable to approximate the properties of the CE such as its expected social utility and marginal probabilities. Apart from providing insights about a specific game, this problem also has natural applications in learning graphical games from data, and in designing games with desired outcomes.
We study outer relaxations of this problem that yield approximate marginal strategies for the players under a variety of utility funcions. Results on simulated games and in a real problem involving drug design indicate that our approximations can be highly accurate and can be successfully used when exact computation of CE is infeasible.
Sivaraman Balakrishnan*, Hetunandan Kamisetty*, Jaime Carbonell, Su-In Lee and Chris. J. Langmead, "Learning Generative Models for Protein Fold Families," Proteins, Vol 79, Issue 4, pp 1061-1078. [link][pdf].
We introduce a new approach to learning statistical models from multiple sequence alignments (MSA) of proteins.
Our method, called GREMLIN (Generative REgularized ModeLs of proteINs), learns an undirected probabilistic
graphical model of the amino acid composition within the MSA.
The resulting model encodes both the position-specific
conservation statistics and the correlated mutation statistics between
sequential and long-range pairs of residues.
Existing techniques for learning graphical models from multiple sequence alignments
either make strong, and often inappropriate assumptions about the conditional
independencies within the MSA (e.g., Hidden Markov Models), or else use sub-optimal
algorithms to learn the parameters of the model. In contrast,
GREMLIN makes no a priori assumptions about the conditional independencies within the MSA.
We formulate and solve a convex optimization problem, thus guaranteeing that we find a
globally optimal model at convergence. The resulting model is also generative, allowing for the design of new protein sequences that have the same statistical properties as those in the MSA. We perform a detailed analysis of covariation statistics on the extensively studied WW and PDZ domains and show that our method out-performs an existing algorithm for learning undirected probabilistic graphical models from MSA. We then apply our approach to 71 additional families from the PFAM database and demonstrate that the resulting models significantly out-perform Hidden Markov Models in terms of predictive accuracy.
Hetunandan Kamisetty, Arvind Ramanathan, Chris Bailey-Kellogg and Chris J. Langmead, "Accounting for Conformational Entropy in Predicting Binding Free Energies of
Protein-protein Interactions," Proteins, Vol 79, Issue 2, pp 444-462. [link], [pdf] .
A talk based on these results won the Best Scientific Contribution Award at 3DSIG 2009
Protein-protein interactions are governed by the change in free energy
upon binding, &Delta G=&Delta H -T &Delta S. These interactions are
often marginally stable, so one must examine the balance between the
change in enthalpy, &Delta H, and the change in entropy, &Delta S,
when investigating known complexes, characterizing the effects of
mutations, or designing optimized variants. In order to perform a large-scale study into the
contribution of conformational entropy to binding free energy, we
developed a technique called GOBLIN (Graphical mOdel for
BiomoLecular INteractions) that performs physics-based free energy
calculations for protein-protein complexes under both side-chain and
backbone flexibility. GOBLIN uses a probabilistic graphical model that exploits conditional independencies in the Boltzmann
distribution and employs variational inference techniques that approximate
the free energy of binding in only a few minutes. We examined the
role of conformational entropy on a benchmark set of more than 700
mutants in eight large, well-studied complexes. Our findings suggest
that conformational entropy is important in protein-protein
interactions - the root mean square error (RMSE) between calculated
and experimentally measured &Delta&Delta G decreases by 12%
when explicit entropic contributions were incorporated. GOBLIN models all atoms of the protein
complex and detects changes to the binding entropy along the interface as well as positions distal to the binding interface.
Our results also suggest that a variational approach to entropy
calculations may be quantitatively more accurate than the
knowledge-based approaches used by the well-known programs FoldX
and Rosetta - GOBLIN's RMSEs are 10% and 36% lower than
these programs, respectively.
Hetunandan Kamisetty, Bornika Ghosh, Chris Bailey-Kellogg and Chris J. Langmead, "Modeling and Inference of Sequence-Structure Specificity"
In order to evaluate protein sequences for simultaneous satisfaction of evolutionary
and physical constraints, this paper develops a graphical model approach integrating sequence
information from the evolutionary record of a protein family with structural information based
on a molecular mechanics force field. Nodes in the graphical model represent choices for the
backbone (native vs. decoys), amino acids (conservation analysis), and side-chain conformations
(rotamer library). Edges capture dependence relationships, in both the sequence (correlated
mutations) and the structure (direct physical interactions). The sequence and structure
components of the model are complementary, in that the structure component may support
choices that were not present in the sequence record due to bias and artifacts, while the sequence
component may capture other constraints on protein viability, such as permitting an
efficient folding pathway. Inferential procedures enable computation of the joint probability
of a sequence-structure pair, thereby assessing the quality of the sequence with respect to both
the protein family and the specificity of its energetic preference for the native structure against
decoy structures. In a case study of WW domains, we show that by using the joint model and
evaluating specificity, we obtain better prediction of foldedness of designed proteins (AUC of
0.85) than either a sequence-only or a structure-only model, and gain insights into how, where,
and why the sequence and structure components complement each other.
Hetunandan Kamisetty and Chris J. Langmead, "A Bayesian Approach to Protein Model Quality Assesment,"
Given multiple possible models b1,b2,..,bn for a protein structure, a common sub-task
in in-silico Protein Structure Prediction is
ranking these models according to their quality. Extant approaches use MLE estimates of parameters ri to obtain point estimates of the Model Quality. We describe a Bayesian
alternative to assessing the quality of these models that builds an MRF over the parameters of each model and performs approximate inference to integrate over them. Hyper-parameters w are learnt by optimizing a list-wise loss function over training data. Our
results indicate that our Bayesian approach can significantly outperform MLE estimates and that optimizing the hyper-parameters can further improve results.
Hetunandan Kamisetty, Eric P. Xing and Chris J. Langmead, "Free
Energy Estimates of All-atom Protein Structures using
Generalized Belief Propagation." RECOMB 2007. [pdf].
An earlier version appeared as CMU-CS-06-160.
We present a technique for approximating
the free energy of protein structures using Generalized Belief
Propagation (GBP). The accuracy and utility of these estimates are
then demonstrated in two different application domains. First, we
show that the entropy component of our free energy estimates can
be useful in distinguishing native protein structures from decoys.
Second, we show that our estimates of the changes in free energy of protein
structures upon mutation have a linear correlation of upto 0.70 with
GBP is also efficient, taking a few minutes to run on a typical sized protein,
further suggesting that GBP may be an attractive alternative to
more costly molecular dynamic simulations for some tasks.
Hetunandan Kamisetty, Chris Bailey-Kellogg and Gopal Pandurangan, "An efficient randomized algorithm for contact-based
NMR backbone resonance assignment," Bioinformatics 2006, 22(2):172-180 [abstract, html, pdf, preprint(color)].
This paper develops, analyzes and applies a novel algorithm for the identification of polytopes representing consistent patterns of
edges in a corrupted NOESY graph. We employ an NMR-specific random graph
model in proving that our algorithm gives optimal performance in expected polynomial time,
even when the input graph is significantly corrupted. We confirm this analysis in simulation studies with graphs
corrupted by up to 500% noise. Finally, we demonstrate the practical application of the algorithm on several experimental ß-sheet
datasets. Our approach is able to eliminate a large majority of noise edges and to uncover large consistent sets of interactions.
Probabilistic Graphical Models for Integrating Sequence and Structure Information, SIAM Conference on Life Sciences, Pittsburgh, PA. July 2010.
Incorporating Configurational Entropy into Binding Free Energy Calculations via Statistical Inference, short talk in the Keystone Symposia on Computer-Aided Drug Design, Whistler, BC. Apr 2010.
Modeling Protein Sequence and Structure Variability with Probabilistic Graphical Models, Department of Computer Science, University of Toronto. Mar 2010.
Probabilistic Graphical Models for Structural Biology, Baker Lab, Department of Biochemistry, University of Washington. Feb 2010.
N.S. Razavian, S. Moitra, H. Kamisetty, A. Ramanathan, C.J. Langmead, "Time-Varying Gaussian Graphical Models of Molecular Dynamics Data,"
Carnegie Mellon University School of Computer Science Technical Report CMU-CS-10-119, May 2010
S. Balakrishnan, H. Kamisetty, J.C. Carbonell, C.J. Langmead, "Learning Generative Models of Protein Fold Families,"
Carnegie Mellon University School of Computer Science Technical Report CMU-CS-10-113, March 2010. A previous version appeared as CMU-CS-09-177.
Hetunandan Kamisetty, Chris Bailey-Kellogg and Chris J. Langmead, "A Graphical Model Approach for Predicting Free Energies of Association for Protein-Protein Interactions under
Backbone and Side-chain Flexibility", CMU SCS Technical Report CMU-CS-08-162, December 2008. [pdf].
C.J. Langmead, H. Kamisetty, "Detecting Protein-Protein Interaction Decoys using Fast Free Energy Calculations,"
CMU SCS Technical Report CMU-CS-07-156