A Mixture of Experts Approach for

Protein-Protein Interaction Prediction

 

Yanjun Qi1, Judith Klein-Seetharaman1,2, Ziv Bar-Joseph1

1School of Computer Science, Carnegie Mellon University, Pittsburgh, PA 15213

2Department of Pharmacology, University of Pittsburgh School of Medicine, Pittsburgh, PA 15261

 


 

Abstract

 

High-throughput methods can directly detect the set of interacting proteins in yeast but the results are often incomplete and exhibit high false positive and false negative rates. A number of researchers have recently presented methods for integrating direct and {\em indirect} data for predicting interactions. However, due to missing data and the high redundancy among the features used, different samples may benefit from different features based on the set of attributes available. In addition, in many cases it is hard to directly determine which of the datasets lead to the prediction, which is an important issue for the biologists using these predications to design new experiments.

 

To address these challenges we use a Mixture-of-Experts method. We split the data into four (roughly) homogeneous sets. The individual experts use logistic regression and their scores are combined using another logistic regression. However, when combining the scores the weighting of each expert depends on the set of input attributes. Thus different experts will have different influence on the prediction depending on the available features.

 

We applied our method to predict the set of interacting proteins in yeast. Our method improved upon the best previous methods for this task. In addition, using the weighting of the experts the prediction can be easily evaluated by biologists based on the features that they feel are the most reliable.

 

Features Used

 

        Features used in the paper are described in detail in the following paper

o       Y. Qi, Z. Bar-Joseph, J. Klein-Seetharaman, "Evaluation of different biological data and computational classification methods for use in protein interaction prediction", PROTEINS: Structure, Function, and Bioinformatics. Jan 2006

o       This paper’s supplementary website

 

Expert

Feature Category

Num. Features

Coverage (Percentage)

P expert

HMS_PCI Mass

1

8.3

TAP Mass

1

8.8

Yeast-2-Hybrid

1

3.9

F expert

GO Molecular Function

21

80.7

GO Biological Process

33

76.1

GO Component

23

81.5

Essentiality

1

100

MIPS Protein Class

25

4.6

MIPS Mutant Phenotype

11

9.4

S expert

Gene Neighborhood / Gene Fusion / Gene Co-occur

1

100

Sequence Similarity

1

100

Homology based PPI

4

100

Domain-Domain Interaction

1

100

E expert

Gene Expression

20

88.9

Protein Expression

1

42.8

Protein-DNA TF group binding

16

98.0

Synthetic Lethal

1

7.6

 

 

 

Performance Comparison

 

        The performance comparison was done by the random sampled Train set / Test set style

        We used two measures to evaluate performance

o       Precision vs. Recall curves

o       R50 partial area under Receiver Operator Characteristic scores.

o       For detailed description about these two criterions, please reference to

o       Y. Qi, Z. Bar-Joseph, J. Klein-Seetharaman, "Evaluation of different biological data and computational classification methods for use in protein interaction prediction", PROTEINS: Structure, Function, and Bioinformatics. ( In Press )

        Due to the space limit, we just put the R50 comparison in the paper

        Here we present the precision vs. recall curves between these 5 methods. From this plot, we could also find that the feature experts based method still is favorable.