Feature Set Downloading:

 

       

 è [new]      Since the related positive reference sets and feature sources have updated rapidly over the years, just sharing the extracted feature files or partial prediction scores are not good enough anymore. Thus,

 

Here I share the code and related files to generate our feature set. Download (both summary and detailed !)

 

The general framework and the codes should be quite useful. You could try to find more recent versions of related evidence sets to make improvement though.

 

 

> Feature Details in the data set

 

Group Index

# of features

Dataset

Attribute Property

Data Position

in the set

1

20

Gene Expression

Real value:

[-1, 1]

1-20

2

21

GO Molecular Function

{1, 0}

21- 41

3

33

GO Biological Process

{1, 0}

42 - 74

4

23

GO Component

{1, 0}

75 - 97

5

1

Protein Expression

Real Value – Non Negative

98

6

1

Essentiality

{2 , 1, 0}

99

7

1

HMS_PCI Mass *

{ 1, 0}

100

8

1

TAP Mass *

{ 1, 0}

101

9

1

Y2H

{ 1, 0}

102

10

1

Synthetic Lethal

{ 1, 0}

103

11

1

Gene Neighborhood / Gene Fusion / Gene Co-occur

{ 1, 0}

104

12

1

Sequence Similarity

Real value - Non negative

105

13

4

Homology based PPI

Discrete: Non-negative (Most 0, 1)

106 – 109

14

1

Domain-Domain Interaction

Real value between [0, 1]

110

15

16

Protein-DNA TF group binding

Non-negative discrete, most 0

 

111 – 126

16

25

MIPS Protein Class

{ 1, 0}

127 – 151

17

11

MIPS Mutant Phenotype

{ 1, 0}

151 - 162

 

* Matrix model for co-complex and co-pathway prediction. Spoke model for direct PPI prediction.

 

 

> Shared data sets

 

 

 

 

 

 

 

 

 

> Note

 

·        “-100” in the feature sets means a missing value in that position!

·        Details about the gold standard positive sets shared above, please check “Gold Standard datasets” section in the paper.

·        The negative data sets I put here is just a random subset containing ~230,000 yeast protein-protein pairs that are not in the positive PPI set of each specific task.

·        In the paper, we assume the size ratio between the positive examples and the negative examples is roughly 1:600 (estimated based on experimental data) in building the train-test sets.

·        This ratio is still questionable and need further discussion.

·        If you happen to know a better answer other than the above strategy I used, it would be greatly appreciated if you could contact me.

·         

·        If you notice any mistakes in the data, please contact me as soon as possible. Thanks ahead !

 

·        FAQ page