Feature Set Downloading:

 

> Feature Details in the data set

 

Group Index

# of features

Dataset

Attribute Property

Data Position

in the set

1

20

Gene Expression

Real value:

[-1, 1]

1-20

2

21

GO Molecular Function

{1, 0}

21- 41

3

33

GO Biological Process

{1, 0}

42 - 74

4

23

GO Component

{1, 0}

75 - 97

5

1

Protein Expression

Real Value – Non Negative

98

6

1

Essentiality

{2 , 1, 0}

99

7

1

HMS_PCI Mass *

{ 1, 0}

100

8

1

TAP Mass *

{ 1, 0}

101

9

1

Y2H

{ 1, 0}

102

10

1

Synthetic Lethal

{ 1, 0}

103

11

1

Gene Neighborhood / Gene Fusion / Gene Co-occur

{ 1, 0}

104

12

1

Sequence Similarity

Real value - Non negative

105

13

4

Homology based PPI

Discrete: Non-negative (Most 0, 1)

106 – 109

14

1

Domain-Domain Interaction

Real value between [0, 1]

110

15

16

Protein-DNA TF group binding

Non-negative discrete, most 0

 

111 – 126

16

25

MIPS Protein Class

{ 1, 0}

127 – 151

17

11

MIPS Mutant Phenotype

{ 1, 0}

151 - 162

 

* Matrix model for co-complex and co-pathway prediction. Spoke model for direct PPI prediction.

 

 

> Shared data sets

 

 

 

 

 

 

 

 

 

> Note

 

·        “-100” in the feature sets means a missing value in that position!

·        Details about the gold standard positive sets shared above, please check “Gold Standard datasets” section in the paper.

·        The negative data sets I put here is just a random subset containing ~230,000 yeast protein-protein pairs that are not in the positive PPI set of each specific task.

·        In the paper, we assume the size ratio between the positive examples and the negative examples is roughly 1:600 (estimated based on experimental data) in building the train-test sets.

·        This ratio is still questionable and need further discussion.

·        If you happen to know a better answer other than the above strategy I used, it would be greatly appreciated if you could contact me.

·         

·        If you notice any mistakes in the data, please contact me as soon as possible. Thanks ahead !

 

·        FAQ page