>
Feature Details in the data set
|
Group Index |
# of features |
Dataset |
Attribute
Property |
Data Position in the set |
|
1 |
20 |
Gene
Expression |
Real
value: [-1,
1] |
1-20 |
|
2 |
21 |
GO Molecular
Function |
|
21-
41 |
|
3 |
33 |
GO
Biological Process |
|
42 -
74 |
|
4 |
23 |
GO
Component |
|
75 -
97 |
|
5 |
1 |
Protein
Expression |
Real
Value – Non Negative |
98 |
|
6 |
1 |
Essentiality |
|
99 |
|
7 |
1 |
HMS_PCI
Mass * |
|
100 |
|
8 |
1 |
TAP
Mass *
|
|
101 |
|
9 |
1 |
Y2H |
|
102 |
|
10 |
1 |
Synthetic
Lethal |
|
103 |
|
11 |
1 |
Gene
Neighborhood / Gene Fusion / Gene Co-occur |
|
104 |
|
12 |
1 |
Sequence
Similarity |
Real
value - Non negative |
105 |
|
13 |
4 |
Homology
based PPI |
Discrete:
Non-negative (Most 0, 1) |
106
– 109 |
|
14 |
1 |
Domain-Domain
Interaction |
Real
value between [0, 1] |
110 |
|
15 |
16 |
Protein-DNA
TF group binding |
Non-negative
discrete, most 0 |
111
– 126 |
|
16 |
25 |
MIPS
Protein Class |
|
127
– 151 |
|
17 |
11 |
MIPS
Mutant Phenotype |
|
151
- 162 |
* Matrix model for co-complex and co-pathway
prediction. Spoke model for direct PPI prediction.
>
Shared data sets
>
Note
·
“-100”
in the feature sets means a missing value in that position!
·
Details
about the gold standard positive sets shared above, please check “Gold Standard
datasets” section in the paper.
·
The
negative data sets I put here is just a random subset containing ~230,000 yeast
protein-protein pairs that are not in the positive PPI set of each specific
task.
·
In
the paper, we assume the size ratio between the positive examples and the
negative examples is roughly 1:600 (estimated based on experimental data) in building
the train-test sets.
·
This
ratio is still questionable and need further discussion.
·
If
you happen to know a better answer other than the above strategy I used, it
would be greatly appreciated if you could contact
me.
·
·
If
you notice any mistakes in the data, please contact
me as soon as possible. Thanks ahead !
·
FAQ page