====================================================================== Editors note: This file was received on 6/29/93 and has not been updated for subsequently received databases. - David W. Aha ====================================================================== 1. Summary Table of Database Statistics 2. Donated by: Peter Turney & Michael Jankulak As I mentioned, I was working on a table that summarized the UCI data sets, to help me choose the data sets that are appropriate for my needs. A summer student -- Michael Jankulak, a recent graduate of the University of Toronto -- has prepared the table for me. It is appended below. I would like to donate the table to the UCI repository, for others who may need such a list. Best wishes, Peter. 3. Characteristics Presented 1. Database Name 2. Number of Instances (i.e. examples, data points, observations) 3. Number of Features (i.e. dimensions, attributes) 4. Number of Classes (assuming a discrete class variable) 5. Percent of Features that have continuous/integer values 6. Percent of Features that have nominal values 7. Missing Features (yes or no) 8. Highest Reported Accuracy (taken from "Past Usage") 9. Percent of Instances in the Majority Class (to compare with 8) 4. Table 1. 2. 3. 4. 5. 6. 7. 8. 9. name #cases #feat #class %num %symb miss %accur %major ------------------------------------------------------------------------------ anneal 898 38 6 24 76 yes - 76 audiology-stan 226 70 24 0 100 yes - 25 imports-85 205 26 * 62 38 yes 88 * breast-cancer-w 699 10 2 100 0 yes 94 66 bridges 108 13 * 0 100 yes - 25 kr-vs-kp 3196 36 2 0 100 no - 52 machine 209 9 8 78 22 no - 58 credit-app 690 15 2 40 60 yes - 56 echocardiogram 132 13 2 69 31 yes 90 44 flag 194 30 * 33 67 no - * glass 214 10 7 100 0 no - 36 hayes-roth 132 5 3 0 100 no - 39 heart-disease 920 13 5 50 50 yes 77 45 hepatitis 155 19 2 32 68 yes 83 79 horse-colic 368 28 * 32 68 yes - * segmentation 2310 19 7 100 0 no - 14 ionosphere 351 34 2 100 0 no 97 64 iris 150 4 3 100 0 no - 33 labor 57 16 0 50 50 no 98 55 lenses 24 4 3 0 100 no - 62 letter 20000 16 26 100 0 no 80 4 liver-disorders 345 6 2 100 0 no - 60 lung-cancer 32 56 3 0 100 yes - 40 promoters 106 58 2 0 100 no - 50 splice 3190 61 3 0 100 no - 50 monks-1 432 6 2 0 100 no 100 50 monks-2 432 6 2 0 100 no 100 67 monks-3 432 6 2 0 100 no 100 53 mushroom 8124 22 2 0 100 yes 95 52 pima-diabetes 768 8 2 100 0 no 76 65 shuttle-l-c 15 6 2 0 100 yes - 60 solar-flare 1389 13 * 23 77 no - * soybean-large 683 35 19 0 100 yes 97 13 lrs 531 102 100 99 1 no - 10 satellite 6435 36 6 100 0 no - 24 shuttle** 14500 9 7 100 0 no - 79 vehicle 846 18 4 100 0 no - 26 new-thyroid 215 5 3 100 0 no 100 70 thyroid0387 9172 29 21 24 76 yes - 74 hypothyroid 3163 25 2 28 72 yes - 95 sick-euthyroid 3163 25 2 28 72 yes - 91 allbp 3772 29 3 24 76 yes - 96 allhyper 3772 29 5 24 76 yes - 97 allhypo 3772 29 5 24 76 yes - 92 allrep 3772 29 4 24 76 yes - 97 dis 3772 29 2 24 76 yes - 98 sick 3772 29 2 24 76 yes - 94 ann-thyroid 7200 21 3 29 71 yes - 93 tic-tac-toe 958 9 2 0 100 no 99 65 sonar 208 60 2 100 0 no 83 53 vowel 990 10 11 100 0 no 56 9 votes 435 16 2 0 100 yes 95 61 wine 178 13 3 100 0 no 100 40 zoo 101 17 7 12 88 no - 41 * any of the features can be used as the class feature ** the compressed training set in this directory may be corrupted.