======================================= Evaluation Assistant for Classification ======================================= The Evaluation Assistant is a software tool whose aim is to facilitate testing of statistical, machine learning and neural algorithms on given datasets and provide standardized performance measures. The Evaluation Assistant is oriented towards classification tasks. This software enables the user to set up the basic parameters for testing and then carry out the tests. The Evaluation Assistant requires the following files: (1) Dataset file The dataset supplied must conform to the format required by the classification algorithms. The recommended format is as follows. The dataset must follow the standard format with one example per line. All attribute values must be separated by commas (,). Class must appear after the last attribute value. The dataset may be accompanied by a cost matrix. The value in position ij specifies the cost of classifying class i as class j. The cost matrix supplied must be stored in a file .mtx. If the cost matrix has not been supplied, default cost matrix will be generated. This matrix is of size NxN and contains 1's in all places except the positions in the diagonal which contain 0's. The default cost matrix is created on the basis of the loss matrix found in the output (LOG). (2) Classification algorithm This is the learning algorithm that will be tested. The learning algorithm must generate a confusion matrix which must be in the following standard format: [1] ... [N] [1] C11 .. C1N ... .. .. .. [N] CN1 .. CNN where rows should represent the true class and columns the predictions of the classification algorithm. Running Evaluation Assistant The main menu presents the following choices: Menu Button Function: --------------- ---------- HELP OPTIONS Set up various test options METHOD Selection of test method DATASET Selection of dataset for testing ALGORITHM Selection of algorithm for testing RUN Execution of tests RESULTS Presentation of test results QUIT Output of Evaluation Assistant The Evaluation Assistant creates a LOG containing relevant information concerning the test. Each LOG consists of three parts. The first part describes the set-up under which the test was carried out (machine used, user, date etc.), together with the dataset and algorithm chosen, and the test method selected (N cross-validation). The second part shows means and standard deviations of times etc. calculated on the basis of different partial results of N cross validation. The third part contains overall results, that is success rates, overall loss etc., calculated on the basis of all cycles of cross validation (or bootstrap). Further information: See HELP information associated with each menu button. Implementation: The interactive interface was implemented in Tcl by O. Smets from RWTH-Aachen. The Evaluation Assitant itself were implemented using Unix scripts and C routines by J.Gama, LIACC, Porto. Details about the methodology can be also found in: D.Michie, D.Spiegelhalter, C.C.Taylor: Machine Learning, Neural and Statistical Classification, Ellis Horwood, 1994. This book describes the methodology adopted and the results obtained under Esprit 2 Project Statlog (No.5170). Contact names for further information: P.B.Brazdil or J.Gama Tel.: +351 600 1672 LIACC, University of Porto Fax.: +351 600 3654 Rua Campo Alegre 823 Email: statlog-adm@ncc.up.pt 4100 Porto, Portugal ======================================= OPTIONS ======================================= The submenus of OPTIONS presents three choices to the user: SESSION NAME LOAD OPTIONS SAVE OPTIONS Suboption SESSION NAME This suboption enables the user to attribute a name to the current session. The session name should be a relatively short sequence of characters which will appear as a part of various file names generated by the system. For example, if the session name is "TEST1", one file generated by the system will be called TEST1.LOG containing the test results. Suboptions LOAD OPTIONS Repetition of tests is facilitated by storage of various parameter settings in a parameter file. These may be restored any time by invoking this suboption. Suboptions SAVE OPTIONS Repetition of tests is facilitated by storage of various parameter settings in a parameter file. These may be stored any time by invoking this suboption. File DEFAULT.EA is used to store various default settings. Each line may contain one setting of the form = . One important parameter is BIN_DIR = /.../.../ indicating the path to the directory containing the scripts of EA. The information in this file has to satisfy a predefined format. The user must check this parameter before running EA. An example of a complete DEFAULT.EA file is: BIN_DIR=/home/liacc/jgama/bin METHOD=1 NR_DIV=10 MODE=1 indicating that the default test method is cross-validation (METHOD=1) with 10 cycles (NR_DIV=10). The parameter MODE=1 indicates a non-stop run as a default. Another more complex example is as follows: BIN_DIR = /home/liacc/jgama/bin/ Directory with runnable scripts/programs STEM = head1 Session name DATASET = head Dataset file name COST = head.mtx Cost matrix file EXT_TRAIN = .data Extension for the training data file EXT_TEST = .test Extension for the test data file METHOD = 1 Test method (1=N-cross validation etc.) MODE = 1 Mode of testing (0=One Step, 2=Non Stop) NR_DIV = 10 Nr. of cycles of N-cross validation SEED = 0 Seed for bootstrap ALGORITHM = c4.5 Algorithm name TRAIN_COMMAND = c4.5 -f $1 Invocation of the learning algorithm for training TEST_COMMAND = c4.5 -f $1 -u Invocation of the learning algorithm for testing ======================================= METHOD ======================================= This menu permits the user to select the test method: - N-cross validation - Leave one out - Bootstrap - Train & test The defaults test method is N-cross validation. This test method divides the dataset into N divisions where N-1 divisions are used for training, while the remaning one is used for testing. The whole process is then repeated N times. In each run a different division of the data is used for testing. Leave-one-out test method can be considered a special case of N-cross validation, where the number of divisions is equal to the number of cases in the data set. Bootstrap method is considered appropriate for test on small data sets. This method differs from N-cross validation in that certain data elements are repeated in the input, but otherwise the method is similar. This method requires an additional parameter called "seed". Train and test method uses simply the train file supplied by the user for training. Similarly, the test file supplied is used for testing. ======================================= DATASET ======================================= This menu button permits the user to select the dataset for training & testing etc. The associated pull-down submenu has the following form: Menu Button: Function: --------------- ---------- TRAIN&TEST DATA Selection of dataset for training & testing TRAIN DATA Training dataset TEST DATA Test dataset COST MATRIX Selection of cost matrix TRAIN&TEST DATA This suboption enables the user to indicate the name of the file containing the dataset to be used for N-cross validation, leave-one-out or bootstrap tests. The dataset will be automatically split up into separate portions for training and testing. See HELP associated with the submenu for details. The dataset must follow the standard format with one example per line. All attribute values must be separated by commas (,). Class must appear after the last attribute value. TRAIN DATA This suboption enables the user to indicate the name of the file containing the dataset to be used specifically for training. See HELP associated with the submenu for details. The dataset must follow the standard format with one example per line. All attribute values must be separated by commas (,). Class must appear after the last attribute value. TEST DATA This suboption enables the user to indicate the name of the file containing the dataset to be used specifically for training. See HELP associated with the submenu for details. The dataset must follow the standard format with one example per line. All attribute values must be separated by commas (,). Class must appear after the last attribute value. COST MATRIX The dataset chosen may be accompanied by a cost matrix. This option enables the user to indicate the name of the file containing the cost matrix. The cost matrix must follow the standard format. The value in position ij specifies the cost of classifying class i as class j. The cost matrix supplied must be stored in a file .mtx. If the cost matrix has not been supplied, default cost matrix will be generated. This matrix is of size NxN and contains 1's in all places except the positions in the diagonal which contain 0's. The default cost matrix is created on the basis of the error matrix found in the output (LOG). ======================================= TRAIN and TEST ======================================= This menu button permits the user to select the dataset, concise information about dataset etc. The associated pull- down submenu has the following form: Menu Option Function: ----------- ---------- SELECT Select a dataset file for train & test INFORMATION Provide concise information about dataset SHOW Show the contents of the dataset PERMUTE Permute the examples in the dataset TRAIN EXT File extension for portion of data for training TEST EXT File extension for portion of data for testing SELECT This suboption permits the user to select the dataset. The existing files are shown in a window. The required file can be selected by pinpointing. INFORMATION This suboption will initiate the validation of the data in the dataset file and provide values of some basic parameters to the user. The parameters include the number of examples, number of attributes, number of classes and distributions of examples per each class. For example, for Segment dataset the system responds as follows: DESCRIPTION of DATASET Nr. examples 2310 Nr. attributes 19 Nr. classes 7 Classes: Class .. 1, Nr. examples .. etc. SHOW This suboption permits the user to examine examine the contents of the dataset file on the screen. PERMUTE This generates a permuted version of the dataset chosen. TRAIN EXT This suboption enables the user to define an extensions for file containing the portion of the input data to be used in training. If no extension is specified, the system will use "TRA" as the default value. Some learning algorithms require that the dataset file has a specific extension. For C4.5, for example, this extension is "DATA". TEST EXT This suboption enables the user to define an extensions for file containing the portion of the input data to be used in testing. If no extension is specified, the system will use "TES" as the default value. Some learning algorithms require that the dataset file has a specific extension. ======================================= ALGORITHM ======================================= Some classification algorithms may require various parameters to be set before they are called. This menu permits the user to define the exact form. The submenu has the following form: Menu Option Function: --------------- ---------- Generic Name Identification of the algorithm Train Command Line Command line to be used in training phase Test Command Line Command line to be used in testing phase The generic name of the algorithm is required mainly for the LOG. The command lines will be used to invoke the algorithm from within the Evaluation Assitant. Here $1 represents the stem name of the dataset and $2 the extension. For example, C4.5 may be invoked using c4.5 -f $1 -u where $1 represents the stem name of the dataset chosen (e.g. heart). ======================================= RUN ======================================= When this menu button is selected, the following submenu will appear: Menu Option Function: ----------- ---------- RUN Generate customized scripts and execute them GENERATE Generate customized scripts (without executing them) ONE STEP If off, execute all cycles of cross validation etc. If on execute just one cycle of cross validation etc. The option RUN generates customized scripts and initiate training and testing by executing the scripts. The option GENERATE permits the user to generate customized scripts without executing them. This option may occasionally be useful for more experienced users. The scripts may be edited by the user and then executed off line. There is no need to invoke the menus of the Evaluation Assistant. The Evaluation Assitant uses the following scripts: Script: Calls: Description: EA.SCR N Cross-Validation CV.SCR Run N cycles of cross validation RESULTS.SCR Produce results EAB.SCR Bootstrap CVB.SCR Run N cycles RESULTS.SCR Produce results CV.SCR Perform N cycles of cross validation SPLITC Split dataset CV1.SCR Perform one cycle of cross validation CVB.SCR Perform N cycles of bootstrap SPLITB Split dataset CV1.SCR Perform one cycle CV1.SCR Perform one cycle of cross validation LEARN.SCR Call algorithm in learning phase TEST.SCR Calls algorithm in test phase with training data EARES Process matrices TEST.SCR Call algorithm in test phase with test data EARES Process matrices RESULTS.SCR Produce the results STAT Statistics on training phase STAT Statistics on test phase with training data STAT Statistics on test phase with test data SUMATRIX Overall results on training data EARES Process overall matrix SUMATRIX Overall results on test data EARES Process overall matrix All scripts and programs are documented using MAN pages. For example, EA.MAN is the MAN page for EA.SCR. The EA.ROUTINES.MAN contains a list of all scripts and programs of the Evaluation Assistant. ======================================= RESULTS ======================================= This option permits the user to create a LOG for analysis of results. The user can select condensed results or complete output. This is controlled using the following submenu: Menu Option Function: --------------- ---------- SHORT Show results in a condensed form LONG Show all test results If the SHORT form is chosen, just summary information will be shown. The output of the learning algorithm will not be shown. Users interested to see this information can request it by specifying the LONG form. All outputs are stored in a LOG containing relevant information concerning the tests. The results relative to (the user defined session name) are stored in a file with the name LOG..RES. Each LOG consists of three parts. The first part describes the set-up under which the test was carried out (machine used, user, date etc.), together with the dataset and algorithm chosen, and the test method selected (N cross-validation). The second part shows means and standard deviations of times etc. calculated on the basis of different partial results of N cross validation. The third part contains overall results, that is success rates, overall loss etc., calculated on the basis of all cycles of cross validation (or bootstrap). An example of LOG generated is shown below: General information MACHINE: liacc USER: jgama DATE: Fri Apr 4 16:33:10 WET 1992 ALGORITHM: c45 DATA_SET: lymf METHOD: Cross_Validation No.of ROUNDS: 2 Results TRAINING PHASE MEAN OF TIME: 2.875 (S) MEAN OF MEMORY: 64.500 (KB) STANDARD_DEVIATION OF TIME: 0.318 (S) STANDARD_DEVIATION OF MEMORY: 0.707 (KB) TEST PHASE ON TRAINING DATA MEAN OF TIME: 0.325 (S) MEAN OF MEMORY: 50.000 (KB) STANDARD_DEVIATION OF TIME: 0.035 (S) STANDARD_DEVIATION OF MEMORY: 0.000 (KB) TEST PHASE ON TEST DATA MEAN OF TIME: 0.325 (S) MEAN OF MEMORY: 50.000 (KB) STANDARD_DEVIATION OF TIME: 0.035 (S) STANDARD_DEVIATION OF MEMORY: 0.000 (KB) Overall results OVERALL RESULTS: TEST PHASE ON TRAINING DATA CONFUSION MATRIX [1] [2] [3] [4] [1] 2 0 0 0 [2] 0 76 5 0 [3] 0 10 51 0 [4] 0 0 3 1 COST MATRIX: Default matrix! SUCCESS_RATE: 0.878 (130/148) LOSS: 18 AVERAGE_LOSS: 0.121 TEST PHASE ON TEST DATA CONFUSION MATRIX [1] [2] [3] [4] [1] 0 1 1 0 [2] 0 73 8 0 [3] 0 23 38 0 [4] 0 1 3 0 COST MATRIX: Default matrix! SUCCESS_RATE: 0.750 (111/148) LOSS: 37 AVERAGE_LOSS: 0.250