PREPPER: an intelligent preprocessing tool for machine learning algorithms.

CONTENTS

Overview
Preprocessors
Running the program
Predicting
Commands
Credits

WARNING, THIS PAGE IS PRELIMINARY. PROPER DOCUMENTATION AND DOWNLOADING INSTRUCTIONS UNDER CONSTRUCTION.

PREPPER is a program written in C for human-computer collaboration on some common preprocessing operations. It is intended for use by someone who is about to do data-mining or machine learning or neural networks or ANOVA on their data but who wishes to preprocess it first.

Overview

You can use PREPPER for the following basic preprocessing tasks:

* Take a dataset and create new attributes ("data columns") that are transformations and combinations of other attributes.

* Turn symbolic attributes into numerical attributes by a number of transformations.

* Turn date attributes (e/g 11-Jan-1978 or 1/15/97) into numbers using many different date functions.

* Remove unwanted attributes.

* Record the transformations you have made into an "xform" file so that you can make similar transformations to future data.

But you can also use PREPPER to automatically discover useful transformations in the data for your prediction task. These operations include many of the classical preprocessing algorithms, and they automatically create new transformations for you. And if you wish, these transformations can be recorded for you for future use.

Preprocessors

At the time of writing, the available preprocessors are:

              boxcox          select             pca
               apply          symbol             adm
             bestlin       crossterm            date

Type help <preprocessorname> to find out what the preprocessors do.

Running the program

At the command line type prep, or prep <datafilename>. The format of the datafile can be space separated and/or comma separated, and mixed numeric and symbolic data. If the first line looks like a bunch of attribute names to the data reader it will read the first line in as names. Else it will invent boring names for the attributes.

At any time while the program is running, there is a currently loaded dataset, a current xform (pronounced "transform"). The xform defines new attributes or hides old attributes for you.

You can add new attributes by typing include <newname> = <formula>.

You can see the current set of attributes and info about them by typing showatts.

You can see the current xform by typing showxf.

You can see your current data by typing list <attributenames>.

Predicting

For many of the preprocessors, and the eval command, you must state what attribute(s) you'll be trying to predict. To do that, type predict <attname>. Then typing eval at any time will show you how useful your current xform is as a preprocessor.

Commands

To see the available commands type help.

To get more information on a given command, type help <commandname>.

the current documentation is INADEQUATE and SKIMPY. It will be improved on this web page!

Credits

                  PREPPER: An intelligent data preprocessing tool.
      
        Authors: Mary Soon Lee (Dates), Andrew Moore (Infrastructure),
                 Pat Prasangsit (Non-linear transforms),
                 Belinda Thom (Principal Component Analysis)
      
        Development made possible by a research fund gift from 3M
        corporation to the Auton group at Carnegie Mellon University.