CONTENTS
Overview
Preprocessors
Running the program
Predicting
Commands
Credits
WARNING, THIS PAGE IS PRELIMINARY. PROPER DOCUMENTATION AND DOWNLOADING INSTRUCTIONS UNDER CONSTRUCTION.
PREPPER is a program written in C for human-computer collaboration on some common preprocessing operations. It is intended for use by someone who is about to do data-mining or machine learning or neural networks or ANOVA on their data but who wishes to preprocess it first.
You can use PREPPER for the following basic preprocessing tasks:
* Take a dataset and create new attributes ("data columns") that are transformations and combinations of other attributes.
* Turn symbolic attributes into numerical attributes by a number of transformations.
* Turn date attributes (e/g 11-Jan-1978 or 1/15/97) into numbers using many different date functions.
* Remove unwanted attributes.
* Record the transformations you have made into an "xform" file so that you can make similar transformations to future data.
But you can also use PREPPER to automatically discover useful transformations in the data for your prediction task. These operations include many of the classical preprocessing algorithms, and they automatically create new transformations for you. And if you wish, these transformations can be recorded for you for future use.
At the time of writing, the available preprocessors are:
boxcox select pca
apply symbol adm
bestlin crossterm date
Type help <preprocessorname> to find out what the preprocessors do.
At the command line type prep, or prep <datafilename>. The format of the datafile can be space separated and/or comma separated, and mixed numeric and symbolic data. If the first line looks like a bunch of attribute names to the data reader it will read the first line in as names. Else it will invent boring names for the attributes.
At any time while the program is running, there is a currently loaded dataset, a current xform (pronounced "transform"). The xform defines new attributes or hides old attributes for you.
You can add new attributes by typing include <newname> = <formula>.
You can see the current set of attributes and info about them by typing showatts.
You can see the current xform by typing showxf.
You can see your current data by typing list <attributenames>.
For many of the preprocessors, and the eval command, you must state what attribute(s) you'll be trying to predict. To do that, type predict <attname>. Then typing eval at any time will show you how useful your current xform is as a preprocessor.
To see the available commands type help.
To get more information on a given command, type help <commandname>.
the current documentation is INADEQUATE and SKIMPY. It will be improved on this web page!
PREPPER: An intelligent data preprocessing tool.
Authors: Mary Soon Lee (Dates), Andrew Moore (Infrastructure),
Pat Prasangsit (Non-linear transforms),
Belinda Thom (Principal Component Analysis)
Development made possible by a research fund gift from 3M
corporation to the Auton group at Carnegie Mellon University.