% User documentation for the EYE project.
%
% Begun 26 Feb 96
% Copyright Mary Soon Lee 1996
%------------------------------------------------------------------
\documentstyle[psfig]{article}

\title{EYE Documentation: Version 0.0}
\author{}

\begin{document} 
 
\maketitle 
 
\section{Introduction}

EYE is a tool to help people apply machine learning and statistical
techniques.  It enables people to use advanced techniques without
needing to master the underlying algorithms.  Instead EYE has a simple
interface where the user presents the raw data, and then applies EYE to
perform the desired analysis.  Functions include:
\begin{itemize}
\item {\bf BlackBox}: searches for a model that accurately explains the data.
\item{\bf Optimize}: finds a set of inputs that will optimize a given
  criterion, for instance to maximize the sum of the outputs.
\item {\bf Predict}: predicts future behavior from past data.
\end{itemize}
See Section~\ref{compendium} for a compendium of EYE functions.

This document explains how to use EYE.  If you are in a hurry to begin,
you need only read section~\ref{example} and section~\ref{starting}.
Later sections describe the online help facility, additional user
interface tools, and the advanced interface to EYE.

To illustrate the use of EYE, we consider the example of a gardener
trying to grow prize-winning flowers.

\subsection{Tutorial Example: The Gardener}
\label{example}

Suppose a gardener with an interest in machine learning wants to grow
prize-winning flowers.  She's kept records of her past attempts: what
fertilizers she used, how much she watered the seedlings, the
temperature of the greenhouse, what height the flowers grew to, how
brightly colored they were.

The gardener decides to use EYE to help her.  She has several new plant
regimens in mind, and wants EYE to predict how well each one will
perform.  She's also curious to see what regimen EYE itself will
recommend if she asks it to maximize the brightness of the flowers,
subject to the constraint that the flowers must be at least thirty
centimeters in height.

\section{Getting Started: How to Get EYE Running}
\label{starting}

This section provides all the information you need to start using EYE.

EYE can run under either Windows 95 or Windows NT on a PC.  To start it,
bring up an MS-DOS Command Prompt window, move to the directory where
you saved the EYE executable (by using the cd command), and type EYE.
This will bring up the EYE window with the initial welcome screen.

The simplest way to use EYE is via the GMBL menu\footnote{For the
curious, GMBL stands for General Memory Based Learning, the machine
learning approach that underpins the EYE code.} on the main menu bar.
Select the GMBL menu with the left mouse button, and then select the
first menu item, ``Run GMBL.''  This brings up the following dialog
box:

\centerline{\psfig{file=simpledialog.ps,height=3in}}

Suppose the gardener introduced in section~\ref{example} wants to see
how the flower-height depends on the various factors in the plant
regimen (the quantity of green-grow fertilizer, the number of
mineral-drops, the amount of water, and the temperature of the
greenhouse).  To find out, first type garden.mbl into the datafile
slot of the dialog box to tell EYE to use the gardening data.  Now
select {\bf graph} from the main listbox by clicking it with the left
mouse button.  The dialog box should now look like this:

\centerline{\psfig{file=simpledialog2.ps,height=3in}}

Press the RUN button.  The cursor changes to a black eye while EYE
analyzes the data, and then EYE displays four graphs, showing how the
flower-height varies with each of the four factors in turn, while the
other factors are held constant.  Notice that the bottom-right graph,
corresponding to the effect of temperature, is very close to a flat
line.  This shows that the flower-height is hardly affected by the
temperature---at least for the regimens the gardener has tried in the
past.

To run EYE again, select ``Run GMBL'' from the GMBL menu as before.
The same dialog box will appear, with the datafile already filled in
as garden.mbl.  Perhaps this time the gardener, being an intrepid
soul, wants to see if EYE can find a model that explains the data.  To
follow in her footsteps, select {\bf BlackBox} from the listbox and
then press the RUN button.

The black eye appears, showing that EYE is at work, and results start
scrolling down the screen.  EYE is busy searching for a good model for
the data.  {\bf BlackBox} performs this search without any prompting
from the user.  It tries out function approximators such as nearest
neighbor, kernel regression, and attribute subsets---autonomously
tuning their parameters and deciding which model to test next.

After a few seconds, the black eye disappears, and the text stops
scrolling.  You can now examine EYE's report on the {\bf BlackBox}
search.  The overall evaluation should be visible at the bottom of the
scrollable window.  It should look something like this\footnote{Because
EYE uses random numbers to make decisions such as which data should be
used in the testset, the precise results will vary from one run to the
next.}:

\begin{verbatim}
4.  Evaluation.

        Now, if we simply predicted the global average, 
        the mean-abs testset error would be 5.67.

        The best thing we've found so far in the
        searches reduces that by 94%.
\end{verbatim}

This tells us that EYE has found a model for the data whose average
prediction error is only six percent of that for the global average
model.

You have now learned almost all that you need to know to start
applying EYE to your own data.  To run EYE, select ``Run GMBL,'' type
in the datafile, select the function you want, and press the RUN
button.  There is only one more thing you need to know: how to get EYE
to use your own data.

\subsection{Datafiles}

EYE expects datafiles to be arranged with one datapoint per line in
the file.  Each datapoint consists of a sequence of floating point
numbers, specifying the values of each of the variables for that
datapoint.  If you wish to include comments in your datafiles, you can
do so by starting each line of comment with the ``\%'' character.
For instance, here is part of the garden.mbl datafile:
\begin{verbatim}
% GreenG MinDrop Water Temp  Height   Brightness
    2      2      2     15   11.9      2
    2      2      2     20   12.1      2
    2      2      2     25   11.5      2
    2      2      4     15   27.9      2
\end{verbatim}
By default, EYE assumes that the rightmost column of numbers
represents the output value, and that all the other variables are
inputs.  To find out how to specify other formats, see section
~\ref{format} (in brief: you need to select the advanced option from
the ``Run GMBL'' dialog box, and then edit the format slot in the
advanced dialog box).

You are now ready to try out EYE on your own data.  In doing so, you
may spot unfamiliar terms appearing on the screen, such as {\bf
GMString} or {\bf AutoRSM}.  The following section describes how to
get online help that will explain cryptic terms like these.  Later
sections describe such things as how to switch to the previous screen,
how to halt EYE midway through a computation, and how to use the
advanced interface to gain additional control.

\section{Getting Help}
\label{help}

The simplest way to use EYE's online help is via the Help menu on the
main menu bar.  To see the range of topics for which help is provided,
select ``Introductory Help'' from the Help menu.  This brings up a
list of the available help topics.  To get help on any of these, just
click on the corresponding word with the left mouse button.

Whenever you see underlined words, such as those on the introductory
help screen, you can click them with the left mouse button to get more
information.  Sometimes clicking a word that isn't underlined will
still produce help.  (EYE's output would look rather messy if it always
underlined every word for which help was available.)

Help is also available from the Help buttons on several of the dialog
boxes.

The next section describes additional user interface features---from
how to use the File menu, to how to change the colors used to display
EYE's output.  Section~\ref{advanced} describes more advanced features
of the interface, and section~\ref{compendium} is a compendium of all
the EYE functions.

\section{A Medley of Other GUI Features}

Section~\ref{starting} explained a simple way to run EYE, and
section~\ref{help} described how to use the online help.  This
section discusses other useful features of the interface.  

\subsection{The File Menu and How to Exit EYE}

If you select the File menu with the left mouse button, you will see
something like this:

\centerline{\psfig{file=filemenu.ps,height=2in}}

We will briefly describe each option in turn.
\begin{itemize}
\item{\bf Open} 

This lets you select a new datafile.  It brings up a window showing the
available files, and waits for you to choose one.  When you want to
switch datafiles you can either use Open, or you can bring up the
dialog box to run EYE (see section~\ref{starting}) and type the new
filename into the datafile slot.  As a shortcut, you can invoke Open by
holding down the Ctrl character while you press the letter 'O' (that's
what the enigmatic Ctrl+O means).

\item{\bf Save} 

Once you start using the advanced interface to EYE (see
section~\ref{advanced}) you may start modifying your datafile, perhaps
naming your variables or altering which ones are treated as outputs.
Select Save if you wish to save these changes.  As a shortcut, you can
invoke Save by holding down the Ctrl character while you press the
letter 'S'.

\item{\bf Save As} 

This lets you save your datafile under a new name, bringing up a window
that shows you the existing files.

\item{\bf 1 garden.mbl} 

EYE remembers the last four files that you looked at.  This lets you
select them directly instead of using Open.

\item{\bf Exit} 

Last but by no means least, Exit allows you to quit EYE.
\end{itemize}

\subsection{Cursors and Pictures}

EYE provides a few visual cues to show what it is doing.  Whenever it
is busy computing, the cursor will change to a black eye.  The cursor
returns to the standard white arrow when EYE has finished computing.

When EYE expects to be engaged in a particularly long computation, it
also draws a black box\footnote{The choice of a black box is a tip of
the hat to the {\bf BlackBox} function of EYE, which is designed to
autonomously search for a good model for the data} that bounces in the
left hand side of the window.  The black box disappears when EYE
finishes working.

\subsection{Halting EYE}

If you wish to halt EYE midway through a computation, simply press the
letter 'Q'.  Within a second or two a dialog box will pop up and give
you the option of halting.  

\subsection{Where Did the Last Screen Go?}

EYE only keeps the latest set of output in its scrolling window.
Anything you saw earlier---perhaps some results, perhaps help on a
particular topic---disappears.  But if you want to inspect earlier
output, it's easy to do.  Simply select the ``Previous Screen'' option
from the GMBL menu.  You can select this repeatedly to look at
increasingly old output.

\subsection{Colors and Fonts}

To change the default colors and text size of EYE's output, bring up
the GMBL menu and pick ``Set Properties.''  This brings up a dialog box
that lets you specify the number of characters per line, the foreground
color, and the height of the text.

Note that the foreground color is the color used for plain text; if you
change it, other colors, such as the background color of the screen,
may change as well.  Later versions of EYE will allow you to specify
these colors directly.

\section{An Advanced Interface to EYE}
\label{advanced}

As well as the simple dialog box to run EYE (obtained by choosing ``Run
GMBL'' from the GMBL menu), there is also an advanced dialog box.  This
provides the user with more control of EYE's operations.  This section
describes how to use the advanced dialog box.

There are two ways to invoke the advanced dialog box.  You can bring it
up by clicking the Advanced button in the simple ``Run GMBL'' dialog
box.  Or you can simply click the right mouse button in the main EYE
window.  Try either of these methods and you should see the following:

\centerline{\psfig{file=advanced.ps,height=3.6in}}

The datafile and action fields should be familiar to you from the
simple dialog box (the action simply being the task you wish EYE to
execute when it runs, such as {\bf BlackBox}).  And the Run, Help, and
Cancel buttons each have the obvious effect.  But there are quite a
few new parameters.  You can edit these parameters directly by typing
a new value into the dialog box, or indirectly by using the Edit
button.

We now explain each of the remaining parameters in turn, and then
discuss the Edit and Inspect buttons.  Section~\ref{compendium} goes
on to describe the EYE functions (such as {\bf BlackBox} and {\bf
Predict}).

\subsection{Use Classification/Use Regression}

By default EYE uses regression and searches for the best general model
for data, just as we have seen with the example of the gardening data.
Sometimes, however, the data falls into a special category: it
represents a classification problem.  Here each datapoint falls into
one of a finite number of classes, and the goal is to be able to
predict which class a new datapoint will belong to.

For instance, suppose our friend the gardener had carried out
experiments on growing hybrids.  Perhaps the color of the flowers on
the hybrid plants varied: some had yellow flowers, some had orange
flowers, and some had red flowers.  Now the gardener would like to
predict what color flowers will result from particular hybrid
experiments.  Each experiment produces a result belonging to one of a
finite number of classes: yellow, orange, or red.  We represent this
by assigning one output variable to each class, and setting that
output variable to be 1 if the result belongs to that class, and 0
otherwise.  

The user can switch on the {\it Use classification} mode if their data
conforms to a classification problem (i.e. for each datapoint there is
a single output that has the value 1.0---corresponding to that
datapoint's class---and all the other outputs are 0.0).  EYE will then
constrain its own predictions and models so that they also conform to
the classification mode.

\subsection{Format}
\label{format}

This parameter shows the current input/output status of each of the
data columns in a datafile.  Recall the sample of the garden.mbl
datafile that we showed earlier:
\begin{verbatim}
% GreenG MinDrop Water Temp  Height   Brightness
    2      2      2     15   11.9      2
    2      2      2     20   12.1      2
    2      2      2     25   11.5      2
    2      2      4     15   27.9      2
\end{verbatim}
Here the first four data columns correspond to input variables
(factors in the plant regimen) and the last two data columns represent
output variables (the flower-height and color-brightness that resulted
from the regimen).  The {\it Format} string for the garden.mbl
datafile is thus: ``iiiioo.''  In general, the nth character in a {\it
Format} string is an 'i' if and only if the nth data column is being
treated as an input, an 'o' if the column is being treated as an
output, and a '-' if it is being ignored.

By default, EYE assumes that the rightmost column of numbers in a
datafile represents the output value, and that all the other columns
correspond to input variables.  You can edit the {\it Format} string
if this default assumption is incorrect.  

If you wish to permanently record a non-default format for the current
datafile, first edit the {\it Format} string, and then save the
datafile (using the File menu).  Whenever EYE opens that datafile in
the future, it will read in the {\it Format} that you recorded.

\subsection{Restrict} 

Not yet available: later versions of EYE will let the user control
this parameter.

\subsection{Verbosity} 

Not yet available: later versions of EYE will let the user control
this parameter.

\subsection{Blackbox test} 

The proportion of the data that {\bf BlackBox} reserves for use in a
test-set (to check against overfitting).  This parameter should be
between 0.0 and 1.0.  See section~\ref{blackbox} for more information
on {\bf BlackBox}.

\subsection{Blackbox seconds}

The number of seconds for which {\bf BlackBox} will run before
producing a report on its progress.  To halt {\bf BlackBox} before
this time is up, press the letter 'Q'.  See section~\ref{blackbox}
for more information on {\bf BlackBox}.

\subsection{No. crossval}

The number of leave-one-out samples to use during cross-validation
(cross-validation is used by both {\bf BlackBox} and {\bf Search}).
Suppose this parameter is set to N; then instead of finding the mean
leave-one-out error of {\it all} points in the dataset,
cross-validation will find the mean leave-one-out error of the N most
recent points.

\subsection{Max. no. attributes}

Not yet available: later versions of EYE will let the user control
this parameter.

\subsection{Query point}

The {\it query point} is a vector specifying a point in input space.
The nth number specifies the value of the nth input variable.  The
{\it query point} is used by several EYE functions:

\begin{itemize}
\item {\bf Analysis} holds the values of all the inputs (other than the
one currently being graphed) to their value in the {\it query point}.

\item {\bf Graph} holds the values of all the inputs (other than the
one currently being graphed) to their value in the {\it query point}.

\item {\bf Predict} makes its prediction about the current {\it query
point}.  See section~\ref{predict}.
\end{itemize}

The initial value assigned to the {\it query point} when a new
datafile is opened is the midpoint of the range of inputs, i.e. the
nth number in the {\it query point} is midway between the lowest and
highest values taken by the nth input variable.

{\bf AutoRSM} and {\bf Optimize} both set the {\it query point} (see
section~\ref{autorsm} and section~\ref{optimize} for details).

\subsection{Testfile}

Not yet available: later versions of EYE will let the user control
this parameter.

\subsection{GMString}
\label{gmstring}
% What is the character for more than 9 nearest neighbors?
% !!! More details needed
% !!! What is the bit between the two semicolons?

{\it GMStrings} are inscrutable entities that encapsulate a
description of a function approximator.  Function approximators lie at
the heart of EYE.  For instance {\bf BlackBox} hunts for the function
approximator that most accurately models the data, and when it ceases
running {\it GMString} will be set to a representation of the best
function approximator it has found.  

The following is an example of a {\it GMString}:
\begin{verbatim}
    L24:93009
\end{verbatim}
We now describe how to interpret this enigmatic object.

The first character of the {\it GMString} specifies the type of local
model to use during regression.  The current version of EYE supports
five types of local model: 
\begin{itemize}
\item 'A': local averaging (kernel regression).
\item 'L': locally linear regression.
\item 'C': part way between locally linear and locally quadratic
regression: this includes a term containing the sum of the squares
of all the inputs in addition to the linear terms.
\item 'E': part way between locally linear and locally quadratic
regression: this includes terms for the squares of each input, but
does not contain any cross-terms (the product of two or more inputs).
\item 'Q': locally quadratic regression.
\end{itemize}

The second character of the {\it GMString} specifies how much smoothing
the function approximator uses.  This ranges from 1 (a very local
model with almost no smoothing, where only extremely nearby data is
considered) to 9 (a fully global model).

The third character of the {\it GMString} specifies how many nearest
neighbors to ensure are included in the local regression.  If this is,
say, three, then the three nearest neighbors of a {\it query point}
will always be fully weighted when making predictions, even if they
aren't particularly close to the {\it query point}.

Thus the {\it GMString} L24:93009 corresponds to a function
approximator that uses locally linear regression, with little
smoothing, and that always weights the four nearest neighbors fully.

The numbers after the semicolon in a {\it GMString} describe how much
each input should be weighted.  If the nth digit after the semicolon
is a zero, then the nth input variable is ignored.  If it is a nine,
then the nth input variable is fully weighted.  Between nine and zero,
the weighting halves each time the number is reduced by one.  Thus if
the nth digit is a seven, then the nth input variable will only have a
quarter of the full weighting.  In the {\it GMString} L24:93009, the
third and fourth inputs are ignored, the second is weakly weighted,
and inputs one and five are fully weighted.

N.B. You may also see {\it GMStrings} with curly brackets after the
semicolon, such as E40:\{9\}.  The curly brackets just serve as
abbreviations: \{9\} means that all the inputs should have weight 9.
%!!! More complicated than that.  See g/xambl/facode.h

Note that the EYE function {\bf Predict} makes its prediction in
accordance with the current {\it GMString}. See section~\ref{predict}.

\subsection{Edit}

The {\it Edit} button brings up a list of the objects that you are
currently allowed to edit.  If you click on one of the objects, EYE
will bring up a window to help you edit it.  

For example, if you have opened the datafile garden.mbl (which should
therefore be the name displayed in the {\it datafile} field of the
advanced dialog box), then the list of objects you can edit should
include an item called {\it Names}.  If you select this, EYE will
bring up the following dialog box:

\centerline{\psfig{file=edit.ps,height=3.5in}}

This lets you edit the names of the variables for the garden.mbl
datafile, and also lets you edit the ranges of those variables.  The
tall list on the left hand side of the dialog contains the items that
you can edit.  A brief explanation of the currently selected item (in
this case Green-Grow-name) is displayed in the large box to the right.
The item's current value is shown in the smaller box below this; to
change the value, simply type your desired value into the current
value slot---for instance you might decide to shorten Green-Grow's
name to just Green.

By default EYE gives very dull names to the variables represented by
the data columns in the datafile, calling the variable for the first
data column {\it attribute0}, that for the second data column {\it
attribute1}, and so forth.  Often you may want to select the Edit
Names option after you open a datafile for the first time, to assign
more descriptive names to your variables.  If you then save the
datafile, EYE will always remember your names in the future.

All edit dialog boxes have the same basic form.  You select the item
you wish to change, edit its value by typing into the {\it current
value} slot of the dialog, and then select the next item you wish to
change.  When you have made all the changes you wish, simply press the
Done button.

\subsection{Inspect}

The {\it Inspect} button brings up a list of the objects that you are
currently allowed to inspect.  If you click on one of the objects, EYE
will display detailed information about that object, explaining what
the object is and showing its current value.

\section{A Compendium of EYE Functions} 
\label{compendium}

In this section we provide a compendium of the EYE functions.  Many of
these functions use additional parameters, which are italicized for
clarity throughout this section.  If you wish to adjust these
parameters, you can do so via the advanced dialog box described in
section~\ref{advanced}.  That section also briefly explains the meaning
of each of the parameters.

\subsection{BlackBox}
\label{blackbox}

{\bf BlackBox} is one of the core components of EYE.  It aggressively
hunts for the model that best explains the data\footnote{To be more
exact: it searches for the function approximator that will give the
best predictive accuracy when used on new data drawn from the same
distribution as the data in the current training and test sets.}.  

{\bf BlackBox} searches over a wide variety of models.  As well as
considering different kinds of function approximator, such as nearest
neighbor and kernel regression, it also searches for the best attribute
subsets (determining whether any input variables can be ignored, and
what relative weights should be given to the remaining inputs).  Each
of the models typically has several parameters that need to be tuned,
such as distance metric parameters and smoothing parameters.  {\bf
Blackbox} autonomously searches for the optimal values of these
parameters.  It uses multiple levels of cross-validation to police
itself against overfitting.

Because searching over all models may take a considerable time , the
user can set the {\it Blackbox seconds} parameter to impose an upper
limit on the execution time.  When the time limit is reached, {\bf
BlackBox} stops running and produces a summary of its findings.

If you have run {\bf BlackBox} earlier on---and if you are still using
the same datafile and parameter settings---then a new call to {\bf
BlackBox} will resume from where it last stopped, rather than
duplicating earlier work.

{\bf BlackBox} uses the following parameters: {\it
Classification/Regression, Blackbox seconds, Blackbox test,
No.~crossval, Max.~No.~Attributes, Testfile}.

{\bf Blackbox} sets the {\it GMString} parameter to the best function
approximator that it has found for modeling the data.

\subsection{Predict}
\label{predict}

This option predicts the outputs for a given setting of the input
variables.  The value of the input variables to use is determined from
the {\it query point}.  For example, our friend the gardener might want
to see the predicted flower-height and flower-brightness for the
following regimen:
\begin{verbatim}
  Amount of Green-Grow        2.1 
  Amount of Mineral-Drops     5   
  Amount of Water             6
  Temperature                 20
\end{verbatim}

To find EYE's prediction, the gardener must first tell EYE which
regimen she wants to investigate.  This is done by setting the {\it
query point} to the corresponding values.  In this case she would set
the query point to be:
\begin{verbatim}
  2.1  5   6  20
\end{verbatim}
She can then find EYE's prediction by selecting the {\bf Predict}
action, and pressing the RUN button.

Note that {\bf Predict} uses the current {\it GMString} to determine
which model is used when predicting the output.  See
section~\ref{gmstring} for information on {\it GMStrings}.

{\bf Predict} uses the following parameters: {\it
Classification/Regression, GMString, Query point}.

\subsection{Set}
% !!!

\subsection{Graph}
% !!!

\subsection{3dGraph}
% !!!

\subsection{LOOHistogram}
% !!!

\subsection{LOOPredict}
% !!!

\subsection{Search}
% !!!

\subsection{IntelliPrinc}

Not yet available: later versions of EYE will include this function.

\subsection{Transform}

Not yet available: later versions of EYE will include this function.

\subsection{Analysis}
% !!!

\subsection{AutoRSM}
\label{autorsm}
% !!!

\subsection{Optimize}
\label{optimize}
% !!!

\section{Schenley Park Research, Inc.}
% !!!

\end{document}

