\documentstyle[12pt]{cmu-art}
\def\BOX#1{\fbox{\tt #1}}
\begin{document}

\title{15-681: Machine Learning}
\author{Homework 5: Neural networks and face recognition}
\date{Due: 8 November 1994}
\maketitle

\section{Introduction}

In this assignment, you will experiment with a neural network package
on three tasks: a ``you'' recognizer, a face recognizer, and a pose
recognizer.  For training and testing, you will use the face images
of the class which we acquired earlier in the semester.

You will not need to do significant amounts of coding for this assignment,
and you should not let the size of this document scare you, but training
your networks will take time.  It is recommended that you read the
assignment in its entirety first, and start early.

\section{The face images}

The image data can be found in {\tt /afs/cs/project/theo-8/faceimages/faces}.
This directory contains 20 subdirectories, one for each person who
volunteered for the photo shoot, named by userid.  Each of these directories
contains several versions of the face images.  As mentioned in class,
you are welcome to make copies of your face images for personal use
(it's the least we can do after giving you a nice 1500-watt sunburn).

You will be interested in the images with the following naming convention:

{\tt <userid>\_<pose>\_<expression>\_<eyes>\_<scale>.pgm}

\begin{itemize}

\item {\tt <userid>} is the user id of the person in the image, and this
field has 20 values: ap4c, as60, avrim, bthom, cprose, dbrown, dw3l, gary,
ho09, jkam, js, kw00, lgarrido, mitchell, nls, octav, ojuarez, scottd,
vg25, yq22.

\item {\tt <pose>} is the head position of the person, and this field
has 4 values: straight, left, right, up.

\item {\tt <expression>} is the facial expression of the person, and this
field has 4 values: neutral, happy, sad, angry.

\item {\tt <eyes>} is the eye state of the person, and this field has
2 values: open, closed.

\item {\tt <scale>} is the scale of the image, and this field has 3
values: 1, 2, and 4.  1 indicates a full-resolution image ($128$ columns
$\times$ $120$ rows); 2 indicates a half-resolution image ($64 \times 60$);
4 indicates a quarter-resolution image ($32 \times 30$).  For this
assignment, you will be using the quarter-resolution images for experiments,
to keep training time to a manageable level.

\end{itemize}

If you've been looking closely in the image directories, you may notice
that some images have a {\tt .bad} suffix rather than the {\tt .pgm}
suffix.  As it turned out, 47 of the 640 images had glitches due to
problems with the camera setup; these are the {\tt .bad} images.
Some people had more glitches than others, but everyone who got ``faced''
should have at least 27 good face images (out of the 32 variations
possible, discounting scale).

\section{Viewing the face images}

To view the images, you can use the program {\tt xv}.  This is available
as {\tt /usr/local/bin/xv} on Andrew machines, and
{\tt /usr/misc/.X11-others/bin/xv} on CS machines.  {\tt xv} handles
a variety of image formats, including the PGM format in which our
face images are stored.  While we won't go into detail about {\tt xv}
in this document, we will quickly describe the basics you need to know
to use {\tt xv}.

To start {\tt xv}, just specify one or more images on the command line,
like this:

{\tt xv /afs/cs/project/theo-8/faceimages/faces/js/js\_straight\_happy\_open\_4.pgm}

This will bring up an X window with your TA's face.  Clicking the right
button in the image window will toggle a control panel with a variety of
buttons.  The {\tt Dbl Size} button doubles the displayed size of the
image every time you click on it.  This will be useful for viewing
the quarter-resolution images, as you might imagine.

You can also obtain pixel values by holding down the left button while
moving the pointer in the image window.  A text bar will be displayed, showing
you the image coordinates and brightness value where the pointer is located.

To quit {\tt xv}, just click on the {\tt Quit} button or type {\tt q}
in one of the {\tt xv} windows.

\section{The neural network and image access code}

We're supplying C code for a three-layer fully-connected feedforward
neural network which uses the backpropagation algorithm to tune its weights.
To make life as easy as possible, we're also supplying you with an image
package for accessing the face images, and we're also supplying the top-level
program for training and recognition, as a skeleton for you to modify.

The code is located in {\tt /afs/cs/project/theo-8/faceimages/code}.  Copy
all of the files in this area to your homework directory, and type {\tt
make}.  When the compilation is done, you should have one executable
program: {\tt facetrain}.  Briefly, {\tt facetrain} takes lists of image
files as input, and uses these as training and test sets for a neural
network.  {\tt facetrain} can be used for training and/or recognition,
and it also has the capability to save networks to files.

The code has been compiled and tested successfully on CS-side Alphas,
DecStations, and Sun SPARC-2s, and Andrew-side DecStations and Sun SPARC-5s.
If you wish to use the code on some other platform, feel free, but be
aware that the code has only been tested on these platforms.  If you
use some other platform and run into problems, let the TA (Jeff Shufelt,
{\tt js@maps.cs.cmu.edu}) know.

Details of the routines, explanations of the source files, and
related information can be found in Section \ref{docs} of this handout.

\section{The Assignment}

\begin{enumerate}

\item Issue the following command in your home directory to obtain
the training and test set data for this assignment:

{\tt cp /afs/cs/project/theo-8/faceimages/trainset/\{all,straight\}\_t*.list .}

\item Implement a ``you'' recognizer; i.e., implement a neural net which,
when given an image as input, indicates whether the face in the image is
you, or not you. This requires only trivial modifications to the code
you have been given, which is currently a ``TA'' recognizer (userid {\tt js}).
(If you weren't photographed, then use someone other than {\tt js} to
complete this portion of the assignment.)

\item Train it with the default learning parameter settings (learning
rate 0.3, momentum 0.3) for 75 epochs, with the following command:

{\tt facetrain -n you.net -t straight\_train.list -1 straight\_test1.list}
\newline {\tt -2 straight\_test2.list -e 75}

{\tt facetrain}'s arguments are described in Section \ref{RUNFACE},
but a short description is in order here.  {\tt you.net} is the name of
the network file which will be saved when training is finished.
{\tt straight\_train.list}, {\tt straight\_test1.list}, and
{\tt straight\_test2.list} are text files which specify the training
set (70 examples) and two test sets (32 and 50 examples), respectively.

This command creates and trains your net on a randomly chosen sample of 70
of the 152 ``straight'' images, and tests it on the remaining 32 and 50
randomly chosen images, respectively.  One way to think of this test
strategy is that roughly $\frac{1}{3}$ of the images ({\tt
straight\_test2.list}) have been held over for testing, and the remaining
$\frac{2}{3}$ have been used for a train-and-test strategy, in which
$\frac{2}{3}$ of these are being used for the training ({\tt
straight\_train.list}) and $\frac{1}{3}$ are being used for the test to
decide when to stop training ({\tt straight\_test1.list}).

Report your train/test performance and error as a function of epochs.  If you
had stopped training when the performance on test1 leveled off,
what would the performance have been on test2?  And vice versa?

\item In the previous experiment, why was performance so high on all
three sets (train, test1, and test2) after only one epoch of training?
(Hint: out of the 70 training examples, how many are positive?)

\item Implement a face recognizer; i.e. implement a neural net which,
when given an image as input, indicates who is in the image.  To do this, you
will need to implement a different output encoding (since you must
now be able to distinguish among 20 people).  Describe your output
encoding.  (Hint: leave learning rate and momentum at 0.3, and use 20
hidden units).

\item As before, train the network, this time for 100 epochs:

{\tt facetrain -n face.net -t straight\_train.list -1 straight\_test1.list}
\newline {\tt -2 straight\_test2.list -e 100}

You might be wondering why you are only training on samples from a limited
distribution (the ``straight'' images).  The sole reason is training
time.  If you have access to a very fast machine (anything slower than
an Alpha will be too slow), then you are welcome to do these experiments
on the entire set (replace {\tt straight} with {\tt all} in the command
above.  Otherwise, stick to the ``straight'' images.

As before, report your train/test performance and error as a function of
epochs.  If you had stopped training when the performance on test1 leveled
off, what would the performance have been on test2?  And vice versa?

\item Implement a pose recognizer; i.e. implement a neural net which,
when given an image as input, indicates whether the person in the image
is looking straight ahead, up, to the left, or to the right.  You
will also need to implement a different output encoding for this task.
Describe your output encoding.  (Hint: leave learning rate and momentum at
0.3, and use 6 hidden units).

\item Train the network for 100 epochs, this time on samples drawn
from all of the images:

{\tt facetrain -n pose.net -t all\_train.list -1 all\_test1.list}
\newline {\tt -2 all\_test2.list -e 100}

Since the pose-recognizing network should have substantially fewer
weights to update than the face-recognizing network, even those of you
with slow machines can get in on the fun of using all of the images.
In this case, 260 examples are in the training set, 140 examples are in
test1, and 193 are in test2.

As before, report your train/test performance and error as a function
of epochs.  If you had stopped training when the performance on test1
leveled off, what would the performance have been on test2?  And vice
versa?

\item Finally, it's your turn to have some fun, now that you know
your way around {\tt facetrain}.  For either the face or
pose recognition problem, make some interesting modification and
evaluate the change in performance, using the ``straight'' or ``all''
train and test sets as appropriate.  Some possibilities for
experimentation (or choose one of your own invention):

\begin{itemize}

\item Change the input or output encodings to try to improve
generalization accuracy.

\item Vary the number of hidden units, the number of training examples,
the number of epochs, the momentum and learning rate, or whatever else you
want to try, with the goal of getting the greatest possible discrepancy
between train and test set accuracy (i.e., how badly can you make the
network overfit), and the smallest possible discrepancy (i.e., what is
the best performance you can achieve).

\item Use the output of the pose recognizer as input to the face
recognizer, and see how this affects performance.  To do this, you
will need to add a mechanism for saving the output units of the pose
recognizer and a mechanism for loading this data into the face
recognizer.

\item Use the image package and/or anything else you might have
available to try to understand what the network has actually learned
(Hint: each hidden unit has $32 \times 30$ input weights, the same size
as an image....)  Using this information, what do you think the network
is learning?

\end{itemize}

\end{enumerate}

\section{Documentation}
\label{docs}

The code for this assignment is broken into several modules:

\begin{itemize}
\item {\tt pgmimage.c}, {\tt pgmimage.h}: the image package.  Supports
read/write of PGM image files and pixel access/assignment.  Provides
an {\tt IMAGE} data structure, and an {\tt IMAGELIST} data structure
(an array of pointers to images; useful when handling many images). 
{\bf You will not need to modify any code in this module to complete
the assignment.}

\item {\tt backprop.c}, {\tt backprop.h}: the neural network package.
Supports three-layer fully-connected feedforward networks, using the
backpropagation algorithm for weight tuning.  Provides high level
routines for creating, training, and using networks.  {\bf You will not
need to modify any code in this module to complete the assignment.}

\item {\tt imagenet.c}: interface routines for loading images into
the input units of a network, and setting up target vectors for training.
You will need to modify the routine {\tt load\_target}, when
implementing the face recognizer and the pose recognizer, to set
up appropriate target vectors for the output encodings you choose.

\item {\tt facetrain.c}: the top-level program which uses all of the
modules above to implement a ``TA'' recognizer.  You will need to
modify this code to change network sizes and learning parameters,
both of which are trivial changes.  The performance evaluation routines
{\tt performance\_on\_imagelist()} and {\tt evaluate\_performance()} are
also in this module; you will need to modify these for your face
and pose recognizers.
\end{itemize}

Although you'll only need to modify code in {\tt imagenet.c} and
{\tt facetrain.c}, feel free to modify anything you want in any of
the files if it makes your life easier or if it allows you to do
a nifty experiment.

\subsection{Running {\tt facetrain}}
\label{RUNFACE}

{\tt facetrain} has several options which can be specified on the
command line.  This section briefly describes how each option
works.  A very short summary of this information can be obtained
by running {\tt facetrain} with no arguments.

\begin{description}
\item {\tt -n <network file>} - this option either loads an existing
network file, or creates a new one with the given name.  At the end
of training, the neural network will be saved to this file.

\item {\tt -e <number of epochs>} - this option specifies the number
of training epochs which will be run.  If this option is not
specified, the default is 100.

\item {\tt -s <seed>} - an integer which will be used as the
seed for the random number generator.  The default seed is 102194
(guess what day it was when I wrote this document).  This allows you to
reproduce experiments if necessary, by generating the same sequence of
random numbers.  It also allows you to try a different set of random
numbers by changing the seed.

\item {\tt -S <number of epochs between saves>} - this option specifies
the number of epochs between saves.  The default is 100, which means
that if you train for 100 epochs (also the default), the network is
only saved when training is completed.

\item {\tt -t <training image list>} - this option specifies a text
file which contains a list of image pathnames, one per line, that
will be used for training.  If this option is not specified, it
is assumed that no training will take place ($epochs = 0$), and
the network will simply be run on the test sets.  In this case,
the statistics for the training set will all be zeros.

\item {\tt -1 <test set 1 list>} - this option specifies a text
file which contains a list of image pathnames, one per line, that
will be used as a test set.  If this option is not specified,
the statistics for test set 1 will all be zeros.

\item {\tt -2 <test set 2 list>} - same as above, but for test set 2.
The idea behind having two test sets is that one can be used as part
of the train/test paradigm, in which training is stopped when performance
on the test set begins to degrade.  The other can then be used as a
``real'' test of the resulting network.

\end{description}

\subsection{Interpreting the output of {\tt facetrain}}

At the end of each epoch, {\tt facetrain} outputs a number of
performance measures, in the following format:

{\tt <epoch> <delta> <trainperf> <trainerr> <t1perf> <t1err>
<t2perf> <t2err>}

These values have the following meanings:

\begin{description}
\item {\tt epoch} is the number of the epoch just completed; it follows
that a value of 0 means that no training has yet been performed.

\item {\tt delta} is the sum of all $\delta$ values on the hidden and
output units as computed during backprop, over all training examples
for that epoch.

\item {\tt trainperf} is the percentage of examples in the training set
which were correctly classified.

\item {\tt trainerr} is the average, over all training examples,
of the error function $\frac{1}{2} \sum (t_{i} - o_{i})^{2}$, where
$t_{i}$ is the target value for output unit $i$ and $o_{i}$ is the actual
output value for that unit.

\item {\tt t1perf} is the percentage of examples in test set 1
which were correctly classified.

\item {\tt t1err} is the average, over all examples in test set 1,
of the error function described above.

\item {\tt t2perf} is the percentage of examples in test set 2
which were correctly classified.

\item {\tt t2err} is the average, over all examples in test set 2,
of the error function described above.
\end{description}

\subsection{Tips}

Although you do not have to modify the image or network packages,
you will need to know a little bit about the routines and data structures
in them, so that you can easily implement new output encodings for
your networks.  The following sections describe each of the packages
in a little more detail.  You can look at {\tt imagenet.c},
{\tt facetrain.c}, and {\tt facerec.c} to see how the routines are
actually used.  

In fact, it is probably a good idea to look over {\tt facetrain.c}
first, to see how the training process works.  You will notice
that {\tt load\_target()} from {\tt imagenet.c} is called to set
up the target vector for training.  You will also notice the
routines which evaluate performance and compute error statistics,
{\tt performance\_on\_imagelist()} and {\tt evaluate\_performance()}.
The first routine iterates through a set of images, computing the
average error on these images, and the second routine computes
the error and accuracy on a single image.

You will almost certainly not need to use all of the information
in the following sections, so don't feel like you need to know
everything the packages do.  You should view these sections
as reference guides for the packages, should you need information
on data structures and routines.

Another fun thing to do, if you didn't already try it in the last
question of the assignment, is to use the image package
to view the weights on connections in graphical form; you will find
routines for creating and writing images, if you want to play around
with visualizing your network weights.

Finally, the point of this assignment is for you to obtain first-hand
experience in working with neural networks; it is {\bf not} intended as an
exercise in C hacking.  An effort has been made to keep the image package
and neural network package as simple as possible.  If you need
clarifications about how the routines work, don't hesitate to ask.

\subsection{The neural network package}

As mentioned earlier, this package implements three-layer fully-connected
feedforward neural networks, using a backpropagation weight tuning
method.  We begin with a brief description of the data structure,
a {\tt BPNN} ({\tt B}ack{\tt P}rop{\tt N}eural{\tt N}et).

All unit values and weight values are stored as {\tt double}s in a
{\tt BPNN}.

Given a {\tt BPNN *net}, you can get the number of input, hidden,
and output units with {\tt net->input\_n}, {\tt net->hidden\_n},
and {\tt net->output\_n}, respectively.

Units are all indexed from $1$ to $n$,
where $n$ is the number of units in the layer.  To get the value
of the {\tt k}th unit in the input, hidden, or output layer, use
{\tt net->input\_units[k]}, {\tt net->hidden\_units[k]}, or
{\tt net->output\_units[k]}, respectively.

The target vector is assumed to have the same number of units as the
output layer, and it can be accessed via {\tt net->target}.  The
{\tt k}th target unit can be accessed by {\tt net->target[k]}.

To get the value of the weight connecting the {\tt i}th input unit
to the {\tt j}th hidden unit, use {\tt net->input\_weights[i][j]}.
To get the value of the weight connecting the {\tt j}th hidden unit
to the {\tt k}th output unit, use {\tt net->hidden\_weights[j][k]}.

The routines are as follows:

\begin{description}
\item {\tt void bpnn\_initialize(seed)\newline
int seed;}

This routine initializes the neural network package.  It should be
called before any other routines in the package are used.  Currently,
its sole purpose in life is to initialize the random number generator
with the input {\tt seed}.

\item {\tt BPNN *bpnn\_create(n\_in, n\_hidden, n\_out)\newline
int n\_in, n\_hidden, n\_out;}

Creates a new network with {\tt n\_in} input units, {\tt n\_hidden} hidden
units, and {\tt n\_output} output units.  All weights in the network
are randomly initialized to values in the range $[-1.0, 1.0]$.  Returns
a pointer to the network structure.  Returns {\tt NULL} if the routine
fails.

\item {\tt void bpnn\_free(net)\newline
BPNN *net;}

Takes a pointer to a network, and frees all memory associated with
the network.

\item {\tt void bpnn\_train(net, learning\_rate, momentum, erro, errh)\newline
BPNN *net;\newline double learning\_rate, momentum;\newline
double *erro, *errh;}

Given a pointer to a network, runs one pass of the backpropagation algorithm.
Assumes that the input units and target layer have been properly set up.
{\tt learning\_rate} and {\tt momentum} are assumed to be values between
$0.0$ and $1.0$.  {\tt erro} and {\tt errh} are pointers to doubles, which
are set to the sum of the $\delta$ error values on the output units
and hidden units, respectively.

\item {\tt void bpnn\_feedforward(net)\newline
BPNN *net;}

Given a pointer to a network, runs the network on its current input
values.

\item {\tt BPNN *bpnn\_read(filename)\newline
char *filename;}

Given a filename, allocates space for a network, initializes it with the
weights stored in the network file, and returns a pointer to this new
{\tt BPNN}.  Returns {\tt NULL} on failure.

\item {\tt void bpnn\_save(net, filename)\newline
BPNN *net;\newline char *filename;}

Given a pointer to a network and a filename, saves the network to that
file.

\end{description}

\subsection{The image package}

The image package provides a set of routines for manipulating PGM images.
An image is a rectangular grid of pixels; each pixel has an integer value
ranging from 0 to 255.  Images are indexed by rows and columns; row 0
is the top row of the image, column 0 is the left column of the image.

\begin{description}

\item {\tt IMAGE *img\_open(filename)\newline char *filename;}

Opens the image given by {\tt filename}, loads it into a new {\tt IMAGE}
data structure, and returns a pointer to this new structure.
Returns {\tt NULL} on failure.

\item {\tt IMAGE *img\_creat(filename, nrows, ncols)\newline
char *filename;\newline
int nrows, ncols;}

Creates an image in memory, with the given filename, of dimensions
{\tt nrows} $\times$ {\tt ncols}, and returns a pointer to this image.
All pixels are initialized to 0.  Returns {\tt NULL} on failure.

\item {\tt int ROWS(img)\newline IMAGE *img;}

Given a pointer to an image, returns the number of rows the image has.

\item {\tt int COLS(img)\newline IMAGE *img;}

Given a pointer to an image, returns the number of columns the image has.

\item {\tt char *NAME(img)\newline IMAGE *img;}

Given a pointer to an image, returns a pointer to its base filename
(i.e., if the full
filename is {\tt /usr/joe/stuff/foo.pgm}, a pointer to the string
{\tt foo.pgm} will be returned).

\item {\tt int img\_getpixel(img, row, col)\newline IMAGE *img;\newline
int row, col;}

Given a pointer to an image and row/column coordinates, this routine returns
the value of the pixel at those coordinates in the image.

\item {\tt void img\_setpixel(img, row, col, value)\newline
IMAGE *img;\newline int row, col, value;}

Given a pointer to an image and row/column coordinates, and an integer
{\tt value}
assumed to be in the range $[0, 255]$, this routine sets the pixel
at those coordinates in the image to the given value.

\item {\tt int img\_write(img, filename)\newline
IMAGE *img;\newline char *filename;}

Given a pointer to an image and a filename, writes the image to disk with
the given filename.  Returns 1 on success, 0 on failure.

\item {\tt void img\_free(img)\newline IMAGE *img;}

Given a pointer to an image, deallocates all of its associated memory.

\item {\tt IMAGELIST *imgl\_alloc()}

Returns a pointer to a new {\tt IMAGELIST} structure, which is really just
an array of pointers to images.  Given an {\tt IMAGELIST *il},
{\tt il->n} is the number of images in the list.  {\tt il->list[k]}
is the pointer to the {\tt k}th image in the list.

\item {\tt void imgl\_add(il, img)\newline
IMAGELIST *il;\newline IMAGE *img;}

Given a pointer to an imagelist and a pointer to an image, adds the image
at the end of the imagelist.

\item {\tt void imgl\_free(il)\newline IMAGELIST *il;}

Given a pointer to an imagelist, frees it.  Note that this does not
free any images to which the list points.

\item {\tt void imgl\_load\_images\_from\_textfile(il, filename)\newline
IMAGELIST *il;\newline char *filename;}

Takes a pointer to an imagelist and a filename.  {\tt filename} is
assumed to specify a file which is a list of pathnames of images,
one to a line.  Each image file in this list is loaded into memory
and added to the imagelist {\tt il}.

\end{description}

\end{document}
