.CCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCC
.C $Id: fem.l,v 1.2 92/11/30 11:55:27 drew Exp $
.CCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCC
.C
.CCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCC
.C   Copyright 1990,1991,1992,1993 by The University of Toronto,
.C                    Toronto, Ontario, Canada.
.C 
.C                       All Rights Reserved
.C 
.C Permission to use, copy, modify, distribute, and sell this software
.C and its  documentation for any  purpose  is hereby granted  without
.C fee, provided that the above copyright notice appears in all copies
.C and that both  the copyright   notice  and  this permission  notice
.C appear in supporting documentation, and that the name of University
.C of Toronto not be  used in  advertising  or publicity pertaining to
.C distribution   of the  software  without   specific,  written prior
.C permission.  University of  Toronto makes no representations  about
.C the suitability  of this  software for  any purpose. It is provided
.C "as is" without express or implied warranty.
.C
.C UNIVERSITY OF TORONTO DISCLAIMS  ALL WARRANTIES WITH REGARD TO THIS
.C SOFTWARE,  INCLUDING  ALL IMPLIED WARRANTIES OF MERCHANTABILITY AND
.C FITNESS, IN NO EVENT SHALL UNIVERSITY OF TORONTO  BE LIABLE FOR ANY
.C SPECIAL,  INDIRECT    OR  CONSEQUENTIAL DAMAGES    OR  ANY  DAMAGES
.C WHATSOEVER RESULTING FROM LOSS OF  USE, DATA OR PROFITS, WHETHER IN
.C AN ACTION OF CONTRACT, NEGLIGENCE OR OTHER TORTIOUS ACTION, ARISING
.C OUT  OF  OR IN  CONNECTION WITH  THE USE OR  PERFORMANCE   OF  THIS
.C SOFTWARE.
.CCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCC
.C
.de BF          \" boldface a word
\fI\\$1\fP
..
.de FT          \" start a field table with title
.PP
.nf
.ta 2.5i 4.5i
.ce 1
\\$1
..
.de CF          \" center a function on a line
.sp
.ce 1
\\$1
.sp
..
.TH fem LOCAL "September 1991" "Xerion" "Xerion Manual"
.SH NAME
fem - Xerion Free Energy Manipulation Network module

.SH SYNOPSIS
fem [ commands ]
.br
run  [ run options ] fem [ commands ]

.SH DESCRIPTION 
\fIfem\fP is a version of the Free Energy Manipulation algorithm,
built using the Xerion Neural Network Simulator. As such, it
understands all of the commands  that  are  built  into  the
Xerion simulator.

The FEM algorithm (introduced by Galland and Hinton, 1990) is designed for
mean field networks that have NO output units and perform a binary
classification task by adopting low free energy for positive instances and
high free energy for negative instances.

The idea is to explicitly manipulate the free energy of a network containing
hidden units -- push it down for "positive" examples and push it up for
"negative examples".  If there are no connections among the hidden units,
there is no need for an inner loop settling process.  Each hidden unit is only
affected by the input units, so when these are clamped, the probability that
the hidden unit is on can be computed separately for each hidden unit.  This
results in one-step network relaxations.  Once the network is at equilibrium,
the FEM algorithm makes use of the very simple way in which the free energy
changes as the weight between two units changes.

The FEM algorithm can also be applied to networks in which the hidden units
are interconnected, but then the network takes longer to settle to
equilibrium, and it may settle to a non-global free energy minimum.

For some background and references on the Mean Field Theory and Free Energy
Manipulation training and relaxation algorithms, see the LaTeX documentation
in the $XERIONDIR/doc/*.tex files.


.SH ERROR AND GRADIENT UPDATE ALGORITHMS

Below is a greatly reduced and simplified pseudocode representation of
the routines that calculate the error of the network on an example
set, and calculate the derivatives of the connection weights with
respect to this error.  Where possible, actual variable and function
names are used.  Most of the support functions (setOutput,
clampOutput, unitCostUpdate, etc.)  are defined in the fem.c file.

.nf
/***********************************************************
 ********  Update net error and associated derivs   ********
 ***********************************************************/
int             errorDerivUpdate(net, exampleSet)
  Net           net ;
  ExampleSet    exampleSet ;
{

  /* Initialize vars: */ 
  net->error = 0;  /* etc. */

  /* Go through all training cases (examples).  For each one, 
     do a relaxation phase, gather stats ("correlations", i.e.,
     products) and compute the Energy, Entropy, and Free Energy
     for the net; from F compute the "actual probability" 
     predicted by the net that the current case is a "positive"
     instance.  From this actual prob and the desired prob compute
     the error and the gradients and weight updates. */

  for numExamples = 0 to net->batchSize - 1 { 

    /* Initialization */
    getNextExample(exampleSet) ;

    /* probDesired given in example data */
    energy = entropy = freeEnergy = probActual = 0.0;

    /* Relaxation phase -- there are no "positive" and "negative" phases. */

    /* Usually the FEM algorithm is used with only 1 hidden 
       layer, and with no interconnections between hidden units.
       In this case, the relaxation becomes trivial, and no annealing
       is needed.  Thus the user should specify the net parameter
       settings anneal=0 and relaxStepSize = 1. */
       
    /* Single output unit is "dummy" placeholder for probActual */
    for all input units  { clampOutput } ;

    /* Relax net, using simulated annealing */ 
    while (!stopAnnealing) {
      for all hidden units { updateActivity };
      temp = temp * tDecay;
      if (temp <= tMin) OR (!anneal) OR (equilibrium) 
        stopAnnealing = true;
    }

    /* Compute Stats and Energies, Calc  (Si*Sj)+   */
    for all units { setIncomingProducts, POSITIVE } ;

    for all units { unitUpdateEnergies } ;

    energy = 0.5 * (0.0 - energy);
    freeEnergy = energy - (temp * entropy);

    eF = exp(freeEnergy);
    probActual = 1.0 / (1 + eF);

    /* Set the dummy output unit(s) to be equal to probActual */
    for all output units { setOutputProbActual } ;

    caseError = probActual - probDesired;
    error    += square(caseError) ;

    /* gradient update:
       dError/dWij =  - (probDesired - probActual) Si*Sj     */
    updateNetGradients(net) ;

    /* update the error for the net */
    updateNetActivities(net) ;
  }

  /* update cost after all examples done.  */
  for all units { updateCost } ;

  return 0 ;
}
/***********************************************************/

/***********************************************************
 *****               Update net error only           *******
/***********************************************************/
int             errorUpdate(net, exampleSet)
  Net           net ;
  ExampleSet    exampleSet ;
{
  net->error = 0.0 ;
  for numExamples = 0 to net->batchSize - 1 { 
    /* Similar to relaxation phase in training.  Relax net 
       with only the input units clamped. */
    getNextExample(exampleSet) ;

    for all input units  { clampOutput } ;
    /* Relax net, using simulated annealing */
    while (!stopAnnealing) {
      for all hidden units { updateActivity };
      temp = temp * tDecay;
      if (temp <= tMin) OR (!anneal) OR (equilibrium) 
        stopAnnealing = true;
    }
    
    /* Compute Energies and Update Error */
    for all units { unitUpdateEnergies } ;

    energy = 0.5 * (0.0 - energy);
    freeEnergy = energy - (temp * entropy);

    eF = exp(freeEnergy);
    probActual = 1.0 / (1 + eF);

    /* Set the dummy output unit(s) to be equal to probActual */
    for all output units { setOutputProbActual } ;

    caseError = probActual - probDesired;
    error    += square(caseError) ;
  }
  return 0 ;
}
/***********************************************************/
.fi

.SH DETAILS ON IMPORTANT SECTIONS AND FEATURES

This section describes in detail some important points of the above
procedures.

.SS Updating unit activations (outputs) during relaxation:

Because one typically uses the FEM algorithm for binary
classification tasks, in which the free energy needs only to be
"pushed up or down" on particular examples, it is possible to do away
with output units (except our implementation uses one as a
placeholder) and to have a single layer of hidden units with no
intra-layer connections.

In such a case, no annealing is needed as one weighted sum of inputs
and nonlinear "squashing" is all that is needed to pass all info from
the input layer to the hidden layer.  In this case, the user should
set anneal=0, relaxStepSize = 1, and tMax = 1, in a *.in file or in
the Network Parameters window.  Alternatively, the user may wish to
experiment with more elaborate connection architectures which require
annealing.  How to do this sensibly, as well as how to use FEM for
k-ary classification tasks, is left as an exercise for the user.

(If one does want to use annealing, then one should read the
documentation on for the MFT module and look at the sections on
activation updating and synchronous vs. asynchronous operation.)

.SS Updating gradients and network costs:

During batch learning, weights are updated after going through all
training examples once.  In online learning, weights are updated after
\fIeach\fP training case is processed.  We strongly recommend using
\fIminimize\fP with batch learning.  It is, however, possible to train
using online learning (see the description of \fIbatchSize\fP below).

The cost update procedure features options for controlling
the size of the weights by enforcing "weight costs" or "weight
decay".  Hence:

.nf
For each Link \fIlink\fP:
    unit->net->cost  += weightCost*square(link->weight) ;
    link->deriv      += 2.0*weightCost*link->weight ;
.fi
 
It is strongly recommended that the weights be changed using the
momentum update of the \fIminimize\fP command (minimize -momentum).
One might also try the conjugate gradient and line search options.
The dynamics of the weight change in these other minimization procedures
are more complex.  See the man (sman) page, and online help for
\fIminimize\fP for further details.


.SH NET PARAMETERS AND "GLOBAL" NETWORK VARIABLES

The net parameters that govern the training, testing, and relaxation
dynamics of the network on the examples, may be set in a *.in file (as
part of setting up a net and example sets) or in the Xerion "Network
Parameters" window or the Xerion main command window.  There are also
a few variables that *report* on the relaxation and training and
testing but which the user may not set.  The (settable) network
parameters are indicated by the /* netParam: */ comment.  (All of
these variables are defined in the fem.h and file in the fem 
directory.)


.IP "\fIint  batchSize ;            /* netParam: */\fP" 1i
The number of examples to process during each batch of training. If
this value is set to 1, the net will be training online. If set to 0,
all the examples in the training set will be processed before updating
the weights, and the net will be doing batch training. Any other
positive number can be used for "semi-batch" learning.

.IP "\fIReal  weightCost ;            /* netParam: */\fP" 1i
Cost associated with magnitude of weights.  It is sometimes useful to
limit the absolute magnitude of weights in this way, in order to
improve the trained net's "generalization" capabilities.  See above
section on weight updates.

.IP "\fIReal  zeroErrorRadius ;       /* netParam: */\fP" 1i
Interval of acceptance for agreement between desired output and target
output of a unit.  The degree to which a "near miss" will count as a
"hit".

.IP "\fIReal  energy ;\fP" 1i
The (Hopfield) energy of the net.  Double sum of weighted pairwise
products of unit activation values.

.IP "\fIReal  entropy ;\fP" 1i
The entropy of the net.  Sum of: probabilities (activation values,
standing in MFT and FEM for probabilities of "Spin"=1 in Boltzmann
machine) multiplied by log probabilities.

.IP "\fIReal  freeEnergy ;\fP" 1i
Net free energy = energy - (temperature * entropy).

.IP "\fIReal  probDesired ;  \fP" 1i
Desired (target) probability of classification of an example as a
"positive" instance.

.IP "\fIReal  probActual ;\fP" 1i
Actual computed probability of positive instance for an example.
probDesired = 1/(1+exp(freeEnergy) .

.IP "\fIReal  caseError ;\fP" 1i
Network error for one case (example). caseError = probDesired -
probActual .

.IP "\fIReal  tMax ;                  /* netParam: */\fP" 1i
Maximum temperature in annealing.  Usually something between 5 and 30
is considered reasonable.

.IP "\fIReal  tMin ;                  /* netParam: */\fP" 1i
Minimum temperature in annealing.  Typically set to 1.0 .

.IP "\fIReal  tDecay ;                /* netParam: */\fP" 1i
Factor by which to lower temperature at each annealing step.
Something between .80 (fast annealing) and .99 (very slow annealing)
is typical.

.IP "\fIReal  temperature ;\fP" 1i
Current network temperature in annealing process.

.IP "\fIint   anneal;                 /* netParam: */\fP" 1i
Whether to run simulated annealing.  0 = don't,  1 = do anneal .

.IP "\fIReal  relaxStepSize ;         /* netParam: */\fP" 1i
What proportion of the "desired" step in activation space (during
relaxation) to take.  Desired means what sigmoid(weighted sum of
inputs) is computed to be.  The normal setting is relaxStepSize = 1,
i.e. take the full step.  It might be reasonable, for certain desired
network dynamics, to take smaller step sizes (e.g., .85) in some
cases.

.IP "\fIReal  relaxTolerance ;        /* netParam: */\fP" 1i
Used in the calculation of whether "equilibrium" is reached at a given
point in the relaxation.  Basically, if largest change in the
activation value of any unit, in the previous relaxation sweep, is
smaller than relaxTolerance, then annealing will cease, as the network
is "at equilibrium."

.IP "\fIReal  maxDelta ;\fP" 1i
Used in the equilibrium calculation described above.

.IP "\fIint   unitFunction ;          /* netParam: */\fP" 1i
If this field is set to 0, the total input to a unit is passed through
a (0,1) sigmoid to produce its activation. If set to 1, the input is
passed through the tanh function.

.IP "\fIint   unitValueType ;         /* netParam: */\fP" 1i
This field is unused in the current implementation.

.IP "\fIint   synchronousUpdate ;     /* netParam: */\fP" 1i
Whether to update the units synchronously (set var to 1) or
asynchronously (set to 0) in the relaxation.  Asynch is considered
much better in the vast majority of cases.  Also note that random
order traversal is used in asynch case, and "standard" or "row major"
order of traversing the units is used in the synch case.

.IP "\fIint   delayCount  ;           /* netParam: */\fP" 1i
Degree of relaxation slowdown used for viewing the details of network
relaxation.  Basically, a trivial loop from 1 to delayCount*1000 is
used between network relaxation sweeps.  This may be used while
viewing the "testing" of cases (e.g. "clicking" on them in the
"Activations" window) and only while the "inRelaxation" updating
option is turned on in the Activations window. When viewing, try
setting the var to 100 or 1000; performance and ease of viewing will
vary depending on your machine's speed ("raw" CPU speed, memory access
speed, screen update speed, etc.), of course.

.IP "\fIReal  relaxSweepCountAve ;\fP" 1i
The *average* number of relaxation sweeps per example per training
loop.

.SH FILES
.nf
$XERIONDIR/src/sim/fem/fem.[ch]         Source code for the simulator
$XERIONDIR/src/sim/fem/fem-train.[ch]   Source code for the simulator
$XERIONDIR/nets/fem/*.in                Input files for sample nets
$XERIONDIR/nets/fem/*.layout            Layout files for sample nets
$XERIONDIR/nets/*.ex                    Example sets for sample nets
$XERIONDIR/config/femrc                 Initialization file for fem 

.SH SEE ALSO 
.nf
minimize(1XERION), run(LOCAL), bm(LOCAL), mft(LOCAL), bp(LOCAL)
.fi

For general information on the Xerion simulator, its user interface,
and implementation and portability issues, see the appropriate man
(sman) page, README file, or, once inside Xerion, the online help
pages.

For some background and references on the MFT, Boltzmann, and FEM 
training and relaxation algorithms, see the Latex documentation in
the $XERIONDIR/doc/*.tex files.

.SH AUTHOR
.nf
Evan W. Steeg (steeg@ai.toronto.edu)
Dept. of Computer Science
University of Toronto,
Toronto, ON, Canada
.fi
