
             How to use programs bp and cg and data set FL3

                              James L. Blue
              Computing and Applied Mathematics Laboratory
             National Institute of Standards and Technology

                            February 14, 1992

The backpropagation (bp) and conjugate gradient (cg) programs use many of
the same programs, and share a common format for input and output. Both
programs train and test standard "3-layer" fully-connected feed-forward
networks. Both are written in Fortran using the Ratfor preprocessor.

These programs are a contribution of the United States government, and
are not subject to copyright.

Some implementation-dependent parameters are in file "sizcom.h"; they may
be changed before compiling.

For users who do not have the Ratfor preprocessor, Fortran files are also
included. In this case, the implementation-dependent parameters must be
changed in each place they occur.

Note: The driver code assumes that runs are for classifying; the target
outputs for each pattern are all 0 except for a single 1, and the network
has one output for each class. The CG and BP codes do not make this
assumption.

For each run, a specification file must be prepared. This file must be
named "spec" (without the quotes). The specific name for the file is
required because, unlike C, Fortran cannot portably read its command
line. A typical spec file is shown below. (It is here indented to set it
off from the rest of the text; the real spec file lines should start in
column 1.) Lines 2 through 7 specify a training run; lines 8 through 13
specify a testing run using the network that was just trained.

	2
	train.out
	train.run
	fl3trn
	in.wts
	train.wts
	500 32  08  10  10.0  0.25  0.001  12345
	200  0.01  1.e-12  10  1.0  0.0  1
	test.out
	test.run
	fl3tst
	train.wts
	trash
	1434  32  08  10  10.0  0.25  0.0  0
	0  0.0  0.0 10  0.0  0.0  0

Each of the lines will now be discussed in order for the training run.

	2
The number of runs to be done. Often two runs are done, one for training
and one for testing. (If the number of iterations is 1 or more, the run
is for training; if 0 or fewer, the run is for testing.)

	train.out
The name of the file on which to write a detailed summary of the
activations for each pattern of the neural net. A typical line follows:
    11 =  2 R  2     0   131  1000     0     0     0     0     0     0     0
The fields on each line are:
	11   The pattern index
     2   The pattern ID (the correct output)
     R   The classification: Right, Unknown, Wrong
     2   The actual classification
followed by the activation levels, scaled from 0 to 1000.

	train.run
The name of the file on which to write a summary of the computer run,
including the results. This information also appears on the standard
output. This is the file to look at and to save. A typical file is given
as Appendix 1.

	fl3trn
The name of the file containing the input patterns. The file must start
with 3 integers: the number of patterns, the number of inputs per
pattern, and the number of outputs per pattern. On the next line are the
"names" of each output separated by blanks. For example, fl3trn starts
with
	2000   32   10
	0 1 2 3 4 5 6 7 8 9
although it could have as easily started with
	2000   32   10
	one   two   three   four   five   six   seven   eight   nine

For each pattern, there are at least two lines. The input values for each
pattern start on a new line and may continue for several lines; values
may be real numbers or integers. The desired output results for this
pattern also start on a new line; values may be real numbers or integers.
Programs bp and cg assume a classification type of neural network, with
the expected values of the outputs all 0 except for a single 1 marking
the character encoded.

	in.wts
The name of the file containing the initial weights of the network. The
weights for each node must start on a new line. If the input parameter
"iseed" is positive, this file name is not used, but a pseudo-random
number generator (PSRG) is used to generate weights, and iseed is used to
initialize the generator. (This is the standard method.) The PSRG is
designed to give the same numbers on all computers, facilitating making
comparisons between computers.

	train.wts
The name of the file on which to write the weights of the network after
training. (If the run is for testing, the weights are not written to a
file and this file name is not used.) Weights are written six to a line.

	500 32  08  10  10.0  0.25  0.001  12345
These are network parameters for the run. Their mnemonic names
are given on the next line, following which the meanings are given.
	npats ninp nhid nout eta alpha wf nseed	

	npats
The number of patterns to read and use for training or testing.

	ninp
The number of input nodes in the network.

	nhid
The number of hidden-layer nodes in the network.

	nout
The number of output-layer nodes in the network

    eta
The eta value in back-propagation. NOTE: the usual definition of eta is
npats times this eta. (This parameter is ignored by the conjugate gradient
program.) This definition of eta is used so that, to a first
approximation, eta does not need to be changed if the number of input
patterns changes.

	alpha
The alpha value in back-propagation. This parameter is ignored by the
conjugate gradient program.

	wf
The weight factor for the sum of squares of weights in the error term,
the Greek letter "mu" in the paper.

	nseed
The initialization value for the PSRG. This should be an odd integer from
1 to 32767. If it is 0 or less, the initial weights are read from a file.
If an even integer is entered, the next smaller integer is used instead.

	200  0.01  1.e-12  10  1.0  0.0  1
These are parameters governing stopping criteria for the run. Their
mnemonic names are given on the next line, following which the meanings
are given.

	niter egoal gwgoal nfreq errdel oklvl nokdel

	niter
The allowed number of iterations to use. The run is for training if niter
is 1 or more, and otherwise for testing.

	egoal
The acceptable error level at which to stop. The error is a root-mean-
square error per output node; it is an "average" error on the average
output node. The program checks egoal every iteration.

	gwgoal
The acceptable ratio of the RMS gradient to the RMS weight value. The
program checks gwgoal every iteration.

	nfreq
The frequency at which to report the progress of the training. This is
also the frequency at which to check for slow convergence. For cg, 10
is a reasonable value; for bp, 100 is reasonable.

	errdel
The program says that convergence is too slow if the RMS error is more
than errdel times the RMS error that was attained nfreq iterations
previously.

	oklvl
The acceptable activation level at which to "believe" an output as a 1.
The range of the activation levels is from 0 to 1. Patterns resulting in
outputs for which the largest activation level is below oklvl are treated
as unknowns by the scoring routine.

	nokdel
The program says that convergence is too slow if the number of correctly
identified patterns has not been increased by as much as nokdel from the
number that was attained nfreq iterations previously.


The numerical parameters for the testing run are similar. For testing
runs, the input weights are always read from a file, and output weights
are not written to a file.

The values of the following network parameters are ignored during
testing runs:
	eta alpha

The values of the following stopping criteria are ignored during testing
runs:
	
	egoal gwgoal nfreq errdel nokdel

                               Appendix 1
                  Typical ``train.run'' file for CG run

 Training on fl3trn
 Input, hidden, output nodes   32   8  10;    500 patterns
 Random initial weights, seed      12345
 Wfactor    1.000E-03
 stopping criteria:
  (RMS err)  <=    1.000E-02 OR
  (RMS g)    <=    1.000E-12 * (RMS w) OR
  (RMS err)  >     1.000E+00 * (RMS err   10 iterations ago) OR
  (OK count) <  (OK count   10 iterations ago) +    1 (OK level is 0.000)

 Conjgrad: doing    200 iterations;   354 variables
     Iter  Err  (  Ep    Ew )   OK  UNK   NG      OK   UNK    NG
        0 0.540 (0.540 0.284)   50    0  450 =   10.0   0.0  90.0%
       10 0.285 (0.285 0.508)  124    0  376 =   24.8   0.0  75.2%
       20 0.252 (0.251 0.946)  282    0  218 =   56.4   0.0  43.6%
       30 0.207 (0.203 1.337)  402    0   98 =   80.4   0.0  19.6%
       40 0.161 (0.148 1.985)  450    0   50 =   90.0   0.0  10.0%
       50 0.142 (0.121 2.300)  476    0   24 =   95.2   0.0   4.8%
       60 0.132 (0.104 2.578)  482    0   18 =   96.4   0.0   3.6%
       70 0.127 (0.093 2.715)  487    0   13 =   97.4   0.0   2.6%
       80 0.124 (0.085 2.827)  490    0   10 =   98.0   0.0   2.0%
       90 0.122 (0.080 2.900)  491    0    9 =   98.2   0.0   1.8%
      100 0.121 (0.078 2.918)  491    0    9 =   98.2   0.0   1.8%

 oklvl  0.00
 # Highest two outputs (mean)  0.885  0.101 ; mean diff  0.785
 #  key:   0   1   2   3   4   5   6   7   8   9
 #  row: correct, column: actual
 #    0:  50   0   0   0   0   0   0   0   0   0
 #    1:   0  50   0   0   0   0   0   0   0   0
 #    2:   0   0  48   0   1   0   0   0   1   0
 #    3:   0   0   0  50   0   0   0   0   0   0
 #    4:   0   1   0   0  49   0   0   0   0   0
 #    5:   0   0   0   2   0  47   0   0   1   0
 #    6:   0   0   0   0   0   0  50   0   0   0
 #    7:   0   0   0   0   0   0   0  50   0   0
 #    8:   1   0   0   0   1   0   0   0  48   0
 #    9:   0   0   0   0   0   0   0   1   0  49
 #  unknown
 #    *    0   0   0   0   0   0   0   0   0   0

 #  mean highest activation level
 #  row: correct, column: actual
 #  key:   0   1   2   3   4   5   6   7   8   9
 #    0:  92   0   0   0   0   0   0   0   0   0
 #    1:   0  90   0   0   0   0   0   0   0   0
 #    2:   0   0  92   0  34   0   0   0  37   0
 #    3:   0   0   0  91   0   0   0   0   0   0
 #    4:   0   3   0   0  90   0   0   0   0   0
 #    5:   0   0   0  31   0  88   0   0  34   0
 #    6:   0   0   0   0   0   0  92   0   0   0
 #    7:   0   0   0   0   0   0   0  89   0   0
 #    8:   9   0   0   0  40   0   0   0  87   0
 #    9:   0   0   0   0   0   0   0  11   0  87
 #  unknown
 #    *    0   0   0   0   0   0   0   0   0   0

 Histogram of errors, from 2**(-10) to 1
   1929    323    382    452    426    415    425    326    226     81     15
   38.6    6.5    7.6    9.0    8.5    8.3    8.5    6.5    4.5    1.6    0.3%

     Iter  Err  (  Ep    Ew )   OK  UNK   NG      OK   UNK    NG
  F   100 0.121 (0.078 2.918)  491    0    9 =   98.2   0.0   1.8%
 Iter   100; ierr 4 : slow convergence of OK
 Used   100 iterations;    205 function calls; Err  0.121; |g|/|w| 2.841E-06
 Rms change in weights 2.879

        thresh       right   unknown     wrong   correct  rejected
 1tr    0.000000       491         0         9     98.20      0.00
 2tr    0.326928       490         5         5     98.99      1.00
 3tr    0.458614       489        10         1     99.80      2.00
 4tr    0.515061       485        15         0    100.00      3.00
 5tr    0.583675       480        20         0    100.00      4.00
 6tr    0.611205       475        25         0    100.00      5.00
 7tr    0.665457       465        35         0    100.00      7.00
 8tr    0.712340       450        50         0    100.00     10.00
 9tr    0.769677       425        75         0    100.00     15.00
10tr    0.817483       400       100         0    100.00     20.00
11tr    0.847214       375       125         0    100.00     25.00
12tr    0.870498       350       150         0    100.00     30.00
13tr    0.893027       325       175         0    100.00     35.00
14tr    0.910763       300       200         0    100.00     40.00
15tr    0.925240       275       225         0    100.00     45.00
16tr    0.936654       250       250         0    100.00     50.00

 Elapsed time     73.4 seconds
 Weights written to file trn.wts


                               Appendix 2
                  Typical ``test.run'' file for CG run

 Testing on fl3tst
 Input, hidden, output nodes   32   8  10;   1434 patterns
 Initial weights from file trn.wts
 Wfactor    0.000E+00

 oklvl  0.00
 # Highest two outputs (mean)  0.866  0.130 ; mean diff  0.736
 #  key:   0   1   2   3   4   5   6   7   8   9
 #  row: correct, column: actual
 #    0: 163   0   0   0   1   3   2   1   1   0
 #    1:   0 175   0   0   0   0   0   0   0   0
 #    2:   1   2 119   3   4   0   4   6   8   2
 #    3:   5   2   6 142   1   2   0   7   0   3
 #    4:   0   0   0   0 134   2   2   0   3   1
 #    5:   1   0   0   1   1  74   2   1   4   0
 #    6:   1   6   1   0   0   3 131   0   0   0
 #    7:   0   0   3   0   0   1   0 131   2   2
 #    8:   1   4   3   5   1   6   7   1  96   3
 #    9:   0   0   0   1   2   1   0   1   3 129
 #  unknown
 #    *    0   0   0   0   0   0   0   0   0   0

 #  mean highest activation level
 #  row: correct, column: actual
 #  key:   0   1   2   3   4   5   6   7   8   9
 #    0:  92   0   0   0  65  63  82  84  69   0
 #    1:   0  94   0   0   0   0   0   0   0   0
 #    2:  61  81  89  87  66   0  66  69  63  38
 #    3:  54  51  74  85  34  48   0  55   0  55
 #    4:   0   0   0   0  89  66  71   0  51  48
 #    5:  47   0   0  61  31  86  53  94  65   0
 #    6:  33  41  62   0   0  25  92   0   0   0
 #    7:   0   0  83   0   0  55   0  92  41  37
 #    8:  43  44  60  39   4  71  62  72  85  62
 #    9:   0   0   0  68  34  35   0  60  44  88
 #  unknown
 #    *    0   0   0   0   0   0   0   0   0   0

 Histogram of errors, from 2**(-10) to 1
   5487   1008   1106   1212   1295   1278   1122    761    466    303    302
   38.3    7.0    7.7    8.5    9.0    8.9    7.8    5.3    3.2    2.1    2.1%

           Err  (  Ep    Ew )   OK  UNK   NG      OK   UNK    NG
     Test 0.131 (0.131 2.918) 1294    0  140 =   90.2   0.0   9.8%

        thresh       right   unknown     wrong   correct  rejected
 1ts    0.000000      1294         0       140     90.24      0.00
 2ts    0.200197      1292        14       128     90.99      0.98
 3ts    0.278109      1284        29       121     91.39      2.02
 4ts    0.325441      1277        43       114     91.80      3.00
 5ts    0.363239      1272        57       105     92.37      3.97
 6ts    0.432642      1264        72        98     92.80      5.02
 7ts    0.502514      1247       100        87     93.48      6.97
 8ts    0.591357      1220       143        71     94.50      9.97
 9ts    0.698086      1170       215        49     95.98     14.99
10ts    0.783234      1110       287        37     96.77     20.01
11ts    0.834632      1049       359        26     97.58     25.03
12ts    0.879052       983       430        21     97.91     29.99
13ts    0.903504       914       502        18     98.07     35.01
14ts    0.925229       844       574        16     98.14     40.03
15ts    0.940251       776       645        13     98.35     44.98
16ts    0.949892       705       717        12     98.33     50.00

 Elapsed time      0.8 seconds

