# Neural Networks

Neural networks are simple, nonlinear function approximators. A weighted
sum of the inputs is passed through a nonlinear function, typically a
sigmoid such as the hyperbolic tangent. The result is called the output
of the first neuron. This is then repeated several times using different
weights, to get outputs for several neurons. In a single-hidden-layer
neural net, the output of the network is a linear combination of these
outputs. In a multiple-hidden-layer neural network, this is repeated
several times, each time treating the outputs of the previous layer
as the inputs to the next layer. It has been shown that all smooth
functions can be approximated to arbitrary precision by a single-hidden-layer
network with enough neurons.
Neural networks are often used in supervised learning by
training them with the *error backpropagation* algorithm.
This algorithm simply does gradient descent on the squared error
in the outputs of the network. Of course, the squared error in the
outputs for a single training example is just an unbiased estimate
of the mean squared error (where "mean" is the average over all
training examples). So Backprop is an example of stochastic
gradient descent. It has been proved that when the learning rate
decreases appropriately and the weights do not diverge, the
algorithm has guaranteed convergence to a local minimum with probability
one. Similar results can be shown for reinforcement-learning algorithms
using residual algorithms.

Backprop can also be applied to neural networks where certain
weights are constrained to be equal to each other. Gradient
descent in that case is equivalent to doing one step of gradient
descent without the constraint, and then taking each set of weights
that are supposed to remain equal and replacing them with their
average. Using this trick, it has been shown to be possible to
create recurrent nets where the output feeds back to the input, and
to train them to emulate Turing machines. It has been shown that
nets can learn small feature detectors which are applied to all
parts of an image and whose outputs then go to another net. It has
been shown that it is even possible to put two nets in series, and
switch out just the second one when trying to learn a new problem,
thus allowing "life-long" learning in the first one. It appears that
neural nets can be very powerful pattern recognizers, though learning
may be slow.

It is also possible to build analog neural networks in hardware.
A 10-TeraFLOP network in a 1-inch cube using less than a watt was
demonstrated several years ago, and there is great potential for
improvement on this. Of course, for such an analog machine,
a "FLOP" means an analog addition or multiplication of numbers with
only about not much more than a dozen bits of accuracy. But that
is enough for neural networks. In general, any hardware that
allows learning with backprop can easily be modified to allow
reinforcement-learning algorithms.

## More Information

*Back to Glossary Index*