# Neural Networks

Neural networks are simple, nonlinear function approximators. A weighted sum of the inputs is passed through a nonlinear function, typically a sigmoid such as the hyperbolic tangent. The result is called the output of the first neuron. This is then repeated several times using different weights, to get outputs for several neurons. In a single-hidden-layer neural net, the output of the network is a linear combination of these outputs. In a multiple-hidden-layer neural network, this is repeated several times, each time treating the outputs of the previous layer as the inputs to the next layer. It has been shown that all smooth functions can be approximated to arbitrary precision by a single-hidden-layer network with enough neurons.

Neural networks are often used in supervised learning by training them with the error backpropagation algorithm. This algorithm simply does gradient descent on the squared error in the outputs of the network. Of course, the squared error in the outputs for a single training example is just an unbiased estimate of the mean squared error (where "mean" is the average over all training examples). So Backprop is an example of stochastic gradient descent. It has been proved that when the learning rate decreases appropriately and the weights do not diverge, the algorithm has guaranteed convergence to a local minimum with probability one. Similar results can be shown for reinforcement-learning algorithms using residual algorithms.

Backprop can also be applied to neural networks where certain weights are constrained to be equal to each other. Gradient descent in that case is equivalent to doing one step of gradient descent without the constraint, and then taking each set of weights that are supposed to remain equal and replacing them with their average. Using this trick, it has been shown to be possible to create recurrent nets where the output feeds back to the input, and to train them to emulate Turing machines. It has been shown that nets can learn small feature detectors which are applied to all parts of an image and whose outputs then go to another net. It has been shown that it is even possible to put two nets in series, and switch out just the second one when trying to learn a new problem, thus allowing "life-long" learning in the first one. It appears that neural nets can be very powerful pattern recognizers, though learning may be slow.

It is also possible to build analog neural networks in hardware. A 10-TeraFLOP network in a 1-inch cube using less than a watt was demonstrated several years ago, and there is great potential for improvement on this. Of course, for such an analog machine, a "FLOP" means an analog addition or multiplication of numbers with only about not much more than a dozen bits of accuracy. But that is enough for neural networks. In general, any hardware that allows learning with backprop can easily be modified to allow reinforcement-learning algorithms.