12:00, 6 Nov 1996, WeH 7220

	   Cascade Learning with Extended Kalman Filtering

			   Michael Nechyba

Abstract:

In this talk, I will describe a neural network learning architecture
which combines the idea of cascade learning (Fahlman) with the
node-decoupled extended Kalman filtering (NDEKF) training algorithm.
I will show that incremental cascade learning and NDEKF complement
each other well by compensating for each other's weakness. On the one
hand, the idea of training one hidden unit at a time and adding hidden
units in a cascading fashion offers a good alternative to the ad hoc
selection of a network architecture. Quickprop and other
gradient-descent techniques, however, become less efficient in
optimizing increasingly correlated weights as the number of hidden
units rises. This is where NDEKF performs well by explicitly
calculating the approximate conditional error covariance matrix. On
the other hand, NDEKF can easily become trapped in bad local minima if
a network architecture is too redundant. Cascade learning accommodates
this well by training only a small subset of all the weights at one
time. I will also show that, within this framework, allowing variable
activation functions in the hidden units is not only feasible, but
also very desirable. The overall learning architecture converges to
better local minima in many fewer epochs than neural networks trained
with gradient-descent techniques.