12:00, 6 Nov 1996, WeH 7220 Cascade Learning with Extended Kalman Filtering Michael Nechyba Abstract: In this talk, I will describe a neural network learning architecture which combines the idea of cascade learning (Fahlman) with the node-decoupled extended Kalman filtering (NDEKF) training algorithm. I will show that incremental cascade learning and NDEKF complement each other well by compensating for each other's weakness. On the one hand, the idea of training one hidden unit at a time and adding hidden units in a cascading fashion offers a good alternative to the ad hoc selection of a network architecture. Quickprop and other gradient-descent techniques, however, become less efficient in optimizing increasingly correlated weights as the number of hidden units rises. This is where NDEKF performs well by explicitly calculating the approximate conditional error covariance matrix. On the other hand, NDEKF can easily become trapped in bad local minima if a network architecture is too redundant. Cascade learning accommodates this well by training only a small subset of all the weights at one time. I will also show that, within this framework, allowing variable activation functions in the hidden units is not only feasible, but also very desirable. The overall learning architecture converges to better local minima in many fewer epochs than neural networks trained with gradient-descent techniques.