Newsgroups: comp.ai.neural-nets
Path: cantaloupe.srv.cs.cmu.edu!das-news2.harvard.edu!news2.near.net!news.mathworks.com!uhog.mit.edu!news.kei.com!ub!acsu.buffalo.edu!jn
From: jn@cs.Buffalo.EDU (Jai Natarajan)
Subject: Re: # of hidden nodes for Radial Basis Function Networks ?
Message-ID: <Cy7B49.Avw@acsu.buffalo.edu>
Originator: jn@merak.cs.Buffalo.EDU
Sender: nntp@acsu.buffalo.edu
Nntp-Posting-Host: merak.cs.buffalo.edu
Organization: State University of New York at Buffalo/Computer Science
References:  <1994Oct23.165843.30929@cc.usu.edu>
Date: Mon, 24 Oct 1994 23:40:56 GMT
Lines: 36


In article <1994Oct23.165843.30929@cc.usu.edu>, slq15@cc.usu.edu writes:
|> 	I am working on a character classification problm
|> using backprop and radial basis function networks, using Neuralware.
|> I am somewhat familiar with backprob, but not much with RBFNs.
|> Is there any rule of thumb for finding #of nodes in prototype
|> layer in RBFNs ? Seems to me that they use much more hidden nodes
|> in RBFN than in BP. I have 35 input features and 62 output classes.
|> Typically, I would try 60-80 hidden nodes for BP. Example of RBFN given
|> with Neuralware uses 20 nodes in prototype layer for only 2 i/p
|> and 3 o/ps, so I was wondering if you need to use many more
|> hidden (i.e. prototype layer) nodes for RBFN than in BP. I have
|> tried around 200 hidden nodes for RBFN, but performance is much
|> worse than BP (90% vs 70%). Are there any other pitfalls or
|> fine tuning that should be taken care of for RBFNs ? Any
|> comments will be appriciated.
|>  
You're right in concluding that RBFs require a lot of hidden layer nodes. I
worked with RBFs and BP for the OCR problem & found the BP to be superior
for this reason. The centroids for the first layer are set by unsupervised
learning, and the hidden layer nodes are meant to vastly expand the
dimensionality of the space in whihc you are trying to separate the classes. Now
the OCR space is really quite complex. I saw a paper on phoneme classificn with
RBFs where 14 input nodes required ~180 hidden layer nodes to satisfactorily
model the space. With 35 input nodes, you can imagine the size. if you want to do
this on a 486 or something PC based it's just not practical. I would say you
might need close to 400-450 hidden layer nodes. This automaitcally means more
test samples as well. If after the unsupervised stage you have nodes which rarely
or never fired and so are unhelpful you have to be able to junk them.
The main advantage is the training goes on in two steps independantly & the second
layer can even be trained using a closed form solution. I recall the paper was by
Rohwer and Renals at U. of Edinburgh - will try & get the reference.

Jai Natarajan
Dept. of Computer Science
State U. of New York, Buffalo
