AI/RI JOINT SEMINAR


WHEN:   Friday, Sept 30, 1994; 3:30 pm - 5:00 pm
        Refreshments will be served starting at 3:15 pm

WHERE:  ADAMSON WING Auditorium in Baker Hall

SPEAKER: Stephen Omohundro
	 International Computer Science Institute
	 University of California at Berkeley

TITLE:  Model Merging for Learning and Recognition

ABSTRACT:

Many important areas of computer science would be dramatically
impacted by more powerful ways of representing, learning, and
manipulating uncertain information. Decision making under uncertainty
arises directly in areas like machine vision and speech understanding,
and indirectly throughout software and hardware engineering. As the
size and richness of information systems increases, so does the
importance of learning underlying models from data rather than
building them by hand.

Many currently used learning algorithms work by adjusting the
parameters of a fixed complex model to better fit training data. Such
models are often: cognitively implausible, incapable of one-shot
learning, subject to over-fitting, unable to adaptively allocate
resources where they are needed, slow to learn, computationally
expensive and liable to get stuck in local minima. We describe an
approach to learning and recognition which we call "model merging"
that aims to alleviate these problems. It is based on a coherent
underlying Bayesian semantics and is applicable to both learning and
recognition tasks. The approach constructs models whose structure
adapts to the training data, putting more modelling resources where
they are needed and where they can be reliably validated. It can build
arbitrarily complex models without overfitting. 

The first application we describe is to the induction of stochastic
grammars from example strings. Hidden Markov models are the simplest
stochastic grammars and are widely used in speech recognition and
other application areas. The maximum likelihood model directly encodes
the data strings. The model merging approach starts with this model
and repeatedly merges HMM states. It does an excellent job at finding
the correct underlying model topology and outperforms standard
approaches, particularly on small training sets. We have applied this
approach to building word models for speech recognition and the
resulting system outperforms a hand-constructed one. A similar
approach works for inducing stochastic context free grammars.

The second application we describe is learning smooth nonlinear
constraint surfaces. Such constraints arise in many situations in
geometric domains such as vision, graphics, robotics, etc. In this
case the local models are patches of the constraint surface which are
blended together by smooth influence functions. During learning,
surface patches may be merged and the merged patch fit from a more
complex model class as the data warrants. The "bumptree" data
structure makes queries on such representations quite efficient. We
have used these techniques for building models of the "space of lips"
in a lip reading application. The learned surface is used to improve
the performance of a snake-based lip tracker, to non-linearly
interpolate between frames, and to provide relevant coordinates for
later recognition stages.

Host: 		Andrew Moore (awm@cs.cmu.edu)
Appointment: 	Marie Elm (mke@cs.cmu.edu)