AI/RI JOINT SEMINAR WHEN: Friday, Sept 30, 1994; 3:30 pm - 5:00 pm Refreshments will be served starting at 3:15 pm WHERE: ADAMSON WING Auditorium in Baker Hall SPEAKER: Stephen Omohundro International Computer Science Institute University of California at Berkeley TITLE: Model Merging for Learning and Recognition ABSTRACT: Many important areas of computer science would be dramatically impacted by more powerful ways of representing, learning, and manipulating uncertain information. Decision making under uncertainty arises directly in areas like machine vision and speech understanding, and indirectly throughout software and hardware engineering. As the size and richness of information systems increases, so does the importance of learning underlying models from data rather than building them by hand. Many currently used learning algorithms work by adjusting the parameters of a fixed complex model to better fit training data. Such models are often: cognitively implausible, incapable of one-shot learning, subject to over-fitting, unable to adaptively allocate resources where they are needed, slow to learn, computationally expensive and liable to get stuck in local minima. We describe an approach to learning and recognition which we call "model merging" that aims to alleviate these problems. It is based on a coherent underlying Bayesian semantics and is applicable to both learning and recognition tasks. The approach constructs models whose structure adapts to the training data, putting more modelling resources where they are needed and where they can be reliably validated. It can build arbitrarily complex models without overfitting. The first application we describe is to the induction of stochastic grammars from example strings. Hidden Markov models are the simplest stochastic grammars and are widely used in speech recognition and other application areas. The maximum likelihood model directly encodes the data strings. The model merging approach starts with this model and repeatedly merges HMM states. It does an excellent job at finding the correct underlying model topology and outperforms standard approaches, particularly on small training sets. We have applied this approach to building word models for speech recognition and the resulting system outperforms a hand-constructed one. A similar approach works for inducing stochastic context free grammars. The second application we describe is learning smooth nonlinear constraint surfaces. Such constraints arise in many situations in geometric domains such as vision, graphics, robotics, etc. In this case the local models are patches of the constraint surface which are blended together by smooth influence functions. During learning, surface patches may be merged and the merged patch fit from a more complex model class as the data warrants. The "bumptree" data structure makes queries on such representations quite efficient. We have used these techniques for building models of the "space of lips" in a lip reading application. The learned surface is used to improve the performance of a snake-based lip tracker, to non-linearly interpolate between frames, and to provide relevant coordinates for later recognition stages. Host: Andrew Moore (awm@cs.cmu.edu) Appointment: Marie Elm (mke@cs.cmu.edu)