 Members  Meetings  Robots   The Robot Learning Laboratory   Research  Publications  Resources  
We have informal meetings almost every Thursday at 5pm in Wean Hall room 7220.
The purpose of the meetings is to make people of the robot learning community of CMU and visitors aware of interesting papers and results that have come to the attention of the speakers, and also for the informal exchange of ideas.
Food is provided :)
If you have any questions concerning our meetings, please contact Geoff Gordon at "ggordon AT cs DOT cmu DOT edu". If you'd like to suggest a change to this web page, please contact Joelle Pineau or Max Likhachev at "maxim+ AT cs DOT cmu DOT edu".
Wed 10/15/03 12.00pm NSH 1505 
Geoff Gordon 
Solving extensiveform games and playing onecard poker

Spring 2003
Here is a tentative schedule of upcoming
presentations.
Thu 07/31/03 5.00pm Wean 7220 
Drew Bagnell 
Covariant Policy Search (IJCAI'03)
Most standard policy search reinforcement learning algorithms exhibit noncovariant behavior during the learning process. That is, the "steepest descent" rule depends on the exact parameterization of a policy class we are considering, and not simply on the policy class itself. One may analyze these algorithms in informationgeometric terms and discover that there is indeed a version of "steepest descent" that is natural and covariant. Although the approach we describe here is motivated by purely theoretical considerations, the resulting algorithm obeys Vapnik's maxim that "Nothing is more practical than a good theory": in practice, the resulting covariant algorithm has lead to the best known performance in some benchmarks RL applications and often outperforms canonical gradient methods by orders of magnitude. 
Thu 07/24/03 5.00pm Wean 7220 
Brendan McMahan 
Planning in the Presence of Cost Functions Controlled by an Adversary (ICML'03)
We investigate methods for planning in a Markov Decision Process where the cost function is chosen by an adversary after we fix our policy. As a running example, we consider a robot path planning problem where costs are influenced by sensors that an adversary places in the environment. We formulate the problem as a zerosum matrix game where rows correspond to deterministic policies for the planning player and columns correspond to cost vectors the adversary can select. For a fixed cost vector, fast algorithms (such as value iteration) are available for solving MDPs. We develop efficient algorithms for matrix games where such best response oracles exist. We show that for our path planning problem these algorithms are at least an order of magnitude faster than direct solution of the linear programming formulation. 
Thu 07/10/03 5.00pm Wean 7220 
Geoff Gordon 
Tutorial on Support Vector Machines

Thu 07/03/03 5.00pm Wean 4623 
Vandi Verma 
Variable Resolution Particle Filter Particle filters are used extensively for tracking the state of nonlinear dynamic systems. This paper presents a new particle filter that maintains samples in the state space at dynamically varying resolution for computational efficiency. Resolution within statespace varies by region, depending on the belief that the true state lies within each region. Where belief is strong, resolution is fine. Where belief is low, resolution is coarse, abstracting multiple similar states together. The resolution of the statespace is dynamically updated as the belief changes. The proposed algorithm makes an explicit biasvariance tradeoff to select between maintaining samples in a biased generalization of a region of state space versus in a high variance specialization at fine resolution. Samples are maintained at a coarser resolution when the bias introduced by the generalization to a coarse resolution is outweighed by the gain in terms of reduction in variance, and at a finer resolution when it is not. Maintaining samples in abstraction prevents potential hypotheses from being eliminated prematurely for lack of a sufficient number of particles. Empirical results show that our variable resolution particle filter requires significantly lower computation for performance comparable to a classical particle filter. 
Thu 06/19/03 5.00pm Wean 4623 
Mike Montemerlo  Thesis practice talk

Thu 05/08/03 5.00pm Wean 7220 
Vandi Verma  Efficient Monitoring for Planetary Rovers Planetary rovers operate in environments where human intervention is expensive, slow, unreliable, or impossible. It is therefore essential to monitor the behavior of these robots so that contingencies may be addressed before they result in catastrophic failures. This monitoring needs to be efficient since there is limited computational power available on rovers. We propose an efficient particle filter for monitoring faults that combines the Unscented Kalman Filter (UKF) [7] and the Variable Resolution Particle Filter (VRPF) [16]. We begin by using the UKF to obtain an improved proposal distribution for a particle filter which tracks discrete fault variables as part of its state space. This requires computing an unscented transform for every particle and every possible discrete transition to a fault or nominal state at each instant in time. Since there are potentially a large number of faults that may occur at any instant, this approach does not scale well. We use the VRPF to address this concern. The VRPF tracks abstract states that may represent single states or sets of states. There are many fewer transitions between states when they are represented in abstraction. We show that the VRPF in conjunction with a UKF proposal improves performance and may potentially be used in large state spaces. Experimental results show a significant improvement in efficiency. 
Thu 05/01/03 5.30pm Wean 7220 
Dieter Fox  People Tracking and MultiRobot Exploration This talk consists of two parts. In the first part I will present some recent work we did on particle filterbased people tracking. In the context of tracking multiple people using a combination of anonymous and id sensors installed throughout the environment, particles represent assignments between objects and observations (similar to multi hypothesis tracking). To estimate the location of people using GPS (outdoors) or very noisy, sparse indoor data, we project particles onto graph structures. Such a representation has two key advantages. First, less samples are needed to estimate a location on the graph. Second, the graph provides a discretization of continuous trajectories, thereby making the learning of long term behaviour patterns much easier. In the second part I will describe a hierarchical Bayesian approach used for multi robot exploration. The technique allows multiple robots to merge their maps even under complete uncertainty about their relative start locations. 
Thu 03/20/03 5.00pm Wean 7220 
Trey Smith 
Brenda Ng, Leonid Peshkin, Avi Pfeffer, "Factored Particles for Scalable
Monitoring," UAI02
This paper presents a new family of approximate monitoring algorithms that combines the best qualities of the particle filtering and BoyenKoller methods. Our algorithms maintain an approximate representation the belief state in the form of sets of factored particles, that correspond to samples of clusters of state variables. Empirical results show that the factored particle method outperforms ordinary particle filtering. 
Thu 03/13/03 5.00pm Wean 7220 
Brendan McMahan 
I'll discuss an algorithm for online convex programming, and try to motivate it
with some interesting problems from a robotics perspective, including an online shortest path
problem. This is work being done here at CMU by Martin Zinkevich:
Martin Zinkevich, "Online Convex Programming and Generalized Infinitesimal Gradient Ascent," CMU Technical Report CMUCS03110. Convex programming involves a convex set F and a convex functionc:F>R. The goal of convex programming is to find a point in F which minimizes c. In this paper, we introduce online convex programming. In online convex programming, the convex set is known in advance, but in each step of some repeated optimization problem, one must select a point in F before seeing the cost function for that step. This can be used to model factory production, farm production, and many other industrial optimization problems where one is unaware of the value of the items produced until they have already been constructed. We introduce an algorithm for this domain, apply it to repeated games, and show that it is really a generalization of infinitesimal gradient ascent, and the results here imply that generalized infinitesimal gradient ascent (GIGA) is universally consistent. 
Thu 03/06/03 5.00pm Wean 7220 
Joelle Pineau and Geoff Gordon 
Belief compression in POMDPs; Poisson PCA.

Thu 2/6/03 5.00pm Wean 7220 
Drew Bagnell 
Hinfinity control

Thu 1/30/03 5.00pm Wean 7220 
Meeting cancelled.


Thu 1/23/03 5.00pm Wean 7220 
Allison Bruce 
Atkeson and Morimoto, "Nonparametric Representation of Policies and Value Functions: A TrajectoryBased Approach", NIPS 2002. (Cont'd.)
Second half of last week's talk. 
Thu 1/16/03 5.00pm Wean 7220 
Allison Bruce 
Atkeson and Morimoto, "Nonparametric Representation of Policies and Value Functions: A TrajectoryBased Approach", NIPS 2002
This paper describes a method for reinforcement learning in which policies and value functions are represented along selected trajectories. Local linear models are used to estimate the dynamics of the system. Given a trajectory, the first and second derivatives of the value function and the first derivatives of the policy can be updated in a backwards sweep that is derived from the discrete time Riccati equations. In addition to explaining this approach and describing its application to two simple tasks (pendulum swinging and hopping), I'll attempt to relate it to some of the control theory that we learned last semester. Additional reading: Atkeson, "Using Local Trajectory Optimizers to Speed Up Global Optimization in Dynamic Programming", NIPS 94. Stengel, R. F., "Optimal Control and Estimation", pp.270284. 
Fall 2002
Thu 12/05/02 5.00pm Wean 7220 
Geoff Gordon 
Designing a simple controller using the tricks of feedback linearization and proportional feedback.

Thu 11/28/02 5.00pm Wean 7220 
No meeting  Thanksgiving.


Thu 11/21/02 5.00pm Wean 7220 
Meeting cancelled.


Thu 11/14/02 5.00pm Wean 7220 
Rosemary EmeryMontemerlo 
Optimality Conditions for control of linear problems.
Stengel, R. F., "Optimal Control and Estimation", Section 3.4. 
Thu 11/7/02 5.00pm Wean 7220 
Andrew Stein and Dave Ferguson 
How to derive linear controllers and Matlab demo of adaptive robot arm.
[Slides&pictures(.tgz)]

Thu 10/31/02 5.00pm Wean 7220 
Mike Montemerlo 
More fun with control: Bode plots, root locus, frequency response and block diagrams.
Stengel, R. F., "Optimal Control and Estimation", Chapter 2. 
Thu 10/24/02 5.00pm Wean 7220 
Matt Rosencrantz 
How to find a deterministic optimal trajectory.
Stengel, R. F., "Optimal Control and Estimation", Sections 3.13.4. 
Thu 10/17/02 5.00pm Wean 7220 
Brendan McMahan 
Behavior of linear ODEs and how to solve them using Laplace transform methods.
[SLIDES (.pdf)] Stengel, R. F., "Optimal Control and Estimation", Sections 2.3, 2.5, 2.6. Examples 2.6.1. 
Thu 10/10/02 5.00pm Wean 7220 
No meeting.


Thu 10/3/02 5.00pm Wean 7220 
Drago Anguelov 
Learning Object Maps from Laser Range Data
In this talk I will motivate the need for learning symbolic maps that represent the world in terms of objects. I will describe an algorithm that learns a map in terms of highlevel objects  walls and doors  rather than in terms of occupancy grids. The algorithm uses a probabilistic generative model, that describes the shape and motion properties of the objects in the environment. It also includes a realistic sensor model, which addresses the phenomenon of geometric occlusion and specifies how the objects in the environment generate the robot's laser range readings. I will show how the algorithm uses EM to construct a map of an office environment containing both static walls and dynamic doors. 
Thu 9/26/02 5.00pm Wean 7220 
Rahul Biswas 
A Hierarchical NonStationary Object Mapping Algorithm for Mobile Robots
I will present some of our recent work in mapping environments where objects change location over time. Using a set of static maps collected by a mobile robot, we learn highfidelity shape models of both objects and classes of objects. I will discuss our techniques for leveraging multiple observations to improve models, solving the data association problem between observations and obejcts, and clustering objects into classes. I will also show a technique for determining how many objects and classes exist even when only a subset of objects is present in any map. 
Spring 2002
Thu 05/16/02 5.00pm Wean 7220 
Rosemary Emery and Mike Montemerlo 
ICRA redux

Thu 05/10/02 5.00pm Wean 7220 
Rosemary Emery 
TBA

Thu 05/03/02 5.00pm Wean 7220 
No meeting.


Thu 04/25/02 5.00pm Wean 7220 
Mike Montemerlo 
Michael Littman. "PROVERB: the Probabilistic Cruciverbalist"

Thu 04/09/02 5.00pm Wean 7220 
Allison Bruce 
Tim Oates, Laura Firoiu, and Paul R. Cohen. Clustering time series with
hidden markov models and dynamic time warping. In working notes of the
IJCAI99 workshop on Neural, Symbolic and Reinforcement Learning Methods for
Sequence Learning, pages 1721, 1999.
This paper presents a method for automatically determining K, the number of generating HMMs, and for learning the parameters of those HMMs. An initial estimate of K is obtained by unsupervised clustering of the time series using dynamic time warping (DTW) to assess similarity. In addition to producing an estimate of K, this process yields an initial partitioning of the data. As later sections will explain, DTW is related to HMM training algorithms but is weaker in several respects. Therefore, the clusters based on DTW are likely to contain mistakes. These initial clusters serve as input to a process that trains one HMM on each cluster and iteratively moves time series between clusters based on their likelihoods given the various HMMs. Ultimately the process converges to a final clustering of the data and a generative model (the HMMs) for each of the clusters. 
Thu 02/28/02 5.00pm Wean 7220 
Sungwoo Park 
In this talk, I'll show you several works on extending traditional framework
of programming languages to incorporate probabilistic expressions, and
introduce my work in this direction, whose main idea is to use sampling
functions to represent probabilistic quantities.
Traditional framework of programming languages assumes that sufficient information is available at the time of actual computation. However, it fails to model systems in which uncertainty is inherent. In order to model such systems, various probabilistic modeling languages have been developed. The problem with all these approaches is that they are primarily built upon discrete computational framework only. As a consequence, it is hard to encode continuous probabilistic quantities, which frequently arise in the robotics applications, under this discrete computation framework. In this talk, I will introduce a different approach to express probabilistic quantities. The main idea is simple: instead of probability measures, we exploit sampling functions, that is, measurable mappings from Borel sets to sample domains. This approach brings both discrete and continuous probability distributions (and even weird distributions which do not belong to either class) under a unified framework of computation. I will also discuss an application of this approach to a real robotics problem, and its drawbacks (and hopefully strengths). 
Thu 02/21/02 5.00pm Wean 7220 
Tom Minka 
Most informative projections
Visualization allows quick determination of an appropriate data model, and projection is an intuitive way to visualize highdimensional data. I will present new projection techniques for classification and regression data. In contrast to unsupervised methods like PCA, these projections deliberately try to preserve the dependence or `mutual information' between the input and output variables, thereby revealing what sort of classifier or regression function you should be using. For classification, they are typically much more informative than Fisher projection, which assumes the classes are Gaussian with equal covariance. Furthermore, with the right choice of objective function, the optimal projections can be computed quickly. 
Thu 01/31/02 5.00pm Wean 7220 
Nick Roy 
"Algorithms for Inverse Reinforcement Learning"
Andrew Ng and Stuart Russell
This paper addresses the problem of inverse reinforcement learning IRL in Markov decision processes, that is, the problem of extracting a reward function given observed, optimal behavior. IRL may be useful for apprenticeship learning to acquire skilled behavior, and for ascertaining the reward function being optimized by a natural system. We first characterize the set of all reward functions for which a given policy is optimal. We then derive three algorithms for IRL. The first two deal with the case where the entire policy is known; we handle tabulated reward functions on a finite state space and linear functional approximation of the reward function over a potentially infinite state space. The third algorithm deals with the more realistic case in which the policy is known only through a finite set of observed trajectories. In all cases, a key issue is degeneracy  the existence of a large set of reward functions for which the observed policy is optimal. To remove degeneracy, we suggest some natural heuristics that attempt to pick a reward function that maximally differentiates the observed policy from other, suboptimal policies. This results in an efficiently solvable linear programming formulation of the IRL problem. We demonstrate our algorithms on simple discrete/finite and continuous/infinite state problems. 
Thu 01/24/02 5.00pm Wean 7220 
David Hershberger 
Randomized Kinodynamic Planning (1999)
Steven M. LaValle, James J. Kuffner, Jr.
Abstract: This paper presents a statespace perspective on the kinodynamic planning problem, and introduces a randomized path planning technique that computes collisionfree kinodynamic trajectories for high degreeof freedom problems. By using a state space formulation, the kinodynamic planning problem is treated as a 2ndimensional nonholonomic planning problem, derived from an ndimensional configuration space. The state space serves the same role as the configuration space for basic path planning; however, standard randomized path planning techniques do not directly apply to planning trajectories in the state space. We have developed a randomized planning approach that is particularly tailored to kinodynamic problems in state spaces, although it also applies to standard nonholonomic and holonomic planning problems. The basis for this approach is the construction of a tree that attempts to rapidly and uniformly explore the state space, offering benefits that are similar to those obtained by successful randomized planning methods, but applies to a much broader class of problems. Some preliminary results are discussed for an implementation that determines kinodynamic trajectories for hovercrafts and satellites in cluttered environments, resulting in state spaces of up to 12 dimensions. 
Thu 01/17/02 5.00pm Wean 7220 
Chuck Rosenberg 
Partially Labeled Classification with Markov Random Walks (2001)
Martin Szummer and Tommi Jaakkola
To classify a large number of unlabeled examples we combine a limited number of labeled examples with a Markov random walk representation over the unlabeled examples. The random walk representation exploits any low dimensional structure in the data in a robust, probabilistic manner. We develop and compare several estimation criteria/algorithms suited to this representation. This includes in particular multiway classification with an average margin criterion which permits a closed form solution. The time scale of the random walk regularizes the representation and can be set through a marginbased criterion favoring unambiguous classification. We also extend this basic regularization by adapting time scales for individual examples. We demonstrate the approach on synthetic examples and on text classification problems. 
Fall 2001
We have a draft schedule of upcoming
presentations.
Thu 11/29/01 5.00pm Wean 7220 
Vandi Verma  Risksensitive particle filters  We propose a particle filter that incorporates a model of cost when generating particles. The approach is motivated by the observation that the cost of accidentally not tracking hypotheses might be significant in some areas of state space, and irrelevant in others. By incorporating a cost model into particle filtering, states that are more critical to the system performance are more likely to be tracked. Automatic calculation of the cost model is implemented using an MDP value function calculation that estimates the value of tracking a particular state. Experiments in two mobile robot domains illustrate the appropriateness of the approach. 
Thu 11/15/01 5.00pm Wean 7220 
No meeting  
Thu 11/8/01 5.00pm Wean 7220 
No meeting  
Thu 11/1/01 5.00pm Wean 7220 
John Langford  True Error bounds for Neural Networks  I will describe a technique for bounding the true error of a stochastic neural network using only training examples. This technique has been succesfully tested resulting in quantitatively meaningful true error bounds with just 100 learning examples. This is joint work with Rich Caruana. 
Thu 10/25/01 5.00pm Wean 7220 
Mike Montemerlo  RaoBlackwellised Particle Filtering for Dynamic Bayesian Networks  
Thu 10/18/01 5.00pm Wean 7220 
Allison Bruce  How to make robots more likeable  I'll be presenting the results of an experiment that I did this fall that relates a robot's use of expressiveness and motion to its success at performing a social task. I'll also talk a little bit about some work I'm doing now to try to use HMMs to recognize simple behaviors (such as approach and avoidance) so a robot can predict how a person will respond to it and behave appropriately. 
Thu 10/11/01 5.00pm Wean 7220 
Dimitris Margaritis  A Bayesian Multiresolution Independence Test for Continuous Variables  In this paper I present a method of computing the posterior probability of conditional independence of two or more continuous variables from data, examined at several resolutions. The motivation for this kind of test is in learning the structure of Bayesian networks by algorithms that use independence tests to infer the network structure, in domain s that contain any mix of continuous, ordinal and categorical variables. 
Thu 10/4/01 5.00pm Wean 7220 
No meeting  
Thu 9/27/01 5.00pm Wean 7220 
Joelle Pineau  Policy invariance under reward transformations: Theory and application to reward shaping  Paper by: Andrew Y. Ng, Daishi Harada, Stuart Russell. 
Thu 9/20/01 5.00pm Wean 7220 
Geoff Gordon  Brainstorming about Naive Bayes  
Thu 9/4/01 5.00pm Wean 7220 
Doug Aberdeen  Gradient Ascent on POMDP Policy Graphs  Two new gradient descent algorithms for learning to use finitestate controllers as memory in POMDPs. Comparisons to older methods. Heaven, Hell, and robot navigation. 
Thu 8/30/01 5.00pm Wean 7220 
Geoff Gordon  Notes on conditioning  I'll talk about a paper from last NIPS which compares gradient descent to Kalman filtering as a model of different types of conditioning in animals. Conclusion: Kalman filtering predicts actual behavior better in backward blocking experiments. 
Spring 2001
Thu 4/26/01 5.00pm Wean 7220 
Jaimie Schulte  Multiagent planning under uncertainty  
Thu 4/19/01 5.00pm Wean 7220 
Eric Klavins  Decentralized Algorithms for Cyclic Robotic Systems  
Thu 4/12/01 5.00pm Wean 7220 
Mike  MCMC  
Thu 4/5/01 5.00pm Wean 7220 
John Langford  Decision Theoretic Particle Filters  
Thu 3/29/01 5.00pm Wean 7220 
Dimitris Margaritis  UAI  
Thu 3/22/01 5.00pm Wean 7220 
Sungwoo Park  Frob  
Thu 3/15/01 5.00pm Wean 7220 
Greg Armstrong  How robot's work  
Thu 3/8/01 5.00pm Wean 7220 
No Meeting  
Thu 3/1/01 5.00pm Wean 7220 
No Meeting  
Thu 2/22/01 5.00pm Wean 7220 
No meeting  
Thu 2/15/01 5.00pm Wean 7220 
Vandi Verma  Bayesian Fault Detection and Diagnosis in Dynamic Systems  
Thu 2/8/01 5.00pm Wean 7220 
Nick Roy  Active Learning  
Thu 2/1/01 5.00pm Wean 7220 
Frank Dellaert  RANSAC  I will describe RANSAC, a stochastic search algorithm that is used a lot in computer vision to find a good set of correspondences in the context of motion estimation. 
Thu 1/25/01 5.00pm Wean 7220 
Tom Minka  ``Mixture Kalman filters''  and RaoBlackwellization from ``Sequential Monte Carlo methods for Dynamic Systems'' and ``On Sequential SimulationBased Methods for Bayesian Filtering'' 
Fall 2000
Thu 12/7/00 5.00pm Wean 7220 
Dimitris Margaritis  ``Approximate Bayes Net Inference''
I'll present a paper of Daphne Koller, Lerner and Angelov from UAI99: "A General Algorithm for Approximate Inference and its Applications to Hybrid Bayes Nets". 
Thu 11/30/00  NIPS*2000 (no meeting)  
Thu 11/23/00  Thanksgiving break (no meeting)  
Thu 11/16/00 4.30pm Wean 7220 
Joelle Pineau  ``Hierarchical Policy Constraint for POMDP Planning and Execution''
POMDPs provide a useful framework for decisionmaking in the presence of uncertainty, however finding an optimal policy is computationally infeasible for large problems. I will present a hierarchical approach to POMDPs which takes advantage of structure in the domain to find policies for complex tasks. The idea is to divide the action set along domainspecific task decompositions. A POMDP problem is thereby brokendown into a hierachically related set of smaller problems (i.e. subtasks). A value function is learned locally (i.e. related to the local set of actions) for each subtask, thereby yielding local policies. This extends the work on hierarchical reinforcement learning (Dietterich, NIPS'99) to the POMDP framework; the task decomposition and state abstraction are similar, however I propose significantly different planning and execution algorithms to ensure consistent belief states between subtasks, and to accomodate partial observability of subtask completion. This work is applied to the problem of dialogue management, for which I will present results showing that planning time was significantly reduced compared to a conventional POMDP, while the policy performed much better than when using a greedy heuristic approximation. 
Thu 11/9/00 5.00pm Wean 7220 
Alexander Gray  ``A Tutorial on Gaussian Processes''
I'll explain what Gaussian processes are, and why they are interesting. I may also talk about Gaussian process networks (which won the ICML2000 best paper award), and how to overcome the computational problem of learning Gaussian processes using exciting and fabulous new CMU algorithms! 
Thu 11/2/00 5.00pm Wean 7220 
John Langford  ``A Tutorial on Wavelets in Statistics'' 
Thu 10/26/00 5.00pm Wean 7220 
Chuck Rosenberg  ``Semisupervised Pattern Recognition
Approaches in Vision''
An overview of two papers: Recognition of Planar Object Classes M.C. Burl and P. Perona CVPR 96, San Francisco, CA, June 1996 ftp://vision.caltech.edu/pub/techreportsvision/CVPR96recog.ps.gz Unsupervised Learning of Models for Recognition M. Weber, M. Welling and P. Perona ECCV 2000, Dublin, Ireland, June 2000 ftp://vision.caltech.edu/pub/techreports/ECCV00recog.ps.gz 
Thu 10/19/00 5.00pm Wean 7220 
Daniel Nikovski  ``PEGASUS Reinforcement Learning''
An overview of the ICML2000 paper by Ng and Jordan. 
Spring 2000
Thu 5/11/00 5.00pm Wean 7220 
Liam Pedersen  ``Bayes Networks on Ice: The Robotic Search for Antarctic Meteorites''
In January 2000 the Nomad robot was deployed to the ice sheets of Antarctica and made the first ever autonomous field identification of a meteorite by a robot. Nomad's rock/meteorite classifier is based on a Bayes network which addresses several issues unique to classifying rocks from a mobile robotic vehicle: the ambiguity of rock classes, the nontrivial costs of deploying sensors, noisy data, limited training examples and the need to incorporate human expert opinions. I'll also describe the Robotic Antarctic Meteorite Search project and the latest results from the recent expedition 2000 to the Elephant Moraine in Antarctica. 
Thu 5/4/00 5.00pm Wean 7220 
John Langford  ``Making Sample Complexity Bounds Quantitatively Useful'' 
Thu 4/27/00 5.00pm Wean 7220 
Nick Roy  ``Finding Approximate POMDP Solutions through Belief Compression''
Human beings take for granted their ability to move around and interact with each other in a wide range of environments. These same environments are tremendous sources of uncertainty for mobile robots. Not only does navigation introduce positional uncertainty, but systems that try to interact with human beings are faced with the tremendously noisy and ambiguous behaviours that humans exhibit. The Partially Observable Markov Decision Process (POMDP) constitutes a particular planning methodology that attempts to model uncertainty in the way that a system acts and senses the world. POMDPs are unfortunately useless for most real world problems due to the overwhelming computational complexity involved with planning in belief spaces. I propose an method for finding an approximation of the optimal POMDP policy by compressing the full belief state into statistics that represent the belief space compactly, and demonstrate the utility of this technique in the domains of mobile robot navigation and speech dialogue management. 
Thu 4/20/00 5.00pm Wean 7220 
Eric Bauer  ``Boosted Mixture Models'' 
Thu 4/13/00 5.00pm Wean 7220 
Daniel Nikovski  ``Pattern discovery via entropy minimization, parameter extinction, and deterministic annealing''
I will present three papers by Matthew Brand on several novel techniques for learning probabilistic models and pattern discovery. Entropic priors are derived from the basic premise that the world is predictable, and actually make sense, unlike most other priors people have been using. Parameter extinction is a related technique that produces sparse probability tables and might even make the learned models interpretable by humans. Finally, deterministic annealing overcomes shallow local maxima in likelihood and produces quasiglobal maxima, which improves the quality of the models. The combination of these techniques results in a method for learning the structure of probabilistic models, which is much better and faster than the usual testanddiscard approach. The three papers are available online, if you want to check them out in advance: http://www.merl.com/projects/structure/index.html 
Thu 4/6/00 5.00pm Wean 7220 
Dieter Fox  ``MultiHypothesis Tracking''
I will present some of the work done in Stuart Russell's group on traffic surveillance. One of the key problems in this context is to detect that cars observed at different cameras are actually the same car (object identification). They apply techniques similar to those used in the data association community to solve this problem.
Suggested Paper:Hanna Pasula, Stuart Russell, Michael Ostland, and Ya'acov Ritov,Tracking many objects with many sensors. In Proc. IJCAI99, Stockholm, 1999. 
Thu 3/23/00 5.00pm Wean 7220 
John Langford  ``Improvements in MultiRobot Localization'' 
Thu 3/16/00 5.00pm Wean 4625 
Mike Gross  ``PERSES: A VisionBased Interactive Mobile Shopping Assistant''
Mobile robot with omnicam vision and gesturebased interaction. 
Thu 3/9/00 5.00pm Wean 7220 
Frank Dellaert  ``Structure from Motion with EM'' 
Thu 3/2/00 5.00pm Wean 7220 
Sebastian Thrun  ``SVM Tutorial'' 
Thu 2/10/00 5.00pm Wean 7220 
Joelle Pineau  ``Introduction to Hierarchical Reinforcement Learning''
Suggested Papers:Dietterich, T. G., 1998, The MAXQ method for hierarchical reinforcement learning, ICML 1998.Dayan, P. and Hinton, G. E., 1993, Feudal Reinforcement Learning, NIPS 5. Parr, R. and Russell, S., 1997 Reinforcement Learning with Hierarchies of Machines, NIPS 1997. In this talk I will present an introduction to hierarchical reinforcement learning. Hierarchical decomposition of reinforcement learning (RL) problems has been used to accelerate learning in largescale MDP problems by exploiting structure in the problem domain. The components of the hierarchy typically consist of related subtasks with varying degrees of specialization, which can independently learn optimal policies. The papers listed above each present a different approach to hierarchical RL. I will explain the general motivation, characteristics, advantages and limitations of hierarchical RL, and will present a few examples drawn from the Dietterich(1998) and Dayan&Hinton(1993) papers. 
Thu 2/3/00 5.00pm Wean 7220 
Daniel Nikovski  ``An Introduction to Variational Methods for Graphical Models''
Paper by M. I. Jordan, Z. Ghahramani, T.S. Jaakkola and L.K. Saul In M.I. Jordan (editor) Learning in Graphical Models, p. 105, Dordrecht: Kluwer Academic Publishers (1998). This paper presents a tutorial introduction to the use of variational methods for inference and learning in graphical models. We present a number of examples of graphical models, including QMRDT database, the sigmoid belief network, the Boltzmann machine, and several variants of hidden Markov models, in which it is infeasible to run exact inference algorithms. We then introduct variational methods, showing how upper and lower bounds can be found for local probabilities, and discussing methods for extending these bounds to bounds on global probabilities of interest. Finally we return to the examples and demonstrate how variational algorithms can be formulated in each case. 
Talks from previous semesters are linked below:
ALL
1999
Spring ,
Fall
1998
Spring ,
Fall
1997
Fall