|| Members | Meetings | Robots |||The Robot Learning Laboratory||| Research | Publications | Resources ||
These are the robot learning talks of previous semesters. Recent talks are here .
|Thu 12/09/99 at 5.00pm in 7220||Andreas Zell||``Paper to be announced''
|Thu 11/18/99 at 5.00pm in 7220||Vandi Verma||``Improving the Mean Field Approximation via the Use of Mixture Distributions''
Paper by T. S. Jaakkola and M. I Jordan
In M.I. Jordan (editor) Learning in Graphical Models, p. 163, Dordrecht: Kluwer Academic Publishers (1998).
Mean field methods provide computationally efficient approximations to posterior probability distributions for graphical models. Simple mean field methods make a completely factorized approximation to the posterior, which is unlikely to be accurate when the posterior is multimodal. Indeed, if the posterior is multi-modal, only one of the modes can be captured. To improve the mean field approximation in such cases, we employ mixture models as posterior approximations, where each mixture component is a factorized distribution. We describe efficient methods for optimizing the parameters in these models.
|Thu 11/11/99 at 5.00pm in 7220||Johannes Reuter||``Mobile Robot Pose Tracking and self-localization based on low-level features''
J. Reuter, "Reducing Localization Errors by Scan-Based Multiple Hypothesis Tracking", CIRA'99, 6p.
J. Reuter, "Mobile Robot Self-Localization Using PDAB", ICRA'00 submission, 6p.
|Thu 11/4/99 at 5.00pm in 7220||Nick Roy||``A Hierarchical Community of Experts''
Paper by G.E. Hinton, B. Sallans and Z. Ghahramani
In M.I. Jordan (editor) Learning in Graphical Models, p. 479, Dordrecht: Kluwer Academic Publishers (1998).
We describe a directed acyclic graphical model that contains a hierarchy of linear units and a mechanism for dynamically selecting an appropriate subset of these units to model each observation. The non-linear selection mechanism is a hierarchy of binary units each of which gates the output of one of the linear units. There are no connections from linear units to binary units, so the generative model can be viewed as a logistic belief net (Neal 1992) which selects a skeleton linear model from among the available linear units. We show that Gibbs sampling can be used to learn the parameters of the linear and binary units even when the sampling is so brief that the Markov chain is far from equilibrium.
|Thu 10/28/99 at 5.00pm in 7220||Jose Maria Caas Plaza||``Introduction to Monte Carlo Methods''
Paper by D.J.C. MacKay
In M.I. Jordan (editor) Learning in Graphical Models, p. 175, Dordrecht: Kluwer Academic Publishers (1998).
This chapter describes a sequence of Monte Carlo methods: importance sampling, rejection sampling, the Metropolis method, and Gibbs sampling. For each method, we discuss whether the method is expected to be useful for high-dimensional problems such as arise in inference with graphical models. After the methods have been described, the terminology of Markov chain Monte Carlo methods is presented. The chapter concludes with a discussion of advanced methods, including methods for reducing random walk behaviour.
|Thu 10/21/99 at 5.00pm in 7220||Jamie Schulte||``A Mean Field Learning Algorithm for Unsupervised Neural Networks''
Paper by L. Saul and M.I. Jordan
In M.I. Jordan (editor) Learning in Graphical Models, Dordrecht: Kluwer Academic Publishers (1998).
We introduce a learning algorithm for unsupervised neural networks based on ideas from statistical mechanics. The algorithm is derived from a mean field approximation for large, layered sigmoid belief networks. We show how to (approximately) infer the statistics of these networks without resort to sampling. This is done by solving the mean field equations, which relate the statistics of each unit to those of its Markov blanket. Using these statistics as target values, the weights in the network are adapted by a local delta rule. We evaluate the strengths and weaknesses of these networks for problems in statistical pattern recognition.
|Thu 10/14/99 at 5.00pm in 7220||David Cohn||``Latent Variable Models''
Paper by C.M. Bishop
In M.I. Jordan (editor) Learning in Graphical Models, p. 371, Dordrecht: Kluwer Academic Publishers (1998).
|Thu 10/7/99 at 5.00pm in 7220||Chuck Rosenberg||``Chain Graphs and Symmetric Associations''
Paper by T. S. Richardson
In M.I. Jordan (editor) Learning in Graphical Models, p. 231, Dordrecht: Kluwer Academic Publishers (1998).
|Thu 9/30/99 at 5.00pm in 7220||Eric Bauer||``Learning Hybrid Bayesian Networks from Data''
Paper by S. Monti and G. F. Cooper
In M.I. Jordan (editor) Learning in Graphical Models, p. 521, Dordrecht: Kluwer Academic Publishers (1998).
|Thu 9/23/99 at 5.00pm in 7220||Dimitris Margaritis||``Learning Bayesian Networks with Local Structure''
Paper by N. Friedman and M. Goldszmidt
In M.I. Jordan (editor) Learning in Graphical Models, p. 421, Dordrecht: Kluwer Academic Publishers (1998).
|Thu 9/16/99 at 5.00pm in 7220||John Langford||``An Information-Theoretic Analysis of Hard and Soft Assignment Methods for Clustering''
Paper by M.J. Kearns, Y. Mansour and A.Y. Ng
In M.I. Jordan (editor) Learning in Graphical Models, p. 495, Dordrecht: Kluwer Academic Publishers (1998).
|Thu 9/9/99 at 5.00pm in 7220||Frank Dellaert||``A View of the EM Algorithm that Justifies Incremental, Sparse, and Other Variants''
Paper by R.M. Neal and G. E. Hinton
The EM algorithm performs maximum likelihood estimation for data in which some variables are unobserved. We present a function that resembles negative free energy and show that the M step maximizes this function with respect to the model parameters and the E step maximizes it with respect to the distribution over the unobserved variables. From this perspective, it is easy to justify an incremental variant of the EM algorithm in which the distribution for only one of the unobserved variables is recalculated in each E step. This variant is shown empirically to give faster convergence in a mixture estimation problem. A variant of the algorithm that exploits sparse conditional distributions is also described, and a wide range of other variant algorithms are also seen to be possible.
In M.I. Jordan (editor) Learning in Graphical Models, pp. 355-368, Dordrecht: Kluwer Academic Publishers (1998).
|Thu 9/2/99 at 5.00pm in 7220||Hartmut Surmann||``Learning in Robotics Using Fuzzy Control and Genetic Algorithms
|Thu 4/22/99 at 5.00pm in 7220||Dieter Fox||``Multi-robot localization with particle filters
This paper presents a statistical algorithm for collaborative mobile robot localization. Our approach uses a sample-based version of Markov localization, capable of localizing mobile robots in an any-time fashion. When teams of robots localize themselves in the same environment, probabilistic methods are employed to synchronize each robot's belief whenever one robot detects another. As a result, the robots localize themselves faster, maintain higher accuracy, and high-cost sensors are amortized across multiple robot platforms. The paper also describes experimental results obtained using two mobile robots, using computer vision and laser range finding for detecting each other and estimating each other's relative location. The results, obtained in an indoor office environment, illustrate drastic improvements in localization speed and accuracy when compared to conventional single-robot localization.
|Thu 4/15/99 at 5.00pm (Nick: when God intended it to be) in 7220||Hannes Kruppa||``Relative Multi-Robot Localization
In this talk I will present a statistical algorithm for relative multi-robot localization. Relative localization requires the ability to detect other robots as well as to compute their position in terms of relative angle and distance. Mutual detection is enabled by using low-cost color cameras and algorithms from computer-vision. The current detection algorithm allows image processing at frame rate and achieves high specifity. Whenever a robot is recognized, the image is further examined to compute its relative angle. The relative distance is then calculated based on the robot's laser range finder and sensor-fusion. Since sensor data is inevitably noisy, a probabilistic perception model has been trained from data using maximum likelihood estimation. Upon each detection event, this enables the algorithm to capture the inherent uncertainty associated with angle and distance measurements. The algorithm has been successfully used in a collaborative variant of MCL, which Dieter is going to talk about next week.
|Thu 4/8/99 at 5.30pm in 7220||Chuck Rosenberg||``The PBIL and COMIT Optimization Algorithms
I will discuss work done by Shumeet Baluja, Rich Caruana, and Scott Davies over the last couple of years on probabilistic algorithms for combinatorial optimization. The algorithms (PBIL and COMIT) are offered as more principaled approaches than genetic algorithms for finding good solutions to these types of problems. It is interesting to re-examine this work because it has a lot in common with the recent sample based navigation approaches taken by our group.
The papers can be found at: http://www.cs.cmu.edu/afs/cs/user/baluja/www/techreps.html#PBIL/GA
|Thu 4/1/99 at 5.30pm in 7220||Dimitris Margaritis||``Practical Bayesian Nets for Everybody
It is my experience that inducing the structure of Bayesian nets does not have to be as difficult or exotic as it sounds. In this talk I will present a couple of practical algorithms for constructing Bayesian nets of restricted (but still useful) classes. The algorithms are so simple that it took me an day or so to implement them. I am hoping that by presenting them some of you will be encouraged to use them in your own research as a means of modeling dependencies among variables when you are not sure of the structure of them, but have only some idea of the class that they may belong to.
Lastly, I would like to present a structure discovery algorithm for general BNs that I have been working on recently, and invite any comments and criticism that you may have.
|Thu 3/25/99 at 5.30pm in 7220||Nicholas Roy||`` "Robo-Soar: An integration of external
interaction, planning and learning using Soar" by Laird et al.
I plan to give sort of a general overview of Soar, and then talk specifically (but still informally!) about the results of this paper. Hopefully this will start a discussion about why general symbolic AI architectures and Soar in particular appear to be not so great when it comes time to interact with the real world, and whether in fact this appearance is justified or whether we all should keep on feeling pleased with our current approach to interacting with the real world.
|Thu 3/18/99 at 5.30pm in 7220||Frank Dellaert||``Subdivision Curves for Large Scale
'Subdivision curves' are one of these immensely cool things one hopes to use someday, just for the sheer fun of it. In today's talk, I will explain what they are and how to use them, and I will make the connection with wavelets and multi-resolution curves. The main point of the talk is to show that they might actually be useful to solve large scale mapping problems, as encountered by me (mosaics), and Joelle (evidence grids), using the mathematical framework of multigrid relaxation methods (which I will *not* go into at great length).
|Thu 3/11/99 at 5.30pm in 7220||Joelle Pineau||``Introduction to Fuzzy Logic and a Fuzzy
I will be giving an introduction to mysteries of fuzzy logic, its origins, purpose, and a quick tutorial on how to apply it to a basic control problem. This will be followed by a discussion of a section of the paper: "Real-time map building and navigation for autonomous robots in unknown environments" (G. Oriolo, G. Ulivi and M. Vendittelli, 1998 - see abstract below). More specifically, I'll be presenting their mapping algorithm, which uses fuzzy sets to generate grid-based maps of the environment. A fuzzy logic vs probability theory discussion may follow depending on time availability and group interest.
|Thu 2/18/99 at 5.30pm in 7220||Jamieson E Schulte||``Spontaneous, Short-Term Interaction with Mobile Robots
Human-robot interaction has been indentified as one of the major open research directions in mobile robotics. This talk considers a specific type of interaction: short-term and spontaneous interaction between a robot and crowds of people. Such patterns of interaction are found when service robots operate in public places, including information kiosk, receptionist, and tour-guide applications. I will describe our approach to this type of interaction: a robot designed to be a believable social agent. For this and future robotic excursions, I would also like to ask, "what role can learning play in human-robot interaction?"
|Thu 2/11/99 at 5.30pm in 7220||Frank Dellaert||``Bayesian Statistics without Tears
I'll discuss the paper 'Bayesian Statistics without Tears: A Sampling-Resampling perspective' by Smith and Gelfand. This is an excellent and short paper that, amongst other topics, describes SIR (sampling-importance resampling), the basis for the Condensation algorithm and Monte Carlo Localization.
|Thu 2/4/99 at 5.30pm in 7220||Frank Dellaert||``Vision-Based Monte Carlo Localization using Large Scale Ceiling Mosaics
Monte Carlo Localization (MCL) is a revolutionary new method to solve the mobile robot localization problem, based on a particle filter (known in the CV community as the Condensation algorithm). It is easy to implement, yet is an order of magnitude faster than comparable robot localization methods, while being more accurate. In my talk I will discuss how to use MCL in conjunction with large scale ceiling mosaics and a simple vision based sensor. The first half of the talk will focus on building these mosaics from a large set of noisy and distorted images, gathered by the robot. In the second half I discuss how this mosaic is used in the MCL framework, and I present experimental results from the Minerva robotic tour-guide project.
|Thu 1/28/99 at 5.30pm in 7220||Dieter Fox||``Monte Carlo Localization
Abstract not available.
|Thu 1/21/99 at 5.30pm in 7220||Nick Roy||``Topological Mapping
This is going to be a fairly informal session where I will present the (relatively classic) paper by Kuipers & Byun on the merits of topological mapping. I will argue for the virtues of topological maps vs. geometric maps, hopefully riling Sebastian, Dieter & Wolfram enough to argue back." A copy of the paper is at: file://ftp.cs.utexas.edu/pub/qsim/papers/Kuipers+Byun-jras-91.ps.Z
|Thu 12/3/98 at 5pm in 7220||John Langford||``Probabilistic Inequalities
Fun ways to play with them.
|Thu 11/19/98 at 5pm in 7220||Daniel Nikovski||``Learning Bayes Nets with hidden state
How to learn a bayes net with hidden state.
|Thu 11/12/98 at 5pm in 7220||Daniel Nikovski||``Bayes Nets with hidden state
Today's talk will be the second part of the tutorial on reasoning and learning with Bayesian nets. Last week Dimitris talked about prior distributions and learning parameters and structure when all variables are completely observable. I will start out with what was left of last week's agenda - the message-passing and join-tree algorithms, and then go on in the following directions:
|Thu 11/5/98 at 5pm in 7220||Dimitris Margaritas||``Bayes Nets
Learning Bayes nets with fully observable variables as well as present an algorithm for exact Bayesian inference on a BN.
|Thu 10/29/98 at 5pm in 7220||Nicholas Roy||``Coastal Navigation
I'll be talking about motion planner developed for Minerva, the all-singing, all-dancing, tour-guide robot that operated in the Smithsonian this summer. Many of the challenges that arose from the Minerva project were a result of the environment; many large, open spaces and way too many little kids resulted in a uncomfortably high probability that Minerva might periodically get lost. Thus, coastal navigation was born, to reduce the likelihood that Minerva would lose her positional knowledge and emit an irritating whine, "I'm confused". For those distressed by the non-technical nature of this abstract, I say "Partially Observable Markov Decision Processes". http://gs135.sp.cs.cmu.edu/papers/icra99.ps.gz
|Thu 10/22/98 at 5pm in 7220||John Langford||``Microchoice Learning Bounds
I will present bounds which take advantage of extra degrees of freedom to provide bounds which are better for some hypothesis than for others. These bounds can be applied to the structure of many existing learning algorithms in order (sometimes) justify tigheter bounds than the standard PAC bounds.
|Thu 10/15/98 at 5pm in 7220||Daniel Nikovski||``System Identification for Reinforcement
I will try to give a short tutorial on system identification (SI) algorithms, with an emphasis on their possible use in reinforcement learning. SI is a mature discipline that in fact subsumes connectionist modeling, without being limited to it only. I will review the classical SI techniques such as ARIMA and ARMAX, their recursive on-line modifications, the instrumental variable method, and then discuss in more detail subspace-based methods. The example application of subspace-based system identification will be on learning local visual models for robotic soccer.
|Thu 10/8/98 at noon in 4603||Steffen Gutmann||``The CS Freiburg Robotic Soccer Team
Robotic soccer is a challenging research domain because problems in robotics, artificial intelligence, multi-agent systems and real-time reasoning have to be solved in order to create a successful team of robotic soccer players. This talk presents the architecture and key components of the CS Freiburg team. The focus is set on the self-localization and object recognition method based on using laser range finders and the integration of all this information into a common team world model. Using the explicit model of the environment built by these components, path planning, simple ball handling skills and basic multi-agent cooperation have been implemented. The talk concludes by pointing out the key components of the CS Freiburg team that led to the success at RoboCup'98, where CS Freiburg won the competition in the middle size league. Further information at: http://www.informatik.uni-freiburg.de/~gutmann/ and http://www.informatik.uni-freiburg.de/~ki/forschung/robocup.html
|Thu 5/28/98||Tom Mitchell||``Co-training: Learning from
Labelled and Unlabelled Data''
Brain teaser: Suppose you have the task of learning to classify web page (e.g., as academic course pages, student home pages, etc.). I give you a small set of labeled training example pages (positive and negative). I also give you a large number of unlabelled pages (e.g., 300,000,000 of them). Question: what learning algorithm can you think of that will use the unlabeled pages to boost accuracy of your learned classifier?
That's what I'll talk about. If you'd like to check it out in advance, then see the paper on Co-Training at http://www.cs.cmu.edu/afs/cs.cmu.edu/project/theo-11/www/wwkb/colt98_final.ps
|Thu 5/14/98||Peter Stone||``Team-Partitioned, Opaque-Transition
Reinforcement Learning (TPOT-RL)''
In this talk, I present a novel multi-agent learning paradigm called team-partitioned, opaque-transition reinforcement learning (TPOT-RL). TPOT-RL introduces the concept of using action-dependent features to generalize the state space. In our work, we use a learned action-dependent feature space. TPOT-RL is an effective technique to allow a team of agents to learn to cooperate towards the achievement of a specific goal. It is an adaptation of traditional RL methods that is applicable in complex, non-Markovian, multi-agent domains with large state spaces and limited training opportunities. Multi-agent scenarios are opaque-transition, as team members are not always in full communication with one another and adversaries may affect the environment. Hence, each learner cannot rely on having knowledge of future state transitions after acting in the world. TPOT-RL enables teams of agents to learn effective policies with very few training examples even in the face of a large state space with large amounts of hidden state. The main responsible features are: dividing the learning task among team members, using a very coarse, action-dependent feature space, and allowing agents to gather reinforcement directly from observation of the environment. TPOT-RL is fully implemented and has been tested in the robotic soccer domain, a complex, multi-agent framework. This talk presents the algorithmic details of TPOT-RL as well as empirical results demonstrating the effectiveness of the developed multi-agent learning approach with learned features.
There is a link to a paper of the same title at http://www.cs.cmu.edu/~pstone/pstone-papers.html.
|Thu 5/7/98||Reid Simmons||``Task-Level Robot Control: Languages and Tools''
Years of experience with mobile robots has taught the research community a few things about what architectures, data structures and algorithms are useful for robot systems. Unfortunately, in many cases these insights remain beyond the reach of most roboticists, mainly due to the difficulty of understanding and re-implementing the solutions. To help promulgate such results, we advocate the development of general robot-programming languages and support tools that embed concepts that robotics researchers have found useful. In particular, we are working on a "Task Definition Language" that contains explicit support for robot executive functions (concurrency, task decomposition, resource management, monitoring and exception handling). This talk will introduce the language and the design and analysis tools we are building to support the language.
|Thu 4/30/98||Chuck Rosenberg||``Behavior Based Robot
I will discuss two behavior controlled (à la Brooks) robot systems I worked on at IS Robotics, Inc. The first is named Kaa. Kaa is a small autonomous snake-like robot. It has twelve redundant degrees of freedom, all a single plane. A behavior control system implemented basic mobility and grasping algorithms for this system. The second robot I will discuss is named IT. The IT robot consists of a vaguely human shaped head with an expressive face and is meant to interact with people via simulated emotional responses. When specific changes to the environment are sensed, the behavior control system changes the system's internal "emotional state" which in turn changes IT's behavior. For example, when it got dark, IT would become "sad" and eventually "cry".
For more information check out:
The corresponding paper is:
|Thu 4/23/98||Daniel Nikovski||``Visual Servo Control for
I will present Greg Hager's work with Chris Rasmussen on robot navigation using visual servoing on image sequences. Their approach to robot localization is based on the idea that a location is defined by the visual features seen by the robot. Scenes with the same features are grouped in the same equivalence class and this division into classes is used to build a map of the environment in the form of a graph. Navigating between two locations is performed by planning a sequence of node traversals in the graph and executing the traversals by means of visual servoing and handoffs between segments of the trajectory. I will discuss the features used for servo control, and time permitting, other types and applications of visual servoing.
The corresponding paper is:
Rasmussen, C. and Hager, G. D. (1996). Robot navigation using image
sequences, Proceedings of the AAAI Conference on Artificial
Intelligence, pp. 938-943, 1996.
A general introduction to visual servoing is available online too:
Hutchinson, S., Hager, G.D., and Corke, P. (1996). A tutorial on
visual servo control. IEEE Transactions on Robotics and Automation,
12(5) pp. 651-670, 1996.
|Thu 4/16/98||Jamieson Schulte||``Teleo-Reactive Agent Control''
I will describe the Teleo-Reactive (T-R) agent control formalism developed by Nils Nilsson at the Stanford University Robotics Laboratory. A T-R tree is a mapping from states to actions that resembles a control algorithm implemented in an electronic circuit in that the state is continually evaluated based on sensor inputs. With a careful ordering of states within the tree, useful sequences of actions are possible despite the low-level state-action mapping and subsumptive architecture. To ensure that there is some robot learning content, I will summarize some progress that has been made in supervised learning of T-R sequences.
After presenting T-R trees and some experimental results, I will encourage group discussion about the applicability of this approach to different types of robot tasks.
Much of my talk will be based on the paper:
Nilsson, N. (1994)
"Teleo-Reactive Programs for Agent Control",
Journal of Artificial Intelligence Research, Volume 1, pages 139-158.
|Thu 4/9/98||Joseph O'Sullivan||``Specifing transferable knowledge in
lifelong learning agents''
I'll shall begin with a short summary of results from my research into
transfer mechanisms for lifelong learning agents. This should provide
a basis for discussing my current research direction -- improving
lifelong learning agents by explicitly specifying knowledge to be
transfered. Consider that when you are asked to learn a task, you
bring to bear specific knowledge related to the type of task being
learned. As a simple example, if I asked you to learn doors in Wean
Hall, you would already know to discount changes in light. The
questions I'd like to raise are, what are such types of knowledge,
and how can they be used by a lifelong learning agent.
|Thu 4/2/98||Wolfram Burgard||``Integrating Global Position Estimation and Position
Tracking for Mobile Robots: The Dynamic Markov Localization Approach''
Localization is one of the fundamental problems of mobile robots. In order
to efficiently perform useful tasks such as office delivery, mobile robots
must know their position in their environment. Existing approaches can be
distinguished according to the type of localization problem they are designed
to solve. Tracking techniques aim at monitoring the robot's position. They
assume that the position is initially known and that the robot does not loose
track of its position. Global localization techniques, on the other hand, are
able to estimate the robot's position starting with complete uncertainty. In
this paper we present the dynamic Markov localization technique as a uniform
approach to position estimation, which is able to (1) globally estimate the
position of the robot, (2) to efficiently track its position whenever the
robot's certainty is high, and (3) to detect and recover from localization
failures. The approach has been implemented and intensively tested in
real-world environments. We present several experiments illustrating the
strength of our method.
|Thu 3/19/98||John Langford||``Probabilistic Graphplan''
I'll talk about the (graph) representation which graphplan uses. Then
I'll talk about some probabilistic extensions Avrim and I have worked out
for graphplan which allow planning in MDP domains. I'll come up with some
examples to illustrate the usage of graphplan.
|Thu 3/5/98||Frank Dellaert||``The Kalman Filter, Part II: Kalman Returns''
This week I will continue, as requested, the introduction to the Kalman
filter. I hope to have time to discuss at least one real world example in
|Thu 2/26/98||Frank Dellaert||``That Magical Tool: The Kalman Filter''
Taking the hint from last time, I'll give an overview of the magical tool named the Kalman filter. I will illustrate its usefulness with three scenarios where I have applied it:
|Thu 2/19/98||Nick Roy||``Online Self-Calibration For Mobile Robots''
I will be talking about work that Sebastian and I have been doing on developing a probabilistic model of the kinematics of a mobile robot. The goal is to parameterise the motion model in order to correct position estimates given by odometry, or dead-reckoning. In contrast to previous approaches, which require explicit measurements of actual motion when calibrating a robot, the algorithm proposed here uses the robot's sensors to automatically calibrate the robot as it operates.
I'll discuss the formalism, show some experimental results, and
be fairly laid back about the whole thing.
|Thu 11/20/97||Chuck Rosenberg||``Special Purpose GP's for Evolving Artificial
I will focus on describing the work of Frederic Gruau. His research has centered on using genetic search to generate neural network solutions to specific problems instead of using more traditional techniques like backpropagation. This, in and of itself, is not new. What is unique about his work is that the genetic search evolves a special purpose genetic program which when run, outputs a neural network. The evolved genetic program encodes a developmental process, very loosely analogous to a biological developmental process, which can construct a neural network including its weights and general architecture. This can be contrasted with a genetic algorithm approach which might encode the connection weights in a network as a matrix of values.
His claim is that by the choice of genetic program operators which allow for the construction of neural networks with highly regular architectures, solutions to problems can be found faster than other systems which evolve neural networks. Indeed, in the case of the solutions for the problems Gruau presents in his papers, the neural networks are quite regular in their architecture.
He has successfully used this system to evolve solutions which can solve n-bit parity and n-bit symmetry in the general case as well as controllers for a simulated six legged walking robot and a real eight legged walking robot. He has also experimented with the Baldwin effect and Lamarkian learning.
The three papers I will discuss are:
F. Gruau. Automatic Definition of Modular Neural Networks. Adaptive Behavior V3N2, pages 151-183, 1995. MIT press. Available online at: http://www.cwi.nl/~gruau/gruau/AB.ps.Z
F.Gruau. and Quatramaran. Cellular Encoding for Interactive Evolutionary Robotics. ECAL97, pages ???, MIT Press, 1997. Available online at: http://www.cwi.nl/~gruau/gruau/e.ps.gz
F. Gruau and D. Whitley. The cellular developmental of neural networks: the interaction of learning and evolution. Research report RR93-04, Laboratoire de l'Informatique du Parallelisme, Ecole Normale Superieure de Lyon, 1993. Available online at: ftp://ftp.ens-lyon.fr/pub/LIP/Rapports/RR/RR93/RR93-04.ps.Z
If you are only going to read one paper, I would suggest reading the first paper, which gives a good general overview of his technique.
For more information, refer to Gruau's
|Thu 11/5/97||Daniel Nikovski||``Learning Discrete Dynamical Systems''
Learning a model of the world is a fundamental task for intelligent autonomous agents. In the field of automatic control, system identification methods have been used with much success. These methods, though, seem to have had little impact on AI systems, which traditionally model the world with discrete, propositional variables without uncertainty. The relatively recent realization that planning under uncertainty and optimal control are essentially the same thing, has generated a lot of interest in decision-theoretic planning, POMDPs, and system identification of symbolic dynamics. I will discuss some representative work on learning discrete dynamical systems, coming out of Brown University. The paper is:
Basye, K., Dean, T., and Kaelbling, L. (1995). "Learning Dynamics: System Identification for Perceptually Challenged Agents", Artificial Intelligence, Volume 72, Pages 139-171, available from http://www.cs.brown.edu/people/tld/postscript/BasyeDeanandKaelblingAIJ-95.ps.
The author's approach can be summarized as follows. Discrete dynamical systems are modeled with deterministic finite-state automata. Uncertainty is not inherent in the system, but comes from inability to observe states and state transitions correctly. The number of states, actions, and percepts is known to the learner. Under these assumptions, the authors present algorithms for the cases of fully deterministic dynamics, deterministic actions and uncertain perception, stochastic action and fully observable states, and finally, for the case when both action and perception are stochastic. The algorithms are applied to the task of learning large maps of the environment of mobile robots. I will discuss the positive and negative features of those algorithms as regards their applicability to decision-theoretic planning and control.
|Thu 10/30/97||Peter Stone||I will present some robot learning work out of Minoru Asada's lab at
Osaka University. I will cover a few innovations in autonomous
control using reinforcment learning, including learning from easy
missions, state/action space construction, and combining independent
learned behaviors. Time permitting, I will also briefly touch
their work in adaptive visual servoing.
The Asada lab homepage is at http://mike.ccm.eng.osaka-u.ac.jp/Welcome-eg.html
|Thu 10/23/97||Karen Zita Haigh||``Cost-sensitive Robot Learning''
Robot learning aims to improve the connection of a robot's perception to action. However, the majority of work in robot learning has not dealt realistically with perception and action. A common assumption is that sensors provide a complete picture of the external world at no cost and actions can be executed instantly. In reality, robots only have limited resources. There are always costs (e.g., sensing or moving speed) associated with perception and action. This dissertation is a study of cost-sensitive robot learning that considers such costs explicitly. The main thesis is that a learning robot should focus on both learning accuracy and learning efficiency and that it is possible to gradually improve the former without sacrificing the latter.
Specifically, this dissertation describes a cost-sensitive learning system called CSL that learns to perform an approach-recognize task for a real robot. Given a set of unknown objects and the speeds of sensors and actions, CSL learns where to sense, which sensor to use, and which action to apply. As a result, it learns to approach, recognize, and grasp objects efficiently. CSL integrates robot learning with robot control to handle sensing noise and grasping failures and to balance efficiency and effectiveness. Compared with traditional learning approaches, in experiments involving 15 training objects and 11 test objects, CSL is an order of magnitude faster during training and at least 30% faster during testing. Its overall success rate on novel objects is over 80% and many failures can be eliminated through incremental learning.
This dissertation also presents a unified framework for cost-sensitive learning of classification knowledge from examples and constructing examples from scratch. Within this framework, various cost-sensitive learning methods can be instantiated. This dissertation develops two such cost-sensitive learning methods: a decision-tree-based method and an instance-based method, and it evaluates the tradeoff between their learning efficiency and learning accuracy.
This dissertation demonstrates the generality of cost-sensitive learning in three ways. First, it uses cost-sensitive learning to accomplish the approach-recognize task. Second, it applies cost-sensitive learning to construct a task-dependent internal representation for use by a reinforcement learning method as it learns a decision policy. Experimental results confirm the efficiency of the overall approach.
Talk summary (postscript gzipped)
|Thu 10/16/97||Joseph O'Sullivan||``Learning from Demonstration''
Paper (postscript gzipped)
|Thu 10/9/97||David Andre||``What GP has done for us lately?''
Genetic programming is a machine learning technique for automatically generating computer programs to solve a given task by means of simulated evolution. Although GP has only been around in force in the 90s, it has been successfully applied to a wide variety of problems. Starting with simple programmatic ingredients and a means for testing the fitness of programs, GP breeds a population of computer programs using artificial analogs of the natural processes of crossover and mutation. GP has been used to solve a wide variety of 'benchmark' problems from the fields of AI, machine learning, neural networks, control theory, game theory, molecular biology, classification, and system identification. In recent years (1995-1997), GP has been used to create solutions to several problems that rival solutions created by humans. These success stories include the areas of evolving electrical circuits, solving hard molecular biology problems, theorem provers, automatic parallel programming, evolving neural networks, and control theory.
The goal of this talk is to introduce the concepts and ideas of GP and evolutionary computation and place GP in the field of ML and show its relevance to both AI and robot learning. The ``fitness'' of GP as a tool will be discussed, as will the areas where we all should still be skeptical about GP.