Summary of Research prior to 1993

Individual systems have been developed that explore various facets in the creation of a learning robot agent.

Learning, planning, even simply reacting to events, is difficult without having reliable knowledge of the outcomes of actions. The usual human-programming method of defining ``what actions do'' is both inefficient and often ineffectual. Christiansen analyzed conditions under which robotic systems can learn action models allowing for automated planning and successful execution of strategies. He developed systems that generated action models consisting of sets of funnels where each funnel mapped a region of task action space to a reduced region of the state space. He demonstrated that such funnels can be acquired for continuous tasks using negligible prior knowledge and that a simple planner was sufficient for then generating plans when the learning mechanism is robust to noise and non-determinism and the planner is capable of reasoning about the reliabilities associated with each action model.

Once action models have been discovered, sensing to decide which action to take can have varying costs. The time it takes a physical sensor to obtain information varies widely from sensor to sensor. Hero's camera, a passive device, is an order of magnitude faster than active sensing using a wrist mounted sonar. Yet, sonar information is more appropriate than vision when ambiguity exists about the distance to an object. This lead Tan to investigate learning cost-effective strategies for using sensors to approach and classify objects. He developed a cost sensitive learning system called CSL for Hero which, given a set of unknown objects and models of both sensors and actions, learns where to sense, which sensor to use, and which action to apply.

Learning to model sensors involves capturing knowledge independent of any particular environment that a robot might face while learning typical environments in which the robot is known to operate. Thrun investigated learning such models by combining artificial neural networks and local, instance-based learning techniques. He demonstrated that learning these models provides an efficient means of knowledge transfer from previously explored environments to new environments.

A robot acting in a real world situation must respond quickly to changes in its environment. Two disparate approaches have been investigated. Blythe & Mitchell developed an autonomous robot agent that initially constructed explicit plans to solve problems in its domain using prior knowledge of action preconditions and postconditions. This ``Theo-agent'' converges to a reactive control strategy by compiling previous plans into stimulus-response rules using explanation based learning. The agent can then respond directly to features in the environment with an appropriate action by querying this rule set.

Conversely, Lin applied artificial neural network based reinforcement learning techniques to create reactive control strategies without any prior knowledge of the effects of robot actions. The agent receives from its environment a scalar performance feedback constructed so that maximum reward occurs when the task is completed successfully and typically some form of punishment is presented when then agent fails. The agent must then maximize the cumulative reinforcements, which corresponds to developing successful strategies for success at the task. By using artificial neural networks, Lin demonstrated that the agent was able to generalize to unforeseen events and to survive in moderately complex dynamic environments. However, although reinforcement learning was more successful than action compilation at self-improvement in a real-world domain, convergence of learning was typically longer and the plans produced at early stages of learning were dangerous to the robot.

Another issue in robot learning is the question of when to exploit current plans or to explore further in the hope of discovering hidden shortcuts. Thrun evaluated the impact of exploration knowledge in tabula rasa environments, where no a priori knowledge, such as action models, is provided, demonstrating the superiority of one particular directed exploration rule, counter-based exploration.

Bibliography:

Jim Blythe and Tom M. Mitchell. ``On Becoming Reactive''. In Proceedings of the Sixth International Machine Learning Workshop , pages 255-259, Morgan Kaufmann, June 1989.

Alan D. Christiansen. ``Automatic Acquisition of Task Theories for Robotic Manipulation''. PhD Thesis, Carnegie Mellon University, March 1992. (Also appears as CMU Technical Report CMU-CS-92-111). Long-Ji Lin. ``Reinforcement Learning for Robots Using Neural Networks''. PhD Thesis, Carnegie Mellon University, January, 1993. (Also appears as CMU Technical Report CMU-CS-93-103).

Tom M. Mitchell. ``Becoming increasingly reactive''. In Proceedings of the Eight National Conference on Artificial Intelligence , pages 1051-1059, AAAI Press/MIT Press, 1990.

Ming Tan. ``Cost-Sensitive Robot Learning''. PhD Thesis, Carnegie Mellon University, 1992. (Also appears as CMU Technical Report CMU-CS-91-134).

Sebastian B. Thrun. ``Efficient Exploration in Reinforcement Learning''. Technical Report CMU-CS-92-102. School of Computer Science, Carnegie Mellon University, 1992

Sebastian B. Thrun. ``Exploration and Model Building in Mobile Robot Domains''. In Proceedings of the IEEE International Conference on Neural Networks . IEEE Press, March, 1993.

Last Updated: 17Jan96 15:00 josullvn+@cs.cmu.edu