17 November 1993, 3:30pm, WeH 4601 Talk to joint meeting of Reinforcement Learning group and Manipulation group A Vision-Based System for the Learning of Pushing Manipulation Marcos Salganicoff* The General Robotics and Active Sensory Perception (GRASP) Lab Department of Computer and Information Science University of Pennsylvania Philadelphia, PA 19104 sal@grip.cis.upenn.edu *Joint work with G. Metta, A. Oddera and G. Sandini, LIRA Lab, University of Genova, Italy. Abstract We describe an approach for combining image-based task constraints with memory-based learning for the control of robotic manipulation and discuss related issues in using memory-based learners for time-varying mappings. Image-based constraints express task constraints in terms of equivalent perceptual constraints. We demonstrate their effectiveness and simplicity by describing two reference real-time robotic tasks and their corresponding implementations: the insertion of a pen into a ``cap'' (the capping experiment) and the rotational non-sliding point-contact pushing of an object of unknown shape, mass and friction to a specified goal point in the image-space. An unsupervised memory-based learning system is described that allows a robot to rapidly learn to point-contact push an unknown object towards an image-space goal without knowledge of the object's frictional and mass distributions. By having the robot observe the results of its actions on the object's orientation directly in image-space, the system learns a forward model. This acquired model is inverted on-line for manipulation planning and control. Rather than explicitly inverting the forward model to achieve trajectory control, a stochastic action selection technique [Moore,1990] is used to select the most informative and promising actions, thus allowing the integration of model exploitation and exploration. We conclude with a discussion of three explicit forgetting algorithms for memory-based learners. A forgetting algorithm allows memory-based learners to track time-varying concepts by deleting obsolete exemplars from learning sets. Time-weighted forgetting (TWF) is a well known algorithm which deletes exemplars based on their time of arrival. Two alternatives to TWF are introduced: Locally-weighted forgetting (LWF) uses the proximity of subsequent observations to a previous observation to control the previous observation's decay rate; Performance-Error Weighted Forgetting (PEWF) decays an observation based on its recent predictive accuracy. We compare these algorithms and argue that they successfully overcome some of the previous limitations of memory-based learners in time-varying environments.