12:00, 3 Apr 1996, WeH 7220 ``Addressing Selective Attention and Hidden State using Utile Distinctions in Feature-Space and History-Space'' Andrew McCallum Univ. of Rochester (soon to be CMU!) Agents that interact with their environment through sensors and effectors often suffer from two opposite types of perceptual limitations. First, an agent can have ``too much sensory data,'' meaning that the sensors provide so much raw data that the agent cannot possibly process all of it at once. In this case, an agent can use Selective Attention to focus its limited computational resources on processing only certain features. Second, even though the sensors provide so much data, the agent may also have ``too little sensory data,'' meaning that perceptual limitations, such as limited field of view, acuity and occlusions, can hide crucial features of the environment from the agent. This problem is called Hidden State, and can often be solved by using short-term memory to remember features that were available in previous views. In this talk I will present a reinforcement learning algorithm that uses both selective attention and short-term memory to solve tasks with both ``too much'' and ``too little'' sensory data. The algorithm, called U-Tree, uses a Utile Distinction test to build an agent-internal state space that distinguishes between states based only on those features and memories that help predict reward. The algorithm is instance-based, and uses a tree structure to hold its state distinctions. It is related to other tree-based algorithms, including: Parti-game [Moore 94], Prediction Suffix Trees [Ron et al, 94], Variable Resolution Dynamic Programming [Moore 91], and the G-algorithm [Chapman & Kaelbling 91]. I will demonstrate the algorithm solving a highway driving task in which the agent weaves around slower and faster traffic. The agent's limited, driver's-view perception is based on simulated eye movements with Visual Routines [Ullman 84; Agre & Chapman 87; Whitehead 92]. The task imposes time pressure, stochasticity, hidden state, over 21,000 world states and over 2,500 perceptions. From this, the agent learns a task-dependent state space that contains only 143 states.