All of the previous discussion has tacitly assumed that it is possible
to enumerate the state and action spaces and store tables of values
over them. Except in very small environments, this means impractical
memory requirements. It also makes inefficient use of experience. In
a large, smooth state space we generally expect similar states to have
similar values and similar optimal actions. Surely, therefore, there
should be some more compact representation than a table. Most
problems will have continuous or large discrete state spaces; some
will have large or continuous action spaces. The problem of learning
in large spaces is addressed through *generalization techniques*,
which allow compact storage of learned information and transfer of
knowledge between ``similar'' states and actions.

The large literature of generalization techniques from inductive concept learning can be applied to reinforcement learning. However, techniques often need to be tailored to specific details of the problem. In the following sections, we explore the application of standard function-approximation techniques, adaptive resolution models, and hierarchical methods to the problem of reinforcement learning.

The reinforcement-learning architectures and algorithms discussed
above have included the storage of a variety of mappings, including
(policies),
(value functions), (*Q*
functions and rewards), (deterministic transitions), and (transition probabilities). Some of these
mappings, such as transitions and immediate rewards, can be learned
using straightforward supervised learning, and can be handled using
any of the wide variety of function-approximation techniques for
supervised learning that support noisy training examples. Popular
techniques include various neural-network methods [94],
fuzzy logic [11, 58].
CMAC [3], and local memory-based
methods [84], such as generalizations of
nearest neighbor methods. Other mappings, especially the policy
mapping, typically need specialized algorithms because training sets
of input-output pairs are not available.

Wed May 1 13:19:13 EDT 1996