Another strategy for dealing with large state spaces is to treat them as a hierarchy of learning problems. In many cases, hierarchical solutions introduce slight sub-optimality in performance, but potentially gain a good deal of efficiency in execution time, learning time, and space.
Hierarchical learners are commonly structured as gated behaviors, as shown in Figure 8. There is a collection of behaviors that map environment states into low-level actions and a gating function that decides, based on the state of the environment, which behavior's actions should be switched through and actually executed. Maes and Brooks  used a version of this architecture in which the individual behaviors were fixed a priori and the gating function was learned from reinforcement. Mahadevan and Connell  used the dual approach: they fixed the gating function, and supplied reinforcement functions for the individual behaviors, which were learned. Lin  and Dorigo and Colombetti [38, 37] both used this approach, first training the behaviors and then training the gating function. Many of the other hierarchical learning methods can be cast in this framework.
Figure 8: A structure of gated behaviors.