Compositional Q-learning

Singh's compositional Q-learning [110, 109] (C-QL) consists of a hierarchy based on the temporal sequencing of subgoals. The elemental tasks are behaviors that achieve some recognizable condition. The high-level goal of the system is to achieve some set of conditions in sequential order. The achievement of the conditions provides reinforcement for the elemental tasks, which are trained first to achieve individual subgoals. Then, the gating function learns to switch the elemental tasks in order to achieve the appropriate high-level sequential goal. This method was used by Tham and Prager [121] to learn to control a simulated multi-link robot arm.

Leslie Pack Kaelbling
Wed May 1 13:19:13 EDT 1996