Reusing Learned Policies in Similar Problems
Michael Bowling
We are interested in being able to leverage policy learning in complex
problems upon policies learned for similar problems. This capability
is particularly important in robot learning, where gathering data is
expensive and time-consuming, and prohibits directly applying
reinforcement learning. In this case, we would like to be able to
transfer knowledge from a simulator, which may have an inaccurate or
crude model of the robot and environment. We observed that when
applying a policy learned in a simulator, some parts of the policy
effectively apply to the real robots while other parts do not. We
then explored learning a complex problem by reusing only parts of the
solutions of similar problems. Empirical experiments of learning when
part of the policy is fixed show that the complete task is learned
faster, but the resulting policy is sub-optimal. One of the main
contributions of this paper is a theorem and its proof, which states
the degree of sub-optimality of a policy that is fixed over a
subproblem, can be determined without the need for optimally solving
the complete problem. We formally define a subproblem and build upon
the value equivalence of the boundary states of the subproblem to
prove the bound on sub-optimality.