Reusing Learned Policies in Similar Problems Michael Bowling We are interested in being able to leverage policy learning in complex problems upon policies learned for similar problems. This capability is particularly important in robot learning, where gathering data is expensive and time-consuming, and prohibits directly applying reinforcement learning. In this case, we would like to be able to transfer knowledge from a simulator, which may have an inaccurate or crude model of the robot and environment. We observed that when applying a policy learned in a simulator, some parts of the policy effectively apply to the real robots while other parts do not. We then explored learning a complex problem by reusing only parts of the solutions of similar problems. Empirical experiments of learning when part of the policy is fixed show that the complete task is learned faster, but the resulting policy is sub-optimal. One of the main contributions of this paper is a theorem and its proof, which states the degree of sub-optimality of a policy that is fixed over a subproblem, can be determined without the need for optimally solving the complete problem. We formally define a subproblem and build upon the value equivalence of the boundary states of the subproblem to prove the bound on sub-optimality.