Thus far, we have not considered what happens once a policy achieves its goal. Since agents rarely set out to achieve a goal and die, we now want to consider how to account for extended activity involving many goals.
One important class of extended activities is when an agent transforms a whole class of identical objects. We will call this metabolizing the class. Metabolism can be useful or it can make extra work: cooking 100 eggs is useful, at least if you are feeding a lot of people; dirtying 100 forks, however, probably means you have to wash them all.
Whether a policy metabolizes an object class depends in large part on the binding map it uses. The policy metabolizes its materials because the material being worked on ceases to be the leftmost pregoal material as soon as it arrives in a goal state. When this happens, changes bindings and the agent starts to work on a different object. Policy p never actually sees a material in a goal state. Of course, the property of being ``leftmost'' is an artifact of our formalism. What matters to the property of metabolism is simply that the binding map implement some ordering on the instances of the material and always choose the minimum under that ordering of the objects that are in pre-goal states. Such an ordering might be implemented by the agent visually scanning its work surface for an uncooked egg, but always scanning left-to-right and top-to-bottom. We will return to these issues in section 8.
Other binding maps lead to other kinds of behavior, some of which are pathological. If the binding map always chooses the same binding, then metabolism ceases. If the binding map always chooses uncooked eggs but doesn't impose any ordering on them, it might start cooking an infinite number of eggs without ever actually finishing any one of them.
Metabolism is also an issue for tool use. To metabolize its materials, must repeatedly reset its tools. An alternate policy is to metabolize the tools too. Let us define to be the binding map that uses not only the leftmost pregoal material but also the leftmost reset tools. Then clearly,
is a solution from any state for which is defined. This policy treats tools as disposable. So long as there is an infinite supply of fresh tools, p will see a succession of states in which tools are in their reset states. It will never need to execute a resetting action and so the environment is effectively a single-state-tool environment. Thus the reduction of section 7.1.3 is unnecessary.