next up previous
Next: Experiments and Results Up: Learning Method Previous: Retrieving from Memory

Choosing an Action

The action selection method is designed to make use of memory to select the action most probable to succeed, and to fill memory when no useful memories were available. For example, when the defender is at position tex2html_wrap_inline252 , the agent begins by retrieving tex2html_wrap_inline482 and tex2html_wrap_inline484 as described in Section 2.3.2. Then, it acts according to the following function:

tabular75

An action is only selected based on the memory values if these values indicate that one action is likely to succeed and that it is better than the other. If, on the other hand, neither value tex2html_wrap_inline482 nor tex2html_wrap_inline484 indicate a positive likelihood of success, then an action is chosen randomly. The only exception to this last rule is when one of the values is zero,gif suggesting that there has not yet been any training examples for that action at that memory location. In this case, there is a bias towards exploring the untried action in order to fill out memory.



Peter Stone
Mon Dec 11 15:42:40 EST 1995