In addition to the existing learning approaches described above, there are several previously unexplored learning opportunities that apply to homogeneous non-communicating systems (see Table 5).
One unexplored learning opportunity that could apply in domains with homogeneous non-communicating agents is learning to enable others' actions. Inspired by the concept of stigmergy, an agent may try to learn to take actions that will not directly help it in its current situation, but that may allow other similar agents to be more effective in the future. Typical RL situations with delayed reward encourage agents to learn to achieve their goals directly by propagating local reinforcement back to past states and actions . However if an action leads to a reward by another agent, the acting agent may have no way of reinforcing that action. Techniques to deal with such a problem would be useful for building multiagent systems.
In terms of modeling other agents, there is much room for improvement in the situation that a given agent does not know the internal state or sensory inputs of another agent. When such information is known, RMM can be used to determine future actions of agents. However, if the information is not directly available, it would be useful for an agent to learn it. The function from agent X's sensor data (which might include a restricted view of agent Y) to agent Y's sensor data is a useful function to learn. If effectively learned, agent X can then use (limited) RMM to predict agent Y's future actions.