Following is a list of the papers I have submitted for publication. My primary research interest is in the theoretical aspects of multiagent learning. Multiagent learning diverges most from single agent learning when the agents involved have different goals and desires. Hence, the theoretical aspects of multiagent learning this field can greatly benefit from studying and extending the field of game theory. However, because learning inherently implies some aspect of suboptimality, theoreticians in multiagent learning should not allow themselves to be restricted to the study of Nash equilibria, but should attempt to guarantee results with regards to interacting both with rational and irrational agents.
This paper examines the notion of symmetry in Markov decision processes (MDPs). We define symmetry for an MDP and show how it can be exploited for more effective learning in single agent systems as well as multiagent systems and multirobot systems. We prove that if an MDP possesses a symmetry, then the optimal value function and Q function are similarly symmetric and there exists a symmetric optimal policy. If an MDP is known to possess a symmetry, this knowledge can be applied to decrease the number of training examples needed for algorithms like Q learning and value iteration. It can also be used to directly restrict the hypothesis space.
In Zinkevich and Balch(2001), we proved that a symmetric MDP has a symmetric optimal policy. However, in reality, systems may possess slight asymmetries. In this abstract, we present without proof how in slightly asymmetric MDPs there exists a slightly suboptimal symmetric policy. In order to specify the bound on the suboptimality of some symmetric policy, we use notation used in the other paper.
In this paper we study the problem of online market clearing where there is one commodity in the market, being bought and sold by multiple buyers and sellers who submit buy and sell bids that arrive and expire at different times. The auctioneer is faced with an online clearing problem of deciding which buy and sell bids to match without knowing what bids will arrive in the future.
Convex programming involves a convex set F and a convex function c:F->R. The goal of convex programming is to find a point in F which minimizes c. In this paper, we introduce online convex programming. In online convex programming, the convex set is known in advance, but in each step of some repeated optimization problem, one must select a point in F before seeing the cost function for that step. This can be used to model factory production, farm production, and many other industrial optimization problems where one is unaware of the value of the items produced until they have already been constructed. We introduce an algorithm for this domain, apply it to repeated games, and show that it is really a generalization of infinitesimal gradient ascent, and the results here imply that generalized infinitesimal gradient ascent (GIGA) is universally consistent.
Kearns, Mansour, and Singh (2000) presented an algorithm, called Iterated Gradient Ascent(IGA), for two-player, two-action general-sum games, for which they proved that the average reward converges to that of a Nash equilibria. In this paper, we extend this algorithm to n-action games, and establish that IGA is a marginal best response against a large class of algorithms. We also compare and contrast marginal best response, Nash equilibria, and correlated equilibria, in order to compare IGA with other algorithms.
This paper discusses symmetries in stochastic games. In the past, people have discussed symmetric bimatrix games. However, we extend this concept so that players can have internal symmetries between their own states, as well as have states which are symmetric with the states of other players in the game. We prove that if a stochastic game possesses a symmetry, then the best response function is symmetric and there exists a symmetric Nash equilibrium.