Publications

Following is a list of the papers I have submitted for publication. My primary research interest is in the theoretical aspects of multiagent learning. Multiagent learning diverges most from single agent learning when the agents involved have different goals and desires. Hence, the theoretical aspects of multiagent learning this field can greatly benefit from studying and extending the field of game theory. However, because learning inherently implies some aspect of suboptimality, theoreticians in multiagent learning should not allow themselves to be restricted to the study of Nash equilibria, but should attempt to guarantee results with regards to interacting both with rational and irrational agents.

Symmetry in Markov Decision Processes and its Implications for Single Agent and Multiagent Learning dvi ps pdf
with Tucker Balch. Appeared in the Eighteenth International Conference on Machine Learning(ICML-2001).

This paper examines the notion of symmetry in Markov decision processes (MDPs). We define symmetry for an MDP and show how it can be exploited for more effective learning in single agent systems as well as multiagent systems and multirobot systems. We prove that if an MDP possesses a symmetry, then the optimal value function and Q function are similarly symmetric and there exists a symmetric optimal policy. If an MDP is known to possess a symmetry, this knowledge can be applied to decrease the number of training examples needed for algorithms like Q learning and value iteration. It can also be used to directly restrict the hypothesis space.
Using Symmetry in Asymmetrical Markov Decision Processes (Extended Abstract) dvi ps pdf
with Tucker Balch. Appeared in the Fifth International Conference on Autonomous Agents(AGENTS-2001) Workshop on Learning Agents(workshop website).

Here is a draft of the proof(do not distribute)dvi ps pdf

In Zinkevich and Balch(2001), we proved that a symmetric MDP has a symmetric optimal policy. However, in reality, systems may possess slight asymmetries. In this abstract, we present without proof how in slightly asymmetric MDPs there exists a slightly suboptimal symmetric policy. In order to specify the bound on the suboptimality of some symmetric policy, we use notation used in the other paper.
Online Algorithms for Market Clearing dvi ps pdf
with Avrim Blum and Tuomas Sandholm. To appear in the Thirteenth Annual Symposium on Discrete Algorithms(SODA-2002).

In this paper we study the problem of online market clearing where there is one commodity in the market, being bought and sold by multiple buyers and sellers who submit buy and sell bids that arrive and expire at different times. The auctioneer is faced with an online clearing problem of deciding which buy and sell bids to match without knowing what bids will arrive in the future.
Competitive Analysis of the Explore/Exploit Tradeoff pdf
with John Langford and Sham Kakade. Nineteenth International Conference on Machine Learning (ICML-2002).
On Polynomial-Time Preference Elicitation with Value Queries pdf
with Avrim Blum and Tuomas Sandholm. ACM Conference on Electronic Commerce (ACM EC 2003).
Preference Elicitation and Query Learning pdf
with Avrim Blum, Jeff Jackson, and Tuomas Sandholm. The Sixteenth Annual Conference on Learning Theory (COLT 2003).
Online Convex Programming and Generalized Infinitesimal Gradient Ascent pdf
The Twentieth International Conference on Machine Learning (ICML 2003).

The following are works in progress:

Online Convex Programming and Generalized Infinitesimal Gradient Ascent ps dvi pdf
CMU Technical Report CMU-CS-03-110.
Convex programming involves a convex set F and a convex function c:F->R. The goal of convex programming is to find a point in F which minimizes c. In this paper, we introduce online convex programming. In online convex programming, the convex set is known in advance, but in each step of some repeated optimization problem, one must select a point in F before seeing the cost function for that step. This can be used to model factory production, farm production, and many other industrial optimization problems where one is unaware of the value of the items produced until they have already been constructed. We introduce an algorithm for this domain, apply it to repeated games, and show that it is really a generalization of infinitesimal gradient ascent, and the results here imply that generalized infinitesimal gradient ascent (GIGA) is universally consistent.
Marginal Best Response, Nash Equilibria, and Iterated Gradient Ascent dvi ps pdf
with Patrick Riley,Michael Bowling, and Avrim Blum

Kearns, Mansour, and Singh (2000) presented an algorithm, called Iterated Gradient Ascent(IGA), for two-player, two-action general-sum games, for which they proved that the average reward converges to that of a Nash equilibria. In this paper, we extend this algorithm to n-action games, and establish that IGA is a marginal best response against a large class of algorithms. We also compare and contrast marginal best response, Nash equilibria, and correlated equilibria, in order to compare IGA with other algorithms.
Generalized Symmetry in Stochastic Games dvi ps pdf

This paper discusses symmetries in stochastic games. In the past, people have discussed symmetric bimatrix games. However, we extend this concept so that players can have internal symmetries between their own states, as well as have states which are symmetric with the states of other players in the game. We prove that if a stochastic game possesses a symmetry, then the best response function is symmetric and there exists a symmetric Nash equilibrium.

Martin Zinkevich