UBC Theses and Dissertations

UBC Theses Logo

UBC Theses and Dissertations

Algorithmic learning in games Possnig, Clemens

Abstract

This dissertation studies algorithmic agents that interact repeatedly in strategic settings. Chapter 2 provides asymptotic results for a family of reinforcement learning algorithms known as ‘actor-critic learners’. Each such algorithmic agent simultaneously estimates what is called a ‘critic’, such as a value function, and updates its policy, which is referred to as the ‘actor’. The critic is used to indicate directions of improvement for the actor. I establish sufficient conditions for the consistency of each agent’s parametric critic estimator, which enables them to adapt and find optimal responses despite the non-stationarity inherent to multi-agent settings. The conditions depend on the environment, number of observations used in the critic estimation, and policy stepsize. Chapter 3 presents an analytical characterization of the long run policies learned by algorithmic agents in the multi-agent setting. The algorithms studied here form a superset of the family considered in chapter 2. These algorithms update policies, which are maps from observed states to actions. I show that the long run policies correspond to equilibria that are stable points of a tractable differential equation. In chapter 4, I consider algorithmic agents playing a repeated Cournot game of quantity competition. In this situation, learning the stage game Nash equilibrium serves as noncollusive benchmark. I give necessary and sufficient conditions for this Nash equilibrium not to be learned. These conditions are requirements on the state variables of the algorithms, and on the stage game. When algorithms determine actions based only on the past period’s price, the Nash equilibrium can be learned. However, agents may condition their actions on richer types of state variables beyond the past period’s price. In that case, I give sufficient conditions such that the policies converge to a collusive equilibrium with positive probability, while never converging to the Nash equilibrium.

Item Citations and Data

Rights

Attribution-NonCommercial-NoDerivatives 4.0 International