- Library Home /
- Search Collections /
- Open Collections /
- Browse Collections /
- UBC Theses and Dissertations /
- Algorithmic learning in games
Open Collections
UBC Theses and Dissertations
UBC Theses and Dissertations
Algorithmic learning in games Possnig, Clemens
Abstract
This dissertation studies algorithmic agents that interact repeatedly in strategic settings. Chapter 2 provides asymptotic results for a family of reinforcement learning algorithms known as ‘actor-critic learners’. Each such algorithmic agent simultaneously estimates what is called a ‘critic’, such as a value function, and updates its policy, which is referred to as the ‘actor’. The critic is used to indicate directions of improvement for the actor. I establish sufficient conditions for the consistency of each agent’s parametric critic estimator, which enables them to adapt and find optimal responses despite the non-stationarity inherent to multi-agent settings. The conditions depend on the environment, number of observations used in the critic estimation, and policy stepsize. Chapter 3 presents an analytical characterization of the long run policies learned by algorithmic agents in the multi-agent setting. The algorithms studied here form a superset of the family considered in chapter 2. These algorithms update policies, which are maps from observed states to actions. I show that the long run policies correspond to equilibria that are stable points of a tractable differential equation. In chapter 4, I consider algorithmic agents playing a repeated Cournot game of quantity competition. In this situation, learning the stage game Nash equilibrium serves as noncollusive benchmark. I give necessary and sufficient conditions for this Nash equilibrium not to be learned. These conditions are requirements on the state variables of the algorithms, and on the stage game. When algorithms determine actions based only on the past period’s price, the Nash equilibrium can be learned. However, agents may condition their actions on richer types of state variables beyond the past period’s price. In that case, I give sufficient conditions such that the policies converge to a collusive equilibrium with positive probability, while never converging to the Nash equilibrium.
Item Metadata
Title |
Algorithmic learning in games
|
Creator | |
Supervisor | |
Publisher |
University of British Columbia
|
Date Issued |
2023
|
Description |
This dissertation studies algorithmic agents that interact repeatedly in strategic settings.
Chapter 2 provides asymptotic results for a family of reinforcement learning algorithms
known as ‘actor-critic learners’. Each such algorithmic agent simultaneously estimates what
is called a ‘critic’, such as a value function, and updates its policy, which is referred to as
the ‘actor’. The critic is used to indicate directions of improvement for the actor. I establish
sufficient conditions for the consistency of each agent’s parametric critic estimator, which
enables them to adapt and find optimal responses despite the non-stationarity inherent to
multi-agent settings. The conditions depend on the environment, number of observations
used in the critic estimation, and policy stepsize.
Chapter 3 presents an analytical characterization of the long run policies learned by
algorithmic agents in the multi-agent setting. The algorithms studied here form a superset
of the family considered in chapter 2. These algorithms update policies, which are maps
from observed states to actions. I show that the long run policies correspond to equilibria
that are stable points of a tractable differential equation.
In chapter 4, I consider algorithmic agents playing a repeated Cournot game of quantity
competition. In this situation, learning the stage game Nash equilibrium serves as noncollusive
benchmark. I give necessary and sufficient conditions for this Nash equilibrium
not to be learned. These conditions are requirements on the state variables of the algorithms,
and on the stage game. When algorithms determine actions based only on the past period’s
price, the Nash equilibrium can be learned. However, agents may condition their actions on
richer types of state variables beyond the past period’s price. In that case, I give sufficient
conditions such that the policies converge to a collusive equilibrium with positive probability,
while never converging to the Nash equilibrium.
|
Genre | |
Type | |
Language |
eng
|
Date Available |
2023-09-07
|
Provider |
Vancouver : University of British Columbia Library
|
Rights |
Attribution-NonCommercial-NoDerivatives 4.0 International
|
DOI |
10.14288/1.0435830
|
URI | |
Degree | |
Program | |
Affiliation | |
Degree Grantor |
University of British Columbia
|
Graduation Date |
2023-11
|
Campus | |
Scholarly Level |
Graduate
|
Rights URI | |
Aggregated Source Repository |
DSpace
|
Item Media
Item Citations and Data
Rights
Attribution-NonCommercial-NoDerivatives 4.0 International