Reinforcement learning in non-stationary games

UBC Theses and Dissertations

Featured Collection

UBC Theses and Dissertations

Reinforcement learning in non-stationary games Namvar Gharehshiran, Omid

Abstract

The unifying theme of this thesis is the design and analysis of adaptive procedures that are aimed at learning the optimal decision in the presence of uncertainty. The first part is devoted to strategic decision making involving multiple individuals with conflicting interests. This is the subject of non-cooperative game theory. The proliferation of social networks has led to new ways of sharing information. Individuals subscribe to social groups, in which their experiences are shared. This new information patterns facilitate the resolution of uncertainties. We present an adaptive learning algorithm that exploits these new patterns. Despite its deceptive simplicity, if followed by all individuals, the emergent global behavior resembles that obtained from fully rational considerations, namely, correlated equilibrium. Further, it responds to the random unpredictable changes in the environment by properly tracking the evolving correlated equilibria set. Numerical evaluations verify these new information patterns can lead to improved adaptability of individuals and, hence, faster convergence to correlated equilibrium. Motivated by the self-configuration feature of the game-theoretic design and the prevalence of wireless-enabled electronics, the proposed adaptive learning procedure is then employed to devise an energy-aware activation mechanism for wireless-enabled sensors which are assigned a parameter estimation task. The proposed game-theoretic model trades-off sensors' contribution to the estimation task and the associated energy costs. The second part considers the problem of a single decision maker who seeks the optimal choice in the presence of uncertainty. This problem is mathematically formulated as a discrete stochastic optimization. In many real-life systems, due to the unexplained randomness and complexity involved, there typically exists no explicit relation between the performance measure of interest and the decision variables. In such cases, computer simulations are used as models of real systems to evaluate output responses. We present two simulation-based adaptive search schemes and show that, by following these schemes, the global optimum can be properly tracked as it undergoes random unpredictable jumps over time. Further, most of the simulation effort is exhausted on the global optimizer. Numerical evaluations verify faster convergence and improved efficiency as compared with existing random search, simulated annealing, and upper confidence bound methods.

Item Metadata

Title	Reinforcement learning in non-stationary games
Creator	Namvar Gharehshiran, Omid
Publisher	University of British Columbia
Date Issued	2015
Description	The unifying theme of this thesis is the design and analysis of adaptive procedures that are aimed at learning the optimal decision in the presence of uncertainty. The first part is devoted to strategic decision making involving multiple individuals with conflicting interests. This is the subject of non-cooperative game theory. The proliferation of social networks has led to new ways of sharing information. Individuals subscribe to social groups, in which their experiences are shared. This new information patterns facilitate the resolution of uncertainties. We present an adaptive learning algorithm that exploits these new patterns. Despite its deceptive simplicity, if followed by all individuals, the emergent global behavior resembles that obtained from fully rational considerations, namely, correlated equilibrium. Further, it responds to the random unpredictable changes in the environment by properly tracking the evolving correlated equilibria set. Numerical evaluations verify these new information patterns can lead to improved adaptability of individuals and, hence, faster convergence to correlated equilibrium. Motivated by the self-configuration feature of the game-theoretic design and the prevalence of wireless-enabled electronics, the proposed adaptive learning procedure is then employed to devise an energy-aware activation mechanism for wireless-enabled sensors which are assigned a parameter estimation task. The proposed game-theoretic model trades-off sensors' contribution to the estimation task and the associated energy costs. The second part considers the problem of a single decision maker who seeks the optimal choice in the presence of uncertainty. This problem is mathematically formulated as a discrete stochastic optimization. In many real-life systems, due to the unexplained randomness and complexity involved, there typically exists no explicit relation between the performance measure of interest and the decision variables. In such cases, computer simulations are used as models of real systems to evaluate output responses. We present two simulation-based adaptive search schemes and show that, by following these schemes, the global optimum can be properly tracked as it undergoes random unpredictable jumps over time. Further, most of the simulation effort is exhausted on the global optimizer. Numerical evaluations verify faster convergence and improved efficiency as compared with existing random search, simulated annealing, and upper confidence bound methods.
Genre	Thesis/Dissertation
Type	Text
Language	eng
Date Available	2015-01-29
Provider	Vancouver : University of British Columbia Library
Rights	Attribution-NonCommercial-NoDerivs 2.5 Canada
DOI	10.14288/1.0135671
URI	http://hdl.handle.net/2429/51993
Degree (Theses)	Doctor of Philosophy - PhD
Program (Theses)	Electrical and Computer Engineering
Affiliation	Applied Science, Faculty of; Electrical and Computer Engineering, Department of
Degree Grantor	University of British Columbia
Graduation Date	2015-05
Campus	UBCV
Scholarly Level	Graduate
Rights URI	http://creativecommons.org/licenses/by-nc-nd/2.5/ca/
Aggregated Source Repository	DSpace

Open Collections

UBC Theses and Dissertations

UBC Theses and Dissertations

Reinforcement learning in non-stationary games Namvar Gharehshiran, Omid

Abstract

Item Metadata

Item Media

Item Citations and Data

Rights