UBC Theses and Dissertations

UBC Theses Logo

UBC Theses and Dissertations

Reinforcement learning in non-stationary games Namvar Gharehshiran, Omid

Abstract

The unifying theme of this thesis is the design and analysis of adaptive procedures that are aimed at learning the optimal decision in the presence of uncertainty. The first part is devoted to strategic decision making involving multiple individuals with conflicting interests. This is the subject of non-cooperative game theory. The proliferation of social networks has led to new ways of sharing information. Individuals subscribe to social groups, in which their experiences are shared. This new information patterns facilitate the resolution of uncertainties. We present an adaptive learning algorithm that exploits these new patterns. Despite its deceptive simplicity, if followed by all individuals, the emergent global behavior resembles that obtained from fully rational considerations, namely, correlated equilibrium. Further, it responds to the random unpredictable changes in the environment by properly tracking the evolving correlated equilibria set. Numerical evaluations verify these new information patterns can lead to improved adaptability of individuals and, hence, faster convergence to correlated equilibrium. Motivated by the self-configuration feature of the game-theoretic design and the prevalence of wireless-enabled electronics, the proposed adaptive learning procedure is then employed to devise an energy-aware activation mechanism for wireless-enabled sensors which are assigned a parameter estimation task. The proposed game-theoretic model trades-off sensors' contribution to the estimation task and the associated energy costs. The second part considers the problem of a single decision maker who seeks the optimal choice in the presence of uncertainty. This problem is mathematically formulated as a discrete stochastic optimization. In many real-life systems, due to the unexplained randomness and complexity involved, there typically exists no explicit relation between the performance measure of interest and the decision variables. In such cases, computer simulations are used as models of real systems to evaluate output responses. We present two simulation-based adaptive search schemes and show that, by following these schemes, the global optimum can be properly tracked as it undergoes random unpredictable jumps over time. Further, most of the simulation effort is exhausted on the global optimizer. Numerical evaluations verify faster convergence and improved efficiency as compared with existing random search, simulated annealing, and upper confidence bound methods.

Item Citations and Data

Rights

Attribution-NonCommercial-NoDerivs 2.5 Canada