Non-linear contextual bandits

UBC Theses and Dissertations

Featured Collection

UBC Theses and Dissertations

Non-linear contextual bandits Chia, John

Abstract

The multi-armed bandit framework can be motivated by any problem where there is an abundance of choice and the utility of trying something new must be balanced with that of going with the status quo. This is a trade-off that is present in the everyday problem of where and what to eat: should I try a new restaurant or go to that Chinese place on the corner? In this work, a multi-armed bandit algorithm is presented which uses a non-parametric non-linear data model (a Gaussian process) to solve problems of this sort. The advantages of this method over existing work is highlighted through experiments. The method is also capable of modelling correlations between separate instances of problems, e.g., between similar dishes at similar restaurants. To demonstrate this, a few experiments are performed. The first, a synthetic example where the reward function is actually sampled from a Gaussian process, begs the question but helps pin down the properties of the algorithm in a controlled environment. The second, a problem where the objective is to aim a cannon at a distant target, shows how a well-defined objective, i.e., hit the target, can be used to speed up convergence. Finally, the third, an experiment with photographic post-processing, shows how the algorithm can learn from experience. The experiments demonstrate both the flexibility and the computational complexity of the model. This complexity means that problems such as the aforementioned restaurant problem, among others, are still future work.

Item Metadata

Title	Non-linear contextual bandits
Creator	Chia, John
Publisher	University of British Columbia
Date Issued	2012
Description	The multi-armed bandit framework can be motivated by any problem where there is an abundance of choice and the utility of trying something new must be balanced with that of going with the status quo. This is a trade-off that is present in the everyday problem of where and what to eat: should I try a new restaurant or go to that Chinese place on the corner? In this work, a multi-armed bandit algorithm is presented which uses a non-parametric non-linear data model (a Gaussian process) to solve problems of this sort. The advantages of this method over existing work is highlighted through experiments. The method is also capable of modelling correlations between separate instances of problems, e.g., between similar dishes at similar restaurants. To demonstrate this, a few experiments are performed. The first, a synthetic example where the reward function is actually sampled from a Gaussian process, begs the question but helps pin down the properties of the algorithm in a controlled environment. The second, a problem where the objective is to aim a cannon at a distant target, shows how a well-defined objective, i.e., hit the target, can be used to speed up convergence. Finally, the third, an experiment with photographic post-processing, shows how the algorithm can learn from experience. The experiments demonstrate both the flexibility and the computational complexity of the model. This complexity means that problems such as the aforementioned restaurant problem, among others, are still future work.
Genre	Thesis/Dissertation
Type	Text
Language	eng
Date Available	2012-04-23
Provider	Vancouver : University of British Columbia Library
Rights	Attribution-NonCommercial-ShareAlike 3.0 Unported
DOI	10.14288/1.0052153
URI	http://hdl.handle.net/2429/42191
Degree	Master of Science - MSc
Program	Computer Science
Affiliation	Science, Faculty of; Computer Science, Department of
Degree Grantor	University of British Columbia
Graduation Date	2012-05
Campus	UBCV
Scholarly Level	Graduate
Rights URI	http://creativecommons.org/licenses/by-nc-sa/3.0/
Aggregated Source Repository	DSpace

Open Collections

UBC Theses and Dissertations

UBC Theses and Dissertations

Non-linear contextual bandits Chia, John

Abstract

Item Metadata

Item Media

Item Citations and Data

Rights