UBC Theses and Dissertations
Non-linear contextual bandits Chia, John
The multi-armed bandit framework can be motivated by any problem where there is an abundance of choice and the utility of trying something new must be balanced with that of going with the status quo. This is a trade-off that is present in the everyday problem of where and what to eat: should I try a new restaurant or go to that Chinese place on the corner? In this work, a multi-armed bandit algorithm is presented which uses a non-parametric non-linear data model (a Gaussian process) to solve problems of this sort. The advantages of this method over existing work is highlighted through experiments. The method is also capable of modelling correlations between separate instances of problems, e.g., between similar dishes at similar restaurants. To demonstrate this, a few experiments are performed. The first, a synthetic example where the reward function is actually sampled from a Gaussian process, begs the question but helps pin down the properties of the algorithm in a controlled environment. The second, a problem where the objective is to aim a cannon at a distant target, shows how a well-defined objective, i.e., hit the target, can be used to speed up convergence. Finally, the third, an experiment with photographic post-processing, shows how the algorithm can learn from experience. The experiments demonstrate both the flexibility and the computational complexity of the model. This complexity means that problems such as the aforementioned restaurant problem, among others, are still future work.
Item Citations and Data
Attribution-NonCommercial-ShareAlike 3.0 Unported