UBC Theses and Dissertations

UBC Theses Logo

UBC Theses and Dissertations

One weird trick for advertising outcomes : an exploration of the multi-armed bandit for performance-driven marketing Burtini, Giuseppe Antonio


In this work, we explore an online reinforcement learning problem called the multi-armed bandit for application to improving outcomes in a web marketing context. Specifically, we aim to produce tools for the efficient experiment design of variations of a website with the goal of increasing some desired behavior such as purchases. We provide a detailed reference, with a statistical lens, of the existing research in variants and associated policies known for the problem, then produce a set of theoretical and empirical analyses of specific application area questions. Concretely, we provide a number of contributions: First, we present a new standardized simulation platform integrating knowledge and techniques from the existing literature for the evaluation of bandit algorithms in a set of pre-defined worlds. To the best of our knowledge, this is the first comprehensive simulation platform for multi-armed bandits capable of arbitrary arms, parameterization, algorithms and repeatable experimentation. Second, we integrate Thompson sampling into linear model techniques and explore a number of implementation questions, finding both that replication of Thompson sampling and adjusting for estimative uncertainty is a plausible mechanism for improving the results. Third, we explore novel techniques for dealing with certain types of structural non-stationarity such as drift and find that the technique of weighted least squares is a strong tool for handling both known and unknown drift. Empirically, in the unspecified case, an exponential decaying weight provides good performance in a large variety of cases; in the specified case, an experimenter can select a weighting strategy to reflect their known drift achieving state-of-the-art results. Fourth, we present the first known oracle-free measure of regret called statistical regret, which utilizes intuitions from the confidence interval to produce a type of interval metric by replaying late-experiment knowledge over prior actions to determine how performant an experimenter can believe their results to be. Fifth, we present preliminary results on a specification-robust and computationally efficient sampling technique called the Simple Optimistic Sampler which shows promising outcomes via a technique which requires no modelling assumptions to implement.

Item Citations and Data


Attribution-NonCommercial-NoDerivs 2.5 Canada