One weird trick for advertising outcomes : an exploration of the multi-armed bandit for performance-driven marketing

UBC Theses and Dissertations

Featured Collection

UBC Theses and Dissertations

One weird trick for advertising outcomes : an exploration of the multi-armed bandit for performance-driven marketing Burtini, Giuseppe Antonio

Abstract

In this work, we explore an online reinforcement learning problem called the multi-armed bandit for application to improving outcomes in a web marketing context. Specifically, we aim to produce tools for the efficient experiment design of variations of a website with the goal of increasing some desired behavior such as purchases. We provide a detailed reference, with a statistical lens, of the existing research in variants and associated policies known for the problem, then produce a set of theoretical and empirical analyses of specific application area questions. Concretely, we provide a number of contributions: First, we present a new standardized simulation platform integrating knowledge and techniques from the existing literature for the evaluation of bandit algorithms in a set of pre-defined worlds. To the best of our knowledge, this is the first comprehensive simulation platform for multi-armed bandits capable of arbitrary arms, parameterization, algorithms and repeatable experimentation. Second, we integrate Thompson sampling into linear model techniques and explore a number of implementation questions, finding both that replication of Thompson sampling and adjusting for estimative uncertainty is a plausible mechanism for improving the results. Third, we explore novel techniques for dealing with certain types of structural non-stationarity such as drift and find that the technique of weighted least squares is a strong tool for handling both known and unknown drift. Empirically, in the unspecified case, an exponential decaying weight provides good performance in a large variety of cases; in the specified case, an experimenter can select a weighting strategy to reflect their known drift achieving state-of-the-art results. Fourth, we present the first known oracle-free measure of regret called statistical regret, which utilizes intuitions from the confidence interval to produce a type of interval metric by replaying late-experiment knowledge over prior actions to determine how performant an experimenter can believe their results to be. Fifth, we present preliminary results on a specification-robust and computationally efficient sampling technique called the Simple Optimistic Sampler which shows promising outcomes via a technique which requires no modelling assumptions to implement.

Item Metadata

Title	One weird trick for advertising outcomes : an exploration of the multi-armed bandit for performance-driven marketing
Creator	Burtini, Giuseppe Antonio
Publisher	University of British Columbia
Date Issued	2015
Description	In this work, we explore an online reinforcement learning problem called the multi-armed bandit for application to improving outcomes in a web marketing context. Specifically, we aim to produce tools for the efficient experiment design of variations of a website with the goal of increasing some desired behavior such as purchases. We provide a detailed reference, with a statistical lens, of the existing research in variants and associated policies known for the problem, then produce a set of theoretical and empirical analyses of specific application area questions. Concretely, we provide a number of contributions: First, we present a new standardized simulation platform integrating knowledge and techniques from the existing literature for the evaluation of bandit algorithms in a set of pre-defined worlds. To the best of our knowledge, this is the first comprehensive simulation platform for multi-armed bandits capable of arbitrary arms, parameterization, algorithms and repeatable experimentation. Second, we integrate Thompson sampling into linear model techniques and explore a number of implementation questions, finding both that replication of Thompson sampling and adjusting for estimative uncertainty is a plausible mechanism for improving the results. Third, we explore novel techniques for dealing with certain types of structural non-stationarity such as drift and find that the technique of weighted least squares is a strong tool for handling both known and unknown drift. Empirically, in the unspecified case, an exponential decaying weight provides good performance in a large variety of cases; in the specified case, an experimenter can select a weighting strategy to reflect their known drift achieving state-of-the-art results. Fourth, we present the first known oracle-free measure of regret called statistical regret, which utilizes intuitions from the confidence interval to produce a type of interval metric by replaying late-experiment knowledge over prior actions to determine how performant an experimenter can believe their results to be. Fifth, we present preliminary results on a specification-robust and computationally efficient sampling technique called the Simple Optimistic Sampler which shows promising outcomes via a technique which requires no modelling assumptions to implement.
Genre	Thesis/Dissertation
Type	Text
Language	eng
Date Available	2015-10-24
Provider	Vancouver : University of British Columbia Library
Rights	Attribution-NonCommercial-NoDerivs 2.5 Canada
DOI	10.14288/1.0166786
URI	http://hdl.handle.net/2429/55064
Degree (Theses)	Master of Science - MSc
Program (Theses)	Interdisciplinary Studies
Affiliation	Graduate Studies, College of (Okanagan)
Degree Grantor	University of British Columbia
Graduation Date	2015-11
Campus	UBCO
Scholarly Level	Graduate
Rights URI	http://creativecommons.org/licenses/by-nc-nd/2.5/ca/
Aggregated Source Repository	DSpace

Open Collections

UBC Theses and Dissertations

UBC Theses and Dissertations

One weird trick for advertising outcomes : an exploration of the multi-armed bandit for performance-driven marketing Burtini, Giuseppe Antonio

Abstract

Item Metadata

Item Media

Item Citations and Data

Rights