UBC Theses and Dissertations
Online learning and decision making : algorithms and Bayesian analysis Lin, Meichun
Decision making under uncertainty and limited information has been a critical challenge in operations. This thesis studies three applications and sheds light on how to make sequential decisions as data for learning are collected over time. In the first chapter, we consider the data-driven newsvendor problem where a manager makes inventory decisions sequentially and learns the unknown demand distribution based on observed samples of continuous demand (no truncation). We show that the widely-used sample average approximation approach is near-optimal. Moreover, we characterize how the best achievable performance depends on not only the time horizon but also the local flatness of the demand distribution. The second chapter considers a dynamic pricing and learning problem where a seller prices multiple products and learns from sales data about unknown demand. To avoid the classical problem of incomplete learning, we propose dithering policies under which prices are probabilistically selected in a neighborhood surrounding the myopic optimal price. We show that dithering policies achieve asymptotically optimal performance in three typical settings and their extensions with demand correlation, which demonstrates dithering as a unified approach to balance exploration and exploitation. The third chapter considers a sequential search over a group of similar alternatives to select the best one. The individual value of an alternative contains two components: an observable utility and an idiosyncratic value. Once an alternative is searched, the utility can be fully revealed, but the idiosyncratic value is unobservable and needs to be learned gradually by sampling. The utilities share an unknown population distribution, which captures the similarity across the alternatives and allows for knowledge transfer within the group. A novel feature of this problem is the combination of the individual and population levels of learning. We formulate the problem as a Bayesian dynamic program and show that the optimal policy can be found by comparing the mean estimates of the current alternative and the population. We also derive other structural properties to provide managerial insights and shed light on the two levels of learning.
Item Citations and Data
Attribution-NonCommercial-NoDerivatives 4.0 International