Online learning and decision making : algorithms and Bayesian analysis

UBC Theses and Dissertations

Featured Collection

UBC Theses and Dissertations

Online learning and decision making : algorithms and Bayesian analysis Lin, Meichun

Abstract

Decision making under uncertainty and limited information has been a critical challenge in operations. This thesis studies three applications and sheds light on how to make sequential decisions as data for learning are collected over time. In the first chapter, we consider the data-driven newsvendor problem where a manager makes inventory decisions sequentially and learns the unknown demand distribution based on observed samples of continuous demand (no truncation). We show that the widely-used sample average approximation approach is near-optimal. Moreover, we characterize how the best achievable performance depends on not only the time horizon but also the local flatness of the demand distribution. The second chapter considers a dynamic pricing and learning problem where a seller prices multiple products and learns from sales data about unknown demand. To avoid the classical problem of incomplete learning, we propose dithering policies under which prices are probabilistically selected in a neighborhood surrounding the myopic optimal price. We show that dithering policies achieve asymptotically optimal performance in three typical settings and their extensions with demand correlation, which demonstrates dithering as a unified approach to balance exploration and exploitation. The third chapter considers a sequential search over a group of similar alternatives to select the best one. The individual value of an alternative contains two components: an observable utility and an idiosyncratic value. Once an alternative is searched, the utility can be fully revealed, but the idiosyncratic value is unobservable and needs to be learned gradually by sampling. The utilities share an unknown population distribution, which captures the similarity across the alternatives and allows for knowledge transfer within the group. A novel feature of this problem is the combination of the individual and population levels of learning. We formulate the problem as a Bayesian dynamic program and show that the optimal policy can be found by comparing the mean estimates of the current alternative and the population. We also derive other structural properties to provide managerial insights and shed light on the two levels of learning.

Item Metadata

Title	Online learning and decision making : algorithms and Bayesian analysis
Creator	Lin, Meichun
Supervisor	Huh, Woonghee Tim; Kim, Michael Jong
Publisher	University of British Columbia
Date Issued	2023
Description	Decision making under uncertainty and limited information has been a critical challenge in operations. This thesis studies three applications and sheds light on how to make sequential decisions as data for learning are collected over time. In the first chapter, we consider the data-driven newsvendor problem where a manager makes inventory decisions sequentially and learns the unknown demand distribution based on observed samples of continuous demand (no truncation). We show that the widely-used sample average approximation approach is near-optimal. Moreover, we characterize how the best achievable performance depends on not only the time horizon but also the local flatness of the demand distribution. The second chapter considers a dynamic pricing and learning problem where a seller prices multiple products and learns from sales data about unknown demand. To avoid the classical problem of incomplete learning, we propose dithering policies under which prices are probabilistically selected in a neighborhood surrounding the myopic optimal price. We show that dithering policies achieve asymptotically optimal performance in three typical settings and their extensions with demand correlation, which demonstrates dithering as a unified approach to balance exploration and exploitation. The third chapter considers a sequential search over a group of similar alternatives to select the best one. The individual value of an alternative contains two components: an observable utility and an idiosyncratic value. Once an alternative is searched, the utility can be fully revealed, but the idiosyncratic value is unobservable and needs to be learned gradually by sampling. The utilities share an unknown population distribution, which captures the similarity across the alternatives and allows for knowledge transfer within the group. A novel feature of this problem is the combination of the individual and population levels of learning. We formulate the problem as a Bayesian dynamic program and show that the optimal policy can be found by comparing the mean estimates of the current alternative and the population. We also derive other structural properties to provide managerial insights and shed light on the two levels of learning.
Genre	Thesis/Dissertation
Type	Text
Language	eng
Date Available	2023-06-22
Provider	Vancouver : University of British Columbia Library
Rights	Attribution-NonCommercial-NoDerivatives 4.0 International
DOI	10.14288/1.0433734
URI	http://hdl.handle.net/2429/85134
Degree	Doctor of Philosophy - PhD
Program	Business Administration - Management Science
Affiliation	Business, Sauder School of; Operations and Logistics (OPLOG), Division of
Degree Grantor	University of British Columbia
Graduation Date	2023-11
Campus	UBCV
Scholarly Level	Graduate
Rights URI	http://creativecommons.org/licenses/by-nc-nd/4.0/
Aggregated Source Repository	DSpace

Open Collections

UBC Theses and Dissertations

UBC Theses and Dissertations

Online learning and decision making : algorithms and Bayesian analysis Lin, Meichun

Abstract

Item Metadata

Item Media

Item Citations and Data

Rights