Practicable Robust Markov Decision Processes - UBC Library Open Collections

BIRS Workshop Lecture Videos

Featured Collection

BIRS Workshop Lecture Videos

Practicable Robust Markov Decision Processes Xu, Huan

Description

Markov decision processes (MDP) is a standard modeling tool for sequential decision making in a dynamic and stochastic environment. When the model parameters are subject to uncertainty, the "optimal strategy" obtained from MDP can significantly under-perform than the model's prediction. To address this, robust MDP has been developed which is based on worst-case analysis. However, several restrictions of the robust MDP model prevent it from practical success, which I will address in this talk. The first restriction of standard robust MDP is that the modeling of uncertainty is not flexible and can lead to conservative solution. In particular, it requires that the uncertainty set is "rectangular" - i.e., it is a Cartesian product of uncertainty sets of each state. To lift this assumption, we propose an uncertainty model which we call “k-rectangular" that generalizes the concept of rectangularity, and we show that this can be solved efficiently via state augmentation. The second restriction is that it does not take into account the learning issue - i.e., how to adapt the model in an efficient way to reduce the uncertainty. To address this, we devise an algorithm inspired by reinforcement learning that, without knowing the true uncertainty model, is able to adapt its level of protection to uncertainty, and in the long run performs as good as the minimax policy as if the true uncertainty model is known. Indeed, the algorithm achieves similar regret bounds as standard MDP where no parameter is adversarial, which shows that with virtually no extra cost we can adapt robust learning to handle uncertainty in MDPs.

Item Metadata

Title	Practicable Robust Markov Decision Processes
Creator	Xu, Huan
Publisher	Banff International Research Station for Mathematical Innovation and Discovery
Date Issued	2018-03-07T09:44
Description	Markov decision processes (MDP) is a standard modeling tool for sequential decision making in a dynamic and stochastic environment. When the model parameters are subject to uncertainty, the "optimal strategy" obtained from MDP can significantly under-perform than the model's prediction. To address this, robust MDP has been developed which is based on worst-case analysis. However, several restrictions of the robust MDP model prevent it from practical success, which I will address in this talk. The first restriction of standard robust MDP is that the modeling of uncertainty is not flexible and can lead to conservative solution. In particular, it requires that the uncertainty set is "rectangular" - i.e., it is a Cartesian product of uncertainty sets of each state. To lift this assumption, we propose an uncertainty model which we call “k-rectangular" that generalizes the concept of rectangularity, and we show that this can be solved efficiently via state augmentation. The second restriction is that it does not take into account the learning issue - i.e., how to adapt the model in an efficient way to reduce the uncertainty. To address this, we devise an algorithm inspired by reinforcement learning that, without knowing the true uncertainty model, is able to adapt its level of protection to uncertainty, and in the long run performs as good as the minimax policy as if the true uncertainty model is known. Indeed, the algorithm achieves similar regret bounds as standard MDP where no parameter is adversarial, which shows that with virtually no extra cost we can adapt robust learning to handle uncertainty in MDPs.
Extent	35 minutes
Subject	Mathematics; Operations research, mathematical programming; Probability theory and stochastic processes; Operation research
Type	Moving Image
File Format	video/mp4
Language	eng
Notes	Author affiliation: Georgia Institute of Technology
Series	BIRS Workshop Lecture Videos (Banff, Alta)
Date Available	2018-09-03
Provider	Vancouver : University of British Columbia Library
Rights	Attribution-NonCommercial-NoDerivatives 4.0 International
DOI	10.14288/1.0371895
URI	http://hdl.handle.net/2429/67079
Affiliation	Non UBC
Peer Review Status	Unreviewed
Scholarly Level	Faculty
Rights URI	http://creativecommons.org/licenses/by-nc-nd/4.0/
Aggregated Source Repository	DSpace

Item Media

201803070944-Xu_lrv.mp4 -- 106.03MB

Item Citations and Data

Rights

Attribution-NonCommercial-NoDerivatives 4.0 International