- Library Home /
- Search Collections /
- Open Collections /
- Browse Collections /
- UBC Theses and Dissertations /
- Taking advantage of common assumptions in policy optimization...
Open Collections
UBC Theses and Dissertations
UBC Theses and Dissertations
Taking advantage of common assumptions in policy optimization and reinforcement learning Lavington, Jonathan Wilder
Abstract
This work considers training conditional probability distributions called policies, using simulated environments via gradient-based optimization methods. It begins by investigating the effects that complex model classes have on settings where a policy is learned through the imitation of expert data which is gathered through repeated environmental interaction. Next, it discusses how to build a gradient based optimizer which is tailored specifically to policy optimization where querying gradient information is expensive. We then consider policy optimization settings which contain imperfect expert demonstration, and design an algorithm which utilizes additional information available during training to improve the policy performance at test time and the efficiency of learning. Lastly, we consider how to generate behavioral data which satisfies hard constraints by using a combination of learned inference artifacts and a special variant of sequential Monte Carlo.
Item Metadata
Title |
Taking advantage of common assumptions in policy optimization and reinforcement learning
|
Creator | |
Supervisor | |
Publisher |
University of British Columbia
|
Date Issued |
2024
|
Description |
This work considers training conditional probability distributions called policies, using simulated environments via gradient-based optimization methods. It begins by investigating the effects that complex model classes have on settings where a policy is learned through the imitation of expert data which is gathered through repeated environmental interaction. Next, it discusses how to build a gradient based optimizer which is tailored specifically to policy optimization where querying gradient information is expensive. We then consider policy optimization settings which contain imperfect expert demonstration, and design an algorithm which utilizes additional information available during training to improve the policy performance at test time and the efficiency of learning. Lastly, we consider how to generate behavioral data which satisfies hard constraints by using a combination of learned inference artifacts and a special variant of sequential Monte Carlo.
|
Genre | |
Type | |
Language |
eng
|
Date Available |
2024-10-31
|
Provider |
Vancouver : University of British Columbia Library
|
Rights |
Attribution-NoDerivatives 4.0 International
|
DOI |
10.14288/1.0447187
|
URI | |
Degree | |
Program | |
Affiliation | |
Degree Grantor |
University of British Columbia
|
Graduation Date |
2025-05
|
Campus | |
Scholarly Level |
Graduate
|
Rights URI | |
Aggregated Source Repository |
DSpace
|
Item Media
Item Citations and Data
Rights
Attribution-NoDerivatives 4.0 International