- Library Home /
- Search Collections /
- Open Collections /
- Browse Collections /
- UBC Theses and Dissertations /
- Equilibrium policy gradients for spatiotemporal planning
Open Collections
UBC Theses and Dissertations
UBC Theses and Dissertations
Equilibrium policy gradients for spatiotemporal planning Crowley, Mark
Abstract
In spatiotemporal planning, agents choose actions at multiple locations in space over some planning horizon to maximize their utility and satisfy various constraints. In forestry planning, for example, the problem is to choose actions for thousands of locations in the forest each year. The actions at each location could include harvesting trees, treating trees against disease and pests, or doing nothing. A utility model could place value on sale of forest products, ecosystem sustainability or employment levels, and could incorporate legal and logistical constraints such as avoiding large contiguous areas of clearcutting and managing road access. Planning requires a model of the dynamics. Existing simulators developed by forestry researchers can provide detailed models of the dynamics of a forest over time, but these simulators are often not designed for use in automated planning. This thesis presents spatiotemoral planning in terms of factored Markov decision processes. A policy gradient planning algorithm optimizes a stochastic spatial policy using existing simulators for dynamics. When a planning problem includes spatial interaction between locations, deciding on an action to carry out at one location requires considering the actions performed at other locations. This spatial interdependence is common in forestry and other environmental planning problems and makes policy representation and planning challenging. We define a spatial policy in terms of local policies defined as distributions over actions at one location conditioned upon actions at other locations. A policy gradient planning algorithm using this spatial policy is presented which uses Markov Chain Monte Carlo simulation to sample the landscape policy, estimate its gradient and use this gradient to guide policy improvement. Evaluation is carried out on a forestry planning problem with 1880 locations using a variety of value models and constraints. The distribution over joint actions at all locations can be seen as the equilibrium of a cyclic causal model. This equilibrium semantics is compared to Structural Equation Models. We also define an algorithm for approximating the equilibrium distribution for cyclic causal networks which exploits graphical structure and analyse when the algorithm is exact.
Item Metadata
Title |
Equilibrium policy gradients for spatiotemporal planning
|
Creator | |
Publisher |
University of British Columbia
|
Date Issued |
2011
|
Description |
In spatiotemporal planning, agents choose actions at multiple locations in space
over some planning horizon to maximize their utility and satisfy various constraints. In forestry planning, for example, the problem is to choose actions
for thousands of locations in the forest each year. The actions at each location
could include harvesting trees, treating trees against disease and pests, or
doing nothing. A utility model could place value on sale of forest products,
ecosystem sustainability or employment levels, and could incorporate legal and
logistical constraints such as avoiding large contiguous areas of clearcutting
and managing road access. Planning requires a model of the dynamics. Existing
simulators developed by forestry researchers can provide detailed models of the
dynamics of a forest over time, but these simulators are often not designed for
use in automated planning.
This thesis presents spatiotemoral planning in terms of factored Markov decision processes. A policy gradient planning algorithm optimizes a stochastic spatial policy using existing simulators for dynamics.
When a planning problem includes spatial interaction between locations, deciding on an action to carry out at one location requires considering the actions performed at other locations. This spatial interdependence is common in forestry and other environmental planning problems and makes policy representation and planning challenging. We define a spatial policy in terms of local policies defined as distributions over actions at one location conditioned upon actions at other locations.
A policy gradient planning algorithm using this spatial policy is presented which
uses Markov Chain Monte Carlo simulation to sample the landscape policy, estimate its gradient and use this gradient to guide policy improvement. Evaluation is carried out on a forestry planning problem with 1880 locations using a variety of value models and constraints.
The distribution over joint actions at all locations can be seen as the equilibrium of a cyclic causal model. This equilibrium semantics is compared to Structural Equation Models. We also define an algorithm for approximating the equilibrium distribution for cyclic causal networks which exploits graphical structure and analyse when the algorithm is exact.
|
Genre | |
Type | |
Language |
eng
|
Date Available |
2011-11-15
|
Provider |
Vancouver : University of British Columbia Library
|
Rights |
Attribution-NonCommercial-ShareAlike 3.0 Unported
|
DOI |
10.14288/1.0052093
|
URI | |
Degree | |
Program | |
Affiliation | |
Degree Grantor |
University of British Columbia
|
Graduation Date |
2012-05
|
Campus | |
Scholarly Level |
Graduate
|
Rights URI | |
Aggregated Source Repository |
DSpace
|
Item Media
Item Citations and Data
Rights
Attribution-NonCommercial-ShareAlike 3.0 Unported