UBC Theses and Dissertations

UBC Theses Logo

UBC Theses and Dissertations

Modeling pedestrian behavior in pedestrian-vehicle near misses using inverse reinforcement learning algorithms Nasernejad, Payam


Using simulation models to conduct safety assessments can have several advantages as it enables the evaluation of the safety of various design and traffic management options before actually making changes. However, limited studies have developed microsimulation models for the safety evaluation of active road users such as pedestrians. This can be attributed to the limited ability of the existing simulation models to capture the heterogeneity in pedestrian behavior and their complex collision avoidance mechanisms. Therefore, the objective of this thesis is to develop an agent-based framework to realistically model pedestrian behavior in near misses and to improve the understanding of pedestrian evasive action mechanisms in interactions with vehicles. Pedestrian-vehicle conflicts are modeled using single-agent and multi-agent approaches under the Markov Decision Process (MDP) and Markov Games (MG) frameworks, respectively. A continuous Gaussian Process Inverse Reinforcement Learning (GPIRL) approach is implemented to recover pedestrians’ single-agent reward functions and infer their collision avoidance mechanisms in conflict situations. In the multi-agent framework, pedestrian-vehicle conflicts are modeled utilizing the Multi-Agent Adversarial Inverse Reinforcement Learning (MA-AIRL). Video data from a congested intersection in Shanghai, China is used as a case study. Trajectories of pedestrians and vehicles involved in traffic conflicts were extracted with computer vision algorithms. A Deep Reinforcement Learning (DRL) model is used to estimate optimal pedestrian single-agent policies in traffic conflicts. Moreover, the adversarial multi-agent IRL approach simulates road users’ optimum evasive actions with an implementation of the multi-agent actor-critic using Kronecker-factored trust region (MACK). This algorithm enables multi-agent policy estimation using the rewards recovered from the discriminator of the adversarial neural network. The results show that the developed models predicted pedestrian trajectories and their evasive action mechanisms (i.e., swerving maneuver and speed changing) in conflict situations with high accuracy. Moreover, the highly nonlinear structure of the reward function in the multi-agent framework enabled capturing more complex behavior of the road users in near misses and their collision avoidance mechanisms. This study is a crucial step in developing a safety-oriented microsimulation tool for pedestrians in mixed traffic conditions.

Item Citations and Data


Attribution-NonCommercial-NoDerivatives 4.0 International