UBC Theses and Dissertations
Modelling motorcyclist-pedestrian interactions using inverse reinforcement learning Andrade Lanzaro, Gabriel
Traffic simulation models have been used recently for road safety evaluation using traffic conﬂict indicators from simulated road user trajectories. However, this approach has many shortcomings: 1) microsimulation traffic models are developed based on rules that tend to avoid collisions, and 2) they do not realistically model road users’ behaviour and their collision avoidance mechanisms. This research models the interactions between motorcyclists and pedestrians using two inverse reinforcement learning (IRL) frameworks: single-agent IRL and multi-agent IRL. Road users are modelled in a Markov Game setting as intelligent decision-makers that attempt to maximize their utilities over time. The utility is expressed by the reward function, which provides insights into road users’ behaviour in conflict interactions and can be recovered from real road user trajectories. For this study, video data from a busy and congested intersection in Shanghai, China is used. Trajectories of motorcyclists and pedestrians involved in conflict interactions were extracted using computer vision algorithms. For the single-agent model, the Gaussian Process IRL is used to obtain the motorcyclists’ reward function, and the reward function is then utilized to infer motorcyclists’ preferences in conflict situations. In addition, the Deep Reinforcement Learning Actor-Critic framework is used to estimate motorcyclists' optimal policies (sequences of decisions) and simulate their trajectories. For the multi-agent model, Adversarial IRL is used to recover the reward function from the trajectories. The multi-agent model accounts for the equilibrium concept between road users by modelling their intentions in a Markov Game framework. Furthermore, the algorithm applies the Multi-agent Actor-Critic model with Kronecker factors to obtain the road users’ optimal policies. Finally, simulation tools were developed to predict motorcyclist and pedestrian trajectories using the optimal policies. In the single-agent model, the motorcyclist was modelled and the pedestrian had policies that were assumed to be known over time, whereas both road users were modelled as intelligent agents in the multi-agent model. The multi-agent model outperformed the single-agent model in terms of predicting the road users’ trajectories and their evasive action mechanisms. Furthermore, both models provided reasonably accurate predictions for the Post-Encroachment Time (PET) conflict indicator, which correlates well with corresponding field-measured conflicts.
Item Citations and Data
Attribution-NonCommercial-NoDerivatives 4.0 International