Open Collections will undergo scheduled maintenance on the following dates: On Monday, April 27th, 2026, the site will not be available from 7:00 AM – 9:00 AM PST and on Tuesday, April 28th, 2026, the site will remain accessible from 7:00 AM – 9:00 AM PST, however item images and media will not be available during this time.
- Library Home /
- Search Collections /
- Open Collections /
- Browse Collections /
- UBC Theses and Dissertations /
- 2D pose regression using implicit Gaussian mixtures
Open Collections
UBC Theses and Dissertations
UBC Theses and Dissertations
2D pose regression using implicit Gaussian mixtures Evans, Andrew
Abstract
The task of inferring human poses in 2D image data is known as 2D human pose estimation. Modern approaches to this problem use neural networks to generate keypoint predictions: estimates for the locations of keypoints, such as elbows and knees, associated with the human body. Top-down heatmap-based methods are one such approach, in which predictions are generated for each person in an image independently. This is often done by generating a set of heatmaps which can then be used to predict keypoint locations. Although these methods perform well in practice, they have several disadvantages: the heatmaps they generate are in general not probabilistic and cannot be used to attribute confidence to keypoint predictions, nor correlation between uncertain joint positions. These methods also cannot generally be used in end-to-end regression from input to predictions, as the argmax function is used to determine keypoint locations from heatmaps is non-differentiable.
In this thesis, we investigate ways to overcome these drawbacks, with the aim of providing a top-down 2D human pose estimation method which provides predictions which are multi-modal, probabilistic, and differentiable. We accomplish this by creating a model which can generate multiple Gaussian predictions via sampling, leveraging the effectiveness of heatmap-based approaches by generating these Gaussian predictions from heatmaps. Using a sampling procedure ensures predictions are not limited to a small number of preset modes, as is the case with Gaussian mixture models. Implicit Maximum Likelihood Estimation is also used during training, providing a less costly training procedure than performing maximum likelihood estimation on Gaussian mixture distributions. We show that this model is able to generate pose predictions with comparable accuracy to existing methods, while providing rich statistics on the certainty of its predictions over full poses, unlike traditional method which can only produce predictions for each keypoint independently. It is also able to provide multi-modal predictions, capturing information about ambiguous image data that standard heatmap approaches cannot.
Item Metadata
| Title |
2D pose regression using implicit Gaussian mixtures
|
| Creator | |
| Supervisor | |
| Publisher |
University of British Columbia
|
| Date Issued |
2026
|
| Description |
The task of inferring human poses in 2D image data is known as 2D human pose estimation. Modern approaches to this problem use neural networks to generate keypoint predictions: estimates for the locations of keypoints, such as elbows and knees, associated with the human body. Top-down heatmap-based methods are one such approach, in which predictions are generated for each person in an image independently. This is often done by generating a set of heatmaps which can then be used to predict keypoint locations. Although these methods perform well in practice, they have several disadvantages: the heatmaps they generate are in general not probabilistic and cannot be used to attribute confidence to keypoint predictions, nor correlation between uncertain joint positions. These methods also cannot generally be used in end-to-end regression from input to predictions, as the argmax function is used to determine keypoint locations from heatmaps is non-differentiable.
In this thesis, we investigate ways to overcome these drawbacks, with the aim of providing a top-down 2D human pose estimation method which provides predictions which are multi-modal, probabilistic, and differentiable. We accomplish this by creating a model which can generate multiple Gaussian predictions via sampling, leveraging the effectiveness of heatmap-based approaches by generating these Gaussian predictions from heatmaps. Using a sampling procedure ensures predictions are not limited to a small number of preset modes, as is the case with Gaussian mixture models. Implicit Maximum Likelihood Estimation is also used during training, providing a less costly training procedure than performing maximum likelihood estimation on Gaussian mixture distributions. We show that this model is able to generate pose predictions with comparable accuracy to existing methods, while providing rich statistics on the certainty of its predictions over full poses, unlike traditional method which can only produce predictions for each keypoint independently. It is also able to provide multi-modal predictions, capturing information about ambiguous image data that standard heatmap approaches cannot.
|
| Genre | |
| Type | |
| Language |
eng
|
| Date Available |
2026-04-17
|
| Provider |
Vancouver : University of British Columbia Library
|
| Rights |
Attribution-NonCommercial-NoDerivatives 4.0 International
|
| DOI |
10.14288/1.0452015
|
| URI | |
| Degree (Theses) | |
| Program (Theses) | |
| Affiliation | |
| Degree Grantor |
University of British Columbia
|
| Graduation Date |
2026-05
|
| Campus | |
| Scholarly Level |
Graduate
|
| Rights URI | |
| Aggregated Source Repository |
DSpace
|
Item Media
Item Citations and Data
Rights
Attribution-NonCommercial-NoDerivatives 4.0 International