2D pose regression using implicit Gaussian mixtures

UBC Theses and Dissertations

Featured Collection

UBC Theses and Dissertations

2D pose regression using implicit Gaussian mixtures Evans, Andrew

Abstract

The task of inferring human poses in 2D image data is known as 2D human pose estimation. Modern approaches to this problem use neural networks to generate keypoint predictions: estimates for the locations of keypoints, such as elbows and knees, associated with the human body. Top-down heatmap-based methods are one such approach, in which predictions are generated for each person in an image independently. This is often done by generating a set of heatmaps which can then be used to predict keypoint locations. Although these methods perform well in practice, they have several disadvantages: the heatmaps they generate are in general not probabilistic and cannot be used to attribute confidence to keypoint predictions, nor correlation between uncertain joint positions. These methods also cannot generally be used in end-to-end regression from input to predictions, as the argmax function is used to determine keypoint locations from heatmaps is non-differentiable. In this thesis, we investigate ways to overcome these drawbacks, with the aim of providing a top-down 2D human pose estimation method which provides predictions which are multi-modal, probabilistic, and differentiable. We accomplish this by creating a model which can generate multiple Gaussian predictions via sampling, leveraging the effectiveness of heatmap-based approaches by generating these Gaussian predictions from heatmaps. Using a sampling procedure ensures predictions are not limited to a small number of preset modes, as is the case with Gaussian mixture models. Implicit Maximum Likelihood Estimation is also used during training, providing a less costly training procedure than performing maximum likelihood estimation on Gaussian mixture distributions. We show that this model is able to generate pose predictions with comparable accuracy to existing methods, while providing rich statistics on the certainty of its predictions over full poses, unlike traditional method which can only produce predictions for each keypoint independently. It is also able to provide multi-modal predictions, capturing information about ambiguous image data that standard heatmap approaches cannot.

Item Metadata

Title	2D pose regression using implicit Gaussian mixtures
Creator	Evans, Andrew
Supervisor	Sigal, Leonid; Rhodin, Helge
Publisher	University of British Columbia
Date Issued	2026
Description	The task of inferring human poses in 2D image data is known as 2D human pose estimation. Modern approaches to this problem use neural networks to generate keypoint predictions: estimates for the locations of keypoints, such as elbows and knees, associated with the human body. Top-down heatmap-based methods are one such approach, in which predictions are generated for each person in an image independently. This is often done by generating a set of heatmaps which can then be used to predict keypoint locations. Although these methods perform well in practice, they have several disadvantages: the heatmaps they generate are in general not probabilistic and cannot be used to attribute confidence to keypoint predictions, nor correlation between uncertain joint positions. These methods also cannot generally be used in end-to-end regression from input to predictions, as the argmax function is used to determine keypoint locations from heatmaps is non-differentiable. In this thesis, we investigate ways to overcome these drawbacks, with the aim of providing a top-down 2D human pose estimation method which provides predictions which are multi-modal, probabilistic, and differentiable. We accomplish this by creating a model which can generate multiple Gaussian predictions via sampling, leveraging the effectiveness of heatmap-based approaches by generating these Gaussian predictions from heatmaps. Using a sampling procedure ensures predictions are not limited to a small number of preset modes, as is the case with Gaussian mixture models. Implicit Maximum Likelihood Estimation is also used during training, providing a less costly training procedure than performing maximum likelihood estimation on Gaussian mixture distributions. We show that this model is able to generate pose predictions with comparable accuracy to existing methods, while providing rich statistics on the certainty of its predictions over full poses, unlike traditional method which can only produce predictions for each keypoint independently. It is also able to provide multi-modal predictions, capturing information about ambiguous image data that standard heatmap approaches cannot.
Genre	Thesis/Dissertation
Type	Text
Language	eng
Date Available	2026-04-17
Provider	Vancouver : University of British Columbia Library
Rights	Attribution-NonCommercial-NoDerivatives 4.0 International
DOI	10.14288/1.0452015
URI	http://hdl.handle.net/2429/94153
Degree (Theses)	Master of Science - MSc
Program (Theses)	Computer Science
Affiliation	Science, Faculty of; Computer Science, Department of
Degree Grantor	University of British Columbia
Graduation Date	2026-05
Campus	UBCV
Scholarly Level	Graduate
Rights URI	http://creativecommons.org/licenses/by-nc-nd/4.0/
Aggregated Source Repository	DSpace

Open Collections

UBC Theses and Dissertations

UBC Theses and Dissertations

2D pose regression using implicit Gaussian mixtures Evans, Andrew

Abstract

Item Metadata

Item Media

Item Citations and Data

Rights