- Library Home /
- Search Collections /
- Open Collections /
- Browse Collections /
- UBC Theses and Dissertations /
- 3D human pose estimation with self-supervision and...
Open Collections
UBC Theses and Dissertations
UBC Theses and Dissertations
3D human pose estimation with self-supervision and learnable data generation Gholami, Mohsen
Abstract
The estimation of 3D human body poses from 2D images is needed in a wide-ranging application across robotics, computer graphics, and patient monitoring. While Deep Learning offers promising solutions to this challenging problem, the progress has been impeded by the scarcity of annotated 3D datasets. This thesis proposes innovative strategies for training deep learning models without the need for explicit 3D annotated data. Previous approaches have proposed generating pseudo-ground-truth 3D annotated data from multi-view 2D images. While 2D images are cost-effective and easy to capture, these methods unfortunately require calibrated multi-view images. We propose a weak supervision approach that combines classical triangulation methods with a multi-view network to generate pseudo-ground-truth 3D images, demonstrating superior results without requiring camera calibration. We then investigate the generation of synthetic 3D pose data to enhance the generalizability of 3D pose models. We propose generating synthetic 2D-3D pose data that rely on random generation methods as have been previously proposed. Our method is a targeted synthetic 2D-3D pose generation that is tailored to have the resulting 3D pose models adapt to specific settings. Leveraging unlabeled images from the target domain, our method demonstrates significant performance improvements. We further extend our data generation method and generate image-3D poses using Neural Radiance Fields (NeRF). NeRFs have recently been proposed for generating novel views of humans, based on a partial set of 2D images. Unlike classical rendering methods, our method does not require 3D scans of humans. It generates out-of-distribution images for 3D pose models and significantly improves the generalizability of models on unseen datasets. We then explore a real-life application of estimating 3D body poses for monitoring Parkinson's disease patients. Here, we observe that poor generalizability of 3D pose models hinders their application to patient gait monitoring where accurate gait measurement is important. To avoid obtaining cumbersome and time-consuming clinical data, we introduce a weak supervision method that utilizes the expertise of clinicians to label the gait videos of Parkinson's patients, enhancing the applicability of 3D pose estimation in clinical settings.
Item Metadata
Title |
3D human pose estimation with self-supervision and learnable data generation
|
Creator | |
Supervisor | |
Publisher |
University of British Columbia
|
Date Issued |
2024
|
Description |
The estimation of 3D human body poses from 2D images is needed in a wide-ranging application across robotics, computer graphics, and patient monitoring. While Deep Learning offers promising solutions to this challenging problem, the progress has been impeded by the scarcity of annotated 3D datasets. This thesis proposes innovative strategies for training deep learning models without the need for explicit 3D annotated data.
Previous approaches have proposed generating pseudo-ground-truth 3D annotated data from multi-view 2D images. While 2D images are cost-effective and easy to capture, these methods unfortunately require calibrated multi-view images. We propose a weak supervision approach that combines classical triangulation methods with a multi-view network to generate pseudo-ground-truth 3D images, demonstrating superior results without requiring camera calibration.
We then investigate the generation of synthetic 3D pose data to enhance the generalizability of 3D pose models. We propose generating synthetic 2D-3D pose data that rely on random generation methods as have been previously proposed. Our method is a targeted synthetic 2D-3D pose generation that is tailored to have the resulting 3D pose models adapt to specific settings. Leveraging unlabeled images from the target domain, our method demonstrates significant performance improvements.
We further extend our data generation method and generate image-3D poses using Neural Radiance Fields (NeRF). NeRFs have recently been proposed for generating novel views of humans, based on a partial set of 2D images. Unlike classical rendering methods, our method does not require 3D scans of humans. It generates out-of-distribution images for 3D pose models and significantly improves the generalizability of models on unseen datasets.
We then explore a real-life application of estimating 3D body poses for monitoring Parkinson's disease patients. Here, we observe that poor generalizability of 3D pose models hinders their application to patient gait monitoring where accurate gait measurement is important. To avoid obtaining cumbersome and time-consuming clinical data, we introduce a weak supervision method that utilizes the expertise of clinicians to label the gait videos of Parkinson's patients, enhancing the applicability of 3D pose estimation in clinical settings.
|
Genre | |
Type | |
Language |
eng
|
Date Available |
2024-09-05
|
Provider |
Vancouver : University of British Columbia Library
|
Rights |
Attribution-NonCommercial-NoDerivatives 4.0 International
|
DOI |
10.14288/1.0445319
|
URI | |
Degree | |
Program | |
Affiliation | |
Degree Grantor |
University of British Columbia
|
Graduation Date |
2025-11
|
Campus | |
Scholarly Level |
Graduate
|
Rights URI | |
Aggregated Source Repository |
DSpace
|
Item Media
Item Citations and Data
Rights
Attribution-NonCommercial-NoDerivatives 4.0 International