- Library Home /
- Search Collections /
- Open Collections /
- Browse Collections /
- UBC Theses and Dissertations /
- Explicit and implicit warping for accurate human pose...
Open Collections
UBC Theses and Dissertations
UBC Theses and Dissertations
Explicit and implicit warping for accurate human pose estimation and low-latency neural rendering Yu, Frank
Abstract
Deep neural networks have become an integral part of modern advances in the field of computer vision. However, these solutions are not practical when relying on increasingly large neural networks and diverse datasets to scale. Prior works demonstrate that embedding domain/application-specific knowledge in both the architecture design and training procedure is one way to improve scalability. In this thesis, we propose two methods that leverage domain knowledge, defined through explicit and implicit warping, to create more data and runtime efficient networks in two applications. First, we compute an explicit warping to disentangle the learning of camera intrinsic parameters from the human pose estimation pipeline. Our explicit warping takes into account the region of interest and the camera's focal length to define a perspective-correct crop. By including this as a preprocess or end-to-end component in the network, we significantly increase performance, especially in cases where the subject is near the boundary of the image. Second, we leverage the knowledge that sequential frames in talking-head video conferencing have significant visual overlap. Therefore, we design a simple and effective implicit warping strategy between timesteps to greatly decrease the latency and increase the framerate of talking-head neural rendering. Our proposed methods demonstrate significant improvements in accuracy and latency in their respective applications.
Item Metadata
Title |
Explicit and implicit warping for accurate human pose estimation and low-latency neural rendering
|
Creator | |
Supervisor | |
Publisher |
University of British Columbia
|
Date Issued |
2023
|
Description |
Deep neural networks have become an integral part of modern advances in the field of computer vision. However, these solutions are not practical when relying on increasingly large neural networks and diverse datasets to scale. Prior works demonstrate that embedding domain/application-specific knowledge in both the architecture design and training procedure is one way to improve scalability. In this thesis, we propose two methods that leverage domain knowledge, defined through explicit and implicit warping, to create more data and runtime efficient networks in two applications. First, we compute an explicit warping to disentangle the learning of camera intrinsic parameters from the human pose estimation pipeline. Our explicit warping takes into account the region of interest and the camera's focal length to define a perspective-correct crop. By including this as a preprocess or end-to-end component in the network, we significantly increase performance, especially in cases where the subject is near the boundary of the image. Second, we leverage the knowledge that sequential frames in talking-head video conferencing have significant visual overlap. Therefore, we design a simple and effective implicit warping strategy between timesteps to greatly decrease the latency and increase the framerate of talking-head neural rendering. Our proposed methods demonstrate significant improvements in accuracy and latency in their respective applications.
|
Genre | |
Type | |
Language |
eng
|
Date Available |
2023-04-20
|
Provider |
Vancouver : University of British Columbia Library
|
Rights |
Attribution-NonCommercial-NoDerivatives 4.0 International
|
DOI |
10.14288/1.0431314
|
URI | |
Degree | |
Program | |
Affiliation | |
Degree Grantor |
University of British Columbia
|
Graduation Date |
2023-05
|
Campus | |
Scholarly Level |
Graduate
|
Rights URI | |
Aggregated Source Repository |
DSpace
|
Item Media
Item Citations and Data
Rights
Attribution-NonCommercial-NoDerivatives 4.0 International