Explicit and implicit warping for accurate human pose estimation and low-latency neural rendering

UBC Theses and Dissertations

Featured Collection

UBC Theses and Dissertations

Explicit and implicit warping for accurate human pose estimation and low-latency neural rendering Yu, Frank

Abstract

Deep neural networks have become an integral part of modern advances in the field of computer vision. However, these solutions are not practical when relying on increasingly large neural networks and diverse datasets to scale. Prior works demonstrate that embedding domain/application-specific knowledge in both the architecture design and training procedure is one way to improve scalability. In this thesis, we propose two methods that leverage domain knowledge, defined through explicit and implicit warping, to create more data and runtime efficient networks in two applications. First, we compute an explicit warping to disentangle the learning of camera intrinsic parameters from the human pose estimation pipeline. Our explicit warping takes into account the region of interest and the camera's focal length to define a perspective-correct crop. By including this as a preprocess or end-to-end component in the network, we significantly increase performance, especially in cases where the subject is near the boundary of the image. Second, we leverage the knowledge that sequential frames in talking-head video conferencing have significant visual overlap. Therefore, we design a simple and effective implicit warping strategy between timesteps to greatly decrease the latency and increase the framerate of talking-head neural rendering. Our proposed methods demonstrate significant improvements in accuracy and latency in their respective applications.

Item Metadata

Title	Explicit and implicit warping for accurate human pose estimation and low-latency neural rendering
Creator	Yu, Frank
Supervisor	Rhodin, Helge
Publisher	University of British Columbia
Date Issued	2023
Description	Deep neural networks have become an integral part of modern advances in the field of computer vision. However, these solutions are not practical when relying on increasingly large neural networks and diverse datasets to scale. Prior works demonstrate that embedding domain/application-specific knowledge in both the architecture design and training procedure is one way to improve scalability. In this thesis, we propose two methods that leverage domain knowledge, defined through explicit and implicit warping, to create more data and runtime efficient networks in two applications. First, we compute an explicit warping to disentangle the learning of camera intrinsic parameters from the human pose estimation pipeline. Our explicit warping takes into account the region of interest and the camera's focal length to define a perspective-correct crop. By including this as a preprocess or end-to-end component in the network, we significantly increase performance, especially in cases where the subject is near the boundary of the image. Second, we leverage the knowledge that sequential frames in talking-head video conferencing have significant visual overlap. Therefore, we design a simple and effective implicit warping strategy between timesteps to greatly decrease the latency and increase the framerate of talking-head neural rendering. Our proposed methods demonstrate significant improvements in accuracy and latency in their respective applications.
Genre	Thesis/Dissertation
Type	Text
Language	eng
Date Available	2023-04-20
Provider	Vancouver : University of British Columbia Library
Rights	Attribution-NonCommercial-NoDerivatives 4.0 International
DOI	10.14288/1.0431314
URI	http://hdl.handle.net/2429/84370
Degree	Master of Science - MSc
Program	Computer Science
Affiliation	Science, Faculty of; Computer Science, Department of
Degree Grantor	University of British Columbia
Graduation Date	2023-05
Campus	UBCV
Scholarly Level	Graduate
Rights URI	http://creativecommons.org/licenses/by-nc-nd/4.0/
Aggregated Source Repository	DSpace

Open Collections

UBC Theses and Dissertations

UBC Theses and Dissertations

Explicit and implicit warping for accurate human pose estimation and low-latency neural rendering Yu, Frank

Abstract

Item Metadata

Item Media

Item Citations and Data

Rights