Generative adversarial networks for pose-guided human video generation

UBC Theses and Dissertations

Featured Collection

UBC Theses and Dissertations

Generative adversarial networks for pose-guided human video generation Zablotskaia, Polina

Abstract

Generation of realistic high-resolution videos of human subjects is a challenging and important task in computer vision. In this thesis, we focus on human motion transfer -- generation of a video depicting a particular subject, observed in a single image, performing a series of motions exemplified by an auxiliary (driving) video. Our GAN-based architecture, DwNet, leverages a dense intermediate pose-guided representation and a refinement process to warp the required subject appearance, in the form of the texture, from a source image into a desired pose. Temporal consistency is maintained by further conditioning the decoding process within a GAN on the previously generated frame. In this way a video is generated in an iterative and recurrent fashion. We illustrate the efficacy of our approach by showing state-of-the-art quantitative and qualitative performance on two benchmark datasets: TaiChi and Fashion Modeling. The latter was collected by us and is made publicly available to the community. We also show how our proposed method can be further improved by using a recent segmentation-mask-based architecture, such as SPADE, and how to battle temporal inconsistency in video synthesis using a temporal discriminator.

Item Metadata

Title	Generative adversarial networks for pose-guided human video generation
Creator	Zablotskaia, Polina
Publisher	University of British Columbia
Date Issued	2020
Description	Generation of realistic high-resolution videos of human subjects is a challenging and important task in computer vision. In this thesis, we focus on human motion transfer -- generation of a video depicting a particular subject, observed in a single image, performing a series of motions exemplified by an auxiliary (driving) video. Our GAN-based architecture, DwNet, leverages a dense intermediate pose-guided representation and a refinement process to warp the required subject appearance, in the form of the texture, from a source image into a desired pose. Temporal consistency is maintained by further conditioning the decoding process within a GAN on the previously generated frame. In this way a video is generated in an iterative and recurrent fashion. We illustrate the efficacy of our approach by showing state-of-the-art quantitative and qualitative performance on two benchmark datasets: TaiChi and Fashion Modeling. The latter was collected by us and is made publicly available to the community. We also show how our proposed method can be further improved by using a recent segmentation-mask-based architecture, such as SPADE, and how to battle temporal inconsistency in video synthesis using a temporal discriminator.
Genre	Thesis/Dissertation
Type	Text; Moving Image
Language	eng
Date Available	2020-03-31
Provider	Vancouver : University of British Columbia Library
DOI	10.14288/1.0389697
URI	http://hdl.handle.net/2429/73869
Degree (Theses)	Master of Science - MSc
Program (Theses)	Computer Science
Affiliation	Science, Faculty of; Computer Science, Department of
Degree Grantor	University of British Columbia
Graduation Date	2020-05
Campus	UBCV
Scholarly Level	Graduate
Aggregated Source Repository	DSpace

Item Media

ubc_2020_may_zablotskaia_polina.pdf -- 24.41MB

ubc_2020_may_zablotskaia_polina_supp.mp4 -- 4.26MB

Open Collections

UBC Theses and Dissertations

UBC Theses and Dissertations

Generative adversarial networks for pose-guided human video generation Zablotskaia, Polina

Abstract

Item Metadata

Item Media

Item Citations and Data

Rights