- Library Home /
- Search Collections /
- Open Collections /
- Browse Collections /
- UBC Theses and Dissertations /
- Generative adversarial networks for pose-guided human...
Open Collections
UBC Theses and Dissertations
UBC Theses and Dissertations
Generative adversarial networks for pose-guided human video generation Zablotskaia, Polina
Abstract
Generation of realistic high-resolution videos of human subjects is a challenging and important task in computer vision. In this thesis, we focus on human motion transfer -- generation of a video depicting a particular subject, observed in a single image, performing a series of motions exemplified by an auxiliary (driving) video. Our GAN-based architecture, DwNet, leverages a dense intermediate pose-guided representation and a refinement process to warp the required subject appearance, in the form of the texture, from a source image into a desired pose. Temporal consistency is maintained by further conditioning the decoding process within a GAN on the previously generated frame. In this way a video is generated in an iterative and recurrent fashion. We illustrate the efficacy of our approach by showing state-of-the-art quantitative and qualitative performance on two benchmark datasets: TaiChi and Fashion Modeling. The latter was collected by us and is made publicly available to the community. We also show how our proposed method can be further improved by using a recent segmentation-mask-based architecture, such as SPADE, and how to battle temporal inconsistency in video synthesis using a temporal discriminator.
Item Metadata
Title |
Generative adversarial networks for pose-guided human video generation
|
Creator | |
Publisher |
University of British Columbia
|
Date Issued |
2020
|
Description |
Generation of realistic high-resolution videos of human subjects is a challenging and important task in computer vision. In this thesis, we focus on human motion transfer -- generation of a video depicting a particular subject, observed in a single image, performing a series of motions exemplified by an auxiliary (driving) video. Our GAN-based architecture, DwNet, leverages a dense intermediate pose-guided representation and a refinement process to warp the required subject appearance, in the form of the texture, from a source image into a desired pose. Temporal consistency is maintained by further conditioning the decoding process within a GAN on the previously generated frame. In this way a video is generated in an iterative and recurrent fashion. We illustrate the efficacy of our approach by showing state-of-the-art quantitative and qualitative performance on two benchmark datasets: TaiChi and Fashion Modeling. The latter was collected by us and is made publicly available to the community. We also show how our proposed method can be further improved by using a recent segmentation-mask-based architecture, such as SPADE, and how to battle temporal inconsistency in video synthesis using a temporal discriminator.
|
Genre | |
Type | |
Language |
eng
|
Date Available |
2020-03-31
|
Provider |
Vancouver : University of British Columbia Library
|
DOI |
10.14288/1.0389697
|
URI | |
Degree | |
Program | |
Affiliation | |
Degree Grantor |
University of British Columbia
|
Graduation Date |
2020-05
|
Campus | |
Scholarly Level |
Graduate
|
Aggregated Source Repository |
DSpace
|