Single Image Video Prediction with Auto-Regressive GANs

UBC Faculty Research and Publications

Single Image Video Prediction with Auto-Regressive GANs Huang, Jiahui; Chia, Yew Ken; Yu, Samson; Yee, Kevin; Küster, Dennis; Krumhuber, Eva G.; Herremans, Dorien; Roig, Gemma

Abstract

In this paper, we introduce an approach for future frames prediction based on a single input image. Our method is able to generate an entire video sequence based on the information contained in the input frame. We adopt an autoregressive approach in our generation process, i.e., the output from each time step is fed as the input to the next step. Unlike other video prediction methods that use “one shot” generation, our method is able to preserve much more details from the input image, while also capturing the critical pixel-level changes between the frames. We overcome the problem of generation quality degradation by introducing a “complementary mask” module in our architecture, and we show that this allows the model to only focus on the generation of the pixels that need to be changed, and to reuse those that should remain static from its previous frame. We empirically validate our methods against various video prediction models on the UT Dallas Dataset, and show that our approach is able to generate high quality realistic video sequences from one static input image. In addition, we also validate the robustness of our method by testing a pre-trained model on the unseen ADFES facial expression dataset. We also provide qualitative results of our model tested on a human action dataset: The Weizmann Action database.

Item Metadata

Title	Single Image Video Prediction with Auto-Regressive GANs
Creator	Huang, Jiahui; Chia, Yew Ken; Yu, Samson; Yee, Kevin; Küster, Dennis; Krumhuber, Eva G.; Herremans, Dorien; Roig, Gemma
Publisher	Multidisciplinary Digital Publishing Institute
Date Issued	2022-05-06
Description	In this paper, we introduce an approach for future frames prediction based on a single input image. Our method is able to generate an entire video sequence based on the information contained in the input frame. We adopt an autoregressive approach in our generation process, i.e., the output from each time step is fed as the input to the next step. Unlike other video prediction methods that use “one shot” generation, our method is able to preserve much more details from the input image, while also capturing the critical pixel-level changes between the frames. We overcome the problem of generation quality degradation by introducing a “complementary mask” module in our architecture, and we show that this allows the model to only focus on the generation of the pixels that need to be changed, and to reuse those that should remain static from its previous frame. We empirically validate our methods against various video prediction models on the UT Dallas Dataset, and show that our approach is able to generate high quality realistic video sequences from one static input image. In addition, we also validate the robustness of our method by testing a pre-trained model on the unseen ADFES facial expression dataset. We also provide qualitative results of our model tested on a human action dataset: The Weizmann Action database.
Subject	video prediction; autoregressive GANs; emotion generation
Genre	Article
Type	Text
Language	eng
Date Available	2022-05-25
Provider	Vancouver : University of British Columbia Library
Rights	CC BY 4.0
DOI	10.14288/1.0413688
URI	http://hdl.handle.net/2429/81629
Affiliation	Applied Science, Faculty of; Non UBC; Electrical and Computer Engineering, Department of
Citation	Sensors 22 (9): 3533 (2022)
Publisher DOI	10.3390/s22093533
Peer Review Status	Reviewed
Scholarly Level	Faculty; Researcher
Rights URI	https://creativecommons.org/licenses/by/4.0/
Aggregated Source Repository	DSpace

Open Collections

UBC Faculty Research and Publications