HalluciNet-ing Spatiotemporal Representations Using a 2D-CNN

UBC Faculty Research and Publications

HalluciNet-ing Spatiotemporal Representations Using a 2D-CNN Parmar, Paritosh; Morris, Brendan

Abstract

Spatiotemporal representations learned using 3D convolutional neural networks (CNN) are currently used in state-of-the-art approaches for action-related tasks. However, 3D-CNN are notorious for being memory and compute resource intensive as compared with more simple 2D-CNN architectures. We propose to hallucinate spatiotemporal representations from a 3D-CNN teacher with a 2D-CNN student. By requiring the 2D-CNN to predict the future and intuit upcoming activity, it is encouraged to gain a deeper understanding of actions and how they evolve. The hallucination task is treated as an auxiliary task, which can be used with any other action-related task in a multitask learning setting. Thorough experimental evaluation, it is shown that the hallucination task indeed helps improve performance on action recognition, action quality assessment, and dynamic scene recognition tasks. From a practical standpoint, being able to hallucinate spatiotemporal representations without an actual 3D-CNN can enable deployment in resource-constrained scenarios, such as with limited computing power and/or lower bandwidth. We also observed that our hallucination task has utility not only during the training phase, but also during the pre-training phase.

Item Metadata

Title	HalluciNet-ing Spatiotemporal Representations Using a 2D-CNN
Creator	Parmar, Paritosh; Morris, Brendan
Publisher	Multidisciplinary Digital Publishing Institute
Date Issued	2021-09-08
Description	Spatiotemporal representations learned using 3D convolutional neural networks (CNN) are currently used in state-of-the-art approaches for action-related tasks. However, 3D-CNN are notorious for being memory and compute resource intensive as compared with more simple 2D-CNN architectures. We propose to hallucinate spatiotemporal representations from a 3D-CNN teacher with a 2D-CNN student. By requiring the 2D-CNN to predict the future and intuit upcoming activity, it is encouraged to gain a deeper understanding of actions and how they evolve. The hallucination task is treated as an auxiliary task, which can be used with any other action-related task in a multitask learning setting. Thorough experimental evaluation, it is shown that the hallucination task indeed helps improve performance on action recognition, action quality assessment, and dynamic scene recognition tasks. From a practical standpoint, being able to hallucinate spatiotemporal representations without an actual 3D-CNN can enable deployment in resource-constrained scenarios, such as with limited computing power and/or lower bandwidth. We also observed that our hallucination task has utility not only during the training phase, but also during the pre-training phase.
Subject	action recognition; scene recognition; action quality assessment; activity recognition; deep learning; computer vision; convolutional neural networks; multitask learning; transfer learning
Genre	Article
Type	Text
Language	eng
Date Available	2021-10-12
Provider	Vancouver : University of British Columbia Library
Rights	CC BY 4.0
DOI	10.14288/1.0402498
URI	http://hdl.handle.net/2429/79936
Affiliation	Science, Faculty of; Non UBC; Computer Science, Department of
Citation	Signals 2 (3): 604-618 (2021)
Publisher DOI	10.3390/signals2030037
Peer Review Status	Reviewed
Scholarly Level	Faculty
Rights URI	https://creativecommons.org/licenses/by/4.0/
Aggregated Source Repository	DSpace

Open Collections

UBC Faculty Research and Publications