Flexible conditioning in generative models of images and video

UBC Theses and Dissertations

Featured Collection

UBC Theses and Dissertations

Flexible conditioning in generative models of images and video Harvey, William

Abstract

Recent advances in the field of deep generative modelling are leading to increasingly faithful models of real-world data including images and videos. Of particular practical interest are conditional generative models, which parameterise conditional probability distributions given data features. Flexibly-conditional generative models are more flexible than conventional conditional models in the sense that they allow any data features to be conditioned on. This makes them applicable to tasks like image inpainting where we want the same model that can, e.g., inpaint the top half of an image, to also be capable of, e.g., inpainting the bottom half. Flexible conditioning has previously been demonstrated for data types including fixed-size images and short videos, but our thesis is that it can be enabled in a much broader variety of settings. The first setting we will consider is long-video generation, which is normally problematic because the data is high-dimensional and compute constraints can prevent our model from conditioning on all possible frames. The second is where the data dimensionality (e.g. number of frames in a video) is stochastic and can depend on what we condition on. We present techniques to enable flexible conditioning in both of these settings. We further show that the resulting models can sometimes improve on baselines in terms of sample quality even for conventional generation tasks. Another barrier to flexibly-conditional modelling has been the computational cost of training any high-quality generative models on moderate- or high-resolution visual data. We therefore end by presenting the first technique to mitigate this cost for the training of flexibly-conditional variational auto-encoders by incorporating pretrained unconditional model weights.

Item Metadata

Title	Flexible conditioning in generative models of images and video
Creator	Harvey, William
Supervisor	Wood, Frank
Publisher	University of British Columbia
Date Issued	2024
Description	Recent advances in the field of deep generative modelling are leading to increasingly faithful models of real-world data including images and videos. Of particular practical interest are conditional generative models, which parameterise conditional probability distributions given data features. Flexibly-conditional generative models are more flexible than conventional conditional models in the sense that they allow any data features to be conditioned on. This makes them applicable to tasks like image inpainting where we want the same model that can, e.g., inpaint the top half of an image, to also be capable of, e.g., inpainting the bottom half. Flexible conditioning has previously been demonstrated for data types including fixed-size images and short videos, but our thesis is that it can be enabled in a much broader variety of settings. The first setting we will consider is long-video generation, which is normally problematic because the data is high-dimensional and compute constraints can prevent our model from conditioning on all possible frames. The second is where the data dimensionality (e.g. number of frames in a video) is stochastic and can depend on what we condition on. We present techniques to enable flexible conditioning in both of these settings. We further show that the resulting models can sometimes improve on baselines in terms of sample quality even for conventional generation tasks. Another barrier to flexibly-conditional modelling has been the computational cost of training any high-quality generative models on moderate- or high-resolution visual data. We therefore end by presenting the first technique to mitigate this cost for the training of flexibly-conditional variational auto-encoders by incorporating pretrained unconditional model weights.
Genre	Thesis/Dissertation
Type	Text
Language	eng
Date Available	2024-08-22
Provider	Vancouver : University of British Columbia Library
Rights	Attribution-NonCommercial-NoDerivatives 4.0 International
DOI	10.14288/1.0445139
URI	http://hdl.handle.net/2429/89004
Degree (Theses)	Doctor of Philosophy - PhD
Program (Theses)	Computer Science
Affiliation	Science, Faculty of; Computer Science, Department of
Degree Grantor	University of British Columbia
Graduation Date	2024-11
Campus	UBCV
Scholarly Level	Graduate
Rights URI	http://creativecommons.org/licenses/by-nc-nd/4.0/
Aggregated Source Repository	DSpace

Open Collections

UBC Theses and Dissertations

UBC Theses and Dissertations

Flexible conditioning in generative models of images and video Harvey, William

Abstract

Item Metadata

Item Media

Item Citations and Data

Rights