Open Collections

UBC Theses and Dissertations

Featured Collection

UBC Theses and Dissertations

Repurposing large pretrained diffusion models for unsupervised visual understanding and efficient adaptation Hedlin, Eric

Abstract

Large pretrained text-conditioned image generation models learn a compositional and structured latent representation of visual concepts, showcasing their rich understanding of the world through their ability to generate diverse, coherent images. These models link text descriptions to visual concepts, unifying concepts across a range of conditions such as understanding the relationships between the text input and objects in a scene. This thesis explores how this link between text and visual concepts enables identifying consistent semantic regularities across images, where similar regions are mapped through the same text embedding. We show that this can be leveraged for tasks like semantic correspondence and estimating consistent keypoints, simply by optimizing the text embedding to activate highly in a specific region in the image for a given token. We also take advantage of the capacity of the model for one-shot personalization given only a single image. We leverage this by training hypernetworks to quickly estimate network weights for subject personalized generation, whose convergence is only possible due to the smooth underlying representation of concepts learned by these models. This PhD thesis leverages large pretrained diffusion models to address three key areas: semantic correspondence, unsupervised keypoint detection, and eﬀicient hypernetwork-based adaptation for personalized model fine tuning. For semantic correspondence, we optimize text tokens to focus attention on specific regions in an image, leveraging the latent knowledge of large pretrained models to identify correspondences from a single image without additional supervision. For unsupervised keypoint detection, we localize text tokens across a collection of images to identify common keypoints, using a collection of images to focus the model on a specific concept, leveraging the knowledge within the pretrained model to generalize without ground truth keypoints. We also investigate hypernetwork-based methods for generating weights for large model personalization conditioned on a single image, providing an eﬀicient alternative to compute intense optimization without requiring ground truth weights. This work highlights the versatility of diffusion models, extending their utility beyond image generation while proposing scalable, eﬀicient solutions for downstream tasks of semantic correspondence, unsupervised keypoint estimation, and hypernetwork-based personalized model fine tuning.

Item Metadata

Title	Repurposing large pretrained diffusion models for unsupervised visual understanding and efficient adaptation
Creator	Hedlin, Eric
Supervisor	Yi, Kwang Moo
Publisher	University of British Columbia
Date Issued	2025
Description	Large pretrained text-conditioned image generation models learn a compositional and structured latent representation of visual concepts, showcasing their rich understanding of the world through their ability to generate diverse, coherent images. These models link text descriptions to visual concepts, unifying concepts across a range of conditions such as understanding the relationships between the text input and objects in a scene. This thesis explores how this link between text and visual concepts enables identifying consistent semantic regularities across images, where similar regions are mapped through the same text embedding. We show that this can be leveraged for tasks like semantic correspondence and estimating consistent keypoints, simply by optimizing the text embedding to activate highly in a specific region in the image for a given token. We also take advantage of the capacity of the model for one-shot personalization given only a single image. We leverage this by training hypernetworks to quickly estimate network weights for subject personalized generation, whose convergence is only possible due to the smooth underlying representation of concepts learned by these models. This PhD thesis leverages large pretrained diffusion models to address three key areas: semantic correspondence, unsupervised keypoint detection, and eﬀicient hypernetwork-based adaptation for personalized model fine tuning. For semantic correspondence, we optimize text tokens to focus attention on specific regions in an image, leveraging the latent knowledge of large pretrained models to identify correspondences from a single image without additional supervision. For unsupervised keypoint detection, we localize text tokens across a collection of images to identify common keypoints, using a collection of images to focus the model on a specific concept, leveraging the knowledge within the pretrained model to generalize without ground truth keypoints. We also investigate hypernetwork-based methods for generating weights for large model personalization conditioned on a single image, providing an eﬀicient alternative to compute intense optimization without requiring ground truth weights. This work highlights the versatility of diffusion models, extending their utility beyond image generation while proposing scalable, eﬀicient solutions for downstream tasks of semantic correspondence, unsupervised keypoint estimation, and hypernetwork-based personalized model fine tuning.
Genre	Thesis/Dissertation
Type	Text
Language	eng
Date Available	2025-10-03
Provider	Vancouver : University of British Columbia Library
Rights	Attribution-NonCommercial-NoDerivatives 4.0 International
DOI	10.14288/1.0450306
URI	http://hdl.handle.net/2429/92489
Degree (Theses)	Doctor of Philosophy - PhD
Program (Theses)	Computer Science
Affiliation	Science, Faculty of; Computer Science, Department of
Degree Grantor	University of British Columbia
Graduation Date	2025-11
Campus	UBCV
Scholarly Level	Graduate
Rights URI	http://creativecommons.org/licenses/by-nc-nd/4.0/
Aggregated Source Repository	DSpace

Open Collections

UBC Theses and Dissertations

UBC Theses and Dissertations

Repurposing large pretrained diffusion models for unsupervised visual understanding and efficient adaptation Hedlin, Eric

Abstract

Item Metadata

Item Media

Item Citations and Data

Rights