Data-efficient learning on structured output data

UBC Theses and Dissertations

Featured Collection

UBC Theses and Dissertations

Data-efficient learning on structured output data Goyal, Raghav

Abstract

Deep learning relies on huge amounts of labeled data which is time-consuming and expensive to gather and annotate. The issue becomes even more pressing when dealing with structured data, where the cost of annotation increases with complexity of annotations from scribbles, bounding boxes, segmentation masks to scene-graphs. Efficient learning approaches attempts to alleviate this issue by utilizing lower quantity (em e.g., few-shot learning) and quality (em e.g., weakly-supervised learning) of annotations for a target task. In this thesis, we explore and develop efficient learning approaches to tackle structured output tasks across images and videos. Specifically, we first propose a unifying approach for any-shot (zero-shot and few-shot) object detection and segmentation using a semi-supervised transfer learning methodology which learns to semantically transform weak detectors/segmentors to strong ones. We then follow up on more granular annotations - scene graphs - and propose a simple weakly-supervised approach for human-centric scene-graph detection, where despite assuming weaker supervision for objects and relations, we perform competitively to state-of-the-art approaches. We then turn our focus to videos, which compared to images, are more label intensive due to an additional temporal dimension. We explore model design choices for videos in the context of efficient learning paradigms. In particular, we first look at dynamic spatio-temporal annotations in videos, where we propose a single, unified model for tackling multi-modal, query-based video understanding in long-form videos, and show that multi-task training leads to improved performance and ability to generalize to unseen tasks. And second, in the context of Long-Video Object Segmentation, we propose a transformation-aware loss which places greater emphasis on parts of videos where the tracked object is undergoing deformation, and show improved performance over prior works, along with a time-coded memory beyond vanilla additive positional encoding, which helps propagate context across long videos.

Item Metadata

Title	Data-efficient learning on structured output data
Creator	Goyal, Raghav
Supervisor	Sigal, Leonid
Publisher	University of British Columbia
Date Issued	2024
Description	Deep learning relies on huge amounts of labeled data which is time-consuming and expensive to gather and annotate. The issue becomes even more pressing when dealing with structured data, where the cost of annotation increases with complexity of annotations from scribbles, bounding boxes, segmentation masks to scene-graphs. Efficient learning approaches attempts to alleviate this issue by utilizing lower quantity (em e.g., few-shot learning) and quality (em e.g., weakly-supervised learning) of annotations for a target task. In this thesis, we explore and develop efficient learning approaches to tackle structured output tasks across images and videos. Specifically, we first propose a unifying approach for any-shot (zero-shot and few-shot) object detection and segmentation using a semi-supervised transfer learning methodology which learns to semantically transform weak detectors/segmentors to strong ones. We then follow up on more granular annotations - scene graphs - and propose a simple weakly-supervised approach for human-centric scene-graph detection, where despite assuming weaker supervision for objects and relations, we perform competitively to state-of-the-art approaches. We then turn our focus to videos, which compared to images, are more label intensive due to an additional temporal dimension. We explore model design choices for videos in the context of efficient learning paradigms. In particular, we first look at dynamic spatio-temporal annotations in videos, where we propose a single, unified model for tackling multi-modal, query-based video understanding in long-form videos, and show that multi-task training leads to improved performance and ability to generalize to unseen tasks. And second, in the context of Long-Video Object Segmentation, we propose a transformation-aware loss which places greater emphasis on parts of videos where the tracked object is undergoing deformation, and show improved performance over prior works, along with a time-coded memory beyond vanilla additive positional encoding, which helps propagate context across long videos.
Genre	Thesis/Dissertation
Type	Text
Language	eng
Date Available	2024-12-02
Provider	Vancouver : University of British Columbia Library
Rights	Attribution-ShareAlike 4.0 International
DOI	10.14288/1.0447394
URI	http://hdl.handle.net/2429/89767
Degree	Doctor of Philosophy - PhD
Program	Computer Science
Affiliation	Science, Faculty of; Computer Science, Department of
Degree Grantor	University of British Columbia
Graduation Date	2025-05
Campus	UBCV
Scholarly Level	Graduate
Rights URI	http://creativecommons.org/licenses/by-sa/4.0/
Aggregated Source Repository	DSpace

Open Collections

UBC Theses and Dissertations

UBC Theses and Dissertations

Data-efficient learning on structured output data Goyal, Raghav

Abstract

Item Metadata

Item Media

Item Citations and Data

Rights