- Library Home /
- Search Collections /
- Open Collections /
- Browse Collections /
- UBC Theses and Dissertations /
- Data-efficient learning on structured output data
Open Collections
UBC Theses and Dissertations
UBC Theses and Dissertations
Data-efficient learning on structured output data Goyal, Raghav
Abstract
Deep learning relies on huge amounts of labeled data which is time-consuming and expensive to gather and annotate. The issue becomes even more pressing when dealing with structured data, where the cost of annotation increases with complexity of annotations from scribbles, bounding boxes, segmentation masks to scene-graphs. Efficient learning approaches attempts to alleviate this issue by utilizing lower quantity (em e.g., few-shot learning) and quality (em e.g., weakly-supervised learning) of annotations for a target task. In this thesis, we explore and develop efficient learning approaches to tackle structured output tasks across images and videos. Specifically, we first propose a unifying approach for any-shot (zero-shot and few-shot) object detection and segmentation using a semi-supervised transfer learning methodology which learns to semantically transform weak detectors/segmentors to strong ones. We then follow up on more granular annotations - scene graphs - and propose a simple weakly-supervised approach for human-centric scene-graph detection, where despite assuming weaker supervision for objects and relations, we perform competitively to state-of-the-art approaches. We then turn our focus to videos, which compared to images, are more label intensive due to an additional temporal dimension. We explore model design choices for videos in the context of efficient learning paradigms. In particular, we first look at dynamic spatio-temporal annotations in videos, where we propose a single, unified model for tackling multi-modal, query-based video understanding in long-form videos, and show that multi-task training leads to improved performance and ability to generalize to unseen tasks. And second, in the context of Long-Video Object Segmentation, we propose a transformation-aware loss which places greater emphasis on parts of videos where the tracked object is undergoing deformation, and show improved performance over prior works, along with a time-coded memory beyond vanilla additive positional encoding, which helps propagate context across long videos.
Item Metadata
Title |
Data-efficient learning on structured output data
|
Creator | |
Supervisor | |
Publisher |
University of British Columbia
|
Date Issued |
2024
|
Description |
Deep learning relies on huge amounts of labeled data which is time-consuming and expensive to gather and annotate. The issue becomes even more pressing when dealing with structured data, where the cost of annotation increases with complexity of annotations from scribbles, bounding boxes, segmentation masks to scene-graphs. Efficient learning approaches attempts to alleviate this issue by utilizing lower quantity (em e.g., few-shot learning) and quality (em e.g., weakly-supervised learning) of annotations for a target task. In this thesis, we explore and develop efficient learning approaches to tackle structured output tasks across images and videos.
Specifically, we first propose a unifying approach for any-shot (zero-shot and few-shot) object detection and segmentation using a semi-supervised transfer learning methodology which learns to semantically transform weak detectors/segmentors to strong ones. We then follow up on more granular annotations - scene graphs - and propose a simple weakly-supervised approach for human-centric scene-graph detection, where despite assuming weaker supervision for objects and relations, we perform competitively to state-of-the-art approaches.
We then turn our focus to videos, which compared to images, are more label intensive due to an additional temporal dimension. We explore model design choices for videos in the context of efficient learning paradigms. In particular, we first look at dynamic spatio-temporal annotations in videos, where we propose a single, unified model for tackling multi-modal, query-based video understanding in long-form videos, and show that multi-task training leads to improved performance and ability to generalize to unseen tasks. And second, in the context of Long-Video Object Segmentation, we propose a transformation-aware loss which places greater emphasis on parts of videos where the tracked object is undergoing deformation, and show improved performance over prior works, along with a time-coded memory beyond vanilla additive positional encoding, which helps propagate context across long videos.
|
Genre | |
Type | |
Language |
eng
|
Date Available |
2024-12-02
|
Provider |
Vancouver : University of British Columbia Library
|
Rights |
Attribution-ShareAlike 4.0 International
|
DOI |
10.14288/1.0447394
|
URI | |
Degree | |
Program | |
Affiliation | |
Degree Grantor |
University of British Columbia
|
Graduation Date |
2025-05
|
Campus | |
Scholarly Level |
Graduate
|
Rights URI | |
Aggregated Source Repository |
DSpace
|
Item Media
Item Citations and Data
Rights
Attribution-ShareAlike 4.0 International