Towards large-scale nonparametric scene parsing of images and video

UBC Theses and Dissertations

Featured Collection

UBC Theses and Dissertations

Towards large-scale nonparametric scene parsing of images and video Tung, Frederick

Abstract

In computer vision, scene parsing is the problem of labelling every pixel in an image or video with its semantic category. Its goal is a complete and consistent semantic interpretation of the structure of the real world scene. Scene parsing forms a core component in many emerging technologies such as self-driving vehicles and prosthetic vision, and also informs complementary computer vision tasks such as depth estimation. This thesis presents a novel nonparametric scene parsing framework for images and video. In contrast to conventional practice, our scene parsing framework is built on nonparametric search-based label transfer instead of discriminative classification. We formulate exemplar-based scene parsing for both 2D (from images) and 3D (from video), and demonstrate accurate labelling on standard benchmarks. Since our framework is nonparametric, it is easily extensible to new categories and examples as the database grows. Nonparametric scene parsing is computationally demanding at test time, and requires methods for searching large collections of data that are time and memory efficient. This thesis also presents two novel binary encoding algorithms for large-scale approximate nearest neighbor search: the bank of random rotations is data independent and does not require training, while the supervised sparse projections algorithm targets efficient search of high-dimensional labelled data. We evaluate these algorithms on standard retrieval benchmarks, and then demonstrate their integration into our nonparametric scene parsing framework. Using 256-bit codes, binary encoding reduces search times by an order of magnitude and memory requirements by three orders of magnitude, while maintaining a mean per-class accuracy within 1% on the 3D scene parsing task.

Item Metadata

Title	Towards large-scale nonparametric scene parsing of images and video
Creator	Tung, Frederick
Publisher	University of British Columbia
Date Issued	2017
Description	In computer vision, scene parsing is the problem of labelling every pixel in an image or video with its semantic category. Its goal is a complete and consistent semantic interpretation of the structure of the real world scene. Scene parsing forms a core component in many emerging technologies such as self-driving vehicles and prosthetic vision, and also informs complementary computer vision tasks such as depth estimation. This thesis presents a novel nonparametric scene parsing framework for images and video. In contrast to conventional practice, our scene parsing framework is built on nonparametric search-based label transfer instead of discriminative classification. We formulate exemplar-based scene parsing for both 2D (from images) and 3D (from video), and demonstrate accurate labelling on standard benchmarks. Since our framework is nonparametric, it is easily extensible to new categories and examples as the database grows. Nonparametric scene parsing is computationally demanding at test time, and requires methods for searching large collections of data that are time and memory efficient. This thesis also presents two novel binary encoding algorithms for large-scale approximate nearest neighbor search: the bank of random rotations is data independent and does not require training, while the supervised sparse projections algorithm targets efficient search of high-dimensional labelled data. We evaluate these algorithms on standard retrieval benchmarks, and then demonstrate their integration into our nonparametric scene parsing framework. Using 256-bit codes, binary encoding reduces search times by an order of magnitude and memory requirements by three orders of magnitude, while maintaining a mean per-class accuracy within 1% on the 3D scene parsing task.
Genre	Thesis/Dissertation
Type	Text
Language	eng
Date Available	2017-03-03
Provider	Vancouver : University of British Columbia Library
Rights	Attribution-NonCommercial-NoDerivatives 4.0 International
DOI	10.14288/1.0343064
URI	http://hdl.handle.net/2429/60790
Degree (Theses)	Doctor of Philosophy - PhD
Program (Theses)	Computer Science
Affiliation	Science, Faculty of; Computer Science, Department of
Degree Grantor	University of British Columbia
Graduation Date	2017-05
Campus	UBCV
Scholarly Level	Graduate
Rights URI	http://creativecommons.org/licenses/by-nc-nd/4.0/
Aggregated Source Repository	DSpace

Open Collections

UBC Theses and Dissertations

UBC Theses and Dissertations

Towards large-scale nonparametric scene parsing of images and video Tung, Frederick

Abstract

Item Metadata

Item Media

Item Citations and Data

Rights