UBC Theses and Dissertations
Towards large-scale nonparametric scene parsing of images and video Tung, Frederick
In computer vision, scene parsing is the problem of labelling every pixel in an image or video with its semantic category. Its goal is a complete and consistent semantic interpretation of the structure of the real world scene. Scene parsing forms a core component in many emerging technologies such as self-driving vehicles and prosthetic vision, and also informs complementary computer vision tasks such as depth estimation. This thesis presents a novel nonparametric scene parsing framework for images and video. In contrast to conventional practice, our scene parsing framework is built on nonparametric search-based label transfer instead of discriminative classification. We formulate exemplar-based scene parsing for both 2D (from images) and 3D (from video), and demonstrate accurate labelling on standard benchmarks. Since our framework is nonparametric, it is easily extensible to new categories and examples as the database grows. Nonparametric scene parsing is computationally demanding at test time, and requires methods for searching large collections of data that are time and memory efficient. This thesis also presents two novel binary encoding algorithms for large-scale approximate nearest neighbor search: the bank of random rotations is data independent and does not require training, while the supervised sparse projections algorithm targets efficient search of high-dimensional labelled data. We evaluate these algorithms on standard retrieval benchmarks, and then demonstrate their integration into our nonparametric scene parsing framework. Using 256-bit codes, binary encoding reduces search times by an order of magnitude and memory requirements by three orders of magnitude, while maintaining a mean per-class accuracy within 1% on the 3D scene parsing task.
Item Citations and Data
Attribution-NonCommercial-NoDerivatives 4.0 International