Understanding semantics and geometry of scenes

UBC Theses and Dissertations

Featured Collection

UBC Theses and Dissertations

Understanding semantics and geometry of scenes Suhail, Mohammed

Abstract

In this dissertation, we present new approaches for structured scene understanding from images and videos. Structured scene understanding finds numerous applications, including in robotics and autonomous vehicles, as well as in 3D content creation and video editing. The focus of this research is on three specific tasks: scene graph generation, novel view synthesis, and layered scene representation. Scene graph generation involves creating a graph structure that represents the objects and their relationships in a scene. Generating a scene graph from an image demands a comprehensive comprehension of constituent objects and their associations. Our exploration delves into integrating the often overlooked structure of the output space into the reasoning framework. Additionally, we extend beyond bounding box granularity by leveraging pixel-level masks to ground objects when such annotations are absent in scene graph datasets. Novel view synthesis involves generating new views of a scene from input images. Achieving this demands a deep comprehension of the scene's underlying geometry to ensure the rendering of pixels aligns seamlessly with the scene's structure. Within this dissertation, our exploration centers on methods capable of accurately rendering scenes, particularly when dealing with non-Lambertian surfaces. Moreover, we address the challenge of developing view-synthesis techniques capable of generating new scene perspectives without necessitating training for each scene. Layered scene representation involves decomposing a scene into different semantically meaningful layers. In our pursuit of this task, we confront the constraints inherent in existing methods when handling videos with parallax effects resulting from homography-based modeling. To address this, our exploration focuses on a methodology aimed at learning a three-dimensional (3D) layered representation. This approach aims to surpass these limitations and facilitate a more comprehensive scene decomposition. The main contributions of this thesis thus include the exploration and advancement of these tasks.

Item Metadata

Title	Understanding semantics and geometry of scenes
Creator	Suhail, Mohammed
Supervisor	Sigal, Leonid
Publisher	University of British Columbia
Date Issued	2024
Description	In this dissertation, we present new approaches for structured scene understanding from images and videos. Structured scene understanding finds numerous applications, including in robotics and autonomous vehicles, as well as in 3D content creation and video editing. The focus of this research is on three specific tasks: scene graph generation, novel view synthesis, and layered scene representation. Scene graph generation involves creating a graph structure that represents the objects and their relationships in a scene. Generating a scene graph from an image demands a comprehensive comprehension of constituent objects and their associations. Our exploration delves into integrating the often overlooked structure of the output space into the reasoning framework. Additionally, we extend beyond bounding box granularity by leveraging pixel-level masks to ground objects when such annotations are absent in scene graph datasets. Novel view synthesis involves generating new views of a scene from input images. Achieving this demands a deep comprehension of the scene's underlying geometry to ensure the rendering of pixels aligns seamlessly with the scene's structure. Within this dissertation, our exploration centers on methods capable of accurately rendering scenes, particularly when dealing with non-Lambertian surfaces. Moreover, we address the challenge of developing view-synthesis techniques capable of generating new scene perspectives without necessitating training for each scene. Layered scene representation involves decomposing a scene into different semantically meaningful layers. In our pursuit of this task, we confront the constraints inherent in existing methods when handling videos with parallax effects resulting from homography-based modeling. To address this, our exploration focuses on a methodology aimed at learning a three-dimensional (3D) layered representation. This approach aims to surpass these limitations and facilitate a more comprehensive scene decomposition. The main contributions of this thesis thus include the exploration and advancement of these tasks.
Genre	Thesis/Dissertation
Type	Text
Language	eng
Date Available	2024-04-09
Provider	Vancouver : University of British Columbia Library
Rights	Attribution-NonCommercial-NoDerivatives 4.0 International
DOI	10.14288/1.0441006
URI	http://hdl.handle.net/2429/87712
Degree	Doctor of Philosophy - PhD
Program	Computer Science
Affiliation	Science, Faculty of; Computer Science, Department of
Degree Grantor	University of British Columbia
Graduation Date	2024-05
Campus	UBCV
Scholarly Level	Graduate
Rights URI	http://creativecommons.org/licenses/by-nc-nd/4.0/
Aggregated Source Repository	DSpace

Open Collections

UBC Theses and Dissertations

UBC Theses and Dissertations

Understanding semantics and geometry of scenes Suhail, Mohammed

Abstract

Item Metadata

Item Media

Item Citations and Data

Rights