Exploring neural network interpretability in visual understanding

UBC Theses and Dissertations

Featured Collection

UBC Theses and Dissertations

Exploring neural network interpretability in visual understanding Wang, Dan

Abstract

Neural networks (NNs) have reached remarkable performance in computer vision. However, numerous parameters and complex structures make NNs opaque to humans. The failure to comprehend NNs may raise serious issues in real-world applications. My research aims to explore the NN interpretability in diverse visual tasks from post-hoc explanation and intrinsic interpretability perspectives. Convolutional neural networks (CNNs) have outperformed humans in image classification. However, the logic of network decisions remains a puzzle. As such, we propose concept-harmonized hierarchical inference, a post-hoc explanation framework, to explain the decision-making process of CNNs. Firstly, we interpret layered feature representations of NNs with hierarchical visual semantics. Then we explain the NN feature learning as a bottom-up decision logic from low to high semantic levels in which a deep-layer decision is decomposed as a sequence of shallow-layer sub-decisions. With the evolution of virtual reality, researchers are focusing increasingly on inverse rendering: reconstructing a 3D scene from multi-view 2D images. In this field, NNs achieved superior performance in novel view synthesis and 3D reconstruction. For both tasks, learning a 3D representation from input views is the key process where prior methods separately designed a CNN-based single-view feature extraction and a pooling-based multi-view fusion. This incoherent design damages their intrinsic interpretability and performance. Therefore, we aim to design coherent, interpretable NNs that can adequately exploit knowledge of relationships from data. For novel view synthesis, we propose a unified Transformer-based neural radiance field (TransNeRF) conditioned on source views to learn a generic 3D-scene representation. TransNeRF explores deep relationships between the target-rendering view and source views. TransNeRF also improves intrinsic interpretability by enhancing the shape and appearance consistency of a 3D scene. In experiments, TransNeRF outperforms prior neural rendering methods, and the interpretation results are consistent with human perception. We reformulate 3D reconstruction as a sequence-to-sequence prediction and propose an end-to-end Transformer-based framework (EVolT). EVolT jointly explores multi-level associations between input views and the output volume-based 3D representation within our encoder-decoder structure. EVolT achieves state-of-the-art accuracy in multi-view reconstruction with fewer parameters (70% fewer) than prior methods. Experimental results also suggest the strong scaling capability of EVolT.

Item Metadata

Title	Exploring neural network interpretability in visual understanding
Creator	Wang, Dan
Supervisor	Wang, Z. Jane; Salcudean, Septimiu Edmund
Publisher	University of British Columbia
Date Issued	2023
Description	Neural networks (NNs) have reached remarkable performance in computer vision. However, numerous parameters and complex structures make NNs opaque to humans. The failure to comprehend NNs may raise serious issues in real-world applications. My research aims to explore the NN interpretability in diverse visual tasks from post-hoc explanation and intrinsic interpretability perspectives. Convolutional neural networks (CNNs) have outperformed humans in image classification. However, the logic of network decisions remains a puzzle. As such, we propose concept-harmonized hierarchical inference, a post-hoc explanation framework, to explain the decision-making process of CNNs. Firstly, we interpret layered feature representations of NNs with hierarchical visual semantics. Then we explain the NN feature learning as a bottom-up decision logic from low to high semantic levels in which a deep-layer decision is decomposed as a sequence of shallow-layer sub-decisions. With the evolution of virtual reality, researchers are focusing increasingly on inverse rendering: reconstructing a 3D scene from multi-view 2D images. In this field, NNs achieved superior performance in novel view synthesis and 3D reconstruction. For both tasks, learning a 3D representation from input views is the key process where prior methods separately designed a CNN-based single-view feature extraction and a pooling-based multi-view fusion. This incoherent design damages their intrinsic interpretability and performance. Therefore, we aim to design coherent, interpretable NNs that can adequately exploit knowledge of relationships from data. For novel view synthesis, we propose a unified Transformer-based neural radiance field (TransNeRF) conditioned on source views to learn a generic 3D-scene representation. TransNeRF explores deep relationships between the target-rendering view and source views. TransNeRF also improves intrinsic interpretability by enhancing the shape and appearance consistency of a 3D scene. In experiments, TransNeRF outperforms prior neural rendering methods, and the interpretation results are consistent with human perception. We reformulate 3D reconstruction as a sequence-to-sequence prediction and propose an end-to-end Transformer-based framework (EVolT). EVolT jointly explores multi-level associations between input views and the output volume-based 3D representation within our encoder-decoder structure. EVolT achieves state-of-the-art accuracy in multi-view reconstruction with fewer parameters (70% fewer) than prior methods. Experimental results also suggest the strong scaling capability of EVolT.
Genre	Thesis/Dissertation
Type	Text
Language	eng
Date Available	2023-04-05
Provider	Vancouver : University of British Columbia Library
Rights	Attribution-NonCommercial-NoDerivatives 4.0 International
DOI	10.14288/1.0429922
URI	http://hdl.handle.net/2429/84143
Degree	Doctor of Philosophy - PhD
Program	Electrical and Computer Engineering
Affiliation	Applied Science, Faculty of; Electrical and Computer Engineering, Department of
Degree Grantor	University of British Columbia
Graduation Date	2023-05
Campus	UBCV
Scholarly Level	Graduate
Rights URI	http://creativecommons.org/licenses/by-nc-nd/4.0/
Aggregated Source Repository	DSpace

Open Collections

UBC Theses and Dissertations

UBC Theses and Dissertations

Exploring neural network interpretability in visual understanding Wang, Dan

Abstract

Item Metadata

Item Media

Item Citations and Data

Rights