The effectiveness of GNNs for node classification : the significance of side information

UBC Theses and Dissertations

Featured Collection

UBC Theses and Dissertations

The effectiveness of GNNs for node classification : the significance of side information Liu, Xiaoou

Abstract

This thesis studies the effectiveness of Graph Neural Networks (GNNs) for node classification. We conduct systematic experiments on several representative deep-learning models for graph data, using training data generated from the Stochastic Block Model (SBM) and the theoretical results on the fundamental limits of this model as guidance in the design of our experiments. While GNNs are widely believed to be powerful learning models for graph data, our empirical findings suggest that they do not necessarily outperform other machine-learning-based methods and traditional algorithms for node classification. In particular, we observe that GNN-based methods fail to exploit the information from labeled nodes in semi-supervised learning settings. We propose an effective data augmentation method to enhance GNN-based methods by making better use of labeled information in the training data. Our experiments using synthetic data from SBMs and real-world datasets demonstrate that our method can significantly enhance the capabilities of GNN models and notably improve their performance for node classification. Additionally, in the context of unsupervised learning, we discuss the possibility of incorporating other types of side information into GNNs that may exist in multiplex network data.

Item Metadata

Title	The effectiveness of GNNs for node classification : the significance of side information
Creator	Liu, Xiaoou
Supervisor	Gao, Yong
Publisher	University of British Columbia
Date Issued	2024
Description	This thesis studies the effectiveness of Graph Neural Networks (GNNs) for node classification. We conduct systematic experiments on several representative deep-learning models for graph data, using training data generated from the Stochastic Block Model (SBM) and the theoretical results on the fundamental limits of this model as guidance in the design of our experiments. While GNNs are widely believed to be powerful learning models for graph data, our empirical findings suggest that they do not necessarily outperform other machine-learning-based methods and traditional algorithms for node classification. In particular, we observe that GNN-based methods fail to exploit the information from labeled nodes in semi-supervised learning settings. We propose an effective data augmentation method to enhance GNN-based methods by making better use of labeled information in the training data. Our experiments using synthetic data from SBMs and real-world datasets demonstrate that our method can significantly enhance the capabilities of GNN models and notably improve their performance for node classification. Additionally, in the context of unsupervised learning, we discuss the possibility of incorporating other types of side information into GNNs that may exist in multiplex network data.
Genre	Thesis/Dissertation
Type	Text
Language	eng
Date Available	2024-06-17
Provider	Vancouver : University of British Columbia Library
Rights	Attribution-NonCommercial-NoDerivatives 4.0 International
DOI	10.14288/1.0443981
URI	http://hdl.handle.net/2429/88475
Degree	Master of Science - MSc
Program	Computer Science
Affiliation	Science, Irving K. Barber Faculty of (Okanagan); Computer Science, Mathematics, Physics and Statistics, Department of (Okanagan)
Degree Grantor	University of British Columbia
Graduation Date	2024-09
Campus	UBCO
Scholarly Level	Graduate
Rights URI	http://creativecommons.org/licenses/by-nc-nd/4.0/
Aggregated Source Repository	DSpace

Open Collections

UBC Theses and Dissertations

UBC Theses and Dissertations

The effectiveness of GNNs for node classification : the significance of side information Liu, Xiaoou

Abstract

Item Metadata

Item Media

Item Citations and Data

Rights