Pattern matching in massive metadata graphs at scale

UBC Theses and Dissertations

Featured Collection

UBC Theses and Dissertations

Pattern matching in massive metadata graphs at scale Reza, Tahsin Arafat

Abstract

Pattern matching in graphs, that is finding subgraphs that match a smaller template graph within the large background graph is fundamental to graph analysis and serves a rich set of applications. Unfortunately, existing solutions have limited scalability, are difficult to parallelize, support only a limited set of search patterns, and/or focus on only a subset of the real-world problems. This dissertation explores avenues toward designing a scalable solution for subgraph pattern matching. In particular, this work targets practical pattern matching scenarios in large-scale metadata graphs (also known as property graphs) and designs solutions for distributed memory machines that address the two categories of matching problems, namely, exact and approximate matching. This work presents a novel algorithmic pipeline that bases pattern matching on constraint checking. The key intuition is that each vertex or edge participating in a match has to meet a set of constraints specified by the search template. The pipeline iterates over these constraints to eliminate all the vertices and edges that do not participate in any match, and reduces the background graph to the complete set of only the matching vertices and edges. Additional analysis can be performed on this reduced graph, such as full match enumeration. Furthermore, a vertex-centric formulation for this constraint checking algorithm exists, and this makes it possible to harness existing high-performance, vertex-centric graph processing frameworks. The key contributions of this dissertation are solution design following this constraint checking approach for exact and a class of edit-distance based approximate matching, and experimental evaluation to demonstrate effectiveness of the respective solutions. To this end, this work presents design and implementation of distributed vertex-centric, asynchronous algorithms that guarantee a solution with 100% precision and 100% recall for arbitrary search templates. Through comprehensive evaluation, this work provides evidence that the scalability and performance advantages of the proposed approach are significant. The highlights are scaling experiments on massive-scale real-world (up to 257 billion edges) and synthetic (up to 4.4 trillion edges) graphs, and at scales (1,024 compute nodes), orders of magnitude larger than used in the past for similar problems.

Item Metadata

Title	Pattern matching in massive metadata graphs at scale
Creator	Reza, Tahsin Arafat
Publisher	University of British Columbia
Date Issued	2019
Description	Pattern matching in graphs, that is finding subgraphs that match a smaller template graph within the large background graph is fundamental to graph analysis and serves a rich set of applications. Unfortunately, existing solutions have limited scalability, are difficult to parallelize, support only a limited set of search patterns, and/or focus on only a subset of the real-world problems. This dissertation explores avenues toward designing a scalable solution for subgraph pattern matching. In particular, this work targets practical pattern matching scenarios in large-scale metadata graphs (also known as property graphs) and designs solutions for distributed memory machines that address the two categories of matching problems, namely, exact and approximate matching. This work presents a novel algorithmic pipeline that bases pattern matching on constraint checking. The key intuition is that each vertex or edge participating in a match has to meet a set of constraints specified by the search template. The pipeline iterates over these constraints to eliminate all the vertices and edges that do not participate in any match, and reduces the background graph to the complete set of only the matching vertices and edges. Additional analysis can be performed on this reduced graph, such as full match enumeration. Furthermore, a vertex-centric formulation for this constraint checking algorithm exists, and this makes it possible to harness existing high-performance, vertex-centric graph processing frameworks. The key contributions of this dissertation are solution design following this constraint checking approach for exact and a class of edit-distance based approximate matching, and experimental evaluation to demonstrate effectiveness of the respective solutions. To this end, this work presents design and implementation of distributed vertex-centric, asynchronous algorithms that guarantee a solution with 100% precision and 100% recall for arbitrary search templates. Through comprehensive evaluation, this work provides evidence that the scalability and performance advantages of the proposed approach are significant. The highlights are scaling experiments on massive-scale real-world (up to 257 billion edges) and synthetic (up to 4.4 trillion edges) graphs, and at scales (1,024 compute nodes), orders of magnitude larger than used in the past for similar problems.
Genre	Thesis/Dissertation
Type	Text
Language	eng
Date Available	2020-01-03
Provider	Vancouver : University of British Columbia Library
Rights	Attribution-NonCommercial-NoDerivatives 4.0 International
DOI	10.14288/1.0387453
URI	http://hdl.handle.net/2429/73093
Degree (Theses)	Doctor of Philosophy - PhD
Program (Theses)	Electrical and Computer Engineering
Affiliation	Applied Science, Faculty of; Electrical and Computer Engineering, Department of
Degree Grantor	University of British Columbia
Graduation Date	2020-05
Campus	UBCV
Scholarly Level	Graduate
Rights URI	http://creativecommons.org/licenses/by-nc-nd/4.0/
Aggregated Source Repository	DSpace

Open Collections

UBC Theses and Dissertations

UBC Theses and Dissertations

Pattern matching in massive metadata graphs at scale Reza, Tahsin Arafat

Abstract

Item Metadata

Item Media

Item Citations and Data

Rights