- Library Home /
- Search Collections /
- Open Collections /
- Browse Collections /
- UBC Theses and Dissertations /
- Augmenting metadata tags in open data tables using...
Open Collections
UBC Theses and Dissertations
UBC Theses and Dissertations
Augmenting metadata tags in open data tables using schema matching in a pay-as-you-go fashion Yu, Haoran
Abstract
Metadata helps users understand the data contents in a table. Metadata tags can describe the contents in the table and allow a user to easily browse, search, and filter data. However, metadata is less useful when there is heterogeneity and incompleteness in a table. It's difficult to find all related tables to the given table by only examining the tags, because the user is typically looking for overlap of tags between two or more tables and there are no such overlaps in the heterogeneous metadata. We use Open Data tables in a case study and develop strategies to augment the tags in table metadata to increase the number of the tag overlaps among metadata of different tables. As an initialization step, we perform semantic enrichment of words in attributes of table schema and in tags, and perform schema matching between attributes and tags of a table to create semantic labeling, where an attribute is labeled with zero or more tags. We provide one base table, and search for tables using the semantic labeling we created to quickly find related tables. We integrate the table searching step and a schema matching step into an iterative framework, which incrementally add additional tags to a table's metadata for all the tables related to the base table. The additional tags added to the metadata are discovered by semantics overlap during the schema matching step in the iterative framework, based on a composite score with evidence from multiple pairwise value comparison criteria. We evaluate two approaches using a gold standard we created, and compare the accuracy of the augmented tags and the runtime with the two baseline approaches. We show that the case of augmented tags has relatively high accuracy and the runtime of our iterative approach is reasonable. We argue that an approach that creates approximate matching in a pay-as-you-go fashion has good precision and recall, and is the more realistic option in a real-world scenario.
Item Metadata
Title |
Augmenting metadata tags in open data tables using schema matching in a pay-as-you-go fashion
|
Creator | |
Supervisor | |
Publisher |
University of British Columbia
|
Date Issued |
2023
|
Description |
Metadata helps users understand the data contents in a table. Metadata tags can describe the contents in the table and allow a user to easily browse, search, and filter data. However, metadata is less useful when there is heterogeneity and incompleteness in a table. It's difficult to find all related tables to the given table by only examining the tags, because the user is typically looking for overlap of tags between two or more tables and there are no such overlaps in the heterogeneous metadata. We use Open Data tables in a case study and develop strategies to augment the tags in table metadata to increase the number of the tag overlaps among metadata of different tables. As an initialization step, we perform semantic enrichment of words in attributes of table schema and in tags, and perform schema matching between attributes and tags of a table to create semantic labeling, where an attribute is labeled with zero or more tags. We provide one base table, and search for tables using the semantic labeling we created to quickly find related tables. We integrate the table searching step and a schema matching step into an iterative framework, which incrementally add additional tags to a table's metadata for all the tables related to the base table. The additional tags added to the metadata are discovered by semantics overlap during the schema matching step in the iterative framework, based on a composite score with evidence from multiple pairwise value comparison criteria. We evaluate two approaches using a gold standard we created, and compare the accuracy of the augmented tags and the runtime with the two baseline approaches. We show that the case of augmented tags has relatively high accuracy and the runtime of our iterative approach is reasonable. We argue that an approach that creates approximate matching in a pay-as-you-go fashion has good precision and recall, and is the more realistic option in a real-world scenario.
|
Genre | |
Type | |
Language |
eng
|
Date Available |
2023-03-23
|
Provider |
Vancouver : University of British Columbia Library
|
Rights |
Attribution-NonCommercial-NoDerivatives 4.0 International
|
DOI |
10.14288/1.0428551
|
URI | |
Degree | |
Program | |
Affiliation | |
Degree Grantor |
University of British Columbia
|
Graduation Date |
2023-05
|
Campus | |
Scholarly Level |
Graduate
|
Rights URI | |
Aggregated Source Repository |
DSpace
|
Item Media
Item Citations and Data
Rights
Attribution-NonCommercial-NoDerivatives 4.0 International