TECTAS : bridging the gap between collaborative tagging systems and structured data

UBC Theses and Dissertations

Featured Collection

UBC Theses and Dissertations

TECTAS : bridging the gap between collaborative tagging systems and structured data Moosavi, Seyyed Ali

Abstract

Ontologies are core building block of the emerging semantic web, and taxonomies which contain class-subclass relationships between concepts are a key component of ontologies. A taxonomy that relates the tags in a collaborative tagging system makes the collaborative tagging system's underlying structure easier to understand. Automatic construction of taxonomies from various data sources such as text data and collaborative tagging systems has been an interesting topic in the field of data mining. This thesis introduces a new algorithm for building a taxonomy of keywords from tags in collaborative tagging systems. This algorithm is also capable of detecting has-a relationships between tags. Proposed method - the TECTAS algorithm - uses association rule mining to detect is-a relationships between tags and can be used in an automatic or semi-automatic framework. TECTAS algorithm is based on the hypothesis that users tend to assign both "child" and "parent" tags to a resource. Proposed method leverages association rule mining algorithms, bi-gram pruning using search engines, discovering relationships when pairs of tags have a common child, and lexico-syntactic patterns to detect meronyms. In addition to proposing the TECTAS algorithm, several experiments are reported using four real data sets: Del.icio.us, LibraryThing, CiteULike, and IMDb. Based on these experiments, the following topics are addressed in this thesis: (1) Verify the necessity of building domain specific taxonomies (2) Analyze tagging behavior of users in collaborative tagging systems (3) Verify the effectiveness of our algorithm compared to previous approaches (4) Use of additional quality and richness metrics for evaluation of automatically extracted taxonomies.

Item Metadata

Title	TECTAS : bridging the gap between collaborative tagging systems and structured data
Creator	Moosavi, Seyyed Ali
Publisher	University of British Columbia
Date Issued	2010
Description	Ontologies are core building block of the emerging semantic web, and taxonomies which contain class-subclass relationships between concepts are a key component of ontologies. A taxonomy that relates the tags in a collaborative tagging system makes the collaborative tagging system's underlying structure easier to understand. Automatic construction of taxonomies from various data sources such as text data and collaborative tagging systems has been an interesting topic in the field of data mining. This thesis introduces a new algorithm for building a taxonomy of keywords from tags in collaborative tagging systems. This algorithm is also capable of detecting has-a relationships between tags. Proposed method - the TECTAS algorithm - uses association rule mining to detect is-a relationships between tags and can be used in an automatic or semi-automatic framework. TECTAS algorithm is based on the hypothesis that users tend to assign both "child" and "parent" tags to a resource. Proposed method leverages association rule mining algorithms, bi-gram pruning using search engines, discovering relationships when pairs of tags have a common child, and lexico-syntactic patterns to detect meronyms. In addition to proposing the TECTAS algorithm, several experiments are reported using four real data sets: Del.icio.us, LibraryThing, CiteULike, and IMDb. Based on these experiments, the following topics are addressed in this thesis: (1) Verify the necessity of building domain specific taxonomies (2) Analyze tagging behavior of users in collaborative tagging systems (3) Verify the effectiveness of our algorithm compared to previous approaches (4) Use of additional quality and richness metrics for evaluation of automatically extracted taxonomies.
Genre	Thesis/Dissertation
Type	Text
Language	eng
Date Available	2010-10-26
Provider	Vancouver : University of British Columbia Library
Rights	Attribution-NonCommercial-NoDerivatives 4.0 International
DOI	10.14288/1.0051986
URI	http://hdl.handle.net/2429/29554
Degree	Master of Science - MSc
Program	Computer Science
Affiliation	Science, Faculty of; Computer Science, Department of
Degree Grantor	University of British Columbia
Graduation Date	2010-11
Campus	UBCV
Scholarly Level	Graduate
Rights URI	http://creativecommons.org/licenses/by-nc-nd/4.0/
Aggregated Source Repository	DSpace

Open Collections

UBC Theses and Dissertations

UBC Theses and Dissertations

TECTAS : bridging the gap between collaborative tagging systems and structured data Moosavi, Seyyed Ali

Abstract

Item Metadata

Item Media

Item Citations and Data

Rights