Ontology alignment in the presence of a domain ontology : finding protein homology

UBC Theses and Dissertations

Featured Collection

UBC Theses and Dissertations

Ontology alignment in the presence of a domain ontology : finding protein homology Carbonetto, Andrew August

Abstract

Cheap electronic storage and Internet bandwidth has increased the amount of online data. Large quantities of metadata are created to manage this wealth of information. Methods to organize and structure metadata has led to the development of ontologies - data that is organized to describe the relation between elements. The creation of large ontologies has brought forth the need for ontology management strategies. Ontology alignment and merging techniques are standard operations for ontology management. Accurate ontology alignment methods are typically semi-automatic, meaning they require periodic user input. This becomes infeasible on large ontologies and the accuracy and efficiency drops significantly when these algorithms are forced to align without human interaction. Bioinformatics, for example, has seen the influx of large ontologies, such as signal pathway sets with thousands of elements or protein-protein interaction (PPI) databases with hundreds of thousands of elements. This drives the need for a reliable method of large-scale ontology alignment. Many bioinformatics ontologies contain references to domain ontologies - manually curated ontologies describing additional, general information about the terms in the ontologies. For example, more than 2/3 of proteins in PPI data sets contain at least one annotation to the domain ontology the Gene Ontology. We use the domain ontology references as features to compute similarity between elements. However, there are few efficient ways to compute similarity from structured features. We present a novel, automatic method for aligning ontologies based on such domain ontology features. Specifically, we use simulated annealing to reduce the complexity of the domain ontologys structure by finding approximate relevant clusters of elements. An intermediate step performs hierarchical clustering based on the similarity between elements of the ontology. Then the mapping between clusters across aligning ontologies is built. The final step builds an alignment between matched clusters. To evaluate our methods, we perform an alignment between Human (Homo Sapiens) and Yeast (Saccharomyces cerevisiae) signal pathways provided by the Reactome database. The results were compared against reliable homology studies of proteins. The final mapping produces alignments that are significantly more accurate than the traditional ontology alignment methods, without any human involvement.

Item Metadata

Title	Ontology alignment in the presence of a domain ontology : finding protein homology
Creator	Carbonetto, Andrew August
Publisher	University of British Columbia
Date Issued	2008
Description	Cheap electronic storage and Internet bandwidth has increased the amount of online data. Large quantities of metadata are created to manage this wealth of information. Methods to organize and structure metadata has led to the development of ontologies - data that is organized to describe the relation between elements. The creation of large ontologies has brought forth the need for ontology management strategies. Ontology alignment and merging techniques are standard operations for ontology management. Accurate ontology alignment methods are typically semi-automatic, meaning they require periodic user input. This becomes infeasible on large ontologies and the accuracy and efficiency drops significantly when these algorithms are forced to align without human interaction. Bioinformatics, for example, has seen the influx of large ontologies, such as signal pathway sets with thousands of elements or protein-protein interaction (PPI) databases with hundreds of thousands of elements. This drives the need for a reliable method of large-scale ontology alignment. Many bioinformatics ontologies contain references to domain ontologies - manually curated ontologies describing additional, general information about the terms in the ontologies. For example, more than 2/3 of proteins in PPI data sets contain at least one annotation to the domain ontology the Gene Ontology. We use the domain ontology references as features to compute similarity between elements. However, there are few efficient ways to compute similarity from structured features. We present a novel, automatic method for aligning ontologies based on such domain ontology features. Specifically, we use simulated annealing to reduce the complexity of the domain ontologys structure by finding approximate relevant clusters of elements. An intermediate step performs hierarchical clustering based on the similarity between elements of the ontology. Then the mapping between clusters across aligning ontologies is built. The final step builds an alignment between matched clusters. To evaluate our methods, we perform an alignment between Human (Homo Sapiens) and Yeast (Saccharomyces cerevisiae) signal pathways provided by the Reactome database. The results were compared against reliable homology studies of proteins. The final mapping produces alignments that are significantly more accurate than the traditional ontology alignment methods, without any human involvement.
Extent	1902628 bytes
Genre	Thesis/Dissertation
Type	Text
File Format	application/pdf
Language	eng
Date Available	2008-05-09
Provider	Vancouver : University of British Columbia Library
Rights	Attribution-NonCommercial-NoDerivatives 4.0 International
DOI	10.14288/1.0051437
URI	http://hdl.handle.net/2429/821
Degree (Theses)	Master of Science - MSc
Program (Theses)	Computer Science
Affiliation	Science, Faculty of; Computer Science, Department of
Degree Grantor	University of British Columbia
Graduation Date	2008-11
Campus	UBCV
Scholarly Level	Graduate
Rights URI	http://creativecommons.org/licenses/by-nc-nd/4.0/
Aggregated Source Repository	DSpace

Open Collections

UBC Theses and Dissertations

UBC Theses and Dissertations

Ontology alignment in the presence of a domain ontology : finding protein homology Carbonetto, Andrew August

Abstract

Item Metadata

Item Media

Item Citations and Data

Rights