- Library Home /
- Search Collections /
- Open Collections /
- Browse Collections /
- UBC Theses and Dissertations /
- The SHARE system : a semantic web based approach for...
Open Collections
UBC Theses and Dissertations
UBC Theses and Dissertations
The SHARE system : a semantic web based approach for evaluating queries across distributed bioinformatics databases and software Vandervalk, Ben
Abstract
Many bioinformatics studies require combined use of data sets and software developed by different research labs. At the current time, accomplishing such studies requires the development of custom scripts that act as “glue” for the independent resources, performing transformations on the data sets that will allow them to be loaded into a single database and/or shuttled through different pieces of software. Due to the tedium and inefficiency of manual data/software integration, many institutions and research groups have sought to find a more reliable and automatic approach. The most significant integration project in recent years has been the Semantic Web activity of the World Wide Web Consortium (W3C), which aims to automate data integration not only in bioinformatics, but on the WWW as a whole. The goal of the Semantic Web is to interlink data on the web in a manner that is similar to the way that HTML pages are linked, while at the same time making the data available in a universal form that can be easily processed by software. In this thesis, the author describes a distributed query system called SHARE (Semantic Health and Research Environment) which demonstrates how the available standards and tools of the Semantic Web can be assembled into a framework for automating data and software integration in bioinformatics. We find that while SHARE has a similar architecture to existing query systems, the use of Semantic Web technologies has important advantages for the implementation, maintenance, and addition of new data sources to the system. After reviewing the mechanics of SHARE, we examine the crucial problem of optimizing queries in an environment where statistics about the data sources are typically not available. A query evaluation procedure called GREEDY is presented that addresses this challenge by: i) interleaving the planning and execution phases of a query, and ii) learning statistics from the execution of previous queries. We conclude by highlighting the unique strengths of SHARE and GREEDY in relation to other integration systems, and review important areas for future work.
Item Metadata
Title |
The SHARE system : a semantic web based approach for evaluating queries across distributed bioinformatics databases and software
|
Creator | |
Publisher |
University of British Columbia
|
Date Issued |
2011
|
Description |
Many bioinformatics studies require combined use of data sets and software developed by different research labs. At the current time, accomplishing such studies requires the development of custom scripts that act as “glue” for the independent resources, performing transformations on the data sets that will allow them to be loaded into a single database and/or shuttled through different pieces of software. Due to the tedium and inefficiency of manual data/software integration, many institutions and research groups have sought to find a more reliable and automatic approach. The most significant integration project in recent years has been the Semantic Web activity of the World Wide Web Consortium (W3C), which aims to automate data integration not only in bioinformatics, but on the WWW as a whole. The goal of the Semantic Web is to interlink data on the web in a manner that is similar to the way that HTML pages are linked, while at the same time making the data available in a universal form that can be easily processed by software. In this thesis, the author describes a distributed query system called SHARE (Semantic Health and Research Environment) which demonstrates how the available standards and tools of the Semantic Web can be assembled into a framework for automating data and software integration in bioinformatics. We find that while SHARE has a similar architecture to existing query systems, the use of Semantic Web technologies has important advantages for the implementation, maintenance, and addition of new data sources to the system. After reviewing the mechanics of SHARE, we examine the crucial problem of optimizing queries in an environment where statistics about the data sources are typically not available. A query evaluation procedure called GREEDY is presented that addresses this challenge by: i) interleaving the planning and execution phases of a query, and ii) learning statistics from the execution of previous queries. We conclude by highlighting the unique strengths of SHARE and GREEDY in relation to other integration systems, and review important areas for future work.
|
Genre | |
Type | |
Language |
eng
|
Date Available |
2011-04-28
|
Provider |
Vancouver : University of British Columbia Library
|
Rights |
Attribution-NonCommercial-NoDerivatives 4.0 International
|
DOI |
10.14288/1.0071783
|
URI | |
Degree | |
Program | |
Affiliation | |
Degree Grantor |
University of British Columbia
|
Graduation Date |
2011-05
|
Campus | |
Scholarly Level |
Graduate
|
Rights URI | |
Aggregated Source Repository |
DSpace
|
Item Media
Item Citations and Data
Rights
Attribution-NonCommercial-NoDerivatives 4.0 International