- Library Home /
- Search Collections /
- Open Collections /
- Browse Collections /
- UBC Theses and Dissertations /
- Efficient extraction of ontologies from domain specific...
Open Collections
UBC Theses and Dissertations
UBC Theses and Dissertations
Efficient extraction of ontologies from domain specific text corpora Li, Tianyu
Abstract
There is a huge body of domain-specific knowledge embedded in free-text repositories such as engineering documents, instruction manuals, medical references and legal files. Extracting ontological relationships (e.g., ISA and HASA) from this kind of corpus can improve users’ queries and improve navigation through the corpus, as well as benefiting applications built for these domains. Current methods to extract ontological relationships from text data usually fail to capture many meaningful relationships because they concentrate on single-word-terms or very short phrases. This is particularly problematic in a smaller corpus, where it is harder to find statistically meaningful relationships. We propose a novel pattern-based algorithm that finds ontological relationships between complex concepts by exploiting parsing information to extract concepts consisting of multi-word and nested phrases. Our procedure is iterative: we tailor the constrained sequential pattern mining framework to discover new patterns. We compare our algorithm with previous representative ontology extraction algorithms on four real data sets and achieve consistently and significantly better results.
Item Metadata
Title |
Efficient extraction of ontologies from domain specific text corpora
|
Creator | |
Publisher |
University of British Columbia
|
Date Issued |
2011
|
Description |
There is a huge body of domain-specific knowledge embedded in free-text repositories such as engineering documents, instruction manuals, medical references and legal files.
Extracting ontological relationships (e.g., ISA and HASA) from this kind of corpus can improve users’ queries and improve navigation through the corpus, as well as benefiting applications built for these domains.
Current methods to extract ontological relationships from text data usually fail to capture many meaningful relationships because they concentrate on single-word-terms or very short phrases. This is particularly problematic in a smaller corpus, where it is harder to find statistically meaningful relationships.
We propose a novel pattern-based algorithm that finds ontological relationships between complex concepts by exploiting parsing information to extract concepts consisting of multi-word and nested phrases.
Our procedure is iterative: we tailor the constrained sequential pattern mining framework to discover new patterns. We compare our algorithm with previous representative ontology extraction algorithms on four real data sets and achieve
consistently and significantly better results.
|
Genre | |
Type | |
Language |
eng
|
Date Available |
2011-12-14
|
Provider |
Vancouver : University of British Columbia Library
|
Rights |
Attribution-NonCommercial-NoDerivatives 4.0 International
|
DOI |
10.14288/1.0052152
|
URI | |
Degree | |
Program | |
Affiliation | |
Degree Grantor |
University of British Columbia
|
Graduation Date |
2012-05
|
Campus | |
Scholarly Level |
Graduate
|
Rights URI | |
Aggregated Source Repository |
DSpace
|
Item Media
Item Citations and Data
Rights
Attribution-NonCommercial-NoDerivatives 4.0 International