- Library Home /
- Search Collections /
- Open Collections /
- Browse Collections /
- UBC Theses and Dissertations /
- Artificial semantics in text retrieval
Open Collections
UBC Theses and Dissertations
UBC Theses and Dissertations
Artificial semantics in text retrieval Arazy, Ofer
Abstract
The increase in the amounts of available information, coupled with the rising importance of information for planning and decision making purposes, stress the need for effective information retrieval (IR) techniques. Specifically, we are interested in the retrieval of textual information from general - i.e. large and heterogeneous - collections. One of the most critical problems impeding the performance of retrieval systems is the gap between the way in which people think about information (though semantic representations) and the natural language form of textual documents. Bridging this gap requires that documents be translated to semantic representations. For general document collections, the extraction of semantic representation has to be automated, as manual effort and the use of domain-specific resources are inappropriate. We have identified four types of artificial (i.e. automatically extracted) semantic units that are the building blocks of IR representation: 'Tokens', 'Composite Concepts', 'Synonym Concepts', and 'Topics'. These artificial semantic units have been employed in a variety of retrieval system; however, the isolated effect of semantic units on retrieval performance has not been studies previously. This dissertation investigates the effect of semantic units on retrieval performance. Our findings suggest that (a) there are significant differences in performance between semantic units, and (b) our proposed combinations of semantic units into a coherent retrieval model result is performance gains. In addition to the academic contribution in this dissertation, our findings are of importance to practitioners interested in the design of retrieval systems.
Item Metadata
Title |
Artificial semantics in text retrieval
|
Creator | |
Publisher |
University of British Columbia
|
Date Issued |
2004
|
Description |
The increase in the amounts of available information, coupled with the rising
importance of information for planning and decision making purposes, stress the need for
effective information retrieval (IR) techniques. Specifically, we are interested in the
retrieval of textual information from general - i.e. large and heterogeneous - collections.
One of the most critical problems impeding the performance of retrieval systems is the
gap between the way in which people think about information (though semantic
representations) and the natural language form of textual documents.
Bridging this gap requires that documents be translated to semantic
representations. For general document collections, the extraction of semantic
representation has to be automated, as manual effort and the use of domain-specific
resources are inappropriate. We have identified four types of artificial (i.e. automatically
extracted) semantic units that are the building blocks of IR representation: 'Tokens',
'Composite Concepts', 'Synonym Concepts', and 'Topics'. These artificial semantic
units have been employed in a variety of retrieval system; however, the isolated effect of
semantic units on retrieval performance has not been studies previously.
This dissertation investigates the effect of semantic units on retrieval performance.
Our findings suggest that (a) there are significant differences in performance between
semantic units, and (b) our proposed combinations of semantic units into a coherent
retrieval model result is performance gains. In addition to the academic contribution in
this dissertation, our findings are of importance to practitioners interested in the design of
retrieval systems.
|
Extent |
16752010 bytes
|
Genre | |
Type | |
File Format |
application/pdf
|
Language |
eng
|
Date Available |
2009-12-02
|
Provider |
Vancouver : University of British Columbia Library
|
Rights |
For non-commercial purposes only, such as research, private study and education. Additional conditions apply, see Terms of Use https://open.library.ubc.ca/terms_of_use.
|
DOI |
10.14288/1.0091921
|
URI | |
Degree | |
Program | |
Affiliation | |
Degree Grantor |
University of British Columbia
|
Graduation Date |
2004-11
|
Campus | |
Scholarly Level |
Graduate
|
Aggregated Source Repository |
DSpace
|
Item Media
Item Citations and Data
Rights
For non-commercial purposes only, such as research, private study and education. Additional conditions apply, see Terms of Use https://open.library.ubc.ca/terms_of_use.