Artificial semantics in text retrieval

UBC Theses and Dissertations

Featured Collection

UBC Theses and Dissertations

Artificial semantics in text retrieval Arazy, Ofer

Abstract

The increase in the amounts of available information, coupled with the rising importance of information for planning and decision making purposes, stress the need for effective information retrieval (IR) techniques. Specifically, we are interested in the retrieval of textual information from general - i.e. large and heterogeneous - collections. One of the most critical problems impeding the performance of retrieval systems is the gap between the way in which people think about information (though semantic representations) and the natural language form of textual documents. Bridging this gap requires that documents be translated to semantic representations. For general document collections, the extraction of semantic representation has to be automated, as manual effort and the use of domain-specific resources are inappropriate. We have identified four types of artificial (i.e. automatically extracted) semantic units that are the building blocks of IR representation: 'Tokens', 'Composite Concepts', 'Synonym Concepts', and 'Topics'. These artificial semantic units have been employed in a variety of retrieval system; however, the isolated effect of semantic units on retrieval performance has not been studies previously. This dissertation investigates the effect of semantic units on retrieval performance. Our findings suggest that (a) there are significant differences in performance between semantic units, and (b) our proposed combinations of semantic units into a coherent retrieval model result is performance gains. In addition to the academic contribution in this dissertation, our findings are of importance to practitioners interested in the design of retrieval systems.

Item Metadata

Title	Artificial semantics in text retrieval
Creator	Arazy, Ofer
Publisher	University of British Columbia
Date Issued	2004
Description	The increase in the amounts of available information, coupled with the rising importance of information for planning and decision making purposes, stress the need for effective information retrieval (IR) techniques. Specifically, we are interested in the retrieval of textual information from general - i.e. large and heterogeneous - collections. One of the most critical problems impeding the performance of retrieval systems is the gap between the way in which people think about information (though semantic representations) and the natural language form of textual documents. Bridging this gap requires that documents be translated to semantic representations. For general document collections, the extraction of semantic representation has to be automated, as manual effort and the use of domain-specific resources are inappropriate. We have identified four types of artificial (i.e. automatically extracted) semantic units that are the building blocks of IR representation: 'Tokens', 'Composite Concepts', 'Synonym Concepts', and 'Topics'. These artificial semantic units have been employed in a variety of retrieval system; however, the isolated effect of semantic units on retrieval performance has not been studies previously. This dissertation investigates the effect of semantic units on retrieval performance. Our findings suggest that (a) there are significant differences in performance between semantic units, and (b) our proposed combinations of semantic units into a coherent retrieval model result is performance gains. In addition to the academic contribution in this dissertation, our findings are of importance to practitioners interested in the design of retrieval systems.
Extent	16752010 bytes
Genre	Thesis/Dissertation
Type	Text
File Format	application/pdf
Language	eng
Date Available	2009-12-02
Provider	Vancouver : University of British Columbia Library
Rights	For non-commercial purposes only, such as research, private study and education. Additional conditions apply, see Terms of Use https://open.library.ubc.ca/terms_of_use.
DOI	10.14288/1.0091921
URI	http://hdl.handle.net/2429/16068
Degree	Doctor of Philosophy - PhD
Program	Business Administration
Affiliation	Business, Sauder School of
Degree Grantor	University of British Columbia
Graduation Date	2004-11
Campus	UBCV
Scholarly Level	Graduate
Aggregated Source Repository	DSpace

Item Media

ubc_2004-930955.pdf -- 15.98MB

Item Citations and Data

Rights

For non-commercial purposes only, such as research, private study and education. Additional conditions apply, see Terms of Use https://open.library.ubc.ca/terms_of_use.

Open Collections

UBC Theses and Dissertations

Artificial semantics in text retrieval Arazy, Ofer

Abstract

Item Metadata

Item Media

Item Citations and Data

Rights