Using term proximity measures for identifying compound concepts : an expolatory study

UBC Theses and Dissertations

Featured Collection

UBC Theses and Dissertations

Using term proximity measures for identifying compound concepts : an expolatory study Yin, Nawei

Abstract

With the rapid development of information technology, individuals using the technology are liable to be overwhelmed by the excessive amounts of information available when conducting online (local or remote) document searches. It is important therefore that users specify the correct search terms. However, a user does not always know which terms to use and often the same idea can be described by different terms. Constructing lists of possible search terms for different domains would require a very substantial effort by experts in each domain. To alleviate these problems, automated techniques can be valuable to extract concepts and meaningful phrases for specific domains. This work is an exploratory study of automated extraction of compound concepts from a collection of documents in a specific domain. The concept-extraction methods used in this study employed clustering techniques based on distance measures that reflect term affinity statistics rather than techniques based on similarity measures adopted in most previous works. The study compared the effects of different methods of calculating affinities, depending on the sizes of textual units where terms co-occur and on directionality and asymmetry between terms. The accounting context was used as a case study to provide the data. An accounting expert evaluated the resulting clusters produced by the clustering program. As demonstrated by our results, the method identified meaningful accounting compound concepts and phrases. The research also indicated which affinity types generated better results. For example, affinities based on occurrence of terms within a document produced the poorest results. There was a significant manual effort involved in "preprocessing" the data prior to compound concept identification. However, we believe the techniques explored might be useful for users to search relevant information within individual domains and can be extended to support the construction of domain-specific thesauri.

Item Metadata

Title	Using term proximity measures for identifying compound concepts : an expolatory study
Creator	Yin, Nawei
Publisher	University of British Columbia
Date Issued	2004
Description	With the rapid development of information technology, individuals using the technology are liable to be overwhelmed by the excessive amounts of information available when conducting online (local or remote) document searches. It is important therefore that users specify the correct search terms. However, a user does not always know which terms to use and often the same idea can be described by different terms. Constructing lists of possible search terms for different domains would require a very substantial effort by experts in each domain. To alleviate these problems, automated techniques can be valuable to extract concepts and meaningful phrases for specific domains. This work is an exploratory study of automated extraction of compound concepts from a collection of documents in a specific domain. The concept-extraction methods used in this study employed clustering techniques based on distance measures that reflect term affinity statistics rather than techniques based on similarity measures adopted in most previous works. The study compared the effects of different methods of calculating affinities, depending on the sizes of textual units where terms co-occur and on directionality and asymmetry between terms. The accounting context was used as a case study to provide the data. An accounting expert evaluated the resulting clusters produced by the clustering program. As demonstrated by our results, the method identified meaningful accounting compound concepts and phrases. The research also indicated which affinity types generated better results. For example, affinities based on occurrence of terms within a document produced the poorest results. There was a significant manual effort involved in "preprocessing" the data prior to compound concept identification. However, we believe the techniques explored might be useful for users to search relevant information within individual domains and can be extended to support the construction of domain-specific thesauri.
Extent	5784465 bytes
Genre	Thesis/Dissertation
Type	Text
File Format	application/pdf
Language	eng
Date Available	2009-11-26
Provider	Vancouver : University of British Columbia Library
Rights	For non-commercial purposes only, such as research, private study and education. Additional conditions apply, see Terms of Use https://open.library.ubc.ca/terms_of_use.
DOI	10.14288/1.0091604
URI	http://hdl.handle.net/2429/15815
Degree (Theses)	Master of Science in Business - MScB
Program (Theses)	Business Administration - Management Information Systems
Affiliation	Business, Sauder School of; Management Information Systems, Division of
Degree Grantor	University of British Columbia
Graduation Date	2004-11
Campus	UBCV
Scholarly Level	Graduate
Aggregated Source Repository	DSpace

Item Media

ubc_2004-0702.pdf -- 5.52MB

Item Citations and Data

Rights

For non-commercial purposes only, such as research, private study and education. Additional conditions apply, see Terms of Use https://open.library.ubc.ca/terms_of_use.

Open Collections

UBC Theses and Dissertations

Using term proximity measures for identifying compound concepts : an expolatory study Yin, Nawei

Abstract

Item Metadata

Item Media

Item Citations and Data

Rights