PSS : a phonetic search system for short text documents

UBC Theses and Dissertations

Featured Collection

UBC Theses and Dissertations

PSS : a phonetic search system for short text documents Zhang, Jerry Jiaer

Abstract

Finding the right information from the increasing amount of data on the Internet is not easy. This is why most people use search engines because they make searching less difficult with a a variety of techniques. In this thesis, we address one of them called phonetic matching. The idea is to look for documents in a document set based on not only the spellings but their pronunciations as well. It is useful when a query contains spelling mistakes or a correctly spelled one does not return enough results. In these cases, phonetic matching can fix or tune up the original query by replacing some or all query words with the new ones that are phonetically similar, and hopefully achieve more hits. We propose the design of such a search system for short text documents. It allows for single- and multiple-word queries to be matched to sound-like words or phrases contained in a document set and sort the results in terms of their relevance to the original queries. Our design differs from many existing systems in that, instead of relying heavily on a set of extensive prior user query logs, our system makes search decisions mostly based on a relatively small dictionary consisting of organized metadata. Our goal is to make it suitable for start-up document sets to have the comparable phonetic search ability as those of bigger databases, without having to wait till enough historical user queries are accumulated.

Item Metadata

Title	PSS : a phonetic search system for short text documents
Creator	Zhang, Jerry Jiaer
Publisher	University of British Columbia
Date Issued	2008
Description	Finding the right information from the increasing amount of data on the Internet is not easy. This is why most people use search engines because they make searching less difficult with a a variety of techniques. In this thesis, we address one of them called phonetic matching. The idea is to look for documents in a document set based on not only the spellings but their pronunciations as well. It is useful when a query contains spelling mistakes or a correctly spelled one does not return enough results. In these cases, phonetic matching can fix or tune up the original query by replacing some or all query words with the new ones that are phonetically similar, and hopefully achieve more hits. We propose the design of such a search system for short text documents. It allows for single- and multiple-word queries to be matched to sound-like words or phrases contained in a document set and sort the results in terms of their relevance to the original queries. Our design differs from many existing systems in that, instead of relying heavily on a set of extensive prior user query logs, our system makes search decisions mostly based on a relatively small dictionary consisting of organized metadata. Our goal is to make it suitable for start-up document sets to have the comparable phonetic search ability as those of bigger databases, without having to wait till enough historical user queries are accumulated.
Extent	1031338 bytes
Genre	Thesis/Dissertation
Type	Text
File Format	application/pdf
Language	eng
Date Available	2009-03-05
Provider	Vancouver : University of British Columbia Library
Rights	Attribution-NonCommercial-NoDerivatives 4.0 International
DOI	10.14288/1.0051236
URI	http://hdl.handle.net/2429/5537
Degree (Theses)	Master of Science - MSc
Program (Theses)	Computer Science
Affiliation	Science, Faculty of; Computer Science, Department of
Degree Grantor	University of British Columbia
Graduation Date	2008-11
Campus	UBCV
Scholarly Level	Graduate
Rights URI	http://creativecommons.org/licenses/by-nc-nd/4.0/
Aggregated Source Repository	DSpace

Open Collections

UBC Theses and Dissertations

UBC Theses and Dissertations

PSS : a phonetic search system for short text documents Zhang, Jerry Jiaer

Abstract

Item Metadata

Item Media

Item Citations and Data

Rights