Using lexical knowledge and parafoveal information for the recognition of common words and suffixes

UBC Theses and Dissertations

Featured Collection

UBC Theses and Dissertations

Using lexical knowledge and parafoveal information for the recognition of common words and suffixes Rhone, Brock William

Abstract

Research over the past decade into the psychophysics of reading has demonstrated that information extracted from text falling on the parafoveal and peripheral regions of the retina is used by the human visual system to significantly increase reading speed. Recent results provide evidence that knowledge of word frequency is brought to bear in processing parafoveal data. There is other psychological evidence indicating the type of large-scale features used by the visual system to recognize isolated characters in parafoveal vision. This thesis describes the design and implementation of a system able to recognize the most commonly occurring english words and suffixes from parafoveally available information by employing knowledge of their letter sequences and of large-scale features of lower-case characters. The Marr-Hildreth theory of edge detection provides a description of the information computed by the earliest stages of visual processing from parafoveal words. Large-scale features extracted from this description, while relatively invariant with respect to noise and font changes, are insufficient to uniquely identify most characters but are used to place each into one of several classes of similar characters. The sequence of these 'confusion classes' is found to place a strong constraint on word identity—of the 1000 most common words comprising the system's vocabulary, representing 70% of the volume of the Brown Corpus of printed English, 92% have mutually unique confusion class sequences. Word recognition is achieved by using the confusion class sequence as a key into the vocabulary, retrieving the word or words having the same sequence. Suffixes are recognized in a similar way. Results are presented demonstrating the system's ability to identify words and suffixes in text images over a range of simulated parafoveal eccentricities and in two different fonts, one with serifs and one without. Smoothing by the Marr-Hildreth operator, the simplicity and scale of the features, the size of the character classes, and the context provided by the character sequence give the system a degree of robustness.

Item Metadata

Title	Using lexical knowledge and parafoveal information for the recognition of common words and suffixes
Creator	Rhone, Brock William
Publisher	University of British Columbia
Date Issued	1987
Description	Research over the past decade into the psychophysics of reading has demonstrated that information extracted from text falling on the parafoveal and peripheral regions of the retina is used by the human visual system to significantly increase reading speed. Recent results provide evidence that knowledge of word frequency is brought to bear in processing parafoveal data. There is other psychological evidence indicating the type of large-scale features used by the visual system to recognize isolated characters in parafoveal vision. This thesis describes the design and implementation of a system able to recognize the most commonly occurring english words and suffixes from parafoveally available information by employing knowledge of their letter sequences and of large-scale features of lower-case characters. The Marr-Hildreth theory of edge detection provides a description of the information computed by the earliest stages of visual processing from parafoveal words. Large-scale features extracted from this description, while relatively invariant with respect to noise and font changes, are insufficient to uniquely identify most characters but are used to place each into one of several classes of similar characters. The sequence of these 'confusion classes' is found to place a strong constraint on word identity—of the 1000 most common words comprising the system's vocabulary, representing 70% of the volume of the Brown Corpus of printed English, 92% have mutually unique confusion class sequences. Word recognition is achieved by using the confusion class sequence as a key into the vocabulary, retrieving the word or words having the same sequence. Suffixes are recognized in a similar way. Results are presented demonstrating the system's ability to identify words and suffixes in text images over a range of simulated parafoveal eccentricities and in two different fonts, one with serifs and one without. Smoothing by the Marr-Hildreth operator, the simplicity and scale of the features, the size of the character classes, and the context provided by the character sequence give the system a degree of robustness.
Genre	Thesis/Dissertation
Type	Text
Language	eng
Date Available	2010-07-15
Provider	Vancouver : University of British Columbia Library
Rights	For non-commercial purposes only, such as research, private study and education. Additional conditions apply, see Terms of Use https://open.library.ubc.ca/terms_of_use.
DOI	10.14288/1.0051923
URI	http://hdl.handle.net/2429/26520
Degree (Theses)	Master of Science - MSc
Program (Theses)	Computer Science
Affiliation	Science, Faculty of; Computer Science, Department of
Degree Grantor	University of British Columbia
Campus	UBCV
Scholarly Level	Graduate
Aggregated Source Repository	DSpace

Item Media

UBC_1987_A6_7 R46.pdf -- 4.91MB

Item Citations and Data

Rights

For non-commercial purposes only, such as research, private study and education. Additional conditions apply, see Terms of Use https://open.library.ubc.ca/terms_of_use.

Open Collections

UBC Theses and Dissertations

Using lexical knowledge and parafoveal information for the recognition of common words and suffixes Rhone, Brock William

Abstract

Item Metadata

Item Media

Item Citations and Data

Rights