UBC Theses and Dissertations

UBC Theses Logo

UBC Theses and Dissertations

Using lexical knowledge and parafoveal information for the recognition of common words and suffixes Rhone, Brock William 1987

Your browser doesn't seem to have a PDF viewer, please download the PDF to view this item.

Item Metadata

Download

Media
831-UBC_1987_A6_7 R46.pdf [ 4.91MB ]
Metadata
JSON: 831-1.0051923.json
JSON-LD: 831-1.0051923-ld.json
RDF/XML (Pretty): 831-1.0051923-rdf.xml
RDF/JSON: 831-1.0051923-rdf.json
Turtle: 831-1.0051923-turtle.txt
N-Triples: 831-1.0051923-rdf-ntriples.txt
Original Record: 831-1.0051923-source.json
Full Text
831-1.0051923-fulltext.txt
Citation
831-1.0051923.ris

Full Text

USING LEXICAL KNOWLEDGE AND PARAFOVEAL INFORMATION FOR THE RECOGNITION OF COMMON WORDS AND SUFFIXES by BROCK WILLIAM RHONE B.Sc, The University of British Columbia A THESIS SUBMITTED IN PARTIAL FULFILLMENT OF THE REQUIREMENTS FOR THE DEGREE OF MASTER OF SCIENCE in THE FACULTY OF GRADUATE STUDIES (Department of Computer Science) We accept this thesis as conforming to the required standard THE UNIVERSITY OF BRITISH COLUMBIA October 1987 © Brock William Rhone, 1987 In presenting this thesis in partial fulfilment of the requirements for an advanced degree at the University of British Columbia, I agree that the Library shall make it freely available for reference and study. I further agree that permission for extensive copying of this thesis for scholarly purposes may be granted by the head of my department or by his or her representatives. It is understood that copying or publication of this thesis for financial gain shall not be allowed without my written permission. Department of Computer S c i e n c e The University of British Columbia 1956 Main Mall Vancouver, Canada V6T 1Y3 Date 15 O c t o b e r 1987 DE-6(3/81) A B S T R A C T Research over the past decade into the psychophysics of reading has demonstrated that information extracted from text falling on the parafoveal and peripheral re-gions of the retina is used by the human visual system to significantly increase reading speed. Recent results provide evidence that knowledge of word frequency is brought to bear in processing parafoveal data. There is other psychological evidence indicating the type of large-scale features used by the visual system to recognize iso-lated characters in parafoveal vision. This thesis describes the design and implementation of a system able to recognize the most commonly occurring english words and suffixes from parafoveally avail-able information by employing knowledge of their letter sequences and of large-scale features of lower-case characters. The Marr-Hildreth theory of edge detection provides a description of the information computed by the earliest stages of visual processing from parafoveal words. Large-scale features extracted from this descrip-tion, while relatively invariant with respect to noise and font changes, are insuffi-cient to uniquely identify most characters but are used to place each into one of several classes of similar characters. The sequence of these 'confusion classes' is found to place a strong constraint on word identity—of the 1000 most common words comprising the system's vocabu-lary, representing 70% of the volume of the Brown Corpus of printed english, 92% have mutually unique confusion class sequences. Word recognition is achieved by using the confusion class sequence as a key into the vocabulary, retrieving the word or words having the same sequence. Suffixes are recognized in a similar way. Results are presented demonstrating the system's ability to identify words and suf-fixes in text images over a range of simulated parafoveal eccentricities and in two different fonts, one with serifs and one without. Smoothing by the Marr-Hildreth operator, the simplicity and scale of the features, the size of the character classes, and the context provided by the character sequence give the system a degree of ro-bustness. i i C O N T E N T S Abstract ii Contents iii List of Tables v List of Figures vi Acknowledgment vii 1. Introduction 1 1.1 Summary of the Psychophysical Evidence 3 1.2 A Mechanism for Using Parafoveal Information in word recognition 4 2. Approaches to Text Recognition 6 2.1 The Pattern Recognition Approach 6 2.1.1 Recognition of machine-printed text 8 2.2 An Al Approach 9 2.3 The Basis of a New Approach 11 2.4 Applying the New Approach in Understanding Reading 13 2.5 Overview of the Psychophysical Evidence 15 2.5.1 The Work of Rayner and Colleagues 15 2.5.2 Character Features Used as Perceptual Cues in Parafoveal Vision 18 3. Concepts, Design, and Implementation 22 3.1 A Mechanism for Parafoveal Word Recognition 22 3.1.1 Character Features and Confusion Classes 25 3.1.1.1 Features 25 3.1.1.2 Confusion Classes 26 3.1.2 The Most Common English Words 27 3.1.3 Effectiveness of Indexing by Confusion Class Sequence 28 3.1.4 Modeling Parafoveal Information Computed by the Human Visual System 30 3.2 Implementation 36 3.2.1 System Overview 36 3.2.2 Platform 36 3.2.3 Computing Parafoveal Information 38 3.2.4 Data Structures 38 3.2.5 Vocabulary Indexing 40 3.2.6 Segmentation 40 3.2.6.1 Recursive linear segmentation algorithm 40 3.2.6.2 Textlines, Words, and Characters 42 3.2.6.3 Segmentation of Words and Characters 43 3.2.6.4 Determining baselines and midlines 43 3.2.7 Feature Implementation 46 3.2.8 Character Classification and Word Identification 48 i i i 4. Results and Discussion 50 4.1 Results 50 4.1.1 Initial Trials ..i.51 4.1.2 An Explanation of Some Common Word Recognition Errors 54 4.1.2.1 The Proofreading Problem 54 4.1.2.2 Confusing Common Words for Less Common Ones 55 4.1.3 Words with Non-Unique Confusion Class Sequences 56 4.1.4 Classification of All Letters 57 4.1.5 Recognizing Characters in a Different Font 57 4.2 Discussion and Evaluation 61 4.2.1 Adequacy of Features and Classes 61 4.2.2 Robustness 63 4.2.3 Merging 64 5. Conclusions 66 5.1 Contributions 66 5.1.1 Toward a Representation for Parafoveal Information 66 5.1.2 A Mechanism by which Word Frequency Knowledge can Facilitate Parafoveal Word Recognition 67 5.1.3 The Confusion Class Sequence Constraint 67 5.1.4 A Flexible Means of Integrating Multiple Sources of Knowledge 68 5.2 Areas for Further Research 69 5.2.1 Toward a Computational Theory of Reading 69 5.2.1.1 Beyond the Single Character Focus 70 5.2.1.2 Exploiting Other Knowledge Sources 71 Bibliography 72 Appendix A. System Vocabulary 75 A. 1 Vocabulary 75 A.2 Suffixes 76 Appendix B. Non-Unique Confusion Classes 77 iv L I S T O F T A B L E S Table 2.1 Features proposed by Bouma, ordered by estimated Upper Cue values 20 Table 3.1 A few entries from the vocabulary confusion class sequence index 24 Table 3.3 Definition of the confusion classes in terms of the set of features 27 Table 3.4 The two sets of most commonly occurring english words used 28 Table 3.5 Results of indexing the 2005 most common english words by their confusion class sequences 29 Table 3.6 Results of indexing the parafoveal word recognition system's vocabulary words 30 Table 3.7 The widths in the fovea of the central part of the receptive fields of the largest four retinal spatial frequency channels 31 Table 3.8 Values of o, the space constant for the operator calculated according to formula 3.1 with P=30 pixels and A=20' 34 Table 3.9 Word and suffix CC indexes 40 Table 3.10 Implementation of Character Features 46 Table B.l Vocabulary words which do not have unique confusion class sequences 77 v L I S T O F F I G U R E S Figure 3.1 (b) illustrates the zero-crossings (points of intensity change) computed from image (a) by the operator with a = 4.7 pixels 35 Figure 3.2 Schematic of the Parafoveal Word Recognition System 37 Figure 3.3 Composition hierarchy of object types used to represent the objects found in the input image 38 Figure 3.4 Internal representation of some objects 39 Figure 3.5 Coordinate System 39 Figure 3.6 Operation of the recursive linear segmentation algorithm 41 Figure 3.7 Results of applying the recursive linear segmentation algorithm 42 Figure 3.8 The mid and base-lines 43 Figure 3.9 Results of the segmentation phase 45 Figure 3.10 Prolog code defining some confusion classes 48 Figure 4.1 Trial with a = 3.9 (Channel N at centre of parafoveal region) and font Geneva 52 Figure 4.2 Trial with a = 4.7, font Geneva 53 Figure 4.3 The Proofreading Problem (a = 4.7, font Geneva) 54 Figure 4.4 Less common words and abbreviations confused with common ones, (a = 3.1, font Geneva) 55 Figure 4.5 Words with non-unique confusion class sequences, (a = 4.7, font Geneva) 56 Figure 4.6 Classification of all 26 lower-case letters, (o = 3.9, font Geneva) 57 Figure 4.7 Trial with characters of font Palatino (a = 3.9) 59 Figure 4.8 Classification of all lower-case letters in font Palatino (a = 3.9).. 60 vi A C K N O W L E D G M E N T First of all, I would like to thank my supervisor, Richard Rosenberg, for the guid-ance and support he has given me over the course of the work reported here. I would also like to thank David Lowe for his interest and his comments on an ear-lier draft of this thesis. Thanks also go to Marc Majka who was always willing to help in making the software and hardware do what it was supposed to. Finally, I wish to acknowledge the great debt I owe to my family, whose encour-agement and support were essential to the successful completion of this undertak-ing. vii 1 C H A P T E R 1. I N T R O D U C T I O N The human skill of reading has long been a subject of study within several disci-plines. To psychologists, it has been seen as an accessible domain for the study of visual perception; to the pattern recognition community reading is an ability whose emulation by machines would be of enormous practical value. Reading involves the use of many different types of knowledge, from that of the internal structure of individual characters, to the syntax and semantics of natural language and would therefore seem a natural domain for research in the fields of Artificial Intelligence and computational vision, which are centrally concerned with the representation and use of knowledge. Although this has not generally been the case, there is now a growing body of psychological evidence as to how the human visual system han-dles the reading process which can provide insight, guidance and a solid basis for such endeavors. This thesis is concerned with a single aspect of the complex process of reading. Re-cent psychophysical evidence suggests that lower resolution information gathered from characters and words falling to the right of the point where the eye is directly fixated or looking at', is used to significantly increase reading speed and in particu-CHAPTER 1. INTRODUCTION 2 lar may be used to identify familiar words and letter sequences. A natural question to ask is "how can knowledge of familiar words be used in their identification when the low resolution of the visual data may not allow the identification of in-dividual letters?" Results from research in the field of computational vision pro-vide a model of the information computed by the earliest stages of the visual path-way from a text image falling at any point on the retina, and there is additional psy-chological evidence as to the structural features of characters which are used as per-ceptual cues in parafoveal vision. This thesis describes an application of these re-sults in the design and implementation of a system which exploits a strong con-straint provided by the letter sequences of common english words to recognize these words and common suffixes appearing in a parafoveal image. Much of the earlier work on word recognition both the fields of psychology and pattern recognition has shared a common characteristic in paying little attention to the role of early processing as a source of rich and useful descriptions of visual in-formation. Brady (1981) has noted the parallel with earlier trends in Artificial Intel-ligence research where the difficulty in developing adequate descriptions based on low-level information alone resulted in a trend toward often less than successful "heterarchical" systems which relied on a complex integration of many levels of knowledge in order to resolve ambiguity. As a model for a new approach, Brady points to the field of computational vision, which has seen considerable success in explaining the operation of early visual processing. The field is characterized in part by an emphasis on determining the purpose and kind of the knowledge required before considering the mechanisms by which it is used, and close attention to evidence suggesting how the processes oper-CHAPTER 1. INTRODUCTION 3 ate in natural systems. Brady suggests following this approach to develop a "computational theory of reading", and proposes as the next step in this endeavor the search for the means by which low resolution and high resolution visual in-formation is integrated between fixations during reading. This thesis describes an effort following the same approach and proposing an an-swer to the related question of how low resolution parafoveal information might be employed in the early identification of common words and suffixes. 1.1 SUMMARY OF THE PSYCHOPHYSICAL EVIDENCE The process of human reading does not proceed by the exhaustive recognition of each individual character in a line of text. This has been known since the late 19th century when it was observed that a four letter word could be recognized in about the same amount of time needed to recognize a single character (Hull 1986a, p.156). Studies of eye movements during reading have shown that many characters never fall on the fovea—the small area of maximum visual acuity in the retina—and in fact some whole words, usually function words like 'the' and 'and', are frequently skipped altogether. Recent results strongly suggest that lower resolution infor-mation obtained when a word falls on the parafovea (the area surrounding the fovea) is integrated with high resolution information obtained on the subsequent fixation to facilitate word recognition. In fact, a certain proportion of words—those that are skipped, for instance—may be recognized on the basis of parafoveal infor-mation alone. There is also experimental evidence as to the nature of the information that can be extracted from the parafoveal image; it apparently includes such features as word length, and for individual characters, large scale features such as the presence or ab-sence of ascenders and descenders. The scale of these features is consistent with de-CHAPTER 1. INTRODUCTION 4 gree of detail which the computational theory of early vision predicts should be available within the parafovea. Finally, there is evidence as to the the kind of knowledge which is applied in pro-cessing this low-resolution parafoveal information. One of the most recent eye movement experiments shows that the parafoveal information from the most common or most familiar english words is processed far more quickly than that for low-frequency words—strong evidence that word-level knowledge is being brought to bear. This set of observations—that the human visual system apparently uses low reso-lution parafoveal information in word recognition and applies word-level know-ledge in its interpretation—suggests a new avenue to be explored in the application of high-level knowledge in text recognition. The design of a system along these lines involves a definition of the information that is available parafoveally, and a determination of the constraints that lexical knowledge makes available for its interpretation. 1.2 A MECHANISM FOR USING PARAFOVEAL INFORMATION IN WORD RECOGNITION The parafoveal word recognition system described in this thesis operates in the fol-lowing manner. An implementation of the Marr-Hildreth theory of edge detection is used to produce a description of the information computed by the earliest stage of visual processing from an image of text falling on an arbitrary point within the parafovea. From each 'blob' representing a separate character is extracted a set of large scale features. This information is in most cases not enough to uniquely iden-tify the character but is enough to place each into one of several 'confusion classes', each representing all letters bearing the same large scale feature or features. One such confusion class, for example, consists of all characters having descenders. CHAPTER!, INTRODUCTION 5 The word is now represented by a sequence of confusion classes, one for each letter. A simple idea now permits identification of the word, if it is within the system's vocabulary of the most common english words. The confusion class sequence is used as a key into this vocabulary, and all those words with matching sequences are retrieved. Common suffixes give rise to characteristic confusion class sequences which are recognized in the same way, by index lookup into a set of common suf-fixes. It is of course essential to the feasibility of this of this scheme that most lookups into the vocabulary retrieve a single word—in other words that the confusion class sequence provide a strong constraint on the identity of the word. One of the results reported here is that this is indeed a strong constraint. 6 C H A P T E R 2 . A P P R O A C H E S T O T E X T R E C O G N I T I O N The first section of this chapter briefly outlines work on text and character recog-nition that has been done in the field of pattern recognition, and describes an ex-ample of an Al approach to the problem. The results from the pattern recognition efforts make it quite clear that the task is non-trivial and that robust performance will require that disparate sources of knowledge be applied. In attempting to do just that, the Al program described exhibits some of the fundamental problems com-mon to other Al systems of its era. This leads into the description of a new ap-proach, outlined in the following section, to the understanding and emulation of perceptual tasks. The final section describes the psychophysical research into the reading process. 2.1 THE PATTERN RECOGNITION APPROACH The great bulk of the work on text recognition has been directed primarily toward the near-term engineering of practical algorithms for reading machines and has followed what may be termed the 'pattern recognition' approach. Such systems typ-CHAPTER 2. APPROACHES TO TEXT RECOGNITION 7 ically rely solely or primarily on character level knowledge, and operate by classify-ing each individual character on the basis of some set of (usually ad hoc) features extracted from its image. The sub-field is therefore termed 'Character Recognition.' The results of this approach have been mixed and have commonly prompted the conclusion that in order to reliably recognize printing—typeset or handwritten— which is not closely constrained to some expected norm, multiple sources of knowledge must be brought to bear. This section gives a brief overview of the work in this area. Character recognition systems typically proceed through four stages: • Segmentation, which involves demarcating, in the digitized image of the input text, each 'blob' which is presumed to be a separate character. • Preprocessing, which involves size normalization, smoothing and thinning of the character image. • Feature extraction—the "central issue" in handprint recognition, according to a recent review of the field (Suen et al., 1980, p.473). The features most commonly used are geometrical and topological ones such as strokes and curves of various orientations and the relations between them, the loca-tions of end points, intersections and holes, and so on. The choice of fea-tures is ad hoc in the majority of cases although one group (Blesser et al., 1974) has used the results of a series of psychological experiments to define a set of features which human subjects apparently use in distinguishing ambiguous characters. • Classification. The character's identity is finally determined by comparing, via some kind of statistical classifier, its feature profile with those obtained from a training set of characters. Suen et al.'s review came to the following conclusions regarding the performance achieved so far by systems for recognizing handprinted characters: "remarkably CHAPTER 2. APPROACHES TO TEXT RECOGNITION 8 high" recognition rates can be obtained "[w]hen trained subjects are used, and the shapes of models are specified with provision to disambiguate easily confused characters" (this means training subjects to write V, for example, like V so that it is not confused with U.) But "generally speaking, the present [handprint recognition] machines are not very satisfactory", their performance being "somewhat inconsis-tent and irregular" (Suen et al., 1980, p.482). In considering ways to improve this situation, the review emphasizes the large body of acquired knowledge human readers can bring to bear on the problem, and recommends (among other things) research on the human perception of text, and the application of knowledge at other than just the character level, including "grammatical, linguistical [sic], and contextual information". In what is a clear indication of the very applications-oriented flavour of the charac-ter recognition field, the survey also recommends development of a standard set of character models "which could be comfortably followed by human beings to pro-duce samples easily recognized by machines." (p.482) 2.1.1 Recognition of machine-printed text Not surprisingly, efforts at recognition of typeset text have been more successful and high performance machines are on the market. But nevertheless, in order to achieve high performance recognition of multiple fonts at different sizes, it is evi-dent that character level knowledge alone is not sufficient. Kahan et al.'s (1987) account of the development at Bell Labs of such a system pro-vides a very recent illustration. The part of this system which relies on character level knowledge extracts from the text image features such as strokes approximat-ing the character skeleton, stroke crossings, and concavities, and matches these against the feature clusters produced by a training set of characters using a Bayesian CHAPTER 2. APPROACHES TO TEXT RECOGNITION 9 statistical classifier. To find and correct errors made by the character classification stage, the system then applies "linguistic context" knowledge, whose main compo-nent is a spelling checker but which also includes heuristics such as "No numerals or punctuation marks can occur inside strings of text, and no upper case characters can occur between lower case ones." Even after bringing to bear these multiple knowledge sources, the system, whose input is high quality printed text scanned at reasonably high resolution, does not yet meet the minimal performance goal of 99.9% correct recognition, or about 3 er-rors per page—a level comparable with manual entry. 2.2 A N A l APPROACH Within the Al community perceptual tasks, visual or otherwise, have generally been regarded as involving far more than simple classification of an input. Percep-tion has been described as a process of "constructing the most plausible interpreta-tion of an [input] not only on the basis of cues which can be extracted from [it] but also on the basis of prior expectations of what it . . . depicts" (Brady and Wielinga, 1978). Those expectations comprise knowledge of many different kinds, which re-quire representations that are both descriptively adequate, and thus able to capture knowledge in all its richness and subtlety and in a manner that makes explicit the aspects required for the task in question; and procedurally adequate—of reasonable computational efficiency. This section describes Brady and Wielinga's (1978) program for 'reading' hand-printed Fortran coding sheets—one of the few examples of an Al approach to the text recognition problem. Outlined in the next section are Brady's later criticisms of this program, which reflect a trend in Al (and computational vision especially) away from over-emphasis on downward flow of "high level" knowledge in inter-CHAPTER 2. APPROACHES TO TEXT RECOGNITION 10 pretation and toward full exploitation of the information available from early pro-cessing. Brady and Wielinga state that their overall emphasis in developing the program was on defining the kind of knowledge necessary, its representation and the proce-dures to employ it effectively in interpreting the input image. Knowledge in the microworld of Fortran coding sheets falls naturally into two categories; that con-cerning Fortran programming and syntax, and general knowledge about (uppercase) characters and their segmentation. This observation led to division of the program into two distinct processes which interact cooperatively to reach an in-terpretation. The program was written using an implementation of Minsky's frames (Minsky, 1975). The Fortran knowledge module operates something like a top-down parser, work-ing through possible Fortran statement types in order to assign roles to the "blobs" which are initially classified as either "alphanumeric-or-bracket" sequences or as "operator-or-punctuation". In what the paper calls a reflection of "the enormous redundancy" present in an image of the text, this simple process performs quite well, often identifying the statement type correctly, and always including the correct one if several possibilities are offered. The character recognition module is far more complex. Character models are ex-plicit structural descriptions containing the strokes and curves of the character's skeleton and the junctions and relations between them. Strokes are extracted from the character image by a simple edge finder, and curves are recognized as a series of appropriately intersecting short strokes. The paper describes their approach to a number of fundamental Al problems, such as allowing for model instances to differ from the models, the parallel problem of uncertainty and its representation and manipulation, and the resolution of CHAPTER 2. APPROACHES TO TEXT RECOGNITION 11 conflicting evidence. Following the "dictates of the bandwagon of heterarchy" their method is to rely on the incremental and synergistic interaction of the various sources of knowledge to resolve uncertainty and conflict and come up with a con-sistent interpretation. The observation that both modules of the program almost always work in a state of partial knowledge led to design in which partial percepts or intermediate hypotheses are explicitly represented and are incrementally refined by finding at each step the information which can be most cheaply computed. The authors point out that this approach is in marked contrast with that of many Al programs which tend to create high-level hypotheses based on very little evi-dence and are then often forced to backtrack. In practice, however, they found that it forced them to represent explicitly a "surprisingly" large number of intermediate levels of interpretation. One major problem was the difficulty in computing junctions and the resulting large degree of uncertainty associated with them. This was apparently such a seri-ous problem they were prompted late in the game to completely abandon the stroke and junction based character models and to begin developing an entirely new and more robust representation based on the shape of the space a character occupies. (This new representation was only partially described in their paper.) It later became apparent, however, that the problems were of a more fundamental nature. 2.3 THE BASIS OF A NEW APPROACH In a later paper, Brady (1981) is sharply critical of the Fortran reading program, plac-ing it within the genre of complex and less than successful "hierarchical" Al sys-tems characteristic of the 1970's, such as Hearsay-II (Lesser and Erman, 1977) and Margie (Shank et al., 1973), all of which shared the underlying premises that low-CHAPTER 2. APPROACHES TO TEXT RECOGNITION 12 level information is inherently ambiguous, that ambiguity can be resolved only by application of high-level domain specific knowledge, and that this interaction is situation dependent, cannot be fully defined in advance and is therefore best im-plemented in the form of process interactions. Brady points out several problems that have arisen with this approach. First, useful theories governing process inter-actions have failed to emerge, and such interactions remain difficult to control, an-alyze and understand. Secondly, to quote Brady, "the presumed power of heterarchy never materialized. It repeatedly became evident that a small increase in the early processing capabilities of a program could have a far greater impact on the perfor-mance of a program as a whole than a vastly greater amount of 'higher level rea-soning'."(p.l88) A final, and perhaps the key criticism, is that this approach has em-phasized the mechanism by which knowledge and information from different sources interacts, without first addressing the crucial issues of precisely what knowledge is necessary to perform the task, how it should be represented in a way that makes explicit the required aspects, and finally, what algorithms can make use of it in a way that achieves the desired performance. As indicative of a new approach, Brady points to achievements in the development of a computational theory of early vision, a field heavily influenced by the work of David Marr (1982). This field has seen considerable success in producing theories and implementations of such (human) visual processes as edge detection, stereo, shape from shading, and early motion detection. The research methodology producing these results suggests isolating perceptual abilities which psychological evidence suggests occur in humans on the basis of early processing, and places emphasis on defining exactly what information must be extracted from the input in order for the observed perceptual ability to be possi-ble, and on then describing the general, domain independent knowledge of the CHAPTER 2. APPROACHES TO TEXT RECOGNITION 13 world which must be assumed in order that the system can extract this required in-formation. Flowing from this are two important observations having implications for research into the process of reading (from the perspectives of both Al and psychology) and into perceptual tasks in general. First is the demonstration of the power of early processing—that (for vision at least) rich descriptions can be extracted, and signifi-cant ambiguity resolved on the basis of quite general, domain independent knowl-edge of the world and without application of downward flowing high level knowl-edge. This implies of course that when high level knowledge is brought to bear, it deals with more developed and less ambiguous descriptions than was typical in programs in the heterarchy genre, the presumed result being less thrashing and much better performance. The second observation is the research methodology itself. It is characterized in part by reliance on studies of human visual system to isolate perceptual abilities and outline the information which they require. It then follows three steps: The first, as outlined above, is to determine the information the system must extract from the input, and the general assumptions about the world the system must make in order to do so. Second, a representation must be designed which makes the required information explicit (rather than leaving it implicit); and only then are algorithms devised to perform the task. The final step is to test the system to see how well it explains the ability observed in the natural system. 2.4 APPLYING THE NEW APPROACH IN UNDERSTANDING READING Having illustrated the shortcomings of past efforts in developing Al recognition programs including his own, Brady sets out to use this new approach to begin de-veloping a computational model of the human skill of reading. He considers first CHAPTER 2. APPROACHES TO TEXT RECOGNITION 14 the problem of isolating words from text. Psychophysical evidence suggests that in-ter-word spaces are discovered on the basis of low resolution parafoveally1 obtained information. Brady uses the Marr-Hildreth theory of edge detection as a means of describing precisely what that information is—ie. how the text appears at the reso-lution available within the parafovea. Based on this, he postulates that the word isolation algorithm is quite crude and depends on recognizing easily found gross features of the inter-word character or location (such as size or overall shape) which make them distinct from the surrounding text in the parafoveal representation. Unlike earlier theories on word isolation put forward in the psychology literature, this scheme does not depend on high level syntactic or semantic knowledge, and as Brady demonstrates, it is able to account for the empirical evidence and its predic-tions are confirmed by new experiments. It is also of interest that Brady's scheme works on the basis of the lower resolution information available in the parafovea, rather than assuming access to fully de-tailed character information. Brady considers the next step in the development of a computational theory of reading to be the problem of integrating information from one fixation to the next—in particular, of integrating low resolution information obtained in the parafovea with high resolution information obtained when the same text is fixated foveally. He cites the work of psychologist Kieth Rayner as providing the experi-mental basis for such a study, and points out cases where the analysis of the nature of the parafoveally available information suggests explanations of results obtained by Rayner. 1 The fovea is the smal l area of max imum visual acuity subtending 2° at the centre of the retina; the parafoveal region is the sur round ing area of lesser acuity subtending 10°; the remainder is the periphery (Rayner and Bertera, 1979, p.468). CHAPTER 2. APPROACHES TO TEXT RECOGNITION 15 Brady goes on to postulate that it is the large scale features of characters which form the basis of human word representation scheme: "It is not inconceivable that we have learned that such shape information at the extremities of words and from isolated ascenders and descenders within a word are preserved over a typical 2° sac-cade, and have based our word representation scheme, which develops over sev-eral such saccades, and the corresponding processes for eliciting substructure, upon it." (p.205-6) It was this suggestion that prompted the development of the system described in this thesis. More recent experiments by Rayner and others give further clues as to how we use parafoveal information during reading. 2.5 OVERVIEW OF THE PSYCHOPHYSICAL EVIDENCE 2.5.1 The Work of Rayner and Colleagues This section will briefly describe the recent work of experimental psychologist Rayner and his colleagues which provides a basis for the system described in this thesis. As mentioned earlier, these experiments provide evidence that parafoveal information is used in identifying words, that large scale letter features are extracted from it, and that lexical knowledge, specifically, knowledge of word frequency or familiarity, is used in its processing and interpretation. As of all these experiments are based on the study of eye movements during read-ing, some background in this area is in order. During normal reading the eye moves across text in a series of sudden and rapid—saccadic—movements which occur at a rate of 4 or 5 per second and typically extend over 7 or 9 characters. The pauses between saccades when the eye is relatively still and during which informa-tion is gathered are termed fixations. As explained earlier, the fovea is the area in CHAPTER 2. APPROACHES TO TEXT RECOGNITION 16 the retina of maximum visual acuity and extends 2° across the fixation point; the parafovea is the surrounding area of lesser acuity with an extent of 10°; and the re-mainder is termed the periphery. In the experiments described here the fovea cov-ered about 6 characters. Underlying all of this research is the fundamental assump-tion that the pattern and duration of eye movements and fixations during reading reflects the underlying processing of that information. Rayner pioneered a technique which allows text to be dynamically altered as it is read depending on where in subject's visual field it falls during a fixation. This in-volves accurately tracking the subject's eye position as text is read from a CRT and making the alterations while the eye is moving between fixations. Thus a word (or part of one) can be displayed differently when it falls on the parafovea than when it falls on the fovea during the subsequent fixation. The technique allows such exper-iments as studying the effect on reading speed of denying all parafoveal informa-tion about words (except their length) by displaying them as a strings of x's when they appear in the parafovea but normally when they fall on the fovea. Through a series of such experiments, evidence has progressively been uncovered as to what information is extracted from the parafovea during reading and how it is used. The earlier results, such as those cited by Brady (above) showed for instance that letter information useful in word identification including overall word shape and that of the first and last letters could be picked up from the parafovea, as could word length information useful in determining the next fixation location (see Rayner, 1975). Later experiments (for example, Rayner et al., 1982) showed, first of all, that this parafoveally extracted letter information is important—when it was removed, reading speed is cut by 40%. They also provided evidence as to the nature of the in-CHAPTER 2. APPROACHES TO TEXT RECOGNITION 17 formation: the letters making up parafoveal words2 were replaced by letters which were either visually similar or dissimilar in gross overall shape according to letter groups defined by Bouma (1973) on the basis of large scale features such as ascenders and descenders. For example the letters b, k and h all have left ascenders and so fall into the same group. It was found that when the first 3 letters of a parafoveal word were correct and the remainder were replaced by similar ones, the reading rate was almost as high as when the entire word was correct. Rayner et al. saw this as sug-gesting "that the visual information extracted from the end of the word to the right of fixation either may not be very precise or may not be used very fre-quently." (p.547) The further observation that the replacement by similar letters, as opposed to dis-similar ones, has a significant effect on reading rate supports the former interpreta-tion, and leads to the hypothesis that the 'imprecise' information extracted is in fact that which differentiates similar and dissimilar letters—ie. the large scale character parameters. Recall Brady's observation that just such information should be avail-able at the level of visual acuity present in the parafovea. Rayner et al. (1982) argued that this parafoveally extracted information is used to speed lexical processing—it may be enough in itself to allow word identification, and often it is integrated with information obtained when the word is fixated foveally. So, there is evidence as to the nature of the information extracted from the parafoveal word, and that lexical knowledge is used in processing it. A natural question at this point is 'What sort of lexical knowledge?' A recent paper by Inhoff and Rayner (1986) provides strong evidence that word fre-quency or word familiarity has a marked effect on the processing of parafoveal in-2The parafoveal word is the word immediately to the right of the currently fixated word. CHAPTER 2. APPROACHES TO TEXT RECOGNITION 18 formation. Sentence pairs were concocted such that the sentences in each pair were identical except that a target word was a high frequency word in one, and a low fre-quency word in the other. The pair of target words was matched so that effects of factors other than frequency (such as word length) would be minimized. As the sentences were presented to the subjects the target word was made available parafoveally in some cases, and in other cases was not. The results of measure-ments of the duration of the first fixation on the target word were striking—when parafoveal information was denied, there was virtually no difference in duration between high and low frequency words. But when parafoveal information was available, the first fixation duration was significantly less for high frequency words. Gaze duration (the total duration of all fixations on the word) was significantly less for high-frequency words whether parafoveal information was available or not. Inhoff and Rayner interpreted their results as "support[ing] the position that parafoveal word processing is sensitive to the lexical characteristics of the parafoveal word and that high-frequency parafoveal words are processed more ef-fectively than low-frequency parafoveal words."(p.436). 2.5.2 Character Features Used as Perceptual Cues in Parafoveal Vision The work of Bouma (1971) provides evidence as to what properties of characters the human visual system uses as perceptual cues in recognizing text appearing outside the fovea. Bouma's experiment investigates the perceptual cues or features mediat-ing the recognition of isolated lowercase letters in marginal reading conditions. Subjects were required to identify letters presented either (1) at a long distance, or (2) in eccentric vision (ie. falling on the parafovea or periphery) in order to induce a good fraction of incorrect responses or "confusions" as it is from the nature of these CHAPTER 2. APPROACHES TO TEXT RECOGNITION 19 confusions that can be deduced which letter features have been used in recognition and which have not. The data sets chosen for analysis were those for which distance and retinal eccen-tricity, respectively, were such that the rate of correct responses was 50%. (Eccentricity in this case was 7°, which corresponds to 2° beyond the far edge of the parafovea, within the periphery.) The 26 lower-case letters were in the typeface 'Courier 10' as printed by an IBM Selectric typewriter. From the relatively high overall level of confusions, Bouma concluded that ob-servers "readily use available cues for arriving at letter responses." Results for distance reading and eccentric reading were compiled into two confu-sion matrices which record the number of times each letter was given as a response to each stimulus letter. Based on the results, Bouma placed letters into the follow-ing groups among which confusions were more or less common: aszx eoc nmu rvw dhkb tilf gpjyq. There is no direct evidence in these results indicating which character properties or features have been used by the subjects or, of course, "what cue combinations con-stitute the observer's implicit knowledge of a certain letter form." Bouma therefore proposes certain features which are shared within groups of confusable letters, and for each feature determines from the data an estimate of the upper bound of its cue value—an estimate of the degree to which perception of that feature determines the response (see Table 2.1). CHAPTER 2. APPROACHES TO TEXT RECOGNITION 20 Property Cue Property letters Cue letters Upper cue considered considered value C H>0.25mm H i g h letters A l l letters w i th A l l letters w i th 0.92 o r H / W > 1 . 6 m m ascenders or descenders ascenders or descenders Obl ique outer parts Obliques v w v w 0.92 H /W>1.22 Slender letter t i l f j t i l f j 0.90 Left upper extension Left upper ext. hkb hkb 0.89 Right upper extension Right upper ext. d d 0.88 H< 2.0mm Short letter A l l smal l letters A l l smal l letters 0.84 o rH /W<1 .16mm Lower gap (0.4mm) Lower gap n r m 0.80 Two vert, outer parts Double vertical nmu nmuh 0.77 (short letters) Left part envelope Left part round eoc eocdqag 0.74 c i rcular Upper dot Dot i i j 0.70 Right outer gap Right gap c ecszxrtf 0.66 no inner part Upper gap (0.5mm) Upper gap u uvw 0.65 Rectangular Rect. letters aszxnmu aszxnmu 0.65 envelope (short) Ci rcular envelope Round eoc eoc 0.63 Presence inner part Something inside aszxemw aszxemw 0.60 (short) Right outer gap + Right gap e ecszxrtf 0.38 inner part Table 2.1. Features proposed by Bouma, ordered by estimated Upper Cue values. H is letter height, W is width. (From Bouma, 1971, p.466.) The upper cue value is the fraction of responses to a stimulus letter possessing a feature to which perception of that feature may have contributed. It is calculated as the fraction of correct responses plus the fraction of incorrect responses where the incorrect response is nevertheless another letter possessing the same feature. Bouma concludes that of these features which may have served as perceptual cues in conditions of distance and eccentric vision, those with the strongest upper cue values are: height-to-width quotient, vertically ascending and descending parts, outer vertical and outer oblique parts, and outer gaps. Inner parts of characters are apparently weak cues at best. He also concludes that these results would be reasonably representative of other fonts—the observation that details closer together than about 0.5mm (the height of short letters was 1.95mm) could not be perceived in the experiments suggests that CHAPTER 2. APPROACHES TO TEXT RECOGNHION 21 while "the results will be somewhat different, in particular for typefaces without serifs", one should "not expect dramatic differences." It is of some concern that, as Bouma himself points out, there is no direct evidence that the character properties he proposes are actually used by the human visual sys-tem. Haber and Haber (1981, p.165) make this point about Bouma's features more forcefully: "the procedures described here are ad hoc, in that there is no guarantee that all potentially useful features have been isolated, nor any evidence that any of those features are actually used by readers. Presentation of a confusion matrix, or a list of similarity scale scores, tells us which letters are confusable or similar, but not in what ways. Therefore, even after a century of research on the visual characteris-tics of letters, we yet know little about what those characteristics actually are." It is also of concern, as Bouma again points out, that while "[i]n normal reading, the role of single letters is perhaps overshadowed by that of larger units ... [w]e work here on the assumption that research on single letters will eventually help to clar-ify the special role of letter combinations." (p.459) While these results appear to provide one of the best estimates currently available, clearly more work in this area is required. 22 CHAPTER 3 . CONCEPTS, DESIGN, AND IMPLEMENTATION This chapter describes in detail the conceptualization and implementation of the parafoveal word recognition system. The first section develops the ideas which form the basis of the system; the second describes the implementation. 3.1 A MECHANISM FOR PARAFOVEAL WORD RECOGNITION The basic idea behind the system arose out of a consideration of the information the human visual system apparently extracts from word images falling on the parafovea, and the knowledge which it brings to bear in processing that informa-tion. Michael Brady considered that the "next step" in developing a computational theory of early visual processing in reading to be the problem of integrating infor-mation obtained parafoveally with that obtained foveally, and suggested that the word representation scheme needed to support this might be based on such large scale shape information as isolated ascenders and descenders. Rayner et.al.'s (1982) finding of the facilitative effect on reading of substituting "visually similar" letters in parafoveal words supports this idea, and provides further evidence as to what CHAPTER 3. CONCEPTS, DESIGN, AND IMPLEMENTATION 23 this large scale shape information might be—not just ascenders and descenders, but also the other large scale features which Bouma's confusable letter groups share. Finally, Inhoff and Rayner's (1986) work provides evidence that knowledge of whole words is used in processing parafoveal information, and that the parafoveal information from high frequency words is processed more quickly than that for in-frequently encountered words. At this point, it was necessary to ask only how knowledge of whole words might interact with the large scale character information available in the parafovea for the confusion class indexing scheme described here to suggest itself. If information ex-tracted from a word's image by parafoveal processing is not good enough to identify a character uniquely, but only to place it within a group of similar characters—a confusion class, (CC for short)—then representing each word by the sequence of the confusion classes into which its characters fall may provide a simple and fast method for recognizing words and letter sequences. The process would operate in the following manner. A predefined set of words— the system's vocabulary—would be indexed by their confusion class sequences (see Table 3.1). In line with the evidence from Inhoff and Rayner, the vocabulary would contain only high frequency words. During the recognition phase, each character of a parafoveal word would be classified into one (or more) confusion classes on the basis of its large scale features extracted from parafoveally available information, and the sequence of these confusion classes would then be used as a key into the index for retrieval of the indexed word (or words). If the word's CC sequence is not in the index—and a required property of the indexing scheme is that most words not in the vocabulary do not share the same CC sequences with words that are—its recognition would be the job of more complex processes relying on higher resolution foveal information, although constraints provided by the CC sequence may be of use at this next level. CHAPTER 3. CONCEPTS, DESIGN, AND IMPLEMENTATION 24 CC sequence index key corresponding words [tilf,bhk,eoc] [aszx,mn,d] [tilf,eoc,eoc,bhk] (the) (and) (took look) Table 3.1. A few entries from the vocabulary confusion class sequence index. Commonly occurring suffixes, prefixes, and other letter sequences could be indexed and recognized in the same way. This scheme for quick recognition of common words and letter sequences may be expected to possess certain desirable properties: 1) The features on which the character classification is based are simple, large scale shape properties of characters rather than their finer details, and there is empirical evidence of their use by the human visual system. It is therefore probable that such a set of features and the confusion classes based upon them captures some intrinsic and invariant properties of character forms, and that a recognition scheme based upon them should be relatively insensitive to changes in font and/or style. 2) The scheme should be fairly robust with respect to errors or difficulties in the classification of any individual character. First, there is a small number of confusion classes and the features are presumed to be based on rela-tively invariant character properties, so a fair degree of variation from the norm should be tolerated before a letter would be misclassified. Secondly, the index lookup is done using the whole sequence of confusion classes, one for each character. With this context available, a well designed lookup procedure should have a good chance of retrieving the correct result even CHAPTER 3. CONCEPTS, DESIGN, AND IMPLEMENTATION 25 if one or two of the characters have multiple, wrong, or missing classifica-tions. The following sections describe some of the key issues behind this mechanism for parafoveal word recognition, including defining the features and confusion classes, determining the degree to which words index uniquely by these classes—a factor critical to the feasibility of the system—and modeling the information available in the parafovea which forms the input to the system. 3.1.1 Character Features and Confusion Classes 3.1.1.1 Features The character features used by this system are essentially a subset of the features de-scribed by Bouma (section 2.5.2) consisting of those having the highest upper cue values which are sufficient to discriminate among the chosen confusion classes. (Recall that the upper cue value is a measure of the degree to which the presence of a feature determines the correct classification of the character.) There are two areas where the features described here differ from those of Bouma. 1) Bouma's experiments involved the recognition of single, isolated charac-ters. Without the context provided by adjacent characters arranged on a line it is not possible to establish reference lines which allow one to easily pick out the ascenders and descenders. Bouma therefore defined only the features talMetter (comprising characters having an ascender or de-scender) and short _letter, both of which have high cue values. The parafoveal word recognition system will encounter one or more words per line and can therefore assume the presence of enough characters to estab-lish a baseline and midline (see section 3.2.6.4) and thus easily distinguish between characters with ascenders and those with descenders. CHAPTER 3. CONCEPTS, DESIGN, AND IMPLEMENTATION 26 2) Bouma's feature two_vertical_outer_parts has been split into two: left_vertical_outer_part and right_vertical_outer_part. 3.1.1.2 Confusion Classes. The confusion classes selected for the parafoveal word recognition system are shown in Table 3.3. They are similar to those described by Bouma but not identical. As described earlier, Bouma formed his confusion classes by grouping letters among which confusions were more or less common. He grouped them first into short letter, letters with ascenders, and letters with descenders, and then further subdivided the short letter into 4 subgroups: aszx eoc nmu rvw, and the ascenders into 2 subgroups: dhkb and tilf. His full set of classes was therefore: aszx eoc nmu rvw dhkb tilf gpjyq. The differences between Bouma's confusion classes and those defined here are as follows: 1) r in a separate class. Bouma grouping of r with v and w seems odd. These letters do not share confu-sions to any significant degree, especially within his Eccentric Vision confusion matrix, and unlike r, v and w are laterally symmetric and (in most fonts) have oblique_outer_parts, a feature with a high cue value. Since r does not seem to fit any better into any other small letter class, it was decided to place it in a separate class by itself. 2) d separated from hkb. The features distinguishing d from h, k, and b—left_upper_extension and right_upper_extension—are strong ones, so Bouma's single class was split into two. CHAPTER 3. CONCEPTS, DESIGN, AND IMPLEMENTATION 27 3) u grouped with vw rather than with nm. This change was made for two reasons: First, although u, m, and n share the prop-erty two_vertical_outer_parts, the feature upper_gap (and/or lower_gap), which discriminates u, v, and w from n and m, is quite strong. Secondly, in recognition of handprinted characters, u and v are the most frequently confused pair of letters (Suen & Shillman, 1977). Confusion Classes aszx eoc r m r uvw d hkb t i l f gpjyq Features asz X vw u s m a l l T T T T T T T ascender T T T descender - l - i T left_upper_extension T r ight_upper_extension T slender_character —t —1 T lef t_vert ical_outer —l T T r ight_ver t ica l_outer T T upper_gap —1 T T lower_gap - l T T T -1 oblique_outers - l —i T left_round - l - i T -1 —l Table 3.3. Definition of the confusion classes in terms of the set of features. A T' in the column under a confusion class indicates a character must possess the feature to be included in the class; a '->' indicates it must not possess the feature. The absence of any entry indicates the feature is not relevant for the particular CC. These features are similar to those defined by Bouma (1971) (see Table 2.1). The features small, ascender, descender, left_vertical_outer, and right_vertical_outer, differ as described in the text. 3.1.2 The Most Common English Words The set of the most commonly occurring english words used here was taken from the Standard Corpus of Present-Day Edited American English, a corpus of language texts assembled at Brown University in the early 1960's which has become some-thing of a standard in statistical studies of english (Kucera & Francis, 1967). The 'Brown Corpus' as it is commonly known is made up of samples from edited American sources such as newspapers and magazines, scientific writings, fiction, CHAPTER 3. CONCEPTS, DESIGN, AND IMPLEMENTATION 28 and biography all printed in 1961. It is 1,014,232 words in total length and contains 50,406 distinct words. The rank listing—the listing of words in order of their frequency of occurrence in the corpus—provides the source for the most common words. Table 3.4 describes the two sets of words used here. The most common 1030 words are used as the vocabulary of the system, and the most common 2005 were used to test the effectiveness of CC sequence indexing. Some corpus words were not included in the these sets including numerals (eg. 'I960') and those words containing apostrophes. set name #words corpus rank corpus fraction of fraction of in set words distinct words running words Vocabulary 1030 >100 1070 2.123% 69.537% first 2005 2005 >54 2081 4.128% 76.66% Table 3.4. The two sets of most commonly occurring english words used. The set Vocabulary (the words used in the implementation) consists of all 'usable' words (those which are not of numerals or symbols and do not contain apostrophes) occurring > 100 times in the Corpus. These words make up 2.123% of the 50,406 distinct words in the Corpus, and fully 69.537% of the 1,014,232 running words. 3.1.3 Effectiveness of Indexing by Confusion Class Sequence The effectiveness of the proposed system depends in large part on the degree to which common english words have unique confusion class sequences. If most common word CC sequences are not unique, then of course many lookups of parafoveal words will return multiple answers, and the scheme for quick word recognition will be of questionable usefulness. It is also important that a large fraction of 'medium frequency7 words—the more frequently occurring of the words not in the system's vocabulary—have confusion class sequences which differ from those of the vocabulary words. This will ensure CHAPTER 3. CONCEPTS, DESIGN, AND IMPLEMENTATION 29 that most words not in the vocabulary will simply fail to match rather than pro-duce an erroneous match. These conditions were tested in the following manner: The most common 2005 english words representing 76.66% of the running words in the Brown Corpus, were indexed by their confusion class sequences, and the number of words indexed under each distinct sequence was recorded. If there are few non-unique CC sequences among this group as a whole, then there will be few cases where a word in the first 1000 words (those in the vocabulary) shares a CC with a word in the second 1000 (a sample of 'medium frequency' words). The re-sults are summarized in Table 3.5. It shows that 1842—almost 92%—of the CC sequences are unique (ie. refer to only one word each of the 2005). words per frequency % of 2005 CC sequence words 1 1842 91.9 2 67 6.7 3 7 1.0 4 2 0.4 Table 3.5. Results of indexing the 2005 most common english words by their confusion class sequences. Almost 92% of these words have unique CC sequences. These results are encouraging. Among the 2005 most commonly occurring words, fully 91.9% have unique CC sequences. And these words can be expected to make up 77% of the words encountered by a word recognition system in an average sam-ple of text (assuming, of course, that the Brown Corpus is representative of an "average" sample). The same test was run on the first 1030 most common words—the set of words used as the implementation's vocabulary. The results, shown in Table 3.6, are very CHAPTER 3. CONCEPTS, DESIGN, AND IMPLEMENTATION 30 similar. The same high fraction of words—91.9%—have unique confusion class sequences. words per frequency % of 1030 CC sequence words 1 947 91.9 2 34 6.6 3 5 1.5 Table 3.6. Results of indexing the parafoveal word recognition system's vocabulary words—the most common 1030 english words—by their confusion class sequences. Again, almost 92% of these words have unique CC sequences. It seems clear then that among the most commonly occurring english words, a word's confusion class sequence is a strong constraint on its identity. 3.1.4 Modeling Parafoveal Information Computed by the Human Visual System This section describes the use of a recent theory of human early visual processing to model the information available in the parafovea during reading. This in-formation will form the input to the parafoveal word recognition system. Experimental evidence suggests the existence at the retinal level of the human vi-sual pathway of four (and probably five) spatial frequency (size-tuned) channels, each of which is sensitive to intensity changes in the image occurring at a particular spatial scale and orientation. Thus at each point on the retina, the incident image is being analyzed for intensity changes by at least four separate channels. Table 3.7 shows the scale of the stimulus to which each channel is most sensitive. CHAPTER 3. CONCEPTS, DESIGN, AND IMPLEMENTATION 31 Channel W l - D Properties N S 3.1 6.2 . . sustained response most sensitive T U 11.7 21 transient response least sensitive r Table 3.7. The widths in the fovea of the central part of the receptive fields of the largest four retinal spatial frequency channels. The notation w\_D refers to the width in minutes of visual angle of the linear projection of a circularly symmetric receptive field. The diameter of these fields w2-o is therefore y[2w1_D. (Marr 1982, p.62.) There is somewhat more recent evidence suggesting that the fifth, finest resolution channel "plays the most important role in determining the information ... com-pute[d] foveally." (Brady, 1981, p.193) The receptive field sizes apparently scale linearly with eccentricity (angular distance from the centre of the fovea) and are about doubled at an eccentricity of 4°. The parafovea extends from an eccentricity of 1° to 5°. Marr (1982) puts forward the case that operator provides the mathematical basis for the computation performed by these channels.1 V 2G was proposed by Marr and Hildreth (1980) as a operator for finding intensity changes occurring at various spa-tial scales in an image (and is generally known as the Marr Hildreth edge detection operator). They claimed it was a desirable operator for two reasons: first, the Gaus-sian part of it, which blurs the image thus tuning it to intensity changes of a partic-ular spatial scale according to the setting of its space constant a, is the optimal func-tion for this purpose since it minimizes the introduction of artifacts since it is op-timally localized in both the spatial and frequency domains. Secondly, the V 2 part offers economy of computation over other possible choices of a spatial differential operator. 1 9 9 9 9 9 V is the Laplac ian operator (d /dx + d /dy ) and G is the 2-dimensional Gaussian distr ibut ion G(x / y)=exp-((x 2 +y 2 ) / (2Tca 2 )) . CHAPTER 3. CONCEPTS, DESIGN, AND IMPLEMENTATION 32 By applying the V 2G operator to a digital image, one can determine the information which, according to this theory, is computed by a given retinal spatial frequency channel at any given eccentricity, and in particular, that which is computed within the parafovea. All that is required is to determine the spatial constant o of the V 2G in order to reflect the choice of retinal channel, eccentricity, and image digitization and viewing geometry parameters. As the channel widths scale linearly with eccentricity e, and are roughly doubled at e = 4°, the width in minutes w !_D at eccentricity e of a channel which subtends m minutes of arc in the fovea is: e+2 K)i_ D = — m (e> 1°). This assumes that the channels are of constant size throughout the fovea, which has a diameter of about 2° of visual angle; ie. it extends from e=-l° to e=l° (Rayner and Bertera, 1979, p.468). The widths subtended by the four channels within the fovea were listed in Table 3.7. Let SI_D be the width of the central part of the receptive field of the V 2G operator, expressed in pixels. Then a = SLD/2. (Recall that a is the space constant of V 2G and the standard deviation of the Gaus-sian G.) In order to have model the desired information, we set its receptive field size to that of the chosen channel at the chosen eccentricity: SI_D = w\_D K where K is a constant (at small angles A) with units of pixels per minute of visual angle and is determined by dividing the number of pixels in the image subtended CHAPTER 3. CONCEPTS, DESIGN, AND IMPLEMENTATION 33 by an object (say, by the width of an average character) by the visual angle subtended by that object in the viewing situation being modeled. So, K = P/A, where P = width of an average character (pixels), and A = visual angle subtended by the width of the average character (minutes). In a normal reading situation there are about 3 characters per degree of visual an-gle, so A=20'. So finally we have *=^2T~ ( 3 / l ) where e+2 «>I_D = — m (e>l°), e = retinal eccentricity at which character image strikes retina (°), m = width of retinal spatial frequency channel in fovea (0, P = width of an average character in digital image (pixels), and A = visual angle subtended by the width of the average character ('). Table 3.8 shows values of a corresponding to various eccentricities within the parafovea for the two highest resolution spatial frequency channels. CHAPTER 3. CONCEPTS, DESIGN, AND IMPLEMENTATION 34 Channel width Eccentricity in Fovea at e O" (pixels) E ( ° ) mO e(°) 1.0 3.1 3.1 2.3 1.5 3.1 3.6 2.7 2.0 3.1 4.1 3.1 2.5 3.1 4.7 3.5 3.0 3.1 5.2 3.9 3.5 3.1 5.7 4.3 4.0 3.1 6.2 4.7 4.5 3.1 6.7 5.0 5.0 3.1 7.2 5.4 1.0 6.2 6.2 4.7 1.5 6.2 7.2 5.4 2.0 6.2 8.3 6.2 2.5 6.2 9.3 7.0 3.0 6.2 10.3 7.8 Table 3.8. Values of a, the space constant for the V 2G operator calculated according to formula 3.1 with P=30 pixels and J4=20'. An example of the results of applying the V 2 G operator to an image of text are illustrated in Figure 3.1. CHAPTER 3. CONCEPTS, DESIGN, AND IMPLEMENTATION 35 p a r a f o v e a l v i s i o n (a) m (b) Figure 3.1. (b) illustrates the zero-crossings (points of intensity change) computed from image (a) by the operator with a = 4.7 pixels. From Table 3.8, this models what is computed by the finest retinal channel (3.1') at an eccentricity of 4.0° (ie. within the outer part of the parafovea), or by the second finest channel (6.2') at eccentricity 1.0°. Note that the dot and stem of the letter 'i' are merged making the letter similary in appearance to distinguish from T'. CHAPTER 3. CONCEPTS, DESIGN, AND IMPLEMENTATION 36 3.2 IMPLEMENTATION This section describes the implementation of the parafoveal word recognition sys-tem. 3.2.1 System Overview Figure 3.2 presents a schematic of the overall system. The main components are: • parafoveal text image preparation, • segmentation of the parafoveal image into separate lines, words, and char-acters, • feature extraction and character classification, and • word and suffix identification. 3.2.2 Platform The system was implemented on SUN 3/50 workstations2 and a SUN 3/260 all running 4.2BSD UNIX3 . The main body of the system, from the segmentation component on, is written in GProlog4, a modification of CProlog 1.5 with the addition of primitives allowing ac-cess to the SunCore5 graphics package available on Sun workstations. The image manipulation programs used to compute the parafoveal image are written in C and are part of the standard library of the UBC Laboratory for Computational Vision. The original input characters are produced from standard bitmap fonts available on an Apple Macintosh6. 2 S U N Workstat ion is a trademark of S U N Microsystems Inc. 3 U N I X is a trademark of A T & T Bell Laboratories. 4 GPro log was implemented by Barry Brachman. 5 SunCore is Sun Microsystems' implementation of the C O R E graphics standard. ^Macintosh is a trademark of App le Computer, Inc. Tl <*>* « -t (JO n Sr* cr. n re o a fD n O OQ cr. o to CA Parafoveal Word Recognition System Marr-Hildreth edge detection zero-crossing segmentation, text baseline & midline determination; -letter segmentation 'segmente letters with ] reference, lines. w o r d s t Parameters: - visual chan nel - eccentridt y Confusion dasses definitions: letter: 1 2 3 4 5 small T T T - T ascender -i -i -• T -i descender left_vertical_outer -t -i T T -i right_vertical_outer -i - -* T -. left_round -• T -> -i -i oblique_outeis T ~t -I -I -l ..and classification. ascender descender left_vertical_outer right_vertical_outer left_round oblique_outers eoc d hkb T T T Confusion Class . sequencefs' [uvw,eoc,r,d,aszx] Confusion Classes: [aszx eoc mn r uvw d hkb tilf gpjyq] CC sequence used as key into system's vocabulary for retrieval of matching word(s). Vocabulaty of the 1030 most common english words indexed by confusion dass sequence. Common suffixes indexed by confusion class sequence [tilf.bhk.eoc] [the] [tilf,eoc,eoc,bhk] [took, look] [uvw,eoc,r,d,aszx] [words] (92% of the 2000 most common English words index uniquely under the chosen set of CCs.) trailing CCs used. as key into suffix index [words] [aszx] [s] [eoc,d] [d] [tilf.mn.gpjyq] [ing] [tilf,tilf,eoc,mn] [tion] n So O o n tn -a to a tn cn i—> a <z a* § s tn £ tn o CHAPTER 3. CONCEPTS, DESIGN, AND IMPLEMENTATION 38 3.2.3 Computing Parafoveal Information The use of the Marr-Hildreth edge detection operator (V2G) to model the informa-tion computed in the parafovea was described in detail in section 3.1.4. Figure 3.2 illustrates schematically the steps involved in this system in producing parafoveal information from the text image. 3.2.4 Data Structures The heart of the parafoveal word recognition system is the Prolog program. It ac-cepts as input the zero-crossings found in the parafoveal image represented as chains of points and attempts to identify the word or words contained in the origi-nal image and their possible suffixes according to its pre-defined vocabulary. The data structures used are quite simple. The objects found in the image and their properties are represented by the object types shown in Figure 3.3, each of which can have certain attributes. They form a simple composition hierarchy: thus an image consists of one or more textlines (a textline is, not surprisingly, a line of text), a textline consists of one or more words, and so on. An schain is a segmented chain (described below) and consists of only one chain. Image I Textline Word I Character I schain I chain Figure 3.3. Composition hierarchy of object types used to represent the objects found in the input image. CHAPTER 3. CONCEPTS, DESIGN, AND IMPLEMENTATION 39 Figure 3.4 shows some typical instances of various object types with their attributes. r e a d i n g G . 3 9 : t y p e : image f o n t : geneva72 image sigma: 3.9 t e x t l i n e s : [ t e x t l i n e l , t e x t l i n e 2 , t e x t l i n e 3 , t e x t l i n e 4 ] s c h a i n s : [ s c h a i n l , s c h a i n 2 , s c h a i n 3 , . . . ] max_extent: 443.0,543.0 mi n _ e x t e n t : 21.0,17.0 e d g e _ s e a r c h _ r e c t _ w i d t h : 7 t e x t l i n e l : words: [wordl,word2] t y p e : t e x t l i n e p a r t _ o f : readingG.39 index: 1 s c h a i n s : [ s c h a i n 5 , s c h a i n l O , . . . ] e n c l o s i n g _ s c h a i n s : ( s c h a i n 5 , s c h a i n l O , . . . ] max_extent: 100.0,543.0 mi n _ e x t e n t : 21.0,21.0 m i d l i n e _ X : 33.0 b a s e l i n e _ X : 82.0 w o r d l : c h a r s : [ c h a r l , c h a r 2 , c h a r 3 , c h a r 4 , c h a r 5 , c h a r 6 , c h a r 7 ] t y p e : word p a r t _ o f : t e x t l i n e l i n d e x: 1 e n c l o s i n g _ s c h a i n s : [ s c h a i n 5 , s c h a i n l O , . . . ] max_extent: 100.0,291.0 mi n _ e x t e n t : 21.0,21.0 c h a r l : t y p e : char p a r t _ o f : wordl index: 1 s c h a i n s : [schain5] max_extent: 83.0,54.0 m i n _ e x t e n t : 33.0,21.0 h e i g h t : 50.0 w i d t h : 33.0 descender: f a i l a s c e n d e r : f a i l s m a l l : t r u e l e f t _ v e r t i c a l o u t e r : 94.1913 r i g h t _ v e r t i c a T o u t e r : 0.0 l e f t _ r o u n d : f a l l o b l i q u e _ o u t e r s : f a i l lower_gap: f a i l s c h a i n 5 : t y p e : s c h a i n num_segments: 9 max_extent: 83.0,54.0 mi n _ e x t e n t : 33.0,21.0 i n _ t e x t l i n e : t e x t l i n e l p a r t _ o f : wordl Figure 3.4. Internal representation of some objects. The image from which they are taken is shown with its objects labeled in Figure 3.9 (a). Figure 3.5 illustrates the coordinate system in which image points are represented. • • y ? X Figure 3.5. Coordinate System. CHAPTER 3. CONCEPTS, DESIGN, AND IMPLEMENTATION 40 3.2.5 Vocabulary Indexing The representation of the confusion class sequence index for vocabulary words and suffixes is extremely simple in Prolog. The indexes are stored in CProlog's alternate database (to reduce storage and access time) in clauses of the following form: wccin-dex(<CC sequence>,<list of words with this CC sequenco). (See Table 3.9.) Prolog's built-in matching capability make searching for and retrieving matching words and suffixes trivial. wccindex( [tilf,bhk,eoc], [the]). wccindex([tilf,eoc,eoc,bhk], [took look]). sccmdex([tilf,mn,jpgyq],[ing]). Table 3.9. Word and suffix CC indexes. A few examples showing the form in which the vocabulary confusion class and suffix indexes are represented. 3.2.6 Segmentation. The segmentation phase involves grouping the contents of the parafoveal image into textlines, words, and individual characters. 3.2.6.1 Recursive linear segmentation algorithm Before trying to work with the zero-crossings, it is desirable to reduce the volume of data needed to represent them, and to make explicit such properties of their struc-ture as regions of linearity. This is accomplished by applying a recursive linear segmentation algorithm which produces a sequence of line segments approximat-ing, to within a specified threshold, the chain comprising a zero-crossing . The algorithm is simple and may be paraphrased as follows: CHAPTER 3. CONCEPTS, DESIGN, AND IMPLEMENTATION 41 Given a chain of contiguous points, draw a straight line L between its end points Eptl and Ept2. Let D m a x be the distance between L and point P on the chain furthest from L. If D m a x < chain_segmentation_threshold, then return L as the segmentation, else P becomes a new end point, and repeat the process on the two sub-chains Eptl-P, and P-Ept2, and return the concatenation of their segmentations. In the current implementation chain_segmentation_threshold is set to 3.5. Figure 3.6 illustrates the process. Figure 3.7 shows the results of applying the algorithm to the zero-crossing chains. Eptl (e) Figure 3.6. Operation of the recursive linear segmentation algorithm, (a) Original chain, (b) P is point at maximum distance from line L. D m a x is > chain_segmen-tation_threshold so chain is split at P. (c) The process is repeated on the two new sub-chains, (e) Final segmentation. CHAPTER 3. CONCEPTS, DESIGN, AND IMPLEMENTATION 42 (a) (b) Figure 3.7. Results of applying the recursive linear segmentation algorithm, (a) The zero-crossing chains. (b) The resulting segmented chains (schains). Chain_segmentation_threshold = 3.5. 3.2.6.2 Textlines, Words, and Characters. At this point the zero-crossing are represented by a set of schains. An intermediate step carried out now is the removal of 'noise'—the few small extraneous schains produced by the zero-crossing determination process. This is done by deleting all schains having only 1 segment less than 5 pixel widths long. A system required to handle real scanned images of text rather than synthesized ones would without question have to pay much more attention to the problem of noise reduction and elimination. The next step is to determine the lines of text—the textlines—of which there may be more than one. This is done by projecting the schains horizontally onto the x-axis (see Figure 3.5) and assigning each group of mutually overlapping schains to a different textline. Two assumptions are involved here: (1) that the zero-crossing of characters on adjacent lines do not horizontally overlap; and (2) that all the zero-crossing comprising characters (and parts of characters) falling on the same line do overlap horizontally. CHAPTER 3. CONCEPTS, DESIGN, AND IMPLEMENTATION 43 3.2.6.3 Segmentation of Words and Characters Next, each textline is segmented into words and characters. Schains are first ordered on the textline by sorting them by their minimum y-coordinate. Then the internal schains—which represent the inner contours of letters having an enclosed loop— are removed as only the character's outer contours are considered in the feature ex-traction process. At this point it is assumed that each remaining schain represents a single character and each character consists of a single schain. The sequence of characters (schains) on the textline is then searched for gaps greater than a certain size. Words consist of the characters falling between gaps and /or the ends of the line. 3.2.6.4 Determining baselines and midlines The final step in the segmentation phase is the determination, for each textline, of two reference lines—the mid- and baselines as illustrated in Figure 3.8. These mark the positions of the tops and bottoms of small characters and are stored as attributes of the textline. Figure 3.8. The mid and base-lines. Finding them involves projecting the textline horizontally onto the x-axis. Their positions are used during the feature extraction phase in discriminating among small letters, letters with ascenders, and those with descenders. This is done by projecting onto the x-axis the bounding boxes of all the characters on the line and locating the two peaks between the top and bottom of the line which are due to the tops and bottoms of the small letters. CHAPTER 3. CONCEPTS, DESIGN, AND IMPLEMENTATION 44 There are a few wrinkles in this process required to handle lines consisting solely of small letters or those containing no characters with ascenders or no characters with descenders. The present implementation relies for the sake of simplicity on the use of several preset font-specific parameters including: min_expected_f o n t h e i g h t , the min-imum distance in pixels from the bottom of a descender to the top of an ascender, and max_expected_small_char_height, which is self-explanatory. 0 CHAPTER 3. CONCEPTS, DESIGN, AND IMPLEMENTATION 45 At the end of the segmentation phase then, the various components of the text im-age are represented by a set of objects embedded in a hierarchical structure, and the mid- and baselines have been determined (Figure 3.9). i r 1 -\ n \VAV/ : c\m 1 V Y T -U Figure 3.9. Results of the segmentation phase, (a) shows the labelings of the various objects, (b) shows the character, word, textline bounding boxes and mid- and baselines. The internal representation of some of the objects from this example, including their attributes, is shown in Figure 3.4 (with the exception that the feature values shown there are not computed until later). eriar23 (a) (b) CHAPTER 3. CONCEPTS, DESIGN, AND IMPLEMENTATION 46 3.2.7 Feature Implementation Table 3.10. Implementation of Character Features. Illustrated on the right below the example characters are the regions (referred to as search rectangles and outlined with dotted lines) of the character's bounding box which are searched for line segments, and the line segments considered in evaluating the feature. (The overhang on the dotted-line rectangles is due to a glitch in the SunCore graphics package. small Satisfied if a character's height is < distance between the mid l ine and baseline of textline containing the character, p lus 10%. ascender Satisfied if distance between the top of the character a n d the m id l ine is > min_signi f icant_ascender_descender_-he igh t . Parameters: min_significant_ascender_descender_height. descender Satisfied if the distance between the baseline and the bot-tom of the character is > m in_s ign i f i can t_ascender_ -descender_height. Parameters: min_significant_ascender_descender_height. left_vertical_outer & right_vertical_outer These features are measured as scalars meant to represent the length of a vert ical stroke on the left or r ight side (respectively) of the character as a percentage of its total height. This value is determined as the sum of the lengths, expressed as a percentage of the character's height, of a l l l ine segments w i th in 30° of vert ical ent irely contained w i th in the edge search rectangle placed along the charac-ter's left or right side. The character is considered to have a vertical outer part if this value is > 60. Parameters: edge_search_rect_width, vertical_outer_segment_angle_tolerance, vert ical_outer_min. For the character ' a ' at r ight: the va lue computed for left_vert ical_outer is 51.6368 so the feature fa i ls ; for right_vertical_outer it is 78.1025 so the feature succeeds. CHAPTER 3. CONCEPTS, DESIGN, AND IMPLEMENTATION left_round Determined by a simple template matching process. This feature is satisfied for a character if al l connected sequences of segments completely contained wi th in the left 3 /4 of the character wh i ch are connected to a segment completely contained w i th in the edge search rectangle along the left side, fal l w i th in the left_round template. The template is scaled to each character. Its boundaries are d rawn as dotted l ines in the example at right. Parameters: edge_search_rect_width, template def in i t ion. For the examples, left_round succeeds for 'c ' , and fails f o r ' s ' and V. upper_gap & lower_gap To compute these features, two sets of segments are com-pared. The first is the set of al l l ine segments completely contained i n the edge search rectangle placed against the top (or bottom) of the character; the second is the connected sequence of segments starting at one that is in the rectangle, and a l l completely contained wi th in it. If the second set is smaller than the first, then an upper (or lower) gap exists. Parameters: edge_search_rect_width. For the letter ' u ' at r ight , upper_gap succeeds, and lower_gap fai ls. oblique_outers Satisfied if the longest segment hav ing only one end point i n the left search rectangle has an angle wi th in 15° of 25°, and is > 20 pixels long, and the right such segment has an angle w i th in 15° of 155°. Parameters: edge_search_rect_width, oblique_segment_angle_tolerance, oblique_segment_min_length. For the examples at r ight, oblique_outers succeeds for V and fails for V . upper_left_extension & upper_right_extension Succeeds if there are any segment end points contained wi th in smal l search rectangles at the upper left and upper right, respectively, of the character's bounding box. F o r ' k ' , l e f t _ u p p e r _ e x t e n s i o n s u c c e e d s a n d r ight_upper_extension fails. CHAPTER 3. CONCEPTS, DESIGN, AND IMPLEMENTATION 48 3.2.8 Character Classification and Word Identification The Prolog definitions of a few of the confusion classes are shown in Figure 3.10— they reflect the specifications of the confusion classes as they were originally defined (see Table 3.3). c l a s s i f y ( C h a r , g p j y q ) :-c w r i t e l n ( c l a s s i f y , [ ' c l a s s i f y : t r y i n g ",Char,' as g p j y q . . . ' ] ) / f e a t u r e ( C h a r , d e s c e n d e r , t r u e ) . c l a s s i f y ( C h a r , m n ) :-c w r i t e l n ( c l a s s i f y , [ • c l a s s i f y : t r y i n g •,Char,' as mn...']), f e a t u r e ( C h a r , s m a l l , t r u e ) , f e a t u r e ( C h a r , l e f t _ v e r t i c a l _ o u t e r , L V ) , f e a t u r e ( C h a r , r i g h t _ v e r t i c a l _ o u t e r , RV) , c o n s t a n t ( v e r t i c a l _ o u t e r _ m i n , V O M ) , LV >= VOM, RV >= VOM, f e a t u r e ( C h a r , l o w e r _ g a p , t r u e ) . c l a s s i f y ( C h a r , e o c ) :-c w r i t e l n ( c l a s s i f y , [ ' c l a s s i f y : t r y i n g ',Char,' as eoc...']), f e a t u r e ( C h a r , s m a l l , t r u e ) , f e a t u r e ( C h a r , l e f t _ r o u n d , t r u e ) . Figure 3.10. Prolog code defining some confusion classes. The great majority of characters encountered should match only one class, but it is possible that some will match more than one, and that some will match none. Once all of a word's characters have been classified, a list of all its possible CC se-quences is produced. (In the usual case, each character will fall into only one class, so there will be only one such sequence.) The CC of an unclassifiable character is represented by an uninstantiated variable, which of course will match with any class during index lookup. The vocabulary lookup procedure makes use of Prolog's built-in search capability—to retrieve the vocabulary words matching a given CC sequence key, all that is required is a call of the form r e c o r d e d (wccindex, (<cc sequence>,<returned l i s t c o n t a i n i n g matched vocab. words>). The procedure for retrieving matching suffixes is much the same. The longest suf-fix currently stored in the suffix vocabulary is 4 characters long, so the retrieval is CHAPTER 3 . CONCEPTS, DESIGN, AND IMPLEMENTATION 4 9 tried with the word's last four CC sequences, the last 3, the last 2, and finally the last 1. 50 C H A P T E R 4. RESULTS A N D DISCUSSION Results are presented here demonstrating the performance of the parafoveal word recognition system in recognizing the words and suffixes in text images at various simulated parafoveal eccentricities and in two different fonts. Some interesting properties of the word recognition scheme are illustrated, such as its insensitivity to misspellings involving confusable characters (the 'proofreading problem'). Following the presentation of these results, the system's performance is discussed and evaluated. 4.1 RESULTS Each of the trial runs is illustrated below by a series of three pictures showing the original text image, the zero-crossings representing parafoveally available informa-tion, and the system's final recognition of words and suffixes. In order to ensure that characters are reasonably large with respect to pixel size, 72 point Macintosh screen fonts were used in generating the text images1. The average lrThe jagged edges present in the large characters result from the fact that this large size font is a simple enlargement of the corresponding 24 point one (the largest commonly available). They are CHAPTER 4. RESULTS AND DISCUSSION 51 72 point character is 30 pixels in width—this value is used in determining a (see Table 3.8). The values of the space constant a of the V 2G mask convolved with the text image in order to produce the zero crossings are taken from Table 3.8. A number of the trials shown here were performed with a = 3.9, which corresponds to retinal channel N (the channel with the highest resolution and greatest sensitivity of the four described in Table 3.7) at an eccentricity of 3°—ie. at the centre of the parafoveal region. Other trials were run with values of o corresponding to points in the parafovea closer to the fovea and farther away. The system's word and suffix identification results are displayed graphically. Below each character is shown the confusion class (or classes) into which it has been classified; if the character does not fit any class, a '?' is displayed. The vocabulary word or words which match the confusion class sequence are displayed in bold face, and below this are listed any matching suffixes. 4.1.1 Initial Trials Figure 4.1 (a = 3.9) shows the results of a trial using the by now familiar text image that has been used for a number of the examples earlier in the thesis. The text for this and a number of other trials is in the simple, san-serif font 'Geneva'. The first three words are in the system's vocabulary and are correctly recognized. The correct suffixes are among those identified—in addition, 'ally' is offered as a possible suffix of 'university', as its CC sequence also matches the end of that word. The last two words—'parafoveal vision'—are not in the system's vocabulary, but again, their suffixes are identified. effectively smoothed out by the convolution wi th V^G and do not appear to have any effect on the final results. CHAPTER 4. RESULTS AND DISCUSSION 52 reading words university parafoveal vision r e a d i n g G . 3 9 r Q a d H ^ n w o r d : r eoc aszx d t i l f mn ^ ^ i - ^ uww eoc r d reading ing words s a s z x uvu mn t i l f uvu eoc r aszx u n i v e r s i t y a l l y l y t y v aszx r aszx t i l f eoc uvu eoc aszx t i l f 9»jyci al uvw t i l f aszx t i l f eoc mn ? en ion Figure 4.1. Trial with a = 3 .9 (Channel N at centre of parafoveal region) and font Geneva. At the upper left is the original image; at upper right, the zero-crossing extracted from it. At the bottom is the system's classification of the characters, words and suffixes. The first three words are contained in the vocabulary, the last two are not. 'readingG.39' is the name given to this trial; G stands for font Geneva, 3 9 for the value of a. CHAPTER 4. RESULTS AND DISCUSSION 53 Figure 4.2 shows another trial run, with a = 4.7, corresponding to a the view through retinal channel N at point further out in the parafovea, or to the view through channel S on the border of the fovea. this light information times Vancouver WmmMm l ightG.47 t i l f t i l f ^ S hkb t i l f t i l f hkb t i l f a szx t h i s l i g h t g h t t i l f mn t i l f eoc r mn as2x t i l f t i l f eoc mn i n f o m a t i o n en Ion t lon t i l f t i l f mn eoc aszx 1 ines es s uvw a s z x aszx eoc uvw uvw eoc r 7 er Figure 4.2. Trial with a = 4.7, font Geneva. The word 'times' is not in the system's vocabulary but has the same confusion class sequence as lines', which is. CHAPTER 4. RESULTS AND DISCUSSION 54 4.1.2 An Explanation of Some Common Word Recognition Errors 4.1.2.1 The Proofreading Problem In proofreading printed copy it is a common experience that certain typographical or spelling errors can be much more difficult to spot than others. The parafoveal word recognition scheme provides a natural explanation for one aspect of this phe-nomenon, in cases where the misspelling results form the substitution of confus-able letters (Figure 4.3, a =4.7). responsibility governmont imporfant (P©©(p)®[n)©tl § ® w ( f i ) ( j i ( o ) ( n ) £ tj 0 0 spel lG . 47 r eoc a s z x eoc mn * S 2 X t i l f hkb t i l f t i l f t i l f t i l f flpjuei r e s p o n s i b i 1 i t y iy t y eoc uvw eoc r win nm eoc nm t i l f governnent ment mn LJ eoc r t i l f a s z x t i l f t i l f  9 P i > " * i M p o r t a n t Figure 4.3. The Proofreading Problem (a = 4.7, font Geneva). When a word is easily identifiable in spite of being misspelled, the error is more difficult to spot. The parafoveal word recognition scheme provides a possible explanation for this phenomenon. CHAPTER 4. RESULTS AND DISCUSSION 55 4.1.2.2 Confusing Common Words for Less Common Ones A less common type of word recognition error occurs when a relatively high-fre-quency word is substituted for another word (or abbreviation) which occurs less frequently. Figure 4.4 illustrates the case where the parafoveal recognition system confuses a word or abbreviation not in its vocabulary with a more common one having the same confusion class sequence. tho wore properly o ©re properly uncommonG.31 r eoc p r o p e r t y l y t y Figure 4.4. Less common words and abbreviations confused with common ones, (a = 3.1, font Geneva). CHAPTER 4. RESULTS AND DISCUSSION 56 4.1.3 Words with Non-Unique Confusion Class Sequences Although the great majority of the first 2000 most common english words have mutually unique confusion class sequences, of course not all do. Figure 4.5 show a few such words and how the system responds to them. look at the simple ball © t o p i © M D m u l t i G . 4 7 t i l f eoc eoc hkb t o o k l o o k aSZX t i l f ^ t i l f eoc s i n g l e s i n p l e e le u hKD aszx t i l f t i l f h a l f h a l l b a l l Figure 4.5. Words with non-unique confusion class sequences, (a = 4.7, font Geneva). Feature search regions are displayed. CHAPTER 4. RESULTS AND DISCUSSION 57 4.1.4 Classification of All Letters Figure 4.6 shows the system's classification of all 26 lower-case letters in font Geneva at a = 3.9. All are classified correctly except 'x'. The feature upper_gap (see Table 3.10) fails on 'x' at this a and font because instead of looking for line segments with a single end point in the search rectangle, as it probably should, it looks for segments with both inside, and there are none of these in this case. In any case, it is certainly not a problem that seriously degrades the system's performance as the worst that can happen if a single character in a word is unclassified is that more words (and suffixes) may match and be returned in addition to the correct one. Figure 4.6. Classification of all 26 lower-case letters, (a = 3.9, font Geneva). 4.1.5 Recognizing Characters in a Different Font One of the key claims made of the expected capabilities of the parafoveal word recognition scheme is that its basis on presumably fundamental and invariant fea-tures of character forms should make it relatively insensitive to superficial stylistic ? ? CHAPTER 4. RESULTS AND DISCUSSION 58 changes in characters, such as changes in font. To test this, some trials were run using the font Palatino, which has serifs (Figure 4.7, a = 3.9). Figure 4.8 shows the system's performance on all letters of this font. Although there are significant small scale differences between this font and the san-serif Geneva, it was necessary to change only 2 of the system's parameters to attain the illustrated performance. These were m i n _ e x p e c t e d _ s m a l l _ c h a r _ h e i g h t , which is used in determining the mid- and baselines (see section 3.2.6.4), and e d g e _ s e a r c h _ r e c t _ w i d t h which is used in the feature extraction process (see Table 3.10) and was made larger to allow for the serifs. As can be seen, the letter's' is not classified. This is due to the inadequate way in which the features left_vertical_outer and right_vertical_outer are currently im-plemented—they do not recognize gaps but simply sum the lengths of line seg-ments along the character's edge, succeeding if this value is greater than 60% of the character's height. The values computed this way for 's' are greater than the threshold so the features succeed. The character does not possess the features up-per_gap or lower_gap required to place it in either of the classes uvw or mn, so it falls into no class. The threshold value 60 could have been raised so 's' would have been correctly classified. This was not done in order to demonstrate the point that the system's use of more than one source of knowledge in identifying words gives it sufficient robustness to handle failures in the classification of isolated characters. Thus the words 'words' and 'university' are correctly recognized in spite of the fact that the character's' is not. A more serious problem illustrated in Figure 4.7 is that of character merging which is more prevalent in a font with serifs. This is discussed in detail later. CHAPTER 4. RESULTS AND DISCUSSION 59 reading words university parafoveal vision read ingP .39 r eoc aszx d t i l f «in >^_^_^ reading i n g uvw mn t i l f uvw eoc r ? u n i v e r s i t y a l l y l y t y <J S aszx r aszx t i l f eoc uvw eoc aszx t i l f uvw eoc r words g p j y « a l ? t i l f eoc ? en i o n t i o n Figure 4.7. Trial with characters of font Palatino (a = 3.9). The first three words are in the vocabulary and are correctly identified in spite of the fact that the system fails to classify the character V . Note that the characters 'vi' in 'vision' are merged. See text for commentary. CHAPTER 4. RESULTS AND DISCUSSION 60 Figure 4.8 shows the system's classification of all characters in this font. a t o @ d G t ? R M 1 ) aszx hkb eoc d eoc t i l f l^^^^J hkb t i l f ^ J 3 P J y « gpjyq ly ty hkb t i l f mn mn eoc <J S [ r ? t i l f ? al 9 P J U 4 8 P J M C I Figure 4.8. Classification of all lower-case letters in font Palati.no (a = 3.9). The last line illustrates the problem of merging. Correct classification of letters v, w and y is illustrated in Figure 4.7. CHAPTER 4. RESULTS AND DISCUSSION 61 4.2 DISCUSSION A N D EVALUATION The system's performance will be considered here and evaluated in light of the ex-pectations about its performance cited earlier in Chapter 3. A key claim was that the proposed confusion classes and feature set capture some invariant properties of character forms. The fact that most characters were classified correctly in the trial runs despite broad settings of the feature parameters, that ex-tensive fine tuning of these parameters was not required to attain the demonstrated performance, and that very few minor changes were required to allow recognition of characters in a different font, are evidence that this claim is justified. The balance between relying on character-level and word-level knowledge for word and suffix identification gives the system a demonstrated degree of robustness to isolated errors in character classification. A problem which the system is not currently capable of dealing with is that of merging between adjacent characters in the parafoveal image which occurs in fonts with serifs at large retinal eccentricities. Each of these items is discussed in more detail below. 4.2.1 Adequacy of Features and Classes One of the claims made earlier in the thesis was that the chosen set of features and the confusion classes based upon them capture some fundamental properties of character forms which are invariant over different fonts and styles and large in scale—properties whose presence should therefore be easily detectable in character images of parafoveal resolution and in a range of different fonts. The results ob-CHAPTER 4. RESULTS AND DISCUSSION 62 tained support this claim, although more extensive testing with a greater range of fonts and styles is necessary to establish it with greater certainty. There are three levels in the definition of the classification scheme used here. First, there is the choice of the classes in the domain and the features which domain ob-jects may possess; in this case the classes and features were defined on the basis of psychological evidence. The second level is the definition of each class in terms of a subset of these features, shown in Table 3.3. The final level is the implementation of the features—the method by which each is actually computed. It is most important to get the first two steps right if the classification scheme is to be successful. If the classes are unnatural ones with respect to the domain, or if the features are ill-chosen and cannot be used in combination to effectively discrimi-nate among the classes, then one would expect a large number of incorrect and non-unique classifications, with no amount of fine-tuning of feature parameters able to compensate. The results obtained here show no non-unique classifications (evidence of over-lapping classes, which would be manifested as characters classified into two or more confusion classes) and very few instances of characters failing to classify. This is in spite of the fact that the various feature parameters, such as the width and shape of the left_round template and the various angle tolerances, are set quite generously. Extensive fine tuning of parameters was not required, and in fact only two parame-ters of lesser importance had to be adjusted to allow correct recognition of the serif font Palatino. The problems that did arise resulted not from the choice of features or the defini-tion of the classes, but from the inadequate implementation of some of the features. The failure to classify the letter's' in font Palatino is an example—as explained ear-lier, it results from the fact that the implementation of the vertical_outer features CHAPTER 4. RESULTS AND DISCUSSION 63 completely ignores the presence of gaps. The letter V is not classified (see Figure 4.6) because of a minor deficiency in the implementation of the feature upper_gap, as explained earlier. All of this may be taken as evidence supporting the claims made of the adequacy of the classes and the features used to discriminate among them. Taken together with the finding of the strength of the confusion class sequence constraint, this argues well for the plausibility of this parafoveal word recognition scheme. 4.2.2 Robustness Another claim made of the expected performance of the parafoveal word recogni-tion system was that it should exhibit a degree of robustness by being capable of functioning correctly in spite of isolated failures in character classification. This is demonstrated to some extent by the system's correct recognition of words in Figure 4.7 in spite of the failure to classify the letter's'. The present system would also operate correctly if a character were classified into more than one CC, as long as the correct CC was among those offered. And it would be relatively easy to modify the index lookup procedure to generate the correct response even if a character were placed in the wrong CC altogether—although this would result in fewer unique word identifications. This demonstrated and potential robustness results from a balance and synergy be-tween the two levels of knowledge—at the character and word levels—employed by the system. If the feature extraction processes—embodying knowledge of charac-ter classes—is unable to classify one or a few characters, the remaining correctly classified characters will likely provide enough context so that the lookup into the word and suffix indexes—which embody word-level knowledge—can come up with the correct response. CHAPTER 4. RESULTS AND DISCUSSION 64 Without more extensive testing on different fonts, it cannot presently be claimed that the feature extraction process is demonstrably robust, although this does not seem unlikely for two reasons: 1) because the features being detected are simple large-scale ones; and 2) because of the smoothing performed by the V 2G operator, which will remove or reduce the effects of much small-scale noise—the bane of many earlier character recognition schemes. 4.2.3 Merging A problem which the current system is unable to deal with is that of the merging of the zero-crossings of adjacent characters. This is more prevalent in a font with ser-ifs, particularly at larger values of a (for example, note the letters 'vi' in Figure 4.7 and 'vwxy7 in Figure 4.8). It also occurs with the san-serif font Geneva at values of a corresponding to the extreme edge of the parafovea. It would be quite easy to add a capability to remove limited merging such as that il-lustrated in the cases referred to which occurs between the serifs of certain letter pairs. However, at larger values of o when whole sequences of characters would be merged it would become difficult to separate the individual characters without first identifying them. This is a case of the classic segmentation/interpretation prob-lem—we can't do the interpretation without having the segmented objects, but can't know how to do the segmentation without knowing the interpretation. The merging problem highlights what is, from the computational vision perspec-tive, probably the fundamental shortcoming of this scheme for parafoveal word recognition—its requirement of prior segmentation of individual characters. It is quite implausible that the human visual system would employ a scheme which would fail catastrophically as soon as letters were close enough together that their parafoveal images merged. This does not, however, invalidate the basic ideas un-CHAPTER 4. RESULTS AND DISCUSSION 65 deriving the current system, for two reasons. First of all, large scale word shape is determined primarily by large scale character shape, and it is certainly plausible that a number of the features implemented in the present system could usefully be ex-tracted from an un-segemented word image. Secondly, the indexing scheme itself is inherently flexible and could be modified from indexing by sequences of confusion classes to sequences of word-level features. This issue is discussed in somewhat more detail in the Conclusion. 66 CHAPTER 5. CONCLUSIONS 5.1 CONTRIBUTIONS 5.1.1 Toward a Representation for Parafoveal Information The parafoveal word recognition system described here may be considered a small step toward what Brady described as the next stage in developing a computational theory of early visual processing in reading—determining how information is in-tegrated from one fixation to the next and in particular, how low resolution infor-mation obtained in the parafovea is integrated with the high resolution informa-tion obtained when the same text is fixated foveally. The work here refines a set of large scale character features having some psychological basis and demonstrates—to the extent permitted by the limited number of trials—that they can be reliably com-puted from parafoveally available information at a range of eccentricities and from text of two different fonts, providing evidence that these features may be among those used in reading. It also demonstrates how the extraction of these features can facilitate word recognition. CHAPTER 5. CONCLUSIONS 67 5.1.2 A Mechanism by which Word Frequency Knowledge can Facilitate Parafoveal Word Recognition The findings of Inhoff and Rayner (1986) raised the research question "How does knowledge of the frequency of occurrence of a word affect the speed with which it is processed parafoveally?" This thesis proposes and successfully demonstrates a pos-sible mechanism for quick recognition of high frequency words and suffixes. 5.1.3 The Confusion Class Sequence Constraint One of the key objectives of the computational approach to visual perception is the discovery of natural constraints within the domain which can be exploited to facili-tate interpretation. The constraint, uncovered here, that the confusion class se-quence places on the identity of a word is a strong one. If a word is one of the 2005 most frequently occurring in english—which comprise 77% of the words one could expect to encounter in a standard1 passage of text then there is a 92% probability that the word's confusion class sequence constrains its identity to a single word of the 2005. As a possible contribution toward a computational theory of reading, this result is interesting for two main reasons. Firstly, it is important not so much because the particular sets of classes and features employed here have special significance, but because they demonstrate, perhaps surprisingly, that quite strong constraints can lie in the combination of a small number of classes based on quite simple large-scale features extracted from parafoveal imagery, and thus that this might be a fruitful area to look for further such constraints. Secondly, this result illustrates that useful constraints may be available when considering a small subset of the word recogni-tion problem which may not be apparent when following the more common route and considering the problem as a whole. The fraction of unique confusion classes 1 Assuming the Brown Corpus reflects the 'standard'. CHAPTER 5. CONCLUSIONS 68 will almost certainly decline as more words are added to the vocabulary (although the extent to which this will happen is unclear and should be looked into)—the constraint may turn out to be of little value in recognizing any other than the more common english words. However, a special capability for the quick recognition of the most common words and letter sequences is an ability one would expect to find in the human visual system, so constraints unique to this subset of the domain are surely useful. From the perspective of those interested in building practical reading machines, the confusion class sequence constraint may be directly useful in itself. As it stands, it allows unique identification of a large fraction of the words a reading machine could expect to encounter. Unless the degree of indexing uniqueness drops off sharply as more words are added to the vocabulary, it could well provide a gener-ally useful means of significantly constraining a word recognition system's search space. 5.1.4 A Flexible Means of Integrating Multiple Sources of Knowledge The idea of using sequences of features extracted from a word image as an index into a vocabulary for word identification is not a new one (Hull, 1986a), although use of the technique has not been widespread. However, its use in conjunction with features present in parafoveal information and for identification of common words and suffixes is, to this author's knowledge, a novel one. The indexing scheme is easily modifiable in the particular features and classes used—thus one can alter the degree and kind of character level information which must be extracted before word level knowledge is brought to bear, effectively alter-ing the point of interaction of these two levels of knowledge. For example, one can imagine extracting only the coarsest large-scale features in conjunction with a small CHAPTER 5. CONCLUSIONS 69 vocabulary indexed by a very few large confusion classes to allow recognition of the few most frequent words (such as 'the' and 'and') in information obtained from the far edge of the parafovea. In the same way, it would be easy to exploit knowledge of multi-character structures resulting, for example, from the merging of certain pairs of characters which might give rise to easily extractable and invariant features. In general, this sort of indexing scheme provides a means by which frequently used higher-level knowledge could be 'compiled' for quick access. The scheme also in-volves true integration of word-level knowledge into the recognition process, un-like many efforts in the character recognition literature, where a vocabulary is used only to choose among alternatives arrived at after exhaustive recognition of indi-vidual character (as in Kahan et al., 1987). This real integration of two level of knowledge allows the system to bypass the implicit assumption that individual characters must be recognized as an essential first step before higher level knowl-edge can be applied. This is more in line with what has long been known of the reading process which is that it does not operate by exhaustively and sequentially identifying each individual character. 5.2 AREAS FOR FURTHER RESEARCH 5.2.1 Toward a Computational Theory of Reading The further development of a computational theory of reading could certainly ben-efit from further evidence from psychological experiments describing in greater de-tail the working of the human visual system in reading. CHAPTER 5. CONCLUSIONS 70 5.2.1.1 Beyond the Single Character Focus A primary weakness of the system presented here is its basis on the segmentation and classification of individual characters. It does not seem reasonable that individual characters can form the basis of the word shape representation of the human visual system, which must be able to represent shape information across a range of resolutions, including that from the far edge of the parafovea or the pe-riphery where individual characters will be merged together and only the largest scale word shapes will be available. In handwritten text, of course all characters are connected at all resolutions, and it unreasonable to imagine that there is an entirely separate shape representation system for this case. The continued search for such a word and character shape representation—Brady's next step toward a computa-tional theory of reading—able to represent and integrate the information at differ-ent resolutions gathered during successive fixations remains a key area for further research. The work reported here provides a basis for this search. As Bouma recog-nized, "[i]n normal reading, the role of single letters is perhaps overshadowed by that of larger units," but it is probably safe to assume that "research on single letters will eventually help to clarify the special role of letter combinations." The shape of a word is certainly determined by that of the individual characters it contains, par-ticularly the large scale features of those characters. These are precisely the features used in the present system, so a number of these could no doubt be usefully applied to an unsegmented word image. Other large scale features unique to particular multi-letter combinations might arise and be useful in word identification. The same sort of indexing scheme used in the present system could also be applied, but instead of the sequence of confusion classes, the sequence of features would be used as the key into the vocabulary which would be indexed in the same way. The degree to which common words will index uniquely under such a scheme will have to be determined, but if many of the word-level features are indeed the same CHAPTER 5. CONCLUSIONS 71 as the character-level ones used here then the degree of uniqueness observed under confusion class indexing provides reason for optimism. 5.2.1.2 Exploiting Other Knowledge Sources The current system exploits knowledge of the letter sequences of common words and suffixes at the word level, and of lower-case letter structure at the character level. There are other knowledge sources available, such as english spelling rules— some letter sequences are impossible and could be recognized and removed from consideration—and letter transition frequencies—some letter pairs are far more common than others. These could be exploited in a system of this kind to refine the currently rather crude estimate of a word's letter sequence provided by the confu-sion class sequence. This would be especially useful in the case of words not in the vocabulary. Such a refined estimate could be passed on to the process identifying the word from foveal information to reduce its search space and/or to point out at what location in the word higher resolution information could most quickly re-duce the remaining ambiguity and lead to identification. 72 BIBLIOGRAPHY Blesser, B., R. Shillman, T. Kuklinski, C. Cox, M. Eden, and J. Ventura, "A Theoretical Approach for Character Recognition Based on Phenomenological Attributes," Int. J. Man-Machine Studies, vol. 6, pp. 701-714, 1974. Bouma, H., "Visual Recognition of Isolated Lower Case Letters," Vision Research, no. 11, pp. 459-474,1971. Brady, J. M. and B. J. Wielinga, "Reading the Writing on the Wall," in Computer Vision Systems, pp. 283-297, Academic Press, 1978. Brady, M., 'Toward a computational theory of early visual processing in reading," AI-Memo-593, MIT-AI, Cambridge, MA, 1980. (also in Visible Language, vol. 15, no. 2, pp. 183-214,1981.) Cox, C.H., P. Coueignoux, B. Blesser, and M. Eden, "Skeletons: A Link Between Theoretical and Physical Letter Descriptions," Pattern Recognition, vol. 15, no. 1, pp. 11-22,1982. Haber, R.N. and L.R. Haber, "Visual Components of the Reading Process," Visible Language, vol. 15, no. 2, pp. 147-182,1981. Hanson, A.R., E.M. Rieseman, and E. Fisher, "Context in Word Recognition," Pat-tern Recognition, vol. 8, pp. 34-45, 1976. Hull, Jonathan J., "Word Shape Analysis in a Knowledge-Based System for Reading Text," Proc. of the Second IEEE Conference on Artificial Intelligence Applica-tions, pp. 114-119, Miami Beach, Florida, December, 1985. Hull, Jonathan J. and Sargur N. Srihari, "A Computational Approach to Visual Word Recognition: Hypothesis Generation and Testing," Proc. IEEE Com-73 puter Society Conference on Computer Vision and Pattern Recognition, pp. 156-161, Miami Beach, June 1986a. Inhoff, Albrech W. and Keith Rayner, "Parafoveal Word-Processing During Eye Fixations in Reading—Effects of Word Frequency," Perception & Psy-chophysics, vol. 40, no. 6, pp. 431-439, 1986. Kahan, Simon and Theo Pavlidis, "On the Recognition of Printed Characters of Any Font and Size," IEEE Trans PAMI, vol. PAMI-9, no. 2, pp. 274-288, March 1987. Kucera, H. and W. Francis, Computational Analysis of Present-Day American En-glish, Brown University Press, 1967. Lashas, A., R. Shurna, A. Verikas, and A. Dosinas, "Optical Character Recognition Based on Analog Preprocessing and Automatic Feature Extraction," Compter Vision, Graphics, and Image Processing, vol. 32, pp. 191-207, 1985. Lesser, V.R. and L.D. Erman, "A Retrospective view of the Hearsay-H Architecture," IJCAI-77, pp. 790-800,1977. Marr, David, Vision: A Computational Investigation into the Human Representa-tion and Processing of Visual Information, Freeman, San Francisco, 1982. Minsky, Marvin, "A Framework for Representing Knowledge," in The Psychology of Computer Vision, ed. P. Winston, McGraw-Hill, N.Y., 1975. Mori, Shunji, Kazuhiko Yamamoto, and Michio Yasuda, "Research on Machine Recognition of Handprinted Characters," IEEE Trans PAMI, vol. PAMI-6, no. 4, pp. 386-404, July 1984. Rayner, Keith and Alexander Pollatsek, "Is Visual Information Integrated Across Saccades?," Perception & Psychophysics, vol. 34, no. 1, pp. 39-48, 1983. Rayner, Keith and J.H. Bertera, "Reading Without a Fovea," Science, vol. 206, pp. 468-469,1979. Rayner, Keith, "Eye Movements in Reading and Information Processing," Psycho-logical Bulletin, no. 85, pp. 618-660, 1978. Rayner, Keith, "Lexical Complexity and Fixation Times in Reading: Effects of Word Frequency, Verb Complexity, and Lexical Ambiguity," Memory and Cogni-tion, vol. 14, pp. 191-201,1986. Rayner, Keith, "The Perceptual Span and Peripheral Cues in Reading," Cognitive Psychology, vol. 7, pp. 65-81,1975. 74 Rayner, Keith, Arnold D. Well, Alexander Pollatsek, and J.H. Bertera, "The Avail-ability of Useful Information to the Right of Fixation in Reading," Perception & Psychophysics, vol. 31, no. 6, pp. 537-550, 1982. Rayner, Keith, et.al., "Integrating Information Across Eye Movements," Cognitive Psychology, vol. 12, pp. 206-226,1980. Schurmann, J., "Reading Machines," Proc. 6th International Conference on Pattern Recognition, pp. 1031-1044, October 1982. Shank, R., N. Goldman, C. Rieger, and C. Riesbeck, "MARGIE: Memory Analysis Response Generation and Inference on English," IJCAI-73, pp. 255-261, 1973. Shillman, R.J., T. Kuklinski, and B. Blesser, "Psychophysical techniques for investi-gating the distinctive features of letters," Int. ]. Man-Machine Studies, vol. 8, pp. 195-205,1976. Srihari, and Hull, "Knowledge Integration in Text Recognition," Proc. AAAI-82, pp. 148-151,1982. Suen, C.Y. and R.J. Shillman, "Low Error rate optical character recognition of un-constrained handprinted letters based on a model of human perception," IEEE Trans SMC, vol. SMC-7, pp. 491- 495, June 1977. Suen, C.Y., M. Berthod, and S. Mori, "Automatic Recognition of Handprinted Characters—the State of the Art," in IEEE Proceedings, vol. 68, p. 469, New York, 1980. Ullman, J., "Advances in Character Recognition," in Applications of Pattern Recognition, ed. K.S. Fu, CRC Press, 1982. 75 A P P E N D I X A . SYSTEM V O C A B U L A R Y A.1 VOCABULARY The vocabulary of the parafoveal word recognition system contains 1030 words and consists of a subset of the 1070 words shown below by frequency rank which occur > 100 times in the Brown Corpus (see section 3.1.2). The Corpus words that were not used in the vocabulary include those preceded by '%' below and those containing apostrophes. the of and to a in that is was he for it with as his on te at by I this had not are but from or have an they which one you were her a l l she there would their we him been has when who wil l more ro i f out so said what its about into than them can only other new same could time these two may then cb first any my now such like our over man me even most made after also did many before must % " F through back years where much your way well down should because each just those people mr how too little state good ver world still own see men work long fere between both % " H life being under never day same another know while last might VB great old year off come since against 8> came right used take three states himself few house use during without again place american around however home small found mrs thought went say part once general upon school every don't does S * . , united left number until always away something %2 fact though water less public put think almost hand enough far took head yet government system better set told nothing night end why called didn't eyes find ES? asked later knew point next program city business give group toward young days let room president side social given present several order national possible rather second face per among form important often things looked early white case ohn large big need four within felt along children saw best church ever least power development light thing seemed family interest want members mind country area others done turned although open certain kind problem began different door thus help sense whole matter perhaps itself it's york rimes human law line above name example action company hands local show five history whether gave either today act feet across % 3 past quite taken anything having seen death body half really week car field word words already themselves information i'm teU college shall together money period held keep sure Tee real seems behind cannot miss political air question making office brought whose special heard major problems ago became federal moment study available known result street economic boy position reason change south board individual j * society areas west close turn love community true court force full cost am wife age future voice wanted department center woman common control necessary policy following front sometimes girl six clear further land able feel mother music party provide education university child effect level students military mn short stood town morning total outside figure rate art century class north usually Washington leave plan therefore evidence million sound top black hard strong various believe play says surface type value mean scon lines modem near peace table red road tax minutes personal situation %4 alone english go* idea increase nor schools women america living started book longer cut dr f inally nature private secretary third months section call 76 greater expected fire needed ground kept that's values view dark everything pressure basis space east father required spirit union complete except I'll moved wrote conditions return support attention late particular recent hope l ive brown costs else beyond couldn't forces hours nations person taking coming dead inside l o w material report stage data heart instead looking lost miles read added amount feeling followed makes pay, single basic cold hundred including industry move research developed simply tried %1960 can't hold reached committee defense S a n d 1 actually shown sen religious river ten beginning central getting sort doing received rest %st terms trying care friends indeed medical picture administration difficult fine simple subject building especially higher meeting walked bring cent floor foreign paper similar final natural property training count growth international market police england start talk wasn't written hear story suddenly answer congress hal l issue needs considered countries l ikely working you're earth sat entire happened labor purpose relusts cases difference hair meet production stand William fa l l food involved stock earlier increased particularly whom below dub effort knowledge letter paid sent thinking %U". 'S using christian hour yes bi l l blue boys certainly ideas industrial points ready square trade %10 addition bad deal due girls method methods moral color decided directly nearly neither showed statement throughout weeks anyone kennedy questions reading according french lay nation programs services %+ physical remember size comes member record southern understand western normal population strength appeared concerned district merely %s temperature volume direction maybe ran summer trial trouble %1961 US continued evening friend list literature sales army association generally influence led met provided chance changes former husband opened science step student aid average %c cause hot month series works direct effective george lead myself piece planning soviet stopped systems theory wouldn't wrong ask clearly forms freedom movement ways worked beautiful bed consider efforts fear lot meaning note organization press somewhat spring treatment hotel placed truth apparently carried degree easy farm game immediately larger lower recently running charge coupfe dai ly %de eye performance arms blood opportunity persons understanding additional described march radio served stop technical based chief decision determined image main oh religion reported steps test window appear british character europe 8 » middle responsibility account •Kb horse learned writing activity fiscal green length ones serious types activities audience corner forward hit letters lived nuclear obtained returned slowly justice latter moving obviously plane quality straight bom choice figures function include operation parts pattern plans poo-saying seven staff stay %6 cars gives shot sun whatever faith pool ball completely extent heavy hospital lack standard waiting wish ahead corps deep democratic effects firm income language principle there's visit %15 analysis distance established expect growing importance indicated none price products attitude cities continue determine division elements existence leaders pretty serve afternoon agreement applied closed easily factors hardy limited reach scene write %30 attack drive health interested married professional remained rhode season station suggested wont covered current despite eight negro played role built commission council date exactly machine mouth original race reasons studies teeth unit becomes demand news prepared rates related relations rise supply bit director dropped %e events james officer playing raised sides standing Sunday trees unless actual clay doctor energy meant places talking thomas walk A.2 SUFFIXES The system can recognize the following common suffixes: tion ment ally ence ance ing ies ess ion ons ght ed es er al ly ty le en ck e d s. APPENDIX B . N O N - U N I Q U E CONFUSION CLASSES 91.9% of the vocabulary words have unique (with respect to the rest of the vocabu-lary) confusion class sequences (see section 3.1.3). The remaining words which do not are shown here. freq. of C C s e q . matching words Confusion class sequence 3 [time,line,fine] [ti lf,ti lf,mn,eoc] 3 [seen,seem,soon] [asxz,eoc,eoc,mn] 3 [ left,felt,tel l ] [ t i l f ,eoc,t i l f , t i l f ] 3 [ha l f ,ha l l ,ba l l ] [bhk,asxz,t i l f , t i l f ] 3 [get,got,yet] [gpjyq,eoc,tiif] 2 [whi le ,whi te ] [uvw,bhk,t i l f , t i l f ,eoc] 2 [when,whom] [uvw,bhk,eoc,mn] 2 [took,look] [tilf,eoc,eoc,bhk] 2 [these,those] [tilf,bhk,eoc,asxz,eoc] 2 [them,then] [tilf,bhk,eoc,mn] 2 [terms,forms] [tilf,eoc,r,mn,asxz] 2 [step,stop] [asxz,tilf,eoc,gpjyq] 2 [single,simple] [asxz,tilf,mn,gpjyq,tilf,eoc] 2 [set,act] [asxz,eoc,tilf] 2 [road,read] [r,eoc^sxz,d] 2 [rearrest] [r,eoc,asxz,tilf] 2 [over,ever] [eoc,uvw,eoc,r] 2 [output] [eoc,uvw,tilf] 2 [not,met] [mn,eoc,tilf] 2 [no,me] [mn^oc] 2 [new,now] [mn^oc,uvw] 2 [most,next] [mn,eoc,asxz,tilf] 2 [might,night] [mn,ti l f,gpjyq,bhk,ti l f] 2 [lost,test] [t i l f ,epc,asxz,ti l f ] 2 [let,lot] [t i l f,eoc,ti l f] 2 [know,knew] [bhk,mn,eoc,uvw] 2 [it,if] [ t i l f , t i l f ] 2 thit,bit] [bhk,t i l f , t i l f ] 2 [held,hold] [bhk,eoc,tilf,d] 2 [heard,board] [bhk,eoc,asxz,r,d] 2 [he,be] [bhk,eoc] 2 [had,bad] [bhk,asxz,d] 2 [food,feed] [tilf,eoc,eoc,d] 2 [ f ive, l ive] [t i l f , t i l f ,uvw,eoc] 2 [few,low] [tilf,eoc,uvw] 2 [feet,feel] [tilf,eoc,eoc,tilf] 2 [an,am] [asxz,mn] 2 talso,size] [asxz,tilf,asxz,eoc] 2 [ago,age] [asxz,Kpi'yq,eoc] Table B.l. Vocabulary words which do not have unique confusion class sequences. The confusion classes are: [eoCjUVW/inn/r/asxZ/d/bhk,tilf,gpjyq]. 

Cite

Citation Scheme:

        

Citations by CSL (citeproc-js)

Usage Statistics

Share

Embed

Customize your widget with the following options, then copy and paste the code below into the HTML of your page to embed this item in your website.
                        
                            <div id="ubcOpenCollectionsWidgetDisplay">
                            <script id="ubcOpenCollectionsWidget"
                            src="{[{embed.src}]}"
                            data-item="{[{embed.item}]}"
                            data-collection="{[{embed.collection}]}"
                            data-metadata="{[{embed.showMetadata}]}"
                            data-width="{[{embed.width}]}"
                            async >
                            </script>
                            </div>
                        
                    
IIIF logo Our image viewer uses the IIIF 2.0 standard. To load this item in other compatible viewers, use this url:
https://iiif.library.ubc.ca/presentation/dsp.831.1-0051923/manifest

Comment

Related Items