A COMPUTER SIMULATION STUDY OF THE APPLICATION OF CONTEXTUAL CONSTRAINTS TO CHARACTER RECOGNITION by CORNELIUS A. DYKEMA B.Sc, Queen's Univ e r s i t y , 1966 A THESIS SUBMITTED IN PARTIAL FULFILMENT OF THE REQUIREMENTS FOR THE DEGREE OF MASTER OF APPLIED SCIENCE . i n the Department of E l e c t r i c a l Engineering We accept this thesis as conforming to the required standard Research Supervisor . Members of Committee Acting Head of Department Members of the Department of E l e c t r i c a l Engineering THE UNIVERSITY OF BRITISH COLUMBIA September, 1970 In p r e s e n t i n g t h i s t h e s i s i n p a r t i a l f u l f i l m e n t o f t h e r e q u i r e m e n t s f o r an a d v a n c e d d e g r e e a t t h e U n i v e r s i t y o f B r i t i s h C o l u m b i a , I a g r e e t h a t t h e L i b r a r y s h a l l make i t f r e e l y a v a i l a b l e f o r r e f e r e n c e a n d S t u d y . I f u r t h e r a g r e e t h a t p e r m i s s i o n f o r e x t e n s i v e c o p y i n g o f t h i s t h e s i s f o r s c h o l a r l y p u r p o s e s may be g r a n t e d b y t h e Head o f my D e p a r t m e n t o r by h i s r e p r e s e n t a t i v e s . I t i s u n d e r s t o o d t h a t c o p y i n g o r p u b l i c a t i o n o f t h i s t h e s i s f o r f i n a n c i a l g a i n s h a l l n o t be a l l o w e d w i t h o u t my w r i t t e n p e r m i s s i o n . D e p a r t m e n t o f £;/-&-£7~<fc'Ctfd, /5A/g//S/^.&/e/A/<g' The U n i v e r s i t y o f B r i t i s h C o l u m b i a V a n c o u v e r 8, Canada D a t e Sfz.S ="7-Ez*'//i3/JStfT /a, / 4 ^ g > ABSTRACT Improvement of s p e c i f i c character recognition systems through the use of contextual constraints has been demonstrated by several researchers i n the f i e l d . In t h i s t h e s i s , the effectiveness of using contextual constraints to improve the o v e r a l l performance of a generalized character recognition system i s studied experimentally. A computer simulation system designed to f a c i l i t a t e t h i s i n v e s t i g a t i o n for both t h e o r e t i c a l and p h y s i c a l character recognition system models i s described. A s u i t a b l e character recognition system model i s developed. A number of sequential decision algorithms based on Bayes' optimum de c i s i o n rule and using d i f f e r e n t order contextual constraints are implemented as the d e c i s i o n element of the character r e c o g n i t i o n process. The e f f e c t on the performance of the contextual d e c i s i o n algorithms of varying the source data, as w e l l as varying the q u a l i t y and c h a r a c t e r i z a t i o n of the feature e x t r a c t i o n system with c l a s s i f i e r have been demonstrated. The experimental r e s u l t s f o r the p r a c t i c a l character recognition system model confirmed the findings of others that s i g n i f i c a n t improvement i n o v e r a l l system performance can be achieved through the a p p l i c a t i o n of con-textual constraints. I t was shown that the improvement was d i r e c t l y prop-o r t i o n a l to the order of contextual information used for low order s t a t i s t i c s and that the improvement was most pronounced when these s t a t i s t i c s were rep-resentative of the source material, the feature e x t r a c t i o n system with c l a s s i f i e r had a high recognition accuracy i n i t i a l l y and i t s performance was characterized by mistaking the most frequently occurring characters more often than the i n -frequently occurring characters. I t was demonstrated that (1) computer simulation i s a u s e f u l t o o l i n t e s t i n g hypotheses concerning the a p p l i c a t i o n of contextual constraints to ( i i ) character recognition, (2) q u a l i t a t i v e p r e d i c t i o n s based on t h e o r e t i c a l character recognition system models are applicable to p r a c t i c a l character recognition systems, and (3) the improvement which can be obtained with the a p p l i c a t i o n of contextual constraints to character recognition i s s u f f i c i e n t to make machine e d i t i n g of text competitive with c l e r i c a l e d i t i n g . ( i i i ) TABLE OF CONTENTS Page ABSTRACT 1 1 TABLE OF CONTENTS i v LIST OF ILLUSTRATIONS v i ACKNOWLEDGEMENT v i i I. INTRODUCTION 1 1.1 Motivation .' 1.2 Background 1 1.3 Scope of the Thesis 3 I I . DECISION ALGORITHM INPUTS 6 2.1 Source Description 6 2.1.1 S t a t i s t i c a l Description 6 2.1.2 Generating N-graras 6 2.1.3 Estimating P r o b a b i l i t i e s 9 2.1.4 Data Sources 9 2.2 Tentative Decisions 10 2.2.1 The Character Recognizer 10 • 2.2.2 The Character Recognizer Model 12 2.2.3 Channel Simulation 13 2.3 Channel Representations 15 2.3.1 Symmetric Channels 15' 2.3.2 Regularly Structured Channels 16 2.3.3 I r r e g u l a r l y Structured Channels 17 I I I . THE DECISION ALGORITHMS ' 18 3.1 Introduction 18 3.2 Bayes' Optimum Decision Rule 18 3.3 Bayes' Rule, I t e r a t i v e Form 19 3.3.1 Development ' 19 3.3.2 Implementations.... 21 3.4 Bayes' Rule, Look-Ahead Mode of Decision 22 IV. EXPERIMENTS AND THEIR RESULTS 25 4.1 Introduction 25 4.2 Experimental Procedures 25 4.3 Experimental Results 28 (iv) Page V. CONCLUSIONS ; 39 5.1 Concluding Remarks 39 5.2 A p p l i c a t i o n of Results 40 5.3 Suggestions for Further Research 42 BIBLIOGRAPHY 44 (v) LIST OF ILLUSTRATIONS Figure P a g e 1 A general character recognition system 4 2 S i m p l i f i e d flowchart of program for compiling source s t a t i s t i c s 7 3 Block diagram of character recognizer, using measure-ments only 11 4 The character recognizer model 13 5 Flowchart of channel simulation . 14 6 The e f f e c t of source data on the performance of con-textual d e c i s i o n algorithms 29 7 Percent error reduction vs percent channel output e r r o r s ; Bayes' de c i s i o n without look-ahead mode of decision 31 8 Percent er r o r reduction vs percent channel output err o r s ; Bayes' decision with look-ahead mode of decision 33 9 Comparative performance of Bayes' decision algorithms; percent er r o r reduction vs percent channel errors 34 10 Percent er r o r reduction vs percent channel errors for Bayes' de c i s i o n r u l e ; for both M-type channels and IM-type channels 35 11 The e f f e c t of channel q u a l i t y on the performance of contextual d e c i s i o n algorithms f o r an i r r e g u l a r l y structured channel 37 (vi) ACKNOWLEDGEMENT I g r a t e f u l l y acknowledge the f i n a n c i a l support received from both The Defence Research Board of Canada and the National Research Council of Canada. I would l i k e to thank my research supervisor, Dr. R.W. Donaldson, f o r h i s timely encouragements and generous advice throughout the course of th i s t h e s i s . I wish to thank Dr. M.P. Beddoes for reading the manuscript. I also wish to acknowledge Miss Veronica Komczynski and Miss Heather Dubois f o r q u i c k l y and neatly typing the manuscript and Messrs. S. Hussain, G.T. Toussaint and D. A l l a n f o r proofreading the manuscript. F i n a l l y , I wish to thank Messrs. J . Cavers, M. Ito, and D. Chan fo r innumerable s t i m u l a t i n g discussions, t h e i r h e l p f u l suggestions and constructive c r i t i c i s m s . ( v i i ) 1 I. INTRODUCTION 1.1 Motivation Recently, researchers i n the f i e l d of character recognition, such as Donaldson and Toussaint [1] , Raviv [2], A l t e r [3] and others, have improved the performance of t h e i r i n d i v i d u a l character recognition systems by making use of the character source dependencies i n the d e c i s i o n process. However, to the author's knowledge, no attempt has been made to study exper-imentally the effectiveness of applying contextual constraints to improve the performance of a generalized character recognition system, or CRS. In t h i s t h e s i s , a computer simulation system designed to f a c i l i t a t e t h i s i n v e s t -i g a t i o n f o r both t h e o r e t i c a l and p h y s i c a l character recognition systems i s described. A s u i t a b l e character recognition system model i s developed. A number of d e c i s i o n algorithms using d i f f e r e n t contextual constraints are implemented as the d e c i s i o n element of the character recognition process. The e f f e c t on the performance of the contextual d e c i s i o n algorithms of varying the model parameters i s demonstrated. 1.2 Background I n i t i a l e f f o r t s to recognize characters by machine were based on the p r i n c i p l e that i n d i v i d u a l characters could be recognized s o l e l y by t h e i r unique p h y s i c a l properties [4]. L i t t l e a t t e n t i o n was paid to the context i n which these characters occurred. I t soon became evident that purely p h y s i c a l techniques, such as contour t r a c i n g [5], produced un s a t i s f a c t o r y r e s u l t s [6]. Other researchers, such as Edwards and Chambers [7], obtained improved r e s u l t s by g i v i n g some consideration to the character's context. Based on the character's p h y s i c a l properties several p o s s i b l e a l t e r n a t i v e characters were selected . Bigram s t a t i s t i c s were then used to make a f i n a l , d e c i s i o n on the character. 2 These encouraging r e s u l t s caused some researchers to concentrate t h e i r e f f o r t s on using contextual constraints to improve the accuracy of character recognition machines [8]. They were joined i n these e f f o r t s by l i n g u i s t s i n t e r e s t e d i n the degarbling of unorthographic English text [9, 10]. The l i n g u i s t s ' main concern was the co r r e c t i o n of errors rather than the detection and co r r e c t i o n of e r r o r s . The i n i t i a l attempts to incorporate context r e l i e d heavily on di c t i o n a r y look-up schemes, and did not make e x p l i c i t use of the s t a t i s t i c a l constraints of the source material [11]. Cornew [12] used bigrams to make a f i n a l d e c i s i o n on a character a f t e r the word had f a i l e d to be i d e n t i f i e d by a dict i o n a r y look-up. A l t e r [3] and others [11, 13] employed complex algorithms based on e i t h e r syntax or knowledge of frequently misspelled words to correct e r r o r s . In 1948, Shannon [14, 15] showed that the English language i s 75% redundant, and in d i c a t e d how these language constraints might be used to pr e d i c t missing l e t t e r s i n character s t r i n g s . Newman and Gerstman [16], and Carson [17] c a r r i e d out further studies on the analysis and use of l e t t e r constraints i n English text. Shannon [15] also demonstrated that a natural language can be approximated by a kth-order ergodic Markov source, and that the resemblance to natural text becomes markedly stronger with each successive increase i n the order k. This improved t h e o r e t i c a l model of the source, or input to the character recognizer, aided others, such as Drake [18], i n devel-oping a mathematical model f o r an o v e r a l l character r e c o g n i t i o n system. The recognition problem could be regarded as observing a Markov source through a noisy channel. A procedure f o r decoding the channel output f o r source alphabets having more than two classes was devised by Raviv [2] and Abend [19], who showed independently how c l a s s i c a l d e c i s i o n theory could be extended and 3 applied to t h i s problem. Both researchers presented the sequential compound Bayes' procedure for dependent states of nature where the dependence occurs i n the form of Markov dependence. Abend presented the Bayes' procedure i n a h i s t o r i c a l and t h e o r e t i c a l perspective. Raviv, on the other hand, stressed i t s p r a c t i c a l a p p l i c a t i o n and c a r r i e d out extensive experiments to demonstrate i t s e r r o r correcting a b i l i t y as a function of error r e j e c t rate. Recently, the a p p l i c a t i o n of a recursive Bayes' optimal procedure to sequential multi-category pattern recognition has been extended to include source dependencies other than Markov dependence [20], and also to include sequential measurement c o r r e l a t i o n [21]. For an up-to-date perspective of the f i e l d of pattern recognition see Nagy [22], and Kanal and Chandrasekaran [23]. 1.3 Scope of the thesis The general character recognition problem may be modelled as shown i n F i g . 1. The purpose of t h i s thesis was to simulate this model on a general purpose d i g i t a l computer, i n our case an IBM System 360/67, so that the e f f e c t i v e n e s s of various d e c i s i o n algorithms using context could be evaluated and the i n t e r a c t i o n of the various model components could be studied. Spec-i f i c a l l y , several d e c i s i o n algorithms were implemented and t h e i r performance evaluated i n terms of percentage error reduction. The e f f e c t on the per-formance of the d e c i s i o n algorithm of using input data other than design data has been shown. F i n a l l y , the extent to which context i s u s e f u l i n improving the o v e r a l l performance of a character recognition system has been demonstrated for several t h e o r e t i c a l and p h y s i c a l channel representations. In Chapter 2, the design stage of the experiments, the required inputs to the decision algorithms are considered. This includes the s t a t -i s t i c a l d e s c r i p t i o n of the source, the simulation of a character recognizer to S t a t i s t i c a l D e s c r i p t i o n of Source Contextual Cons t r a i n t s Character Source Input Characters Character Recognizer or Noisy Channel S t a t i s t i c a l Descrip t i o n of Channel Tentative Decisions Figures of Confidence De c i s i o n Element F i n a l Decisions Source Estimate F i g . 1. A general character r e c o g n i t i o n system obtain tentative decisions based on measurements alone, and a s t a t i s t i c a l d e s c r i p t i o n of the character recognizer's performance. In the design stage, the trigram, bigram and l e t t e r frequencies were generated from the language data sources. Several sample t h e o r e t i c a l and p r a c t i c a l channel represent-ations are discussed. Chapter 3, contains a b r i e f i n t r o d u c t i o n to the a p p l i c a t i o n of deci s i o n theory to character r e c o g n i t i o n . The i t e r a t i v e form of Bayes' optimum decision procedure i s derived and i s implemented f o r d i f f e r e n t con-tex t u a l constraints i n a s e r i e s of algorithms. The look-ahead mode of dec i s i o n i s discussed. Chapter 4 presents the experimental procedures used and the r e s u l t s that were obtained. Some conclusions that were drawn from the experimental r e s u l t s are discussed. Chapter 5 presents some f i n a l conclusions, a discussion of the a p p l i c a t i o n of the r e s u l t s of the thesis and some suggestions f o r furth e r research. 6 I I . DECISION ALGORITHM INPUTS 2.1 Source Description 2.1.1 S t a t i s t i c a l D escription The s t r i n g of characters presented as input to a character recognizer i s c a l l e d the source. Shannon [13] and others have shown that i f the source i s a n a t u r a l language, such as English text, the dependencies among the characters are i n the form of Markov dependence. Markov dependence of order k i s defined as: p ( x ^ / x r , i ' • ' * > xi) = P ( x /x ...... ,x ) (1) n n-1 1 n n-1 n-k where p(.) i s a c o n d i t i o n a l density function and {x } i s a set of n characters. n This implies that for a zero-order Markov source the characters are i n f a c t chosen independently, but each l e t t e r occurs with the same prob-a b i l i t y that i t has i n the n a t u r a l language. For a f i r s t - o r d e r Markov source the choice of a character depends on the previous character. The e n t i r e spectrum of p o s s i b l e choices for a next l e t t e r may be represented by a table of bigrams. Each bigram represents the p r o b a b i l i t y of occurrence of a l e t t e r p a i r combination. S i m i l a r l y , trigrams are the p r o b a b i l i t i e s of occurrence of l e t t e r t r i p l e t s , and so on t i l l n-grams. Thus we may choose to describe the source i n terms of n-gram s t a t i s t i c s . T h e o r e t i c a l l y , the maximum value of n i s not bounded, however, i n a p r a c t i c a l implementation n i s bounded by the amount of storage a v a i l a b l e . This r e s t r i c t s n to a small number, say n<3, since the s i z e of an n-gram table i s r R where r i s the s i z e of the alphabet. 2.1.2 Generating N-grams No s i n g l e set of n-gram tables which adequately represents the English language i s a v a i l a b l e i n the l i t e r a t u r e , although several researchers have published l i m i t e d bigram and trigram s t a t i s t i c s [24, 25]. In most cases Read i n options, and storage f i l e names Are only new s t a t i s t i c s required? Count trigrams, and store i n array TC_^ ^ No Find and load o l d array, T C . j k Yes 1 Read source data, e d i t to remove non-acceptable characters 1 Estimate trigram prob TC. ., + l / r 2 _ i.l k i i k ? ? f TC. +r l j k i j k Compute bigram p r o b a b i l i t i e s n. . = £ n i j k xjk Compute l e t t e r p r o b a b i l i t i e s n. = ? n. . Store s t a t i s t i c s ; arrays : TC. .. , n . ., , i i k i j k n.., n. J J Are natural logarithms desired? Yes Compute L. ., • = In (n. ) Compute L. . = In ( n. .) Compute L = In ( n ) Store arrays i j k 13 i ( Stop ) No, C s t ° p ) F i g . 2. S i m p l i f i e d flowchart of program f o r compiling source s t a t i s t i c s 8 a very r e s t r i c t e d set of t r a i n i n g data was used. The trigrams i n "Secret and Urgent", f o r example, were based on only 20,000 words. Other shortcomings of these s t a t i s t i c s are that they are not a v a i l -able i n a r e a d i l y usable form. They are usually given as frequencies of occurrence rather than r e l a t i v e frequencies of occurrence, and they generally do not include s t a t i s t i c s on the space. To overcome some of these drawbacks we compiled our own language s t a t i s t i c s . A s i m p l i f i e d flowchart of the program used to compile the s t a t i s t i c s i s shown i n F i g . 2. Some of i t s advantages and options are described here. The program generates trigrams, bigrams and l e t t e r frequencies from raw input data, supplying both frequencies of occurrence and r e l a t i v e frequencies. The input data, samples of English text, can be transcribed d i r e c t l y to key-punched cards without having to remove punctuation marks. The card data must be error free, since t h i s i s design data, and i s stored on magnetic tape, which i s the only type of input accepted by the n-gram compile program. The input data need not be i n one contiguous block, nor does the data have to be processed a l l at once. This means that the s t a t i s t i c s can be updated each time more key-punched data becomes a v a i l a b l e . This also permits the s t a t i s t i c s to be biased i n favour of some s p e c i a l i z e d form of English, such as s c i e n t i f i c E n glish, by compiling the s t a t i s t i c s from predominantly s c i e n t i f i c data. The program considers a 27 character alphabet where the space i s considered as the 27th character. A l l punctuation marks are considered as spaces, since these are generally intended to provide a break i n the context, and the replacement of a punctuation mark with a space serves t h i s same purpose. An a d d i t i o n a l option which may prove u s e f u l f o r some algorithms i s the generation of n-gram tables containing the n a t u r a l logarithms of the prob-a b i l i t i e s . 9 2.1.3 Estimating P r o b a b i l i t i e s Since the n-gram p r o b a b i l i t i e s are i n f a c t not known, we may instead use the maximum l i k e l i h o o d estimates of these p r o b a b i l i t i e s , provided these estimates were based on a s u f f i c i e n t l y large number of t r i a l s . I f each t r a i l consists of s e l e c t i n g a l e t t e r from the data set, then the maximum l i k e l i h o o d estimate of p., the p r o b a b i l i t y of the l e t t e r i , i s n./N where n. i s the number 1 l l of occurrences of the l e t t e r i , and N i s the t o t a l number of t r i a l s . I f we attempt to estimate the p r o b a b i l i t y of an event when n. i s small compared to N/r, where r i s the alphabet s i z e , the maximum l i k e l i h o o d estimate w i l l usually be too small, e s p e c i a l l y i f = 0, To avoid t h i s d i f f i c u l t y when estimating p r o b a b i l i t i e s from a small number of t r i a l s some ad hoc modi f i c a t i o n to the observed frequencies i s usually made. This tends to move a l l the n.'s closer to the mean N/r. Some s t a t i s t i c i a n s f e e l that a modification of the form (n^+k)/(N+kr), where k i s ref e r r e d to as a " f l a t t e n i n g constant" which i s added to each of the sample frequencies, i s a reasonable one. The " f l a t t e n i n g constant" k=l i s used here. This implies that i f the sample frequencies, n_^ , were obtained from the estimates of P ^ j ^ ' the p r o b a b i l i t y of the t r i p l e t i j k , by summing over a l l k and a l l j , then the appropriate form of these estimates 3 2 i s (n... +k' )/(N+k'r ), where k' = 1/r . A more d e t a i l e d i u s t i f i c a t i o n i s i j k J ' given by Raviv [2] and others. 2.1.4 Data Sources The input data from which the bigram and trigram frequencies were compiled came from two sources: (1) approximately 20,000 characters from, "Bigness i s a Numbers Game" by Sanford Rose [26], Fortune magazine; (2) approximately 340,000 characters from Bertrand Russell's book, "Education and the S o c i a l Order" [27]. 10 I t was o r i g i n a l l y intended to get a b e t t e r cross s e c t i o n of English language by s e l e c t i n g large data samples from a wide d i v e r s i t y of sources, ranging from newspaper e d i t o r i a l s to popular s c i e n t i f i c a r t i c l e s . However, the process of key-punching the data was both too c o s t l y and too time consuming f o r the resources a v a i l a b l e at the time. Russell's book was used because i t had already been transcribed to computer cards and thus provided a ready supply of data. The bias to l i t e r a r y English as a r e s u l t of t h i s large sample may be o f f s e t i n the future by incorporating data samples from other sources. To test the performance of the d e c i s i o n algorithms discussed i n Chapter III on the design population, a block of 10,000 characters was chosen a r b i t r a r i l y from t h i s design data as a test sample. This sample was stored i n a separate f i l e on magnetic tape. A second test sample of 10,000 characters was taken from a popular s c i e n t i f i c j o u r n a l , S c i e n t i f i c American. The a r t i c l e used was, "The Rangelands of the Western U.S.," by R. M. Love [28] and was also stored on magnetic tape. The purpose of t h i s test sample was to allow a check on the s u f f i c i e n c y of the source s t a t i s t i c s . 2.2 Tentative Decisions 2.2.1 The Character Recognizer In t h i s t h e s i s , the term character recognizer ref e r s to that part of the character recognition problem which i s depicted i n the block diagram of F i g . 3. I t performs three functions. The transducer describes the state of nature, or character input class i n terms of a set of observ-ables and assigns a value to them. F i n a l l y , the c l a s s i f i e r evaluates the measurements and makes a te n t a t i v e d e c i s i o n or c l a s s i f i c a t i o n . In t h i s r Real World Character Categories Feature Space Measurement Space Transducer x l X l • X r Feature X r E xtractor • X n X n C l a s s i f i e r Character Recognizer D e c i s i o n Space Tentative Decision J F i g . 3. Block diagram of character recognizer, using measurements only 12 the s i s , we are not concerned with the i n d i v i d u a l functions of these com-ponents but rather with the performance of the character recognizer as a si n g l e e n t i t y . 2.2.2 The Character Recognizer Model Since we are not concerned with the i n t e r n a l workings of the character recognizer, i t may be modelled as shown i n F i g . 4. The term channel i s used to r e f e r to a d e s c r i p t i o n of how the sequence of input characters i s mapped into a sequence of ten t a t i v e decisions. The input to the channel i s orthographic English text. The output, i n general, i s garbled English or unorthographic text. The errors which have been introduced are caused by the imperfections of the character recognizer and may be c a l l e d channel noise. Since we do not have an e x p l i c i t d e s c r i p t i o n of the noise c h a r a c t e r i s t i c s nor of the character recognizer without the noise, the channel descriptions must contain the noise i m p l i c i t l y . The channel i s described by a t r a n s i t i o n matrix, C. Each element, c^_., of the matrix represents the p r o b a b i l i t y with which the character recognizer makes the tentative d e c i s i o n that the input character i s a j , when the true input character i s an i . The t r a n s i t i o n p r o b a b i l i t y matrix i s used by the d e c i s i o n algorithms as a weighting f a c t o r for the tentative decisions when considering how much emphasis should be placed on the measurements and how much on context. Each element of the p r o b a b i l i t y t r a n s i t i o n matrix expresses the confidence with which a l e t t e r has been c l a s s i f i e d . Since the elements on the p r i n c i p a l diagonal of the t r a n s i t i o n matrix r e f l e c t the frequency of correct recognition these give a p a r t i a l i n d i c a t i o n of how w e l l the CRS w i l l perform. The other f a c t o r to be considered i n estimating system performance i s the a p r i o r i p r o b a b i l i t y d i s t r i b u t i o n of the source characters. A fig u r e of merit, F, 13 Noise Character Categories Real World Channel Decision Space Tentative Decisions F i g . 4. The Character Recognizer Model used to describe the performance of CRS which does not employ contextual constraints may be defined as 27 (2) F. = E p.c. . i=l 1 13 3 = i where i s the a p r i o r i p r o b a b i l i t y of the i t h character and c i s an element on the p r i n c i p a l diagonal of the t r a n s i t i o n matrix. 2.2.3 Channel Simulation The input to the d e c i s i o n algorithm i s unorthographic English text. To obtain t h i s data from an e r r o r free sample of text we must sim-ulate the channel i f we do not wish to b u i l d an actual character recognizer. This may be done with a general purpose d i g i t a l computer. The method used to simulate the channel i s shown i n the block diagram of F i g . 5. The t r a n s i t i o n matrix of the channel we wish to simulate i s e s s e n t i a l l y an array of d i s c r e t e p r o b a b i l i t y density functions, one for each of the characters of the input alphabet. The corresponding p r o b a b i l i t y d i s t r i b u t i o n s are derived from these density functions. A s t r i n g of characters i s input to the simulator and i s e d i t t e d to remove punctuation marks. A pseudo-random number generator generates p r o b a b i l i t i e s and assigns these successively to each character i n the sequence. Each input character £ S t a r t } Input channel d e s c r i p t i o n , p r o b a b i l i t y density functions. 1 Convert density functions to p r o b a b i l i t y d i s t r i b u t i o n s . Input a s t r i n g of characters 1 1 Remove punctuation marks, and numerals Assign a random p r o b a b i l i t y to each character Match p r o b a b i l i t y to p r o b a b i l i t y d i s t r i b u t i o n function to f i n d output character i Replace input character with output character 1 Output scrambled character s t r i n g 1 / \ / End of input data? ) No ( Stop ) F i g . 5. Flowchart of Channel Simulation 15 i s then used to s e l e c t the appropriate p r o b a b i l i t y d i s t r i b u t i o n function and i t s associated p r o b a b i l i t y determines the corresponding output character. The assigned p r o b a b i l i t y i s located i n the p r o b a b i l i t y d i s t r i b u t i o n by using a 5-level binary search tree. In this manner an output character i s associated with each input character. This procedure i s repeated u n t i l the supply of input characters i s depleted. 2.3 Channel Representations 2.3.1 Symmetric Channels As we have seen, the channel i s completely described by a trans-i t i o n p r o b a b i l i t y matrix. The symmetric channels are described by symmetric matrices. Several d i f f e r e n t symmetric channels have been inve s t i g a t e d i n the experiments of Chapter IV. The p r i n c i p a l d i f f e r e n c e between each of these channels i s the di f f e r e n c e i n the elements of the p r i n c i p a l diag-onals. For convenience of reference each d i f f e r e n t channel i s given a type name and i s described below. Type A 1-e 1-s 1-e 0 0 0 0 1 where e= 25oc The p r o b a b i l i t y of a correct transmission through the channel for each character i s 1-e, except f o r the blank which i s transmitted cor-r e c t l y each time. If an erro r i s made, each i n c o r r e c t character occurs with equal p r o b a b i l i t y , namely e/25. 16 2.3.2 Regularly Structured Channels The r e g u l a r l y structured channels considered here f a l l i nto two categories, the M-type channels and the IM-type channels. The structure of these channels bears some s i m i l a r i t y to the structure of the symmetric A-type channels described above. For both types of channels the erro r p r o b a b i l i t i e s are uniformly d i s t r i b u t e d over the remaining 25 characters. A blank i s never mistaken f o r a character and a character i s never mistaken for a blank. The d i s t i n g u i s h i n g c h a r a c t e r i s t i c of the M-type channel i s that i t s elements on the p r i n c i p a l diagonal have a p o s i t i v e c o r r e l a t i o n with the a p r i o r i l e t t e r p r o b a b i l i t i e s of the source. For the experiments of Chapter IV a l i n e a r f u n c t i o n a l r e l a t i o n s h i p i s used. A t y p i c a l element, m.j, on the p r i n c i p a l diagonal i s given as, m „ = aP(G\) + b, where j = i , a >_0, CKb<L, and P (0^) i s the a p r i o r i p r o b a b i l i t y of the i t h character of the alphabet. The of f - d i a g o n a l elements are defined as a., = m../25, f o r ik xj k = 1,. . .,26, k ± j , j = i . Type M m l l CC 12 CC 13 0 cc 21 m 2 2 OC 23 0 cc 31 CC 32 m „ 33 0 0 0 0 1 The IM-type t r a n s i t i o n matrix has the same structure as the M-type matrix. The di f f e r e n c e i s that the elements on the p r i n c i p a l diagonal of the IM-type channel have a negative c o r r e l a t i o n with the a p r i o r i l e t t e r p r o b a b i l i t i e s of the source. Thus i f a l i n e a r r e l a t i o n s h i p l i k e that for 17 the M-type channel i s used, the range of the parameter 'a' would be a<0. In general, we may say that the M-type channel i s matched to the f i r s t order source s t a t i s t i c s , whereas, the IM-type channel i s mismatched to these s t a t i s t i c s . 2 . 3 . 3 I r r e g u l a r l y Structured Channels The structure of the t r a n s i t i o n p r o b a b i l i t y matrices of the i r -r e g u l a r l y structured channels depends p r i m a r i l y on the manner i n which the feature used to characterize the input characters were se l e c t e d . For example, a feature may be chosen such that the character recognizer has d i f -f i c u l t y i n d i s t i n g u i s h i n g the l e t t e r 'A' from the l e t t e r 'R', but has no d i f f i c u l t y i n d i s t i n g u i s h i n g 'A1 from ' I ' . In the former case, the t r a n s i t i o n p r o b a b i l i t y p would be large compared to the t r a n s i t i o n p r o b a b i l i t y p .. 3.1*. 3 1 Each element of the t r a n s i t i o n p r o b a b i l i t y matrix would probably be obtained from the analysis of the performance of an actual p h y s i c a l character rec-ognizer. Consequently, the simpler symmetric channel models and the reg-u l a r l y s tructured channel models are more r e a d i l y obtainable and are thus us e f u l i n studying the e f f e c t s of the channel on the performance of the d e c i s i o n algorithms discussed i n the next chapter. 18 I I I . THE DECISION ALGORITHMS 3.1 Introduction A large number of d e c i s i o n algorithms which take context into account may be employed as the d e c i s i o n element i n the automatic character recognition system. Since we l i k e to have the best system p o s s i b l e , the algorithms which are optimum for a given set of assumptions and some per-formance c r i t e r i a are of most i n t e r e s t . The ones which have been implemented i n t h i s thesis are based on the optimum Bayes' decision r u l e . The Bayes' d e c i s i o n procedure minimizes the average p r o b a b i l i t y of recognition e r r o r . The a p p l i c a t i o n of the sequential compound Bayes' procedure f o r dependent states of nature to character recognition was shown independently by Abend [19] and Raviv [2]. Abend also showed the evolution of the sequential compound procedure from c l a s s i c a l d e c i s i o n theory. 3.2 Bayes' Optimum Decision Rule The optimal sequential compound Bayes' procedure has been derived under the assumption that the n a t u r a l language to be recognized i s a reg-u l a r Markov chain [2, 19]. To describe the source S, l e t o n represent the state of the Markov chain at time n, the observation time, and l e t \ n rep-resent the character i d e n t i t y at time n. If we assume a second-order ap-proximation to the natural language, the source i s a regular simple Markov chain and the state a n and character i d e n t i t y \ n are interchangeable since each state contains only one character. For higher order approximations each state i s composed of a s t r i n g of characters and \ n represents the l a s t character of the s t r i n g for state 6*n. For the simple Markov chain, the state space considered i s of dimension 27. Each s t a t e represents one of the characters of the alphabet or the blank space. For convenience, the 1 9 character i d e n t i t i e s are represented by the subscripts 1,2,...,27, whereas superscripts are used to denote time of observation. The t e n t a t i v e decisions on the character i d e n t i t i e s , which were obtained by considering feature n measurements only, w i l l be denoted by x , and are synonymous with observations. For character recognition applications the optimum Bayes' d e c i s i o n r u l e i s derived under the assumption that the loss function L(a^,j) i s of the form L ( a . , i ) = 1-6.., where i "is» the true character i d e n t i t y , a c t i o n a. x 1J 1 i s deciding that the true i d e n t i t y i s i , and 6 i s the Kronecker delta function. Thus, a unit i s l o s t whenever an e r r o r i n c l a s s i f i c a t i o n i s made, but nothing i s l o s t f o r correct c l a s s i f i c a t i o n . For the case when past observations only are considered the optimum Bayes' de c i s i o n rule can be stated as f o l l o w s : Choose i|> to be the true character i d e n t i t y at time n, i f i s any integer l<i><_27 such that P C o ^ / x 1 , . ...x11) = max[P(a n=l/x 1,...,x n),...,P(a n=27/x 1,...,x n)] (3) where P ( a n = i / x \ . . .,x11) denotes the c o n d i t i o n a l p r o b a b i l i t y that the state of the Markov chain at time n i s i , given a l l past tentative decisions. 3.3 Bayes' Rule, I t e r a t i v e Form 3.3.1 Development I f we assume that the source i s a regular simple Markov chain, the vector of a p o s t e r i o r i p r o b a b i l i t i e s used to make Bayes' de c i s i o n i n (3) can be calculated from known qua n t i t i e s i n an i t e r a t i v e manner. This i t e r a t i v e structure i s i d e a l f or permitting the procedure to be implemented on a d i g i t a l computer. I t should be noted that i f a higher order approxim-ation to the source i s used no changes need be made i n the development of the i t e r a t i v e form since a higher order Markov process can be reduced to 20 a regular simple Markov chain [30]. In addition to the previous assumptions, we assume that each measurement based tentative decision depends only on the i d e n t i t y of the corresponding character and i s independent of past or future characters. Thus, x\...,x are c o n d i t i o n a l l y independent, which implies that P ( x \ . . . >^i/a1=±,a2=3 , . .. ,an=k) = P(x 1/a 1=i) -P(x 2/a 2=j) .. .P(x n/a n=k) (4) or P(x n/a n=k) = P(x n/a n=k,x 1,...,x n _ 1) (5) Each element of the a p o s t e r i o r i p r o b a b i l i t y vector may be expressed as follows: „ . n , , 1 n. P(,a =k,x ,...,x ) p ( a =k/x ,...,x ) = P ( x i ; . . : , x n ) ^ n , n, 1 n-1. _ . 1 n-1. P (a =k,x /x ,...,x ) . P ( x ,...,x ) P ( x n / X l , . . . , x n - l ) . P ( x l j . . . > x n - l ) P(a n=k,x n/x 1,...,x n 1 ) P C x ^ x 1 , . . . ^ " 1 ) n. n 1 n-1. . n , , 1 n-1 P(x /a =k,x ,...,x )• P(a =k/x ,...,x ) r „ / n . n \ 1 n-1. „ . n . , l n _ i . ~ E P(x /a =j,x ,...,x ) • P(a =j/x x,...,x n L) j=l Using (5) we have P ^ k / x 1 , •••/•Vp(xV-k) ( 7 ) P(a -k/x ,...,x ) j p ( a n = j / x l > - i # > x n - l ) . P ( x n / a n = j ) J=l but F(an=k/x1,...,xn 1 ) = E P(a n=k/a n 1=i) . p ( C T n ^ i / x 1 , . . . ,xn (8) 3=1 Thus the i t e r a t i v e procedure can be stated as follows: „ 1 n . E P ^ k / a ^ i ) ^ ( a ^ t / x 1 , . . i .x11"1) . P ( x V - k ) P(a =k/x ,...,x ) = i i i — (9) j i i J i P C ^ j / a " - 1 - ! ) • P ( 0 n - 1 = i / x 1 , . . . , x n - 1 ) - P ( x n / a n = j ) where P(a =i/x ,...,x ) f o r i = l , . . . , r i s a vector representing the" state of the decision process at time n-1, P(a n=k/o n ^=i) for k = l , . . . , r and i = l , . . . , r represent the st a t e t r a n s i t i o n s of the Markov source and P(xn/o~n=k) for k = l , . . . , r are the channel t r a n s i t i o n p r o b a b i l i t i e s . The vector of a p o s t e r i o r i p r o b a b i l i t i e s at time n can thus be derived from a knowlege of the state at time n-1. Hence this process i s completely s p e c i f i e d i f we know the i n i t i a l state d i s t r i b u t i o n vector. This vector i s obtained from the source data when compiling the desired s t a t i s t i c s . The vector P ( a n = i / x \ . . . ,x n can be regarded as a vector of a p r i o r i p r o b a b i l i t i e s to which Bayes' theorem i s applied to obtain the a p o s t e r i o r i p r o b a b i l i t i e s describing the state of the dec i s i o n process at time n. Bayes' rule (3) i s then used to make the f i n a l d e c i s i o n on the state a 1 1. The a p r i o r i p r o b a b i l i t y vector summarizes a l l the inform-ation that i s relevant to making a dec i s i o n on the state cj n at time n with-out a c t u a l l y having observed the state an through the noisy channel. This l a t t e r information i s contained i n the channel t r a n s i t i o n p r o b a b i l i t y . 3.3.2 Implementations Each d e c i s i o n procedure, or algorithm used i n t h i s thesis was implemented using a Fortran subroutine. The Bayes' i t e r a t i v e procedure (9) can be implemented as i t stands above i f we assume the source to be a f i r s t -order Markov source. The BAYES2 subroutine implements t h i s case. The Markov state t r a n s i t i o n p r o b a b i l i t i e s are computed using a bigram table and the sta t e s ' i n i t i a l p r o b a b i l i t y d i s t r i b u t i o n vector. These s t a t i s t i c s along with the other inputs to the algorithm, the channel t r a n s i t i o n prob-a b i l i t i e s and the tentative decisions, were obtained as indicated i n Chapter I I . The BAYES2 algorithm computes the state vector, or a p o s t e r i o r i prob-a b i l i t y vector, at each time n and decides on o" using (3). The algorithm 22 i s repeated for each character of the input message. The input message i n t h i s case i s the output from the automatic character recognition machine which had been simulated previously. The BAYES2 algorithm i s r e f e r r e d to as Bayes 1 d e c i s i o n using bigram s t a t i s t i c s . The BAYES1 algorithm implements Bayes' de c i s i o n procedure with the m o d i f i c a t i o n that a source with a f i r s t - o r d e r approximation to the n a t u r a l language i s used instead of the f i r s t - o r d e r Markov source. This means that the input characters are independent but occur with t h e i r a p r i o r i p r o b a b i l i t i e s . Hence, the BAYES1 algorithm i s referred to as Bayes' d e c i s i o n using a p r i o r i p r o b a b i l i t i e s . The BAYES3 algorithm also employs Bayes' i t e r a t i v e procedure (9) to compute the state vector a n and decides on the p a r t i c u l a r state r e a l i z - • a t i o n using (3). This algorithm, however, assumes the source to be a second-order Markov source. These second-order s t a t i s t i c s can be employed d i r e c t l y i n (9) provided each state i s represented by a two character s t r i n g and the state t r a n s i t i o n s of one two-character s t r i n g to any other two-character s t r i n g are known. This method requires a large amount of computer s s storage since the Markov matrix i s 27 x 27 , where s i s the length of the character s t r i n g f or each s t a t e . An a l t e r n a t i v e approach, u t i l i z i n g l e s s storage, considers a table of trigrams to be a c o n d i t i o n a l bigram table [2]. The trigram table i s p a r t i t i o n e d into 27 bigram tables. Each bigram table i s conditioned on the (n-2)nd state of the Markov chain. This algorithm i s r e f e r r e d to as Bayes' dec i s i o n using trigram s t a t i s t i c s . A l l three of the above algorithms use only past information to decide on the character i d e n t i t y at time n. ' -3.4 Bayes' Rule, Look-ahead Mode of Decision I n t u i t i o n would t e l l us that i f we had more relevant information a v a i l a b l e before making a d e c i s i o n we would be able to make a be t t e r d e c i s i o n . 23 In the character recognition a p p l i c a t i o n t h i s a d d i t i o n a l information may-be obtained by observing subsequent characters before deciding on the i d e n t i t y of the present character. I f a s i n g l e subsequent tentative decision i s considered, Bayes' decision rule i s given as follows: Choose to be the true state r e a l i z a t i o n at time n i f i|) i s any integer 1<^<27 such that / n ,, 1 n+l. . . n ,. 1 n+l. n 1 n+l. ,,„ s P(o =«>/x ,...,x ) = max [P(a =l/x , . .. ,x ),...,P(a =27/x ,...,x ) (10) The vector P(a n=iji/x^, . . . ,xn+^") , 1<^<27, can be calculated i n the same manner as shown above i n se c t i o n 3.3.1 with one a d d i t i o n a l step a f t e r obtaining P(a^i^/x^, . .. ,x°) . Thus, since n 1 n+l P ( x n + 1 / a n = i ) . P(a n=i/x 1,...,x n) P(° = 1 / X '••"X ) = I P ( x n + 1 / a n = k ) . P ( on= k/x x») k=l and using the previous assumption p / n+l . n+l . n .. , n+l , n+l .. .. „. P U lo =j, a =i) = P(x lo =j) (12) to obtain n+l. n .. I n+l. n+l . n .. „, n+l . . n .. ., „. P(x lo =i) = E P(x lo =j,a =i) . P(a =j/a =i) (13) i = l i t follows that Bayes' i t e r a t i v e d ecision procedure f o r the look-ahead mode of d e c i s i o n can be stated as r ^ / n+l / ^ n + l .x „ / n + l . , n .. „, n . , 1 n. n ]_ n + 1 j = i p ( x / C T =J)-P(a =j/a =i).P(a =i/x ,...,x ) ( 1 4 > P(a =i/x ,...,x ) = -r. —• I I P ( x n + 1 / a n + 1 = t).P(a n + 1=t/a n=k)-P(a n=k/x 1,...,x n) k=l t=i The BAYES4 subroutine c a l c u l a t e s the a p o s t e r i o r i p r o b a b i l i t y vector P ( a n = i / x \ . . . , x1"*'"'") using (14) and makes Bayes' d e c i s i o n using (10). BAYES4 assumes the source to be a f i r s t - o r d e r Markov source and uses a bigram look-up table to obtain the state t r a n s i t i o n p r o b a b i l i t i e s . The BAYES5 subroutine also implements equation (14) and makes the optimal decision using (10). I t assumes the source to be a second-order. Markov source. These higher order s t a t i s t i c s are implemented i n the same manner as described above for the BAYES3 subroutine with the a d d i t i o n of the c o n d i t i o n a l bigram matrix, whose t y p i c a l element i s P(o n +^=j /<jn=i) , being conditioned on the ( n - l ) s t state, a 1 1 ^. To i l l u s t r a t e the e s s e n t i a l d i f f e r e n c e betweend the BAYES4 and BAYES5 algorithms consider the following example. Let the character s t r i n g 'female' be the output from the character source and l e t observation time n correspond to the p o s i t i o n of character 'a'. To compute the a p o s t e r i o r i p r o b a b i l i t y vector at time n the BAYES4 algorithm uses the bigram t r a n s i t i o n p r o b a b i l i t y P ( a n = j / a n ^~=i) and the look-ahead t r a n s i t i o n p r o b a b i l i t y P(a n +^= k/a n=j). The BAYES5 algorithm on the other hand uses trigram t r a n s i t i o n p r o b a b i l i t i e s or a l t e r n a t i v e l y , c o n d i t i o n a l bigram p r o b a b i l i t i e s . Thus i n computing the a p o s t e r i o r i p r o b a b i l i t y vector at time n the BAYES5 algorithm uses the c o n d i t i o n a l bigram t r a n s i t i o n p r o b a b i l i t i e s P ( o n = j / a n ^=i,X n ^=e) , ~n— 2 where A represents the decision at time (n-2) that the character i d e n t i t y I I , TW n+l , , n . ~ n - l . , ~ n - l , . . . was e , and P(.a =k/a =3,A =m) where A represents the d e c i s i o n at time (n-1) that the character i d e n t i t y was 'm'. 25 IV. EXPERIMENTS AND THEIR RESULTS 4.1 I n t r o d u c t i o n In the previous chapters a character r e c o g n i t i o n system has been described which makes use of context to reach a f i n a l d e c i s i o n on the i d e n t i t y of a c h a r a c t e r . In t h i s chapter s e v e r a l experiments are described which are designed to i n v e s t i g a t e to what extent context i s u s e f u l i n improving the o v e r a l l r e c o g n i t i o n performance of c e r t a i n c h a r a c t e r recog-n i t i o n systems whose parameters are s p e c i f i e d . This i s the primary aim of the experiments. A second o b j e c t i v e of the experiments i s to demonstrate the f l e x i b i l i t y and u t i l i t y of the programming system i n studying the e f f e c t of changing various system parameters on the use of context i n c h a r a c t e r r e c o g n i t i o n . The s p e c i f i c experiments performed to achieve the above o b j e c t i v e s are described i n s e c t i o n 4.2 and t h e i r r e s u l t s are presented i n s e c t i o n 4.3 along w i t h a d i s c u s s i o n of these r e s u l t s . 4.2 Experimental Procedures The f i r s t experiment was c o n t r i v e d to demonstrate the usefulness of context i n improving the o v e r a l l performance of an already e x i s t i n g automatic character r e c o g n i t i o n a l g o r i t h m [31]. The CRS was modelled using an i r r e g u l a r l y s t r u c t u r e d channel t r a n s i t i o n m a t r i x which was based on the a c t u a l performance of the hand-p r i n t e d c h a r a c t e r r e c o g n i t i o n a l g o r i t h m of Toussaint [31]. This system had a c o r r e c t r e c o g n i t i o n r a t e of 80% on equiprobable, s t a t i s t i c a l l y indep-endent c h a r a c t e r s . Thus the output from the channel contained approximately 20% e r r o r s . The input to the channel was a data sample c o n s i s t i n g of a b l o c k of 10,000 characters.taken from the t r a i n i n g data. The garbled channel 26 output was used as the input to each of the f i v e d e c i s i o n algorithms described i n Chapter I I I . The r e s u l t s are shown i n F i g . 6 which plo t s the percentage reduction i n errors as a function of the order of contextual information used. The second experiment was performed to demonstrate the e f f e c t of using input data other than t r a i n i n g data on the a b i l i t y of context to improve automatic character recognition. The data sample i n th i s case was a block of 10,000 characters from a popular s c i e n t i f i c l i t e r a t u r e source as described i n s e c t i o n 2.1.4. This data i s r e f e r r e d to as test data. The i d e n t i c a l channel model to that of the f i r s t experiment was used and the same test procedure was followed. The r e s u l t s of th i s experiment are also portrayed i n F i g . 6 to f a c i l i t a t e comparison. In order to study the e f f e c t of the q u a l i t y of the channel on the performance of the dec i s i o n algorithms, the t h i r d experiment was con-ducted. The 10,000 character t r a i n i n g data sample was used as the input data sample. A t h e o r e t i c a l channel model of type A (see se c t i o n 2.3.1) was used i n the channel simulation to obtain the input to the d e c i s i o n algorithms. The parameter e, which controls the q u a l i t y of the channel, was assigned a d i f f e r e n t value on each of seven t e s t s . By varying e the recognit i o n accuracy of the CRS may be a l t e r e d . For each channel configur-ation a l l f i v e d ecision algorithms processed the input text sample. The f i n a l decisions were then compared to both the channel input and i t s out-put to determine the percentage of channel output errors that were corrected. The r e s u l t s f o r the BAYES1, BAYES2 and BAYES3-decision algorithms are presented i n F i g . 7, which shows the percentage of channel errors vs the percentage of errors corrected. The r e s u l t s of the BAYES4 and BAYES5 de c i s i o n algorithms are shown i n F i g . 8. These r e s u l t s have been summarized 27 i n F i g . 9 to i n d i c a t e the r e l a t i v e performance of using Bayes' decision rule with the look-ahead mode of dec i s i o n and without i t . The fourth experiment was c a r r i e d out to study the e f f e c t on the use of context i n character recognition of using e i t h e r an M-type channel or an IM-type channel (see sec t i o n 2.3.2). Two tests with d i f f e r e n t values fo r the parameters a and b were made using the M-type channel and s i m i l a r l y two tests were c a r r i e d out using the IM-type channel. The data sample used was the 10,000 character t r a i n i n g sample. Each of the f i v e d e c i s i o n algorithms attempted to correct the errors i n the channel output. The r e s u l t s of these tests are p l o t t e d i n F i g . 10. The f i f t h experiment was performed to learn to what extent the trends observed from t e s t i n g t h e o r e t i c a l channel models were applicable to p r a c t i c a l channel representations. The 10,000 character t r a i n i n g data sample was again used as input data. The channel t r a n s i t i o n matrix used in the f i r s t experiment was modified to r e t a i n i t s same s t r u c t u r a l char-a c t e r i s t i c s but to improve i t s correct character recognition performance gr e a t l y . The modified t r a n s i t i o n matrix had approximately 4% p r o b a b i l i t y of error as compared to the o r i g i n a l 20% p r o b a b i l i t y of e r r o r . The f i v e decision algorithms were again employed to improve the channel output. The r e s u l t s are plo t t e d i n F i g . 11 as a function of the order of contextual information used. The r e s u l t s of the f i r s t experiment have been repeated i n F i g . 11 f o r the sake of comparison. The r e l a t i v e performance of the deci s i o n algorithms f o r the two d i f f e r e n t q u a l i t y , i r r e g u l a r l y structured channels i s compared to the trends observed f o r the t h e o r e t i c a l A-type channel. 28 4 . 3 Experimental Results Each of the dec i s i o n algorithms described i n Chapter I I I , BAYES1 through BAYES5, makes use of d i f f e r e n t contextual constraints. The algorithms have been l a b e l l e d to i n d i c a t e the order of contextual information used i n each case. Thus, BAYES1 uses a f i r s t order approximation to the English language, BAYES2 a second order approximation, and so on. Each successively higher ordered approximation makes use of an increased amount of contextual information. Hence, by employing the decision algorithms to degarble the same input data sample and by using a s u i t a b l e performance c r i t e r i o n we should get an i n d i c a t i o n to what extent context i s useful i n improving the performance of character recognition systems The performance c r i t e r i o n used i n these experiments was the resultant percentage of channel errors corrected. This c r i t e r i o n i s i n d i c a t i v e of the o v e r a l l improvement of a CRS that can be achieved by the use of contextual c o n s t r a i n t s . The term o v e r a l l improvement takes into account the f a c t that some of the errors i n the f i n a l d e c i s i o n output are introduced by the decision algorithms. These errors are subtracted from the t o t a l number of corrections made before the percentage improvement i s c a l c u l a t e d . The r e s u l t s of experiment / / l are presented i n F i g . 6. The graph in d i c a t e s that the higher the order of contextual information used by a decision algorithm the greater the improvement w i l l be i n o v e r a l l system performance. Bayes' de c i s i o n using a p r i o r i p r o b a b i l i t i e s achieves only a moderate improvement whereas Bayes' d e c i s i o n i n the look-ahead mode and using trigram s t a t i s t i c s achieves b e t t e r than 66% improvement. Thus a character recognition system which has 80% correct recognition without using context can improve i t s performance to about 9 3 % correct recognition by taking context into account. Fig. 6. The e f f e c t of source data on the performance of contextual d e c i s i o n algorithms. 30 By c o n s i d e r i n g the source dependencies to be Markov dependencies, as opposed to c o n s i d e r i n g an independent data source, the number of e r r o r s c o r r e c t e d i s more than doubled. The graph suggests that f u r t h e r improve-ment of performance may be achieved by c o n s i d e r i n g d e c i s i o n algorithms which are based on even higher order s t a t i s t i c s . However, s i n c e the l i n e a r s e c t i o n s of the graph have d i f f e r e n t s l o p e s , a c e r t a i n amount'of cautio n must be e x e r c i s e d i n e x t r a p o l a t i n g these r e s u l t s . The r e s u l t s of the second experiment are represented by the curve l a b e l l e d t e s t data on F i g . 6. The curve c l o s e l y p a r a l l e l s the r e s u l t s of experiment #1 but f o r each order of c o n t e x t u a l i n f o r m a t i o n used the performance i s s l i g h t l y degraded from the case where t r a i n i n g data was used f o r the input data sample. I t i s i n t e r e s t i n g to note that f o r higher order language s t a t i s t i c s the degradation becomes worse than f o r the lower order s t a t i s t i c s . This i s reasonable s i n c e we would expect the d i f f e r e n c e s between t e s t data and t r a i n i n g data to become more evident as longer s t r i n g s of characters are considered, p r o v i d i n g , of course, that the s t a t i s t i c s were based on a l i m i t e d sample of t r a i n i n g data. I f both the q u a n t i t y and the scope of the t r a i n i n g data are expanded u n t i l the s t a t i s t i c s are rep-r e s e n t a t i v e of the e n t i r e p o p u l a t i o n we would expect the two curves to c o i n c i d e f o r low order s t a t i s t i c s and only begin to diverge f o r h i g h e r order s t a t i s t i c s . The r e s u l t s thus g i v e a good i n d i c a t i o n of the s u f f i c i e n c y of the source s t a t i s t i c s . F i g . 7 shows the percentage of channel e r r o r s c o r r e c t e d as a f u n c t i o n of the percentage of e r r o r s present i n the channel output. Each curve represents the performance of a d i f f e r e n t d e c i s i o n a l g o r i t h m . The r e s u l t s of using Bayes' d e c i s i o n r u l e w i t h a p r i o r i s t a t i s t i c s are r epres-ented by curve A. Curve B i s based on Bayes' d e c i s i o n using bigram s t a t -i s t i c s and curve C p o r t r a y s the r e s u l t s of Bayes' d e c i s i o n using t r i g r a m 31 20 A - Bayes with a p r i o r i p r o b a b i l i t i e s with bigram s t a t i s t i c s B - Bayes 15 C - Bayes with trigram s t a t i s t i c s S3 o M H F i g . 7. Percent e r r o r reduction vs percent channel output e r r o r s ; Bayes' d e c i s i o n without look-ahead mode of d e c i s i o n . s t a t i s t i c s . Each experimental point i s pl o t t e d along with a bar of length two standard deviations. The standard deviation was cal c u l a t e d under the assumption that the corrections e x h i b i t a binomial d i s t r i b u t i o n [29]. That i s , the standard deviation i s given by o=/p(l-p)/n , where p i s the prob-a b i l i t y of co r r e c t i n g an error and- n i s the t o t a l number of channel e r r o r s . When using a f i r s t order approximation to the Engli s h language no improvement i n recognition i s achieved for CRS's having a high correct recognition rate and for the type of channel model considered. A s l i g h t improvement can be obtained f o r systems having an error rate greater than 15%. CRS's of the above type with er r o r rates l a r g e r than 15% can improve t h e i r performance i n the range of 4-5% with the use of decision algorithms employing bigram or trigram s t a t i s t i c s . The bigram s t a t i s t i c s appear to be somewhat superior i n this range. The usefulness of context slowly 50 % CHANNEL ERRORS 32 deteriorates as the percentage of channel errors increases. In the range of 0-15% channel errors the behaviour of the BAYES2 and BAYES3 algorithms i s markedly d i f f e r e n t . The usefulness of 2nd-order contextual information i n improving the o v e r a l l system drops o f f as the qu a l i t y of the channel improves, whereas the decision algorithm employing 3rd-order s t a t i s t i c s more than doubles i t s improvement i n performance f o r the same type of channel behaviour. The maximum improvement expected for thi s type of CRS with an i n i t i a l e r r o r rate of 1% i s about 12.5%. F i g . 8 presents the same type of information as F i g . 7. The curves l a b e l l e d D and E portray the r e s u l t s of using Bayes' decision i n the look-ahead mode of decis i o n . Curve D r e s u l t i n g from the BAYES4 algorithm i s based on bigram s t a t i s t i c s whereas curve E i s obtained from using BAYES5 with trigram s t a t i s t i c s . The improvement obtained with BAYES4 l i e s i n the range of 8-13% for the range of channel errors considered. The improvement obtained by using the BAYES5 algorithm i s considerably l a r g e r than that obtained from the BAYES4 algorithm. For low channel e r r o r rates the improvement i s more than double the improvement attained by using the BAYES4 algorithm. However, whereas the performance of the BAYES4 algorithm i s r e l a t i v e l y constant the performance of the BAYES5 algorithm drops o f f quite r a p i d l y as the channel error rate increases to about 40% of the data sample. In F i g . 9 the curves B, C, D and E have been repeated without t h e i r standard deviations i n order to permit a comparison of the performance of the decision algorithms with and without the look-ahead mode of d e c i s i o n . I t i s seen i n each case that the algorithms u t i l i z i n g the look-ahead mode of d e c i s i o n are s u b s t a n t i a l l y b e t t e r than the corresponding algorithms using the same kind of s t a t i s t i c s but without the look-ahead mode of de c i s i o n . < 33 w 35 -30 25 -2; o M S 20 g o 15 10 5 -F i g . 8. "T 10 D - Bayes with bigram s t a t i s t i c s * E - Bayes with trigram s t a t i s t i c s * * - Look-ahead mode of d e c i s i o n T 1 ~T 20 3D 40. % CHANNEL ERRORS r 5.0 T" 6XX Percent error reduction vs percent channel output e r r o r s ; Bayes' d e c i s i o n with look-ahead mode of d e c i s i o n . I t should be noted that using bigram s t a t i s t i c s with look-ahead mode produces more improvement than using trigram s t a t i s t i c s without look-ahead mode. For channel error rates of less than 5% however, t h e i r per-formances are almost equal and the curve D appears to coincide with curve C For error rates greater than 5% the performance of the BAYES4 algorithm i s 34 35 -30 -o M H cj P P Q 25 -20 -15 -10 -5 -Bigram s t a t i s t i c s Trigram s t a t i s t i c s Bigram s t a t i s t i c s with LAM Trigram s t a t i s t i c s with LAM Look-ahead mode of d e c i s i o n F i g . 9. % CHANNEL ERRORS Comparative performance of Bayes' d e c i s i o n algorithms; percent er r o r reduction vs percent channel e r r o r s . s i m i l a r to the performance of the BAYES2 algorithm although BAYES4 corrects about twice as many errors as BAYES2. The r e s u l t s of experiment //4 are presented i n F i g . 10. The results, f or the output of a p a r t i c u l a r channel type have been joined with s t r a i g h t l i n e segments to i n d i c a t e possible performance trends. To v e r i f y these trends and extend t h e i r a p p l i c a b i l i t y to a broader range of channel @ „ IM-type channel Q „' M-type channel .0 5 10 15 20 25 30 % CHANNEL ERRORS 45-F i g . 10. Percent e r r o r reduction vs percent channel errors f o r Bayes' d e c i s i o n r u l e ; f or both M-type channels and IM-type channels. 36 performance further experimental r e s u l t s should be c o l l e c t e d . I t may be noted however, that the performance of the d e c i s i o n algorithms for the IM-type channel i s considerably superior to that for the M-type channel i n every case. This would appear to v e r i f y the theory that i f the channel i s matched to the source s t a t i s t i c s and i f the tentative decisions made by the character recognizer are optimum Bayes' decisions based on measurements then the contextual decision algorithms w i l l provide no improvement i n performance. Curve A i n F i g . 10 for the M-type channel supports t h i s theory since for t h i s curve the channel was matched to the f i r s t order language s t a t i s t i c s and the decision algorithms used only f i r s t order s t a t i s t i c s to attempt to improve the channel output. Since the channel was only matched to the f i r s t order language s t a t i s t i c s , the d e c i s i o n algorithms using higher order s t a t i s t i c s were able to obtain some improve-ment by using the a d d i t i o n a l contextual information. On the other hand i f the channel i s severely mismatched to the source as i s the case for the IM-type channel, we would expect the c l a s s i f i e r of the CRS to perform badly and consequently we would expect s u b s t a n t i a l improvements to be r e a l i z e d by using contextual d e c i s i o n algorithms. The curves of F i g . 10, fo r the IM-type channel, support t h i s hypothesis. A further v e r i f i c a t i o n of t h i s hypothesis i s provided by comparing the performance of the M-type channel over the domain for which the trends are ind i c a t e d with the r e s u l t s f or the A-type channel portrayed i n F i g . 9. We observe that the general behaviour i s very s i m i l a r i n each case, with the performance of the decision algorithms f o r the A-type channel being be t t e r than the performance for the M-type channel. The d i f f e r e n c e i n performance for the BAYES3 algorithm at 7% channel errors i s quite large, with the A-type channel giving more improvement. Thus the'A-type channel provides an example of a channel intermediate between the M-type channel 37 F i g . 11. The e f f e c t of channel q u a l i t y on the performance of contextual d e c i s i o n algorithms f o r an i r r e g u l a r l y s t r u c t u r e d channel. • . 38 matched to the f i r s t order source s t a t i s t i c s and the severely mismatched IM-type channel. In experiment it 5 the r e s u l t s shown i n F i g . 11 are compared with the trends observed i n F i g s . 7 and 8. Since the A-type channels are not r e l a t e d i n a d i r e c t way to the i r r e g u l a r l y structured channels we would expect, at best, only a general carry over of performance trends. Curve A i n F i g . 7 corresponds to the use of l s t - o r d e r contextual information i n F i g . 11. The trend of curve A suggests a s l i g h t decrease i n performance for higher q u a l i t y channels. This agrees with the cor-responding r e s u l t s observed i n F i g . 11. S i m i l a r l y by noting the correspondence between the l e t t e r e d curves of F i g . 7 and F i g . 8 and the corresponding orders of contextual information used i n F i g . 11, we observe i n each case that the trends for the A-type channel agree with the observations for the i r r e g u l a r l y structured channels i n a q u a l i t a t i v e manner. I t should be noted that q u a n t i t a t i v e predictions based on the trends observed for the A-type channel would not be v a l i d f o r the i r r e g u l a r l y structured channels. These r e s u l t s suggest that using r e a d i l y a v a i l a b l e t h e o r e t i c a l channel models to get an i n s i g h t into c e r t a i n modes of behaviour of p r a c t i c a l character recognition systems i s v a l i d . 39 V, CONCLUSIONS 5.1 Concluding Remarks In t h i s thesis i t has been demonstrated that a character recog-n i t i o n system, which has been modelled i n terms of simple system components, may be simulated on a general purpose computer f o r the purpose of studying the effectiveness of using contextual constraints i n improving i t s over-a l l performance. Five i t e r a t i v e , s e q u e n t i a l l y compound Bayes' de c i s i o n procedures using d i f f e r e n t contextual constraints were implemented to observe the improvement i n performance that could be attained. Several experiments x^ere performed which, aside from demonstrating the f l e x i b i l i t y of a software simulation system, showed the e f f e c t of source s t a t i s t i c s , channel q u a l i t y and channel c h a r a c t e r i z a t i o n on the usefulness of using context to improve a character recognition system. The r e s u l t s of the f i r s t experiment i n d i c a t e d that, for p r a c t i c a l character recognition systems, s u b s t a n t i a l improvements i n the correct recognition rate could be r e a l i z e d and the improvement was d i r e c t l y prop-o r t i o n a l to the order of contextual information used. These r e s u l t s agree with the r e s u l t s of other researchers, such as Raviv [2], A l t e r [3], and others [1,8]. The second experiment demonstrated that unless s u f f i c i e n t est-imates of the source s t a t i s t i c s are used to describe the source m a t e r i a l , the a b i l i t y of context to improve recognition i s adversely e f f e c t e d . Thus, the q u a l i t y of the s t a t i s t i c s used provides one l i m i t a t i o n on the performance of the contextual d e c i s i o n algorithms. In experiment #3 the e f f e c t of channel q u a l i t y on the performance of contextual d e c i s i o n algorithms was i n v e s t i g a t e d . The r e s u l t s shown i n F i g . 9 suggest that the usefulness of context to improve recognition becomes 40 les s s i g n i f i c a n t as the channel q u a l i t y d e t e r i o r a t e s . In considering the tradeoff between a l l o c a t i n g resources to improve the channel or to improve the contextual decision element, the best choice, i n t h i s case, would appear to be improving the feature e x t r a c t i o n system and c l a s s i f i e r . On the other hand, character recognition systems having a high correct recog-n i t i o n rate would probably benefit more from the use of e f f i c i e n t contextual decision algorithms using higher order contextual cons t r a i n t s . A comparison of the r e s u l t s of experiment #3 and experiment #5 i n d i c a t e d that t h e o r e t i c a l channel models could be used to p r e d i c t the behaviour of p r a c t i c a l channels i n a q u a l i t a t i v e way. Thus, i n order to gain some i n s i g h t into the behaviour of some r e a l character recognition system i t i s not necessary to model the c h a r a c t e r i s t i c s of i t s feature e x t r a c t i o n system and c l a s s i f i e r exactly. Furthermore, hypotheses con-cerning CRS's may be tested using simple channel models. From the r e s u l t s of experiment #4 we may conclude that the channel c h a r a c t e r i z a t i o n has a profound e f f e c t on the performance of the contextual d e c i s i o n algorithms. I f the channel i s characterized by elements on the p r i n c i p a l diagonal of the t r a n s i t i o n matrix which are mismatched to the a p r i o r i source s t a t i s t i c s , the improvement achieved, f o r the same qu a l i t y channel, with contextual decision algorithms i s greater than i f the elements were matched to the source s t a t i s t i c s . 5.2 A p p l i c a t i o n of Results Both the experimental software system and the r e s u l t s obtained from the experiments performed have useful a p p l i c a t i o n s . The software system has proven i t s f l e x i b i l i t y i n t e s t i n g hypotheses concerning the i n t e r a c t i o n of the various character recognition system parameters and thus i s a useful experimental t o o l . From the standpoint of the d e c i s i o n t h e o r e t i c 41 experimenter the software system provides a convenient way of t e s t i n g . new decision procedures. On the other hand, hardware oriented character recognition schemes may be simulated through t h i s system and tested i n conjunction with f i x e d contextual decision algorithms. Those elements of the software system pertaining to the design state of the experiments w i l l continue to be useful i n the c o l l e c t i o n of source s t a t i s t i c s i n the form of trigrams, bigrams and l e t t e r frequencies. The r e s u l t s obtained from the experiments are i n s t r u c t i v e i n def i n i n g further areas of research as w e l l as providing some i n t e r e s t i n g pre-liminary i n s i g h t f o r the questions considered. F i n a l l y , a few words may be said i n defense of the relevance of this research. G a l l i and Yamada [11] concluded as a r e s u l t of a document conversion study that, "Automatic or semi-automatic proofreading i s de s i r e -able because of the inaccuracies and high cost of c l e r i c a l e d i t i n g . " To support t h i s conclusion they reported that, " f o r keypunch conversion the cost of manual proofreading and c o r r e c t i o n was 43% of the t o t a l ; and that even though 77% of the t r a n s c r i p t i o n errors were detected, only 59% were properly corrected. For 18% of the errors, corrections were improperly made, and 23% of the o r i g i n a l errors were not detected i n proofreading. C l e r i c a l e d i t i n g thus increased conversion accuracy from 96.7% to 98.7% based on word count, while i t increased the cost by approximately 74%." Although no cost estimates have been made for the o v e r a l l auto-matic character recognition system simulated i n this t h e s i s , thus precluding a cost comparison, i t i s noteworthy to compare the o v e r a l l error correcting c a p a b i l i t y of this system with that of c l e r i c a l e d i t i n g . For the p r a c t i c a l channel considered i n experiment #5 with an i n i t i a l correct recognition rate of 96.33% and a sample s i z e of 10,000 characters, the'Bayes' decision algorithm using trigram s t a t i s t i c s i n the look-ahead mode of dec i s i o n 42 corrected 69.21% of the e r r o r s , thereby increasing the correct recognition accuracy to 98.87%. Thus the. automatic character recognition system i s somewhat superior to c l e r i c a l e d i t i n g i n t h i s case. I f the processing time i s considered to be the primary recurring cost f a c t o r the automatic system i s , of course, f a r superior to c l e r i c a l e d i t i n g . In the example above, the processing time required f o r a 10,000 character sample was 442 seconds. 5.3 Suggestions for Further Research Some areas of further research were prompted by the experimental r e s u l t s obtained. For example, the r e s u l t s of the second experiment sug-gest that the s t a t i s t i c s compiled were not s u f f i c i e n t l y representative of the English language. More design data covering a broader scope of material should be used to compile s u f f i c i e n t s t a t i s t i c s . The extension of the work sta r t e d i n the fourth experiment should be c a r r i e d out to v e r i f y the trends observed and to permit a more adequate comparison of the e f f e c t s of channel c h a r a c t e r i z a t i o n . Before incorporating the decision algorithms considered i n this t hesis into a p r a c t i c a l character recognition system, the r e s u l t s of the f i r s t experiment suggest that algorithms using higher order s t a t i s t i c s should be i n v e s t i g a t e d . Unfortunately higher ordered s t a t i s t i c s require s p e c i a l storage and search techniques as a consequence of the exponential growth of the state t r a n s i t i o n matrix. The increased look-up time for the s t a t i s t i c s may o f f s e t the advantages of using higher order contextual con-s t r a i n t s . In a s i m i l a r p r a c t i c a l vein i t may be of i n t e r e s t to study the tradeoffs i n terms of the cost of improving the contextual decision algorithms or improving the performance of the feature extraction system and c l a s s i f i e r . 43 Suboptimum d e c i s i o n p r o c e d u r e s w h i c h approached t h e p e r f o r m a n c e o f t h e Bayes' optimum d e c i s i o n p r o c e d u r e s b u t w h i c h c o u l d be e x e c u t e d more q u i c k l y w o u l d p r o v i d e an a t t r a c t i v e a l t e r n a t i v e . 44 BIBLIOGRAPHY 1. R.W. Donaldson and G.T. Toussaint, "Use of contextual constraints i n recognition of contour-traced handprinted character", IEEE Trans, on Computers, In press. 2. J . Raviv, "Decision making i n Markov chains applied to the problem of pattern r e c o g n i t i o n " , IEEE Trans, on Information Theory, Vol. IT-13, pp. 536-551, October 1967. 3. R. A l t e r , " U t i l i z a t i o n of contextual constraints i n automatic speech recognition", IEEE Trans, on Audio and E l e c t r o a c o u s t i c s , pp. 6-11, March 1968. 4. L. Uhr and C. Vossler, "A Pattern-recognition program that generates, evaluates and adjusts i t s own operators", Computers and Thought, Feigen-baum and Feldman, Eds., pp. 251-268, McGraw-Hill, New York, 1963. 5. J.K. Clemens, " O p t i c a l character recognition f o r reading machine a p p l i c a t i o n s " , M.I.T. E l e c t r o n i c s Research Lab., Cambridge Mass., Quart. Progress Rept. 79, pp. 219-227, October 1965. 6. J.H. Munson, "The re c o g n i t i o n of handprinted text", Pattern Recognition, L. Kanal, Ed., Thompson Book Co., Washington, D.C, 1968. 7. A.W. Edwards and R.L. Chambers, "Can a p r i o r i p r o b a b i l i t i e s help i n character recognition?"- J . Assoc. Computing Machinery, V o l . 11, pp. 465-470, October 1964. 8. R.B. Thomas and M. Kassler, "Character recognition i n context", Information and Control, v o l . 10, pp. 43-64, 1967. 9. F.J. Damerau, "A technique for computer c o r r e c t i o n of s p e l l i n g e r r o r s " , Commun. ACM, v o l . 7, pp. 171-176, 1964. 10. CM. Vossler and N.M. Branston, "The use of context f o r c o r r e c t i n g garbled English text", Proc. ACM Nat'1. Conf., Section D 2.4, 1964. 45 11. E.J. G a l l i , and H.M. Yamada, "Experimental studies i n computer-assisted c o r r e c t i o n of unorthographic .text", IEEE Trans. Engineering Writing and Speech, v o l . EWS-, pp. 75-84, August 1968. 12. R.W. Cornew, "A s t a t i s t i c a l method of error c o r r e c t i o n " , Information and Control, No. 2, pp. 79-93, February 1968. 13. G. Carlson, "Techniques f o r r e p l a c i n g characters that are garbled on input", AFIPS Conf. P r o c , V o l . 28, pp. 189-192, 1966 Spring J o i n t Computer Conference. 14. C.E. Shannon, "A mathematical theory of communication", B e l l Syst. Tech. J . , Vol. 27, pp . 379-423, pp. 623-656, 1948. 15. C.E. Shannon, " P r e d i c t i o n and entropy of pri n t e d English", 3ell Syst. Tech. J . , Vol. 30, pp. 50-64, 1951. 16. E.B. Newman and L.J. Gerstman, "A new method for analyzing printed English", J . E x p t l . Psychol. Vol. 44, pp. 114-125, 1952. 17. P.H. Carson, " L e t t e r constraints w i t h i n words i n printed English", Cybernetics, Vol. 1, pp. 46-54, 1961. 18. A.W. Drake, "Observation of a Markov source through a noisy channel", Ph.D. thesis, Dept. of Elec. Eng., M.I.T., Cambridge, Mass., June 1962. 19. K. Abend, "Compound decision procedures for-unknown d i s t r i b u t i o n s and for dependent states of nature", Pattern Recognition, L. Kanal, Ed., Thompson Book Co., Washington, D.C, 1968. 20. J . Spragins, "A note on the I t e r a t i v e a p p l i c a t i o n of Bayes' r u l e " , IEEE Trans, on Information Theory, Vol. IT-11, No. 4, pp. 544-549, October 1965. 21. C.G. Hilborn, J r . , and D.G. L a i n i o t i s , "Unsupervised learning minimum r i s k pattern c l a s s i f i c a t i o n f o r dependent hypotheses and dependent measurements", IEEE Trans, on Systems Science and Cybernetics, Vol. SCC-5, No. 2, pp. 109-115, A p r i l 1969. 22. G. Nagy, "State of the ar t i n pattern recognition", P r o c . IEEE, Vol. 56, pp. 836-862, May 1968. 23. L. Kanal and B. Chandrasekaran, "Recognition, Machine "Recognition", and S t a t i s t i c a l approaches", Methodologies of Pattern Recognition, S. Watanabe, Ed. 24. F. P r a t t , "Secret and Urgent", Blue Ribbon Books, Garden C i t y , N.Y., 1939. 25. R.J. Underwood and W. Schultz, "Meaningfulness and Verbal Learning'.', P h i l a d e l p h i a , Pa: L i p p i n c o t t , 1960. 26. S. Rose, "Bigness i s a numbers game", Fortune, v o l . LXXX, No. 6, pp. 112-115, Nov. 1969. 27. B. R u s s e l l , "Education and the S o c i a l Order", London, G. A l l e n & Unwin Ha., 1933. 28. R.M. Love, "The rangelands of the Western U.S."j' S c i e n t i f i c American, V o l . 222, No. 2., pp. 89-93, Feb. 1970. 29. W.H. Highleyman, "The design and analysis of pattern recognition experiments", The B e l l System Technical Journal, March 1962, pp. 723-744. 30. John B. Thomas, "An Introduction to S t a t i s t i c a l Communication Theory," John Wiley & Sons, New York, 1969, pp. 503-512. 31. G.T. Toussaint and R.W. Donaldson, "Algorithms f o r Recognizing Contour-traced Handprinted Characters," IEEE Trans, on Computers, Vol. C-19, No. 6, pp. 541-546, June 1970.
- Library Home /
- Search Collections /
- Open Collections /
- Browse Collections /
- UBC Theses and Dissertations /
- A computer simulation study of the application of contextual...
Open Collections
UBC Theses and Dissertations
Featured Collection
UBC Theses and Dissertations
A computer simulation study of the application of contextual constraints to character recognition Dykema, Cornelius A. 1970
pdf
Page Metadata
Item Metadata
Title | A computer simulation study of the application of contextual constraints to character recognition |
Creator |
Dykema, Cornelius A. |
Publisher | University of British Columbia |
Date Issued | 1970 |
Description | Improvement of specific character recognition systems through the use of contextual constraints has been demonstrated by several researchers in the field. In this thesis, the effectiveness of using contextual constraints to improve the overall performance of a generalized character recognition system is studied experimentally. A computer simulation system designed to facilitate this investigation for both theoretical and physical character recognition system models is described. A suitable character recognition system model is developed. A number of sequential decision algorithms based on Bayes' optimum decision rule and using different order contextual constraints are implemented as the decision element of the character recognition process. The effect on the performance of the contextual decision algorithms of varying the source data, as well as varying the quality and characterization of the feature extraction system with classifier have been demonstrated. The experimental results for the practical character recognition system model confirmed the findings of others that significant improvement in overall system performance can be achieved through the application of contextual constraints. It was shown that the improvement was directly proportional to the order of contextual information used for low order statistics and that the improvement was most pronounced when these statistics were representative of the source material, the feature extraction system with classifier had a high recognition accuracy initially and its performance was characterized by mistaking the most frequently occurring characters more often than the infrequently occurring characters. It was demonstrated that (1) computer simulation is a useful tool in testing hypotheses concerning the application of contextual constraints to character recognition, (2) qualitative predictions based on theoretical character recognition system models are applicable to practical character recognition systems, and (3) the improvement which can be obtained with the application of contextual constraints to character recognition is sufficient to make machine editing of text competitive with clerical editing. |
Subject |
Perceptrons |
Genre |
Thesis/Dissertation |
Type |
Text |
Language | eng |
Date Available | 2011-05-25 |
Provider | Vancouver : University of British Columbia Library |
Rights | For non-commercial purposes only, such as research, private study and education. Additional conditions apply, see Terms of Use https://open.library.ubc.ca/terms_of_use. |
IsShownAt | 10.14288/1.0102175 |
URI | http://hdl.handle.net/2429/34858 |
Degree |
Master of Applied Science - MASc |
Program |
Electrical and Computer Engineering |
Affiliation |
Applied Science, Faculty of Electrical and Computer Engineering, Department of |
Degree Grantor | University of British Columbia |
Campus |
UBCV |
Scholarly Level | Graduate |
AggregatedSourceRepository | DSpace |
Download
- Media
- 831-UBC_1970_A7 D85_2.pdf [ 2.71MB ]
- Metadata
- JSON: 831-1.0102175.json
- JSON-LD: 831-1.0102175-ld.json
- RDF/XML (Pretty): 831-1.0102175-rdf.xml
- RDF/JSON: 831-1.0102175-rdf.json
- Turtle: 831-1.0102175-turtle.txt
- N-Triples: 831-1.0102175-rdf-ntriples.txt
- Original Record: 831-1.0102175-source.json
- Full Text
- 831-1.0102175-fulltext.txt
- Citation
- 831-1.0102175.ris
Full Text
Cite
Citation Scheme:
Usage Statistics
Share
Embed
Customize your widget with the following options, then copy and paste the code below into the HTML
of your page to embed this item in your website.
<div id="ubcOpenCollectionsWidgetDisplay">
<script id="ubcOpenCollectionsWidget"
src="{[{embed.src}]}"
data-item="{[{embed.item}]}"
data-collection="{[{embed.collection}]}"
data-metadata="{[{embed.showMetadata}]}"
data-width="{[{embed.width}]}"
async >
</script>
</div>
Our image viewer uses the IIIF 2.0 standard.
To load this item in other compatible viewers, use this url:
http://iiif.library.ubc.ca/presentation/dsp.831.1-0102175/manifest