"Applied Science, Faculty of"@en .
"Electrical and Computer Engineering, Department of"@en .
"DSpace"@en .
"UBCV"@en .
"Dykema, Cornelius A."@en .
"2011-05-25T22:45:59Z"@en .
"1970"@en .
"Master of Applied Science - MASc"@en .
"University of British Columbia"@en .
"Improvement of specific character recognition systems through the use of contextual constraints has been demonstrated by several researchers in the field. In this thesis, the effectiveness of using contextual constraints to improve the overall performance of a generalized character recognition system is studied experimentally. A computer simulation system designed to facilitate this investigation for both theoretical and physical character recognition system models is described. A suitable character recognition system model is developed. A number of sequential decision algorithms based on Bayes' optimum decision rule and using different order contextual constraints are implemented as the decision element of the character recognition process. The effect on the performance of the contextual decision algorithms of varying the source data, as well as varying the quality and characterization of the feature extraction system with classifier have been demonstrated.\r\nThe experimental results for the practical character recognition system model confirmed the findings of others that significant improvement in overall system performance can be achieved through the application of contextual constraints. It was shown that the improvement was directly proportional to the order of contextual information used for low order statistics and that the improvement was most pronounced when these statistics were representative of the source material, the feature extraction system with classifier had a high recognition accuracy initially and its performance was characterized by mistaking the most frequently occurring characters more often than the infrequently occurring characters.\r\nIt was demonstrated that (1) computer simulation is a useful tool in testing hypotheses concerning the application of contextual constraints to character recognition, (2) qualitative predictions based on theoretical character recognition system models are applicable to practical character recognition systems, and (3) the improvement which can be obtained with the application of contextual constraints to character recognition is sufficient to make machine editing of text competitive with clerical editing."@en .
"https://circle.library.ubc.ca/rest/handle/2429/34858?expand=metadata"@en .
"A COMPUTER SIMULATION STUDY OF THE APPLICATION OF CONTEXTUAL CONSTRAINTS TO CHARACTER RECOGNITION by CORNELIUS A. DYKEMA B.Sc, Queen's Univ e r s i t y , 1966 A THESIS SUBMITTED IN PARTIAL FULFILMENT OF THE REQUIREMENTS FOR THE DEGREE OF MASTER OF APPLIED SCIENCE . i n the Department of E l e c t r i c a l Engineering We accept this thesis as conforming to the required standard Research Supervisor . Members of Committee Acting Head of Department Members of the Department of E l e c t r i c a l Engineering THE UNIVERSITY OF BRITISH COLUMBIA September, 1970 In p r e s e n t i n g t h i s t h e s i s i n p a r t i a l f u l f i l m e n t o f t h e r e q u i r e m e n t s f o r an a d v a n c e d d e g r e e a t t h e U n i v e r s i t y o f B r i t i s h C o l u m b i a , I a g r e e t h a t t h e L i b r a r y s h a l l make i t f r e e l y a v a i l a b l e f o r r e f e r e n c e a n d S t u d y . I f u r t h e r a g r e e t h a t p e r m i s s i o n f o r e x t e n s i v e c o p y i n g o f t h i s t h e s i s f o r s c h o l a r l y p u r p o s e s may be g r a n t e d b y t h e Head o f my D e p a r t m e n t o r by h i s r e p r e s e n t a t i v e s . I t i s u n d e r s t o o d t h a t c o p y i n g o r p u b l i c a t i o n o f t h i s t h e s i s f o r f i n a n c i a l g a i n s h a l l n o t be a l l o w e d w i t h o u t my w r i t t e n p e r m i s s i o n . D e p a r t m e n t o f \u00C2\u00A3;/-&-\u00C2\u00A37~ ABSTRACT Improvement of s p e c i f i c character recognition systems through the use of contextual constraints has been demonstrated by several researchers i n the f i e l d . In t h i s t h e s i s , the effectiveness of using contextual constraints to improve the o v e r a l l performance of a generalized character recognition system i s studied experimentally. A computer simulation system designed to f a c i l i t a t e t h i s i n v e s t i g a t i o n for both t h e o r e t i c a l and p h y s i c a l character recognition system models i s described. A s u i t a b l e character recognition system model i s developed. A number of sequential decision algorithms based on Bayes' optimum de c i s i o n rule and using d i f f e r e n t order contextual constraints are implemented as the d e c i s i o n element of the character r e c o g n i t i o n process. The e f f e c t on the performance of the contextual d e c i s i o n algorithms of varying the source data, as w e l l as varying the q u a l i t y and c h a r a c t e r i z a t i o n of the feature e x t r a c t i o n system with c l a s s i f i e r have been demonstrated. The experimental r e s u l t s f o r the p r a c t i c a l character recognition system model confirmed the findings of others that s i g n i f i c a n t improvement i n o v e r a l l system performance can be achieved through the a p p l i c a t i o n of con-textual constraints. I t was shown that the improvement was d i r e c t l y prop-o r t i o n a l to the order of contextual information used for low order s t a t i s t i c s and that the improvement was most pronounced when these s t a t i s t i c s were rep-resentative of the source material, the feature e x t r a c t i o n system with c l a s s i f i e r had a high recognition accuracy i n i t i a l l y and i t s performance was characterized by mistaking the most frequently occurring characters more often than the i n -frequently occurring characters. I t was demonstrated that (1) computer simulation i s a u s e f u l t o o l i n t e s t i n g hypotheses concerning the a p p l i c a t i o n of contextual constraints to ( i i ) character recognition, (2) q u a l i t a t i v e p r e d i c t i o n s based on t h e o r e t i c a l character recognition system models are applicable to p r a c t i c a l character recognition systems, and (3) the improvement which can be obtained with the a p p l i c a t i o n of contextual constraints to character recognition i s s u f f i c i e n t to make machine e d i t i n g of text competitive with c l e r i c a l e d i t i n g . ( i i i ) TABLE OF CONTENTS Page ABSTRACT 1 1 TABLE OF CONTENTS i v LIST OF ILLUSTRATIONS v i ACKNOWLEDGEMENT v i i I. INTRODUCTION 1 1.1 Motivation .' 1.2 Background 1 1.3 Scope of the Thesis 3 I I . DECISION ALGORITHM INPUTS 6 2.1 Source Description 6 2.1.1 S t a t i s t i c a l Description 6 2.1.2 Generating N-graras 6 2.1.3 Estimating P r o b a b i l i t i e s 9 2.1.4 Data Sources 9 2.2 Tentative Decisions 10 2.2.1 The Character Recognizer 10 \u00E2\u0080\u00A2 2.2.2 The Character Recognizer Model 12 2.2.3 Channel Simulation 13 2.3 Channel Representations 15 2.3.1 Symmetric Channels 15' 2.3.2 Regularly Structured Channels 16 2.3.3 I r r e g u l a r l y Structured Channels 17 I I I . THE DECISION ALGORITHMS ' 18 3.1 Introduction 18 3.2 Bayes' Optimum Decision Rule 18 3.3 Bayes' Rule, I t e r a t i v e Form 19 3.3.1 Development ' 19 3.3.2 Implementations.... 21 3.4 Bayes' Rule, Look-Ahead Mode of Decision 22 IV. EXPERIMENTS AND THEIR RESULTS 25 4.1 Introduction 25 4.2 Experimental Procedures 25 4.3 Experimental Results 28 (iv) Page V. CONCLUSIONS ; 39 5.1 Concluding Remarks 39 5.2 A p p l i c a t i o n of Results 40 5.3 Suggestions for Further Research 42 BIBLIOGRAPHY 44 (v) LIST OF ILLUSTRATIONS Figure P a g e 1 A general character recognition system 4 2 S i m p l i f i e d flowchart of program for compiling source s t a t i s t i c s 7 3 Block diagram of character recognizer, using measure-ments only 11 4 The character recognizer model 13 5 Flowchart of channel simulation . 14 6 The e f f e c t of source data on the performance of con-textual d e c i s i o n algorithms 29 7 Percent error reduction vs percent channel output e r r o r s ; Bayes' de c i s i o n without look-ahead mode of decision 31 8 Percent er r o r reduction vs percent channel output err o r s ; Bayes' decision with look-ahead mode of decision 33 9 Comparative performance of Bayes' decision algorithms; percent er r o r reduction vs percent channel errors 34 10 Percent er r o r reduction vs percent channel errors for Bayes' de c i s i o n r u l e ; for both M-type channels and IM-type channels 35 11 The e f f e c t of channel q u a l i t y on the performance of contextual d e c i s i o n algorithms f o r an i r r e g u l a r l y structured channel 37 (vi) ACKNOWLEDGEMENT I g r a t e f u l l y acknowledge the f i n a n c i a l support received from both The Defence Research Board of Canada and the National Research Council of Canada. I would l i k e to thank my research supervisor, Dr. R.W. Donaldson, f o r h i s timely encouragements and generous advice throughout the course of th i s t h e s i s . I wish to thank Dr. M.P. Beddoes for reading the manuscript. I also wish to acknowledge Miss Veronica Komczynski and Miss Heather Dubois f o r q u i c k l y and neatly typing the manuscript and Messrs. S. Hussain, G.T. Toussaint and D. A l l a n f o r proofreading the manuscript. F i n a l l y , I wish to thank Messrs. J . Cavers, M. Ito, and D. Chan fo r innumerable s t i m u l a t i n g discussions, t h e i r h e l p f u l suggestions and constructive c r i t i c i s m s . ( v i i ) 1 I. INTRODUCTION 1.1 Motivation Recently, researchers i n the f i e l d of character recognition, such as Donaldson and Toussaint [1] , Raviv [2], A l t e r [3] and others, have improved the performance of t h e i r i n d i v i d u a l character recognition systems by making use of the character source dependencies i n the d e c i s i o n process. However, to the author's knowledge, no attempt has been made to study exper-imentally the effectiveness of applying contextual constraints to improve the performance of a generalized character recognition system, or CRS. In t h i s t h e s i s , a computer simulation system designed to f a c i l i t a t e t h i s i n v e s t -i g a t i o n f o r both t h e o r e t i c a l and p h y s i c a l character recognition systems i s described. A s u i t a b l e character recognition system model i s developed. A number of d e c i s i o n algorithms using d i f f e r e n t contextual constraints are implemented as the d e c i s i o n element of the character recognition process. The e f f e c t on the performance of the contextual d e c i s i o n algorithms of varying the model parameters i s demonstrated. 1.2 Background I n i t i a l e f f o r t s to recognize characters by machine were based on the p r i n c i p l e that i n d i v i d u a l characters could be recognized s o l e l y by t h e i r unique p h y s i c a l properties [4]. L i t t l e a t t e n t i o n was paid to the context i n which these characters occurred. I t soon became evident that purely p h y s i c a l techniques, such as contour t r a c i n g [5], produced un s a t i s f a c t o r y r e s u l t s [6]. Other researchers, such as Edwards and Chambers [7], obtained improved r e s u l t s by g i v i n g some consideration to the character's context. Based on the character's p h y s i c a l properties several p o s s i b l e a l t e r n a t i v e characters were selected . Bigram s t a t i s t i c s were then used to make a f i n a l , d e c i s i o n on the character. 2 These encouraging r e s u l t s caused some researchers to concentrate t h e i r e f f o r t s on using contextual constraints to improve the accuracy of character recognition machines [8]. They were joined i n these e f f o r t s by l i n g u i s t s i n t e r e s t e d i n the degarbling of unorthographic English text [9, 10]. The l i n g u i s t s ' main concern was the co r r e c t i o n of errors rather than the detection and co r r e c t i o n of e r r o r s . The i n i t i a l attempts to incorporate context r e l i e d heavily on di c t i o n a r y look-up schemes, and did not make e x p l i c i t use of the s t a t i s t i c a l constraints of the source material [11]. Cornew [12] used bigrams to make a f i n a l d e c i s i o n on a character a f t e r the word had f a i l e d to be i d e n t i f i e d by a dict i o n a r y look-up. A l t e r [3] and others [11, 13] employed complex algorithms based on e i t h e r syntax or knowledge of frequently misspelled words to correct e r r o r s . In 1948, Shannon [14, 15] showed that the English language i s 75% redundant, and in d i c a t e d how these language constraints might be used to pr e d i c t missing l e t t e r s i n character s t r i n g s . Newman and Gerstman [16], and Carson [17] c a r r i e d out further studies on the analysis and use of l e t t e r constraints i n English text. Shannon [15] also demonstrated that a natural language can be approximated by a kth-order ergodic Markov source, and that the resemblance to natural text becomes markedly stronger with each successive increase i n the order k. This improved t h e o r e t i c a l model of the source, or input to the character recognizer, aided others, such as Drake [18], i n devel-oping a mathematical model f o r an o v e r a l l character r e c o g n i t i o n system. The recognition problem could be regarded as observing a Markov source through a noisy channel. A procedure f o r decoding the channel output f o r source alphabets having more than two classes was devised by Raviv [2] and Abend [19], who showed independently how c l a s s i c a l d e c i s i o n theory could be extended and 3 applied to t h i s problem. Both researchers presented the sequential compound Bayes' procedure for dependent states of nature where the dependence occurs i n the form of Markov dependence. Abend presented the Bayes' procedure i n a h i s t o r i c a l and t h e o r e t i c a l perspective. Raviv, on the other hand, stressed i t s p r a c t i c a l a p p l i c a t i o n and c a r r i e d out extensive experiments to demonstrate i t s e r r o r correcting a b i l i t y as a function of error r e j e c t rate. Recently, the a p p l i c a t i o n of a recursive Bayes' optimal procedure to sequential multi-category pattern recognition has been extended to include source dependencies other than Markov dependence [20], and also to include sequential measurement c o r r e l a t i o n [21]. For an up-to-date perspective of the f i e l d of pattern recognition see Nagy [22], and Kanal and Chandrasekaran [23]. 1.3 Scope of the thesis The general character recognition problem may be modelled as shown i n F i g . 1. The purpose of t h i s thesis was to simulate this model on a general purpose d i g i t a l computer, i n our case an IBM System 360/67, so that the e f f e c t i v e n e s s of various d e c i s i o n algorithms using context could be evaluated and the i n t e r a c t i o n of the various model components could be studied. Spec-i f i c a l l y , several d e c i s i o n algorithms were implemented and t h e i r performance evaluated i n terms of percentage error reduction. The e f f e c t on the per-formance of the d e c i s i o n algorithm of using input data other than design data has been shown. F i n a l l y , the extent to which context i s u s e f u l i n improving the o v e r a l l performance of a character recognition system has been demonstrated for several t h e o r e t i c a l and p h y s i c a l channel representations. In Chapter 2, the design stage of the experiments, the required inputs to the decision algorithms are considered. This includes the s t a t -i s t i c a l d e s c r i p t i o n of the source, the simulation of a character recognizer to S t a t i s t i c a l D e s c r i p t i o n of Source Contextual Cons t r a i n t s Character Source Input Characters Character Recognizer or Noisy Channel S t a t i s t i c a l Descrip t i o n of Channel Tentative Decisions Figures of Confidence De c i s i o n Element F i n a l Decisions Source Estimate F i g . 1. A general character r e c o g n i t i o n system obtain tentative decisions based on measurements alone, and a s t a t i s t i c a l d e s c r i p t i o n of the character recognizer's performance. In the design stage, the trigram, bigram and l e t t e r frequencies were generated from the language data sources. Several sample t h e o r e t i c a l and p r a c t i c a l channel represent-ations are discussed. Chapter 3, contains a b r i e f i n t r o d u c t i o n to the a p p l i c a t i o n of deci s i o n theory to character r e c o g n i t i o n . The i t e r a t i v e form of Bayes' optimum decision procedure i s derived and i s implemented f o r d i f f e r e n t con-tex t u a l constraints i n a s e r i e s of algorithms. The look-ahead mode of dec i s i o n i s discussed. Chapter 4 presents the experimental procedures used and the r e s u l t s that were obtained. Some conclusions that were drawn from the experimental r e s u l t s are discussed. Chapter 5 presents some f i n a l conclusions, a discussion of the a p p l i c a t i o n of the r e s u l t s of the thesis and some suggestions f o r furth e r research. 6 I I . DECISION ALGORITHM INPUTS 2.1 Source Description 2.1.1 S t a t i s t i c a l D escription The s t r i n g of characters presented as input to a character recognizer i s c a l l e d the source. Shannon [13] and others have shown that i f the source i s a n a t u r a l language, such as English text, the dependencies among the characters are i n the form of Markov dependence. Markov dependence of order k i s defined as: p ( x ^ / x r , i ' \u00E2\u0080\u00A2 ' * > xi) = P ( x /x ...... ,x ) (1) n n-1 1 n n-1 n-k where p(.) i s a c o n d i t i o n a l density function and {x } i s a set of n characters. n This implies that for a zero-order Markov source the characters are i n f a c t chosen independently, but each l e t t e r occurs with the same prob-a b i l i t y that i t has i n the n a t u r a l language. For a f i r s t - o r d e r Markov source the choice of a character depends on the previous character. The e n t i r e spectrum of p o s s i b l e choices for a next l e t t e r may be represented by a table of bigrams. Each bigram represents the p r o b a b i l i t y of occurrence of a l e t t e r p a i r combination. S i m i l a r l y , trigrams are the p r o b a b i l i t i e s of occurrence of l e t t e r t r i p l e t s , and so on t i l l n-grams. Thus we may choose to describe the source i n terms of n-gram s t a t i s t i c s . T h e o r e t i c a l l y , the maximum value of n i s not bounded, however, i n a p r a c t i c a l implementation n i s bounded by the amount of storage a v a i l a b l e . This r e s t r i c t s n to a small number, say n<3, since the s i z e of an n-gram table i s r R where r i s the s i z e of the alphabet. 2.1.2 Generating N-grams No s i n g l e set of n-gram tables which adequately represents the English language i s a v a i l a b l e i n the l i t e r a t u r e , although several researchers have published l i m i t e d bigram and trigram s t a t i s t i c s [24, 25]. In most cases Read i n options, and storage f i l e names Are only new s t a t i s t i c s required? Count trigrams, and store i n array TC_^ ^ No Find and load o l d array, T C . j k Yes 1 Read source data, e d i t to remove non-acceptable characters 1 Estimate trigram prob TC. ., + l / r 2 _ i.l k i i k ? ? f TC. +r l j k i j k Compute bigram p r o b a b i l i t i e s n. . = \u00C2\u00A3 n i j k xjk Compute l e t t e r p r o b a b i l i t i e s n. = ? n. . Store s t a t i s t i c s ; arrays : TC. .. , n . ., , i i k i j k n.., n. J J Are natural logarithms desired? Yes Compute L. ., \u00E2\u0080\u00A2 = In (n. ) Compute L. . = In ( n. .) Compute L = In ( n ) Store arrays i j k 13 i ( Stop ) No, C s t \u00C2\u00B0 p ) F i g . 2. S i m p l i f i e d flowchart of program f o r compiling source s t a t i s t i c s 8 a very r e s t r i c t e d set of t r a i n i n g data was used. The trigrams i n \"Secret and Urgent\", f o r example, were based on only 20,000 words. Other shortcomings of these s t a t i s t i c s are that they are not a v a i l -able i n a r e a d i l y usable form. They are usually given as frequencies of occurrence rather than r e l a t i v e frequencies of occurrence, and they generally do not include s t a t i s t i c s on the space. To overcome some of these drawbacks we compiled our own language s t a t i s t i c s . A s i m p l i f i e d flowchart of the program used to compile the s t a t i s t i c s i s shown i n F i g . 2. Some of i t s advantages and options are described here. The program generates trigrams, bigrams and l e t t e r frequencies from raw input data, supplying both frequencies of occurrence and r e l a t i v e frequencies. The input data, samples of English text, can be transcribed d i r e c t l y to key-punched cards without having to remove punctuation marks. The card data must be error free, since t h i s i s design data, and i s stored on magnetic tape, which i s the only type of input accepted by the n-gram compile program. The input data need not be i n one contiguous block, nor does the data have to be processed a l l at once. This means that the s t a t i s t i c s can be updated each time more key-punched data becomes a v a i l a b l e . This also permits the s t a t i s t i c s to be biased i n favour of some s p e c i a l i z e d form of English, such as s c i e n t i f i c E n glish, by compiling the s t a t i s t i c s from predominantly s c i e n t i f i c data. The program considers a 27 character alphabet where the space i s considered as the 27th character. A l l punctuation marks are considered as spaces, since these are generally intended to provide a break i n the context, and the replacement of a punctuation mark with a space serves t h i s same purpose. An a d d i t i o n a l option which may prove u s e f u l f o r some algorithms i s the generation of n-gram tables containing the n a t u r a l logarithms of the prob-a b i l i t i e s . 9 2.1.3 Estimating P r o b a b i l i t i e s Since the n-gram p r o b a b i l i t i e s are i n f a c t not known, we may instead use the maximum l i k e l i h o o d estimates of these p r o b a b i l i t i e s , provided these estimates were based on a s u f f i c i e n t l y large number of t r i a l s . I f each t r a i l consists of s e l e c t i n g a l e t t e r from the data set, then the maximum l i k e l i h o o d estimate of p., the p r o b a b i l i t y of the l e t t e r i , i s n./N where n. i s the number 1 l l of occurrences of the l e t t e r i , and N i s the t o t a l number of t r i a l s . I f we attempt to estimate the p r o b a b i l i t y of an event when n. i s small compared to N/r, where r i s the alphabet s i z e , the maximum l i k e l i h o o d estimate w i l l usually be too small, e s p e c i a l l y i f = 0, To avoid t h i s d i f f i c u l t y when estimating p r o b a b i l i t i e s from a small number of t r i a l s some ad hoc modi f i c a t i o n to the observed frequencies i s usually made. This tends to move a l l the n.'s closer to the mean N/r. Some s t a t i s t i c i a n s f e e l that a modification of the form (n^+k)/(N+kr), where k i s ref e r r e d to as a \" f l a t t e n i n g constant\" which i s added to each of the sample frequencies, i s a reasonable one. The \" f l a t t e n i n g constant\" k=l i s used here. This implies that i f the sample frequencies, n_^ , were obtained from the estimates of P ^ j ^ ' the p r o b a b i l i t y of the t r i p l e t i j k , by summing over a l l k and a l l j , then the appropriate form of these estimates 3 2 i s (n... +k' )/(N+k'r ), where k' = 1/r . A more d e t a i l e d i u s t i f i c a t i o n i s i j k J ' given by Raviv [2] and others. 2.1.4 Data Sources The input data from which the bigram and trigram frequencies were compiled came from two sources: (1) approximately 20,000 characters from, \"Bigness i s a Numbers Game\" by Sanford Rose [26], Fortune magazine; (2) approximately 340,000 characters from Bertrand Russell's book, \"Education and the S o c i a l Order\" [27]. 10 I t was o r i g i n a l l y intended to get a b e t t e r cross s e c t i o n of English language by s e l e c t i n g large data samples from a wide d i v e r s i t y of sources, ranging from newspaper e d i t o r i a l s to popular s c i e n t i f i c a r t i c l e s . However, the process of key-punching the data was both too c o s t l y and too time consuming f o r the resources a v a i l a b l e at the time. Russell's book was used because i t had already been transcribed to computer cards and thus provided a ready supply of data. The bias to l i t e r a r y English as a r e s u l t of t h i s large sample may be o f f s e t i n the future by incorporating data samples from other sources. To test the performance of the d e c i s i o n algorithms discussed i n Chapter III on the design population, a block of 10,000 characters was chosen a r b i t r a r i l y from t h i s design data as a test sample. This sample was stored i n a separate f i l e on magnetic tape. A second test sample of 10,000 characters was taken from a popular s c i e n t i f i c j o u r n a l , S c i e n t i f i c American. The a r t i c l e used was, \"The Rangelands of the Western U.S.,\" by R. M. Love [28] and was also stored on magnetic tape. The purpose of t h i s test sample was to allow a check on the s u f f i c i e n c y of the source s t a t i s t i c s . 2.2 Tentative Decisions 2.2.1 The Character Recognizer In t h i s t h e s i s , the term character recognizer ref e r s to that part of the character recognition problem which i s depicted i n the block diagram of F i g . 3. I t performs three functions. The transducer describes the state of nature, or character input class i n terms of a set of observ-ables and assigns a value to them. F i n a l l y , the c l a s s i f i e r evaluates the measurements and makes a te n t a t i v e d e c i s i o n or c l a s s i f i c a t i o n . In t h i s r Real World Character Categories Feature Space Measurement Space Transducer x l X l \u00E2\u0080\u00A2 X r Feature X r E xtractor \u00E2\u0080\u00A2 X n X n C l a s s i f i e r Character Recognizer D e c i s i o n Space Tentative Decision J F i g . 3. Block diagram of character recognizer, using measurements only 12 the s i s , we are not concerned with the i n d i v i d u a l functions of these com-ponents but rather with the performance of the character recognizer as a si n g l e e n t i t y . 2.2.2 The Character Recognizer Model Since we are not concerned with the i n t e r n a l workings of the character recognizer, i t may be modelled as shown i n F i g . 4. The term channel i s used to r e f e r to a d e s c r i p t i o n of how the sequence of input characters i s mapped into a sequence of ten t a t i v e decisions. The input to the channel i s orthographic English text. The output, i n general, i s garbled English or unorthographic text. The errors which have been introduced are caused by the imperfections of the character recognizer and may be c a l l e d channel noise. Since we do not have an e x p l i c i t d e s c r i p t i o n of the noise c h a r a c t e r i s t i c s nor of the character recognizer without the noise, the channel descriptions must contain the noise i m p l i c i t l y . The channel i s described by a t r a n s i t i o n matrix, C. Each element, c^_., of the matrix represents the p r o b a b i l i t y with which the character recognizer makes the tentative d e c i s i o n that the input character i s a j , when the true input character i s an i . The t r a n s i t i o n p r o b a b i l i t y matrix i s used by the d e c i s i o n algorithms as a weighting f a c t o r for the tentative decisions when considering how much emphasis should be placed on the measurements and how much on context. Each element of the p r o b a b i l i t y t r a n s i t i o n matrix expresses the confidence with which a l e t t e r has been c l a s s i f i e d . Since the elements on the p r i n c i p a l diagonal of the t r a n s i t i o n matrix r e f l e c t the frequency of correct recognition these give a p a r t i a l i n d i c a t i o n of how w e l l the CRS w i l l perform. The other f a c t o r to be considered i n estimating system performance i s the a p r i o r i p r o b a b i l i t y d i s t r i b u t i o n of the source characters. A fig u r e of merit, F, 13 Noise Character Categories Real World Channel Decision Space Tentative Decisions F i g . 4. The Character Recognizer Model used to describe the performance of CRS which does not employ contextual constraints may be defined as 27 (2) F. = E p.c. . i=l 1 13 3 = i where i s the a p r i o r i p r o b a b i l i t y of the i t h character and c i s an element on the p r i n c i p a l diagonal of the t r a n s i t i o n matrix. 2.2.3 Channel Simulation The input to the d e c i s i o n algorithm i s unorthographic English text. To obtain t h i s data from an e r r o r free sample of text we must sim-ulate the channel i f we do not wish to b u i l d an actual character recognizer. This may be done with a general purpose d i g i t a l computer. The method used to simulate the channel i s shown i n the block diagram of F i g . 5. The t r a n s i t i o n matrix of the channel we wish to simulate i s e s s e n t i a l l y an array of d i s c r e t e p r o b a b i l i t y density functions, one for each of the characters of the input alphabet. The corresponding p r o b a b i l i t y d i s t r i b u t i o n s are derived from these density functions. A s t r i n g of characters i s input to the simulator and i s e d i t t e d to remove punctuation marks. A pseudo-random number generator generates p r o b a b i l i t i e s and assigns these successively to each character i n the sequence. Each input character \u00C2\u00A3 S t a r t } Input channel d e s c r i p t i o n , p r o b a b i l i t y density functions. 1 Convert density functions to p r o b a b i l i t y d i s t r i b u t i o n s . Input a s t r i n g of characters 1 1 Remove punctuation marks, and numerals Assign a random p r o b a b i l i t y to each character Match p r o b a b i l i t y to p r o b a b i l i t y d i s t r i b u t i o n function to f i n d output character i Replace input character with output character 1 Output scrambled character s t r i n g 1 / \ / End of input data? ) No ( Stop ) F i g . 5. Flowchart of Channel Simulation 15 i s then used to s e l e c t the appropriate p r o b a b i l i t y d i s t r i b u t i o n function and i t s associated p r o b a b i l i t y determines the corresponding output character. The assigned p r o b a b i l i t y i s located i n the p r o b a b i l i t y d i s t r i b u t i o n by using a 5-level binary search tree. In this manner an output character i s associated with each input character. This procedure i s repeated u n t i l the supply of input characters i s depleted. 2.3 Channel Representations 2.3.1 Symmetric Channels As we have seen, the channel i s completely described by a trans-i t i o n p r o b a b i l i t y matrix. The symmetric channels are described by symmetric matrices. Several d i f f e r e n t symmetric channels have been inve s t i g a t e d i n the experiments of Chapter IV. The p r i n c i p a l d i f f e r e n c e between each of these channels i s the di f f e r e n c e i n the elements of the p r i n c i p a l diag-onals. For convenience of reference each d i f f e r e n t channel i s given a type name and i s described below. Type A 1-e 1-s 1-e 0 0 0 0 1 where e= 25oc The p r o b a b i l i t y of a correct transmission through the channel for each character i s 1-e, except f o r the blank which i s transmitted cor-r e c t l y each time. If an erro r i s made, each i n c o r r e c t character occurs with equal p r o b a b i l i t y , namely e/25. 16 2.3.2 Regularly Structured Channels The r e g u l a r l y structured channels considered here f a l l i nto two categories, the M-type channels and the IM-type channels. The structure of these channels bears some s i m i l a r i t y to the structure of the symmetric A-type channels described above. For both types of channels the erro r p r o b a b i l i t i e s are uniformly d i s t r i b u t e d over the remaining 25 characters. A blank i s never mistaken f o r a character and a character i s never mistaken for a blank. The d i s t i n g u i s h i n g c h a r a c t e r i s t i c of the M-type channel i s that i t s elements on the p r i n c i p a l diagonal have a p o s i t i v e c o r r e l a t i o n with the a p r i o r i l e t t e r p r o b a b i l i t i e s of the source. For the experiments of Chapter IV a l i n e a r f u n c t i o n a l r e l a t i o n s h i p i s used. A t y p i c a l element, m.j, on the p r i n c i p a l diagonal i s given as, m \u00E2\u0080\u009E = aP(G\) + b, where j = i , a >_0, CKb to be the true character i d e n t i t y at time n, i f i s any integer l*<_27 such that P C o ^ / x 1 , . ...x11) = max[P(a n=l/x 1,...,x n),...,P(a n=27/x 1,...,x n)] (3) where P ( a n = i / x \ . . .,x11) denotes the c o n d i t i o n a l p r o b a b i l i t y that the state of the Markov chain at time n i s i , given a l l past tentative decisions. 3.3 Bayes' Rule, I t e r a t i v e Form 3.3.1 Development I f we assume that the source i s a regular simple Markov chain, the vector of a p o s t e r i o r i p r o b a b i l i t i e s used to make Bayes' de c i s i o n i n (3) can be calculated from known qua n t i t i e s i n an i t e r a t i v e manner. This i t e r a t i v e structure i s i d e a l f or permitting the procedure to be implemented on a d i g i t a l computer. I t should be noted that i f a higher order approxim-ation to the source i s used no changes need be made i n the development of the i t e r a t i v e form since a higher order Markov process can be reduced to 20 a regular simple Markov chain [30]. In addition to the previous assumptions, we assume that each measurement based tentative decision depends only on the i d e n t i t y of the corresponding character and i s independent of past or future characters. Thus, x\...,x are c o n d i t i o n a l l y independent, which implies that P ( x \ . . . >^i/a1=\u00C2\u00B1,a2=3 , . .. ,an=k) = P(x 1/a 1=i) -P(x 2/a 2=j) .. .P(x n/a n=k) (4) or P(x n/a n=k) = P(x n/a n=k,x 1,...,x n _ 1) (5) Each element of the a p o s t e r i o r i p r o b a b i l i t y vector may be expressed as follows: \u00E2\u0080\u009E . n , , 1 n. P(,a =k,x ,...,x ) p ( a =k/x ,...,x ) = P ( x i ; . . : , x n ) ^ n , n, 1 n-1. _ . 1 n-1. P (a =k,x /x ,...,x ) . P ( x ,...,x ) P ( x n / X l , . . . , x n - l ) . P ( x l j . . . > x n - l ) P(a n=k,x n/x 1,...,x n 1 ) P C x ^ x 1 , . . . ^ \" 1 ) n. n 1 n-1. . n , , 1 n-1 P(x /a =k,x ,...,x )\u00E2\u0080\u00A2 P(a =k/x ,...,x ) r \u00E2\u0080\u009E / n . n \ 1 n-1. \u00E2\u0080\u009E . n . , l n _ i . ~ E P(x /a =j,x ,...,x ) \u00E2\u0080\u00A2 P(a =j/x x,...,x n L) j=l Using (5) we have P ^ k / x 1 , \u00E2\u0080\u00A2\u00E2\u0080\u00A2\u00E2\u0080\u00A2/\u00E2\u0080\u00A2Vp(xV-k) ( 7 ) P(a -k/x ,...,x ) j p ( a n = j / x l > - i # > x n - l ) . P ( x n / a n = j ) J=l but F(an=k/x1,...,xn 1 ) = E P(a n=k/a n 1=i) . p ( C T n ^ i / x 1 , . . . ,xn (8) 3=1 Thus the i t e r a t i v e procedure can be stated as follows: \u00E2\u0080\u009E 1 n . E P ^ k / a ^ i ) ^ ( a ^ t / x 1 , . . i .x11\"1) . P ( x V - k ) P(a =k/x ,...,x ) = i i i \u00E2\u0080\u0094 (9) j i i J i P C ^ j / a \" - 1 - ! ) \u00E2\u0080\u00A2 P ( 0 n - 1 = i / x 1 , . . . , x n - 1 ) - P ( x n / a n = j ) where P(a =i/x ,...,x ) f o r i = l , . . . , r i s a vector representing the\" state of the decision process at time n-1, P(a n=k/o n ^=i) for k = l , . . . , r and i = l , . . . , r represent the st a t e t r a n s i t i o n s of the Markov source and P(xn/o~n=k) for k = l , . . . , r are the channel t r a n s i t i o n p r o b a b i l i t i e s . The vector of a p o s t e r i o r i p r o b a b i l i t i e s at time n can thus be derived from a knowlege of the state at time n-1. Hence this process i s completely s p e c i f i e d i f we know the i n i t i a l state d i s t r i b u t i o n vector. This vector i s obtained from the source data when compiling the desired s t a t i s t i c s . The vector P ( a n = i / x \ . . . ,x n can be regarded as a vector of a p r i o r i p r o b a b i l i t i e s to which Bayes' theorem i s applied to obtain the a p o s t e r i o r i p r o b a b i l i t i e s describing the state of the dec i s i o n process at time n. Bayes' rule (3) i s then used to make the f i n a l d e c i s i o n on the state a 1 1. The a p r i o r i p r o b a b i l i t y vector summarizes a l l the inform-ation that i s relevant to making a dec i s i o n on the state cj n at time n with-out a c t u a l l y having observed the state an through the noisy channel. This l a t t e r information i s contained i n the channel t r a n s i t i o n p r o b a b i l i t y . 3.3.2 Implementations Each d e c i s i o n procedure, or algorithm used i n t h i s thesis was implemented using a Fortran subroutine. The Bayes' i t e r a t i v e procedure (9) can be implemented as i t stands above i f we assume the source to be a f i r s t -order Markov source. The BAYES2 subroutine implements t h i s case. The Markov state t r a n s i t i o n p r o b a b i l i t i e s are computed using a bigram table and the sta t e s ' i n i t i a l p r o b a b i l i t y d i s t r i b u t i o n vector. These s t a t i s t i c s along with the other inputs to the algorithm, the channel t r a n s i t i o n prob-a b i l i t i e s and the tentative decisions, were obtained as indicated i n Chapter I I . The BAYES2 algorithm computes the state vector, or a p o s t e r i o r i prob-a b i l i t y vector, at each time n and decides on o\" using (3). The algorithm 22 i s repeated for each character of the input message. The input message i n t h i s case i s the output from the automatic character recognition machine which had been simulated previously. The BAYES2 algorithm i s r e f e r r e d to as Bayes 1 d e c i s i o n using bigram s t a t i s t i c s . The BAYES1 algorithm implements Bayes' de c i s i o n procedure with the m o d i f i c a t i o n that a source with a f i r s t - o r d e r approximation to the n a t u r a l language i s used instead of the f i r s t - o r d e r Markov source. This means that the input characters are independent but occur with t h e i r a p r i o r i p r o b a b i l i t i e s . Hence, the BAYES1 algorithm i s referred to as Bayes' d e c i s i o n using a p r i o r i p r o b a b i l i t i e s . The BAYES3 algorithm also employs Bayes' i t e r a t i v e procedure (9) to compute the state vector a n and decides on the p a r t i c u l a r state r e a l i z - \u00E2\u0080\u00A2 a t i o n using (3). This algorithm, however, assumes the source to be a second-order Markov source. These second-order s t a t i s t i c s can be employed d i r e c t l y i n (9) provided each state i s represented by a two character s t r i n g and the state t r a n s i t i o n s of one two-character s t r i n g to any other two-character s t r i n g are known. This method requires a large amount of computer s s storage since the Markov matrix i s 27 x 27 , where s i s the length of the character s t r i n g f or each s t a t e . An a l t e r n a t i v e approach, u t i l i z i n g l e s s storage, considers a table of trigrams to be a c o n d i t i o n a l bigram table [2]. The trigram table i s p a r t i t i o n e d into 27 bigram tables. Each bigram table i s conditioned on the (n-2)nd state of the Markov chain. This algorithm i s r e f e r r e d to as Bayes' dec i s i o n using trigram s t a t i s t i c s . A l l three of the above algorithms use only past information to decide on the character i d e n t i t y at time n. ' -3.4 Bayes' Rule, Look-ahead Mode of Decision I n t u i t i o n would t e l l us that i f we had more relevant information a v a i l a b l e before making a d e c i s i o n we would be able to make a be t t e r d e c i s i o n . 23 In the character recognition a p p l i c a t i o n t h i s a d d i t i o n a l information may-be obtained by observing subsequent characters before deciding on the i d e n t i t y of the present character. I f a s i n g l e subsequent tentative decision i s considered, Bayes' decision rule i s given as follows: Choose to be the true state r e a l i z a t i o n at time n i f i|) i s any integer 1<^<27 such that / n ,, 1 n+l. . . n ,. 1 n+l. n 1 n+l. ,,\u00E2\u0080\u009E s P(o =\u00C2\u00AB>/x ,...,x ) = max [P(a =l/x , . .. ,x ),...,P(a =27/x ,...,x ) (10) The vector P(a n=iji/x^, . . . ,xn+^\") , 1<^<27, can be calculated i n the same manner as shown above i n se c t i o n 3.3.1 with one a d d i t i o n a l step a f t e r obtaining P(a^i^/x^, . .. ,x\u00C2\u00B0) . Thus, since n 1 n+l P ( x n + 1 / a n = i ) . P(a n=i/x 1,...,x n) P(\u00C2\u00B0 = 1 / X '\u00E2\u0080\u00A2\u00E2\u0080\u00A2\"X ) = I P ( x n + 1 / a n = k ) . P ( on= k/x x\u00C2\u00BB) k=l and using the previous assumption p / n+l . n+l . n .. , n+l , n+l .. .. \u00E2\u0080\u009E. P U lo =j, a =i) = P(x lo =j) (12) to obtain n+l. n .. I n+l. n+l . n .. \u00E2\u0080\u009E, n+l . . n .. ., \u00E2\u0080\u009E. P(x lo =i) = E P(x lo =j,a =i) . P(a =j/a =i) (13) i = l i t follows that Bayes' i t e r a t i v e d ecision procedure f o r the look-ahead mode of d e c i s i o n can be stated as r ^ / n+l / ^ n + l .x \u00E2\u0080\u009E / n + l . , n .. \u00E2\u0080\u009E, n . , 1 n. n ]_ n + 1 j = i p ( x / C T =J)-P(a =j/a =i).P(a =i/x ,...,x ) ( 1 4 > P(a =i/x ,...,x ) = -r. \u00E2\u0080\u0094\u00E2\u0080\u00A2 I I P ( x n + 1 / a n + 1 = t).P(a n + 1=t/a n=k)-P(a n=k/x 1,...,x n) k=l t=i The BAYES4 subroutine c a l c u l a t e s the a p o s t e r i o r i p r o b a b i l i t y vector P ( a n = i / x \ . . . , x1\"*'\"'\") using (14) and makes Bayes' d e c i s i o n using (10). BAYES4 assumes the source to be a f i r s t - o r d e r Markov source and uses a bigram look-up table to obtain the state t r a n s i t i o n p r o b a b i l i t i e s . The BAYES5 subroutine also implements equation (14) and makes the optimal decision using (10). I t assumes the source to be a second-order. Markov source. These higher order s t a t i s t i c s are implemented i n the same manner as described above for the BAYES3 subroutine with the a d d i t i o n of the c o n d i t i o n a l bigram matrix, whose t y p i c a l element i s P(o n +^=j / "Thesis/Dissertation"@en .
"10.14288/1.0102175"@en .
"eng"@en .
"Electrical and Computer Engineering"@en .
"Vancouver : University of British Columbia Library"@en .
"University of British Columbia"@en .
"For non-commercial purposes only, such as research, private study and education. Additional conditions apply, see Terms of Use https://open.library.ubc.ca/terms_of_use."@en .
"Graduate"@en .
"Perceptrons"@en .
"A computer simulation study of the application of contextual constraints to character recognition"@en .
"Text"@en .
"http://hdl.handle.net/2429/34858"@en .
*