UBC Theses and Dissertations

UBC Theses Logo

UBC Theses and Dissertations

Machine recognition of typewritten characters based on shape descriptors Kanciar, Eugene J.A. 1974

You don't seem to have a PDF reader installed, try download the pdf

Item Metadata

Download

Media
UBC_1974_A7 K35.pdf [ 5.65MB ]
Metadata
JSON: 1.0065610.json
JSON-LD: 1.0065610+ld.json
RDF/XML (Pretty): 1.0065610.xml
RDF/JSON: 1.0065610+rdf.json
Turtle: 1.0065610+rdf-turtle.txt
N-Triples: 1.0065610+rdf-ntriples.txt
Original Record: 1.0065610 +original-record.json
Full Text
1.0065610.txt
Citation
1.0065610.ris

Full Text

MACHINE RECOGNITION OF TYPEWRITTEN CHARACTERS BASED ON SHAPE DESCRIPTORS by Eugene J . A . Kanc iar B . A . S c , U n i v e r s i t y of Western O n t a r i o , 1972 A THESIS SUBMITTED IN PARTIAL FULFILMENT OF THE REQUIREMENTS FOR THE DEGREE OF MASTER OF APPLIED SCIENCE i n the Department o f E l e c t r i c a l Engineer ing We accept t h i s thes i s as conforming to the required^standard THE UNIVERSITY OF BRITISH COLUMBIA October, 1974 I n p r e s e n t i n g t h i s t h e s i s i n p a r t i a l f u l f i l m e n t o f t h e r e q u i r e m e n t s f o r a n a d v a n c e d d e g r e e a t t h e U n i v e r s i t y o f B r i t i s h C o l u m b i a , I a g r e e t h a t t h e L i b r a r y s h a l l m a k e i t f r e e l y a v a i l a b l e f o r r e f e r e n c e a n d s t u d y . I f u r t h e r a g r e e t h a t p e r m i s s i o n f o r e x t e n s i v e c o p y i n g o f t h i s t h e s i s f o r s c h o l a r l y p u r p o s e s may be g r a n t e d by t h e H e a d o f my D e p a r t m e n t o r by h i s r e p r e s e n t a t i v e s . I t i s u n d e r s t o o d t h a t c o p y i n g o r p u b l i c a t i o n o f t h i s t h e s i s f o r f i n a n c i a l g a i n s h a l l n o t be a l l o w e d w i t h o u t my w r i t t e n p e r m i s s i o n . ( D e p a r t m e n t o f T h e U n i v e r s i t y o f B r i t i s h C o l u m b i a V a n c o u v e r 8 , C a n a d a D a t e i i ABSTRACT An o p t i c a l character r e c o g n i t i o n technique f o r t y p e w r i t t e n l e t t e r s was developed w i t h a p p l i c a t i o n to a personal reading machine f o r the b l i n d . The fe a t u r e e x t r a c t i o n process defined a chara c t e r i n terms of l i n e s and shapes which are c l o s e l y r e l a t e d to a person's d e s c r i p t i o n of form. The system was developed to i d e n t i f y a l l upper and lower-case t y p e w r i t t e n characters i n the alphabet. A l e t t e r was described by any combination of seven b a s i c f e a t u r e s , u s u a l l y i n a 3 x 3 f e a t u r e m a t r i x . The e x t r a c t i o n of t o p o l o g i c a l (or s t r u c t u r a l ) p r o p e r t i e s had s e v e r a l advantages; a very s m a l l f e a t u r e d i c t i o n a r y w i t h about 100 code-word e n t r i e s ; quick and simple t r a i n i n g procedure f o r a new font ; and, a strong c a p a b i l i t y to handle character d e f o r m i t i e s . A separate technique, based on edge examination, was developed to i d e n t i f y characters w i t h prominent di a g o n a l f e a t u r e s . S e q u e n t i a l c l a s s i f i c a t i o n was employed throughout the e n t i r e system so th a t r e c o g n i t i o n was made once a s u f f i c i e n t l y unique measure was s a t i s f i e d . Tests on both repeated c h a r a c t e r s and t y p e w r i t t e n passages p^cf'j^^d p̂"-̂ c"i™."*"cly Q 7 < y c"cu"ccy vhcn the s y t — *.— c ~pp 1 i c J *~z three fonts which v a r i e d from a s t y l i z e d to a s e r i f l e s s p r i n t . For a scanning r a t e of 60 wpm, a r e c o g n i t i o n speed of two char a c t e r s per second was achieved. The system was developed on a PDP-12 computer and i s f u l l y compatible f o r r e a l i z a t i o n on a PDP-8 computer w i t h 8K of memory. i i i TABLE OF CONTENTS Page A b s t r a c t i i Table of Contents • i i i L i s t of I l l u s t r a t i o n s v Acknowledgement . .... v i I . INTRODUCTION 1 1.1 Purpose of Research • • 1 1.2 P e r s p e c t i v e 1 1.3 E x i s t i n g Schemes • • • 2 1.3.1 Template Matching 2 1.3.2 Contour T r a c i n g 3 1.4 A New Approach 4 I I . THE PATTERN RECOGNITION SCHEME 6 2.1 I n t r o d u c t i o n 6 ' ? ? Ir̂ v.T.t? - „ . ; . . . . . . . . . . . » • • >i' . . . . . i ' 2.3 Preprocessing 6 2.4 Feature E x t r a c t i o n , P a r t 1 7 2.4.1 I n t r o d u c t i o n • 7 2.4.2 B a s i c Concepts 7 2.4.3 S t r i n g I n d e n t i f i c a t i o n 8 2.4.4 Four L e v e l R e s o l u t i o n 9 2.4.5 B a s i c Feature Assignment 10 2.4.6 V e r t i c a l Compression 12 2.4.7 S e r i f D e l e t i o n 13 2.4.8 H o r i z o n t a l Feature I n t e g r a t i o n 14 2.4.9 H o r i z o n t a l Compression 16 2.4.10 An Example w i t h Curvature 17 2.5 Feature E x t r a c t i o n , P a r t 2 19 2.5.1 Characters w i t h Diagonals 19 2.5.2 S p e c i a l Cases 20 i v 2.6 C l a s s i f i c a t i o n 20 2.6.1 I n t r o d u c t i o n 20 2.6.2 Code-Word Generation 21 2.6.3 Subgroup Assignment 21 2.6.4 I d e n t i f i c a t i o n 22 2.7 O v e r a l l System C o n f i g u r a t i o n 23 I I I TESTS AND RESULTS 24 3.1 D e s c r i p t i o n of Test M a t e r i a l 24 3.2 Tests and R e s u l t s w i t h Repeated L e t t e r s 25 3.3 Tests and R e s u l t s w i t h Typewritten Passages 25 3.4 Sources of E r r o r 26 3.5 T r a i n i n g Procedure 28 3.6 Speed 28 3.7 System Storage Requirements 29 3.8 System F l e x i b i l i t y 29 IV CONCLUSION 30 4.1 Summary 30 4.2 Recommendations f o r F u r t h e r Research 31 References 33 APPENDIX A Confusion M a t r i c e s 35 APPENDIX B Examples of the Character R e c o g n i t i o n Process .... 40 APPENDIX C Examples of E r r o r s 43 APPENDIX D Examples of System F l e x i b i l i t y 45 APPENDIX E B a s i c Flowchart f o r the Diagonal Checking Routine. 47 V LIST OF ILLUSTRATIONS Page 1. The template matching scheme 2 2. The contour t r a c i n g scheme 4 3. The c l o s u r e c r i t e r i a 8 4. S t r i n g i d e n t i f i c a t i o n s i n the l e t t e r 'R' 9 5. Four l e v e l r e s o l u t i o n of s t r i n g i n f o r m a t i o n f o r the l e t t e r 'R' 10 6. B a s i c f e a t u r e assignments f o r the l e t t e r 'R' 12 7. V e r t i c a l compression t o three l e v e l s f o r the l e t t e r 'R' ... 12 8. S e r i f d e l e t i o n i n the l e t t e r 'R' 13 9. H o r i z o n t a l f e a t u r e i n t e g r a t i o n f o r the l e t t e r 'R' 16 10. H o r i z o n t a l compression f o r the l e t t e r ' R' 17 11. Feature i n t e g r a t i o n a p p l i e d t o a c h a r a c t e r w i t h c u r v a t u r e , the l e t t e r ' s' as an example 18 12. R e c o g n i t i o n of diagonal c h a r a c t e r s by s e l e c t i v e edge examination 19 13. The three code-words f o r the l e t t e r 'R' 21 14. Subgrouping the alphabet 22 15. Character i d e n t i f i c a t i o n , the l e t t e r 'c' as an example ... 23 16. Block diagram of the system 23 17. The three fonts used i n the t e s t s , reproduced i n f u l l s i z e 24 18. Suggested c l o s u r e c r i t e r i a t o r e p l a c e the c r i t e r i a i n Figure 3 32 ACKNOWLEDGEMENT The author would l i k e to thank Dr. M.P. Beddoes f o r h i s constant i n t e r e s t and i n p u t of ideas i n t o t h i s p r o j e c t . The a v a i l a b i l i t y of h i s time to my i n q u i r i e s i s g r a t e f u l l y a p p r e c i a t e d . Thanks are a l s o due to Rodney G. George f o r a l l h i s a s s i s t a n c e w i t h software implementation as w e l l as h i s p r a c t i c a l i n s i g h t i n t o problem areas. F i n a l l y I would l i k e t o extend my a p p r e c i a t i o n to a l l the graduate students, f a c u l t y and s t a f f w i t h whom I shared my experiences. F i n a n c i a l a s s i s t a n c e was provided by the N a t i o n a l Research C o u n c i l of Canada, the M e d i c a l Research C o u n c i l of Canada, and the Vancouver Foundation. I . INTRODUCTION 1 1.1 Purpose of Research The purpose of t h i s research was to : (1) devis e an o p t i c a l character r e c o g n i t i o n scheme th a t would be a p p l i c a b l e to a pe r s o n a l reading machine f o r the b l i n d ; (2) develop a f e a t u r e e x t r a c t i o n technique based on shape a n a l y s i s and comparable to human d e s c r i p t i o n of form; (3) o b t a i n p r e l i m i n a r y r e c o g n i t i o n r e s u l t s and determine the sources of e r r o r s w i t h t y p e w r i t t e n c h a r a c t e r s . 1.2 P e r s p e c t i v e The b a s i c approach to read i n g a i d s f o r the b l i n d has been to develop d i r e c t t r a n s l a t i n g devices that produce f a c s i m i l e r e p r o d u c t i o n s of c h a r a c t e r s w i t h e i t h e r t a c t i l e [4] or a u d i t o r y [ 3 ] , [20] output. The advantages achieved i n t h i s approach are compactness, t e c h n i c a l s i m p l i c i t y , and economy. The disadvantages of t h i s approach are t h a t the reader must be t r a i n e d to decode the output and the reading r a t e s are low. The next order of complexity i n a reading machine has been to employ OCR ( o p t i c a l character r e c o g n i t i o n ) w i t h s p e l l e d o r spoken speech as output. Character r e c o g n i t i o n schemes have been i n commercial use f o r a number of years. Present users of OCR i n c l u d e the post o f f i c e , government, and i n d u s t r i a l concerns. I n these cases OCR i s used w i t h a p p l i c a t i o n s r e q u i r i n g l a r g e volume, h i g h speed, and accuracy. U s u a l l y these schemes u t i l i z e a w e l l d e f i n e d s et of c h a r a c t e r s . By c o n t r a s t , r e l a x a t i o n of many performance c r i t e r i a may be a p p l i e d to a p e r s o n a l reading machine; but, at the same time t h i s machine must cope w i t h a wider range o f f o n t s and p r i n t i n g q u a l i t y than those which a r e encountered i n commercial a p p l i c a t i o n s . Two schemes have been developed over the l a s t t e n years f o r t h i s p a r t i c u l a r purpose and these are discussed i n the next s e c t i o n . 2 1.3 E x i s t i n g Schemes 1.3.1 Template Matching One s o l u t i o n to the OCR problem has been Smith-Mauch 1s template matching scheme [20]. The l o c a t i o n s of b l a c k areas, encountered as the template scans across the c h a r a c t e r , are processed to i d e n t i f y i t . This system, i l l u s t r a t e d i n F i g u r e 1, c o n s i s t s of a hand h e l d probe t h a t focuses the p r i n t e d c h a r a c t e r onto a two-dimensional p h o t o c e l l array of 12 sensors. The d i s c r i m i n a t i n g r o u t i n e i s a p p l i e d i n a m u l t i p l e snapshot sequence. As a cha r a c t e r i s scanned across the o p t i c a l i n p u t , i t s t i m u l a t e s a t r i g g e r c e l l (TC). Every time a w h i t e to b l a c k or b l a c k to white t r a n s i t i o n occurs i n the t r i g g e r c e l l , a snapshot of a l l the c e l l s i n template i s taken. Output from these c e l l s i s converted i n t o a f i v e - b i t code that s p e c i f i e s the c h a r a c t e r . The o b j e c t i v e of t h i s scheme i s an accuracy of 90% to 95% at 80 wpm to 90 wpm but t h i s has y e t to be achieved. The present reading speed i s about 20 wpm to 25 wpm. The r e c o g n i t i o n l o g i c of t h i s machine i s designed w i t h i n t e g r a t e d c i r c u i t s . For t h i s scheme t o work w e l l , s e v e r a l c r i t i c a l parameters must be s a t i s f i e d . The exact p o s i t i o n i n g of the template over the l e t t e r i s important. A l s o , the th r e s h o l d f o r i n f o r m a t i o n when a c e l l on the template i s on l y p a r t i a l l y i n a b l a c k f i e l d i s a r e l e v a n t c o n s i d e r a t i o n . This scheme, i n e x t r a c t i n g s p a t i a l i n f o r m a t i o n i n a snapshot manner, does not make good use of geometric or s t r u c t u r a l p r o p e r t i e s which r e q u i r e more continuous measurements. Thus the system's performance i s l i m i t e d because the e x t r a c t e d i n f o r m a t i o n i s not complete enough to handle problems such as charac t e r d i s t o r t i o n s o r a m b i g u i t i e s . FIRST "SNAPSHOT" SECOND "SNAPSHOT" THIRD "SNAPSHOT" FOURTH "SNAPSHOT" F i g . 1 The template matching scheme. 3 1.3.2 Contour Trac i n g Mason [16] and Clemens [6] proposed an OCR scheme which u t i l i z e d contour t r a c i n g . This technique, i l l u s t r a t e d i n F i g u r e 2, re q u i r e s three pieces of in f o r m a t i o n : a code word t h a t s p e c i f i e s the charac t e r ' s geometric extrema i n the x and y d i r e c t i o n s ; a coord word which s t o r e s the l o c a t i o n of each extremum; and, the h e i g h t - t o - w i d t h r a t i o (H/W) that c l a s s i f i e s the c h a r a c t e r i n t o one of fou r subgroups. From these measurements a 3 0 - b i t code word i s d e r i v e d . The lookup t a b l e contains three to f i v e p o s s i b i l i t i e s f o r every c h a r a c t e r . The o r i g i n a l scheme, as developed by Clemens [6] achieved 1% to 3% e r r o r a t 100 wpm and was t e s t e d on ten f o n t s . F u r t h e r improvements upon the system have reduced the e r r o r to 0.1% but t h i s has been achieved on work w i t h only one f o n t [13], [16]. T h i s system uses a PDP-9 type o f computer and r e q u i r e s a f l y i n g spot scanner f o r the data i n p u t . The code word i s a good t o p o l o g i c a l d e s c r i p t i o n of a ch a r a c t e r ' s shape. However both the coord word and the h e i g h t - t o - w i d t h r a t i o are r i g i d geometric measurements which do not make t h i s scheme r e a d i l y auap Uctuxc Lu u L u e i J.UHL&. i i i e COGi.ll wGiiU j_o d £>cuSi.kiyc i U i i ^ i - j - C i x o f p o s i t i o n and as such, i t w i l l a l t e r as the extrema are l o c a t e d i n new region s . The he i g h t - t o - w i d t h r a t i o s a l s o vary among the d i f f e r e n t f o n t s . The p r i n t m a t e r i a l used by Mason and Clemens t o t e s t t h e i r r e c o g n i z e r was f r e e from breaks. As the r e c o g n i z e r deduces from the p r i n t not o n l y the p r i n t i n f o r m a t i o n but a l s o the p o s i t i o n t o ex p l o r e next, a break produces a doubly c a t a s t r o p h i c e r r o r . These break e r r o r s are minimized by us i n g h i g h r e s o l u t i o n (60 x 60 p o i n t s were used by Lee [13]). One l a s t b a s i c l i m i t a t i o n w i t h the Mason-Clemens r e c o g n i z e r i s t h a t only the o u t s i d e contour of a cha r a c t e r i s used. With ambiguous p a i r s of cha r a c t e r s such as 'D' and 'B', 'c' and 'e', a s p e c i a l technique, such as t a k i n g a v e r t i c a l s l i c e through the c h a r a c t e r ' s c e n t r e and no t i n g the number of i n t e r s e c t i o n s (two or three) was used, because the ou t s i d e contour was an i n s u f f i c i e n t measure. start-* ( 00 10 i . code word: 2. coord word: 00 00 00 10 01 3. H/W ratio 1 0 0 1 1 Fig. 2 The contour tracing scheme. 1.4 A New Approach The pattern recognition techniques that have been developed on large computers are not realizable on small, economical processors. S t a t i s t i c a l and probabalistic approaches [2], [11] are not applicable for this purpose because of their complexity. The two schemes discussed in Section 1.3 represent one*approach to the problem; that i s , discrete spatial information was extracted and no attempt was made to integrate the measurements into more general and higher order information. In this thesis such a new approach was undertaken. well known [1], [14]. However, results show that a complex machine i s needed to recognize one font [16]. This work set out, as a main aim, to invent a set of features to identify a l e t t e r simply. A further consideration has been to relate these features to descriptors of shape which would appear natural to the human reader. For example, with the l e t t e r 'b' we could use the description: "a straight v e r t i c a l l i n e at the l e f t which carries a l i t t l e flag (serif) at the top; this line intersects two horizontal lines, one at the bottom and one half way up, which are traced to the right and, moving together, form a junction at the right." This compact description of form i s strongly topological and gives the gist of how this OCR scheme interprets a character. Our technique requires seven features to characterize a l e t t e r . Usually the l e t t e r was reconstructed within a 3 x 3 feature matrix. A special technique was used for characters with prominent diagonal features, such The process of identifying a le t t e r by means of features i s 5 as 'W', whereby the c h a r a c t e r ' s edge was used f o r i t s i d e n t i f i c a t i o n . T h i s scheme should be l e s s s e n s i t i v e to d i s t o r t i o n and more f l e x i b l e i n i t s r e c o g n i t i o n c a p a b i l i t y than the e x i s t i n g schemes. This t h e s i s i s based on o r i g i n a l work and i s c o n c e p t u a l l y most s i m i l a r to work done by Genchi [12] and Marko [15]. Genchi has developed an OCR scheme which i s used i n the Japanese p o s t a l system f o r r e c o g n i t i o n of handprinted numerals. The numerals were recognized by e x t r a c t i n g a sequence of f e a t u r e s i n h o r i z o n t a l zones. A 3 x 3 scanning window i d e n t i f i e d each 9 element a r r a y as one of 7 p o s s i b l e s t r o k e segments such as blank, v e r t i c a l , s l a n t e d , e t c . I n the second step, each h o r i z o n t a l s t r i n g of s t r o k e segments was c l a s s i f i e d i n t o one of 16 h o r i z o n t a l f e a t u r e s . The t h i r d step was to connect v e r t i c a l l y a l l the i n d i v i d u a l h o r i z o n t a l f e a t u r e s by u s i n g a t r a n s i t i o n t a b l e . The chara c t e r was recognized by t h i s t a b l e . With Marko's system, s p a t i a l f i l t e r i n g was performed on the c h a r a c t e r i n four d i r e c t i o n s ( h o r i z o n t a l , v e r t i c a l , and two d i a g o n a l d i r e c t i o n s ) . From t h i s p r o c e s s i n g , higher order features such as angles, c u r v a t u r e s , and endpoints were e x t r a c t e d . Information was then i n t e g r a t e d i n t o a very compact f e a t u r e m a t r i x . F i n a l l y , a weighting scheme was a p p l i e d to these f e a t u r e s which were then processed by the c l a s s i f i e r . 6 I I . THE PATTERN RECOGNITION SCHEME 2.1 I n t r o d u c t i o n I n order to de s c r i b e the techniques employed i n the c h a r a c t e r r e c o g n i t i o n system i t i s u s e f u l to employ a general model f o r a p a t t e r n r e c o g n i t i o n system w i t h each step i n the p r o c e s s i n g being i d e n t i f i e d : input -Mpreprocessor\-^\feature extractorh^classifien-* output The scanner i s the transducer, a p h o t o - e l e c t r i c d e v i c e , which converts the p r i n t e d c h a r a c t e r i n t o a two-state ( b i n a r y ) i n p u t . The preprocessor, i f present, employs techniques such as data n o r m a l i z a t i o n and s t r o k e t h i n n i n g . The purpose of the f e a t u r e e x t r a c t o r I s to o b t a i n . a s e t of c h a r a c t e r i s t i c measures and statements upon which a d e c i s i o n as to the most probable i d e n t i t y of the p a t t e r n sample can be made. The c l a s s i f i e r , on the b a s i s of the i n f o r m a t i o n provided by the f e a t u r e e x t r a c t o r , a p p l i e s a d e c i s i o n c r i t e r i o n to the p a t t e r n measurements to make a d e c i s i o n as t o which, i f any, of the a l l o w a b l e c l a s s e s the p a t t e r n belongs. The output r e l a y s the i d e n t i f i c a t i o n i n an a p p r o p r i a t e form. 2.2 Input A 64 v e r t i c a l element R e t i c o n RL-64P s o l i d - s t a t e l i n e scanner provided the i n p u t f o r a PDP-12 computer. The speed of the scan corresponded to approximately 60 words per minute. The t y p i c a l p i c t u r e frame f o r a s i n g l e upper case c h a r a c t e r contained 15 x 30 elements and 12 x 20 elements f o r a s m a l l lower case l e t t e r . 2.3 P r e p r o c e s s i n g Preprocessing r e f e r s to data m a n i p u l a t i o n i n the primary stages of a process so th a t redundancy and d i s t o r t i o n can be ap p r e c i a b l y reduced. I n the OCR scheme th a t was developed, there was no formal preprocessing. I t was b e l i e v e d that the s u c c e s s i v e stages of i n f o r m a t i o n h a n d l i n g contained enough s o p h i s t i c a t i o n to be able to cope w i t h the inherent n o i s e i n the bi n a r y data. The system was designed w i t h t h i s premise i n mind. A preprocessor [24] can have the f o l l o w i n g o b j e c t i v e s : a c h a r a c t e r should i d e a l l y be of u n i t 7 thi c k n e s s ; t h i s s k e l e t o n must resemble the l i n e p a t t e r n a human would draw, that i s , no i n f o r m a t i o n other than the l i n e w i d t h i s l o s t ; breaks i n the charac t e r must be searched f o r and, where m i s s i n g , the c o n t i n u i t y should be r e s t o r e d . As u s e f u l as these o b j e c t i v e s may be, f o r t h i s s p e c i f i c a p p l i c a t i o n , where the cost of machine implemen- t a t i o n must be kept low and an e r r o r r a t e of a few percent can be t o l e r a t e d , t h i s preprocessing i s not j u s t i f i e d . 2.4 Feature E x t r a c t i o n , P a r t 1 2.4.1 I n t r o d u c t i o n Feature e x t r a c t i o n i n OCR has r e c e i v e d c o n s i d e r a b l e a t t e n t i o n because the e f f e c t i v e n e s s of t h i s stage p r e d i c a t e s the r e c o g n i t i o n a b i l i t y of the e n t i r e system. I t c o n s i s t s of one or a combination of three p o s s i b l e techniques [21] : g e o m e t r i c a l , t o p o l o g i c a l and mathematical (or s t a t i s t i c a l ) f e a t u r e e x t r a c t i o n . T o p o l o g i c a l f e a t u r e e x t r a c t i o n has been considered i n the r e c o g n i t i o n of handwritten c h a r a c t e r s . Eden and H a l l e [9] found t h a t only 18 d i f f e r e n t s t r o k e s were needed to c o n s t r u c t any E n g l i s h c h a r a c t e r . By p a r t i t i o n i n g the perimeter around a character i n t o e i g h t s e c t i o n s , Tou [22] was abl e to e x t r a c t t o p o l o g i c a l f e a t u r e s such as bays, i n f l e c t i o n s and curvatures from each octant. I n t e r e s t i n g work on computational topology has been reported [23] , however, t h i s approach i s y e t to. be a p p l i e d . 2.4.2 B a s i c Concepts Important concepts and terms are in t r o d u c e d . A s t r i n g i s a continuous segment of b i n a r y l ' s w i t h i n a column and has the property l e n g t h , L. Each s t r i n g i s c h a r a c t e r i z e d by one of three d e s c r i p t i o n s : a v e r t i c a l (V), a c l o s u r e (C), or a centre c e l l (CC). For a character of height H, H/4=T+R (where the remainder, R, i s 0 ^ 3 ) de f i n e s the 1/4 height t h r e s h o l d T. Two types of v e r t i c a l s are d e f i n e d : a f u l l v e r t i c a l and a minor v e r t i c a l . A s t r i n g w i t h T<L<3T was assigned a minor v e r t i c a l . A s t r i n g w i t h L>3T was assigned a f u l l v e r t i c a l . 8 Whenever two branches of a ch a r a c t e r converged to o r diverged from a s t r i n g common to both, that s t r i n g was c l a s s i f i e d as a c l o s u r e . I n the l e t t e r 'c' i n Fi g u r e 3, column thren i s i d e n t i f i e d as a c l o s u r e because i t i n i t i a t e s the divergence of the two branches i n column f o u r . For c l o s u r e to e x i s t a l l the c o n d i t i o n s i n F i g u r e 3, must be s a t i s f i e d . I f columns three and f o u r are interchanged the t e s t would i n d i c a t e convergence. Any s t r i n g which was c l a s s i f i e d as a minor v e r t i c a l was te s t e d f o r c l o s u r e . I f the s t r i n g was i d e n t i f i e d as a f u l l v e r t i c a l , the c l o s u r e c r i t e r i o n was not a p p l i e d . A s t r i n g w i t h L<T was assigned i t s c e n t r e c e l l . However, a s t r i n g w i t h T/2<L<T was a l s o checked f o r a c l o s u r e . O c c a s i o n a l l y a cl o s u r e d i d occure w i t h L<T/2 and i n these cases the c l o s u r e was missed but these e r r o r s were i n f r e q u e n t . I 4 ? I ooo ^ oooooo ooo ooo 16 — C oo oo . lo oo ooo A—oo « oo ooo oo—D oo ooo o I I - oo ooo o oo ooo o ooo o n u o A>C + "55 g + 88 8 D < E oo oo o ° ° „ go _ oo—E ooo oo B—oo oo ooo 0—F ooooooooo oooooo F i g . 3 The c l o s u r e c r i t e r i a . 2.4.3 S t r i n g I d e n t i f i c a t i o n The l e t t e r 'R* was chosen t o demonstrate the OCR technique t h a t was developed. A t y p i c a l b i n a r y p i c t u r e f o r the l e t t e r 'R' i s shown i n F i g u r e 4A. The f i r s t i n f o r m a t i o n t o be noted about the ch a r a c t e r was i t s height and width (28 x 17) and, T (28/4=7). The s t r i n g s i n columns one through three were l e s s than T. Thus each s t r i n g was completely d e s c r i b e d by i t s CC. For the s t r i n g i n column fo u r L>3T, so a f u l l v e r t i c a l was noted. Columns f i v e through eleven contained only CC in f o r m a t i o n . In column twelve the f i r s t s t r i n g was assigned i t s CC. For the second s t r i n g T<L<3T; t h i s was the c o n d i t i o n f o r f u r t h e r processing to determine i f the s t r i n g was best d e s c r i b e d as a minor v e r t i c a l or a c l o s u r e . I n t h i s case divergence was i n d i c a t e d . In column t h i r t e e n the f i r s t two s t r i n g s were c l a s s i f i e d as CC's but the t h i r d s t r i n g , T<L<3T and again a check f o r c l o s u r e was i n i t i a t e d . "This s t r i n g conveyed length i n f o r m a t i o n o n l y and was i d e n t i f i e d as V. The f i r s t s t r i n g i n column fourteen was a l s o t e s t e d f o r c l o s u r e and was found t o be the convergence f o r the two branches i n column t h i r t e e n . The remaining s t r i n g s i n the charac t e r were a l l CC's. These i d e n t i f i c a t i o n s are ta b u l a t e d i n F i g u r e 4B. 12345678901234567 ooooooooooo ooo oooo oo oo o oo o oo o oo o o o o o o o o oo oo oo oo ooo oo ooooooooo oo oooo 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 1 1 1 V 1 1 1 1 1 1 1 2 4 C 27 28 28 27 27 27 13 13 14 14 14 15 15 C 12 27 ?n 97 ?7 ?8 V o o oo oo oo o o oo oo oo ooooooo oooooooo oo o o o oo oo oo oo oo oo ooo ooooo 26 Figure 4B 12345678901234567 F i g u r e 4A F i g . 4 S t r i n g i d e n t i f i c a t i o n s i n the l e t t e r 'R'. 2.4.4 Four L e v e l R e s o l u t i o n The choice of a fou r l e v e l r e s o l u t i o n system was based on the f a c t that the maximum number of f e a t u r e l e v e l s i n any E n g l i s h alphanumeric i s f o u r . The number of f e a t u r e l e v e l s i s equal to the maximum number of i n t e r s e c t i o n s t h a t a v e r t i c a l l i n e can make through a c h a r a c t e r . For example, the c h a r a c t e r s 'g', 'e', and 'o' r e s p e c t i v e l y have f o u r , t h r e e , and two f e a t u r e l e v e l s . The r e s o l u t i o n from t h i s arrangement proved s a t i s f a c t o r y . Thus the s t r i n g i d e n t i f i c a t i o n s f o r each column were quantized i n t o one of fou r p o s s i b l e v e r t i c a l quadrants. For the example t r e a t e d i n S e c t i o n 2.4.3, the i n f o r m a t i o n so processed appears i n Fi g u r e 5. The blank quadrants are zero e n t r i e s . 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 1 1 1 V 1 1 1 1 1 2 A C 27 28 28 27 27 27 13 13 1A 1A 1A 15 15 C 12 27 20 27 27 28 V 26 1 1 1 V 1 1 1 1 1 1 1 2 A C V 13 13 1A 1A 1A 12 C V 20 15 15 C V 27 27 27 V 26 27 27 28 C V 27 27 28 28 F i g . 5 Four l e v e l r e s o l u t i o n of s t r i n g i n f o r m a t i o n f o r the l e t t e r 'R'. 2.4.5. B a s i c Feature Assignment Each column entry of data i s compared t o the i n f o r m a t i o n s t o r e d i n the next column. This comparison r e s u l t s i n one ot s i x b a s i c f e a t u r e s being assigned : (/) U - Up (\) D - Down (—) H - H o r i z o n t a l ( | ) V - V e r t i c a l ( X ) C - Closure ( • ) Z - Zero For two adjacent columns, a p a i r of adjacent elements (X,Y) can be obtained. F i g u r e 6A summarizes the f e a t u r e s e l e c t i o n process f o r two adjacent columns. The assignment of b a s i c f e a t u r e s i s i l l u s t r a t e d i n F i g u r e 6B. When the f i r s t column was compared to the second, the f i r s t (X,Y) p a i r was (1,1) which was represented by an H. Li k e w i s e f o r the second p a i r (27,27). The second and t h i r d columns produced i d e n t i c a l r e s u l t s . When the t h i r d and f o u r t h columns were s t u d i e d , the p a i r s (1,V) and (27,V) r e s u l t e d i n V e n t r i e s i n the f i r s t and f o u r t h v e r t i c a l quadrants. These e n t r i e s , although they may seem s u p e r f l u o u s , are necessary f o r the c o n t i n u i t y of the f e a t u r e s e l e c t i o n process. The reasoning behind such e n t r i e s w i l l become evident when columns eleven and twelve are d i s c u s s e d . A f o u r l e v e l v e r t i c a l was assigned t o column f o u r . In columns f i v e and s i x , the f i r s t two p a i r s (1,1) and (13,13) produced two H f e a t u r e s . The t h i r d p a i r was (20,0). In such a case a check was performed on the p a i r e d elements immediately above (13,13) and below (26,27). Thus (13,13) was com- pared to a p o s s i b l e combination of (20,13); s i m i l a r l y , (26,27) was compared t o (20,27). In both cases the (X,Y) combination w i t h X = 20 was the poorest choice. The comparison t h e r e f o r e remains w i t h (20,0) fo l l o w e d by the l a s t column p a i r (26,27). These p a i r s r e s u l t i n Z and D f e a t u r e s being entered r e s p e c t i v e l y . The second p a i r i n c o l - umns nine and ten (14,0) r e q u i r e d a s i m i l a r search; however, i n t h i s case a more appropriate p a i r (14,15) was found during the search and thus a D f e a t u r e was s t o r e d i n the second l e v e l . The s e l e c t e d b a s i c f e a t u r e was entered on the same v e r t i c a l l e v e l as the X i n die (X,Y) p a i r , i h e f i r s t p a i r (1,2) i n columns eleven and twelve produced a D f e a t u r e . For the second p a i r (15,C) C was assigned. The reason f o r s t o r i n g C was to m a i n t a i n the c o n t i n u i t y of the columns without i n t r o d u c i n g f a l s e i n f o r m a t i o n . Since the f i r s t p a i r (1,2) spanned both columns, a Z e n t r y f o r a no comparison f o r the p a i r (15,C) would have broken column c o n t i n u i t y i n the second row. Whereas, p o s s i b l y e n t e r i n g an H f e a t u r e to i n d i c a t e t h i s c o n t i n u i t y would have intr o d u c e d new i n f o r m a t i o n i n t o the l e t t e r . Now, assume C i s entered f o r (15,C). The next time t h i s same C i s encountered between columns twelve and t h i r t e e n , i t w i l l appear as (C,V) and, from Figure 6A, another C w i l l be entered. Thus two C's w i l l have been entered f o r the same datum. As i t w i l l be shown l a t e r , continuous h o r i z o n t a l e n t r i e s of C or V can be reduced to only one C or V without any l o s s of i n f o r m a t i o n . Next Column I A. cc C V z Co l cc UDH C V u e c C C C C 01 to 41 V v V V V M Hi z Z z z z search for best y F i g u r e 6A 1 .2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 1 1 1 V 1 1 1 1 1 1 1 2 4 C V 13 13 14 14 14 12 C V 20 15 15 C V 27 27 27 V 26 27 27 28 C V 27 27 28 28 — — 1 1 — — — — — — \ \ X X 1 — \ — — \ X X 1 — X X 1 — — 1 1 \ — \ X 1 — \ — F i g u r e 6B F i g . 6 B a s i c f e a t u r e assignments f o r the l e t t e r 'R* 2.4.6 V e r t i c a l Compression A f t e r the assignment of b a s i c f e a t u r e s , the ch a r a c t e r was compressed v e r t i c a l l y to three h o r i z o n t a l rows w i t h the r e s u l t i n g advantages of compactness and b e t t e r l e v e l c o n t i n u i t y . Rows two and three were compressed together. The r e s u l t of compression between two b a s i c f e a t u r e s i s summarized i n F i g u r e 7A. The f o l l o w i n g order or s u p e r p o s i t i o n i n g dominance was imposed : (U,D,H)>C>V>Z. Rows two and three are compressed i n F i g u r e 7B. ' 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 3rd Row & o •o (3 CM ' .•»' • UDH C V Z UDH = UDH UDH UDH C UDH C C C V UDH C V V z UDH c V z no compression — — I 1 — — — — — — \ \ X X .... 1 — \ — — \ X X 1 — X X 1 — — 1 1 \ — \ X 1 — \ — — — 1 1 — — — — — — \ \ X X 1 — \ — — \ — X X X X — — 1 1 \ — \ X 1 — \ — F i g u r e 7A F i g u r e 7B F i g . 7 V e r t i c a l compression to three l e v e l s f o r the l e t t e r 'R'. 13 2.4.7 S e r i f D e l e t i o n S e r i f s * which ar e the c r o s s - l i n e s f i n i s h i n g o f f l e t t e r s , tend to i n c r e a s e a l e t t e r ' s l e g i b i l i t y . However i n the machine r e c o g n i t i o n process, the s e r i f s were c o n f u s i n g . Due to s e r i f s ' s h o r t l e n g t h s , the p r i n t i n g q u a l i t y , and the r e s o l u t i o n o f the scanner, a s e r i f could be d e s c r i b e d by zero, one, or more b a s i c f e a t u r e s . T h i s i n t r o d u c e d an unnecessary amount of s e r i f v a r i a b i l i t y . To i n c r e a s e system r e p r o d u c i b i l i t y , h o r i z o n t a l s e r i f d e l e t i o n was implemented. S e r i f s are found i n the top and bottom quadrants of a l e t t e r and are a s s o c i a t e d w i t h v e r t i c a l s . By a s s i g n i n g an a p p r o p r i a t e s e r i f l e n g t h f o r the f o n t being read, searching the top and bottom quadrants o n l y , and n o t i n g the f a c t t h a t a s e r i f begins from an empty space and ends i n a v e r t i c a l (and v i c e v e r s a ) , s e l e c t i v e d e l e t i o n of s e r i f s was achieved. The s e r i f l e s s c h a r a c t e r i s presented i n F i g u r e 8. I n a d d i t i o n t o the s e r i f s being d e l e t e d , the two v e r t i c a l s i n column three were a l s o e l i m i n a t e d . A minor r o u t i n e d e l e t e d any column which contained o n l y extraneous v e r t i c a l s w i t h no U, D o r H i n f o r m a t i o n present. For t h i s example, column fo u r c o u l d s u i t a b l y represent a l l the i n f o r m a t i o n contained i n both columns three and f o u r . 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 — — 1 — — — — — — \ \ X X 1 — \ — — \ — X X X X — — 1 1 \ — \ X 1 — \ — 1 \ \ X X 1 — \ — — \ — X X X X 1 X 1 F i g . 8 S e r i f d e l e t i o n i n the l e t t e r »R'. 14 2.4.8 H o r i z o n t a l Feature I n t e g r a t i o n A f t e r a l e t t e r was compressed v e r t i c a l l y i n t o three rows, h o r i z o n t a l f e a t u r e i n t e g r a t i o n (or compression) was a p p l i e d along each row. The s t a t e t r a n s i t i o n t a b l e i n F i g u r e 9A was used to s e l e c t the most appropriate i n t e g r a t e d shape that would d e s c r i b e a continuous row of i n d i v i d u a l U, D and H b a s i c f e a t u r e s . The programme took the p r e s - ent s t a t e of the f e a t u r e i n t e g r a t i o n process and compared i t to the next b a s i c f e a t u r e to generate the new s t a t e . The negative s i g n i n the t a b l e represented a t e r m i n a l c o n d i t i o n whereby the system s t o r e d the present s t a t e of the i n t e g r a t i o n process. As w e l l , the s t a t e t r a n s i t i o n t a b l e was r e i n i t i a l i z e d w i t h the next b a s i c f e a t u r e upon encountering the negative s i g n . • E s s e n t i a l l y , the t a b l e encoded a continuous row of U, D and H i n d i v i d u a l elements i n t o one g e n e r a l i z e d f e a t u r e . At l e a s t two (or any combination of) U, D or H consecutive f e a t u r e s had to occur before any i n t e g r a t e d f e a t u r e c o u l d be assigned. S i m i l a r l y , a minimum of two e n t r i e s were necessary f o r a g e n e r a l i z e d Z. A s i g n i f i c a n t d i f f e r e n c e occurred w i t h the V and C f e a t u r e s . For t h i s s i t u a t i o n , one V or C e n t r y was enough to consider them as the g e n e r a l i z e d f e a t u r e s . This was the case because these two f e a t u r e s had g r e a t e r s i g n i f i c a n c e than any s i n g l e U, D, H or Z element. Con- tinuous h o r i z o n t a l row e n t r i e s of e i t h e r V or C were g e n e r a l i z e d i n t o only one e n t r y of the appropriate f e a t u r e . Since C was a more s e l e c - t i v e occurence than V, i n a continuous s t r i n g w i t h both V and C e n t r i e s , C was chosen over V to be r e p r e s e n t a t i v e of t h a t row of f e a t u r e s . The reason f o r t h i s masking of V by C i s i l l u s t r a t e d by the l e t t e r 'c' i n F i g u r e 3. Column two was c l a s s i f i e d as a minor v e r t i c a l ; column three was the c l o s u r e . Column two i s expanding towards and developing the c l o s u r e i n column three and can be represented by the more s i g n i f i c a n t f e a t u r e , the c l o s u r e . The above d i s c u s s i o n i s summarized as f o l l o w s : f o r s i n g l e e n t r i e s : Z = U = D = H = 0 V = V, C = C f o r repeated h o r i z o n t a l e n t r i e s : X n = ( < ) n - 1 X where X = ( Z, U, D, H, V, C ) f o r a row of V and C e n t r i e s : vm- cn- v ° = (<)m + ° + n - h The symbol (<) i n d i c a t e d the c o n t i n u i t y across columns of the f e a t u r e t o i t s r i g h t . I n t e g r a t e d f e a t u r e s other than the one l i s t e d above are a l s o p o s s i b l e . These occurred w i t h combinations or U, D, and H f e a t u r e s ( S e c t i o n 2.4.10). F i g u r e 9B a p p l i e s the s t a t e t r a n s i t i o n t a b l e to the example being developed. In both rows one and two a g e n e r a l i z e d H f e a t u r e was s t o r e d r a t h e r than the D. This was the r e s u l t of an a d d i t i o n a l check which was performed on the t e r m i n a l f e a t u r e s t a t e of the i n t e g r a t i o n process to ensure that the most a p p r o p r i a t e s e l e c t i o n had been made. In t h i s i n s t a n c e , a simple check on the f i r s t row such as n o t i n g t h a t there were s i x H and two D elements d i s q u a l i f i e d a D e n t r y . Instead an H e n t r y was deemed most s u i t a b l e . The same c r i t e r i o n was a p p l i e d t o the second row. To devise a l a r g e r and more complex s t a t e t r a n s i t i o n t a b l e to handle every p o s s i b i l i t y was found t o be e x p e r i - mentally i m p r a c t i c a l . Instead, s a t i s f a c t o r y f e a t u r e i n t e g r a t i o n per- formance was achieved w i t h a two-step procedure. This was accomplished by u s i n g a t r a n s i t i o n t a b l e of moderate s i z e i n c o n j u n c t i o n w i t h a secondary v e r i f i c a t i o n r o u t i n e . 16 Next B a s i c Feature D H U Z V C IZ ID IH IU Z V c i n i t i a l zero 1 2 3 4 5 6 7 8 9 10 n IU H HH U IZ V c i n i t i a l up 1 \ \ X X ID D LH H IZ V c i n i t i a l down 1 _ \ — — \ — X X X X IH LH H HH IZ V c i n i t i a l h o r i z o n t a l 1 X 1 io n V - V - V - V - V V c v e r t i c a l ra t C - C - C - C -C c c c losure Row 1 1 — — — — — — \ \ X X te g LH D LH PU -H - H - H low h o r i z o n t a l S ta te V IH 11 H H H H LH D c c e M HH PD HH U -H - H - H high h o r i z o n t a l Store V H c o D D D CU -D -D -D down at e w CA U U - U - U - U up Row 2 1 — \ — — \ — X X X X 4-t CO CA CA CA U -CA -CA -CA cap State V IH LH LH LH D D c c c c * J c 11) CU D CU CU -CU -CU -CU cup Store V H c re si  PD CA PD H - H - H - H peaked and downward PU H PU CU - H - H - H peaked and upward Row 3 1 X 1 H LH H HH - H - H - H h o r i z o n t a l State V IZ Z Z Z Z Z Z c c IZ Z - Z - Z - Z Z - Z - Z zero Store V Z c F i g u r e 9A Fi g u r e 9B F i g . 9 H o r i z o n t a l f e a t u r e i n t e g r a t i o n f o r the l e t t e r 'R'. 2.4.9 H o r i z o n t a l Compression Once the i n t e g r a t i o n process was completed, h o r i z o n t a l compression was a p p l i e d to the cha r a c t e r . The i n t e g r a t e d f e a t u r e s allowed f o r a compact r e p r e s e n t a t i o n of the l e t t e r . A f e a t u r e m a t r i x r e p r e s e n t a t i o n was obtained i n the f o l l o w i n g manner. The columns were examined i n p a i r s , s t a r t i n g on the r i g h t s i d e and ending a t column one. Adjacent h o r i z o n t a l quadrants i n each p a i r of columns were checked f o r c o n t i n u i t i e s . I f there e x i s t e d a t l e a s t one c o n t i n u i t y i n each p a i r of h o r i z o n t a l quadrants then the two columns were compressed i n t o one. This process continued u n t i l two adjacent i n t e g r a t e d f e a t u r e s were encountered. At t h i s p o i n t the compressed column, which contained a l l 17 the f e a t u r e s to the r i g h t of i t , was saved. Column compression was then r e i n i t i a l i z e d w i t h the next column. R e f e r r i n g to F i g u r e 10, columns eleven and ten and n i n e through twc were reduced to only one column each. 1 2 3 4 5 6 7 8 9 10 11 A B C 1 < < < < < < < — < X 1 < < < < < < < < X 1 < < < < < < < X < 1 — X 1 — X 1 .c . X F i g . 10. H o r i z o n t a l compression f o r the l e t t e r 'R' 2.4.10 An example w i t h Curvature To demonstrate the f u l l c a p a b i l i t i e s of the i n t e g r a t i o n process, e s p e c i a l l y i n the i d e n t i f i c a t i o n of c u r v a t u r e s , the l e t t e r *s' was s e l e c t e d . In Figure 11, the l e t t e r has been reduced to i t s three l e v e l form. Each row was i n t e g r a t e d w i t h the s t a t e t r a n s i t i o n t a b l e and then h o r i z o n t a l l y compressed. The cap (r\) and cup (W) d e s c r i p t i o n s were w e l l s u i t e d to represent the curvatures along the h o r i z o n t a l l e v e l s . This example a l s o i l l u s t r a t e s the g e n e r a l i t y of the c l o s u r e symbol (X)« A c l o s u r e can represent e i t h e r divergence or convergence. Experimentally i t was found t h a t the d i s t i n c t i o n between a convergence and a divergence was not c r i t i c a l to the l e t t e r ' s i d e n t i f i c a t i o n . The assignment of an a l l i n c l u s i v e c l o s u r e (X) was a convenient s i m p l i f i c a t i o n . 18 Next B a s i c Feature D H U Z V C IZ ID IH IU Z V C i n i t i a l zero 1 2 3 A 5 6 7 8 9 10 IU H HH U IZ V c i n i t i a l up 1 X / / — — — \ \ 1 ID D LH H IZ V c i n i t i a l down X — \ — \ \ \ X X IH LH H HH IZ V c i n i t i a l h o r i z o n t a l 1 1 \ \ — / / / X X a o •ri V - V - V - V - V V c v e r t i c a l ra t C -c - C - C - C C c c losure Row 1 1 X / / — — — \ \ 1 te g LH D LH PU - H - H - H low h o r i z o n t a l S ta te V C IU U u u U PD CA V e M HH PD HH U - H - H - H h igh h o r i z o n t a l Store C CA V m o D D D CU -D - D -D down at e U CA U U - U - U - U e up Row 2 X — \ — \ \ \ X X 4-1 CO CA CA CA U - C A -CA -CA cap State IZ c I l l LH LH D D D C c 4 J a QJ CU D CU CU -CU -CU -CU cup Store c D c re s<  PD CA PD H - H - H - H peaked and downward PU H PU CU - H - H - H peaked and upward Row 3 1 1 \ l \ -1/1/ / X X H LH H HH - H - H - H h o r i z o n t a l S ta te v V ID » D PU CU CU c c Z - Z - Z - Z Z - Z - Z zero Store^ V CU c < X < < < < < < 1 X < < < < < \ < X < 1 < < < < < <J < X A B C X r \ 1 X \ X 1 <j X F i g . 11 Feature i n t e g r a t i o n a p p l i e d to a ch a r a c t e r w i t h curvature w i t h the l e t t e r *s' as an example. 19 2.5 Feature E x t r a c t i o n , P a r t 2 2.5.1 Characters w i t h Diagonals The l e t t e r s 'A, k, K, v, V, w, W, x, X, y, Y,' are comprised b a s i c a l l y of diagonals and r e q u i r e a separate approach f o r t h e i r i d e n t i f i c a t i o n . An a l t e r n a t e method of f e a t u r e e x t r a c t i o n i s used. I n F i g u r e 12, the m a j o r i t y of the s t r i n g s which form the diagonals are long enough t o s a t i s f y L, where T<L<3T and are i d e n t i f i e d as minor v e r t i c a l s ; thus, the d i r e c t i o n a l q u a l i t y of the diagonals i s l o s t . The most d e s c r i p t i v e f e a t u r e s i n these characters are t h e i r continuous edge t r a n s i t i o n s . By l o o k i n g a t the s i d e of each c h a r a c t e r from the d i r e c t i o n s i n d i c a t e d by the arrows i n F i g u r e 12, the unobstructed ( s e r i f l e s s ) and unique d i a g o n a l f e a t u r e s a r e c l e a r l y i d e n t i f i e d . Using the programme i n Appendix E, each of the d i a g o n a l characters was viewed along i t s unique s i d e ( s ) : the top s i d e f o r 'A'; the bottom s i d e f o r 'v, V, w, W'; and, both the l e f t and r i g h t s i d e s f o r 'k, K, x, X, y, Y'. The s p e c i f i c a t i o n s o f a di a g o n a l l e n g t h (number of increments) and the up-down sequence f o r a p a r t i c u l a r s i d e were used f o r t h i s subgroup's i d e n t i f i c a t i o n . The l o c a t i o n of t h i s p a r t i c u l a r r o u t i n e i n the o v e r a l l system i s shown i n S e c t i o n 2.7. _ >f _ o o o o o O O O Q O O C _ £ 5 9 o o o o o o o o o O o o o o o o o o o o o X x 2 o o o o o o o o o o o o o o o o o _ g o o o o o o o o o o o o O O O o O O O O Q O o o o ° g o o o S o O O O O X n o o o o o o o o o o o o o o o 2 ° 2 o o o o o o o o o o o o o o o o o o o _ o o o o 5 o o o o g o g o o o o o o o o O o o o o o o o =1 11 8 8° A g § § n gg gg 8 8 • V g8§ V o o o o o fie o o 7 o o o o o o o o o o o o o o S o o o o o o 0 0 0 0 0 0 0 0 0 0 p o w o o o o o o o S o o o o o o o o o o o o o 9 o o o o o o o g o o o o o o o o o S . o o o o o o o o o o o o o o o o o o o o o o o o ' o o _ 2 2 -a o o o o o o o o o o o o o o o o o o o o o o o S o o o o o o o o o o o o o o o o o o o o o o o o F i g . 12 Re c o g n i t i o n of di a g o n a l c h a r a c t e r s by s e l e c t i v e edge examination. 20 2.5.2 S p e c i a l Cases The l e t t e r s 'g, i , j , m' d i d not r e q u i r e the e n t i r e f e a t u r e e x t r a c t i o n process f o r t h e i r i d e n t i f i c a t i o n . I n these s p e c i a l cases the i d e n t i f i c a t i o n was made once a unique measurement was s a t i s f i e d . The d i s t i n c t i v e c h a r a c t e r i s t i c of the l e t t e r s ' i ' and 'j.' i s the f a c t that they are dotted. To detect the dot a l l the columns were i n c l u s i v e l y OR'ed"-together. An ' i ' or ' j ' was assumed i f a break e x i s t e d i n the upper h a l f of the OR'ed ch a r a c t e r . The l e n g t h of the s e c t i o n below the dot was used to d i s t i n g u i s h between the two; ' i ' and ' j ' w ithout t h e i r dots have the same height as a s m a l l and t a l l c h a r a c t e r , r e s p e c t i v e l y . The l e t t e r ' g 1 i s the only f o u r l e v e l c h a r a c t e r i n the alphabet. The contents of l e v e l s two and three were always examined before v e r t i c a l compression ( S e c t i o n 2.4.7) to three l e v e l s was a p p l i e d . I f both the second and t h i r d rows were f i l l e d w i t h U, D, and H i n f o r m a t i o n then the unknown ch a r a c t e r was i d e n t i f i e d as a 'g'. The l e t t e r 'm' i s unique because i t contains three f u l l h e i g h t v e r t i c a l s . When t h i s c o n d i t i o n was s a t i s f i e d i n the s t r i n g i d e n t i f i c a t i o n r o u t i n e -(Section 2.4.3) an :m : was assigned. These measurements were simple and e f f e c t i v e . The savings r e a l i z e d i n computational time and i n the r e d u c t i o n of e n t r i e s i n the look-up t a b l e made t h i s an a t t r a c t i v e approach. 2.6 C l a s s i f i c a t i o n 2.6.1 I n t r o d u c t i o n I f the cost of t a k i n g f e a t u r e measurements i s h i g h , then s e q u e n t i a l d e c i s i o n procedures ( c l a s s i f i c a t i o n ) should be a p p l i e d . This i s e s p e c i a l l y p e r t i n e n t i n the design of the r e c o g n i t i o n l o g i c f o r a pe r s o n a l r e a d i n g machine. Fu [8] emphasizes t h a t a t r a d e - o f f between the e r r o r and the number of f e a t u r e s to be measured can be obtained by t a k i n g measurements s e q u e n t i a l l y and t e r m i n a t i n g the process when a s u f f i c i e n t l y unique measurement i s s a t i s f i e d . The measurements must be ordered i n such a way that t h e e x t r a c t e d f e a t u r e s w i l l cause the t e r m i n a l d e c i s i o n as e a r l y as p o s s i b l e . These p r i n c i p l e s were i n c o r p o r a t e d i n t h i s c l a s s i f i e r . 21 2.6.2 Code-Word Generation A code-word was assigned to every horizontal level i n the compressed character. This code-word does not contain continuities (<). Furthermore, only a Z bounded on both sides by features other than a continuity or another Z was stored i n the code-word. In Figure 13, the three code-words for the l e t t e r *R* are generated. Since there are seven possible values for a feature, a minimum of three b i t s i s required to specify any one of these features. Thus i n a twelve-bit code-word, up to four features can be packed. Feature Matrix Code-words Feature Code Fig. 13 The three code-words for the le t t e r 'R'. 2.6.3 Subgroup Assignment Rather than compare the code-words to a l l the alphabetic p o s s i b i l i t i e s , a cl a s s i f i c a t i o n technique was implemented which confined the recognizer to search within a limited subgroup of characters. The separation of characters into classes was based on two physical properties of a letter. F i r s t , for a random selection of characters, 'a, B, c, E, G, H, N, 0', the height i s used to separate small letters from t a l l ones: 'a, c' and 'b, E, H, N, 0'. Secondly, the number of f u l l character height verticals, whether zero, one, or two, further subgroups the characters: *a, c' (small letters) and 'G, 0' (no verticals) and 'b, E' (one vertical) and 'H, N' (two v e r t i c a l s ) . In this manner four distinct classes are identified. The alphabet i s subgrouped in Figure 14. The special cases and the two forms 'g* and 'g' are also included. • / \ — - 1 X 0 1 2 3 4 5 6 7 1 — X 1 — X | • X 4 3 5 4 3 5 4 0 5 22 Small L e t t e r s a c e n o r s u z No V e r t i c a l s C G 0 Q S Z One V e r t i c a l -lower case b d f g h 1 P q t -upper case B D E F I J L p R T Two V e r t i c a l s H M N U Diagonal Characters -lower case k V w X y -upper case A K V W X Y S p e c i a l Cases g i j m F i g . 14 Subgrouping the alphabet. 2.6.4 I d e n t i f i c a t i o n Three code-words were assigned to an unknown c h a r a c t e r , one fo r each of the three f e a t u r e l e v e l s . The f i r s t code-word was compared to a l l the s t o r e d code-words f o r t h a t l e v e l , i n i t s a p p r o p r i a t e subgroup. Each code-word was a s s o c i a t e d w i t h a second t w e l v e - b i t word which s t o r e d a l l the p o s s i b l e characters w i t h t h a t code-word i n common. Since there were l e s s than twelve characters i n any upper or lower case subgroup, each b i t of the second t w e l v e - b i t word was a s s o c i a t e d w i t h a s p e c i f i c c h a r a c t e r . The entry of a '1' or '0' f o r a p a r t i c u l a r b i t i n d i c a t e d whether the code-word was common to th a t character o r not. I f the code- word was unique to only one ch a r a c t e r then the i d e n t i f i c a t i o n was made a t t h a t p o i n t . I f the code-word was shared by s e v e r a l c h a r a c t e r s , then these p o s s i b i l i t i e s were recoded and the process was repeated f o r the next f e a t u r e l e v e l code-word. I f no unique code-word matches were found a f t e r searching the three l e v e l s , then the i d e n t i f i c a t i o n was based on the character w i t h the maximum number of code-word matches. Let us assume that *-he unknown c h a r a c t e r i n F i g u r e 15 i s the l e t t e r 'c'. The code-word f o r the f i r s t l e v e l i s common to bot h ' c' and * s'. However, the second l e v e l i n 'c' i s uniaue to i t and so the i d e n t i f i c a t i o n as 'c' i s made. For the example of the l e t t e r 'R', 23 there are no unique code-words and thus the i d e n t i f i c a t i o n i s based on the maximum number of code-word matches. This c h a r a c t e r r e c o g n i t i o n process i s i l l u s t r a t e d i n Appendix B. u 1 1 X 1 X h x X / 1 X X — X X 1 X < X 1 p 1 x r > l x 1 / 1 X n 1 1 1 1 — X 1 1 XI X 1 X \ X 1 1 X X 1 1 X M I X 1 1 <j X 1 1 X — i Subgroup a c e n o r s u z 1st CW Matches 0 1 0 0 0 0 1 0 0 2nd CW Matches. 0 1 0 0 0 0 0 0 0 Unique 3rd CW Matches 0 1 1 0 0 0 0 0 0 F i g . 15 Character i d e n t i f i c a t i o n , the l e t t e r ' c' as an example. 1. . / U V C J L d X X U The e n t i r e system i s represented i n b l o c k diagram form i n Fi g u r e 16. The p a t t e r n r e c o g n i t i o n process has been broken down i n t o i t s v a r i o u s stages. Each stage i s f u r t h e r i d e n t i f i e d as to the r o u t i n e s which are performed w i t h i n i t . The numbered r o u t i n e s i n d i c a t e processes that occur during one major programme. The l e t t e r i n g r e f e r s to d i s c r e t e r o u t i n e s that are c l a s s i f i e d according to t h e i r common f u n c t i o n . Initialization Identification input from photocell ^ scanner output via display and TTY A. l . height and width 2. set a l l parameters 3. store edgea along the four sides B. check for ' i V j 1 1. string identification (CC,V,C) 2. check for 'a' Diagonal Check Feature Extraction 1. 'A'-toD edee 2. *v,V,w,W'-bottom edge 3. 'k.K.x.X.v.Y'-left and right edges A. four level resolution B. basic feature assignment (Z,U,D,H,V,C) Classification A. subgroup assignment B. search through look-up tabic and identification Feature Co isolidation Feature Enhancement A. horizontal feature integration B. horizontal compression into feature matrix C. assignment of three code-words A. l . check for ̂ Q' 2. vertical compression to three levels B. serif deletion F i g . 16 Block diagram of the system. I I I . TESTS AND RESULTS 3.1 D e s c r i p t i o n of the Test M a t e r i a l Three d i f f e r e n t t y p e w r i t e r f o n t s were used i n the t e s t i n g of the system: Hermes Ambassador (HA), IBM Delegate (D), and IBM Gothic (G). These fonts were s e l e c t e d because of t h e i r l a r g e p r i n t which ensured good machine r e s o l u t i o n . About one-half of a l l type- w r i t e r f o n t s are t h i s s i z e . This t h e s i s was typed w i t h the IBM P r e s t i g e - E l i t e f o n t ; i t s s i z e i s t y p i c a l of the s m a l l e r s i z e d f o n t s . The three f o n t s shown i n F i g u r e 17, were s e l e c t e d bacause they cover the range of p r i n t s t y l e s , e x c l u d i n g i t a l i c and s c r i p t , from the most s t y l i z e d HA, through a moderately s t y l i z e d D, to a b o l d and s e r i f l e s s G. The height of the t a l l and s m a l l l e t t e r s i s approximately the same f o r the three f o n t s although, the widths v a r y . The t e s t m a t e r i a l i n c l u d e d a l l upper- and lower-case c h a r a c t e r s . Carbon ri b b o n was used f o r a l l t y p i n g . Hermes Ambassador, pitch 10 A B O - D E - P G H I J K L M N O P Q R S T . U V W X T Z a b c d e f g h i j k l m n o p q r s t u v w x y z I B M D e l e g a t e , p i t c h 1 0 A B C D E F G H I J K L M N O P Q R S T U V W X Y Z a b c d e f g h i j k l m n o p q r s t u v w x y z IBM Gothic, pi tch 12 A B C D E F G H I J K L M N O P Q R S T U V W X Y Z a b c d e f g h i j k l m n o p q r s t u v w x y z F i g . 17 The three f o n t s used;in the t e s t s , reproduced i n f u l l s i z e . 25 3.2 Tests and Results w i t h Repeated L e t t e r s The t y p e w r i t t e n m a t e r i a l was presented i n the f o l l o w i n g manner. Each upper and lower case character was typed 10 times i n a row. A f t e r scanning one row, the r e c o g n i t i o n r e s u l t s were p r i n t e d out on the TTY. T e s t i n g was performed f i r s t w i t h the HA look-up t a b l e on a l l three f o n t s . Then the system was t r a i n e d on each of the D and G f o n t s , i n d i v i d u a l look-up t a b l e s were developed, and the t e s t s were repeated on the newly t r a i n e d f o n t s . The l i g h t i n g i n t e n s i t y was adjusted only once at the beginning of each f o n t ' s t e s t f o r maximum r e s o l u t i o n . The f o l l o w i n g r e s u l t s were obtained from these t e s t s : font HA D G u n t r a i n e d — 66.8 68.9 t r a i n e d 88.2 85.2 89.0 Over 65% of the HA look-up t a b l e was general enough to be a p p l i c a b l e to the other f o n t s without t r a i n i n g . A l l three f o n t s achieved s i m i l a r a c c u r a c i e s . Since the t r a i n i n g procedure was simple and accomplished q u i c k l y , the system demonstrated good m u l t i f o n t a d a p t a b i l i t y . Confusion m a t r i c e s have been t a b u l a t e d f o r the t e s t r e s u l t s and are i n c l u d e d i n Appendix A. 3.3. Tests and R e s u l t s w i t h Typewritten Passages This t e s t was performed to determine the r e c o g n i t i o n c a p a b i l i t y of the system on m u l t i c h a r a c t e r t e x t under more r e a l i s t i c c o n d i t i o n s than i n S e c t i o n 3.2. A passage from a magazine a r t i c l e was typed, e x c l u s i v e of punctuation, w i t h the three f o n t s . The t e s t s were performed w i t h the t r a i n e d look-up t a b l e s f o r each f o n t . The system was not t r a i n e d beforehand on t h i s passage. The r e s o l u t i o n ( i l l u m i n a t i o n ) of the scanner was adjusted only once at the beginning of each f o n t ' s t e x t . Character i d e n t i f i c a t i o n s were p r i n t e d on the TTY at the end of each l i n e . The r e c o g n i t i o n r e s u l t s f o r a t y p e w r i t t e n passage which contained 293 c h a r a c t e r s , are g i v e n below : fo n t HA D G r e s u l t s 89.9% 89.1% 87.7% The passages as recognized by the system are i n c l u d e d i n Appendix A. In t h i s t e s t m a t e r i a l c h a r a c t e r s r a r e l y touched and no segmentation scheme [5] was implemented. 3.4 Sources of E r r o r In t h i s s e c t i o n the major c o n t r i b u t o r s to the e r r o r a r e i d e n t i f i e d . T y p i c a l e r r o r s are I l l u s t r a t e d i n Appendix D. From the confusion m a t r i c i e s which were t a b u l a t e d f o r HA (Appendix A ) , the e r r o r s have been c a t e g o r i z e d as f o l l o w s : m i s i n t e r p r e t a t i o n of a c l o s u r e 25.4% m i s c l a s s i f i e d f e a t u r e s ( b l o t t e d o r broken cha r a c t e r s ) 23.8% m i s s i n g a d i a g o n a l character 15.9% i n d i s c r i m i n a t e s e r i f d e l e t i o n 12.7% confusion among s i m i l a r c h a r a c t e r s 11.1% improper subgroup assignment 11.1% A s i g n i f i c a n t source of e r r o r was due to m i s i n t e r p r e t a t i o n of a c l o s u r e as a minor v e r t i c a l and v i c e v e r s a . A c l o s u r e r e q u i r e d c e r t a i n c o n s t r a i n t s ( S e c t i o n 2.4.2) t o be s a t i s f i e d , but these measurements proved to be incomplete and a r e f o r m u l a t i o n o f the c l o s u r e c r i t e r i a i s suggested ( S e c t i o n 4.2). 27 A broken; s t r i n g d e s c r i b i n g a minor v e r t i c a l or a c l o s u r e r e s u l t e d i n an e r r o r . A break i n a f u l l v e r t i c a l c o u l d be checked f o r and co r r e c t e d by counting the number of b l a c k c e l l s i n a column and, i f the t a l l y exceeded the height requirement f o r a f u l l v e r t i c a l , then one was assigned. However, w i t h minor v e r t i c a l s and c l o s u r e s no such c r i t e r i o n e x i s t e d to dete c t a break and thus an e r r o r r e s u l t s . B l o t t e d f e a t u r e s i n a charac t e r a l s o r e s u l t e d i n e r r o r . For example, a b l o t t e d area can change a short s t r i n g i n t o a longer one, thus wrongly changing i t s i d e n t i t y from a CC to a V. The performance of the r o u t i n e t h a t checked f o r di a g o n a l c h a r a c t e r s was troublesome. The problem was th a t the programming was i n e f f i c i e n t to d e a l w i t h a l l the v a r i a t i o n s encountered i n f o l l o w i n g the incremental changes along an edge. C e r t a i n groups o f ch a r a c t e r s were d i f f i c u l t to d i s t i n g u i s h . For example, the l e t t e r s 'V' and 'Y' were d i f f i c u l t to separate. These m i s c l a s s i f i c a t i o n e r r o r s were confined to one or two s i m i l a r characters and u s u a l l y occurred w i t h i n the same subgroup. Thus m i s c l a s s i f i c a t i o n was confined to p o s s i b l y one of three c h a r a c t e r s r a t h e r than one i n f i f t y - t w o . The s e r i f d e l e t i o n r o u t i n e , besides reducing the v a r i a b i l i t y of the i n f o r m a t i o n , was sometimes i n d i s c r i m i n a n t l y a p p l i e d . For example, when s e r i f s are e l i m i n a t e d from the l e t t e r s '1' and ' I ' i d e n t i c a l r e s u l t s are obtained. Another type of e r r o r was improper subgroup assignment. I n the case of such l e t t e r s as ' J ' and 'U', the h e i g h t c r i t e r i o n f o r a f u l l v e r t i c a l was o c c a s i o n a l l y missed because o f the rounding on the v e r t i c a l s ' bottoms, thus c o n f i n i n g the code-word search to the wrong subgroup. 28 3.5 T r a i n i n g Procedure A simple t r a i n i n g procedure was used. A row of ten i d e n t i c a l c haracters was scanned. The operator then observed the computer d i s p l a y s f o r each c h a r a c t e r , which were s i m i l a r to those i n Appendix B, and noted the most common f e a t u r e m a t r i x r e p r e s e n t a t i o n s . The new look-up t a b l e f o r e i t h e r D o r G was developed from the HA t a b l e by simple a l t e r a t i o n of the e x i s t i n g code-words. I n most cases not a l l three code-words f o r a c h a r a c t e r r e q u i r e d m o d i f i c a t i o n . Rather, one or two code-words were i n need of m o d i f i c a t i o n and, u s u a l l y only p a r t of the code-word. For example, w i t h the c h a r a c t e r s 'C (HA) and 'C (G), only the code-word f o r the top l e v e l needed to be changed from c l o s u r e - c a p - v e r t i c a l ( f e a t u r e code 564) to closure-cap ( f e a t u r e code 56) The code-words f o r the two remaining f e a t u r e l e v e l s i n t h i s l e t t e r were i d e n t i c a l . The system r e q u i r e d approximately 4 hours to be t r a i n e d to a new f o n t . 3.6 Speed Ihe speed of c h a r a c t e r r e c o g n i t i o n was i n the range of 376 ms. to 668 ms. This r e s e a r c h was e x p l o r a t o r y and, as such, no attempt was made to o p t i m i z e the programming. The programme was w r i t t e n i n such a way t h a t each step of p r o c e s s i n g could be r e a d i l y f o l l o w e d ; as w e l l , the ease of m o d i f i c a t i o n was important. Thus only one o p e r a t i o n was performed on the data at a time and, as a r e s u l t , much d u p l i c a t i o n of e f f o r t e x i s t s . The speed of the present system can be i n c r e a s e d by approximately 150 ms. using more e f f i c i e n t programming techniques. For a p r a c t i c a l system, the separate r o u t i n e s can be i n c o r p o r a t e d together. For i n s t a n c e , s t r i n g i d e n t i f i c a t i o n need occur only one column ahead of b a s i c f e a t u r e assignment, which i n t u r n should be only one column ahead of the f e a t u r e i n t e g r a t i o n process. This type of system c o n f i g u r a t i o n should be a b l e to achieve a reading r a t e of 10 char/sec. 29 3.7 System Storage Requirements This OCR system was a l l o c a t e d 8 K of memory on the PDP-12 computer. The r e c o g n i t i o n programme i t s e l f r e q u i r e d only 4K of memory. The second 4K was occupied by the I/O and d i s p l a y r o u t i n e s as w e l l as the storage of a l l the processed i n f o r m a t i o n . The programming was w r i t t e n i n PDP-8 language and t h e r e f o r e the system can be implemented on a PDP-8 minicomputer. Approximately 100 code-word e n t r i e s were necessary f o r the look-up t a b l e of a p a r t i c u l a r f o n t . About 40% of these code-words were unique e n t r i e s ; or, 60% of the code-words were shared by -two or more c h a r a c t e r s . 3.8 System F l e x i b i l i t y The a b i l i t y of t h i s system to accomodate poor image q u a l i t y , d i s t o r t i o n , and complex shapes was h i g h l y s a t i s f a c t o r y . The system's f l e x i b i l i t y i s i l l u s t r a t e d i n Appendix D. A break i n a c h a r a c t e r o r a b l o t t e d p r i n t i n g could be accomodated. I f damage was l o c a l i z e d , then onlv one code-word was a f f e c t e d and, the other two cou l d s t i l l be used f o r i d e n t i f i c a t i o n . E l o n g a t i o n and compression of a c h a r a c t e r was t o l e r a t e d by the system. This type of rubber-sheet d i s t o r t i o n of a plane i s known as a t o p o l o g i c a l mapping [7] and e s t a b l i s h e s the s t r o n g l y t o p o l o g i c a l ; nature of t h i s scheme. The system's a b i l i t y to handle c a r e f u l l y handprinted characters was noted. The scheme was a l s o a b l e to r e s o l v e complex shapes i n t o more s i m p l i f i e d s t r u c t u r e s . 30 IV. CONCLUSION 4.1 Summary This t h e s i s has developed an OCR system w i t h a p o t e n t i a l a p p l i c a t i o n to a p e r s o n a l reading machine f o r the b l i n d . A t o p o l o g i c a l f e a t u r e e x t r a c t i o n technique based on l i n e and shape d e s c r i p t o r s was implemented. A v e r t i c a l p h o t o c e l l a r r a y , scanning at 60 wpm, was used to o b t a i n a b i n a r y image. A continuous s t r i n g o f b l a c k c e l l s was c h a r a c t e r i z e d by one of three p o s s i b i l i t i e s : a s i n g l e p o i n t to represent the centre c e l l of a short s t r i n g ; a v e r t i c a l o f a p p r o p r i a t e l e n g t h to c h a r a c t e r i z e a long s t r i n g ; and, an area of c l o s u r e to d e f i n e the convergence or divergence of two branches w i t h i n a l e t t e r . A f o u r l e v e l r e s o l u t i o n of column data was adopted and each column was compared to i t s next neighbour to generate continuous f e a t u r e s across the columns. Feature enhancement, such as s e r i f d e l e t i o n and v e r t i c a l compression to three l e v e l s , was implemented. A t r a n s i t i o n t a b l e i n t e g r a t e d h o r i z o n t a l f e a t u r e s along the three f e a t u r e l e v e l s . A f t e r h o r i z o n t a l compression, a c h a r a c t e r ' s shape was d e s c r i b e d by any combination of seven b a s i c f e a t u r e s , u s u a l l y w i t h i n a 3 x 3 f e a t u r e m a t r i x . Each of the three h o r i z o n t a l l e v e l s was assigned a code-word. The c l a s s i f i e r confined the unknown c h a r a c t e r i n t o a subgroup of the alphabet; t h i s was based on the c h a r a c t e r ' s h e i g h t and the number of f u l l l e n g t h v e r t i c a l s which i t contained. I d e n t i f i c a t i o n was made when a unique code-word i n the l e t t e r was encountered; o r , i f the code- words were a l l common w i t h i n the subgroup, then the r e c o g n i t i o n was on the b a s i s of the c h a r a c t e r w i t h the maximum number of code-word matches. A s p e c i a l technique was developed to i d e n t i f y c h a r a c t e r s w i t h prominent d i a g o n a l f e a t u r e s ; the t r a n s i t i o n of an edge along a s p e c i f i c s i d e of a character was used f o r t h i s subgroup's r e c o g n i t i o n . S e q u e n t i a l c l a s s i f i c a t i o n was employed throughout the system so t h a t i d e n t i f i c a t i o n was made once a s u f f i c i e n t l y unique measure was s a t i s f i e d . 31 The system was t e s t e d on upper- and lower-case cha r a c t e r s of three d i f f e r e n t t y p e w r i t e r f o n t s which v a r i e d from a s t y l i z e d to a s e r i f l e s s p r i n t . R e s u l t s i n d i c a t e d t h a t the l e v e l of r e c o g n i t i o n achieved was approximately 87% accuracy f o r a r e c o g n i t i o n speed of two characters per second. The t r a i n i n g of the system to a new f o n t was a simple procedure t h a t r e q u i r e d about four hours of e f f o r t . Approximately 100 code-word e n t r i e s were s t o r e d i n the look-up t a b l e . An OCR method has been proposed which uses shape d e s c r i p t o r s which i n a sense p a r a l l e l d e s c r i p t i o n s t h a t a man would use. Transformations which w i l l leave the l e t t e r ' s i d e n t i t y u n a l t e r e d to man should l i k e w i s e be used by the machine. The p i l o t s t u d i e s i n d i c a t e s e v e r a l advantages of t h i s approach. Rubber-sheet changes of s c a l e produce i n v a r i a b l e r e s u l t s . Breaks i n cha r a c t e r s produced l o c a l i z e d u n c e r t a i n t y which i n most Instances allowed the character to be c o r r e c t l y i d e n t i f i e d . The presence or absence of s t y l i z a t i o n , such as s e r i f s , i n the f o n t seem to have l i t t l e e f f e c t on i d e n t i f i c a t i o n . A few c a r e f u l l y drawn, hand- p r i n t e d characters were t r i e d w i t h the OCR and were c o r r e c t l y i d e n t i f i e d . 4.2 Recommendations f o r Further Research The d e t e c t i o n of a c l o s u r e as s t a t e d i n S e c t i o n 2.4.2 i s weak. R e f e r r i n g to Figure 18, a more r i g o r o u s r e f o r m u l a t i o n of the c l o s u r e c r i t e r i a i s now presented: 1. the column (B) a t which the c l o s u r e occurs must have two s t r i n g s i n the adjacent column (C), t h a t are l o c a t e d at i t s e x t r e m i t i e s ; 2. to be c e r t a i n of the i d e n t i t y of the two s t r i n g s as a c t u a l l y two branches t h a t converge to form the c l o s u r e , the next column (D) over from these two s t r i n g s must be checked f o r t h e i r c o n t i n u a t i o n ; 3. the column (A) preceeding the c l o s u r e should be s m a l l e r than the c l o s u r e s t r i n g to i n d i c a t e expansion towards the c l o s u r e . Such a new c r i t e r i a can c o r r e c t the c l o s u r e e r r o r s i n the l e t t e r s 'e, C, U' i n Appendix C. The c l a s s i f i e r i s underdeveloped. For example i n Appendix C the f e a t u r e matrices f o r both 'P' and '0' are good c h a r a c t e r d e s c r i p t i o n s d e s p i t e the m i s c l a s s i f i c a t i o n s . The e r r o r s were confined to only one column; however, i n the h o r i z o n t a l d i r e c t i o n s e v e r a l rows were a f f e c t e d . Thus more than one code-word was a l t e r e d by the same e r r o r . We suggest that three code-words f o r the v e r t i c a l d i r e c t i o n i n a d d i t i o n to the three code-words f o r the h o r i z o n t a l d i r e c t i o n w i l l produce a more powerful c l a s s i f i e r . This w i l l double the s i z e of the present look-up t a b l e to about 200 e n t r i e s , but t h i s i s s t i l l a modest s i z e . An improvement to the system i s a n t i c i p a t e d because the s i x code-words f o r each c h a r a c t e r w i l l r e s u l t i n a b e t t e r d i s t i n c t i o n among c h a r a c t e r s , double the p o s s i b i l i t i e s f o r unique code-words, and i n t r o d u c e a h i g h degree of e r r o r t o l e r a n c e . Such a c l a s s i f i e r w i l l be a b l e to recognize the l e t t e r s 'P' and '0' i n Appendix C from t h e i r v e r t i c a l code-words. From the d i s c u s s i o n on e r r o r s ( S e c t i o n 3.7), i t was noted that there e x i s t s m a l l s e t s of c h a r a c t e r s which the b a s i c system f i n d s some d i f f i c u l t y i n s e p a r a t i n g . An advanced system w i l l have to i n c l u d e a s m a l l number of s o r t i n g r o u t i n e s whenever these l a t e n t a m b i g u i t i e s are encountered. The r o u t i n e t h a t checks f o r d i a g o n a l c h a r a c t e r s has to be improved. S p e c i f i c a l l y , t h e . a b i l i t y t o accomodate l o c a l i z e d d i s t o r t i o n s when f o l l o w i n g an edge must be i n c o r p o r a t e d i n t o the programming. D A B C D B , 0 0 loo 'oo oo oo oo oo ooo ooo ooo oo oo oo oo oo ooo oo ooo oooooo ooo ooo oo ooo ooo ooo ooo oo oo oo ooo ooooooooo oooooo m i o o o I t A-B a > b e < f B-C b > c+1 b < i - 1 f > k+1 f < g-1 C-D c > d+1 g < h-1 i 7* m F i g . 18 Suggested c l o s u r e c r i t e r i a to r e p l a c e the c r i t e r i a i n Figure 3. 33 REFERENCES [I] S.K. A b d a l i , "Feature E x t r a c t i o n Algorithms 1*, P a t t e r n R e c o g n i t i o n , 3, pp. 3-23, A p r i l , 1971. [2] H.C. Andrews, Mathematical Techniques i n P a t t e r n R e c o g n i t i o n , New York: Wiley, 1972. [3] M.P. Beddoes, "An Inexpensive Reading Instrument f o r the B l i n d " , IEEE Trans. Bio-Med. Eng., V o l . BME-15, pp. 70-79, A p r i l , 1968. [4] J.C. B l i s s , "A R e l a t i v e l y H i gh-Resolution Reading A i d f o r the B l i n d " , IEEE Trans. Man-Mach. Syst., V o l . MMS-10, pp. 1-9, March, 1969. [5] Clayden e t a l . , " L e t t e r R e c o g n i t i o n and the Segmentation of Running Text", Information and C o n t r o l , 9, pp. 246-264, 1966. [6] J.K. Clemens, " O p t i c a l Character R e c o g n i t i o n f o r Reading Machine Ap- p l i c a t i o n s " , Ph.D. t h e s i s , M.I.T., September, 1965. [7] R.O. Duda and P.E. H a r t , P a t t e r n R e c o g n i t i o n and Scene A n a l y s i s , pp. 327-378. New York: Wiley, 1973. [8] M. Eden, "The A p p l i c a t i o n of Character R e c o g n i t i o n Techniques to the Development of reading machines f o r the B l i n d " , Image P r o c e s s i n g i n B i o l o g i c a l Science, ed. D.M. Ramsey, U.C.L.A. P r e s s , pp. 35-56, 1968. [9] M. Eden and M. H a l l e , " C h a r a c t e r i z a t i o n of C u r s i v e Handwriting", Proc. 4th London Symp. Inform. Theory , C. Chery ed., London: Butterworths, I96T7 ~ [10] K.S. Fu, S e q u e n t i a l Methods i n P a t t e r n "Recognition and Machine Learn- i n g . New York: Academic P r e s s , 1968. [ I I ] K. Fukunaga, I n t r o d u c t i o n t o S t a t i s t i c a l P a t t e r n R e c o g n i t i o n . New York: Academic P r e s s , 1972. [12] H. Genchi et a l . , "Recognition of Handwritten Numerical Characters f o r Automatic L e t t e r S o r t i n g " , Proc. IEEE, V o l . 56, no. 8, pp. 1292- 1301, August, 1968. [13] F. Lee, "A Reading Machine: From Text to Speech", IEEE Trans. Audio and E l e c t r o a c o u s t i c s , v o l . 17, no. 4, pp. 275-282, December, 1969. [14] M.D. Levine, "Feature E x t r a c t i o n : A Survey", Proc. IEEE, v o l . 57, no. 8, pp. 1391-1407, August, 1969. [15] H. Marko, "A B i o l o g i c a l Approach to P a t t e r n R e c o g n i t i o n " , IEEE Trans. Systems, Man, and C y b e r n e t i c s , v o l . SMC-4, no. 1, pp. 34-39, January, 1974. 34 [16] S.J. Mason and J.Ki Clemens, "Character r e c o g n i t i o n i n an E x p e r i - mental Reading Machine f o r the B l i n d " , Recognizing P a t t e r n s , eds. P.A. K o l e r s and M. Eden, M.I.T. P r e s s . Cambridge, Mass., pp. 156- 167, May,1968. [17] G. Nagy, "State of the A r t i n P a t t e r n R e c o g n i t i o n " , Proc. IEEE, v o l . 56, pp. 836-862, May, 1968. [18] P.W. Nye and J.C. B l i s s , "Sensory A i d s f o r the B l i n d : A C h a l l e n g i n g Problem w i t h Lessons f o r the Future", Proc. IEEE, v o l . 58, no. 12, pp. 1878-1898, December, 1970. [19] A. Rosenfeld, "Figure E x t r a c t i o n " , Automatic I n t e r p r e t a t i o n and C l a s s i f i c a t i o n of Images,' ed. A. G r o s s e l l i , Academic P r e s s , New York, pp. 137-153, 1969. [20] G.C. Smith and H.A. Mauch, "Summary Report on the Development of a Reading Machine f o r the B l i n d " , B u l l . P r o s t h e t i c s Res., v o l . BPR 10-12, pp. 243-271, F a l l , 1969. [21] J.T. Tou, "Figure E x t r a c t i o n i n P a t t e r n R e c o g n i t i o n " , P a t t e r n Rec- o g n i t i o n , v o l . 1, pp. 3-11, J u l y , 1968. [22] J.T. Tou and R.C. Gonzalez, " R e c o g n i t i o n of Handwritten Characters by T o p o l o g i c a l Feature E x t r a c t i o n and M u l t i l e v e l C a t e g o r i z a t i o n " . IEEE Trans. Computers, v o l . C-21, pp. 776-785, J u l y ; 1972. [23] G. T o u r l a k i s and J . Mylopoulos, "Some R e s u l t s i n Computational Topology", J . ACM 30, 3, pp. 439-455, J u l y , 1973. [24] E.E. T r i e n d l , " S k e l e t o n i z a t i o n of Noisy Handdrawn Symbols Using P a r a l l e l Operations", P a t t e r n R e c o g n i t i o n , v o l . 2, pp. 215-226, 1970. 35 APPENDIX A Confusion Matrices The f i r s t three pages c o n t a i n the confusion matrices t h a t were tabul a t e d from t e s t s w i t h repeated c h a r a c t e r s under the c o n d i t i o n s of constant i l l u m i n a t i o n and t r a i n e d look-up t a b l e s f o r each f o n t . The top f i g u r e i s the lower case r e s u l t s and the bottom i s f o r the upper case. The question mark ( ? ) , which i s one of the p o s s i b i l i t i e s of i d e n t i f i c a - t i o n , i n d i c a t e s that no code-word matches were found f o r t h i s c h a r a c t e r . An 'X' represents complete r e c o g n i t i o n (10/10). A s l a s h (/) through a number i n d i c a t e s t h a t , i f t h i s i s the lower-case m a t r i x , then the confu- s i o n i s w i t h the upper-case c h a r a c t e r , and v i c e v e r s a . The r e c o g n i t i o n r e s u l t s f o r the t y p e w r i t t e n passages are i n - cluded on the f o u r t h page of t h i s appendix. An a s t e r i s k (*) i s l o c a t e d above each e r r o r . I d e n t i f i c a t i o n a b c d e f g h i j k 1 m n o. P q r s t u V y Z ? a X b 9 I c X 1 d X e X c X X g X h 2 8 i 1 XI J X k 2 7 1 a 2 8 El 9 1 n 1 7' ? o X P lx <J i 8 1 r X 6 1 la 1 t X U X V X w 1 I R X 3 7 y 1 9 2 1 8 1 • I d e n t i f i c a t i o n A B C D T? U F G H I J K L M N 0 P Q R S T u V w X T z *\ >j B X C 8 2 D 9 1 E X F 2 7 1 G 8 2 H 8 2 I 2 7 1 J 1 9 K 1 9 L 2 8 M 7 1 2 N K 0 X P 8 Q 1 7 1 1 R 1 1 8 S 1 X T 3 7 U 7 1 ?, V X W X X 8 ? Tf 1 9 Z 1 7 ? Confusion matrices f o r Hermes Ambassador f o n t , on which the system was developed. I d e n t i f i c a t i o n a b c d e f B h i j k 1 T.1 n o P q r s t u V w X y z ? a 7 2 1 b 9 1 c 9 1 d 1 9 e 1 8 1 f X g 8 h 1 9 i X j k 1 9 1 1 3 7 tn 3 1 1 n 9 1 o X P 1 9 q X r X s 7 2 1 t 1 1 8 u X V X w X 1 9 y 6 1 z L 1 B • I d e n t i f i c a t i o n A B c D E F G K I J K L M N 0 P Q R S T U V W X Y Z ? • 1 i i i • i i " I • i B 8 ? C X D X E 7 2 1 F 8 2 G J, 9 H I 9 I 4 J t 8 1 K 9 1 L t 6 I 1 M 9 1 N X 0 2 P 1 9 Q ? 1 R 3 7 S j 9 T 7 TJ 9 1 V X W X X 9 1 Y 6 4 Z 7 Confusion m a t r i c e s f o r IBM Delegate f o n t ( t r a i n e d ) . I d e n t i f i c a t i o n a b e d e f g h i j k l m n o p q r s t u v w x y z ? a 8 1 1 1 b X 1 c 9 1 1 d 8 1 2 e xi f X — — g X h X i ix J X k 2 8 1 be ra 8 2 n 9 1 o 2 8 P 9 t q 1 9 r 2 7 1 s 5 1 t X u X V X w X X X y X z X I d e n t i f i c a t i o n A B C D E F G H I J K L M N O P Q R S T U V W X Y Z ? 9 1 8 1 1 1 8 1 1 9 be 1 9 2 8 8 ? X X 7 2 1 9 1 1 1 8 X 91 1 4 6 1 9 % 8 8 2 X X 9 1 7 3 X 9 8- to V H Confusion m a t r i c e s f o r IBM Gothic f o n t ( t r a i n e d ) . A f t e r our documentary The F i f t h E s t a t e was a i r e d by the CIO Kevin O N e i l l who was revealed as d i r e c t o r of Canadas l a r g e s t i n t e l l i g e n c e agency the Communications Branch of the N a t i c i i a l Research C o u n c i l CBNRC was asked by a newspaper r e p o r t e r vho i t was that he r e p o r t s t o O N e i l l t o l d him that he had speri; most of the morning t r y i n g to determine j u s t t h a t A f t e r our documentary The F i f t h E s t a t e was a i r e d by the CIC Kevin O N e i l l who was rev e a l e d as d i r e c t o r of Canadas l a r g e s t i n t e l l i g e n c e agency the Communications Branch o f the N a t i c n a l Research C o u n c i l CBNRC was asked by a newspaper r e p o r t e r vho i t was that he r e p o r t s to O N e i l l t o l d him t h a t he had spert most of the morning t r y i n g to determine j u s t t h a t A f t e r o u r d o c u m e n t a r y The F i f t h E s t a t e was a i r e d by t h e CBC K e v i n O N e i l l who was r e v e a l e d as d i r e c t o r o f Ca n a d a s l a r g e s t i n t e l l i g e n c e a g e n c y t h e C o m m u n i c a t i o n s B r a n c h o f t h e N a t l o r a l R e s e a r c h C o u n c i l CBNRC was a s k e d by a n e w s p a p e r r e p o r t e r wfo i t was t h a t he r e p o r t s t o O N e i l l t o l d him t h a t he had s p e n t most o f t h e m o r n i n g t r y i n g t o d e t e r m i n e j u s t t h a t R e a d i n g r e s u l t s f r o m a c o n t i n u o u s p a s s a g e . Top and IBM G o t h i c f o n t s . * * * « Af te r our documeatary The F i f t b Estute was a iced by the CBC * ft* * * * * Keviu O N e i l l who wnu revealed au J i r e c t o r of Oanadas l acges t * ft* ft * i u t e l l i g e n c e uKency the Comraunicationc Bracch of t he .Na t iona l ft ft ft ft ft ft Feuearch Counci l OBMC was usbed by a newspaper repor ter t-jho ft* ft ft i t wuu thut he reportu to O N e i l l t o ld him that he had spent ft ft * mout of the morning t r y i u g tu determine j u s t that ft ft ft ft ft ft Af te r our documentar? Lhe L i f t h Estate wax a i r e d bV tbe CBC * * * ft * * ft Kavin O N e i l l who w?s reve? leJ es d i r ec toc of Canadax l a rges t ft ft ft ft ft ft i n t e l l i ? e n c e a?enc? the Communications Br?nch of the N a t l o ? a l * * * * Research Counc i l CBNKC was axked b? a newspaper -reporter who ft ft ft i t was thc t he reports to O N e i l l t o ld aim that he had xpent ft* ft * * ft m o s t of tAe morninq t r ? i n g to betermine j u s t that ft ft ft ft * Af te r our decunentary The F i f t h Estate was aiwed ?y the CBF ft ft ft * Kevlw O N e i l l who was revealed av d i r ec to r o f Cawahas l a r g e s t ft ft * * ** * i a t e l l i g e a c e egency the Comnunicatioax Braach of the N a t i o n a l * ft ft ft ft ftft ft ft * ft Pexearch Couns i l CBNPC ras axbed by a newspepar repor te r w?o * * * f t * i t wes thab he reporfs bo O N e i l l bold him that he had spent ft ft * ft movt of the norning t ry jng to determire j u s t that b o t t o m : Hermes A m b a s s a d o r , IBM D e l e g a t e , 40 APPENDIX B Examples of the Character 'Recognition Process The numbered statements r e f e r to areas w i t h i n each p i c t u r e , as i l l u s t r a t e d i n the f i g u r e at the bottom of the page. 1. i n p u t to the system from the p h o t o c e l l scanner. 2. the p o i n t s represent the centre c e l l s of the short s t r i n g s . 3. the p o i n t s represent v e r t i c a l s and c l o s u r e s and are l o c a t e d d i r e c t l y above the column to which they belong. 4. the c h a r a c t e r i s quantized i n t o f o u r l e v e l s and each s t r i n g i s assigned i t s a p p r o p r i a t e f e a t u r e . 5. v e r t i c a l compression to three l e v e l s ; s e r i f d e l e t i o n . 6. h o r i z o n t a l f e a t u r e i n t e g r a t i o n along each l e v e l . 7. f i n a l f e a t u r e m a t r i x ; a code-word i s assigned f o r each l e v e l . 8. the three l e v e l s correspond to the three code-words; markers i d e n t i f y code-word matches. 9. subgroup to which the c l a s s i f i c a t i o n i s c o n fined; top and bottom areas represent the lower and.upper case c h a r a c t e r s , r e s p e c t i v e l y . 10. a number above a character i n d i c a t e s the i d e n t i t y of the unknown char- a c t e r ; the numbers one through three s p e c i f y a t what f e a t u r e l e v e l the i d e n t i f i c a t i o n i s made; a four i n d i c a t e s t h a t the i d e n t i f i c a t i o n i s based on the c h a r a c t e r w i t h the most code-word matches. 3 2 u \ X 1 ..... .... - — - 7 6 5 4 41 i n x — N — l • • - x — l • • • x x — x • xxsx—-x—xxj —x I I I x — s — I • • — x - x — I • • X X X S - - x - x x | - x <<<!<<<<<-!<• <x< << < <<<x|< • <x<<<<<<<UI < • R C C N D R S U : • • | • x x x x — M • • j .»"!h • • I x 1 • • 2 m :: B D P H L P m ». . B. . . B. . . • • • • J ii • • | XXXXX-- s | • • • • • • :::::ti i i t H n i < • 1<<<<<<<<<<• <•i<<<<<<<ni<- B D E F X 7 L P R T < • i <<<<<<<•i<• • i<< - • i n i • • i • i • X X X — — s s s l • • • •: . i l l •I.II | • V N - S X - X X X l • • • 3 XI N BDPHLPftT • • • • • X X X N S S | • • • • • • • • x s s - s x - x x x l • • • x<<<<<<<<ni<< x<<<<<<<<UI<<• B D C P I V L P R l <<<<<<<<<•1<<• x n i • • • *UI • <• 1 • 1 I X X X — I • • • • I I • I 1 1 X X X — I • • • • j I <<<-<!<<<<x| <<<•<!<<<<<• <<<-!<<<-<<• RCENDR5UI x x x - I - s B D P H L P U 7 x s x - l - s - — . . . . j <<<•I<<<<<• <<<-l<<<<<- <<<•I<<<<<U B D C P I 7 L P R 7 — x | • • I I • • s s - x x x x - I s - R C C N D R S U : I<<<<<<<•I<• !<<<<<<<•K• I<<<<<<U<I<• 42 :: ii , | / / / X K BDFHLPH7 m | l ' • , | - \ / / / / X X < < • | < < < < < < < < X < X 2 << • 1<<<<<<<<•<x B 0 C F I 7 L P R T << • 1<<<<<<<</<x • • • • 1 s x • • I X • 1 / x X / - X | I X I II 1 s / — X / - X | | X j I • < < < < < - < I < < < < - <<<<<•<!<<<<• <<<<<-!<<<<<- BDFHLPH7 B D C P I 3 L P R T . . . . • • • . . • . | ~ i — • X / * — / — x - x x x x x • • X W S S / - / / / X X ' • X / / - / X - X X X X X - | x x | • X X X X S / - / / / X X - <: x< <<<<<< <<<fl<x< 2 <x<<<<<<<<<<< • <x CGDUSZ <x<<<<<<<<<<U<x< • - aa • • • xf)x • • B X - X x u x - V | / X • - S | / X - . . . J x x x x • • X X I X X X I • • • • • • • K| X X - — S S X X - • — X X X X X I • • • • • K | X X - B O F H L P « 7 <<•!<<<<<<<<-<*<<• <<•!<<<<<<-<<<<*<• <<•!<<<<<<<•<M<<<- 4 EFX7LPRT x / - / — s - s s s s l I x x s x — — s — s - • • • s s x x I N W - S - - / - / / X X x / - / - - x - x x x x | I X X X X - - X - - X X X X X I X X X - X / - / / X X x<<<<<<<<<<n<i x< <<<<<<<<<x<x !<<<<<<<<<<U<x 43 APPENDIX C Examples of 'Errors Top L e f t ; The c losure on the r i g h t s ide was missed f o r t h i s character because the s t r i n g at which the two branches converged was too s h o r t . Top Right ; A broken c losure on the r i g h t s ide r e s u l t e d i n m i s c l a s s i f i - ca t ion of t h i s s t r i n g as a minor v e r t i c a l . Centre L e f t ; The c losure on the r i g h t s ide was miss ing because the con- vergence c r i t e r i o n was not s a t i s f i e d . S p e c i f i c a l l y , the l a s t s t r i n g i n the lower branch ended before the s t r i n g performing the a c t u a l c l o s u r e . A more comprehensive t e s t for a c losure must rep lace the r i g i d , and i n th i s case i n d i s c r i m i n e n t , c r i t e r i a p r e s e n t l y used. Centre Right : The upper r i g h t corner was assumed to be the po in t of d i - vergence for two branches. A check for branch c o n t i n u i t y would have detected t h i s e r r o r . Bottom L e f t : The long s t r i n g on the c h a r a c t e r ' s l e f t s ide f a i l e d to s a t - i s f y the c r i t e r i o n for a f u l l v e r t i c a l . Any s t r i n g other than a f u l l v e r - t i c a l or a very short s t r i n g was tes ted f o r c l o s u r e . In t h i s case , when the f u l l v e r t i c a l was missed, the s t r i n g s a t i s f i e d the c losure c r i t e r i o n . Improper subgroup assignment occurred as a consequence of the l o s t v e r t - i c a l . Bottom Right : The feature matrix was ambiguous f o r t h i s l e t t e r because the d e l e t i o n of the s e r i f s r e s u l t e d i n a l o s s of f ea tures . By r e f e r r i n g back to the c h a r a c t e r ' s s t r u c t u r e before s e r i f d e l e t i o n , the ambiguity can be r e s o l v e d . For t h i s c h a r a c t e r , the system f i r s t searched through a l l the code-words. When no unique e n t r i e s were found, a search was i n - i t i a t e d for the character with the most feature l e v e l matches. When two or more e q u a l l y l i k e l y p o s s i b i l i t i e s e x i s t e d , the programme chose the f i r s t l e t t e r i n the s e r i e s . For t h i s example, w i th m u l t i l e cho ices , the character ' b ' was s e l e c t e d . 44 'I •!« • y • MM' I I • -K| • I ISS-S /-//KX' • • /////-S-WNN- K* I ' MM | • I I W-S S-SSMM- < • <<<<<<<<<<<n<K< I < < < < < < < < < < < < • < < X <<!<<<<<<<<<<u<*< i — x - i 11 • i 1111 i I S S \ " s - l I I • I 111 I I S S — - - « ' # ' * » - • l<<<<<<n<<i< I<<<<<<-<<<! I<<<<<<<<< u< H C E N 0 R S U 2 ::::: • s S S S - S S | K I x X i x • X S S S S - - — • • I I "'hhiii'" Ix x • x s s s s s*\ I <x<<<<<<<<<n<x •: <x<<<<<<<<<<-x <x<<<<<<<<<U<I • ! x ! • • • x I • • • • X S - S / ' - I • BDFHLPftT • • • ! x I • • • • • X S - S .'-1 • <<•<x<<<<<<<<• I <: <<•<x<<<<<<<<•|< <<<•x<<<<<<<<-!< 4 B D F H L P H 7 <• l<<- <•!<<• <• l<<- 45 APPENDIX D Examples of System F l e x i b i l i t y Top L e f t ; High l i g h t i n g i n t e n s i t y produced a break at the bottom of the characte r . Because the damage was l o c a l i z e d to only one l e v e l , i d e n t i - f i c a t i o n was s t i l l p o s s i b l e w i t h the code words from the two remaining f e a t u r e l e v e l s . Top R i g h t : Low l i g h t i n g i n t e n s i t y produced a b l o t t e d image. The second l e v e l i n the f e a t u r e m a t r i x , although s t i l l p l a u s i b l e , was not charac- t e r i s t i c of t h i s l e t t e r under normal c o n d i t i o n s . Centre L e f t : C o rrect c l a s s i f i c a t i o n r e s u l t e d when t h i s l e t t e r was com- pressed by 1/3. Centre R i g h t : E l o n g a t i o n by 1/3 of the same c h a r a c t e r produced s i m i l a r r e s u l t s as above. Bottom L e f t : This c a r e f u l l y handprinted c h a r a c t e r was c o r r e c t l y i d e n t i - f i e d . Bottom Right : Good fe a t u r e s i m p l i f i c a t i o n was obtained on t h i s hand- w r i t t e n c h a r a c t e r from the U k r a i n i a n alphabet. 46 47 APPENDIX E Ba s i c Flowchart f o r the Diagonal Checking Routine (eg.) enter N 1 t a l l l e t t e r ? i s 1st b i t of in f o r m a t i o n i n top 1/4 of 1st column ? N N check bottom edge DU ? »v,V DUDU ?—»w,W $Jcheck l e f t and r i g h t edges k. V-VUD ?• V-UD ? - DU-U ?- DV-UV ?• DU-UD ? ->k -5>Y -»x,X T exit (eg.) i s 1st b i t of i n f o r m a t i o n i n bottom 3/4 of 1st column ? check top edge UD ? -s>A N o t a t i o n : v e r t i c a l J J ^ d o w n OR V-UD

Cite

Citation Scheme:

    

Usage Statistics

Country Views Downloads
United States 14 2
Russia 6 0
China 3 1
Canada 1 0
India 1 0
City Views Downloads
Unknown 13 2
Ashburn 4 0
Beijing 3 0
Sunnyvale 2 0
Montreal 1 0
Mountain View 1 0
New Delhi 1 0

{[{ mDataHeader[type] }]} {[{ month[type] }]} {[{ tData[type] }]}
Download Stats

Share

Embed

Customize your widget with the following options, then copy and paste the code below into the HTML of your page to embed this item in your website.
                        
                            <div id="ubcOpenCollectionsWidgetDisplay">
                            <script id="ubcOpenCollectionsWidget"
                            src="{[{embed.src}]}"
                            data-item="{[{embed.item}]}"
                            data-collection="{[{embed.collection}]}"
                            data-metadata="{[{embed.showMetadata}]}"
                            data-width="{[{embed.width}]}"
                            async >
                            </script>
                            </div>
                        
                    
IIIF logo Our image viewer uses the IIIF 2.0 standard. To load this item in other compatible viewers, use this url:
http://iiif.library.ubc.ca/presentation/dsp.831.1-0065610/manifest

Comment

Related Items