Open Collections

UBC Theses and Dissertations

UBC Theses Logo

UBC Theses and Dissertations

Experiments in character recognition using linear and quadratic filters DeMarco, John Francis 1980

Your browser doesn't seem to have a PDF viewer, please download the PDF to view this item.

Item Metadata

Download

Media
831-UBC_1980_A7 D45.pdf [ 7.84MB ]
Metadata
JSON: 831-1.0065442.json
JSON-LD: 831-1.0065442-ld.json
RDF/XML (Pretty): 831-1.0065442-rdf.xml
RDF/JSON: 831-1.0065442-rdf.json
Turtle: 831-1.0065442-turtle.txt
N-Triples: 831-1.0065442-rdf-ntriples.txt
Original Record: 831-1.0065442-source.json
Full Text
831-1.0065442-fulltext.txt
Citation
831-1.0065442.ris

Full Text

E X P E R I M E N T S I N C H A R A C T E R R E C O G N I T I O N U S I N G L I N E A R AND Q U A D R A T I C F I L T E R S I • by J o h n F r a n c i s D e M a r c o B . A . S c . , U n i v e r s i t y o f W i n d s o r , 1 9 7 6 A T H E S I S S U B M I T T E D I N P A R T I A L F U L F I L M E N T T H E R E Q U I R E M E N T S F O R T H E D E G R E E OF M A S T E R OF A P P L I E D S C I E N C E i n T H E F A C U L T Y OF G R A D U A T E S T U D I E S i n t h e D e p a r t m e n t o f E l e c t r i c a l E n g i n e e r i n g We a c c e p t t h i s t h e s i s a s c o n f o r m i n g t t h e r e q u i r e d s t a n d a r d T H E U N I V E R S I T Y OF B R I T I S H C O L U M B I A J u n e , 1980 ( ? ) J o h n F r a n c i s D e M a r c o , 1980 In presenting th i s thesis in pa r t i a l fu l f i lment of the requirements for an advanced degree at the Univers i ty of B r i t i s h Columbia, I agree that the L ibrary shal l make it f ree ly ava i l ab le for reference and study. I further agree that permission for extensive copying of th is thesis for scholar ly purposes may be granted by the Head of my Department or by his representat ives. It is understood that copying or pub l i ca t ion of th i s thes i s fo r f inanc ia l gain sha l l not be allowed without my written permission. Department of E l e c t r i c a l Engineering The Univers i ty of B r i t i s h Columbia 2075 Wesbrook P l a c e Vancouver, Canada V6T 1W5 Date June 19, 1980 A B S T R A C T T h i s t h e s i s d e s c r i b e s t h e s i m u l a t i o n o f a c h a r a c t e r r e c o g n i t i o n s y s t e m u s i n g r i v e f i l t e r d e s i g n s b a s e d o n p r o b a b i l i s t i c m o d e l s o f c h a r a c t e r p a t t e r n s . F o u r o f t h e d e s i g n s y i e l d l i n e a r f i l t e r s . O f t h e s e , t h r e e a r e b a s e d o n v a r i a t i o n s o f a G a u s s i a n m o d e l . T h e f o u r t h i s b a s e d o n t h e a s s u m p t i o n o f i n d e p e n d e n t b i n a r y - v a l u e d f e a t u r e s . T h e l a t t e r d e s i g n i s s h o w n t o p r o d u c e h i g h e r r e c o g n i t i o n r a t e s t h a n a n y o f t h e o t h e r s w h e n t e s t e d o n M u n s o n ' s m u l t i - a u t h o r h a n d - p r i n t e d c h a r a c t e r s . T h i s f i l t e r d e s i g n i s a l s o t e s t e d o n t w o s u b s e t s o f t h e C o r n e l l m a c h i n e - p r i n t e d d a t a b a s e . T h e f i f t h f i l t e r d e s i g n i s a s p e c i a l c a s e o f a q u a d r a t i c f i l t e r , b a s e d o n a G a u s s i a n m o d e l i n w h i c h s p a t i a l l y s t a t i o n a r y c o v a r i a n c e s t a t i s -t i c s a r e a s s u m e d . T h i s a s s u m p t i o n r e s u l t s i n a f i l t e r s t r u c t u r e c o n s i s t i n g o f a l i n e a r o p e r a t i o n o n t h e p a t t e r n v e c t o r p l u s a l i n e a r o p e r a t i o n o n t h e a u t o c o r r e l a t i o n v e c t o r o f t h e p a t t e r n . T h i s f i l t e r d e s i g n i s f o u n d t o a c h i e v e l o w e r p e r f o r m a n c e t h a n t h e b e s t l i n e a r f i l t e r d e s i g n w h e n t e s t e d o n M u n s o n ' s c h a r a c t e r s , a n d n e a r l y e q u a l p e r f o r m a n c e o n t h e C o r n e l l c h a r a c -t e r s . H o w e v e r , t h e r e a r e i n d i c a t i o n s t h a t a f i l t e r o f t h i s s t r u c t u r e c o u l d a c h i e v e h i g h e r p e r f o r m a n c e f o r s o m e c h o i c e o f f i l t e r c o e f f i c i e n t s . i i i TABLE OF CONTENTS . Page ABSTRACT i i TABLE OF CONTENTS i i i LIST OF TABLES v LIST OF ILLUSTRATIONS v i ACKNOWLEDGEMENTS v i i 1. INTRODUCTION 1.1 Character Recognition and Gaussian Models 1 1.2 Review of Previous Work 3 1.3 O u t l i n e of the Thesis 5 2. FILTER DESIGNS 2.1 The General Gaussian Model 7 2.2 Simple Matched F i l t e r (Type I) 8 2.3 Normalized C r o s s - c o r r e l a t i o n F i l t e r (Type I I ) 10 2.4 Matched F i l t e r w i t h Non-stationary Noise Variance (Type I I I ) 11 2.5 Non-Gaussian Linear F i l t e r (Type IV) 12 2.6 S t a t i o n a r y Covariance F i l t e r (Type V) 13 3. FILTER SIMULATIONS 3.1 D e s c r i p t i o n of Data Sets 20 3.2 Preprocessing of Character P a t t e r n s 22 3.2.1 Munson's Data Base 22 3.2.2 C o r n e l l Data Base 25 3.2.3 A u t o c o r r e l a t i o n 26 3.3 S i m u l a t i o n of Separate T r a i n i n g and T e s t i n g Samples 26 3.3.1 Type I F i l t e r 27 3.3.2 Type I I F i l t e r 28 3.3.3 Other L i n e a r F i l t e r s 28 3.4 Implementation 30 4. SIMULATION RESULTS WITH LINEAR FILTERS 4.1 Summary of R e s u l t s 33 4.2 D i s c u s s i o n of R e s u l t s w i t h Munson's Characters 35 4.3 D i s c u s s i o n of R e s u l t s w i t h C o r n e l l Characters 41 iv P a g e 5. S I M U L A T I O N R E S U L T S WITH S T A T I O N A R Y C O V A R I A N C E F I L T E R S 5.1 Summary o f R e s u l t s 45 5.2 D i s c u s s i o n o f R e s u l t s w i t h M u n s o n ' s C h a r a c t e r s 45 5.3 D i s c u s s i o n o f R e s u l t s w i t h C o r n e l l C h a r a c t e r s 52 6. C O N C L U S I O N 6.1 S u m m a r y o f C o n c l u s i o n s 55 6.2 S u g g e s t i o n s f o r F u r t h e r W o r k 57 A P P E N D I X 59 R E F E R E N C E S 95 LIST OF TABLES Number of Samples i n 3MIBEX and CORAL2 Subsets of C o r n e l l Data Base Recognition Rates f o r L i n e a r F i l t e r s w i t h Munson's Data Recognition Rates f o r Type IV F i l t e r s w i t h C o r n e l l Data Comparison of R e s u l t s w i t h Munson's Data Recognition Rates f o r Type V F i l t e r s w i t h Munson's Data, Compared to Rates f o r Type I and Type IV F i l t e r s R e c ognition Rates f o r Type V F i l t e r s w i t h C o r n e l l Data, Compared to Rates f o r Type IV F i l t e r s v i LIST OF ILLUSTRATIONS Fig u r e 1.1 S t r u c t u r e of a Bayes C l a s s i f i e r 2.1 P a t t e r n Vector P and Augmented P a t t e r n Vector X 3.1 Samples from Munson's Data Base 3.2 Samples from C o r n e l l Data Base (a) 3MIBEX Subset (b) C0RAL2 Subset 3.3 S t r u c t u r e of Simulated R e c o g n i t i o n System ACKNOWLEDGEMENTS I would l i k e to thank Dr. R.W. Donaldson f o r h i s encouragement and patience throughout the course of t h i s p r o j e c t . Valuable advice and suggestions were a l s o provided by other members of the department and by Dr. R. Westwick of the Department of Mathematics. F i n a n c i a l support during t h i s work was provided by the N a t u r a l Sciences and Engineering Research C o u n c i l . 1. I N T R O D U C T I O N 1.1 C h a r a c t e r R e c o g n i t i o n a n d G a u s s i a n M o d e l s T h e r a p i d i n c r e a s e i n t h e p o w e r o f d a t a p r o c e s s i n g m a c h i n e s o v e r t h e p a s t t w o d e c a d e s h a s i n v i t e d t h e d e v e l o p m e n t o f f a s t m e t h o d s o f i n p u t -t i n g i n f o r m a t i o n t o s u c h m a c h i n e s . One m e t h o d o f i n p u t w h o s e i m p o r t a n c e i s s t e a d i l y i n c r e a s i n g i s t h e o p t i c a l r e a d i n g a n d r e c o g n i t i o n o f m a c h i n e -p r i n t e d a n d h a n d - p r i n t e d c h a r a c t e r s . T h i s f o r m o f i n p u t i s a t t r a c t i v e b e -c a u s e m o s t h u m a n s a n d a v a r i e t y o f m a c h i n e s g e n e r a t e c h a r a c t e r i n f o r m a t i o n . W h i l e c h a r a c t e r r e a d e r s o f v a r i o u s t y p e s h a v e b e e n i n u s e f o r some y e a r s , i n m o s t c a s e s t h e t y p e o f i n p u t m u s t b e c o n s i d e r a b l y r e s t r i c t e d i n o r d e r t o a c h i e v e p r a c t i c a l c o s t - p e r f o r m a n c e . T h i s i s e s p e c i a l l y t r u e o f h a n d - p r i n t i n g , f o r w h i c h r e s t r i c t i o n s m i g h t i n c l u d e a l i m i t e d a l p h a b e t , p h y s i c a l c o n s t r a i n t s o n t h e p r i n t i n g , o r t h e u s e o f t r a i n e d p e r s o n n e l f o r p r i n t i n g [ H 5 , K 2 , S I ] . F o r m a c h i n e p r i n t , r e a d e r s w h i c h h a n d l e a w i d e v a r i e t y o f f o n t s , s u c h a s p o s t a l a d d r e s s r e a d e r s , c o n t i n u e t o b e v e r y e x -p e n s i v e . T h e r e i s s t i l l a s t r o n g i n c e n t i v e t o i n c r e a s e t h e v e r s a t i l i t y a n d s p e e d o f r e c o g n i t i o n m a c h i n e s , a n d / o r r e d u c e t h e c o s t . F o r t u n a t e l y , t h e a b o v e - m e n t i o n e d i n c r e a s e i n c o m p u t i n g p o w e r i s a l l o w i n g t h e r a n g e o f p r a c t i -c a l r e c o g n i t i o n a l g o r i t h m s t o e x p a n d . A l a r g e c l a s s o f r e c o g n i t i o n s c h e m e s w h i c h h a s b e e n i m p l e m e n t e d o r p r o p o s e d may b e d e s c r i b e d a s f o l l o w s : a s e t o f c h a r a c t e r p a t t e r n s o f k n o w n i d e n t i t y ( c a l l e d t h e t r a i n i n g s e t ) a r e u s e d t o a p p r o x i m a t e t h e p r o b a b i l i t y d i s t r i b u t i o n s o f t h e p a t t e r n s f o r e a c h c h a r a c t e r c l a s s . T h e s e d i s t r i b u t i o n s a r e t h e n u s e d t o d e s i g n a s e t o f f i l t e r s o r d i s c r i m i n a n t f u n c t i o n s , w h i c h o p e r a t e o n a n u n k n o w n p a t t e r n t o y i e l d a m e a s u r e o f t h e p r o b a b i l i t y o f t h e given pattern occurring, conditioned on each c l a s s . The f i l t e r outputs may then be used, p o s s i b l y i n conjunction with contextual or other information, to assign the unknown pattern to a c l a s s . I f t h i s assignment, o r de c i s i o n , i s made so as to minimize the p r o b a b i l i t y of e r r o r , or of some function of the p r o b a b i l i t y of err o r known as r i s k , then the system i s known as a Bayes c l a s s i f i e r . Figure 1.1 i l l u s t r a t e s the st r u c t u r e of a Bayes c l a s s i f i e r . This c l a s s of recognition schemes o f f e r s v i r t u a l l y u nlimited c a p a b i l i t y , depending on the choice of p r o b a b i l i s t i c model. I t lends i t s e l f very w e l l to implementation by machines which can perform a l a r g e number of 1 A PRIORI v PROBABILITIES, j CONTEXTUAL INFORMATION, • "COST"-OF INCORRECT 1 DECISIONS F1 I DECISION DEVICE: © SELECTS CLASS WHICH MINIMIZES RISK DECISION © e (ONE FILTER PER CLASS) y n IS A MEASURE Figure 1.1 Structure of a Bayes C l a s s i f i e r (n denotes the pattern c l a s s ) . s i m i l a r computations r a p i d l y , and may f r e e the designer.from e x p l i c i t l y determining a set of features or d e s c r i p t o r s which are necessary to i d e n t i f y a c h a r a c t e r . w i t h i n t h i s broad group of methods, one approach i s to assume th a t the p a t t e r n s of a given c l a s s can be described as Gaussian n o i s e added to a d e t e r m i n i s t i c p a t t e r n , or e q u i v a l e n t l y , t h a t the p a t t e r n s are sample func-t i o n s from a Gaussian random process. Although t h i s assumption i s u s u a l l y very i n a c c u r a t e , the p r o p e r t i e s of the Gaussian d i s t r i b u t i o n s i m p l i f y the design and implementation of r e c o g n i t i o n a l g o r i t h m s . While t h i s approach i s not new, the r e l a t i v e m e r its of i t s many p o s s i b l e v a r i a t i o n s have not been f u l l y i n v e s t i g a t e d or compared to other a v a i l a b l e approaches. The main purpose of t h i s t h e s i s i s to evaluate a group of f i l t e r designs f o r character r e c o g n i t i o n . Designs based on the above approach are described along w i t h another c l o s e l y r e l a t e d design, and t h e i r performance i s compared experimentally f o r hand-printed and t y p e w r i t t e n c h a r a c t e r s . 1.2 Review of Previous Work The o p t i m a l Bayes s t r a t e g y f o r c h a r a c t e r r e c o g n i t i o n was f i r s t proposed by Chow [ C l ] , who also presented the form of the op t i m a l d i s c r i m i -nant f u n c t i o n s f o r p a t t e r n s of s t a t i s t i c a l l y independent v a r i a b l e s w i t h Gaussian d i s t r i b u t i o n s . Minsky [M2] and others [ B l , C2] developed the case of independent bi n a r y p a t t e r n v a r i a b l e s . The theory of the Bayes r e c o g n i -t i o n s t r a t e g y and s e v e r a l s p e c i a l cases have been reviewed more r e c e n t l y by Duda and Hart [D2]. The o p t i m a l f i l t e r f o r the case of independent v a r i a b l e s w i t h con-sta n t Gaussian noise has long been recognized as e q u i v a l e n t to the matched f i l t e r of s i g n a l d e t e c t i o n theory. M i l s o n and Rao [Ml] have demonstrated t h i s r e l a t i o n and t r e a t e d the e f f e c t of sampling on a b i n a r y - v a l u e d two-dimensional p a t t e r n . The assumption of Gaussian s t a t i s t i c s w i t h s p a t i a l l y s t a t i o n a r y covariance has been invoked f o r image r e s t o r a t i o n [H8], which has some common ground w i t h p a t t e r n r e c o g n i t i o n . Published experimental r e s u l t s u s i n g the techniques d i s c u s s e d i n t h i s t h e s i s have been l i m i t e d and i n c o n c l u s i v e . Experiments us i n g matched f i l t e r s based on noise having constant v a r i a n c e over the p a t t e r n s ( a l s o known as the c r o s s - c o r r e l a t i o n or minimum-distance method), both w i t h and without n o r m a l i z a t i o n of the p a t t e r n v e c t o r s , have been reporte d f o r hand and machine p r i n t i n g [H3, H4, N2, P3]. Matched f i l t e r s have a l s o been a p p l i e d i n the frequency domain [HI, M l , 01]. At l e a s t four s t u d i e s have been pu b l i s h e d using f i l t e r s based on the assumption of independent b i n a r y features [ B l , C2, B3, U l ] . The l a s t two of these made comparisons w i t h nor-malized c r o s s - c o r r e l a t i o n , but reached d i f f e r e n t c o n c l u s i o n s . Because of v a r i a t i o n s i n methodology and data bases used, and u s u a l l y s m a l l sample s i z e s , no general c o n c l u s i o n as to which method i s best can be drawn from the above r e p o r t s . However, some of them provide u s e f u l comparisons w i t h other methods not considered i n t h i s t h e s i s [B3, C2, P3, U l ] , and w i t h human performance [N2]. I t should be noted that many v a r i a t i o n s of Bayes c l a s s i f i c a t i o n have been a p p l i e d t o character r e c o g n i t i o n i n c o n j u n c t i o n w i t h some f e a t u r e e x t r a c t i o n process which transforms the o r i g i n a l p a t t e r n i n t o a f e a t u r e vec-t o r of lower d i m e n s i o n a l i t y ( f o r example, [H9, Q I ] ) . The methods considered i n t h i s t h e s i s are d i s t i n g u i s h e d by the f a c t that they operate d i r e c t l y on 5. the e n t i r e character p a t t e r n , r a t h e r than on a feature v e c t o r d e r i v e d from the p a t t e r n . Many p r a c t i c a l character readers have been based on methods r e l a t e d to those t r e a t e d i n t h i s t h e s i s , i n t h a t they implement a set of l i n e a r operations on input p a t t e r n s using one "mask" or "template" per c h a r a c t e r c l a s s [H2, H7, N l , R l ] , However, the masks have u s u a l l y been designed e i t h e r by using i d e a l character models, or i n t u i t i v e methods combined w i t h t r i a l and e r r o r , r a t h e r than p r o b a b i l i s t i c models of ch a r a c t e r p a t t e r n s . A l s o , the weighting c o e f f i c i e n t s have t y p i c a l l y been r e s t r i c t e d to a very few d i s c r e t e v a l u e s . Perhaps because of the design methods used, l i n e a r masks have been considered s u i t a b l e only f o r reading machine p r i n t of a f i x e d number of fonts [H7, R l ] . 1.3 O u t l i n e of the Thesis In Chapter 2, f i v e f i l t e r designs f o r a p p l i c a t i o n to c h a r a c t e r r e c o g n i t i o n are presented, based on d i f f e r e n t but r e l a t e d models of the p r o b a b i l i t y d i s t r i b u t i o n s of character p a t t e r n s . Four of these are l i n e a r , w h i l e the f i f t h i s a s p e c i a l case of a q u a d r a t i c f i l t e r . . The l a s t design i s developed i n greater d e t a i l than the o t h e r s , s i n c e i t has not p r e v i o u s l y appeared i n the l i t e r a t u r e . Chapter 3 describes the data used to t e s t these f i l t e r designs, and the implementation of the simulated r e c o g n i t i o n system. In Chapter 4, the experimental r e s u l t s obtained w i t h the four l i n e a r f i l t e r designs are presented and d i s c u s s e d . The same i s done f o r the r e s u l t s of the f i f t h f i l t e r design i n Chapter 5. 6. In Chapter 6, the conclusions of the. preceding chapters are sum-marized, and some suggestions f o r f u r t h e r work are presented. 2. F I L T E R D E S I G N S 2.1 T h e G e n e r a l G a u s s i a n M o d e l A r a t h e r g e n e r a l s t a r t i n g p o i n t i n d e s i g n i n g a r e c o g n i t i o n a l g o -r i t h m i s t o a s s u m e t h a t t h e c h a r a c t e r s i n a g i v e n p a t t e r n a r e s a m p l e s f r o m a G a u s s i a n r a n d o m p r o c e s s . We l e t X = { x i , X 2 , x ^ } r e p r e s e n t a d i s c r e t e k - d i m e n s i o n a l p a t t e r n v e c t o r . T h e n , f o r e a c h p a t t e r n c l a s s M = 1, 2, N , t h e r e i s a p r o b a b i l i t y d e n s i t y f u n c t i o n o f t h e w e l l - k n o w n G a u s s i a n f o r m ( E [ ] d e n o t e s e x p e c t e d v a l u e ) : f ( X ) = ( — — ) — e x p { - % ( X - M ) A - 1 ( X - M ) t } (2.1) n \ rz~ J n n n \ /2TT ' A \ k w h e r e M = {m , . . . , m } n n ' n , 1 k = E [ X n ] a n d n n . . X = E [ ( x . - m ) ( x . - m ) n ] n . . i n . i n . i j i J T h e p a t t e r n s f r o m a p a r t i c u l a r c l a s s c a n b e c o n s i d e r e d a s d e t e r -m i n i s t i c p a t t e r n s M ^ s u p e r i m p o s e d w i t h G a u s s i a n n o i s e h a v i n g t h e c o v a r i a n c e m a t r i x A n > T h e n o i s e c o m p o n e n t m a y , i n g e n e r a l , b e n o n - w h i t e a n d d e p e n d e n t o n p o s i t i o n w i t h i n t h e p a t t e r n . I f t h e c h a r a c t e r c l a s s e s a r e a s s u m e d t o b e e q u i p r o b a b l e , t h e r e -c o g n i t i o n s t r a t e g y w h i c h m i n i m i z e s t h e p r o b a b i l i t y o f e r r o r c o n s i s t s o f c h o o s i n g t h e c l a s s n w h i c h m a x i m i z e s t h e p r o b a b i l i t y o f a g i v e n p a t t e r n X o c c u r r i n g . T h u s , we c h o o s e t h e c l a s s w h i c h m a x i m i z e s f ^ ( X ) » o r w h i c h minimizes the f o l l o w i n g monotonically decreasing f u n c t i o n of f ^ ( X ) : g (X) = l o g |A I + (X-M ) A - 1 (X-M ) f c (2 .2) n 1 n' n n n For p a t t e r n s of l a r g e d i m e n s i o n a l i t y , the computation r e q u i r e d to evaluate the above equation f o r each c l a s s i s p r o h i b i t i v e . One approach to designing a p r a c t i c a l character r e c o g n i t i o n system i s to f i r s t transform the pa t t e r n space to a space of lower d i m e n s i o n a l i t y , w h i l e r e t a i n i n g enough i n -formation to recognize the ch a r a c t e r s . This leads to a very wide range of p o s s i b i l i t i e s , which are not considered i n t h i s t h e s i s . A second approach, the one used here, i s to make f u r t h e r assumptions or r e s t r i c t i o n s which s i m p l i f y the p r o b a b i l i t y d e n s i t y gunction given i n equation ( 2 . 1 ) . The f o l l o w i n g s e c t i o n s describe the f i l t e r designs i n v e s t i g a t e d ; each design i s optimal f o r the set of assumptions on which i t i s based. The f i l t e r s are described i n approximate order of i n c r e a s i n g g e n e r a l i t y of the unde r l y i n g model. (For convenience, the v a r i o u s f i l t e r designs are l a t e r r e f e r r e d to as Type I through Type IV. These type numbers appear i n the t i t l e s of the f o l l o w i n g s e c t i o n s . ) Henceforward, i t i s assumed that the character p a t t e r n s are ve c t o r s of binary-valued f e a t u r e s . A d e s c r i p t i o n of the a c t u a l p a t t e r n s used i n the r e c o g n i t i o n experiments appear In Chapter 3. 2.2 Simple Matched F i l t e r (Type I) E v a l u a t i o n of equation (2 .2) may be g r e a t l y s i m p l i f i e d i f the no i s e covariance m a t r i x , A, i s a constant times the i d e n t i t y m a t r i x . This c o r r e -sponds to the assumption that the characters i n a given c l a s s c o n s t i t u t e a d e t e r m i n i s t i c mean p a t t e r n superimposed w i t h white s t a t i o n a r y n o i s e . W h i t e n e s s i m p l i e s t h a t t h e n o i s e i s u n c o r r e l a t e d w i t h i t s e l f , a n d t h u s t h a t t h e f e a t u r e s w i t h i n a p a t t e r n a re . s t a t i s t i c a l l y i n d e p e n d e n t . S t a t i o n a r i t y i m p l i e s t h a t t h e n o i s e v a r i a n c e i s i n d e p e n d e n t o f p o s i t i o n w i t h i n t h e p a t -t e r n . T h e r e s u l t i n g r e c o g n i t i o n a l g o r i t h m s e l e c t s t h e c l a s s n w h i c h m i n i -m i z e s t h e f u n c t i o n : M . M f c y = ~ ~ - M - X * ( 2 . 3 ) ' n 2 n T h i s o p e r a t i o n , w h i c h i s l i n e a r i n X, i s r e f e r r e d t o a s . m a t c h e d f i l t e r i n g o r c r o s s - c o r r e l a t i o n . M i n i m i z i n g y ^ i s e q u i v a l e n t t o s e l e c t i n g t h e m e a n p a t t e r n w h i c h i s c l o s e s t i n E u c l i d e a n d i s t a n c e t o t h e u n k n o w n p a t t e r n , s i n c e d 2 = X-X1 - 2 M - X 1 + M ' M 1 n n n n = 2 y + X-Xt ( 2 . 4 ) • 'n I n t h e e x p e r i m e n t a l s i m u l a t i o n s o f t h i s f i l t e r d e s i g n , t h e m e a n p a t t e r n s , M ^ , w e r e r e p l a c e d b y t h e i r r e s p e c t i v e e s t i m a t e s o b t a i n e d f r o m a s e t o f T k n o w n t r a i n i n g p a t t e r n v e c t o r s , ^X, f r o m e a c h p a t t e r n c l a s s a s f o l l o w s : 1 T V = — * t X i l t = l o r "n - - r t X ( 2 - 5 ) T h e s y m b o l d e n o t e s t h e e s t i m a t e o f a p a t t e r n s t a t i s t i c . 10. 2 .3 Normalized C r o s s - C o r r e l a t i o n F i l t e r (Type I I ) One problem w i t h the preceding design i s that the magnitude of X may vary w i d e l y w i t h i n a given c l a s s , f o r example, due to d i f f e r i n g widths of pen strokes or type. This problem may be addressed by n o r m a l i z i n g the magnitude of both the unknown and the d e t e r m i n i s t i c characters before f i l -t e r i n g . In t h i s case, the r e c o g n i t i o n procedure i s to maximize y. M •Xt M -if n n n n M X 2 M M i n i i i 1 n 1 n 1 M •Xt . n 4- (2-6) or e q u i v a l e n t l y , y ' M X i n i • M ' X t n n l M n l This procedure i s o f t e n r e f e r r e d t o as normalized c r o s s - c o r r e l a t i o n . E f f e c t i v e l y , the c l a s s s e l e c t e d i s the one which minimizes the angle between the unknown and the s t o r e d d e t e r m i n i s t i c p a t t e r n v e c t o r s . This v a r i a t i o n was i n c l u d e d mainly because the f i l t e r outputs could be e a s i l y obtained from the r e s u l t s of the Type I f i l t e r s without a c t u a l l y implementing another set of f i l t e r s . I f we denote the corresponding outputs of the simple matched f i l t e r and the c r o s s - c o r r e l a t i o n f i l t e r as y T and y__ • In I l n r e s p e c t i v e l y , then M •Mt y„ = — M 'X y I n 2 n and M •Xt n I I n M n M / n I t s h o u l d b e n o t e d t h a t t h i s r e s u l t i s n o t e x a c t l y e q u i v a l e n t t o t h a t w h i c h w o u l d b e o b t a i n e d i f t h e t r a i n i n g s a m p l e s w e r e n o r m a l i z e d b e f o r e a v e r a g i n g t o e s t i m a t e M . I n t h e p r e s e n t c a s e , t h e t r a i n i n g s a m p l e s w i t h l a r g e r v e c t o r m a g n i t u d e s c a r r y a l a r g e r w e i g h t i n d e t e r m i n i n g t h e f i l t e r c o e f f i c i e n t s . 2 . 4 M a t c h e d F i l t e r w i t h N o n - s t a t i o n a r y N o i s e V a r i a n c e ( T y p e I I I ) A p o s s i b l e a s s u m p t i o n , w h i c h i s s l i g h t l y m o r e g e n e r a l t h a n t h a t o f t h e s i m p l e m a t c h e d f i l t e r , i s t h a t t h e n o i s e i s w h i t e b u t may h a v e d i f f e r e n t v a r i a n c e f o r e a c h e l e m e n t o f e a c h p a t t e r n c l a s s . T h e n o i s e c o v a r i a n c e m a -t r i x r e m a i n s d i a g o n a l , a n d ( 2 . 2 ) y i e l d s a r e c o g n i t i o n p r o c e d u r e w h i c h m i n i -m i z e s K K ( x i - m n . ) 2 y n = * 1 0 8 A n i J L + * A n . . 1 1=1 1=1 " n w h e r e K i s t h e n u m b e r o f p a t t e r n f e a t u r e s . S i n c e x ? i s e q u a l t o x_^ f o r b i n a r y - v a l u e d f e a t u r e s , t h i s r e d u c e s t o a f o r m w h i c h i s l i n e a r i n X : K K ( 1 _ 2 n i n . ) K m n . y = I l o g A n + Z x — r ± - + Z j-^- ( 2 . 8 ) n i = l 1 1 i = l 1 X n i i i = l ^ i i T h e n o i s e v a r i a n c e s w e r e e s t i m a t e d f r o m a s e t o f T k n o w n t r a i n i n g s a m p l e s f o r e a c h c l a s s a s f o l l o w s : = m - m2 (2.9) n. n. 1 1 Although t h i s estimate i s biased by a f a c t o r ( T - l ) / T , i t does not seem l i k e l y that the performance of the a l g o r i t h m would be n o t i c e a b l y a f f e c t e d . I t i s not obvious whether an unbiased estimate would, i n f a c t , be optimal from the p o i n t of view of minimizing i n c o r r e c t c l a s s i f i c a t i o n . Since a f i n i t e number of t r a i n i n g samples were used to estimate m , these estimates o f t e n had values of e x a c t l y 0 or 1. These two v a l u e s , n. l i f s u b s t i t u t e d i n t o equations (2.9) and (2.8), would produce an undefined r e s u l t . To avoid t h i s problem, the values of m were a r b i t r a r i l y l i m i t e d n. l to between 0.01 and 0.99. 2.5 Non-Gaussian L i n e a r F i l t e r (Type IV) Another design was t e s t e d which i s not based on the assumption of Gaussian s t a t i s t i c s , but which i s s i m i l a r t o the three preceding designs i n i t s implementation. I f we maintain only the assumption of independent f e a -t u r e s , the p r o b a b i l i t y d i s t r i b u t i o n of a b i n a r y - v a l u e d f e a t u r e i s completely determined by the p r o b a b i l i t y , p , that the f e a t u r e i s a '1'. As a r e s u l t , n. l a l i n e a r f i l t e r can be designed which d i r e c t l y y i e l d s a measure of the p r o b a b i l i t y of a p a t t e r n o c c u r r i n g c o n d i t i o n e d on a p a r t i c u l a r c l a s s , as f o l l o w s : y n = l o g (P(x|n)) K = I l o g (P(x |n)) i = l K I [ l o g ( l - p ) + x . ( l o g ( p ) - l o g ( l - p ))] (2.10) . i n , 1 n . n . i = l l l l The recognition procedure i s to choose the clas s n which maximizes V For a binary feature, p i s the same as the mean value of the i feature, m n. I As for the type Type I I I f i l t e r , a value of 0 or 1 for the estimate of any m would produce an undefined r e s u l t . In t h i s case, a r b i t r a r y l i m i t s n. l of 0.001 and 0.999 were set on these estimates. These l i m i t s are d i f f e r e n t than for the Type III f i l t e r because the Type IV f i l t e r c o e f f i c i e n t s are not as s e n s i t i v e to extreme values of m . n 2.6 Stationary Covariance F i l t e r (Type V) A l l of the preceding designs were based on the assumption that the features of a pattern are independent, i n order to avoid computation of the quadratic term (X-M ) A - 1 (X-M ) t n n n It i s expected that the performance of those designs could be im-proved by making some use of the c o r r e l a t i o n among the pattern elements, without going as f a r as computing exactly the above expression. The approach taken here was to develop a f i l t e r based on the assumption that the noise co-variance i s s p a t i a l l y s t a tionary. The noise covariance between two features fo r a given class then depends only on the r e l a t i v e p o s i t i o n of the two fea-tures i n the pattern. The following shows how t h i s assumption r e s u l t s i n a form of f i l t e r which i s f e a s i b l e to implement. Consider f i r s t a s p a t i a l l y one-dimensional pattern, represented by a vector X of dimensionality K. For a given c l a s s n, the p r o b a b i l i t y d i s -t r i b u t i o n of the random component of the patterns i s described by a sym-metri c a l covariance matrix: X X n n 11 12 12 n IK IK n. KK (2.11) We assume that the covariance between any two points i n the pattern depends only on t h e i r r e l a t i v e displacement, and that the f i r s t element of the pattern vector i s adjacent to .the l a s t ( a l t e r n a t i v e l y , consider the pat-tern vector to represent one period of a p e r i o d i c p a t t e r n ) . This implies that X = A " i j " l , (j-i+l)Mod K (2.12) This means that each row of A contains the same elements as the n f i r s t row, but each row i s rotated one p o s i t i o n to the r i g h t from the pre-vious row. Noting also that A^ i s symmetrical, we obtain the form: 15. n K - l X X X X " l l n i 2 > n i Q X X ••• X X n i 2 n i l n i Q-l 1Q n 12 X X n!3 n ! 2 n 13 n 11 (2.13) where Q = — — f o r K odd. ( I t f o l l o w s from subsequent d i s c u s s i o n s that K i s always odd.) This form i s known as a symmetric c i r c u l a n t m a t r i x . Because of these p r o p e r t i e s , the eigenvectors of are e a s i l y obtained, a l l o w i n g the determinant and the i n v e r s e to be computed by d i a g o n a l i z a t i o n [B2] . The r e -s u l t i n g i n v e r s e i s a l s o a symmetric c i r c u l a n t m a t r i x , the elements of which we denote by X n. . We now d e f i n e a m a t r i x R: 0 1 0 -0 0 1 0 R = 0 1 0 0 0 0 1 0 (2.14) P o s t - m u l t i p l y i n g any matrix by R has the e f f e c t of r o t a t i n g each row of that matrix one p o s i t i o n to the r i g h t . Thus, the powers of R are a l l s h i f t e d v e r s i o n s of R. We can now w r i t e A i n the f o l l o w i n g form, where I i s the n i d e n t i t y matrix: A - 1 = A I + X R + X R 2 + • • • n n n n 1 2 + X R Q _ 1 + X R Q + • • • + X R K - 1 (2.15) n l Q n i Q . n l 2 From (2.2) we need to compute the following function f o r each un-known pattern and each hypothesis: g n(X) = log | A j + (X-Mn) A" 1 (X-M n) t = l o g |A I + XA _ 1 Xt + M A _ 1 M t - 2X A _ 1 1 n 1 n n n n n n concentrating on the d i f f i c u l t quadratic term, we obtain: XA _ 1 Xt = X[A I + X R + • • • + A R K _ 1 ] X t  n i l n!2 n ! 2 = [A , A , A 1'IXIX1", XRXfc, X R ^ V l n l l n 1 2 n l 2 (2.16) The f i r s t vector on the right-hand side of the above equation i s simply the f i r s t row of A \ which we denote by V^. The second vector, which we denote by A(X) = ( a ^ X ) , a 2(X) , a^X)}, may be inter p r e t e d as the autocorrelation function of X, since a (X) = XR1""3*1" L X. X K 3 (J+i - D M o d K (2-17> The recognition procedure can now be expressed as s e l e c t i o n of that c l a s s n which minimizes: y = log | A | + V •A t(X) - 2X A M*" + M A M*1 (2.18) Jn 1 n' n n n n n Thus the assumption of stationary covariance s t a t i s t i c s has reduced the quadratic term to a manageable l i n e a r operation i n A(X), and at the same time allowed the required determinant and inverse of to be ob-tained. The number of operations to compute XA^X*" has been reduced from order K to order K. On the other hand, each operation may be more time-consuming, since the components of A(X) are not binary-valued, whereas X has binary-valued components. (These r e s u l t s could a l t e r n a t i v e l y be derived using Fourier transform descriptions of X and A .) n Recalling that X represents a s p a t i a l l y one-dimensional pattern, we now require a method of mapping a two-dimensional pattern P into X with-out l o s i n g any information. In the case of independent features, t h i s could be done simply by arranging the elements of P as a vector i n any chosen order. This i s not s a t i s f a c t o r y when we are using covariance information, because a r e l a t i v e displacement between two points i n the vector X would not represent a unique r e l a t i v e displacement i n the o r i g i n a l p a t t e r n . For example, i f the rows of P were concatenated to form X, the l a s t element of each row would be considered adjacent to the f i r s t element of the next row. To obtain a one-to-one mapping between displacements i n the pat-tern P and the vector X, the dimensionality of X must be increased. In a square pattern of SxS points, there are S 2 + ( S - l ) 2 d i f f e r e n t p o s s i b l e r e l a -t i v e positions between two points, noting that the order of each p a i r of points i s unimportant. Also, because of t h i s symmetry, the a u t o c o r r e l a t i o n vector A(X) contains only (K+l ) / 2 d i s t i n c t values (for K odd). For each d i s t i n c t value i n A(X) to correspond to a d i f f e r e n t r e l a t i v e p o s i t i o n i n P, we require: ^ - S 2 + ( S - l ) 2 • • K = 4S 2 - 4S + 1 = (2S-1) 2 (2.19) To achieve a p a t t e r n v e c t o r of d i m e n s i o n a l i t y K, the o r i g i n a l p a t t e r n may be augmented w i t h rows and columns of zeroes to form a (2S-1) x (2S-1) p a t t e r n , as shown i n Figure 2.1. The rows of the augmented p a t t e r n may then be concatenated to form the r e q u i r e d p a t t e r n v e c t o r X. I t i s e a s i l y v e r i f i e d that a displacement i n P of i u n i t s i n the v e r t i c a l d i -r e c t i o n and j u n i t s i n the h o r i z o n t a l d i r e c t i o n maps i n t o a displacement of ( i - l ) ( 2 S - l ) + j u n i t s i n the v e c t o r X. The only remaining requirement of the a l g o r i t h m i s the e s t i m a t i o n of the s t a t i s t i c s of the n o i s e from the t r a i n i n g samples. Applying the assumption of s t a t i o n a r i t y to the formula f o r sample v a r i a n c e y i e l d s the f o l l o w i n g e s t i m a t i o n formula: 1 T K ' E [ - £ - E ( x-m_ ) C x , , _ 4 J , w , v - H I ) ] »li T t = l 1 K 3 = 1 V t J V t ( J " i + 1 ) M ° d K n(3+i+DMod K T K 1 E a . ( X) - E (m -m ) or T t = i 1 t K j = i n - j n ( j - i + l ) M o d K 1 T T t n Thus, the estimate of may be obtained from the a u t o c o r r e l a t i o n f u n c t i o n s of the t r a i n i n g samples of c l a s s n. S-J F o e " — PS1 'SS ORIGINAL PATTERN Pl1 P22 P 1 2 "T~ s as 19. S-1-J X(2S-1)2 THIS AREA CONTAINS ZEROES THIS AREA CONTAINS ORIGINAL PATTERN S-1 X2S-1 Figure 2.1 Pattern P and Augmented Pattern Vector X 3. FILTER SIMULATIONS 3.1 Description of Data Sets Two sources of character data were used to test the f i l t e r designs Munson's multi-author handprinted characters [M3], and a set of typewritten characters prepared at Cornell Aeronautical Laboratory [C3]. Both data sets were available on computer-compatible magnetic tapes. Munson's data set consists of 147 samples of each Fortran charac-ter (A to Z, 0 to 9 and ten special characters) handprinted by 49 different authors. Only 146 of the alphabets were used in these tests, due to a defect in 144th alphabet of the copy which was available. The samples show wide variation in size, style and quality. Some typical examples are shown in Figure 3.1. Each sample i s a 24 x 24 array of bits, with '1' indicating dark-ness, or the presence of the character, and '0' the background. The Cornell data base consists of 100,021 characters from the following sources: 37,666 characters from IBM Executive and Remington E l i t e typewriter fonts; 22,282 characters obtained from actual typewritten addresses and from samples of 58 different typewriter fonts; and 40,073 modified characters obtained from some of the above characters by scaling down one or both dimensions, rotating, or adding and deleting black points. These modifications were intended to simulate problems encountered in reading mail addresses. 2 1 . o • C O • o * to » • o * XM i i g l r 1 pr r r E x x |x x r. x x x x x x x ' I x X X X X X X X X o * C O » » < « * » * « o a C O » « X X X X X X X X X « » « • . • X X X X X X X X x x x x X X X X X X X X X X X X X X X x x x X X X i l l 11 | l l l X X X X * • • • • • | x l x * Ix X X lx X X & X X X \r. x x x I x & X X X 2. X X, X X X X X X X X X X X X X X x I i X X X X X X X X X X X X x x : x x: x x ! I X X X X X X X X X X s X X X X X X X X X X X X X X x X X X X X X X . X X X ........... x x x X X X x x x X X X X X — X X X X X X X X X X X X X X X X X X X x , X X X X X X X X X X X X X X X X X X X X X X X X x X X X X X X X X X X X £ X X X X X X X X X X X X X x 5 X X ' I IX IX X X _ X X X X X X X X X X X X X X X X X X X X X X X x — X X ' X X X X X X X X X X X X X x T. r x x x X X X X X _ X X X X X X X X X X X X X X X X x x x J; x x X X X x I X X X X X X X X X X X X X X X X I I I I I S I I X X X X X hr x X X x x x X X X X X X x,_ X X x|x X X X X X X X X X X X X X X x x x X X X X X X X i x X X X X X X X X X X X X X X I I X X X X X X X X I * I X X X X X x x x x x x x x x x x x x x x X X X X X X X X X X X X X : X X X X : x x : x x X X I I X X X X x x r x X X I I X X X X X X X X X X X X X X I I X X X X X X X X X X X X X X X • • • I I X X * » » » : x : X : x x : X : x : x : x x I I I I I • . • • o • I U • * O * «M • • » o • •" . » « » » » a * <M * r-l » * o » r. X X X X X X x »: x x X I I X X X X X i x X X X X X « o * IM » » t « x £ • . . •x X X X X X X i i X I I x x x x x x XX X X X X X X X X £ x x x x x x x x X X X X X X X X X X x X X X X X X X X X x X X X x J -L : •  i- ••":. • ,... • i . X X X X X : x x X X X x x x x • x x x X X X X X I x x r x x : x x x x : x x X X X x X X X X x x x X X X X fexx x x X X X X X x i I pr. x x x x x x x x : x x x X X X X X X X X X X X X X X X x. X X X X X X |X X X X X X X X X X X X X X X X T X X X X X X X X x _ x X X X X X X X X X X X X X X X X X X I x X • • •• . X X X X x i X X X X I •... . * X X X X X X X X : X X ! X X : X X i X X ' X . • ... # • ffi X X X X X X X x x x : x x x : CD a) Xi at o CO e a <u CO ro OJ H 3 |r x x x x : . r x x x : * x x _ x x : X X : : x : x : x X X X X X X X X X x x : x x : X X : This data base i s d i v i d e d i n t o n e a r l y equal t r a i n i n g and t e s t i n g s e t s . I t i n c l u d e s the 26 l e t t e r s i n upper and lower case, the ten numerals, and the characters ".,-$". As i n Munson's data base, each character i s represented by a 24 x 24 b i n a r y p a t t e r n . In the v e r s i o n of t h i s data base which was a v a i l a b l e to us, a l l the p a t t e r n s were apparently normalized to a s i z e equal to or near 16 x 16 b i t s , using an unknown alg o r i t h m . Since t h i s n o r m a l i z i n g occurred sub-sequent to the above-mentioned m o d i f i c a t i o n s to the 40,073 c h a r a c t e r s , the r e s u l t i n g s i z e of the a c t u a l characters i s h i g h l y dependent on the presence of noise b i t s o u t s i d e the l i m i t s of the o r i g i n a l c h a r a c t e r , e s p e c i a l l y f o r characters of s m a l l o r i g i n a l dimensions. Two subsets of t h i s data base were used i n t h i s work: the IBM Executive samples (which i n c l u d e no lower-case l e t t e r s ) and the set of modified c h a r a c t e r s , i n order to represent the extremes of r e c o g n i t i o n d i f f i c u l t y a v a i l a b l e i n t h i s data base. These two subsets are r e f e r r e d to i n the data base documentation as "3MIBEX" and "C0RAL2" r e s p e c t i v e l y . Some samples of each are shown i n F i g u r e 3.2. Table 3.1 l i s t s the number of each c l a s s of character i n each subset. 3.2 Preprocessing of Character P a t t e r n s 3.2.1 Munson's Data Base Munson's characters were preprocessed by s i z e - n o r m a l i z i n g i n most of the t e s t s , to reduce the v a r i a t i o n s i n s i z e and p o s i t i o n . F i r s t , the l i m i t s of the un-normalized character were determined such that each border row and column contained at l e a s t two b l a c k p o i n t s , and a l l rows and columns .. Number of Samples . Character Class 3MIBEX subset CORAL2 subset T r a i n i n g Testing T r a i n i n g Testing set. set set set A 441 241 666 ' 738 ' B 240 241 450 473 c 478 482 576 610 D 239 241 414 416 E 239 241 655 719 F 239 241 531 560 G 239 241 307 349 H 239 241 350 375 I 478 481 567 612 J 239 240 274 290 K 241 242 323 336 Upper L 240 242 606 589 M 480 482 618 672 Case H 240 239 702 714 0 240 239 789 825 P 240 239 308 305 Q 240 239 220 229 R 240 239 622 653 S . 240 239 510 551 T 240 239 416 431 V 241 243 429 441 V 241 243 287 319 w 241 243 291 300 X 241 243 228 223 Y 241 243 494 519 Z 241 243 235 238 A 0 0 155 170 B 0 0 . 43 42 c 0 0 63 69 D 0 0 49 87 E 0 0 144 177 F 0 0 116 112 G 0 0 38 64 H 0 0 88 82 I 0 0 95 107 J 0 0 28 18 K 0 0 52 78 Lower L 0 0 493 552 M 0 0 28 38 Case K 0 0 120 98 0 0 0 177 224 P 0 0 41 32 Q 0 0 22 34 R 0 0 180 173 S 0 0 77 74 T 0 0 110 116 U 0 0 106 104 V 0 0 42 42 W 0 0 48 41 X 0 0 23 18 y 0 0 43 47 z 0 0 23- 27 1 482 487 461 481 2 . 240 243 670 688 3 475 482 511 556 4 . 470 480 706 704 Numerals 5 6 236 236 239 238 326 276 380 294 7 236 238 337 361 8 472 •476 470 454 9 236 238 285 296 0 1 241 243 196 206 S p e c i a l • 241 241 243 243 352 253 362 269 Characters - 241 243 197 198 $ i 241 : — 243 194 205 Table 3.1 Number of Samples i n 3MIBEX and C0RAL2 Subsets of C o r n e l l Data Base W B O D T I 1 rrm u m t i i t n u u t u u t • " i i i i u i u m n i t i m U l l l l l ~m T r n t j i n — t X11 11111 1U11 i t l l 11111 U l t i n " X1111 11111 " r n r n — n T i n l i m n m m n m i n i n "U 11111111u ti m i l u i H i u r n l t u L H i l l IU—ITTTT n u t i n i u m H i l l n 111 u u " " " i ' i n m i i l l l l l l u i m i m i i l l l l l l l t l M i l i u m l l l l l t l H T -i n n i n n m i n i u h u u m i l m i l m i l ' " l i i u u i l u u i m u t i u i U l l U l t l l m i l i u m T u n — u u i u m u ~'t u u i i " m m m u m i n u u r u u m m 1H111_ H i i u u T TTT U l U l i i u i t i n m u m m i u i u i i r i m u u m u u I m m n u t ' u m i u u m m i n i ' m u u u m i " i f f l l l l l l l l l l U l l l l l l l l l l l l l l l ! U U 1111 H U U 1" n u n n u n u u i u TTI T U T l l l l l l l t l l l l l l l l t l T l l l l l l U u u i i i i l l l l l l l l l i i i l l m u m i i i i i i u t i " l l l i i l l U 111 111 i n u u u u i T T U i U l u u " i n t i n t i i i i i i t i i l i t i -i i n i m i i i i t u i t u i """ U U ~~7~ C O B T L I R T " C 0 R A L 2 U M C O R A L S "CoRALz—RT CdBAlJ—Co" J T L it it m i t i m i n i u u n t i l i n i n n i t t n m t m u u tnt _ t i n u u i u m m u u i UUI t i l l u u 11U 1 n u n . l u u u m m m m u u " u i i i n it n i n n n u i n u n n n n n m u u m m m i t i u i u 111 l l l l l l l l u m m u m u u u i u u u u i l t i i t i i i . u u i n u m Figure 3.2 n u m t l l U t t l l i tt u i it i i t m i m u i i i 11 I u u u t n n u i u u u m m l i i i i i i i i i i u m i l i u m i it I u i l n i i U U l l l U l t t i m u u _ • i u u " t u t t HI 1 1 t u _tn 1 U Hit u m i n i l l t l U t i i t t _ n i m u u t i U l 111 u i l l l l l l l l t 11 Samples from Cornell data base t l i n n t i i i i i i t u u i n t n n i n i n t n i l l l l U l l i i i i t u i i i 25. outside these l i m i t s contained at most one b l a c k p o i n t . The r e c t a n g l e c o n t a i n i n g the character was then d i v i d e d i n t o a 24 x 24 g r i d . Each c e l l of t h i s new g r i d whose centre l a y on a b l a c k c e l l of the o r i g i n a l c h a r a c t e r was considered b l a c k . This process has the e f f e c t of r e p e a t i n g rows and columns of the o r i g i n a l character so that i t f i l l s the 24 x 24 g r i d . For s i m p l i c i t y , no t e s t s of h e i g h t - t o - w i d t h r a t i o or other s p e c i a l t e s t s were made, even though s i z e - n o r m a l i z i n g can s e v e r e l y d i s t o r t c h a r a c t e r s w i t h small dimensions such as upper case I without s e r i f s , ","» "_"> " ( " and " ) " . Although more a t t e n t i o n to preprocessing would probably improve the r e c o g n i t i o n r a t e , the main o b j e c t i v e here was to prepare a character set which could reasonably be used to compare d i f f e r e n t f i l t e r designs. 3.2.2 C o r n e l l Data Base As mentioned p r e v i o u s l y , the C o r n e l l data base was a v a i l a b l e i n a v e r s i o n which had already been subjected to s i z e - n o r m a l i z i n g . This process was apparently s e n s i t i v e to noise b i t s whose presence could a f f e c t d e f i n i t i o n of the character boundary. No r m a l i z i n g a l s o had the e f f e c t of e l i m i n a t i n g v a l u a b l e d i s t i n c t i o n s i n the dimensions of d i f f e r e n t character c l a s s e s . Since the degree of s i z e v a r i a t i o n i n the o r i g i n a l samples i s not known, i t cannot be determined whether the n o r m a l i z i n g was d e s i r a b l e . No attempt was made to improve the n o r m a l i z i n g process. The p a t t e r n s were f u r t h e r processed before use by c e n t e r i n g such that the centre of g r a v i t y of each p a t t e r n was as near as p o s s i b l e to the centre of the 24 x 24 g r i d . Any b i t s which were s h i f t e d o u t s i d e the boundary of the g r i d by t h i s process were ignored. The p o s s i b i l i t y of r e g i s t e r i n g the c h a r a c t e r s to two edges of the g r i d was r e j e c t e d because of i t s greater s e n s i t i v i t y to s t r a y b l a c k p o i n t s and s t y l e v a r i a t i o n s such as s e r i f s . 3.2.3 A u t o c o r r e l a t i o n For the Type V f i l t e r s i m u l a t i o n s , the a u t o c o r r e l a t i o n f u n c t i o n s of the character p a t t e r n s were r e q u i r e d both i n the t r a i n i n g phase (to estimate the covariance matrix) and i n the t e s t i n g phase. From the p o i n t of view of implementation, computing the a u t o c o r r e l a t i o n s may be con-sidered as a preprocessing step. The a u t o c o r r e l a t i o n f u n c t i o n of a p a t t e r n v e c t o r X was d e f i n e d i n Section 2.6, as f o l l o w s : A(X) = { a ^ X ) , a 2 ( X ) , .. ., a R ( X ) } 1 K a i ( X ) = K j = l X j X ( j + i - D M o d K Since the components of X are b i n a r y - v a l u e d , i t would not be most e f f i c i e n t to apply the above formula d i r e c t l y . Instead, the summations were evaluated by computing the displacement between every p a i r of ' 1 ' b i t s i n a p a t t e r n , and incrementing the summation corresponding to that d i s -placement. 3.3 Simulation of Separate T r a i n i n g and T e s t i n g Samples Because of the small s i z e of Munson's data base, i t was d e s i r a b l e to use as many chara c t e r s as p o s s i b l e f o r t r a i n i n g and t e s t i n g , i n order to o b t a i n a r e l i a b l e estimate of performance IK1]. For the four algorithms which are based on independent f e a t u r e s , i t was p o s s i b l e to e x a c t l y simulate the e f f e c t of t r a i n i n g and t e s t i n g on separate samples without a c t u a l l y doing so. A l l the chara c t e r s were used f o r t r a i n i n g , but the r e s u l t s obtained are e x a c t l y those which would be obtained by the s o - c a l l e d U method of Lachenbruch and Mickey [ L I ] , i n which a l l c h aracters except the one being t e s t e d are used f o r t r a i n i n g . The computation r e q u i r e d was l e s s than f o r a r o t a t i o n method such as that of Toussaint and Donaldson IT1], w h i l e the number of t r a i n i n g samples was maxi-mized. The f o l l o w i n g s e c t i o n s d e s c r i b e how t h i s s i m u l a t i o n was achieved. 3.3.1 Type I F i l t e r In t h i s case, each f i l t e r output was f i r s t converted to a Euclidean 2 di s t a n c e measure, d R , using equation (2.4). Then the d i s t a n c e measure corresponding to the c l a s s c o n t a i n i n g the t e s t sample was c o r r e c t e d by a constant m u l t i p l i e r . The j u s t i f i c a t i o n f o r t h i s procedure i s as f o l l o w s . Assume that a p a t t e r n X i s one of T p a t t e r n s of the same c l a s s which were averaged to o b t a i n a sample mean p a t t e r n f o r that c l a s s , M^. The sample mean of the remaining T - l p a t t e r n s , denoted by , i s : M' = ™ n " X (3.1) n T - l Thus the r e q u i r e d d i s t a n c e measure which would be obtained w i t h X excluded from the t r a i n i n g sample i s : 28. ,.2 X - M' X - ™ n - X T - 1 TX T - l T - l . T .2 ,2 = ( T ^ 1 ) °n (3.2) 3.3.2 Type I I F i l t e r Again i n t h i s case, an exact expression was de r i v e d f o r the r e q u i r e d output of a f i l t e r t r a i n e d on T - l samples, e x c l u d i n g the t e s t sample. The r e s u l t s , which f o l l o w from equations (2.4), (2.7) and (3.1), are presented here: and M ' «X n 'iH* I 1 n 1 (3.3) where M ' n T M n 1 X ,2 M ' -X n T - l T |x|2 + | M ; | 2 - d ,2 3.3.3 Other L i n e a r F i l t e r s For the Type I I I and Type IV l i n e a r f i l t e r s , i t was not p o s s i b l e to c o r r e c t the f i l t e r outputs a f t e r the f a c t . However, i t was p o s s i b l e to design modified f i l t e r s which produced the d e s i r e d outputs. Since each f i l t e r c o e f f i c i e n t f o r these two f i l t e r types depends only on the mean value of the corresponding f e a t u r e i n the t r a i n i n g samples, the f i l t e r i n g operations can be expressed i n the f o l l o w i n g form where fu n c t i o n s f., and f„ depend only on the estimates m 1 2 . J n. K y = I t f, (m ) + f 9 (A ) x. ] * (3.4) -'n . - I n . 2 n. 1 1=1 l i Considering j u s t one term of the sum, l e t z. = f, (m ) + f (m ) x. (3.5) l I n . I n. l i i Using equation (3.1), expressions can be w r i t t e n f o r the sample means of the t r a i n i n g samples w i t h the t e s t sample excluded: T mn. m' = f o r x. = (5 n i T - l 1 T mn, - 1 "n m l f o r x. = 1 (3.6) n i T - l S u b s t i t u t i n g i n t o (3.5) gives expressions f o r the r e q u i r e d terms i n the f i l t e r output: T mn. T - l z'. = f- ( ~ ) f o r x. = 0 1 J- m -» -^T n ^ . - l T m - 1 z . = f ( i _ ) + f ( = - ) f o r x = 1 (3.7) 1 - " - T - l * T - 1 Because x i s bi n a r y - v a l u e d , the above two expressions can be com-i bined i n t o one expression l i n e a r i n x i Tmn H - 1 T^n.--1 T mn,-z\ = f , ( — ~ ) + x,[ f ( 1— ) + f ( i _ ) - f ( i - ) ] T - l T - l T - l T - l (3.8) This equation s p e c i f i e s the c o e f f i c i e n t s of a f i l t e r which, when operating on a v e c t o r X belonging to i t s own t r a i n i n g s e t , produces the same output as a f i l t e r which was t r a i n e d on a l l samples except the cur r e n t one. I n c i d e n t a l l y , t h i s method i s a l s o a p p l i c a b l e to the Type I f i l t e r , but not to the Type I I f i l t e r , s i n c e i n the l a t t e r case each f i l t e r co-e f f i c i e n t depends on |M | as w e l l as m^ . i 3.4 Implementation The component operations of the o v e r a l l r e c o g n i t i o n procedure used to t e s t the f i l t e r designs, and the flow of data between these o p e r a t i o n s , are i l l u s t r a t e d i n Figure 3.3. The par t of the f i g u r e enclosed by a broken l i n e corresponds to the c l a s s i f i e r s t r u c t u r e of F i g u r e 1.1. The preprocessing procedures have been described i n S e c t i o n 3.2. The f i l t e r t r a i n i n g and f i l t e r i n g stages were b a s i c a l l y s t r a i g h t f o r w a r d implementations of the formulas presented i n Chapter 2 and S e c t i o n 3.3. Since a l l of the samples used were 24 x 24 a r r a y s , each l i n e a r f i l t e r r e q u i r e d 577 c o e f f i c i e n t s ( i n c l u d i n g a constant term). Each Type V f i l t e r r e quired a f u r t h e r 1105 c o e f f i c i e n t s f o r the term i n v o l v i n g the auto-c o r r e l a t i o n of the p a t t e r n . Because of the preprocessing done, i t was not considered necessary to do any s h i f t i n g of the samples at the f i l t e r i n g stage, although t h i s might have produced some improvement i n r e c o g n i t i o n r a t e s . The d e c i s i o n stage c o n s i s t e d simply of s e l e c t i n g the maximum or minimum f i l t e r output (depending on the f i l t e r type) and r e c o r d i n g the de-c i s i o n i n a confusion matrix. No a p r i o r i p r o b a b i l i t y i n f o r m a t i o n was used ( i . e . a l l character c l a s s e s were considered equiprobable). A l l the components of the simulated r e c o g n i t i o n system were T R A I N I N G S A M P L E S P R E P R O C E S S I N G F I L T E R T R A I N I N G r F I L T E R C O E F F I C I E N T S T E S T I N G b S A M P L E S P R E P R O C E S S I N G F I L T E R I N G O P E R A T I O N C O N F U S I O N " M A T R I C E S _ J F igure 3.3 Structure of Simulated Recognit ion System implemented as P L / l programs run on a l a r g e general purpose computer. I n t e r -mediate r e s u l t s were stored e i t h e r i n d i s c f i l e s or on magnetic tapes, de-pending on the volume of data. A l l of the confusion matrices produced are contained i n the Appendix. These matrices summarize how the samples of each c l a s s were c l a s s i f i e d . Each entry i n a matr i x represents the percentage of samples of the c l a s s corresponding to the row which were assigned to the c l a s s corresponding to the column. The percentages have been rounded o f f to the nearest i n t e g e r . 4. SIMULATION RESULTS WITH LINEAR FILTERS 4.1 Summary of R e s u l t s Table 4.1 summarizes the r e c o g n i t i o n r a t e s achieved w i t h Munson's ch a r a c t e r s , using the four types of l i n e a r f i l t e r s . In a d d i t i o n to the complete 46-character alphabet, r e s u l t s are given f o r subsets c o n s i s t i n g of the 26 upper-case l e t t e r s and the 10 numerals. The f u l l set of 146 al p h a -bets was used f o r both t r a i n i n g and t e s t i n g i n a l l cases. However, i n most cases separate t r a i n i n g and t e s t i n g sets were simulated. The e r r o r r a t e f o r each t e s t i s simply 100% minus the r e c o g n i t i o n r a t e , s i n c e no p r o v i s i o n was made f o r r e j e c t i o n of ambiguous c h a r a c t e r s . The Type I f i l t e r was t e s t e d on both normalized and un-normalized c h a r a c t e r s , i n order to confirm the usefulness of s i z e - n o r m a l i z a t i o n f o r t h i s data s e t . The Type I I I f i l t e r was not t e s t e d on the 10- or 46-character s e t s , since i t s performance on the 26-character set was i n f e r i o r to that of the other l i n e a r f i l t e r s . The Type IV f i l t e r was test e d w i t h and without s i m u l a t i o n of sepa-r a t e t r a i n i n g and t e s t i n g s e t s , f o r comparison w i t h l a t e r r e s u l t s w i t h the Type V f i l t e r . On the b a s i s of the experiments w i t h Munson's data, only the Type IV was s e l e c t e d among the l i n e a r f i l t e r s f o r t e s t i n g on the C o r n e l l t y p e w r i t t e n c h a r a c t e r s . Table 4.2 shows the r e c o g n i t i o n r a t e s f o r the 3MIBEX (IBM Executive f o n t ) and C0RAL2 (mixed f o n t s w i t h m o d i f i c a t i o n s ) subsets. F i l t e r Type Samples S i z e -Normalized? Simulated Separation of T r a i n i n g and T e s t i n g Samples? Recognition Rates I Confusion J M a t r i c e s 46 FORTRAN Characters 26 L e t t e r s 10 Numerals I No 1 Yes | 64.70 66.83 78.70 A . l , A.2, A.3 I Yes | Yes 72.96 79.08 83.15 A.4, A.5, A.6 I I Yes Yes 69.07 79.00 82.05 A.7, A.8, A.9 I I I Yes Yes 78.37 A.10 IV Yes Yes 74.70 80.24 84.79 A.11, A.12, A.13 IV Yes No 76.50 81.88 86.58 A.15, A.16, A.17 Table 4.1 R e c o g n i t i o n Rates f o r L i n e a r F i l t e r s w i t h Munson's Data Recognition Rates Confusion M a t r i c e s C o r n e l l Data Base Subset 66 Characters 40 Characters 26 L e t t e r s 10 Numerals 3MIBEX (IBM E x e c u t i v e Font) 97.45 99.81 99.96 A.18, A.19, A.20 CORAL2 (Mixed Fonts w i t h M o d i f i c a t i o n s ) 81.55 87.70 93.82 96.92 A.21, A.22, A.23, A.24 Table 4.2 R e c o g n i t i o n Rates f o r Type IV F i l t e r s w i t h C o r n e l l Data 35. R e s u l t s are given f o r the e n t i r e 66-character set and three d i f f e r e n t subsets. The 40-character subset i n c l u d e s the upper case l e t t e r s , the ten numerals and the s p e c i a l characters ",.-$". In computing the o v e r a l l r e c o g n i t i o n r a t e f o r each t e s t , a l l character c l a s s e s were weighted e q u a l l y , although there are unequal numbers of samples per c l a s s i n the C o r n e l l data base. As noted e a r l i e r , a c t u a l confusion m a t r i c e s appear i n the Appendix. 4.2 D i s c u s s i o n of R e s u l t s w i t h Munson's Characters The r e s u l t s of the Type I f i l t e r s on Munson's characters show that a s u b s t a n t i a l improvement i n the o v e r a l l r e c o g n i t i o n r a t e was obtained"by s i z e - n o r m a l i z i n g the samples. However, because of the crude method of s i z e - n o r m a l i z i n g , there was an in c r e a s e i n c e r t a i n types of con f u s i o n s , and the r e c o g n i t i o n r a t e s of the f o l l o w i n g character c l a s s e s decreased: " l - . ( $ = " . For these c h a r a c t e r s , important d i s t i n g u i s h i n g c h a r a c t e r i s t i c s were obscured by the n o r m a l i z i n g process. In the subset c o n s i s t i n g of a l p h a b e t i c c h a r a c t e r s , very few types of confusions increased a f t e r n o r m a l i z i n g , and there was no character c l a s s whose r e c o g n i t i o n r a t e decreased. In the numeric subset, the r e c o g n i t i o n r a t e s of "1" and "8" de-creased, l a r g e l y due to confusions w i t h each other. On balance, the r e s u l t s i n d i c a t e that s i z e - n o r m a l i z i n g i s an appro-p r i a t e way to preprocess Munson's characters before f i l t e r i n g , e s p e c i a l l y i f the method were improved to reduce the disadvantages j u s t mentioned. Since a s i m i l a r process would l i k e l y be used i n a p r a c t i c a l r e c o g n i t i o n system, only the normalized characters were used i n a l l subsequent e x p e r i -ments. The performance of the Type I I f i l t e r s , i n which the magnitude of the p a t t e r n v e c t o r s are normalized, was s l i g h t l y i n f e r i o r to that of the Type I f i l t e r s . The decrease i n performance was l a r g e l y due to confusion w i t h the two c l a s s e s Since these two chara c t e r s f i l l most of the p a t t e r n area a f t e r s i z e - n o r m a l i z i n g , the f i l t e r s f o r these c l a s s e s tend to act as c a t c h - a l l s f o r samples from many other c l a s s e s . This k i n d of con-f u s i o n was not as common w i t h the Type I f i l t e r s because the magnitude of the p a t t e r n v e c t o r s i n the c l a s s e s are much l a r g e r than i n most other c l a s s e s , and the Type I f i l t e r s make use of both the angle and the magni-tude of the p a t t e r n v e c t o r s . In the a l p h a b e t i c and numeric subsets, the o v e r a l l r e c o g n i t i o n r a t e s decreased only s l i g h t l y as compared to the Type I f i l t e r s , w i t h no obvious trends. These r e s u l t s i n d i c a t e t h a t , f o r t h i s set of samples, any advantage gained by n o r m a l i z i n g the magnitude of the p a t t e r n v e c t o r s was more than o f f s e t by the r e s u l t i n g l o s s of i n f o r m a t i o n . The Type I I I f i l t e r s , which a l l o w f o r n o n - s t a t i o n a r y n o i s e v a r i a n c e , performed worse than the simple matched f i l t e r s (Type I ) , even though they are apparently based on broader assumptions. The reasons f o r t h i s r e s u l t were made c l e a r by subsequent o b s e r v a t i o n s , to be discussed l a t e r . The Type IV f i l t e r s , which are not based on a Gaussian model, y i e l d e d the highest o v e r a l l r e c o g n i t i o n r a t e s of a l l the l i n e a r f i l t e r s . R e l a t i v e to the Type I f i l t e r s , the r e c o g n i t i o n r a t e improved by 1.74% f o r the e n t i r e 46-character s e t , rep r e s e n t i n g a 6.9% decrease i n the number of e r r o r s . The r e c o g n i t i o n r a t e s increased f o r most c l a s s e s and decreased f o r a few, wi t h no obvious p a t t e r n . There was only one c l a s s ("H") f o r which the r e c o g n i t i o n r a t e decreased by more than 2%. Since the Type IV f i l t e r s are t h e o r e t i c a l l y optimal i f independent bin a r y features are assumed, i t seems reasonable i n r e t r o s p e c t that nothing would be gained by a f u r t h e r assumption of Gaussian s t a t i s t i c s . F urther i n v e s t i g a t i o n showed t h a t , of the two Gaussian models used i n the design of the Type I and Type I I I f i l t e r s , the model w i t h n o n - s t a t i o n a r y n o i s e v a r i a n c e i s a c t u a l l y l e s s accurate i n i t s r e p r e s e n t a t i o n of the p r o b a b i l i t y d i s t r i b u t i o n of a binary-valued f e a t u r e . Although the allowance of a d i f f e r e n t n o i se v a r i a n c e f o r each f i l t e r provides an a d d i t i o n a l degree of freedom i n designing the f i l t e r s , t h i s freedom i s misused as a r e s u l t of the Gaussian assumption. One might devise a way of choosing the mean and varia n c e f o r a Gaussian model so that i t would y i e l d the d e s i r e d values of p r o b a b i l i t y at 0 and 1, but the r e s u l t would then be i d e n t i c a l to the Type IV f i l t e r , o r i g i n a l l y developed using d i f f e r e n t arguments (see Chapter 3). On the b a s i s of the fo r e g o i n g , i t was concluded that the assumption of independent f e a t u r e s w i t h Gaussian s t a t i s t i c s was i n a p p r o p r i a t e f o r the design of l i n e a r f i l t e r s o p e rating on b i n a r y character p a t t e r n s . Another s i g n i f i c a n t aspect of the r e s u l t s i n Table 4.1 i s the over-a l l s i m i l a r i t y of the r e c o g n i t i o n r a t e s , i n s p i t e of the v a r i e t y of f i l t e r des igns . T h i s suggests tha t o ther c o n s t r a i n t s cou ld be p laced on the f i l t e r des ign w i thout much s a c r i f i c e i n per formance. For example, the f i l t e r elements cou ld be r e s t r i c t e d to a l i m i t e d number of d i s c r e t e v a l u e s , i n order to reduce the cos t o f implementa t ion or i n c r e a s e the speed o f the f i l t e r i n g o p e r a t i o n . The s i m i l a r i t y of the r e c o g n i t i o n r a t e s a l s o g i v e s some suppor t to a con jec tu re tha t the r e s u l t s of the Type IV f i l t e r s a re c l o s e to the m a x i -mum r e c o g n i t i o n r a t e s o b t a i n a b l e w i t h any l i n e a r f i l t e r s on these p a r t i -c u l a r samples. The f a c t tha t the o v e r a l l r e c o g n i t i o n r a t e s were l e s s than 2% h igher i n the t e s t i n which the t r a i n i n g and t e s t i n g samples were not separated i s c o n s i s t e n t w i t h t h i s h y p o t h e s i s . In Tab le 4 . 3 , the r e s u l t s ob ta ined w i t h the Type IV f i l t e r s a re compared w i t h o ther exper iments on the same d a t a . Wh i le exact comparisons are i m p o s s i b l e because of d i f f e r e n c e s i n exper imen ta l methods, the t a b l e shows that s e v e r a l of the o ther exper iments produced h igher r e c o g n i t i o n r a t e s than the Type IV f i l t e r s . However, a l l of those y i e l d i n g h i ghe r performance appear to i n v o l v e c o n s i d e r a b l y g rea te r computa t iona l comp lex i t y than the l i n e a r f i l t e r s combined w i t h s i z e - n o r m a l i z i n g . C o n s i d e r i n g the s i m p l i c i t y of des ign and imp lementa t ion , the performance of the Type IV f i l t e r s i s encourag ing . Examinat ion of the con fus i on m a t r i c e s from the Type IV f i l t e r t e s t s shows tha t c e r t a i n con fus ions occur red much more f r e q u e n t l y than o t h e r s . For example, i n the a l p h a b e t i c subse t , con fus i on between the p a i r s " G Q " , " I J " and " K R " , and the t r i o "HMN" account f o r about 20% of a l l e r r o r s . On the other hand, a l a r g e p r o p o r t i o n of the e r r o r s are s c a t t e r e d t h i n l y Source Character Set Recognition Rate (%) Substitution Rate (%) Rejection Rate (%) Number of Training Samples (Per Test) Number of Training Samples (Total) Recognition Method Present work 46 FORTRAN characters 74.70 25.30 6715 6716 Type IV linear f i l t e r s 26 l e t t e r s 80.24 19.76 3795 3976 10 numerals 84.79 15.21 1359 1460 U, V 89.38 10.62 291 292 A l l et a l [Al] 10 numerals 92.98 3.57 3.45 480 840 Polygonal approxi-mations, parsing Hussain et a l [H9] 26 l e t t e r s 72.5 27.50 3276 3822 Black points counted in 25 regions, Bayes c l a s s i f i e r Knoll [K3] 25 l e t t e r s ( 0 excluded) 77.8 1 9.2 13.0 3822Z 500 Characteristic l o c i table look-up Mvnson [M3] 46 FORTRAN characters 78 22 4416 . 2346 Feature templates, adaptive piecewise linear c l a s s i f i e r 46 FORTRAN characters 77 23 4416 2346 Topological Descrip-tors, adaptive piece-wise linear c l a s s i f i e r Pavlidls et a l [PI] 10 numerals 90.6 9.4 3 100 1470 Polygonal approxima-tions, decision tree Persoon et a l -[P2] 10 numerals ' 89.4 10.6 470 500 Fourier descriptors Suen et a l 1 [S2] 1 U, V 94.52 4.11 1.37 148 146 Special test based on model of human perception . 1. Includes 7.8% ambiguous decisions containing the correct decision 2. Manual t r a i n i n g ; t e s t i n g samples included i n t r a i n i n g set 3. Tr a i n i n g samples included i n t e s t i n g set Table 4.3 Comparison of Results with Munson's Data 40. throughout the m a t r i x . Of the 26 x 25 p o s s i b l e confusions w i t h i n the a l p h a b e t i c subset, about 33% a c t u a l l y occurred at l e a s t once. These ob-s e r v a t i o n s suggest that the use of a few s p e c i a l t e s t s to r e s o l v e common confusions could enhance the o v e r a l l performance c o n s i d e r a b l y , but not to the extent of a c h i e v i n g a r e c o g n i t i o n r a t e much above 90%. Munson's c h a r a c t e r s , because of t h e i r high v a r i a b i l i t y , should be considered as a very d i f f i c u l t t e s t f o r a r e c o g n i t i o n system. None of the reported experiments i n Table 4.3 achieved a r e c o g n i t i o n r a t e that would be considered s a t i s f a c t o r y f o r most p r a c t i c a l a p p l i c a t i o n s . Munson [M13] has reported that f i v e human sub j e c t s made an average of 4.5% e r r o r s when asked to recognize samples from h i s data base. This f i g u r e could be l o o s e l y considered as a l i m i t on the performance of a machine r e c o g n i t i o n system on the set of 46 FORTRAN ch a r a c t e r s . Therefore, i t i s reasonable to expect that the Type IV f i l t e r s would perform much b e t t e r on more c o n t r o l l e d or constrained handprinted c h a r a c t e r s . Because of i t s s i m p l i c i t y , t h i s f i l t e r design could be u s e f u l i n a system which combines two or more r e c o g n i t i o n techniques. For example, a set of l i n e a r f i l t e r s could form the f i r s t l e v e l of a h i e r a r c h i c a l system. The f i l t e r s could a c c u r a t e l y c l a s s i f y most of the samples, and e l i m i n a t e some choices f o r the r e s t of the samples. The higher l e v e l s of the h i e r -archy would u t i l i z e more s o p h i s t i c a t e d and time-consuming techniques, which would be invoked only when necessary. Another reason f o r combining l i n e a r f i l t e r s w i t h other r e c o g n i t i o n methods i s that d i f f e r e n t methods are s u i t a b l e f o r handling d i f f e r e n t types of v a r i a t i o n i n the samples. L i n e a r f i l t e r s have d i f f i c u l t y w i t h gross d i s t o r t i o n s of the c h a r a c t e r s , but are not ve ry s e n s i t i v e to s a l t - a n d - p e p p e r no i se or s m a l l gaps i n c h a r a c t e r s , which might confound a t o p o l o g i c a l a l g o -r i t h m . 4 .3 D i s c u s s i o n of R e s u l t s w i t h C o r n e l l Cha rac te rs Tab le 4.2 shows tha t the Type IV l i n e a r f i l t e r s y i e l d e d ve ry h igh r e c o g n i t i o n r a t e s w i t h the s i n g l e - f o n t 3MIBEX c h a r a c t e r s . In the numeric subse t , the on ly e r r o r was a " 0 " c l a s s i f i e d as a " 6 " . In the a l p h a b e t i c s u b s e t , the o n l y con fus i on which occu r red more than once was between " G " and " C " , accoun t ing f o r approx imate ly o n e - t h i r d of the e r r o r s . When the e n t i r e 40 -cha rac te r se t i s c o n s i d e r e d , the e r r o r r a t e i n -creased by an order of magnitude over the a l p h a b e t i c subse t . The e r r o r s were ma in l y due to con fus ions between the l e t te r -number p a i r s " 0 0 " , " I I " and " Z 2 " , and con fus ions i n v o l v i n g the s p e c i a l c h a r a c t e r s " . , - " . The l a t t e r type of e r r o r can be a t t r i b u t e d l a r g e l y to the s i z e -n o r m a l i z i n g p r o c e s s , which removed the c h a r a c t e r i s t i c s of s i z e and p ropo r -t i o n d i s t i n g u i s h i n g those s p e c i a l c h a r a c t e r s . A more s o p h i s t i c a t e d no rma l -i z i n g a l g o r i t h m should reduce such e r r o r s . Many of the le t te r -number con fus ions cou ld p robab ly be avo ided by d e v i s i n g s p e c i a l t e s t s f o r d i s t i n g u i s h i n g between the p a i r s ment ioned. A s p e c i a l t e s t would be a p p l i e d when the con f idence l e v e l of a samp le ' s c l a s s i -f i c a t i o n , as measured by the d i f f e r e n c e between f i l t e r o u t p u t s , was below a s p e c i f i e d t h r e s h o l d . The r e s u l t s w i t h the 3MIBEX c h a r a c t e r s i n d i c a t e , as expec ted , tha t the Type IV f i l t e r s are capable of a c h i e v i n g high r e c o g n i t i o n r a t e s on machine-printed characters of a s i n g l e f o n t . However, t h i s o b s e r v a t i o n does not provide much in f o r m a t i o n on the m e r i t s of the technique used, s i n c e other techniques would achieve high r e c o g n i t i o n on c h a r a c t e r s of s i m i l a r q u a l i t y . Probably the more important aspect of the r e s u l t s i s that a s i g -n i f i c a n t number of confusions between s i m i l a r c h a r a c t e r s d i d occur. Table 4.3 a l s o summarizes the r e s u l t s of the Type IV f i l t e r s w i t h the CORAL2 c h a r a c t e r s . These samples are intended to simulate the v a r i e t y and q u a l i t y of characters encountered i n a r e l a t i v e l y u n c o n t r o l l e d machine-p r i n t r e c o g n i t i o n a p p l i c a t i o n , such as p o s t a l address reading. The o v e r a l l r e c o g n i t i o n r a t e s obtained are intermediate between those f o r the 3MIBEX characters and those f o r Munson's c h a r a c t e r s . The confusion m a t r i x f o r the e n t i r e 66-character set shows that con-f u s i o n s o f t e n occurred between the upper- and lower-case v e r s i o n s of many l e t t e r s . This type of confusion accounts f o r approximately 27% of the e r r o r s i n the 66-character s e t . I f these e r r o r s are not counted, the e r r o r r a t e f o r the 66-character set approaches that f o r the 40-character s e t , which excludes lower-case l e t t e r s . The confusion matrices f o r the C0RAL2 ch a r a c t e r s q u a l i t a t i v e l y resemble those f o r Munson's c h a r a c t e r s , i n that a l a r g e p r o p o r t i o n of the e r r o r s are s c a t t e r e d t h i n l y through the m a t r i c e s . This i n d i c a t e s that the use of some s p e c i a l t e s t s to r e s o l v e common confusions would not improve the r e c o g n i t i o n r a t e to above 98%. The s c a t t e r e d e r r o r s can be a t t r i b u t e d to four major causes: the s e n s i t i v i t y of the s i z e - n o r m a l i z i n g process to s t r a y b l a c k p o i n t s ; the v a r i a t i o n s i n s t y l e among the v a r i o u s f o n t s ; character degradation due to poor p r i n t q u a l i t y and other n o i s e ; and character r o t a t i o n . The f i r s t of these f a c t o r s could probably be much reduced by a more s o p h i s t i c a t e d normal-i z i n g a l g o r i t h m . Some of the d i f f e r e n c e s among fo n t s could be accommodated by using more than one f i l t e r per character c l a s s , . c o r r e s p o n d i n g to major s t y l e v a r i a t i o n s . E x t r a f i l t e r s could a l s o be added to detect r o t a t e d v e r s i o n s of those characters whose r e c o g n i t i o n i s s e n s i t i v e to r o t a t i o n . Since the r e l a t i v e importance of the d i f f e r e n t causes of e r r o r s has not been i n v e s t i g a t e d , the p o t e n t i a l gain due to these suggested enhance-ments i s not known. The extent to which the r e c o g n i t i o n r a t e could be improved by such measures would determine whether the Type IV f i l t e r s could form the b a s i s of a p r a c t i c a l m u l t i - f o n t r e c o g n i t i o n system. I f not, i t could s t i l l be u s e f u l i n c o n j u n c t i o n w i t h other techniques, as discussed i n the preceding s e c t i o n . M i l s o n and Rao [Ml] have reported some experiments using the three character c l a s s e s "BE8" from the C o r n e l l data base. Their r e c o g n i t i o n system was s i m i l a r to the Type I matched f i l t e r design, w i t h two important d i f f e r e n c e s . F i r s t l y , the f i l t e r i n g was c a r r i e d out i n the s p a t i a l f r e -quency domain, i n order to achieve i n v a r i a n c e to the p o s i t i o n of the sample. Secondly, the f i l t e r s were based on i d e a l prototypes of the IBM Executive font (the f o n t of the 3MIBEX samples) r a t h e r than mean p a t t e r n s estimated from many t r a i n i n g samples. The reported s u b s t i t u t i o n r a t e s were 3.4% f o r the 3MIBEX samples, and 18% f o r the C0RAL2 samples. In the experiment reported here w i t h the Type IV f i l t e r s and the 3MIBEX samples, there were no s u b s t i t u t i o n s at a l l among the group "BE8". For the C0RAL2 samples, the confusion m a t r i x shows an average s u b s t i t u t i o n r a t e of about 6% w i t h i n the group ( t h i s f i g u r e could be a l i t t l e higher i f the d e c i s i o n s were r e s t r i c t e d to those three c l a s s e s ) . Since the p r o v i s i o n f o r i n v a r i a n c e to sample p o s i t i o n could be expected to improve r e c o g n i t i o n , the higher e r r o r r a t e s obtained by M i l s o n and Rao are probably due to t h e i r use of i d e a l prototype ch a r a c t e r s . I f so, t h i s i n d i c a t e s that a l i n e a r f i l t e r based on a proba-b i l i s t i c model of character p a t t e r n s can perform s u b s t a n t i a l l y b e t t e r than a f i l t e r or mask based on a prototype, even f o r samples of a s i n g l e f o n t . 5. SIMULATION RESULTS WITH STATIONARY COVARIANCE FILTERS 5.1 Summary of R e s u l t s Table 5.1 summarizes the r e c o g n i t i o n r a t e s obtained using the s t a -t i o n a r y covariance f i l t e r s (Type V) on Munson's c h a r a c t e r s . The r a t e s obtained w i t h the Type I and Type IV l i n e a r f i l t e r s are a l s o i n c l u d e d f o r comparison. In one t e s t , a l l 146 alphabets were used f o r both t r a i n i n g and t e s t i n g . In the second set of t e s t s , the data base was d i v i d e d i n t o sub-sets of 49, 49 and 48 alphabets, each subset c o n s i s t i n g of one alphabet from each of the 49 authors (except f o r the one missing a l p h a b e t ) . F o l l o w -ing the r o t a t i o n method of Toussaint and Donaldson [ T l ] , each of the three subsets was i n turn used as the t e s t s e t , w i t h the remaining two as the t r a i n i n g s e t . The r e s u l t s of each r o t a t i o n and the average of the three are given i n the t a b l e . Table 5.2 summarizes the r e c o g n i t i o n r a t e s obtained using the Type V f i l t e r s on the 3MIBEX and C0RAL2 subsets of the C o r n e l l data base. The t r a i n i n g and t e s t i n g sets were completely separate i n these t e s t s . The corresponding r e s u l t s of the Type IV f i l t e r s are again i n c l u d e d f o r comparison. 5.2 D i s c u s s i o n of R e s u l t s w i t h Munson's c h a r a c t e r s Table 5.1 shows that the Type V f i l t e r s y i e l d e d s i g n i f i c a n t l y lower o v e r a l l r e c o g n i t i o n r a t e s than e i t h e r the Type I or Type IV f i l t e r s on Munson's characters w i t h separate t r a i n i n g and t e s t i n g s e t s . On the F i l t e r Type D e s c r i p t i o n of and T e s t i n g T r a i n i n g Recognition Rates Confusion M a t r i c e s Sets 46 FORTRAN Characters 26 L e t t e r s 10 Numerals V A l l samples used f o r t r a i n i n g and t e s t i n g 86.28 A. 25 Separate t r a i n i n g and t e s t i n g samples 1st r o t a t i o n 70.98 76.61 76.94 2nd r o t a t i o n 70.63 75.20 82.65 3rd r o t a t i o n 71.78 79.01 77.92 average 71.13 76.92 79.18 A.26, A.27, A.28 I Simulated separate t r a i n i n g and t e s t i n g sets 72.96 79.08 83.15 A.4, A.5, A.6 IV Simulated separate t r a i n i n g and t e s t i n g sets 74.70 80.24 84.79 A.11, A.12, A.13 A l l samples used f o r t r a i n i n g and t e s t i n g 76.50 81.88 86.58 A.15, A.16, A.17 Table 5.1 R e c o g n i t i o n Rates f o r Type V F i l t e r s w i t h Munson's Data, Compared to Rates f o r Type I and Type IV F i l t e r s . F i l t e r Type C o r n e l l Data Base Subset R e c o g n i t i o n Rates Confusion M a t r i c e s 66 Characters 66 Characters 26 L e t t e r s 10 Numerals V 3MIBEX 98.35 99.72 99.88 A.29, A.30, A.31 C0RAL2 81.58 88.04 93.48 96.91 A.32, A.33, A.34, A.35 IV 3MIBEX 97.45 99.81 99.96 A.18, A.19, A.20 C0RAL2 81.55 87.70 93.82 96.92 A.21, A.22, A.23, A.24 Table 5.2 R e c o g n i t i o n Rates f o r Type V F i l t e r s w i t h C o r n e l l Data, Compared to Rates f o r Type IV F i l t e r s . 48. other hand, when a l l the characters were used f o r both t r a i n i n g and t e s t i n g , the Type V f i l t e r s y i e l d e d much higher r a t e s . Comparing the confusion matrices of the Type IV and Type V f i l t e r s w i t h separate t r a i n i n g and t e s t i n g s e t s , the q u a l i t a t i v e appearance of the matrices i s q u i t e s i m i l a r . In the complete 46-character s e t , most of the confusions which occurred w i t h the Type IV f i l t e r s a l s o occurred w i t h the Type V f i l t e r s , to v a r y i n g e x t e n t s . Most of the d i f f e r e n c e s between the two matrices are small and s c a t t e r e d . However, there were l a r g e i n c r e a s e s i n a few types of confusions, such as the p a i r s "CG" and "NW", w i t h the Type V f i l t e r s . I t seems reasonable that the cha r a c t e r s i n these p a i r s would have s i m i l a r covariance m a t r i c e s , thereby reducing t h e i r d i s t i n g u i s h -a b i l i t y by the Type V f i l t e r s . There were 14 character c l a s s e s whose r e c o g n i t i o n r a t e was higher w i t h the Type V than the Type IV f i l t e r s , thus going a g a i n s t the o v e r a l l trend. Some of these, such as "T1+", have simple shapes which could be expected to produce d i s t i n c t i v e covariance m a t r i c e s . For o t h e r s , such as "D4$", there does not appear to be such a simple e x p l a n a t i o n f o r the improved r e c o g n i t i o n . For seven of these 14 character c l a s s e s , there was a l s o a decrease i n the t o t a l number of confusions of other c l a s s e s w i t h these c l a s s e s . This i n d i c a t e s that f o r at l e a s t some c l a s s e s , s p e c i f i c a l l y "DTYl4+$", there was a r e a l i n c r e a s e i n performance w i t h the Type V f i l t e r s , and not j u s t a b i a s i n favor of c e r t a i n c l a s s e s . Examination of the confusion matrices f o r the 26- and 10-character subsets leads to observations s i m i l a r to the above. The decrease i n r e c o g n i t i o n r a t e s r e l a t i v e to the Type IV f i l t e r s w i t h separate t r a i n i n g and t e s t i n g sets may be p a r t l y a t t r i b u t e d to the 2 f a c t that the t r a i n i n g set f o r the Type V f i l t e r s was only /^ as l a r g e as f o r the Type IV f i l t e r s i n each t e s t , due to the r o t a t i o n method. However, t h i s f a c t o r alone cannot account f o r a performance that was a c t u a l l y i n f e r i o r to any of the l i n e a r f i l t e r s . The main reason f o r the d i f f e r e n c e must be i n the design of the f i l t e r s , and more s p e c i f i c a l l y , i n the method of computing the f i l t e r c o e f f i c i e n t s . Since each Type V f i l t e r c o n s i s t s of a l i n e a r o p e r a t i o n on the p a t t e r n v e c t o r , combined w i t h a l i n e a r o p e ration on the a u t o c o r r e l a t i o n v e c t o r of the p a t t e r n , i t f o l l o w s that f o r some choice of c o e f f i c i e n t s , such a f i l t e r must be able to perform at l e a s t as w e l l as any given f i l t e r c o n s i s t i n g only of a l i n e a r o p e r a t i o n on the p a t t e r n v e c t o r . The f a c t that a lower p e r f o r -mance was a c t u a l l y achieved i n d i c a t e s that the set of assumptions invoked i n designing the Type V f i l t e r s was i n a p p r o p r i a t e . The comparison of the v a r i o u s l i n e a r f i l t e r s has al r e a d y shown that a model based on the assumption of independent Gaussian f e a t u r e s was not the best one. The assumption of s p a t i a l l y s t a t i o n a r y noise covariance has apparently, on the whole, compounded the e r r o r of the e a r l i e r assump-t i o n . This s i t u a t i o n i s analogous to that of the Type I I I f i l t e r s , which performed worse than the Type I f i l t e r s , even though the former design was apparently more general than the l a t t e r . In t h a t case, i t was shown that a f i l t e r having the same s t r u c t u r e as the Type I I I but a d i f f e r e n t s et of c o e f f i c i e n t s , namely the Type IV f i l t e r , could i n f a c t achieve b e t t e r performance than e i t h e r the Type I or the Type I I I . As s t a t e d above, some f i l t e r s must e x i s t w i t h the s t r u c t u r e of the Type V f i l t e r s which perform at l e a s t as w e l l as, and most probably b e t t e r than, any of the l i n e a r f i l t e r s . The f a c t that b e t t e r performance was a c t u a l l y achieved by the Type V f i l t e r s f o r some character c l a s s e s may be one i n d i c a t i o n of t h i s p o t e n t i a l . Further evidence i s given by the high performance of the Type V f i l t e r s when the same ch a r a c t e r s were used f o r t r a i n i n g and t e s t i n g . This r e s u l t suggests that f i l t e r s of t h i s s t r u c t u r e could perform w e l l i n r e c o g n i z i n g samples which are of high v a r i a b i l i t y , provided the v a r i a t i o n s are w e l l represented i n the t r a i n i n g s e t . However, i t i s r i s k y to draw any c o n c l u s i o n s o l e l y from t h i s r e s u l t , s i n c e the d i m e n s i o n a l i t y of the f i l t e r s i s q u i t e l a r g e compared w i t h the sample s i z e . In summary, the experiments w i t h the Type V f i l t e r s on Munson's characters have shown that the f i l t e r s t r u c t u r e shows some promise, but the model on which i t s design was based was i n a p p r o p r i a t e . However, there i s l i t t l e b a s i s from which to i n f e r the extent of the p o s s i b l e improvement over a l i n e a r f i l t e r s t r u c t u r e , or how t h i s improvement might be r e a l i z e d . In the course of these experiments, some p o t e n t i a l l y u s e f u l obser-v a t i o n s were made p e r t a i n i n g to the c o e f f i c i e n t v e c t o r s (V ), which were extr a c t e d from the f i r s t row of each i n v e r s e covariance m a t r i x (A ^ ) . n F i r s t l y , i t was noted that the l a r g e s t components of V n were those c o r r e s -ponding to small p h y s i c a l displacements i n the p a t t e r n , and the magnitude of the components decreased r a p i d l y w i t h i n c r e a s i n g displacement. In most cases, a l l the components corresponding 'to displacements gr e a t e r than about 5 c e l l s were smaller than .01 times the l a r g e s t component. This observa-t i o n i s c o n s i s t e n t w i t h the i n t u i t i v e idea t h a t , f o r handprinted c h a r a c t e r s , the c o r r e l a t i o n between d i s t a n t p o i n t s should be s m a l l compared to nearby p o i n t s . Although i t has been shown here that the values of V n were not optimal w i t h respect to r e c o g n i t i o n r a t e s , i t i s probable that the optimal f i l t e r s of t h i s s t r u c t u r e would e x h i b i t a s i m i l a r p a t t e r n . The s i g n i f i c a n c e of t h i s observation i s that most of the components of each could probably be approproximated as zeroes, thus g r e a t l y reducing the number of computa-t i o n s r e q u i r e d , w i t h l i t t l e e f f e c t on the r e s u l t s . I f so, then the Type V f i l t e r s t r u c t u r e might be f e a s i b l e even i f the g a i n i n performance r e l a t i v e to a l i n e a r f i l t e r were not l a r g e . Another notable observation was that the V f o r the v a r i o u s n character c l a s s e s were a l l q u i t e s i m i l a r . Although the d i f f e r e n c e s among them would presumably be important i n d i s t i n g u i s h i n g the c l a s s e s , i t i s p o s s i b l e that the use of a s i n g l e covariance m a t r i x f o r a l l c l a s s e s would s t i l l be b e t t e r than using no covariance i n f o r m a t i o n at a l l . I f only one covariance matrix i s used, the term XA ^X t i s the same f o r a l l c l a s s e s f o r n a given sample, and t h e r e f o r e need not be computed. In t h i s case, i t i s s u f f i c i e n t to choose the c l a s s which minimizes the l i n e a r f u n c t i o n : y (X) = M A" 1 M t - 2XA _ 1M t (5.1) •'n n n n Since the quadratic term has been e l i m i n a t e d , there i s no need f o r r e s t r i c t i n g A to being a c i r c u l a n t m a t r i x , and t h e r e f o r e the noise s t a t i s -t i c s need not be r e s t r i c t e d to being s t a t i o n a r y . The Type I f i l t e r i s a s p e c i a l case of equation (5.1) i n which A = 1. This f i l t e r d i f f e r s from the Type I and the other l i n e a r f i l t e r s d escribed i n Chapter 2 i n that i t s design i n c o r p o r a t e s some covariance i n f o r m a t i o n . This f i l t e r might a l s o be viewed as a v a r i a t i o n of the Type I f i l t e r i n which the mean patterns M n are modified by a l i n e a r t r a n s -formation A Assuming that the e n t r i e s i n A ^ would be mostly p o s i t i v e , as was the case f o r the c i r c u l a n t matrices A \ t h i s t r a n s f o r m a t i o n would n e s s e n t i a l l y be a smoothing operation. This suggests that the f i l t e r des-c r i b e d by equation (5.1) might be most u s e f u l i n a s i t u a t i o n i n which only a s m a l l t r a i n i n g sample i s a v a i l a b l e , because the smoothing e f f e c t would compensate f o r some of the randomness of the t r a i n i n g sample. (Stated another way, i f the estimates of M n are worse, there i s a greater r e l a t i v e advantage i n using A i n the f i l t e r design. Since A i s assumed to be the same f o r a l l c l a s s e s , i t s estimate i s based on a much l a r g e r sample than any of the Mn-) For these reasons, i t may be worthwhile to make an experimental comparison of the above f i l t e r w i t h the other l i n e a r f i l t e r designs. 5.3 D i s c u s s i o n of R e s u l t s w i t h C o r n e l l Characters Table 5.2 shows that the performance of the Type V f i l t e r s on the C o r n e l l characters followed very c l o s e l y the corresponding r e s u l t s f o r the Type IV f i l t e r s . The d i f f e r e n c e s i n o v e r a l l r e c o g n i t i o n r a t e s were a l l l e s s than 1%, and most were much l e s s . O v e r a l l , the Type V f i l t e r s per-formed s l i g h t l y b e t t e r on the 40- and 66-character s e t s , but s l i g h t l y worse on the 10- and 26-character s e t s . The confusion matrices f o r the 3MIBEX ch a r a c t e r s show t h a t , i n the 10- and 26-character s e t s , the Type V f i l t e r s produced most of the confu-sions a l s o produced by the Type IV f i l t e r s , as w e l l as some new confusions, r e s u l t i n g i n a s l i g h t l y lower o v e r a l l r e c o g n i t i o n r a t e . The only type of confusion which occurred i n s i g n i f i c a n t numbers was between the l e t t e r s "G" and " C " . I f t h i s p a i r were exc l uded , the r e c o g n i t i o n r a t e s f o r the d i f f e r e n t f i l t e r s would be n e a r l y i d e n t i c a l . In the complete 40 -cha rac te r s e t , the Type V f i l t e r s produced a s l i g h t l y b e t t e r r e c o g n i t i o n r a t e , main ly due to fewer con fus ions between the p a i r s " I I " , " 0 0 " and " Z 2 " , and w i t h i n the group S t i l l , these types of con fus ions accounted f o r most of the e r r o r s . These r e s u l t s i n d i c a t e that a f i l t e r which uses cova r iance i n f o r -mation i s capable of d i s t i n g u i s h i n g some s i n g l e - f o n t c h a r a c t e r s of s i m i l a r shape b e t t e r than the bes t l i n e a r f i l t e r which was t r i e d . However, the degree of improvement was not s u f f i c i e n t to j u s t i f y the use of t h i s p a r t i -c u l a r f i l t e r des ign i n a p r a c t i c a l s i t u a t i o n . With the C0RAL2 c h a r a c t e r s , the Type V f i l t e r s aga in y i e l d e d mar-g i n a l l y lower o v e r a l l r e c o g n i t i o n r a t e s than the Type IV f i l t e r s on the 10-and 26 -cha rac te r s u b s e t s . However, the d i f f e r e n c e s between the c o r r e s -ponding con fus ion ma t r i ces f o r the two f i l t e r types are g rea te r than might be expected from the c l oseness of the o v e r a l l r a t e s . The r e c o g n i t i o n r a t e s f o r i n d i v i d u a l cha rac te r c l a s s e s d i f f e r more o f t e n than not between the two f i l t e r t y p e s , by as much as 7%. The m a t r i c e s from the Type V f i l t e r t e s t s have about 13% fewer o f f - d i a g o n a l e n t r i e s , an i n d i c a t i o n tha t the e r r o r s tended to be concent ra ted i n t o fewer t y p e s . A s i d e from t h i s , no s i g n i f i c a n t t rends are apparent i n the d i f f e r e n c e s between the m a t r i c e s . In the 40 - and 66 -cha rac te r s e t s , the h igher r e c o g n i t i o n r a t e s ach ieved by the Type V f i l t e r s as compared to the Type IV f i l t e r s a re more than accounted f o r by h ighe r r e c o g n i t i o n of the two c h a r a c t e r s " . , " • I f these two were not i nc luded i n the t o t a l s , the Type IV f i l t e r s would have again produced s l i g h t l y higher o v e r a l l r e c o g n i t i o n . Once again, the d i f f e r e n c e s between the confusion matrices are many, but w i t h no obvious p a t t e r n . The number of letter-number confusions was about 10% lower w i t h the Type V f i l t e r s , w h i l e the number of confusions between upper and lower case v e r s i o n s of the same l e t t e r was about 3% higher. These observations do not lead to any new conclusions about the merits of the Type V f i l t e r s t r u c t u r e . While there was an improvement over the Type IV f i l t e r i n some r e s p e c t s , the o v e r a l l r e s u l t s do not demon-s t r a t e that the Type V f i l t e r i s b e t t e r i n a p r a c t i c a l sense. The f a c t that there were wide d i f f e r e n c e s between the two f i l t e r designs i n terms of the r e c o g n i t i o n r a t e s f o r p a r t i c u l a r character c l a s s e s leads to s p e c u l a t i o n that there are improvements to be made over both designs, p o s s i b l y by combining aspects of each. On the other hand, these d i f f e r e n c e s may be e s s e n t i a l l y n o i s e , owing to many characters of poor q u a l i t y whose c l a s s i f i c a t i o n i s s e n s i t i v e to s m a l l d i f f e r e n c e s i n f i l t e r design. I t i s l i k e l y that such characters would not be recognized w i t h confidence by any scheme as u n s o p h i s t i c a t e d as those used here. More p o s i t i v e c onclusions regarding t h i s matter might be reached by studying the confidence of the c l a s s i f i c a t i o n s made, as measured by the d i f f e r e n c e between the highest and next highest f i l t e r outputs. 6. CONCLUSION 6.1 Summary of Conclusions The t e s t s of the four l i n e a r f i l t e r designs w i t h Munson's hand-p r i n t e d characters showed that the Type IV f i l t e r s y i e l d e d c o n s i s t e n t l y higher r e c o g n i t i o n r a t e s than any of the o t h e r s , although not by a wide margin. The Type IV f i l t e r s were based only on the assumption of indepen-dent binary-valued f e a t u r e s . I t was concluded that the use of a Gaussian model, l e a d i n g to the matched f i l t e r design and i t s v a r i a t i o n s , o f f e r e d no advantage i n the design of l i n e a r f i l t e r s f o r b i n a r y character p a t t e r n s , as long as independent f e a t u r e s were assumed. Although b e t t e r r e s u l t s have been obtained by others (see Table 4.3) using more s o p h i s t i c a t e d r e c o g n i t i o n schemes w i t h Munson's character samples, the s i m p l i c i t y of the l i n e a r f i l t e r s suggests that the Type IV design may be u s e f u l i n a handprint r e c o g n i t i o n system which combines more than one technique, p o s s i b l y i n a h i e r a r c h i c a l s t r u c t u r e . The t e s t s of the Type IV f i l t e r s w i t h the 3MIBEX subset of the C o r n e l l data base i n d i c a t e d that t h i s design i s capable of y i e l d i n g recog-n i t i o n r a t e s of approximately 99% w i t h s i n g l e - f o n t machine-printed c h a r a c t e r s . For many a p p l i c a t i o n s , these r a t e s would be acceptable. How-ever, the r e s u l t s w i t h the m u l t i - f o n t C0RAL2 subset i n d i c a t e d that a sub-s t a n t i a l improvement would be necessary to o b t a i n e r r o r r a t e s above 95% f o r characters of such q u a l i t y . This improvement might be achieved i n p a r t by the use of e x t r a f i l t e r s to accommodate v a r i a t i o n s In the samples, and s p e c i a l t e s t s to d i s t i n g u i s h between commonly confused c l a s s e s . In an e f f o r t to design a f i l t e r which could make some use of the c o r r e l a t i o n among the fea t u r e s i n a p a t t e r n , without the cost of a f u l l q uadratic f i l t e r , the Type V f i l t e r was developed. I t was shown t h a t , i f the f e a t u r e s of a p a t t e r n are assumed to have Gaussian d i s t r i b u t i o n s w i t h s p a t i a l l y s t a t i o n a r y covariance, the optimal f i l t e r r e q u i r e s only a l i n e a r operation on the p a t t e r n v e c t o r and a l i n e a r o p e r a t i o n on the a u t o c o r r e l a -t i o n v e c t o r of the p a t t e r n . E x p e r i m e n t a l l y , the Type V f i l t e r s produced lower o v e r a l l r e c o g n i -t i o n r a t e s than any of the other f i l t e r designs w i t h Munson's c h a r a c t e r s . I t was concluded that the assumption of Gaussian s t a t i s t i c s was not an appropriate b a s i s f o r designing a f i l t e r of t h i s s t r u c t u r e to operate on bina r y p a t t e r n s . However, there were some i n d i c a t i o n s that a f i l t e r of the s t r u c t u r e of the Type V f i l t e r could achieve b e t t e r performance than the simple l i n e a r f i l t e r s , provided that a s u i t a b l e method of choosing the f i l t e r c o e f f i c i e n t s could be found. Further t e s t s of the Type V f i l t e r s w i t h the C o r n e l l data base supported these c o n c l u s i o n s . In the course of designing the experimental s i m u l a t i o n s of the l i n e a r f i l t e r s , two d i s t i n c t methods were developed f o r e x a c t l y s i m u l a t i n g separate t r a i n i n g and t e s t i n g samples, w h i l e using a l l of the a v a i l a b l e samples f o r t r a i n i n g . One of these methods i s s u f f i c i e n t l y general that i t could be a p p l i e d to any l i n e a r f i l t e r design i n which each f i l t e r co-e f f i c i e n t depends only on the sample mean of the corresponding b i n a r y f e a t u r e . 57. 6.2 Suggestions f o r Further Work The main purpose of t h i s work has been to evaluate the me r i t s of a group of f i l t e r designs f o r character r e c o g n i t i o n . While some conc l u s i o n s have been reached, there are s e v e r a l areas i n which f u r t h e r i n v e s t i g a t i o n may be worthwhile. The Type IV f i l t e r d esign, which was apparently the best of the l i n e a r designs, should be compared more e x t e n s i v e l y w i t h other a v a i l a b l e . r e c o g n i t i o n techniques, p a r t i c u l a r l y those which are of s i m i l a r complexity. For example, there are a v a r i e t y of non-parametric and i t e r a t i v e a l g o -rithms by which l i n e a r f i l t e r s could be designed [DI, N3, Y l ] . Another p o s s i b l e l i n e a r f i l t e r design was described by (5.1). A favourable com-pa r i s o n of the Type IV design w i t h these other techniques would j u s t i f y i t s f u r t h e r development, i n c l u d i n g the enhancements suggested i n Chapter 4. A l s o , the e f f e c t of some d e t a i l s such as the a r b i t r a r y l i m i t s on the f i l t e r c o e f f i c i e n t s could be i n v e s t i g a t e d . I t was suggested i n Chapter 5 that the s t r u c t u r e of the Type V f i l t e r had the p o t e n t i a l to perform better- f o r some choice of f i l t e r co-e f f i c i e n t s . While there are v a r i a t i o n s on the model described i n Se c t i o n 2.6 which might be explored, some other approach would probably be more produc-t i v e . One approach would be to develop a p r o b a b i l i s t i c model which provides f o r s p a t i a l l y s t a t i o n a r y dependence of the fe a t u r e s i n a p a t t e r n , without the assumption of Gaussian s t a t i s t i c s . Chow's r e c o g n i t i o n method usi n g neighbour dependence [C2] might be a u s e f u l s t a r t i n g p o i n t . 58. Another p o s s i b i l i t y would be to design a l i n e a r f i l t e r to operate on the a u t o c o r r e l a t i o n v e c t o r of each p a t t e r n , whose output would be added to the output of a l i n e a r f i l t e r (such as the Type IV) operating on the p a t t e r n v e c t o r i t s e l f . This approach might be aided by some e a r l i e r work i n which only the a u t o c o r r e l a t i o n of a p a t t e r n , or i t s F o u r i e r transform, was used i n r e c o g n i t i o n experiments ( f o r example, [ H 6 ] ) . In designing a f i l t e r to operate on a u t o c o r r e l a t i o n v e c t o r s , a Gaussian model may be more a p p r o p r i a t e s i n c e the components of the a u t o c o r r e l a t i o n v e c t o r are not b i n a r y - v a l u e d . A t h i r d approach would be to t r e a t the o r i g i n a l p a t t e r n v e c t o r , augmented by the a u t o c o r r e l a t i o n v e c t o r , as one f e a t u r e v e c t o r which i s operated on by a set of l i n e a r f i l t e r s . In t h i s case, as s t a t e d above, there are a v a r i e t y of methods by which the f i l t e r s could be designed. A P P E N D I X CONFUSION MATRICES CECISICNS U l CECISICHS (o A S e 0_ e C _ K _ T J X _ l K N 0 C . <« S T 'U V V X Y Z 1 2 3 4 5' 6 7 «• 9 0 ( _ • / i 1 A 1 1 6 2 1 1 1 2 1 3 . 1 1 2 1 1 1 1 1 2 1 1 p 4 •) i •\ 1 • * 4 1 1 1 3 c i 66 1 11 6 1 1 1 6 3 C 0 2 61 i 1 3 1 10 1 2 '1 3 3 1 1 2 1 5 . . 1 0 E I 3 3 3 1 _1_ 1 4 1 1 1 '1 1 1 . 3 9 1 E F 1 ' 1 77 " " l Y i 10 1 1 1 2 f " 1 1 F c 1 I 53 1 ; 1 3 1 12 1 2 1 1 1 1 12 • 1 1 1 2 1 1 C M 3 \ 1 1 17 i 1 ? 1 I 4 1 1 1 4. H I 1 57 9 4 5 s 2 1 5 1 1 3 1 1 I J 1 9 ' 0 .2 o 4 1 2 3 3 1 1 8 2 J K 3 2 _4. _3._ 1 64 2 I 3 5 « ] 1 6 1 1 1 1 K L 46 4 1 1 3 ~i 11 3 3 t. .y i 8 74 & 1 1 1 Z 1 1 1 1 M V 1 1 1 0 i ! t- 55 1 • i •* i 1 1 1 1 0 1 5 5 J I 1 1 59 3 2 2 1 1 1 4 1 1 5 1 0 p 3 1 1 12 1 1 2 47 1 1 1 3 5 1 1 J 1 p o 3 1 a 1 _1_ 1 j 68 1 ' 1 i 2 1 _4 —V .v 1. .;_ 9_ R " 7 1 _._ ~ 2 • " 2 T s 1 2 1 2 . 5 46 3 3 1 3 1 . 1 n S 1 1 1 2 I 3 1 55 1 1 1 2 •4 3 1. 5 1 9 1 3 s T 8 •» 1 \ 1 M 4 • 4 2 4, 1 1 J 7 U 2 1 3 2 1 1 5 2 1 62 9 4 1 3 1 3 V I 1 1 3 4 1 .1 8 59 t 5 1 5 1 1 1 4 1 V . i 12 3 ? 76 2 W ', X 3 ~ 1 — — — - 3 1 1 2 70 8 1 1 1 2 1 1 1 1 X Y X •» 3 i > 4 2 62 1 2 3 1 9 1 2 1 Y 1 1 1 9 2 1 f 1 1 . 7 . — ^ - 77 1 1 11 2 4 1 1 1 1 2 2 • \ 3 I 7 1 1 8 5 47 1 5 3 1 1 1 I 1 6 1 1 2 l 2 2 4 _.1_ . _ 3 .5.3. 1 1_ 11 l . JS 3_ ~4 •i " i ~ i 1 1 2 9 1 5 1 37 " T 1 . 10 2 1 1 T 10 2 Z 4 5 1 1 1 3 . 1, ' 1 3 3 5 1 1 1 1 ,1 1 10 1 1 • 1 5 1 1 59 2 5 3 1 2 1 1 1 5 6 10 1 1 1 1 1 1 3 1 1 37 4 S _ 1 6 3 1 I 40 1 1 1 .J, 8 8 _i.„2_-5.-0 l • 1 _l •• 1 1 • 1. 1 1 . 1 2 • • . 1 1 I 3 1 15 ...5 .. 1 3 3 V. \ . 1 1. 68 81 9 1 1 1 1 1 84 1 1 J J * 1' 1 1 ! 9 1 1 1 1 1 « 1 2 97 3 66 ! i ) 3 3 1 1 3 2 3 1 2 2 1 . 1 4 3 6 2 1 1 3 3 5 1 4 63 i 79 1 A 3 " " C "0" £""F"""0 ~ H " — : . J K I f - N C P • 0 R 5 T U V W X Y Z ' I" 2 '3' '5~'4 ~"7~ 8 " " « 0 1 * / 1 K l SI".?'. ?S CF EiCM CH13ACTR USED CvF.Httt H J C O J M T C N R A T E 1 . 6 4 . 6 9 6 2 F i g u r e A . l Type I f i l t e r s Munson's data (not size-normalized) 46-character set o : — • : C E C I S I C N S m ' A B C D E F G H I J K L M N C P Q R' S T U V W X Y _ Z A 69 1 1 1 7 1 1 2 2 l 1 1 1 2 1 3 4 1 A B 5 60 1 3 3 1 1 1 4 •a 2 1 1 3 5 I 6 B C 71 1 \ 1 1 14 8 1 1 C D 1 3 1 1 1 8 3 1 10 1 2 2 1 1 D E 1 6 3 66 5 1 2 2 S l 1 1 1 1 1 E . " f "~"T~ 1 79 1 2 l 10 3 1 1 1 F G 3 1 3 1 56 1 1 3 1 l - 5 1 12 4 2 2 1 G H 4 3 58 1 • 1 C 17 1 1 2 5 1 1 H I 1 1 1 65 14 3 5 1 8 I J 1 2 12 60 2 2 10 8 2 2 J K 3 2 1 3 1 1 1 66 2 1 3 6 " 1 1 7 1 K "l 1 1 2 5 82 • 4 2 1 1 tl M 3 1 1 8 75 7 1 1 1 1 2 M N 4 1 20 1 6 55 I 3 2 3 1 N 0 3 1 • 6 5 3 2 2 1 1 • 63 4 . 1 3 2 1 0 P 3 1 1 12 1 1 2 73 1 1 1 1 3 P 0 4 1 1_ 1 9 1 1 1 1 2 72 1 I I 1' 2 ' I 0 R 7 •> 2 2 1 18 1 2 1 2 5 48 1 4 3 R S 1 6 3 1 2 2 5 5 1 i 69 2 S T 1 8 5 9 1 i 1 67 7 T ' u 2 1 4 3 1 1 5 2 1 63 13 5 U V 1 1 1 « 5 1 4 1 1 \ 8 66 1 8 V w 1 1 1 1 12 3 3 77 1 1 w X 3 1 5 4 1 1 2 74 8 X Y 2 3 1 3 3 1 1 4 6 6 .69 70 Y 7 8 4 l ] 1 ? ? ? V , 3_ 6 7 A B C 0 E F G H T J K L K N 0 P 0 R S T U V W X Y Z 146 SAMPLES O F E A C H C H A R A C T E R U S E D ' OVERALL RECCGNITCN RATE : 66.8335 % F i g u r e A . 2 Type I f i l t e r s Munson's data (not s i z e - n o r m a l i z e d ) 26 -cha rac te f se t D E C I S I O N S m 1 2 3 4 5 6 7 ' 8 9 0 1 9 4 1 3 I 1 1 2 7 7 5 4 5 7 ' i 1 ? 3 5 5 8 1 1 1 3 2 2 3 4 3 7 5 2 2 1 1 0 6 4 5 2 4 1 F 4 4 1 1 1 • 2 5 . 6 3 1 2 8 6 1 6 7 4 1 5 1 7 6 4 1 0 7 8 5 5 5 3 6 6 5 3 8 9 3 5 7 1 0 3 7 2 9 0 1 1 2 8 3 1 4 2 7 7 0 1 4 6 S A M P L E S O r S A C H C H A R A C T E R U S F D C . V E R A J L k . . . M . C P G . N I I F i g u r e A . 3 Type I f i l t e r s Munson's data (not s i z e - n o r m a l i z e d ) 10 -character se t A B C 0 E e C H 1 J K L_ C N C ? 0 R S T IT V W X Y I 2 3 4 5 6 7 e 9 0 1 * / • - t 1 " A 75 ' 1 1 1 1 3 3 i 3 1 1 1 3 ? 1 1 1 2 ; I 5 1 8 1 1 1 3 1 1 1 2 A * c c € 84 . 2 75 75 i 1 1. 1 1 4 2 1 . 5 10 1 ' 1 1 3 1 1 . i 1 3 . 1 1 ..1 ' 1 1 8 1 4 C 0 E F s H I 5 1 1 2 .90 60 58 j -1 1 1 e 16 1 2 13 1 3 2 1 1 1 ? i 1 3 3 10 1 1 " 1 1 J 1 1 1 1 1 F C H • X J K .... 2. 1 1 1 ...1. ... l -75 8 8 53 77 1 1 1 1 1 8 3 . 8 6 I 1 J 3 i l I 1 3 3 3 1 1 1 1 2 1 1 1 1 1 1 1 2 13 I J K L H N 5 ? " i " 1 "i' ~V 5 10 1 33 79 7 3 73 1 1 J 1 ? 1 1 7 1 l 1 1 1 s 1 1 1 I n N C ? 0 2 1 i 3 1 2 . .».. 1 1 1 88 82 1 73 1 •2 ; I 1 ...s. 1 .. 1 2 ..1. 3 . . . . . 2 1 .1 1 • . 3 1 1 1 0 ? 0 R s T 5 1 l 1 "i 8 2 " 2 3 1 e" 5 3 I 1 ~T I i 6S 71 78 2 1 1 1 l ""3" 4 8 8 3 3 1 1 1 1 R S T u V . y l 1 I 1 .- 1. 1 1 2 1. 1 8 5 _1 8 3 2 1 1 2 1 69 • 2 3 7 73 <r 1' Bl 1 l 3 1 2 1 1 1 » 7 1 1 1 U V W X Y 7 1 1 6 3 •> 1 1 1 I I 3 1 70 1 10 70 1? I ; 1 4 1 3 .. . 7 1 1 1 5 14 1 ' 1 1 3 1 X Y J I 2 3 . .1. I 1 1 1 4 1 1 2 2 1 5 1 4 1 1 8 45 1 65 .. 2. ' 1 73 1 3 2 .. 1. 1 27 1 3 16 3 3 3 1 1 1 8 j 2 3 ; 4 5 6 * 5 6 }-3 2 > 1 I 1 • I 3 1? 2 2 1 3 1 2 2 1 1 3 12 I ? 4 i 5 3 49 1 74 ? 47 " 1 1 1 1 1 11 2 1 7 8 9 IS 2 . 2 1 2 1 1 2 1 -1. 1 1 3 2 2 1 2 3 1 1 1 2 1 1 3. 1 1 8 1 2 ..1 1 3 4 73 5. 39 5 7 76 3 1 1 1 1. 3 1 10 7 8 9 0 ( "' 2 ' 1' 6 3 I " V 1 . . . . . 4 4 1 4 1 3 5 i 1 1 1 1 1 1 1 1 3 1 " 1 " 1 69 79 84 '"3" 3 — 'V 1 ~T 6 0 t / 1 2 1 1 1 2 9 6 1 1 1 72 1 91 7 3A. 1 I 4 3 • • 2 1 1 "3 3 5 3 1 3 3 1 1 1 80 1.0 12 86 2 75 1 4 • ' 1 1 I * 6 1 I 1 * 1 2 13 1 1 1 2 3 1 1 1 1 1 1 1 5 1 3 1 1 58 88 I ) — A" S' C "F "c " " j x" ~C" <S ? 0 R i t u V U X Y I 2 " 3' ""V " S" -'6 ' " 7" — $ - 9r 0 ( * / • - • i ) 146 S, OF CHARACTER USED CVERALI. RECOCNZTCN RATE l_ 72.9601JS F i g u r e A .4 Type I f i l t e r s Munson's data 4 6 - c h a r a c t e r se t — : DECISIONS {*) ; ~ , A B C 0 E F G H I J K L V H C P Q R S T U " V ' W X Y Z A 79 1 3 1 3 2 1 1 3 1 1 2 1 A B 1 81 3 2 1 2 2 1 1 1 3 1 1 1 B C 86 1 1 1 . 5 5 1 C 0 73 3 2 2 1 10 1 3 ' • ' 1 D • E 1_ 2 80 6 1 5 _1 ; 1 3 E F 92 2 2 3 I F G 1 3 2 72 1 1 1 2 1 13 1 - 3 G H 6 2 60 1 9 16 1 1 1 1 2 H 1 1 1 81 10 1 1 3 I I J 1 1 10 63 1 1 9 12 1 J K 2 _ _ __ 7 8 1 _ 1 8 1 1 5 1 K L 1 1 1 1 3 1 87 2 3 L M 5 .. 5 r.79 7 1 1 1 M N 3 1 10 1 3 73 3 2 1 2 N 0 1 1 3 - 3 1 90 1 0 P 3 2 2 1 1 86 1 3 I I P 0 1 4 _ 1 _ _ _ • _ 1 _ 1 0 „ 1 _ 8 0 1 1 1 1 0 _ R 6 1 1 2 8 1 1 3 1 6 8 1 1 2 1 3 R S 1 1 1 1 3 4 7 82 S _ _ I 8 1 7 1 82 1 T U 1 1 1 1 1 1 1 3 3 2 3 71 8 U V 1 1 5 5 1 1 3 78 1 1 3 V W . II 1 1 !__.8 _1 3__4_8.1 . W_ X I 1 7 1 1 1 77 12 • X Y 1 1 4 1 1 1 4 8 79 Y r. i 1 L _ l I i 1 . _1 ; 5J l A B C D E F G H I J K L V. ' M C P 0 R S T - U V W X Y Z 146" SAMPLES"OFEACH "CHARACTERUSED OVERALL RECOGNITON RATE : 79.0832 % F i g u r e A . 5 Type I f i l t e r s Munson's data 26 -cha rac te r se t 4> D E C I S I O N S m 1 2 3 4 5 6 7 8 9 0 1 8 1 2 1 4 3 8 1 1 2 I 8 6 1 1 1 4 5 1 2 3 1 2 9 1 1 1 . 1 2 1 3 4 1 7 7 2 1 1 1 0 8 4 5 i 5 1 9 0 3 1 5 6 4 1 2 9 2 1 6 7 2 1 8 6 1 1 0 7 8 1 0 3 5 1 5 ' 1 6 4 9 1 8 9 2 1 5 8 3 8 1 1 9 0 4 1 3 1 1 2 2 8 6 0 1 2 " 3 " 4 " 5 6 7 " " 8 9 " 0 ~ " 1 4 6 S A M P L E S OF E A C H C H A R A C T E R U S E D O V E R A L L R E C O G N I T O N R A T E : 32^119.1^.3-F i g u r e A.6 Type I f i l t e r s Munson's data 10-character set DECISIONS OeClSIC NS i : A e C 0 E F S M 1 J K U _ g _ _0_ _S_ T U V w JL 1 2 3 5 6 7 4 9 0 ( • • / - # t 1 A 74 ' • K !• 6? 1 2 5 1 1 1 1 i I 1 1 * 5 /, •> 3 3 1 1 1 4 4 16 1 2 A : •5 ! 1 . i 0 J 2 AS. 68 3 1 1 1 5 1 2 1 5 12 1 l 1 1 l 1 2 3, 1 2 1 1 1 1 i 6 l 1 2 1 7 3 5 C 0 6 : i F : r. l H 5 1 2 69 50 52 2 1 8 15 3 2 8 1 3 2 1 1 1 1 1 1 3 1 3 I 9 2 1 1 3 5 18 2 F C ! H I j i « 3 1 1 1 1 73 5 9 1 57 73 1 2 1 1 7 3 10 5 1 1 3 2 1 1 1 3 4 2 1 1 i 1 1 1 1 1 1 2 1 1 1 u 1 J K ; L - 3 1 1 ! 1 4 <> 2 1 84 1 62 4 3 67 3 1 1 1 1 1 1 1 1: 5 1 2 21 8 4 1 L M : P 1 0 I 3 1 2 1 1 1 1 1 I a7 •5 79 64 1 1 1 3 5 1 1 1 2 1 3 1 1 3 2 5 7 10 • p 0 * 4 s T 1 1 10 2 2 2 8 1 6 3 1 1 1 1 62 67 73 2 2 3 2 3 1 . 9. t 7 2 1 •. 2 • 1 1 1 2 5 1 1 1 R s U V M 1 1 1 I 1 ) 2 1 7 4 6 5 1 * 2 66 I 4 5 67 5 I 7_3_ 2 1 ' 1 1 1 1 1 1 1 7 5 8 6 3 2 2 1 u V w I 1 1 3 3 1 1 1 1 1 1 ~66~ 1 2 73 73 3 1 3 I 3 1 2 1 1 1 1 5 15 1 1 1 1 1 I 3 1 Y Z 3 2 1 I 1 1 1 4 1 2 1 3 A 3 1 1 10 1 31 I 61 _1_ ~iT 1 4 1 1 3 21. 1 1 24 2 2 1 3 2 2 1 • 1 __§. 1 2 _3 ' • 4 1 5 6 3 ? 3 I 1 2 1 1 3 1 1 1 I 1 11 1 I 4 I 3 1 1 1 3 77 1 67 I 10 1. 1 2 5 18 2 16 3 1 1 4 5 6 7 8 12 «> 2 1 1 1 I 1 2 1 1 1 1 1 3 1 1 1 2 1 1 2 1 9 5 1 1 1 1 3 77 4_ 4 30 7 66 3 1 . 3 4 IS 12 1 1 1 8 7 3 ' 9 o i T ( 6 i I 1 3 3 3 3 1 1 1 3 I 1 * 1 1 1 4~ T 1 1 62 1 ao 1 82 ~s" ' 2" 5 3 1 2 0 I • i . I . i 1 1 1 1 \ 3 . 3 5 1 6 5 1 1 68 3 95 8 82 3 1 3 3 • 2 1 _ 1 1 1 2 3 5 3 1 1 12 1 2 1 3 1 1 1 1 71 9 12 84 3 75 1 2 • i : i 1 1 6 1 • 1 1 2 1 14 1 1 2 3 i I 1 1 1 i 1 4 1 3 • I 4 1 55 87 I i s " c D £ F . G H I J K I M N 0 P a a S I U V W X . Y 1 .1 2 3 i 4 7 ( • » / * • i i ) 14 6 Sl"» -SS Or f ;Oi CH«>AC TF.R USED • -C / E B t l l "rCCGM'CN «1TE : 69.0739 < F i g u r e A .7 Type I I f i l t e r s Munson's data 46 -cha rac te r set O N D E C I S I O N S IZ) C D E I J K U V X Y Z C D A 8 2 B 1 8 4 _H I_ 3 1 _N O__P JL i l l l 1 8 6 1 7 5 2 2 7 9 2 1 1 5 ' 1 5 6 1 1 2 9 2 4 3 6 9 5 8 1 . 9 1 7 2 1 1 3 8 0 1 0 6 6 6 7 5 4 1 1 0 1 2 L 1 M 5 N 2 1 8 3 5 11 8 0 7 4 7 1 1 1 1 4 2 0 1 P 3 0 1 R f S T 1 9 2 • 8 6 1 1 1 2 7 8 1 0 4 1 8 6 1 6 8 1 3 0 1 8 2 5 1 1 8 6 1 7 5 1 1 7 0 8 1 7 7 2 4 5 7 9 A _5 B_ C 3 D 3 E_ p G H 2 I 1 J K 3 T M N 9 4 1 K L M N . 0 P S T U W X T 4 6 S A M P T T S OF " E A C H " C H A R A C T E R U S E D o p Q R S T U V W 1 7 5 1 2 X 3 2 8 7 Y 1 1 1 9 2 Z O V E R A L L R E C O G N I T O R R A T E : 7 9 . 0 0 4 2 Z F i g u r e A . 8 Type IT f i l t e r s Munson's data 2 6 - c h a r a c t e r se t DECISIONS m 1 2 3 4 5 6 7 3 9 0 . 1 82 1 1 3 2 10 1 1 2 3 83 1 . 1 2 6 3 2 3 2 1 90 1 2 1 3 3 4 7 63 2 . 1 1 14 7 4 5 1 6 1 89 . 2 1 5 6 5 1 93 1 6 7 2 1 89 8 7 a 13 2 3 1 4 1 65 • 9 1 8 9 3 1 5 5 2 79 9 0 3 1 8 1 1 2 1 82 0 I 2 3 T ~ 5 * " 6 7" ~'"8~" ~9 0 146 SAMPLES OF EACH CHARACTER USED OVERALL RECOGNITON RATE : 82.0548 % F i g u r e A.9 Type I I f i l t e r s Munson's data 10-character set ; — : D E C I S I O N S ui : A 3 C D E F G " H I J K L "K N C P ' Q R ' S" ' T' "IT V" W "X " Y "Z A B 84 1 86 3 1 2 i 1 1 2 2 6 1 1 1 1 3 2 1 1 1 A , B C 0 e 1 I 3 6 81 l 82 1 JL?_ 1 4 1 4 1 8 3 1 4 7 2 "1 1 1 2 1 1 C 0 E * F G H i_ 1 12 5 2 . 1 87 1 74 44 1 1 • 1 11 23 1 9 1 14 1 1/ •1 1 1 1 " 1 2 1 F G H I J K 2 1 1 1 1 . 2 1 86 11 3 62 77 1 1 4 • 1 1 1 1 8 v 3 9 11 1 ' 2 6 1 1 I J K L M N 1 11 7 T 1 i 5 10 3 '"'1 1 88 75 ' 3 5 69 1 2 1 3 1 1 •3 3 3 2 1 L M N 0 P 0 1 1 . 1 1 1 1 5 2 1 2 8 i 89 94 86 1 " 3 1 1 0 . P 0 S T 10 1 4 1 • 1 9 4 .1 4 6 3 14 1 2 1 2 2 1 63 79 82 3 1 R . S T U V w 1 1 1 3 1 1 1 1 3 ' 1 1 8 . 6 4 2 2 2 " 1 8 1 72 1 4 3 76 7 3 79' 1 • ^  • 5 U ',v :.*v/ X Y Z 2 4 l 1 1 1 1 ? 8 3 1 ] 1 1 ' 1 1 1 3 ' 1 1 5 I 79 5 ? 9 79 35 •:x • Y 7 A 3 C 'D E F G [4 I J K L H ' N . ' C P Q R S T U V W X Y Z 146 S AM P L E S op~c'ACH T H A R A C T ER "jJsE'D" O V E R A L L R E C C G N I T C N R A T E :. 7 8 . 3 7 2 0 % F i g u r e A . 1 0 Type I I I f i l t e r s Munson's data 26 -cha rac te r se t O N V O DECISIONS t u A » e 0 8 F 6 H I J K t _ l ' N C l» 0 R S T 'U V' 'W X' Y ' I 1 2 3 * 5 4 7 8 1 0 ( ._•_-.*... J/ *._T • ''" ''">' * '78 8 1' 77 j 1 3 \ 2 c 1 1 1 2 1 1 •3 } 1 A 1 10 1 1 1 2 1 A 8 C 0 2 6. ...1 _.2. 0 I 1 M 7 1 81 1 76 81 1 1 1 1 5 1 6 9 1 1 1 1 3 1_ 1 3 1 2 2 1 1. 1 1 2 2 A C 0 p 1 1 92 62 1 54 2 1 i 1 9 ;7 1 3 13 I 1 2 1 1 1 1 1 1 3 2 10 1 1 1 1 1 1 1 1 F C H I J x 2 L >( 5 N 2 1 1 1 80 8 5 53 76 1 1 _1 -77 3 1 1 8 2 9 9 1 1 5 1 1 1 1 3 3 3 1 1 2 1 \ 2 1 2 7 I J K ] ..... 'I 5 9 3 1 '33~ 8 73 1 1 •> 1 1 ? 1 1 i 1 1 ..... 1 3 I 1 1 L i 1 • N 0 1 P 3 C 1 R ~ 8 • S 1 T 2 4 3 1 1 1 _5 1 68 A 87 76 1 1 1 l 1 5 1 1 3 1 1 3 3 1 I 0 p 0 10 2 "~2" 2 1 5 3 10 1 1 1 2 67 73 77 2 2 l 5 1 8 1 3 3 8 1 1 ; R s T U V J 1 2 1 1 1 1 1 1 1 A 1 1 8 3 7 A 1 2 1 1 5 71 3 3 3 77 A 2. A? 2 l 2 1 1 1 1 1 1 i u V v' X Y 1 1 7 5 i 1 1 . 1 1 3 77 3 6 75 1? i i •>, i " ? 1 1 3 2 . 1 ,? 2 1 5 1 3 X Y 1 2 3 . .3 A 4 5 A 7 3 1 2 I 1 3 _ 1 2 1 3 .5 1 1 1 1 A 1 1 1 ,2 1 '8 1 A3 67 2 74_ 2 1 I 1 1 I 27 2 1* 3 J 2 1 1 A 6 1 2 3 7 2 ""l 3 1 0 " 3 l" 1 1 2 1 2 12 1 5 1 5 1 3 1 50 1 75 ? 8 1 3 1 1 10 2 1 I A 5 A 7 8 16 2 1 1 1. 1 1 1 j 2 •a A 3 1 2 3 1 1 1 3 1. 1 1 1 1 7 1 1 3 1 76 6 AS 7 3 1 77 I 1 1 1 1 3 1 7 7 8 9 0 3 1 ( 3 ~T i ""i ; " "l A A 7 5 1 3 5 >i 1 1 1 1 ~Y 1 1 3 \ " T 1 1 72 83 2 1 2 i 5 1 0 ( » 1 / l .1 . 2 1 1 1 1 2 5 5 _7 . 1 1 2 75 95 5 1 1 3 3 * _ . i 3 3 5 1 I 1 3 A ? ~1 1 1 81 9 11 87 1 f • • i 2. 1 I 1 1 1 A 1 1 1 12 1 1 1 2 3 2 1 1 1 3 3 1 5 1 1 56 89 i ) A 6 " c " " o ' — g — p - — s H _ 1 J K I N C f> 0 R 5 Y u V U X Y I " 1 ""2" •' 3/ _ 5 '"4 " 7 8 9 U ( * / • t '.',4 SAMPLES EJCH CHAR AC TER USED OVERALL 1EC0GNITC.N RATC : 74.7022 F i g u r e A .11 Type IV f i l t e r s Munson's data 46 -cha rac te r se t o A B C 0 E . F G H I J K L M N • 0 P ' Q R" S T ' U V ' w X Y Z A 82 1 3 1 2 5 1 1 2 1 1 1 1 A B 1 88 3 1 1 1 1 1 l 1 1 I B C 0 c 1 2 3 85 1_ 81 82 3 1 1 3 1 c 4 1 6 9 1 2 1 1' 3 1 2 C 0 .e . F G H 1 8 3 3 1 r 92 74 1 56 2 1 1 1 10 17 1 3 1 13 1 1 3 1 1 1 1 1 2 1 F G H T 1 J K 2 1 1 1 i 1 36 10 8 63 77 1 1 3 1 1 1 1 8 1 2 9 13 1 '1 5 1 1 I J K L M N 1 5 2 1 1 1 i 1 6 9 3 1 1 88 78 3 8 73 1 1 3 1 1 2 1 3 2 1 L M N G p Q R S T 1 3 1 2 5 3 1 1 1 5 1 1 1 90 4 92 83 2 " i " 1 I 1 2 0 P 0. 8 1 1 1 10 3 2 5 1 5 . 7 .10 1 1 1 2 70 84 8? 2 3 R S T U V \ ^ 1 1 2 1 i 1 1 1 1 •i 1 1 5 1 i 8 3 7 4 1 3 1 1 6 -71 3 3 4 78 4 2 82 1 2 U V w ' X Y 1 3 i 1 1 1 1 \ 3 5 1 1 1 1 1 1 1 1 5 V 82 8 7 77 90 X Y 7 A B C 0 E F G H I J K I M N 0 P Q R S .T U V W X Y . Z • s AVp r=s ""C ?" EA C H " C HAR ACT ERUSED OVERALL RECOGNITGN RATE : " 80 .2424 % F i g u r e A.12 Type IV f i l t e r s Munson's data 26-character set D E C I S I O N S ( « ) 1 2 3 4 5 6 7 8 9 0 tt-— " 1 8 0 3 3 3 8 1 1 1 2 1 8 8 1 1 2 3 3 1 2 '' 3 \ 2 9 1 1 1 1 3 1 3 4 1 7 8 .. 1 1 1 1 0 8. 4 5 5 1 9 0 2 2 5 6 4 2 9 2 2 6 7 3 2 8 6 1 3 7. 8 9 1 1 1 3 1 7 2 1 0 1 8 9 3 4 6 3 8 4 1 9 0 3 . 3 1 4 1 8 8 0 1 4 6 S A M P L E S O F E A C H C H A R A C T E R U S E D O V E R A L L R E C O . G N I T O N _ R l_84,..7945. ? F i g u r e A.13 Type IV f i l t e r s Munson's data 10-character set U V U 90 10. U V 11 89 V U V T4"'6'" SAMPLESQF''"''E ACfiCHARACTER0SE0 OVERALL RECOGNITON RATE : 89.3336 ? Figure A.14 Type IV f i l t e r s Munson's data 2-character set : i e c i i i i N S ( 5 1 O S C I S I C N S H I A B C O S F G H I J K J. K N C r 0 P S ' T U V W • X' Y 2 I .2 3 A 5 4 7 8 9 0 _t . » / * - . ' , « L l A (» 83 ?? 1 1 3 2 1 A 1 1 1 1 3 1 A 1 8 1 1 1 1 A ft c r; 1 2 1 82 1 77 8A 1 1 1 1 5 3 5 9 1 1 1 1 3 • 1 1 3 1 2 2 1 1 1 1 1 ..V. 3 C 0 E F C 7 1 1 1 93 42 60 2 1 I 1 8 I A 1 3 12 1 1 2 1 1 1 1 1 1 1 2 2 10 1 1 1' 1 1 1 J F c H I J . 2 s 7 1 1 1 81 5 5 57 79 1 1 _ l i_ 1 8 2 8 9 1 1 3 1 1 1 1 3 2 3 A " I 1 2 1 1 2 1 1 7 I J K I H .s 3~ 5 9 ... .^ £6" 78 8 3 7A 1 1 3 1 2 ! I ? 1 ' 1 " V 1 1 1 3 1 1 , I S H 0 p c 2 1 1 1 3 2 1 1 5 1 9'1. A 85 77 1 1 1 1 1 5 1 3 1 1 J ? . 3 1 1 0 P 0 R ' S T 8 ' I 2 2 2 8 1 8 5 3 1 1 2 70 75 7* 2 2 i" 5 i 4 i" 2 3 8 1 I R s T U V V . X Y 1 2 -.1 1 1 1 I I 1 1 A 1 8 3 1 7 A 1 2 1 5 72 2 ' 3 3 77 2" 4_a2_ 2 l 7 1 1 1 1 1 1 U V w \ 1 l 5 A i 1 1 i i 3 79 3 6 76 I 7 "i •) 1 1* 3 ! _.l 2 2 ? 5 1 1 x Y 7 1 2 3 . A 5 3 1 2 1 I 3 1 1 3 2 3 2 * 0 1 1 3 5 i I 1 A 1 3 2 5 1 AA 72 7A. 2 1 . . ...1 1 1 1 27 2 IA 3 1 2 1 1 6 1 2 3 7 7 2 1 2 12 7 5 1 5 1 3 1 52 I 77 f 73 1 ""» "f 1 1 10 J A 5 A 7 8 9 0 ( 16 2 1 .... i 1 1 l i 1 ..2 1 3 A 7 2 3 1 1 1 3 .1. 1 1 1 _ 1 . 1 6 1 1 3 1 79 A7 3 1 5 7 .78 1 * 1 1 1 6 7 8 9 2 "l 3 3 1 A 7 5 ~ T 3 5 1 1 I 1 • l 1 1 3 1 " l " 1 75 8A ~ i ~ 1 2 1 5 0 t . • 1 _1 ._l 2 J 1 1 1 2 5 A _7 _A 1 1 77 95 A 86 1 1 • j • ¥ 2 1 3 3 5 2 1 1 1 3 3 2 1 ) 1 82 7 11 89 1 .75 A t i 2 1 1 1 A 1 1 1 11 1 1 1 1 3 1 1 1 3 1 3 1 5 1 1 60 90 i ) . " " A " B ~ ' C - o " E F C H I J K L p :< C P 0 S T U V W — x" — Y " T - I ~ 2 ' ~i~ "'4 " 5 ~4 ~7~"8" — 9 0 ( * • / » t » I 1A6 SA M P ' . F S O F F A C H C H A R A C T E R U S F D OVERALL RcCCCNITQN RATE j_76.503? _S. F i g u r e A . 1 5 T y p e I V f i l t e r s M u n s o n ' s d a t a 4 6 - c h a r a c t e r s e t No s i m u l a t e d s e p a r a t i o n o f t r a i n i n g a n d t e s t i n g s e t s . A B C 0 E F G H I J K L f N C P Q R S T U V W X Y Z A B 84 91 2 1 1 3 1 1 1 2 4 1 1 1 1 1 1 1 1 1 1 1 A 8 C D c ..._.L. 2 _._1 86 82 84 3 1 1 3 1 4 1 5 9 1 1 1 '1 3 1 2 C : o '»: E F G H 1 8 3 2 1 93 75 62 2 1 1 1 ' 8 14 1 •a 1 12 1 . 1 3 1 1 1 1 2 1 F G H I J K 2 1 1 1 1 1 86 8 7 66 81 1 1 1 1 1 1 1 8 ' 1 2 8 12 1 i 4 1 1 I J K L M N 1 5 2 1 1 5 9 3 1 1 90 79 3 8 74 1 1 3 1 2 1 • 3 2 1 L M N 0 P 0 1 2 1 1 1 4 2 i 1 5 1 1-1" 92 93 84 2 1 1 1 1 2 0 P 0 R S T 8 1 1 1 8 3 2 4 1 5 7 8 1 1 2 72 85 83 2 3 R S T U V w 1 1 2 1 1 _ 1_. 1 1 1 1 5 1 1 8 3 7 4 ' 1 3 1 6 72 2 3 4 79 4 2 82 1 2 U V w ' X Y z •> 1 1 1 1 1 I 6 4 1 1 1 1 1 1 1 1 5 1 84 8 6 79 91 X Y z A B C 0 E F G H I J K L M N 0 P Q R S T U V W X Y Z 146" SAX>LES~OFEACHCHARACTERUSED CVERALL RECOGNITCN RATE : 81.8757 % F i g u r e A .16 Type IV f i l t e r s Munson's da ta 26 -cha rac te r set No s imu la ted sepa ra t i on of t r a i n i n g and t e s t i n g s e t s . 1 2 3 4 5 6 7 3 9 0 . 1 81 3 3 3 7 1 1 1 . 2 1 90 1 1 3 3 1 2 3 i 2 91 1 1 1 3 I 3 4 1 81 1 1 1 8 7 4 5 4 1 92 2 1 5 6 4 2 93 1 6 7 2 1 88 1 8 7 • 8 8 1 1 1 3 1 75 9 1 8 9 3 4 • 4 2 36 I 9 0 3 3 . 1 4 39 0 r ~ 2 " 3 4 5 6 ~ 8 ~ Q 146 S A M P L E S OF E A C H CHARACTER USED O V E R A L L R E C O G N I T O N RATE : 8 6 . 5 7 5 3 % F i g u r e A .17 Type IV f i l t e r s Munson's da ta 10 -cha rac te r set No s imu la ted sepa ra t i on of t r a i n i n g and t e s t i n g s e t s . ON — — " ~ ~ : ~ DECISIONS (*) ... ... . •,-» t in i ID i;C I'T !•(• 11 Vi LI'A LJX LJ Y VIZ. 1 ? _ J A 6 I 8 9 t NL . _ > £ _ UA. U3. UC .UO- UE..UF- J C . . . . U K - J J . L - 1 U - U K _ J J — U l _ i l _ - _ _ _ J f l _ . U i _ J j . 5 - - L I - _ - _ . V _ U _ _ U A _ _ X UA ico 1 UA l l h 241 ? 4 _ > u _ _ — — UC . 100 * UO 100 *• u o 2 4 ; WE i _ -'J P 10 0 . — — • —" ~ cf l o o . . UG 2 97 * . ... 1 f\'\ - — 1 ur UG DH 24 1 ?<rl OH i • UI 97 • j j 1KC 3 • U 1 UJ 4E 1 24i; _-*2_ 242 482 U< . . . — — : ] ; — UL " i j wi 1 CC ' ICO _ _ — UL UH UN U J . •—--^ •— : U'J • 9 1 UO 100 8 •• U G UP l / i . 2 i > 2 J 9 _ i . _ ^ . _ 2 39 239 "24 3 243 _ . 2 : . l . ^ _ 1 UJ .. . • ,-r;—1  ,o " " • i c o y.i m o 1 UK u s UT UT — i - i u u 1 0 0 • 1 0 0 uU UV J I . v - •• 100 u x • • 1 0 0 ur <,,] 2 ux UY U7 < ** J* 243 24 } A 13 7 'J 7 — •—• — 1 1 7 . 15 2 a s 85 98 * 2 A 2 3 3 ... • • — — — — — — — — — — — — — — 4. 5 100 96 ]no + + 2 5 r> 22S 2 3 o 6 : — — " 7 a 100 100 100 i e 9 476 212.. 93 86 3 95 11 1 0 24 3 245 ?43 - / -i 1 4 • • 84 00 i 243 s UT Voire' uo u n UF'JS UM UI UJ UK UL IM UM UO UP UQ UR US.UT UU uv uu ux u t „ 1 2 3 4 5 6 7 8 9 0 t > i v n i cft T r s L i - /<•*>• t i — L _ _ _ J — L _ _ _ J — _ _ _ - _ — — - — — OVERALL S»ECC;r.!TON R»T£ - EOWAL W E j . 9 „ „ ' - i ^ t ^ - i — — — — • _ SA.-IPLE y.siv»nrss v / . i i h i i F i g u r e A.18 Type IV f i l t e r s C o r n e l l 3MIBEX data 40-char.acter set DEC I SIGNS U) ,U.A-.IJ.B....UC....^ ^^ —nutta.na. \J ' J uc uo UE 100 + 100 ICO UC UD UE 482 241 241 UF UG UM 100 2 98 100 + UF UG UH 241 • 241 ?41 UI UJ UK + 100 100 IOC U I UJ UK 431 240 242 UL UM u .*•; + 100 • 100 100-UL UM UN 242 402 UG U? UQ •f. • ' 100 1 + 0 0 IOC UG UP UQ 239 239 239 UR US U f +• 100 100 . 100 UR US UT 239 239 239 UU uv uw 100 -100 100 UU UV UW 243 243 243 ux UY uz + 100 100 100 UX UY U7. 243 243 ?4? UA UB UC U D U E Uf UG UH UI UJ UK UL OM UN UO UP UQ UR US UT UU UV UW UX UY UZ 6987 « + INDICATES NCN-ZERO QUANTITY LESS THAN C. 5 OVERALL RECGGNITON RATE - EQUAL WE IGHTS : 99.3083 X - SAMPLE WE IGHTS : .99.8139 % F i g u r e A.19 Type IV f i l t e r s C o r n e l l 3MIBEX data 26-character set DECISIONS [%) .2 .3. h.. 5 ..fe. 1 S 9 .0. •lUi:l.^ ..£B 1 1 0 0 1 4 8 7 3 1 0 0 3 4 8 2 4 1 0 0 4 4 3 0 5 1 0 0 5 2 3 9 6 1 0 0 6 2 3 8 7 1 0 0 7 2 3 8 3 1 0 0 8 4 7 * 9 I C O 9 2 3 f i 0 . + . 1 0 0 0 2 4 3 1 2 3 4 5 6 7 8 9 0 3364 + INDICATES MCM-ZERO QUANTITY LESS T H A N Cl. 5  OVERALL R ECOGNI TON RATE - EQUAL WEIGHTS : 99.9588 % ~ SAMPLE WEIGHTS: 99.97C3 % Figure A.20 Type IV f i l t e r s Cornell 3MIBEX data 10-character set U4 US DC _ I'D uC t.r yc u» t i l t;J U* >L v9 u s ux_ V.5J.C LO L * I.I ^ - V L k k y? y°. L» LV_ i i . y*j-y.- L I . _JL_ _ 2 _ 4 5 • 4 • 4 s U* 4. ?S 2 1 2 • 2 1 » t 1 2 • 1 • • • 1 4 * w k» U3 UC 1 *2 *• t • 1 2 ; V • * * I » J 1 2 J L • 1 I * 2 • • • * • UL Li/ US 1 1 1- 2 2 2 • i * 1 1 * t 1 3 u i U J U4 7» • : *fc 4S • t • 2 2 * * 1 • 2 1 \ • * i * 1 * UJ u * wu u - • • 5 t 4» 1 * t J 41 1 t 1 1 • ~ 1 . . ... a • • i * * • U" U J ~i"S I « 2_ 47 2 43 • • » * 2 * * • 2 • • * « • ik u 3 ~ u * uo 1.1 • i 2 2 • * 40 2 2 » • 2 * • - -- 1 t l 2 • 1 * • U l u l Urf u * ~ u * V * •JZ * * * 1 * * 42 *' .11 1 __l • 4 s -L. • 7 u< 2 2 I *4 I 3 4 1 7 U» UT uZ c* tc * 1 14 1 1 7 44 I »2 i 2 I 2 1 J _ t i 1 i 3 1 1 1 1 t * L 4 UC 1.3 IE LF* i 1 1 i i t i M «2 5 1 1 • 3 3 1 2 4 1 ~ 1 u a vc u ' L i L : L< 3 2 t a • 2 4 % i 2 2 2 76 2 42 t i 1 4 > 1 i u-1 1 1 i 4 48 «2 t>0 1 1» * i •» J w* LS. L-L3 1 1 1 * 1 i 2 2 • 2 1 42 4* 1 3 1 1 2 t • • 1." L i LO 1 J ? 44 40 1 4 u i i. J L i L ^  I i 3 1 IV 4 I 1 2 1 7 1 1 1 i •2 « 2 1 3 i t I k s LT UU 6 2 « 2* 32 b 34 s 4 •I 33 47 4 7 2 2 L-t « t i l T I* • 12*" 2*3 }»!> 2 i» i i i •« ti i CT U " 79 <4 22« LZ _ 1 i »4 • I 1 42 2 W H I *1 • U« J6 'i-C US »C v'f U^ UH \tJ i"< t>L u*i ys i " u s ua u s UT uu uv U1 ux uv u; n ta LC UP L£ i r LC LH L! LJ IK IL I» LQ i a i« u u iu iv t« L< II I 2 3 « 5 * 7 « 4 a ' 7 4$ 12 * 2*« i f l 1 47 • !•»* • 4* 4 2:S * :v6; : *T*I * 5 h - 7 £ B 3 CU*MTITy t .£35 TMAN a.s - &*-?L»- -tN^Tat bi.1**2 * F i g u r e A.21 Type IV f i l t e r s C o r n e l l CORAL2 data 66-character set oo o O.CISIONS (->" OA U9_UC UO UE UP UC UH U l UJ UK Ul. IJH UN UO UP_ua,_U« US UT UU UV UW UX UV UZ 1 2 _ J « _ _ 5 *._l_..»_..?--_.0.-• 2 I • • - S UA 9(, US ~uc uo UE " Uf UC IJH • ""iii UJ UK • UL UM + • JJ _________ ub • I V ua " •"" l " us -_UT u u UV ux " ' UY uz 1 1 1 10 UA ue 738 173 S o * 2 1 • 8S 1 • t • • 89 J • 1 <>6 2 • 89 1 9fl 5 7 • 2 1 * 2 LJC 610 UO <il6 UE 719 UF 560 UG 3U9 UH 375 78 + 1 99 95 • 1 99 1 • 86 3 2 1 91 1 1 1 f3 2 95 • 3_ 9« 1 ""+ " V V ~ 9 i " -So" " ¥ ! t T 2 UJ 290 UK 336 UL So9 UH 6 72 _UN UU~ _71S 82S~ U? 305 1 93 95 "95" ~ 97 I t 89 98 • 94 17 95 5 Tl 11 + 82 •UQ " i U H • us UT uu" uv 7 U» UX ur uz 229 653 SSI « 3 1 _ "(.ii 1 319 300 223 519 258 1 G&l • 95 1 96 f 1 91 1 1 1 9fl 2 3 - j j 1 5 • 6 o!>6 556 704 3B0 29S_ 7 3 e l 8 ebit 9 296 "0 ' 206 , 3e2 _l 2S?_ - 198 205 96 1 + 1 • t 11 . 1 91 ">_ " • 61 S i 7 21 7 6S 12 1 1 1 1 1 13 1 10 3 67 2 • ""UA""U8"UC"-0"UE"U? UG'IJH"UI UJ UK UL U'M UN uo UP u'Q'"''u>r~us~UT—uu uv"u~ux"uV - uz"~~i~Y~Y 7 8 97 S i 179<U ' - INDICATES NQN-ZEHO OUAMTITY LESS THAN Q . $ OVERALL RECOGNITOR RATE - E^UAL W.IGHTs 87,697o X " * " • S A M P L E -EI GriTS t 87.603s X F i g u r e A .22 Type IV f i l t e r s C o r n e l l CORAL2 data 40-char 'acter se t co i—• UA UB UC UD UE UG UH UI UJ UK UL UM UN UQ UP UR US UT UU UV uw UX UY UZ NUMBER . UA UB 97 + + 92 + 3 + • + + + 2 1 1 1 UA UH 738 473 UC UO UE + 2 87 + + + 86 + 2 1 90 + 3 4 + + + + 1 6 10 + + + 1 1 + + ' 1 + 2 UC UO UE 610 416 719 UF UG . UH + 2 + 1 i 2 97 93 9a + + + •2 + 1 2 3 2 +. + UF UG UH 560 349 375 UI UJ UK + 1 • + 92 i 1 99 95 + + 1 t ti + • + 3 UI UJ UK 612 290 336 UL UM UN + 1 + + 1 + •f 3 1 + 100 • 1 + 2 88 2 92 1 1 + 1 1 1 UL UM UN 589 672 714 UO UP UQ + + 2 6 + 1 + 4 1 + ,8b 3 95 2 95 • + 1 2 + + UO UP . UQ 825 305 229 UR US UT + 1 1 I + 1 +• 2 + 2 2 • + + + 92 1 + 97 95 + + * 2 UR US UT 653 551 431 UU uv UW 1 • . + _ 1. + + + 2 1 + • 1 + 96 1 97 + + 94 • 2 UU UV Uw 44 1 319 300 UX UY UZ + • + 2 1 + I + 2 2 98 9a 98 UX UY UZ 223 519 233 UA U6 UC UD UE UF UG UH U l UJ UK UL UM UN uO UP UQ UR US UT UU UV U* UX UY UZ 12487 T~TND IC AT ES~N3 N-Z E R Q"'"QU A N TITy" LF. S S" t HA N 0 , 5 OVERALL KbCQGNHON RATE - EQUAL WEIGHTS : 93,82^5 % . SAMPLE WEIGHTS: 93,1208 % F i g u r e A . 2 3 Type IV f i l t e r s C o r n e l l CORAL2 data 26 -cha rac te r set . D E C I S I O N S (%) I 2 3 4 5 6 7 8 9 0 N UH B E R 1 <?3 + 1 1 1 081 2 + 98 + f + + • 2 688 3 2 + 96 1 1 + 3 556 a 98 l 1 • a 704 5 1 + 9b l 1 5 38 0 6 r 1 "96 2 6 29a 7 + 100 7 361 8 2 I + t 97 8 9 2 + 97 + 9 296-0 + + « 1 98 0 206 + I N D I C A T E S N O N - Z E R O Q U A N T I T Y L E S S T H A N 0,5  0 V E R A L L _ R E C 0 G N I T 0 N 1 R A T E - E Q U A L W E I G H T S • 96„9i5i % " • " " " " " ' • ' " " " " " - S A M P L E W E I G H T S ' : 9 6 , 9 2 3 1 % F i g u r e A .24 Type IV f i l t e r s C o r n e l l CORAL2 data 10 -cha rac te r se t OF.CISICNS U) A 8 C O . E F G H I J K L M N O P Q R S T U V W X Y Z A B 9C 1 92 1 i 1 1 i 2 1 2 i 3 3 1 1 1 I A B C C _ 1 1 3 2 84 86 1 83 5 5 1 2 1 3 3 4 3 1 1 1 1 C 0 E G H 1 1 1 8. _ 1 .0 75 78 . 1 1 • 1 2 8 .1 1 8 1 1 1 1 2 3 2 7 F G M I J K 3 1 1 1 1 3 88 4 1 82 87 6 3 1 1 1 1 . 1 4 2 2 1 1 1 1 1 1 1 1 I J K L M N 2 1 i 1 2 10 13 1 1. 1 87 85 2 3 76 1 2 1 1 1 5 3 L M N P Q i 3 _ 2 5 2 3 3 1 1 91 2 86 1 2 1 90 2 3 1 • P Q " " R S T 2 1 1 1 1 1 ' 3 3 1 3 2 3 1 1 1 1 2 •2 88 1 87 92 1 R S T U V V. 3 3 1 1 2 2 1 1 3 3 1 1 . 1 73 3 4 3 £3 3 13 7 86 1 1 U • V w X Y Z 1 1 1 3 1 _ 1 2 1 i 1 • 1 1 1 1 1 1 90 3 3 92 96 X Y Z A - 8 C 0 E F G H I J K L M N 0 P Q R S T U V W X Y Z '146 'S'A ' 'Kpr_S"OFEACHCHARA^ OVERALL RtCQGNITON RATE: 86.2750 F i g u r e A . 2 5 Type V f i l t e r s Munson's da ta 26 -cha rac te r se t A l l samples used f o r t r a i n i n g and t e s t i n g . INCISIONS *s> DECISIONS 13) 8 C 0 e p.. c H I J K _JL_ __N 0_ 0' <t s T U V W X Y i 2 i 4 5 6 7 8 9 0 ( * 7 * -• » s 1 1 -t 73 «, ^ 3 3 2 1 1 1 J 1 \ 3 1 1 1 9 1 1 12 3 2 l 1 ? A n c 0 e f a 1 5 .2 3 66 . 2 8 80 59 7 1 8 6 1 75 53 1 1 1 1 1 5 5 3 1 1 J 1 1 3 12 6 2 1 . 3 6 1 1 1 1 1 C D c ~~X ' i 1 ; 5 8 1 2 11 1 1 1 3 7 1 1 1 \ 5 10 i I *. 1 5 2 1 1 1 1 e G H i j K S 1 1 1 1 1 1 61 5 I 59 1 i ] 3 2 \ J 1 <) 2 <, 3 1 1 1 2 7 1 i i . 5 1 . 1 -10 10 . 1 _2_ 3 1 <r 1 1 1 1 1 1 1 1 J < " 5 1 1 1 i r 11 "IT 7? 76 5 40 1 'i 7 4 H 1 i 8 1 3 1 3 1 l 1 1 1. M N 0 * 0 X s I 1 3 I 1 2 3 & . 1. 3 _.2. 3 3 _1_ 2 7 7? 68 5 1 7<> 5 3 1 2 1 1 I 2 I 3 2 1 1 3 1 2 l 1 0 P 2 3 1 " l " 1 1 3 1 1 J 5 71 1 73 1 4 2 6 1 3 1 1 1 2 1 ~i" 2 1 1 1 R S T u V 1 1 3 1 1 1 . 1 _3_ 2 3 1 1 1 1 c6 •3 5 3 71 ? 12 8 7? 1 6 i 2 1 1 l i 1 U V w' M " X Y 1 _ . ~T l 1 1 1 3 1 1 5 76 * 6 79 82 1 3 1 1 1 1 1 1 1 " T l 1 1 1 2 1 X Y 7 I 1 2 1 1 1 61 1 1 1 a l 5 12 5 1 1 2 i ._3 8 i 5 3 4> 1 5 6 60 ...3 3 1 1 . .1 2 3 . 2 l -l 1 ? 1 1 2 3 3 5 3 5 ! ' 2 1 I •» ~Y 1 I 1 1 1 8 i 1 1 "2 •> 2 5<. 1 1 1 1 1 7' 1 1 6 10 3 1 1 1 10 1 3 1 4 5 * 7 a ? 0 c 11 .. 1 I _JL 1 1 _1 2 _ 1 2 1 1 1 3 1 1 3 1 1 3 4 _ 1 1 1 1 2 1 3 1 . .1 1 66 1 _ 3 1 5* 3 5 5 70 1 6 2 1 I 1 1 } I 1 7 7 8 9 ' ' i 1 10 2 1 1 " "l 6 1 1 5 1 1 1 I 5 3 " 3 1 1 1 72 73 1 91 3 I l 1 1 1 2 3 0 ( / i 1 1 3 3 1 1 10 1 ..7 2 1 ...1 7 1 65 _3 a7 8 85 Vs" 13 1 2 I / • -7 \ . 1 3 3 2 10 y i ' 1 3 1 <t 2 3 1 1 7 82 3 1 • > ; 1 1 1 1 5 3 • 1 3 1. 1 1 1 3 1 1 5 1 2 3 <r 2 1 67 79 * 1 A ~"C -~0" ~"ri I J K t f N C P 0 5 T " J V w X Y — r * 1 2 3 4 — S" " 6" — 7 8~ ~~9 0 ( * / • t i ) _C V £3 A L L. * E C? CN IT C N. R AT E !_71 .12 e.6. « . _ F i g u r e A .26 Type V f i l t e r s Munson's data 46 -cha rac te r se t Average of ' three r o t a t i o n s oo OSCISIONS (%) A B C 0 E F G H I' J K L N 0 P Q ft S t U' V W X Y Z A B 79 1 83 i 3 1 i 1 3. 1 3 1 2 1 l 1 A 3 1 3 2 1 1 3 A B C 0 £ 1 5 3 75 2 81 1 72 3 10 3 e 1 1 1 X 5 5 c 3 2 1 1 1 1 5 " C D F F G H 1 3 3 1 13 10 1-32 64 72 1 1 1 1 1 2 • 5 8 • 1 2 11 1 1 1 1 1 2 •3 7 1 1 F G H I J K- 5 1 1 i 1 1 3 1 1 1 1 5 73 3 3 70 1 1 69 8 2 1. 1 1 1 9 2 5 1 2 5 3 1 1 1 1 2 2 4 1 I J K L M N 2 1 1 2 1 1 1 12 14 1 1 1 1 86 76 5 60 1 4 2 5 14 4 L M N 0 p Q 1 5 1 1 1 2 3 7 1 3 5 '3 1 1 8 0-2 1 76 .1 5 1 86 6 5 l " 1 0 P 0 9, S T 1 2 3 1 1 6 3 2 4 2 2 7 4 1 1 1 2 5 79 1 83 82 1 p s T U V VJ 1 1 1 1 3 1 4 1 1 1 4 2 3 2 6 3 r 1 ; i 1 . 1 67 3 ' 5 3 73 2 12 8 79 1 6 u V w X Y z I ? 1 1 4 1 1 6 1 1 1 1 1 1 i 3 1 1 1 6 79 7 4 82 94 X Y z A " 8 C 0 F G H T 1 • J K L f N G P Q R S T U V W X Y Z CVERALL RECOGNITCN RATS : 76.9231 % F i g u r e A .27 Type V f i l t e r s Munson's data 26 -cha rac te r se t Average of th ree r o t a t i o n s DECISIONS i%) 1 2 3 4 5 6 7 8 9 0 1 85 2 1 1 10 i 1 2 1 30 3 1 2 1 5 3 4 •> 3 1 4 79 5 3 6 2 1 3 4 2 76 1 1 1 6 12. 4 5 1 8 1 82 2 1 1 . 4 5 6 4- 2 1 3 1 84 1 4 6 7 3 3 3 2 79 2 6 3 7 8 1 4 3 1 1 2 i 69 3 9 8 9 1 4 1 4 12 75 3 9 0 1 10 4 1 1 84 0 146 S A M P L E S OF EACH CHARACTER USED OVERALL RECOGNITION RAT E : 79 «1761 % F i g u r e A .28 Type V f i l t e r s Munson's data 10 -cha rac te r se t Average of th ree r o t a t i o n s OS-CISIONS CZ) 'UA"i'b_UC W u t OF L'&~n™Tl—ijj'Tirf" 'U'CUH" U* t)0""uP uTi UR Vs'.uT'UU OV U« UX CFY u i ' ~ i 2 3 « S 6 7" 8 9 0 , / - S ""NUM»t» 100 UA 2*11 us UC uo 10 0 100 ICO • • US UC ui) 2-11 <Jb2 2«1 u£" UF UG 100 1 00 s 05 + 1 UE of u i 2*1 2*1 2 " I U n U! UJ 1 0.0 97 :oC 3 • un Ul UJ 2« 1 88 1 2<-0 " UK UU I"! + 100 100 100 Di-. UL UH 2-2 2«2 ubZ UM U'J u» 1 100 68 100 il UN UU UP £39 239 2 39 UU uw U3 100 100 99 • 1 UO UK us 239 239 239 ur U'J uv m o 100 ' 100 ' uT U'J UV iVi 2*3 2 « * " ir» U i uv • + 100 99 100 u * UK UY 2 " 5 ~ 2 ^ i 2<tJ Ui 1 2 u -.100 89 2 t • 98 • U£ 1 2 2">3 «B7 2-iJ ™"4S2 if — _ • 100 * u s 100 100 5 t o O 239 o 1 8 100 100 -too to 7 6 ~~~— n" 23b 23d a ?t> ~~ 2 i s 213 2-3 "9 c 100 100 86 7 7 9 0 « 9b • 1 2« i - 1 9 ' I 89 - Z-i UA U« UC UO uE hf UG UH UJ <U UK UL n« UN uU UP UQ UR US UT UU UV UW UX UY UZ 1 2 3 +" I ^ o t C i f t S ^UN-ZERO Oi'ANTirY U S S T "AN n,5 " c V £ ? A L U ' s e e o C T i r a.-riTiYr'.";; ; " U AC ' "•wTi'G«TS'7"'"9s;3«6ir"S" F i g u r e A .29 Type V f i l t . e r s C o r n e l l 3MIBEX data 46 -cha rac te r se t co oo U A 10 0 DECISIONS (X) TjAnLTfTTTir-Uunjr^^ NURBE'FT UA 241 UU 100 UC 100 110 100 UE 1 0 0 UF 100 UG «• 96 TJB 2 T T UC 432 UD 241 "UE 24 1~ UF 241 UG 201 UH 100 UI • + 100 UJ "UK" + UL J ° 0 UM Hi. • ; — iJR 24 1 UI 481 « F L C UJ ,240 ^-FOO : 0 K - " 2 4 2 -UL - 242 UM 482 ; : ON 2~3T" 9 V . UO 239 UP 239 UN UO UP 100 U'J UH 100 roo 100 100 US 239 TOO UQ 239' 100 UR 239 ur uu uv • 1O0 100 100 U 1 uu uv 239 2«3 243 uw UX + . + ur • 100 99 100 Uw UX UY 243 24 3' 2^3 UZ UA U3 UC UD UE UF UG UH Ul V.J UK UL ijM UN UO UP UQ UR US UT UU UV UW 100 UX UY UZ UZ 243 • «• ^ 6987 f INDICATES NON-ZERO QUANTITY LESS THAN 0.5 OVERALL RECOGNITON RATE - EQUAL WEIGHTS - SAMPLE WEIGHT : 99,7206 % S: 99,7281 X F i g u r e A .30 Type V f i l t e r s Co rne l l ' 3M IBEX da ta 26 -cha rac te r set D E C I S I O N S m 1 "2— 1 4 5 6~ T~ 8 9 0 NX W E 'S" • 1 10Q i[ : 1 4 6 7 2 iOO . + ? ? 4 3 3 100 + 5 • 4 8 ? 4 H>0_ . 4 480 6 1 00 6 7 1_00 7 238 5 100 ' 5 476 9 100 9 233 0 + 1 0 0 0 2 4 3 + j 2 3 « 5 6 7 8 9 0 3364 I N D I C A T E S N O N - Z E R O Q U A N T I T Y L E S S THAN 0 , 5 O V E K A L U K £ C O 6 * 1 H O N R A T ? - -~"ECTI/AC WETGHT'S : 9 9 , 8 7 6 4 % S A M P L E W E I G H T S : 9 9 , 6 8 1 1 ' % F i g u r e A.31 Type V f i l t e r s C o r n e l l 3MIBEX data 10-character set . . . . . . K I C W I O U <«> — —ui ui uc uo ut-ur uc >j» in'iu u«~vc u"U» uu u» no u» us » r » i n r - j « T i < - v r u n « u. ». I v u-te m i v t e - t « " t : - w n w i " v.o-L»-to-i.»-i.s-i.T-ti»T.v-t«-cic-vir-i.i—j--Xt—-j-: •-> > »-• : -«3 r—* 2 > , ! - , 2 2 7 1 J I . ul 412 " »• • • 1 1 L ,....1.1 . , -—• - - u« JJ. • --.$ - .--J • • « . , . . 1 . , J I • * 14 5 « 2 » . 1 1 "> T 1 " " T ^ : 2 : . . . . . . . . . . . • " , -1 40—2 »7 UH L'« «)1 us ... - j - - u. »« 2 . I l l u us 1 1 1 - I 1 1 1 l» «2 . J LC " — t - l " * " 1 * , 1 10 .1 '» " . 4 , 1 » 1 1 1 1 I 1 " "' 1 I " « 2 I " J 1 * 4 UJ !• «' , 1 L i ' ' . '* ,. : , < 27—r—i—-— r~> Li—si2-« i r m j L- J. » , .5 2 1 . - -"1 « — . 4J t— 1 • * L , i t J M J * -J 1« PT L> 1« 1 , -• 7T- 1—1 ' l u -Id 1 1 ' 7 T \ i 1 5 2 L> «2 2 L- «: U it 7 22 4 , j . «2 . — J - . 7 1.1 * • . • 1 1 « J S , M L . LL U, L . L, ^ ^ r ^ - ^ ^ ^ ^ ^ ^ ^ r ^ ^ - ^ "> * " " ^ " I" » ' " ^ L ' L U L " L I U L " ^ ' ^ R C T ^ T " . J S 0 X C * T . * ".Ot-JCfiO C u i « . t I T T L f S S 0 . -j : i 7* i * -2 1 * 11 »» • 1*9 —u—j—s—;—s—9—s—T~7—"—* J^ y>7 C < E « - U "tCCC». ITCs »AT£ - E3i.ll. W t l C T j , 8..S7SJ X n*vn-pi- t- r C^TS"!-5T75 vwx~ F i g u r e A .32 Type V f i l t e r s C o r n e l l CORAL2 data 66 -cha rac te r set DECISIONS CX) UA - U 3 - U C - U 0 - ue~ur ue UH~UI-VJ~OK-UCTJN-ON-UO-up-oa-UR-usuT-uu-ov-ow-u-jrur-oz— i 2 — 3 — « -UA 97 ' 3 +_ -03 £TS I t * 2 * I UC 8 0 * 4 6 • " ' 5 1 • • UO 1 + 87 • • 4 + + + 2 " U E 2 — + 8 9 - 3 - -• — • 1—• ~ J"' 2 " • + >" \jr 1 OS 3 1 UG a 1 84 3 1 1 2 ' • • g - - , - 0 •--»-• m l • NUMBER" • 3 1 2 • • + i UA ~VST UC UD UE UF UG 738 - S 7 3 — * 610 416 719 " 5oO 349 Un UI U J UK " >-UL UH 7S + + 94 9 6 " " • I " 9 4 1 86 5 + 2 1 • 1 ~UH 375 11 uo UP uo un us -*'—97 i ; 56 1 • 3 93 2 ......z 9 2 - - 3 1 96 + + 93 t 2 + • + 29 J ~ ' • 2 "TTT UU UV UK '•• *" UX uv ...... J.. 1 — 93-+ 97 1 - 1 9 2 ™ l 97 95 — 1 -1 • 1 UI UJ U>v UL UM - J N ~ UU UH UG UR US UU UV Uk UX UY 612 290 336 5o9 6 7 2 "714-625 305 229 653 551 -53 r 4«.i 319 300 223 519 U i - 9 T T 16 67 v " 2 " + 3 • 1 3« t 10 UA UB UC UO UE UE UG UH UI UJ UK UL U" UN uO UP UQ UH US UT UU UV U* UX UY UZ 1 —p«~DTC"A I E 5 wtTN-J? EW^CnjTT^TTT¥n^S"S_TKA-R'-0 .5 2 3 « S 6 7 8 179«i OVERALL" RECOGNITOirRATE*"-" "EUilAL WETGHTs•«b.'C««9" X - SAMPLE " E I G H T S : 6S.O0S1 X F i g u r e A . 3 3 Type V f i l t e r s C o r n e l l CORAL2 data 40 -cha rac te r set ' D E C I S I O N S (*) vs-nB-TJC-UD~Utr-t.FF"_utj~uK"TfT"IUT; x ~ u r o M U - N - U O ~ ~ U P H J Q ~ U R " ~ U S - U ^ NUMBER* IJ A 9t + • 3 UA 738 -7JB 93 1 — 1 ' f : 3 2 I UB ZT73— UC 81 + 4 8 1 6 + 1 UC 610 UO + 1 + 8 8 + + 6 . + + 1 2 UD 416 — — — — — -..... — - „ , . — • — -• , i c — -f 4 rt -UE 2—+ 90"~ "3" 1-'3 +~ U E 7 1 9 UF 1 95 3 1 . UF 560 86 * 1 1 2 J UG 349 UG .4 •* 1  T J H ? : 9"S UI 3 + 89 + UJ ~Z f ~l U'r\ 175-1 1 3 + 1 • + UI 612 + 98 + + + 1 + UJ 290 U K + * + 96 +"""• 1 1 : ~~+ 1 U K - - 336" UL + + 1 + 97 + + + + UL 589 UM • + 4 1 87 5 +_ + 2 UM 672 -TjT3 + i r~9~s ' : i * un r n r UO + 2 S + 2 '84 ' 1 + ft + UO 825 UP 1 4 94 2 UP 305 -Ua —: 1 + -3 92—3" "1 : UG"~ 229 UR + 1 + 1 1 . 1 96 + UR 653 US + + ' + 99 + US 551 97j—;F r 3 c n — n r r 98 uu um + 97 1 1 UV 319 UT 1 1 + UU UV + _ - U W — - + : - +— ~ UX 3 UY 2 + "1 95" ' UW300 97 UX 223 1 96 UY 519 : : 100 UZ 2TB" TTT UA U8 UC UD UE UF UG UH Ul UJ UK UL U« UN UU UP UQ UR US UT UU UV UW UX UY UZ 124S7 + INDICATES NON-ZERO QUANTITY LF.SS THAN 0,5 OVERALL RECOGNITON RATE - EQUAL WEIGHTS : 93.4777 X - SAMPLE WEIGHTS: 9 2 . 8 8 « 6 % F i g u r e A .34 Type V f i l t e r s C o r n e l l CORAL2 data 26 -cha rac te r set D E C I S I O N S (%j _„T—2__.-3 y S — - 6 j—g--.--g— NUHRER"" 90 2 3 3 + 1 I " 6 1 9-9 37 + ; 2 68 8 3 I + 97 1 + 1 3 S 5 6 4 + + 99 + + 4 704 -5 + 2 9 3 " " 5 3 8 0 " 6 + 1 3 94 1 6 294-7 + 1 99 7 361 -g —+ 1 1 1 9~7 8 ZP54 9 1 1 1 + 9 7 9 296 0 + . 1 1 98 0 206 } 2 3 4 5 6 7 8 9 0 4 4 2 0 + 1NUILA1E6 NOiWb.RO IJU ANP ' I Y LESS '^N O . b . ~0 V E« A F XT-R E C 0 G'NTTO N R A T E ' " " - E & O A ' C W E I G H T S : 9 5 . 9 1 0 0 T . SAMPLE W E I G H T S : 9 7 . 1 2 ^ 7 % F i g u r e A . 3 5 Type V f i l t e r s C o r n e l l CORAL2 da ta 10 -cha rac te r se t 95. REFERENCES A l F. A l i and T. P a v l i d i s , " S y n t a c t i c r e c o g n i t i o n of handwritten numerals", IEEE Trans. Systems, Man and C y b e r n e t i c s , Vol.7, pp. 537-341, J u l y 1977. B l P. Baran and G. E s t r i n , "An adaptive character reader", i n IRE Wescon Convention Record, p a r t 4, 1960, pp. 29-41. B2 R. Bellman, I n t r o d u c t i o n to M a t r i x A n a l y s i s 2nd E d i t i o n . New York, N.Y.: McGraw-Hill, 1970, pp.242-243. B3 W. Bledsoe and C. B i s s o n , "improved memory matrices f o r the h-tuple p a t t e r n r e c o g n i t i o n method", IRE Trans. E l e c . Comput., Vol.11, pp. 414-415, June 1962. C l C K . Chow, "An optimum character r e c o g n i t i o n system using d e c i s i o n f u n c t i o n s " , IRE Trans. E l e c . Comput., Vo l . 6 , pp. 247-254, Dec.1957. C2 C K . Chow, "A r e c o g n i t i o n method using neighbour dependence", IRE Trans. E l e c . Comput., Vol.11, pp.683-690, Oct. 1962. C3 C o r n e l l A e r o n a u t i c a l Laboratory, An Alphanumeric Character P a t t e r n Data Base. B u f f a l o , 1968. DI R.O. Duda and H. Fossum, " P a t t e r n c l a s s i f i c a t i o n by i t e r a t i v e l y determined l i n e a r and piecewise l i n e a r d i s c r i m i n a n t f u n c t i o n s " , IEEE Trans. E l e c . Comput., Vol.15, pp.220-232, Apr. 1966. D2 R.O. Duda and P.E. Hart, P a t t e r n C l a s s i f i c a t i o n and Scene A n a l y s i s . John Wiley and Sons, 1973. HI S. Hard and T. Feuk, "Character r e c o g n i t i o n by complex f i l t e r i n g i n reading machines", P a t t e r n R e c o g n i t i o n , V o l . 5 , pp. 75-82, 1973. H2 L.D. Harmon, "Automatic r e c o g n i t i o n of p r i n t and s c r i p t " , Proc. IEEE, Vol.60, pp. 1165-1176, Oct. 1972. H3 W.H. Highleyman, "An analog method f o r character r e c o g n i t i o n " , IRE Trans. E l e c . Comput., Vol.10, pp. 502-512, Sept. 1961. H4 W.H. Highleyman and L.A. Kamentsky, "Comments on a c h a r a c t e r r e c o g n i t i o n method of Bledsoe and Browning", IRE Trans. E l e c . Comput., V o l . 9, p. 263, June 1960. H5 D. Himmel, "Some r e a l - w o r l d experience w i t h handprinted o p t i c a l character r e c o g n i t i o n " , IEEE Trans. Systems, Man and C y b e r n e t i c s , V o l . 8, pp. 288-292, A p r i l 1978. H6 L.P. Horwitz and G.L. Shelton, " P a t t e r n r e c o g n i t i o n using auto-c o r r e l a t i o n " , Proc. IRE, V o l . 49, pp. 175-185, Jan. 1961. A. W. H o l t , "The impact of new hardware on OCR designs", P a t t e r n R e c o g n i t i o n , V o l . 8, pp. 99-105, 1976. B. Hunt, "Bayesian methods i n n o n l i n e a r d i g i t a l image r e s t o r a t i o n " , IEEE Trans. Comput., V o l . 26, pp. 219-229, Mar. 1977. A.B.S. Hussain, G.T. Toussaint and R.W. Donaldson, " R e s u l t s obtained using a simple character r e c o g n i t i o n procedure on Munson's handprinted data", IEEE Trans. Comput., V o l . 21, pp. 201-205, Feb. 1972. L. Kanal, "Patterns i n p a t t e r n r e c o g n i t i o n " , IEEE Trans. Information Theory, V o l . 20, pp. 697-722, Nov. 1974. A. Kegel, J . G i l e s and A. Ruder, "Observations on s e l e c t e d a p p l i c a t i o n s of o p t i c a l character readers f o r c o n s t r a i n e d numeric handprint", IEEE Trans. Systems, Man and C y b e r n e t i c s , V o l . 8, pp. 282-288, A p r i l 1978. A.L. K n o l l , "Experiments w i t h ' c h a r a c t e r i s t i c l o c i ' f o r r e c o g n i t i o n of handprinted c h a r a c t e r s " , IEEE Trans. Comput., V o l . 18, pp. 366-372, A p r i l 1969. P.A. Lachenbruch and M.R. Mickey, " E s t i m a t i o n of e r r o r r a t e s i n d i s c r i m i n a n t a n a l y s i s " , Technometrics, V o l . 10, pp. 1-11, Feb. 1968. T. M i l s o n and F. Rao, "A s t a t i s t i c a l model f o r machine p r i n t r e c o g n i t i o n " , IEEE Trans. Systems, Man and C y b e r n e t i c s , V o l . 6, pp. 671-678, Oct. 1976. M. Minsky, "Steps toward a r t i f i c i a l i n t e l l i g e n c e " , Proc. IRE, V o l . 49, pp. 8-30, Jan. 1961. J . Munson, "Experiments i n the r e c o g n i t i o n of handprinted t e x t : p a r t I - character r e c o g n i t i o n " , i n 1968 F a l l J o i n t Comput. Conf., AFIPS Conf. P r o c , V o l . 33. Washington, D.C: Thompson, 1968, pp. 1125-1138. G. Nagy, "State of the a r t i n p a t t e r n r e c o g n i t i o n " , Proc. IEEE, V o l . 56, pp. 836-862, May 1968. H. Niemann, " C l a s s i f i c a t i o n of c h a r a c t e r s by man and machine", P a t t e r n R e c o g n i t i o n , V o l . 9, pp. 173-179, 1977. N. N i l s s o n , Learning Machines. New York, N.Y.: McGraw-Hill, 1965. K. Ozawa and K. Tanaka, "Character r e c o g n i t i o n using c o r r e l a t i o n technique on s p a t i a l complex-frequency plane", i n Proc. 2nd I n t . J o i n t Conf. on Pat. Recog. Copenhagen, Denmark, 1974, pp. 409-410. T. P a v l i d i s and F. A l i , "Computer r e c o g n i t i o n of handwritten numerals by polygonal approximations", IEEE Trans. Systems, Man and C y b e r n e t i c s , V o l . 5, pp. 610-614, Nov. 1975. E . P e r s o o n a n d K . S . F u , " S h a p e d i s c r i m i n a t i o n u s i n g f o u r i e r d e s c r i p t o r s " , I E E E T r a n s . S y s t e m s , M a n a n d C y b e r n e t i c s , V o l . 7 , p p . 1 7 0 - 1 7 9 , M a r . 1 9 7 7 . W . A . P o r t e r , " A c o m p a r i s o n o f s e l e c t e d p a t t e r n r e c o g n i t i o n f u n c t i o n s " , P a t t e r n R e c o g n i t i o n , V o l . 9 , p p . 7 7 - 8 7 , 1 9 7 7 . P . J . Q u a r m b y , " E x p e r i m e n t s o n h a n d - w r i t t e n n u m e r a l c l a s s i f i c a t i o n " , I E E E T r a n s . S y s t e m s , M a n a n d C y b e r n e t i c s , V o l . 1 , p p . 3 3 1 - 3 3 8 , O c t . 1 9 7 1 . J . R a b i n o w , " T h e p r e s e n t s t a t e o f t h e a r t o f r e a d i n g m a c h i n e s " , i n P a t t e r n R e c o g n i t i o n , L . K a n a l , E d . , W a s h i n g t o n , D . C : T h o m p s o n , 1 9 6 8 , p p . 3 - 2 9 . C Y . S u e n , M . B e r t h o d a n d S . M o r i , " A d v a n c e s i n r e c o g n i t i o n o f h a n d p r i n t e d c h a r a c t e r s " , i n P r o c . 4 t h I n t . J o i n t C o n f . o n P a t . R e c , 1 9 7 8 , p p . 3 0 - 4 4 . C Y . S u e n a n d R . J . S h i l l m a n , " L o w e r r o r r a t e o p t i c a l c h a r a c t e r r e c o g n i t i o n o f u n c o n s t r a i n e d h a n d p r i n t e d l e t t e r s b a s e d o n a m o d e l o f h u m a n p e r c e p t i o n " , I E E E T r a n s . S y s t e m s , M a n a n d C y b e r n e t i c s , V o l . 7 , p p . 4 9 1 - 4 9 5 , J u n e 1 9 7 7 . G . T . T o u s s a i n t a n d R . W . D o n a l d s o n , " A l g o r i t h m s f o r r e c o g n i z i n g c o n t o u r - t r a c e d h a n d p r i n t e d c h a r a c t e r s " , I E E E T r a n s . C o m p u t . ( S h o r t N o t e s ) , V o l . 1 9 , p p . 5 4 1 - 5 4 6 , J u n e 1 9 7 0 . J . R . U l l m a n n a n d P . G . K i d d , " R e c o g n i t i o n e x p e r i m e n t s w i t h t y p e d n u m e r a l s f r o m e n v e l o p e s i n t h e m a i l " , P a t t e r n R e c o g n i t i o n , V o l . 1 , p p . 2 7 3 - 2 8 9 , 1 9 6 9 . T . Y . Y o u n g a n d T . W . C a l v e r t , C l a s s i f i c a t i o n , E s t i m a t i o n a n d P a t t e r n  R e c o g n i t i o n . New Y o r k , N . Y . : A m e r i c a n E l s e v i e r P u b l i s h i n g C o . I n c . 1 9 7 4 , p p . 5 2 - 2 1 9 . 

Cite

Citation Scheme:

        

Citations by CSL (citeproc-js)

Usage Statistics

Share

Embed

Customize your widget with the following options, then copy and paste the code below into the HTML of your page to embed this item in your website.
                        
                            <div id="ubcOpenCollectionsWidgetDisplay">
                            <script id="ubcOpenCollectionsWidget"
                            src="{[{embed.src}]}"
                            data-item="{[{embed.item}]}"
                            data-collection="{[{embed.collection}]}"
                            data-metadata="{[{embed.showMetadata}]}"
                            data-width="{[{embed.width}]}"
                            async >
                            </script>
                            </div>
                        
                    
IIIF logo Our image viewer uses the IIIF 2.0 standard. To load this item in other compatible viewers, use this url:
http://iiif.library.ubc.ca/presentation/dsp.831.1-0065442/manifest

Comment

Related Items