UBC Theses and Dissertations

UBC Theses Logo

UBC Theses and Dissertations

An experimental system for recognizing typewritten characters Fletcher, Thomas Ralph 1970

Your browser doesn't seem to have a PDF viewer, please download the PDF to view this item.

Item Metadata

Download

Media
831-UBC_1970_A7 F54.pdf [ 5.27MB ]
Metadata
JSON: 831-1.0102200.json
JSON-LD: 831-1.0102200-ld.json
RDF/XML (Pretty): 831-1.0102200-rdf.xml
RDF/JSON: 831-1.0102200-rdf.json
Turtle: 831-1.0102200-turtle.txt
N-Triples: 831-1.0102200-rdf-ntriples.txt
Original Record: 831-1.0102200-source.json
Full Text
831-1.0102200-fulltext.txt
Citation
831-1.0102200.ris

Full Text

AN EXPERIMENTAL SYSTEM FOR RECOGNIZING TYPEWRITTEN CHARACTERS by THOMAS RALPH FLETCHER B.A.Sc, Uni v e r s i t y of B r i t i s h Columbia, 1968 A THESIS SUBMITTED IN PARTIAL FULFILMENT OF THE REQUIREMENTS FOR THE DEGREE OF MASTER OF APPLIED SCIENCE i n the Department of E l e c t r i c a l Engineering We accept this thesis as conforming to the required standard Research Supervisor Members of the Committee Acting Head of Department Members of the Department of E l e c t r i c a l Engineering THE UNIVERSITY OF BRITISH COLUMBIA September, 1970 In p r e s e n t i n g t h i s t h e s i s i n p a r t i a l f u l f i l m e n t o f t h e r e q u i r e m e n t s f o r an a d v a n c e d d e g r e e a t t h e U n i v e r s i t y o f B r i t i s h C o l u m b i a , I a g r e e t h a t t h e L i b r a r y s h a l l make i t f r e e l y a v a i l a b l e f o r r e f e r e n c e a n d s t u d y . I f u r t h e r a g r e e t h a p e r m i s s i o n f o r e x t e n s i v e c o p y i n g o f t h i s t h e s i s f o r s c h o l a r l y p u r p o s e s may be g r a n t e d by t h e Head o f my D e p a r t m e n t o r by h i s r e p r e s e n t a t i v e s . I t i s u n d e r s t o o d t h a t c o p y i n g o r p u b l i c a t i o n o f t h i s t h e s i s f o r f i n a n c i a l g a i n s h a l l n o t be a l l o w e d w i t h o u t my w r i t t e n p e r m i s s i o n . D e p a r t m e n t o f £C&C 7-, The U n i v e r s i t y o f B r i t i s h C o l u m b i a V a n c o u v e r 8, C a n a d a Date $&PT. ^i/fyO ABSTRACT An o p t i c a l character r e c o g n i t i o n system was developed using the Lexiphone, a d i r e c t - t r a n s l a t i o n reading machine f o r the b l i n d , i n t e r f a c e d to a d i g i t a l computer. The system was designed to read alphanumeric typewritten characters. In the development stages one s e r i f e d type-style was used e x c l u s i v e l y . Three a d d i t i o n a l d i f f e r e n t , but s e r i f e d , type-styles were also used. C l a s s i f i c a t i o n was performed by a method re f e r r e d to as "component de l e t i o n " . This method extends a nearest neighbours c l a s s i f i c a t i o n scheme enabling feature vectors d i f f e r i n g i n length by one two-bit segment to be compared by making t h e i r dimensionality the same. The method includes the c l a s s i f i c a t i o n of equal length vectors. The component d e l e t i o n technique provided s i m i l a r r e s u l t s to exact matching yet required s i g n i f i c a n t l y l e s s t r a i n i n g . Using carbon ribbon p r i n t , recognition was b e t t e r than 92% f o r the four type-styles combined. The use of c l o t h ribbon for the type-style used i n the development lowered recogn i t i o n by a few percent. i i TABLE OF CONTENTS Page ABSTRACT . . . . . i i TABLE OF CONTENTS . . . . . . . . i i i LIST OF ILLUSTRATIONS v . LIST OF TABLES • - . v i i ACKNOWLEDGEMENT v i i i 1. INTRODUCTION 1 1.1 The Purpose of the Research . 1 1.2 Perspective 1 1.3 The System 2 2. PREPROCESSING INFORMATION.FROM THE LEXIPHONE 4 2.1 Introduction 4 2.2 Feature .Selection . 4 2.3 Feature Vector Representations 8 2.3.1 Vector Representation of the V e r t i c a l Information . . . 9 2.3.2 Vector Representation of the Horizontal Information . . 14 2.4 Further Discussion of the Vector Representations 16 3. FEATURE VECTOR CLASSIFICATION 20 3.1 Introduction 20 3.2 The Nonparametric Approach 20 3.3 Component Deletion 22 3.3.1 The Metric 22 3.3.2 The Comparison of Vectors of Equal Length 23 3.3.3 The Comparison of Vectors of D i f f e r e n t Lengths 23 3.3.4 Examples of Various Special Cases of CD 27 i i i Page 3.3.5 The C l a s s i f i c a t i o n of the H2 Vectors 27 3.4 The Sequential Decoder 27 3.5 The Ov e r a l l System 29 3.6 A Parametric Approach 31 4. TESTING AND RESULTS 33 4.1 Introduction 33 4.2 Desc r i p t i o n of the Training and Test Data 33 4.3 The E f f e c t s of Various Combinations of the Stages on Recognition 35 4.4 Recognition of HA . . . . 36 4.5 Recognition of HA, SP, SD, SC 37 4.6 The E f f e c t s of CD 38 5. SUMMARY AND DISCUSSION 43 5.1 Summary of the System . 43 5.2 Summary of Experiments and Concepts 44 5.3 Further Research 45 APPENDIX I 48 APPENDIX II 49 BIBLIOGRAPHY 51 i v LIST OF ILLUSTRATIONS Figure Page 1- 1 Block diagram of the OCR system 3 2- 1 S l i t scanning i n the Lexiphone . 5 2-2 Photograph of the r e s u l t i n g f(y,x) a f t e r scanning the l e t t e r "a" 5 2-3 Photograph from the computer display of the r e s u l t i n g VI a f t e r scanning the l e t t e r "a" 7 2-4 Photograph from the computer display of the r e s u l t i n g HI, (a), and H2, (b), a f t e r scanning the l e t t e r "a" 7 2-5 I l l u s t r a t i o n of the formation of v e r t i c a l and h o r i z o n t a l feature vectors f or the l e t t e r "0" 9 2-6 The representation of a t y p i c a l V2 for "a" 11 2-7 Photograph from the computer display of V3 for "a" 11 2-8 Flow chart of the algorithm f or the reduction of V2 to the f i n a l vector, V3 . 13 2-9 Flow chart describing the reduction of HI to H2 15 2-10 Photograph from the computer display of the r e s u l t i n g HI, (a), and H2, (b), a f t e r scanning the l e t t e r " j " 17 2- 11 An i n d i c a t i o n of trends while learning the four type s t y l e s : HA, SP, SD, SC 17 3- 1 The a p p l i c a t i o n of CD i n matching X with vectors from the t r a i n i n g set . . 26 3-2 Flow chart of the sequential decoder (TREE) 28 3- 3 Flow chart of the o v e r a l l c l a s s i f i c a t i o n scheme 30 4- 1 Examples of the four type-styles used: HA, SC, SP, SD . . . 34 4-2 Range i n q u a l i t y of the HA p r i n t used i n t r a i n i n g and te s t i n g , showing "a" and "W" 39 v Figure Page 4-3 An i n d i c a t i o n of the learning of a d d i t i o n a l aesz groups . . . 39 4- 4 A s i m p l i f i e d graphical i l l u s t r a t i o n of the overlapping p r o b a b i l i t y density functions of "a", "e", and "s" 42 5- 1 Recognition r e s u l t s from D/A a f t e r scanning "a", "b", "c", "d" 44 v i LIST OF TABLES Table Page 4-1 C l a s s i f i c a t i o n r e s u l t s f o r various combinations of stages of the system 35 4-2 Recognition rates f o r the group of l e t t e r s aesz f or EM and CD with t r a i n i n g on 12 and 192 groups 41 4-3 Confusion tables f o r the group of l e t t e r s aesz 41 (a) EM r e s u l t s a f t e r t r a i n i n g on 12 groups (b) CD r e s u l t s a f t e r t r a i n i n g on 12 groups (c) EM r e s u l t s a f t e r t r a i n i n g on 192 groups (d) CD r e s u l t s a f t e r t r a i n i n g on 192 groups v i i ACKNOWLEDGEMENT The author i s g r a t e f u l to h i s supervisor Dr. M.P. Beddoes for h i s advice and encouragement. Gratitude i s issued to Dr. D.L. Pulfrey f o r reading the manuscript and o f f e r i n g h e l p f u l suggestions. The author also acknowledges R. George f o r h i s help with the e l e c t r o n i c s implementation; C.Y. Suen and S. Hussain for proofreading the manuscript; G.T. Toussaint f o r proofreading and h e l p f u l discussions; Herb Black for photography; Heather DuBois and Veronica Komczynski for typing the manuscript; and the National Research Council of Canada for f i n a n c i a l assistance. v i i i CHAPTER 1. INTRODUCTION 1.1 The Purpose of the Research The project was undertaken with the dual aim of producing a workable o p t i c a l character recognition (OCR) system while c o l l e c t i n g s u f f i c i e n t experi-mental data to obtain a f a i r l y complete p i c t u r e of the problems to be overcome i n the manufacture of a highly s u c c e s s f u l machine. The working system should be capable of c l a s s i f y i n g various s t y l e s of s e r i f e d typewritten text, produced from reasonably acceptable carbon or c l o t h ribbon. The computer programming was organized i n a step-by-step manner to enable the development and study of the intermediate stages of the system. No e f f o r t was made to optimize programming i n order to minimize computation time. 1.2 Perspective Various types of machines have been developed which read s t y l i z e d and standard typewritten text. The i n t e r e s t e d reader i s dir e c t e d to [1], [2] f o r an o v e r a l l d e s c r i p t i o n of pattern recog n i t i o n applied to reading machines, to [3] f o r a d e s c r i p t i o n of types of character recog n i t i o n machines, to [4], [5] f o r a d e s c r i p t i o n of s o p h i s t i c a t e d OCR machines, to [6], [1], [19], [18] for a d e s c r i p t i o n of the popular contour tracing scheme, and to [7] for a d e s c r i p t i o n of the Cognodictor, a simple spelled-speech output reading machine f o r the b l i n d . I t i s f e l t that the contour tracing scheme u t i l i z i n g an acceptably inexpensive f l y i n g - s p o t scanner would be more expensive, and no more immune to l e t t e r noise, than the type of front-end on the Lexiphone (Chapter 2.2). A f l y i n g spot scanner has been s u c c e s s f u l l y used i n such commercial machines as the IBM 1287 and 1288 [5] under the proviso that s p e c i a l OCR type-styles produced from the p r e c i s e S e l e c t r i c typewriter, or c a r e f u l l y formed handprinted numerics, be used. 2 The Cognodictor [7] i s the type of machine which i s po s s i b l y the cl o s e s t to the i d e a l personal reading machine f o r the b l i n d . I t i s t r u l y portable, has a hand-held reading head, has moderate accuracy (90-95%) and speed (80-90 words per minute), recognizes most popular type fonts, and has-a s p e l l e d speech output. However, information concerning the q u a l i t y of the p r i n t , what exactly i s meant by "most popular type fonts", and tracking and alignment are not d i s c l o s e d i n [7]. A low r e s o l u t i o n matrix of photocells i s the means by which the characters are scanned,and problems of r e p r o d u c i b i l i t y are l i k e l y to a r i s e with such a scanner [8]. Alignment d i f f i c u l t i e s must also a r i s e becuase of a lack of leeway within the low r e s o l u t i o n scanning f i e l d . The Lexiphone [9], [8] i s a d i r e c t - t r a n s l a t i o n reading machine with sound output f o r the b l i n d . The front-end i s a high r e s o l u t i o n scanner comprised of 54 photodiodes capable of producing a f a i r l y consistent output [10] for d i f f e r e n t scannings of a given character. The c o n s i s t e n t l y good performance of the scanner suggested that t h i s machine could be used as a front-end f o r a character reco g n i t i o n machine. The code sounds from the Lexiphone could be read by people. This ind i c a t e d that the Lexiphone might provide s u f f i c i e n t information to allow m a c h i n e - c l a s s i f i c a t i o n of typewritten alphanumeric characters. 1.3 The System A s i m p l i f i e d block diagram of the complete system i s shown i n Figure 1-1. The Lexiphone i s used as a front-end and p a r t i a l preprocessor. Hardware c i r c u i t r y i s added to the Lexiphone to obtain more information than that provided by the Lexiphone alone. The remaining preprocessing, and processing ( c l a s s i f i c a t i o n ) i s performed by a PDP-9 d i g i t a l computer. The system uses a multistage c l a s s i f i e r composed of a table look-up c l a s s i f i e r i n s e r i e s with a simple sequential decoder, and a second table look-up c l a s s i f i e r to c l a s s i f y 3 the information from the a d d i t i o n a l hardware. Results are outputted on-line v i a the teletype, or as voltage l e v e l s from a digital-to-analogue converter (D/A). P£epgoc~6£<,o&. CiAssiF/'etz. ivpuT P/?./M-cr£> LEX/P//O/VE fiooeo /AJTSRFACE. PDA-3 i ... J....L i Ourpirf Figure 1-1 Block diagram of the OCR system. 4 CHAPTER 2. PREPROCESSING INFORMATION- FROM THE LEXIPHONE 2.1 Introduction The operation of the Lexiphone and the manner i n which the Lexiphone was modified to obtain the desired feature s e l e c t i o n i s discussed. The develop-ment of feature vectors and a discu s s i o n of these vectors i s presented. 2.2 Feature S e l e c t i o n The Lexiphone output i s a frequency and amplitude modulated square wave. The frequency modulation i s provided by an analogue s i g n a l given by M£x)-1 f (y,x) = y .(x) + E Ay. (x) = f. m . , I i = l This function i s bet t e r understood a f t e r a b r i e f d i s c u s s i o n of the Lexiphone. The l i n e of typewritten p r i n t i s moved h o r i z o n t a l l y below a s l i t scanner. The s l i t consists of a l i n e of 54 photodiodes f i x e d i n a plane on which the ink patterns are focused. At any one insta n t there are M^ white to black tran-s i t i o n s imaged on the photodiode array. Figure 2-1 i l l u s t r a t e s the case with the p o s i t i o n of the scan at x^x^- TTi e complete f for' the l e t t e r "a" i s shown i n Figure 2-2. The audio output i s a square wave with a frequency proportional to f . Amplitude modulation i s part of the code. I f a contiguous column i n a character exceeds a c e r t a i n length the amplitude of the audio output i s increased 10 dB. For example, the sound becomes louder three separate times during scanning of the l e t t e r "m". Because f and the column detection (CN) proved u s e f u l i n producing a humanly recognizable code, i t was decided to use these techniques i n the system. The analogue waveforms representing the various l e t t e r s were c o n s i -dered too complex to be e a s i l y processed. A more c l e a r l y defined function Top OF SCAM BLECTROWIC VERTICAL SLIT SCANA//H Cr OF TAL MOTIOfJ OF SLIT RELATIVE To PAPER >~ X F i g u r e 2 - 1 S l i t s c a n n i n g i n the Lexip h o n e . f ( y , x k ) = y m ( x k ) + A y 1 ( x k ) + A y 2 ( x k ) F i g u r e 2-2 Photograph of the r e s u l t i n g f ( y , x ) a f t e r s c a n n i n g the l e t t e r " a " . 6 r e t a i n i n g s i g n i f i c a n t f eatures of f was sought. The Lexiphone processes scan by scan approximately every 0.1 m i l l i -seconds, w i t h two c o n t r o l pulses P2, P3 o c c u r r i n g i n sequence before the s t a r t of the e l e c t r o n i c scanning of the 54 d e t e c t o r s . At P2, the number of edges plu s one M +1, encountered during the previous scan i s loaded i n t o a three-b i t r e g i s t e r (the " c o u n t - b y " . r e g i s t e r i n [ 8 ] ) . By sampling t h i s r e g i s t e r , decrementing, and s t o r i n g the r e s u l t i n computer memory on P3 a d i s t r i b u t i o n of the edge count i s b u i l t up. The new f u n c t i o n VI i s more c l e a r l y d e f i n e d than f and i s given by V(M^, x) = M^(x) = V I . The complete f u n c t i o n f o r the l e t t e r "a" i s shown i n F i g u r e 2-3. Note that although f and VI are s i m i l a r , VI r e q u i r e s only four l e v e l s to represent the most complex ch a r a c t e r . Note a l s o i n VI that a l l i n f o r m a t i o n concerning the r e l a t i v e h eights of the edges i s d i s c a r d e d . V e r t i c a l alignment w i t h i n the scanning f i e l d i s thus i n c o n s e q u e n t i a l f o r VI. The r e s u l t i s . a n i n c r e a s e i n s i m p l i c i t y accompanied by a corresponding decrease i n i n f o r m a t i o n content. To evaluate the usefulness of VI and CN, a c l a s s i f i -c a t i o n scheme was developed. Using i n f o r m a t i o n on the p o s i t i o n i n g i n the x - d i r e c t i o n and heights of columns, and a s i m p l i f i c a t i o n of VI (Chapter 2.3.1), fea t u r e v e c t o r s were formed to represent scanned c h a r a c t e r s . A look-up t a b l e (Chapter 3.2) was formed w i t h the average number of d i f f e r e n t v e c t o r s (many v e c t o r s occurred s e v e r a l times per character) per c h a r a c t e r being 17.3. I t was f e l t t hat the t a b l e , or t r a i n i n g s e t , was l a r g e enough to provide a s u f f i c i e n t d e s c r i p t i o n of the 52 c h a r a c t e r t y p e w r i t t e n alphabet. Recognition was performed on s i x 52 c h a r a c t e r alphabets from the t r a i n i n g s e t . The r e s u l t s of 65% using the exact match (EM) technique (Chapter 3.3.2) and 82% using a dimension e q u a l i z a -t i o n (DE) technique (Chapter 3.3.3) were obtained. These r e s u l t s were somewhat o p t i m i s t i c because the t r a i n i n g set was used f o r t e s t i n g [ 1 1 ] ? [ 1 2 ] . I n e f f e c t , F i g u r e 2-3 Photograph from the computer d i s p l a y of the r e s u l t i n g VI a f t e r s c a n n i n g the l e t t e r " a " . W (6) F i g u r e 2-4 Photograph from the computer d i s p l a y of the r e s u l t i n g HI, ( a ) , and H 2 , ( b ) , a f t e r scanning the l e t t e r " a " . 8 too much t r a i n i n g was required and even then inadequate r e s u l t s were obtained. More information had to be extracted from the characters. The means to obtain more information was to scan h o r i z o n t a l l y p a r a l l e l to VI. HI was produced by hardware added to the e x i s t i n g Lexiphone. B a s i c a l l y , each of the 54 v e r t i c a l l y arranged detectors counted the number of edges (M ) encountered as the character moved h o r i z o n t a l l y beneath the s l i t . The actual system i s b r i e f l y documented i n Appendix 1. A photograph of HI f o r the scanning of "a" i s shown i n Figure 2-4(a). Note that information concerning the o v e r a l l height and the r e l a t i v e heights of the features w i t h i n a character i s a v a i l a b l e i n HI. The computer display revealed- that often columns would not be detected because of breaks i n l e t t e r s . Also, the p o s i t i o n i n g of the columns was not d e f i n i t e because of f l u c t u a t i o n s i n platform speed. Tall-column (as i n "I") detection and the p o s i t i o n i n g of a column wit h i n two equal regions along the x - d i r e c t i o n were used i n the sequential decoding of a few s p e c i a l cases (Chapter 3.4). The r e l i a n c e on CN was i n t h i s way kept to a minimum. 2.3 Feature Vector Representations This s e c t i o n describes the development of feature vectors from the v e r t i c a l and h o r i z o n t a l information. Figure 2-5 i l l u s t r a t e s the processes involved f o r the l e t t e r "0". VI i s composed of samples of the edge count (M ) c o l l e c t e d approxi-mately every 0.1 milliseconds while the character i s passing beneath the scanner. Dimensionality and noise are reduced i n producing V2 from VI. Further dimensionality reduction produces V3, the v e r t i c a l feature vector. the h o r i z o n t a l Experiment showed that too much emphasis was being placed on CN. HI i s composed of the outputs of 54 counters and y i e l d s a h o r i z o n t a l edge count (M^) d i s t r i b u t i o n a f t e r a l e t t e r has passed beneath the scanner. H2 i s a highly s i m p l i f i e d form of HI and i s the h o r i z o n t a l feature vector. HZ V 3 F i g u r e 2 - 5 I l l u s t r a t i o n o f t h e f o r m a t i o n o f v e r t i c a l a n d h o r i z o n t a l f e a t u r e v e c t o r s f o r t h e l e t t e r " 0 " . 2.3.1 Vector Representation of the V e r t i c a l Information Aft e r scanning a character a b u f f e r i n computer memory contains a serie s of three-bit words composing VI. The actual number of thre e - b i t words, or samples, depends upon the width of the l e t t e r and the platform speed. For the usual platform speed used an average character i s resolved into 800 samples of the edge count, M^. Such a high degree of r e s o l u t i o n i s not necessary, but was used at the s t a r t of the development of the system r e a l i z i n g that r e s o l u t i o n could always be reduced. The high r e s o l u t i o n was retained throughout the development. Each VI i s an extremely long vector averaging 800 samples at three 10 b i t s each, or 2,400 bits.... E f f i c i e n t c l a s s i f i c a t i o n of vectors of this length would be extremely d i f f i c u l t . VI contains S samples. A noise threshold was chosen to be large enough to r e j e c t most noise, but small enough to r e t a i n most pertinent i n -formation. This threshold, T, was chosen experimentally by observing VI on the computer display and noting T's e f f e c t on the r e s u l t i n g feature vectors. The chosen threshold was T = — . Note that the noise threshold i s pro p o r t i o n a l to the l e t t e r width, S. Two advantages r e s u l t . F i r s t , the platform speed may be v a r i e d with the threshold r e l a t i v e to the r e s u l t i n g S (a slow platform speed y i e l d s a large S and a f a s t platform.a small S) remaining constant at 1/32. This means that a wide range of reading speeds may be used. Also, due to a "crude" d r i v i n g mechanism the platform may cause S to be much d i f f e r e n t for d i f f e r e n t scannings of the same character. This case i s equivalent to the previous case of d i f f e r e n t reading speeds. The second advantage i s that for narrow l e t t e r s pertinent information, i t s e l f dense i n narrow l e t t e r s , i s . not r e j e c t e d . As an example, i f the threshold f o r "W", a wide l e t t e r , were to be used f o r the narrow l e t t e r " j " much pertinent information would be l o s t f u • II rom j • The s i m p l i f i c a t i o n of VI using T i s accomplished with the following algorithm. The number, n , of consecutive samples with i d e n t i c a l M i s stored c v i n a sample counter. I f n^ reaches T then the corresponding M v i s recorded i n a secondary table and the sample counter i s r e i n i t i a t e d . Should Mychange before T i s reached the value of M preceding t h i s change i s discarded and the sample counter i s r e i n i t i a t e d . The process of checking through the VI d i s t r i b u t i o n i s continued u n t i l an e n t i r e secondary d i s t r i b u t i o n , V2, (as shown i n Figure 2-6) i s completed. Secondary samples could e a s i l y be l o s t when M changes before T i s 11 v 3 (3 F i g u r e 2-6 The r e p r e s e n t a t i o n of a t y p i c a l V2 f o r " a " . Note t h a t the s p i k e s p r e s e n t i n F i g u r e 2-3 have been e l i m i n a t e d . F i g u r e 2-7 Photograph from the computer d i s p l a y of V3 f o r " a " . The v e c t o r r e p r e s e n t a t i o n i n edges i s l / 2 / 3 / 3 / 3 / l / l • S u b t r a c t i n g one edge from each segment and p l a c i n g a "1" i n f r o n t t o denote the b e g i n n i n g of the v e c t o r the r e s u l t i n o c t a l i s Y3=43240g. 12 reached, or at sub-threshold spikes (two of which can be observed i n Figure 2-3) which i n t e r r u p t the sample counter and cause i t to be r e i n i t i a t e d . The loss of a few V2 samples out of around 32 was not considered c r i t i c a l . The number of samples i n VI i s approximately reduced by a f a c t o r of 32 i n V2. Reduction for the l e t t e r "a" i s i l l u s t r a t e d i n Figure 2-6. V2 i s a much simpler representation than VI i n Figure 2-3 which contains 800 or 900 samples and noise spikes . To s i m p l i f y V2, each change i n i s retained as w e l l as the r e l a t i v e widths of each separate l e v e l of constant M . The various widths of the l e v e l s v of constant M must be retained to allow d i s c r i m i n a t i o n between characters of v somewhat s i m i l a r VI d i s t r i b u t i o n s . Also the dimensionality of the feature vector should be low. A flow chart i l l u s t r a t i n g the chosen algorithm to produce the f i n a l vector, V3, i s shown i n Figure 2-8. The algorithm works b a s i c a l l y i n the following manner. The f i r s t entry i n V2 i s recorded. I f the following N consecutive entries are i d e n t i c a l , then the entry i s recorded again and i s recorded each time N i s reached. I f an edge change occurs i n V2 before N i s reached, the new entry i s recorded and a check f o r i d e n t i c a l e ntries i s s t a r t e d again. The algorithm w i l l be applied to Figure 2-6 using N=6. The change from Mv=0 to M^=l i s recorded as M =1. Following the i n i t i a l entry i s a s t r i n g c o n s i s t i n g of a s i n g l e entry of M =1. The s t r i n g does not exceed N=6 so no fur t h e r values are recorded for M =1. With the change to M -2, the value v v M^=2 i s recorded. Trie three following entries are disregarded because N=6 i s not reached. The change from M =2 to M =3 i s recorded as M =3. Twelve v v v consecutive entries follow, exactly enough to reach N=6 twice. Two more values of M =3 are thus recorded. The change from M =3 to M =1 i s recorded as v v v M v=l with the following ten consecutive entries supplying one a d d i t i o n a l M =1 13 (sTART^) RECORD NEWLY ENCOUNTERED ENTRY RECORD INITIAL ENTRY PROM V2 INCREMENT THE V2 SAMPLE COUNTER HAS THE NEW SAMPLE' -NO-< THE SAME M v AS THE PREVIOUS SAMPLE/ 9 NO YES ~'S FINISHED 'CHECKING ALL V2 ENTRIES 9 RECORD ANOTHER ENTRY OF THAT VALUE OF M, 7 STOP F i g u r e 2-8 Flow c h a r t o f t h e a l g o r i t h m f o r t h e r e d u c t i o n o f V2 t o t h e f i n a l v e c t o r , V'3. 14 value. The r e s u l t i n g coarsened d i s t r i b u t i o n , as shown i n Figure 2-7, i s M =1/2/3/3/3/1/1. Note that while the. dimensionality of t h i s function i s v v a s t l y reduced from VI (Figure 2-3) the pertinent information remains. Because two b i t s can represent the maximum of four edges found i n typewritten characters, the l e v e l s 0 to 3 were chosen to correspond r e s p e c t i v e l y to the edges 1 to 4. Values outside this range were rejected as r e s u l t i n g from l e t t e r noise. Thus i n the f i n a l feature vector representation each value of M i s decremented, v A leading "1" i s used to mark the beginning of a V3. The l a s t b i t of a V3 corresponds to the l e a s t s i g n i f i c a n t b i t i n the computer word. The feature vector for "a" evolves thus: /1/2/3/3/3/1/1 •> 1/0/1/2/2/2/0/0 = 1/00/01/10/10/10/00/00 = [43240] g= V3 g The basic idea of reduction for VI i s also employed f o r the h o r i z o n t a l information. 2.3.2 Vector Representation of the Horizontal Information A f t e r scanning a l e t t e r , HI i s contained i n a b u f f e r of 54 t h r e e - b i t words i n computer memory. The e n t i r e b u f f e r f o r an "a" i s displayed i n Figure 2-4(a). HI i s s i m p l i f i e d into H2 according to the flow chart i n Figure 2-9. B a s i c a l l y , i f two or more consecutive entries have the same then that i s recorded. I f following those two entries are K or more consecutive i d e n t i c a l values of then an a d d i t i o n a l entry i s recorded. A maximum of two entries can be recorded i n H2 f o r a given value of i n HI. Entries of M^=0 i n HI above and below the character are disregarded. E n t r i e s of M^=0 within the character are a r b i t r a r i l y changed to the value " l " . Figures 2-4 and 2-10 r e s p e c t i v e l y i l l u s t r a t e the s i m p l i f i c a t i o n for the l e t t e r s "a" and " j " . The algorithm w i l l be explained using Figure 2-10 with K=13. The f i r s t two consecutive samples are M =1 so an M = l i s recorded. 15 START AT TOP OP HI DISTRIBUTION FIND FIRST NON-ZERO ENTRY YES INCREMENT THE HI SAMPLE COUNTER RECORD, THAT VALUE OF M-•H YES IGNORE ENTRY INCREMENT THE HI SAMPLE COUNTER YES ' NEW- ENTRY ISOLATED ? —^ EJ— .NO-YES RECORD A SECOND ENTRY FOR THAT M H -=82-F i g u r e 2-9 Flow c h a r t d e s c r i b i n g the r e d u c t i o n of HI t o 112. 16 The threshold K i s not exceeded so there are no more e n t r i e s . Next are two values of M^=0 so M^=l i s recorded. Then a se r i e s of M^=l entries are encountered with the number of edges following the o r i g i n a l two reaching K=13. Consequently two entries of MJJ=1 a r e recorded. S i m i l a r l y , the l a s t two entries recorded are 14^ =2 and ^ =1. As seen i n Figure 2-10(b) the r e s u l t i n g h o r i z o n t a l feature vector f o r th i s " j " i s : /1/0/1/1/2/1 -> 1/1/1/1/1/2/1 + 1/0/0/0/0/1/0 = [1004]g = H2 g A f u r t h e r discussion of the V3 and H2 vectors w i l l next be considered. 2.4 Further Discussion of the Vector Representations Now that the construction of V3 and H2 has been discussed the philosophy behind t h e i r construction w i l l be considered. The v e r t i c a l scanning, p a r t i c u l a r l y with s e r i f s present, produces more information than the h o r i z o n t a l scanning. Consequently, V3 i s allowed to be more complex than H2. V3 and H2 d i f f e r i n that V3 i s not limited' i n the number of two-bit segments following a change i n M , while H2 i s l i m i t e d to a maximum of two ent r i e s following a change i n M ^ . The s i z e of H2 w i l l thus not increase to the extent that V3 w i l l . Horizontal information was meant to be used p r i m a r i l y as a reinforcement f o r v e r t i c a l information. A short H2 i s thus a p p l i c a b l e , and with t h i s shorter vector CD i s not necessary. Three thresholds used i n vector formation were previously mentioned: "2" f o r the formation of H2, K for H2, and N f o r V3. A l l were chosen experimentally. The threshold "2" was chosen simply by observing HI, as i n Figures 2-4(a) and 2-10(a), on the computer d i s p l a y . Figure 2-10(a) i s s u f f i c i e n t to describe the reasoning behind the s e l e c t i o n . The gap between the dot and the column of the " j " i s considered pertinent. The lone M ^ = 2 entry among the MJJ=1 entries i s meaningless. Below the gap i n the " j " are three pertinent edges, F i g u r e 2-10 P h o t o g r a p h f r o m t h e c o m p u t e r d i s p l a y o f t h e r e s u l t i n g H I , ( a ) , a n d H 2 , ( b ) , a f t e r s c a n n i n g t h e l e t t e r " j " . F i g u r e 2 - 1 1 l.::A..//mS£A...OF...:.. CO j C f f A M O T Z R . JjLP/tABB.TS. A n i n d i c a t i o n o f t r e n d s w h i l e l e a r n i n g t h e f o u r t y p e s t y l e s : HA, S P , 3D, SC ( C h a p t e r 4.2) 18 1, 2, and 1, a l l with more than 2 e n t r i e s . The lower l i m i t of the l a s t s i n g l e edge, the bottom of the hook, (also the top and bottom extremities i n "c") would be 2 e n t r i e s . A reasonable value f o r t h i s threshold i s thus 2. To provide u s e f u l information concerning the r e l a t i v e heights of features, K was chosen as K=13. Reasoning f o r t h i s choice of K i s as follows. The l e t t e r s " t " and "b" are s i m i l a r i n both V3 and HI. Each could have l e v e l s of M^=l/2/l as the d i s t r i b u t i o n . But " t " has approximately 16 entries i n i t s M ^ = l column, while "b" has approximately 8 e n t r i e s . By s e t t i n g K=13, H2 f o r " t " i s 1/1/2/1 while f o r "b" H2 i s 1/2/1. The t a l l l e t t e r "V" could have as few as 13 e n t r i e s before the bottom s i n g l e edge p o r t i o n (the meeting of the diagonal l i n e s ) . To y i e l d an H2 of 2/2/1 f o r "V" and thus emphasize the height of the M^=2 p o r t i o n , the threshold could be no greater than 13. K was chosen small enough to prevent great overlap among the various categories yet n e c e s s a r i l y large to prevent an excess of d i f f e r e n t H2 vectors f o r a given category. To keep the dimensionality of V3 low, N should be large. In the l i m i t as N becomes very large V3 would consist only of the edge changes found i n V2. Any given category would thus have r e l a t i v e l y few vector representations and hence few r e j e c t i o n s , but a great number of s u b s t i t u t i o n s would occur. Conversely, i f N=l then the r e s u l t i n g V3 would be i d e n t i c a l to V2. Values of N=6, 5, and 4 were inve s t i g a t e d . Recognition r e s u l t s obtained with a large t r a i n i n g set and EM were previously given as 65% c o r r e c t . In that case, N=4 was used. The average length of V3 f o r N=4 was found to be 20 b i t s (excluding the leading "1"). For N=5 the average length was 16 b i t s , and f o r N=6, 14 b i t s . Fewer r e j e c t s than obtained with N=4 were e s s e n t i a l . Because of the added information of H2 (an average length of 8 b i t s ) i t was f e l t that the r i s k of 19 s u b s t i t u t i o n i n V3 could be taken to obtain a short V3. Thus N=6 was chosen. With a r e s u l t i n g r e l a t i v e l y short V3 a small amount of t r a i n i n g was expected to produce a good representation of the most probable vectors. As a guide to how we l l the system was being trained on V3's the number of new vectors found a f t e r t r a i n i n g on a se r i e s of typings of a 60 character alphabet was inve s t i g a t e d and recorded i n Figure 2-11. The f i r s t 12 typings (alphabets) comprise the standard t r a i n i n g set produced by a Hermes Ambassador typewriter with both c l o t h and carbon ribbons. The new entries eventually decrease to about 20 per complete alphabet trained on. I t was expected that should there be on the average around 20 vectors rejected by EM many of these vectors could be "found" by CD (Chapter 3.3). As the three a d d i t i o n a l type-styles were learned.the new entries increased at f i r s t then s e t t l e d down each time. The s a t u r a t i o n point of around 20 new vectors per alphabet, reached with v e r t i c a l information, can be compared with approximately 10 new vectors per alphabet f o r the h o r i z o n t a l information. The lower sa t u r a t i o n point f o r H2 i s necessary because no CD i s used. The processing of these vectors w i l l next be discussed a f t e r a b r i e f account 6f c l a s s i f i c a t i o n theory applying to the c l a s s i f i c a t i o n method chosen. 20 CHAPTER 3. FEATURE VECTOR CLASSIFICATION 3.1 Introduction This chapter deals with the c l a s s i f i c a t i o n of V3 and H2. The c l a s s i f i c a t i o n of H2 was accomplished using an exact match (EM) technique:. V3 i s c l a s s i f i e d using a "nearest neighbours" (NN's) technique i n conjunction with a "dimension e q u a l i z a t i o n " (DE) scheme which allows comparison of vectors o r g i n a l l y having s l i g h t l y d i f f e r e n t lengths. The o v e r a l l V3 c l a s s i f i c a t i o n method i s termed "compenent d e l e t i o n " (CD). A sequential decoder (TREE) i s used to reduce s u b s t i t u t i o n errors i n c l a s s i f y i n g V3. The complete system combines V3 with CD and TREE, and H2 with EM. 3.2 The Nonparametric Approach An a r b i t r a r y feature vector X i s optimally c l a s s i f i e d as belonging to pattern class C. when the discriminant x g .00 = P(X,C.) = P(X|C.)P(C.) (3-1) i s a maximum. The optimum c l a s s i f i c a t i o n d e c i s i o n i s based on the r e l a t i v e degree of s i m i l a r i t y between X and the vectors i n the R subsets of a l l the classes C^, C^, C R, i n the t r a i n i n g set [13]. The table look-up c l a s s i f i c a t i o n method i s an approximation to th i s procedure. Instead of considering a l l p o s s i b l e vectors, only the most probable vectors (those that the X's are l i k e l y to be near) are considered. A look-up table contains R t r a i n i n g subsets, one f o r each pattern class C^. A t r a i n i n g subset, ij; , for C^ i s considered complete a f t e r t r a i n i n g on ^ independent ink patterns i n C^. The t r a i n i n g set thus consists of N_^ R vectors f o r N^=N^=. . « = N R - I f a member, X^ ^, of ij^ occurs h^ of the N_^  times during t r a i n i n g i t s frequency i s h^/N^, an estimate of P(X^ ^|C.). Tims the look-up table contains estimates of the p r o b a b i l i t i e s of the most l i k e l y vectors. 21 So P(XJC^) i s i m p l i c i t l y present i n the look-up table. The c l a s s i f i c a t i o n method applied to equal length vectors w i l l be discussed f i r s t . The k feature vectors of the R t r a i n i n g subsets iji , ...» which l i e within a distance, d, or l e s s , of X are gathered. Of these k is. v vectors l e t n 1 belong to ik , n 0 to i{/0, n to if-r,* so that n n+n +. . . .+n =k . i l / Z R R 1 Z Kv The category of X i s chosen as i f the i f c ^ discriminant, n^, i s the l a r g e s t of the discriminants n n , n~, n„. This method w i l l be c a l l e d the k 1 2 R v nearest neighbour (k -NN) method. v The k^-NN method i s s i m i l a r to the F i x and Hodges, [14], p. 120, method, or k-NN [15], [16], [17] method, i n which k, a large p o s i t i v e integer i s small compared to and i s f i x e d . With the k-NN method k should be large so that the n^, n^, n^ values w i l l not be overly s e n s i t i v e to small changes i n X. On the other hand k must be much smaller than the t o t a l number, N_^ , of vectors in the t r a i n i n g subsets so that s e n s i t i v i t y to the v a r i a t i o n s of the p r o b a b i l i t y d i s t r i b u t i o n s with X w i l l not be l o s t . The k-NN d e c i s i o n rule has been shown [23] to lead to the same decisions as would be made i f the p r o b a b i l i t y d i s t r i -butions were known and used i n (3-1) as long as k and N. are allowed to increase x without l i m i t while k/jy^ approaches zero. With d f i x e d , as i n the k -NN method, k i s v a r i a b l e . Consequently v v k^ may be the i d e a l value i n some cases while i n other cases i t may be too large or too small. With a fi x e d k, the k-NN method automatically relaxes the distance requirement (makes d larger) i f X e x i s t s i n a sparse population of t r a i n i n g vectors and thus searches a greater p o r t i o n of the vector space. Under the same circumstances with a f i x e d d (the k^-NN method) possibly only a few t r a i n i n g vectors would be observed and the discriminants may vary r a p i d l y with X. In order f o r the k -NN method to be succ e s s f u l some provisions must v r be made to ensure a s u f f i c i e n t l y , but not overly, large k . These provisions 22 can be obtained by the appropriate choices of the metric, d, and the vector length, L, as w e l l as by the use of DE. In the above discussion the various pattern classes were considered as equiprobable. To account f o r the a p r i o r i p r o b a b i l i t y of C^, P(C^) i n (3-1), the discriminant values could be m u l t i p l i e d by the respective a p r i o r i p r o b a b i l i t i e s . Thus, n P(C. ),n P(C ), n P(C ) would be checked f o r a 1 1 Z Z R R maximum. A l t e r n a t i v e l y the s i z e , N , of each of the R subsets could be adjusted while t r a i n i n g such that the more probable classes have more vectors (a.larger N ) i n t h e i r respective subsets. In other words more t r a i n i n g would be allowed f o r the more probable classes than the le s s probable c l a s s e s . 3.3 Component Deletion CD i s a means of, from one viewpoint, enlarging the t r a i n i n g set by producing NN's to the stored t r a i n i n g vectors. Instead of comparing X to a few X^'s, X i s compared to the X_^ 's and several nearest neighbours of the X_^'s. Another viewpoint i s that CD allows a small degree of d i s s i m i l a r i t y between X and i t s matches found i n the t r a i n i n g set. CD extends beyond simple distance d i s s i m i l a r i t i e s to enable the comparison of vectors d i f f e r i n g i n dimensionality. CD includes the comparison of equal-length vectors (the k -NN) method v and vectors d i f f e r i n g i n length by a s i n g l e two-bit segment (DE). Thus X. w i l l be gathered among the k^ vectors i f i t d i f f e r s from X by no more than one two-b i t segment. 3.3.1 The Metric To measure s i m i l a r i t y between vectors of equal length any convenient metric may be chosen [13], p. 15. The metric chosen to y i e l d a two-bit tolerance i s L / 2 d = I |y, - y..I (3-2) j = l 3 2 23 where y = ( X2 J _ T » X 2 j ^ ' a Va^-T °^ binary components, j - 1,2,3, L/2 L = number of b i t s i n X (and X^). i s 1,2,3, R The binary components are ^, x^ _. while y^ i s a two-bit segment. 3.3.2 The Comparison of Vectors of Equal Length The case with the lengths of V3's, X and X , equal w i l l be considered. The k -NN method i s used. The feature vector X. i s included among the k vectors v x e v i f a maximum of one p a i r of corresponding segments d i f f e r and L/2 d = s |y, - y „ | id = 3 . (3-3) 3=1 This means that X and must be e i t h e r EM's (d=0) or d i f f e r at most by a s i n g l e two-bit segment (d=l,2,3) f o r to be included among the k nearest vectors. In the case that a s i n g l e p a i r of corresponding segments i n X and X d i f f e r , these two segments are deleted and the r e s u l t i n g shorter vectors X' and X^' are compared using EM. 3.3.3 The Comparison of Vectors of D i f f e r e n t Lengths The k^-NN method can be applied only when X has the same number of components as X_^ . Because, by choice, N_^  i s small and because the number of X_^ 's i n 4>. which has the same dimensionality as X i s even smaller, k i s n e c e s s a r i l y x v small. To allow comparison of X with vectors d i f f e r i n g i n length from X, DE i s used. With t h i s method i s enlarged s u f f i c i e n t l y to y i e l d acceptable r e s u l t s using small (N^=12) t r a i n i n g subsets. Any t r a i n i n g subset, has vectors of length L^ plus or minus up to a few two-bit segments. DE allows a tolerance of one two-bit segment between X and every member of every t r a i n i n g subset. X and X^ a f t e r DE w i l l be r e f e r r e d to as X' and X.1 to denote that e i t h e r X or X. was shortened by one two-bit segment. I f a f t e r the tolerance i s allowed, X' and X ' are the same length, they are compared using EM. Thus X\ i s included i n the k feature vectors a f t e r DE i f L/2 £ y.' - y.. 1 < d.=0, where y.' i s a two-bit j = 1 3 J i ~ 2 J2 segment of X'. Note that the distance, d^, i s zero, not 3. This i s i n keeping with the o r i g i n a l plan of allowing a tolerance of only one segment. If i n addit i o n to the s i n g l e segment d e l e t i o n a p a i r of corresponding segments were allowed to d i f f e r (d^-^) the X_^ 's would become unnecessarily d i s t o r t e d with an abundance of random s u b s t i t u t i o n errors r e s u l t i n g . A more objective approach to DE w i l l now be discussed. Let X^ be the shorter vector of X or X_^ , and X^ the l a r g e r , by two b i t s , of the two. Then *a = ( y l , a ' y 2 , a ' \ / 2 > a ) *b ( y l , b ' y 2 , b ' •* ,' yL /2,b' yL 12 +1, b) v ' V L denotes a v a r i a b l e vector length. L v When subjected to DE X^ i s transformed into a maximum of —^ + 1 vectors to be compared with X . \ ( y l , b ' y 2 , b ' y L /2-2,b' YL /2-l,b' yL /2,b } V V V - 2 \ ~ ( y l , b ' y 2 , b ' y L /2-2,b' yL /2-l,b' yL /2+l,b } V V V - 3 h = ( y l , b ' y 2 , b ' Y L /2-2,b' yL /2,b' YL /2+l,b ) V V V - L 12 + 1 _ % V ~ C y2,b' y3,b' •••» y L /2-l,b' yL /2,b' yL /2+l,b ) V V V In e f f e c t , the s i n g l e vector X^ produced a d i s t r i b u t i o n of vectors s i m i l a r to X^, yet with the dimensionality of X . The s i g n i f i c a n t point i s that the small 2 5 number o f v e c t o r s i n each t r a i n i n g s u b s e t i s c a p a b l e o f p r o d u c i n g a l a r g e a r t i f i c i a l t r a i n i n g s e t d u r i n g p r o c e s s i n g . I n s t e a d o f s t o r i n g an e x c e s s i v e number o f n e a r e s t n e i g h b o u r s i n the t r a i n i n g s e t , the n e a r e s t n e i g h b o u r s c a n i n a s e n s e be p r o d u c e d as t h e y a r e r e q u i r e d . When an EM w i t h X.' i s f o u n d , t h e DE p r o c e s s i s t e r m i n a t e d even 1 though f u r t h e r matches w i t h X ' may be o b t a i n e d . F u r t h e r matches a r e e q u i v a l e n t to t h e f i r s t match and a r e thus n o t i n c l u d e d among the v e c t o r s . F o r the c a s e t h a t X. i s l a r g e r than X, t h e n X. p r o d u c e s the a d d i t i o n a l v e c t o r s . I f , however, X i s l a r g e r t h a n X^ t h e n X, t h e u n c l a s s i f i e d v e c t o r , i s used t o p r o d u c e the a d d i t i o n a l v e c t o r s . A l t h o u g h i t may seem i l l o g i c a l t o form n e a r e s t n e i g h b o u r s from the i n c o m i n g v e c t o r X and t h e n compare X\, t h e v e c t o r from t h e t r a i n i n g s e t , w i t h t h e s e n e a r e s t n e i g h b o u r s , s u c h i s n o t t h e c a s e . The r e a s o n i n g i s as f o l l o w s . I n s t e a d o f removing one segment a t a t i m e from X when X i s l o n g e r t h a n X^, one segment a t a t i m e c o u l d be added to X t o a t t e m p t to make X. s i m i l a r t o X. I n t h i s c a s e the segment added to X. w o u l d be i d e n t i c a l to t h e x x c o r r e s p o n d i n g segment i n X i n o r d e r t h a t X and X^ be s i m i l a r . But i n EM t h e s e two c o r r e s p o n d i n g segments w o u l d always match up so they m i g h t j u s t as w e l l be d e l e t e d . But i n d e l e t i n g t h e added segment i n t h e m o d i f i e d X^ and t h e c o r r e s p o n d i n g segment i n X we a r e s i m p l y d e l e t i n g t h e segment from X. C o n s i d e r t h e g e n e r a l c a s e w i t h * i = K = ( y l , a ' y 2 , a ' • • • ' y j - l , a ' y j , a ' V l , a ' • • • ' y L v / 2 , a ) ^ * = * b = ( y l , b ' y 2 , b ' • • • ' y j - l , b ' y j , b > y j + l , b > • ' • > y L v / 2 + l , b ) To make X as l o n g as X s l i p segment y. to t h e l e f t o f y. i n X , and t h e n 1 3 >b J , a x' compare X. 1 and X* . x V = V = ( y l , a ' y 2 , a > • • - y j - l , a ' | y j , b ^ j , a ' y j + l , a ' • • " y L / 2 , a ) X = \ = ( y i , b ' y 2 , b ' • • • ' y j - l , b ' l y j , b ' y j + l , b ' y j + 2 , b ' • • • ' y L / 2 + l , b ) L L 26 y(j) 5 2 I X —I 1 1 1 1 ( a ) A r b i t r a r y v e c t o r X= ( j ^ ^ , ^ ^ ) , w i t h L=4 3 X, -I \ — I 1—t-' , P * . P *>? w 6\p) 3 X, H 1 ) 1 1-( b ) A v e c t o r X p = ( y 1 , p , y 2 , p , y 3 , p , y ^ , p , f r o m t h e t r a i n i n g s e t , w i t h _L=. T h i s i s a n e x a c t m a t c h w i t h X . ( c ) A t r a i n i n g v e c t o r X q , w i t h L=4• S e g m e n t y ^ , ^ d i f f e r s f r o m y ^ . 3 .1 / o X 1 1 1 1 h ),r 2,r s,r i , r ( d ) A t r a i n i n g v e c t o r X ^ w i t h L=5 • T h i s i s a n e x a c t m a t c h w i t h s e g m e n t y 2 > r d e l e t e d a n d y ^ , r , v 4 ' r ' ^ 5 ' r : s h i f t e d l e f t x, — » — i — i — i — i ( e ) A t r a i n i n g v e c t o r X s w i t h L=3• T h i s i s a n e x a c t m a t c h w h e n y ^ i s d e l e t e d f r o m . X . F i g u r e The a p p l i c a t i o n o f CD i n m a t c h i n g X w i t h v e c t o r s f r o m t h e t r a i n i n g s e t . 27 No te that comparing X^' and X^ i s equivalent to d e l e t i n g y ^ from X^ and comparing with X . In other words, the addition of components to X^ i s equivalent to d e l e t i o n from X, i . e . , allowing X to form the NN's i s equivalent to allowing X_^  to form them. 3.3.4 Examples of Various Special Cases of CD Four cases are shown i n Figure 3.1 i n which X matches a t r a i n i n g vector. The f i r s t match, X with X(d=0), i s simply an EM. The second match, X with X (d=l), i s obtained by deleting y„ and y„ and s h i f t i n g y. and y. q b J 3 3,q b : k 4,q one segment to the l e f t . X and X^_ d i f f e r i n dimensionality by one two-bit segment and are eventually matched by deleting y . Note that the d e l e t i o n z, r of Y 1 also y i e l d s an EM but, as previously stated (Chapter 3.3.3), this I > r match i s i d e n t i c a l to that obtained by d e l e t i n g y . The f i r s t match only z, r i s thus recorded. X. can be matched by deleting y from X. Or, equivalently S 3 y„ could be i n s e r t e d into X„ to the l e f t of y„ 3 S 3,S-3.3.5 The C l a s s i f i c a t i o n of the H2 Vectors The H2 vectors are c l a s s i f i e d by EM (d=0). The H2 vectors are s u f f i -c i e n t l y short that l i t t l e v a r i a t i o n occurs within a pattern class thus making EM a p p l i c a b l e . 3.4 The Sequential Decoder The k^-NN's to X, gathered from the V3's, are subjected to a sequential decoder (TREE). This decoder takes one of the k -NN's at a time and compares v c e r t a i n of i t s features with those features i n the unknown character, C, re-presented by X i n V3 space. TREE works as follows. The features of C., the character i n the i t r a i n i n g set presently being compared with C, are known a p r i o r i and TREE s t a r t s by asking i f C^ i s a member of the group of short l e t t e r s > YES 28 FO OTHERS CONSIDER SPECIAL CASES OF C i e ,n,m,v. REJECT X i 1 ta»-NO YES OTHERS CONSIDER SPECIAL CASES OF G±: Q , 0 , D , R , B , M , G , J , L f b , d , j , 6 , 9 , 8 . , REJECT X OTH] 3RS SPECIAL CASES: T,P,p. YES ¥ : C a THE CHMticre-A Figure 3-2 Flow chart of the s e q u e n t i a l decoder (TREE). the V3 r e p r e s e n t a t i o n of C±, i s one of the k TREE, decides i f X should be kept. -NN', (a,c,e,m,n,o,r,s,u,v,w,x,z) . If C i s not a member of this group, then the information on C i s examined to check i f C as w e l l i s not a short character. I f both and C are large then further sequential decoding takes place; i f not then X i s rejected from the acceptable NN's. The complete flow diagram of TREE i s shown i n Figure 3-2. The s p e c i a l cases are shown separately i n Appendix 2. TREE removes from the k -NN's those X.'s which, though s i m i l a r to v 1 X, have b a s i c feature differences not found i n the v e r t i c a l information alone. An example of a feature i s the height of a character, which i s extracted from the h o r i z o n t a l information. 3.5 The O v e r a l l System The o v e r a l l c l a s s i f i c a t i o n method i s shown i n Figure 3-3. The following d e s c r i p t i o n r e f e r s to t h i s f i g u r e . The k nearest X.'s to X are found and v i passed through TREE y i e l d i n g a smaller number of nearer X.'s at I. The X. 's, i l , H from the H2 t r a i n i n g set are exact matched with X^, the u n c l a s s i f i e d H2 vector. The r e s u l t s appear at I I . Each time an X i s found among the nearest neighbours a t a l l y f o r C. i s incremented. Another t a l l y i s incremented f o r each X. i i , H matched to }L . The r e s u l t i n g t a l l i e s , n. and n. , are the r e s u l t i n g d i s c r i -i i i i , H minants for the v e r t i c a l and h o r i z o n t a l information r e s p e c t i v e l y . The outputs at I and II are combined y i e l d i n g the i n t e r s e c t i o n at I I I . Here the V3 and H2 t a l l i e s f o r a given pattern class are combined. In this way, the t o t a l number of NN's belonging to includes contributions from both V3 and H2. I f an i n t e r s e c t i o n occurs the with the l a r g e s t discriminant i s chosen. I f there i s no i n t e r s e c t i o n of v e r t i c a l and h o r i z o n t a l information three p o s s i b l e outcomes a r i s e . F i r s t , i f there i s no output at I or II the a r b i t r a r y vector, X, i s rejected. I f an output occurs at I but not at II then X XH F I N D THE k v - N N . ' s X i , H F I N D E M ' s T R E E I I I ( I N T E R S E G T I O O p V3 AND H2 R E S U L T S ) COMBINE V3 AND H2 ' R E S U L T S I I g u r e 3-3 F l o w c h a r t of t h e o v e r a l l c l a s s i f i c a t i o n s c h e m e . 31 the d e c i s i o n i s based s o l e l y on the v e r t i c a l information so the output at I i s chosen. Should there be no output at I then the d e c i s i o n i s based on the l e s s r e l i a b l e (H2 contains l e s s information than V3) h o r i z o n t a l information so the output at II i s chosen. The l a t t e r two procedures increase the o v e r a l l recognition rate yet c e r t a i n l y cause more s u b s t i t u t i o n s than i f they were not present. 3.6 A Parametric Approach Using the assumption that the components of the feature vectors are L independent, (3-1) can be put i n the following f orm P (x| C. )P(C. )= II P(X. | C. )P(C.) 1 1 j = i 3 1 1 (3-5).- Now, instead of l e a r n i n g and s t o r i n g a large number, N., of p r o b a b i l i t i e s P(X|C i) for each pattern c l a s s , only the L p r o b a b i l i t i e s P(X 1(h), j=l,2, ...,L, need be stored. In the system developed i n t h i s thesis N_^ =12 was a c t u a l l y less than the average V3 length L=14. However, to improve recognition N^ should be increased.' In so doing more storage would be necessary and more time would be necessary to interrogate the l a r g e r look-up table. The use of (3-5) would thus seem appropriate but f o r two problems. F i r s t , the components of X may not be s u f f i c i e n t l y independent. Secondly, L i s v a r i a b l e . Nothing can be done about the f i r s t problem except make the independence assumption, but the second problem can be handled using a .different form of (3-5). This c l a s s i f i e r discussed i n [19], [18] conditions the p r o b a b i l i t y of a component, x., on C and the length L of the vectors X and X.. The pattern class C J i v I i i s chosen i f the discriminant L 1± = P(X,C., L v ) = P ( X j |Ci,L v)-P(L v|C.)-P(C.) (3-6) i s a maximum. Note that the p r o b a b i l i t y of a component given the j o i n t occurrence of C and L replaces P ( x | C ) , and the p r o b a b i l i t y of a c e r t a i n L I V 1 X V given C. i s an added term. Now the subset , i s broken down further into smaller i I subsets of vectors of various lengths. With a large amount of data the p r o b a b i l i t i e s i n (3-6) could be w e l l estimated. P(C ) can be found i n [20], [21]. The following chapter deals with.results obtained during and a f t e r completion of the development of the system. 33 CHAPTER 4. TESTING AND RESULTS 4.1 Introduction This chapter deals with a d e s c r i p t i o n of t r a i n i n g and test data, the tests performed on the data, and the r e s u l t s of the t e s t s . Various combinations of parts of the c l a s s i f i e r are studied to f i n d how each part a f f e c t s c l a s s i f i c a t i o n . C l a s s i f i c a t i o n r e s u l t s f o r one type-style, i n a d d i t i o n to that type-style combined with three others, are i n v e s t i g a t e d . CD i s studied extensively by performing recog n i t i o n tests on a group of four p a r t i c u l a r l y s i m i l a r l e t t e r s . 4.2 Description of the Training and Test Data Four s e r i f e d e l e c t r i c typewriter type-styles are included i n the data: Hermes Ambassador (HA), IBM S e l e c t r i c ' s P i c a 72 (SP) , Delegate (SD), and Courier 72 (SC). These type-styles are shown i n Figure 4-1. The s i z e s (Picha) of the d i f f e r e n t type-styles are approximately equal. HA contains both c l o t h and carbon ribbon p r i n t while the S e l e c t r i c p r i n t i s obtained from carbon ribbon. Figure 4-2 i l l u s t r a t e s the range i n q u a l i t y of the HA p r i n t . The HA p r i n t was used e x c l u s i v e l y i n the development of the system. In this way emphasis was placed on generalizing the system to accept a wide range i n p r i n t q u a l i t y for one type-style. To account for d i f f e r e n t type-styles further t r a i n i n g was performed with SP, SD, and SC. No s a n s - s e r i f p r i n t was used. I t was found that the carbon ribbon yielded the c l e a r e s t characters, yet some characters were found to have gaps due to poor tra n f e r from ribbon to paper. The new c l o t h ribbon produced very thick, dark characters requiring excessive i l l u m i n a t i o n . S l i g h t l y less dense characters among these very dark ones were over-illuminated. The very pale characters required very low i l l u m i n a t i o n . The o v e r a l l lack of contrast caused o v e r s e n s i t i v i t y of these pale characters to i l l u m i n a t i o n . Hermes Ambassador This type was used in the development of the system. HA ABCDEFGHI JKLMOPQRSTUVWXYZ 1234567890 abcdefghijklmnopqrstuvwxyz Courier 72 Code 015 10 Pitch IBM COURIER 72 Type is a square-serif design in the Pica family of type styles. The open spaced characters make i t highly legible. ABCDEFGHIJKLMNOPQRSTUVWXYZ [ ] ( ) 1234567890 abcdefghijklmnopqrstuvwxyz -_=+! 0"'/?:;,. SC Pica 72 Code 008 10 Pitch I B M P I C A 72 T y p e i s s i m i l a r t o t h e P i c a t y p e s t y l e s o f f e r e d w i t h t h e I B M M o d e l D T y p e w r i t e r . ^ P I t i s w e l l s u i t e d f o r a v a r i e t y o f - t y p i n g j o b s . A B C D E F G H I J K L M N O P Q R S T U V W X Y Z [ ] @ # $ U & * ( ) 1 2 3 ^ 5 6 7 8 9 0 a b c d e f g h i j k l m n o p q r s t u v w x y z - _ = + ! ° " ' / ? : ; , • Delegate Code 070 10 Pitch IBM D E L E G A T E T y p e i s a w e i g h t e d t y p e c o n v e y i n g ^ _ t h e f e e l i n g o f p r i n t e d m a t e r i a l . I t i s r e c o m - ^ £ ) m e n d e d f o r t e x t c o p y a n d s i m i l a r t y p i n g j o b s . A B C D E F G H I JKLMNOPQRSTUVWXYZ [ ] @ # $ U $ * 0 1 2 3 4 5 6 7 8 9 0 a b c d e f gh i j k l m n o p q r s t u v w x y z - =+! ° 1 1':;/?, . Figure 4-1 Examples of the four type-styles used:HA, SC, SP, SD. Correct 4.3 The E f f e c t s of Various Combinations of the Stages of the System on Recognition This s e c t i o n r e f e r s to Table 4-1. The t r a i n i n g set consists of twelve 60 character HA alphabets. Testing was performed on 12 d i f f e r e n t HA alphabets. The r e s u l t s of the V3 c l a s s i f i c a t i o n alone are shown i n Columns 1 and 2. Note that though CD reduces the number of r e j e c t s , the recogn i t i o n rate remains i n t o l e r a b l y low. The reason i s that V3 y i e l d s representations from only one dimension or viewpoint. Consequently, many characters (45.5% with CD) appear s i m i l a i For example, the information of whether or not the columns i n "d" and "q" ascend or descend i s not present i n V3. Consequently the V3 vectors for these two l e t t e r s are i d e n t i c a l . The number of s u b s t i t u t i o n s decreases as more information on the incoming character i s obtained. S p e c i f i c a l l y , with the addition of reinforcement by H2 matching (Column 3) the number of s u b s t i t u t i o n s drops s i g n i f i c a n t l y . By sub-Column 1 Column 2 Column 3 Column 4 Column 5 EM(V3) EM,CD(V3) EM,CD(V3)&EM(H2) EM,CD,•TREE(V3) EM,CD, TREE(V3)&EM(H2' 50.0% 52.3% 77.2% 80.0% 93.3% Substitutions ; 3 0 . 0 % 4 5 . 5 % 2 1 . 4 % 1 6 . 0 % 6 . 4 % Rej ects 20.0% 2.2% 1.4% 4.0% 0.3% Table 4-1 C l a s s i f i c a t i o n r e s u l t s f o r various combinations of stages of the system j e c t i n g V3 r e s u l t s to TREE (Column 4), with no h o r i z o n t a l scanning, an even l a r g e r reduction i s obtained. The combination of TREE to r e j e c t unwanted V3's, and H2 matching to supplement the v e r t i c a l scanning i s shown i n Column 5 to reduce sub-s t i t u t i o n s to 6.4%. The su b s t i t u t i o n s are reduced 39.1% from Column 2 to Column 5. This reduction i s less than the sum of the reductions from Column 2 to 3 and Column 2 to 4 implying an overlap of information contained i n TREE and H2 matching. In fac t much of TREE i s formed from the h o r i z o n t a l scanning information. I d e a l l y , the 3b information overlap should approach zero so that maximum advantage i s made of the two stages. The only increase i n s u b s t i t u t i o n s r e s u l t s with the use of CD. CD i s used to reduce r e j e c t s , unfortunately at the expense of increasing s u b s t i t u t i o n s . A f t e r CD has reduced r e j e c t s , TREE i s used to remove vectors which could possibly cause s u b s t i t u t i o n e r r o r s . With the decline of s u b s t i t u t i o n s the r e j e c t rate increases s l i g h t l y . The replacement of s u b s t i t u t i o n errors by r e j e c t errors (assuming small error) i s an a d d i t i o n a l advantage of TREE. In Column 5 the r e j e c t rate drops p r i m a r i l y because of reinforcement by H2 matching which supplies optional r e s u l t s , some of them erroneous, to the V3 r e s u l t s . The main objective i n designing the various stages was to increase the correct c l a s s i f i c a t i o n s . Although CD alone does l i t t l e to improve recognition, . ; correct c l a s s i f i c a t i o n increases greatly when CD i s used with H2 matching, or with TREE. With both TREE and H2 matching the re c o g n i t i o n rate was a favourable 93.3%. An obvious problem with the system as organized i n Figure 3-3 i s illustrate< i n the f i n a l s u b s t i t u t i o n and r e j e c t rates (Column 5). The r e j e c t s should be more frequent than the s u b s t i t u t i o n s [1]. A subject following the output (probably s p e l l e d speech) would then know when an error occurred whereas s u b s t i t u t i o n errors may lead the subject astray. 4.4 Recognition of HA The HA data set consists of a t r a i n i n g set of 12 alphabets and a test set of 12 alphabets. Of each group of 12 alphabets, s i x were produced from carbon ribbon, three from dark c l o t h ribbon, and three from pale c l o t h ribbon. Under s t r i c t supervision, that i s scanning one l e t t e r at a time and adjusting the i l l u m i n a t i o n to allow for contrast v a r i a t i o n s , the c l a s s i f i c a t i o n r e s u l t s of Column 5 i n Table 4-1 were achieved. The recognition rate of 93.3% approaches the acceptable rate of 95% [1]. To obtain more r e a l i s t i c r e s u l t s , two a d d i t i o n a l testing sessions were car r i e d out on the twelve 60 character alphanumeric t e s t alphabets. Alignment and i l l u m i n a t i o n were adjusted before scanning each alphabet. Then one complete l i n e of f i v e characters was scanned at a time with the c l a s s i f i c a t i o n r e s u l t s being typed out on the teletype as the characters were c l a s s i f i e d . The r e s u l t s of scanning the 12 alphabets twice are: Correct - 88.63% Substitutions - 10.26% Rejects - 1.11% Combining the r e s u l t s of Column 5 i n Table 4-1, the r e s u l t s improve: Correct - 90.22% Substitutions - 8.95% Rejects - 0.83% The r e s u l t s of scanning the 12 alphabets twice can be analyzed further by considering the three types of ribbons: Carbon - 96.1% Correct Heavy Cloth - 91.1% Correct Pale Cloth - 71.4% Correct The r e s u l t s are acceptable i f good q u a l i t y p r i n t i s used. For poor q u a l i t y p r i n t , some means of automatic i l l u m i n a t i o n control would seem necessary to obtain acceptable r e s u l t s . Even though acceptable r e s u l t s are obtainable, too high a proportion of the errors are s u b s t i t u t i o n s . Once an i n i t i a l adjustment i s made to ensure a l l characters on the top l i n e of text pass below and wit h i n the l i m i t s of the scanning s l i t , alignment i s no problem while reading the rest of a document. 4.5 Recognition of HA, SP, SD, SC The complete t r a i n i n g set consists of 12 HA alphabets, as discussed i n Chapter 4.4, with four a d d i t i o n a l carbon ribbon alphabets f o r each of SP, SD and SC. The adaptive learning i s in d i c a t e d i n Figure 2-11. The test set consists of only carbon ribbon characters so that the emphasis i s placed on the differences i n type-styles rather than on the. differences i n the q u a l i t y of the p r i n t . Six 5 2 - l e t t e r alphabets (no numerals).of each of the four type-styles were scanned three times. The o v e r a l l r e s u l t s are: Correct - 92.80% Substitutions - 7.18% Rejects - 0.02% Considering each type-style separately, the r e s u l t s are as follows: HA - 94.97% Correct SP - 93.91% Correct SC - 92.10% Correct SD - 90.18% Correct As expected, HA, the s t y l e that the system was based on and which has 12 alphabets of t r a i n i n g to the other s t y l e s ' four alphabets, has the highest recog n i t i o n rate. The most s t y l i z e d p r i n t , SD, (See Figure 4-1) has the lowest recognition rate. SC on the other hand i s the l e a s t s t y l i z e d and thus again d i f f e r s from HA. As the r e s u l t s i n d i c a t e , and Figure 4-1 i l l u s t r a t e s , SP i s f a i r l y s i m i l a r to HA. The adaptiveness of the system i s evident from the recognition r e s u l t s . Only a small amount of t r a i n i n g i n excess of the o r i g i n a l HA t r a i n i n g was necessary to achieve favourable r e s u l t s for the a d d i t i o n a l three type s t y l e s . The i n c l u s i o n of the a d d i t i o n a l s t y l e s added l i t t l e to the s u b s t i t u t i o n rate, a p o s s i b i l i t y which was at f i r s t feared. The few extra s u b s t i t u t i o n s were removed with TREE. 4.6 The E f f e c t s of CD CD was o r i g i n a l l y incorporated to reduce the amount of t r a i n i n g and the number of r e j e c t s . Both these objectives were achieved but a better understanding of CD was sought. To obtain t h i s understanding, experiments were performed on only the group of s t r u c t u r a l l y s i m i l a r (with respect to the M d i s t r i b u t i o n s ) l e t t e r s "a", "e", "s", and "z" (the aesz group). This group i s a good represent-at i o n of characters which are d i f f i c u l t to c l a s s i f y . The V3 d i s t r i b u t i o n s f o r members of the group are rather c l o s e l y clustered as ' v e i l . CD w i l l thus be applied to worst-case data. Figure 4-2 Range i n q u a l i t y of the HA p r i n t used i n t r a i n i n g and t e s t i n g , showing "a" and "W". From l e f t to r i g h t : carbon r ibbon,heavy c l o t h r i b b o n , pale c l o t h . F igure 4-3 An i n d i c a t i o n of the l e a r n i n g of a d d i t i o n a l aesz groups. The c l a s s i f i e r was assumed " e x t e n s i v e l y " t r a i n e d a f t e r t r a i n i n g on 192 groups (768 l e t t e r s ) . 40 The two sets of data used are 12 aesz groups for t r a i n i n g (from the 12 HA alphabets of Chapter 4.4) as w e l l as 192 groups, i n c l u d i n g the former 12. The a d d i t i o n a l 180 groups were produced equally from HA carbon and c l o t h ribbon. The 192 groups were produced to represent an extensively trained data set. Figure 4-3 shows that at 192 groups most v a r i a t i o n s of a, e, s and z V3's have been encountered. The 12 HA alphabets of Chapter 4.4 served as the test set and the aesz group from each alphabet was scanned f i v e times. Recognition r e s u l t s were obtained from the data using EM and CD. The o v e r a l l r e s u l t s (Table 4-2) show as expected that the t r a i n i n g on 192 groups provided better r e s u l t s than the 12 group data. In both the 12 and 192 group cases CD increased the recognition rate. Two a d d i t i o n a l advantages of CD are c l e a r . F i r s t , CD provided a much greater increase (23.1%) i n recognition f o r the 12 group case than f o r the 192 group case (8.4%). CD thus i s of great use i n expanding a small t r a i n i n g set. The improvement with the large space t r a i n i n g set, though less pronounced than with groups, i s s t i l l outstanding. Secondly, though the recognition rate increases greatly with CD, the s u b s t i t u t i o n rate increases only s l i g h t l y . A disadvantage i s that the increase i n recognition i s at the cost of r e j e c t i o n s , leaving the su b s t i t u t i o n s almost untouched. Besides the improvement over EM, the main observation to be made i n Table .4-2 i s the s i m i l a r i t y between CD for 12 groups and EM f o r 192 groups. Despite the a d d i t i o n a l 180 groups of t r a i n i n g the recognition rate f o r EM i s but.1% better than CD with 12 groups of t r a i n i n g . The r e j e c t / s u b s t i t u t i o n r a t i o i s , however, f a r be t t e r f or EM. Table 4-3 shows the confusion tables for the EM and CD t e s t i n g on the 12 and 192 t r a i n i n g groups. The t e s t i n g as before consisted of f i v e sessions of scanning the 12 HA aesz groups (60 of each character). 41 Output e Rejects Output a 42 2 1 0 15 a 57^-— 2 1 2 1 0 e 0 44 0 0 16 e 0 55_ 3 0 s 4 20 13 0 23 s 11 17 27 0 z 0 0 0 41 19 z 1 1 2 4 55 z Rejects 1 2 5 2 (a) EM r e s u l t s a f t e r t r a i n i n g on 12 groups (b) CD r e s u l t s a f t e r t r a i n i n g on 12 groups Output InplTt~~~- a e s z Rej ects __0utput InputT^-- a e s z Rej ects a 54 2 0 0 4 a 57 3 0 0 0 e 0 48 5 0 7 e 0 56 4 0 0 s 5 4 41 3 7 s 4 4 47_ 1 0 z 0 0 2 54 4 z l ' l l i 57 0 2 2 (c) EM r e s u l t s a f t e r t r a i n i n g on 192 groups (d) CD r e s u l t s a f t e r t r a i n i n g on 192 groups Table 4-3. Confusion tables f o r the group of l e t t e r s aesz Trained on 12 groups Trained on 192 groups EM CD EM CD % Correct 57.9 81.0 82.0 90.4 % Rejected 30.4 4.2 9.2 0.0 % Substituted 11.7 14.8 8.8 9.6 Table 4-2. Recognition rates f or the group of l e t t e r s aesz for EM and CD with t r a i n i n g on 12 and 192 groups Table 4-3(a) and (d) are included f o r completeness. Table 4-3(b) and (c) i l l u s t r a t e the strength of CD f o r the l e t t e r s "a", "e", and "z". Only the l e t t e r "s" y i e l d s r e s u l t s i n (b) i n f e r i o r to those i n (c). Figure 4-4 serves to i l l u s t r a t e the p r o b a b i l i t y d i s t r i b u t i o n s of "a", "e", and "s". The d i s t r i b u t i o n for "a" and "e" overlap s l i g h t l y . The l e t t e r "s" i s sub-s t i t u t e d more for "e" than f or "a". CD has the e f f e c t of spreading the d i s t r i b u t i o n f o r "s" w e l l into that f o r "a" and "e". I t i s safe to say that i f the o r i g i n a l vector d i s t r i b u t i o n s are reasonably non-overlapping, then CD has the c a p a b i l i t y of "producing" a use f u l large t r a i n i n g set from a small one. F i g u r e 4-4 A s i m p l i f i e d g r a p h i c a l i l l u s t r a t i o n o f . t h e o v e r l a p p i n g . p r o b a b i l i t y d e n s i t y f u n c t i o n s o f " a " , " e " , a n d " s " . CHAPTER 5. SUMMARY AND DISCUSSION 5.1 Summary o f the System U s i n g the L e x i p h o n e as a t r a n s d u c e r and p a r t i a l p r e p r o c e s s o r an a l p h a n u m e r i c c h a r a c t e r r e c o g n i z e r was b u i l t . I n f o r m a t i o n , i n a d d i t i o n t o t h a t o b t a i n a b l e when t h e L e x i p h o n e i s used as a d i r e c t - t r a n s l a t i o n r e a d i n g machine, was o b t a i n e d by b u i l d i n g a d d i t i o n a l c i r c u i t r y . The p r e p r o c e s s i n g and p r o c e s s i n g o f d a t a from the L e x i p h o n e was p e r f o r m e d o n - l i n e by a PDP-9 d i g i t a l computer. The programming was a c c o m p l i s h e d i n s t a g e s as t h e computer s i m -u l a t e d p r e p r o c e s s o r and p r o c e s s o r were d e v e l o p e d . The r e s u l t i s a l m o s t 8000 words o f programming and memory spac e to h a n d l e : o n - l i n e c o m m u n i c a t i o n w i t h t h e L e x i p h o n e ; f e a t u r e e x t r a c t i o n ; computer d i s p l a y s m o n i t o r i n g v a r i o u s s t a g e s o f development; f e a t u r e v e c t o r d e v e l opment; m a g n e t i c t a p e u n i t s w h i c h s t o r e the v a r i o u s t r a i n i n g s e t s ; m u l t i s t a g e c l a s s i f i c a t i o n o f v e c t o r s ; o u t p u t of i n t e r m e d i a t e r e s u l t s from the t e l e t y p e ; o u t p u t o f t h e f i n a l c l a s s i f i c a t i o n r e s u l t s from e i t h e r the t e l e t y p e , o r t h e D/A i n t h e form o f an a n a l o g u e v o l t a g e . The a v a i l a b l e memory i n t h e l o w e r 8000 word bank o f the computer was r e s e r v e d ( b u t by no means f i l l e d ) f o r s t o r i n g the t r a i n i n g s e t b e i n g u s e d . The 1591-18 b i t words f o r the HA t r a i n i n g s e t and 2565 words f o r the combined HA, SP, SC, .SD t r a i n i n g s e t were s t o r e d on DEC t a p e . The f i n a l program y i e l d e d a c l a s s i f i c a t i o n r e s u l t v i a t h e D/A ( F i g u r e 5-1) i n a p p r o x i m a t e l y 150 m i l l i s e c o n d s (msec) u s i n g the 12 a l p h a b e t t r a i n i n g s e t ( C h a p t e r 4-4) and a p p r o x i m a t e l y 220 msec ..using the l a r g e r t r a i n i n g s e t ( C h a p t e r 4-5). The a n a l o g u e v o l t a g e s were used as i n p u t to a PDP-12 computer programmed to o u t p u t s p e l l e d s p e e c h . A s p e l l e d - s p e e c h r e a d i n g machine, the i n p u t to w h i c h was a c t u a l t y p e w r i t t e n p r i n t , r e s u l t e d . F i g u r e 5 - 1 R e c o g n i t i o n r e s u l t s f r o m D / A a f t e r s c a n n i n g " a " , " b " , " c " , " d " . A f t e r a c h a r a c t e r i s s c a n n e d t h e o u t p u t g o e s t o z e r o v o l t s u n t i l c l a s s i f i c a t i o n i s c o m p l e t e ( - . 1 5 s e c o n d s ) . A f t e r c l a s s i f i c a t i o n t h e v o l t a g e r e m a i n s c o n s t a n t u n t i l a n o t h e r c h a r a c t e r i s s c a n n e d . 5.2 Summary of Experiments and Concepts C l a s s i f i c a t i o n r e s u l t s f o r combinations of intermediate stages of the system were obtained to in v e s t i g a t e the effectiveness of the stages. Results were obtained also for one type-style of various p r i n t q u a l i t i e s , and for various type-styles of approximately one p r i n t q u a l i t y . Experiments were performed on the aesz group to in v e s t i g a t e how CD affected the d i s t r i b u t i o n s of the V3 representations. CD was found to be a valuable aid i n reducing t r a i n i n g i n a table look-up c l a s s i f i e r . The adaptiveness of the table look-up method was proven to be u s e f u l i n re a d i l y obtaining an evaluation of the preprocessor, and i n expanding the scope of the system to accept various type-styles. A multistage system composed of r e l a t i v e l y simple operations was chosen instead of a s i n g l e complicated approach. Two advantages r e s u l t : 1) alt e r n a t e routes to obtain the r e s u l t are a v a i l a b l e ; 2) complexity, and usually cost, i s kept low. Tracking i s no problem a f t e r simple alignment of one l i n e on a page of text i s accomplished. Skewed characters o f f e r no problem, p a r t i c u l a r l y when the reading head i s aligned [22]. 5.3 Further Research The assembly language computer programming consists of l o g i c a l operations and the use of memory banks which can be manufactured from a v a i l a b l e hardware. Some form of rapid-access memory would be necessary should the table look-up method be used. Possibly a more e f f i c i e n t method of implementing the system would be to use the ce n t r a l processing unit (CPU) of a small computer with perhaps a few thousand words of memory for control and intermediate b u f f e r s . D/A and data storage f a c i l i t i e s would also be needed. The time saved, the a d a p t a b i l i t y , and now low cost, of a programmable machine may make the CPU approach more app l i c a b l e . The experimental system i s too slow. As developed, the c l a s s i f -i c a t i o n time, due pr i m a r i l y to the excessive sequential computations of CD, was so long that consecutive characters were n e c e s s a r i l y single-spaced. The method of Chapter 3.6 may decrease computation time s u f f i c i e n t l y . Using a la r g e r rapid-access memory and no CD would speed up computation time at the expense of added memory. Time can be sto l e n ( v i a a program i n t e r r u p t ) from the gaps between e l e c t r o n i c scanning of the following character. In this way one character would be processed during the times the computer was not c o l l e c t i n g VI information from the following character. By table look-up s u b d i v i s i o n an a r b i t r a r y vector can be compared with but a f r a c t i o n of the t r a i n i n g set instead of with every member. S p e c i f i c a l l y , i f the input i s a short character, then the present computation time could be approximately halved by examining only those members of the t r a i n i n g set corresponding to short character. The problem of running text [22] must be considered. In r e a l i t y one character at a time i s not encountered: adjacent characters could be touching. Presently, no normalization procedure i s provided to handle a v a r i e t y of type s i z e s . Rather than e l e c t r o n i c normalization, as i s done wit! the v e r t i c a l information, the object length and focus of the reading head could be a l t e r e d mechanically to provide correct imaging of the ink patterns within the f i e l d of the scanning s l i t . I t i s f e l t that the ph o t o c e l l array might be of higher r e s o l u t i o n i n order to always detect small gaps (such as found i n "g") i n characters, while allowing the characters to be s u f f i c i e n t l y small that greater v e r t i c a l tolerance i n alignment i s achieved. Problems encountered with low-contrast p r i n t may be p a r t i a l l y eliminated by feedback of the average r e f l e c t e d i l l u m i n a t i o n to c o n t r o l the lamp i n t e n s i t y . A system i s discussed i n [4] pp 354-363. A l t e r n a t i v e l y a deci s i o n as to whether a point should be c a l l e d black or white could be influenced by the black density of i t s neighbours [3]. The r e l a t i v e widths of l e v e l s of constant M are d i s t o r t e d by v f l u c t u a t i o n s due to the crude platform-driving mechanism. Solution of the problem, whether i t be f i n e r gears or a b e l t d r i v e , would increase V3 r e p r o d u c i b i l i t y . The p r o b a b i l i t y of obtaining d i f f e r e n t vector represent-ations f o r d i f f e r e n t (but under i d e n t i c a l conditions) scannings of a given character would be lessened. Although most p r i n t i s presently of the s e r i f e d s t y l e , i ncreasing use i s being made of the s a n s - s e r i f s t y l e . The problem of recognizing sans-s e r i f characters was not studied, but should be included i n further research Much information i s contained i n h o r i z o n t a l s e r i f s and shows up i n V3. The removal of s e r i f s would a l t e r V3 considerably and a f f e c t the amount of information required by the recognizer. 4 0 or '5 OA/ LEFT Sice ? •reee 10 reee UJ/TH p •** iv fie fluty ,9 vteToid K; ) s*o T A c HAKIICTCK , is ReTflifeo £ej£cTED f}pp€ht//x JE Spea'a/ cases /'n T^£S. The n)jouts /» the. vor/'o^J (PQCrE IOF 2.) trees vre -the characters ("h/eh rhe. Yc's represent. BIBLIOGRAPHY 1. Clemens, J.K., " O p t i c a l Character Recognition f or Reading Machine Ap p l i c a t i o n s " , Ph.D. Thesis, M.I.T.September, 1965. 2. Rabinow, J . , "The Present State of the Art of Reading Machines" i n L.N. .Kanal (ed.), Pattern Recognition, pp. 3-29, Thompson Book Company, Washington, D.C, 1968. 3. Holt, A.W., "Comparative R e l i g i o n i n Character Recognition Machines", Computer Group News, Vol. 2, No. 6, November, 1968. 4. IBM Journal of Research and Development, V o l . 12, No. 5, pp. 346-371, September, 1968. 5. Anderson, W.H., "The IBM 1288 O p t i c a l Page Reader - New Dimensions i n O p t i c a l Character Recognition", Computer Group News, Vol. 2, No. 6, November, 1968. 6. Greanias, E.C., et a l . , "The Recognition of Handwritten Numerals by Contour Analy s i s " , IBM Journal, V ol. 7, No. 1, pp. 14-21, January, 1963. 7. Smith, G.C., and Mauch, H.A., "Summary Report on the Development of a Reading Machi ne for the B l i n d " , B u l l e t i n of Prosthetics Research, BPR 10-12, F a l l , 1969. 8. Ramsay, W.D., "Design of a Simple Reading Machine f o r the B l i n d " , M.A.Sc. Thesis, U.B.C., October, 1968. 9. Beddoes, M.P., "An Inexpensive Reading Instrument with Sound Output for the B l i n d " , IEEE Transactions on Bio-Medical Engineering, V o l . BME-15, No. 2, A p r i l , 1968. 10. Martin, W.P., "A computer-Based Environment for Compression Experiments with Code Sounds from the Lexiphone", M.A.Sc. Thesis, U.B.C., Ju l y , 1969. 11. Kanal, L. and Chandrasekaran, B., "On Dimensionality and Sample Size i n S t a t i s t i c a l Pattern C l a s s i f i c a t i o n " , National E l e c t r o n i c s Conference, Chicago, pp. 2-7, 1968. 12. Highleyman, W.H., "rne Design and Analysis of Pattern Recognition Experiments", B e l l System Technical Journal, pp. 723-744, March, 1962. 13. Sebestyen, G.S., Decision-making Processes i n Pattern Recognition, The MacMillan Company, New York, 1962. 14. Nilsson, N.J., Learning Machines: Foundations of Trainable Pattern-• C l a s s i f y i n g Systems, McGraw-Hill, 1965. 15. Cover, T.M. and Hart, P.E., "Nearest Neighbour Pattern C l a s s i f i c a t i o n " , IEEE Transactions on Information Theory, V o l . IT-13, pp. 21-27, January, 1967. 16. Cover, T.M., "Estimation by the Nearest-Neighbour Rule", IEEE Transactions  on Information Theory, Vol. IT-14, No. 3, pp. 50-55, January, 1968. 17. Hart, P.E., "The Condensed Nearest Neighbour Rule", IEEE Transactions on  Information Theory, Vol. IT-14, No. 3, pp. 515-516, May, 1968. 18. Toussaint, G.T. and Donaldson, R.W., "Algorithms for Recognizing Contour-Traced Handprinted Characters", IEEE Transactions on Computers, Vol. C-19, No. 6, June, 1970. 19. Toussaint, G.T., "Machine Recognition of Independent and Contextually Constrained Contour-Traced Handprinted Characters", M.A.Sc. Thesis, U.B.C., December, 1969. 20. Pierce, J.R., Symbols, Signals and Noise: The nature and process of  Communication, Harper and Row, New York, p. 283, 1961. 21. P r a t t , F., Secret and Urgent, Blue Ribbon Books, Garden C i t y , N.Y. 22. Clayden, Clowes, and Parks, "Letter Recognition and the Segmentation of Running Text". Information and Control, 9, pp. 246-264, 1966. 23. Fix, E., and Hodges, J.L., J r . , "Discriminatory Analysis, Nonparametric Discrimination", USAF School of A v i a t i o n Medicine, Randolpy F i e l d , Tex., Project 21-49-004, Rept. 4, Contract AF41(128)-31, February, 1951. 

Cite

Citation Scheme:

        

Citations by CSL (citeproc-js)

Usage Statistics

Share

Embed

Customize your widget with the following options, then copy and paste the code below into the HTML of your page to embed this item in your website.
                        
                            <div id="ubcOpenCollectionsWidgetDisplay">
                            <script id="ubcOpenCollectionsWidget"
                            src="{[{embed.src}]}"
                            data-item="{[{embed.item}]}"
                            data-collection="{[{embed.collection}]}"
                            data-metadata="{[{embed.showMetadata}]}"
                            data-width="{[{embed.width}]}"
                            async >
                            </script>
                            </div>
                        
                    
IIIF logo Our image viewer uses the IIIF 2.0 standard. To load this item in other compatible viewers, use this url:
http://iiif.library.ubc.ca/presentation/dsp.831.1-0102200/manifest

Comment

Related Items