Open Collections

UBC Theses and Dissertations

UBC Theses Logo

UBC Theses and Dissertations

Schema labelling applied to hand-printed Chinese character recognition Bult, Timothy Paul 1987

Your browser doesn't seem to have a PDF viewer, please download the PDF to view this item.

Item Metadata

Download

Media
831-UBC_1987_A6_7 B84.pdf [ 4.04MB ]
Metadata
JSON: 831-1.0051883.json
JSON-LD: 831-1.0051883-ld.json
RDF/XML (Pretty): 831-1.0051883-rdf.xml
RDF/JSON: 831-1.0051883-rdf.json
Turtle: 831-1.0051883-turtle.txt
N-Triples: 831-1.0051883-rdf-ntriples.txt
Original Record: 831-1.0051883-source.json
Full Text
831-1.0051883-fulltext.txt
Citation
831-1.0051883.ris

Full Text

SCHEMA LABELLING APPLIED TO HAND-PRINTED CHINESE CHARACTER RECOGNITION by TIMOTHY PAUL BULT Maitrise, Universite de Grenoble, 1981 A THESIS SUBMITTED IN PARTIAL FULFILLMENT OF THE REQUIREMENTS FOR THE DEGREE OF MASTER OF SCIENCE in THE FACULTY OF GRADUATE STUDIES DEPARTMENT OF COMPUTER SCffiNCE We accept this thesis as conforming to the required standard THE UNIVERSITY OF BRITISH COLUMBIA April, 1987 (g) Timothy Bult, 1987 In p r e s e n t i n g t h i s t h e s i s i n p a r t i a l f u l f i l m e n t of the requirements f o r an advanced degree a t the U n i v e r s i t y o f B r i t i s h Columbia, I agree t h a t the L i b r a r y s h a l l make i t f r e e l y a v a i l a b l e f o r r e f e r e n c e and study. I f u r t h e r agree t h a t p e r m i s s i o n f o r e x t e n s i v e copying of t h i s t h e s i s f o r s c h o l a r l y purposes may be granted by the head o f my department or by h i s or her r e p r e s e n t a t i v e s . I t i s understood t h a t copying or p u b l i c a t i o n o f t h i s t h e s i s f o r f i n a n c i a l gain s h a l l not be allowed without my w r i t t e n p e r m i s s i o n . Department of d&rv\^ i/\ef ^ ^'iCmcC The U n i v e r s i t y of B r i t i s h Columbia 1956 Main Mall Vancouver, Canada V6T 1Y3 Date 1 DE-6 (3/81) A b s t r a c t Hand-printed Chinese character recognition presents an interesting problem for Artificial Intelligence research. Input data in the form of arrays of pixel values cannot be directly mapped to unique character identifications because of the complexity of the characters. Thus, intermediate data structures are necessary, which in turn lead to a need to represent knowledge of the characters' composition. Building the intermediate constructs for these hand-printed characters necessarily involves choices among ambiguities, the set of which is so large that an efficient search algorithm becomes central to the recognition process. Schema labelling is a theory of how knowledge should be organized for recognition tasks in which composition structure is inherent in the domain, the composition entails ambiguity, and the ambiguity generates large search spaces. This thesis describes an implementation of an enhanced version of schema labelling for Chinese characters. The specific problems addressed by the enhancements, with some success, are (i) the segmentation of real images into objects usable by the schema system, (ii) the definition of schemas which adequately describe the generic composition of hand-printed Chinese characters, as well as common variations or vagaries, and (iii) the inclusion of sufficient "control knowledge" to prevent combinatorial explosion of the backtracking recognition process. Test characters for recognition systems can be classified along several dimensions. On the spectrum from type-set, through hand-printed, to hand-written forms, our system was tested on restricted hand-print, at a level somewhat more difficult than is normally attempted. On the spectrum of input types, from grey-scale pixel input through on-line stroke representations, our system was fully tested only at the high end, with complete synthetic strokes. We obtained a success rate of 57%, 12 out of the 21 characters tested. The principal success of the work is that characters of the complexity tested could be recognized at all, and in the impact schema labelling techniques had on that recognition. ii Table of Contents Abstract i i Table of Contents iii List of Figures iv Acknowledgements v Chapter 1 — INTRODUCTION: Knowledge Representation and Chinese Characters 1 1.1 Why Apply Schema Labelling To Chinese Characters? 3 Chapter 2 — BACKGROUND: Knowledge Representation for Chinese Characters 5 2.1 Character Recognition 6 2.1.1 Binary Image Template Matching (Casey and Nagy, 1966) 6 2.1.2 Parametric Templates (Yamamotoetal., 1981) 7 2.1.3 Syntactic Construction (Stallings, 1965) 8 2.1.4 A Brief Summary of Character Recognition 10 2.2 A Scenic Detour into the Constraints Problem 12 2.2.1 One Binary Constraint (Huffman, 1971; Clowes, 1971) 12 2.2.2 Development of Network Consistency Algorithms 13 2.2.3 Full Constraint Lattice (Freuder, 1978) 14 2.2.4 Hierarchy and Hypotheticality (Brooks, 1981) 14 2.3 Knowledge Representation Applied to Character Recognition 16 2.3.1 Hierarchical Relaxation (Hayes, 1980) 16 2.3.2 Frames for Fortran (Brady and Wielinga, 1978) 17 2.4 Schema Labelling (Havens, 1985) 18 Chapter 3 — S C H E M A S FOR CHINESE C H A R A C T E R S 20 3.1 Goal 21 3.2 Overview of the System 21 3.3 Knowledge of Chinese Character Composition 23 3.4 Sample Session: The Character 'Yin' 30 3.4.1 Input — the image processing 30 3.4.2 Making new instances and variables 31 3.5 From a Character on Paper to Schema Instances 34 3.5.1 Image Processing 34 3.5.2 Schema Instance Extensions 37 3.5.3 Schema Class Extensions 40 3.5.4 Control of the Backtracking Search 47 Chapter 4 — RESULTS: Success of Implementation and Application 51 4.1 Ease of Coding of Chinese Character Specifications 51 4.2 Success of Character Recognition 53 Chapter 5 —C O N C L U S I O N 60 5.1 Summary 60 5.2 Suggested Future Work 6 2 Bibliography 64 Appendix A: Knowledge Representation Code 68 Appendix B: Printouts of Sample Instances 74 iii List of Figures Figure 1. Examples of Hand-Printed Chinese Characters 3 Figure 2. Character Histograms (from Yamamoto etal., 1981) 7 Figure 3. Recognition by Syntactic Construction 8 Figure 4. Stallings' Segmentation of Strokes 9 Figure 5. Node Segmentation (from Stallings, 1966) 9 Figure 6. Overview of System Operation 22 Figure 7. Composition Hierarchy 24 Figure 8. Valid 'Groups' of Strokes (from Rankin, 1965) 25 Figure 9. Different Strokes 27 Figure 10. A 3-Bar Stroke with Bar-ends 28 Figure 11. The Segmentation 29 Figure 12. A 'Variable Link' Between 2 Instances 32 Figure 13. Top of Final Network for T i n ' 33 Figure 14. A Digitized Character Image 34 Figure 15. Contours Found by Edge Detector 35 Figure 16. Smoothed Contours of a Character 36 Figure 17. Segmented Contours of a Character 37 Figure 18. The Labelset and Variables of a Bar Instance 40 Figure 19. Some Variables of the Bar Schema Class 41 Figure 20. A Composition Rule for the Stroke Schema Class 44 Figure 21. A Constraint on the Group Schema Class 45 Figure 22. Linearity of Time and Space versus Input Variables 59 Figure 23. Erratic Script Style of Chinese Writing (from Wang, 1970) 63 Figure 24. Printout of the Stroke Schema Class 69 Figure 25. Printout of a Sample Instance 71 iv Acknowledgements My thanks go to the other graduate students of UBC Computer Science, for hours of ideas and arguments; to Theresa Fong, Lindsey Wey, and Carol Whitehead for years of help in the office; to the Computer Science Department, the University of British Columbia, and the Natural Sciences and Engineering Research Council for money; to the UBC Laboratory for Computational Vision and Bell-Northern Research for hardware, software, time, and space; to my brothers of the Sigma Chi Fraternity for where, when, how, and why to study; to Tai Chi master Steve Malliaris for the other sides of life; to my parents for motivation; to Gloria for the final impetus; to Robin McGillveray for running around; to Mark Turchan, Daniel Zlatin, Sameh Rabie, and Dick Peacocke for critical comments; and especially to Bill Havens, for teaching me how to do research, for ideas, for meta-ideas, and for patience. Timothy Bult Chapter 1 INTRODUCTION: Knowledge Representation and Chinese Characters In the pursuit of learning, every day something is acquired. In the pursuit of the Too, every day something is dropped. Less and less is done Until non-actio n is achieved. When nothing is done, nothing is left undone. The world is ruled by letting things take their course. It cannot be ruled by interfering. Lao Zi, from the Dao De'jfng The goal of this thesis was to investigate the utility of schema labelling theory as a knowledge representation tool for recognition of hand-printed Chinese characters. Schema labelling is appropriate to the problem because the complexity and variability of Chinese hand-print suggest a "knowledge-based" approach. Relevant concerns therefore include both the understanding of the natural composition structure of the input data, and intelligent control of the search for a correct interpretation of it. These are the keystones of schema labelling theory. This thesis addresses "the recognition problem", the ability to identify an element of a group of things as being of that group, including an interpretation of what distinguishes it from the other members. For example, knowledge of the letter "t" should enable identification of the first letter of this word as a "t", be the word type-set, typed, printed, written, or scrawled. Identification of a group's elements characterizes many problems in computational vision, natural language processing, and possibly even planning — where one must recognize what the world state is, and then recognize what actions are appropriate. The recognition problem is therefore an important application for knowledge representation theories. 1 In the schema-based paradigm that this thesis adopts (see Havens, 1985), the objects that the system recognizes are represented by instances of schema classes, where a schema class represents a group of similar objects. Objects in any world are related geometrically, and by other characteristics, to other objects; similarly, schema instances are linked together to represent these relations. This leads to a representation of knowledge as a set of classes and instances, where the classes are specifications for all the possible instances and their relations. Schema classes must provide a model for generating new instances based on the cues given by the existing instances. For example, instances for coat hangers, various clothing, and a confined space, appropriately related, should suggest a "closet" instance. Classes must also detail how to determine the relations possible between instances (Dahl and Nygaard, 1976; Minsky, 1975; Goldberg and Robson, 1983). The first subgoal of the recognition process, in this paradigm, is to create instances representing the direct but relatively low-level perceptions or "cues" given by the sensory mechanism. In this thesis, we have applied the Marr-Hildreth (1979) theory of image-discontinuity detection, and contour-representation work by Mackworth and Mokhtarian (1984), to build low-level schema instances for the rest of the recognition process to use. Given this set of instances as input, the second subgoal of the recognition process is to create a set of instances and relations which adequately describe the input, in the terms provided by the schema classes. This description is a network of instances, ranging from the input data, through various intermediate objects, up to high-level structures such as the Chinese characters of our application. We have developed a complete framework of schema classes for Chinese characters, based on an analysis of linguistic studies, and the stroke-level studies of pattern recognition researchers. This specification required extensions to Havens' schema labelling theory, to handle real number variables, to handle complex constraints not feasibly represented by finite sets of label tuples, and to represent knowledge of control of the search process. Chapter Two of this thesis describes some previous work on the recognition problem, especially as related to Chinese characters and schema-based knowledge representation for computer vision. Chapter Three presents our system, including a description of the extensions to standard schema classes and instances which were developed. (For those interested in the implementation details, Appendix A and B describe the system's data structures and programs more deeply.) Chapter 2 Four presents the results of our work: the entire system was implemented, and several test cases run, though not with raw data. The success rate on synthetic data was 57%, the failures all being rejected for exceeding memory or processing time limits. There were no misrecognized characters, and the complexity of characters attempted was higher than that of the type-set characters usually seen in character recognition research. Chapter Five presents a summary of the research, and some ideas for further, related work. 1.1 Why Apply Schema Labelling To Chinese Characters? Chinese characters are the most complex set of written symbols used today, and the language of a quarter of the world's population. There exist about 50,000 characters in the language, though ordinary use requires fewer than 5,000. Like the English alphabet, Chinese characters are usually made by putting strokes of ink on a piece of paper, crossing and touching and aligning them in patterns of size, direction, and position. The characters are geometric arrangements of one or two "components", often called "radicals". Each component is a either a "group" or else a pair of sub-components in one of four simple configurations. Each "group" is a pattern of connected strokes of about the same complexity as English letters (see Rankin, 1965, and Chapter Three of this thesis for further explanation). Figure 1 gives some typical examples of hand-printed Chinese characters. Figure 1. Examples of Hand-Printed Chinese Characters 3 Irregularity in the sire, shape, angles, connections, and positions of strokes can make the characters quite difficult to read — so difficult that nobody has yet written a computer program capable of reading typical hand-written Chinese at all, though some researchers have achieved rates of 90% recognition on heavily restricted hand-printed Chinese (e.g., Yamamotoetal., 1981). The lack of success with less restricted, hand-printed Chinese characters using conventional pattern recognition techniques is one reason for attempting them, as a challenge to schema labelling theory. Also, as a problem domain, they offer interesting natural obstacles of ambiguity, complexity, size of search space, and a hierarchy of composition. Though not a "toy world", they do avoid certain problems of complete vision, such as perception of depth, colour, shading, and light sources, and three-dimensional modeling. Thus hand-printed Chinese character recognition provides a good research domain for Artificial Intelligence, isolating some real problems from the baffling immensity of all the problems. Also, an automatic Chinese reader might well be commercially valuable: governments need to process Chinese documents (IBM, 1962), computer manufacturers and users need input mechanisms that do not require hundreds of push-keys (Trotter, 1981; Yuen, 1983), and a technology capable of reading Chinese would probably apply very well to simpler character sets like our own, with subsequent benefits for postal automation, data entry, and office systems. 4 Chapter 2 BACKGROUND: Knowledge Representation for Chinese Characters Why is the sea king of a hundred streams? Because it lies below them. Therefore it is the king of a hundred streams. If the sage would guide the people, he must serve with humility. If he would lead them, he must follow behind. In this way when the sage rules, the people will not feel oppressed; When he stands before them, they will not be harmed. The whole world will support him and will not tire of him. Because he does not compete. He does not meet competition. Lao Zi, from the Dao De JThg This chapter presents a history of the increasing sophistication and generalization in knowledge representation strategies applicable to Chinese character recognition. Because we are concerned with this concrete problem, we do not attempt to cover the entire field of work in knowledge representation, much of which would not directly apply. Because we are concerned with knowledge representation, we ignore most of the commercial character recognition implementations which recognize certain character types quickly and consistently, but add little to theoretical understanding. The content and ordering of the review is roughly chronological, moving from primitive character-recognizing machines that hold very little "knowledge", through the development of constraint-based object-centered knowledge representations, and finally to a few powerful knowledge-based systems applied to complex character recognition. 5 2.1 Character Recognition 2.1.1 Binary Image Template Matching (Casey and Nagy, 1966) Casey and Nagy wrote the first computer program for reading Chinese characters, published in 1966. By then, a science of statistical pattern recognition was becoming well-established, with sophisticated algorithms for identifying a mildly distorted image with one of a dictionary of templates. Casey and Nagy applied some of these algorithms, writing a program that compared the character it was trying to read, in a 25x25 binary array format, with masks of 40 selected points, each representing a different character in a lexicon of 1000 characters, until a mask matched the input to above some threshold. A first stage classified the character as one of 64 groups, using the same masking technique. While any rotation or stretching of the input characters would disable the method, it allowed some translation error, by fitting the masks at 9 different offsets (each one "bit" away from "perfect registration"), taking the best of these matches as the score for the mask. The 40 points for each mask were selected by a program which guaranteed, by exhaustive search through a "training" set, that the matches to given standard training characters were good enough for the corresponding masks and bad for all the other masks. The algorithm's time complexity was linear with the size of the lexicon, hence fast hardware could guarantee a minimum reading speed independent of the input characters' complexity. Unfortunately, it made some irrecoverable errors, and rejected some characters as unreadable; these problems totalled about 8% of the reads attempted during experimentation. The cause of the failures was the minor deviation of the input characters from the standards: variations in stroke width, clipped corners, and sometimes incorrect masks. For type-set characters from a specified font, one could reduce these error and reject rates with better quantization of input and more careful mask selection. The knowledge representation strategy employed by Casey and Nagy was quite oblique and minimal, making the process applicable only to single, completely specified Chinese character type fonts, ignorant of their complex structure, and dependant on high quality input. While providing a 6 commercial arrow to follow towards efficient automatic readers of typed or typeset Chinese (see next section), the method is not applicable to handwritten characters. 2.1.2 Parametric Templates (Yamamoto et ah, 1981) Statistical pattern-matching character recognition techniques developed into a powerful tool for all except hand-written characters. Recently, Yamamoto's system achieved over 90% recognition of very neatly hand-printed Chinese characters. The program parameterizes its input as "feature vectors" and compares these against a dictionary formed by averaging the feature vectors of training sets. The features used are all histograms, along various axes of the character, of various simple measurements made perpendicular to the histogram axis: the number of pixels on strokes in the direction of the axis, as a measure of stroke directions; the amount of white space pinched between the strokes crossing the axis, and the number of crossing strokes. Figure 2 (taken from Yamamoto et al., 1981) shows the histograms produced for these measures for a sample character. On the left is the outline of an input character, assumed to be represented as an array of black and white pixels, about 40 rows by 40 columns. In the center are three pictures showing the determination of the three types of histogram: counts of black pixels in four directions, measures of "pinched" white pixels in three directions, and counts of crossing strokes in four directions. On the right is a sample match of such a histogram to a model from the program's dictionary. Figure 2. Character Histograms (from Yamamoto et al., 1981) 7 The method allows some stretching of characters, irregularity of stroke width and location, and even some holes or glitches. However, it still views characters as variations on image templates, and founders on varied styles or large variation of simple visual parameters like size or rotation. We view the problem as a lack of appropriate knowledge representation; the program still works in the image domain rather than in the scene domain (Mackworth and Havens, 1983), with image features rather than natural composition structures. Human beings often express their knowledge of characters in terms of a hierarchy of the component parts of characters and the appearance of their component strokes; they describe strokes as geometrical objects or productions of mechanical actions; only the bottom-level visual objects in this hierarchy, such as the edges between black ink and background white, are related directly to input images. Hierarchical, scene domain knowledge should yield a more natural, and hence more powerful automatic character recognition. 2.1.3 Syntactic Construction (Stallings, 1965) William Stallings wrote a program for reading machine-printed Chinese through the construction of a data structure to represent the input character. Stallings' program views character components as graphs, with the nodes being the connections between strokes, and the arcs being the segments of strokes between connections. The geometric relations between components of a character form a binary tree, with the component-graphs as terminal nodes, as in Figure 3. (our writing, but conforming to the requirements of sample input graph built by the program (stroke segments quantized in 8 directions, nodes labelled according to number of Stallings' program) connections) Figure 3. Recognition by Syntactic Construction 8 Construction of this graph is a major part of Stallings' program. It is done by tracking opposite sides of strokes, in search of the stroke connections which form the nodes of the graphs representing components. As shown in Figure 4, a stroke will usually widen at its connection with another stroke; the node is noticed by his program when this widening surpasses a specified threshold. Complete description of a node is made by further tracking to link the super-threshold crossings together, as shown in Figure 5. Figure 4. Stallings' Segmentation of Strokes J Figure 5. Node Segmentation (from Stallings, 1966) 9 The program achieved over 90% recognition of machine-printed characters of various fonts; the great majority of errors arose from incorrect nodes: confusions where two nodes were practically adjacent, or missing nodes where strokes "should have" but did not quite meet Some ambiguities arose in that different characters can have the same graph, because lengths and angles of strokes impart distinctions transparent to the graph structuring. Stallings did represent some natural knowledge of the characters in his program, by creating data structures directly representing the connections between strokes and the components they form. Missing from his system were structural descriptions of the strokes themselves, and the ability to recover from errors of segmentation, wrong choices made by the low-level tracking routines; this made the process unresponsive to the ambiguities of handwriting and poor digitization. Hierarchical description and recognition of characters may be made more powerful by iteratively hypothesizing instances at higher levels, maintaining consistency between the hypotheses, and retracting them if they turn out to be wrong. Attempts to represent knowledge in this flexible way awaited the Artificial Intelligence (AI) research of the 1970's. 2.1.4 A Brief Summary of Character Recognition Character recognition was until recently a problem confined largely to classical pattern recognition research. For a detailed history, see one of the excellent surveys and analyses (Stallings, 1976, or Mori et al., 1984, for Chinese characters; Suen et al., 1980; Ullman, 1982 for general character recognition). It began with direct matching of templates to images, restricting the application to single, rather clean fonts. Researchers later developed many parameterizations of input images that could be matched against more sophisticated templates, as in Yamamoto's work (see Section 2.1.2, page 7). The emphasis of this kind of research has usually been to find a feature or small set of features which a low-level algorithm, usually embodied in a piece of digital hardware, can extract very quickly from low-resolution input images, typically 25x25 pixels per character, and which distinguishes all the characters of the language from each other. Statistical analysis of feature matches has yielded up 10 to 99.99% recognition of neatly hand-printed Chinese characters (Mori et al., 1984), therefore such features seem indeed to exist for type-set, typed, and very neatly blocked and printed characters. They do, however, elude researchers of handwriting or general hand-print recognition. A "descriptive" approach began in parallel with that of statistical pattern recognition (Mermelstein and Eden, 1964), attempting to formalize recognition of strokes, letters, and words on the basis of their structure, rather than on statistical properties. Stallings' work (1965) seems to have been the only attempt to apply these "syntactic methods" to Chinese characters before 1981, but the approach has now become common in Pattern Recognition proceedings. Unfortunately, the algorithms generally have no means of detecting their own errors, hence cannot backtrack to recover from mistaken descriptions. Building a syntactic description of an input character image should involve knowledge-directed search through the many possible descriptions that might be built. Thresholds and irrevocable heuristic guesses may either miss valid though improbable search paths, or leave too large a search space for any computer to scan by blind backtracking: Statistical, Boolean, syntactic, and other methods have achieved worthwhile, but limited success in character recognition. Even for a specialized area such as hand-print recognition, there is no generally accepted opinion as to which method is best Continued interest in using a mixture of different methods suggests that it is possible to obtain better results than any single known method gives. The fact that commercial use of character recognition is still very limited, despite colossal and indeed exploding requirements for data capture, suggests that available technology has cost-effectiveness problems. A few years ago, these problems might have been blamed at least partly on the cost of circuitry, but microelectronic progress has tended to reduce this cost greatly. A more plausible view is that in the area of character recognition some vital computational principles have not yet been discovered, or at least, have not yet been fully mastered. If this view is correct, then research into basic principles is still needed in this area. (Ullman, 1982) The control and data structures developed by Artificial Intelligence research may lead to the missing abilities. 11 2.2 A Scenic Detour into the Constraints Problem Many AI problems essentially consist of finding values for a set of variables from given domains, subject to various constraints between the variables (see Nudel, 1983, for a review of this paradigm). Generally, the variables and their domains form the inputs to a consistency algorithm, and the constraints represent the knowledge being applied to solve the problem. Often the domains are finite, so the constraints can be represented by simple sets of legal tuples of domain elements. In computational vision, the domain elements are sometimes called "labels" and constraint satisfaction is seen as a "labelling" problem. Various special cases of the constraints problem characterize much of the AI research conducted during the 1970's. As this thesis uses a consistency maintenance theory adapted from such a constraint system (see section 2.4, page 18), we present some notes on the history of constraint satisfaction algorithms. 2.2.1 One Binary Constraint (Huffman, 1971; Clowes, 1971) Huffman and Clowes wrote similar programs which analyzed line drawings of groups of blocks to determine some of the three-dimensional arrangements implied by the lines arranged on the two-dimensional paper. Their "variables" were the junctions, or line intersections. Their domains, compiled for each of a set of easily distinguishable junction types, were labels for the different three-dimensional corners that can project onto the observed junction in the image plane. The different junction labels were derived from labels for their connected lines: 4- marks a convex edge, — a concave edge, > an edge occluding an area around to the right of the "arrow" represented by the *>' symbol. For example, a forked junction could be labelled + + +, meaning that all three lines at the junction come from convex edges. This is the junction you see if you look at a corner of a box which is pointing at you, with all three adjacent faces of the box visible. Other possible labels for a fork junction are >-<, , - X , and > <-. Each junction label implies the identification of the adjoined lines as concave, convex, or occluding edges. Final assignment of values to all the variables, from these domains, follows a single binary constraint: that the two junctions terminating each line both give the line the same interpretation. Huffman's algorithm for finding labellings consistent with 12 this constraint was exhaustive search of the cross product of the domains, which worked satisfactorily for the few variables and small domains considered. Huffman and Clowes' work was a first step toward constraint-based vision systems. In such a system, segmentation routines elicit ambiguous features, or variables, from raw sensory input; constraints representing knowledge of the world and the mapping from images to that world are applied to the variables by an efficient algorithm, eventually yielding an unambiguous interpretation of the part of the world that had been seen. Some of what remained to be discovered, after Huffman and Clowes' work, was efficient algorithms for constraining the variable values, the use of more complex constraints, and a general algorithm for creating the variables to begin with — the segmentation of the input and the composition of hypotheses. 2.2.2 Development of Network Consistency Algorithms Waltz (1972) devised a "filtering algorithm" which efficiently reduces the ambiguities of variables subject to binary constraints, later shown by Mackworth and Freuder (1985) to be linear in the number of variables, if the graph formed using the variables as nodes and the constraints as arcs is planar. The refined algorithm is also linear in the number of arcs for non-planar graphs. Montanari (1974) analyzed the application of binary constraints to picture processing, presenting an algorithm for transforming certain n-ary constraint problems into binary-constraint networks. Mackworth developed a new consistency algorithm for n-ary constraints, as part of a model-based recognition system called MAPSEE (1977). The system transforms a conservative segmentation of hand-drawn sketch maps into variables representing the clearly distinguishable lines and regions depicted in the map. Each of the variables initially has a set of possible values, called "labels", which are its possible interpretations as part of the map. For example, a region of the map might be "lake", "land", or "sea". The consistency algorithm reduces the ambiguity of the variables' possible values according to constraints representing knowledge of the conventions involved in such maps. For example, the regions on opposite sides of a line cannot both represent "sea", though one might be "sea" if the other is "land". Ambiguities remaining after the consistency process cue further segmentation of the input, forming a "cycle of perception" (Mackworth, 1978). 13 2.2.3 Full Constraint Lattice (Freuder, 1978) Freuder published an algorithm, calling it "k-consistency", which, given a set of variables with finite domains, and a set of relations of any arity over these variables, finds all the tuples of labels over the full set of variables which satisfy all the relations. That is, all the constraints are met for every tuple in the resultant giant relation over all the variables. The algorithm works in time (Xn n) where n is the number of variables (Seidel, 1984), and assuming a fixed bound on the size of the domains. Freuder notes several refinements that would improve the algorithm's efficiency, including not adding tautological nodes, and different strategies for linking nodes. The algorithm guarantees a correct solution for a very general class of constraint problem, which makes it an important achievement. However, it does not meet all the needs of the recognition process: a structure for the set of variables, to express knowledge of the composition of simple objects into complex ones; knowledge-based methods for creating and possibly destroying variables, to allow hypotheses and reprises; and constraints on very large or infinite variable domains, to include real number measurements. 2.2.4 Hierarchy and Hypotheticality (Brooks, 1981) Several A I systems (Reddy, 1973; Davis and Henderson, 1979; Mackworth and Havens, 1983) have enriched the representation of the constraints problem by the use of hierarchies of the variables, building the higher levels of these data structures by revocable hypothesis. Rosenfeld and Zucker (1976) have applied relaxation methods to hierarchical recognition to represent knowledge of likelihood and for more efficient search. Tsotsos (1984) has applied systems involving several orthogonal hierarchies to analysis of time-varying medical information. Systems using hierarchical representations, where the nodes contain variable slots and procedures for filling them are usually called "frame-based" (Minsky, 1975). Frames are separated into "classes", which hold the constraints 14 and the specifications of how the variables may be composed, and "instances", which are the data structures representing the actual objects being viewed. Brooks' ACRONYM is one of the most sophisticated and fully implemented frame- and constraint-based vision systems to date. It enables definition of classes of three-dimensional objects based on generalized cylinders, with constraints on subparts being algebraic equality or inequalities of mathematical expressions including functions like min, max, sin, and cos, as well as standard arithmetic For each object, there is a hierarchy of constraints specifying the object in degrees of detail; for example a set of constraints in an airplane object could specialize an instance to be a particular style of passenger craft as opposed to a military bomber, and other constraints could determine the make and model. ACRONYM simplifies the constraint expressions and finds, for each object instance hypothesized, the most specialized set of constraints satisfied. For each of the object classes, ACRONYM predicts possible image features using knowledge of geometrical projection. These predictions are a formalism for producing the ground hypotheses of the process, as matches of the predicted features to the actual data. The matches push the system into the "cycle of perception" (Mackworth, 1978), hypothesizing more object instances, applying the constraints stored in the object classes, and predicting more image features to seek out and incorporate. There are difficulties in applying the ACRONYM knowledge representation to character recognition. Its algorithm for predicting possible image features is based on a transformation from three-dimensional solids, represented as generalized cylinders, to perspective images, not on two-dimensional objects like characters, and not on something general enough to allow both. ACRONYM 'S constraints are all over numerical variables, though processed symbolically, whereas Chinese characters and their components have labels and symbolic constraints upon them. The system does contribute the useful separation of classes from instances, and of low-level perceptions from hypothesized objects. It also contains a strong attempt to formalize the concepts of object and constraint. 15 2.3 Knowledge Representation Applied to Character Recognition 2.3.1 Hierarchical Relaxation (Hayes, 1980) English handwriting has much of the variation and hence ambiguity of Chinese characters, requiring flexible control and a knowledge of composition in order to read it. By "composition" we mean our understanding of objects as having component parts. Complex objects are made of subparts which themselves are made of subparts, and so on down to form a "composition hierarchy". As letters are parts of words, smaller objects like segments of strokes and connections between them are parts of letters. Hayes wrote a program for reading neatly handwritten words. It constructs a graph whose nodes represent hypotheses of recognition of critical points on strokes, strokes themselves, and finally letters, hence a composition hierarchy of three levels. Each node has a list of labels, which one may interpret as values for the variables represented by the nodes. Each label has a numerical value attached to it which indicates the "certainty" of that label being the correct one. The program builds the whole graph without eliminating any competing hypotheses, with heuristics to initialize the certainty factors. Since no search is done, there is no pruning of the hypotheses to be tested. However, two processes lead to efficient winnowing of the label sets once the whole set is built up: one modifies the certainty factors according to global formulae, proposed by Hayes, that heuristically move the values closer to 1 for correct labels and 0 for wrong ones; the other process removes labels whose certainty drops below some threshold. Hayes* work contributes to the idea of using a composition hierarchy and general constraint satisfaction algorithms, but lacks the ability to prune the search space of possible object hypotheses during their construction. In many recognition tasks, including our own, the set of conceivable composition hierarchies is too large to build explicitly before applying constraints. 16 2.3.2 Frames for Fortran (Brady and Wielinga, 1978) Brady and Wielinga wrote a program to read hand-printed Fortran coding sheets, using an implementation of frames. They defined frames for the stroke-primitives forming the input to the recognition process, a frame for the Fortran program which constitutes the recognition goal, and more frames for many intermediate-level objects like stroke junctions, words, and operators. The frames contain procedures and predicates which help to instantiate the frames, for example a set of actions to take when a particular slot of a frame is filled. The authors described the problem of general control of the instantiation process, suggesting a "research director" process to administer the various subprocesses incited by new instantiations. Their work identifies a number of important general problems and principles for computational vision which this thesis addresses: that the vision problem involves an enormous quantity of input data and surprisingly complex computation, for example a million-pixel input image producing thousands of features for input to the recognition cycle; that perception requires a hierarchy of intermediate levels between some direct sensation, like pixels, and complex objects like English characters: We have genuinely been surprised at the sheer number of intermediate levels that we have been forced to represent explicitly in our program. If this is true for hand-printed characters, then the number needed to perceive natural scenes must be quite staggering. (Brady and Wielinga, 1978, p 291) For reasons of convenience, the Brady/Wielinga system considers strokes as primitive percepts, while naturally written strokes are actually very complicated objects. The image processing done to extract strokes was motivated by the application alone; a complete theory of visual recognition will include a robust and general algorithm for segmentation. The flow of control presented by Brady and Wielinga seems very complex and possibly chaotic. The intelligent control of this flow must be explicit, clear, and well-defined. 17 2.4 Schema Labelling (Havens, 1985) Havens has developed a schema-based theory of knowledge representation for recognition. This thesis is an implementation of his theory, with some extensions. While the implementation and indeed Havens' theory provide more power and formalism than much previous work, schemas do have a long history; in 1920 Henry Head published a book including a theory of how the human nervous system remembers and recognizes the patterns of activity it incites or suffers: But in both cases the image, whether it be visual or motor, is not the fundamental standard against which all postural changes are measured. Every recognizable change enters into consciousness already charged with its relation to something that has gone before, just as on a taxi meter the distance is presented to us already transformed into shillings and pence. So the final product of the tests for the appreciation of posture or passive movement rises into consciousness as a measured postural change. For this combined standard, against which all subsequent changes of posture are measured before they enter consciousness, we propose the word "schema". By means of perpetual alterations in position we are always building up a postural model of ourselves which constantly changes. Every new posture of movement is recorded on this plastic schema, and the activity of the cortex brings every fresh group of sensations evoked by altered posture into relation with it. Immediate postural recognition follows as soon as the relation is complete. (Head, 1920, pp 605-606) Schemas have percolated through psychology (Bartlett, 1932) into Artificial Intelligence (Minsky, 1975; Shank and Abelson, 1977; Havens, 1983; Mackworth and Havens, 1983). In computer vision, they reflect a concern with the recognition of complex objects. As discussed earlier, complex objects are conveniently represented by composition hierarchies, where the top nodes of the hierarchies are the things one wishes to recognize, the intermediate nodes represent components, and the bottom corresponds, by understood processes, to a set of image features. Schemas are network structures with nodes representing objects, and links representing their relationships, including, but not confined, to physical composition. The recognition problem can be seen as the process of building correct schema instances, based on schema classes (or prototype schemas), from ambiguous, noisy, and incomplete input. Havens has 18 given definitions to prototype schemas and schema instances which attempt to make this building process more clear, and more computationally reasonable (Havens, 1985). His theory divides the recognition process into two parts, called composition and specialization. The composition process builds a network of schema instances, creating and destroying instances and the links between them. To encode the knowledge of composition to specify valid instances and links to build, each prototype schema possesses a set of composition rules, as well as a set of components — the prototype schemas with which relationships among instances may be made. The components can be seen as "slots", or "variables", attached to given schema instances, and into which are placed values corresponding to other schema instances. The composition rules correspond to the "methods" often seen attached to frames (Minsky, 1975) to fill slot values, or to algorithms which build the entire network of instances as a first stage of recognition (Waltz, 1972; Hayes, 1980). It has been advocated strongly that knowledge of composition be more formally developed in frame systems (Brachman, 1985), and Havens' theory makes some progress in doing so, by defining composition rules as a formal component of prototype schemas. The components form a generic composition hierarchy of schemas (see Figure 7, page 24), which the recognition process uses to build a composition hierarchy of instances (see Figure 13, page 33). The specialization process ensures that the network of instances created by the composition process obeys a set of constraints on their interpretation. Each constraint is a relation on a set of constituent classes, giving the "legal" tuples of labels, or "legal interpretations" of instances of the class. Each schema class is used to represent a "type" of object, and hence each class has a set of labels to identify the different "interpretations", or elements of the "type". For example, an "automobile" schema class might include labels "Mustang", "Eagle", and "Bobcat". Whereas the composition rules govern how schema instances may be created and linked, the specialization process operates on the knowledge of which labels are allowed for various compositions. A prototype schema therefore consists of a set of labels, a set of constraints, a set of components, and a set of composition rules. A schema instance consists of a reference to its prototype schema, a set of currently applicable constraints, and a set of labels consistent with those constraints. The instances are built by the composition process, and their coherence as descriptions of the world is maintained by the specialization process. The following chapter describes how these structures and processes have been extended and implemented for Chinese character recognition. 19 Chapter 3 S C H E M A S FOR CHINESE C H A R A C T E R S The Too is forever undefined. Small though it is in the unformed state, it cannot be grasped. If kings and lords could harness it. The ten thousand things would naturally obey. Heaven and earth would come together A nd gentle rain fall. Men would need no more instruction and all things would take their course. Once the whole is divided, the parts need names. There are already enough names. One must know when to stop. Knowing when to stop averts trouble. Too in the world is like a river flowing home to the sea. Lao Zi, from the Dao D£ Jihg This chapter presents the extensions made to schema labelling to perform Chinese character recognition, as well as the specific knowledge used for the recognition. We begin with a statement of our goal for the program and follow this with a brief overview of the system. Next we describe the various sources of knowledge about the structure of Chinese characters used in defining our schema classes. To aid in understanding the definitions of the data structures and control knowledge, an example of the system's operation then follows. Finally, we describe in detail the knowledge representation method used: the production of low-level schema instances, the extensions to standard schema data structures, and our control of the recognition process. 20 3.1 Goal We require an extension of schema labelling that is flexible enough to represent the generic structure of hand-printed Chinese characters, with sufficient control knowledge to enable their recognition. This required extended types for the schema instance "variables", or components, more complex types of constraint, and extended facilities for representing control knowledge, to prevent explosion of the search process when faced with the vast search space provided by the ambiguities and complexities of hand-print. Isolated analysis of a single pixel in an image of a character cannot determine the character of which it is a part, or even what type of stroke it comes from; this ambiguity generates a search space: what strokes one might hypothesize to exist on the basis of the pixels seen, or more generally, what objects one might hypothesize to exist on the basis of the subparts already found. The program must represent this ambiguity and search efficiently for correct hypotheses; the alternative to such controlled search is either incorrect guesses, or floundering through a colossal search space. The major result of our work on this goal has been the structural representation of Chinese characters, using the extensions to schema labelling just mentioned, in a form which includes at least some knowledge for control of the search process, achieving some recognition success thereby. 3.2 Overview of the System The system is a computer program whose input is an array of pixel grey-scale values for a digitized image of a Chinese character, and whose output is a network of schema instances, including one character instance, hopefully with a unique label correctly identifying the character of the image. By the way, the bold italic font, as in the word character, is used throughout this thesis to identify schema classes we have defined, as in the composition hierarchy of Figure 7, page 24. There are several natural stages to the recognition process, shown in Figure 6. The rectangles in the figure represent the data structures created by the various stages, starting with the pixel grey-scale array in the upper left-hand corner, and ending with the character instance in the lower-right. The entire system, designed as shown, was implemented and unit-tested. Note however, that systematic, recorded testing was performed only with synthetic data at the stroke instance level. 21 Figure 6 Overview of System Operation Scanned Character Contour-Segment end Locus Instance Network I One-Labelled Character Instance 22 First, an edge detection program (see the uppermost oval in Figure 6) converts the grey-scale image into a set of "contours'* — sequences of edge segments making up the outlines of the character's strokes. Secondly, a segmentation program divides the contours at natural points, creating a network of instances of two schema classes: locus (meaning "particular location") instances representing the points of contours where they are segmented, and contour segment instances representing the bits of contour between loci. Thirdly, a schema labelling interpreter is invoked which builds up the network of instances, adding higher and higher level instances representing greater and greater conglomerates of parts, until a character instance appears with a single label. The interpreter uses a knowledge base composed of a set of schema class definitions (Havens, 1985). These classes contain the knowledge of composition of Chinese characters, and constraints on the labels or names of various subparts, and, very importantly, knowledge of control for the process of searching for correct instance creations and linkages. The input set of low-level instances to the interpreter yields a huge search space of possible higher-level instances to create; searching this space effectively is a crucial part of the recognition problem. In Section 3.5.4 we describe this search problem more fully, along with our attempt at a solution. 3.3 Knowledge of Chinese Character Composition The definition of schema classes for recognition of Chinese characters requires more than a dictionary of the symbols or even simple familiarity with writing them. Several sources of knowledge were tapped for clear and useful descriptions of the characters' structure. While most people who read or write Chinese are consciously aware of a division into "radicals" and other components, and a further division into strokes, there are other important layers in the composition of the characters which must be explicitly detailed for complete recognition. The following sources were combined to produce the complete composition hierarchy, shown in Figure 7. 23 Figure 7. Composition Hierarchy C O M P O S I T I O N H I E R A R C H Y 24 In 1965, Rankin completed a linguistics thesis at the University of Pennsylvania (Rankin, 1965) presenting a generative grammar for Chinese characters, a very regular and succinct formalism describing all the tens of thousands of characters in the language, except for a few hundred oddly constructed ones. His description comprises two levels, a finite state automaton for building "groups" of "strokes" (his terminology), and a short context-free grammar for combining "groups" into "components" and "components" into full "characters", based on a few simple geometric relations. Figure 8 is a part of Rankin's automaton network. Each node represents a particular, valid group. Each group is generated from another or from a single stroke by the addition of a particular stroke in a particular position; the arrows in the diagram represent this etymology. In all, Rankin lists about 400 valid groups, with a maximum of seven strokes in each. Figure 8. Valid'Groups'of Strokes (from Rankin, 1965) We have defined a schema class called group, which specifies the schemas to which group instances may and should link, including constituent strokes, superpart components, stroke connections, and numbers for position and size. For example, a given group instance might link to 25 two stroke instances, a connection instance which represents the geometric relation between two strokes, a pair of cartesian coordinates for the center of the group, and a number representing the area taken up by the group. "Components'' in Rankin's grammar are sets of up to five "groups", placed vertically to each other, or horizontally, or with one group enclosing the others. Components, combined in the same three ways as groups are combined, form characters. Rankin added some restrictions which cause the grammar to generate almost the exact set of all Chinese characters. The excess generations are "well-formed" but "meaningless" characters. We based our schema specification of Chinese characters on Rankin's grammar, such that our composition rules can produce all the structures Rankin considers well-formed. In addition, our labelling constraints simply leave out the meaningless combinations of subpart labels, which prevents them from being produced as interpretations. We also defined a schema class called component representing the "components" of Chinese characters, not to be confused with the use of the word "component" for the links between schema instances! Component instances have links to the groups which compose them, to the character instance which they compose, and to numbers for their position and size. The composition rules implement Rankin's geometric constraints, while the labels and corresponding labelling constraints of the component class implement the semantic notions of meaningfulness. Finally, the top-level character class links to components, position, and area, to summarize the whole structure of a recognized character. We have in Rankin's work the foundation for an explicit structural representation for practically all of the Chinese characters, down to the level of detail where "strokes" are the primitives. The model includes knowledge of the hierarchy of structure from strokes through intermediate groups of strokes and up to full characters. It specifies the geometric relations between the constituents and allows the notion of meaningfulness to be conveniently defined in terms of the structures built up by the geometric compositions. Realistic character recognition cannot blithely begin at the stroke level; recognition of strokes is a difficult problem of itself, because of stroke overlaps and intersections, crude or complex 26 penmanship, and wildly varying angles. Freehand Chinese strokes on paper do obey certain structural rules, and a restricted set of these, describing fairly neatly hand-printed characters, has here been found amenable to implementation in our schema system. Figure 9 shows examples of the strokes we have specified. The selection of primitives were chosen from a linguistic reference (Leon, 1981, p 382), though the labelling scheme is our own. Figure 9. Different Strokes What are the component parts of a stroke on paper? Every Chinese stroke can be described naturally as a sequence of one to four bars. Our bar schema class represents these elongated blobs of ink. A bar is a two-dimensional area, but approximates a short circular arc — that is, it looks like a bit of curved line of constant curvature. The various bar labels arise from different lengths, angles of orientation, and sign of curvature of the various possible bars. While Chinese tradition gives almost poetic names for these, we have used a simple, efficient coding scheme where V stands for a long 27 vertical bar, V stands for a long horizontal bar, and so on for about 15 labels. Each stroke is uniquely identified by a sequence of these bar labels. Even low-level structures like our bars are broken down into subparts. The links from a bar instance are to two bar-sides and two bar-ends, which all together make up the contour around the outside of the bar. Bar-ends are labelled either "free" or "bent", and bar-sides have the same labels as bars. See Figure 10 for a sample decomposition of a stroke into three bars. The bars, BAR-1, BAR-2, and BAR-3, are indicated, as is one bar-end called END-OF-BAR1, and the labels for all four bar-ends in the stroke. This structure of three appropriately constructed and labelled bars uniquely identifies and describes this Chinese stroke. Figure 10. A 3-Bar Stroke with Bar-ends Connections between strokes usually interrupt the bar sides at "joints", a joint being the gap in a bar-side where another stroke has crossed it, as in Figure 11. We have defined a schema class called joint to represent these gaps. Joint instances link to the following: a number for the size of the gap, the two points on the stroke contours which bracket the gap, and four instances representing segments of that contour that end at the two joint points. Bar-end instances are constructed similarly to joints, though they often have only one locus instead of two. Identifying the bar-ends and joints is a difficult problem, as the contours of the strokes in hand-printed Chinese tend to be rounded rather than abruptly bent, and certainly do not have right angles at the pixel level for every bend at the 28 stroke level. Mack worth and Mokhtarian (1984) have suggested that one represent curves, like the outlines of strokes, using a parameterization not by cartesian coordinates, but by curvature and arclength. This leads easily to an object-centered representation and segmentation of curves into contour segments of monotonically changing curvature separated by the critical contour points we call lod, which are the extrema of curvature on the original contour. One expects endpoints and the outsides of bends to appear as positive maxima of curvature, the insides of bends to appear as negative minima, and joints to appear as nearby pairs of negative minima. The locus and contour segment classes form the bottom of our composition hierarchy. The locus instances contain only their position and curvature in addition to "upward" or "sideways" links to contour segments, joints, and bar-ends; that is, lod are not structurally decomposed in this schema system. Rather, they are the primitives from which joints and bar-ends are hypothesized. The contour segments, primitives for the hypothesizing of bar-sides, have only their length and endpoint loci in addition to "parent" bar-side links. In summary, we have defined schema classes called character, component, group, stroke, connection, bar, bar-side, bar-end, joint, locus, and contour segment (see Figure 7, page 24). These outa'de of bend Figure 11. The Segmentation 29 represent, with their labels, links, constraints and composition rules, the structure of hand-printed Chinese characters from the level of quite simple geometric primitives through to identification of the characters. 3.4 Sample Session: The Character'Yin' Here is an example of a complete run of our system, applied to the character Yin. The example is meant to make the schema classes more comprehensible, and to impart a sense of how the system works. After this section we shall describe the image processing and the schema system in detail. Yin is an ancient word, with an original concrete meaning of "the leeward side of a hill", and another meaning, of "the feminine principle", as in the Yin-Yang dichotomy, popular in Chinese philosophy (Lao Zi, 600 BC). The modern character, shown above, has two principal components, a mound-radical positioned left of a moon-radical. The moon-radical decomposes to a shell around a smaller "sub-component" called the two-radical, two small horizontal bars, the character for the number 'two'. Incidentally, the whole character actually functions inside some other characters as a subcomponent. 3.4.1 Input — the image processing Ideally, the raw data for the system is a digital image of a character. For the tests which were run to completion, the input instances were hand-computed, unambiguous segmentations (the capacity for dealing with ambiguities is shown for higher levels of the composition lattice). Although the thrust and testing of this thesis are the processing of the schema instances after segmentation, the image processing was nevertheless designed and implemented as well, in an effort to find a model for the recognition all the way from the image up to a structured description of the character. 30 Finer descriptive detail will be given in the next section, but the general flow of the image-processing sub-system is to locate lightness changes in the image, describe them as connected dots, smooth these connected paths into circular arcs, and segment using this description to produce contour segment and locus instances. To speed the recognition, a character instance and a preset maximum number of group instances are added to the input instances. This addition may be viewed as the knowledge or expectation of the fact that a character should be seen. The "guess" focuses the search, as the schema instance interpreter can be prevented from trying to create other character or group hypotheses, using control predicates which we describe later, and thus finish its work more quickly. Essentially, the search for correct hypotheses is done both bottom-up, from the instances produced by segmentation, and top-down, from the initial, unlabeled character instance. At the end of recognition, the character instance's labelset has been refined down to the unique label identifying the input character. 3.4.2 Making new instances and variables The principal module of the recognition system is an interpreter which tries to expand a given network of instances. Its basic repeated action is to make a link from one instance to another, possibly creating the second instance to do it. This is done in six stages: (1) Choose an instance to work on. (2) Choose one of the instances' potential links to try to fill. (3) Find an instance which can fill that link. (4) Check the two instances and their link against the composition rules. (5) Apply the labelling constraints to remove inconsistent labels. (6) If any labelset reduces to nil, remove a link and try another instead. 31 The "links" of this system are called "variables", because like the variables of programming languages, they contain values — in our case the values are schema instances. Let us describe the process of "adding a variable" for a particular instance-expansion during the recognition of "Yin". As mentioned above, the inputs are the contour segment and locus instances produced by image processing, plus a few hypothetical higher level objects. A first candidate instance is chosen arbitrarily; we shall consider for the example a contour segment called CS-3. The interpreter consults the con tour segment schema class and chooses to attempt a "bar-side" link, as these are the first links specified for the contour segment schema class, in its class definition. To attach a bar-side variable to CS-3, it must find a bar-side instance. Finding none in the current network (which, you remember, consisted on input of nothing but contour segments and loci), it creates a new one, calling it BARSIDE-1. The composition rules of the contour segment class are checked to see if CS-3 can have a bar-side variable holding BARSIDE-1; the check succeeds and the link is made, as in Figure 12. Figure 12. A 'Variable' Link Between 2 Instances The system might have failed if the chosen bar-side already had links to loci or contour segments far away from CS-3, because of a rule which states that the contour segments composing a 32 bar-side must be contiguous. But in this case the bar-side is easily accepted, as it is the first such attachment and has no competing links with which it could fail a composition rule. Finally, the specialization process is activated. There is a constraint which restricts the labels a bar-side may have, based on the labels of its first contour segment: for example, a horizontal bar-side cannot begin with a segment labelled "vertical", so assuming that CS-3 does have a "vertical" label, BARSIDE-l's "horizontal" label is eliminated. Several labels do remain, so the addition of the new variable succeeds. The interpreter can go on to another link. Recognition of "Yin" continues by the addition of a "barside" variable to each of the contour segments, and creation of bar-end instances for attachment to those, then creation of bars to combine pairs of bar-sides, strokes to link abutted bars, connections to group together overlapping or nearly overlapping strokes, and so on up to links between the character instance and its components. Figure 13 is a trimmed version of the final instance network for "Yin", showing the top few levels of the composition hierarchy. Figure 13. Top of Final Network for 'Yin* 33 This chapter so far should have given the reader familiarity with the terms involved in Chinese character recognition by the schema-based system, and the general principle of expansion of the schema instance network until recognition is achieved. The following sections will present the processes and data structures of the implementation in more detail. 3.5 From a Character on Paper to Schema Instances 3.5.1 Image Processing We print test characters for the system on stiff paper, with a thick felt pen (see Figure 14). An Optronics System C-45O0 Digitizing Scanner converts a character into a standard image file (Havens et al., 1982) of grey-level pixels, using a VAX 11/780 computer running the Berkeley 4.2 UNIX operating system at the UBC Laboratory for Computational Vision. The stiff paper was necessary for convenient mounting in the digitizer, and the thickness of the felt pen was necessary to compensate for the digitizer's insufficient resolution of 10 pixels per millimetre. Figure 14. A Digitized Character Image 34 The system's first subprogram finds the contours of strokes by tracking zero-crossing contours of the Laplacian of the Gaussian of the image at a fixed spatial frequency. This technique is derived from the theory of edge detection developed by David Marr and colleagues at MIT (Marr and Hildreth, 1979; Hildreth, 1980). We chose a frequency giving maximum sensitivity to an edge 2 pixels across, about the sharpest that the digital masks could approximate, as the reflectance changes for characters inked onto white paper are of course quite abrupt. First an image file is made, using a program written by Alan Carter at UBC, holding first-order approximations to the gradients of the zero crossings, a measure which appeals to our intuitive notion of the "strength" of an intensity change. Our program then extracts chains of pixels from the zero-crossing file, choosing 4-connected pixels above a particular gradient threshold. These chains "represent" the "contours" that the system "sees" in the image. Figure 15 is a picture of the contours found for the sample character. Figure 15. Contours Found By Edge Detector A second program, written by Farzin Mokhtarian (Mackworth and Mokhtarian, 1984), calculates the curvature at all points of the contours, using a Gaussian convolution of fixed spatial 35 frequency. This provides a description of the contours in terms which make it easy to segment at the points where the contours bend sharply, which we call loci. The loci mark the endpoints of strokes and the joints where strokes curve or intersect. See Figure 16, for example, where the significant features of the character's outline remain, but the irregularities of the original and digital images are largely removed. Figure 16. Smoothed Contours of a Character. As the points of the contours of strokes which most interest a character recognizer are the sharp curves — ends and bends and intersections of strokes — the extrema of curvature are useful points at which to segment. A short LISP subprogram converts the curvature descriptions of the stroke contours into locus and contour segment schema instances by scanning the list of curvature values for positive maxima and negative minima, building contour segment instances as it scans and 36 linking them to locus instances created at the extrema. Figure 17 shows the loci and contour segments found for the sample character. Figure 17. Segmented Contours of a Character. 3.5.2 Schema Instance Extensions When the system has finished, its description of the recognized character is a network of schema instances, ranging from the input loci and contour segments up to a single character instance. These instances are the principal unit of representation for the characters being recognized, so we shall describe their structure here in detail. Havens has defined schema instances as structures derived from prototype schemas, created by a composition process and modified by a labelling process 37 (Havens, 1985). For each instance, he includes a reference to its prototype schema, a set of applicable labelling constraints, and a labelset In this thesis we have extended some of these aspects of instance structure. A description of these extensions follows. (1) variables (a list of the instance's currently filled variables) The variables are a schema instance's links to other instances, corresponding to Havens' "components". Whereas components are featureless links, our system's links are named, typed, and structured, to give the linking mechanism more power. We view such a link as a variable, which may be created or destroyed at any time during the recognition process. A schema class specifies the variables which its instances may hold, including information about what the variables may bind to. In general, each variable of an instance will be bound to another instance. (2) constraints (the set of constraints applicable to the instance) Our system provides three types of constraint explicit label tuples as in Havens' theory, predicates, and generative functions, all applied to the labels of the variables held by an instance. These labels are the possible interpretations of the instances bound to the variables. Definitions for the different constraints actually reside at the level of schema classes (what Havens calls 'prototype schemas*), and will be described later, within a given instance, all constraints produce the same structure: a set of tuples of labels, representing the label combinations which satisfy a constraining relation over some particular subset of the instance's variables. Some of the constraints are as small as a single value for a single variable, some allow hundreds of tuples over the labels of a dozen variables. A typical constraint is ternary, allowing pairs of labels for two components, and giving to each pair a label for a special variable which holds the parent instance itself. For example, we present some of the tuples from a constraint in a character instance, over the variables holding the character itself, a "subcomp" component, and a "leftcomp" component, these last 38 guaranteed by the composition rules to be appropriately positioned to form a character. The first tuple of the constraint indicates that the character may be labelled 'yin', as long as its subcomp component may be a 'moon-radical' and its leftcomp component a 'mound-radical'. A CONSTRAINT OVER THE VARIABLES (character subcomp leftcomp) ((yin moon-radical mound-radical) (good child-radical woman-radical) (bestow precious-radical ci-phonetic) . ) (3) labelset (a list of legal tuples over the external variables) A simple finite set of labels is often insufficient to uniquely disambiguate an instance. For example, numbers representing size, weight, or age might be useful addenda to the label representing a person's identity. For this reason we extend the label set of an instance to be a set of tuples: the cartesian product of the label sets of a specified set of "external variables", reduced by the labelling constraints of the schema class. The labels of a single class may come from a finite set, as in traditional constraint systems like Waltz's, or else they may come from an infinite set such as the real numbers, or even from a set of structures such as LISP lists. From the above example of a schema for a person, a possible label set for a particular instance might be {(Timothy medium average 23) (Ian medium average 23)}. Here the system has not yet decided which person the instance represents, Ian or Timothy, but at least it does know the person's size, weight, and age! While the larger labelset does not necessarily further disambiguate the instance, it does make access to important information about the instance simple — everything the recognition process or the system user may want to know about an instance, for building or disambiguating other instances, or verification of the program, can be put in the label tuples. 39 For Chinese characters as described herein, most of the external variables represent metrics like the lengths of bars and the positions of connections. Note the labelset of the following bar instance, refined already to a single tuple where the principal label is V for 'vertical', both ends are 'free', the length is 0.79, the 'posl' or 'position of first endpoint' is at coordinates (0.5, 0.9), and so on. The contents of all the bar's variables are listed below the labelset. ft******************************** INSTANCE bar-2 LABELSET: (v free free 0.79 (0.5. 0.9) (0.5. 0.1) (0.9 0.5 0.10.5)) OVER EXTERNAL VARIABLES: (bar, endl, end2, length, posl, pos2, space) INTERNAL VARIABLES: 'sidel' = bar-side-3, labelled (v) 'iside' = bar-side-4, labelled (iv) 'curvature' = ((0.0)) 'stroke' = stroke-2, labelled (v). Figure 18. The Labelset and Variables of a Bar Instance 3.5.3 Schema Class Extensions The schema classes are data structures holding the knowledge which is necessary to create and keep consistent the schema instances. They are essentially prototypes for generating the instance network. Havens has defined a prototype schema as a structure including a labelset (identifying the various possible subclasses which the instances of the class might represent), a set of components, a set of composition rules governing these components, and a set of constraints over the labelsets of the 40 components (Havens, 1985). In our system, several extensions have been made to the definitions. They provide mechanisms for greater control of the composition process, some new types of labelling constraints, and support for the schema instance extensions already described. (1) variable prototypes (a list of specifications for the variables which an instance of the schema class may hold). Variables are the links between schema instances, representing the relati nships between recognized objects. Whereas Havens lists the schema classes which may be linked to, this system treats the links themselves as data structures of some complexity. The prototype of a variable is not a "type" in the programming language sense, but a specification of how variables may be created for instances of a given schema class, and which instances those variables may then bind to. The variable prototypes of this system allow the distinction between different constituents of the same schema class. For example, the bars which make up the strokes of Chinese characters are drawn in particular directions, hence each bar has a 41 "beginning" and an "end" (see Figure 10, page 28). As shown in Figure 19, the bar schema class has variable prototypes "endl" and "end2", both allowing the end-of-bar schema class. The bar class also has a variable prototype called "stroke", which allows its variables to hold instances of the stroke schema class, and a variable prototype called "length" whose variables can hold numbers. In this system, there are various ways to bind variables; when defining a set of schema classes, one chooses a "fill method" for each variable prototype. The default method is to scan the already existing instances for one that satisfies the class's composition rules. One may specify a creation fill method, causing a featureless new instance to be created and used for the variable, if no already existing one fits. The computation method gives an instance for the variable by calling a function with other variables as arguments. This is used, for example, to fill the "length" variable of an instance made of components with "sublength" variables (length = (sum sublength 1 sublength2 . . .)). Some variables are computed by the segmentation routines which create the initial instances, before the recognition machine begins its work — these variables have a bottom-up fill method. Finally, some variables are merely the inverse links of other variables, bound automatically when their inverses are bound. Thus when a man has a son, the boy automatically obtains a father. This method is called inversion. Before linking any variable, the recognition process consults its fill method, and binds accordingly. (2) external variables (a list of variable prototypes) These are simply a subset of all the variable prototypes which are used to build the labelsets of instances of the schema class, as previously described on page 39. Having "external" variables explicit in the labelset gives two benefits: the labelset gives a fuller description of its instance, for presentation to the system user or for use by other instances, and external variables allow a useful form for the composition rules, described next. 42 (3) composition rules (a list of valid "composition prototypes" for instances of this class) Each schema instance must satisfy at least one of its class's composition rules. In this system, each composition rule is (a) a list of variable prototypes whose instantiations might comprise a valid composition, and (b) a list of predicates on the instances to be held by these variables, predicates which must be satisfied to make the composition valid. For a rule to be satisfied, every one of its predicates for which all argument variables have been bound must be satisfied. To represent a restriction on composition, one writes a LISP predicate for a particular list of variables, where the predicate is applied to the labels of the instances held by the variables. For example, one composition rule in the Chinese character specifications allows only values of the variable "curvature" greater than a particular threshold, by giving the rule a predicate as follows: (sharp-enough curvature) where the function definition is: (defun sharp-enough (k) (greaterp k min-k)) Some natural restrictions apply to subparts of subparts. For example, the composition rule for strokes demands that the first end of its first bar and the last end of its last bar be labelled "free" (see Figure 10, page 28). This is represented by allowing the composition predicates to apply to any external variable of the instance held by a variable, hence the predicates: (free? (endl barl)) (free? (end2 bar4)) where the function definition is: (defun free? (end-label) (eq end-label 'free)) 43 Strokes potentially have variables "barl", "bar2", "bar3", and "bar4". The composition predicate (free? (endl barl)) guarantees that the external variable "endl", or the instance held by the "barl" variable of the stroke, has a "free" label. Unfortunately, some compositions depend on the identity rather than on the labels of their constituent instances, that is, the instances held by their variables. For a stroke composed of two bars, the end of the first bar must be the same instance as the start of the second bar, not merely have the same label (e.g. "bent"). Also, the efficiency of the recognition process can be heavily marred if certain variables are bound before others, again independently of their labels. Both of these problems are solved in this system by the use of three special composition predicates which operate on the identities of the instances held by the variables: eq, neq, and prec, meaning equality, inequality, and precedence of binding. For a concrete example of these predicate types, see Figure 20 for a composition rule for "strokes" of Chinese characters. It guarantees that "endl" of "barl" will be free, that "end2" of "barl" will be the same instance as "endl" of "bar2", that the "barl" variable will be bound before, or preceding "bar2", and so on. ((barl bar2 bar3 bar4 length space group) ((free (endl barl)) (eq (end2barl) (endl bar2)) (eq (end2bar2) (endl bar3)) (eq (end2bar3) (endl bar4)) (free (end2 bar4)) (prec barl bar2) (prec bar2 bar3) (prec bar3 bar4) (neq barl bar2) (neq bar2 bar3) (neq bar3 bar4) (neq barl bar3) (neq barl bar4) (neq bar2 bar4))) Figure 20. A Composition Rule for the Stroke Schema Class 44 The composition rules express possible bindings of variables, but do nothing to reduce the label set of an instance from the cartesian product of its external variables' labelsets. Schema labelling (Havens, 1985) uses constraints on label sets to achieve this reduction. (5) constraints (a list of constraints on the variables of the class) In addition to the finite sets of tuples which form the constraints of standard schema theory (see the last section of Chapter Two), we have implemented constraints defined as predicates and functions written in LISP. To control application of the constraints more flexibly than by the presence of the affected variables, we have added a means of ignoring a constraint under special circumstances: a constraint may be excluded by the presence of particular variables; thus each constraint includes a list of "excluded variables". For example, in the schema class for a group of strokes, the constraint applied to determine the group's label depends on how many strokes there are in the group. The set of three-stroke groups is represented by constraints which exclude the variable "stroke4": (exclusive-tuples (s4) (group si s2 s3 cl2 c23) ((hlr h 1 r 1/2-1/3 1/2-0) (hrOu h rO u 2/3-1/3 2/3-1/2) (hvh-c h v h 1/2-1/2 1-1/2) (hLjh-a hL j h 1-0 1/3-1/2) (lvh I h v 1/2-0 1/3-1/2) (1H1H5 1H 1H 5 1-1/3 1-1/2))) Figure 21. A Constraint on the Group Schema Class Figure 21 shows a constraint of type "exclusive-tuples", where s4 (a "fourth stroke") is the only excluded variable. If the s4 variable is present, hence bound to some particular stroke, this constraint is not applied. The constraint is a set of valid tuples over the variables shown on the second line: group, si, s2, s3, cl2 (a "connection" between strokes 1 and 2) and 45 c23 (a connection between strokes 2 and 3). The labels in the "group" column of the tuples represent the various groups of Chinese strokes made of exactly 3 strokes and the two indicated connections between them. The constraints are the knowledge that identifies some labels as correct while forcing others to disappear. In this system, a constraint can be a list of valid tuples over the label sets of the given variables. It can also be a predicate over them, which succeeds if and only if its arguments meet the constraint For example, (lowerleftof (position Ucomp) (position subcomp)) guarantees the right configuration of subcomponents by checking their positions with a simple geometrical formula called "lowerleftof in LISP. Finally, a constraint can be a function, giving the label of one variable on the basis of the labels of the others; for example, (length = (sum (length barl) (length bar2))) computes the length of a stroke instance using its subpart lengths and the LISP function "sum". The two latter constraint types are major extensions to schema labelling, to deal with real-valued labels, which could lead to tuple constraints with an infinite number of tuples. For example, some Chinese characters are distinguished only by the relative lengths of their component strokes, and some by the relative positions of their component groups; one can express these constraints on the character's label by predicates on the real numbers representing its lengths and component positions, while a list of valid (label length) or (label position) tuples would be infinite. (6) search control predicates (three functions for control of the hypothesis process) The algorithm using schemas to recognize objects must decide which variable of which instance to try to bind next This is the question of the order in which hypotheses should be made. We define three predicates to help make this decision: expand-new-component? tells whether or not to move control to the instance linked to, or retain control at the instance linked from; relinquish-control? tells whether or not to relinquish control back to the instance which gave control to the current instance, and satisfied? decides whether or not to remove the instance from further attempts to add variables to it. 46 The control predicates are used to reduce the combinatorics of arbitrary hypothesis-making for Chinese character recognition. For example, if the program is given a list of bar instances already linked into strokes, it methodically creates, links, and completes all possible connection instances using simple control predicates — without any backtracking required. Bar's expand-component? is always true for "connection" variables, so that the creation of a connection instance leads immediately to an attempt to complete it. A connection is neither satisfied? nor does it relinquish-control? until it holds two bars and the position of their intersection. As connections expand none of their constituents, control always returns to the bar which created the connection instance; thus all connections are created and made consistent (assigned their appropriate labels) before any attempt is made to complete higher-level instances which depend on connections. Visualize an initial set of instances, linked together into a small network, and the network growing in controlled bursts at points allowed by the control predicates. A set of bars grows a shell of connect/on instances first, then moves on to groups, components, and finally a unique character instance at the top. 3.5.4 Control of the Backtracking Search The composition process makes "hypotheses" — that is, it creates schema instances to represent objects it thinks it sees, and also variables to link them together. A "possible hypothesis" for the system in a given state is a specification that a certain instance may be linked to a certain other instance by creation of a certain variable. The latter instance may have to be newly created. The proposed variable and its binding must conform to a variable prototype of the first instance's schema class, and to all composition rules governing that variable prototype. For typical AI problems, the set of possible sequences of hypotheses ("S"), given a set of input instances, is an enormous search space requiring vigorous intelligent pruning (Nudel, 1983). This system's algorithm is based on simple backtracking search through all of "S" , thus guaranteed to find a consistent "solution" if one exists, at the risk of terribly long computation. The theoretical worst case remains to be a traversal of the entire tree of possible hypothesis sequences — at 47 least exponential in the size of the input. Three refinements to standard backtracking made the algorithm efficient enough to recognize 5 7 % of the tested synthetic characters within reasonable memory and time limits. The first two refinements are due to standard schema theory, the last one is the major contribution of this work here. First, the use of labelsets for the schema instances coalesces groups of many hypotheses into single ones (Mackworth and Havens, 1983); for example, upon recognizing four wheels and a connecting body, one may hypothesize an "automobile" instance, and because of the labelling of the "automobile" schema class, one need not separately hypothesize a Ford Mustang, a Mercury Capri, a Toyota Corolla, and so on. For Chinese character recognition, this method was applied at all levels, so that 3 5 stroke types, 240 possible "groups", and over 100 components and characters were all represented by only four schema classes. Wherever a set of hypotheses might be made, the search tree has a branching factor equal to the cardinality of that set Labelling may reduce the set to a single hypothesis, where the ambiguity of that hypothesis is reduced by a later process — that of consistency maintenance with the class's constraints. Labelling may thus reduce the branching factor of the search of "S" by a constant factor, hence exponentially reducing the full size of the search, assuming that the consistency process does not increase the search complexity. Second, the algorithm groups as one hypothesis the variables allowed by separate composition rules of the same schema class, potentially saving another constant branching factor. This is done by assigning backtracking choicepoints to the stack of possible variable bindings rather than to the stack of composition rules as is usually done. For example, there are several ways to group components to make a Chinese character. Here are three of them: component 1 lefl-of components componentl above component2 component 1 surrounding component2 48 Suppose the character in question is of the "surround" type, obeying the third composition rule stated above. Backtracking through all possible hypotheses could perform the following sequence of actions: make a variable for 'component 1' make a variable for 'component2' test if componentl is Ieft-of component2 FAIL (remove the variables) make a variable for 'componentl' make a variable for 'component2' test if componentl is above component2 FAIL (remove the variables) make a variable for 'componentl' make a variable for 'component2' test if componentl surrounds component2 SUCCEED For each hypothesis, our system checks all the rules, a small finite set for each schema class, until it finds one that succeeds: Make a variable for 'component 1' make a variable for 'component^' test if componentl is left-of component2 NO, SO TRY ANOTHER test if componentl is above component2 NO, SOTRYANOTHER test if componentl surrounds component2 SUCCEED 49 Thus, our system does not fail to the point of removing variables even once for this example, while the simpler backtracking scheme fails twice. The difference is significant if many instances are considered for each variable binding, as the failures of each rule will be multiplied by the number of binding combinations made. The attempting of all rules for each set of bindings reduces this pathology at an acceptable cost traversal of the fixed-size set of composition rules for a schema class. Third, the use of "control predicates" allows heuristic switching between top-down and bottom-up search, or between depth-first and breadth-first search. This flexibility is vital to the use of backtracking for the composition process. If expand-component? always returns true, the search through the space of possible hypotheses will be depth-first, always building on the latest hypothesis made; conversely, when it returns false, control remains at the same level, adding variables to the same instance or to others created at the same time, until no more can be done with them — breadth-first search. Relinquish<ontrol? can be used to guarantee that links are all formed either top-down or bottom-up. If it returns true whenever an instance's composition state indicates that the next variable binding is "down" to a lower-level schema class, search will be consistently bottom-up. If relinquish-control? returns false whenever lower level variable links are next on the composition list, then the search will proceed top-down, trying to complete these links. These predicates enable schema classes to determine the flow of control on the basis of the state of the instance currently enjoying it, hence the more "intelligent" one makes these predicates, the sooner the process may find the solution. The actual behavior of the composition process depends on how well the labelsets and control predicates are used, but the experience with Chinese characters gives some evidence that we can obtain computation with a reasonable branching factor, giving time and space usage proportionate to the size of the input plus the knowledge base, after careful and extensive use and debugging of the control knowledge representation techniques described. See the following chapter for a summary of this evidence. 50 Chapter 4 RESULTS: Success of Implementation and Application Under heaven all can see beauty as beauty only because there is ugliness. All can know good as good only because there is evil. Therefore having and not having arise together. Difficult and easy complement each other. Long and short contrast each other; High and low rest upon each other; Front and back follow one another. Therefore the sage goes about doing nothing, teaching no-talking. The ten thousand things rise and fall without cease, Creating, yet not possessing. Working, yet not taking credit Work is done, then forgotten. Therefore it lasts forever. La\> Zi, from the Dao De Jihg 4.1 Ease of Coding of Chinese Character Specifications Once the extended schema theory had been implemented, and appropriate sources of knowledge about Chinese character composition found (Rankin, 1965; Leon, 1981), basic description of the Chinese characters was straightforward. All the group labels were coded, sufficient according to Rankin to build all but a few hundred of the 50,000 characters in the lexicon. About 100 important components and characters were explicitly coded. Variables were used to link representations of physical components, physical supercomponents, metric properties, and related objects in a uniform manner — a confounding which led to convenient constraint descriptions but sometimes too large a number of variables for efficient processing. One difficulty encountered using this methodology was in choosing between composition rules and labelling constraints for representing knowledge. While in many cases the processes are naturally distinct, sometimes labels affect composition rules, and composition affects the legal 51 labellings. For example, the value attached to a "length" variable for a stroke depends on the stroke's composition, the length being the sum of the lengths of the stroke's constituent bars. Also, the name of a stroke, which is a label for the stroke schema, depends on the number of bars making up the stroke, the angles they meet at, and on their labels, thus composition rules affect the labelling process. As an example of how labels influence the application of the composition rules, note the rule that the ends of the bars making up a stroke must be labelled "free" at the stroke's ends and "bent" within the stroke, that is, the composition rule refers to the labels, which are decided by the constraints. To prevent the backtracking search for correct hypotheses from thrashing, the schema classes had to be designed with the recognition process in mind, just as Prolog programmers normally cannot ignore the procedural aspects of their logical specifications. For example, the "fill method" indicators allowed the knowledge base to control how schema instances were generated, preventing gross inefficiency; their intelligent application became essential to use of the system. Methods could be changed, but would require reprogramming of the character specifications. This is because the procedural aspects of the specifications (e.g. ordering of rules and variables) were necessarily tuned to the general flow of process implied by the fill method indicators. For completely top-down recognition, one would have to change most of the fill method indicators and composition rules. There were choices to be made at all levels; for example, in the end system, stroke instances create group instances, while the bars which compose strokes constructed connections. An alternate method is to have the connections created top-down, by the group instances which might require them. The design of a good "fill strategy" was not trivial; control knowledge was vital to the program's ability to identify characters within the bounds of the computer's memory and reasonable processing time. This made the schema class definition a programming task —not the solely descriptive task one might wish for. Because the principle act of the recognition algorithm is the Unking of one instance to another, much of the knowledge about objects had to consist of constraints on these links. Thus, unfortunately, most of the description of the locus schema class is rules for construction of joints and bar-ends from locus instances — descriptions of the links between locus and other instances. One might wish that this type of knowledge were separate from the description of the loci themselves, but that was impossible given the structure of this system. 52 4.2 Success of Character Recognition Despite the described difficulties in coding the schema system, it was successful in limited tests of Chinese character recognition.The system was tested on 21 characters; it correctly identified 12 of them. The following paragraphs will discuss the correct identifications, and the reasons we have perceived for the failures. In a first phase, schemas were defined for the complete composition hierarchy presented in Chapter Three (see Figure 7, page 24). Classes, variables, simple composition rules and constraints were written and syntactically debugged, and their parameters were tuned to produce ground-level instances which appeared reasonable. Unfortunately, after this phase, the system as a whole did not work. The program produced thousands of misguided hypotheses; the backtracking thrashed too badly for reasonable recognition to occur. Although the schema classes were correct descriptions of their domain, they were procedurally inept. In a second phase, refinement of the schemas was begun at the bottom of the hierarchy, the level of the locus and contour segments. After a significant amount of work, reasonable stroke instances could sometimes be produced from small and simple input images. Full characters still provided too large a search space for the system to handle, apparently because of its lack of control knowledge. In a third phase, a representative set of simple, synthetic stroke descriptions were used as input, and refinement of the schema classes was done from the stroke level up. This work was continued until several characters could be recognized within reasonable limits on processing time and memory use. None of the test characters caused a failure of the system other than computation beyond the arbitrary space and time limits; the failures seem to have been the result of lack of control over the search space — not a lack of programming means to do so. Unfortunately, these are observations of what the system actually did, not theoretical or even exhaustively tested proof of the system's procedural adequacy. What we do have here is some evidence that the control knowledge representation techniques and our application of them has improved the recognition process over blind 53 backtracking, and that the backtracking allows recognition of more complex characters than non-searching systems can manage. Now let us examine the successful results which were obtained. The following pages contain pictorial representations of the synthetic characters used as test inputs — those which the program succeeded in recognizing, with printouts of the character instances returned and a few measures of the amount of computation done. The pictures were produced by a simple graphics program which takes stroke instances as input, and draws approximations to them using circular arcs for the contours, center lines, and endpoints. The labelsets of characters include the values of the variables holding the positions and areas of the characters. Note that the constraints have reduced each instance to a unique label, the identification of the character. The "input instances" include some top-down knowledge in the form of empty group and character instances to focus the recognition process on its goal. The "number of variable additions" is a count of the number of links formed between instances, including those variables which were later removed. The "number of backtracking fails" refers to the points where the process realizes it has made a mistake, and undoes a former decision to try another, possibly removing a variable. Backtracking is the principal source of inefficiency in the system, and the careful coding of constraints and composition rules is the principal influence on the number of backtracking failures. •,•,•••»*,»„****»„*,„»*•„*,***, INSTANCEcharacter-1 LABELSET: (characterposition area) ((sheng-born (0.54. 0.55) 0.47)) VARIABLES: 'position' = (((0.54. 0.55))) 'area' = ((0.4736)) 'comp' = component-1, labelled (sheng-born) NUMBER OF INPUT INSTANCES. 16 NUMBER OF VARIABLE ADDITIONS: 99 NUMBER OF BACKTRACKING FAILS: 7 TOTAL PROCESSING TIME: 490 seconds MAXIMUM MEMORY USE: 3032 pages 54 ***•*•*,****,•„*,•**•*•*• •* INSTANCEcharacter-2 LABELSET: (character position area) ((zi-child(0.56. 0.54)0.5)) VARIABLES: 'position* = (((0.565 . 0.54025))) 'area' = ((0.501795)) 'comp' = component-1, labelled (zi-child) NUMBER OF INPUT INSTANCES: 12 NUMBER OF VARIABLE ADDITIONS: 53 NUMBER OF BACKTRACKING FAILS: 0 TOTAL PROCESSING TIME: 348 seconds MAXIMUM MEMORY USE: 2381 pages ,*,**,»»********,,*,***,***,*•**»* INSTANCE character-3 LABELSET: (character position area) ((tu-earth(0.5. 0.5)0.63)) VARIABLES: •position' =(((0.5.0.5))) •area' = ((0.64)) 'comp' = component-1, labelled (tu-earth) NUMBER OF INPUT INSTANCES: 10 NUMBER OF VARIABLE ADDITIONS: 53 NUMBER OF BACKTRACKING FAILS: 0 TOTAL PROCESSING TIME: 249 seconds MAXIMUM MEMORY USE: 1707 pages ********************************** INSTANCEcharacter-4 LABELSET: (character position area) ((hao-good (0.4. 0.5) 0.74)) VARIABLES: 'position' = (((0.43 . 0.51))) 'area' = ((0.74)) 'comp' = component-1, labelled (hao-good) NUMBER OF INPUT INSTANCES: 19 NUMBER OF VARIABLE ADDITIONS: 131 NUMBER OF BACKTRACKING FAILS: 67 TOTAL PROCESSING TIME: 1654 seconds MAXIMUM MEMORY USE: 3827 pages 55 • •**«*••,*•*»•*****,*•**„**•*•*** INSTANCE character-5 LABELSET: (character position area) (Ui-kilometer (0.5 . 0.52) 0.53)) VARIABLES: •position' = (((0.51. 0.52))) 'area* = ((0.5304)) 'comp' = component-1, labelled (li-kilometre) NUMBER OF INPUT INSTANCES: 23 NUMBER OF VARIABLE ADDITIONS: 209 NUMBER OF BACKTRACKING FAILS: 39 TOTAL PROCESSING TIME: 1390 seconds MAXIMUM MEMORY USE: 2947 pages ,,«,„•*•**,*,******•»************ INSTANCEcharacter-6 LABELSET: (character position area) ((wang-king(0.52. 0.5)0.48)) VARIABLES: •position' = (((0.525 . 0.5))) 'area' = ((0.4818)) 'comp' = component-1, labelled (wang-king) NUMBER OF INPUT INSTANCES: 13 NUMBER OF VARIABLE ADDITIONS: 76 NUMBER OF BACKTRACKING FAILS: 0 TOTAL PROCESSING TIME: 355 seconds MAXIMUM MEMORY USE: 2281 pages ********************************** INSTANCE character-7 LABELSET: (character position area) ((zhong-middle (0.54. 0.51) 0.46)) VARIABLES: 'position' = (((0.54025. 0.515))) 'area' = ((0.463245)) 'comp' = component-1, labelled (zhong-middle) NUMBER OF INPUT INSTANCES: 14 NUMBER OF VARIABLE ADDITIONS: 118 NUMBER OF BACKTRACKING FAILS: 0 TOTAL PROCESSING TIME: 497 seconds MAXIMUM MEMORY USE: 3016 pages 56 ***•*•*••***••»*• • INSTANCE character-8 LABELSET: (character position area) ((ren-people (0.53 . 0.54) 0.41)) VARIABLES: 'position' = (((0.53 . 0.545))) 'area' = ((0.4154)) 'comp' = component-1, labelled (ren-people) NUMBER OF INPUT INSTANCES: 7 NUMBER OF VARIABLE ADDITIONS: 30 NUMBER OF BACKTRACKING FAILS: 0 TOTAL PROCESSING TIME: 193 seconds MAXIMUM MEMORY USE: 1443 pages ,,*„,,**,*,*********,,**„,*»*,,* INSTANCE character-9 LABELSET: (character position area) ((shi-ten(0.5. 0.5)0.63)) VARIABLES: 'position' = (((0.5. 0.5))) 'area' = ((0.64)) 'comp' = component-1, labelled (shi-ten) NUMBER OF INPUT INSTANCES: 7 NUMBER OF VARIABLE ADDITIONS: 30 NUMBER OF BACKTRACKING FAILS: 0 TOTAL PROCESSING TIME: 168 seconds MAXIMUM MEMORY USE: 1218 pages ,**,**,,**,**,*****••*,*»»*•*»»••• INSTANCE character-10 LABELSET: (character position area) ((qian-thousand (0.5. 0.51)0.48)) VARIABLES: 'position' = (((0.51. 0.515))) 'area' = ((0.486)) 'comp' = component-1, labelled (qian-thousand) NUMBER OF INPUT INSTANCES: 10 NUMBER OF VARIABLE ADDITIONS: 53 NUMBER OF BACKTRACKING FAILS: 0 TOTAL PROCESSING TIME: 254 seconds MAXIMUM MEMORY USE: 1756 pages 57 ......,.•.....,,.•.,.,„,.,. INSTANCE character-11 LABELSET: (character position area) ((tai-too(0.62. 0.55)0.43)) VARIABLES: 'position' = (((0.625. 0.55))) 'area' = ((0.438)) 'comp' = component-4, labelled (tai-too) NUMBER OF INPUT INSTANCES: 13 NUMBER OF VARIABLE ADDITIONS: 33 NUMBER OF BACKTRACKING FAILS: 11 TOTAL PROCESSING TIME: 720 seconds MAXIMUM MEMORY USE: 6000 pages I * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * INSTANCE character-12 LABELSET: (character position area) ((yin-yin(0.45. 0.5)0.46)) VARIABLES: 'position' = (((0.45025. 0.50025))) 'area' = ((0.46268025)) 'comp' = component-8, labelled (yin-yin) NUMBER OF INPUT INSTANCES: 23 NUMBER OF VARIABLE ADDITIONS: 95 NUMBER OF BACKTRACKING FAILS: 50 TOTAL PROCESSING TIME: 1895 seconds MAXIMUM MEMORY USE: 3359 pages As mentioned above, the classes and their rules were refined until some success was achieved within reasonable limits on time and space. The change in the recognition time as the knowledge base matured was dramatic, but much work remains before it might efficiently handle the whole lexicon, especially from raw pixel input data. For example, a careful reordering of variable specifications in a schema class often eliminated the creation of thousands of erroneous instances and variable links during one recognition attempt, cutting recognition time by hours. However, the next character attempted might cycle for hours without success, revealing yet another inefficiency. Just as the search space is huge, the variety of applicable control knowledge seems to be huge as well. The correctly identified characters used computer time and space in roughly linear proportion to the size of their 58 input, that is the number of input instances (see Figure 22). Generally, the maintenance of the complex network of instances, with all its labels and their constraints, is expensive in time and space, but seems from the admittedly small test space to be at least of good computational order when enough intelligence is coded to control the search for good variable bindings. Input Character CPU Time Used Maximum Memory Usage Instances Number (seconds) (512 Kbyte pages) 7 9 168 1218 7 8 193 1443 10 3 249 1707 10 10 254 1756 12 2 348 2381 13 6 355 2281 13 11 720 6000 14 7 497 3016 16 1 490 3032 19 4 1654 3827 23 5 1390 2947 23 12 1895 3359 Correlation for CPU Time vs. Input Variables: 0.74 Correlation for Memory vs. Input Variables: 0.87 (worst case Character #11 excluded) Figure 22. Linearity of Time and Space versus Input Variables The complexity of Chinese characters, when described as completely as the schema theory demands, makes the coding of the specifications a large task, and more specialized hardware for the algorithms a necessity, if the system is to compete with commercial ones. The system's power does emerge in the completeness of the descriptions it achieves for the examples chosen: the character instances, aside from their final unique label of identification, embody the character's complete and natural decomposition into components, groups, strokes, and so on down to image features. Should the average stroke length or maximum curvature be required, they are obtainable from the character's schema instance and its components. The recognition is an understanding of the structural nature of the characters rather than a simple identification of an element of a set of visual patterns. 59 Chapter 5 CONCLUSION Heaven and earth are ruthless; They see the ten thousand things as dummies. The wise are ruthless; They see the people as dummies. The space between heaven and earth is like a bellows. The shape changes but not the form; The more it moves, the more it yields. More words count less. Holdfast to the center. Lao Zi, from the Dao JIhg 5.1 Summary A recognition machine for some hand-printed Chinese characters was written using schema labelling techniques. It correctly identified 12 out of the 21 characters attempted using synthetic stroke-level instances as input. In the other 9 cases, execution was aborted because it exceeded reasonable memory or processing-time limits. No characters were actually misrecognized. A subsystem for producing lower-level schema instances directly from grey-scale pixel data was designed, implemented, and tested, though sufficient control knowledge to enable searching through the full set of ambiguities was coded only for the higher levels of the composition hierarchy. The definition of schemas was extended to include more complex labelsets, and different types of labelling constraints to deal with potentially infinite variable domains. These extensions to schema 60 theory make it more powerful for describing and recognizing complex objects, specifically hand-printed Chinese characters. The mechanism of variable links allows the intuitive flavour of representation by semantic networks such as the composition hierarchy, while Havens' schema labelling techniques maintain the necessary efficiency of the consistency process. Control of the search through the large hypothesis space, for the composition process, was extended by the application of "control predicates", the "fill methods" of variables, and the "composition predicates". Without such guidance, the backtracking recognition process exploded in time and space. With them, it succeeded for some synthetic stroke-level input, in time and space manageable by a minicomputer. Recognition machines for Chinese characters with natural knowledge of their composition, to allow for natural variations in their appearance, may recognize more complex characters than conventional pattern recognition techniques. Efficient systems which plough very well through thousands of typed glyphs usually fail when applied to handwritten text. Humans identify Chinese characters by recognition of their composition: a character is its components which are their subcomponents which are their strokes which are their endpoints and sides and connections. Character recognizers which represent these levels explicitly are more able to search through the possible compositions, as suggested by noisy data, to find consistent interpretations. 61 5.2 Suggested Future Work Thirty spokes share the wheel's hub; It is the center hole that makes it useful. Shape clay into a vessel; It is the space within that makes it useful. Cut doors and windows for a room; It is the holes which make it useful. Therefore profit comes from what is there; Usefulness from what is not there. Lao Zi, from the Dao Jmg Schema theory, as discussed here, lacks a method of learning new schema classes: all class definitions must be provided directly. An algorithm generating schema classes from example instances would be a valuable extension. Problems would arise in "learning" the predicates and functions which fill the composition rules. Use of these arbitrary LISP functions should be changed to something more easily learned. For example, they might be composed from a library of precompiled functions. Many subprograms of the system were not refined for efficiency: one could add statistical matching to rank hypotheses, tighten some of the LISP code to use less memory and do less garbage collection, do the low-level image processing and segmentation in specialized hardware or an array processor, and port the recognition code to a fast LISP machine. Given these improvements and further coding of characters, the program might compete commercially with those that do not deal with ambiguity, by actually searching the space of possible hypotheses, and hence recognizing more complex, ambiguous characters. While some nonsensical input can be tolerated by the algorithm, by building satisfactory recognition on the good instances, a lack of necessary input instances causes the process to fail. More knowledge of when appropriate instances might be created, upon suggestion by context, would be a valuable extension to the system. 62 Finally, there are complexities of handwriting which are not even attempted by this system, such as continuous script, extreme rotation or stretching of character components, and so on (see Figure 23). Figure 23. Erratic Script Style of Chinese Writing (from Wang, 1970) Better stop short than fill to the brim. Oversharpen the blade, and the edge will soon blunt. Amass a store of gold and jade, and no one can protect it. Claim wealth and titles, and disaster will follow. Retire when the work is done. This is the way of heaven. Lao Zi, from the Dao De Jing 63 Bibliography Bartlett, F.C., 1932, Remembering (Cambridge University Press, Cambridge, 1932) Beying Foreign Language Institute, 1978, The Chinese-English Dictionary (Han-Ying Cidian), (Commercial Press, Hong Kong, 19791 Beijing Language Institute, 1975, Table of Chinese Character Frequency", (BeiJing JinHua YinShangChang, Beijing, 1975) Brachman, Ronald J., AT&T Bell Labs, 1985, "'I Lied about the Trees' Or, Defaults and Definitions in Knowledge Representation", The AI Magazine, volume 6 number 3, (AAAI, Menlo Park CA, Fall 1985) pp 80-93 Brady, J.M. and Weilinga, B.J., University of Essex, 1978, "Reading the Writing on the Wall", Computer Vision Systems, (Academic Press, 1978) pp 283-297 Brooks, Rodney A. et al.. Stanford, 1979, "The ACRONYM Model-Based Vision System", IJCAI-79 Brooks, Rodney A., Stanford AI Laboratory, 1981, "Symbolic Reasoning Among 3-D Models and 2-D Images", in Artificial Intelligence 17, special volume on computer vision, (North-Holland Publishing Company, Amsterdam, 1981) pp 285-348 Brown, M.K. and Ganapathy, S., Bell Labs, 1983, "Preprocessing techniques for cursive script word recognition", Pattern Recognition, volume 16 number 5, (Pergamon Press, Oxford, 1983) pp 447-458 Casey, R. and Nagy, G., IBM Watson Research Center, 1966, "Recognition of Printed Chinese Characters", Transactions on Electronic Computers, volume EC-15 number 1, (IEEE, 1966) pp 91-101 ~ Chen Heqin, Society for the Advancement of Chinese Education, 1939, Modern Chinese Vocabulary, (Commercial Press, 1939) Chui, T., University of British Columbia, 1976, "Real-Time Computer Recognition of Chinese Characters", Electrical Engineering Master's thesis, (UBC, 1976) Cohen, Paul R. and Feigenbaum, Edward A., editors, Stanford, 1982, The Handbook of Artificial Intelligence, volume 3, (Heuristic Press, Stanford, 1982) Davis, Larry and Henderson, Thomas, 1979, "Hierarchical Constraint Processes for Shape Analysis", University of Texas Technical Report 115 (1979) Davis, Larry and Rosenfeld, Azriel, 1977, "Hierarchical Relaxation for Waveform Parsing", University of Maryland Technical Report 568 (1977) Erman, Lee D. etal.. University of Southern California, 1981, T h e Design and an Example Use of Hearsay-IH", IJCAI-81 Freuder, Eugene C, 1976, "A Computer System for Visual Recognition using Active Knowledge", MIT AI Lab Technical Report 345 (1976) Freuder, Eugene C, University of New Hampshire, 1978, "Synthesizing Constraint Expressions", Communications of the ACM, volume 21 number 11, (ACM, November 1978) pp 958-966 64 Glicksman, Jay, University of British Columbia, 1982, "A Cooperative Scheme for Image Understanding using Multiple Sources of Information", Computer Science PhD thesis (UBC 1982) Goldberg, A. and Robson, D., 1983, Smalltalk-80: The Language and its Implementation, (Addison-Wesley, Reading, Massachusetts) Gregg, James R. and Heath, Gordon, 1964, The Eye and Sight, (DC Heath, Boston, 1964) Hanson, A.R., Hampshire College, 1978, "Visions: A Computer System for Interpreting Scenes", Computer Vision Systems, (Academic Press, 1978) pp 303-334 Havens, William S., UBC, 1978, "A Procedural Model of Recognition for Machine Perception", Computer Science PhD thesis (UBC. 1978) Havens, William S, and Carter, Alan, and Kingdon, Stewart, UBC, 1982, "Standard Image Files: Generic Operations for Image Datatypes", UBC TR-82-10 (UBC, Vancouver, 1982) Havens, William S, University of British Columbia, 1983, "Recognition Mechanisms for Schema-Based Knowledge Representations", Computers and Mathematics With Applications, volume 9 number 1, (Pergamon Press, Great Britain, 1983) pp 185-199 Havens, William S, University of British Columbia, 1985, "A Theory of Schema Labelling", Computational Intelligence, volume 1 number 3, (National Research Council, Canada, 1985) pp 127-139 Havens, William S, University of British Columbia, 1986, "PIPS: A Portable Image Processing System", IEEE Computer Society Workshop on Visual Languages, Dallas, Texas (June, 1986) Hayes, Kenneth Jr., University of Maryland, 1980, "Reading Handwritten Words Using Hierarchical Relaxation", Computer Graphics and Image Processing 14, (Academic Press, New York, 1980) pp 344-364 Head, Henry, Oxford University, 1920, Studies in Neurology, (Oxford, 1920) Hildreth, Ellen C, MIT, 1980, "Implementation of a Theory of Edge Detection", MIT-AI-TR-579 (MIT, 1980) IBM Survey Report, 1962, "Survey of the need for language translation", Planning Research Corporation, RC-634, March 12 (IBM, 1962), quoted in (Casey and Nagy, 1966) Jaynes, Julian, 1976, The Origin of Consciousness in the Breakdown of the Bicameral Mind, (Houghton Mifflin Company, Boston. 1976) see pp 63-66 King, G. and Chang, H.W., 1963, "Machine Translation of Chinese", Scientific American. June 1963, volume 208 number 6, (New York, 1963) Lao Zi, about 600 BC, Dao De Jing, ("The Way-of-Morality Classic"), Feng Gia-fu and Jane English, translators, (Random House, New York, 1972) Leon, N.H., 1981, Character Indexes of Modern Chinese, (Curzon, London, 1981) Mackworth, Alan K.. UBC. 1975, "Consistency in Networks of Relations", TR75-3 (UBC, Vancouver, July 1975) Mackworth, Alan K., UBC, 1977, "On Reading Sketch Maps", IJCAI-77, (CMU, Pittsburgh, 1977) pp 598-606 Mackworth, Alan K., UBC, 1978, "Vision Research Strategy: Black Magic, Metaphors, Mechanisms, Miniworlds and Maps", Computer Vision Systems, edited by Hanson and Riseman, (Academic Press, New York, 1978) pp 53-59 65 Mackworth, Alan K. and Havens, William S., UBC, 1983, "Representing Knowledge of the Visual World", Computer, volume 16 number 10, (IEEE, October 1983) pp 90-96 Mackworth, Alan K. and Mokhtarian, Farzin, UBC, 1984, "Scale-Based Descriptions of Planar Curves", UBC CPSC TR-84-1 (UBC. Vancouver, 1984) Mackworth, Alan K. and Freuder, Eugene C, 1985, T h e Complexity of Some Polynomial Network Consistency Algorithms for Constraint Satisfaction Problems", Artificial Intelligence, volume 25, number 1 (North Holland Publishers, Amsterdam, January 1985) pp 65-74 Marr, David, and Hildreth, Ellen, MIT, 1979, Theory of Edge Detection", MIT AI Lab Memo 518 (April, 1979) Marr, David, 1982, Vision: A Computational Investigation into the Human Representation, and  Processing ot Visual lnlormation, (W.H. Freeman. San Francisco. 19B2) Mermelstein and Eden, 1964, "Experiments on Computer Recognition of Connected Handwritten Words" Miller, J., 1981, "Global Precedence in Attention and Decision", Journal of Experimental Psychology:  Human Perception and Performance, volume 7, (1981) pp 1161-1174 Minsky, Marvin, MIT, 1975, "A Framework for Representing Knowledge", The Psychology of  Computer Vision, edited by Patrick Winston, (McGraw-Hill, New York, 1975) Montanari, U., 1974, "Networks of constraints: Fundamental properties and applications to picture processing", Information Sciences 7, Number 2, (April, 1974) pp 95-132 Mori, Shunji, and Yamamoto, Kazuhiko, and Yasuda, Michio, 1984, "Research on Machine Recognition of Handprinted Chinese", Pattern Analysis and Machine Intelligence, Volume 6, Number 4, (IEEE, New York, July 1984) pp 386-405 Mulder, Jan, UBC, 1985, "Representing Ambiguity and Hypotheticality in Visual Knowledge", PhD thesis (UBC, Vancouver, 1985) Mylopoulous, John et al., University of Toronto, 1983, "Building Knowledge-Based Systems: The PSN Experience", Computer, volume 16 number 10, (IEEE, October 1983) pp 83-89 Nakato, Y. etal., Hitachi Labs, 1973, "Improvement of Chinese Character Recognition using Projection Profiles", Pattern Recognition I, proceedings, (IEEE, New York, 1973) pp 172-178 Nudel, Bernard, Rutgers University, 1983, "Consistent-Labelling Problems and their Algorithms: Expected Complexities and Theory-Based Heuristics", Artificial Intelligence, volume 21 (North-Holland, Amsterdam, 1983) pp 135-178 Ogawa, H. and Taniguchi, K., Fukui University, 1982, Thinning and Stroke Segmentation for Handwritten Chinese Character Recognition", Pattern Recognition, volume 15 number 4, (Pergamon Press, Great Britain, 1982) pp 299-3TJB Rankin. Bunyan Kirk III, 1965, "A Linguistic Study of the Formation of Chinese Characters", PhD linguistics thesis, (University of Pennsylvania, 1965) Reddy, D. Raj et al., 1973, T h e Hearsay Speech Understanding System: an example of the recognition process", IJCAI-3, (IJCAI, 1973) pp 185-193 Riggs, L., 1965, "Visual Acuity", Vision and Visual Perception, C Graham editor, (Wiley, New York, 1965) Roberts, L., MIT, 1963, "Machine Perception of Three-Dimensional Solids", Lincoln Laboratory TR 315 (MIT, May 1963) 66 Rosenfeld, Azriel, and Hummel, R.A.. and Zucker, S.W., 1976, "Scene Labelling by Relaxation Operators", IEEE Transactions on Systems, Man, and Cybernetics, SMC-6 (IEEE, New York, 1976) pp 420^33 Rubin, Steven M., Bell Labs, 1981 "A Text Recognition System with Generalized Context", Canadian  Information Processing, volume 57 (1981) Sakai, Kunio et al., Toshiba R + D Center, "An Optical Chinese Character Reader" Schank, Roger C. and Abelson, R.P., 1977, Scripts, Plans, Goals, and Understanding, (Lawrence Erlbaum Associates, Hillsdale NJ, 1977] Seidel, R., Cornell University, 1984, "On the Complexity to Achieve K-Consistency", Cornell Computer Science Department, (unpublished technical report) Stallings, William, 1966, "Recognition of Printed Chinese Characters by Automatic Pattern Analysis", Computer Graphics and Image Processing, (Academic Press, 1972) pp 47-65 Stallings, William, Arlington Center for Naval Analyses, 1976, "Approaches to Chinese Character Recognition", Pattern Recognition, volume 8, (Pergamon Press, Great Britain, 1976) pp 87-98 Suen, C.Y., and Berthod, M., and Mori, S., 1980, "Automatic Recognition of Handprinted Characters — the state of the art", IEEE Proceedings, volume 68, number 4, (IEEE, New York, 1980) p 469 Trotter, Robert J., 1981, "Computers with Character", Science News, Volume 120, July 1981, pp 26-28 Tsotsos, John K., University of Toronto, 1984, "Representational Axes and Temporal Cooperative Processes", RCBV-TR-84-2, Computer Science Department, (U of T, Toronto, April 1984) Ullman, J., 1982, "Advances in Character Recognition", Applications of Pattern Recognition, KS Fu editor, (CRC Press, 1982) Waltz, David L., MIT, 1972, "Generating Semantic Descriptions of Scenes with Shadows", MIT-AI-TR-271 (MIT, Cambridge, 1972) Wang Fangyu, Yale University, 1970, Introduction to Chinese Cursive Script, (Yale, 1970) Winograd, Terry, 1975, "Frame Representations and the Procedural-Declarative Controversy", in Representation and Understanding, edited by D.G. Bobrow and A. Collins, (Academic Press, New York) pp 185-210 Wohn Kwangyoen, University of Wisconsin, 1980, "On Recognizing Korean Characters" (Computer Science Department, University of Wisconsin, unpublished draft) Yamamoto, E. et al., Fujitsu, 1981, "Handwritten Kanji Character Recognition using the Features Extracted from Multiple Standpoints", Pattern Recognition and Image Processing, (IEEE, 1981) pp 131-136 Yuen. C.K., University of Hong Kong, 1983, "Dual-stroke method for Chinese character input". Centre of Computer Studies and Applications TR-A6-83, (Hong Kong, 1983) Zucker, Steven, McGill University, 1978, "Vertical and Horizontal Processes in Low Level Vision", in Computer Vision Systems, Hanson and Riseman, editors, (Academic Press, 1978) pp 187-195 67 A P P E N D I X A Knowledge Representation Code This system was implemented as a set of LISP functions which allow definition and creation of schemas, and execute the recognition algorithm presented in Chapter Three. This appendix is a description of the data structures which implement the schema classes and instances. A schema class is a LISP atom with ten attributes saved on its property list: name, labels, instances, variable types, external variables, composition rules, constraint prototypes, and the three control predicates expand-component?, relinquish-control?, and satisfied?. The name is a simple LISP string; the labels are a list of LISP strings; the instances are a list of instances as described later. Each element of the list of variable prototypes is an atom with five things on its property list: a name for the variable class; the schema class it may contain; a "fill method indicator" as to how the variable may be instantiated — as result of a composition function, by creation of a schema instance to fill it, by scanning of existing schema instances for one that works (this is the default), by the low-level input-creation routines, or as inverses to other variable links which may be ignored; a list of constraint prototypes covering the variable; another list of just those constraint prototypes of type "tuple", defined below. The external variables of a schema class are a list of those variables of instances of the schema class which must be accessible to other instances, for satisfaction of composition and constraint predicates and calculation of computed variables. Each element of the list of composition rules is a list of three things: a list of variable prototypes comprising a possibly valid composition; a list of predicates defining the validity of the composition, each predicate being represented by a list of the variable prototypes for providing the schema instances whose labelsets are the arguments for the predicate, and the name of a LISP function embodying the predicate itself. As many composition constraints depend on the identity of 68 instances and their order of linking for efficiency, three special predicates — 'eq' for equality, 'neq' for inequality, and 'prec' for precedence of linking — operate on the instances themselves rather than on their labelsets. Each element of the list of constraint prototypes of a schema class is of the type "predicate", or "function", as "tuple" constraints are compiled directly into constraint nodes linked to the variable prototypes they cover. Each non-tuple constraint prototype is an atom with four things on its property list the type, predicate or function, of the constraint, a list of the variables it covers, a list of the "excluded" variables which would make the constraint inapplicable, and the name of a LISP function embodying the constraint Figure 24 is an example of definition of a schema class, drawn from the Chinese character recognition application. Figure 24. Printout of the Stroke Schema Class (mkclass 'stroke '(h v 1 d r o s u 6 7 3 4 5 h3 hL hV j 11 IH IR rO vl v7 vH hDl hLl hLC hRO h V l hV7 hVH vHO hVHO 1HL1 vHLl) '((barl bar) (bar2 bar) (bar3 bar) (bar4 bar) (length number compute) (space number-list compute) (conns connection ignore) (group group create)) '(length space) '(((barl bar2 bar3 bar4 length space group) ((free (endl barl)) (eq (end2barl) (endlbar2)) (eq (end2bar2) (endlbar3)) (eq (end2bar3) (endlbar4)) (free (end2 bar4)) (prec barl bar2) 69 (prec bar2 bar3) (prec bar3 bar4) (neq barl bar2) (neq bar2 bar3) (neq bar3 bar4) (neq barl bar3) (neq barl bar4) (neq bar2 bar4)))) '((exclusive-predicate (stroke barl) (bar2) eq) (exclusive-tuples (bar3) (stroke barl bar2) ((vH v h) (IR 1 r) (IH I h) (hL h I) (hV h v) (h3 h 3) (rO r 0) (v7 v 7) 0" v 1) (vl v 1) (U 1 1))) (exclusive-tuples (bar4) (stroke barl ((hDl h d 1) (hLl h 1 1) (hLC h 1 d) (hRO h r 0) (hVl h v 1) (hV7 h v 7) (hVH h v h) (vHO v h 0))) (tuples (stroke barl bar2 bar3 bar4) ((hVHO h v h 0) (1HL1 1 h 1 1) (vHLl v h 1 1))) (function length sum ((length barl) (length bar2) (length bar3) (length bar4))) (exclusive-function length sum ((length barl) (length bar2) (length bar3)) (bar4)) (exclusive-function length sum ((length barl) (length bar2)) (bar3)) (exclusive-function length quote ((length barl)) (bar2)) (function space merge-space ((space bar 1) (space bar2) (space bar3) (space bar4))) (exclusive-function space merge-space ((space barl) (space bar2) (space bar3)) (bar4)) (exclusive-function space merge-space ((space barl) 70 (space bar2)) (bar3)) (exclusive-function space quote ((space bar I)) (bar2))) 'never2 'always 1 'always 1) A schema instance is a LISP atom with four things on its property list: its class, variables, composition state, and labelset The class is a structure as defined above. Each of the list of variables is an atom with four things on its property list the variable's prototype; the instance, and its tuples, contained by the variable; and the constraints governing the variable. Some of the constraints will be compiled ones, kept in the schema's class; some, in the form of predicates and functions, will be created after the variable is added to the instance. The composition state of a schema instance is a LISP cell with two pointers: one to a list of variable classes, indicating the next starting point for a search to find a new variable to add to the instance, and one to a list of instances of the schema class required by the first variable class in the former list, indicating the next starting point for a search for an instance to fill that variable. If the instance list is empty, this indicates that the way to fill the variable is by computation or instance creation. The labelset of an instance is a list of tuples, each tuple being a list of the first elements of tuples from the external variables, consistent with all the currently applicable constraints. Figure 25 is a sample printout of an instance produced during recognition of a Chinese character. Figure 25. Printout of a Sampl e Instance ,*,,.,„».*.**,,*,*,,**,*.»,,*.,*. INSTANCEcontour-segment0246 LABELSET: (contour-segment length curvature orientation) ((5 0.06743652641427311 3.2242462 266.8132280669122) (4 0.06743652641427311 3.2242462 266.8132280669122) (3 0.06743652641427311 3.2242462 266.8132280669122)) NEXT VARIABLE TO TRY FILLING: variable 'cendpoinf holding locus0228 VARIABLES: •length' =((0.06743652641427311)) "curvature* = ((3.2242462)) 'cendpoint* = Iocus0232, labelled (max) 'ccendpoint' = locus0228, labelled (max) 71 'orientation' = ((266.8132280669122)) 'barside' = bar-side0258, labelled (3 4 5) CONSTRAINT LATTICE: predicate0249 variables: length contour-segment tuples: ((0.06743652641427311 5) (0.067436526414273114) (0.067436526414273113) (0.067436526414273112) (0.06743652641427311 1) (0.067436526414273110) (0.067436526414273117) (0.06743652641427311 6)) predicate0251 variables: curvature contour-segment tuples: ((3.2242462 5) (3.2242462 4) (3.2242462 3) (3.22424622) (3.2242462 1) (3.2242462 0) (3.2242462 7) (3.2242462 6) (3.2242462 r) (3.2242462 v) (3.2242462 1) (3.2242462 ih) (3.2242462 io) (3.2242462 iv) (3.2242462 u) (3.2242462 h)) predicate0257 variables: orientation contour-segment tuples: ((266.8132280669122 5) (266.81322806691224) (266.8132280669122 3) (266.8132280669122 o) (266.8132280669122 r) (266.8132280669122 v) (266.8132280669122 I)) Notice that some of the variables are filled with normal schema instances, some with numerical values computed from other variables. The "cendpoint" and "ccendpoint" are the two ends of the contour segment, the "barside" variable holds the bar-side which starts with this contour segment. Notice also that only three labels, from the total possible 18 labels for such segments, have survived the three constraints applied. The first constraint is a predicate disallowing overly short strokes or overly long "dots". The second distinguishes between strokes bent to the left of right, which 72 has significance in Chinese characters, as in the English distinction between a hand-printed 'q' and a hand-printed 'g*. The third constraint filters labels according to the segment's orientation on the page, distinguishing, for example, between horizontal and vertical. The recognition machine is one LISP function and its many subfunctions. "RECOGNIZE" takes as input a list of instances, and returns an expanded list of instances when it cannot create any more variables consistently. The user of the machine must define the schema classes representing the system's knowledge; the input instances will provide the links into this network of knowledge. For this purpose, the functions MKCLASS and MKINSTANCE take simple descriptions of classes and instances as input and produce the above representations, including compilation of tuple constraints and the various cross links between data structures. These latter functions required some 1000 lines of commented LISP code. The composition and constraining algorithms required about another 1000 lines. 73 A P P E N D I X B Printouts of Sample Instances Here is a printout of the top level instances produced by the recognition of a character called TAI. Each instance appears in this printout as a title line (e.g. ********** <name-of-instance>) followed by either two or three sections presenting the internal structure of the instance at the time of the printout, which in this case is after recognition is complete. The first section is called LABELSET: it first lists the variables over which the labelset is constructed, and then enumerates the full set of tuples of values for those variables. As described in Chapter Three, the labelset is over a special variable holding the instance itself, and a list of "external variables" specified in the schema class. For example, in the CHARACTER-1 instance below, the external variables are "position" holding only one value, the cartesian coordinates (0.62 . 0.55), and "area" holding the number 0.43. Where there is only one value for each external variable, as in COMPONENT-1 below, these values are printed for the first label-tuple only. The second section of each instance printout is the full set of variables currently held by the instance, including the external variables. Each variable's name is listed to the left of a representation of its value: a list of the labels for the instance held by the variable. For example, in the COMPONENT-1 instance, below, one can see that the "group" variable holds the instance GROUP-1, which currently has the symbols '4' and '5' as labels. Finally, where there are labelling constraints applicable to the instance, as for COMPONENT-2 below, the constraints are listed. Each constraint is presented as a list of the variables it covers, and a list of the tuples of values for these variables which were allowed by the constraint, according to its prototype in the instance's schema class. 74 *,,*,,,,,,,„„,,,,,,,.,,*,„**,„ INSTANCE character-1 LABELSET: (character position area) ((tai-too(0.62. 0.55)0.43)) VARIABLES: 'position' = (((0.625 .0.55))) •area' = ((0.438)) 'comp' = component-4, labelled (tai-too) **************«,•***,»«*»»*,*,*,,, INSTANCE component-1 LABELSET: (component position area) ((4 (0.5 .0.3) 0.0) (5)) VARIABLES: 'group' = group-1, labelled (4 5) •position' = (((0.5. 0.305))) 'area* = ((0.002)) 'supercomp' = component-2, labelled (tai-too) ,,»,»„,,»,***,*,*,,,,,,.**„.*,,, INSTANCE component-2 LABELSET: (component position area) ((tai-too (0.62.0.55) 0.43)) VARIABLES: 'subcomp' = component-1, labelled (4 5) 'ocomp' = component-3, labelled (big-rad) •position' = (((0.625 . 0.55))) 'area' = ((0.438)) 'supercomp' = component-4, labelled (tai-too) CONSTRAINTS: variables: ocomp subcomp component tuples: ((enclosure-rad jade-rad guo-country) (cheng-become h-box han-phon) (hVld two-rad moon-rad) (big-rad 5 tai-too)) *,*•*•**,*,*„,*,***,*,*,*,***»„* INSTANCE component-3 LABELSET: (component position area) ((big-rad (0.62. 0.55) 0.43)) VARIABLES: 'group' = group-2, labelled (hlr) •position' = (((0.625 . 0.55))) •area' = ((0.438)) 'supercomp' = component-2, labelled (tai-too) **,**,,*,**»**,**,**,*****»*„*«** INSTANCE component-4 LABELSET: (component position area) ((tai-too (0.62 . 0.55) 0.43)) VARIABLES: 'subcomp' = component-2, labelled (tai-too) 75 'position* = (((0.625 . 0.55))) "area* = ((0.438)) 'char' = character-1, labelled (tai-too) •*•*»****##••*•*«•»***••**#*••••*• LABELSET: (group position area) ((4 (0.5.0.3) 0.0) (5)) VARIABLES: •position' = (((0.5.0.305))) 'area* = ((0.002)) 'space' = (((0.33 0.48 0.28 0.52))) 'si' = stroke-1, labelled (6 4 5) 'comp' = component-1, labelled (4 5) INSTANCE group-1 «**»**«**»««*»••********«»**«****« LABELSET: (group position area) ((hlr (0.62.0.55) 0.43)) INSTANCE group-2 VARIABLES: 'position' = (((0.625 . 0.55))) 'area' = ((0.438)) 'space' = (((0.85 0.26 0.25 0.99))) 'si' = stroke-2, labelled (h u) 's2' = stroke-3, labelled (v 1 d) 's3* = stroke-4, labelled (r s) 'cl2* = connection-1, labelled (1/3-1/3 1/3-1/2 1/2-1/3 1/2-1/2) 'c23' = connection-3, labelled (1/3-0 1/2-0 2/3-0) 'comp' = component-3, labelled (big-rad) CONSTRAINTS: variables: group s i s2 s3 cl2 c23 tuples: ((hlr h 1 r| 1/2-1/31 ll/2-0|) (hrOu h rO u |2/3-l/3| |2/3-l/2|) (hvh-ch vh|l/2-l/2| 11-1/21) (hLjh-a hLj h |l-0j |l/3-l/2|) (lvhlh v 11/2-0| |l/3-l/2|) (1H1H51H1H5|1-1/3||1-1/2|)) «»**»*****«»••»**«»#***•**«**»***« LABELSET: (connection) INSTANCE connection-1 1/3-1/3 1/3-1/2 1/2-1/3 1/2-1/2 VARIABLES: 'barl* = bar-2, labelled (h u) Tracl' = ((0.3802262666010821)) •frac2' = ((0.3851451057550418)) 'intersection' = ((((0.53756517461879 . 0.6266158386620757) 0.3802262666010821 0' .3851451057550418))) •bar2' = bar-3, labelled (v 1 d) 'strokel' = stroke-2, labelled (h u) 76 'stroke2* = stroke-3, labelled (v 1 d) 'position' = (((0.53756517461879 . 0.6266158386620757))) 'group' = group-2, labelled (hlr) ,*****,»**,**»****„»••*»*»**,»*,, INSTANCE connection-2 LABELSET: (connection) ((| l/3-l/3|) 1/3-1/2 1/2-1/3 1/2-1/2 VARIABLES: 'bar2' = bar-2, labelled (h u) 'fracl' = ((0.3851451057550418)) 'frac2' = ((0.3802262666010821)) •intersection* = ((((0.53756517461879 . 0.6266158386620757) 0.3851451057550418 0.3802262666010821))) 'barl' = bar-3, labelled (v 1 d) 'stroke 1' = stroke-3, labelled (v 1 d) 'stroke2' = stroke-2, labelled (h u) 'position* = (((0.53756517461879. 0.6266158386620757))) ****,*„*,»**,,*,»******,*„**,»„ INSTANCE connection-3 LABELSET: (connection) ((| l/3-0|) (11/2-01) (|2/3-0|)) VARIABLES: 'barl' = bar-3, labelled (v 1 d) 'fracl' = ((0.4238143289606458)) Trac2' =((-0.3118062563067608)) 'intersection' = ((((0.528284561049445 . 0.6041876892028254) 0.4238143289606458 -0.3118062563067608))) 'bar2' = bar-4, labelled (r s) 'strokel' = stroke-3, labelled (v 1 d) 'stroke2' = stroke-4, labelled (r s) 'position' =(((0.528284561049445.0.6041876892028254))) 'group' = group-2, labelled (hlr) ************************«*„****** INSTANCE connection-4 LABELSET: (connection) ((|0-1/3|) (10-1/21) (|0-2/3|)) VARIABLES: *bar2' = bar-3, labelled (v I d) 'fracl' =((-0.3118062563067608)) •frac2r = ((0.4238143289606458)) 'intersection' = ((((0.528284561049445 . 0.6041876892028254) -0.3118062563067608 0.4238143289606458))) 'barl' = bar-4, labelled (r s) 'strokel' = stroke-4, labelled (r s) 'stroke2' = stroke-3, labelled (v I d) 'position' = (((0.528284561049445 . 0.6041876892028254))) 77 

Cite

Citation Scheme:

        

Citations by CSL (citeproc-js)

Usage Statistics

Share

Embed

Customize your widget with the following options, then copy and paste the code below into the HTML of your page to embed this item in your website.
                        
                            <div id="ubcOpenCollectionsWidgetDisplay">
                            <script id="ubcOpenCollectionsWidget"
                            src="{[{embed.src}]}"
                            data-item="{[{embed.item}]}"
                            data-collection="{[{embed.collection}]}"
                            data-metadata="{[{embed.showMetadata}]}"
                            data-width="{[{embed.width}]}"
                            async >
                            </script>
                            </div>
                        
                    
IIIF logo Our image viewer uses the IIIF 2.0 standard. To load this item in other compatible viewers, use this url:
http://iiif.library.ubc.ca/presentation/dsp.831.1-0051883/manifest

Comment

Related Items