UBC Theses and Dissertations

UBC Theses Logo

UBC Theses and Dissertations

An expert system for the recognition of general symbols Ahmed, Maher M.

Abstract

This thesis addresses the problem of automatic recognition of any hand written symbol. The number of different styles of handwritten symbols demonstrates the difficulties that an automatic recognizer must cope with. For example, Some handwritten styles of capital A letters are: [picture] The main technical approaches for solving the problems in recognizing patterns are statistical pattern recognition and structural pattern recognition. Statistical pattern recognition systems use pixel-features for recognition. Some of these features are moments, histograms, Fourier transforms, and the percentage of ink pixels in different zones. Although statistical pattern recognition techniques, including Artificial Neural Networks (ANNs), carry a high recognition rate, there are some disadvantages that are in these systems. Disadvantages include the requirements of a very large number of training data, and the inability to justify its answer. In addition, the output is only a classification and not a description of the actual pattern. As opposed to statistical pattern recognition techniques, structural pattern recognition techniques extract commonly used descriptions of the patterns (structural-features). These features include loops, end points, and arcs. After extracting these similarities, the system then finds the common relationships among these structural-features (descriptions). In this research, the structural pattern recognition approach was used for developing an expert system that extracts structural-features (descriptions) from the symbol at each stage of recognition. The developed system enabled us to automatically recognize handwritten symbols, assuming that the symbols are in their isolated forms. This system is unique in that it is not limited for a specific application, but it can be used to recognize any general symbol of any language. To obtain a representation of a symbol the system performs four basic steps. First, the system adjusts the symbol by rotating it around its central point until its principal axis aligns with the vertical axis or having a multiple of 20° to the vertical axis. Second, the system scales the symbol to a predefined size. The third step is to thin the symbol. A novel rule-based system for thinning is developed in this research. The resultant thinned image is composed of the central lines of the image. Finally, the last step involves extracting and describing the thinned symbol in terms of strokes. These strokes will be approximated by a set of line segments. The resulting representation of the symbol is compared with different stored models of the different symbols in the system knowledge base. For each symbol many models are stored. The results of our system depend on a certain threshold. Using a low threshold will decrease the space for this symbol, increase the rejection rate and increase the recognition rate. The system was tested with 5726 handwritten English characters. When the system learned an average of 97 models per symbol and used a low threshold, the recognition rate was 95% and the rejection rate was 16.1%. The tested data were all test data (binary data) taken from the Center of Excellence for Document Analysis and Recognition (CEDAR) database. When the threshold is 100, the recognition rate was 87.6% and the rejection rate was 0%. The recognition rate of our system can be increased by storing more models for each symbol or by increasing the rejection rate. The system is capable of learning new symbols by simply adding models for these symbols to the system knowledge base. The system is implemented using C++ running on a 120 MHz Pentium PC.

Item Media

Item Citations and Data

Rights

For non-commercial purposes only, such as research, private study and education. Additional conditions apply, see Terms of Use https://open.library.ubc.ca/terms_of_use.

Usage Statistics