UBC Theses and Dissertations
Information and distance measures with application to feature evaluation and to heuristic sequential classification Vilmansen, Toomas Rein
Two different aspects of the problem of selecting measurements for statistical pattern recognition are investigated. First, the evaluation of features for multiclass recognition problems by using measures of probabilistic dependence is examined. Secondly, the problem of evaluation and selection of features for a general tree type classifier is investigated. Measures of probabilistic dependence are derived from pairwise distance measures such as Bhattacharyya distance, divergence, Matusita's distance, and discrimination information. The properties for the dependence measures are developed in the context of feature class dependency. Inequalities relating the measures are derived. Also upper and lower bounds on error probability are derived for the different measures. Comparisons of the bounds are made. Feature ordering experiments are performed to compare the measures to error probability and to each other. A fairly general tree type sequential classifier is examined. An algorithm which uses distance measures for clustering probability distributions and which uses dependence and distance measures for ordering features is derived for constructing the decision tree. The concept of confidence in a decision in conjunction with backtracking is introduced in order to make decisions at any node of the tree tentative and reversible. Also, the idea of re-introducing classes at any stage is discussed. Experiments are performed to determine the storage and processing requirements of the classifier, to determine effects of various parameters on performance, and to determine the usefulness of procedures for backtracking and reintroducing of classes.