Perceptual Organization and Symmetry in Visual Object Recognition by Susan E. Wilson B.Sc, Queen's University, 1989 A Thesis Submitted in Partial Fulfillment of the Requirements for the Degree of Master of Science in The Faculty of Graduate Studies Department of Computer Science We accept this thesis as conforming to the required standard University of British Columbia March 1991 ©Susan E . Wilson 1991 In presenting this thesis in partial fulfilment of the requirements for an advanced degree at the University of British Columbia, I agree that the Library shall make it freely available for reference and study. I further agree that permission for extensive copying of this thesis for scholarly purposes may be granted by the head of my department or by his or her representatives. It is understood that copying or publication of this thesis for financial gain shall not be allowed without my written permission. Department of Com p ic-h? C So. rcr & The University of British Columbia Vancouver, Canada D ^ e HnAr.tr, / , / 9 9 f DE-6 (2/88) Abstract A system has been implemented which is able to detect symmetrical groupings in edge images. The initial stages of the algorithm consist of edge detection, curve smoothing, and the extension of the perceptual grouping phase of the SCERPO [Low87] vision system to enable detection of instances of endpoint proximity and curvilinearity among curved segments. The sym-metry detection stage begins by first locating points along object boundaries which are significant in terms of curvature. These key points are then tested against each other in order to detect locally symmetric pairs. An iterative grouping procedure is then applied which matches these pairs together using a more global definition of symmetry. The end result of this process is a set of pairs of key points along the boundary of an object which are bilaterally symmetric, along with the axis of symmetry for the object or sub-object. This paper describes the implementation of this system and presents several examples of the results obtained using real images. The output of the system is intended for use as indexing features in a model-based object recognition system, such as SCERPO, which requires as input a set of spatial correspon-dences between image features and model features. ii Contents Abstract ii Contents iii List of Figures v 1 Introduction 1 2 Previous Work 7 2.1 Model Based Vision 7 2.2 Perceptual Organization 10 2.2.1 Computational Vision 12 2.2.2 Human Vision 16 iii 2.3 Symmetry 20 2.3.1 Psychological Aspects of Symmetry . 20 2.3.2 Computational Vision Systems 27 3 The Detection Algorithm 33 3.1 Overview 33 3.2 Curve Smoothing 36 3.3 Endpoint Proximities and Colinearities 37 3.4 Location of Key Points 42 3.5 Detection of Local Symmetries 47 3.6 The Grouping Process 50 4 Results 55 5 Conclusions and Future Work 64 5.1 Goals of the Thesis 64 5.2 Implementation Problems 65 5.3 Possible Extensions 67 iv List of Figures 1.1 A Symmetrical Dish Soap Bottle 5 2.1 Types of End Connections 17 2.2 Symmetry Detection Hierarchy 23 2.3 Average Number of Correct Responses 26 2.4 Symmetric Axis Transform 28 2.5 The SAT of a Rectangle 29 2.6 A Local Symmetry 30 3.1 A Symmetrical Object 34 3.2 Endpoint Proximity 38 3.3 Curve Significance 40 3.4 Colinear Segments 41 v Chapter 1 Introduction Humans are able to recognize effortlessly and instantaneously the objects which they see around them. Many researchers believe that they perform this task by somehow matching the images formed on the retina to internal memories of the corresponding three dimensional objects. Within the field of computational vision, this is referred to as the model-based vision paradigm, in which visual object recognition consists of "finding a correspondence be-tween elements of an image and a prior representation of objects in the world" [Low84]. Digital images presented to the computer replace the retinal im-ages of humans, while internal, three dimensional models of the objects to be recognized replace the human memories. So how are correspondences gener-ated? Bottom up processing of the image yields intrinsic information such as 1 depth, orientation and edge locations, but is unable to generate a complete three dimensional reconstruction as information is lost during perspective projection from three dimensions to two. On the other hand, top down pro-cessing of the object models can only provide a two dimensional description when an object's orientation and position in space are known. The problem is to find some intermediate level of processing which is able to bridge this gap-Many previous efforts concentrated on extracting three dimensional de-scriptions from the two dimensional images. Stereo images, either taken from slightly different positions (as with the human eyes), or using different light-ing conditions (known as photometric stereo), and shape from shading are some examples of the techniques employed. These, along with other visual depth cues, clearly have an important place in visual processing, however the importance of their role in model-based recognition is unclear. Humans are able to recognize familiar objects from simple line drawings which lack not only stereo information but also colour, texture and shading data. In addi-tion, they can identify objects which are partially occluded and which have fragmented boundaries. This suggests that edge images provide a significant 2 amount of the information used during recognition, and that a successful computer system should employ techniques which require only this limited amount of input. An alternative to the bottom up methods suggested above involves com-bining information from the image and from the object models. Mathe-matical methods exist which make it possible, given a sufficient number of appropriate spatial correspondences between image features and model fea-tures, to determine the projection parameters used in the generation of the image. Once these parameters are known, it is possible to verify the match between image and model by projecting the object back onto the image and testing the correlation between the real and projected edges. The problem to be solved then, is how are features in an image matched to features in an ob-ject model in order to provide these spatial correspondences? Humans seem to group edges together effortlessly in a way which makes this task possible - this process is known as perceptual organization. Some examples of the type of features used are connectivity, parallelism, texture and symmetry. The principles of perceptual organization have been successfully applied to computer vision by Lowe [Low84, Low87], upon whose research this thesis 3 is based. In this previous work, however, processing has been confined to straight line segments or approximations, and has made use of only a few of the available properties, specifically proximity, parallelism and colinearity. It seems logical that extension of this type of work to include curved segments and to make use of other types of groupings would provide a more robust matching and indexing component. Symmetry is an attractive candidate because of the abundance of sym-metrical objects which we find around us and the correspondingly important role which it appears to play in human vision. People are able to rapidly determine if an object, or even a pattern of dots or line segments, is symmet-rical, and also where the axis of symmetry lies. For example, Figure 1.1 can easily be identified as the outline of a symmetrical dish soap bottle, with its axis of symmetry tilted slightly with respect to the vertical. Detection of this axis is valuable since it not only provides a natural basis for describing the shape of the object, and is therefore useful as an indexing feature, but also because it constrains the possible three dimensional orientation of the origi-nal object, leading to more rapid identification of the projection parameters and verification of object recognition. Symmetry is also a property which 4 remains relatively stable over changes in viewpoint, an important character-istic as less stable measures such as distances and angles make poor indexing features. Finally symmetry can be detected in simple, possibly fragmented The goal of this thesis was therefore to develop an algorithm to detect certain types of symmetries in digital images. This algorithm was imple-mented and included the detection of proximity and curvilinearity groupings among curved segments. The symmetrical groupings which are the result of the program are intended for use as indexing keys in a model-based visual object recognition system. The algorithm takes advantage of the underlying edge images. Figure 1.1: A Symmetrical Dish Soap Bottle 5 structure of the edges in an image to identify symmetries, rather than Using a brute force, point to point approach. The program worked effectively when applied to real images. Chapter 2 describes previous work which contributed to this thesis: Sec-tion 2.1 describes some existing model-based vision systems in order to fur-ther define the role of the perceptual organization component, Section 2.2 describes existing research in perceptual organization in both the computer science and psychology communities, and Section 2.3 examines the specific problem of symmetry detection. This final subsection is divided into two parts: Section 2.3.1 describes some of the psychological aspects of symme-try in human vision, while Section 2.3.2 describes existing algorithms for detecting symmetry in digital images. Chapter 3 details the implementa-tion of a five stage algorithm which detects symmetrical groupings in digital images. Chapter 4 presents the results of this work, including some exam-ples of the program running on real images. The final chapter describes the advantages and disadvantages of the algorithm, along with a discussion of possible future work. 6 Chapter 2 Previous Work 2.1 Model Based Vision The model-based vision paradigm of performing object recognition can be traced to early work by Roberts [Rob65]. His vision system attempted to identify simple three dimensional objects (formed from blocks, wedges, and pyramids) based on the idea that "shape perception is and must be invariant under perspective transformation." Working from an edge image, he at-tempted to generate sets of matches between image points and model points, which could then be used to determine not only the orientation of the object, but also the values of internal model parameters. These matches were gener-ated by first locating polygons in the image. Points in the image which were surrounded by such polygons were then located and described by the order-7 ing of the polygons and the number of sides on each. Matches could then be hypothesized between these points and model corners whose adjacent sides could have produced the correct polygon list. Once seven model-point to image-point correspondences were identified, a computation was performed which produced the required transformation parameters and also a mean-square error measure. Matches having an acceptable error value could then be verified by generating the perspective transformation of the object model using the computed parameters and comparing it to the original line drawing. This framework for recognition was very general and also very robust, but unfortunately was not expanded on immediately as attention was focused in other less quantitative directions. A second influential model-based vision system was ACRONYM [Bro81]. It used a model of the world in which all objects were composed of a hierarchy of generalized cones or cylinders. These components were "...represented by a curve through space, called the spine, along which a two dimensional shape, called the cross section, is swept. The cross section is kept at a constant an-gle to the tangent of the spine, and is deformed according to a deformation function called the sweeping rule." In this way hierarchies of objects could 8 be generated in which each level provided stronger constraints on the val-ues of the CROSS-SECTION, SPINE and SWEEPING RULE parameters. The image features which ACRONYM tried to locate were ribbons, which constrained the values of the SPINE and SWEEPING RULE parameters and ellipses, which constrained the values of the CROSS SECTION parame-ters. The matching component consisted of a constraint manipulation system (CMS) which checked the consistency of potential matches between image ribbons and ellipses and object cylinders or cones. While this framework was impressive due to its generality, the CMS was not entirely successful due to the non-linearities of the problem and the approximations that were required. A more recent system, SCERPO, described in [Low87], performs object recognition by using methods similar to those of Brooks' and Roberts', but in a more quantitative way. SCERPO begins by finding spatial correspondences between features in a two dimensional image and a three dimensional model. This task is facilitated by the use of the following powerful constraint: The viewpoint consistency constraint: The locations of all projected model features in an image must be consistent with projection from a single viewpoint. 9 As with Robert's system, edge images are used to generate a set of initial matches between image points and model points. Newton's method is then used to solve for the unknown viewpoint parameters, and verification can then be performed, again as by Roberts, by projecting the model back onto the image. The results achieved from using Newton's method were quite impressive and convergence after a small number of iterations was achieved in any case where the initial orientation parameters were within 60 degrees of the correct values. This method can also be extended to solve for inter-nal model parameters such as variable dimensions, articulations, or surface deformations [Low89a]. Given this robust matching algorithm which suc-ceeds even in the prescence of occlusion and fragmented edge data, we are left with a search problem whose goal is the identification of image to model correspondences. This problem is addressed in SCERPO by the perceptual organization component, which is described in the next section. 2.2 Perceptual Organization Perceptual organization or grouping is the term given to the human ability to detect useful groups or structures in an image, and can be traced back to 10 the Gestalt psychologists. In his paper, Laws of Organization in Perceptual Forms [Wer38], Wertheimer notes that humans, when presented with a num-ber of discrete elements, have a tendency to group them together, and that these "arrangements and divisions" seem to follow definite principles. Factors are demonstrated which tend to cause elements to be grouped together, such as proximity, similarity and uniformity of density. As well as these factors taken individually he describes the reinforcing and competing effects which can occur when more that one factor is present at one time, as well as the effects of gradual alteration of a pattern over a series of images. Moving from individual elements to lines and curves we find that the situation has become more complicated. Proximity of individual points is no longer as important as "direction" or continuity, and shape closure also becomes a contributing, though not necessarily overriding factor. Wetheimer also believes that past experience can also affect the grouping process. Though these ideas are based on examples involving simple patterns of lines and dots, many of them share a common property: they remain relatively stable over changes in viewpoint. It is this factor which makes them viable candidates for use in computational vision. 11 2.2.1 Computational Vision Although the principles of perceptual organization have been recognized for many years, and although both of the model-based vision systems described in the previous section require the detection of correspondences between the image and the internal model (Robert's junctions and Brook's cylinders), Lowe's work is the first to explicitly make use of these principles in this stage of object recognition. He identifies two criteria which grouping operations must satisfy in order to be useful [Low87]: The viewpoint in variance condition: Perceptual features must remain stable over a wide range of viewpoints of some corresponding three dimensional structure. The detection condition: Perceptual features must be sufficiently con-strained so that accidental instances are unlikely to arise. In addition, he notes that usefulness in indexing (into a database of ob-ject models) is also an important characteristic of image groupings. The SCERPO system groups together straight line segments based on colinear-ity, parallelism and connectivity. For each grouping, the probability that this 12 relation could have occurred accidentally is calculated and used to rank its significance. It is important to note that a grouping operation such as this one does not have to provide only correct groupings, nor do groups have to be non-overlapping, as the verification stage will correct any mistakes which are made by the grouping process. Though only a subset of the suggested operations were implemented in S C E R P O , and only for straight line segments, they were very successful at reducing the search space. Clearly colinear segments are likely to have come from the same line, and therefore provide a larger group for the matching process. More importantly, segments whose endpoints are close together in the image are likely to come from object edges whose endpoints are close together, so that their point of intersection forms a good index, and similarly parallel lines provide good indexing features. Extension of this work to in-clude curved edges and other groupings such as symmetry would be valuable in a full system working with multiple object models. Following Lowe's work, Jacobs [Jac88] developed a recognition system which makes use of some of the principles of perceptual organization. G R O -P E R used a grouping algorithm to try and determine which lines in an image 13 seem likely to have come from the same object. He uses three criteria for ranking predicted groups: 1. Likelihood of coming from the same object. 2. Extent to which they will narrow down the search if correct. 3. Usefulness in indexing. In addition, he adds the condition that the grouping process be fast enough to reduce the time to identify objects. The G R O P E R system uses edge segments to form an ordered collection of groups based on the first criteria listed above. Initially each line segment forms a group, and at each iteration, combinations of groups are ranked by the Bayesian probability that they come from the same object given the distance between the groups and their relative orientations. Once this is com-pleted, it is the job of the indexing component to determine which objects, if any, could have produced a given group of image edges. This is done by first calculating a set of five parameter values for each pair of edges in the group. These parameters are the angle between the two edges and the maximum and minimum distances from each edge to their point of intersection. These 14 values are calculated for each pair of model edges in a preprocessing stage and stored in a look-up table. Indexing is then performed by matching image edge pairs to model edge pairs whose parameters values are consistent, with variations to insure a globally consistent match for groups of more than two edges. The system performs well in its intended domain of two dimensional polygonal objects, but less well in more realistic 3D scenes. The author suggests that this may be due in part to the poor quality of edges generated by such images, however it seems more likely that the indexing scheme used is at fault. The heavy reliance placed on angles and distances between edges is reasonable in a two dimensional world, where perspective effects can be ignored. In a three dimensional world however the value of these parame-ters may be drastically altered by projection onto an image. Though the grouping operations are effective in gathering up edges which seem to come from the same object, which is valuable for reducing the number of possible models, the criteria used (distance and orientation) are not values which are invariant over a wide range of viewpoints, making them poor features for use in indexing. Alternate properties such as the number and relative position-15 ing of edge intersections (i.e. corners) would perhaps be more useful as they remain more stable over changes in viewpoint. 2.2.2 Human Vision Study of the human visual system can provide important clues for percep-tual organization. Walters [Wal87] describes the results of psychophysical experiments she performed, using perceived contrast as a measure of per-ceptual importance. The results of these experiments were then used in the development of simple algorithms for enhancement of perceptually significant lines in an image and segmentation of an image into perceptually significant regions. The experimental results showed a correlation between two factors and perceived contrast: the length of lines and the relationships between their endpoints. An hierarchy of end connections as shown in Figure 2.1 was then proposed, with type A being the most perceptually significant, and type D the least. Each type of connection can be classified in terms of the number of line end points and the number of interjacent points it contains. Connections are then identified by a locally connected network which identifies these proper-ties. A generic enhancement algorithm is described which consists of simply 16 A B C D Figure 2.1: Types of End Connections enhancing lines based on the following rules: 1. Lines terminating at the type A connections are enhanced by amount a 2. Lines terminating at the type B connections are enhanced by amount b 3. Lines terminating at the type C connections are enhanced by amount c 4. a > 2b 5. b > 2c Similarly, an algorithm which segments a line drawing consists of iterating the following steps over each type of connection in turn: 17 1. Assign a unique label to each connection and its participating lines. 2. Analyze the lines to divide the connections into sets of connected com-ponents. 3. Assign the lowest member's label to each member of each set. It is interesting to note that the enhancement algorithm only adjusts lines which terminate in a significant connection, which is similar to the proxim-ity and colinearity calculations of [Low87]. The segmentation algorithm is based on the observation that the different types of connections tend to arise from specific features in the image. For example, A connections are usually found on the boundary of objects, so that this algorithm will join up and emphasize lines which form this boundary. The basis of these algorithms on psychophysical data, the fact that they can be easily realized by locally con-nected networks, and the results obtained when they were applied to sample drawings supports the theory that end connections can provide a great deal of information for perceptual grouping. The algorithms themselves, however, seem unlikely to generalize beyond the domain of simple line drawings. A theory of human recognition based in part on the principles of percep-18 tual organization has been developed by Biederman [Bie85]. Recognition by components, or RBC, attempts to account for the rapid initial categoriza-tion of objects by the human visual system. The basic premise is that people recognize objects by subdividing them into primitive volumes. This segmen-tation does not depend on the familiarity of the objects, but rather on the location of points of deep concavity. The volumetric components used are differentiated on the basis of viewpoint invariant properties, and the theory provides for a small number of components, corresponding to the small num-ber of phonemes required to distinguish between spoken words. Recognition then takes place in the following stages: 1. Edge Extraction 2. (a) Detection of Non-accidental Properties (b) Parsing at Regions of Concavity 3. Determination of Components 4. Matching of Components to Object Representations 5. Object Identification 19 The paper describes some experiments which were performed in order to test this theory. They consisted of showing subjects line drawings of sim-ple objects and measuring recognition times. The drawings were altered in several ways including removing some of the primitive components and edge degradation both in midsegment and along component boundaries. Though the experimental results obtained provide strong support for this theory, the fact that the stimuli used were hand drawn and represented ideal "children's drawings" of very common objects tends to downgrade their credibility. Sub-jectively however the idea that some intermediate representation between lines and objects (such as primitive shapes and volumes) is attractive. 2.3 Symmetry 2.3.1 Psychological Aspects of Symmetry Examination of the role of symmetry in perception can also be traced back to the Gestalt psychologists, where it was part of a property termed "goodness of form". Since then psychologists have studied both the detection of symmetry and also the effects of symmetry on the performance of other visual tasks. Royer [Roy81] studied the detection problem in depth. He performed a series 20 of experiments in which the detection time for different types of symmetries was measured. The goal of his work was to develop a processing model to account for the data generated both by other researchers and in his own experiments. Several proposed models are described by the author. Palmer and Hemen-way [PH78] suggested that parallel analysis of different axis of symmetry is initially carried out, and the results are then combined. This would account for the fact that multiple symmetry was detected faster than single symme-try. Chipman [Chi77] believed that "pattern elements and their relations are specified in pattern encoding", and that symmetry would therefore sim-plify this encoding by reducing the number of pattern elements. Garner [Gar62] proposed that symmetry effects could be explained by equivalence set size. The first experiment performed by Royer was to test this hypothesis. Set size was defined under rotational and reflectional equivalence, so asym-metrical patterns (class "A") would have a set size of eight, patterns which were vertically (V), horizontally (H), 180 degree centrically(C), or diagonally (D-f ,D-) symmetric would have a set size of four, patterns which contained two types of symmetry (for example HV) would have a set size of two, and 21 finally fully symmetric patterns (M) would have a set size of one. In this experiment subjects were shown a number of test patterns and asked to determine whether the pattern was symmetrical in any way or not. The results indicated that the response times varied linearly with the equiv-alence set size, supporting the redundancy theory of Chipman, but that the detection times varied significantly within equivalence sets, refuting the the-ory of Garner. The second experiment was performed in order to study the effects of learning and of display types. It was similar to the first experiment, except that the subjects tested were familiar with the display patterns, and the dis-play elements were varied between dots, blocks, diagonal lines and vertical or horizontal lines. The results showed that familiarity had no effect on detec-tion times, but that the the type of display elements used did. Blocks were preferred, followed in order by rectilinear elements, diagonolinear elements, and finally dots. The results of these two experiments generated the following ordering on detection times for the types of symmetry tested, with type M (fully symmetric) having the shorted detection time: M , H V , V , D D , H , C C , D - , D + . 22 Another significant result was obtained from the error measures. In the experiments, asymmetrical patterns were derived by removing fifteen percent of the elements from the corresponding symmetrical pattern. The error rate for asymmetrical patterns varied inversely with the amount of symmetry present in its conjugate pattern. To account for these results the author proposes a hierarchical representation of symmetry within which an ordered search for each type occurs (See Figure 2.2). M H A C D+ Figure 2.2: Symmetry Detection Hierarchy A third experiment was performed to verify this hypothesis. Again the same type of experiment was performed, but the subjects were asked to detect a certain type of symmetry. The results corresponded to predictions 23 made by the theory. The reduction in detection times was least for vertical symmetry and greatest for centric. There were differences in the time taken to detect a given symmetry in a pattern containing all types of symmetry; these were small for vertical and horizontal but larger for diagonal and centric. In addition, the interference effects of multiple symmetries were again apparent in the error measures. These results seem to provide strong support for some kind of serial search during the detection of symmetry. It would be interesting to discover if this hierarchy corresponds to the rate of occurrence of symmetry in the world, where certainly vertical symmetry is most common, and if it is a learned or instinctive structure. Whether or not this theory is entirely accurate, it appears that humans have a strong and complex capacity for detection of symmetry that should be incorporated into computer vision. There would be no reason for humans to have such complicated mech-anisms for detecting symmetry if they had no use for it. Correspondingly, the effects of symmetry on memory and learning have been popular topics with researchers for many years. In early experiments by Attneave [Att54] he discovered that symmetrical patterns are better remembered than asym-24 metrical patterns, but suggested that this might be due to the information content rather than the symmetry. In 1971 Deregowski [Der71] performed an experiment to examine the effect of symmetry on memory, and to test the information theory of Attneave. The subjects of the experiment were children divided into three age groups (7-8,8-9,9-10), and five types of patterns were used: vertically symmetric, ver-tically repetitive, horizontally symmetric, horizontally repetitive, and asym-metric. The colour of the two halves of the stimuli was varied so that some were uniform and some were different; in addition some patterns included a distinct dividing line between their two halves. The subjects were shown each pattern for a period of two seconds and then asked to reproduce them. The two factors which significantly affected the accuracy of the responses were the type of pattern and the uniformity of its colour (the mean frequencies of correct responses obtained are shown in Figure 2.3). The age of the subjects also correlated with the number of correct responses (the younger children had a lower number of correct responses), but only weakly. These results indicate that vertical symmetry is in itself a contributing factor in pattern memory, separate from the amount of information contained 25 Vertical Stimuli Horizontal Stimuli Random Colour Symmetrical Repeated Symmetrical Repeated stimuli One Colour 0.53 0.19 0.31 0.28 0.03 Two Colours 0.39 0.11 0.25 0.17 0.03 Figure 2.3: Average Number of Correct Responses in the stimuli. This corresponds to the preference for vertical symmetry dis-covered by other researchers [Roy81, FF87]. Although the effects of horizon-tal symmetry were not as pronounced (perhaps because of the young age of the subjects), it did show a definite advantage over the repetitive pattern. These results may be less distinctive due to the young age of the subjects. Another interesting facet of the symmetry component of human vision was discovered during an experiment into shape constancy [KMTB76]. This is the term given to the phenomenon that "the true shape of an object tends to be perceived independently of the orientation of the object to the viewer's line of sight." While performing experiments to determine the cor-relation between angle of rotation and recognition time, the authors detected a bias towards responding with the symmetrical response (the stimuli used 26 were square/rectangle and circle/ellipse pairs). A second experiment was performed in which the number of symmetrical and asymmetrical stimuli presented was no longer equal (1/3 - 2/3), and the subjects were informed what the correct ratio was. There was no effect of this probability manip-ulation on the results, indicating that this is a perceptual bias. This bias suggests that it is possible that symmetry is used as a depth or orientation cue, similar to the convergence of parallel lines in the distance. The experiments described in this section indicate that symmetry plays a very important role in human vision. We are able to detect and differentiate between multiple symmetries with a strong bias towards vertical symmetry, we can use the symmetricality of a pattern to aid in memory storage, and we can use it as a clue to the orientation of objects in our world. The ability to detect and utilize symmetry would clearly be useful in computational vision. 2.3.2 Computational Vision Systems Though symmetry has not been used previously as an indexing feature, it has been used in work on shape description, and reviewing some of this research provides valuable background information for the detection algorithm. In 27 their 1978 paper [BN78], Blum and Nagel describe a method of shape seg-mentation and description based on the symmetric axis transform, or SAT. The SAT is generated by determining a set of maximal disks which fit inside an object. The SAT then consists of two parts: the symmetric axis, formed from the centers of the disks, and the radius at each point. The axis, and the object with it, is segmented at branch and end points, as shown in Figure 2.4. Each segment is then classified based on four object properties: width, axis, left boundary, and right boundary. The primitive shapes generated include worms, wedges, cups, flares, pinches, and branches. branch Figure 2.4: Symmetric Axis Transform There are threes main drawbacks to this scheme: the first, and most important, is that it requires the original shape to be a "bounded, connected, closed set of points" - something that is rarely available in computer vision 28 where occlusion and shading effects are present. The second is the way in which some objects are unnaturally segmented. For example, the SAT of a rectangle divides it into five smaller rectangles (see Figure 2.5. Finally, the computation is strongly affected by noise in the outline, which can cause significant variations in the computed description. A second method of shape description, known as smoothed local symmetries, or SLS [BA84] is based on the SAT. It was developed by Brady and Asada to overcome these drawbacks. They argue that the SAT is a purely region based descriptor, and that a better method is to combine the contour and region based approaches. Figure 2.5: The SAT of a Rectangle A local symmetry is defined as the midpoint between two points A and B on the bounding contour of a shape (see Figure 2.6). UA and ns are the 29 normals to the contour, and the angles a and b between the normals and the line connecting A and B must be equal. The axes of the SLS representation consist of "loci of local symmetries that are maximal with respect to forming a smooth curve." For any given shape, the global well as other minor axes are represented, thus solving the unnatural segmentation problems of As with the SAT representation, symbolic descriptors are then associated with the smoothed local symmetries. The descriptors are based on the values of four parameters: the sign of the curvature at the two boundary points, A and B, the rate of change of the distance between A and B, and the the SAT. nA Figure 2.6: A Local Symmetry 30 rate of change of the angle between the normals and the line connecting A and B. These parameters are simple to compute from the local symmetries and provide a simpler and more robust method of discriminating among primitives. A simple but inefficient algorithm for generating SLS would be to test each point on the contour against each other point for local symmetry. How-ever, a more efficient method is used involving approximating the contour of the object. A set of knots representing significant changes in curvature of the contour are first located, using the method described in [AB86]. These knots are then used to construct piecewise smooth approximations to the con-tour consisting of straight lines and circles. Calculation of local symmetries can then take place between segments of the contour rather than individual points, thus significantly speeding up the search. Though the smoothed local symmetries representation improves on the SAT by providing a more natural segmentation and providing local support to allow for gaps in the contour, it still requires that the input be a connected segment representing a significant percentage of the object's boundary. The usefulness of such a representation in images containing multiple objects, 31 where perspective effects have skewed their outline and fragmentation, oc-clusion, and noise have broken up their edges is bound to be limited. The ideas behind the algorithm for calculating the SLS are good however, and will be used extensively in the detection algorithm presented in the next section. 32 Chapter 3 The Detection Algorithm 3.1 Overview This section describes the implementation of an algorithm to locate certain types of perceptual groupings in edge images. As well as extending the proximity and colinearity grouping process of the SCERPO system [Low87] to work with curved line segments, an algorithm has been developed to detect symmetrical groupings. This algorithm is based to a large extent on the smoothed local symmetries paradigm, as described in [BA84]. Consider the object shown in Figure 3.1. The natural axis of symmetry is clearly vertical, and an intuitive set of cross-sectional lines can be filled in as shown. These cross sectional lines are located at points where there is something unusual about the curvature; in this case at points where the curvature has reached 33 a local maxima, where it changes sign, or the midpoint of sections where it is low (the straight segments). As with Asada and Brady's algorithm, the symmetry detection algorithm developed here takes advantage of this fact and begins by locating key points along the curves in the image. These keys points are then organized into symmetrical pairs, using the same criteria for a local symmetry (see Figure 2.6) developed for the smoothed local symmetries algorithm. At this point however, the two approaches differ. Within the SLS paradigm, it is assumed that the edge segment or segments being examined come from the outline of a single object, and in fact repre-sent the majority of the object's boundary. Given this, the goal of the SLS program is to find the best possible axis for the object, and then to use this Figure 3.1: A Symmetrical Object 34 axis as a basis for describing the object. The symmetry detection algorithm implemented here differs both in its initial assumptions and in its goals. The input to the algorithm is an entire edge image, containing fragmented edge segments which may or may not belong to the outlines of any number of ob-jects. The goal of the algorithm is to form bilaterally symmetric groupings from these edges. These groups can then serve as input to a model-based object recognition system in two ways: first the edges in a group are likely to have come from the same object, thus providing more information for the latter stages of recognition, and second the axis of symmetry can be used as an indexing key as well as a basis for object description. Note that since the results of this algorithm are to be used as indexing and matching features it does not have to produce definitive results - groups can be overlapping and are ranked by their degree of symmetry. The symmetry detection algorithm presented here therefore consists of five stages: preprocessing to extract and smooth the edges, grouping on the basis of endpoint proximity and curvilinearity, location of key points, determination of local symmetries among pairs of key points, and finally the grouping operation. The remainder of this section describes each of these 35 steps in more detail. 3.2 Curve Smoothing The initial input to the program is a set of linked edge points, generated by a Canny ([Can83]) edge detector. Each edge is represented by a list of points expressed in (x,y) co-ordinate form. Clearly this amount of information is insufficient to perform the operations described above: curvature values are required for the location of key points, while tangent and normal values are necessary to test for local symmetry. In addition, some smoothing will be necessary in order to accurately locate the key points. A method for performing all of these tasks was developed by Lowe [Low89b]. The Lowe curve smoothing and segmentation operation begins by per-forming gaussian smoothing of the input edge points using a range of a values. Each smoothed curve is then divided into intervals where the rate of change of curvature remains low. The larger the value of a used, the ' smoother the intervals must be. Intervals are then selected which maximize two factors: the length of the interval and the value of a used. Each in-terval is initially ranked using these criteria, and the algorithm proceeds by 36 repetitively choosing the highest ranked interval and removing all remaining intervals which overlap it, until no intervals are left. The chosen intervals are then processed to correct the shrinkage caused by the gaussian smooth-ing and the endpoint truncation caused by convolution. The result of this process is a set of curve segments, including tangent and curvature values, smoothed to an appropriate level. 3.3 Endpoint Proximities and Colinearities Along with edge detection and curve smoothing, one final preprocessing step is necessary. The outlines of objects in the image can be fragmented by both the edge detector and the curve smoothing process. We would like to join these segments back together wherever possible so that the symmetry detection algorithm will have better data to work with. The idea of joining segments together based on endpoint proximity and colinearity is the basis of the perceptual grouping stage of the SCERPO system [Low87]. Lowe uses the probability that these relations occurred by accident to rank occurrences of endpoint proximity, colinearity, and parallelism, but only among straight line segments. Since we are dealing with curved segments, this had to be 37 extended. First, consider endpoint proximity. The approach used in the SCERPO system was to calculate the probability that an instance of endpoint prox-imity arose by accident. Assuming uniform distribution of line segments in an image, the expected number of endpoints, N, within a radius r, of a given endpoint is equal to the average density of endpoints per unit area, d, multiplied by the area of a circle with radius r (see Figure 3.2): N = dirr2 Figure 3.2: Endpoint Proximity The density of endpoints per unit area is dependent on the length of the line segments under consideration, so the density term, d, is replaced by with the term 2D/12, where D is a constant representing the density of endpoints, and / is the length of the shorter line segment. This makes the calculation scale independent; we would not like changing the size of the image to affect 38 results in any way. The probabil i ty of two endpoints occurring close together by accident can then be approximated by the expected number of endpoints N = 2Dwr2/l2 when this value is much less that 1. This basically states that longer lines can support larger gaps, which seems intuit ively accurate. For curved lines, we can begin wi th the same basic formula, N = dirr2. However, when we attempt to replace d w i th a scale independent term, it becomes more com-plicated. Intuitively, the scale of a curve does not depend only on its length - a long meandering curve is not as significant as a smooth curve of equal length (see Figure 3.3). Th i s corresponds to the notion of saliency based on curvature and length as discussed in [SU88]. The curve smoothing al-gori thm operates by choosing a scale of smoothing, cr, which w i l l maximize the length of a curve while maintaining a low rate of change of curvature. We can therefore use cr as a measure of scale, and d can be replaced wi th 2D/cr2. The probabil i ty of the endpoints of two curved segments occurring close together by accident can therefore be approximated by the expected number of endpoints N = 2DTtr2/a2 39 where r is the distance between the endpoints as before, and a is the minimum of the two curves' cr values. Figure 3.3: Curve Significance Curvilinearity of segments is more complicated. The formula used in the SCERPO system to measure error in the linear case was E = W6s{g + h)l*l\ where 0 is the difference in orientation, s is the separation, g is the distance between endpoints, and l\ is the length of the shorter line (see Figure 3.4). Intuitively, we are measuring the distance between the two segments if one was extended over to the midpoint of the other. The rectangular region of length g + h and width 2s represents the area in which other lines would have the same degree of proximity. Given a uniform distribution of orientations, 20/n lines will be within 9 of a given line. The 2Dj\\ factor is once again present to maintain scale independence. 40 Figure 3.4: Colinear Segments Figure 3.5 shows an example using curved segments. The separation s and the gapsize g can be measured as before, and once again a can be used as a measure of scale, replacing the length measure /. The term 2gas therefore represents the area in which other lines would have the same degree of proximity. The 26/n term remains the same, and the 2D/I2 factor is replaced with 2D/cr2. The difference of curvature, k, is also added to the equation, producing an error measure of E = WOsgk/ira An algorithm which uses the results of the Lowe curve smoothing pro-gram as input and ranks instances of endpoint proximity and curvilinearity among curved segments was implemented. The first image in Figure 3.6 shows the original edges after the application of the Lowe curve smoothing process. The second image shows a sample of the results of the curvilinearity 41 Figure 3.5: Curvilinear Segments and proximity detection process. Thick line segments join endpoints where significant relations have been detected, with a "c" indicating curvilinearity and a "p" indicating proximity. As you can see, some endpoints are joined by both relations. 3.4 Location of Key Points The first step in the symmetry detection algorithm is the location of the following three types of key points: Type m key points: A point on an edge where the absolute value of cur-vature has reached a local maxima. 42 Figure 3.6: Proximity and Colinearity Relations 43 Type z key points: A point on an edge where the sign of the curvature changes. Type s key points: The midpoint of a straight section of an edge. A straight section can be defined as follows: A straight section: An area of an edge where the absolute value of the curvature remains sufficiently small. Type z key points or zero crossings are simply points where the curvature changes from positive to negative (or vice versa) from one point to the next. Longer regions where the curvature is low will be counted as straight sections. These regions are determined by locating all points where \k\ falls below a preset threshold. At first this thresholding seems arbitrary, however closer examination reveals that this is not the case. The key point will be located at the midpoint of the straight section, and during the local symmetry loca-tion process will be allowed to wander along the straight section if required. Therefore the exact localization of the ends of the straight section is not required for sufficiently accurate localization of the key point. In addition, the threshold varies inversely with the a value used during smoothing, since 44 smoother segments will have lower curvature. In the examples shown in this thesis, a value in the range of 0.05/a to 0.09/cr was used. Location of type m key points is a difficult problem in practice. Despite the smoothing performed, wiggles in the curve tend to produce extraneous maxima and disguise real ones. In order to counteract this, an algorithm was developed to try and remove these wiggles. Initially, the locations of all local maxima and minima are marked on the curve, and a significance value is calculated for each of the maxima. Let mo be the location of a local maxima, with curvature value ko, and let n\ and n-i be the locations of mo's surrounding minima, with curvature values kn\ and kn2 respectively. The significance value assigned to the local maxima mo is calculated as v = ko — mai ( fc n i , A;n2) The maxima with the lowest significance value is then removed from consider-ation, along with its larger neighbouring minima, and the significance values are recalculated. This process is repeated until all of the remaining maxima have rankings above a certain threshold, again inversely proportional to the scale of smoothing. This threshold can be set fairly high as the inclusion of extraneous maxima only slows down the process slightly, and will cause 45 no damage, whereas removing correct points will cause it to fail completely. Values in the range of 0.06/tr to 0.10/<r were used to produce the results shown in Chapter 4. Figure 3.7 shows two graphs of curvature versus arc length for a sample curve. Initially, the curvature is zero, then it rises to a local maxima at 5, followed by a local minima at 4.5, a local maxima at 10, and finally a minima at 2. The significance value assigned to the first maxima is 0.5, the difference in curvature between the maxima (5.0) and the larger of its neighbouring minima (4.5). Similarly the value assigned to the second maxima is 5.5. The second graph shows the new significance values after the removal of the smaller maxima and its larger neighbouring minima. The new value assigned to the maxima is 8 as the surrounding minima are now those with curvature values of 0 and 2. Figure 3.8 shows the results of this algorithm on a real edge image. The key points are indicated with crosses on the edges and a letter nearby indicating their type. Note that there are no type z key points, in practice they do not appear often as most zerocrossings occur over several points and are therefore classed as type s. As you can see, some extra points have been found, however all of the relevant keys are also marked. 46 Curvature 10 maxima value = 5.5 value =0.5 Curvature 10 maxima new value = Arc Length Arc Length Figure 3.7: Removing Wiggles 3.5 Detection of Local Symmetries Once the key points have been located, they are joined into symmetrical pairs. Recall from Section 2.3.2, Figure 2.6, that two points form a local symmetry when the angles between the two inward facing normals and the line connecting the two points are equal. In addition, we would like the curvature to be the same at both points. Let pi and pi be key points, with types t\ and ti, and suppose that the curvature along the edge at points pi and p2 is &i and ki respectively, and also that the vectors n\ and rf2 represent the inward facing normals. Let c be a vector along the line connecting p\ 47 48 and P21 and let a\ and a-i be the angles between ni and c and n-i and c respectively. Then pi and P2 form a key pair if the following conditions are true: 1. h = t2 2. |Jfei - fc2| = 0 3. |ai — a 2 | = 0 In practice, generation of key pairs is carried out using some variations. To limit complexity, each key point is only tested against the n closest key points with the same type. Setting n to 10 seems to be more than adequate. The value of v = — * \a\ — is used to test the degree of symmetry for each pair, and this value is actually calculated for all points along intervals surrounding each key point. Each interval consists of the region surrounding the point where the curvature remains within 25% of the curvature at the key point, to a maximum of five points in either direction. This allows the location of type m key points freedom to wander around within the low curvature region, while forcing type z and type m key points to remain fairly still. The locations in each interval which minimize v are chosen to make up 49 the key pair. Figure 3.9 shows an example where the original key points (shown by the large dots) have been moved slightly in order to produce an optimum key pair, as shown by the connecting line. The thick straight lines overlaying the object's edge represent the straight sections found by the program. Figure 3.9: Choosing the Best Location 3.6 The Grouping Process An iterative grouping process is used to join the key pairs together. Let kpi,..., kpn be a set of key pairs whose component key points lie on edges ei , . . . ,e m . The axis of symmetry, s, is a unit vector in the direction of the Jine of best fit through the midpoints of the connecting lines c"i,..., cn associated with the key pairs kpi,... ,kpn. The key pairs kpi,..., kpn form 50 a group if the following conditions are true (see Figure 3.10): 1. The edges e i , . . . , em are connected 2. The angles a\,..., an between the connecting lines c[,..., cn and the axis of symmetry s are all zero 3. The perpendicular distances di,...,d„ between the midpoints of the connecting lines c[,..., cn and the axis s are all zero Connectivity among key pairs is denned as for a graph. The nodes are the key points making up each key pair in the two groups. There is an arc connecting two key points if they lie on the same edge segment, or if the edge segments on which they lie are joined by a proximity or curvilinearity relation.. This connectivity criteria insures that all of the key points forming a group are likely to lie on the outline of a single object. In the implementation, each key pair initially forms a group of size one. Grouping proceeds by merging groups together which are connected and whose mean square angle error (Yl?=i ° 2 ) a n d mean square distance error (5Z"=i a r e sufficiently small. In practice, the angle error was forced to be below 5 degrees, while the distance error was very small; in the range of 0.01 51 to 0.05. One reason the distance error was so low was that it is divided by the length of the axis squared in order to preserve scale independence. In order to simplify the calculations, the axis of symmetry is approximated by the line joining the midpoints of the two outer key pairs, which appears to work adequately. Although the object of the program is to produce a list of possible groups and their rankings, rather than to make a final decision as to which groups are correct or incorrect, in most cases tested, one correct grouping stands out from the others. Figure 3.11 shows the results of this process when applied to some simple symmetrical objects. The outline of the objects are the smoothed edges resulting from the preprocessing stage. The overlaying, thicker straight lines are the straight sections, with crosses marking their midpoints as well as the other key points. Key pairs participating in the symmetry grouping are shown by horizontal lines, and the axis of symmetry joins the midpoints of the outermost key pairs. 53 Figure 3.11: Results of Grouping Algorithm 54 Chapter 4 Results The program described in the previous section was implemented in C using NAAG [Noi90]. This section shows some of the results obtained by running the program on some real digitized images. Execution time on a SUN 4 architecture depended greatly on the number of objects and their complexity (in terms of curvature). For the single objects depicted in Figure 3.11, it was quite low (less than a minute), whereas for larger images containing multiple objects it was slower, often requiring up to 3 or.4 minutes. Figures 4.1 to 4.4 demonstrate the use of the program on an image of several objects placed on a light table. Figure 4.1 shows the original image, and Figure 4.2 shows the results of the Canny edge detector. The curve smoothing process was then applied, and the resulting smoothed edges are shown in Figure 4.3. 55 Line thickness corresponds to the value of a used during smoothing. Figure 4.1: Example 1: The Original Image Figure 4.4 shows the final results of the grouping program. The key points are indicated with crosses, and the straight sections are marked with a thick straight line. Each line joining key points indicates a key pair, with the axis of symmetry joining the midpoints of the outermost key pairs. The proximity and curvilinearity groupings have not been marked, although the endpoints they join have been connected using a thick line. As you can see from the images, the most significant grouping is the symmetrical bottle, as 56 57 Figure 4.3: Example 1: The Smoothed Curves 58 its outline contains more curvature variation. The multiple straight sections along the body of the bottle are caused by the slight skewing of the image during projection. The other objects in the image do not contain sufficient information to generate highly significant groupings, although the minor ones found by the program are also shown. Note the large number of key points located along the side of the pen just under the bottle. These extraneous points are caused by insufficient smoothing of the edge; this problem is dis-cussed in the next section. Figure 4.5 shows an image taken of some common household objects and a toy car against a black background. Figure 4.6 shows the results of the grouping algorithm on this image. The car, being non-symmetrical is ignored, while the soap bottle, having multiple curvature variations, generates a strong grouping. Note that the axis of symmetry is not vertical as it has been in most of the previous examples; nothing in the algorithm biases it towards vertical symmetry. The vase in the center of the image suffers slightly from insufficient smoothing, but still produces a group containing four key pairs. The bottle on the right hand side is partially occluded but the correct axis of symmetry is still located. The edges of the bottle lying on its side are too 59 60 fragmented to respond well. F i g u r e 4.5: E x a m p l e 2: T h e O r i g i n a l I m a g e One final example is shown in Figures 4.7 and 4.8. This shows the dish soap bottle in another orientation, and also gives a better example of grouping on the vase. As you can see, in different images of the same object, different amounts of smoothing are selected by the curve segmentation algorithm. This in turn leads to different positioning of the key points and corresponding key pairs. This does not seem to be a problem however as the final axis of symmetry is always correctly identified. 61 62 Figure 4.7: Example 3: The Original Image Chapter 5 Conclusions and Future Work 5.1 Goals of the Thesis The main goal of this work was the detection of perceptual groupings for use as indexing keys in a model-based object recognition system. The system implemented is able to successfully locate instances of endpoint proximity and curvilinearity among curved edges, an extension of the perceptual grouping component of the SCERPO system [Low87]. In addition, the system is able to form symmetrical groupings consisting of pairs of key points along the edges of objects. Some further preprocessing would be required to make these symmetrical groupings useful as indexing features. By themselves, they can only separate a library of models into symmetrical and non-symmetrical objects. Once the axis of symmetry has been discoved however, further 64 processing can yield more information for use in indexing. For example, the object could be described in terms of the distances of the edges from the axis. Although this measure in itself is not invariant under changes in viewpoint, the relative distances from the axis of symmetry to the key points, along with the relative distances between the projections of the key points onto the axis are. These variations in distance, along with the key points used in the generation of the symmetrical grouping, can be used to locate spatial correspondences between image points and model points. In addition, the axis itself constrains the initial values of the viewpoint parameters. As you can see from the examples presented in the previous section, the system adequately meets the goals of the thesis. 5.2 Implementation Problems The detection of endpoint proximities and curvilinearities works quite well. A pair of threshold values for the significance of these groupings was discovered through experimentation and seem to work well over a wide range of images. The location of key points, however, was more problematic. In many cases the scale of smoothing chosen by the curve smoothing algorithm was too low, 65 resulting in bumpy edges and therefore extraneous maxima and zerocrossings. This also tended to break up straight sections, making them difficult to match up. Forcing the program to choose smoother intervals resulted in the object outlines being fragmented into many small, smooth segments, with points of maximum curvature being lost at the segment endpoints. Several improvements are possible to this process, for example location of key points across several scales of smoothing is a possibility, but requires tracking of points over multiple scales, which can be very difficult. Adjust-ment of the methods used to choose the scale of smoothing are also possible, perhaps using the appearance and disappearance of curvature variations to aid in this process. Finally using the instances of endpoint proximity them-selves as key points is a possibility, but it would be difficult to establish criteria for a local symmetry between a broken corner and a local maxima. The generation of local symmetries also was quite robust, although if many extraneous key points were detected a large number of groups of size one were generated, thus slowing down the grouping algorithm. In most cases as the threshold for a local symmetry was increased the number of key pairs increased up to a point, but eventually seemed to level out. This is probably 66 due to a combination of the individual thresholding on the curvature and angular differences, and the limit on the possible number of other key points a point could be compared against. There did not seem to be a problem as long as enough appropriate key points were generated by the previous stage. The grouping process was also very successful; thresholds for the angle and difference measures, along With group significance were easy to discover and worked well on almost every image presented. The only problem encoun-tered was a dramatic slowdown when too many key pairs were generated by the previous stages. This problem would be solved in part by fixing the problems in the previous stages, but improvement is also possible here. For example sorting the existing groups by the position and orientation of their i axis of symmetry and only attempting to join groups to other groups with similar values for these parameters could help to limit complexity. i . \ 5.3 Possible Extensions As well as just describing the groupings found as a function of the distance from the object's edges to the axis of symmetry, further work on shape de-scription could be done. Multiple symmetries will be detected by the algo-67 rithm as it now stands and only a small amount of processing would be re-quired to notice them. Partial symmetries such as those occurring in the cor-ners of a rectangle would also be detected is enough information was present, and could also provide information for shape description. The smoothed lo-cal symmetries program handles this using subsumption: when the section of an object's outline associated with an axis of symmetry is wholly contained within the section associated with a second axis, the second axis is said to subsume the first. These subsumed axis are therefore considered less signif-icant during shape description. A similar methodology could be employed here. Basic shapes such as ellipses and squares could also be detected, and variations in the algorithms could allow it to detect skew symmetry and parallelism. Skew symmetry would require a relaxation of the distance con-straint, while the case of parallelism is more complicated. Key points would be detected as before, and generation of local symmetries would be basically the same except the angles between the normals should sum to 180 degrees instead of being the same. The criteria for grouping would have to be differ-ent however. One possibility is to trace out three paths: one down each side of the object, going from key point to key point, and one down the center of the object connecting the midpoints of the key pairs (see Figure 5.1). These 68 paths would be made of straight line segments, and should all be parallel. In this way the problem of finding parallel curves is reduced to one of finding parallel lines. Figure 5.1: Parallel Curves Another future possibility is the parallelization of the algorithm. The algorithm lends itself to this in several ways. First the curve smoothing could be parallelized by processing each segment at each scale of smoothing in parallel. Location of key points could be performed with a processor for each smoothed segments. The local symmetry test could be carried out with a processor for each key point, and each iteration of the grouping algorithm could be parallelized by assigning a processor to each existing group. In conclusion, with some optimization and possible parallelization, there is no 69 reason why this algorithm could not be incorporated into a real time system for object recognition. 70 Bibliography [AB86] Haruo Asada and Michael Brady. The curvature primal sketch. IEEE Transactions on Pattern Analysis and Machine Intelli-gence, 8(1):2-14, 1986. [Att54] F. Attneave. Some informational aspects of visual perception. Psychological Review, 61:183-193, 1954. [BA84] Michael Brady and Haruo Asada. Smoothed local symmetries and their implementation. A.I. Memo 757, MIT Al Lab, Cam-bridge, MA, 1984. [Bie85] Irving Biederman. Human image understanding: Recent research and a theory. Computer Vision, Graphics, and Image Processing, 32:29-73, 1985. [BN78] Harry Blum and Roger N. Nagel. Shape description using weighted symmetric axis features. Pattern Recognition, 10:167-180, 1978. [Bro81] Rodney .A. Brooks. Symbolic reasoning among 3-D models and 2-D images. Artificial Intelligence, 17:285-348, 1981. [Can83] J. F. Canny. Finding lines and edges in images. AI-TR-757, MIT Al Lab, Cambridge, MA, 1983. [Chi77] S. F. Chipman. Complexity and structure in visual pat-terns. Journal of Experimental Psychology: General, 106:269-301, 1977. 71 [Der71] J. B. Deregowski. Symmetry, gestalt and information theory. Quarterly Journal of Experimental Psychology, 23:381-385, 1971. [FF87] Celia B. Fisher and Maria P. Fracasso. The Goldmeier effect in adults and children: environmental, retinal, and phenomenal influences on judgments of visual symmetry. Perception, 16:29-39, 1987. [Gar62] W.R. Garner. Uncertainty and structure as psychological con-cepts. New York:Wiley, 1962. [Jac88] D. W. Jacobs. The use of grouping in visual object recognition. AI-TR-1023, MIT AI Lab, Cambridge, MA, 1988. [KMTB76] Michael King, Glenn E. Meyer, John Tangney, and Irving Bieder-man. Shape constancy and a perceptual bias towards symmetry. Perception and Psychophysics, 19:129-136, 1976. [Low84] D. G. Lowe. Perceptual organization and visual recognition. Stan-cs-84-1020, Stanford University, Dept. of Computer Sci-ence, Stanford, CA, 1984. [Low87] David G. Lowe. Three-dimensional object recognition from single two-dimensional images. Artificial Intelligence, 31:355-395, 1987. [Low89a] David G. Lowe. Fitting parameterized 3-D models to images. Technical Report 89-26, University of British Columbia, Decem-ber 1989. Department of Computer Science. [Low89b] David G. Lowe. Organization of smooth image curves at multiple scales. International Journal of Computer Vision, 3:119-130, 1989. [Noi90] Emanuel G. Noik. NAAG - not another application generator. Master's thesis, University of British Columbia, July 1990. [PH78] S.E. Palmer and K. Hemenway. Orientation and symmetry: Ef-fects of multiple, rotational and near symmetries. Journal of Experimental Psychology: Human Perception and Performance, 4:691-702, 1978. 72 [Rob65] L. G. Roberts. Machine perception of three dimensional solids. In J. T. Tippett, D. Berkowitz, L. Clapp, C. Koester, and A. Van-derburgh, editors, Optical and Electro-optical Information Pro-cessing, pages 159-197. MIT Press, Cambridge, MA, 1965. [Roy81] Fred L. Royer. Detection of symmetry. Journal of Experimental Psychology: Human Perception and Performance, 7:1186-1210, 1981. [SU88] Amnon Sha'ashua and Shimon Ullman. Structural saliency: The detection of globally salient structures using a locally connected network. Second International Conference on Computer Vision, pages 321-327, 1988. [Wal87] Deborah Walters. Selection of image primitives for general-purpose visual processing. Computer Vision, Graphics, and Im-age Processing, 37:261-298, 1987. [Wer38] Max Wertheimer. Laws of organization in perceptual forms. In W. Ellis, editor, A Source Book of Gestalt Psychology, pages 71-88. New York: Harcourt, Brace and Company, 1938. 73
- Library Home /
- Search Collections /
- Open Collections /
- Browse Collections /
- UBC Theses and Dissertations /
- Perceptual organization and symmetry in visual object...
Open Collections
UBC Theses and Dissertations
Featured Collection
UBC Theses and Dissertations
Perceptual organization and symmetry in visual object recognition Wilson, Susan E. 1991
pdf
Page Metadata
Item Metadata
Title | Perceptual organization and symmetry in visual object recognition |
Creator |
Wilson, Susan E. |
Publisher | University of British Columbia |
Date Issued | 1991 |
Description | A system has been implemented which is able to detect symmetrical groupings in edge images. The initial stages of the algorithm consist of edge detection, curve smoothing, and the extension of the perceptual grouping phase of the SCERPO [Low87] vision system to enable detection of instances of endpoint proximity and curvilinearity among curved segments. The symmetry detection stage begins by first locating points along object boundaries which are significant in terms of curvature. These key points are then tested against each other in order to detect locally symmetric pairs. An iterative grouping procedure is then applied which matches these pairs together using a more global definition of symmetry. The end result of this process is a set of pairs of key points along the boundary of an object which are bilaterally symmetric, along with the axis of symmetry for the object or sub-object. This paper describes the implementation of this system and presents several examples of the results obtained using real images. The output of the system is intended for use as indexing features in a model-based object recognition system, such as SCERPO, which requires as input a set of spatial correspondences between image features and model features. |
Subject |
Visual perception -- Mathematical models Symmetry groups Image processing -- Digital techniques |
Genre |
Thesis/Dissertation |
Type |
Text |
Language | eng |
Date Available | 2010-11-04 |
Provider | Vancouver : University of British Columbia Library |
Rights | For non-commercial purposes only, such as research, private study and education. Additional conditions apply, see Terms of Use https://open.library.ubc.ca/terms_of_use. |
DOI | 10.14288/1.0051968 |
URI | http://hdl.handle.net/2429/29802 |
Degree |
Master of Science - MSc |
Program |
Computer Science |
Affiliation |
Science, Faculty of Computer Science, Department of |
Degree Grantor | University of British Columbia |
Campus |
UBCV |
Scholarly Level | Graduate |
AggregatedSourceRepository | DSpace |
Download
- Media
- 831-UBC_1991_A6_7 W54.pdf [ 3.52MB ]
- Metadata
- JSON: 831-1.0051968.json
- JSON-LD: 831-1.0051968-ld.json
- RDF/XML (Pretty): 831-1.0051968-rdf.xml
- RDF/JSON: 831-1.0051968-rdf.json
- Turtle: 831-1.0051968-turtle.txt
- N-Triples: 831-1.0051968-rdf-ntriples.txt
- Original Record: 831-1.0051968-source.json
- Full Text
- 831-1.0051968-fulltext.txt
- Citation
- 831-1.0051968.ris
Full Text
Cite
Citation Scheme:
Usage Statistics
Share
Embed
Customize your widget with the following options, then copy and paste the code below into the HTML
of your page to embed this item in your website.
<div id="ubcOpenCollectionsWidgetDisplay">
<script id="ubcOpenCollectionsWidget"
src="{[{embed.src}]}"
data-item="{[{embed.item}]}"
data-collection="{[{embed.collection}]}"
data-metadata="{[{embed.showMetadata}]}"
data-width="{[{embed.width}]}"
async >
</script>
</div>
Our image viewer uses the IIIF 2.0 standard.
To load this item in other compatible viewers, use this url:
https://iiif.library.ubc.ca/presentation/dsp.831.1-0051968/manifest