Open Collections

UBC Theses and Dissertations

UBC Theses Logo

UBC Theses and Dissertations

Using range data for object recognition in a robotics environment Archibald, Colin 1984

Your browser doesn't seem to have a PDF viewer, please download the PDF to view this item.

Item Metadata

Download

Media
831-UBC_1984_A6_7 A73.pdf [ 3.44MB ]
Metadata
JSON: 831-1.0051856.json
JSON-LD: 831-1.0051856-ld.json
RDF/XML (Pretty): 831-1.0051856-rdf.xml
RDF/JSON: 831-1.0051856-rdf.json
Turtle: 831-1.0051856-turtle.txt
N-Triples: 831-1.0051856-rdf-ntriples.txt
Original Record: 831-1.0051856-source.json
Full Text
831-1.0051856-fulltext.txt
Citation
831-1.0051856.ris

Full Text

U S I N G R A N G E D A T A F O R O B J E C T R E C O G N I T I O N IN A R O B O T I C S E N V I R O N M E N T by C O L I N A R C H I B A L D B.C.S. A c a d i a University, 1982. A T H E S I S S U B M I T T E D I N P A R T I A L F U L F I L L M E N T O F T H E R E Q U I R E M E N T S F O R T H E D E G R E E O F M A S T E R O F S C I E N C E in T H E F A C U L T Y O F G R A D U A T E S T U D I E S D E P A R T M E N T O F C O M P U T E R S C I E N C E We accept this thesis as conforming to the required standard. . T H E U N I V E R S I T Y O F B R I T I S H C O L U M B I A February, 1984 ©Colin Archibald, 1984 In presenting t h i s thesis i n p a r t i a l f u l f i l m e n t of the requirements fo r an advanced degree at the University of B r i t i s h Columbia, I agree that the Library s h a l l make i t f r e e l y a v a i l a b l e f o r reference and study. I further agree that permission for extensive copying of t h i s thesis f o r s c h o l a r l y purposes may be granted by the head of my department or by his or her representatives. I t i s understood that copying or publi c a t i o n of t h i s thesis f o r f i n a n c i a l gain s h a l l not be allowed without my written permission. Department of Computer Science  The University of B r i t i s h Columbia 1 9 5 6 Main Mall Vancouver, Canada V6T 1 Y 3 Date Feb. 22, 1984. DE -6 ( 3 / 8 1 ) Abstract A new approach to object recognition for a robotics environment is presented. 2-D models enriched with 3-D information are constructed automatically from a range image. These 'view models' are used to recognize objects by matching them to models subsequently constructed from similar images. A segmentation method for extraction of planar surfaces from range images has been developed. View models are constructed as augmented sur-face adjacency graphs. A heuristic model matching method, is presented. Results indicate that object recognition in a constrained domain and environment can be achieved at a very high success rate using singlevview range images. ii T a b l e o f C o n t e n t s 1 I n t r o d u c t i o n 1 1 . 1 G o a l s o f A r t i f i c i a l I n t e l l i g e n c e 1 1 . 2 S t e p s T o w a r d O b j e c t R e c o g n i t i o n 2 2 R a n g e D a t a I n t e r p r e t a t i o n . 5 2 . 1 I n t r o d u c t i o n 5 2 . 2 P i x e l s , V o x e l s , a n d S u r f e l s 5 2 . 3 R e l a t e d R e s e a r c h 6 2 . 3 . 1 E d g e a n d R e g i o n D e t e c t i o n I n R a n g e Images 7 2 . 3 . 1 . 1 E d g e D e t e c t i o n 7 2 . 3 . 1 . 2 R e g i o n D e t e c t i o n 9 2 . 3 . 2 D i s c u s s i o n o f E d g e V s . R e g i o n M e t h o d s . . . . 9 2 . 3 . 3 M o d e l i n g a n d O b j e c t R e c o g n i t i o n 11 3 A New A p p r o a c h t o C o n s t r a i n e d D o m a i n / E n v i r o n m e n t O b j e c t R e c o g n i t i o n 13 3 . 1 I n t r o d u c t i o n 13 3 . 2 T h e R a n g e F i n d e r 13 3 . 3 O b j e c t R e c o g n i t i o n 15 4 P l a n a r S u r f a c e E x t r a c t i o n f r o m R a n g e Images 18 4 . 1 I n t r o d u c t i o n 18 4 . 2 G r a d i e n t S p a c e 18 4 . 3 G a u s s i a n S m o o t h i n g 19 4 . 4 T h e G r a d i e n t O p e r a t o r 21 4 . 5 Two S t a g e S u r f a c e E x t r a c t i o n 23 i i i 4.5.1 C l u s t e r i n g 23 4.5.2 Merging the Regions 24 4.6 V e r i f i c a t i o n o f Surfaces 26 5 Modeling The Objects 27 5.1 I n t r o d u c t i o n 27 5.2 View Modeling 27 5.2.1 Surface P r o p e r t i e s 29 5.2.2 Surface Adjacency P r o p e r t i e s 30 5.2.3 Generic View Model P r o p e r t i e s 31 5.3 Adequacy Issues 31 5.3.1 D e s c r i p t i v e Adequacy 31 5.3.2 Procedural Adequacy 34 6 Model Matching 35 6.1 I n t r o d u c t i o n 35 6.2 F i l t e r the Knowledge Base 35 6.3 H e u r i s t i c Augmented Graph Matching 36 6.3.1 R e l i a b i l i t y C o n s t r a i n t 36 6.3.1.1 Sc o r i n g a Node fo r R e l i a b i l i t y 37 6.3.2 S t r u c t u r a l C o n s t r a i n t 38 6.3.3 R e l a t i o n a l C o n s t r a i n t 38 6.3.4 A S c o r i n g Mechanism f o r the Match 38 6.3.4.1 Compare Nodes 39 6.3.4.2 Compare Li n k s 39 6.3.5 The Al g o r i t h m 40 i v 0 7 R e s u l t s and E v a l u a t i o n 42 7 .1 C r i t e r i a f o r E v a l u a t i o n . ...... 42 7.2 Strengths and Weaknesses 43 7.2.1 Specular Reflectance' ..................... 43 7.2.2 Shadow E f f e c t 43 7.2.3 C a l i b r a t i o n and Panoramic D i s t o r t i o n 45 7.3 Per formance 47 7.3.1 CPU Time Requirements 47 7.3.2 Image Features Detected 48 7.3.3 Object R e c o g n i t i o n S t a t i s t i c s 49 8 Extensions and Conclusions . 50 8.1 Extensions t o the Implementation 50 8.2 Conclusions 51 References *. . 53 Appendix 58 v L i s t of Figures Figure 3.1 Range Finding Technique 14 Figure 3.2 Structure of the Knowledge Base 17 Figure 4.1 (a) Two Dimensional Gaussian Function ... 20 Figure 4.1 (b) F i r s t P a r t i a l Derivative of Gaussian 22 Figure 4.2 (a) Clustering Neighbours 25 Figure 4.2 (b) Merging Neighbours 25 Figure 5.1 (a) Sketch View of Truncated Pyramid 28 Figure 5.1 (b) Surface Adjacency Graph 28 Figure 5.2 Two Objects With the Same View Model .... 33 Figure 7.1 Specular Reflectance Demonstrated 44 Figure 7.2 Surfaces Extracted From Figure 7.1 44 Figure 7.3 Shadow Effe c t 46 Figure 7.4 Surfaces Extracted From Figure 7.3 46 v i Acknowledgements The author would like to thank the National Research Council of Canada for allowing this research to be done with the help of their facilities and expertise. Many thanks go to NRC researchers Nestor Burtnyk, Tony Kasvand, Shadia Elgazzar, and Cathy Merritt whose patience and advice were readily offered. The range finder used in this research was developed by Marc Rioux who generously permitted its use, as well as permission to reproduce Figure 2.1. Alan Mackworth, who supervised this research from UBC, has unraveled many mysteries in computational vision, and his contributions to this research are gratefully acknowledged. Bob Woodham has offered some valuable advice toward the final draft of this thesis and his assistance is sincerely appreciated. Financial support for this research was provided by the Natural Sciences and Engineering Research Council of Canada. vii -1-Chapter 1 Introduction 1.1. Goals of Artificial Intelligence The field of Artificial Intelligence is emerging as one of the most interest-ing areas of modern scientific research. The goals of this research are almost as varied as the scientists and engineers specializing in this discipline. A basic goal of A l is to make computers more useful by enabling them to make intelli-gent decisions. Some researchers are also interested in revealing principles of intelligence through research in A l . Although the simulation of the entirety of human intelligence is often considered a futuristic goal, it has been discovered that building a machine with a minute fraction of our intelligence can be very rewarding. To have a machine that can reason properly in a very confined domain is a much more realistic task, and these 'toy worlds' have been used extensively in A l research. 'Toy worlds' were intended originally to be domains for the development of methods which could eventually be generalized to richer domains. Some 'toy worlds' are themselves useful, and are now referred to as 'confined' or 'constrained' domains. For example, an expert system which deals only with predicting geological formations beneath the earth's surface would not require any knowledge or reasoning in the field of medicine. In the area of computational vision, many useful confined domains can be identified. In manufacturing industries, for instance, there is often a limited number of objects which a vision system would be required to recognize. - 2 -These objects would almost always have a regular form with smooth surfaces, as opposed to naturally occurring objects, such as trees or mountains. Recog-nition of these objects in an industrial environment is the first step toward placing productive, intelligent robots in the manufacturing sectors of industry. The research described herein is in pursuit of this objective. 1.2. Steps Toward Object Recognition There are many vision systems designed for use in a robotics environ-ment. A bibliography describing recent ones has been presented by Abdel-malek (1983). Most of these systems are based on 2-D monochromatic inten-sity images obtained from various types of television cameras. A new family of cameras which can provide direct three dimensional information have recently been developed. Jarvis (1983) has written a comprehensive survey on these range finding devices. The camera used in this research is a range finder capable of capturing depth information in almost real time. The principles of this device are described in Section 3.2. In our simulated robotics environment, the field of view for the range finder is the work envelope of the robot arm. The range finder is directed straight down. The objects to be recognized are all polyhedra. The problem is easily defined: How can the single view range image be used to recognize the objects in the robot's work envelope? In this research we propose a method for collecting features from a par-ticular view of an object. These features are placed in a data structure called - 3 -a view model. A view model consists of topological and geometric properties of the sur-faces that are visible. A n amalgamation of these properties is also a part of the view model, and is referred to as generic knowledge. There can be several view models for each object; one for every stable resting position. View models which are orientation independent with respect to that particular view are taught to the system automatically. These view models form the knowledge base for the domain, and can be maintained interactively. The same methods that are used for building the knowledge base are used subse-quently to build view models for recognition. In the recognition stage, view models are created for the objects to be recognized in the scene. These models are compared to the models in the knowledge base, first generically to obtain a list of possible matches, and then a more elaborate matching algo-rithm is used to determine which of the possible models best matches the object to be recognized. In much computer vision research, a 2-D image has been used with the intention of extracting a 3-D model (Mackworth, 1976). In these systems, depth cues, such as shadows and perspective are interpreted. In our research the model used is basically a 2-D description of what is in the 3-D scene. We believe that to achieve a success rate of 100% for object recognition, no guess-ing of hidden surfaces or other invisible properties of the scene should be al-lowed. Using range images, extraction of information which is explicitly avail-able in the image is sufficient for object recognition in a constrained domain. In this research we are not trying to model or mimic human object recog-nition. It would be very difficult for a human to take a list of topologic and - 4-geometric properties representing an object, and visualize what is being described. This is how model based object recognition is implemented. The reader may speculate that humans actually do form property models subcons-ciously, and perform object recognition at this level. This type of speculation is not a part of our research. The following chapter describes the very new field of range data interpre-tation. Chapter 3 describes the range finding device used, as well as a discus-sion of the issues to be addressed in the implementation. In the implementation described in Chapters 4 to 7, the following steps are taken: In Chapter 4 the range image is simultaneously smoothed and transformed to gradient space, creating two images; one for each of x slope and y slope. Planar surfaces are extracted from the gradient space images and verified. Next, a model is constructed based on the topology of the object's sur-faces, and their geometric properties. This is described in Chapter 5. In Chapter 6, a matching algorithm is described which compares models previously taught to the system with models built from objects to be recog-nized. Chapter 7 describes the performance of the entire implementation in terms of successful recognition, scene features detected, and CPU time re-quired for each step. Chapter 8 proposes some extensions which would improve performance, and remove constraints, and also reveals some conclusions drawn from this research. - 5 -Chapter 2 Range Data Interpretation 2.1. Introduction Range data interpretation is a very new area of computational vision. Most methods used in the literature have grown from tried and tested methods of intensity image processing. Since range finders are only useful for a limited range of depths, almost all of the literature in range data interpreta-tion has been directed at the area of robotics, where the environment and the domain can be legitimately constrained, without removing all hope that the system developed may have some practical purpose. 2.2. Pixels, Voxels, and Surf els Most computational vision systems use some variation of two dimensional images as input (Abdelmalek, 1983). The image is represented as a two di-mensional array of values called pixels (picture elements). These pixels may consist of hues, grey levels, or binary values. With the advent of computer aided tomography, used mainly for medical imagery, a new type of image pro-cessing was introduced (Morgenthaler and Rosenfeld, 1981). This type of im-age is a three dimensional array of intensities, representing density. The ele-ments of this array are called voxels (volume elements) and must be treated differently during analysis. The camera described briefly in Chapter 3 is one of the recently developed sensors referred to as 'range finders'. The images produced by these devices are two dimensional arrays of depths, i.e., the dis-tance from the camera to the surface being scanned. These elements will be - 6 -referred to as 'surfels' (surface elements), and must also be processed using methods developed specifically for this type of image. The benefits of using range images are obvious to those who have at-tempted to extract depth cues from 2-D images to obtain feasible 3-D in-terpretations. Range images provide the 3-D information explicitly, which el-iminates the costly and unsure task of implying depth in a scene. A range im-age is easier to analyze than an intensity image because the information in this type of image is a function of only one scene property, namely depth.1 When processing an intensity image, factors such as lighting, surface reflectivity, and surface orientation must be considered, complicating what is already an inadequate source of 3-D information. However, a range image can only be acquired for surfaces within a limited range of distances from the range finder. This restriction is not a serious problem in the indoor factory environment where the objects occur within the constant and short range of the range finder. 2.3. Related Research This thesis is concerned with both low level and high level processing of a range image. The purpose of low level image processing, or image analysis, is to compute a description of the image to be used for subsequent processing. In high level processing, or scene analysis, the image description is used to determine what is in the scene. *An exception to this occurs when specular reflectance interferes with the depth finding process. This is discussed in Chapter 7. - 7 -With the potential benefits of successfully exploiting range data for robotics, a number of interesting papers have appeared. The methods used to interpret range data, and indeed the devices themselves, vary greatly with the task. Some of the recent research useful for a robotics environment is dis-cussed in this chapter. 2.3.1. Edge and Region Detection in Range Images Generally, object recognition from images is accomplished by extracting features of the object from the image and matching these to previously defined models, or collections of features pertaining to an object. A n image is represented as a very large array of elements from which image features must be identified. The first stage in image processing is to manipulate the image into a manageable form which will make these features more apparent. As in intensity image processing, low level processing of range images is either 'edge based', 'region based', or some combination of the two. 2.3.1.1. Edge Detection A different approach is used for edge detection in range images from that used in intensity images. Different information is available from surfels than from pixels. Edge information, derived from range images, often consists of both location and type of the edge. There are three types of edges commonly identified. If the two surfaces form a convex intersection, the edge where the surfaces meet is referred to as a convex edge. Concave edges are defined simi-larly. The third type occurs when a surface is occluding some part of the scene. The edge formed here is characterized by an abrupt change in range - 8 -values, and is referred to as a jump edge, or jump boundary. Nitzan et al. (1977) extract jump boundaries from range images to find outlines of objects in the scene. This information is used to compute the Cartesian coordinates of the outline of the object, and obtain normalized views of the interesting surfaces. The method is demonstrated by rotating a surface in a registered intensity image to a normal position, where the writing on the surface can be interpreted. Another approach to edge detection was used by Mitiche and Aggarwal (1982). They extract edges from a noisy image using a model driven algo-rithm. The probability that an edge exists at each surfel is used to help derive an edge map for the image. Inokuchi et al. (1982) use a combination of edge and region detection. They have developed an operator which is used to label every surfel as being on a jump edge, convex edge, concave edge, or planar region. The benefits of using this operator to higher level processing are not fully known. Perhaps the most interesting work describing edge detection in a range image can be found in Gil et al. (1983). They detect edges in registered inten-sity and range images, and combine these edge maps. This is quite successful; and the results seem to include all significant edges. High level processing us-ing this edge map is not discussed. 2.3.1.2. Region Detection Extraction of planar, and quadratic surfaces is often done by histogram-ing techniques, as described in Duda et al. (1979). Different histograms are used to identify horizontal, vertical, and slanted surfaces. Planar surfaces are - 9 -approximated from the histogram and refined by fitting a plane to this section of the image. Reliable surfaces are identified first, and then questionable ones are attempted. In Hebert and Ponce (1982), the image is mapped into a two dimensional space, namely surface normals represented on the Gauss sphere. In their method, the planar, cylindrical, and conic surfaces are detected globally using a Hough transform, and distinguished by location in the image plane. The surfaces are then related to each other by their proximity, and the small ones are eliminated. Their approach is not to classify every surfel, but to extract features and reference points from the scene for recognition of industrial parts. The 'Three-Point Seed' method of planar surface extraction is described in Henderson and Bhanu (1982). Three neighbouring surfels are selected, and if they approximate a plane, the algorithm will grow this surface to its edges. The surfaces are approximated as polygons, and used to create a model of the object. 2.3.2. Discussion of Edge Vs. Region Methods In this research we favor region based segmentation methods. When ex-tracting information from a range image or any other digital signal, it is best to use much of the signal available as possible. Algorithms which extract re-gions from an image are using more of the surfels to build a region map than would be used in building an edge map. The region map is very likely to be more reliable because more correct information is used. - 10-Region detection can only be used if there is a constant measure over the region to be found. In range images of a polyhedral world, surface slope can be extracted and used as such a measure. There is another significant reason to favor region detection over edge detection in this domain. A slightly incomplete region will be less harmful to subsequent processing than an incomplete edge map. Using incomplete edges to build an object model is a very difficult task, and will inevitably result in models which are not as reliable as the images are capable of providing using a region based method. It is difficult, if not impossible, to prove region methods better than edge methods for extracting information from an image. The information con-tained in an image is not well defined. Performance of low level routines found in the literature can not be compared with respect to each other in an unconfined domain, as some methods will be working with noisier images, and with different features to be identified. Final results in object recognition can not be used as a judge of the low level routine, because the different methods of low level processing affect the research at every subsequent stage. It appears from the literature that a 'best' low level processing method has not been established for range image processing. It may be that no 'best' domain independent method exists, especially since there are a variety of ulti-mate goals in the research. -11 -2.3.3. Modeling and Object Recognition Model based vision systems have been using intensity images since 1965. (For a complete summary of this work see Glicksman, 1982, Ch. 3.) Several methods of object modeling have been proposed. One way is to create a dic-tionary of primitives, and represent the objects in terms of these simple ob-jects. This approach was used by Roberts (1965), who used simple polyhedra as primitives. A n assembly of generalized cylinders, and ways in which they can be connected is used in Nevatia and Binford (1977). Generalized cylinders have also been used in Popplestone et al. (1975). This representation is ambi-guous in that an infinite number of cylinders can represent a single object. Modeling of polyhedra has been a popular place to begin research in ob-ject recognition. Falk (1972) uses topological and geometric properties of ob-jects in a representation called face adjacency graphs. The nodes in these graphs represent planar faces and the links represent connecting edges. The graphs are used to recognize objects which are found in incomplete, and im-perfect line diagrams of the polyhedra. Falk's objective is quite ambitious. From a single view line drawing, he attempts to identify objects which may be in any position and orientation, and also may be partially occluded by oth-er objects in the scene. Hidden surfaces are predicted, and some ambiguity is removed from the diagram by restricting the domain to include only trihedral corners. The models which are created for the line drawings are matched to previously stored prototype models. This is accomplished by eliminating un-suitable prototypes until only one interpretation remains. The elimination procedure used is very interesting. The most reliable features in the image are used first. The base edge, for example is used to eliminate all prototypes - 12 -without such an edge, since the base edge was found consistently in the draw-ings. Although Falk's methods are sound, a limited amount of information can be extracted from a single 2-D image, and he points out that: "When the time comes to analyze very complex scenes, it will undoubt-edly be the case that more than a single view will be required in order to produce a complete scene description." Oshima and Shirai (1983) have used the concept of face adjacency graphs for object recognition using range images. Their goals are also ambitious, as they attempt to recognize polyhedra that may occlude one another in varying degrees from a single view range image. Much research in range data interpretation is based on the requirement that a robot vision system must recognize an object in any position or orien-tation. This has made complete 3-D models necessary. Henderson and Bhanu (1982) attempt to build a complete 3-D model by rotating the object on a platform, and taking several range images. Others have indicated that a C A D / C A M type model will eventually be used in their systems. (Shirai, (1979) suggests this as a solution.) It is a contentious issue as to what restrictions need to be placed on the object domain, and the working environment, to achieve a performance rate for object recognition that is suitable for robotics applications. A modeling method has not yet been developed that can be successfully moved from one domain to another. - 13-Chapter 3 A New Approach to Constrained Domain/Environment Object Recognition 3.1. Introduction In object recognition for robotics, the work environment can be con-strained, as well as the domain for the objects. This constrained environment allows for new techniques and devices to be introduced. The device described briefly in this chapter is only capable of providing depth information for a limited range of depths. The environment for this research has further been constrained so that the range finder is directly above the objects which are to be recognized. This arrangement is suitable for many manufacturing tasks. 3.2. The Range Finder The range finder used in this research was developed in the Division of Electrical Engineering at the National Research Council of Canada. It is a laser based scanner, consisting of a light source, a scanning mechanism, and a position sensor. The N R C device differs from other range finders in two significant ways. A new geometry for the triangulation has been devised which greatly reduces the shadow effect found in all triangulation based images. In conjunction with this, the scanning laser beam is synchronized with the detec-tor so that accurate X and Y image coordinates are obtained for each surfel at a rate of 200 KHz. The depth of a surface at each point is determined in the following way: Consider Figure 3.1. Whenever an object is encountered by the laser beam, -14-Figure 3.1 Range finding technique using synchronized geometry. - 15 -P 0 differs from P t by an amount proportional to Ax. P 0 is known because of the synchronization, and Pj is the detected point. The Z value can then be approximated using of the following relationship: This relationship can only be used for small values of 9, as panoramic distortion is not considered. Details regarding this range finding device can be found in Rioux (1982). 3.3. Object Recognition The essence of this thesis is to show that object recognition can be achieved in a constrained domain/environment from a single range image. Constraints on the domain of objects to be recognized are extensive for this research, but this does not significantly change the crux of the problem. There are two major constraints on the object domain. The objects must be polyhedral, and no object may touch or occlude another from the light source and detection viewpoints of the range finder. The success rate for a viable robotics vision system must be near 100%. Many vision systems are a complex arrangement of reasoning and deduction regarding what appears in the scene. The more complex a vision system is, the more likely that an image will occur that it can not interpret. This com-plexity has been caused by inadequate sources of information. The 2-D in-tensity image has been expected to reveal 3-D information, and elaborate sys-tems have been developed with this goal in mind. With view models created from range images the explicit 3-D information can eliminate much complexi-- 16 -ty in model building, and matching stages. Since we are working in a robotics environment with the camera directly overhead, there is a very restricted number of views possible for each object. Each view is referred to as a 'resting position' for that particular object. Thus, there may be several view models constructed for each object and placed in the knowledge base. The number of models required is dependent on the shape of the object. A cube, for example, has six resting positions, but only one view model is required because the object will appear the same in each position. In general, more models will be required for objects that are complex in shape. Each of these view models points to a 'real world' model which contains information pertaining to the object, which is true regardless of the resting position. These properties may include the object's name, weight, purpose in an assembly task, etc. Figure 3.2 illustrates the entire structure of the knowledge base for view models. The view model will be represented as a surface adjacency graph. It must be built automatically and placed in a knowledge base for later use in recognition. Automatic teaching of view models is an ideal method for object recognition. The features detected by the range finder and low level process-ing for a particular object view are likely to be very similar to subsequent 'sightings' of that object. This makes the model matching process very reli-able. The objective of the implementation described in Chapters 4 - 7 is to demonstrate the ability of the range finder combined with view models to recognize simple objects with a success rate close to 100%. -17-Fiffure 3 2 Structure of the Knowledge Base. Each view model, represented as an augmented surface adjacency graph, has generic knowledge associated with it and points to a real world model containing non-visual information. - 18-Chapter 4 Planar Surface Extraction from Range Images 4.1. Introduction In Chapter 2 it was stated that for range images the region based method of information extraction is preferable over edge detection. This chapter describes steps taken in the implementation to accomplish planar surface ex-traction. A l l planar surfaces have constant x and y slopes, but only surfaces paral-lel to the image plane have constant range values. To identify all planar sur-faces in the image using the same algorithm we have chosen to compute the gradient space coordinates for each surfel from the Cartesian coordinates which are explicit in the range images. In this representation all planar sur-faces can be detected by local consistency of both x and y slopes. The range images contain a small amount of noise which must be re-moved, or at least be made insignificant before surfels can be considered indi-vidually. A n operator is introduced which when convolved with the image results in a directional gradient value while simultaneously smoothing out high frequency noise. 4.2. Gradient Space The gradient space (G), introduced by Mackworth (1973), is a coordinate system which facilitates the mapping of a three dimensional scene into two di-mensions. The axes represent x slope (usually labeled 'p'), and the y slope (usually labeled '^'). When a planar surface is mapped into G, it is represent-- 19 -ed as a point. More concisely stated: If a plane is defined by Ax + By + Cz -I- D = 0, it is mapped into G as (p,q) = ( -A/C , - B / C ) . Gradient space has been used extensively in image analysis and has many interesting properties. A detailed description of gradient space and its appli-cations can be found in Shafer et al. (1982). 4.3. Gaussian Smoothing The range images used in this research were found to have very little noise. However, experimentation has shown that some smoothing would im-prove results. Marr and Hildreth (1980) have shown that local averaging based on the normal distribution can be used to create a low pass filter which is optimal in the tradeoff between spatial localization and maintaining a small variance in the frequency domain. In our case this low pass filter can be used to smooth high frequency noise from the range images. The one dimensional normal distribution function, often called the Gaussian function is: V 2 T T < 7 2 Hereinafter, the scale factor — ^ will be ignored. In two dimensions, V 2 7 T C T 2 without the scale factor, the function is represented as: C ( z , y ) = e - ( * 2 + ^ W This function is represented graphically in Figure 4.1(a). Our objective is to smooth the image, and approximate the first partial derivative at each surfel in the image. A n operator designed to perform these two tasks simultaneous-ly is described in the next section. - 2 0 -Figure 4 . 1 (a) Two d imens ional Gaussian f u n c t i o n . - 21 -4 . 4 . The Gradient Operator Gaussian smoothing in two dimensions is implemented by creating a square matrix called a mask, which is a discrete representation of the normal distribution. The mask is used to convolve the image with the Gaussian func-tion. The five by five Gaussian mask with a a of 1 is as follows: .0183 .0821 .1353 .0821 .0183 .0821 .3679 .6065 .3679 .0821 .1353 .6065 1 .6065 .1353 .0821 .3679 .6065 .3679 .0821 .0183 .0821 .1353 .0821 .0183 Notice that the mask is rotationally symmetric, and therefore non-directional. Taking the first partial derivative of the Gaussian function with respect to x results in: ax a2 (P-This function is represented graphically in Figure 4.1(b). There is one parameter, a , which should be determined carefully when using real world images (Glicksman, 1982). In our controlled environment, approximately the same amount of high frequency noise is apparent in every image, thus a was selected simply by visual examination. The 5 x 5 mask representing the first partial differential of the Gaussian function with respect to x is as follows: - 2 2 -i g u r e 4 . 1 (b) F i r s t p a r t i a l d e r i v a t i v e o f the Gaussian f u n c t i o n . - 23 -.0183 .0821 0 -.0821 -.0183 .0821 .3679 0 -.3679 -.0821 .1353 .6065 0 -.6065 -.1353 .0821 .3679 0 -.3679 -.0821 .0183 .0821 0 -.0821 -.0183 When applied to a range image at a point P, the resulting value is an ap-proximation of the slope in the x direction at P. A value to approximate the y slope at P can be determined by rotating the mask 90 degrees.2 This operator is similar to difference operators applied to intensity images by Sobel, Kirsch, and others (Ballard and Brown, 1982) with the added ad-vantage of noise removal. It also returns values of large magnitude at points in the image where a jump edge is present. This is not surprising, since the operator is a discrete approximation to first order partial differentiation. A planar surface extraction algorithm based on local consistency will not ignore one of these jump edges. 4.5. Two Stage Surface Extraction 4.5.1. Clustering The first stage of surface extraction is a single top to bottom scan of the x and y slope images simultaneously. During this scan neighbouring surfels which are similar in both x and y slope are clustered into a region and given a A mask slightly different from this was used in error in the implementation. label. 'Similar' means to be within a threshold. For the clustering algorithm the first row and column of the region map are defined to be region label 1. The x and y slope of each surfel S, (excluding the first row and column) are compared to the x and y slope of its four preceding neighbours in the scan. These neighbours are denoted N l - N4 in Fig. 4.2. If S is not similar to any of these neighbours, it is assigned a new re-gion label. If it is similar to one or more neighbour, it is assigned the label of the neighbour with the minimum total magnitude difference in slopes. In an 8-connected manner, but considering only 4 neighbours, the scan proceeds to create a region map, and a data structure containing the number of surfels in each region, and the average x andy slopes for that region. This scan can not result in an adequate representation of surfaces in an image. Consider a ' U ' shaped surface. This top down scan would at best result in two adjacent regions representing that surface. The purpose of this scan is to reduce the image to a region map and a data structure. Although there may be 500 or more of these regions, varying with image content, merging the regions will certainly be less computationally costly than merging at the surfel level. 4.5.2. Merging the Regions In this stage the region map, derived above, is scanned top to bottom re-peatedly until all adjacent, similar regions are merged. At each point in the 15 This threshold is currently set at approximately 15 degrees. This is too high for all but the simplest of domains. The reason for this high threshold is discussed in Chapter 7. -25-Na M. Figure 4.2 (a) Clustering Neighbours. S Ne Figure 4.2 (b) Merging Neighbours. - 26 -region map, the region label of the succeeding 4 neighbouring surfels is checked to see if there is an adjacent region present. (N4 - N8 in Fig. 4.2(b)) The slopes of the adjacent region are checked for similarity, and similar re-gions are merged. In the merging stage the scan is looking forward and down for adjacent regions, resulting in a complete 8-connected detection method when con-sidered in combination with the clustering stage. A n algorithm similar to this is given in Ballard and Brown (1982). 4.6. Verification of Surfaces The previous section has referred to clusters and regions, as opposed to surfaces, because some of the regions detected are not valid surfaces. There are two types of regions detected that do not correspond to surfaces in the scene. The first type are small regions caused by noise or the shadow effect, and these are eliminated by setting a threshold for the smallest region that will be accepted as a surface. The threshold used is 40 surfels.4 The second type of erroneous region is caused by the operator used to compute the gra-dients gradients. These regions are characterized by having very large slopes. As the gradient operator passes over an edge or peak in the image, the result-ing approximation to the first partial derivative is meaningless. The solution is to remove all regions with a very high slope. This was found to eliminate the erroneous regions without affecting valid long and narrow surfaces. 4It was found that this 40 surfel threshold performs well at removing unwanted regions. This is a domain and device dependent threshold that can only be determined by experi-mentation. - 27 -Chapter 5 Modeling the Objects 5.1. Introduction A modeling scheme for rigid solids that is both descriptively and pro-cedurally adequate (Glicksman, 1982) has been sought since the early 1970's for use in computer graphics, C A D / C A M , and scene analysis. (For a survey of these methods see (Requicha, 1980).) The proper technique to use is depen-dent on application, and the information available for building the model. In this chapter, more details of view modeling are disclosed. A knowledge base containing the view models is created interactively by the user in 'Teach Mode'. The actual topological, and geometric properties are collected automatically from the image, and the user is asked to interac-tively enter the non-visual object properties, if there is not already a 'real world' model for that object in the knowledge base. 5.2. View Modeling The framework for view modeling is surface adjacency topology. A model of a view is represented by a surface adjacency graph. Falk (1972) uses a similar representation which he refers to as face adjacency graphs. To demonstrate this framework, Figure 5.1(a) is a sketch of an overhead view of a truncated pyramid. Fig. 5.1(b) is the surface adjacency graph for that ob-ject which will serve as the skeleton for the model of this view. The nodes in the graph represent the surfaces, and the links represent surface adjacencies. Putting the meat on this skeleton involves extracting the geometric properties -28-Figure 5.1 (b) Surface adjacency graph for truncated pyramid sketched above. - 29 -of the surfaces and adjacencies, which are augmented to the nodes and links, respectively. A l l properties to be used for model matching must be indepen-dent of the object's location and orientation with respect to its particular resting position in the image. The view model for an object is used to represent only one resting position for that particular object. The model will vary slightly as the object is placed in different locations in the image due to panoramic distortion. This variation is insignificant if the image plane is small compared to the distance from the image plane to the range finder. In our implementation the images are treated as orthographic projections, and this degradation toward the extremities of the image plane is ignored. The entire adjacency network is summarized in another structure. This object knowledge is referred to as generic knowledge as it contains informa-tion about the entire adjacency structure. The generic knowledge of the model is used for a preliminary scan of the knowledge base determining which models should be considered as candidates for model matching algorithms. This will be discussed in Chapter 6. The C programming language data structures used for a view model are given in Appendix A . 5.2.1. Surface Properties The properties collected for each surface are the following: 1. Number of adjacent surfaces - determined during the trace of the object. 2. Projected area - measured in surfels. 3. Projected perimeter - measured in surfels. - 30 -. ~ , 4TT Projected Area 4. Compactness = (Projected Perimeter)2 5. Maximum height - measured in range image levels. 6. x slope = p as determined by the gradient operator. 7. y slope = q as determined by the gradient operator. 8. Magnitude of gradient 9. Direction of gradient = atan(<jf/p) If an intensity image were used rather than a range image, several of these properties, such as maximum height, and slope values would not be readily available. The model is very rich, even when considering that it is only for a single view. Note that properties such as area do not represent the true shape of that surface, but the projected area seen in the image of this view as extracted by the surface detection methods of Chapter 4. 5.2.2. Surface Adjacency Properties The most important property regarding the relationship between two sur-faces is whether a connecting edge exists. For adjacent surfaces, two more useful properties can be extracted: 1. Difference in magnitude of slope between the two surfaces 2. Convexity of the surface adjacency, i.e., do the two surfaces form a convex edge, a concave edge, or does one surface partially occlude the other, resulting in a jump edge. - 31 -5.2.3. Generic View Model Properties Properties which typify the entire view model are collected as the adja-cency structure is being constructed. They are: 1. Number of surfaces visible in this view. 2. Total projected surface area for this view. 3. The maximum height of any surfel in the object. 5.3. Adequacy Issues 5.3.1. Descriptive Adequacy Descriptive adequacy for view modeling in a restricted domain is a different issue from typical solid modeling adequacy criteria. The objective of this modeling scheme is not to represent an object, but to represent a particu-lar view of an object. The difference is significant. No hidden surfaces in a particular view are included in the view model, making it a 2-D description of what is in the scene. The representation is made more adequate by enriching this 2-D description with accurate 3-D geometric and topologic information. Since there is no attempt to actually create an object model, the question: 'What objects can be accurately represented?', becomes: 'What views, and of what objects can be distinguished from each other without ambiguity?'. Am-biguity in view models can occur in three ways. The first case is when two objects that are slightly different are not distinguished from one another be-cause the range finder and low level processing have failed to detect this small difference. The degree of this ambiguity is dependent on the capabilities of the range finder and low level processing. The features recognized in our im-- 32 -plementation are discussed in Chapter 7 . Some standard for defining features of objects that can be distinguished in range images would be useful. No such standard has appeared in the literature. The second case of ambiguity is when a particular view of an object is the same as a view of a different object. Although this very problem would make view modeling impractical in most vision systems, the fact that this scheme is designed for a robotics domain makes the deficiency less serious. When this ambiguity occurs the robot can easily remove the object from the work envelope, or alter its position, so that it can be recognized uniquely. The third case is when two view models are the same, but the actual view is different. This very rarely occurs, and is a direct result of a deficiency in the implementation. Consider the two objects in Figure 5.2. Each object has corresponding surfaces of the same size and height. The height of the surfaces labeled SI is greater than those labeled S2, and all surfaces are parallel to the image plane. Presently our view models for both of these objects will be the same. One possible solution to this inade-quacy is to identify the edges in the scene, and specify connecting edges in terms of exactly which edge is shared by the adjacent surfaces.5 Since there is no attempt to distinguish all solid objects, the main cri-teria for descriptive adequacy is application dependent. Is the representation descriptive enough to distinguish among all different objects in a domain? It is only useful, and necessary, to recognize objects which may be encountered. This deficiency in the modeling method was not noticed until the final collection of results for this research, when exactly the case in Figure 5.2 occurred. This is yet another example of the program debugging the programmer. -33-s, St. vS2. Figure 5.2 Two objects with the same view model. - 34 -5.3.2. Procedural Adequacy The criteria for procedural adequacy of a modeling technique are more easily stated. First, the model must be designed so that all of its information is easily accessible. Second, it must lend itself to being processed efficiently. Even a model which contains all the necessary information is useless, until it can be used in the application for which it was designed. The view model, due mostly to its simplicity, meets the first criterion very well. A l l of the information on all surfaces, and the object, is readily available. Pointers are used to represent the undirected adjacency graph, and each adjacency is stored twice - once with each node, for immediate graph traversal. There is one potentially serious procedural problem with the representa-tion in satisfying the second criteria. Since the models are represented as graphs, model matching must be some form of graph matching.-Determination of graph isomorphism is easy to implement, but theoretically it provides some serious difficulties. It is still an open issue as to whether the graph isomor-phism problem is NP-complete (Garey and Johnson, 1979), so the algorithm which does this matching must be carefully guided heuristically, and should check for situations that are computationally explosive. For example, when two surface adjacency graphs have a large number of similar nodes, the matching algorithm described in the next chapter will be very inefficient. Our problem is complicated even more by the fact that we are not searching for an exact graph isomorphism, since the data may be incomplete. - 35 -Chapter 6 Model Matching 6.1. Introduction The model matching procedure described in this chapter is performed in two stages. In the first stage, the generic knowledge is used to eliminate models in the knowledge base that are not close enough to the model which is to be recognized. This stage will be referred to as 'filter'. The second stage is to match the surface adjacency graph of the model to be recognized (Rmodel) to each of the models in the knowledge base that was not eliminated in the first stage. A scoring mechanism associated with this match is used to deter-mine which of the models in the knowledge base best matches the Rmodel. This graph matching algorithm is computationally complex, and several con-straints are described to guide the graph matching more efficiently. Barrow and Popplestone (1971) have used a similar method of model matching. They have combined graph traversal with the Branch-and-Bound algorithm. Branch-and-Bound is not used in the graph matching algorithm described in this chapter, although it could be added to the constraints already on the al-gorithm to make it even more efficient. 6.2. Filter the Knowledge Base The knowledge base is scanned to eliminate models which are not to be considered in the second stage of the matching procedure. A significant difference in either the maximum height of the object, or total surface area is cause to reject the possibility of a match. A 30% difference in area, or 10 sen-- 36 -sor levels of height (approximately 3.5 mm) are the maximum differences al-lowed for a model to be considered in the next stage of matching. 6.3. Heuristic Augmented Graph Matching Matching two augmented graphs is an interesting search problem. The goal is to determine if the two graphs are isomorphic, and the optimal isomor-phism, if such a relationship exists. The problem is further complicated by the possibility of inaccurate or incomplete data, making the concept of 'al-most isomorphic' meaningful. It is not practical to test all of the mappings from one graph into another. For graphs with n nodes there are n! such map-pings. The search must be constrained in some manner. It is suggested in Korfhage (1974) that invariants in the graphs, such as number of edges and vertices be used as initial constraints. This is not a practical solution to guide a search where the data may be incomplete. The search method used is a constrained, simultaneous depth first search of the two graphs being compared. It determines an optimal graph match, and is guided by three independent constraints: reliability, structural compati-bility, and relational compatibility. 6.3.1. Reliability Constraint At any stage during the depth first search there may be several adjacent nodes which could be selected, as the next node to be considered in the search. The reliability constraint guides the search to 'prefer' the node which represents the surface that is most likely to be recognized in a scene. This surface is characterized by a large surface area, low magnitude of slope, and - 37 -high compactness. The purpose of this 'preference' for reliable nodes is to guide an imper-fect match. A l l nodes will eventually be considered in a graph matching algo-rithm where the data used is complete, and in that case the order of the match is inconsequential. In our images a small or highly oblique surface may occasionally be missed by the low level processing. This node evaluation is used to minimize the effect of this error on the graph matching algorithm, and improve the search performance by avoiding mistakes in the models due to missing or erroneous information. 8.3.1.1. Scoring a Node for Reliability A node can receive a maximum of 100 'reliability' points. If the surface area is greater than 500 surfels, then 50 points are awarded. If there are less than 500 surfels, then the number of surfels divided by 10 is awarded. 25 points are added in a similar manner based on the magnitude of the gradient for that surface. The compactness (which ranges from 0 to 1) times 15 is ad-ded giving a maximum total, so far, of 90 points. The remaining 10 points are determined by the number of adjacent nodes. If a node has 10 or more adjacent nodes, 10 points are added. If there are less than 10, the number of adjacent nodes is added. Although the reliability of the surface does not depend on this property, it is included to guide the search toward nodes which have many adjacencies. These nodes are preferable because the algo-rithm will have more adjacencies to choose from in the subsequent step. - 38 -6.3.2. Structural Constraint During the simultaneous depth first search, nodes are matched one at a time. After matching the initial pair of nodes, only nodes which are structur-ally compatible in the graphs will be considered for matching. For example, the second pair of nodes to be matched, must both be adjacent to their respective member of the initial pair. This structural constraint is inherent in the depth first search algorithm, making it a convenient method for reducing the number of nodes which must be compared. 6.3.3. Relational Constraint Once a pair of nodes has been deemed structurally compatible, they are compared to determine relational compatibility. This comparison is computed by examining the differences in the geometric properties of the surfaces which the nodes represent. If the comparison results in a score which is above a threshold, the nodes are relationally compatible. The comparison of nodes is detailed in section 6.4.3.1. The threshold is set at 50 points out of a max-imum 100, to ensure that no possible match is missed. 6.3.4. A Scoring Mechanism for the Match At each stage in the match it is necessary to know a score for the good-ness of the match so far. The score is determined by evaluating the difference between the nodes and links as they are traversed. A point system where a high score represents a good match has been devised for this purpose. Proper-ties which the range finder and low level processing found accurately are given higher weights in this point system than properties detected inconsistently. - 39 -The score used for comparison of nodes is the same formula used to determine relational compatibility of the two nodes. 6.3.4.1. Compare Nodes The comparison of nodes in the model scores the match between two sur-faces. The more reliable properties are given more weight. The score is based on per cent difference, with a maximum score of 100 points. The weights as-signed to each property used in the comparison are as follows: Surface Property Weight 1. Area 20 2. Number of Adjacent Surfaces 10 3. Perimeter 10 4. Compactness 15 5. Maximum Height 25 6. Magnitude of Gradient 20 Total 100 6.3.4.2. Compare Links The adjacencies between the surfaces, represented in the graph by links, are already considered in the matching algorithm in a structural sense during the graph traversal. The relational properties augmented to these links should also be considered in the final score relating the two view models being - 40 -matched. 50 points are awarded here for an exact match between adjacency properties. The total score is equally divided between the convexity factor, and the difference in magnitude of slope. 6.3.5. The Algorithm In this section, the algorithm for graph matching is summarized in pseudo-English. 'Best' refers to the node with the highest score using the node evaluation function. 'comp_nodes' refers to the score of the comparison of nodes in Section 6.2.2, and 'comp_links' refers to the score of the comparis-on of two links described in Section 6.2.3. Recall that a 'possible match' between two nodes indicates that they are in structurally compatible positions in the graphs, and relationally they have scored above a threshold using the 'comp_nodes' function. - 41 -Driver Determine the 'best' node in Rmodel, call it B E S T . Determine 'possible' matching nodes in KBmodel for B E S T . For each 'possible' matching node: Call Depth First Search( B E S T , possible KBnode, Rvisit, KBvisit, score) Matcher [I] Depth First Search(Rnode, KBnode, Rvisit, KBvisit, score) (Rvisit and KBvisit are arrays of nodes that have already been considered.) [2] Visited(Rnode, Rvisit) Visited(KBnode, KBvisit) (Record that these nodes have been considered.) [3] score = score + comp_nodes(Rnode, KBnode) [4] For all of Rnode's unvisited adjacent nodes: [5] T E M P = 'best' of Rnode's unvisited adjacent nodes [6] For all 'possible' matches for T E M P from among KBnode's unvisited adjacent nodes: [7] Store a copy of this environment. [8] Depth First Search(TEMP, A KBnode 'possible' match, Rvisit, KBvisit, score) (The KBnode which returns the largest score in this loop is the one to continue with - call it M A T C H . ) [9] Replace the old environment stored above. [10] score = score + comp_nodes(TEMP, M A T C H ) [II] score = score + comp_links(Rnode,TEMP,KBnode,MATCH) [12] Visited(TEMP,Rvisit) Visited(MATCH, KBvisit) [13] The maximum score that appears here at the end of the recursive routine is the value of the match between these two models. This score must be saved globally. - 42 -Chapter 7 Results and Evaluation 7.1. Criteria for Evaluation Ultimately an object recognition system for robotics is judged on success-ful operation in an actual working environment. For purposes of research, some other criteria must be established. One way is to choose a domain and test recognition performance. The success rate, however, will never be 100% for all environments. There will always be objects which are too similar to be distinguished, even if the difference is very minute. It would be useful to judge the system at intermediate stages of processing to identify where the method has failed. The success of a vision system for object recognition is dependent on the scene features that can be discerned. For a region based system, the smallest surface that can be successfully identified should be recorded. This surface must be smaller than any distinguishing feature in the domain. The time required to recognize objects is very important in robotics ap-plications. The robot must work quickly enough to make its presence finan-cially viable. Comparison with research done in other laboratories is often useful. Several papers on range data processing have been published using the same test image. (Gil et al. (1983), Nitzan et al. (1977), Duda et al. (1979)) Unfor-tunately, the information and features in the scene that is used, cannot be defined in terms of features present, data-noise ratio, or any quantitative measure that can be used to compare performance of the methods applied. - 43 -7.2. Strengths and Weaknesses 7.2.1. Specular Reflectance Some problems inherent in range images have not yet been discussed herein. Occasionally, if a surface is very smooth, and positioned normal to the laser beam, the reflection from the surface will be too intense for the sen-sors to detect properly. This problem, due to specular reflectance, can be corrected by reducing the intensity of the light source when it is detected that the reflectance is over a threshold. Research into this feedback system is just beginning for use with the N R C range finder. Specular reflectance is demon-strated in Figure 7.1. The intensity image is displayed here and the specular reflectance is. clearly visible. The effect of this error in the range image is shown by the planar surfaces that can be detected, in Figure 7.2. A n image with very little specular reflectance is shown as the right half of both Figures 7.1 and 7.2. 7.2.2. Shadow Effect The second problem caused by the range finding method is called the shadow effect. This occurs when the laser beam strikes the scene in a location which is not visible to the sensor, or when a point in the scene can be viewed by the sensor, but not reached by the laser. When using the N R C range finder, with its compact head and new geometry, the shadow effect is reduced considerably. Figure 7.3 shows one of the shadows visible in the test images. Figure 7.4 indicates the effect of this error after regions are detected. Although it may not be clearly visible in the - 4 4 -Figure 7.2 The surfaces extracted from the range image corresponding to Fig-ure 7.1. The surface most affected by specular reflectance is noticably deficient. - 45 -photograph in Figure 7.4, a small erroneous region is detected, and is assumed to be part of the object when it is in fact a region caused by the shadow effect. It was discovered that these shadows are so minimal in our images, that there is no significant impairment of the ability to do object recognition. As described in Chapter 6, the model matching algorithm is guided by an evaluation function. The regions caused by shadow effect are consistently scored very low by this function, and therefore the matching process is only slightly affected. 7.2.3. Calibration and Panoramic Distortion The N R C range finder is too complex to describe here. In Rioux (1982), the design of the camera head is explained. The mirrors used to direct the laser beam are arranged in a manner that allows the device to be very fast, but the resulting image is slightly distorted. This distortion is, so far, mathematically intractable. It includes a combination of non-linear transforms in the X - Y , X-Z, and Y-Z planes, all of which vary in different parts of the image. This distortion did not affect our basic research goals, but it did make the problem of adjusting detected surface properties for panoramic distortion impractical at this time. Because of this, the accuracy of each step of the processing has suffered, and the thresholds were required to be more lenient to error. Even these unsolved problems did not prevent the methods described in the previous chapters from achieving a satisfactory rate of success. When the images are properly calibrated, and perspective error is re-moved, the methods described in Chapter 4 will be capable of distinguishing - 4 6 -Figure 7.3 The shadow effect has been greatly reduced by the N R C range finder's new scanning mechanism. This photograph is of a range image displayed as intensity. The lighter areas are nearer the range finder. Some shadow effect is present in the image, although it may not be visible in the photograph. Figure 7.4 Occasionally a shadow is included as part of an object in a view model. This photograph depicts the surfaces detected in Figure 7.3. There are 2 small erroneous surfaces. - 47 -much smaller surfaces, and the accuracy of the geometric properties extracted from the scene will be significantly improved. 7.3. Performance Performance of the system will be measured in terms of the three criteria mentioned previously in this chapter. C P U time required for the algorithms; image features detected; and success rate of object recognition will be report-ed. The methods described in Chapters 4-6 were implemented on a V A X 11/780 in the ' C programming language. The C P U times recorded were pro-vided by the ' T I M E ' command built into the U N I T Y operating system. The images used are 128 x 155 surfels with a resolution of approximately 1.1 mm. 7.3.1. C P U Time Requirements The processing is divided into three stages for more informative analysis of C P U time required: 1. Simultaneous smoothing and mapping the image to gradient space. 2. Planar Surface Extraction. 3. Model Building and Matching. Stage 1 is computationally very costly. For each image the C P U time re-quired is 40.0 seconds. Time required for stage 2 varies with the content of the image. The C P U time was recorded for images with 1 to 4 objects. The average time per image is 35.4 seconds, which amounts to about 11.8 seconds per object found - 48-in the scene. For stage 3, timing also depends largely on image content. The average time for model construction and matching is 13.8 seconds per image, or 6.53 seconds for each object. The total average processing time per image is approximately 1.5 minutes. This is a considerable distance from real time object recognition. The concept of 'flexible hardware' (Shirai, 1979) for real time low level image analysis should all but eliminate the first two, most costly stages. The model construction and matching stage could easily lend itself to a multiprocessor implementation, with each model to be matched executing on a separate pro-cessor. 7.3.2. Image Features Detected In the simple polyhedral domain, the image features such as surfaces, are relatively easy to measure. Since all features in our test images were detect-ed, the only figures that can be given as a measure of performance are the smallest features found in the domain. More rigorous testing should be con-ducted when the calibration of the range finder is complete. The danger of region merging methods such as the one described in Chapter 4 is that they tend to merge or grow further than they are intended. The measure of a good region detection method in range images is the angle between planar surfaces which the method can detect. The smallest dihedral angle such as this in the test images used is approximately 20 degrees. The smallest surface is approximately 40 m m 2 in area. - 4 9 -7.3.3. Object Recognition Statistics Currently, the assumption is made that all objects seen will have a corresponding view model somewhere in the knowledge base. This could easi-ly be changed to a required minimum threshold for a match between two models. Our results would have improved slightly by this addition. For the purpose of testing the system, 10 models were constructed, each from a different image, and placed in the knowledge base. 9 images were 'shown' to the system containing a total of 19 randomly placed (but not touching) objects. 16 of the 19 were successfully recognized. This is a success rate of about 84%. Of the three that failed, one was due to specular reflectance, which caused the KBmodel to be poorly constructed. The second was due to the descriptive inadequacy of the modeling technique. The specific fault was described in Section 5.3.1, where a straight forward solution is pro-posed. The third case would not have been an error if we had set thresholds for successful matching, instead of assuming the highest scoring match is correct. The object to be recognized had no corresponding KBmodel, but it was matched to a similar object with a fairly low score. - 50 -Chapter 8 Extensions and Conclusions 8.1. Extensions to the Implementation There are many possible extensions for the implementation described in this thesis. We are firmly committed to the principle of a 100% success rate, before extensions are made, and constraints on the domain/environment are removed. The constraint that objects may not touch or occlude one another, should be removed. A method for this is discussed in Oshima and Shirai (1983). They allow for partial occlusion of objects, and analyze what is seen as well as what is expected in the occluded area. This method will eventually fail depending on the amount of occlusion, and some alternate method will have to be used. The problem is exacerbated if it is impossible to determine when the method has failed. The extension that we propose will remove the constraints on objects touching or occluding each other. When an object or group of objects is not recognized the robot should be programmed to simply push those objects away, or pick up one of them and discard it from the work envelope. This is not a glamorous, or scientific solution, but it may be the most practical one. When searching for a 100% success rate, all of the tools available must be used to their best advantage. Everything that can be found in the proverbial 'bag of tricks' should be used. The bin of parts problem can also be solved using such a practical solu-tion. If the robot has the capability, it should simply pick up the bin of parts - 51 -and dump it onto a flat surface. If that is not feasible, the robot can pick an object from the bin, and place it on a flat surface where it can be recognized by the vision system. Boissonnant and Germain (1981) have proposed a method of picking objects from a bin without actually being able to recognize the object. This is done by identifying shapes of partially occluded objects, which are the correct size for the gripper, and appear to be stable, i.e., it ap-pears that the gripper will not slip from these locations. Since the world is not constructed of polyhedra, it will eventually be necessary to remove this constraint also. Several methods of doing this have been proposed. One solution is to approximate all surfaces with planar patches. This idea can not be considered in conjunction with the methods described in Chapters 4 - 6 . If a curved surface is approximated as a large number of planar surfaces, the graph matching algorithm will become unsuit-able for matching models of such complexity. A more appropriate solution is to detect and model the curved surfaces, as is done in Agin and Binford (1973). The industrial environment will almost always provide regularly curved surfaces. With range images, the ambiguity found when modeling curved surfaces from intensity images is removed. 8.2. Conclusions The intent of this research was not to produce an end product for indus-trial robotics. Our objective has been to design an approach to computational vision specifically for range images that eludes some of the factors that have prevented widespread use of intelligent robots. The system implemented, pro-vides a new modeling method, low level algorithms to create the models, and - 52 -high level model matching techniques. The success rate for object recognition in the constrained domain/environment that was simulated is quite encourag-ing. Although only 84% of the objects were successfully recognized, the rea-sons for the 16% that did not succeed have been identified, and straight for-ward methods for correcting these errors are presented. - 53-References Abdelmalek, Nabih N . (1983), Robotics Bibliography: 1981-1982, Part B, Robotic Research and Development, ERB-960, National Research Council of Canada, Ottawa, Canada. Agin, G.J . and Binford, T . O . (1973), Computer Description of Curved Objects, Proc. Third International Joint Conference on Artificial Intelli-gence, 629-640. Ballard, D . H . and Brown, C M . (1982), Computer Vision, Prentice Hall, Englewood Cliffs, New Jersey. Barrow, H . G . and Popplestone, R.J. (1971), Relational Descriptions in Picture Processing, Machine Intelligence 6, 377-396. Boissonnant, J.D., and Germain, F. (1981), A New Approach to the Prob-lem of Acquiring Randomly Oriented Workpieces out of a Bin, Proceed-ings of 7th IJCAI, 796-802. Duda, R .O. , Nitzan, D. , and Barett, P. (1979), Use of Range and Reflectance Data to Find Planar Surface Regions, IEEE Transactions on Pattern Recognition and Machine Intelligence, Vol . PAMI-1(3), 259-271. - 54-[7] Falk, G . (1972), Interpretation os Line Data as a Three Dimensional Scene, Artificial Intelligence 3(2), 101-144. [8] Gil , B., Mitiche, A . and Aggarwal, J .K. (1983), Experiments Combining Intensity and Range Edge Maps, Computer Vision, Graphics, and Image Processing 21, 395-411. [9] Garey, M.R. and Johnson, D.S. (1979), Computers and Intractability: A Guide to the Theory of NP-completeness, W . H . Freeman and Co., San Francisco. [10] Glicksman, J. (1982), A Cooperative Scheme for Image Understanding Using Multiple Sources of Information, University of British Columbia Technical Report T N 82-13. [11] Hebert, M . , and Ponce, J. (1982), A New Method for Segmenting 3-D Scenes into Primitives, Proceedings of the 6th International Conference on Pattern Recognition, Munich, Germany, 836-838. [12] Henderson, T . C . and Bhanu, B. (1982), Three Point Seed Method for Ex-traction of Planar Faces from Range Data, Conference Record of the 1982 Workshop on Industrial Applications of Machine Vision, 181-186. [13] Inokuchi, S., Nita, T . , Matsuda, F . and Sakurai, Y . (1982), A Three Di-mensional Edge-Region Operator for Range Pictures, Proceedings of the - 55 -6th International Conference on Pattern Recognition, Munich, Germany-, 918-920. [14] Jarvis, R.A.(1983), A Perspective on Range Finding Techniques for Com-puter Vision, IEEE Transactions on Pattern Analysis and Machine Intel-ligence, Vol . PAMI-5, No. 2. [15] Korfage, R.R. (1974), Discrete Computational Structures, Academic Press. [16] Mackworth, A . K . (1973), Interpreting Pictures of Polyhedral Scenes, Proceedings of Third IJCAI, 556-563. [17] Mackworth, A . K . (1974), Use of Constraints for Interpreting Three-Dimensional Scenes, Ph.D. thesis, University of Sussex. [18] Mackworth, A . K . (1976), Model Driven Interpretation in Intelligent V i -sion Systems, Perception, Vol . 5, 349-370. [19] Marr, D. and Hildreth, E . (1979), Theory of Edge Detection, Mas-sachusetts Institute of Technology A l Memo No. 518. [20] Mitiche, A . and Aggarwal, J .K. (1982), Detection of Edges using Range Information, IEEE Transactions on Pattern Analysis and Machine Intelli-gence, Vol PAMI-5, No. 2. - 56-[21] Morgenthaler, D . G . and Rosenfeld, A . (1981), Surfaces in Three-Dimensional Digital Images, Information and Control 51, 227-247. [22] Nitzan, D. , Brain, A . and Duda, R. (1977), The Measurement and Use of Registered Reflectance and Range Data in Scene Analysis, Proceedings of the IEEE, Vol 65, No. 2. [23] Oshima, M . and Shirai, Y. (1983), Object Recognition Using Three-Dimensional Information, IEEE Transactions on Pattern Analysis and Machine Intelligence, Vol PAMI-5, No. 4. [24] Popplestone, R.J., Brown, C M . , Ambler, A P . and Crawford, G . F . (1975), Forming Models of Plane-and-Cylinder Faceted Bodies from Light Stripes, Proceedings of 4th IJCAI, 664-668. [25] Requicha, A . A . G . (1980), Representations for Rigid Solids: Theory, Methods and Systems, Computing Surveys, Vol 12, No. 4, 437-464. [26] Rioux, M . (1982), Compact, High Speed 3-D Vision Sensor for Robotic Applications, 2nd Canadian CAD/CAM and Robotics Conference Proceedings, Toronto, 4.11-4.16. [27] Shafer, S.A., Kanade, T . and Render, J. (1982), Gradient Space Under Orthography and Perspective, IEEE 1982 Workshop on Computer Vision: Representation and Control, 26-34. - 57-[28] Shirai, Y . (1979), On Application of Three Dimensional Computer Vision, Bull. Electrotech. Lab. (Japan), Vol 43(6), 358-377. [29] You, K . C . and Fu, K.-S. (1979) A Syntactic Approach to Shape Recogni-tion Using Attributed Grammars, IEEE Transactions on Systems, Man and Cybernetics, Vol SMC-9, No. 6. - 5 8 -A p p e n d i x A / * T h e d a t a s t r u c t u r e s r e p r e s e n t i n g t h e k n o w l e d g e b a s e . * / * T h e l i n k b e t w e e n two a d j a c e n t s u r f a c e s * / s t r u c t a d j _ s u r f f l o a t x _ d i f f , y _ d l f f ; / * d i f f e r e n c e i n x _ s l o p e a n d y _ s l o p e o f t h e two s u r f a c e s f l o a t m a g _ d i f f ; / * m a g n i t u d e o f s l o p e d i f f r e n c e s h o r t i n t c o n v e x ; / * I s t h i s a c o n v e x i n t e r s e c t i o n s t r u c t s u r f a c e * a d j ; / * p o i n t e r t o t h e a d j s u r f a c e s h o r t i n t a i ; / * an i n d e x f o r s t o r a g e >; / * A s u r f a c e a n d i t s a t t r i b u t e s * / s t r u c t s u r f a c e < s h o r t i n t s u r f _ l a b e l , n o _ a d j s u r f s , a r e a , p e r i m e t e r , m a x _ h e i g h t ; f l o a t c o m p a c t , x _ s l o p e , y _ s l o p e ; d o u b l e m a g n i t u d e , g r a d i e n t ; / * w h i c h s u r f a c e i s t h i s / * c o u n t o f a d j s u r f a c e s / * a r e a i n s u r f e l s / * p e r i m e t e r i n s u r f e l s / * max h e i g h t o f s u r f a c e / * c o m p a c t n e s s o f s u r f a c e / * a p p r o x t o f i r s t p a r t i a l d e r i v / * m a g n i t u d e o f g r a d i e n t * / / * d i r e c t i o n o f g r a d i e n t * / s t r u c t a d j _ s u r f n e x t _ s u r f [ 8 ] ; / * l i n k s t o a d j s u r f a c e s >; / * T h e o b j e c t s t r u c t u r e c o n t a i n s t h e g e n e r i c k n o w l e d g e f o r t h e p a r t i c u l a r v i e w m o d e l . * / s t r u c t o b j e c t c h a r v i e w n a m e [ 1 0 ] ; s h o r t i n t n o _ s u r f s , t o t _ a r e a , m a x _ h e i g h t ; / * name t h e v i e w f o r t h e r e p o r t / * s u r f a c e s i n t h i s o b j e c t / * t o t a l a r e a o f a l l s u r f s / * max h e i g h t o f o b j e c t s t r u c t s u r f a c e s u r f [ 1 5 ] ; / * r e p r e s e n t a t i o n o f s u r f a c e s s t r u c t w o r l d _ m o d * m o d e l ; / * p o i n t t o c o r r e c t w o r l d m o d e l s h o r t I n t m i ; / * m o d e l i n d e x }; -59-/ * T h e w o r l d m o d e l , w h i c h e a c h o r i e n t a t i o n o f t h e same o b j e c t r e f e r s t o . * / s t r u c t w o r l d _ m o d < c h a r o b _ n a m e [ 2 0 ] ; / * i d e n t i f y t h e o b j e c t b y name * / f l o a t g r i p [ 4 ] ; / * r o b o t g r i p l o c a t i o n s a n d o t h e r n o n - v i s u a l i n f o r m a t i o n c a n b e s t o r e d h e r e . * / >; / * C r e a t e t h e w o r l d , a n d make i t g l o b a l . Max o f 30 m a c h i n e d m o d e l s f o r n o i * . * / / * p i e c e [ i ] I s a m a c h i n e d m o d e l * / s t r u c t o b j e c t p i e c e [ 3 0 ] ; / * t h e a c t u a l m o d e l s * / s t r u c t w o r l d _ m o d a c t _ m o d [ 1 5 ] ; / * t h e c u r r e n t n o . o f m a c h i n e d a n d a c t u a l m o d e l s * / s h o r t i n t n o _ o b j e c t s , w o r l d _ o b s ; 

Cite

Citation Scheme:

        

Citations by CSL (citeproc-js)

Usage Statistics

Share

Embed

Customize your widget with the following options, then copy and paste the code below into the HTML of your page to embed this item in your website.
                        
                            <div id="ubcOpenCollectionsWidgetDisplay">
                            <script id="ubcOpenCollectionsWidget"
                            src="{[{embed.src}]}"
                            data-item="{[{embed.item}]}"
                            data-collection="{[{embed.collection}]}"
                            data-metadata="{[{embed.showMetadata}]}"
                            data-width="{[{embed.width}]}"
                            async >
                            </script>
                            </div>
                        
                    
IIIF logo Our image viewer uses the IIIF 2.0 standard. To load this item in other compatible viewers, use this url:
http://iiif.library.ubc.ca/presentation/dsp.831.1-0051856/manifest

Comment

Related Items