Open Collections

UBC Theses and Dissertations

UBC Theses Logo

UBC Theses and Dissertations

Robust 3D object detection and feature extraction for cooperative multi-robot tasks Kananka Liyanage, Arunasiri 2010

Your browser doesn't seem to have a PDF viewer, please download the PDF to view this item.

Notice for Google Chrome users:
If you are having trouble viewing or searching the PDF with Google Chrome, please download it here instead.

Item Metadata

Download

Media
24-ubc_2010_fall_kanankaliyanage_arunasiri.pdf [ 4.62MB ]
Metadata
JSON: 24-1.0071036.json
JSON-LD: 24-1.0071036-ld.json
RDF/XML (Pretty): 24-1.0071036-rdf.xml
RDF/JSON: 24-1.0071036-rdf.json
Turtle: 24-1.0071036-turtle.txt
N-Triples: 24-1.0071036-rdf-ntriples.txt
Original Record: 24-1.0071036-source.json
Full Text
24-1.0071036-fulltext.txt
Citation
24-1.0071036.ris

Full Text

ROBUST 3D OBJECT DETECTION AND FEATURE EXTRACTION FOR COOPERATIVE MULTI-ROBOT TASKS by Arunasiri Kananka Liyanage B.Tech.(Eng.), The Open University of Sri Lanka, 2005  A THESIS SUBMITTED IN PARTIAL FULFILLMENT OF THE REQUIREMENTS FOR THE DEGREE OF  MASTER OF APPLIED SCIENCE in The Faculty of Graduate Studies  (Mechanical Engineering)  THE UNIVERSITY OF BRITISH COLUMBIA (Vancouver) June 2010  © Arunasiri Kananka Liyanage, 2010  Abstract Computer vision uses image processing, image understanding, and feature extraction, which is vital in robotic tasks. This research is an integral part of a larger project on human rescue robotics where the goal is to quickly locate objects in an emergency scenario by a group of heterogeneous robots and assemble them into a useful device. Hence, the vision system should be fast and capable of working in an unstructured, dynamic, and unknown environment. Since there may be a number of variations with regard to the objects and the environment, the robustness is crucial.  A novel vision system architecture is proposed and developed in this research to fulfill the vision requirements of a multi-robot system. Appropriate approaches, techniques, and structures are proposed and implemented together with appropriate existing methods and their enhancements. An approach of object modeling is proposed and used to generate object models. These models are used with a proposed object detection method to identify objects and determine useful features and parameters. Another object detection method is proposed to detect regular geometrical shaped objects. The proposed methods is able to detect multiple objects with varying object properties and environmental factors.  Different types of object detection methods are employed in the proposed system according to the requirement of a robot by utilizing a real-time method selection technique, which is developed in the thesis. Achieving the expected level of performance involves a tradeoff between speed and accuracy, by managing the execution of the processing steps in the developed method. Properties of expected objects need to be defined as facts and constraints based on the requirements of the robots. The performance of the vision system can be enhanced, by providing more facts and constraints.  The developed methodologies are implemented in an experimental system in the Industrial Automation Laboratory of the University of British Columbia. Rigorous experiments are conducted in a typical unstructured environment. Features such as invariance of scale, rotation, illumination, and occlusion are tested with different types of objects, for various methods. Generally good results have been obtained thereby validating the developed vision system for use in the multi-robot application. ii  Table of Contents Abstract .................................................................................................................................. ii Table of Contents ....................................................................................................................iii List of Tables ..........................................................................................................................vi List of Figures ........................................................................................................................vii Abbreviations .........................................................................................................................ix Acknowledgments ...................................................................................................................x Dedication ..............................................................................................................................xii Chapter 1- Introduction ...................................................................................................... 1 1.1 Application Domain....................................................................................................... 1 1.2 Computer Vision Requirements for a Multi-robot System ............................................. 2 1.3 Motivation..................................................................................................................... 2 1.4 Objectives ..................................................................................................................... 3 1.5 Challenges..................................................................................................................... 4 1.6 Review of Previous Work ............................................................................................. 4 1.6.1 Classification of Previous Work ......................................................................... 5 1.6.1.1 Local Region Detectors ............................................................................. 7 1.6.1.2 Comparison of Local Region Detectors ..................................................... 8 1.6.1.3 Local Region Descriptors ........................................................................... 8 1.6.1.4 Comparison of Local Region Descriptors .................................................... 9 1.6.2 Incorporated Approaches and Techniques .......................................................... 10 1.7 Organization of the Thesis ........................................................................................... 11 Chapter 2- Vision System Architecture ............................................................................ 13 2.1 Overview..................................................................................................................... 13 2.2 Adaptive Behavior of the Vision System ..................................................................... 14 2.2.1 Image Feature Enhancement ............................................................................. 17 2.2.2 Forming or Training Object Model .................................................................... 18 2.2.3 Facts and Constraints for the Objects ................................................................. 18 2.2.4 Real-Time Object Detection .............................................................................. 19 2.2.5 Object Property Extraction ................................................................................ 22 2.3 Significant Features of the System ............................................................................... 22 2.4 Summary..................................................................................................................... 23 iii  Chapter 3- Core Technologies ........................................................................................... 25 3.1 Overview..................................................................................................................... 25 3.2 Image Feature Enhancement Techniques ..................................................................... 25 3.3 Object Modeling .......................................................................................................... 26 3.3.1 Features of an Object ......................................................................................... 28 3.3.1.1  Haar-like Feature ................................................................................... 28  3.3.1.2  SIFT Feature.......................................................................................... 29  3.3.1.3  Corner-like Feature ................................................................................ 30  3.3.1.4  Other Features....................................................................................... 30  3.3.2 3D Invariant Feature Space ................................................................................ 31 3.3.3 AdaBoost-based Object Model........................................................................... 33 3.3.4 Other Methods .................................................................................................. 38 3.4 Object Detection Methods............................................................................................ 39 3.4.1 Generative and Discriminative Approach .......................................................... 40 3.4.1.1 Object Detection using AdaBoost-based Object Model ............................... 41 3.4.2 Feature-based Approach .................................................................................... 44 3.4.2.1 Object Detection using 3D Invariant Feature Space Model .......................... 45 3.4.3 Geometric Shape-based Approach...................................................................... 50 3.4.3.1 Detection and Property Extraction of Regular Object ................................... 51 3.5 Summary..................................................................................................................... 56 Chapter 4- Experimental Investigation ............................................................................. 58 4.1  4.2  Experimental System ................................................................................................. 58 4.1.1  Run-Time Software Platform ......................................................................... 59  4.1.2  Computer and Robot Hardware ...................................................................... 59  4.1.3  Camera Hardware .......................................................................................... 60  4.1.4  System Implementation .................................................................................. 61  4.1.5  Properties of the Environment ........................................................................ 62  4.1.6  Objects Used in Experiments .......................................................................... 62  Experimentation......................................................................................................... 63 4.2.1  AdaBoost-based Object Modeling ................................................................... 63  4.2.1.1  Experiments with Object Detection using AdaBoost-based Object Model ........................................................................................ 65  4.2.1.1.1 Location-Invariant Behavior .................................................................. 65 iv  4.2.1.1.2 Orientation-Invariant Behavior............................................................... 66 4.2.1.1.3 Invariance to Location, Orientation and Lighting Combination ............... 66 4.2.2  Experiments with Object Detection using 3D Invariant Feature Space Model ... 68  4.2.2.1 Experiments with Different Variation Factors .......................................... 68 4.2.2.1.1 Effect of Misidentified Features ............................................................. 68 4.2.2.1.2 Varying the Lighting Condition, Object Location and Orientation............69 4.2.2.1.3 Different Orientations and Close Camera Position .................................. 69 4.2.2.1.4 Varying the Camera Angle .................................................................... 70 4.2.2.1.5 Occlusion of Object ............................................................................... 70 4.2.2.1.6 Detection of Different Types of Multiple Objects ................................... 71 4.2.3  Experiments with Detection and Property Extraction of Regular Objects ......... 72  4.2.3.1  Experiments with Different Variation Factors ......................................... 73  4.2.3.1.1 Objects with Different Orientations........................................................ 73 4.2.3.1.2 Variation of Camera Angle.................................................................... 73 4.2.3.1.3 Object at a Distant Location................................................................... 74 4.2.3.1.4 Errors in Parameter Extraction............................................................... 74 4.2.3.1.5 Multiple Objects of Different Radius .................................................... 75 4.2.3.1.6 Detection of Rectangular Objects........................................................... 75 4.2.3.1.7 Requirement of Facts and Constraints .................................................... 76 4.2.3.1.8 Detection of Different Type of Multiple Objects: Circular and Rectangular ....................................................................... 76 4.3 Summary  .............................................................................................................. 78  Chapter 5- Conclusions ..................................................................................................... 80 5.1 Synopsis and Contributions........................................................................................... 80 5.2 Possible Future Work ................................................................................................... 82 Bibliography ......................................................................................................................... 83  v  List of Tables Table 1.1: Summary of the Detector Category, Invariance Properties and Individual Ratings for Runtime, Repeatability and the Number of Obtained Regions ................. 8 Table 1.2: Summary of the Descriptor Category, Rotational Invariance Property, Dimensionality of the Descriptors and an Individual Performance Rating; (a) Implemented by Schaffalitzky and Zisserman (2002), (b) N = Number of Samples in the Patch, (c) No Comparable Results, (d) Implemented by Lowe (2004) .......................................................................... 10 Table 4.1: Comparison of Object Detection Methods with Results ...........................................79  vi  List of Figures Figure 1.1: Classification of Useful Methods ............................................................................ 6 Figure 2.1: Vision System Architecture of the Multi-robot System .......................................... 14 Figure 3.1: (a) Haar Wavelets, (b) Haar-like Features.............................................................. 28 Figure 3.2: Illustration of the SIFT Descriptor Calculation ...................................................... 29 Figure 3.3: Boosting Algorithm to Form a Strong Classifier .................................................... 34 Figure 3.4: Predefined Feature Sets ........................................................................................ 35 Figure 3.5: AdaBoost Algorithm ............................................................................................ 36 Figure 3.6: Flow Diagram of Object Detection using AdaBoost-based Object Model............... 42 Figure 3.7: Algorithm of Object Detection using AdaBoost-based Object Model ..................... 43 Figure 3.8: Flow Diagram of Object Detection using 3D Invariant Feature Space Model ......... 46 Figure 3.9: Object Detection using 3D Invariant Feature Space Model .................................... 48 Figure 3.10: For a Given List of Features, the Algorithm will Construct a Balanced KD_Tree Containing these Features................................................... 49 Figure 3.11: Nearest_Neighbor KD_Feature_Tree Search Algorithm ...................................... 50 Figure 3.12: Simplified Process of Polygon and Circular Type of Object Detection ................. 52 Figure 3.13: Polygon and Circular Type of Object Finding Algorithm ..................................... 55 Figure 4.1: (a) The PioneerTM 3-DX Mobile Robot (b) The PioneerTM 3-AT Mobile Robot ...... 60 Figure 4.2: System Implementation Diagram .......................................................................... 61 Figure 4.3: The Laboratory Experimental Environment........................................................... 62 Figure 4.4: Sample Object Images in the Training Sets: (a) Positive Image Set of Wheel Object; (b) Negative Training Image Set ............. 64 Figure 4.5: Correctly Oriented Wheels are Identified when the Location is Varied (the Object is Present in Different Scales) ............................................................. 65 Figure 4.6: Similar Type of Wheels are Identified. Single Object is Identified Even Though There are Many Such Objects (this Method Can Detect Only One Object at a Time).................................................................................................................... 66 Figure 4.7: Detection of Wheel Under Different Lighting Conditions ...................................... 66 Figure 4.8: Results Before using Feature Distribution in the Object Detection Method............. 68  vii  Figure 4.9: The Wheel (Object) Placed at Different Locations, Under Different Lighting and Orientations ..................................................................................... 69 Figure 4.10: Camera is Placed Close to the Wheel and the Object Assumes Different Orientations......................................................................................... 69 Figure 4.11: Camera Angle is Varied...................................................................................... 70 Figure 4.12: The Wheel is Partly Occluded............................................................................. 70 Figure 4.13: Mostly One Feature is Detected for Wheel of Waste Container ............................ 71 Figure 4.14: Effect of Orientation in the Detection .................................................................. 73 Figure 4.15: Camera Angle is Continuously Varied................................................................. 73 Figure 4.16: Scale Invariant Behavior of the Method............................................................... 74 Figure 4.17: Radius of the Detected Object is Shown by the Corresponding Circle Drawn in the Found Location .................................................................... 74 Figure 4.18: Detection of Multiple Objects ............................................................................. 75 Figure 4.19: Different Types of Rectangular Objects are Detected........................................... 75 Figure 4.20: Lack of Facts and Constraints Leads to Reduce Accuracy.................................... 76 Figure 4.21: Detection of Different Types of Multiple Objects Under Different Variations of Objects and Background................................................................................. 76  viii  Abbreviations 3D  Three Dimensional  ANN  Artificial Neural Network  CCD  Charge Coupled Device  DoG  Difference of Gaussian  EBR  Edge Based Region  EBSR  Entropy Based Salient Region  GLOH  Gradient Location-Orientation Histogram  IAL  Industrial Automation Laboratory  ICA  Independent Component Analysis  IBR  Intensity Based Region  KLT  Karhunen–Loève Transform  LDA  Linear Discriminant Analysis  MSER  Maximally Stable Extremal Region  NMF  Non-negative Matrix Factorization  PCA  Principal Component Analysis  SVM  Support Vector Machine  SIFT  Scale-Invariant Feature Transform  SURF  Speeded Up Robust Feature  ix  Acknowledgements First, I wish to express my sincere appreciation to my supervisor, Professor Clarence W. de Silva, Director of the Industrial Automation Laboratory at The University of British Columbia for his rigorous supervision, advice, constructive input, support and help provided during my Master’s degree program and research. These include, to name a few, supervising my research, suggesting and sharpening the research ideas, helping to solve research problems, carefully revising and improving my writing, providing resources, and horning my communication skills. In a word, I am tremendously grateful in my heart, to Professor de Silva for his skillful guidance and unwavering support for my academic and career goals. His academic accomplishments and humanitarian efforts are a true source of inspiration to me and I consider myself very lucky to study under his guidance.  My sincere gratitude goes to Dr. Lalith Gamage, former Ph.D. student of Industrial Automation Laboratory and currently a Visiting Professor and Professor Mu Chiao, who have kindly served as members of my research committee and provided me with constructive comments to improve the standard of my research and the thesis.  I wish to thank the Visiting Professors and my colleagues in the Industrial Automation laboratory; notably, Dr. Tahir Khan, Dr. Ying Wang, Dr. Qingshan Zeng, Dr. Farbod Khoshnoud, Dr Saeed Behbahani, Srinivas Raman, Guan-Lu Zhang, Behnam Razavi, Gamini Siriwardana, Ramon Campos, Tufail Khan, Edward Yanjun Wang, Roland Haoxiang Lang, Mohammed Alrasheed and Madalina Wierzbicki. My gratitude will extend to visiting scholars, Steven Chu, Miller Lu and Joe Huang in Industrial Automation Laboratory for their continuous encouragement, valuable suggestions and comments. In particular, Dr. Tahir Khan has been extremely helpful during the critical situations to get constructive ideas. Further, I wish to thank academic staff (specially, Dr. CW. de Silva, Dr. Ryozo Nagamune, Dr. Jim Little, Dr. Dan Gelbert) for nurturing me with new technologies and concepts and also Ms. Yuki Mastumura for her patience and kind advices given to me when I needed administrative help.  x  Financial assistance for my research, including personnel, equipment and other resources, was provided by research grants held by Professor de Silva, particularly through the: Tier 1 Canada Research Chair (CRC); Canada Foundation for Innovation (CFI); British Columbia Knowledge Development Fund (BCKDF); the National Sciences and Engineering Research Council (NSERC) of Canada Special Research Opportunity (SRO) and Discovery Grants. In addition, I am greatly indebted for the Graduate Entrance Scholarship and the International Partial Tuition Scholarship awards given to me by the University of British Columbia. If not for appointing me as a research assistant and providing financial support for my research by Professor de Silva, it would not have been possible to carryout the present research.  I wish to gratefully acknowledge the researchers especially in the computer vision area and the contributors of OpenCVTM in Intel Corporation for providing an open source computer vision library.  Further, I take this opportunity to thank all my colleagues both in Canada and Sri Lanka; especially Eranda Harinath and his wife Ruvini Sudharma, Nalantha Wanasundara, Lalinda Weerasekara, Nuwan Dewapriya, Anjana Punchihewa and Primal Wijesekera, have made my life at UBC memorable and pleasant with their constant support. I want to thank my previous academic teachers at The Open University of Sri Lanka and University of Moratuwa, Sri Lanka, Prof. Emeritus H Sriyananda, Prof. Dileeka Dias, Mr. Sarath Chandra, Dr. LSK. Udugama, Mr. DK. Withanage and colleagues for helping me to continue my studies.  Last but not least I would like to extend my appreciation towards my parents and my wife Kasuni for their continuous encouragement and constant support given throughout. Words cannot describe how indebted I am to my parents. I can only hope to be as dedicated, loving and caring parent as they are. I dedicate this thesis to them.  xi  To my parents  who gave me wings to fly,  held me when I fell,  &  their love is the greatest gift of all.  xii  Chapter 1 Introduction 1.1. Application Domain The research presented in this thesis is an integral part of a project on robotic human rescue in emergency and hazardous situations, as carried out in the Industrial Automation Laboratory (IAL) of the University of British Columbia (UBC). Specifically the thesis addresses visionbased object recognition and feature extraction as needed for the associated robotic tasks.  In situations such as a bomb blast, tsunami, earthquake, fire or flood, it may be rather hazardous or even impossible to utilize humans for urgent rescue, salvaging and cleaning operations in view of the adverse temperature conditions, chemical reactions, physical obstructions and geological factors that maybe present. In such situations, cooperative multi-robot system will be able to play a vital role. The robots will be able to perform a number of useful tasks ranging from rescue and evacuation people, providing medicine and first aid, removal of valuable equipment, clearing of obstructions and harmful material, and coordination between other robots and/or humans in task execution. Fast and accurate identification of humans and useful objects will be key to effective execution of such tasks. Image based techniques are particularly useful in this regard.  With rapid advancement of robotic technology and accompanied cost effectiveness, it is expected that populated neighborhoods of the future will rely on relatively simple and low cost robots to carry out routine tasks such as litter removal, traffic control, assisting the physically disadvantaged, surveillance, and so on a daily basis. These robots will typically be mobile, relatively low-cost, dedicated, and heterogeneous. The overall goal of the Rescue Robot Research project of IAL at UBC is to utilize a group of such robots in an emergency situation to carry out a crucial task, with information received from a sensor network in the area. Even though these robots are heterogeneous, and are not specifically designed to perform the emergency tasks, the innovation of the IAL project is in utilizing their capabilities to perform the required tasks. 1  Identifying, locating and collecting appropriate objects from the scene and assembling them into a carriage to carry out a rescue operation, is as an important application project in the present context. The carriage will be assembled from the items available in the emergency environment. For assembling the carriage, the robots should first identify objects that are useful and utilizable in the task. The work carried out in the present thesis is particularly relevant in this regard.  1.2. Computer Vision Requirements for a Multi-robot System The practical objective of the research presented in this thesis is to provide the ability of vision to the cooperative multi-robot system as developed in our laboratory (IAL). In a typical task of the project, the cooperating robots need to detect, identify, locate, grasp, transport and manipulate objects. These objects may be either useful parts for constructing the needed device for the emergency rescue operation or obstacles to be moved out of a navigation path. Also, other robots, target locations, obstacles will have to detected, identified, and located even though they will not be transported or manipulated. Hence, detection/identification of objects and extraction of parameters (or properties) of an object are major requirements of the overall robotic task.  In the present thesis, architecture is proposed (developed) for a suitable vision system for the multi-robot project by considering the specific requirements of the project. This architecture of the developed vision system consists of feature enhancement techniques, object modeling methods, object detection methods, a technique to select a method for object detection (to achieve required levels of speed and accuracy while providing adaptive behavior in the application), object parameter extraction techniques and facts and constraints for defining useful objects.  1.3. Motivation On 26th of December 2004, Sri Lanka and several other countries in Asia were badly affected by a severe tsunami. This was considered the worst tragedy that Sri Lanka experienced in recent times, which resulted in the loss of over one hundred thousand human lives and property worth billions of dollars. During the rescue process following the tsunami, humans were not able reach most of the affected areas as the structures were unstable and also the collapsed structures 2  obstructed the paths. That further increased the number of deaths and deteriorated the health of the affected people due to lack of medicine and first aid in a timely manner. The terrorist attack on the World Trade Center in New York City on September 11th in 2001 resulted in similar human suffering and loss of life and property. In such a situation a basic mobile robot may be able to reach areas that are not accessible to people. One or more robots may be able to provide medical assistance and perform rescue operations in an autonomous manner, without waiting for human intervention.  The human suffering in such emergency situations could have been mitigated to some extent through vision-based robotic intervention. These were the major factors that motivated the conducting of the specific research as reported in the present thesis. Furthermore, the ever increasing demand in industry for fast, efficient, accurate, and automated robotic vision systems provide further impetus for the present research direction.  1.4. Objectives One main objective of the present research is to develop a vision application that has the ability to quickly identify and distinguish objects in a complex environment by their inherent properties such as shape, colour, dimensions, and other (geometric) features. Furthermore, the developed methodologies should be capable of extracting specific useful parameters of the detected objects. There are several methods that can be applied to object detection; however, in the present work the focus will be directed at methods that are fast and robust with respect to variations of object properties and the environment. Existing methods may have to be modified, enhanced, or integrated to achieve the specific requirements of the application considered in the present work.  Another major objective of the present work is to quickly and accurately identify predefined type of objects. In the present application scenario, the speed of object detection is more important than the accuracy since the robots are expected to carry out their tasks quickly using the available and identified resources in the scene. The constraint on the speed of the vision schemes is crucial as well in view of the limitation on the processing capabilities that are available in an emergency environment.  3  1.5. Challenges Several challenges have been encountered during the initialization and implementation stages of the present project, which are briefly indicated now.  The designed vision application has to perform in an unstructured, unknown, and dynamic environment where the object information and the environmental conditions may vary. The variations in the object information may include orientation, scale, affine distortions, and so on. Occlusion and variation in the illumination level may be considered as environmental variations. It follows that achieving robustness in object detection and parameter extraction is a challenge.  It is clear that in view of the requirements of an emergency scenario, a robot should achieve its objective of object identification within a specified time frame (i.e., processing time per image has to be limited) while maintaining a required level of accuracy in the results. The on-going research on visual servoing in our laboratory has indicated that the speed of generation of the results is more important than the achieved accuracy level. Therefore, achieving the required speed of vision techniques while maintaining an adequate level of accuracy, has been a challenge. The best compromise between speed and accuracy has to be reached in the present work.  Object detection is a difficult problem because the same object may appear in many different ways in real-world images. An object might be occluded and a snapshot of the object could be taken from different views or under changing illumination conditions. Also, the object may appear in an arbitrary size anywhere in the image. These challenges need to be addressed as well.  1.6. Review of Previous Work An extensive survey has been made of the available literature on image processing, object detection, intelligent methodologies for object recognition and machine vision. In the literature review, the main aim has been to find methodologies that concern vision systems dealing with emergency situations. The methods should robust to variations in object features and environmental factors. They also should be fast. The surveyed literature is outlined in the following sections.  4  1.6.1 Classification of Previous Work Methods of object detection are classified with respect to two different approaches; namely, local and global approaches and generative and discriminative models. A local approach looks for more significant regions as described by image structures. This approach may be separated into two sections: region detectors and region descriptors. In a global approach, the entire image is taken into consideration. Subspace methods are used in global approaches. In the second approach, generative models (ICA or independent component analysis, PCA or principal component analysis, and NMF or non-negative matrix factorization, etc.) are used to reduce the dimension of the dataset. Discriminative models (LDA or linear discriminant analysis, SVM or support vector machines, boosting, etc.) are used for categorization. Then it is possible to find optimal decision boundaries for the available training dataset and the corresponding labels. Thus, to classify an unknown sample using a discriminative model a label is assigned directly based on the estimated decision boundary. In contrast, in a generative model, first the likelihood of the sample is estimated and based on this the sample is assigned the most likely class.  Object recognition systems that are found in the literature on cognitive psychology such as Schreiber et al. (1991), employ a neural network approach and a multiplayer network solution. These methods are classified as shown in Figure 1.1 and are described below.  5  6 Figure 1.1: Classification of Useful Methods.  1.6.1.1. Local Region Detectors Typically, local appearance-based object recognition techniques repetitively find distinguished regions of an image. In general, local regional detectors return the scale, shape and the orientation of a region. Interest point detectors return exact position within the image, a point being a special case of a region. The local region detectors may be divided into the categories: corner based detectors, region based detectors, and other approaches.  Corner based detectors locate points and regions that contain a high level of image structure (e.g., edges). However, they are not suited for uniform regions and regions with smooth transitions. Region based detectors regard local blobs of uniform brightness as the most salient aspects of an image. Other approaches, for example, take into account the entropy of a region (i.e., entropy based salient regions). Most popular methods are listed below, which also give performance results. (Schmid et al., 2000; Mikolajczyk et al., 2003; Mikolajczyk et al., 2004; Mikolajczyk et al., 2005) •  Harris or Hessian point-based detectors (Harris, Harris-Laplace, Hessian, HessianLaplace) (Harris and Stephens, 1988; Mikolajczk and Schmid, 2001)  •  Difference of Gaussian Points (DoG) detector (Lowe, 2004)  •  Harris or Hessian affine invariant region detectors (Harris-Affine) (Mikolajczyk and Schmid, 2002)  •  Maximally Stable Extremal Regions (MSER) (Matas et al., 2002)  •  Entropy Based Salient Region detector (EBSR) (Kadir and Brady, 2001; Kadir et al., 2004)  •  Intensity Based Regions and Edge Based Regions (IBR, EBR) (Tuytelaars and Gool, 2000; Tuytelaars and Gool, 2004)  7  1.6.1.2. Comparison of Local Region Detectors Table 1.1 presents a summary of invariance properties, runtime of detectors, their repeatability, and regions (number of detections) of some important methods described in this chapter. The ratings are based on the previously reported data in this area (MSER, DoG, EBSR) and the vast collection of implementations provided by the Robotics Research Group at the University of Oxford. Also the results from extensive evaluations studies given by (Mikolajczyk et al., 2003; Mikolajczyk et al., 2004; Mikolajczyk et al., 2005) are used. Table 1.1: Summary of the Detector Category, Invariance Properties and Individual Ratings for Runtime, Repeatability and the Number of Obtained Regions. Detector Harris Hessian Harris-Lap. Hessian-Lap. DoG Harris-Affine Hessian-Affine MSER EBSR EBR IBR  Assigned Category Corner Region Corner Region Region Corner Region Region Other Corner Region  Invariance  Runtime  Repeatability  None None Scale Scale Scale Affine Affine Projective Scale Affine Projective  Very short Very short Medium Medium Short Medium Medium Short Very long Very long Long  High High High High High High High High Low Medium Medium  Number of Detections High High Medium Medium Medium Medium Medium Low Low Medium Low  1.6.1.3. Local Region Descriptors This section gives an overview of some important state of the art region descriptors. Feature descriptors describe the region or its local neighbourhood that is already identified by the detector using certain invariance properties.  It should be clear that the performance of a descriptor strongly depends on the power of the region detector. Incorrect detection of the region location or shape will significantly change the appearance of the descriptor. Robustness against rather small errors in the detection of location or shape is also an important property for an efficient region descriptor.  8  As suggested by Mikolajczyk and Schmid (2005), the above mentioned descriptors may be roughly divided into three main categories: Distribution based descriptors, filter based descriptors, and other methods. The existing techniques are: •  SIFT (Lowe, 1999; Brown and Lowe, 2002; Lowe, 2004)  •  PCA-SIFT (gradient PCA) (Ke and Sukthankar, 2004)  •  Gradient location-orientation histograms (GLOH), also called extended SIFT (Mikolajczyk and Schmid, 2005)  •  Spin Images (Lazebnik et al., 2003; Lazebnik et al., 2005)  •  Shape context (Belongie et al., 2002)  •  Locally Binary Patterns (Ojala et al., 1996)  •  Differential-invariants (Schmid and Mohr, 1997)  •  Complex and steerable filters (Baumberg, 2000; Freeman and Adelson, 1991; Schaffalitzky and Zisserman, 2002)  •  Moment-invariants (Gool, 1996; Tuytelaars and Gool, 2000)  1.6.1.4. Comparison of Local Region Descriptors Table 1.2 presents a summary of some common properties of the descriptors mentioned above, based on the investigations in (Mikolajczyk and Schmid, 2003; Mikolajczyk and Schmid, 2005). Common types of invariance against geometrical distortions are: rotation, scale change, and affine distortion. Here only the rotational invariance is considered because geometrical distortion invariance is a requirement of the detector. It should provide a rotational, scale or affine normalized patch. The most common techniques of scale adaptation and affine normalization provide a normalized patch defined up to an arbitrary rotation. The descriptor invariance against rotation is necessary.  9  Table 1.2: Summary of the Descriptor Category, Rotational Invariance Property, Dimensionality of the Descriptors and an Individual Performance Rating; (a) Implemented by Schaffalitzky and Zisserman (2002), (b) N = Number of Samples in the Patch, (c) No Comparable Results, (d) Implemented by Lowe (2004). Descriptor SIFT PCA- SIFT GLOH Spin images Shape context LBP Differential Inv. Steerable Filters Complex Filters Cross correlation Color moments Intensity moments Gradient moments  Assigned Category Distribution-bas. Distribution-bas. Distribution-bas. Distribution-bas. Distribution-bas. Distribution-bas. Filter Filter Filter Other Other Other Other  Rotational Invariance No No No Yes No No Yes Yes Yes No Yes Yes Yes  Dimensionality  Performance  High (128) Low (20) High (128) Medium (50) Medium (60) Very high (256) Low (9) Low low (15) (a) very high (N)(b) low (18) Low low (20)  Good Good Good Medium Good (c) Bad Medium Bad Medium (d) (c) (c) Medium  The dimensionality of a descriptor is very important because it heavily affects the runtime process and the memory requirements for storing the descriptors. The performance ratings are based on the evaluation of Mikolajczyk and Schmid (2003 and 2005). They have done their evaluations on re-implementations of the original descriptors with occasionally differing dimensionality. They have done an extensive evaluation of different scene types and various detector-descriptor combinations.  1.6.2. Incorporated Approaches and Techniques The commonly available methods are robust for variation in only a few factors. It may be necessary to integrate several methods and further enhance them for the present development and application of a robust vision system.  To find local features, it is desirable to employ the Harris Corner Detector (rotational invariant, runtime is very short, and detection is high), Difference of Gaussian Points (DoG, a region detector, scale invariant, runtime is short, and detection is medium), Hessian-Affine (a region detector, affine invariant, runtime is medium, detection is medium), Scale-invariant Feature 10  Transform (SIFT, this is not rotational invariant), Gradient Location and Orientation Histogram (GLOH, extension of SIFT), and Gradient moments (rotational invariant).  A discriminative model such as the AdaBoost approach may be used to model objects, since this is a type of classification approach. Extensive experiments will be carried on these selected methods.  Image pre-processing techniques will also be used to enhance the features in an image, which will be useful in improving the accuracy of the results obtained from these methods.  1.7. Organization of the Thesis Organization of this thesis is as follows. The present chapter gives a comprehensive introduction of the overall research project. This includes the background and the factors that motivated the specific research activity, and the objectives of the research. Developing a robust vision system that is able to resist the variation of the properties of the objects and their environment is a challenging task. Also, it is crucial to achieve a required level of speed and accuracy in order for the robots to properly carry out their tasks in the present application domain. The chapter also presents a review of pertinent literature, giving an overview of the key existing methods related to the subject area.  Chapter 2 presents the architecture of the vision system as developed in this thesis. The proposed architecture gives a platform that satisfies the vision requirements of the multi-robot system in the present experimental domain. It includes the following main modules: image pre-processing module to enhance the features of an image, offline object modeling approaches, and real-time object detection and parameter extraction methods. The vision system adaptively functions by selecting appropriate object detection method in real-time. Furthermore, the chapter presents significant features of the system, concluded by a summary of the key material.  Chapter 3 describes the core technologies used in the present research. It covers image feature enhancement techniques, object modeling, and object detection methods. Object model is very useful component in detecting a 3D object from any viewpoint. Objects are introduced by  11  modeling with training images, facts and constraints. Some object detection techniques use object models, and some other methods use facts and constraints to detect objects. Experimental environment, the runtime software platform, and the hardware devices used in the present work are described in Chapter 4. Also, the object modeling process and representative experimental results are presented and discussed. Extensive experimentation has been done in this regard. Further enhancements will be carried out based on the experimental results and the objectives of the application. Chapter 5 concludes the thesis by providing a synopsis of the present research and outlining the major contributions made in the thesis. Also, suggestions are made for further research in this field.  12  Chapter 2 Vision System Architecture 2.1. Overview As with the natural vision of humans, “artificial” vision is an essential for robots in sensing and understanding its environment. The vision system that is developed for the present application is modeled into several modules by considering the requirements of the cooperative multi-robot system.  In addition to the techniques of image preprocessing, feature enhancement, and object identification, the developed vision system makes use of intelligence. Sophisticated techniques of image preprocessing and feature enhancement are required for improved accuracy in object detection; however, this typically increases the average processing time per image.  Initially, common facts and constraints that are related to the environment and its objects are gathered and inserted into the “FactsAndConstraints” database. A robot may also provides possible information including facts and constraints of the objects, which will be useful in properly executing a requested task.  Object models are generated or formed using sets of training images, and related facts and constraints. There are a variety of object detection approaches. Some approaches use object models to detect objects of interest. They are employed in the “Real-Time Object Detection” module of the proposed vision system architecture. Run-time-modes are defined according to the requirements of the robot. Also, several factors are defined for the object detection methods and run-time-modes. The defined factors are evaluated in order to select the appropriate methods for achieving improvements in the object detection rate while reducing the average processing time per image.  In general, object detection methods provide the location of an object. To realize this, sets of images are taken from the location where an object is found. These images along with other 13  available data are processed by the object detection methods to generate the necessary information about the object. This feature is activated when a robot requests more information on the object.  Similar to how humans use intelligence in vision, the computer vision system of the multi-robot system utilizes intelligence for accommodating adaptive behavior of the system. In essence, the vision system provides the requested information to the robot for carrying out such tasks as object grasping, transportation, or manipulation.  2.2. Adaptive Behavior of the Vision System The architecture of the vision system as developed in the present thesis is presented in Figure 2.1. It uses computer based vision techniques as well as techniques that are similar to those used by human for recognizing objects. As humans combine cognitive processes with object detection techniques (Schreiber et al., 1991), the proposed vision system uses image-processing and feature enhancement techniques, and object detection approaches with intelligence to identify and gather important information about various useful objects in a scene. Intelligence is also used for accommodating adaptive behavior of the vision system.  Object  Varying environmental factors (lighting, occlusion, etc.)  Preprocessing  Objectimage & features  Object Parameter Extraction  Object Parameters (shape, size, position, orientation, etc.)  Image  Camera (object properties can change)  Enhanced images, for off-line modeling  Enhanced image Selected object model(s)  Form/Train Object Model(s)  Adaptive behavior Real-Time Object Detection Data for model selection  Adaptation  Operating Robots Task Requirements  Knowledge/ Facts/ Constraints  Adaptation  Figure 2.1: Vision System Architecture of the Multi-robot System.  14  The proposed vision system may be adopted to unknown and dynamic environments where environmental factors and properties of the objects are subjected to variation. In order to achieve this, a “Preprocessing and Feature Enhancement” module is introduced to possess the improved images with enhanced features. Those enhanced image datasets are useful in the initialization process of the “Form/Train Object Models” module and in the “Real-Time Object Detection” module at run-time.  In the initialization process, information such as areas of different levels of illumination and locations of the detected objects (detected using object detection methods) are gathered and stored as facts and constraints in the “FactsAndConstraints” database. Some initialization steps are repeated from time to time or when the environmental conditions change.  In the “Form/Train Object Models” module, models of the objects are formed or trained using the initial image data sets and facts and constraints. Those models are used in the real-time feature matching process at the object detection module.  Methods based on probability  distribution and geometric topology are not required in such models. Instead the module uses available facts and constraints.  When a request is made by a robot, identified regions of an object image are further processed to extract important properties. Those properties are required for the robot to carry out its tasks. The “Object Property Extraction” module will estimate the properties of the required objects and will provide the information to the robots, when requested. These modules are described in detail in following sections. In addition to the described basic functions, intelligence and adaptive behavior are incorporated into the vision system in order to achieve a tradeoff between speed and accuracy of object detection. There, various facts and constraints as related to the robot, the environment, and the objects are considered as well.  Computer-based intelligence plays a major role in the adaptive behavior of a vision system. Human perception is a very complex process. It starts with biological specialization and ends with the thought processes that categorize inputs on the basis of environmental cues and historical knowledge (Schreiber et al., 1991). The cognitive processes provide the ability to recognize objects under various conditions such as changing color and changing lines of view. Occlusion makes the problem even more complex by changing the appearance of the object. 15  Without intelligence, humans would be less successful in extracting useful information from a scene observed from an environment.  Recognition is an easy task for humans, as they have the brainpower, which has been evolved to function in a three-dimensional environment, and have cognitive abilities which enable them to make sense of the visual inputs. However, for a computer, this is an extremely difficult task because it does not have any built-in cognitive capabilities.  In incorporating adaptive behavior into the vision system, threshold levels are applied using an adaptive threshold technique. Variations of object properties, possible object areas, and values for fine-tuning parameters of the algorithms are estimated and predicted. Moreover, the relationships between the objects are identified, appropriate methods to detect objects in runtime-modes are selected, the “FactsAndConstraints” database is updated in real-time, the computational power utilization is optimized, and the object detection rate is increased.  In the prediction of variations of the object properties, the environment is sensed to get such properties as colored areas, texture, and reflectance. This stage may also be carried out in general, and the resulting facts are stored in the “FactsAndConstraints” database. In the prediction of possible object areas, the following factors are considered together with the provided facts and constraints in the database: previously detected locations, locations of other objects, possibilities of having similar type of objects together, and so on. Object detection methods match the identified properties with the existing ones. This is a situation where fuzzy logic techniques are appropriate, as the match is more likely to return varying degrees of membership in certain sets instead of an exact identification.  The adaptive behavior process examines not only the object that is being identified, but also the environment in which it exists. For example, when trying to determine the identity of a wheel, the thought processes might involve questions such as “This is a developed environment, so it makes sense to have a wheel here” or “This is a natural environment, so the object is probably not a wheel since it does not make sense in this context.” The application uses techniques of statistics and fuzzy logic to perform the categorizations. For example, fuzzy logic is necessary since it will allow the application to determine categorization on the basis of degrees of membership between various sets instead of absolute and discrete values. 16  The developed vision system uses various methods of object detection from time to time. The method-switching process is related to the requirements of the robot. The requirements include factors of run-time-modes and properties of the objects of interest to be found. Further details on this aspect are found in section 2.2.4.  For further improvement of accuracy, a few predefined images are captured and processed while the robot is stationary, and another set of images is captured by varying the camera direction. Even under the same settings, the camera noise and illumination patterns are subjected to change from time to time. Also, results produced by two robots are used for further verifications.  2.2.1. Image Feature Enhancement In the present work, a vision system is developed for assisting a cooperative multi-robot system operating in an unstructured, unknown, and dynamic environment. There are different types of variations and corruptions in the system such as direction dependent illuminations, shadows, occlusions, time dependent changes of object properties, and geometrical distortions. The imaging devices and computational techniques also may introduce noise patterns and distortions. Hence, image feature extraction will involve removal of false information and enhancement of the useful features which may be corrupted. In order to enhance the image features, image preprocessing techniques are applied according to the requirement of the object detection method. These techniques include smoothing (Gaussian, specific filters, Up and Down sampling), thresholding (adaptive threshold), histogram equalization, morphological operation, flood fill (a technique which marks or isolates portions of an image for further processing), and affine transform.  Features of the objects and environmental factors affect the results of object detection methods. Image processing techniques such as filtering are used to improve features in an image. It is also possible to enhance the required features (e.g., special symbols, corners and lines) by using primitive operators; specifically, Canny edge detector and Harris corner detector.  These techniques are used to obtain the enhanced feature sets in order to improve the accuracy of feature extraction and object identification. An image with good features will improve the object 17  identification accuracy, but the overall processing time will increase due to the increased number of pixel-wise operations in the techniques. Details of these techniques are presented in section 3.2.  2.2.2. Forming or Training of Object Model Object modeling is a useful yet challenging process. The system modeling technique that is developed and used in the present work is a semi-automated process. Hence it is required to determine adequate information about the object off line by acquiring and analyzing data for feature extraction (e.g., Haar-like features, SIFT features, corners, lengths of the sides, radii of circular objects, colored areas, images of special marks) from images of the object. It is time consuming but needs to be carried out only in the initial stage. It is possible to use geometric relations between features and also variations of features. Furthermore, available facts and constraints in the “FactsAndConstraints” database which relate to the object are also used. The data are located in a two- or three-dimensional vector space, for plainer objects and for 3D objects, respectively. This ensures that each feature has a location and information. A modeling approach termed “3D Invariant Feature Space” is proposed in the present work. Together, an existing statistical approach is used to form a model named “AdaBoost-based Object Model.” These approaches are described in sections 3.3.2 and 3.3.3. Once the model is formed, matching algorithms are employed to perform the matching process using real-time image data.  These types of models are used in “Object Detection using 3D Invariant Feature Space Model” and “Object Detection using AdaBoost-based Object Model” methods and provide a good basis for the feature matching process. Details are given in section 3.4 of Chapter 3.  An artificial neural network may also be used to train an object model. This approach has similar features like the above technique, but it will take more time to train an object model.  2.2.3. Facts and Constraints for the Objects Initially the vision system receives properties of objects that are useful for the robots to carry out the necessary tasks of the application. In addition, environmental properties such as the illuminated areas are extracted at the initial stage to recognize the environment in which the 18  object is located. Such properties are stored as facts and constraints in the “FactsAndConstraints” database. Useful facts of the objects are those related to the shape (such as width, height and radius for regular objects and contours for irregular objects), color and images of special marks that may be present on the objects. Constraints related to these facts should be included as well. During run-time as well, it is possible to store the properties of the detected objects. Possible variations of these properties may be predicted by using such updates.  Facts and constraints are used in the object detection algorithms (described in section 2.2.4, and in section 3.4 of Chapter 3) to identify useful objects. It is possible to increase the accuracy, while reducing the processing overhead by incorporating the facts and constraints. In this manner, the object detection rate may be increased significantly.  2.2.4. Real-Time Object Detection Available common approaches usually provide satisfactory results of object detection when the number of varying factors (such as lighting, scale, rotation, shearing) is no more than two, with constraints applied on other variable factors. No method is found that can accommodate a large number of variations of properties of the objects and the environment. Consequently, it is required to combine and enhance the best existing approaches in order to achieve the required levels of performance in the present application domain, which involves an unknown and dynamic environment. In fact most of the successful applications do not rely on just one method to detect or track objects. As a result, it has been decided to use more than one object detection method to ensure satisfactory performance in the present application.  A predictable relationship cannot be found between the processing time per image and the object detection accuracy. It depends on the object detection method, available facts on the required objects, the type of objects (especially, features of objects), their background, and required task involving the robot. Object detection methods take varying processing times while achieving different levels object detection accuracy. Moreover, the object detection rate may be increased by disabling some functions in the system (some image preprocessing steps, reasoning process, etc.) which will reduce the computational burden and the resources. Even if the object detection accuracy is adequate, it is also required to consider the processing time per image, since the speed of performance of the robotic system depends on the image processing speed. 19  In the initial stage of system operation, it is possible to find at least a few properties of the objects as facts which are needed for a specific robotic task. The properties of the required objects, processing time per image, object detection rates, types of objects, and the type of object background are used to define various run-time-modes. There are a number of factors that have to be defined for an object detection method such as, properties of required objects, processing time per image, and object detection rates. In addition, types of objects (it can be predicted with the given properties of required objects), type of background, fine-tuning parameters of the algorithms, and settings for some system features (to control preprocessing steps, resultant image showing option, resultant image saving option, and saving of detected object properties in the database option) are also defined. These factors are initially defined and are updated at run-time and when an equipment change occurs.  In the vision system it is possible to switch to appropriate methods and use them in real-time object detection. This is done by matching the factors in the run-time modes to the object detection methods with the use of techniques related to the methods and fuzzy technique. Factors such as facts and constraints are incorporated with intelligence for intelligent decision making when required.  Methods that use models of objects are suited for objects that have good features. These methods are briefly explained now. One such method is the “Object Detection using AdaBoost-based Object Model.” Haar-like features in training data sets are used with the AdaBoost approach in machine learning to form models of the required objects. This works well for consistently textured and mostly rigid objects. Models have to be formed very carefully by providing images with required features in every image in the set, while varying the other features within the same image set. It takes more time to generate models, but provides better results.  The method of “Object Detection using 3D Invariant Feature Space Model” uses Scale Invariant Feature Transformation (SIFT) features and corner-like features in a real-time image to match with features in the object models. Usually, this method provides better performance with planar and 3D objects. It has scale- and rotational-invariant features and reduces the effects of linear and non-linear illumination changes. SIFT is not affine invariant, but it is possible to calculate  20  SIFT on other type of detectors. However, this method has a very low misidentification rate and a moderate object detection rate when compared with other approaches.  There are other types of methods that use only given facts and constraints of the required objects. One such method is “Object Tracking using Mean-Shift” which is based on a probabilistic approach. In this method, initially the probability distributions for all objects of interest are calculated. Then, the application searches for the nearest probability distributions in the real-time images. If matching objects are found, it will be encircled by different colored rectangular frames. This is a simple and fast object tracking method and it uses fewer computations when compared with other approaches. It is possible to achieve the feature of scale invariance. Also it is robust when tracking and detecting moving objects which have compact features like a human face. The major drawback of this technique is that it is not robust to variations of illumination and the background.  In the method of “Detection and Properties Extraction of Regular Objects” regular type of objects are identified and their properties are extracted. Mathematical and pattern analysis techniques are utilized to find closely matching objects, which are defined by their properties (for rectangular objects width, height, and angles; for circular objects radius; for other type of objects contours; for all objects possible colors). Topological techniques are also used to find matching objects based on qualitative measures, while a statistical analysis technique is used to determine the degree of set membership that must be met in order for an object to be identified. These techniques are combined with some elements of computational intelligence in order to improve their performance in real applications. This method is the second best with respect to speed (processing time per image), and also provides the second highest object detection rate when compared with other approaches. Sharper outer boundaries are required for objects to get better results and these boundaries should be highlighted in the image with correct orientations. Due to that, the method of “Detection and Properties Extraction of Regular Objects” produces good results for black (or green or red) colored object boundaries, since the contours are found through RGB color planes. Variation of scale and occlusion do not affect the results, but the performance will degrade due to changing illumination and orientation.  As an additional feature for the developed system, template matching technique is used for further verification while motion analysis technique is used to identify moving objects. However, 21  these techniques suffer from lower object detection rates and higher processing time per image. The hierarchical convolutional neural networks are considered as a promising model for robust object detection, where it is required to train neural networks to detect different types of objects. This method has not been implemented so far, since it takes more time at the initial stage to train the neural networks. These approaches and techniques are discussed further in Chapter 3.  2.2.5. Object Property Extraction The method “Detection and Property Extraction of Regular Objects” which is based on properties of objects, provides properties of the objects without further processing the detected objects, which is one of the advantages of the method. Most of the other approaches find the desired objects and provide the locations of the objects. With those methods in order to obtain more details about the object, the detected object area of the image needs to be further processed.  To improve the accuracy of the results, it may be desirable to obtain more images at the detected area from different directions under different conditions. These images are further processed using techniques of image processing and primitive pixel-wise specific property extraction to extract dimensions of the object, different colored areas of poses of the object and images of special symbols. Poses of the object are identified using these extracted properties. These additional processing steps are executed according to the settings of run-time modes. Such properties are saved in the “FactsAndConstraints” database, and the robots are able to access the database when performing their tasks.  2.3. Significant Features of the System The vision system developed in the present thesis has a number of features. The most important features are highlighted now. Additional features can be included to the application, since the modules of the architecture are designed and implemented using object-oriented methodology. There are a number of variation factors that are related to the properties of the objects and the environment. The application is invariant to variations in illumination, scale, rotation, orientation, and shearing up to certain levels. Some approaches are good even when part of the object is not apparent. Multiple objects may be detected and their properties may be extracted. Their poses may be estimated using the results. Several techniques are utilized for further verification of the results. 22  Intelligence is used to incorporate adaptive behavior in the system. All functions in the system are executed by considering the requirements of the robots in their task execution. Required performance factors such as the object detection rate and the average processing time per image may be defined. Features of the system are utilized to satisfy the required performance factors. The object detection rate, average processing time per image, options to save and show the resultant image, and the option to save the results in the database are provided in addition to the locations of the objects and their properties.  2.4. Summary The proposed vision system architecture, as developed in the present thesis, represents a platform that fulfills the vision requirements of a multi-robot system. It is modularized, which is advantages in its design and development. Developing a vision system that is robust to variations of properties of the objects and the environment is a challenging task. Sufficient attention has been given and work has been done to obtain the enhanced feature sets by means of integration and enhancement of existing techniques along with new additions while considering the processing overhead.  A new object modeling approach has been introduced and the available Haar-like feature based object modeling approach has been used to model the desired objects. These models are used for object detection. Most useful object detection methods were discussed and used while incorporating new techniques as well. These methods produce good results for the considered types of objects with some constraints. A number of enhancements have been carried out and several methods are simultaneously employed in the run-time modes.  Considerable success has been achieved due to the adaptive behavior of the system. Even though the intelligent techniques used here do not guarantee optimal solution, they reduce the computation complexity and increase the object detection rate. Facts and constraints are useful to reduce the processing overhead and to detect objects of interest. In the developed vision system several techniques have been introduced and enhanced in addition to image preprocessing, feature enhancement and object detection. Utilization of the techniques is optimized in the run-  23  time modes by considering the specific requirements of the robots in task execution, which is very challenging task.  Multiple threads (multiprocessing) will be used to concurrently run different object detection methods. The present architecture will be enhanced for use with multiple robots and will include stationary cameras. The laser range finder of a robot will be integrated to find the distance to an object.  24  Chapter 3 Core Technologies 3.1. Overview Image feature enhancement is essential for obtaining accurate results from a vision system when the application is running in a real environment. Image pre-processing techniques are needed for this purpose. Enhanced feature sets are used in object modeling and detection. In this thesis, a new object modeling approach is proposed and implemented, where the model contains the useful features in 3D feature space. Another useful approach called Adaptive Boosting, or simply AdaBoost, is also used to model object.  Two object detection methods: Feature based method and Geometric Shape based method are proposed and implemented in the present work. In addition, existing principles are used to implement another method for object detection. These methods make use of the object models except for the Geometric Shape based method. Furthermore, a technique for feature point distribution representation is proposed in the thesis. Following sections present these core approaches and techniques in detail.  3.2. Image Feature Enhancement Techniques In the present context, features are the results obtained when a general neighborhood operation (i.e., a feature extractor or a feature detector) is applied to the image. Features may refer to specific structures in the image itself, ranging from simple structures such as points or edges to more complex structures such as objects. A further discussion on features is given in section 3.3.1.  Enhanced feature sets are required for obtaining improved results in the subsequent processing steps. Techniques of image processing and feature enhancement are used to remove undesirable effects on the image and highlight the features.  25  It is possible to use morphological operations (dilation, erosion) to remove noise, isolate individual elements, and join disparate elements in an image.  Smoothing is a simple and frequently used image processing operation. It is usually performed to reduce noise or camera artifacts. Smoothing is also important in reducing the sharp changes of an image.  Another useful technique is histogram equalization. It can be incorporated to expand the dynamic range of values of a real-time image. If the image has poor contrast, the histogram shows that the actual intensity values are all clustered near the middle of the available range. This technique is used to stretch this range out.  Also, Flood fill is an extremely useful technique and is used to mark or isolate portions of an image for further processing or analysis.  HSV (hue, saturation, value) colour model is  sometimes used to define gradients. Hough Transform may be used to highlight lines, circles, or other simple forms in an image. Affine Transform will correct the affine distortions in an image. There are functions in the OpenCVTM software library for carrying out these operations. Most these techniques affect the entire image, which may not be appropriate particularly when the effect is not linearly distributed.  3.3. Object Modeling Object model is useful in detecting 3D objects from any direction of view. It is superior to most existing approaches, which use individual training images. Features of the object are incorporated in object modeling.  There are many reasons and motivations for using features rather than the pixels directly. The most common reason is that features can act to encode ad-hoc domain knowledge that is difficult to learn using a finite quantity of training data. For the vision system developed for the present application there is also a second critical motivation for using features: a feature-based system is much faster than a pixel-based system.  26  There exist carefully designed methods to detect and represent features. Some methods have been used since the early days of stereo matching by Moravec (1983) and Hannah (1988) and have recently gained popularity for image stitching applications (McLauchlan and Jaenicke, 2002; Brown et al., 2005; Brown and Lowe, 2007) as well as for fully automated 3D modeling (Beardsley et al., 1996; Schaffalitzky and Zisserman, 2002; Brown and Lowe, 2003; Snavely et al., 2006).  In the present thesis, a new approach, 3D Invariant Feature space, is proposed and implemented where different types of features can be embedded with the object model. Useful feature descriptions are organized in a proper format (data structure) using images from different views. Image variations are considered with a range of imaging conditions and transformations. The feature set of a new training image is matched to the existing features in the model in order to place the new feature set. A feature set includes different type of features.  In another category, once the features are selected, different appearances of an object may be learned by using a supervised learning approach. These learning approaches include, but are not limited to, neural networks (Rowley et al., 1998), adaptive boosting (Viola et al., 2003), decision trees (Grewe and Kak, 1995), and support vector machines (Papageorgiou et al., 1998). These learning methods compute a hypersurface that separates one object class from another in a high dimensional space.  In the present thesis a statistical (supervised learning) principle named AdaBoost with a new object-related Haar feature selection method is used to model objects. The resulting object model is termed “AdaBoost based Object Model”. These approaches are discussed in detail in sections 3.3.2 and 3.3.3.  27  3.3.1. Features of an Object Localized features or interest points are described by the appearance of patches of pixels surrounding the point location. In general, the most desirable property of a visual feature is its uniqueness so that the objects can be easily distinguished in the feature space. There are different types of features which can be extracted and represented; namely, Haar-like feature, SIFT feature, Corner-like feature, Edges, Ridges, and Blobs. Out of these, commonly used features in the literature and applications are described next.  3.3.1.1. Haar-like Feature Haar-like features are similar to the basic functions in Haar wavelets, where it is a specific sequence of functions as shown in Figure 3.1 (a). Few samples of Haar-like feature are shown in Figure 3.1 (b).  Instead of the usual image intensities, this feature considers rectangular regions of the image and sums up the pixels in the region. The sum is used to categorize the images. The sum of pixels in a region may be quite high or low compared to another region. The value will depend on the structure of the object in the region and its environment. All images are categorized where Haarlike feature in the rectangular region in a certain range of values as one category and those falling out of this range as another category. This will roughly divide the set of images into ones corresponding to one type of object and others corresponding to other type of object. This procedure may be iteratively carried out to further divide the image clusters.  (a)  Edge features  Line features  Center-surround features  (b) Figure 3.1: (a) Haar Wavelets, (b) Haar-like Features. 28  3.3.1.2. SIFT Feature Scale Invariant Feature Transform (SIFT) (Lowe, 1999) features are local and based on the appearance of the object at particular interest points, and are invariant to image scale and rotation. These are also robust to changes in illumination, noise, and minor changes in the direction of view. In addition to these properties, these features are highly distinctive and relatively easy to extract. The scale invariant properties of the descriptor are based on the scale invariant detection behaviour of the difference of Gaussian (DoG) point detector. Rotational invariance is achieved by the main orientation assignment of the region of interest.  SIFT consists of a scale invariant region detector called DoG detector. The DoG point detector determines highly repetitive interest points at an estimated scale. To get a rotation invariant descriptor, the main orientation of the region is obtained by a 36 bin orientation histogram. This histogram is related to gradient orientations and it is within a Gaussian weighted circular window. All the weighted gradients are normalized to the main orientation of the circular region. The circular region around the feature-point is divided into 4 x 4 non-overlapping patches and the histogram gradient orientations within these patches are calculated. Histogram smoothing is carried out in order to avoid sudden changes of orientation, and the bin size is reduced to 8 bins in order to limit the descriptor size. This results in a 4 x 4 x 8 = 128 dimensional feature vector for each key-point (feature). Figure 3.2 illustrates this procedure for a 2 x 2 window. This is taken in part from the paper presented by Lowe (1999). Note that only a 32 dimensional histogram obtained from a 2 x 2 grid is depicted for a better facility of illustration.  Figure 3.2: Illustration of the SIFT Descriptor Calculation. Finally, the feature vector is normalized to unit length and thresholded in order to reduce the effects of linear and non-linear illumination changes. The descriptor is not affine invariant. But, it is possible to calculate SIFT on other type of detectors, so that it will inherit affine invariance from them (Harris-Laplace, MSER or Harris-Affine detector). 29  3.3.1.3. Corner-like Feature In the literature, a “corner” is defined as a point for which there are two dominant and different edge directions in a local neighbourhood of the point. Such points often arise as a result of geometric discontinuities such as corners of a real world object. They may also arise from small patches of texture.  Most algorithms are capable of detecting both kinds of points of interest, though the algorithms are often designed to detect one type or the other. There are different types of detectors to compute corner response; for example, 1) Edge based corner detectors, 2) Greylevel derivative based detectors, and 3) Direct grey-level detectors.  A simple approach to corner detection in images uses correlation, but this can be computationally expensive and suboptimal. An alternative approach that is often used is based on a method proposed by Harris and Stephens (1988), which has been improved by improved by Moravec (1989).  Usually, most corner detection methods detect interest points in general rather than the corners in particular. If only corners are to be detected, it is necessary to do local analysis of the detected interest points. Corner detectors are not usually very robust and often require expert supervision.  3.3.1.4. Other Features Other features include, KLT feature (Shi and Tomasi, 1994), Blob (Patches of image), Edge (Canny), Ridge, Curvature of the shape, and Color distribution of the object. Comprehensive descriptions of alternative techniques can be found in a series of survey and evaluation papers by Schmid, Mikolajczyk et al. covering both feature detection (Schmid et al., 2000; Mikolajczyk et al., 2005; Tuytelaars and Mikolajczyk, 2007) and feature descriptors (Mikolajczyk and Schmid, 2005). Shi and Tomasi (1994) and Triggs (2004) have provided good reviews of feature detection techniques.  It is clear that different features have their own pros and cons. Features are detected and represented by feature descriptors. They are then used to form or train object models and to 30  detect objects. The decision to use specific features in a particular situation is dependent on the type of object to be detected and the requirements of the vision system.  3.3.2. 3D Invariant Feature Space The view clustering approach presented by David Lowe (2001) is an important approach. In this approach, the first training image is used to initiate the model with extracted SIFT features. Then the process of matching models to subsequent images uses a Hough transform approach followed by least-squares geometric verification. In principle, the feature combinations are performed by measuring the closeness of the geometric fit to the previous views. However, for the purpose of view clustering, it is important to use a similarity transform instead. It can integrate any number of training images.  A new image-based modeling approach has been proposed by Snavely et al. (2006). This approach is based on computing, from the images themselves, the photographer’s locations and orientations, along with a sparse 3D geometric representation of the scene, using a state-of-theart image-based modeling system. This approach first computes the feature correspondences between images, using SIFT descriptors that are robust with respect to variations in pose, scale, and lighting, and then runs an optimization to recover the camera parameters and 3D positions of those features. The resulting correspondences and 3D data enable all of the features of its system. The system handles large collections of unorganized photographs taken by different cameras in widely different conditions.  Almost all approaches are feature dependent, and hence they are not entirely invariant for variations in image. Thus, there is no single approach that provides a very robust solution. Also in some methods, object detection rates and processing time per image are unsatisfactory. By considering all drawbacks, a new approach is proposed in the present thesis by combining a number of feature types and accounting for a number of invariant factors to reduce the run-time computational requirement while improving the detection rate and the accuracy. In the proposed vision system architecture, it is possible to define facts and constraints for the objects to be identified and also to define the expected object detection rate and the processing time per image. These are useful to optimize the use of features and techniques within the model and to  31  determine when to use the model for object detection. The approach is given by the following steps: •  First, place the object in a unique white colour background, and define the region of interest (ROI) of the image to omit unwanted features and reduce the computation cost.  •  Detect locations of SIFT and corner features.  •  Significant blob areas in the image are manually specified during the off-line training process. This option will reduce the picking of useless feature points and reduce the computational burden.  •  According to the requirement of object detection, add and perform linear and nonlinear distortions (affine, projection, noise, illumination, rotating, scaling) to the local patches of features.  •  Use specific feature extracting methods to find and store feature descriptors, such as SIFT feature vectors and corner descriptors in data-structures. In addition, their locations on the image, colour at the feature locations, and positions and directions of the camera are also temporarily stored.  •  Reduce the feature set while keeping most distinguished and stable feature points.  •  If different types of features are found at the same location, keep only a few low dimensional features.  •  Remove the similar feature descriptors that are found from previous images using a matching method. Also if similar features are found at different locations on the object, keep the feature descriptor and the locations.  •  Store data (feature descriptor, location, colour) of the features in an indexed database.  •  Two feature sets of two images from known directions are used to find the distance to the feature point, which is useful in pointing the feature in a three dimensional space.  •  Features pointed to in a three dimensional space using location and distance of the feature.  •  Store the expected object detection rate, processing time per image and the provided facts and constraints for the object.  The following factors need to be considered in the object detection methods. •  Initially, if possible define the required object detection rate and processing time per image. Also provide the available facts and constraints of the object to be found. This performance criterion may prevent execution of some processing steps.  •  In real-time identification, the 3D feature space may be geometrically rotated to find a number of best matching features. Then the pose of the object can be estimated in a simple manner.  32  Many techniques are utilized in the work presented in this thesis. For feature detection the SIFT feature detector, Harris’ corner detector and Hough-line are used. Feature matching is performed using the KD-Tree technique. To remove unstable feature points, techniques proposed by Turcot and Lowe (2009) are used. Images are merged using Image Stitching and Panorama techniques to remove repeated features.  The object model is created by following the approach as described above. It is used in the section on “Object Detection using 3D Invariant Feature Space Model” and the results are presented with conclusions in section 3.4.2.2. Due to difficulty in implementation, all the features described above are not included in the created model.  Advantages of the proposed approach are as follows: 1) It reduces the run-time computation. The model generating process is complex and time consuming only at the beginning; 2) It may be used with a number of object detection methods; 3) It satisfies feasible performance criteria; and 4) It is able to work with different type of features if the object has some significant features.  The model that is proposed and developed in the present thesis may be upgrade by combining with intelligent techniques. Furthermore, it is desirable to automate the process, and also to design and implement methods of model verification and fine-tuning.  3.3.3. AdaBoost-based Object Model “Boosting” is a general and efficient machine learning algorithm for improving the performance of any learning algorithm. Boosting works by repeatedly running a given weak learning algorithm on various distributions using the training data, and then combining the classifiers produced by the weak learner into a single composite classifier (Yoav and Robert, 1997). The object model is formed by using the classifier, and the facts and constraints for the object. This process is illustrated in the Figure 3.3.  33  Training Samples  weak learner  Object Model  Features Cascading  T  T 1  2 F  T 3  F  T N  F  F  Figure 3.3: Boosting Algorithm to Form a Strong Classifier. As an effective boosting algorithm, AdaBoost has been presented by Schapire (1990) and Freund (1995). AdaBoost is an abbreviation for Adaptive Boosting which has been described as “the best off-the-shelf classifier in the world'' by Hastie et al. (2001). In AdaBoost learning, Haar-like features are used as training data to make classifiers. Usually, each feature is described by basic geometric shapes such as two or three joined “black” and “white” rectangles, either up-right or rotated by 450, as in Figure 3.1 (b) in section 3.3.1.1. Each feature type can indicate the existence (or not) of certain characteristics in the image, such as edges or changes in texture. For example, two rectangle features will indicate where lies the border between a dark region and a light region. The presence of a Haar feature is determined by subtracting the average dark-region pixel value from the average light-region pixel value. If the difference is above a threshold (set during learning), that feature is said to be present. Selection of features plays an important role in the performance of the classification. Hence, it is important to use a set of features that discriminate one class from another. Therefore, rectangle combination is better suited for regular type of objects. In contrast, two different types of feature sets are used in the present work according to the type of object: one is for regular type of objects and other for irregular type of objects. When the type of object is defined, an appropriate predefined feature set will be selected. Such predefined sets are shown in Figure 3.4 (a) and (b). Also note that the used Haar-like features are different in block size and types to the features discussed in Section 3.3.1.1. Also this feature set is calculated with image variations such as scale, rotation, and illumination to increase the accuracy of the classifier. 34  (a)  Feature Set for Irregular Objects (Saxena et al., 2008).  Edge Features  Line Features (b)  Corner Features features  Diagonal Line Features Center-surround Features  Feature Set for Regular Objects.  Figure 3.4: Predefined Feature Sets. The computed feature value is then used as the input to a simple decision tree classifier that usually has just two terminal nodes, that is:  + 1, xi ≥ ti fi =   − 1, xi < t i  where the response +1 means an object of interest, and -1 means an object that is not of interest.  A “weak” classifier will not be able to detect objects of interest; rather, it will react to a simple feature in the image that may be related to an object of interest. Here, "weak" means that the classifier is only slightly better than random guessing, which is not acceptable. When there are many such weak classifiers, each one pushing the final answer a little bit in the right direction, the combined force for arriving at the correct solution can be significant. The AdaBoost algorithm selects a set of weak classifiers and combines them by assigning a weight to each classifier. This weighted combination will result in a strong classifier.  The algorithm trains weak classifiers using an input training set ( x1 , y1 ),...., ( x m , y m ) , where xi are the object images belonging to the object domain X , and each label y i are classified outputs in some label set Y . In addition, the instances xi correspond to features of object images and labels y i give the outcomes of each object image. Assume Y = {−1, + 1} . This differs for the multi-  object case. The weight of this distribution on the training object image i in round t is denoted as Dt (i) . At the beginning of the learning phase, the weights D1 (i) are initialized as D1 (i ) = 1 / m . 35  Here m is the number of object images. AdaBoost calls the weak learning algorithm repeatedly in a series of rounds t = 1,...., T .  One of the main ideas of the algorithm is to maintain a distribution or set of weights over the training set. Initially, all weights are set equal. In each round, the weights of the incorrectly classified object images are increased so that the weak learner is forced to focus on the hard object images in the training set. This algorithm is presented in Figure 3.5.  The weak learner's task is to find a weak hypothesis ht : X → {−1, + 1} that is appropriate for the distribution Dt . The goodness of a weak hypothesis is measured by its error  ε t = Pri ~ D [ht ( xi ) ≠ y i ] = t  ∑ D (i). t  i:ht ( xi ) ≠ y i  The error is measured with respect to the distribution Dt on which the weak learner was trained. Given: ( x1 , y1 ),...., ( x m , y m ) where xi Є X , y i Є Y = {−1, + 1 } Initialize D1 (i) = 1 / m For t = 1, ..., T : 1. Train weak learner using distributed Dt . 2. Get weak hypothesis ht : X → {−1, + 1 } with error  ε t = Pri ~ D [ht ( xi ) ≠ yi ] t  3. Choose α t =  1 1 − ε t ln  2  ε t      4. Update: Dt +1 (i ) = =  Dt (i )  e −α t if ht ( xi ) = y i ×  αt Zt  e if ht ( xi ) ≠ y i Dt (i ) exp(− α t y i ht ( xi ) ) Zt Where Zt is a normalization factor (chosen so that Dt+1 will be a distribution).  Output the final hypothesis:  T  H ( x ) = sign  ∑ α t ht ( x)   t =1   Figure 3.5: AdaBoost Algorithm.  36  Once the weak hypothesis ht has been received, AdaBoost chooses a parameter α t as in the Figure 3.5. Intuitively, α t measures the importance that is assigned to ht . Note that α t ≥ 0 if  ε t ≤ 1 / 2 (which may be assumed without loss of generality), and that α t gets larger as ε t becomes smaller.  The distribution Dt is updated using the rule shown in Figure 3.5. The effect of this rule is to increase the weight of the object images misclassified by ht , and to decrease the weight of the correctly classified object images. Then, the weight will tend to concentrate on the “hard” object images.  The final hypothesis H is a weighted majority vote of the T weak hypotheses where α t is the weight assigned to ht .  Total number of Haar-like features is very large within any image sub-window. For fast classification, the learning process has to focus only on a small set of significant features. Therefore, enhanced AdaBoost method proposed by Tieu and Viola (2000) is used in the present work. There, the weak learner is constrained so that each weak classifier that is returned will depend on only a single feature (Bowyer and Dyer, 1990).  The object related facts and constraints are embedded into the final classifier to form the object model.  This method gives a fast object detection rate at run-time and there are no parameters to tune, except the number of round T. An advantage of the AdaBoost algorithm is its simplicity of implementation. However better performance is achieved with the proposed feature set selection by considering the object type. Also robustness is enhanced by varying the scale, orientation and illumination up to several levels. The algorithm is less susceptible to the over-fitting problem when a large number of training image data sets are used.  37  This approach will be extended for multi-object modeling. In addition to Haar-like features, other features such as object area, object orientation and object appearance will be used in the form of a density function such as histogram.  3.3.4. Other Methods In addition to the object modeling methods described in the above sections, there exist several other object modeling techniques as summarized below.  —Probability Densities of Object Appearance. The probability density estimates of the object appearance can be parametric, such as Gaussian (Zhu and Yuille, 1996); a mixture of Gaussians (Paragios and Deriche, 2002); nonparametric, such as Parzen windows (Elgammal et al., 2002); or histograms (Comaniciu et al., 2003). The probability densities of the features of object appearance (color, texture) can be computed from the image regions specified by the shape models (interior region of an ellipse or a contour).  —Active Appearance Models. Active appearance models are generated by simultaneously modeling the object shape and appearance (Edwards et al., 1998). In general, the object shape is defined by a set of landmarks. Similar to the contour-based representation, landmarks can reside on the object boundary or, alternatively, they can reside inside the object region. For each landmark, an appearance vector is stored, which is in the form of color, texture, or gradient magnitude. Active appearance models require a training phase where both the shape and its associated appearance are learned from a set of samples using, for instance, the principal component analysis.  —Multiview Appearance Models. These models encode different views of an object. One approach for representing different object views is to generate a subspace from the given views. Subspace approaches, for example, Principal Component Analysis (PCA) and Independent Component Analysis (ICA), have been used for both shape and appearance representation (Mughadam and Pentland, 1997; Black and Jepson, 1998). Another approach to learn different views of an object is by training a set of classifiers; for example, the support vector machines (Avidan, 2001) or Bayesian networks (Park and Aggarwal,  38  2004). One limitation of multiview appearance models is that the appearances in all views are required ahead of time.  —Support Vector Machines. As a classifier, Support Vector Machines (SVM) are used to cluster data into two classes by finding the maximum marginal hyperplane that separates one class from the other (Boser et al., 1992). The margin of the hyperplane, which is maximized, is defined by the distance between the hyperplane and the closest data points. The data points that lie on the boundary of the margin of the hyperplane are called support vectors. In the context of object detection, these classes correspond to the object class (positive samples) and the nonobject class (negative samples).  3.4. Object Detection Methods The vision system as developed in the present thesis has to work in an unknown, unstructured and dynamic environment. Therefore, object detection methods that are employed should be robust to variations in environmental factors such as lighting (illumination and the direction of illumination), change of image direction, and variation of object features (such as scale, shape, affine distortions, orientation, and rotation). Thus, the present work focuses on most suitable approaches that can deal with these varying factors and conditions.  Different types of object detection approaches have been identified. Two pertinent approaches are, Local and Global approaches and Generative and Discriminative models. In Local and Global approaches, Local approaches look for more significant regions describe by image structures. This approach consist of two aspects: Region Detectors and Region Descriptors. The entire image is taken into consideration by Global approaches. Subspace methods also fall into global approaches. In the second approach of Generative models (Principal Component Analysis or PCA), Non-negative Matrix Factorization or NMF, etc..) are employed to reduce the dimension of the dataset. In Discriminative models (Linear Discriminant Analysis or LDA), Support Vector Machines or SVM, Boosting, and so on are designed for categorization.  Selection of a suitable object detection method for the proposed vision system is dependent on the type of object, operating environment, and the requirements of the vision system. A series of experiments are carried out using different object detection methods. Some of these methods are 39  enhanced to improve the object detection rate while reducing the computational burden. In the present work, two object detection methods are proposed and implemented.  An enhanced version of an existing method, termed “Object Detection using AdaBoost-based Object Model” within the Generative and Discriminative approach is presented in section 3.4.1.1. The proposed methods: “Object Detection using 3D Invariant Feature Space Model” within the Local approach given in section 3.4.2.1 and “Detection and Property Extraction of Regular Object” within the Geometric Shape based approach given in 3.4.3.1. are also presented.  3.4.1. Generative and Discriminative Approach A generative probabilistic distribution is a principled way to model many problems of machine learning and machine perception. There, one provides domain specific knowledge in terms of structure and parameters over the joint space of variables.  Generative models provide the user with the ability to start a learning algorithm with knowledge about the problem at hand. This is given in terms of structured models, independence graphs, Markov assumptions, prior distributions, latent variables, and probabilistic reasoning (Box and Tiao, 1992; Pearl, 1997). The focus of generative models is to describe a phenomenon and attempt to re-synthesize or generate configurations from it. In the context of building classifiers, predictors, regressors and other task-driven systems, density estimation over all variables or a full generative description of the system often can be an inefficient intermediate goal. As one extreme, the generative models attempt to estimate a distribution of all the variables (inputs and outputs) in a system.  Meanwhile, discriminative techniques such as support vector machines have little to offer in terms of structure and modeling power, yet achieved superb performance on many test cases. This is due to their inherent and direct optimization of a task-related criterion.  On the other hand, discriminative algorithms adjust a possibly non-distributional model for data optimization in a specific task such as classification or prediction. This typically leads to superior performance while reaching a compromise in flexibility of generative modeling.  40  Statistical model-based training takes multiple instances of the object class of interest, or “positive” samples, and multiple “negative” samples, i.e., images that do not contain objects of interest. Positive and negative samples together make a training set. During training, various features are extracted from the training samples and distinctive features that can be used to classify the object are selected. This information is “compressed” into statistical model parameters. If the trained classifier does not detect an object (i.e., the object is missed) or mistakenly detects an absent object (i.e., gives a false alarm), it is easy to make an adjustment by adding the corresponding positive or negative samples to the training set. This approach is utilized in the object detection given next.  3.4.1.1. Object Detection using AdaBoost-based Object Model In this method, a real-time image is converted into a gray-scale image, and the histogram equalization operation is used to spread the brightness values. Then rectangular regions are found in the image, which may contain objects. The regions of the image are scanned several times with different scales. Each time it considers overlapping regions in the image and applies object model to the regions. After it has proceeded and collected the candidate rectangles (regions that passed the classifier cascade), it groups them and returns a sequence of average rectangles.  This method uses AdaBoost-based Object Models. The model contains a classifier and facts and constraints of the object. The classifier is trained using training image sets. Facts and constraints are incorporated to the model to verify the detected object. The object modeling is presented in section 3.3.3.  The object detection process is represented as a 'cascade.' A positive result from the first classifier triggers the evaluation of the second classifier, which has also been adjusted to achieve very high detection rates. A positive result from the second classifier triggers a third classifier, and so on. A negative outcome at any point leads to immediate rejection.  Individual locations in the image are checked to find whether or not it contains the object. It is performed through different variations of features until the object is found. This is optimized by considering the time taken. Feature variation is achieved by scaling, rotating and thresholding the  41  features of the classifier itself, rather than by scaling the image itself. This process makes sense because the features can be evaluated at any scale with the same cost.  Subsequent locations are obtained by shifting the window through N number of pixels. This shifting process is affected by the scale of the detector; if the current scale is s the window is shifted by sN. The choice of N affects both the speed of the detector and the accuracy. Since the training images have some translational variability the learned classifier will achieve a good object detection rate in spite of small shifts in the image. As a result the detector sub-window can be shifted more than one pixel at a time. However, a step size of more than one pixel will slightly decrease the detection rate while decreasing the number of false positives. A simplified process is presented by the flowchart in Figure 3.6 and the algorithm in Figure 3.7.  Terminate  1  2  Robot decide termination If (# detected features # features of the object)  F Capture an image; Convert to gray scale; Histogram equalization; Smooth image;  2  F  F  ≈  1  The object is found ! If further verification option =true  F  Repeat for #sub-windows Match templates of patches; Repeat for different scales, rotation s, thresholds  F If (object is found) & (property extraction)  F Compute feature values; ‘final-value’ = Detect_features(Feat ure_values, Cascaded_classifiers)  F If (‘final-value’ > 0) Count # detected features;  Extract properties of the object; Match with facts & constra ints;  If properties are matched  F  Draw bounding boxes;  1 Figure 3.6: Flow Diagram of Object Detection using AdaBoost-based Object Model.  42  Capture an image; Convert to gray scale image; // Preprocess Use Histogram equalization; Smooth image; // Detect features through several sub-windows Loop for sub-windows // Try with number of scales, rotations and illuminations Loop for multiple scales, rotations and thresholds Calculate feature, using sub-window; // Classify the feature with the classifier of the object Model, the algorithm is bellow this “final_value” = Detect_feature(feature, cascaded_classifier); If (sign of the “final_value” is positive) // The object is detected Exit from Loops; End if End loop End loop If (further verification = true) Match templates of blobs; End if If (object is detected = true) // Object property extraction Extract properties of the object; Match with the existing properties in Facts & Constraints in the objet model; End if If (properties of object are matched) // The object is detected and further verified Draw a bounding box at the object found location; End if Detect_feature(“feature”, “cascaded_classifier”) { Loop for number of classifiers in “cascaded_classifier” Pass “feature” through the particular classifier (weak Hypothesis, ht); Multiply the weight_factor(αt) and ht(“feature”); // αt is calculated when the classifier is trained Add products at each round; // so, the final values is “final_value” End loop return “final_value”; }  Figure 3.7: Algorithm of Object Detection using AdaBoost-based Object Model.  43  3.4.2. Feature-based Approach There are four general steps in a feature based system. First, in feature detection, each image is searched for locations that are likely to match well in other images. In the second feature description stage, each region around the detected feature locations is converted into a more compact and invariant descriptor that can be matched against other descriptors. The third feature matching stage efficiently searches for likely matching candidates in other images. The fourth feature tracking stage is an alternative to the third stage, which only searches a small neighborhood around each detected feature. Therefore, it is more suitable for video processing.  Feature based correspondence approaches are used in stereo matching by Hannah (1988) and more recently have become popular for image stitching applications (Badra et al., 1998; Brown et al., 2005; Brown and Lowe, 2007) as well as fully automated 3D modeling (Beardsley et al., 1996; Brown and Lowe, 2003; Snavely et al., 2006). These approaches use various types of features such as Haar-like feature, SIFT feature, KLT feature, and Corner-like feature. Feature matching is a general requirement in the matching strategy. There features in two images that have similarities are accurately tracked using a local search technique such as Least square, KDTree, or Best Bin First.  Due to the presence of higher dimensional features such as the 128 dimensions in SIFT feature vector, it is required to efficiently locate the nearest neighbour. There is a better way to find the nearest neighbour than using exhaustive search by using Best-Bin-First search strategy in KDTree.  A problem in the feature matching process corresponds to the presence of features arising from the background clutter. There are approaches to overcome this problem; for example, by searching for the distance from the closest match to the second-closest match.  Incorrect identification of features can be avoided with the technique of feature distribution matching, as proposed in the present thesis.  Most feature based matching systems are not restricted to using only a specific type of features. Instead, a collection of feature types is incorporated. For example, the system proposed by Weng 44  (1988) combines intensity, edges, and corners to form multiple attributes for matching. Lim and Binford (1987) on the other hand, used a hierarchy of features varying from edges and curves to surfaces and bodies (2D regions) for high-level attribute matching.  3.4.2.1. Object Detection using 3D Invariant Feature Space Model This method is proposed and implemented in the present thesis to extract SIFT and Corner-like features and detect multiple objects, by matching the features to the 3D Invariant Feature Space model.  Input image is pre-processed to highlight the features of the image. These features might be blobs, edges, corners, or ridges. The technique of histogram equalization is employed to obtain a proper illumination distribution. Down-scaling and up-scaling of the image may be done to filter the noise. An available executable program for SIFT feature extraction is used (which is embedded in the present implementation) to extract and compute SIFT feature descriptors. Also corner-like features are extracted using the Harris corner detector.  In the process of feature matching, the correspondence between two sets of features is found by minimizing some distance between the two sets of descriptors. If there is a closes variation of values of feature vectors, then the features are identified. When a number of found features have met the criteria to find an object, the object is considered to be detected. A criterion is the number of matched features, for SIFT features. According to literature, a suitable number is three. Another criterion is feature distribution, where feature points are scattered with a pattern on the object. This feature point distribution can be formulated into a chain-code with variations. It is possible to increase the accuracy of object detection by checking the distribution of the featurelocations. This simplified process is shown in Figure 3.8.  45  Terminate  1 Find and match feature distributions  Robot decide termination  F  If ((#ma tched SIF T features > 2) & (feature distributions are mat ched))  Capture an image; Get SIFT and corner feature descriptors; Remove repeated features;  T The object is found by SIF T features!  F Repeat for #object-models  If (reduce-proce ssing-time = false) or (#matched S IFT-feature s < 3)  F  2  F  Repeat for different rotations of object model Kd-tree to make feature tree of Corners; Kd-search to mat ch features; Record locations of matched features; Find feature distribution;  2 F Repeat for #objects to be found  Kd-tree to make feature tree of SIFT; Kd-search to mat ch features; Record locations of matched features;  If (feature distribution s Are matched) The object is found by corner features!  If (#mat ched SIF T features > 2)  1  T  2  T  Figure 3.8: Flow Diagram of Object Detection using 3D Invariant Feature Space Model.  Generally, the best candidate match for each feature point is found by identifying its nearest neighbour in the object model. The nearest neighbour is defined as the feature point with minimum Euclidean distance for the invariant descriptor vector. For the SIFT features, 128 dimensional feature descriptors need to be compared, in order to complete the matching process. These 128 vectors represent a histogram, and as such a correct distance between the two descriptors, a and b , is  128  (ai − bi )2  i =1  ai + bi  d (a, b ) = 2∑  46  Here, all entries of a and b are positive. Also cross correlation is suitable as a feature descriptor and it is affine invariant. These two techniques and many others can become infeasible due to high dimensionality (128 dimensions for SIFT descriptors) and high number of features.  Several widely used similarity measuring techniques are studied to reduce the computational burden while providing correct results. Out of these methods, KD-Tree, Locality Sensitive Hashing (LSH), and Sphere-Rectangle-Tree (SR-Tree) are particularly useful. KD-Tree with the Best Bin First (BBF) search strategy is chosen here, which has been successfully used before with SIFT (Beis and Lowe, 1997). The KD-Tree is a classic data structure for use in multidimensional indexing. Basically, it is a binary search tree where each node of the tree splits the search space along a given dimension of the space (Friedman, 1977).  To build a KD-Tree from a base set in the space, one starts by choosing the dimension to be split. The criterion of choice may vary from choosing the dimension according to the level on the tree, choosing it randomly, or picking the dimension where the data are more spread. The point of splitting must then be chosen by selecting an element as the pivot of the splitting. The pivot may be the median element on the splitting dimension, or the element nearest to the middle of the range of dimensions, or something else. A node is created with the pivot and the splitting dimension, the left sub-tree is created recursively with all elements that are less than or equal to the pivot, and the right sub-tree is created with all elements that are greater than the pivot. If the number of elements in a sub-tree is smaller than a threshold, a leaf is created and put in a bucket which simply contains all remaining elements. In order to perform the nearest neighbor search in a KD-Tree, one starts by finding the leaf nearest to the query. To do this, the tree is traversed from top to bottom, always choosing the sub-tree that is on the same side as the query in the splitting dimension. Once a leaf is reached, the nearest elements to the query are found and the distance to those elements computed. If this distance is smaller than the distance between the query and any other regions of the space, the nearest neighbours are found. Otherwise, the sub-trees corresponding to the other regions of the space are compared in order to check if there are nearer elements. The search stops when no region can possibly contain an element closer than the elements that are already found. The associated algorithm is presented in Figure 3.11.  47  The Best Bin First is one possible strategy for traversing a KD-Tree. First, the most probable nodes are found, and then the nearest matches are determined. This strategy starts by exploring the sub-trees that have the corresponding regions nearer to the query, and the search is stopped after a certain number of regions have been explored. The search is approximate and it is possible to reach a trade-off between efficiency and precision by setting the number of regions to visit (Beis and Lowe, 1997). The following pseudocodes present the logic of the implemented method. Loop unconditionally Capture an image (later there will be two images at a time from two cameras); Skip first few images (for initialize); Get SIFT and Corner feature descriptors of the real-time image; Remove repeated SIFT and Corner features at the same location; Loop for number of object models Loop for number of required objects to be found Make “SIFTfeatureList”; kdtree(“SIFTfeatureList”, “depth”); // Make a KD-Tree using the algorithm in figure 3.10 kdsearch(“kdtree”,“target”,“hr”,“max-dist-sqd”); // Use kd-search algorithm in figure 3.11 Record locations of matched features; If (count of matched SIFT features > 2) Find and match feature distributions; If ((count of matched SIFT features > 2) && (matched feature distributions)) The object is found by SIFT features!; End if If (reduce processing time = false) or (count of matched SIFT features < 3) Make “CornerFeatureList”; kdtree(“CornerFeatureList”, “depth”); // Make a KD-Tree using the algorithm in figure 3.10 kdsearch(“kdtree”,“target”,“hr”,“max-dist-sqd”); // Use kdsearch algorithm in figure 3.11 Record locations of matched features; Find feature distribution; If (feature distributions are matched) The object is found by corner features!; End If End If End If End Loop // Loop to find the next Object End Loop End main Loop  Figure 3.9: Object Detection using 3D Invariant Feature Space Model.  48  kdtree(“featureList”, “depth”) { // Select axis based on “depth” so that axis cycles through all valid values k = len(“featureList[0]”); // assumes all features have the same dimension axis = “depth” mod k; // Sort “featureList” and choose median as pivot element “featureList”.sort; // by axis from “featureList” median = len(“featureList”)/2; // choose median // Create node and construct subtrees var tree_node node; node.location = “featureList[median]”; node.leftChild = kdtree(features in “featureList” before median, “depth”+1); node.rightChild = kdtree(features in “featureList” after median, “depth”+1); return node; }  Figure 3.10: For a Given List of Features, the Algorithm will Construct a Balanced KD_ Tree Containing these Features. kdsearch(“kdtree”, “target”, “hr”, “max-dist-sqd”) { // Declarations “kdtree” – input KD-Tree; “dom-elt” – a feature from multi-dimensional space; “range-elt” – a feature from other multi-dimensional space; “target” – domain vector; “nearest” – a nearest feature to “target”; “split” – splitting dimension; “hr” – hyperrectangle; “dist-sqd” – square of the distance of this nearest feature, default is infinity; // Cuts the current hyperrectangle into the two “S” = “split” field of “kdtree”; “pivot” = “dom-elt” field of “kdtree”; Cut “hr” into two sub hyperrectangles “left-hr” and “right-hr”; The cut plane is through “pivot” and perpendicular to the “S” dimension; // Determine which child contains the target “target-in-left” = “targets” ≤ “pivots”; If (“target-in-left”) “nearer-kd” = left field of “kdtree” and “nearer-hr” = “lefthr”; “further-kd” = right field of “kdtree” and “further-hr” = “right-hr”; If (not “target-in-left”) “nearer-kd” = right field of “kdtree” and “nearer-hr” = “right- hr”; “further-kd” = left field of “kdtree” and “further-hr” = “left-hr”;  49  // This initial child is searched Recursively call kdsearch with parameters; (“nearer-kd”, “target”, “nearer-hr”, “max-dist-sqd”), storing the results in “nearest” and “dist-sqd”; // Check for any closer feature // restricts the maximum radius in which any possible closer feature could lie “max-dist-sqd” = minimum of “max-dist-sqd” and “dist-sqd”; // checks whether there is any space in the hyperrectangle of the further child which lies within this radius A near feature could only lie in “further-kd” if there were some part of “further-hr” within distance “max - dist - sqd” of “target”; If feature is lies within this radius // checks if the feature associated with the current node of the tree is closer than the closest yet If ((“pivot” – “target”)2 < “dist-sqd”) “nearest” = “pivot.range-elt” field of “kdtree”; “dist-sqd” = (“pivot” – “target”)2; “max-dist-sqd” = “dist-sqd”; // The further child is recursively searched Recursively call kdsearch with parameters; (“further-kd”, “target”, “further-hr”, “max-dist-sqd”), storing the results in “temp-nearest” and “temp-dist-sqd”; // If (“temp-dist-sqd” < “dist-sqd”) “nearest” = “temp-nearest” and “dist-sqd” = “temp-dist-sqd”; End if return “nearest”, “dist-sqd”; }  Figure 3.11: Nearest_Neighbor KD_Feature_Tree Search Algorithm.  The method provides good results under some constraints. Typical results are presented in the section 4.2.2. The determined features are shown by drawing a color rectangle at the feature point, and an object is shown by a bounding box.  3.4.3. Geometric Shape-based Approach Research has been carried out on applying the field of topology in object identification. Topology studies qualitatively question about geometrical structures such as properties that are invariant of shape. As an example, a doughnut and a rectangle are topologically equivalent, because a rectangle can be “stretched” to form a doughnut, and vice versa. This can help to 50  determine the invariant relationships between objects, or between various features of the same object. The study of invariant shape is extremely important for object identification since the same object may appear in many different views, depending on the direction of view, lighting conditions, other environmental factors, and occlusion.  Topology provides the ability to examine an object by qualitative means. The topological approach typically succeeds over other approaches due to the fact that it studies relationships between objects instead of relying on exact measurements. For example, trying to identify intersections between two roads geometrically relies on an infinite number of angle combinations, but topologically the only necessary condition is that the two lines intersect. In identifying structures, rather than looking for all possible shapes with a variety of sizes, topology would identify the characteristics that all buildings have, and use them in the initial detection.  3.4.3.1. Detection and Property Extraction of Regular Object In general, regular types of objects are used to build a useful device such as a cart, even in an unconstrained environment. Several functions are implemented in the present thesis to detect and track irregular types of objects. A regular type of object detection method is proposed to identify polygons and circular objects from real-time images. In this method, the identified object regions in the image are further processed with the given facts and constraint of objects (such as facts and constraints of useful objects, which are required for the cart assembling process) to get properties of objects.  The concept behind detecting polygon type objects is to find contours with a specified number of vertices, nearly specified degrees of angles between edges, and limiting the contour area to a certain range. Furthermore, for circular type of objects, center position and the corresponding pixel distribution, which are at a unique distance or in a proportional distance to the center, are found. In such cases, the object factors (number of vertices, values of angles, minimum and maximum radius) are varied according to a given specification with facts and constraints of the objects and by considering the shearing of the object in the image. An overview of the algorithm is presented as a flow diagram in Figure 3.12.  51  Terminate Robot decide termination  2 F  F  Capture am image; Use facts & constraint s; Gaussian smooth image;  Repeat for # contours  3  Approx imate the cont our;  F Repeat for three colour p lanes  F Use a colour p lane;  Che ck # vertices, area & convex ity  4 Repeat for d ifferent threshold leve ls  F  F Che ck ang les between ed g es  F Store t he p oly gon;  If (thre shold_leve l = 0)  Apply Canny edge d etect or; Apply dilate oper ator;  3  Apply t he t hreshold leve l;  4  If cir cular obje cts to be found  F  3  If rect ang ular object s t o be found  F Find cont our s;  2  T Compute grad ient;  F  Repeat for #columns & #r ows Sort gr adient v alues; Store t he locat ion of point s; Sele ct center s;  Repeat for #center s  F  4  Sort p oints according t o d ist ance; Find rad ii;  Figure 3.12: Simplified Process of Polygon and Circular Type of Object Detection.  52  Initially, a real-time image is converted into a gray image with the same dimensions and converted between color spaces. Then the image is pre-processed by defining the region of interest and smoothing the image to prevent false contours and circles being detected.  Contours and circles are searched through different color planes by applying different thresholds to overcome lighting issues. In this process, Canny edge detection method is used at the zeroeth threshold level, and a different threshold is applied to the image at the other threshold levels.  Then contour finding method is applied with contour parameters. While looping for each contour, one contour is picked and the number of vertexes, area of the contour and convexity are cheked for. Subsequently, angles between edges are found and if the angles are around the specified values then vertices are written to the resultant contour. Finally, these sequence of contours are stored in to a specified memory storage and return.  To find circular contours, for every nonzero point in the edge image, local gradient is considered (the gradient is determined by computing the first order Sobel x- and y-derivatives). Using this gradient, every point along the line indicated by this slope, from a specified minimum to a specified maximum distance, is incremented in the accumulator. Simultaneously, the location of each one of these nonzero pixels in the edge image is noted. The candidate centers are then selected from the points in this (two-dimensional) accumulator which satisfy the following condition:, The selected candidate centre should be 1. above a given threshold and 2. larger than all of their immediate neighbours.  These candidate centers are sorted in the descending order of their accumulator values, so that the centers with the most supporting pixels appear first. Next, for each center, all the nonzero pixels (recall that this list was built earlier) are considered. These pixels are sorted according to their distance from the center. Working out from the smallest distances to the maximum radius, a single radius is selected that is best supported by the nonzero pixels. A center is kept if it has sufficient support from the nonzero pixels in the edge image and if it is at a sufficient distance from any previously selected center. The pseudocode of the algorithm is presented in Figure 3.13.  53  PolygonAndCircularTypeObjectFinding(“image”, “polygon_storage”, “circle_storage”){ // Declare and initialize variables “min_radius”, “max_radius”, “num_vertices”, “max_angle”, “dimension_of_object”; // these from facts & constraints of object Declare image objects; “thresh” = 50; “thresh_levels” = 11; // Preprocess Define the region of interest in the “image” to reduce processing overhead and to avoid unnecessary results; Smooth “image” to prevent a lot of false contours and circles from being detected; // Find polygons and circles in every color plane of the “image” for(int c = 0; c < 3; c++) { Extract the c th color plane; // Try to find square and circular contours at several threshold levels for(int l = 0; l < “thresh_levels”; l++) { if(l == 0) { Use Canny edge detector with the upper threshold “thresh” and the lower threshold 0 (which force edges merging); // It is usefull to find polygons with gradient shading Use dilate operator to dilate Canny output to remove potential holes between edge segments; } else { Apply a threshold value to the “image”; // The threshold value is gradually increased at each iteration according to the “thresh_levels” } // To find polygons Find contours in “image” and store them as a list in “contours” and “contour_storage”; // Loop for each contour while(“contours”) { Approximate contour with contour parameters and store in “result” sequence; //Check number of vertices, area (to filter noisy contours) and convexity of contour if(num_vertices_in_“result” == “num_vertices” && ContourArea > 1000 && CheckContourConvexity(“result”)) { // Find minimum angle between adjecent edges “angle”=360; for(int i = 0; i < “num_vertices”; i++ ) { “temp_angle” = angle_between_two_edges; if (“temp_angle” < “angle”) “angle” = “temp_angle”; } // Check the angle requirement if(“angle” <= “max_angle”) for(i = 0; i < “num_vertices”; i++)  54  write i th vertice to the resultant contour of “polygons” contour sequence; } Point to the next contour; Store the polygon in “polygon_storage”; } // To find circles Gradient is computed by firstorder Sobel x- and y-derivatives; // Loop for every row in the “image” for(int row = 0; row < numRows; row++) { // Loop for every column in the “image” for(int col = 0; col < numcols; col++) { Using the gradient, every point along the line indicated by this slope from a specified minimum to a specified maximum distance is incremented; The location of every one of these nonzero pixels in the edge image is stored; } } Candidate centers are selected from those points that are both above some given threshold and larger than all of their immediate neighbors; Candidate centers are sorted in descending order, so that the centers with the most supporting pixels appear first; // Loop for each center, all of the nonzero pixels are considered for(int i = 0; i < centers->total; i++) { Pixels are sorted according to their distance from the center; Working out from the smallest distances to the maximum radius, a single radius is selected that is best supported by the nonzero pixels; A center is kept if it has sufficient support from the nonzero pixels in the edge image and if it is a sufficient distance from any previously selected center; Store success circle to sequence “circle_storage”; } } } Release all declared image objects; Return resultant “polygon_storage” and “circle_storage” sequences; }  Figure 3.13: Polygon and Circular Type of Object Finding Algorithm.  Facts and constraints of the objects are utilized to reject the found polygons and circular contours that are not matched with the given facts and not met with the given constraints of the objects that are examined. Once an object of interest is found, it is shown by drawing a similar shape at the found location on the image. Also properties of the detected objects are simply calculated using the found data. 55  3.5. Summary Different types of objects may be present in an environment with varying features and it is difficult to identify the corresponding objects with a reliable level of accuracy for a machinebased vision system. Hence feature enhancement techniques have to be employed to obtain accurate features of the object. Different types of feature enhancement techniques may be used depending on the type of feature. Performance of a vision system will degrade if primitive image processing operators are used for feature enhancement.  The feature based object modeling approach as proposed and developed in the present thesis is better suited than those that employ individual training images. The significant features of the developed approach are as follows: different types of features can be used; image variations are considered; run-time computational load is reduced; feature distribution is included; performance criteria (object detection rate and processing time per image) can be defined to optimize the processing time; and facts and constraints for the object can be defined to speedup and verify the results.  An existing principle is used to implement an object modeling method, termed “AdaBoost-based Object Model.” According to this principle, features affect the performance, and hence two different feature sets are introduced for regular and irregular types of objects.  Existing object detection methods were reviewed and some of them were tested to check their adaptability for the developed system. Methods are selected to be compatible with the types of objects considered, the environment, and the system requirements. An improved method is implemented using the existing principle of “Object Detection using AdaBoost-based Object Model” which uses “AdaBoost-based Object Model.” This method gives a fast object detection rate while presenting the major drawback of time consuming training process.  A new method termed “Object Detection using 3D Invariant Feature Space Model” is proposed and implemented by incorporating and enhancing several existing techniques. SIFT and Cornerlike features are used there. The main purpose of this method is feature matching. Hence the best available feature matching strategies are incorporated into the method. In order to increase the object detection rate, feature distribution is introduced and matched. 56  Another object detection method termed “Detection and Property Extraction of Regular Object” is proposed and implemented, which is based on geometric properties of the object. This method provides better results for regular type of objects. Facts and constraints are used to verify the detected objects. This can provide properties of objects with less effort when compared with other methods.  This research will be further extended by automating the object modeling process and introducing a model verification method for the proposed “3D Invariant Feature Space” approach. In addition, the object modeling approach "AdaBoost-based Object Model” and the object detection method “Object Detection using AdaBoost based Object Model” will be extended for multi-object detection.  57  Chapter 4 Experimental Investigation First experiments are carried out on object identification and feature extraction using existing methods such as Mean-Shift object detection and tracking. It has been discussed in the thesis that some of these methods (such as the use of the existing Haar-like features in AdaBoost) are inadequate to obtain reliable results at required speeds as needed in the multi-robot project. Therefore existing methods are modified and integrated with new methods to meet the requirements of the specific application. However, some methods such as SIFT feature extraction technique and Harris Corner detector technique are used without modification in the present project. The applicable and developed methods have been discussed in detail in Chapter 3.  The vision application as developed in the present thesis is implemented in Microsoft Visual Studio 2005 ™ development environment with the use of existing software libraries. Different types of computers and camera hardware are used in the developed system as described in the present chapter. Experiments are carried out in the laboratory environment by varying the type of objects, properties of objects, and the environment. In response to a specific request to find an object (the provided information may include the type of object that needs to be detected, facts and constraints needed to define the object, required level of accuracy and speed) in coordination with the robot which will carry out the task, the vision system will select the most suitable object detection method among the available methods. The following sections give some representative experimental results from those methods. It is found that the required levels of accuracy and speed are achieved in the object detection tasks. Experimental results are discussed and further enhancements are proposed to overcome existing drawbacks.  4.1. Experimental System The present vision application is developed using Microsoft Visual Studio 2005 development environment and functions in OpenCV™ computer vision and ARIA Pioneer™ 3 robot software libraries. It is setup in Sony VAIO™ computer and the built-in computer of the Pioneer™ 3 robots in our laboratory. The application is implemented on a Microsoft Windows ™ platform. 58  The robots have Windows 2000™ and the Sony VAIO™ computer has Windows Vista™ operating system. Images are captured using three types of cameras. Training images are also captured in a similar environment. Object models are formed using the training datasets. Experiments are carried out in the laboratory environment, which is a typical unstructured environment. Different environmental factors and variations of objects are considered during the experimentations.  The following sections outline the hardware and software in the experimental system, the implementation of the system, and the test environment used for the present experiments.  4.1.1. Run-Time Software Platform This is an object-oriented development: the functions are developed and integrated as a class library to be integrated in robotic applications. Experiments are carried out in Microsoft Windows Vista™ platform and use Microsoft Visual Studio C++ 2005™ (MVSC++) development environment and OpenCV™ and ARIA™ class libraries.  The P3-DX/AT robots with the ARIA™ software have the ability to: (1) Wander randomly; (2) Drive under control of software, keys or joystick; (3) Plan paths with gradient navigation; (4) Display a map of its sonar and/or laser readings; (5) Localize using sonar and laser distance finder; (6) Communicate sensor and control information related to sonar, motor encoder, motor controls, user I/O, and battery charge data; (7) Provide C/C++/JAVA development platform; and (8) Simulate behaviors off-line with the simulator that accompanies each development environment.  4.1.2. Computer and Robot Hardware The mobile robots are manufactured by MobileRobots Inc. (formerly ActiveMedia Robotics Company) which is a main player in the mobile robot market. In the present project, one fourwheel driven PioneerTM 3-AT robot and two two-wheel driven PioneerTM 3-DX robots are used. Built on a core client-server model, the P3-DX/AT robots contain an embedded Pentium III computer, opening the way for on-board vision processing, Ethernet-based communication, laser sensing, and other autonomous functions. The P3 robots store up to 252 watt-hours of hot59  swappable batteries. They come with a ring of 8 forward sonars and a ring of 8 rear sonars. Their powerful motors and 19 cm wheels can reach speeds of 1.6 m/s and carry a payload of up to 23 kg. In order to maintain accurate dead reckoning data at the speeds, the Pioneer robots use 500pulse encoders. The robots provide laser-based navigation, bumpers, gripper, vision, compass and a rapidly growing suite of other options. The appearance of the P3-DX robot and P3-AT robot is shown in Figure 4.1.  Figure 4.1: (a) The PioneerTM 3-DX Mobile Robot; (b) The PioneerTM 3-AT Mobile Robot.  In summary, the Pioneer-3 DX or AT robot is an all-purpose mobile base, which is useful for research and applications  involving  mapping, teleportation,  localization,  monitoring,  reconnaissance, vision, manipulation, cooperation and other behaviors.  Sony VAIO™ VGN-FW140D computer with Intel Core2 Duo™ processor (2.26 GHz, FSB speed – 1066 MHz, L2 Cache – 3 MB), 3 GB memory is used in experimentation.  4.1.3. Camera Hardware Canon VC-C50i Communication Camera™ is integrated with PioneerTM 3-DX and PioneerTM 3AT robots. The Sony VAIO™ VGN-FW140D computer has a built-in 1.3 megapixel Sony Visual Communication Camera™. A Logitech QuickCam™ camera is also used in the present experimentation.  60  4.1.4. System Implementation Basic functions are programmed to perform specific tasks. These functions are utilized in object modeling and object detection methods. There are other basic functions in run-time-modes, to select object detection methods according to the provided performance specification and facts and constraints of the objects. These functions are named under the main module of the implementation diagram, as shown in Figure 4.2.  Figure 4.2: System Implementation Diagram.  These basic functions are integrated to carry out higher level tasks in each module of the implementation diagram. Higher level functions are called within the main function to autonomously operate and generate the required results with expected performance level, as described in Chapter 2. The architecture is designed and implemented using object-oriented methodology. The entire project is programmed and built on Microsoft Visual Studio C++ 2005™ development environment with the use of OpenCV™ and ARIA™ class libraries. Finally the executable program is installed in the PioneerTM 3-AT and PioneerTM 3-DX robots and Sony VAIO™ VGN-FW140D computer, for experimentations. 61  4.1.5. Properties of the Environment The experimental environment is typically unstructured. Different types of variations are considered such as changing illumination. Location and orientation of the objects are varied from time to time. In addition, camera angels are also changed. Different types of objects are used. The images in Figure 4.3 show the experimental environment in the Industrial Automation Laboratory (IAL) at The University of British Columbia where the present experiments are carried out. Different types of objects with different sizes and colors are used in the experiments. The orientations and scales of the objects are subjected to variation under different lighting conditions (from location to location). Also, shadows may appear and the illumination levels may change.  Figure 4.3: The Laboratory Experimental Environment.  4.1.6. Objects Used in Experiments Three types of objects: featured (texture-based), featureless, and regular geometric-shape, are considered in the experiments. The chain sprocket of a bicycle is selected as a featured object, which has teeth and holes to represent features together with a shiny side which reflects light. It may be considered as a planer object of diameter 11.43 cm with negligible thickness (in the mm range). The wheel of waste container and a toy vehicle are used as circular type featureless objects. These have fewer number of features, compared with the chain sprocket, and both are black in colour with radii of 10.16 cm and 7.62 cm, respectively. Boxes are chosen as regular type objects, with the following identifiable features. Box 1: (36x23x15) cm, white colour markings on glossy black background. Box 2: (64x50x28) cm box has brown and black (matt) areas on the longer sides. Box 3: (46x30x22) cm box is red in colour. In addition to these three 62  types of objects, the robots and the obstacles are also considered as objects. These vary in size and colour. Also the location may vary with time.  4.2. Experimentation Three types of objects as described in section 4.1.5, namely featured, featureless and geometric shape based objects, are used for the experiments. Object models are formed or trained in an offline process or in real-time in the initial stage. In real-time, applicable object detection method is identified (among the three object detection methods, as described in Chapter 3) using the real time method selecting technique. Experiments related to these methods are presented in sections 4.2.1, 4.2.2 and 4.2.3. Variations such as scale of the object (by varying the position), different lighting conditions, changing the orientation, and occlusion, are incorporated in the experimentations.  4.2.1. AdaBoost-based Object Modeling Training datasets are captured to generate object models. This method of object model generation is a watchful and time consuming task. These models are used in the “Object Detection using AdaBoost-based Object Model” method and it has produced good results when compared with other methods. Further improvements are carried out as well. Future enhancements are proposed on the basis of the obtained experimental results.  Image sets are captured in the laboratory environment for different type of objects. These objects include wheels of a waste container, toy vehicle, chain sprocket of a bicycle, and boxes. More details of these objects are presented in section 4.1.5. Image sets were subjected to different variations while keeping the key features of the object. A developed software tool is utilized to automatically extract and save positive object image sets.  In the beginning of the object modeling task as carried out in this thesis, due to in-adequate number of training images in the positive and negative image sets, (600 images and 1000 images), it was not possible to detect the target object using the generated object model. Better results are obtained by increasing number of training images to the range of one thousand and 63  reducing the size of the images to 16x16 pixels (in the previous modeling, it was higher; for example, 40x40 pixels). It is better to include the outer boundary of the object rather than tightly cropping the object in the image. This object modeling process is more time consuming. For instance, it has taken one week to train a single model. A successful object model has been formed for the wheel object. The object training set consists of 4916 hand-labeled images of the wheel which are scaled and aligned to a base resolution of 24 pixels x 24 pixels. The objects are extracted from images as captured in the laboratory. These images of the wheel contain much of the object, and also include extra visual information such as the outer boundary which helps to improve the accuracy. For the negative training data set, the non-wheel object sub-windows (also of size 24 pixels x 24 pixels) are collected by selecting random sub-windows from a set of 9500 images which do not contain wheel objects. Different sets of non-wheel object sub-windows are used for the training different classifiers. Some typical object images are shown in Figure 4.4.  The object model is formed by incorporating Facts and Constraints with the final classifier. Variations of facts are also included. This requires less computer processing power as it is not a time consuming task.  (a)  (b)  Figure 4.4: Sample Object Images in the Training Sets: (a) Positive Image Set of Wheel Object; (b) Negative Training Image Set.  When capturing the training image sets, environmental factors such as background, location, orientation, lighting, and so on need to be varied in order to improve the robustness of the model. Also, it is important to keep key invariant features of the object while varying the other unnecessary features in the image. All other features which do not belong to the object in the negative image set are distributed. In other words, it is only required to keep required features in every image, while distributing the other features throughout the image set. Also in a positive 64  image set, much of the background should be avoided. This will result in a smaller standard deviation implying that it is easier for the classifier to learn. For a negative image set, it is needed to have background images of high resolution (A high percentage of background images are rejected in the first few stages. These are the images that cannot be used in later stages of the training process). All image sub-windows used for training are variance normalized to minimize the effect of different lighting conditions. Normalization is therefore necessary during detection as well.  Training duration for the object model was found to be one week on a single computer with 1 GHz Intel PentiumTM processor. During this laborious training process several improvements to the learning algorithm were discovered. Attention has to be paid to reducing the training time, by utilizing enhanced techniques (see section 3.3.3 in Chapter 3 for further details.)  4.2.1.1. Experiments with Object Detection using AdaBoost-based Object Model Scale-, rotation-, and illumination-invariant capabilities of the method are tested by varying the location and orientation of the objects and lighting of the background. The background consists of different types of objects with different configurations. Once an object is identified, it is encircled by a red rectangle.  4.2.1.1.1. Location-Invariant Behaviour  Similar wheels are identified from different locations as shown in Figure 4.5. It is not possible to detect an object if the object is in a distant location. In the present experiment with the wheel, the maximum possible distance is 8 meters.  (a)  (b)  (c)  Figure 4.5: Correctly Oriented Wheels are Identified when the Location is Varied (the Object is Present in Different Scales).  65  4.2.1.1.2. Orientation-Invariant Behaviour  The object has different orientations with different camera angles as shown in Figure 4.6. The objects are identified if the camera angle is between 450 and 700 and the object angle is at most 600 from the front. Also note that the objects are scattered in the environment.  (a)  (b)  (c)  Figure 4.6: Similar Type of Wheels are Identified. Single Object is Identified Even Though There are Many Such Objects (this Method Can Detect Only One Object at a Time).  4.2.1.1.3. Invariance to Location, Orientation and Lighting Combination  During these experimentations, in addition to varying the background lighting, the location and the orientation are also varied. A similar object arrangement is present in the images (b) and (c) of Figure 4.7, but lighting level is reduced in image (c). Detected object is shown with a bounding box in the image.  (a)  (b)  (c)  Figure 4.7: Detection of Wheel Under Different Lighting Conditions.  In conclusion, with the 3D Invariant Feature Space-based method it is difficult to detect the wheel object, since the object does not contain significant features for detection by using the SIFT feature detector. Also, when an object is not properly oriented in the scene, it is difficult to 66  detect the object using the Geometric shape-based object detection method. But, the AdaBoostbased method produced better results, since the object model is generated by repeatedly finding small invariant basic geometric shapes throughout the large number of training images. These images consist of different variations of the object from different environments.  When different cameras are used in object detection, noise of the cameras also will degrade the accuracy of the results. This can be overcome by using images taken from different cameras in the training process.  According to the experimental results, it is required to increase the number of images in the positive and negative (background) image set to generate a better model. The minimum number of positive images required is 1000 and it is recommended to have about 5000 or higher number of positive images. The rule of thumb is: the number of negative images is five times the number of positive images. The time taken to detect the object in real-time is not significantly increased even when the number of images are increased during the training process.  It is proposed to use a faster machine with enhanced memory capacity (few gaga bytes), and memory of large buffer size in the training process. The default parameters (scale_factor, min_neighbors, flags) of the method have to be tuned for accurate object detection.  The method provides lower processing time per image, on a Sony VAIO™ computer with 2.26 GHz Intel Pentium IV ™ processor, by processing a 640x480 pixels image in about 2 seconds. This is true because, a large majority of sub-windows are rejected by the first or the second layer in the cascade classifier in the model. The feature values of the sub-windows in the real-time image are compared with the parameters of the model, rather than matching individual features in feature based methods. The average object detection rate is recorded as 95% of 100 images.  Major drawbacks of this method are: 1) A large number of images are required with different variations including those obtained using different cameras. Also positive dataset generation requires close attention and it involves manual cropping of objects from images; 2) due to the presence of higher number of features in a higher number of images, object modeling process can take few days or a week.  67  4.2.2. Experiments with Object Detection using 3D Invariant Feature Space Model Two different types of objects are used in the experiments: a chain sprocket of a bicycle which contains a large number of features, and a wheel of a waste container which consist of only a features. There are variations in the environment and object arrangements. In addition, the camera orientation is also changed. The results obtained from the experiments in the laboratory environment are outlined next.  4.2.2.1. Experiments with Different Variation Factors Different environments are created by arranging the objects and varying the background lighting and the direction of the lights. The view angle and the location of the camera are also subjected to change. Objects to be identified are placed in different locations with different orientations, and the objects may be partly occluded. The detected feature points are shown by “X” symbols and indicated by red colour arrows. The detected object is indicated by a surrounded rectangle. Since this method can detect multiple objects, different coloured rectangles are used to indicate the detected objects.  4.2.2.1.1. Effect of Misidentified Features  One advantage of the developed approach is that there is a low presence of misidentified features. The images in Figure 4.8 have a misidentified feature. Due to that, the bounded box covers not only the desired object. To overcome this drawback, the feature distribution of the detected features is encoded and matched with the feature distribution of the object model. After introducing the feature distribution technique, correct results are obtained, as shown in figures 4.9 through 4.12.  (a) (b) (c) Figure 4.8: Results Before using Feature Distribution in the Object Detection Method. 68  4.2.2.1.2. Varying the Lighting Condition, Object Location and Orientation  Results of these experiments are given in Figure 4.9. They show that it is possible to detect the wheel under different lighting conditions, orientations and locations. This is true since the identified SIFT features are robust to variations in scale, orientation and illumination.  (a) (b) (c) Figure 4.9: The Wheel (Object) Placed at Different Locations, Under Different Lighting and Orientations. 4.2.2.1.3. Different Orientations and Close Camera Position  During experimentation it is noticed that a greater number of features are detected (as shown in Figure 4.10) when the object is close to camera. Even for different orientations of the object they appear in the image that is used to extract features.  (a) (b) (c) Figure 4.10: Camera is Placed Close to the Wheel and the Object Assumes Different Orientations.  69  4.2.2.1.4. Varying the Camera Angle  Camera angle is changed from 450 to 800 with respect to the ground. The number of identified features is very low (as shown in image (c) of Figure 4.11) when the chain sprocket (object) appears as a planer object from a distant location, since the features of the object are not in the image. As seen in the image (a) and (b) in Figure 4.11, the object is shiny. The effect of this is also to reduce the number of detected features.  (a)  (b)  (c)  Figure 4.11: Camera Angle is Varied. 4.2.2.1.5. Occlusion of Object  The experimental results in Figure 4.12 show that the wheel is detected even when it is partly occluded. At least three identified features are adequate to detect the object, since the SIFT features are robust and stable under a number of variations.  (a)  (b)  (c)  Figure 4.12: The Wheel is Partly Occluded.  70  4.2.2.1.6. Detection of Different Types of Multiple Objects  Images in Figure 4.13 show the identified features of a wheel of a waste container. According to the results, most of the time only one feature is found in the real-time image. A low object detection rate is recorded for this object due to the relatively few number of features it contains, and since the available features may not appear in the image, in comparison to the chain sprocket which contains a higher number of features and detected key features.  (a)  (b)  (c)  Figure 4.13: Mostly One Feature is Detected for Wheel of Waste Container.  Note that, the experiment in 4.2.2.1.6 is carried out before introducing the feature distribution technique to discard the misidentified features. Most of the time only one feature can be detected for this type of featureless object. This is a drawback of this method.  This approach provides better results when the object contains a large number of significant features (for a chain sprocket, features such as holes, tooth and rings are significant). Most of the features are identified even though the location and the pose of the object are varied and the object is partly occluded. Also it is robust to the lighting on the object surface even when it has shiny metallic surface which reflects lights. Since this is a SIFT feature-based approach, it reduces the damaging effects of image transformations to a certain extent. Features extracted by SIFT are invariant to image scaling and rotation, and partially invariant to lighting changes. Although, SIFT is not robust to affine variations, it is possible to use SIFT on other detectors to get the affine-invariant feature.  Unlike correlation-based methods, SIFT descriptors are extracted to represent these features instead of image windows. When measuring the similarity of features, KD-Tree Nearest71  Neighbour finding technique is used to reduce the searching in feature space. Further experiments have to be carried out by using SIFT and corner features in the method.  The SIFT feature vector generating algorithm will be updated by incorporating the HessianAffine method (to reduce the affine and shearing effects). According to the literature, there are number of enhancements that can be carried out to generate proper feature descriptors. The use of SIFT and GLOH (Gradient Location and Orientation Histogram), which is generally accepted as the most effective descriptors, PCA-SIFT (or Gradient PCA) will be another extension. Also SURF (Speeded Up Robust Features) may be used in place of SIFT to increase the robustness while reducing the runtime computational complexity.  The feature of object property extraction has to be included in the method. Also imaging methods with a high dynamic range may be used.  4.2.3. Experiments with Detection and Property Extraction of Regular Objects This method is employed for detection and parameter extraction for regular type of objects. Images are continuously captured and fed to the algorithm to get a sequence of identified regions and their parameters. Initially two types of experiments for wheel and box type of objects are carried out. Better experimental results have been achieved though there are a few drawbacks (some of these are common to other methods). On the basis of the results, solutions are proposed to overcome the existing drawbacks of the algorithm.  In the first type of experiment, different types of wheels under different variations are considered. There are five cases. The radius of each of these circular type objects is extracted and shown by drawing a circle with the same radius in the resulted image. In the second type of experiment, for different rectangular objects, dimensions and center locations are shown in the images. The shown values are with reference to the camera frame.  Object models are not required for this method. Objects are found by utilizing facts and constraints which are provided to define the objects to be found.  72  4.2.3.1. Experiments with Different Variation Factors 4.2.3.1.1. Objects with Different Orientations  In image (a) of Figure 4.14, the chain sprocket adjacent to the waste container is not detected due to it’s orientation. Other images show the detected chain sprocket and the extracted radius by drawing a color circle. There are drawbacks in this method. As shown in image (c) of Figure 4.14, the wheel of the waste container is not detected due to it’s orientation. However, when the robot is wandering it is possible to detect the undetected objects, if the objects are properly (but not exactly) oriented in the images.  (a)  (b)  (c)  Figure 4.14: Effect of Orientation in the Detection. 4.2.3.1.2. Variation of Camera Angle  This experiment is similar to the experiment in section 4.2.3.1.1, but the number of objects is increased and the camera angle is changed. The sequence of images show that it is possible to consistently detect and extract the properties of objects, even when some background properties are changed. It is clear form the results that the variation of the scale and slight changes in the orientation of the object do not affect the test results.  (a)  (b)  (c)  Figure 4.15: Camera Angle is Continuously Varied.  73  4.2.3.1.3. Object at a Distant Location  This experiment is performed to detect the wheels of a robot; so, the defined radius is 20 units in the provided facts. The sequence of images shows that it is possible to detect the object even when the object is at a distant location but with correct orientation. Some incorrect results are found in the image (c) of Figure 4.16, since the defined maximum radius is small.  (a)  (b)  (c)  Figure 4.16: Scale Invariant Behaviour of the Method.  According to the results, variation of radius for circular objects, slight changes of the orientation, collision with another object (while a considerable number of features are present in the image), and scale variation (by changing the object location) will not affect the results of detection and parameter extraction of the objects. In the experimental environment, different locations have different lighting conditions, which affect the robustness, and also introduce geometric distortions and affine distortions. 4.2.3.1.4. Errors in Parameter Extraction  In this experiment, the objects are correctly detected except for image (c) of Figure 4.17. The error is in the parameter extraction of the chain sprocket in image (c) of Figure 4.17.  (a)  (b)  (c)  Figure 4.17: Radius of the Detected Object is Shown by the Corresponding Circle Drawn in the Found Location.  74  4.2.3.1.5. Multiple Objects of Different Radius  In this experimentation, to detect objects with different radii, the minimum and the maximum values of radius for circular objects are provided with constraints. The images in Figure 4.18 show good results. Some errors are still present as in the experiment of section 4.2.3.1.1.  (a)  (b)  (c)  Figure 4.18: Detection of Multiple Objects.  4.2.3.1.6. Detection of Rectangular Objects  In this experiment, the detected objects are bounded by a green color convex contour. The shown numbers are related to their properties, such as: location of the center, width, and height.  Images in Figure 4.19 show correctly identified objects. Variation of location and orientation (slight) do not affect the results. Proper outer boundaries of the objects are required for achieving good results. These boundaries should be highlighted in the image with correct orientations. This method works on Canny edge detection and contour finding algorithms.  (a)  (b)  (c)  Figure 4.19: Different Types of Rectangular Objects are Detected.  75  4.2.3.1.7. Requirement of Facts and Constraints  Images in Figure 4.20 show some drawbacks of the method. Orientation of the object and the illumination at the location affect the results of extracting the geometric parameters of the objects. Some objects are not detected or are partially detected as a result. It is possible to reduce the misidentifications by providing additional facts and constraints for the desired objects.  (a)  (b)  (c)  Figure 4.20: Lack of Facts and Constraints Leads to Reduced Accuracy.  4.2.3.1.8. Detection of Different Type of Multiple Objects: Circular and Rectangular  This experimentation is carried out using rectangular and circular objects. Different types of properties and arrangements in the background are considered for these two types of objects. Detected and extracted properties are shown in Figure 4.21. Drawbacks and suggestions are similar to those given in sections 4.2.3.1.1, 4.2.3.1.3, 4.2.3.1.6 and 4.2.3.1.7.  (a)  (b)  (c)  Figure 4.21: Detection of Different Types of Multiple Objects Under Different Variations of Objects and Background.  76  One advantage of this method is that it is not required to train a model. Accuracy of the results and the speed of detection can be increased by providing extra facts and constraints of the objects to be found. Average processing time for an image is very low (about 3 seconds). This is a faster method than the other methods used in the present work. But object detection rate (accuracy) is average (about 65% of 100 images). Since this method is based on the properties of the objects, it is easy to extract the properties of objects using this method.  According to the results, parts of the object are detected since the object contains black (or green or red) color areas. Also, this method produces better results for black (or green or red) colour outer boundaries, since contours are being found through the RGB color planes. Therefore, it is required to update the algorithm to work with other combinations of RGB colors. It is also possible to detect the undetected objects by increasing the number of threshold levels, but it will result in increased processing time of the image.  These drawbacks can be reduced further by improving the algorithms (according to the suggestions given with the experimental results) and by involving facts and constraints that are related to the environment and the objects. It is required to enhance the functionalities of the image pre-processing, such as introducing adaptive histogram equalizing technique (according to the change in illumination). Also the method may be enhanced to automatically change the dominant parameters (such as parameters of the smoothing operation, and change in resolution) to increase the detection rate and accuracy of parameter extraction. The method may be improved to extract properties of elliptical objects (circular objects subjected to variations). The camera is positioned two feet from the ground and at an angle of 450 to 700 to the horizontal. With this arrangement, affine distortions also affect the correct representations of poses of objects. As an example, a circle may be presented like an oval shaped object (as shown in the images above). It is required to position the cameras to overcome this affine distortion in addition to improving the algorithms. One camera should be located perpendicular to the ground surface (and at a height of about 1.5 feet from the ground) to nearly face to the objects, since most of the object surfaces are perpendicular to the ground (for example, a box type object). Also there are other types of object surfaces that are parallel to the ground (e.g., wheel type objects). Therefore, it is useful to have another camera at 5 feet above the ground and parallel to the  77  ground surface. The application may use movable cameras (in different directions) to capture images where the object correctly appears.  When the robot is wandering, the possibility of getting accurate results improves, as this will generate sets of images from different directions (the camera may be correctly oriented with respect to the object pose in some of these images). But this will decrease the object detection rate.  4.3. Summary Experiments for three different object detection methods are presented. In each experiment, properties of the objects and the background are varied. Mainly three different types of objects are used. These objects are selected to test the capabilities of each method. Results are discussed under each experiment. Also, further possible enhancements are presented. A comparison of the results from three different object detection methods is presented in Table 4.1. The object detection rates are achieved for the detectable object types presented in Table 4.1 and under different degrees of variation of the objects and the environment. The average processing time for a 640 x 480 pixel image depends on the computer performance.  Generally, the performance of the application depends on the computer performance. There is a major impact of the operating system even with higher speed hardware. Lower performance was noticed with the Microsoft Windows Vista  TM  operating system and better performance with the  Microsoft Windows 2000 TM operating system. Furthermore, better performance will be possible when the system runs on a LinuxTM or UnixTM platform. Several aspects were considered during the development of the vision system such as reducing and releasing the memory usage, removal of non-essential processes in the operating system, and preventing some processing steps in the methods in order to achieve the expected performance as needed by the robot for task execution.  78  79 Table 4.1: Comparison of Object Detection Methods with Results.  Object detection method  Object detection rate for 100 images  Average time to process a 640 x 480 pixel image  Detectable object types  AdaBoostbased  95%  2 Seconds  Geometric Shape-based  65%  3D Invariant Feature Space (3DIFS)-based  90%  Accuracy under variation of properties of objects and environment Proximity  Orientation  Lighting  Occlusion  Affine  All type, Better results for featureless object than 3DIFS  High  Medium  High  Medium  Medium  3 Seconds  Regular type  Medium  Better results for slight variation  Medium  Low  Very Low  6 Seconds  Featured objects  High  High  High  High  High  Chapter 5 Conclusions 5.1. Synopsis and Contributions An architecture for a vision system was proposed and developed in the present thesis in order to provide vision requirements for a mobile multi-robot application. The main objective of the developed system is to identify useful objects and extract their parameters (properties) for use by the robots for locating, grasping, transporting, and manipulating these objects to assemble a useful device.  The developed system includes modules of image pre-processing, object modeling process, realtime object detection, parameter extraction, and knowledge/facts/constraints. Facts and constraints are incorporated to reduce the processing overhead and to increase the object detection rate (detection accuracy) while considering the required level of accuracy for the robotic tasks. Also, utilization of the techniques is optimized in the run-time modes, which is a challenging task. There, object detection methods are selected to be compatible with the type of object, environment, and the system requirements.  The feature of object property extraction is embedded in the object detection method. Also the detected object area of the image is further processed to estimate important properties of object.  Enhanced feature sets are used in object modeling and detection. An available principle is used to implement an object modeling method called “AdaBoost-based Object Model” by using different Haar-like feature sets. The feature set is selected according to the object type. An advantage of the AdaBoost algorithm is its simplicity in implementation. One major drawback is that a large number of carefully selected training images are required for the training process over a long time duration. However, better performance is achieved with the proposed method of feature set selection by considering the object type. Also robustness is enhanced by varying the scale, orientation and illumination up to several levels. The algorithm is less susceptible to the over-fitting problem when a large number of training image data sets are used.  80  The feature based object modeling approach called “3D Invariant Feature Space” is introduced. Model represents features in a 3D feature space, which can be used with geometric transformations for robust feature identification. The significant features of this approach are as follows: different types of features can be used, image variations are considered, run-time computation is reduced, the feature distribution is included, performance criteria (object detection rate and processing time per image) can be defined to optimize the processing time, and facts and constraints for the object can be defined to speedup and verify the results.  Existing object detection methods were reviewed and some of them were tested to check the adaptability for the considered system. An existing principle was enhanced to implement the AdaBoost-based object detection method called “Object Detection using AdaBoost based Object Model” from a discriminative approach. This uses “AdaBoost-based Object Model.” This method provides a fast object detection rate at run-time and there are no parameters to tune in training.  In addition, two object detection methods based on features and geometric shape were introduced and implemented; namely, “Object Detection using 3D Invariant Feature Space Model” and “Detection and Property Extraction of Regular Object.” The former method uses SIFT and Corner-like features. The main aspect of this method is feature matching. The second method is based on geometric properties of the object. This method provides better results for regular type of object. Facts and constraints are used to verify the detected objects. This method can provide properties of objects with less effort when compared with other existing methods. These methods make use of object models except in the geometric shape-based method.  Utility software tools were developed using available software class libraries. One is “Extract Object Defined by User,” the tool used to extract training images from an image source (such as camera, video, or images). Another software tool was used to create an image in the robot by reading the robot’s camera buffer.  A series of experiments was carried out using these different object detection methods. The methods produced good results for the particular types of objects considered, with some constraints. There, image preprocessing and feature enhancement techniques are required for improving the object detection rate. However, this increases the average processing time per 81  image. Feature based methods have two major advantages: 1) Features can act to encode ad-hoc domain knowledge that is difficult to learn using a finite quantity of training data; 2) A feature based system operates much faster than a pixel-based system. But if the object is not a featured one, during the experiments it was noticed that the methods “Object Detection using 3D Invariant Feature Space Model” and “Object Detection using AdaBoost-based Object Model” are not suitable for object detection. In this situation, the “Detection and Property Extraction of Regular Object” method may be used.  5.2. Possible Future Work In future, multiple threads (multiprocessing) may be used to run different object detection methods concurrently. The developed system architecture may be updated for use with multiple robots and may be included with stationary cameras. The laser range finder of a mobile robot may be integrated to find the distance to the objects.  This research may be further extended by automating the object modeling process and introducing a model verification method for the proposed “3D Invariant Feature Space” approach. For higher speed, SURF (Speeded Up Robust Features) may be used instead of SIFT features. Intelligent techniques may be used to accurately identify an object in a practical and complex environment. The AdaBoost-based object modeling approach may be extended for simultaneous detection of multiple objects. In the modeling, in addition to Haar-like features, other features may be used such as object area, object orientation and object appearance in the form of a density function such as histogram. With these suggested improvements, extensive experiments should be carried out in the real environment of the practical application. Multi-robot collaborative object recognition is an extensive research area, where multiple cameras may be integrated. Combination of technologies may be integrated for robust object identification. There it is possible to use voice recognition, external data and knowledge, and results of history (learning). It may be possible to design and develop robust recognition technologies by considering other factors that are not directly related to vision. 82  Bibliography Avidan, S., “Support vector tracking”, IEEE Computer Society Conference on Computer Vision and Pattern Recognition, Kauai, Hawaii, pp. 283-290, 2001. Badra, F., Qumsieh, A., and Dudek, G., “Rotation and zooming in image mosaicing”, IEEE Workshop on Applications of Computer Vision, New Jersey, US, pp. 50-55, 1998. Baumberg, A., “Reliable feature matching across widely separated views”, Proc. IEEE Conf. on Computer Vision and Pattern Recognition, South Carolina, US, pp. 774-781, June 2000. Beardsley, P., Torr, P., and Zisserman, A., “3D model acquisition from extended image sequences”, Fourth European Conference on Computer Vision, Cambridge, England, pp. 683695, 1996. Beis, J.S. and Lowe, D.G., “Shape Indexing Using Approximate Nearest-Neighbour Search in High-Dimensional Spaces”, Proceedings of the Conference on Computer Vision and Pattern Recognition, San Juan, Puerto Rico, pp.1000, June 1997. Belongie, S., Malik, J., and Puzicha, J., “Shape matching and object recognition using shape contexts”, IEEE Trans. on Pattern Analysis and Machine Intelligence, Vol. 24, 2002. Black, M.J. and Jepson, A.D., “EigenTracking: robustmatching and tracking of articulated objects using a view-based representation”, International Journal of Computer Vision, Vol. 26, No. 1, 1998. Boser, B., Guyon, I.M., and Varnik, V., “A training algorithm for optimal margin classifiers”, ACM Workshop on Conference on Computational Learning Theory, Pennsylvania, US, pp. 142152, 1992. Bowyer, K. and Dyer, R., “Aspect graphs: An introduction and survey of recent results”, International Journal of Imaging Systems and Technology, Vol. 2, 1990. Box, G. and Tiao, G., Bayesian Inference in Statistical Analysis, John Wiley & Sons, 1992. Bradski, G. and Kaehler, A., Learning OpenCV, First Edition, O’Reilly Media Inc, September 2008. Brown, M. and Lowe, D.G., “Invariant features from interest point groups”, Proc. British Machine Vision Conf., Wales, United Kingdom, pp. 656-665, 2002. Brown, M. and Lowe, D.G., “Recognizing panoramas”, Ninth International Conference on Computer Vision, Nice, France, pp. 1218-1225, 2003. Brown, M., Szeliski, R., and Winder, S., “Multi-image matching using multi-scale oriented patches”, IEEE Computer Society Conference on Computer Vision and Pattern Recognition, San Diego, US, pp. 510-517, 2005. 83  Brown, M. and Lowe, D.G., “Automatic panoramic image stitching using invariant features”, International Journal of Computer Vision, Vol. 74, No. 1, 2007. Comaniciu, D., Ramesh, V., and Meer, P., “Kernel-Based object tracking”, IEEE Trans. on Pattern Analysis and Machine Intelligence, Vol. 25, 2003. Edwards, G., Taylor, C.J., and Cootes, T.F., “Interpreting Face Images Using Active Appearance Models”, Proc. Third International Conf. Automatic Face and Gesture Recognition, pp. 300-305, 1998. Elgammal, A., Duraiswami, R., Harwood, D., and Davis, L.S., “Background and Foreground Modeling using Non-parametric Kernel Density Estimation for Visual Surveillance”, Proceedings of the IEEE, July 2002. Freeman, W.T. and Adelson, E.H., “The design and use of steerable filters”, IEEE Trans. on Pattern Analysis and Machine Intelligence, Vol. 13, No. 9, 1991. Freund, Y., “Boosting a Weak Learning Algorithm by Majority”, Information and Computation, Vol. 121, No. 2, 1995. Friedman, J.H., Bentley, J.L., and Finkel, R.A., “An Algorithm for Finding Best Matches in Logarithmic Expected Time”, ACM Transactions on Mathematical Software, Vol. 3, No. 3, 1977. Gool, L.V., Moons, T., and Ungureanu, D., “Affine/ photometric invariants for planar intensity patterns”, Proc. European Conf. on Computer Vision, Cambridge, UK, pp. 642-651, April 1996. Grewe, L. and Kak, A., “Interactive learning of a multi-attribute hash table classifier for fast object recognition”, Computer Vision Image Understand, Vol. 61, No. 3, 1995. Harris, C. and Stephens, M., “A combined corner and edge detector”, Proc. of Alvey Vision Conf., pp. 147-151, 1988. Hannah, M.J., “Test results from SRI’s stereo system”, Image Understanding Workshop, Massachusetts, US, pp. 740-744, 1988. Hastie, T., Tibshirani, R., and Friedman, J., The Elements of Statistical Learning. SpringerVerlag, New York, 2001. Kadir, T. and Brady, M., “Saliency, scale and image description”, Intern. Journal of Computer Vision, Vol. 45, No. 2, 2001. Kadir, T., Zisserman, A., and Brady. M., “An affine invariant salient region detector”, Proc. European Conf. on Computer Vision, Prague 1, Czech Republic, pp. 228-241, May 2004. Ke, Y. and Sukthankar, R., “PCA-SIFT: A More Distinctive Representation for Local Image Descriptors”, Proc. IEEE Computer Society Conference on Computer Vision and Pattern Recognition, Washington, US, pp. 506-513, June 2004.  84  Lazebnik, S., Schmid, C., and Ponce, J., “A sparse texture representation using affine-invariant regions”, Proc. IEEE Conf. on Computer Vision and Pattern Recognition, Wisconsin, US, pp. 319-324, June 2003. Lazebnik, S., Schmid, C., and Ponce, J., “A Sparse Texture Representation Using Local Affine Regions”, IEEE Tran. on Pattern Analysis and Machine Intelligence, Vol. 27, No. 8, 2005. Lowe, D.G., “Object Recognition from Local Scale-Invariant Features”, Proceedings of the IEEE International Conference on Computer Vision, Corfu, Greece, pp. 1150-1157, September 1999. Lowe, D.G., “Local feature view clustering for 3D object recognition”, IEEE Conference on Computer Vision and Pattern Recognition, Kauai, Hawaii, pp. 682-688, 2001. Lowe, D.G., “Distinctive image features from scale-invariant keypoints”, Intern. Journal of Computer Vision, Vol. 60, 2004. Matas, J., Chum, O., Martin, U., and Pajdla, T., “Robust wide baseline stereo from maximally stable extremal regions”, Proc. British Machine Vision Conf., Wales, United Kingdom, pp. 384393, September 2002. McLauchlan, P.F. and Jaenicke, A., “Image mosaicing using sequential bundle adjustment”, Image and Vision Computing, Vol. 20, No. 9, 2002. Mikolajczk, K. and Schmid, C., “Indexing based on scale invariant interest points”, Proc. IEEE Intern. Conf. on Computer Vision, British Columbia, Canada, pp. 525-531, 2001. Mikolajczyk, K. and Schmid, C., “An affine invariant interest point detector”, Proc. European Conf. on Computer Vision, Copenhagen, Denmark, pp. 128-142, May 2002. Mikolajczyk, K. and Schmid, C., “A performance evaluation of local descriptors”, Proc. IEEE Conf. on Computer Vision and Pattern Recognition, Wisconsin, US, pp. 257-263, June 2003. Mikolajczyk, K. and Schmid, C., “Comparison of affine-invariant local detectors and descriptors”, Proc. European Signal Processing Conf., Vienna, Austria, pp. 1729-1732, September 2004. Mikolajczyk, K. and Schmid, C., “A performance evaluation of local descriptors”, IEEE Trans. on Pattern Analysis and Machine Intelligence, Vol. 27, No. 10, 2005. Mikolajczyk, K., Tuytelaars, T., Schmid, C., Zisserman, A., Matas, J., Schaffalitzky, F., Kadir, T., Gool, L.V., “A comparison of affine region detectors”, International Journal of Computer Vision, Vol. 65, No. 1-2, 2005. Moravec, H., “The stanford cart and the cmu rover”, Proceedings of the IEEE, Vol. 71, No. 7, 1989. Mughadam, B. and Pentland, A., “Probabilistic visual learning for object representation”, IEEE Transaction Pattern Analysis Machine Intelligence, Vol. 19, No. 7, 1997. 85  Ojala, T., Pietikainen, M., and Harwood, D., “A comparative study of texture measures with classification based on featured distributions”, Pattern Recognition, Vol. 29, No. 1, 1996. Papageorgiou, C., Oren, M., and Poggio, T., “A general framework for object detection”, IEEE International Conference on Computer Vision, Bombay, India, pp. 555-562, January 1998. Paragios, N. and Deriche, R., “Geodesic Active Regions and Level Set Methods for Supervised Texture Segmentation”, International Journal of Computer Vision, Vol. 46, No. 3, 2002. Park, S. and Aggarwal, J.K, “A hierarchical bayesian network for event recognition of human actions and interactions”, Multimedia System, Vol. 10, No. 2, 2004. Pearl, J., Probabilistic Reasoning in Intelligent Systems: Networks of Plausible Inference, Morgan Kaufmann, 1997. Rowley, H.A., Baluja, S., and Kanade, T., “Neural network-based face detection”, IEEE Transactions on Pattern Analysis and Machine Intelligence, Vol. 20, No. 1, 1998. Saxena, A., Driemeyer, J., and Andrew Y.N., “Robotic Grasping of Novel Objects using Vision”, The International Journal of Robotics Research, Vol. 27, 2008. Schmid, C. and Mohr, R., “Local grayvalue invariants for image retrieval”, IEEE Trans. on Pattern Analysis and Machine Intelligence, Vol. 19, 1997. Schmid, C., Mohr, R., and Bauckhage, C, “Evaluation of interest point detectors”, International Journal of Computer Vision, Vol. 37, No. 2, 2000. Schreiber, A.C., Rousset, S., Tiberghien, G., and Facenet, “A Connectionist Model of Face Identification in Context”, European Journal of Cognitive Psychology, Vol. 3, No. 1, 1991. Schapire, R.E., “The strength of weak learnability”, Machine Learning, Vol 5, No. 2, 1990. Schaffalitzky, F. and Zisserman, A., “Multi-view matching for unordered image sets, or how do i organize my holiday snaps?”, Proc. European Conf. on Computer Vision, Copenhagen, Denmark, pp. 414-431, June 2002. Shi, J. and Tomasi, C., “Good features to track”. IEEE Computer Society Conference on Computer Vision and Pattern Recognition, Seattle, US, pp. 593-600, 1994. Snavely, N., Seitz, S.M., and Szeliski, R., “Photo tourism: Exploring photo collections in 3D”, ACM Transactions on Graphics, Vol. 25, No. 3, 2006. Tieu, K., Viola, P., “Boosting Image Retrieval,” IEEE Conference on Computer Vision and Pattern Recognition, South Carolina, US, pp. 1228-1235, June 2000. Triggs, B., “Detecting keypoints with stable position, orientation, and scale under illumination changes”, Eighth European Conference on Computer Vision, Prague, Czech Republic, pp. 100113, 2004. 86  Turcot, P. and Lowe, D.G., “Better matching with fewer features: The selection of useful features in large database recognition problems,” ICCV Workshop on Emergent Issues in Large Amounts of Visual Data (WS-LAVD), Kyoto, Japan, October 2009. Tuytelaars, T. and Gool, L.V., “Wide baseline stereo matching based on local, affinely invariant regions”, Proc. British Machine Vision Conf., Bristol, UK, pp. 412-422, September, 2000. Tuytelaars, T. and Gool, L.V., “Matching widely separated views based on affine invariant regions”, Intern. Journal of Computer Vision, Vol. 1, No. 59, 2004. Tuytelaars, T. and Mikolajczyk, K., “Local invariant feature detectors”. Foundations and Trends in Computer Graphics and Computer Vision, Vol. 3, No. 1, 2007. Viola, P., Jones, M.J., and Snow, D., “Detecting pedestrians using patterns of motion and appearance”, Ninth International Conference on Computer Vision, Nice, France, pp. 734-741, 2003. Viola, P. and Jones, M., “Robust Real-time Object Detection”, International Journal of Computer Vision (IJCV), 2004. Young, A.W. and Bruce, V., “Perceptual Categories and the Computation of ‘Grandmother’”, European Journal of Cognitive Psychology, Vol. 3, No. 1, 1991. Yoav, F. and Schapire, R.E., “A decision-theoretic generalization of on-line learning and an application to boosting”, Journal of Computer and System Sciences, Vol. 55, 1997. Zhu, S.C. and Yuille, A.L., “Region competition: Unifying snakes, region growing, and bayes/mdl for multiband image segmentation”, IEEE Transactions on Pattern Analysis and Machine Intelligence, Vol. 18, No. 9, 1996.  87  

Cite

Citation Scheme:

        

Citations by CSL (citeproc-js)

Usage Statistics

Share

Embed

Customize your widget with the following options, then copy and paste the code below into the HTML of your page to embed this item in your website.
                        
                            <div id="ubcOpenCollectionsWidgetDisplay">
                            <script id="ubcOpenCollectionsWidget"
                            src="{[{embed.src}]}"
                            data-item="{[{embed.item}]}"
                            data-collection="{[{embed.collection}]}"
                            data-metadata="{[{embed.showMetadata}]}"
                            data-width="{[{embed.width}]}"
                            data-media="{[{embed.selectedMedia}]}"
                            async >
                            </script>
                            </div>
                        
                    
IIIF logo Our image viewer uses the IIIF 2.0 standard. To load this item in other compatible viewers, use this url:
https://iiif.library.ubc.ca/presentation/dsp.24.1-0071036/manifest

Comment

Related Items