Open Collections

UBC Theses and Dissertations

UBC Theses Logo

UBC Theses and Dissertations

Image-based visual servoing with hybrid camera configuration for robust robotic grasping Zhang, Guan-Lu 2009

Your browser doesn't seem to have a PDF viewer, please download the PDF to view this item.

Item Metadata


24-ubc_2009_fall_zhang_guan-lu.pdf [ 2.46MB ]
JSON: 24-1.0069373.json
JSON-LD: 24-1.0069373-ld.json
RDF/XML (Pretty): 24-1.0069373-rdf.xml
RDF/JSON: 24-1.0069373-rdf.json
Turtle: 24-1.0069373-turtle.txt
N-Triples: 24-1.0069373-rdf-ntriples.txt
Original Record: 24-1.0069373-source.json
Full Text

Full Text

Image-based Visual Servoing with Hybrid Camera Configuration for Robust Robotic Grasping  by  Guan-Lu Zhang B.A.Sc., University of British Columbia, 2007  A THESIS SUBMITTED iN PARTIAL FULFILLMENT OF THE REQUIREMENTS FOR THE DEGREE OF  MASTER OF APPLIED SCIENCE  in  The Faculty of Graduate Studies (MECHANICAL ENGINEERING)  THE UNIVERSITY OF BRITISH COLUMBIA (Vancouver) September 2009  © Guan-Lu Zhang, 2009  ABSTRACT A challenging anthropomorphic characteristic which a robotic manipulator should acquire is vision. When first introduced to manufacturing and material-handling industries, robots were taught to move “blindly” in a structured environment where the position and the orientation of the manipulated object were assumed to be known. Since then, a spectrum of visual servoing techniques has been developed to increase the versatility and accuracy of the robotic manipulators for carrying out tasks in less structured environments. In particular, the concept of continuous motion control of robotic systems using a visual feedback ioop has been applied recently to unstructured environments. In the work presented in this thesis, a functioning mobile manipulator platform, named Autonomous Intelligent Mobile Manipulator (AIMM) with a hybrid camera configuration has been developed. This platform is a member of the cooperative heterogeneous mobile robotic team at the Industrial Automation Laboratory (IAL) of the University of British Columbia (UBC). Given the increasing threat of terrorism and natural disasters around the world, the encompassing research explores new robotic solutions for search and rescue applications. Specifically, the LkL robotics subgroup develops robotic capabilities that can assist or even replace human rescuers in situations that involve life threatening risks. The main contribution of the work in this thesis can be summarized by its two aspects: hardware and control scheme. The developed mobile manipulator consists of a light-weight yet capable manipulator, a mobile robotic base, and a hybrid camera configuration using both a monocular camera and a stereo camera. The classical imagebased visual servoing scheme has been improved by techniques of depth estimation and neural network-based grasping. In the present work, both the platform and the control scheme have been tested in real-life scenarios where multiple robots complete a complex task of object handling in an unstructured environment. Future directions in platform upgrade and visual servoing research are proposed as well.  11  TABLE OF CONTENTS ABSTRACT  ii  TABLE OF CONTENTS  iii  LIST OF TABLES  v  LIST OF FIGURES  vi  NOMENCLATURE  viii  ABBREVIATIONS  ix  ACKNOWLEDGEMENTS  x  Chapter 1 Introduction  1  1.1 Motivation  2  1.2 Problem Definition  3  1.3 Review of Previous Work  3  1.3.1 Depth Estimation Techniques  4  1.3.2 Grasping Techniques  5  1.3.3 Mobile Manipulator Platforms  5  1.4 Justification of the Hybrid Camera Configuration  8  1.5 Research Objectives  9  1.6 Thesis Organization  10  Chapter 2 Mobile Manipulator Test-Bed  12  2.1 Hardware Overview  12  2.2 Software Overview  13  2.3 Mobile Robot:  PioneerTM  2.4 Manipulator: Harmonic  3  P..rrnrM  14 15  2.4.1 The Manipulator  15  2.4.2 The Universal Gripper  16  2.4.3 Manipulator Kinematic Model  17  2.5 Auxiliary Sensors  2.5.1 Monocular Camera  20 20 Intrinsic Camera Parameters  22 Extrinsic Camera Parameters  23  111  2.5.2 Stereo Camera. 25 Intrinsic Camera Parameters  26 Extrinsic Camera Parameters  26  2.5.3 Laser Range Finder  27  2.6 On-board Computer  28  Chapter 3 Visual Servoing  30  3.1 Fundamentals of Image-Based Visual Servoing  30  3.2 Coordinate Transformations  33  3.2.1 Web Camera Frame to Base Frame  33  3.2.2 Stereo Camera Frame to Web Camera Frame  34  3.3 Computer Vision  35  3.3.1 Feature Tracking  36  3.3.2 Depth Estimation via Color Tracking  37  3.4RuleBase  39  3.5 Experimental Results  41  Chapter 4 Grasping  47  4.1 Neural Network Architecture  47  4.2 Simulation Data  49  4.3 Training  50  4.4 Results  50  Chapter 5 Conclusions  56  5.1 Summary and Contributions  56  5.2 Limitations and Future Directions  57  BIBLIOGRAPHY  59  iv  LIST OF TABLES Table 2.1: Parameters Definitions for the DH Convention  18  Table 2.2: Harmonic Arm Physical Attributes  18  Table 2.3: Intrinsic Parameters of the Monocular Camera  22  Table 2.4: Intrinsic Parameters of the Stereo Camera  26  Table 3.1: Rule Base  40  Table 3.2: Feature Point Case Study  41  Table 4.1: Input and Output Values for Different Grasping Scenarios  50  v  LIST OF FIGURES Figure 1.1: IAL Heterogeneous Multi-Robot Team  3  Figure 1.2: Nomadic Technologies XR4000  6  Figure 1.3: The Stanford Artificial Intelligence Robot  7  Figure 1.4: EI-E from Healthcare Robotics  8  Figure 1.5: The Proposed Hybrid Camera Configuration  9  Figure 2.1: Subsystems of AIMM  12  Figure 2.2: System Architecture of AIMM  13  Figure 2.3: Software Architecture of ATMM  14  Figure 2.4: Major Components of PioneerTM 3  15  Figure 2.5: Major Components of the Harmonic ArmTM  16  Figure 2.6: Embedded Sensors of the Universal Gripper  16  Figure 2.7: Response Curve of the Infrared Proximity Sensor  17  Figure 2.8: Harmonic Arm Attribute Identification  18  Figure 2.9: Base Frame  19  Figure 2.10: Frames 0 and 1  19  Figure 2.11: Frame 2  19  Figure 2.12: Frame 3  19  2.13: Frame 4  19  Figure  Figure 2.14: Frame 5 and the Tool Frame  20  Figure 2.15: Eye-in-Hand Monocular Camera  21  Figure 2.16: Pinhole Camera Model  21  Figure 2.17: Image of Calibration Grid  22  Figure 2.18: Coordinate Frames Used in Camera Calibration  23  Figure 2.19: Eye-to-Object Stereo-Vision Camera  25  of Stereo Camera  26  Figure 2.20: Coordinate Frames  Figure 2.21: Hokuyo Laser Range Finder  28  Figure 2.22: Outputs of the Hokuyo Laser Range Finder  28  Figure 2.23: Firewire and Serial Adapters  29  Figure 3.1: Control Diagram for Multi-Camera Image-Based Visual Servoing  30  vi  Figure 3.2: Feature Points Identified by Eye-in-Hand Camera  31  Figure 3.3: Major Coordinate Frames on AIMM  33  Figure 3.4: Target Object with Unique Visual Features Marked on It  36  Figure 3.5: Image Feature Data from MIL  36  Figure 3.6: Rectified Epipolar Geometry  37  Figure 3.7: Disparity Map  38  Figure 3.8: ACTS Color Tracking  38  Figure 3.9: Hierarchy of Influence  40  Figure 3.10: Camera Frame Relative to Image Plane  41  Figure 3.11: Training Image for Case 1  42  Figure 3.12: Training Image for Case 2  42  Figure 3.13: Depth Estimation Accuracy of BumbleBee2 Stereo Camera  42  Figure 3.14: Pixel Error Results  43  Figure 3.15: Translational Velocity Results  44  Figure 3.16: Rotational Velocity Results  45  Figure 3.17: Condition Number Results  45  Figure 3.18: Motor Limit Results  46  Figure 4.1: Feedforward Multilayer Perceptron Architecture  47  Figure 4.2: Backpropagation Learning Algorithm  49  Figure 4.3: Gradient Descent Training (N  4 and L = 0.3)  51  5 and L  =  0.3)  52  Figure 4.5: Gradient Descent Training (N= 6 and L  =  0.3)  52  Figure 4.4: Gradient Descent Training (N  =  Figure 4.6: Gradient Descent with Momentum  53  Figure 4.7: Gradient Descent with Momentum  54  Figure 4.8: Gradient Descent with Momentum and Variable Learning Rate  54  Figure 4.9: Gradient Descent with Momentum and Variable Learning Rate  55  Figure 5.1: End-Effector Roll, Pitch and Yaw Axes  58  vii  NOMENCLATURE pC  Camera frame defined by the classic pin-hole camera model  TM p C  Camera frame defined by the MATLAB Camera Calibration Toolbox Right camera frame of the stereo camera defined by the MATLAB  pLM  Left camera frame of the stereo camera defined by the MATLAB  pG  Grid frame  pW  World frame  pB  Harmonic Arm base frame  pS  Stereo Camera frame  R  Rotation matrix describing frame 0 with respect to frame 1 Translation vector describing frame 0 with respect to frame 1  0,  DH Joint Angle  L.  DH Link Offset  a,  DH Link Twist  /1,  DH Link Length  F  Focal Length  (R, c)  Pixel Coordinates  (OR,Oc)  Principle Point Coordinates  (u, v)  Image Plane Coordinates  (, , F)  Focal Length to Pixel Size Ratio  S  Current visual feature vector or S’’  Desired (i.e., trained) visual feature vector  e(t)  Error between the current and the desired visual feature vector  66 K  Gain matrix  C  Transformed depth parameter in the eye-in-hand camera frame Eye-in-hand camera velocity vector  L  Interaction matrix  viii  ABBREVIATIONS AIMM IAL  —  Autonomous Intelligent Mobile Manipulator  Industrial Automation Lab  —  CRASAR Center for Robot Assisted Search and Rescue -  PBVS IBVS DOF  -  —  Dynamic Link Libraries  —  LAN  Image-Based Visual Servoing  Degrees of Freedom  —  DLL  Position-Based Visual Servoing  Local Area Network  —  IR Infrared -  DH Devanit-Hartenberg -  CCD  Charge Coupled Device  —  W.R.T. MTh  —  —  With Respect To  Matrox Imaging Library  ANN Artificial Neural Network -  MN  —  Neural Network  ix  ACKNOWLEDGEMENTS As an undergraduate student, my impression of Prof. Clarence de Silva was a mix of curiosity and reverence. As one of his graduate students during the past two years, I have been more familiar with him by his warm support as a mentor, which further strengthened my initial impression. I have been very fortunate to receive from Prof. de Silva rigorous guidance and critical feedback as a research supervisor throughout my study. Also, I am grateful for the Research Assistantship which he provided to me through his Special Research Opportunities (SRO) grant from the Natural Sciences and Engineering Research Council (NSERC) of Canada. I am also profoundly thankful for the guidance Dr. Ying Wang has given me during my studies. Ying has taught me many practical skills and he has motivated me through challenges, which I doubt I could overcome alone. In the beginning of my research, when I was new to the area of robotics and controls, I received helpful comments from Dr. Farbod Khoshnoud (of SOFTEK and a Research Associate of our lab), Dr. Elizabeth Croft, Dr. Ryozo Nagamune, and Dr. Lalith Gamage (a Visiting Professor in our lab). My sincere gratitude goes to them as well. I wish to express my appreciation to Markus Fengler, Erik Wilson, Sean Buxton and Glenn Jolly. Erik and Markus were both very patient and helpful to my questions related to machining and hardware issues. Sean and Glenn were both enjoyable to work with and I certainly could not have understood electrical issues so quickly without their insights. In addition, I wish to thank Yuki Mastumura and Barb Murray for their patience and advice when I needed administrative help. During the last term of my graduate studies, I was fortunate to be accepted by the MITACS Accelerate Canada internship program, which gave me the chance to experience research projects at an industrial pace. Besides, Prof. de Silva’s continuous support during this 4-month period of internship, I wish to thank Dr. Réda Fayek of Dynacon Inc. for being my internship supervisor and Dr. Claudia Krywiak of MITACS for coordinating this internship. Srinivas, Ramon, Roland, Howard, Tahir, Behnam, Gamini and Arunasiri are my colleagues at the Industrial Automation Lab and they friendship allowed me to get through the sometimes challenging periods of research. I would like to especially thank  x  Howard’s help in programming because I could not have implemented my ideas without his expertise. Ultimately, whatever I have achieved thus far, I own it to my parents and grandparents, who nurtured me with everything they could afford and offer. Through them I see the best example of love for a child and the selfless sacrifice that brought them half way around the world. My life is a gift from them and I wish to dedicate this work to my family  xi  Chapter 1 Introduction Vision is one of the most challenging anthropomorphic characteristics for robot manipulators to acquire. When first introduced to industries such as manufacturing and material-handling, manipulator motions were performed “blindly” and the performance of these robots was extremely sensitive to changes in the environment (i.e., position and orientation of the target object). To increase the versatility and accuracy of manipulator systems, the technique of continuous motion control using a visual feedback loop called “Visual Servoing” was developed in 1973 [1]. Since then, many variations of visual servoing has been proposed and developed in order for robotic systems to be more capable and useful. Three decades after the introduction of robotic vision, the focus of research in robotic manipulation is shifting from the structured environments such as assembly lines to unstructured environments such as disaster scenes and planetary surfaces. In particular, visually-controlled mobile manipulators have received avid interest from the robotics community because they possess three advantages over the traditional fixed-based (stationary) manipulators: First, the mobile base expands the robot’s workspace and allows it to traverse through virtually any terrain; Second, the visual capability introduces human-like perception to the robot which enhances its ability to execute complex manipulation tasks; Third, with the use of more than one mobile manipulator, cooperative manipulation will be beneficial in unstructured environments where human involvement might be inefficient or hazardous. Example situations include urban search and rescue, planetary exploration, chemical or nuclear material handling, and explosive disarmament. Despite the promising capabilities, several issues remain as challenges to the functional development of mobile manipulators. Perception of an unstructured environment can be especially difficult when the position and orientation of a target object are unknown. Furthermore, during manipulation, the redundant degrees-of freedom provided by the mobile base and the manipulator joints can add to the complexity of the servoing motion.  1  1.1 Motivation At the Industrial Automation Laboratory (IAL) of the University of British Columbia (UBC), a team of heterogeneous mobile robots is being developed for cooperative applications in unstructured environments (see Figure 1.1). In particular, we are in the process of investigating and developing robotic capabilities that will be useful in emergency response situations. The research presented in this thesis is a part of this overall research effort within JAL. Our motivation comes from the increasing threat of terrorism and natural disasters around the world. In the aftermath of the September 11th attacks on the World Trade Centre, robots were used for the first time to help search for victims and identify forensic evidence in the wreckage. During a two week period after the attacks, remote-controlled robots deployed by the Center for Robot Assisted Search and Rescue (CRASAR) searched for victims under unstable and collapsed structures and through voids that were too dangerous and small for access by human or canine rescuers [2]. In the aftermath of hurricane Katrina in 2005, robotic search and rescue was implemented again to search for survivors in damaged houses [3]. Given the current state of search and rescue robotics, the robotics subgroup of IAL envisions a future where robots can assist or even replace human rescuers in situations that involve life threatening risks. To illustrate, imagine a future urban setting where robots have replaced humans in performing many menial tasks such as trash picking and street cleaning. In the event of a disaster such as an earthquake or an explosion, these robots should be able to provide some assistance even though they are not specifically designed for search and rescue purposes. This is similar to the notion that human by-standers are expected to respond to an automobile accident, for example, with the best of their abilities even though they are not trained as emergency responders. With immediate response, the chances of survival would be increased. These “good Samaritan” robots would especially be necessary when human or canine rescuers are unable to access the victims because of damaged and unstable structures or because the victims are trapped in a hazardous environment (e.g., likelihood of explosion due to natural gas leak, presence of hazardous material such as radioactive debris, or possibility of secondary shocks after an earthquake).  2  Figure 1.1: IAL Heterogeneous Multi-Robot Team in an Unstructured Environment.  1.2 Problem Definition The overall goal of the robotic team includes searching for humans in distress, cooperatively grasping and manipulating of objects to help the human, and constructing simple devices by multiple robots in order to transport the affected humans to safety [4]. The focus of the present thesis, however, is on the task of object grasping by a mobile manipulator in the multi-robot team. Realistically, the scope of the task is to have the mobile manipulator grasp a marked object in an unstructured environment where the object’s position and orientation are not known a priori.  1.3 Review of Previous Work Techniques of visual servoing are classified into two categories: position-based visual servoing (PBVS) and image-based visual servoing (IBVS). In PBVS, the control inputs are computed in the three-dimensional Cartesian space, where the positions of features on the target object are derived from a previously know model of the object. The performance of this method is sensitive to camera calibration and accuracy of the object model [5]. The state-of-the-art research on visual servoing has almost abandoned this method because perfectly calibrated camera systems are not practical and having detailed knowledge about the objects in an unstructured environment is not realistic.  3  In contrast, IBVS has been the focus of some recent robotics research for its robustness against calibration errors and coarse object models. Control inputs for IBVS are computed in the two-dimensional (2D) image space, and an interaction matrix defines the relationship between the feature point velocities in the image and the camera velocity in 3D space [5]. However, IBVS is not free of shortcomings. A key parameter of the translational interaction matrix, which is needed in the approach, is the unknown depth variable for each feature point during each iteration of the control loop. It has a significant influence on the stability of the scheme, the realizable motion of the camera, and the global convergence properties of image errors. For example, instability can cause the manipulator to move to a pose where the image features leave the field of view of the camera, leading to servo failure. Other techniques have been proposed to achieve a compromise between IBVS and PBVS. Notably, researchers have proposed a novel 2 ,4 D visual servo scheme -  where translational and rotational degrees of freedom are decoupled in order to allow the translational interaction matrix to be always invertible [6]. Similarly, the partitioned visual servo scheme extends the decoupling effect by defining one image feature for each of the six degrees-of-freedom (DOF) for an object [7]. Since the visual features in the 2 V 2 -  D scheme are still defined partly by 3D information and the partitioned scheme does not  apply to all objects, neither of two schemes will be the focus of the present thesis.  1.3.1 Depth Estimation Techniques The classical IBVS scheme is considered the most relevant approach in the context of a mobile manipulator in an unstructured environment. Thus, improving the depth estimation in the context of the classical image-based scheme will be the focus of the present work. De Luca et al. [8] introduced the method of using a nonlinear observer that asymptotically recovers the actual depth of each image feature of the target object. The work was performed on a mobile robot with a monocular on-board camera, but the intent was to allow a robotic manipulator with a calibrated camera mounted on the end-effector to estimate the depth of the feature point relative to the image plane during camera motions. Similarly, Cervera et al. [9] introduced a new feature vector for the control law  4  input, thereby removing the nonlinearity in the interaction matrix as defined by the classical perspective projection camera model. Kragic et a!. [10], alternatively, uses several stereo-vision systems to directly estimate the depth of a target object from the disparity map and uses only a single camera to perform the task of visual servoing. Perrin et al. [11] propose an active monocular depth recovery method that calculates depth estimates for points on the object contour by using control points from two deformable models. These models are constructed from two images that are taken from a known camera displacement along the z-axis of an eye-in-hand system.  1.3.2 Grasping Techniques Recent research on object grasping with mobile manipulators has been done in indoor settings that are relatively structured compared to a disaster scene. Saxena et al. [12] achieved a novel mobile manipulation system that could unload items from a dishwasher through supervisory learning of grasping points on common household items (e.g., cups, wine glasses, plates). Kemp et a!. [13] introduced a new approach that allowed a mobile manipulator to grasp arbitrary objects after a human specified the target object with an off-the-shelf laser pointer. Rezzoug and Gorce [14] implemented a multistage neural network architecture that matched a grasping posture with the contact configuration of an object. Vilaplana and Coronado [15] developed a coordination strategy for hand gestures using a neural network model. A library of hand gestures provided the basis of appropriate action primitives from which the neural model could be selected. All these research activities have focused on grasping of objects with a high level of accuracy that is typically not necessary for emergency response applications.  1.3.3 Mobile Manipulator Platforms Besides the theoretical work as surveyed here, several mobile manipulator platforms have been developed which represent the state-of-the-art in prototype robotic systems. Although practically all the existing commercial mobile manipulators have been designed to serve as domestic assistants, they are still worthy as references for robots designed to serve as emergency assistants. All these platforms resemble the typical configuration of a mobile robot augmented with a dexterous manipulator; some even use the same robotic manipulator as the AIMM in our laboratory. 5  The Nomadic Technologies XR4000, developed by the Centre for Autonomous Systems at the Royal Institute of Technology in Stockholm, Sweden, is designed to perform fetch-and-carry tasks. Major components are illustrated in figure 1.2. Notably the 4 DOF “YORICK” stereo head has two sets of stereo cameras: peripheral and foveal. In order to overcome the challenge of fixed focal length, two sets of stereo cameras are used to achieve a compromise between the field of view and the resolution. Specifically, the peripheral set, with a focal length of 6 mm, provides a wide angle field of view that is necessary for navigation and coarse object identification. Conversely, the foveal set, with a focal length of 28 mm, provides a high resolution magnified view of the target object which is necessary for manipulation [10].  \ J 1(3 togcv  UM)(  IlrkiI Iu,id ‘,—.in--h.ind  Android  l’VEt l1 R  rnri  i’or  v_-i  r  krru dmcra  pau  — SI( K I.sci ,ewc  4)flI l)II’  Figure 1.2: Nomadic Technologies XR4000 Centre for Autonomous Systems, RIT.  STAIR (STanford Artificial Intelligence Robot), developed by the Artificial Intelligence Laboratory at the Computer Science Department of Stanford University, is designed to serve in domestic scenarios such as unloading a dishwasher and clearing a table. STAIR has a Harmonic Arm as the manipulator and a Segway® Personal  6  Transporter as the mobile base (see Figure 1.3). The STAIR vision system consists of an eye-in-hand webcam and several elevated stereo cameras. In addition, STAIR has a SICK laser scanner for navigation and an on-board power supply [12].  Figure 1.3: The Stanford Artificial Intelligence Robot  EI-E (“Ellie”) is an assistive robot developed by the Health Care Robotics Laboratory at Georgia Institute of Technology. Similar to STAIR, its purpose is to assist in domestic care of patients and/or senior citizens. Unlike STAIR, EI-E behaves through a “clickable interface,” where the user defines the target object and destination by using a green laser pointer. Figure 1.4 illustrates the major components of EI-E. Notably, EI-E has a mobile base driven by two active wheels and a passive caster. There are two visual systems: an omnidirectional camera system which is composed of a hyperbolic mirror with a monochrome camera; a stereo vision camera mounted on a pan-tilt unit provides 3D colored images of its surroundings. In addition, the Harmonic Arm and a laser range finder are mounted on a linear actuator (“Zenither”) that stands comparable to an average height of a human [16].  7  St”c’o cr’a Mrror t04 c’inidr&toraI aseroln  — -.  Oetectior IcI,otne  carwra wiit  qron  hftcr  Katara vir  CompLiter system  cy n Lacr rçiQ  -,  Erratk ba  rinlcr  (OR Figure 1.4: EI-E from Healthcare Robotics Georgia Institute of Technology  1.4 Justification of the Hybrid Camera Configuration From the foregoing survey of the mobile manipulator platforms it is evident that a single camera in the eye-in-hand configuration is not sufficient for performing useful tasks in unstructured environments. Also evident from the survey is the fact that the current mobile manipulators imitate both human behavior (i.e., vision, articulated arm, and so on) and human form (i.e., elevated field of view). For a field robot designed for emergency response, where the terrain is not always flat, maintaining a lower profile is advantageous. Figure 1.5 illustrates the hybrid camera configuration as developed in the present work, where the depth information used by the eye-in-hand camera is transformed from the depth information obtained by the stereo camera through the known kinematics of the mobile manipulator components. The advantages of this configuration are twofold: the  8  enlarged field of view is appropriate for the emergency response application, and stereo depth estimation can be performed without occlusion by the manipulator.  Figure 1.5: The Proposed Hybrid Camera Configuration.  1.5 Research Objectives This thesis intends to address the general issue of using vision as the primary sensor to control a mobile manipulator in an unstructured environment. A common research topic in the area of mobile manipulation is autonomous grasping of objects in an environment where the positions and orientations of the objects are unknown. The objectives of the research described in this thesis are listed below: 1. Develop a physical mobile manipulator system with vision and other sensory devices that are suitable for tasks in unknown and unstructured environments 2. Implement a hybrid camera configuration that includes both the classical eye-in-hand configuration and the less common eye-to-object configuration to enhance the visual capabilities of the mobile manipulator 3. Develop a new control technique based on the classical image-based visual servoing for a position-controlled robot manipulator with multiple cameras  9  4. Demonstrate and validate the capabilities of the developed mobile manipulator among the heterogeneous robotic team in a physical simulation of a cooperative task in an unstrnctured environment.  1.6 Thesis Organization So far, this chapter has introduced the background and motivation of developing a visually controlled mobile manipulator as part of a team of cooperating heterogeneous mobile robots for emergency response purposes. The specific deliverables and the objectives of this thesis have been outlined along with a survey of relevant previous work. The organization of the remainder of the thesis is outlined below. Chapter 2 will describe AIIv1M with details of the specific subsystems and components since its physical attributes are intricately related to the conceptual contribution of this thesis. In particular, an overview of the hardware and software aspects of the platform will be discussed. Details of the robotic components of AIMM will be given. There will be a special emphasis on the kinematic model of the robotic manipulator since its development is the foundation of the control scheme developed in the thesis. Calibration parameters for both the monocular and the stereo cameras will be presented as well. Chapter 3 will focus on the aspect of visual servoing of AIMM. The chapter will start with a road map of the control scheme and then address each component in the servoing process. The background of the classical IBVS will be briefly presented. There will be significant focus on the computer vision and coordinate transformation aspects of the process. A discussion on improving the classical image-based visual servoing with multiple cameras and on using dynamic gains to control a position-controlled manipulator will follow. Finally, experimental results will be presented and discussed. Chapter 4 will present a neural network approach for robotic grasping an object. Specifically, three training functions; namely, gradient descent, gradient descent with momentum and gradient descent with momentum and variable learning rate, using the backpropagation learning algorithm, will be explored in order to improve the speed and the accuracy of the network. Conditions of the simulation experiments will be presented along with the corresponding results.  10  Chapter 5 will provide a summary of the work carried out in the present research and will outline its notable contributions. Limitations of the current research and possible directions for future work will be indicated.  11  Chapter 2 Mobile Manipulator Test-Bed 2.1 Hardware Overview The mobile manipulator platform, named “AIMM” (Autonomous Intelligent Mobile Manipulator), developed in the Industrial Automation Laboratory (IAL) is composed of four subsystems: mobile robot, manipulator arm, auxiliary sensors, and onboard computer (see Figure 2.1). Each subsystem is a commercial off-the-shelf product, which makes this mobile manipulator system reproducible. For practical reasons, customized components were manufactured and modifications were made in order to integrate the needed hardware. Figure 2.2 illustrates the centralized architecture, which organizes all the subsystems into a single functional entity. Details of each subsystem are described in the subsequent sections. Harmonic Arm By Nouro,ko AC;  Web13v Cam LoIWh —  Stereo Cam By PoIfl Groy Roooroh  A  Computer By Lenovo  Laser j Range Finder By IIouyo  12  1 394___{’ca  C  Harmonic Arm  }__crossed  Figure 2.2: System Architecture of AIMM.  2.2 Software Overview Structurally, the software architecture of AIMM is similar to its systems architecture (see Figure 2.3). A C++ main program is the hub that combines all secondary programs for each subsystem. For illustrative purposes, the dashed arrows indicate a link between software blocks within the program while the solid arrows indicate a physical link between software and hardware. The “Wi-Fi Server-Client Exchange” establishes a wireless communication network between AIMM and the other members of the robotic team. It is directly connected to instances of the manipulator and mobile robot, where built-in functions such as inverse kinematics and obstacle avoidance can be used in the form of a black box. Calculations in the visual servo control scheme and coordinate transformation are coded initially in the MATLAB® environment. Through a compatible MATLAB® C++ complier and runtime conversion, these equations are transformed from m-files to DLL files, which the main program can now use. For the stereo camera, the raw images and the disparity map are first obtained by the Triclops® Library provided by the manufacturer. Then ACTS® Color Tracking is used to filter the raw image for identification of the visual cue on the target object. For the eye-in-hand web camera, the raw image is captured by the open-source computer vision package: OpenCV, and the feature coordinates are identified by the Matrox® Imaging Library. Ultimately, the main program sends the appropriate command to either the mobile robot or the manipulator.  13  MATLAB® .mfiles  Wi-Fi ServerClient EX  Stereo Camera  Open Source Computer  MATLAB C++ Complier & Runtime  Vision Lib  I  I  Socket Control lo TCP/IP  coti 22  I  Mobile Robot  Eye-in-hand Camera  I  I  Robotic Arm  Figure 2.3: Software Architecture of AIMM.  2.3 Mobile Robot: The as  PioneerTM  PioneerTM  3  3 developed by MobileRobots Inc. is the mobile robot that serves  the base of our mobile manipulator system. The  PioneerTM  3 is a battery powered  robot driven by two reversible DC motors each equipped with high-resolution optical quadrature shaft encoders for position and velocity sensing [17]. An embedded computer built on a core client-server model with Pentium III processor allows wireless network communication with other robots and provides low-level motor control and navigation capabilities. The PioneerTM 3 can function as a stand-alone robot; however, it is connected to the on-board laptop computer via a RS-232 serial connection. The rings of ultrasonic sensors at the front and rear (see Figure 2.4) help the PioneerTM 3 avoid obstacles and prevent collision with other robots [18].  14  Front Sonar Ring  Battery Access Door  Caster  Drive Wheel  Figure 2.4: Major Components of PioneerTM 3 Reproduced with permission from MobileRobots, Inc.  2.4 Manipulator: Harmonic ArmTM 2.4.1 The Manipulator The Harmonic  ArmTM  6M1 80 developed by Neuronics AG is an anthropomorphic  robotic manipulator, which serves as the robotic arm for the mobile manipulator. The Harmonic ArmTM is a 5 DOF position-controlled robotic manipulator actuated by brushed DC motors and it is powered by a tethered connection through an AC adapter. The onboard micro-controller (see Figure 2.5) provides low-level motor control while high-level control such as direct and inverse kinematics are dictated by the TCP/IP protocol via a LAN connection to the central laptop computer.  15  Universal  Gripper  Motor 5  Motor 4  Motor2  Motor I On-board Microcontroller with LAN Connection  Figure 2.5: Major Components of the Harmonic ArmTM Modified with permission from Neuronics AG  2.4.2 The Universal Gripper The Harmonic  ArmTM  is also equipped with a two-finger Universal Gripper TM  actuated by a single brushed DC motor. The gripper fingers are embedded with a total of 9 infrared sensors and 4 force sensors allowing closed-loop grasping control (see Figure 2.6).  Gripper Main Body Force  1  IR  4  —  Figure 2.6: Embedded Sensors of the Universal Gripper Modified with permission from Neuronics AG  16  Each proximity sensor has an operating range of 1-5 mm. Figure 2.7 shows the response curve of the proximity sensor, expressed in terms of the collector current of the infrared phototransistor, non-dimensionalized by the maximum collector current. Although different objects with various surface textures will have dissimilar infrared reflectance, this calibration curve obtained with a neutral white test card serves as an adequate reference for simulation purposes [19]. 100 ‘C  % 80  60  40  20  0  Figure 2.7: Response Curve of the Infrared Proximity Sensor Reproduced with permission from OSRAM  2.4.3 Manipulator Kinematic Model Establishing a kinematic model for the manipulator is useful for coordinate frame transformation. In addition, it can be used for analyzing singularities of the manipulator. Using the Devanit-Hartenberg (DH) convention [20], any displacements between two frames (i—i and I) linked by a revolute and a prismatic joint can be represented by a current frame homogeneous transformation:  = I  cos8 1 1 sinO 0 0  1 —sin8cosa  1 sin&,sina  cosO, cosa 1 sina,  1 sina —cos8 1  cos8 1 /3 1 /3 sin  1 cosa  L.  0  0  1  17  Table 2.1: Parameters Definitions for the DH Convention.  Parameter 6  Joint Angle  L,  —  Link Offset Link Twist Link Length between Z,,  —  a.  —  ,6.  Description Variable for revolute joints  Variable for prismatic joints Constant angle between Z 1 and Z 1 Constant perpendicular distance and Z.  —  The forward kinematics for the manipulator can be derived by multiplying together a series of these homogeneous transformation matrices from the base frame to the tool frame: —  —  B A° A’ A 4 2 A 3 A 4 A 5  (2.1)  ‘O “1 “2”3 4”5 ‘T  Note that the base frame (see Figure 2.9), not frame zero, is assigned as the reference frame because the Harmonic  ArmTM  is programmed to move with respect to the fixed  base frame only. Each homogeneous transformation matrix (Eq. 2.2  —  2.8) uses the  attributes described in Table 2.2. Table 2.2: Harmonic Arm Physical Attributes.  Link Lengths [mm]  Home Position Angle Offsets [Degrees]  I  L =200 1 2 = 190 L 3 = 139 L = 185 4 L 5 = 130 L 80 =0 81 = 124.25 82 = 52.7 83=63.5 84=8.5 8=345.7 8 = 140 82=241.5  Operating Range [Degrees]  83 =232 84 = 332.2 Gripper Fingers  =  140  Figure 2.8: Harmonic Arm Attribute Identification.  18  1  0  00  0  1  00  0  0  1  0  0  01  ) 0 cos(—.ir+6  0  sin(—r+8 ) 0 0 0  A:  =  (2.2)  1 -L  0  0  —sin(—z+0 ) 0 ) 0 cos(—r+8  —1  0  1 L  0  0  1  0  (2.3) ‘I  Figure 2.9: Base Frame.  ) —sin(0 1 cos(—8 ) 1 A=  sin(—6 ) 1 0  0  cos(—0 2 L ) 1  cos(—Oi) 0 0 1  sin(—8 2 L ) 1  0  0  1  cos(2T —02) —sin(2T—0 ) 0 2 ) cos(r—O 2 sin(r—8 ) 0 2 0 0  (2.4)  0  0  Figure 2.10: Frames 0 and 1.  cos(r—0 3 L ) 2 sin(r—8 3 L ) 2  0  1  0  0  0  1  (2.5)  Figure 2.11: Frame 2.  Figure 2.12: Frame 3.  I  74  +03 =  ) 3 sin[.E+  0  —  S1fl— +03  J 3 cos+e  J  0 4 Y  o  (2.6)  :.HI  Figure 2.13: Frame 4.  19  4 A  =  cos(6 4 ) 4 —sin(& 0 ) ) cos(6 4 sin(8 ) 0 4 0 1 0 0  0  0 0 4 L  (2.7)  01  1000 0100 0  0  1  5 L  (2.8)  5 x  5 Y  0001  Figure 2.14: Frame 5 and the Tool Frame.  2.5 Auxiliary Sensors A11vIM is equipped with three auxiliary sensors: one monocular camera, one  stereo camera, and one laser range finder. The combination of the monocular and the stereo cameras embodies the hybrid camera configuration, where the monocular camera is in the “eye-in-hand” configuration and the stereo camera is in the “eye-to-object” configuration. It is worthy to note that the stereo camera cannot be used as the “eye-inhand” camera because its weight exceeds the payload limit (500 g) of the manipulator end-effector.  2.5.1 Monocular Camera A Logitech QuickCam® Communicate STXTM CCD webcam is used as the monocular eye-in-hand camera. The ball-socket base of the webcam is removed and replaced by a custom-made fixture. Finally, the entire assembly is rigidly fixed on the manipulator’s end-effector with a band clamp (see Figure 2.15). The classical pinhole model (see Figure 2.16) is used for camera calibration, and the majority of the calibration calculations have been simplified by the MATLAB® Camera Calibration Toolbox developed by Dr. Jean-Yves Bouguet of California Institute of Technology. Each  20  calibration routine must include at least 20 images of a calibration grid that oriented in random poses with respect to the camera (see Figure 2.17). a  Figure 2.15: Eye-in-Hand Monocular Camera.  yC  pC  oc  =  (X, Y, Z)C  Optical Axis  Figure 2.16: Pinhole Camera Model.  21 Intrinsic Camera Parameters The intrinsic parameters determine the characteristics of the CCD sensor with respect to the image plane; therefore, these parameters remain constant in the presence of camera motion. Table 2.3 summarizes the parameter values determined by the calibration toolbox with the exception of the focal length, which was obtained directly from the manufacturer. Table 2.3: Intrinsic Parameters of the Monocular Camera.  Focal Length Focal Length to Pixel Size Ratio Principal Point Resolution  F [mm] F [no units] F [no units] Or [pixel] O [pixel] R [pixel] C [pixel]  4.5 mm (obtained from Logitech) 973.8393 9 982.40240 303 220 480 640  22 Extrinsic Camera Parameters The extrinsic parameters characterize the geometric relationship between the camera and the world so that the physical coordinate of a point in Cartesian space  [x, Y, z]  can be translated into the pixel coordinate of the image space [R, c].  Specifically, this relationship is captured in a homogenous transformation matrix, of which the reference frame is the camera frame. Since the camera is calibrated with the grid and the relationship between the grid frame and the world frame is known (see Figure 2.18), a mathematical manipulation must be done to link the camera to the world frame.  yG  Z’’ pG  —  Figure 2.18: Coordinate Frames Used in Camera Calibration.  Note that the MATLAB® camera frame pMc, which is given by the calibration tool box, is different from the actual camera frame defined by the classic pinhole model P’. This discrepancy will be corrected in a subsequent relation. Also, note that the rotation matrices in this thesis follow the standard convention:  R  =  1 •x x 0  0 1 .x y  1 .x z 0  0 y  1 y y 0  1 z  .zo 1 x  yi.zo  zi.zo  .  23  The outputs of the calibration routine are RffC and Tc which represent a point [x, y, z] in the grid frame pG in the MATLAB® camera frame pMC through the following relation: pMC  =R  .p°  (2.9)  +T  Here:  f 1 R  =  —0.044968 0.986551  0.998849 0.041775  0.01668 0.158024  0.157145  0.023564  —0.987294  and Tc  1941.186394]T  =[—77.162634 35.822967  can be physically measured and establish the following  By inspection, R and T relation:  pG  =R.Pw+T?  (2.10)  Substituting Eq. 2.10 into Eq. 2.9 we get: PMC— —  MC G  G W  W  MC G  +  C 0 W+G  The result becomes: pMCRMCpW+TMC  (2.11)  Here: —09988 R= —0.0418  —0.0236  0.0450  —0.0167  —0.9866 —0.1580 —0.1571 0.9873  T  —55.4109 and T=  As mentioned before, the following relation links  818.2255 —368.1532  pMC  with pC.  pCCpMCC  (212)  Here: —l 0 0 0 R= 0 —1 0 andTC= 0 001 0 Finally, the camera frame with respect to the world frame can be established as: pC  =R,  .pW  +T  (2.13)  24  As a validation step, the following two relations are calculated and the calibration errors are defined with reference to the actual values in pixel coordinates: R=—F--+OR  (2.14)  C=—F-_-+O  (2.15)  AIMM ‘s eye-in-hand camera has a calibration error of 2 pixels and 8 pixels in the row  (R) and column (C) directions, respectively.  2.5.2 Stereo Camera A BumbleBee®2 by Point Grey Research is used as the stereo-vision eye-to-object camera. The stereo camera is directly mounted on a 3 DOF tripod ball-socket head and a custom-made fixture (see Figure 2.19). The entire assembly is rigidly fixed on the top cover of the mobile robot. Each camera is calibrated using the same method as the web camera, and the two cameras are ultimately linked to each other by a common coordinate frame PS(see Figure 2.20).  Figure 2.19: Eye-to-Object Stereo-Vision Camera.  25  Zs  4%  Xs Figure 2.20: Coordinate Frames of Stereo Camera. Intrinsic Camera Parameters The intrinsic parameters of the stereo camera are given in Table 2.4. Table 2.4: Intrinsic Parameters of the Stereo Camera.  Focal Length Focal Length to Pixel Size Ratio Principal Point Resolution  F [mm] F[flo units] 1 [no units] F 0,. [pixel] 0 [pixel]  R [pixel] C [pixel]  Left_Camera Right_Camera 2.5 mm (obtained from Point Grey Research) 1321.63149 1314.97694 1355.18201 1344.62115 466 470 407 392 768 1024 Extrinsic Camera Parameters The extrinsic camera parameters for each single camera are derived in the same way as those of the web camera. However, a single coordinate frame needs to be defined 26  for the stereo camera unit; thus, the relationship between the two cameras relative to each other is: P  L  L  R  L  =RRP +TR  (2.16)  Here: 0.9978 R= —0.0194  0.0637  0.0171  —0.0644  0.9992  —0.0359 andT=  0.0370  122.5259 1.6699  0.9973  3.1301  Note that R is approximately an identity matrix because the two cameras of the stereo vision system are almost aligned. The relation between the stereo camera frame and the world frame is defined as:  PSRRMPW+  (2.17)  Here, fi (Length of the stereo baseline in Figure 2.20)  =  120.183 mm  The stereo frame pS has the same orientation as both of the individual camera frames but its origin is translated by half the baseline. Finally, Eq. 2.17 becomes: pS  =R.  .pW  +T  (2.18)  Here: 0.1110  —0.9658 R,  —0.8228  =  —0.2557  —0.5574  —0.2343 —0.5668 and T  =  0.7899  2.5.3 Laser Range Finder The second sensor that is mounted on the end-effector of the manipulator is the Hokuyo URG-O4LX Laser Range Finder (see Figure 2.21). This reflective infrared scanner is robust against various external light sources and surface finishes because it is a phase-difference system using time-of-flight as its measuring principle.  27  Figure 2.21: Hokuyo Laser Range Finder.  The scanner has an angular scanning range of 240 degrees and a depth scanning range of 4 meters (see Figure 2.22) both of which are very useful in the localization of unstructured environments. Although this sensor is not directly related to the technical contribution of the present thesis, it does provide AIMM with vital navigational capabilities that will certainly be needed for future work.  Figure 2.22: Outputs of the Hokuyo Laser Range Finder.  2.6 On-board Computer The on-board computer of AIMM is a Lenovo Thinkpad T400. There is no specific reason to pick this particular model other than Thinkpad’s good reputation for physical robustness and functionality. One special feature of this computer that is essential for integrating both the mobile robot via the RS-232 serial connection and the  28  stereo camera via the IEEE- 1394 Firewire connection is the double-decked expansion slot (see Figure 2.23).  Figure 2.23: Firewire and Serial Adapters.  29  Chapter 3 Visual Servoing Visual Servoing is fundamentally a closed-loop control scheme where the onboard camera proves the sensory (visual) feedback signal. As illustrated by figure 3.1, the eye-in-hand camera (i.e., the web camera) provides the feature vector S of the object i) to be manipulated, which is compared to the desired feature vector S (or 5 to create an  error signal eQ). The IBVS controller takes this error signal along with the gain matrix K 66 and the transformed depth information obtained by the stereo camera Z Cto generate a velocity vector with respect to (w.r.t) the eye-in-hand camera frame C Specifically for the position-controlled Harmonic Arm, this velocity vector is transformed into the base frame of the manipulator and then integrated with a constant sampling time to create the future position values for each degree-of-freedom. These future position values represent deviations in scale, rotation and translation between the trained image and the current image of the visual feature pattern. Ultimately, these deviations are fed into a rule base, which in turn, updates the gain matrix to improve the performance of the IBVS controller for the next iteration.  Figure 3.1: Control Diagram for Multi-Camera Image-Based Visual Servoing.  3.1 Fundamentals of Image-Based Visual Servoing The classical task of image-based visual servoing is essentially a finite horizon regulator problem, where the error between the actual and the desired image feature coordinates is regulated asymptotically to zero. The classical IBVS control law is given by:  30  C  =_K.L_1e(t)  (3.1)  1/; 2 V  =_K.L1(s(t)_sd)  (3.2)  (D (D  2 a) Here: Velocity vector w.r.t. camera frame =Gain K Inverse of interaction matrix L’ e(t) = Pixel error Pixel error is defined as the difference between desired pixel coordinates and the current =  pixel coordinates: e(t)= s(t)— sd  (3.3)  Here: ruiQ)=—fx(r—or)1  s, (t) =  1 (t) v  —  —f (c  —  o)  for the i th feature point.  The desired feature points are manually selected from a training picture and the current feature points are identified in real time by the eye-in-hand camera (see Figure 3.2).  I •  ±  ±  -:-#1-. -i-.±.  ••. (a) Training Picture  (b) Current Picture  Figure 3.2: Feature Points Identified by Eye-in-Hand Camera.  31  The interaction matrix relates the velocity of the pixel coordinates in the image plane to the velocity of the camera in the camera frame (i.e., eye-in-hand camera). For this reason, the interaction matrix is also known in literature as the feature Jacobian. For the  1h 1  feature point we have: (3.4)  (t)= 1 1 (u,v,z) L vx  F  F  v  +v F 2  1 + F 2 u F v 1 u  z(t)  t) 1 z  F  F  V 1 U  o  — —  —  o  (3 5 ) .  ‘  Here: F = focal length (obtained from manufacturer specifications)  depth w.r.t. camera frame  z(t)  The size of the interaction matrix should be constructed so that its inverse can be easily computed. For m DOF of camera movement, with k  =  2n feature coordinates  matching to n feature points, the interaction matrix can be stacked by rows to create a composite interaction matrix L (u, v, z) as: 1 (u L ,v 1 1,z ) 1 LJU,v,z)= L(U,v,z) -.  1 U  —  2  F  o  t) 1 z  F  0 -  —  F +u 1  2 1  t) 1 z 1 v  ) )  F 2 F2+v  F v 1 u  vy  -  F  -  F  (3.6) —  F  z(t)  0  0 —_F z(t)  v z(t)  v 7 ii,  +u F 2  F -i-v F 2 F  F  L(U,v,z)e  ‘  ‘  —  F  2nx6  32  3.2 Coordinate Transformations 3.2.1 Web Camera Frame to Base Frame Since the reference frame for the Harmonic Arm is its base frame, as illustrated in Figure 3.3, the velocity vector, which is in the eye-in-hand camera frame  ,  must be  transformed into the base frame 4.  Figure 3.3: Major Coordinate Frames on AIMM.  First, the camera frame  needs to be converted into the tool (end-effector) frame  using the following relation: rRT C  0 3x3  s(T)l1 TC R  (3.7)  J  The rotation matrix R must be obtained indirectly because the origin of the camera frame cannot be physically measured. Thus, the following method is employed. First form the matrices: pMCRMCpC  Pw— —  pB  W MC  =R  MC  (3.8)  W +MC  ,pW p  (3.10)  33  (3.11) Substitute Eq. 3.8 through Eq. 3.11 sequentially into each other to obtain: (‘  RB Rw  c )Pc+(RT RB )Tw +R M R T TB+Tr  (3.12)  Therefore, the following becomes R: DT  DT  —  DB  Rw  MC  —  RMC C  R can be manually measured by a known kinematic pose (see Figure 2.3):  R  0  ) 80 _cos(1O.2,’  Q) 8 _cos(1o.’  0  cos(80 ‘4’80) .  —  (3.14)  cos(80 2’80)  R can be obtained by inspection as: 001 R= 1  0  0  (3.15)  010 RZC can  be obtained from the previous calibration of the web camera (Eq. 2.11): DW _(DMCT 1 M C “W  I  (3 16  R can also be obtained from the previous calibration of the web camera (Eq. 2.12): R = (RZC) T  (3.17)  Finally, the tool frame needs to be transformed into the base frame using: [R:  03x3]T  (3.18)  Note that R can be obtained from the Harmonic Arm kinematic model given by Eq. 2.1. 3.2.2 Stereo Camera Frame to Web Camera Frame From Eq. 3.6, it is evident that the depth parameterz(t) is represented in the eye in-hand camera frame. In order to use the depth information obtained by the stereo camera, a vector in the stereo camera frame pS must be transformed into the eye-in-hand camera frame Pc’. This relationship can be determined by taking advantage of the known kinematic relationships of the mobile manipulator:  34  plY  pS  = R7  + TsW  (3.19)  Here: R7 =(RM)’and Tr =_(RM)T: PB =R  .pW  (3.20)  +T  (3.21) PCRCP + T TC  (3.22)  Substitute Eq. 3.19 through Eq. 3.22 sequentially into each other to obtain: Xc=R.Xs+T.  (3.23)  Every element except R of this relation has been solved explicitly in the previous sections. However, from the kinematic model of the manipulator, 4 can be transformed into A’ by transforming each of the homogeneous transformation matrices that make up4 as: R’ =  01X3 =  T,l_1l  1  T (R1)  T .1i-1 _(Rf-1)  01X3  1  4  (3.24)  ‘  j  =H- ‘1 [01X3  1  (3.25)  j  =  (3.26)  This transformation is done at every joint motion of the robot, where the depth observed by the stereo camera, dynamically changes w.r.t. the eye-in-hand camera frame.  3.3 Computer Vision It is evident that the performance of a visual servoing scheme directly depends on the performance of computer vision software. The goal of computer vision is to quickly provide accurate information so that the control scheme can effectively regulate the feedback signal to asymptotic zero. This topic is a research project on its own, and the present thesis focuses on using available computer vision software in order to achieve a functional mobile manipulator. To simplify the task for computer vision, the target object of the mobile manipulator is marked with unique image features as illustrated in  35  Figure 3.4. The black dots are used to create the feature vector and the color red is used for depth estimation by the stereo camera. The target object is constructed with light weight foam material in order to accommodate Harmonic Arm’s payload limit. A pair of customized handle is to accommodate the two-finger Universal Gripper.  Figure 3.4: Target Object with Unique Visual Features Marked on It.  3.3.1 Feature Tracking During the task of servoing, the eye-in-hand camera must obtain visual feedback continuously in order to prevent failure of the control scheme. A commercially available software called Matrox® Imaging Library (MIL), which employs the template matching technique, is used to extract image feature data from the target object.  Y  1  Ii  a a aI aI  Figure 3.5: Image Feature Data from MIL  As illustrated in Figure 3.5, MTh is able to match the current visual features inside the solid-border box with the trained visual features inside the dashed-border box and provide coordinate deviations of individual image feature and angular discrepancies  36  between the two templates. The coordinate information is used to create the feature vector as described in Eq. 3.3, during each cycle of the control loop.  3.3.2 Depth Estimation via Color Tracking Similar to how human eyes perceive the world, the stereo camera uses the differences between two images of the same scene to generate depth information. Specifically, each pixel of the two images is compared for correspondence, and the distance that separates the same pixels between the images, called disparity, is used to calculate depth.  1 M 2 M  zi, —————BaseLine—————  Figure 3.6: Rectified Epipolar Geometry.  37  To illustrate, consider a simplified stereo rig with known base line dimension (see Figure 3.6). After rectification of the two image planes, the relative distance between the physical points M 1 and M 2 into the page can be estimated by the relative pixel disparity between the image points m 1 and m 2 with respect to the origins of each image plane (i.e. OL and OR). A disparity map (see Figure 3.7) can be generated after the corresponding pixel in each image has been processed.  Figure 3.7: Disparity Map.  When the pixel correspondence between images cannot be established, values in the disparity map become null (black regions). The disparity map, which contains an overwhelming amount of information, must be further processed to capture useful portions of the image. An alternative technique of feature tracking other than template matching, which was described in Section 3.3.1, is coloring filtering. Through training, a color tracking software is able to identif’ the trained color (i.e., color blob) in any scene and generate metrics such as blob area and its center of gravity in image coordinates.  Figure 3.8: ACTS Color Tracking.  38  Using the ACTS TM software from MobileRobots Inc., the red marker in the original image (see Figure 3.8a) can be identified in Figure 3.8b and the corresponding region can be isolated in the disparity map by the center of gravity coordinates and the color blob area. Finally, the average depth value in this region of the disparity map will be converted into a vector in the stereo camera frame, and transformed into the eye-inhand camera frame as described in Section 3.2.2.  3.4 Rule Base As stated earlier in this chapter, the gain matrix is applied to the IBVS controller in order to individually influence the DOFs of the eye-in-hand camera: 0  KVx  C  (3.27)  66 = K K 0  ...  ...  K  This gain matrix replaces the gain constant in Eq. 3.1. In addition to tracking the pixel coordinates of the individual features on the visual pattern, the MIL software is also able to compare the trained image with the current one and provide the deviations in scale, rotation and translation. These three deviations, or errors, can be decoupled and ordered such that each error has more influence in achieving the desire image than the next one. In Figure 3.9, these three types of errors are ranked in a “Hierarchy of Influence” pyramid where the top of the pyramid (i.e., error in scale) has the most influence. In order words, the error in scale should be the first error to be resolved because otherwise, the coordinates of the visual features in the current image will never match those of the trained image when the servoing task is completed. A simple rule base decision mechanism is used, as given in Table 3.1, to determine which type of error affects which element in the gain matrix.  39  Error in Scale  /  \\\  / 8x8y  Error in Rotation  Error in Translation  Figure 3.9: Hierarchy of Influence. Table 3.1: Rule Base.  .<<‘r ‘r k a mm a ‘a max -‘  I7  •<c’<’r i.’ 0 k 0 max  6mm  If  1 k  ,<ç’<  xmin  T.< v mm  S  L’  xinax  Then  j’ max  In Figure 3.10, the eye-in-hand camera frame is positioned relative to the image plane where the visual features are captured. According to this setup, a finite set of ifthen statements can be established for each type of error between the tolerances of VminlTmax  40  Y  x  Figure 3.10: Camera Frame Relative to Image Plane.  3.5 Experimental Results From the definition of a composite interaction matrix (Eq. 3.6), it is evident that the number of feature points influences the method in which the inverse of the interaction matrix is calculated. The present section will investigate the experimental results from the two cases as described in Table 3.2. Table 3.2: Feature Point Case Study.  Case 1: 3 Feature Points .  Case 2: 6 Feature Points  Dimensions of the interaction matrix (i.e.,  • Dimensions of the interaction matrix (i.e.,  6 feature coordinates) match the 6 DOFs  12 feature coordinates) are greater than  of the eye-in-hand camera (i.e., k = m).  the 6 DOFs of the eye-in-hand camera (i.e., k> m).  • The inverse of the interaction matrix can be calculated directly from L’  •  The inverse of the interaction matrix can be calculated from the least squares solution (pseudo-inverse): L  =(LTL)LT  where the control law now becomes: c  =  —ALe(t)  41  In order to obtain experimental results that are suitable for comparing the two cases, one training image is used, but the desired feature coordinates are different in the two cases (see Figures 3.11 and 3.12). Case 1: 3 Feature Points  Case 2: 6 Feature Points  .  •  ••.44  Figure 3.11: Training Image for Case 1.  Figure 3.12: Training Image for Case 2.  The start pose of the manipulator was kept the same for both cases and the mobile base was always kept stationary. Since the mobile base remained stationary during the entire task of servoing, it was only necessary to perform the depth estimation once using the stereo camera. Raw estimates w.r.t. to the stereo camera and the transformed estimates w.r.t. to the camera frame were both confirmed by manual measurements. The maximum distance between the target object and the stereo camera was between 85 cm to 90 cm; thus, the accuracy of the depth estimation behaves according to the trend illustrated in Figure 3.13. 0.0070  Short Rang. Z .ccurao9  nnnn  0.0040  “““,“  00010 0.0000 0000000 0100000 0200000 0300000 0400000 0500000 0800000 0700000 0800000 0900000 1000000 00 00 00 00 80 00 00 00 00 00 00 -.  Range (rn)  Figure 3.13: Depth Estimation Accuracy of BumbleBee2 Stereo Camera Reproduced with Permission from Point Grey Research  42  The plots in Figure 3.14 compare the pixel error against the number of loops the controller had to iterate before the current match the desired pixel coordinates within a 10 pixel tolerance. Note that the x and y coordinates in the plot correspond to the x and y axes illustrated in figure 3.10. Upon inspection of the two plots in figure 3.14, it is evident that case 1 required more iterations than case 2. Notably, the average errors with which the two cases terminated are 7 and 4 pixels, respectively, for the cases 1 and 2. Pixel Error  Pixel Error  12O  200  —  C)  C)  :  ::zzz:z:z:::z  m4  100  1::  60  -  zzzzz  —  — — -  — -  -  -  — -  c,__4:;8 Number of Loops  (a) Case 1: 3 Feature Points.  2346 Number of Loops  (b) Case 2: 6 Feature Points. Figure 3.14: Pixel Error Results.  In figures 3.15 and 3.16, the output velocity vector w.r.t. the base frame for each case is plotted against the number of iteration loops. Qualitatively, the translational and the rotational velocity profiles for case 2 are more gradual than those for case 1. The same characteristic is demonstrated in the pixel error plot for case 2 as well (see Figure 3.1 4b). From figure 3.16, it is evident that the dramatic fluctuation of the rotational velocity in they direction caused the same fluctuation in pixel error in Figure 3.1 4a, which ultimately prolonged the servoing task.  43  - -  mOx mOy mix miy m2x m2y m3x m3y m4x m4y m5x m5y  Translational Velocity wit Base Coordinate  Translational Velocity wrt Base Coordinate 0..  LIZ  I  I  I  I  I  0.2 f:_____  0.1  .  E  JJ•  —  C  -0.1  ------.  > -0.2 0 It  -0.6  Vz  I-  -0.3  -0.6  -0.4  -1  -0.5  —1.2  1  2  3  4 Number of Loops  5  6  7  1  1.5  2  2.5 3.5 3 Number of Loops  4  (a) Case 1: 3 Feature Points. (b) Case 2: 6 Feature Points. Figure 3.15: Translational Velocity Results.  During each iteration of the control loop, the condition number of the interaction matrix is calculated and plotted as given in figure 3.17. As a characteristic number of a matrix, the condition number determines the suitability of a matrix for inversion. Notably, a low condition number indicates a well-conditioned matrix while a high condition number indicates an ill-conditioned matrix. In the context of Eq. 3.4 (definition of interaction matrix), one can interpret the condition number as the sensitivity of the camera velocity to the rate of change of the visual features. As illustrated in the scale of the vertical axes in figure 3.17, the average condition number for case 1 is significantly higher than that of case 2. The condition number for the 6 iteration ioop in case 1 elevated even higher in magnitude, perhaps due to the lighting disturbances on the computer vision software, influenced the camera velocities with similar effect in figures 3.15a and 3.16a.  44  ‘1.5  5  Rotational Velocity wit Bane Coordinate  Rotational Velocity wit Bane Coordinate  ‘IL  H  30  Wx  20 (n -c  10  0  >  0  > cc  cc  -10 .  -20  2  3  4 Numbar of Lnopn  5  (a) Case 1: 3 Feature Points.  6  7  —  —  —  l  1.5  2.5 3.5 3 Number of Loopa  2  4  4.5  i  (b) Case 2: 6 Feature Points. Figure 3.16: Rotational Velocity Results. Condition Number of Interaction Matrix  a io  4.4 4.2  4  4  3.6  3.6  3 -ø  3.6  C C  3.4  B  2.5 0  •0 0  C-)  1.5  3.2 3 2.6  0.5  2.6 3  4 Number of Loopo  1  1.5  (a) Case 1: 3 Feature Poiats.  2  2.6 3 as Number of Loopo  4  (b) Case 2: 6 Feature Points. Figure 3.17: Condition Number Results.  Satisfying the motor limit constraints of the manipulator during servoing is also a topic of great interest in this field. While we have not actively prevented a joint from approaching its limit, the results from both cases in Figure 3.18 demonstrate that this issue is hardly a concern in the current experimental investigation.  45  4.5  5  9ji  snnsaN fl11’fl JOJOjAJ :81 €: o .qo.s 11 .doo  SS  S  5?  P  SC  C  SC  C  53  05  CCL  l5SpLLLW..._ L055L0W.  SQL S CCC DC  DCC  DCL  :1  S  JO1OW  050 LOX  I  5  55  St  odoop 051005  0 0 6 0ILOI0SWPS P SC C  (Cool CoiSvp P10105  SC  C  SI  I  S  S  OOICWCpWIopl  P  C  C  DL LPLSCL0L  JS  CCL  —  0VPLC1OIOW  SQL  C10L0W_  j  10 P 1°W  5053  ICON  ‘SE  Itipi “LLSCJLLLL5  C0ILPC’LL0PL  5 Loop,.qwn  00lLL °CC’CL 6 0  _I.__._L_Y._._LJ2._.  L95  C  S  S  0  P  C  C  OEI  CCCI  DCL  OSL  CCL  CCL  j  £1010141  LIPCCILLL 10..—.—  O’PLCIOL  CCCLW  950  ILQISPPWCLLIOVP  5?  5  SC  P  CCC  CLPIS4WC.Lsn  .dQG9pLequC 55  4 .60015 quwp St  C  SL  C  S  I  S  V  C  C  DC  DC C  C  OPS 055(10105  °J°N 1 Z 455L0..  SQDC  HL)CM0  LIP1010LOW.._._  P010W  01010W  —  CCL  CCL  CC  CC  EL  ICopI  CSOW  0  IC.PjCpLSQ0,  1010PM  SC  St  £  SL  SC  SLOO9LOOSWIPI  f  06009101050051  I  SS  ?  ?  ¶  9  OS  OS SQL  L0550L0W... 0101010051  6511  CCL  Wp5yoOSI  C  oeSNoosLa..  CCC DC  DC  OSI;  S OSL S  505W  —  CCC  :DC  aSS  0CC  001605 010105  (CopI  I 10J0J,4,J  OLICLOVO]010p4  u :j asuj  sjuioj ainpnj  aznjuaj 9 : asuj  SJU!OJ  Chapter 4 Grasping An artificial neural network (ANN or NN) with feedforward architecture using the backpropagation learning algorithm is employed to fuse multiple infrared proximity sensors and determine the positioning status (i.e., correct or incorrect) of the Universal Gripper with respect to an unknown object, as described in Section 2.4.2. Three training functions: gradient descent, gradient descent with momentum, and gradient descent with momentum and variable learning rate, using the backpropagation learning algorithm, are explored in order to improve the speed and accuracy of the network.  4.1 Neural Network Architecture A multilayer perceptron under the class of feedforward topology is used in the present work. In particular, the architecture used for this work consists of an input layer with nine elements, an output layer with one element, and a hidden layer with variable numbers of nodes (see Figure 4.1).  Output  Output Layer  Hidden Layer  Input Layer IRI  1R2  ...  1R9  inputs  Figure 4.1: Feedforward Multilayer Perceptron Architecture.  With an optimal number of hidden nodes and with sigmoid activation functions for each node, the network can approximate any continuous mapping problem. The network learning algorithm is the weight-updating mechanism, which allows for learning during the training process. Since input values and desired output values can be known a  47  priori, the supervised learning mechanism is particularly appropriate. Specifically, the backpropagation algorithm using gradient descent, gradient descent with momentum [21] and gradient descent with variable learning rate [22] are implemented to map a given input vector with a corresponding output value. Of the three variations, the unmodified gradient descent method is the simplest way of updating the network weights in the direction in which the performance function decreases. The formulation of this optimization problem is done off-line (i.e., in batch mode) where weights are updated after the entire training set has been applied to the network. A vector of weight changes for layer 1 is expressed as: (4.1) Here i is the learning rate, which determines the steepness of the performance gradient. The optimal learning rate achieves a balance between the learning speed and stability of learning: If the learning rate is small, the convergence rate becomes slow; and if the learning rate is large, oscillations may occur in the weight space and the algorithm may never reach the minimum error. For this reason, a momentum term can be added in order to prevent stagnation at a local minimum. Now the modified vector of weight changes is expressed as +  i) =  —  +  yAw(’)(t)  (4.2)  Here, the momentum parameter’ is introduced in order to allow trends of weight changes in iteration t to affect the weight changes in iteration t +1. As an additional improvement, the learning rate can be varied according to the behavior of the error with respect to time. The vector of weight changes with variable learning rate and momentum is expressed as Aw’ (t +1) = —2i 8  +  yz\w(’)(t)  (4.3)  R<(t+1)  2= R>E1) E(t)  (4.4)  48  Similar to the momentum term, the variable learning rate parameter 2 adjusts the magnitude of the learning rate based on the ratio of error at iteration t +1 and the error at iteration t. If the ratio is greater than a predefined threshold R , the learning rate is modified by the value 8. Otherwise, the learning rate is modified by the value Ultimately, the objective of the backpropagation algorithm updates the network weights among all the neurons such that the cumulative error Ec is minimized: q  2  (k)—o =min--[t ( k)] minEs 1  (4.5)  k=1 i1  The mean squared error (Euclidean norm) between the target output t. and the actual output 0 is evaluated for each training data k. Index I represents the  neuron of the  output layer; and n and q represent the number of training patterns and output neurons, respectively. Figure 4.2 summarizes the backpropagation learning algorithm. Bias is an additional element in the input layer for the purpose of solving multi-dimensional pattern recognition problems. 1. Initialize weights to random values and set bias 2. Repeat for each training pattern in the data set; a. Calculate the network output value(s) b. Calculate the errot with respect to target output values c. Calculate aD the weight changes from the hidden layer to the output layer d Calculate aD the weight changes from the input layer to the hidden layer e UpdatealI weights 3. Repeat step 2 until the cumulative error satisfies the  stopping criterion Figure 4.2: Backpropagation Learning Algorithm.  4.2 Simulation Data Implementation of the backpropagation learning algorithm using the three training functions mentioned in Section 4.1 is done using the MATLAB Neural Network Toolbox. Distance measurements of the infrared proximity sensor are expressed as a percentage of collector current at the infrared phototransistor (see Figure 2.7). Input vector consists of 9 elements corresponding to the 9 infrared proximity sensors available on the gripper fingers. Data used for simulation is created by a random number function, but actual data  49  is used for validation. Table 4.1 provides a sample of the training data set. Values in the range 10-100 (shaded cells) indicate a detected object, and those between 0-10 indicate otherwise. This forms the input vectors. The status indicates whether a given set of infrared sensor data corresponds to a correct (1) or an incorrect (0) grasping posture. This is the target output value. Table 4.1:  IR1 1R2  Input and Output Values for Different Grasping Scenarios.  1 21 56  2 83 60  3 8 7  Grasping Scenarios 4 5 6 7 6 9 10 6 9 55 71 3  8 2 3  9 5 4  10 8 2  iiIiZii 1R4 83 10 65 74 65 3 2 5 2 9 ?  1R5 1R6  77 3  97 10  81 3  6 4  J1 ± I 1  1R8 1R9 Status  73 5  8 24  10 3  4 4  4 0  5 57  ± Z !. .1 i !Q  7 10 7 6 5 6 3 52 1 7 10 9 9 3 10 0 33 7 8 9 111100000  4.3 Training The training set consists of 100 training patterns (i.e., column vectors in Table 4.1) and the training cycle is set to 100 epochs. The data set is randomly segmented such that 60% of the samples are assigned to the training set, 20% to the validation set and 20% to the test set. The training process continues on the training vector as the cumulative error on the validation vectors decreases. This technique prevents the network from memorizing, and allows for generalization to new input data. After the training is done and the network has been validated, the test set (i.e., the last 20%) is given as an independent test for the network to generalize.  4.4 Results Simulation results are presented here as plots showing the decay of mean squared error, which is the performance parameter. Each plot has its vertical axis as the error in logarithmic scale and the horizontal axis as the epoch in integer scale. The stopping criterion is set as 0.01 (i.e., 102). Network performance is indicated by the rate of convergence of error towards the goal value. The ability of the NN to generalize is evident from the behavior of the test curve. The closer the test curve approaches the goal,  50  the better the ability of the network to generalize. As mentioned before, the number of neurons in the hidden layer plays an important role in the network performance. Even with the aforementioned segmentation technique, a non-optimal number of neurons in the hidden layer can adversely affect the network. From the results of unmodified gradient descent training (see Figures 4.3, 4.4 and 4.5), it is clear that with the learning rate set at  constant (L  =  0.3), the optimal number of hidden neurons is 5 (N = 5). Batch Gradient Descent Mthod Train Validatior Test Best Goal  Figure 4.3: Gradient Descent Training (N  =  4 and L  =  0.3).  51  w  U,  E C  Ui 4I  Q)  Figure 4.4: Gradient Descent Training (N  5 and L  =  0.3).  6 and L  =  0.3).  U)  C  Ui  0)  40  84 Epochs Figure 4.5: Gradient Descent Training (N  52  In demonstrating the effect of momentum on the network’s learning ability, the comparison of figures 4.6 and 4.7 indicates that an overly high momentum value can be detrimental to training and generalization. Finally, the effect of gradient descent with momentum and variable learning rate on the network performance is illustrated by figures 4.8 and 4.9. Increment and decrement modifiers for the learning rate are investigated, for a constant threshold ratio of errors. It is clear from figure 4.8 that the network performance is sensitive to large changes in learning rate. The rise in error suggests that for significantly high increment and decrement modifiers, the network begins to overfit the data instead of generalizing.  U)  0  Lii 0  0  (1)  1I  Figure 4.6: Gradient Descent with Momentum (N=4,L=O.3,M=O.2).  53  Batch Grid  Descent  With Momentum  10° Train Validation  Validation  U,  -1  10  w  Train/s Cl) (V IV  I  0  10  20  I  I  30  4  I  I  I  I  60 70 80 50 I QO epochs Figure 4.7: Gradient Descent with Momentum (N =4, L = 0.3, M = 0.4).  90.  100  Batch Gradient Descent with Momentum and Variable Learning Rate 100  Train  E 10 4-.  0  Ui  0•  U’  v  Iv  iü  -  0  10  20  30  40  51  59 Epochs Figure 4.8: Gradient Descent with Momentum and Variable Learning Rate (N=4,L=0.3,M=0.4,R= 1.2, 8= 1.7, ö=03).  54  ‘0)  t-.  w 0) (U  w (U  Figure 4.9: Gradient Descent with Momentum and Variable Learning Rate (N= 4, L  =  0.3, M= 0.4, R  =  1.2,  8 = 1.07, ô  =  0.1).  55  Chapter 5 Conclusions 5.1 Summary and Contributions In this thesis, a novel control scheme based on the classical image-based visual servoing was developed along with a functional mobile manipulator platform. With monocular and stereo vision cameras, a hybrid camera configuration was integrated into the system. This improved the classical eye-in-hand configuration by providing the depth information on-line in an unstructured environment. As the monocular camera tracks the visual features of a target object in 2D, the stereo camera accurately estimates the distance between the target object relative to the mobile manipulator platform and then transforms this distance with respect to the eye-in-hand camera frame using known kinematics of the platform. Deviations relative to the trained image of the target object are fed into a rule base which adjusts a gain matrix in order to enhance the performance of the controller. Since the controller output is a velocity vector for the eye-in-hand camera, a coordinate frame transformation and an integration step with constant sampling time have been incorporated to allow the position-controlled manipulator to move relative to its base frame. An experimental investigation has been carried out using the developed system. In particular, two cases concerning the number of visual features were studied: 3 visual features (case 1) versus 6 visual features (case 2). The case with 3 visual features showed significant disadvantage in a servoing task, specifically in the speed of servoing and the average pixel error upon completion of servoing. The interaction matrix in case 1 also demonstrated to have a higher condition number at each iteration than in case 2. This directly caused erratic fluctuations in the velocity profiles of case 1 proving that a minimally constrained interaction matrix (i.e., with 3 visual features) is not an appropriate pre-condition for image-based visual servoing. In addition, a neural network approach to multi-sensor grasping of unknown objects in an unstructured environment was explored using simulation experiments. Specifically, neural networks with feedforward architecture using the backpropagation learning algorithm were conceived to combine multiple infrared proximity sensors and determine the positioning of the manipulator gripper relative to the target object. Three  56  variations of the back-propagation algorithm were investigated; namely, gradient descent, gradient descent with momentum and gradient descent with momentum and variable learning rate. Both unmodified gradient descent and gradient descent with momentum methods demonstrated slow convergence. Gradient descent with momentum and variable learning rate showed relatively better convergence speed. The main contribution of this work can be listed as follows: 1. A new control technique based on the classical scheme of image-based visual servoing along with a novel camera configuration was developed. 2. A functioning mobile manipulator platform was developed with specific focus on emergency response and rescue applications 3. The ability for the developed mobile manipulator system to perform object manipulation tasks in unstructured environments while cooperating with other heterogeneous robots was studied and validated using physical experiments and simulation.  5.2 Limitations and Future Directions For operational reasons (i.e., weight, cost and size), a position-controlled robot manipulator (Harmonic Arm) was selected for mounting on the mobile manipulator platform. The characteristic of position control is in conflict with the requirements of the visual servo controller, which has velocity as outputs. Although integration was performed on the output velocity vector to resolve this issue, only a velocity-controlled manipulator can accurately capture the true essence of a visual servo controller. As the manipulation task and the environment which contains the target object become more complex, kinematic singularities of the manipulator will present a major challenge for a fully capable mobile manipulator. For this reason, future work focusing on model predictive control of the entire mobile manipulator unit should greatly enhance the functionality of its workspace and prevent servo failures while in operation. Another crippling issue of image-base visual servoing is the possibility of the visual features leaving the camera field of view. Although the current hybrid camera configuration has increased the monocular eye-in-hand camera field of view, AIMM’s field of view remains limited because the stereo camera is mounted statically on a ball  57  socket tripod head. An actuated pan-tilt head should greatly expand field of view of the stereo camera and allow AlIvIM to visually navigate in unstructured environment. As mentioned before, the Harmonic Arm is manipulator with 5 DOFs, and its end-effector does not have a spherical joint. This provides a limitation in the yaw direction (see Figure 5.1). This cannot be compensated for by the mobile robot and it conflicts with the 6 DOF requirements of a visual servo controller. Yaw  Figure 5.1: End-Effector Roll, Pitch and Yaw Axes.  Currently, the visual feedback is performed by MIL template matching and the ACTS color tracking software. This will be problematic as the mobile manipulator explores more unstructured and unknown environments. The open source computer vision software package (OpenCV) has been proven to be powerful and reliable. Another useful future work would be to implement techniques from OpenCV on the AIMM vision systems. As future work in grasping, generalization may be improved for all variations of the back-propagation algorithm. This may be done by employing different methods of data segmentation (e.g., block or interleaved segmentation). The speed of convergence may be increased by using more advanced training methods such as conjugate gradient and quasi-Newton algorithms.  58  BIBLIOGRAPHY [1] Y. Shirai and H. Inoue, “Guiding a Robot by Visual Feedback in Assembling Tasks,” Pattern Recognition, vol. 5 pp. 99-108, 1973.  [2] M. J. Micire, Analysis of the Robotic-Assisted Search andRescue Response to the World Trade Center Disaster, M.S. Thesis, University of South Florida, 2002.  [3] M. J. Micire, “Evolution and Field Performance of a Rescue Robot,” I Field Robotics, vol. 25, no. 1-2, pp. 17-30, January/February 2008. [4] G. Zhang, Y. Wang and C.W. de Silva, “Multi-Sensor Gripper Positioning in Unstructured Urban Environments Using Neural Networks,” IEEE International Conference on Automation and Logistics ICAL, 2008, pp. 1474-1479, Sept. 2008. [5] F. Chaumette and S. Hutchinson, “Visual Servo Control. Part I: Basic Approaches,” IEEE Robotics andAutomation Magazine, vol. 13, no. 4, pp. 82-90, Dec. 2006. [6] E. Maliso and F. Chaumette. “2-1/2D Visual Servoing with respect to Unknown Objects Through a New Estimation Scheme of Camera Displacement,” Internitional Journal of Computer Vision, vol. 37, no. 1, pp. 79-97, June 2000. [7] F. Chaumette and S. Hutchinson, “Visual Servo Control. Part II: Advanced Approaches.” IEEE Robotics andAutomation Magazine, vol. 14, no. 1, pp. 109-118, March 2007.  [8] A. De Luca, G. Oriolo and P.R. Giordano, “Feature Depth Observation for Image Based Visual Servoing: Theory and Experiments,” International Journal ofRobotics Research, vol. 27, no. 10, pp. 1093-1116, Oct. 2008.  59  [9] E. Cervera, A. P. del Pobil, F. Berry and P. Martinet. “Improving Image-Based Visual Servoing with Three-Dimensional Features,” International Journal ofRobotics Research, vol.22, no. 10-11, pp. 821-839, Oct./Nov. 2003. [10] D. Kragic, M. BjOrkman, H.I. Christensen and J.O. Eklundh, “Vision for Robotic Object Manipulation in Domestic Settings,” Robotics and Autonomous Systems. vol. 52, no. l,pp. 85-100, July 2005.  [11] D.P. Perrin, B. Kadioglu, S.A. Stoeter, N. Papanikolopoulos, “Grasping and Tracking Using Constant Curvature Dynamic Contours,” International Journal of Robotics Research, vol. 22, no. 10-11, pp. 855-871, Oct./Nov. 2003. [12] A. Saxena, J. Driemeyer, and A. Y. Ng, “Robotic Grasping of Novel Objects using Vision,” International Journal ofRobotics Research, vol. 27, no. 2, pp. 157-173, February 2008.  [13] C. C. Kemp, C. Anderson, H. Nguyen, and A. Trevor, “A Point-and-Click Interface for the Real World: Laser Designation of Objects for Mobile Manipulation,” Proceedings of the ACWIEEE International Conference on Human Robot Interaction, 2008.  [14] N. Rezzoug and P. Gorce, “A Multistage Neural Network architecture to Learn Hand Grasping Posture,” IEEE International Conference on Intelligent Robots and Systems, vol. 2, pp. 1705-1710, 2002. [15] J. M. Vilaplana and J. L. Coronado, “A Neural Network Model for Coordination of Hand Gesture During Reach to Grasp,” Neural Networks, vol. 19, no. 1, pp. 12-30, January 2006.  [16] Y.S. Choi, C.D. Anderson, J.D. Glass and C.C. Kemp, “Laser Pointers and a Touch Screen: Intuitive Interfaces for Autonomous Mobile Manipulation for the Motor  60  Impaired,” ASSETS’08: The 10th InternationalACMSIGACCESS Conference on Computers and Accessibility, pp. 225-232, Oct. 2008. [17] Pioneer 3 User’s Manual, Mobile Robots Inc., ver. 5, July 2007. [18] Y. Wang. Cooperative and Intelligent Control ofMulti-Robot System Using Machine Learning, Ph.D. dissertation, The University of British Columbia, 2007. [19] Harmonic Arm User’s Manual, Neuronics AG, ver. 2.0.1, May 2007.  [20] M.W. Spong, S. Hutchinson, M. Vidyasagar. Robot Modeling and Control. John Wiley & Son, 2006  [21] F.O. Karray and C.W. de Silva, Soft Computing and Intelligent Systems Design: Theory, Tools and Applications,Addison Wesley, 2004.  [22] M. T. Hagan, H. B. Demuth, and M. H. Beale, Neural Network Design, PWS Publishing, 1995.  61  


Citation Scheme:


Citations by CSL (citeproc-js)

Usage Statistics



Customize your widget with the following options, then copy and paste the code below into the HTML of your page to embed this item in your website.
                            <div id="ubcOpenCollectionsWidgetDisplay">
                            <script id="ubcOpenCollectionsWidget"
                            async >
IIIF logo Our image viewer uses the IIIF 2.0 standard. To load this item in other compatible viewers, use this url:


Related Items