Open Collections

UBC Theses and Dissertations

UBC Theses Logo

UBC Theses and Dissertations

Machine learning-based multi-robot cooperative transportation of objects Siriwardana, Pallege Gamini Dilupa 2009

Your browser doesn't seem to have a PDF viewer, please download the PDF to view this item.

Item Metadata

Download

Media
24-ubc_2009_fall_siriwardana_pallege.pdf [ 3.41MB ]
Metadata
JSON: 24-1.0068136.json
JSON-LD: 24-1.0068136-ld.json
RDF/XML (Pretty): 24-1.0068136-rdf.xml
RDF/JSON: 24-1.0068136-rdf.json
Turtle: 24-1.0068136-turtle.txt
N-Triples: 24-1.0068136-rdf-ntriples.txt
Original Record: 24-1.0068136-source.json
Full Text
24-1.0068136-fulltext.txt
Citation
24-1.0068136.ris

Full Text

MACHINE LEARNING-BASED MULTI-ROBOT COOPERATIVE TRANSPORTATION OF OBJECTS by Pallege Gamini Dilupa Siriwardana B.Sc., University of Peradeniya, Sri Lanka, 2003 B.Tech., The Open University of Sri Lanka, 2004  A THESIS SUBMITTED TN PARTIAL FULFILLMENT OF THE REQUIREMENTS FOR THE DEGREE OF  MASTER OF APPLIED SCIENCE in The Faculty of Graduate Studies  (Mechanical Engineering)  THE UNIVERSITY OF BRITISH COLUMBIA (Vancouver) May 2009  © Pallege Gamini Dilupa Siriwardana, 2009  Abstract Multi robot cooperative transportation is an important research area in the multi robot domain. In the process of object transportation, several autonomous robots navigate cooperatively in either a static or a dynamic environment to transport an object to a goal location and orientation. The environment may consist of both fixed and removable obstacles and it will be subject to uncertainty and unforeseen changes within the environment. Furthermore, more than one robot may be required in a cooperative mode for handling heavy and large objects. These are some of the challenges addressed in the present thesis. This thesis develops a machine learning approach and investigates relevant research issues for object transportation utilizing cooperative and autonomous multiple mobile robots. It makes significant original contributions in distributed multi robot coordination and self deterministic learning for robot decision making, and comes up with an optimal solution to the action selection conflicts of the robots in the cooperative system. This will help to improve the real time performance and robustness of the system. Also, the thesis develops a new method for object and obstacle identification in complex environments using a laser range finder, which is more realistic than the available methods. A new algorithm for object pose estimation algorithm is developed, enabling a robot to identify the objects and obstacles in a multi-robot environment by utilizing the laser range finder and color blob tracking. The thesis develops a fully distributed hierarchical multi-robot architecture for enhanced coordination among robots in a dynamic and unknown environment. It strives to improve the real time performance and robustness. The system consists with three layers. By combining two popular artificial intelligence (Al) techniques such as learning and behavior based decision making, the developed architecture is expected to facilitate effective autonomous operation of cooperative multi-robot systems in a dynamically changing, unstructured, and unknown environment.  11  Machine learning, is integrated into the developed multi-robot system for decision making during the transportation under uncertainty and unforeseen changes within the environment. For this purpose the conventional algorithm of Q learning known as singleagent  Q learning algorithm improved to form the “modified Q learning algorithm”, which  provides efficient state identification and optimally resolves action selection conflicts in multi robot learning. A C++ based simulation system and a physical multi-robot experimental system are developed in the laboratory to transport an object of interest to a goal location in a dynamic and unknown environment with a complex obstacle distribution. The approaches developed in this thesis are implemented in the prototype system and validated through computer simulation and experimentation. The results validate the developed techniques.  111  TABLE OF CONTENTS Abstract  .  ii  Table of Contents  iv  List of Tables  vii  List of Figures  viii  Nomenclature  x  Abbreviations  xii  Acknowledgements  xiii  Chapter 1 Introduction  1  1.1 Project Description  2  1.2 Objective of the Research  4  1.3 Problem Definition  4  1.4 Related Work  7  1.4.1 Multi-Robot Coordination  9  1.4.1.1 Behavior- Based Approach for Coordination  9  1.4.1.2 Planning-Based Approach for Coordination  10  1.4.1.3 Distributed Control Approach for Coordination  10  1.4.1.4 Market-Based Approach for Coordination  11  1.4.1.5 Learning-Based Approach for Coordination  11  1.4.2 Machine Learning in Multi-Robot Systems  12  1.4.3 Pose Estimation Technologies for Multi-Robot Systems  14  1.5 Contributions and Organization of the Thesis Chapter 2 Multi-Robot Coordination for Object Transportation  17 19  2.1 Overview  19  2.2 General Hierarchical Architecture for Multi Robot Coordination  20  2.2.1 Machine Learning Unit  26  2.2.2 Coordination Unit  27  2.2.3 Behavior-Based Distributed Control  28  2.2.3.1 Behavior Representation  30  2.2.3.2 Behavior Composition and Cooperation  31  2.2.3.3 Communication Behaviors for Coordination  33  2.2.3.4 Group Behaviors for Coordination  34 iv  2.2.3.5 Behavior Coordination for Object Transportation Sub Task  2.3 Mechanism for Multi Robot Coordination  34 37  2.3.1 Dynamic Role Assignment Mechanism for Coordination Unit  37  2.3.2 Dynamic Role Assignment Model for Coordination Unit  40  2.4 Distributed Control Issues for Multi Robot Coordination  43  2.5 Summary  44  Chapter 3 Machine Learning for Multi-robot Decision-making  46  3.1 Overview  46  3.2 Markov Decision Process (MDP)  48  3.3 Reinforcement Learning  51  3.4 Modified  54  Q Learning Algorithm  3.5 Multi Robot Transportation Using Modified Q Learning  58  3.5.1 Distributed Multi-Robot Q-leaning  60  3.5.2 Object Transportation Task  61  3.5.3 Multi-Robot Object Pushing with Modified Q-learning  62  3.5.3.1 The Environment States for Modified  Q Learning  62  3.5.3.2 Robot Actions  64  3.5.3.3 Object Dynamics  65  3.5.3.4 Reward Function  67  3.5.4 Simulation Results  68  3.5.4.1 Stimulation Results for Object Pushing 3.5.4.2 Effectiveness of Modified  Q Learning in a Multi Robot Cooperative Task  3.6 Summary Chapter 4 Pose Estimation for Multi Robot Object Transportation  69 71 76 78  4.1 Overview  78  4.2 Multi-Robot Transportation System  81  4.3 Pose Estimation Technology  83  4.3.1 Global Localization of Robot  85  4.3.2 Color Blob Tracking  87  4.3.3 Object Pose Estimation  89  4.3.3.1 Relative Pose Estimation  90  4.3.3.2 Object Global Estimation  93  4.4. Experimental Result 4.4.1 The Test-Bed  94 95 v  4.4.2 Experiments with the Stationary Robot 4.4.3 Experiments with a Moving Robot  4.5 Summary Chapter 5 Experimentation in Distributed Multi-Robot Cooperative Transportation  96 101 105 107  5.1 Overview  107  5.2 Test Bed  108  5.3 A Physical Multi-Robot Transportation System  112  5.4 Summary  118  Chapter 6 Conclusions  119  6.1 Primary Contributions  119  6.2 Limitations and Suggested Future Research  121  6.2.1 Improvement of the Model and Algorithm of MQLA  121  6.2.2 Integrating Learning and Coordination  122  6.3 Summary  122  Bibliography  123  Appendix 1: Program Functions  127  Appendix 2: Program Codes  129  vi  List of Tables Table 4.1:  The actual and estimated object pose results in the first set of experiments with a stationary robot  Table 4.2:  The actual and estimated object pose results for the second set of experiments with a stationary robot  Table 4.3:  99  The actual and estimated object pose results for the first set of experiments with a moving robot  Table 4.4:  96  101  The actual and estimated object pose results for the second set of experiments with a moving robot  103  vii  List of Figures Figure 1.1  The overall project representation  3  Figure 1.2  Schematic representation of developed system  5  Figure 2.1  Hierarchical architecture for object transportation  23  Figure 2.2  Composite representation of the proposed architecture  25  Figure 2.3  Principle of machine learning  26  Figure 2.4  The environment state representation for coordination  28  Figure 2.5  The hierarchical behavior based control layer  29  Figure 2.6  SR diagram of a simple behavior  30  Figure 2.7  The SR diagram for a composite behavior with a competitive operator  32  Figure 2.8  The SR diagram of a composite behavior with a cooperative operator  33  Figure 2.9  Behavioral hierarchy for the coordinated transport sub-task  35  Figure 2.10 Transitions of the group behaviors of coordinated object transportation  36  Figure 2.11 Types of role assignment  38  Figure 2.12 Schematic representation of a role composed by three roles  42  Figure 3.1  The value iteration algorithm  51  Figure 3.2  The single-agent Q-learning algorithm  52  Figure 3.3  Modified  Figure 3.4  The developed multi-robot system  59  Figure 3.5  Description of a multi robot object pushing task  61  Figure 3.6  Environment state representation using binary code  63  Figure 3.7  Coding protocol of the environment states  64  Figure 3.8  Pushing action points available to each robot  65  Figure 3.9  Dynamics of the object with rectangular cross section  66  Q learning algorithm  Figure 3.10 Two object-pushing snapshots under different conditions  57  70  Figure 3.11 The environmental state with decimal value “512” and six pushing actions available to the robots Figure 3.12 The Q-values under state of “512” in 100,000 rounds of object pushing  72 73  Figure 3.13 Probability density estimate of Q values under state of “512” in 100,000 rounds of object pushing  74  viii  Figure 3.14 Probability density estimate of a action selection probability under the state of “512” in 100,000 rounds of object pushing  75  Figure 4.1  The developed physical multi-robot system  82  Figure 4.2  General scheme of pose estimation for cooperative object transportation  85  Figure 4.3  Global pose estimation of wheeled robot  87  Figure 4.4  Flow diagram of color blob tracking  88  Figure 4.5  Division of camera frame into four quadrants  89  Figure 4.6  Schematic drawing of laser range sensor  90  Figure 4.7  Relative object pose estimation  90  Figure 4.8  MobileSim simulator visualized results from laser range finder  92  Figure 4.9  Arbitrary layout of objects  94  Figure 4.10 The experimental system in IAL at UBC  95  Figure 4.11 The x-axis error from Table 4.1  97  Figure 4.12 The y-axis error from Table 4.1  97  Figure 4.13 The orientation error from Table 4.1  98  Figure 4.14 The x-axis error from Table 4.2  99  Figure 4.15 The y-axis error from Table 4.2  100  Figure 4.16 The orientation error from Table 4.2  100  Figure 4.17 The x-axis error from Table 4.3  102  Figure 4.18 The y-axis error from Table 4.3  102  Figure 4.19 The orientation error from Table 4.3  103  Figure 4.20 The x-axis error from Table 4.4  104  Figure 4.21 The y-axis error from Table 4.4  104  Figure 4.22 The orientation error from Table 4.4  105  Figure 5.1  The multi-robot object transportation system  108  Figure 5.2  The color blobs used to identify the orientation of the object  109  Figure 5.3  PioneerTM 3-DX mobile robot  110  Figure 5.4  PioneerTM 3-AT mobile robot  111  Figure 5.5  Multi-robot cooperative transportation  116  ix  No menclature Notations S  =  {s, s , 2  A  =  2 , 1 {a , a .  ...,  .  s}  .,a}  Set of states in the environment Set of actions available to a robot Environment model, which decides the next environmental state  T(s, a, s’)  s’when robot selects action a, under current states. Probability distribution over the states  fl(s) T: S x A  —>  ,z(s)  Transition function or transition model  t  time  R : S x A —÷ fl  Reward function. This gives the immediate reward when robot takes action a. under current state s Policy with which a robot selects its actions  R(S)  Received Reward Discount factor, which varies from “0” to “1” Optimal policy, which yields the highest expected sum of the discounted rewards  U(s)  Utility of a state  2T  Utility of the subsequent state Maximum error allowed in the utility of any state  6  Maximum change in the utility of any state in an iteration  U, U’  Vectors of the utilities for the states in s  Q(s,,a,)  An entry in the Q-table  77  Learning rate  r  Direct reward  V  “Temperature” parameter (see Chapter 3) Empty set A set including all actions selected by the robots  x  Orientation angle  8 F  Net force applied by the robots  F  Net torque applied by the robots  ,c 1 Cf,C  Coefficients  At  Short period of time Distance between the goal location and the object center before taking the pushing action  Dnew  Distance between the goal location and the object center after taking the pushing action  T  Sampling time  1 A  Action set of the i ’ robot 1  I  Set including the currently available actions  xi  Abbreviations MDP  Markov Decision Process  RL  Reinforcement Learning  MQLA  Modified Q Learning Algorithm  Al  Artificial Intelligence  LAN  Local Area Network  TCP/IP  Transmission Control Protocol/ Internet Protocol  MRS  Multi-Robot System  CCD  Charge Couple Device  GPS  Global Positioning System  FSA  Finite State Diagram  VLSI  Very Large Scale Integration  ACTS  Advanced Color Tracking System  xli  Acknowledgements A number of people contributed to this effort intellectually, personally and financially. First and foremost, I wish to thank Prof. Clarence W. de Silva, Director of the Industrial Automation Laboratory, in view of number of reasons. He was my research supervisor, and his dedication and commitment towards the student community is appreciated by all of us who were his students. Had it not been for Professor de Silva’s considerable persuasive skills, boundless patience, great-heartedness, keen guidance and unbelievable human qualities; this particular research perhaps would never have come to fruition. I take this opportunity to thank Dr. Ying Wang, The post-doctoral research scientist in the Industrial Automation Laboratory, who is a scholar of multi-robotic systems. Whenever I had a problem in multi robotic systems, he assisted me along the correct path and I am privileged and honored to have worked with him. I must thank Mr. Howard Hu, a Research Assistant in the Industrial Automation laboratory, for his constant support and constructive remarks, especially in debugging the computer programs. I wish to express my sincere gratitude to Prof. Mu Chiao and Dr. Farbod Khoshnoud of SOFTEK, who served as members of my research committee. Furthermore, I wish to thank The Open University of Sri Lanka and the Asian development Bank funded Distance Education Modernization Project for the financial assistance in part. Major financial assistance, for my Research Assistantship, and for equipment and other resources for my research, was provided by research grants held by Prof. de Silva, notably: the National Sciences and Engineering Research Council (NSERC) of Canada Special Research Opportunity (SRO) and Discovery grants; Tier 1 Canada Research Chair (CRC); Canada Foundation for Innovation (CFI); and British Columbia Knowledge Development Fund (BCKDF). I also acknowledge the support extended by my Sri Lankan friends, Mr. Anjana Punchihewa, Mr. Eranda Harinath and Mr. Lalinda Weerasekara for their encouragements and support in different ways.  xlii  I take this opportunity to thank the members of the Industrial Automation Laboratory; especially Prof. Lalith Gamage, Mr. Tahir Khan, Mr. Mohammed Airasheed, Ms. Madalina Wierzbicki, Mr. Roland Haoxiang Lang, Mr. Bhenam Razavi, Mr. Srinivas Raman, Mr. Guan-Lu Zhang, Mr. Arunasiri Liyanage, Mr. Ramon Campos and Prof. Qingshan Zeng for their assistance, guidance and encouragements throughout the research project. Last but not least, I wish to thank my family for their constant support, especially my wife Lelka Samanmalee, my daughter Viduni Siriwardana, my mother, my father and my sister. Without their unwavering support, tolerance of my idiosyncrasies and abiding confidence, this research never would have been possible.  xiv  Chapter 1 Introduction The field of Multi-Robot Systems emerged as a major area of robotics research in the late 1980’s. Since then, the robotics community has investigated a number of important issues of cooperative multiple mobile robots. Prior to that most research was concentrated on either single robots or distributed problem-solving systems that did not involve multi-robot cooperation. Multi-robot systems (MRSs) are becoming increasingly significant in industrial, commercial and scientific applications. The number of robots used in these applications is also increasing. An MRS can be defined as a set of autonomous robots interacting in the same work environment in fairly complex ways to carry out a common task. It should be able to operate in dynamic environments, where uncertainty and unforeseen changes can arise due to the presence of moving robots, other agents and both stationary and moving obstacles. Such a system may consist of complex mobile robotic platforms equipped with sophisticated sensors and actuators, which may be required to execute complex tasks. The advantages of a multi-robot system (MRS) over a single-robot system are indicated now. Multiple robots have the potential to accomplish a given task more efficiently because the capabilities more diverse, flexible, and distributed. Multiple robots can localize themselves faster and more accurately by exchanging information among themselves about their positions and orientations (pose). An MRS possesses a greater fault tolerance than a powerful and expensive single robot, particular because if one robot in the MRS fails or malfunctions there will be other robots to assume its tasks. In summary, an MRS will have a greater range of task capabilities, greater efficiency, increased system performance, fault tolerance and robustness, lower economic cost, ease of development, distributed sensing and action, inherent parallelism in comparison to a single robot of comparable capability. In the MRS domain, the robotics community began to investigate major challenges such as robot synchronization, determination of a robot coordination strategy and identification of complex work environments. Also, they have investigated the potential for using MRS research to provide insight into social and life sciences involving groups of biological entities.  1  Multi-robot object transportation is an important research subject in the multi-robot domain. In an object transportation process, several autonomous robots navigate cooperatively in either a static or a dynamic environment to transport an object to a goal location and orientation. The environment may consist of fixed and movable obstacles and may be subject to uncertainty and unforeseen changes within the work environment. A single robot may not be able to handle heavy and large objects alone. Underlying challenges can be overcome by multiple mobile robots. A primary concern associated with multi-robot application is the method of enabling individual robots to autonomously adapt their behaviors to meet the dynamic changes in their task environment. For this, each robot needs to communication with its peers, exchange sensor information, formulate the coordination strategy and obstacle avoidance scheme, plan the robot trajectory, and assign and receive subtasks and roles for carrying out the required task quickly and successfully. Due to main challenges inherent in various research fields, multi-robot object transportation is a good benchmark application to assess the effectiveness of multi-robot control and coordination strategies. Therefore control and coordination methods for multi-robot systems have been investigated by various researchers and multi robot object transportation problem has become a theme of research and development.  1.1 Project Description A main objective of the overall project of “Multi-Robot Autonomous Assembly System” in our laboratory (Industrial Automation Laboratory) is to develop a physical multi-robot system with object manipulation, transportation and assembly capabilities. First, a group of intelligent autonomous robots cooperatively search and identify any useful parts within the work environment. Then the identified parts are individually or cooperatively transported to a designated place and assembled into a useful device. Finally, the built device is transported individually or cooperatively to a known location in an unknown and dynamic environment. During this process, the obstacles may be static or even appear randomly during the operation. The overall project representation is shown in Figure 1.1.  2  __________  “7  cle/  /  I  Movable obstacle  /-  F  0 N  /  /  Movable obstacle  .A..  o  /777k  ‘7-  /  Fixed obstacle  N  1 Movable obstacle  Movable obstacle —  /  ®  / ‘—‘i  /  /  Goal Location  Figure 1.1: The overall project representation. The overall project consists of three main stages: 1. Multi-Robot Foraging: Multiple mobile robots autonomously search the environment for useful objects/parts. The identified useful parts are transported by individual robots or cooperating multiple robots to a designated place. 2. Multi-Robot Assembly: Robots autonomously determine the assembly sequence for constructing a useful device, properly handle and manipulate the parts, and assemble the device. 3. Multi-Robot Transportation: Tow or more robots cooperatively transport the built device to the goal location. A major challenge of the overall project is the transportation of an object of arbitrary size and shape  -  a key issue in the multi-robot domain. In the stage of transporting components for  assembly at a designated location the robots may have to work in a complex dynamic environment. The stages of multi-robot forging and transportation need to use cooperative object transportation technologies to accomplish the overall task. This thesis mainly concerns object transportation technologies and the development of a physical multi-robot system to transport an object of arbitrary size and shape.  3  1.2 Objective of the Research The physical multi-robot object transportation system, as developed through the research reported in the present thesis, has the ability to navigate an object to a predetermined location though a dynamic and unknown environment with complicated obstacle distribution. The primary research objectives of the thesis are listed below. o  Develop a fully distributed physical multi-robot system for operation in a dynamic and unknown environment with complicated obstacle distribution.  o  Improve an available single-agent machine learning algorithm so as to enable a multirobot system to deal with the uncertainty and unforeseen changes in the environment including changes of the robots.  o  Develop a technique to identify obstacles within a dynamic and unknown environment.  o  Develop a technique for pose estimation in a task of object transportation, to more realistically estimate the pose of the object of interest.  o  Evaluate the performance attributes such as coordination, self-learning and adaptation, speed, and accuracy of the developed multi-robot system and its underlying methodologies.  1.3 Problem Definition Development of a physical multi-robot system is an important objective of the present work. This system consists of a group of intelligent autonomous robots, which have the ability to work cooperatively for transporting an object to a goal location and orientation. The work environment is considered unknown and dynamic, and obstacles may be present or even appear randomly during the transportation process. Multi-robot coordination and machine learning are integrated into the developed physical system to cope with the challenges of the problem. A schematic representation of the developed system is shown in below Figure 1.2.  4  Server  Wired/wireless Router TCP/IP  1I  Figure 1.2: Schematic representation of the developed system. Three or more autonomous mobile robots are present in the system, which enable the cooperative transportation of an object to a goal location. During the course of the transportation, they have to interact with each other to establish the strategy of coordination and cooperation. The feasible and suitable locations and points of action have to be established for each robot assisting in the cooperative transportation. The object has to be transported quickly and effectively while avoiding obstacles that may be present during the transportation process. Furthermore, the system will have to avoid damage to the transported object. In the developed system, charge coupled device (CCD) cameras, laser range finders and sonars are used to monitor and measure the current location and orientation of an object. The dynamic and unpredictable environment contains both fixed objects and also movable obstacles, which may appear randomly. The robots and sensory system are separately linked to their host computers, which are connected though a local network to implement machine intelligence and system control.  5  It is important to consider three major challenges in the development of the multi-robot system. The first one is the dynamic and unpredictable environment. In particular, dynamic obstacles may be scattered arbitrarily within the working environment. The obstacles may appear and disappear randomly during the transportation process. In addition, there will be multiple robots working parallel. While one robot makes decisions, some other robots may have changed the environment according to their behaviors. This also makes the environment dynamic from the view of a single robot. As a result, the process of decision making becomes rather complex for the robots within the environment. Challenge is to make decisions in real time and learn to identify the conditions of the surrounding environment from the local viewpoint of the robots. The second challenge results from the localized sensing capabilities of the robots. In the present project it is assumed that each robot is capable of detecting an object or obstacle within a limited detection radius. The robots possess local sensing capabilities only, and consequently the global environment is generally unknown to them. A robot will not have a good knowledge about the location or the shape of an object until it moves sufficiently close to the object. The unknown environment is another major challenge in the multi-robot domain, because the robots do not have a complete understanding about the environment in advance, partly due to the unfamiliarity and partly due to the dynamic nature of the environment. Thus the system will not be able to utilize the traditional planning methodologies for the decision making process. The third challenge is the lack of reliable communication among robots, where the disturbances are represented by white noise. Due to unreliable communication, the robots may end up selecting a wrong coordination strategy. Improvement of the robustness of multi-robot decisionmaking in an environment with unreliable communication is also a challenging issue in the multi-robot domain. Normally, the environment of a multi-robot system is dynamic and unknown. In such a situation, the communication of robots is generally inconsistent. In this thesis, several approaches are introduced to meet these challenges in multi-robot systems in enabling them to work effectively in an unstructured environment.  6  1.4 Related Work A significant amount of work has been performed by the robotics community in the past 10 years in multi-robot coordination and learning for decision making, particularly to support effective and robust operation of multi-robot systems in both simple and dynamic environments. This section presents some relevant work in this context. Cao et at. (2006) presented a multi-robot hunting task in an unknown environment. They proposed a distributed control approach involving local interactions with respect to local coordinate systems. Proposed approach can cope with the cumulative errors of wheels of the robots and also unreliable communication networks. Computer simulation validated their approach. Kumar and Garg (2004) studied a multi-robot cooperative manipulation with two industrial robotic arms. They introduced a fuzzy logic-based approach to coordinate the motions of the two robotic manipulators so that the internal forces among them became minimum. In addition, they used Kalman filtering to estimate the external force on the end effecter, based on the information from a force/torque sensor mounted on the robot wrist. Stone et al. (2006) examined the issue of uncertainty in multi-robot systems. They identified several sources of uncertainty and proposed a method for reducing the uncertainty and making decisions in the face of uncertainty. Their method enables effective planning under uncertainty for robots engaged in goal orientated behavior within a dynamic and complex environment. The scaling issues of multi-robot systems are presented by Gustafson et al. (2006). They used several abstract models to elucidate that it was counterproductive to increase the number of robots or the sensor strength in a multi-robot system. Mataric et al. (2006) presented a mathematical model for dynamic task allocation in multi-robot systems, based on stochastic processes. They made an assumption for a large-scale multi-robot system with local sensing, behavior based controller, and distributed task allocation capabilities. Through storing the history of the environment examined in the internal state of a robot, they tried to cope with the issues of local sensing and indirect communication in multi-robot systems. Their model was demonstrated for a multi-robot foraging task.  7  In another paper, the same authors (Gerkey and Mataric, 2004) developed a taxonomy for MRTA (multi robot task allocation). They proposed three axes for use in describing MRTA problems: (1) Single task robots (ST) versus multi task robots (MT). (2) Single robot tasks (SR) versus multi robot tasks (MR). (3) Instantaneous assignment (IA) versus time extended assignment (TA). Based on introduced method, they compared six typical multi-robot architectures with their MRTA approaches. Many valuable surveys have been conducted on multi-robot systems. Farinelli et al. (2004) completed a survey on multi-robot systems and proposed a new taxonomy for multi-robot coordination. Multi-robot tasks were classified as unaware systems, aware but not coordinated systems, weakly coordinated systems, strongly coordinated and strongly centralized systems, strongly coordinated and weakly centralized systems, and strongly coordinated and distributed systems. In a survey carried out on existing multi-robot architectures, Parker (2000) pointed out several challenges in typical multi-robot tasks. Specially, she categorized the available architectures into three multi robot architectures as: general architecture, specialized architecture and hybrid architecture. In addition, she classified three typical multi-robot tasks and their main challenges. For multi-robot object transportation, a major challenge is uneven terrain; for multi-robot learning, a major challenge is how to assign credits among robots; and for multi-robot motion coordination, the key challenge is computational complexity. There are a number of surveys that review Multi-Robot Learning (MRL) or Multi-Agent Learning (MAL). For example, the survey presented by Yang and Gu (2004) identified that scaling an individual reinforcement learning algorithm to multi-robot systems would violate its Markov decision process assumption, which is the weakness of many available multi-robot reinforcement learning approaches. They also classified four frameworks of multi-agent reinforcement learning: the stochastic games (SGs)-based framework, the fictitious play framework, the Bayesian framework, and the policy iteration framework. Furthermore, they categorized multi-agent reinforcement algorithms for possible application to multi-robot systems. They argued that there were four major challenges in applying reinforcement learning to multi robot systems: the Markov decision assumption, continuous spaces, uncertainties, and  8  incomplete information. Three future directions were pointed out as well: the behavior based approach with reinforcement learning, fuzzy approximation functions, and continuous reinforcement learning. Panait and Luke (2005) presented a survey on multi-agent learning for cooperation and they discussed the two main challenges in cooperative multi-agent learning; namely, the large learning space and the dynamic environment. Furthermore, they classified two types of learning: concurrent learning and team learning. The major issues in the area of concurrent learning were discussed as: credit assignment, learning dynamics, teammate modeling and relationship between learning and communication.  1.4.1 Multi robot Coordination To improve a mechanism for coordinating multiple robots in the execution of cooperative tasks has been a challenging objective in multi-robot systems. Multi-robot coordination has been carried out according to a developed multi-robot architecture. Multi-robot architectures seek to improve the coordination mechanism for the object transportation process. Primarily five types of multi robot coordination approaches are briefly discussed now; namely, the behavior-based approach, the planning-based approach, the market-based approach, the distributed control-based approach, and the learning-based approach.  1.4.1.1 Behavior-based approach for coordination Parker (2000) introduced a distributed and behavior-based multi-robot architecture called L ALLIANCE. It employs the concepts “Impatient” and “Acquiesce” to dynamically stimulate behaviors for coordination among robots. Moreover, by evaluating the performance of the teammates and dynamic changes in the environment, L-ALLTANCE autonomously updates its parameters to adjust to those changes. Introduced architecture was validated in a box pushing task with heterogeneous mobile robots. The behavior based multi-robot coordination, CAMPOUT, has been introduced in the Robot Construction Crew (RCC) project developed in the Jet Propulsion Laboratory (JPL) of NASA. CAMPOUT is a fully distributed, behavior based multi-robot architecture and it uses leader follower distributed coordination strategy. CAMOUT was validated using a multi-robot  9  transportation task in an autonomous deployment project on a planetary surface. In their latest work they introduced a multi-robot autonomous construction task based on the CAMPOUT architecture. Mataric et a!. (2002) developed a behavior based approach for a distributed multi-robot system. They developed controllers which were robust to noise and robot failures. Three versions of the task were evaluated in a spatio-temporal context using interference, time to completion, and distance traveled as the main investigative parameters. Konidaris and Hayes (2005) proposed to integrate the behavior-based approach with reinforcement learning for multi-robot coordination. Furthermore, they suggested to use topological maps to reduce the learning space in reinforcement learning. Camacho et a!. (2006) introduced behavior based coordination for robot soccer agents. In their work, they allocated each robot a duty for execution of the particular defined task. A rule-based RECCA control algorithm was also proposed.  1.4.1.2 Planning-based approach for coordination Miyata et a!. (2002) presented coordination of multiple mobile robots for cooperative transportation in unknown static environments. They introduced a centralized approach for task planning and assignment. Priority-based assignment algorithms were used for multi-robot coordination and motion planning.  1.4.1.3 Distributed control approach for coordination Wang and de Silva (2005), Zaerpoora et al. (2005), Pereira et al. (2004), and Chaimowicz et al. (2004) presented the caging formation control problem in multi-robot coordination for cooperative transportation. They proposed an “Object Closure” strategy to coordinately move the object with multiple mobile robots while maintaining the formation. The major challenge was to establish a bounded movable area for the object during its transportation. In this case, the contact between the object and the robots did not need to be maintained by the controller of each robot. They termed it “Object Closure”, which was used as a type of distributed formation control strategy for multi-robot cooperative carrying tasks. Basically, the “Object Closure” approach is a  10  sub-category of behavior based multi-robot coordination. However, it places its emphasis on the distributed control features. Marshall et al. (2006) introduced a distributed control strategy called “cyclic pursuit” for multi robot coordination. The particular approach was found to be robust in the presence of Unmodeled dynamics and delays due to sensing and information processing.  1.4.1.4 Market-based approach for coordination Multi-robot task allocation is a vital and complex area of multi-robot coordination and the conventional approaches have seen low success in the presence of a large number of robots in the system. However, market-based approaches seem better for solving the complex multi-robot task allocation problem, and they are gaining popularity. For example, Mataric et a!. (2002) presented an auction-based approach for multi-robot coordination. They utilized the publisblsubscribe first price auction method and validated it in a multi-robot box pushing task. During the transportation process a “watcher” robot monitors the transportation task, and “publishes” the required auctions for use by the “pusher” robots in the same environment. The “pusher” robots determine the available task at a given instance though “subscribing” to the information from the “watcher” robot. By matching its own abilities to the required tasks, each “pusher” robot bids for a task. When the “watcher” robot receives all bids from the “pusher” robots, it selects the most suitable “pusher” for the particular tasks and then the “watcher” robot broadcasts the decisions to the “pusher” robots in the system. Particular auction-based approach was validated for a multirobot box pushing task in a static environment without obstacles.  1.4.1.5 Learning-based approach for coordination Typically multi-robot systems work in a dynamic and unknown environment where a traditional behavior-based approach can easily fail because the designer cannot foresee all possible world states the robot would encounter and design the corresponding behavior. A behavior-based multi-robot system will work well in a known and structured environment. Learning capabilities are important for multi-robot systems to work properly in this type of environments. Most existing work in multi-robot learning utilizes reinforcement learning due to its simplicity and good real-time performance. For example, Ito and Gofuku (2004) proposed a two-layer multi robot architecture for multi-robot coordination in cooperative object transportation. They 11  introduced a centralized machine learning method in the top level of this architecture for decision making. A distributed approach of rule-based control was developed for the lower level of motion control of the robot, to reach a specified position and take a specific action according to the commands from the upper level. Their approach reduced the learning space by integrating reinforcement learning with genetic algorithms.  1.4.2 Machine Learning in Multi-Robot Systems Learning capabilities are desirable for multi-robot systems when the working environment is unknown, unstructured and dynamic. Technologies of machine learning have been employed commonly for this purpose. Among the machine learning approaches, reinforcement learning and particularly Q-Learning has been quite popular due to its simplicity and good real-time performance. A well-known work in this area, Parker et al. (2002) studied two significant aspects in multirobot learning called learning new behaviors autonomously and learning parameter adjustments. They proposed two approaches for autonomously learning new behaviors in multi-robot systems. The first approach is called distributed pessimistic lazy Q-learning and it combines lazy learning with Q-learning. It also uses a pessimistic algorithm for the credit assignment problem. The second approach, called Q-Learning with VQQL (vector quantization with Q-learning) and the generalized Lloyd algorithm, addresses the generalization issue in reinforcement learning. Both approaches have been validated in a COMMMT (Cooperative Multi robot Observation of Multiple Moving Targets) project. Let us consider learning for parameter adjustment. Their previous paper (as mentioned under behavior-based approach) introduced the behavior-based architecture called L-ALLIANCE, which enables robots to autonomously update their control parameters so that they can adapt their behavior over time in response to changes in the capabilities of the robots, team composition, and the environment. Kapetanakis and Kudenko (2002) established two of multi-agent learning techniques called multiple single-agent learning and social multi-agent learning. In the multiple single-agent learning technique, each learning agent observes other agents as a part of the environment and an individual agent does not have any explicit knowledge of about other agents. However, in social multi-agent learning, agents can have a high level of knowledge of other agents and integrate this  12  knowledge in the learning process, potentially using local modeling techniques, communication and coordination to support the learning task. Martinson and Arkin (2003) presented a multi-robot foraging task with  Q learning. They used Q  learning to select roles of the robots dynamically and it was integrated with the behavior-based approach. The learning space was reduced by the behavior representation. They did simulation to verify their approach. Dahi et al. (2004) presented a  Q  learning-based multi-robot cooperative transportation task in a  simple environment. They proposed a two-layer multi-robot architecture. The  Q  learning-based  behavior-selection subsystem was located in the upper level and the behavior-based reactive subsystem was located in the lower level. Further, they introduced two communication strategies to speed up the convergence of Q learning. Kovac et al., (2004) addressed a “pusher-watcher” multi-robot box pushing problem with reinforcement learning. According to their approach one robot acted as a “watcher” which observed the current world state and broadcasted it to other robots. The other robots in the project acted as “pushers” who selected their respective pushing actions using  Q  learning. A  major weakness of this approach is the need for a robust communication network. Taylor and Stone (2005) studied a methodology to transfer learned knowledge from one task to another with a different state-action space. This is an important issue in reinforcement learning. This transfer is beneficial for the reduction of the training time in reinforcement learning. Particular methodology used the specific domain knowledge to construct a knowledge mapping function between two considered tasks so that the knowledge transfer happened effectively. Tangamchit et al., (2003) addressed several crucial factors affecting decentralized multi-robot learning in an object manipulation task. They recognized four factors that directly affect the distributed multi robot learning: the reward type, the learning algorithm, diversity, and the number of robots. They concluded that the reward type and the learning algorithm considerably affected the final results, and the scalability of the system would also affect the learning speed but had only a slight effect on the final outcomes.  13  It is clear that reinforcement learning is rather popular in multi-robot coordination. The environment in a multi-robot task is typically dynamic because the robots themselves also change in their shared environment. Therefore, the single agent reinforcement learning algorithm when extended to a multi-robot system violates its assumption of a stationary environment. A number of multi-agent reinforcement algorithms have been introduced in the multi-agent area, namely, mini-max Q learning algorithm, Nash Q learning algorithm, friend or foe Q learning algorithms, team Q learning algorithm, and so on. The team Q learning algorithm is a simplified version of the Nash Q learning algorithm. All these algorithms assume a dynamic environment and allow each robot to observe the actions of its peers before it makes its own decisions. The large learning space is the main hurdle in multi-agent reinforcement learning algorithms. Each robot needs to monitor its own actions and its peer actions as well. Therefore, when the scalability of the system increases, the resulting learning space will be excessive. However, when the number of robots increases, the single agent Q learning algorithm still can provides reasonable performance due to its fixed learning space. Multi-robot community has tried to employ numerous approaches such as neuro-fuzzy, genetic algorithms (GAs), and so on to reduce the large learning space in multi-robot coordination. However, research in this area is still in its infancy, and there are many open questions yet to be explored.  1.4.3 Pose Estimation Technologies for Multi-robot Systems Estimation of pose (i.e., position and orientation) is essential for object transportation in a multirobot system. Technologies of pose estimation are utilized to estimate the real-time pose of objects, including obstacles and the other robots, in the work environment. It will help identify the world state and make appropriate decisions accordingly. Researchers have utilized various sensory methods such as digital CCD (charge coupled device) cameras, global positioning systems (GPS), laser range finders and sonar devices to identify the pose of an object in a robotic work environment. Also, sensing algorithms have been developed to recognize, identify objects and estimate features or poses of the objects in the environment.  14  Lang et al. (2008) presented a multiple sensor-based method for robot localization and box pose estimation. A CCD camera mounted on the robot was used to find the box in its environment. A laser range finder mounted on the robot was activated to measure the distance between the laser source and the object. Finally homogenous matrix transformation was applied to represent the global pose of the robot and the box. The approach was validated using computer simulation. Park et al. (2006) introduced a method for global localization of an indoor environment, which employed object recognition using a stereo camera. Their method of global localization first made a coarse estimation of the pose and then refined it. The coarse pose was estimated by means of object recognition and least squares fitting through singular value decomposition, while the refined pose was estimated by using a particle filtering algorithm with onmi-directional metric information.  Even though a vision system has been used for pose estimation, the  approach has a number of drawbacks. In particular, the presented schemes are computationally expensive, time consuming, have time delays, and are susceptible to changes in the ambient lighting conditions. Ekvall et a!. (2005) studied a computer vision approach for object recognition and pose estimation based on color co-occurrence histogram and a geometric model. Even though the approach was somewhat robust, it failed under large changes in lighting conditions. Kato et al. (2003) presented a method for identifying a robot and obtaining its relative position in a multi-robot system using an omni-directional vision sensor. Their approach was validated using the simulation and experimentation. Similar work has been done by Hajjdiab et a!. (2004). A vision based approach for multi-robot simultaneous localization and mapping (SLAM) was presented. They presented a method to calculate the locations of the robots using a collection of sparse views of the planar surfaces on which the robots were moving. The camera movements were estimated using inter-image homographs computed by matching the overhead transformed views. Tracking of multiple moving objects using vision sensors is an important issue in road traffic control systems. Chen et al. (2005) introduced a framework for spatiotemporal vehicle tracking based on unsupervised learning segmentation and object tracking. Adaptive background learning  15  and subtraction method was applied to two real life traffic video sequences in order to obtain accurate spatiotemporal information on vehicle objects. Laser range finders and CCD cameras are commonly-used sensors in mobile robots. Most laser range finders have the ability to scan the surrounding environment and accurately detect objects within a radius of 50 meters, and return a distance measurement for each degree in a 180 degree range. On the other hand, CCD cameras can be used to detect the shape, color and surface features of the objects of interest and return the corresponding vision information. Researchers have combined these two sensors for object recognition and localization. Murarka et al. (2006) proposed a hybrid method incorporating laser range finders and vision for building local 2D maps to help an intelligent robotic wheelchair distinguish between safe and hazardous regions in its immediate environment. Here, laser range finders were used for localization and mapping of the obstacles in the 2D laser plane while vision was used for detection of the hazards and other obstacles in the 3D space. The information on the obstacles was used to construct a local 2D safety map. Tomono (2005) used a mobile robot equipped with a laser range finder and a monocular camera to build a 3-D environment model. In this framework, they first built a 2-D map by scanning objects in the environment with a laser sensor, and then verified the detected candidates by using vision-based object recognition. Takahashgi and Ghosh (2005) proposed a method to identify the parameters of a moving object by integrating a laser range finder and a vision sensor. The approach has been reduced the dimension of the parameter ambiguity. Gopalaicrishnan et al. (2005) developed a mobile robot navigation project which integrated a laser range finder and a vision sensor. The laser range finder was used for localization and the vision sensor was used object detection. Also a route based navigation technique was used for mobile robot navigation.  16  1.5 Contributions and Organization of the Thesis This thesis performs research and develops advance techniques which will help multi-robot systems operate in a robust and effective manner in a dynamic and unknown environment. The main contributions in the thesis are as follows: o A multi-robot coordination mechanism and a fully distributed three-layer multi-robot hierarchical architecture are developed to carry out cooperative tasks, which integrates machine learning and behavior-based approach of robot coordination. This coordination technique enables efficient state identification and coordination among robots. The developed system architecture has the ability to dynamically allocate a task in an unknown environment. o An improved reinforcement learning algorithm termed the “modified  Q learning  algorithm” is developed, which is able to provide efficient state identification and optimize the action selection conflicts in multi-robot learning. It also helps deal with some key issues in multi-robot learning; for example, task assignment, reduction of the learning space, and circumventing the Markov assumption. o A method is developed to estimate the pose of an object, which can be used to track potential obstacles and the other robots in the work environment simultaneously during object transportation. This approach is more realistic than other available approaches. It is validated using two types of physical experimentation. o A physical multi-robot transportation system, which integrates the developed techniques, is developed in our laboratory (Industrial Automation Laboratory). This multi-robot system works in a dynamic and unknown real-world environment and shows effectiveness, flexibility and overall good performance. The organization of the thesis is as follows. Chapter 1 has discussed multi-robot systems and the research objectives of the thesis. It defined the object transportation problem as studied in the thesis, and discussed related past work. Chapter 2 addresses the multi-robot coordination mechanism and the three-layer multi-robot architecture. Two techniques of multi-robot coordination included in multi robot architecture the behavior-based approach and the machine -  17  learning approach  -  are discussed and their relationship is presented. Several pertinent issues of  distributed controls for multi-robot coordination are discussed. Chapter 3 proposes a modified  Q  learning algorithm in the multi-robot domain to facilitate robotic decision making in a dynamic and unknown environment. This algorithm is assessed using a multi-robot transportation task in simulation. The results are analyzed, and the advantages and disadvantages of the algorithm are indicated. In Chapter 4, an innovative method for detennining the pose of an object is developed, which will facilitate the robots to identify useful objects in a work environment. The technology is validated using a task of multi-robot object transportation, in simulation. Chapter 5 presents a physical multi-robot cooperative transportation system operating in a dynamic and unknown environment. Microsoft Visual Studio 2005  —  distributed computing model is introduced to  allow robots to effectively communicate and cooperate. Experimental results are presented to study the performance of the developed system, particularly concerning adaptively and robustness. Chapter 6 summarizes the primary contributions of the thesis and indicates some relevant issues for future research.  18  Chapter 2 Multi-Robot Coordination for Object Transportation 2.1 Overview Multi-robot coordination can be defined as the ability to manage the interdependencies of activities between mobile robots. Coordination among a group of robots should in principle improve the overall performance of the team of robots, as robots may share their world views and may negotiate to identif’ the present world state and cooperatively select the actions. However, in practice, effectively handling in real-time, multi-robot information and coordination is a challenging task. Many open questions remain in the area of multi-robot coordination. How should a group of robots divide tasks among its members? Once roles have been assigned to the robots, how should they position themselves to fulfill their roles without interfering with their team-mates? What happens if a robot fails or if the environment changes so that a different robot is more suitable for the task? The coordination of multi-robot teams in dynamic environments is one of the fundamental problems in multi-robot transportation. The underlying question is, given a group of robots and a task to be performed in a dynamic environment, how the robots should be coordinated in order to successfully complete the task? Coordination normally implies synchronization of robot actions and exchanging of information among the robots. The level of synchronization and communication depends heavily on the task requirements, characteristics of the robots, and the environment. Thus, different levels of coordination are required in different situations. Coordination of a group of mobile robots is performed according to a multi robot architecture developed in the present work. The architecture represents the organization structure of the key components and their functional descriptions in a multi-robot system. Promoting cooperation or coordination among robots and enabling a member robot to make rational decisions are the main objectives of the architecture.  19  In this chapter, a framework for a group of robots to coordinate their activities and a hierarchical multi-robot architecture are proposed. This hierarchical multi-robot architecture integrates machine learning and behavior-based approaches. In section 2.2 a fully distributed hierarchical multi-robot architecture for robotic decision making and detailed implementation is described. Section 2.3 presents a coordination framework for a group of robots. This framework for multirobot coordination facilitates global sharing of information during the stage of object environment state identification. It also describes how obstacle identification in the object neighborhood may be combined with multi-robot coordination. Finally, in section 2.4, relevant distributed control issues in multi-robot systems are discussed. The multi-robot architecture and coordination framework developed in this chapter will be validated using computer simulations and physical experiments. Details of the validation process will be presented in the subsequent chapters.  2.2 General Hierarchical Architecture for Multi-Robot Coordination There are five main approaches for multi-robot coordination as mentioned in section 1.4.1: the behavior-based approach, the planning-based approach, the distributed control-based approach, the market-based approach and the learning-based approach. Among these approaches, the behavior-based approach is the most popular one because of its simplicity and robustness. However, this approach is particularly suitable for a known and structured environment. The major drawback of this approach is that the designer must have an ability to predict all the available environment states, because he needs to design all possible behavior rules. However, most realistic systems need to make decisions in a dynamic and unknown environment, and this faces numerous challenges as indicated below. First, a human must design all the behavior rules and individual robots must establish their preferences in advance. It is virtually impossible for a human designer to predict all possible environmental states that the robots will encounter and design the relevant behavior rules for them. On the other hand, within a dynamic and unknown environment, behavior-based robotic decision making can easily fail when the number of environmental states is large. Second, typically there will be a large number of environmental states in a dynamic and unstructured environment. The designer must design a general rule base to deal with all these environment states. However, it will be virtually impossible for the designer to arrange the preference for each 20  behavior rule in an extensive rule base due to the problem of “combination explosion.” Therefore a purely behavior-based approach for multi-robot coordination is not feasible in a unknown and dynamic environment. Planning-based approach utilizes planning techniques of traditional artificial intelligence (Al) to establish optimal decisions for the robots (Miyata et al., 2002) in a multi-robot system. The centralized approach and the distributed approach are the two types of planning-based approaches for multi-robot coordination. In the centralized planning approach, the actions are planned by a single robot in the system and this robot assigns subtasks to all other robots in the team. Task allocation among multiple robots is a NP (non-deterministic polynomial) time problem in the theory of computational complexity. In fact computational complexity is the key disadvantage of this approach. For a practical multirobot systems consisting of large number of robots, finding an analytical solution is very difficult and even impossible, through this approach. A compromise would be to approximate it by employing an Al based optimization technique. Furthermore, the lack of communication will make the centralized approach weak. On the other hand, the distributed planning approach plans robot actions independently and achieves a globally optimal decision by integration the individual decisions of all the robots. In view of its difficulty, few successful examples have been reported in the literature. In addition, both centralized and distributed planning approaches can easily fail in a dynamic and unknown environment since these approaches are unable to alter their policies (action sequences) for the robots quickly enough to adapt to a rapidly changing environment. The distributed control approach (Pereira et al., 2004; Wang and de Silva, 2005) utilizes a distributed control strategy such as “Form Closure,” “Object Closure,” or “Leader-Follower Control” to assign sub-tasks to the robots for controlling the team formation in the system. Distributed control is not a general approach. In the class of multi-robot cooperative transportation, it is only suitable for special task situations, and this is a major disadvantage of the approach. Furthermore, the approach of distributed control also suffers from the problem of computational complexity.  21  The market based approach (Mataric et al., 2002) is a rather promising approach for multi-robot coordination in a dynamic environment. However, there are many open questions related to this approach. Its main challenges are computational complexity and real time performance. The machine learning-based approach attempts to circumvent the computational complexity problem which affects other multi-robot coordination approaches by approximating the optimal polices of the robots. In particular, in view of its good real-time performance and simplicity, reinforcement learning, particularly the  Q learning algorithm, has been employed commonly  (Parker et al., 2002; Martinson and Arkin, 2003; Martinson and Arkin, 2003; Dahi et al., 2004; Taylor and Stone, 2005; Wang and de Silva, 2007). In a survey paper, Arai et al. (2002) have recommended that the area of learning based multi-robot coordination. The learning capability is essential for multi-robot coordination and task execution in a dynamic and unknown environment. As suggested in Chapter 1, the learning capability will enhance the adaptability to the dynamic environment and assist the robots to identify new environmental states. The robots can continuously adjust their action policies and improve the performance based on the experiences of past success and failure in a dynamic environment, thus completing the required tasks in a faster and more robust manner. Furthermore, Wang and de Silva (2006) have shown that the single agent learning scheme when extended to a multi-robot system, will violate the original assumption of a stationery environment. Consequently, the learning algorithm may not converge to its optimal policy. These observations clearly show that the problem of decision making for multi-robot coordination is a challenging undertaking. A single approach may not be able to solve all the problems encountered in a cooperative multi-robot transportation process. Therefore, the combination of several existing techniques will be employed to develop a multi-robot architecture that can effectively support multi-robot coordination and cooperation in a dynamic and unknown environment. This fully distributed multi-robot hierarchical architecture integrates two popular artificial intelligence (Al) techniques; namely, machine learning and behavior-based approaches for decision making in a dynamic, unknown and unstructured environment. Figure 2.1 shows the  22  proposed three level multi-robot architecture that offers improved robustness and real-time performance for multi-robot object transportation.  High Level Tack Decomposition Ard ASn[Tilt  Behaviour based Distnbued Corirol --  Hardware Level  Rcboj  Figure 2.1: Hierarchical Architecture for Object Transportation. The hierarchical architecture for object transportation integrates different techniques in a hierarchical manner to provide flexible and robust capabilities to the robots for decision making while familiarizing with the external environment dynamically. Basically, it consists of three layers: Task Decomposition and Assignment layer, Behavior-based Control layer, and Hardware level layer, subject to the constraints of the available robots. Task Decomposition and Assignment represents a high level layer. It has a learning unit and a coordination unit which are used to implement inter-robot coordination. The distribution of available information among robots plays a vital role in multi-robot cooperative transportation. Particularly, one robot in the system will collect information from its own sensors, and will transmit the information to the system. All robots in the system will gain an understanding about the sensor information of the peer robots in addition its own, though the communication network.  23  Such communication will determine the current world state cooperatively, and decide the robotic actions sequentially using its learning or coordination units to reach the goal. The high level layer will perform inter-robot task decompositions and assignment. During the transportation process each robot may use a learning unit or a coordination unit for making decisions. Particularly, in the stage of environmental state identification an arbitrator will be assigned dynamically who will monitor the environment state identification. A coordination unit is used to coordinate the activities of the particular stage. The arbitrator will decide which unit is used to make decisions for the stage of environment state identification. For example, when the robot navigates in a dynamic environment, usually the learning-based unit is activated. The arbitrator typically switches on the learning unit for decision making of the robot. However, once the arbitrator realizes that a robot within the system has identified the object and estimated its pose, it will turn off the learning unit temporarily and switch on the coordination unit. This will coordinate the state identification stage for the robots in the system within the bounded local region called the detection area (Details are fund in Chapter 3). When the robots complete the state identification stage, the arbitrator will return the decision making capability to the learning unit. Through this type of switch mechanism, the multi-robot architecture benefits from multirobot coordination and the flexibly adapt to various environments. A popular behavior-based approach is employed in the middle level of the architecture, to implement robust control of robotic actions. As stated above, the behavior based approach is known to be efficient in multi-robot cooperative control. The hierarchical architecture utilizes the behavior-based approach to execute sub-tasks as determined by the top level. When the task decomposition and assignment level of the architecture establishes the current sub-tasks for the robots, they will be sent down to the behavior-based control level so as to be decomposed in further detail of robot behavior. The behavior-based control level consists of four types of behavior. The communication behaviors are responsible for communication among robots. The group behaviors execute coordination among robots; for example, “Assume Formation” and they are typically fired by the learning unit or the coordination unit in the top level of the architecture. Composite behaviors are responsible for the control of composite actions of individual robots such as “Find Object,” “Move to Location” and “Avoid obstacle.” These composite behaviors can be decomposed further into primitive behaviors such as “Turn left,” “Go straight” and  24  _____________  _________  “Move Forward.” Through the design of hierarchical behaviors and by carefully organizing their preferences, the robots can complete the required tasks as defined by the Task Decomposition and Assignment level, in a effective and robust manner. The low level of the architecture is the hardware level. It consists of sensors, actuators, and communication hardware of the robots. They compose the hardware platform of the hierarchical multi-robot architecture for object transportation. What is proposed is a general architecture for distributed multi-robot coordination and decision making. This architecture is common for all robots in the system. A composite representation of the proposed architecture is given in Figure 2.2. Robot 01  High Level  Control  Robot 02  High Level  Control  Robot 03 High Level Control  Behaviour Control  Behaviour Control  Behaviour Control  Hardw’arc Level  1—lard ware Level  Hardware Level  A  Figure 2.2: Composite representation of the proposed architecture The top level of the architecture implements “soft” control of a robot while the middle level implements “hard” control behavior. In particular, the top level decomposes and assigns tasks among robots; and the behavior based control layer decomposes the actions of a single robot further into more detailed primitive behaviors and controls motions of the robots. By combining established mainstream approaches in multi-robot coordination, such as learning and behavior based methods, this multi-robot architecture provides more flexibility and cooperating power in a dynamic, unknown and unstructured environment than those presented in the literature. The proposed multi-robot architecture will be validated through computer simulation and experimentation, as presented in the following chapters of the thesis.  25  2.2.1 Machine Learning Unit The machine learning unit is located in the top level of the proposed multi-robot architecture. The main purpose of this unit is to learn optimal decision making in a given dynamic environment though continuously monitoring actions of robots and the corresponding environmental rewards. The principle of the machine learning unit is presented in Figure 2.3.  Figure 2.3: Principle of Machine Learning. All robots in the system gain knowledge by continuously monitoring the current environmental state, selecting an action sequentially and executing it, and receiving rewards from the environment. At the sequential action selection stage, each robot needs to observe concurrently the actions of the peers in the system and review their effects on the environment. Each robot employs a machine learning technique to learn from the past experience with the work environment. The machine learning technique developed in the thesis continuously improves its performance and approximates its optimal policy for decision making. The machine learning unit dynamically adjusts the mapping relation among the environment state and its actions. Consequently, the machine learning unit will lead to better performance in the system robots and will complement the behavior layer in the middle level. In this thesis,  Q learning is utilized and enhanced to establish the mapping relationship between  the environment state and the robot actions. The maj or challenge arises in the modification of the  26  single agent  Q learning to accommodate a multi-robot environment. Next chapter will discuss  this issue in detail.  2.2.2 Coordination Unit In the prototype system there are three autonomous heterogeneous robots which attempt to push a rectangular object cooperatively from a starting position to a goal location. Many challenges exist in this transportation task. For example, there are obstacles distributed randomly in the environment and each robot possesses local sensing capabilities only. This means a robot will not know about an obstacle before it moves sufficiently close to the obstacle. Another challenge is that the object is rather long and heavy, so a single robot will not be able to handle it alone. The robots have to find optimal cooperative pushing strategies to move the object to the goal location while avoiding collisions with any obstacles in the environment. In order to reach this goal, an arbitrator is designed in the top layer of the proposed architecture, which will select either the learning unit or the coordination unit for multi-robot decision making. When a multi-robot system begins its task, the arbitrator selects the machine learning unit as its decision making unit. Robots start exploring the environment for color coded object to be transported. Once an object is detected by a robot in the system, it rearranges its pose to center the color blob in the camera frame. Then the appropriate robot estimates the pose (position and orientation) of the object and the information will be transmitted to the other robots. After the object is detected and the pose is estimated, the particular robot will temporarily close the machine learning unit and transfer its decision rights to the coordination unit. At the same time the robot that detected the object is assigned as the leader of the team and all other robots in the system will stop their learning units and transfer their decision rights to the coordination unit of the leader robot. After all the robots have transferred their decision making rights to the leader robot, the coordination unit is activated temporarily. Hence, the leader robot should assign tasks to peers in the system who will assist in identif’ing the obstacle distribution around the object. The leader robot transmits all the processed sensor information to the system, and the robots will be able to establish the current local environment state individually after switching on their learning unit. After the state identification stage, the coordination unit of the leader robot will stop working and will transfer the decision rights to all the learning units of the robots.  27  Figure 2.4 shows a representation of the local environment state in the multi-robot object pushing task which employs the modified Q learning algorithm. This will be described in detailed in the next chapter. In the present subsection it is demonstrated why the coordination unit is necessary for multi-robot tasks.  Detection Radius  _. —  Goal Location  I  —  I /  ,  /  ,/  0  —  •1  ,“  o  —  /  —‘  \  \  1  9  .//••• \.  I  X  Object / I/O  ,..  I  /  Obstacle  ‘••_%.  4  ——  J  Figure 2.4: The environment state representation for coordination. Both learning and coordination units are important for a multi-robot system to successfully complete cooperative tasks in a dynamic and unknown environment in the presence of complex obstacle distribution.  2.2.3 Behavior-Based Distributed Control The proposed hierarchical multi-robot architecture is represented in Figure 2.1. The behavior based distributed control layer is designed to control the local actions of the robots in the system. It is located between the high-level task decomposition and assignment layer and the low-level hardware layer. Normally, the high-level task decomposition and assignment layer selects quite abstract actions, which cannot be directly executed using actuators in the hardware layer. Hence, 28  these high-level abstract actions need to be decomposed into more detailed actions in the layer of behavior based distributed control. The hierarchical behavior control architecture CAMPOUT has been developed at NASA Jet Propulsion Laboratory JPL (Huntsberger et at., 2003) to control the robot behavior in a robust -  manner. Their hierarchical control architecture consists of two types of behaviors called group behaviors and primitive behaviors. The group behavior layer further decomposes them into composite and communication behaviors. The mid-level behavior control layer based on CAMPOUT clearly indicates the hierarchy of communication behaviors, group behaviors and composite behaviors. Communication behaviors are the high level behaviors relative to the group behaviors and group behaviors are accomplished though the communication behaviors. Composite behaviors are the low level behaviors relative to group behaviors. The general idea of the behavior control layer is presented in Figure 2.5.  Figure 2.5: The hierarchical behavior-based control layer. Behaviors serve as basic building blocks for robotic actions in the “Behavior-Based Systems.” Simply put, behavior is a reaction to a stimulus. Thus, in some literature, behavior systems are also called as reaction systems. In reactive robotic systems, actions of robots are tightly coupled with their perceptions without a knowledge reasoning process or time history. The foundation for behavior-based systems has originated from animal models. Normally, biology has provided an existence proof that many of the tasks we want the behavior-based robotic systems to undertake are indeed doable. Additionally, the biological sciences, such as neuroscience, ecology, and  29  psychology have elucidated various mechanism and models that may be useful in implementing behavior-based robotic systems. Behavior-based systems have several advantages over planning-based systems. First, a behaviorbased system is simple and easy to design. For a fast changing environment, a planning-based system is difficult or even impossible to implement because it cannot adapt to a fast changing environment. A behavior-based system is much easier to design than a traditional system with planning and reasoning by tightly coupling perceptions with actions and avoiding complicated time consuming reasoning and planning. Also, a behavior-based system can be successful even in a dynamic and unknown environment and it has shown better performance and robustness in physical robotic systems, if the designer can predict all possible environmental states and arrange the corresponding behavior rules to deal with them.  2.2.3.1 Behavior Representation Several methods are available for expressing robotic behavior. These methods are stimulusresponse diagram (SR), functional notation, and finite state acceptor (FSA). SR diagrams are used for graphical representation of specific behavior configurations. Here, behaviors are expressed using Stimulus Response (SR) diagrams. Figure 2.6 shows an SR diagram. Stimulus  4&\iieIIaviou,,  r  Response  Figure 2.6: SR diagram of a simple behavior. According to Figure 2.6 a behavior can be expressed using the stimulus, response, and behavior. Therefore, the triple (S, R, ,6) represents the behavior, where S denotes the set of all possible stimuli, R represents the range of the response, and  fi denotes the mapping function between  SandR.  Binary coded duple (p, )) represents an individual stimulus s(s e 5), where p indicates the type of stimulus and . % denotes the strength of the stimulus. A threshold value r is assigned for each stimulus, and the response will be generated only if the value of stimulus is bigger than v.  30  In addition, a response r can be expressed with a six dimensional vector consisting of six parameters for mobile robots. For each of the six degrees of freedom of motion, each parameter encodes the magnitude of the translation and orientation responses: (2.1)  r=[x,y,z,8,cb,çi],where reR  The gain value g is introduced for a particular behavior /3 in order to further modify the magnitude of its response. A behavior is expressed as r=g*r=g*/3(s)  (2.2)  2.2.3.2 Behavior Composition and Cooperation Behavior coordination has two predominant classes of coordination. These classes are competitive (composition) class and cooperative class. Each class has several strategies for realization. First consider the composition of behaviors which is described in Figure 2.5, which clearly shows how composite behaviors are decomposed further into primitive behaviors. Therefore, simple primitive behaviors can be joined to construct more abstract composite behaviors. Composite behaviors are behavior packages on which a behavior-based robot system is built. A behavior coordination operator and a number of behavioral components are included with each composite behavior. The behavioral components may be either primitive behaviors or composite behaviors. High level behaviors can be built on some simple behaviors. These high level behaviors can be referred without needing to know their behavioral components. Composite behavior is given as follows: p  =  C(G * R)  =  g 1 0  1 S  ii  where, R  (2.3)  C(G * B(S))  r ,  S=  S ,  s,  G  =  o...o g2  0 0...0g  ,and B  /32  31  Here, B stands for the vector with n number of current composite behaviors (fit ‘182’  fin).  S represents a vector of all detectable stimuli relevant for each behavior ,8 at time t; R represents a vector of response of n number of behaviors  (181,182,.  ,  /3,,) at time t; G represents a  gain matrix which modifies the response matrix R; and C represents the behavior coordination operator for composite behavior (There are two types of behavior coordination operators, called competitive operator and cooperative operator, for two classes of behavior coordination). Figure 2.7 shows the SR diagram of a composite behavior with a competitive operator.  p Stiniu Ii• Response  Figure 2.7: The SR diagram for a composite behavior with a competitive operator.  According to this figure, conflict can result when two or more behaviors are active simultaneously, each with its own independent response. Competitive method provides a means of coordinating behavior response for conflict resolution. This competitive operator is designed to select the winner behavior from among the simultaneously activated behaviors. Only the winner’s response is directed to the robot for execution. There are numerous competitive operators; for example, suppression-based arbitration, arbitration via action selection, voting based arbitration, and so on. The other type of behavior coordination operator is the cooperative operator, which is shown in Figure 2.8.  32  Si i-i  p Stimuli Response  Figure 2.8: The SR diagram of a composite behavior with a cooperative operator. The cooperative method provides an alternative to a competitive method such as arbitration. It uses behavior fusion and provides the ability to concurrently use the output of more than one behavior at a time. The outputs of multiple behaviors are fused and directed to the robot for execution. It has been pointed out that multiple behaviors are executed by the robot one at a time if they do not conflict with each other. The simplest cooperative operator is performed through vector addition or super-positioning. The key to a cooperative operator is to design the behaviors so that their responses can be fused. The potential field-based method is the most popular approach, which simply adds the outputs of multiple behaviors.  2.2.3.3 Communication Behaviors for Coordination For coordinating the activities of the robots in a multi-robot system, communication behaviors are designed to receive information from the high level task decomposition layer, exchange sensing information, objectives, and internal states of the robots. The communication behavior coordination is implemented through the high level task decomposition layer which is located in the top level of the behavior-based control layer. The behavior communication layer sends its received information to the group behavior layer for low level control. Explicit communication and implicit communication are the two types of communication. Explicit communication concerns sending or receiving information via computer network data links. However, robots also have the ability to exchange information indirectly with its teammates by interacting with the environment or using sensing. For example, a robot can estimate its relative distance from another robot or from closest obstacle by using its CCD camera or laser distance finder.  33  2.2.3.4 Group Behaviors for Coordination The group behavior layer is responsible for decomposing the abstract behaviors, which are received from the high level task decomposition layer though the communication behavior layer. The group behavior layer will carefully coordinate the activities of the robots with the purpose of managing and solving conflicts among robots and making them contribute to the common tasks. The learning or coordination unit continuously decomposes subtasks among multiple robots and assigns them to each robot. When a robot receives its sub-task, it will decompose the sub-task further into more detailed composite behaviors in its behavior-based distributed control level and finally convert them into primitive behaviors for execution.  2.2.3.5 Behavior Coordination for Object Transportation Subtask A fully distributed multi-robot coordinated object transportation system is developed in this thesis. The behavior-based distributed control layer facilitates the implementation of the subtasks which are assigning by the learning/coordination units in the top level of the proposed architecture. Figure 2.9 shows the behavior hierarchy in the behavior-based distributed control layer for the coordinated object transportation subtask. In the object transportation subtask, first the robots wander in the environment, looking to find an object of interest. The group behavior of the “Assume Formation” makes the robot move to an intended site around the transported object so as to establish a formation with its peer robots. The learning/coordination unit in the top level of the proposed architecture establishes the transportation formation of the robots; it decomposes into the specific position and orientation of the robots, and sends the determined subtasks to the behavior control layer for execution. When the “Assume Formation,” parent behavior is activated, it will first turn on its child behavior of “Find Object.” It attempts to search for the object to be transported. If this object found from the environment, it estimates the object pose (location and orientation) in the environment. A CCD camera and a laser range finder are used to accomplish this estimation. Simultaneously, the composite behavior “Find Object” will call the “Search Color Blob” composite behavior to make the robot to rearrange its pose continuously and try to locate the predefined color blob attached to the object and move it to the center of the camera frame, by using its CCD camera. The predefined color blob can be used to distinguish the  34  object of interest from other objects or obstacles in the environment. Then the “Laser measurement” primitive behavior is triggered to estimate the pose (position and orientation) of the object using the laser range finder. After that the composite behavior “Find Obstacle” will call the “Laser measurement” primitive behavior to measure the distances to the obstacles from the object of interest, angles and sizes. Then, the behavior of “Move to a Location” will attempt to move the robot to the designated action position as decided by its higher level learning/coordination unit. The composite behaviors “Move” and “Obstacle Avoidance” are activated simultaneously, until the robot moves to its designated action position.  r Lend:  ()  V)  [  Behar  J  Figure 2.9: Behavioral hierarchy for the coordinated transport sub-task.  35  The group behavior “Carry” is activated after all robots reach their designated action positions, and creates a formation around the object of interest. When the “Carry” parent behavior is activated, it will first turn on its child behavior of “Signal” which attempts to start a synchronization process to transport the object. During the transportation process, the robots need to monitor whether the object collides with any obstacles in the environment. If the object collides with an obstacle, the robots will abandon the “Carry” behavior and inform the incident to their learning/coordination units in the top level task decomposition and assignment layer. Then learning/coordination units plan their formation again. No at isiqiaed pt;on Other  Tha forniulator is eablished  Not at th goal postion  All  Figure 2.10: FSA describing the transitions of the group behaviors in the task of coordinated object transportation (Wang, 2007). Finite State Diagrams (FSA) is a method for expressing robotic behaviors. Here, Finite State Diagrams (FSA) are used to describe the aggregations and sequences of the behaviors so that the process of behavior activation and transition becomes more explicit. A finite state diagram can ,F) with 0 be specified by the quadruple (Q,6,q  Q representing all behavioral states;  0 representing the initial state; F representing all final (termination) states; and ö representing q the mapping function which maps the current behavior state q and input a into a new behavioral  36  state q’, i.e., q’ = 5(q,a). The FSA describing the transitions of the group behaviors in the project of coordinated object transportation is shown Figure 2.10.  2.3 Mechanism for Multi-Robot Coordination A multi-robot coordination mechanism must cope with different types of requirements, such as robot positioning and synchronization. It should provide flexibility and adaptability, allowing multi-robot teams to execute tasks efficiently and robustly. To accomplish this, a dynamic role assignment mechanism is proposed in which robots can dynamically change their behavior, enabling them to successfully execute different types of cooperative tasks. This approach is validated using environmental state identification. A role is defined as a function like environment state identification, which one or more robots perform during the execution of a cooperative transportation task. The particular state identification stage consists of three roles 3 Obstacle Identification. 2 Object Identification, and s 1 Wandering; s (modes): s -  -  -  2.3.1 Dynamic Role Assignment Mechanism for Coordination Unit In general, to execute a cooperative role a team of robots must be coordinated; specifically, they have to synchronize their activities and exchange information. In this approach, each robot performs a role that determines its activity during the cooperative task. According to its internal state and information about the other robots and the role received through communication, a robot can dynamically change its role. The mechanism for coordination is centralized, but it is activated only for a short time. As mentioned in subsection 2.2 the leader robot has the power to make decisions based on its local information and the information received from peers. Each team member has a specification of the possible activities that should be performed during each phase of cooperation in order to complete the state identification task. These activities must be specified and synchronized considering several aspects, such as robot properties, task requirements, and characteristics of the environment. The dynamic role assignment will be responsible for allocating the correct activities to each robot and synchronizing the cooperative execution though the leader robot in the system. Before describing the details of the role assignment mechanism, it is necessary to define what a role is in a cooperative transportation task. A role is defined as a function that one or more robots  37  perform during the execution of the state identification stage, which is a cooperative robotic task according to the modified  Q learning algorithm. Each robot will perform a role while certain  internal and external conditions are satisfied, and will assume another role otherwise. The role will define the present activity of the robot in that instance. Here three distinct conditions are considered for role transition. CCD camera and laser range finder is used as sensors for this particular task, generating high level stimulus to the system. The role assignment mechanism allows the robots to change their roles dynamically during the execution of a task, adapting their activities according to the information they have about the system and the task. Basically, according to the state identification stage; there are two ways of changing roles during the execution of a cooperative task. The simplest way is the Allocation, in which a robot assumes a new role after finishing the execution of another role. In the Reallocation process, a robot terminates the performance of one role and starts or continues the  performance of another role. Figure 2.11 shows a diagram with the two types of role assignment. The vertical bars inside the robots indicate the percentage of the role execution that has been completed. Ro1, 02  Rok 01  I  ‘—  Aflocattion  \__.  RaI1ocaiiic,n  Figure 2.11: Types of role assignment.  38  The control software for a robot includes programs for each role it can assume. The local information consists of the robot’s internal state and its perception about the environment. Global information contains data about the other robots and their view of the system and is normally received through explicit communication (message passing). A key issue is to determine the amount of global and local information necessary for the role assignment. This depends on the type of the activity that is being performed. The present approach allows for two types of explicit communication: synchronous and asynchronous. In synchronous communication, the messages are sent and received continuously in a constant rate, while in asynchronous communication an interruption is generated when a message is received. Synchronous messages are important in situations where the robots must receive constant updates about the activities of the others. On the other hand, asynchronous communication is used for coordination when, for example, the leader robot needs to inform the others about the object pose or the leader robot need to interrupt the wandering role of the other robots, and so on. An important requirement is to define when a robot should change its role. In the role allocation process, the robot detects that it has finished its role and assumes another available role. For example, consider the robots which wander in the work environment to find a color coded object. If a robot detects the color coded object then it needs to identify the object (particularly, estimate the object pose). In the reallocation process, the robots should know when to abandon the current role and assume another role. A possible way to do that is to use a function that measures the utility of performing a given role. A robot which performs role r has a utility given by Pr• When a new role r’ is available, the robot computes the utility of executing the new role Pr’• If the difference between the utilities is greater than a threshold: T((Pr  —  Pr)  >  r) the robot will change  its role. The function p can be computed based on local and global information and may be different for different robots, tasks and roles. Also, the value r must be chosen such that the possible overhead of changing roles will be compensated for by a substantial gain on the utility and consequently producing a better overall performance.  39  2.3.2 Dynamic Role Assignment Model for Coordination Unit The dynamic role assignment can be described and modeled in a more formal framework. In general, a multi-robot system can be described by the robotic state (X), which is a concatenation of the states of the individual robots: x 2 , 1 X=[x  (2.4)  ,xj  Considering a simple control system, the state of each robot varies as a function of its continuous robotic state (xi) and the input vector (u,). Also, each robot may receive information about the other robots in the system (2). This information consists of estimates of the robotic states of the other robots that are received mainly through communication. We use the hat (A) notation to emphasize that this information is an estimate because the communication can suffer delays, failures, and so on. Using the role assignment mechanism, in each moment a robot will be controlled using a different continuous equation according to its current role in the state identification stage. Therefore, we use the subscript q to indicate the current role of a robot. By following this description, the state equation of robot i during the execution of the task is given by:  i  (2.5)  =fq(XjUj2j)  Since each robot is associated with a control policy, we have (2.6)  1 =kjq(Xj,2j) u  and since 2 is a function of the state X, we can rewrite the state equation: =  (2.7)  fq(X)  or, for the whole team,  X = F (X), whereFq = {J  T ,f]  (2.8)  The continuous behavior of each robot (it represent the robotic state at any instance) is modeled by these equations. Based on these equations, the continuous actions of the whole team of robots 40  are implemented during the execution of a cooperative state identification stage. These equations, together with the roles, role assignments, continuous and discrete variables, communication, and synchronization can be better understood and formally modeled using a function A, as described next. This function consist of a finite number of real-valued variables that change continuously, as specified by differential equations and inequalities, or discretely, according to specific assignments. The state identification stage consists of both discrete states (roles) and continuous states (robotic states). Therefore, the nature of the particular multi-robot system in state identification stage is a hybrid, given by A  =  (2.9)  {S,V,T,f,condl,cond2,init,R}  Here S  =  {l, 2  ,  k} is the set of roles and this is a discrete set of states of the robot. According  to Figure 2.12, a role is composed by other roles, which are called hierarchical/sequential compositions. Therefore, variables of the system V (V = V u J) can be composed by discrete information (Vd) and continuous information (Va)  .  J’ represents the robotic states and  1 j”  represents the sensory information within each role. The dynamics of the continuous information are determined by the flows f. Generally continuous information is updated according to the differential equations inside each role (fq) Discrete transitions between pairs of roles (p, q) are specified by transitionT, which represents the role assignment. condl and cond2 are predicates related to the set of roles (5) and transition T, respectively and both condl and cond2 are defmed when each robot will assume a new role. The system can stay in a certain control mode while its condl is satisfied, and can take transition when its cond2 (jump condition) is satisfied. The initial states of the system are given by mit, and each transition T can also have an associated reset statement R, to change the value of some variable during a discrete transition using discrete information. Furthermore, variables are also updated according to the reset statement of each discrete transition. Finally, we can model the cooperative state identification stage execution using a parallel composition. 2 Object Identification; 1 Wandering; s Figure 2.12 shows a role composed by the three roles: s -  3 s  -  -  Obstacle Distribution. Each role is characterized by differential equations and algebraic  41  constraints. The condition under which transitions between roles occur and the conditions under which robots stay in a particular role, are specified.  4 2’2(x)<0 (1  CT  Role br Robot  i  Figure 2.12: Schematic representation of a role composed by three roles. Communication among robots can also be modeled in this framework. We use a message passing mechanism to exchange information among the agents. To model message passing in a hybrid automaton, we consider that there are communication channels between agents, and use the basic operations send and receive to manipulate the messages. In the hybrid automaton, messages are sent and received in discrete transitions. These actions are modeled in the same way as assignments of values to variables (reset statements). It is rather common to use a self transition, i.e., a transition that does not change the discrete state, to receive and send messages.  42  2.4 Distributed Control Issues for Multi-robot Coordination Normally, a multi-robot system with a distributed control architecture does not employ a leader robot to make decisions for all other robots in the system. The proposed hierarchical multi-robot architecture is also distributed control architecture. In a distributed control system, each robot needs to decide its own actions independently by observing the environment or communicating and negotiating with its teammates. The only exception is the coordination unit in the proposed architecture which requires centralized decision making when the robots need to identifj the environmental state. However, the centralized coordination unit works only for the state identification phase which exists for a very short period of time compared to the transportation task. Once the multi-robot system leaves the state identification stage, the learning unit will takeover the control of the robot again. A distributed architecture has many advantages over a centralized control architecture, especially for multi-robot systems. First, a distributed multi-robot system has an ability to increase the number of robots in the robot team without significantly increasing its computational cost. This represents the scalability. For example, the architecture proposed in this chapter can be expanded from to 3 to 20 robots without serious problems. If a member robot quits or a new robot enrolls into the system during the transportation process, the system is capable of adapting to changes of the team size dynamically. For a centralized multi-robot system, scalability is a difficult issue because the computational complexity of a centralized decision making algorithm increases rapidly with the number of robots. Second, distributed systems have a low communication volume. A decentralized distributed system does not assign a leading robot. Therefore, bottlenecks in communication can appear. However, a centralized system has a central leader and when the number of robots in the system increases the communication bottleneck can be so serious that the algorithm becomes worthless. Furthermore, distributed systems are capable of accomplishing some of the cooperative tasks even without communication. Particularly, most realistic cases of mobile robot systems are expected to work with a limited communication capability, because most multi-robot systems need to run in an environment where communication among robots is usually unreliable; for example, the bottom of a deep ocean or on a planetary surface. If the system does not establish reliable communication among robots, centralized decision making becomes weak because the 43  robot members cannot receive commands from their leader in a timely and accurate manner. However, distributed multi-robot system may work well even when communication among robots is not reliable. Third, the distributed systems are robust. Each robot in the system can make their decisions individually to accomplish a common task. When one robot in the system malfunctions or quits, other robots still can make decisions by themselves to continue their tasks. However, in a centralized system one leader robot makes decisions for all other robots in the team. Hence, once the leader robot fails, the whole system will fail entirely. Therefore, such systems are not robust compared to distributed control systems. Low efficiency is a major disadvantage of a distributed system, where multiple robots reach their agreements through some cooperation technique such as voting, action, and so on. These cooperation approaches are usually time consuming and are not efficient. Normally, distributed learning algorithms are much more difficult and complex than centralized equivalents. Therefore, a distributed system is usually not optimal, because the final policy reached by the members is not optimal. It is still an open question as to how to implement efficient and optimal distributed decision making.  2.5 Summary In this chapter, a three layer distributed multi robot architecture, which accommodates distributed machine learning, behavior-based distributed control and low level “hard” control, was developed. The objective of the architecture is to support multi-robot systems to properly coordinate the robotic tasks in a dynamic and unknown environment. The proposed architecture is a general platform that merges several existing techniques of multi-robot decision making, and accommodates homogenous or heterogeneous robots in the same framework. Before proposing the architecture, several popular multi-robot coordination approaches given in the literature were discussed, and their advantages and disadvantages were compared. The discussed multi-robot coordination approaches are the behavior-based approach, the planning based approach, the distributed control-based approach, the market-based approach, and the learning-based approach.  44  Next, the machine learning unit and the coordination unit were introduced. In the proposed architecture, an arbitrator dynamically selects either the learning unit or the coordination unit to make decisions for the robots. Then the behavior-based control layer and their sub-layer were presented. In this thesis, it is combined with the learning and coordination units in the top level of the architecture to control the actions of a robot. After that, the mechanism of multi-robot coordination was presented. This framework, which facilitates a group of robots to coordination their activities, allows global sharing of information during the stage of object environment state identification. Finally, some distributed control issues of multi-robot systems were discussed. The advantages and disadvantages of both centralized and distributed architectures were pointed out and some open questions in this field were identified.  45  Chapter 3 Machine Learning for Multi-Robot Decision Making 3.1 Overview Multi-robot decision making is a challenging task in many real world problems, which involve real-time operation of robotic systems in uncertain and dynamic environments. In these problems, actions and interactions of multiple mobile robot have to be considered concurrently so as to reach a global optimal policy. Among the available approaches, planning-based approach is a contender for dealing with this problem. However, planned optimal policy usually is feasible in static environments, and it is shown that the planning-based approach is inapplicable to fast changing environments. Current approaches for decision making in uncertain and dynamic multi-robot environments such as planet surfaces and urban environments have two main difficulties. First, many approaches do not take into account the uncertainty inherent in these problem domains. For example, in the multi-robot environment, terrain features can change continuously. Second, for systems with many robotic agents, state of the art methods can grow to be too cumbersome to be and suboptimal. For example, even though the physical environment is stationary, from the perspective of a single robot, the particular static environment is still a dynamic environment because the other robots of a multi-robot system take actions, continuously making changes to the environment. In view of these considerations, the behavior-based approach and the machine learning approach are the primary approaches that are capable of overcoming the challenges of multi-robot systems operating in dynamic, unstructured, and unknown environments. As mentioned in the previous chapter, the behavior-based approach is more robust for a dynamic environment than the planning approach. It is so because it can quickly adapt to a fast changing environment, by identifying the current environment state and couple it with a specific action. In addition, the behavior-based approach is known to become unsuccessful in a complex environment with a large number of environment states.  46  However, it is highly effective in simple environments. Particularly, the behavior-based approach needs a human designer to design all behavior rules and the designer needs to foresee all possible world states that the robots will encounter. Therefore it is almost impossible for this approach to be effective in a dynamic and unknown environment with a large number of states. Considering how humans accomplish physical tasks in a dynamic environment, it is not difficult to come to the conclusion that a human employs not only planning or reactive (behavior-based) techniques but also learning techniques. Humans identify the new environment states through their learning capabilities, come across corresponding optimal actions from the past experience, and improve their planning and reactive techniques. In other words, humans explore the unexpected environment states with less uncertainty and make their decisions more robust and effective in a dynamic environment. Therefore, machine learning has become an important subject in the in the multi-robot field. Machine learning is considered a vital development due to the highly dynamic nature of typical multi-robot environments. Among the existing machine learning approaches, reinforcement learning (RL), especially  Q learning, is the most commonly  used one in multi-robot systems due to its simplicity and good real time performance.  Q learning algorithm is the most common Q learning algorithm. If we directly extend the single agent Q learning algorithm to a multi-robot system, the The single agent  assumption of static environment is violated, and as a result the values of the Q-table may not converge. Although the robots still can find some good policies for cooperation, the performance of the whole team of robots can be degraded when that approach is extended in this manner. (Wang and de Silva, 2007) Therefore, an improved version of the single agent  Q learning algorithm is proposed here  for robot decision making, which can come up with an optimal solution to the action selection conflicts of the robots in a multi-robot system. Hence the proposed  Q learning  algorithm is suitable for real-time operation of cooperating multi-robot systems.  47  In this chapter machine learning is used to make decisions for a multi-robot object transportation task. Markov decisions processes (MDP) and the single agent Q Learning algorithm are introduced in section 3.2 and section 3.3, respectively. The single agent Q learning algorithm will converge to an optimal policy in an MDP problem without a transition model. Modified Q learning algorithm is presented in section 3.4 and it is validated though computer simulation. Section 3.5 presents the distributed decision making unit of Q learning-based multi-robot object transportation, which is validated though computer simulation.  3.2 Markov Decision Process (MDP) The Markov decision process framework for decision making, planning, and control is surprisingly rich in capturing the essence of purposeful activities in various practical situations. MDP models and associated problem arise in many areas, including medical decision making, maintenance planning, and mobile robot navigation. Therefore MDP is a popular topic in artificial intelligence and it facilitates an intelligent agent to make decisions in real-world problems. The basic concept of MDP is introduced now. MDP can be defined by the tuple< S,A,T,R,/3>, where •  S=2 ,s 1 {s  ,s}, is a set of states of the environment  •  A2 ,a 1 ={a  ,a},is a set of actions by the robot  •  T : SxA —> fl(s), is a transition function, which decides the next environmental  states’ when the robot selects an action a, under the current states. fl(s) is  the  probability distribution over the states s. •  R: S x A  —  R, is a reward function. When the robot takes an action a. under the  current states, it determines the immediate reward. MDP presents a sequence of decisions made by an agent or a robot in an environment with many states. At time  t,  the agent or a robot requires to observe the environment and  48  determine its current state s e S, and then select an action a c A based on its policy  it  which denotes the action needed to be taken by the robot or agent to reach the desired state. Further, there is a reward function which denotes the immediate reward when the agent or robot takes an action a under the current state s. Two major attributes of MDP make it interesting and challenging. First, that particular subsequent states’ is not deterministic when the agent or robot obtains an action a under the current environmental state s. Instead, it has a probability distribution fl(s) over all the states. This probability distribution is defined by a transition function or transition model T(s, a, s 9. Second, MDP’s transitions among states are Markovian; that is, the probability of reaching s’ from s depends only on s and not on the history of previous states. The performance of an agent or robot policy  it  is measured by the rewards it received  when it made a sequence of decisions according to the policy and the sequence of states it visited. This measurement is usually represented by a sum of discounted rewards as given by  [fiuR(s)  (3.1)  I  where, 0 </3  1 is the discount factor, and R(s,) is the reward received when the agent  visits the state s at time t. Because the transitions among states are not deterministic in view of the probabilistic nature of the transition function T(s, a, s’), given a policy, the sequence of states visited by the agent or robot each time is not fixed and it has a probability distribution. Therefore, an optimal policy  ir  is a policy that yields the highest  expected sum of the discounted rewards, which is given by *  I  )I7r 1 yr =armaxE[/3R(s  (3.2)  Given an MDP problem, the way to find all optimal policies has to be carefully considered, if more than one optimal policy exists. The value iteration algorithm has been  49  developed to find the solution to this problem (Russel and Norvig, 2003). In this algorithm, first the concept of “utility of a state” is defined as  0 =s] U(s)=E[fitR(s)Is  (3.3)  From equation (3.3), the utility of a state s is given by the expected sum of discounted rewards when the agent executes the optimal policy. If we have the utilities of all states, action selection (agent’s or robot’s decision) will become easy; specifically, to select the action that maximizes the expected utility of the subsequent state: (s) = arg max  T(s, a, s’)U(s’)  (3.4)  However, determination of the utilities of all states is not an easy task. Bellman found that the utility of state could be divided into two parts: the immediate reward for that state and  the  expected  U(s) = R(s) + ,8 max  discounted  utility  T(s, a, s’)U(s’)  of  the  next  state,  as  given  by  (3.5)  Equation (3.5) is the Bellman equation. There is a corresponding Bellman equation for each state. For example, if there are n states totally, then there are n number of Bellman equations for the states. The utilities of the n states are found by solving the n Bellman equations. However, the Bellman equation is a nonlinear equation because it includes a “max” operator. The analytical solutions cannot be found using techniques of linear algebra. The only way to find the solutions of n Bellman equations is to employ iterative techniques. The value iteration algorithm is an approach to find the utilities of all states, and is given below.  50  Function Value-Iteration (mdp,6) returns a utility function inputs: mdp, an MDP with state S, transition model T, reward function R,  discount 13, and the maximum error allowed in the utility of any state. local variables: U, U’ are the vectors of the utilities for the states in 5,  initially zero. 5 is the maximum change in the utility of any state in an iteration. repeat  U-U’;5-O for each state s in S do  U’(s)  If until  —  IU’(s)  R(s) + /3 max  —  U(s)  >  T(s, a, s’)U(s’)  S then S  —  U’(s) U(s) —  5<s(l—/3)//3  Figure 3.1: The Value Iteration Algorithm (Russel and Norvig, 2003).  3.3 Reinforcement Learning The value iteration algorithm is introduced in section 3.2 to solve MDP problems. However, the environment model T(s, a, s’) and the reward function R(s) are usually unknown in a real robot decision making situation. Therefore, the value iteration algorithm is not practically viable for real robot decision making. Usually, robots are unable to obtain perfect information of the environment model for employing the value iteration algorithm to solve the Markov Decision Process (MDP) problem. Reinforcement Learning is a viable solution to overcome this challenge.  51  In reinforcement learning, an agent or a robot observes a sequence of state transitions and receives rewards, through trials of taking actions under different states in the environment. This can be used to estimate the environment model and approximate the utilities of the states. This implies that reinforcement learning is a model-free learning approach, which is an important feature. There are numerous variants of reinforcement learning among which Q-learning is the most common algorithm. A typical Q-learning algorithm is introduced below. For each state  1 s s, e (s , 2  entry  to zero. Initialize v to 0.9. Initialize the discount factor 0< fi  ,  ,a) 1 Q(s  and the learning rate  ...,  0<i  •  Observe the current state .c  •  Do repeatedly the following:  and action a. e (a ,a 1 ,..., am), initialize the table 2  n)  1  1  • Probabilistically select an action ak with probability eQsT  P(ak) =  ,  m  and execute it  eQar  • Receive the immediate reward • Observe the new state  s’  • Update the table entry for • Q(s,ak)  =  r  Q(s, ak)  (1—i1)Q(s,ak)+77(r+,6rnaxQ[s’,a’])  •  Figure 3.2: The single-agent Q-learning algorithm. An empty Q-table is set up and all its entries are initialized to zero at the initiation of the Q-learning algorithm. The Q-table is a 2D table. Its rows represent the environment states  52  and its columns represent the actions available to the agent or the robot. Q(s ,a) 1 represents the value of a Q-table entry and describes how desirable it is to take the action under the state  Si,  while the utility U(s ) in section 2.2 represents how appropriate it 1  is for the agent or the robot to be in the state s,,. Parameters such as the learning rate r, the discount factor ,8 and the “temperature” parameter z, have to be initialized as well. During the operation, the agent or the robot observes the environment, determines the current state s, selects an action  ak  probabilistically with probability given in equation  (3.6) and executes it. es )Ir P(ak)=  (3.6)  m  Once the agent or the robot executes the action ak, it will receive a reward r from the environment and will observe the new environment states’. Based on the information of the immediate reward (r) and the new environment states’, the agent will update its  Q  table by utilizing the following equation: Q(s, ak) = (1— 77)Q(s, ak) +  (r +  ,Bmax Q[s’,a’j)  (3.7)  After updating the Q-table, the current environment state is transited from s to s’. Based on the new state, the above operations are repeated until the values of the Q-table entries converge. In the Q-Learning algorithm described in Figure 3.2, a s—greedy search policy presented in equation (3.6) is used instead of a greedy policy which always selects the action with the maximum  Q value. In a s greedy search policy, with probability s, the agent or —  robot has an ability to select a unique action uniformly and randomly among all possible actions, and with probability 1-6, the agent or the robot selects the action with a high value under the current state. In addition, the probability benefit of the s  -  S  Q  is decreased gradually. The  greedy search policy is its balance in exploring unknown states against  53  exploiting known states when the agent attempts to learn its decision-making skills and improves its Q table. In equation (3.6), the probability s is controlled though decreasing the “temperature” parameter r.  3.4 Modified  Q Learning Algorithm  The mentioned single agent Q learning algorithm is the most common Q learning algorithm. It is employed to find optimal action policies for the members of a multi-robot system and it can help robots find good cooperation policies. However, this algorithm has some serious disadvantages in a multi-robot environment. In particular, directly extending the single agent Q learning algorithm to a multi-robot system violates its assumption of static environment, and as a result the values of the Q-table may not converge. Although the robots still can find some good policies for cooperation, the performance of the whole team of robot can be degraded when that approach is used. (Wang and de Silva, 2007) Accordingly, a novel Q learning algorithm is developed, which is suitable for real-time operation of cooperative multi-robot systems. It determines the optimal solution to the action selection conflicts and provides a more realistic state identification methodology for a multi-robot system. The developed algorithm is suitable for real-time operation of cooperating multi-robot systems. The single agent Q learning algorithm is improved and is named the “Modified Q learning algorithm.” The basic idea of the modified Q-learning algorithm comes from a strategy that is typically employed by humans when they cooperatively push a large and heavy object to a goal location. The group members typically do not observe the environment state individually and decide the environment state independently. Instead, one of them may handle one side, and another member may handle a complementary side. Then they share their reactive information and come to a conclusion about the present environment state. Likewise other group members typically do not  select their  pushing locations and forces concurrently. One person among them will select his pushing location and apply a force first. Then, by observing the first person’s action, the second person will select his pushing action (i.e., a cooperative strategy) accordingly. 54  Next, the third person will determine his action by observing the actions of the first two people. In this manner, when all the people in the group would determine their actions, and they will execute the overall pushing actions through simple synchronization. This cooperative strategy is known to work well in manual tasks. By taking inspiration from human cooperation, the same strategy is used here to develop the modified Q-learning algorithm for multi-robot transportation tasks. A modified algorithm is presented in Figure 3.3.  •  Assume that there are n robots, R ,R 1 ,..., R which are arranged in a 2 special sequence. The subscripts represent their positions in this sequence.  •  S, is the corresponding role set available for the robots. In particular, the robot R, has S role available for the particular instant.  •  Robots R ,R 1 ,..., R exploring the environment for color coded object 2  •  If the robot R. detects the color coded object  ,S 1 S 2  o  ,  R, estimates the pose (position and orientation) of the object.  o Pose information will be transmitted to R ,R 1 ,..., R 2 o  R, assigned as the leader of the team.  o  1 ,R 1 R ,..., R,_ 2 andR,÷ , R 1  ,...,  R transfer the decision rights to the  coordination unit of the R.  o  ,R 1 2 R, should assign tasks to R 11 R,..., R, R andR 1 in the system who will assist in identifying the obstacle distribution around the object o R 1 R ,..., R are transmitting their processed sensor information to 2 the system. o R ,R 1 2 R are will be able to establish the local environment state individually else If Robots R 2 R exploring the environment for color coded ,R 1 object ,...,  ,  ,  ,...,  •  ,...,  Figure 3.3(a): Environment state identification algorithm.  55  • •  Assume that there are n robots, R ,R 1 2 R which are arranged in a special sequence. The subscripts represent their positions in this sequence. ,A 1 A 2 ,Aare the corresponding actions sets available for the robots. In particular, the robot R, has m, actions available for execution as given ,...,  1 : A. =(a,a byR  • • • •  ,a,,)  Q ) 1 ( , 2 ) ,Q s a Q(s,,a ) 1 are the corresponding entries in the Q tables are initialized to zero. Initialize r to 0.9.  Q tables for the robots. All  is a set including all actions selected by the robots thus far, and çS represents the empty set. Do repeatedly the following: o Initialize ( = o o  Observe the current environment state s For(i=lton) 1. Generate the current available action set 1, 2. The robot R 1 selects the action P(a)=  e”°  =  1 (A  —  1 m C’)) (A  e 1, with probability )/r  (3.8)  m (s,a )/r  Where a c Y,(r =1, 2,..., k) and k is the size of the setY, 3. Add action a’, to the set a. End For b. For each robot R.  =  (1 =1,2,..., n), execute the corresponding selected action a’,  c. Receive an immediate reward r d. Observe the new state s’ e. For each robot R 1 follows:  =  (i = 1,2,..., n), update its table entry for Q, (s, a, a  , ...,  a) as  where 0 <i <1 is the learning rate and 0 <fi < 1 is the discount rate. o  s+—s’,r+—r*0.999  Figure 3.3(b): Modified  Q learning algorithm 56  Eia1eQbJe  Estimate ObjcctPosc and Calculate Goal Orientation  Assign  Leader  Get Decision tIcadI  ESLIITatC Ohsaclc Pose and Size according to th Decision  [  Determine Current State c  Figure 3.3(c): The flow diagram of the modified  Q learning algorithm.  During operation, robots in the system start wander in the environment for color coded object to be transported. Once an object is detected by a robot in the system, it rearranges its pose to center the color blob in the camera frame. Then the appropriate robot estimates the pose (position and orientation) of the object and the information will be transmitted to the other robots. The robot that identified the object will temporarily close the machine .  learning unit and transfer its decision rights to the coordination unit. Particular robot is assigned as the leader of the team for the state identification process and all other robots in the system will stop their learning units and transfer their decision rights to the coordination unit of the leader robot. Thereafter, the leader robot will assign tasks to peers in the system, enabling to identify the obstacle distribution around the object. All members in the system will transmit their information while processing the information continuously. All this information would determine the current local environment state s of the object individually by all robots. According to the present state, robots will individually select their respective actions.  57  In the action selection stage, a robot probabilistically and sequentially selects an action a,with probability; and executes it. The proposed modified Q learning algorithm uses a sequential action selection method to overcome the action selection conflicts among robots. When two robots select the same action to transport the object to the same location, it violates the cooperation strategy. According to the proposed sequential action selection method, if one robot selects an action, it should transmit the information to the system. This would help to overcome the conflict. After the agent takes the action ak according to the sequential action selection method, it will receive a reward r from the environment and observe the new environments’. Based on the information of r and s’, the agent will update its Q-table according to (3.10) In this manner the current environment state is transited from s to s’. Based on the new state, the above operations are repeated until the values of the Q-table entries converge. Learning rate i determines to what extent the newly acquired information will override the old information. Discount factor “temperature”  parameter  is r.  The  determines the importance of future rewards. The developed  approach  is  validated  through  experimentation.  3.5 Multi-Robot Transportation Using Modified  Q Learning  Multi-robot object transportation is an important subject in multi-agent robotics. In an autonomous  transportation  system,  several  autonomous  mobile  robots  move  cooperatively to transport an object to a goal location and orientation in a static or dynamic environment while avoiding fixed or removable obstacles. It is a rather challenging task. For example, in the transportation process, each robot needs to sense the positions of the obstacles and the other robots in the neighborhood, and any change in the environment. Then it needs to communicate with the peers, exchange the sensing information, discuss the coordination strategy, suggest the obstacle avoidance strategy, assign or receive the subtasks and roles, and coordinate the actions so as to move the object quickly and successfully to the goal location. Such operations have many practical 58  _____  applications in such fields as space exploration, intelligent manufacturing, deep sea salvage, dealing with accidents in nuclear power plants, and robotic warfare. As explained in section 3.1, a learning capability is desirable for a cooperative multirobot system. It will help the robots to cope with a dynamic or unknown environment and make the entire system increasingly flexible and autonomous. Although most of the existing commercial multi-robot systems are controlled remotely by a human, autonomous performance will be desirable for the next generation of robotic systems. Without a learning capability, it will be quite difficult for a robotic system to become fully autonomous. This provides the motivation for the introduction of machine learning technologies into a multi-robot system. A schematic representation of the developed system is shown in Figure 1.2 and it is repeated in Figure 3.4.  I  Wired/wireless Router TCP/IP  Figure 3.4: The developed Multi Robot System.  59  In this section, a multi-robot transportation system outfitted with distributed Q-learning is studied, which works in a dynamic and unknown environment with a complex obstacle distribution. Modified Q-learning algorithm is used to accomplish this task. 3.5.1 Distributed Multi-robot Q-leaning There are two ways to extending single-agent reinforcement learning into the multi-robot field. One way is to treat multi-robot learning as a combination of multiple single-agent learning processes across the team members. The other way employs SG-based (stochastic game-based) reinforcement learning As mentioned above, Modified learning is also an improved version of typical single agent multiple single agent nature of the typical single agent  Q  Q learning. Therefore the  Q learning algorithm plays a  significant role in multi-robot systems. In this section we consider the combination of multiple single-agent learning type only. Each robot is equipped with the reinforcement learning technique of the multiple single agent learning type and learns their skills as if ,  the other robots do not exist in the environment. This type of reinforcement learning is called “distributed learning”.  The main advantage of “distributed learning” is its  simplicity. In this method, every robot just needs to monitor its own actions and the corresponding rewards, and it does not need to care about the behaviors of the peer robots. Consequence of this is a small learning space and a low computational cost. However, directly extending the single-agent reinforcement learning algorithm to the multi-robot domain violates the MDP (Markov Decision Processes) assumption the theoretical basis -  of reinforcement learning which assumes that the agent’s environment is stationary and -  does not contain any other active agents. It follows that, “distributed learning” may find some utility in the multi-robot learning domain because of its simplicity. Distributed learning may work in a purely cooperative multi-robot task (Yang and Gu, 2004). Theoretical proof of its convergence under some conditions is available (Littman, 2001).  On the other hand, “distributed Q-learning” is  applied to a multi-robot object transportation task in a dynamic and unknown environment, which is a challenging and important research topic in multi-robot  60  coordination systems.  Next, some important issues of “distributed learning” are  discussed and effectiveness, robustness and adaptivity are studied through computer simulation of a learning-based multi-robot system. 3.5.2 Object Transportation Task  Even though considerable work has been done in multi-robot object transportation, it is still difficult problem. In a task of object transportation, multiple autonomous robots have to cooperatively push an object to a goal location and orientation in a dynamic and unknown environment. Pushing is the transportation method used to accomplish the particular task. The particular object has a rectangular cross section. It is large and heavy, and a single robot is unable to accomplish the task alone. Furthermore, each robot only possesses local sensing capability and the obstacles may appear and disappear randomly in the transportation path. Therefore robots cannot plan the trajectory in advance. Figure 3.5 presents a dynamic and unknown environment in a continuous space. Both dynamic  and static planning technologies will fail if the environment changes quickly and the computational speed of the robots is not sufficiently fast. The learning based approach may be a better solution to this challenging problem than a static/dynamic planning approach. The modified  Q learning approach described in Figure 3.3 is utilized to execute  this multi-robot object pushing problem.  .  F  Randon, ObsLacles Goal Location  I  .  ‘‘:ObjeCt N  • Figure 3.5: Description of a multi-robot object pushing task.  61  3.5.3 Multi-robot Object Pushing with Modified Q-learning  Modified Q-learning is used as the learning technique for the multi-robot object pushing problem. The operational method is as follows. Each robot is equipped with modified  Q  learning. When the robots begin to transport the object, each robot identifies the current environmental state cooperatively utilizing the state identification technique mentioned earlier, which consists of the object pose estimation technology and obstacle identification technology, and then individually develops the current environment state as a binary word (Details are given in subsection 3.5.3.1). Next, the robots independently employ the Modified Q-learning algorithm given in Figure 3.3 to select their actions sequentially to push the object. After the actions are taken, each robot cooperatively utilizes the state identification technique and observes the new environmental state individually and receives its immediate reward r. The robots then update their own  Q  tables with the reward and the new environment state information, and continue with the next step of object-pushing until the object reaches to goal location. Each robot individually maintains their Q-tables and makes their decisions. Therefore, it is a fully distributed learning-based decision making system. In other words, the multi-robot object pushing problem is solved with a purely learning-based approach. However, the proposed “Distributed Reinforcement Learning” system also violates the assumption of stationary environment in the MDP problem. While a robot is pushing the object of interest, its peer robots are also pushing it. Therefore, the environment is not stationary anymore from the viewpoint of this robot. Furthermore, the random obstacles make the environment more complicated. However, the simulation results in the following section will show that the object pushing task can be completed successfully albeit with less efficiency. 3.5.3.1 The Environmental States for Modified  Q Learning  The Modified Q-learning algorithm is employed in the task of multi-robot object pushing. As mentioned before, environmental state identification plays a vital role in accomplishing the overall task. Normally, a multi-robot working environment is continuous and complex. Therefore, a methodology is needed to express the continuous  62  environment where the infinite number of environmental states have to be represented by a finite number of states. The environmental state is coded into 16-bit binary code as shown in Figure 3.6.  Goal Orientation  Obstacle Distribution A  [15  87  Figure 3.6: Environmental State Representation using Binary Code.  The current obstacle distribution around the object within the detection radius is represented by the higher 8 bits (from bit 8 to bit 15) in the binary code. The orientation of the goal location relative to the current object position and orientation (pose) is represented by the lower 8 bits (from bit 0 bit 7) in the binary code. The coding protocol is presented in Figure 3.7. The coding protocol assumes that the robots only possess local sensing abilities. Consequently, the system needs to identifi the local environment and detect the obstacles that appear within the detection radius. In order to describe the obstacle distribution in Figure 3.7, the detection circle is divided into 8 equal sectors. Each sector corresponds to one of the higher 8 bits of the environmental state. If there is at least one obstacle detected in this sector, the corresponding bit in the state will become “1.” Otherwise, it will become “0.” For example, in Figure 3.7, the higher 8-bits of the environmental state is “10100100” which indicates the current obstacle distribution around the object within the detection circle.  63  Detection Radius  fy  0 Goal Location  0 /  /  9  1•  I  0  /  x  0  /  I -f—__  —  0  Figure 3.7: Coding protocol of the environment states. The orientation of the goal relative to the current object pose is represented by the lower 8 bits of the environmental state code. The orientation angle & of the goal in Figure 3.7 is used to calculate the lower 8 bits of the state using the following equation: ) 7 Statebll(O_  =  (3.11)  int(8 /(360.0 / 256))  intO function is used to convert a float value into an integer number. 3.5.3.2 Robot actions  Normally, a robot has the ability to push an object of rectangular cross section at any  contact point on the surfaces of the object. Therefore, the object provides an infinite number of action points for the robots to push. However, a finite number of action points needs to be defined to implement the modified  Q learning algorithm. Accordingly, only  six action points (a1a6) will be available to each robot, as shown in Figure 3.8 (Wang, 2007). 64  2 a  3 a  1 a  4 a  6 a  5 a  Figure 3.8: Pushing action points available to each robot. 3.5.3.3 Object dynamics  This section discusses the application of the modified  Q learning algorithm to the multi-  robot object pushing problem. The object dynamics is simplified in the simulation system. It is assumed that the object displacement is proportional to the net force exerted by the robots, and the rotational angle of the object is also proportional to the net torque exerted by the robots. In Figure 3.9, the dynamics of the object is illustrated, where the object is being pushed by three mobile robots (Wang, 2007). A A  EZE Object  65  Yenv  / / /  Y2  Yi  xl  x,  X  env  Figure 3.9: Dynamics of the object with rectangular cross section. According to Figure 3.9, three robots push the object with rectangular cross section and 1 to F are their pushing forces. The object dynamics is described by the following  equations: (Wang, 2007) 2 =x x 1 ÷cFcos(y+a ) 1  (3.12)  +cFcos(y+a =y ) 2 1 .v  (3.13)  a =c 2 a + 1 F  (3.14)  where,  1 and c, are coefficients, and F is the net force, F is the net torque, c  y = arctg2(Fj /F)  66  3.5.3.4 Reward function  A common immediate reward policy is presented in this section. Specifically, each robot receives the same reward after it pushes the object for a short period of time At. The reward consists of the following three parts: (Wang, 2007) (1) The reward for pushing the object toward the goal location RdIS ce = (DOld  —  (3. 1 5)  Dnew )cd  where DOld and Dnew are the distances between the goal location and the object center before and after taking the pushing actions, and c is a coefficient. (2) The reward for the rotational behavior of the object 2 —a 1 Rrotation = cos(I a where, (a 2  —  I)  with 0 I a 2 —a 1  Ig  (3.16)  ) is the rotational angle of the object, when it is pushed by the 1 a  robots for a period of time At. While a small rotation is encouraged, a large rotation will be punished because it will make handling of the object difficult for the robots. (3) The reward for obstacle avoidance  1i.o;f Robsiacle =  where,  i’  no obstacle —  —  1 .5(sin(ip’ / 2))  (3.17)  is the angle between the direction of the net force and the direction of  the obstacle closest to the net force. Finally, the total reward is calculated as: R = WlRdistce  + W2Rrotation +  Robstacle 3 W  (3.18)  2 +w 3 =1.0. Where w 1 to w are the weights andw 3 1 +w  67  3.5.4 Simulation Results The simulation system is developed to validate the proposed modified  Q learning  technique. Microsoft Visual C++ language is used to program a learning-based multirobot object pushing system, which has been described in the previous sections. The dimensions of the simulated environment are 1000 x 700 units and the dimensions of the object with rectangular cross section are 120 x 80 units. Gray color circles with a radius of 2 units are used to represent the three robots, which are used for the simulation system in the subsequent figures to push the object to the goal location. The object with a rectangular cross section is large and heavy, and an individual robot cannot handle it without the help of its peers. During the simulation, based on the available environmental state, the robots sequentially select their actions using the modified Q-learning algorithm. After selecting their actions sequentially, they begin to push the object after synchronization. The object is pushed for a short period of time At, and the same reward is given all the robots. Next, observing the new environmental state, based on the reward and the new environmental state, the robots independently update their Q-tables. Next step of pushing starts, and so on, until the goal location is reached or the total number of steps exceeds the maximum limit. The object pushing is completed ones the object reaches the goal location or the total number of steps for one round exceeds the limit. When a round of object pushing is completed, a new round of object pushing is started. In addition, the starting location of the object, the goal location, and the obstacles are selected randomly, every time the next round of object pushing is started. However, the Q-table of each robot is inherited form the last round. The simulation system is stopped after the total number of rounds reaches the desired number. The parameter values of the simulation system are: /3=0.9, r = 0.9, c =0.275, c, =0.0175,  Cd  i =  0.8,  =0.5, w 1 =0.6, w 2 =0.04, w 3 =0.36, and 1=10. Eleven  obstacles appear randomly in the environment during the simulation, and the maximum number of steps per round is 6000.  68  3.5.4.1 Simulation Results for Object Pushing Two different object pushing snapshots under different obstacle distributions and goal location are presented in Figure 3.10. Three robots represented by very small circles are located on the sides of the object. The eleven large blue color circles represent the obstacles and the red color circle represents the goal location in the environment.  4  Goal Location  Obstacle  I  ObjedTrajecty  .  .  . Obacle  4  StartingPoint (a).  Snapshot of a successfiil object pushing task  69  fib,  . .  .  .  (b). Another successful task with different obstacle distribution and goal location Figure 3.10: Two object-pushing snapshots under different conditions. The three robots first cooperatively push the object upwards to the goal location in the top right corner, as presented in Figure 3.10 (a). During the object pushing process, the object reaches an obstacle in the transportation path. Successfully avoiding the obstacle after several failed trials, the robots select to slightly push the object downwards to avoid the obstacle. The object is pushed by the robots upward again to reach the direction of the goal location. However, before the object reaches the goal location, it reaches another couple of obstacles.  The robots again adjust their actions according to the new  environment states and avoid the obstacles during transportation. Finally, the object is successfully pushed to the goal location. Figure 3.10(b) presents another object pushing task with a different obstacle distribution. Both these results demonstrate that the modified  Q learning algorithm which is also a  single agent reinforcement learning algorithm, can be extended to multi-robot object  70  pushing in an effective and successful manner. In the process of pushing and obstacle avoidance, the robots demonstrate a flexible and adaptive sequential action selection strategy in a complex and unknown environment. In addition, the system employs a fully distributed  multi-robot  decision-making  system  with  local  sensing  capability.  Consequently, it is robust and scalable, and may be easily employed in various complex environments and applications. It is also observed that the learning algorithm is not perfect in every respect. It can suffer from the lack of local sensing.  However, the particular problem can be solved by  increasing the detection radius. That will lead to rapid expansion of the learning space, which can make the algorithm ineffective. 3.5.4.2 Effectiveness of Modified Q-learning in a Multi-Robot Cooperative Task  As mentioned before, immediate reward is a same common reward and it is distributed among the robot. Because a robot’s reward is affected by the action of its peers, the assumption of a stationary environment is violated in the Modified Q-learning algorithm as well.  This means the particular algorithm has less sensitivity in a multi-robot  environment. However, it was possible for single agent reinforcement learning to learn the coordination among robots in a purely cooperative multi-robot task (Yang and Gu, 2004).  This observation is verified with the multi-robot object pushing simulations  presented in this section. Additionally, some probability analysis results are used to evaluate the Modified Q-leaming in a multi-robot cooperative task as presented next. For the present simulations, the environment state is selected randomly for analysis, whose state value represents the binary number “1000000000” or decimal number “512.” Its corresponding Q-values for the six action candidates are also examined. According to the coding protocol of the environmental states in section 3.5.3.1, the binary value of  “1000000000” represents an environmental state, which is shown in Figure 3.11. The robots are allowed to push the object continuously for 100,000 rounds to gain experience and update their respective Q-tables.  71  +  Detection Radius  \  Action NoOl  —.—.—.-1-.  0  Goat Location F  I 0  0  ‘4  Action I Action No 05 No 06 0  b  0  Figure 3.11: The environmental state with decimal value “512” and six pushing actions available to the robots. The Q-values under the state of”512”in 100,000 rounds of object pushing are recorded and shown in Figure3.12, and the probability density estimate of these Q-values is given in Figure3.13.  72  18 16 14 12 10 08 6  4 2  0  0  2  4 6 Round number  10  8 x 10  4  Figure 3.12: The Q-values under state of “512” in 100,000 rounds of object pushing.  73  O .35 0.3 ci)  0.15 cii D  2  0.1  0  0.05 0 -5  0  5  10  15  20  Q Value Figure 3.13: Probability density estimate of Q values under the state of “512” in 100,000 rounds of object pushing. When the environment is stationary, the Q-values in a single-robot system will usually converge to some values with probability one (Yang and Gu, 2004). Figure 3.14 shows that the Q-values increase rapidly from the initial values of zero to some positive values in a very early stage. Then the Q-values start to oscillate in a bounded area with an upper bound of 16 and a lower bound of 3. From Figure 3.14 and Figure 3.15 it is rather difficult to find which action is selected at higher probability under the state of “512.” The Q-values in Figure 3.14 appear to oscillate randomly and the curves overlap with each other. Therefore, it appears that the robots do not learn any cooperative strategy in the training process and only randomly select their actions. This phenomenon conflicts with the results in Figure 3.12.  74  According to the sequential action selection strategy learned by the robots, the action )Ir  selection probability in each round is calculated using the equation P(a) = (s,a )/r  Figure 3.16 presents the probability density estimate of the action selection probability with the sample of 100,000 rounds of object pushing. 14  Action Action Action Action Action Action  12  ElO  1 2 3 4 5 6  0  2  0 -0.2  0  0.2 0.4 0.6 0.8 Action Selection Probabilty  1.2  Figure 3.14: Probability density estimate of an action selection probability under the state of “512” in 100,000 rounds of object pushing. Figure 3.14 gives the probability density estimate of sequential action selection probability under a special state in 100,000 rounds of object pushing. It is shown that there is a high probability that the robots select the action number Olin a low probability, and there is a lower probability that the robots select action number 04 in a low probability. In other words, action number 01 is selected by the robots in low probability  75  under the environmental state of “5 12”and action number 04 is selected by the robots in higher probability under that particular state. According to Figure 3.11, it is observed that action number 01 pushes the object away from the goal location, and action number 04 pushes the object towards the goal without colliding with the obstacle. By comparing Figure 3.14 with Figure 3.11, it is easy to conclude that the robots learn the correct sequential action selection strategy when they cooperatively push the object to the goal location. This explains why the robots in Figure 3.10 can cooperate to push the object to the goal successfully in a complex and dynamic environment with random obstacle distributions.  3.6 Summary This chapter addressed the application of machine learning to multi-robot decision making. Most multi-robot operating environments, such as planet surfaces, are dynamic and unstructured where the terrain features and the obstacle distribution change with time. Furthermore, even if the external physical environment is stationary, the overall system structure is still dynamic from the viewpoint of a single robot because other robots may take actions thereby changing the environment of that particular robot continuously. The environmental dynamics makes multi-robot decision-making quite complex, where the traditional task planning becomes ineffective because a planned optimal policy can become inappropriate a few minutes later due to changes in the environment. Therefore, multi-robot systems face some special challenges; for example, dynamic and unknown environments, which the traditional Al technologies cannot effectively deal with. Reinforcement learning is commonly used to make decisions for a single robot due to its good real time performance and guaranteed convergence in a static environment. In this chapter, first, the single agent reinforcement learning was introduced for multi-robot decision making. The basic idea of the single agent Q learning and its framework of Markov Decision Processes were introduced in section3 .2 and section 3.3. In section 3.4, a more challenging multi-robot environment was assumed, which is dynamic and unknown and has a more complex obstacle distribution. By observing human capabilities  76  of dealing with a dynamic environment, it is easy to conclude that a human employs not only planning or reactive (behavior-based) techniques but also learning techniques to successfully complete a task in such an environment. Through learning, a human learns new world states, finds optimal actions under these states, and improves his planning and reactive techniques continuously. Learning enables him to deal with unexpected world states, decrease uncertainties in the environment, and make his decision-making capability more robust in a dynamic environment. For such an environment, a distributed multi-robot object pushing system based on the modified  Q learning algorithm was  developed. The simulation results showed that the robots still could complete the task well at considerable speed. A modified Q-learning algorithm, which is suitable for multi-robot decision making was developed. The simulation results showed that it has disadvantages when used to make decisions in a multi-robot system, even though it can enable the robots to make good decisions and properly complete tasks in most of cases. Because it violates the assumption of a static environment, directly applying the learning algorithm to the multi robot environment will result in low convergence speed, and there is no guarantee that the algorithm will converge to the optimal policies. However, the developed learning algorithm showed good performance with regard to more realistic of state identification and action selection in a simulated multi-robot object-pushing exercise.  77  Chapter 4 Pose Estimation for Multi-Robot Object Transportation 4.1 Overview A multi-robot system consists of several autonomous robots for carrying out a common task cooperatively. This represents an important research area in robotics. Such systems should be able to monitor and learn their cooperative strategy. Because of their flexible configuration, high efficiency, robustness, low cost and similarity to human groups, there are numerous research issues and applications associated with them. Several autonomous robots share the environment in a multi-robot system. Therefore, the team members need to know their exact position and orientation in the shared environment. Not only that an individual robot needs to know its own latest pose (position/orientation), but also the pose of other robots and potential obstacles in the shared environment in order to make rational decisions. Therefore, pose estimation plays a vital role in a multi-robot system. There are many means to measure the pose of a robot or an obstacle; for example, using digital cameras, sonar, or laser distance finders. However, most multi-robot systems employ digital cameras for this purpose, which offer three key advantages. First, a digital image provides a rich source of information on multiple moving objects simultaneously in the operating environment. Second, there is a possibility to build fast and accurate vision subsystems at low cost. Third, robot cameras observe and understand their operating environment in a “natural” manner as how humans use eyes to observe the world (Wang and de Silva, 2007). However, numerous obstacles exist in employing a computer vision system. First, since the object detection has to be done through feature extraction from the image, if the lighting conditions of the environment change, the result can become inaccurate. Second, in the image capturing process, hidden features of the image will not be in the image. Therefore, it will be difficult to detect an object occluded by another object. Third, different light sources may have different light intensity levels  78  from different directions in the background. In a practical environment, with different light sources and lighting directions, identifying an object can become difficult. Fourth, object detection using digital cameras becomes challenging when the actual orientation or the extracted features are different from those employed in the training process. Fifth, during image processing with large amounts of training data, computational processing power may not be adequate, and this will affect the overall performance of the system. Use of computer vision as a stand alone sensory approach in a dynamic environment to detect objects has been attempted by others. In particular, Kay and Lee (1991) used images from a stereo camera mounted on the end-effecter of a robot manipulator to estimate the pose of an object with respect to the base frame of the robot. The orientation of the object was estimated by using the least-squares method. A Kalman filter was used to smooth the estimation error. Stone and Veloso (1998) studied a multi-robot soccer system. In their work, they used a global camera to monitor the positions of the robots and objects in the game. Simon et a!. (1994) developed a system which was able to perform full 3-D pose estimation of a single arbitrarily shaped rigid object. They used a high speed VLSI range sensor to acquire data on the object, and then performed pose estimation by fitting the data to a triangular mesh model using an enhanced implementation of the iterative closest point algorithm. Veeraraghavan et al. (2003) developed a computer vision algorithm to track the vehicle motion at a traffic intersection. A multilevel approach using a Kalman filter was presented by them for tracking the vehicles and pedestrians at an intersection. The approach combined low-level image-based blob tracking with high-level Kalman filtering for position and shape estimation. Maurin et al. (2005) presented a vision-based system for monitoring crowded urban scenes. Their approach combined an effective detection scheme based on optical flow and background removal that could monitor many moving objects simultaneously. Kalman filtering integrated with statistical methods was used in their approach. Chen et al. (2005) presented a framework for spatiotemporal vehicle tracking using unsupervised learning-based segmentation and object tracking. In their work, an adaptive background learning and subtraction method was applied to two real-life traffic video sequences in order to obtain accurate spatiotemporal information on vehicle objects. Ekvall et a!. (2005) presented a computer vision approach for object recognition and pose estimation 79  based on color co-occurrence histogram and a geometric model. Even though the approach was somewhat robust, it failed under large changes in lighting conditions. Park et al. (2006) presented a method for global localization of an indoor environment, which employed object recognition using a stereo camera. Their method of global localization first estimated a coarse pose and then a refined pose. The coarse pose was estimated by means of object recognition and least squares fitting through singular value decomposition, whereas the refined pose was estimated by using a particle filtering algorithm with omni-directional metric information. Even though a vision system has been used for pose estimation, their work has a number of drawbacks. In particular, the presented schemes are computationally expensive, time consuming, have time delays, and are susceptible to changes in lighting conditions. Lang et al. (2008) presented a multiple sensor based method for robot localization and box pose estimation. A CCD camera mounted on the robot was used to find the box in its environment. A laser range finder mounted on the robot was activated to measure the distance between the laser source and the object. Finally, homogenous matrix transformation was applied to represent the global pose of the robot and the box. The approach was validated using Microsoft Robotics Studio simulation environment. In a multi-robot object transportation task, multiple mobile robots move quickly in an unpredicted manner and the vision system needs to capture their positions and orientations within a very short time. Therefore, the conventional time-consuming vision algorithms are not feasible here. These algorithms are too complex and computationally demanding for meeting real-time constraints in a multi-robot object transportation system. On the other hand, in an object transportation task, the multiple mobile robots move in a large area that has different levels of illumination. Multi-robot systems usually work in large, unstructured, and unknown environments with uneven lighting conditions. The robots usually move into and move out of sub-areas having different brightness levels. In order to track the robots effectively in such an environment, the vision system must be robust with respect to different illumination conditions. However, most of the existing algorithms do not consider this problem.  80  Object and obstacle recognition and pose estimation are essential for an object transportation system with mobile robots. Therefore, computer vision alone is not an adequate solution for the detection of objects and potential obstacles. Sensor fusion can be proposed as an alternative to overcome the problems mentioned above. This chapter investigates an approach for pose (position and orientation) estimation of an object or potential obstacle using a laser range finder and a CCD camera mounted on a mobile robot. In the next section, the multi-robot transportation system in the laboratory is introduced. In Section 4.3, the proposed pose estimation approach is presented. Section 4.3.1 presents the robot global localization. Section 4.3.2 outlines color blob tracking for object recognition. Section 4.3.3 presents object pose estimation. An experimental evaluation of object pose estimation is given in Section 4.4. The conclusions of the chapter are given in Section 4.5.  4.2 Multi-robot Transportation System The primary objective of the multi-robot object transportation project in the Industrial Automation Laboratory (IAL) is to develop a physical multi-robot system where a group of intelligent autonomous robots has the ability to work cooperatively to transport an object to a goal location and orientation in an unknown and dynamic environment. Obstacles may be present or even appear randomly in the work environment during the transportation process. Three or more autonomous mobile robots are present in the system, which enable cooperative transportation of an object to a goal location. During the course of the transportation, they have to interact with each other to develop the coordination and cooperative strategy and decide on the feasible transportation locations and suitable action positions selected by each robot so that the object is transported quickly and effectively while avoiding obstacles that may be present during the transportation process. Furthermore, the system may have to consider avoiding damage to the transported object. Charge coupled device (CCD) camera systems, laser range finders and sonar are used to monitor and measure the current location and orientation of the object. A schematic representation of the initial version of the developed system is shown in Figure 1.2 and is repeated in Figure 4.1.  81  Server  TCP/IP  Figure 4.1: The developed physical multi-robot system. The dynamic and unpredictable work environment contains fixed obstacles and movable obstacles, which may appear randomly. The robots and the sensory system are separately linked to their host computers, which are connected though a local network to implement complicated controls and machine intelligence. The robots, the camera, and the sensors are separately linked to their host computers, which are connected through a local communication network, to implement complex controls and machine intelligence. It is important to consider three major challenges in the development of the system, which are also major challenges in multi-robot systems. The first one is the dynamic and unpredictable nature of the environment. Especially, dynamic random obstacle are scattered within the working environment of robots. The obstacles may appear and disappear randomly during the transportation process. In addition, there are multiple  82  robots working in parallel. While one robot makes decisions, other robots may have changed the environment according to their behaviors. This also makes the environment dynamic from the viewpoint of the first robot. As a result, the decision making process becomes very complex for the robots within the dynamic environment. A major challenge is to make decisions in real-time while identifying hidden or invisible features of objects and the surrounding environment conditions from the local viewpoint of the robots. The second challenge results from the local sensing capabilities of the robots. In the present project, it is assumed that each robot is capable of detecting an object or an obstacle within a limited detection radius. It means that the robots possess local sensing capabilities only and therefore the global environment is unknown to them. Robots do not have an understanding about the location or the shape of an obstacle until a robot moves close to the obstacle. The unknown environment is a major challenge in the multi-robot domain because the robots do not have a complete understanding about the environment in advance, partly due to the unfamiliarity and partly due to the dynamic nature. Thus the system cannot utilize the traditional planning technologies for decision making. The third challenge is the lack of a reliable communication capability among robots, where the disturbances are represented by white noise. Due to unreliable communication, the robots may end up selecting a wrong coordination strategy. Improvement of the robustness of multi-robot decision-making in an environment with unreliable communication is a challenging issue in the multi-robot domain.  4.3 Pose Estimation Technique Pose estimation in real time is a fundamental requirement in multi-robot cooperation for object transportation. Though there has been substantial growth of research activity in pose estimation of a robot, the pose estimation of objects and potential obstacles has not received the much needed attention. Estimation of the Cartesian coordinate position (x, y) and orientation (8) of an object can be used in diverse applications like object grasping and robotic manipulation, motion prediction, and object transportation. This chapter presents an approach to acquire meaningful data on distance and orientation of an object 83  using the laser range finder of a mobile robot for calculating the pose (center and orientation) of the object with respect to the robot in the global coordinate system. The developed methodology as presented here is an integral part of the multi-robot cooperative object transportation system. A group of mobile robots explore the environment to detect a color coded object. Once the object is detected, that particular robot estimates the object pose. This includes the center and the orientation of the object. However, the object of interest is large and heavy, and cannot be transported by a single robot. Therefore, a team of robots in the multi-robot system is called upon to cooperatively transport the object. Figure 4.2 presents the general scheme of pose estimation of an object for cooperative object transportation. According to the proposed method, successful transportation constitutes six steps, as outlined next. Initialization of all robots: Robot global localization. Local localization of all robots are transformed into global localization Search for an object: Robots start searching for an object of interest to be transported to the goal location and orientation. Rearrange of robot pose: Once the robot in the multiple robot system finds a color coded object, it rearranges its pose to center the color blob in the camera frame. Object pose estimation: The particular robot, which found the color blob, estimates the pose of the object and takes over the leadership in the rest of the state identification process. The particular robot assigns tasks to the member robots in the system during the state identification stage. Obstacle pose estimation: The other robots in the system identify the local obstacle distribution around the object and build the state binary code of themselves. Then the system determines suitable points of contact (action points) with the object for transporting it.  84  ____//  Transportation of the object: Multiple robots transport the object by pushing the object of interest. The process is repeated until the object reaches the goal location and orientation.  C  Initialize Robots  C :4  /  Robot Global Localization  7  Encoder  /  ---  //  Search Object  I  /  -  Camera  /  No  —  Is Object Found  Wander  Rearrange Robot Pose  :-:-  -  Object Pose Estimation  -  4  Laser Range Finder  ZL Transport the Object  Figure 4.2: General scheme of pose estimation for cooperative object transportation.  4.3.1 Global Localization of Robot Global pose estimation is required since there is more than one robot in the environment of cooperative object transportation. An important means of estimating pose of a robot is odometry. This method uses data from the shaft encoder mounted on a wheel of the robot. The encoder measures the speed and the displacement of the drive wheel after a specific time step and is added to the pose of the robot in the previous time step. With reference to Figure 4.3, the kinemati model of the mobile robot is expressed by the  85  relations presented next. The state vector representing the pose of the mobile robot in the global frame is given by: q(k)  [x(k) y(k) 8(k)] T  =  ,  where x(k) and y(k) are the coordinates of position P in mm,  and 8(k) is the orientation in degrees. Also, D(k) is the distance traveled between the time steps k and k+]; v (k) is the robot translational speed in mm/s; T is the sampling 1 time in seconds; 8(k) is the angle between the robot and the global x-axis; A 8(k) is the rotation angle between time steps k and k+1;  0L(k)  and  R 0  (k) are the angular velocities  of the left and right wheels, respectively; r is the radius of the two drive wheels; and d is the distance between the two wheels. The kinematic model of the mobile robot is given by: x(k + 1)  =  x(k) + D(k) .cosO(k + 1)  (4.1)  y(k +1)  =  y(k) + D(k). sin 8(k +1)  (4.2)  8(k +1)  =  8(k) + A8(k)  (4.3)  D(k)  v (k).T  A8(k)  v (k)  o.(k)  =  =  (4.4)  o(k).T  (4.5)  COL(k).r +  coR(k).r  (4.6)  2 = WR  (k).r  —  ‘L  (k).r  (47)  d  It follows that the updated pose state vector is q(k + 1)  =  [x(k +1) y(k + 1) 8(k + 1)]T  By using equations 4.1 through 4.7, the robot global pose q(k +1) for a wheeled robot is computed as:  86  x(k) q(k + 1)  =  y(k)  WL  +  aiL  (k) ± R (k) (k) ±  (k)  cos0(k) + E (aiR (k) sin0(k) + L (aiR (k)  -  -  O)L  (k)))  O)L  (k))  (4.8)  aiR(1’)— O)L(k)  0(k)  2  XR  xl k)  Figure 4.3: Global Pose Estimation of Wheeled Robot.  4.3.2 Color Blob Tracking The phenomenon of color blob template matching is utilized for color blob tracking. The approach presented in this section entails detection of the object to be transported with the help of two different color blobs attached to its vertical surfaces. For this purpose the Advanced Color Tracking System (ACTS) capability is employed. ACTS is a software tool which in combination with a color camera, allows the application to track up to 320 colored objects at a speed of 30 frames per second. The robots explore the surrounding environment within the sensory range and a robot receives a stimulus if it locates an object with the color blob. As it locates the object, the robot rearranges its pose so as to move the color blob attached to the object to the center of the camera frame. As illustrated in Figure 4.5, the camera frame is divided into four quadrants. Here, the size of the camera frame is (640 x 480) pixels, while the center of the camera frame is represented by a square of (40x40) pixels. If the color blob falls  87  within the square, it is believed to be in the center. Otherwise the robot rotates its base so that the detected color blob is approximately located at the centerline of the camera frame. By centering the color blob in the camera frame, a suitable position is provided for the Laser range finder to scan the object for pose estimation. A flow diagram of the above mentioned process is shown in Figure 4.4.  \ 1ove Robot to centre the blob  i:s  No ‘4  Figure 4.4: flow diagram of color blob tracking  88  (0,0)  (640,0)  o  o f  Second Quadrant  Third Quadratit  o  40  0  First Quadrant (0,480)  4  Fourth Qu acirant (640,480)  Figure 4.5: Division of Camera Frame into Four Quadrants.  4.3.3 Object Pose Estimation As noted in section 4.3.1, once the robot rearranges its pose to center the color blob within camera frame, the laser range finder that is mounted on the robot is activated to estimate the pose of the object. Figure 4.6 indicates how the laser range finder is used to measure the range. Specifically, the distance is determined by calculating the phase difference between the incident and reflected signals. If the transmission time from A to B is T, phase difference is q, and the modulation frequency is I’m’ then the distance between A and B is given by: L  =  c  .  The pose of the object is estimated relative to  the robot, which is then transformed into global pose estimation (Gao and Xiong, 2006). This is needed because the robot may have to communicate the object pose to other robots in order to transport the object cooperatively.  89  / / /  Reflection Ray I LJiidniRay____ -  Figure 4.6 (a): Schematic drawing of laser range sensor.  Figure 4.6 (b): A 1 80-degree laser range sensor. 4.3.3.1 Relative Pose Estimation With reference to Figure 4.7, data on distance and angle are acquired through the laser range finder to calculate the center and the orientation of the object relative to the robot coordinate system. C ‘V  B  Figure 4.7: Relative Object Pose Estimation. 90  Note that a and /3 are the angles corresponding to the distances d 1 and d , respectively. 2 Also, d 1  =  2 AOR, d  =  BOR;  OR  and XR are robot coordinates; and d 1 and d 2 are the  distances from laser range finder to two distinct edges of the object. We have: 3 dlcosa=ORX  (4.9)  2 cos,6 d  (4.10)  =  ORXI  sina=AX 1 d 3  (4.11)  sin/3=BX 2 d 1  (4.12)  By using equations 4.9 through 4.12, the angle which represents the orientation of the object is given by:  ‘=tan’  1 s 2 d s infl—d ina 1 cosa d d 2 cos/3  (4.13)  —  Object orientation can be found by using equation 4.13. The center point of the object  °B  is given by:  cosa + d 2 cos/3 + 2 (d sinfl d 1 sina)] —  1 sina + d ![d 2 sinfl + 1 (d cosa d 2 2 cosfi)] —  ( 4 cosfl+s 2 infl—d d sina)] cosa +d 21  (4.14)  2 sinfl+(dj cosa—d 42 2 cosfl)] “c =-i[dl sina+d  (4.15)  xc  =  Equations 4.13 through 4.15 represent the object pose relative to robot pose, and can be given by the vector:  91  o=[x  (4.16)  Y  Equation 4.16 describes the object with respect to the mobile robot, and is called the relative pose. Figure 4.8 shows the results of the laser range finder using MobleSim simulator. There are six objects within the laser scanner according to the results given in Figure 4.9. Because the object with a color blob label should be in the center of the laser range scan after applying the color blob tracking in Section 4.3.2, it is very easy to establish that the object must be in the range between  -  14° and 32°.  Fig. 4.8 (a) MobileSim simulator visualized results from laser range finder.  ii Fig. 4.8 (b) Laser range finder results using MobileSim simulator.  92  4.3.3.2 Object Global Estimation A homogenous transformation matrix (Spong et a?, 2006), which represents the translational and rotational motion of one frame with respect to another frame, is applied to represent the relation between the three coordinate systems: the robot coordinate system, the object coordinate system, and the global coordinate system. The homogeneous transformation matrix between the object coordinate system and the robot coordinate system can be expressed by using the result obtained in section 4.3.2.1:  T’=  cos8’ 0  (4.17)  Y 1  0  The robot global pose is known from equation 4.8 in section 4. The homogeneous transformation matrix between the robot coordinate system and the global coordinate system can be established as: cos 8 T”=t sin8  —sin8  X  cos8  Y  0  1  0  (4.18)  By using equations 4.17 and 4.18, the homogeneous transformation matrix T between the object coordinate system and the global coordinate system can be computed as: cos8 —sin8 X T  T:  T”.T’  =  sin8  cos8  Y  0  0  1  cos8’ —sin9 X .  sin8’  cos9  }  0  0  1  —sin+O’) Xcos8—}sin8+x sin+9) cos+9) XsinO+}cos&+y 0  0  (4.19)  1  93  From equation 4.19, the pose of the object in the global coordinate system is determined as:  o”=  x y 8  =  Xcos8—Y,sin8+x xsin8+Ycos8+y A tan(sin(8 + 8’)! cos(8 + 8’))  4.4 Experimental Results In this section the experimental results are presented using the test bed described in section 4.4.1. Experiments are conducted under two different situations. In the first set of experiments, the robot is kept stationary and the objects are placed at different positions and orientations. In the second set of experiments, the robots move, and once an object is detected by a robot it calculates the pose of the detected object. In both sets of experiments the objects are placed in different poses, as indicated in Figure 4.9.  /  —  5 \  —----  •  Position 01  I  Position 02  Robot  Figure 4.9: Arbitrary layout of objects.  94  4.4.1 The Test-bed  Figure 4.10: The experimental system in IAL at UBC. A test bed has been developed in the Industrial Automation Laboratory (IAL) for multi-  robot transportation, as shown in Figure 4.10. It includes one four-wheel-driven ActiveMedia PioneerTM 3-AT robot and two three-wheel-driven ActiveMedia PioneerTM 3-DX robots. All three robots include wireless Ethernet, eight sonar sensors and a gyro. Two robots include a CCD camera and a laser range finder. Each robot houses a Pentium based computer running windows 2000. The laser range finder in the present system uses a SICK LMS 200 2D scanner, which has a horizontal range of 180° with a maximum resolution of0.5°, which is used to monitor the positions and orientations of the object, potential obstacles and mobile robots. The device produces range estimation using the time required for the light to reach the target and return (Nourbaksh et at, 2004).  95  4.4.2 Experiments with Stationary Robot Two sets of experiments are conducted in the category of stationary robot. In the first set of experiments, the robot is placed at the global position [0, 100, 0], which gives the X coordinate, Y-coordinate, and orientation  ,  respectively. The seven test points in Table 1  represent an object having dimensions 525mm x 257mm x 446mm, placed in the workspace at seven different positions and orientations. First the actual center of the test point is measured, which is compared with the center estimated through the laser range finder that is mounted on the robot. Figures 4.11, 4.12 and 4.13 correspond to the information in Table 4.1. Specifically, Figure 4.11 gives the difference between the actual and the estimated values of the x-coordinate of the center of the object. Figure 4.12 represents the difference between the actual and the estimated values of the y-coordinate of the center of the object. Figure 4.13 indicates the difference between the actual orientation and the estimated orientation of the object.  Table 4.1: The actual and estimated object pose results in the first set of experiments with a stationary robot. Test Points  Xactual  estimated  error  %error  actual  estimated  error  %error  actual  0estimated  error  %error  1  248  286.36  38.36  15.47  2095  1985.69  109.31  5.22  16  14.17  1.83  11.44  2  -219  -241.24  22.24  10.16  1184  1150.34  33.66  2.84  -12  -13.08  1.08  9  3  943  929.25  13.75  1.46  1460  1324.37  135.63  9.29  38  36.96  1.04  2.74  4  1471  1387.21  83.79  5.7  463  350.17  112.83  24.37  70  68.66  1.34  1.91  5  -1364  -1370.01  6.01  0.44  538  536.67  1.33  0.25  -66  -66.84  0.84  1.27  6  -1368  -1388.36  20.36  1.49  1354  1317.24  36.76  2.71  -32  -33.97  1.97  6.16  7  -710  -727.02  17.02  2.4  2298  2274.81  23.19  1.01  -7  -9.31  2.31  33  96  2000 1500 1000  5OO  I [[  O  D X- actual  • X estimated -  DX- error 500 -1000 -1500 -2000  L Test Point  Figure 4.11: The x-axis error from Table 4.1. 2500  I  2000  E  ii  1500  DY-actual • Y- estimated DY-error  1000  500  I  1  0 1  2  3  ki[ II 4  5  6  7  Test Point  Figure 4.12: The y-axis error from Table 4.1.  97  80 60 40 E E  20  0 0 C  0  0  0  C theta- actual • theta C theta  -20  -  -  estimated error  -40 -60 -80 Test Peint  Figure 4.13: The orientation error from Table 4.1.  In the second set of experiments the robot global pose is changed to [40, 105, 60]. The dimensions of the object in this experiment are 525mm x 257mm x 446 mm. Table 4.2 presents the result for the second experiment, and Figures 4.14, 4.15, and 4.16 show the error results corresponding to the information given in this table. The same parameters as in the previous experiment are used. In both sets of experiments, with a few exceptions, the % error is well within an acceptable range, which is 100 mm in the considered application.  98  Table 4.2: The actual and estimated object pose results for the second set of experiments with a stationary robot. Test Points  Xactual  Xestimated  error  %error  Yactual  Yestimated  error  %error  8actual  6estimated  error  %error  1  248  266.56  18.56  7.48  2095  1964.69  130.31  6.22  16  13.06  2.94  18.38  2  -219  -169.26  49.74  22.71  1184  1145.34  38.66  3.27  -12  -13.54  1.54  12.83  3  943  931.65  11.35  1.2  1460  1303.52  156.48  10.72  38  36.56  1.44  3.79  4  1471  1364.56  106.4  7.24  463  327.65  135.35  29.23  70  67.66  2.34  3.34  5  -1364  -1293.14  70.86  5.2  538  534.66  3.34  0.62  -66  -67.67  1.67  2.53  6  -1368  -1339.67  28.33  2.07  1354  1299.93  54.07  3.99  -32  -34.86  2.86  8.94  7  -710  -692.93  17.07  2.4  2298  2258.81  39.19  1.71  -7  -10.21  3.21  45.86  2000 1500  1000 E E  500  a)  0  C., U,  -500 -1000 -1500  HJui  c X- actual • X- estimated DX- error  -2000 Test Point  Figure 4.14: The x-error from Table 4.2.  99  2500  —  2000  1500  DY-actual • Y- estimated  II:  1000  50:  1  2  DY-error  lull  3  4  5  6  7  Test Point  Figure 4.15: The y-error from Table 4.2.  80 60 40 20 0 1 -20  LI34’J4’DthetaerrOr  -40 -60 -80 Test Point  Figure 4.16: The orientation error from Table 4.2.  100  4.4.3 Experiments with a Moving Robot  Two sets of experiments are conducted in this category as well. Here, robots explore the environment in searching for an object to be transported. Once a robot finds the color coded object, it rearranges its pose to center the color blob within the camera frame. Then the laser range finder is activated to estimate the orientation and the center of the object. In the first set of experiments, an object having dimensions 430 x 537 x 594 is used. Seven different experiments are carried out. Table 4.3 presents the results of the experiments. Figures 4.17, 4.18, and 4.19 correspond to this table, which show the difference between the actual and the estimated x-coordinate, y-coordinate, and orientation, respectively, of the center of the object. Table 4.3: The actual and estimated object pose results for the first set of experiments with a moving robot. Test Points  Xactual  Xestimated  error  %error  Yactual  Yestimated  error  %error  8actual  8estimated  error  %error  1  1318  1384.52  66.52  5.05  3455  3514.78  59.78  1.73  30  28.57  1.43  4.77  2  2467  2543.12  76.12  3.09  4047  4149.39  102.39  2.53  15  12.95  2.05  13.67  3  1205  1237.39  32.39  2.69  6147  6219.28  72.28  1.18  45  44.21  0.79  1.76  4  2745  2790.32  45.32  1.65  6123  6183.43  60.43  0.99  -15  -16.43  1.43  9.53  5  1465  1487.45  22.45  1.53  7442  7543.27  101.27  1.36  -60  -61.25  1.25  2.08  6  2063  2092.14  29.14  1.41  8522  8632.02  110.02  1.29  -30  -31.37  1.37  4.57  7  2842  2914.42  72.42  2.55  9712  9828.21  116.21  1.2  -75  -76.65  1.65  2.2  101  ____  ___________  ________________  ______________________________  ____  3500  I:_  3000 2500  X- actual  2000 C) C., ,  U,  I I— ‘UI [i[ IL II F 50  •X-estimated  1500  —  X-error  1000  2  1  3  5  4  6  7  Test Point  Figure 4.17: The x error from Table 4.3. -  12000  -  10000  E  8000 DY-actual :rTaH  I  Test Point  Figure 4.18: The y error from Table 4.3. -  102  60 I—  40 20 E E 0 C)  .5U,  0  ri,l  U  2  I  3  -20  D theta- actual •.  • theta O theta  -40  -  -  estimated error  -60  -100 Test Point  Figure 4.19: The orientation error from Table 4.3.  In the second set of experiments, an object having dimensions 590mm x 726mm x 171 5mm is used. Table 4.4 presents the results from these experiments. Figures 4.20, 4.21, and 4.22 correspond to this table, which show the difference between the actual and the estimated x-coordinate, y-coordinate, and orientation, respectively, of the center of the object. Here too the error readings are well within acceptable range. Table 4.4: The actual and estimated object pose results for the second set of experiments with a moving robot. %error  8actual  6estimated  error  %error  23.52  0.67  30  28.27  1.73  5.77  4112.23  89.23  2.22  15  13.95  1.05  7  6123  6179.72  56.72  0.93  45  43.77  1.23  2.73  2.32  6147  6184.17  37.17  0.6  -15  -16.43  1.43  9.53  25.76  1.79  7412  7513.23  101.23  1.37  -60  -62.24  2.24  3.73  2072.34  32.34  1.59  8541  8612.98  71.98  0.84  -30  -32.06  2.06  6.87  2903.59  78.59  2.78  9703  9812.73  109.73  1.13  -75  -77.06  2.06  2.75  Test Points  Xactual  Xestimated  error  %error  Yactual  Yestimated  1  1321  1356.43  35.43  2.68  3486  3509.52  2  2488  2502.32  14.32  0.58  4023  3  1223  1235.43  12.43  1.02  4  2724  2787.21  63.21  5  1437  1462.76  6  2040  7  2825  error  103  3500 3000 —  I  i  i,. .,rjr[  I -  D X actual  •X-estimated  :j,oxeor  Test Point  Figure 4.20: The x-axis error from Table 4.4.  12000  10000  E  8000  E 6000  20 : 111HHL1[  4000  1  2  3  4  5  6  7  Test Point  Figure 4.21: The y-axis error from Table 4.4.  104  60 40 20 0 C theta- actual a)  -20  • theta O theta  -40  -  -  estimated error  -60 -80 -100 Test Point  Figure 4.22: The orientation error from Table 4.4. By comparing the % error of the results obtained when the robots were stationary with those when the robots were moving, it is seen that the latter results are better. When a robot autonomously explores the surroundings and finds an object, it stops at a suitable distance from the object, rearranges its pose to centre the color blob within the camera frame and then estimates the pose of the object. On the other hand, in the case of a stationary robot, the object is placed in front of the robot, and the associated distance may not be a suitable. Consequently, human error may have contributed to the enhanced % error in pose estimation with a stationary robot.  4.5 Summary A method for pose estimation of an object for application in cooperative object transportation by mobile robots was developed in this chapter. A CCD camera, optical encoders and laser range finders were the sensors utilized by the robots. In order to transport the object by robots, first robot global localization was established. Next, the CCD camera was used to find the object in the work environment. Once the object was identified by using a color blob tracking approach, the robot rotated its base to move the color blob into the center of the camera frame. Finally a laser range finder was used to 105  scan the object and to determine the distance and the orientation angle. The developed approach was thoroughly tested using a series of physical experiments.  106  Chapter 5 Experimentation in Distributed Multi-Robot Cooperative Object Transportation 5.1 Overview Multi-robot object transportation is an active research area in the robotics community. The particular multi-robot system as developed in our laboratory (Industrial Automation Laboratory  -  JAL at the University of British Columbia  -  UBC) consists of three  heterogonous multiple robots. It is intended to transport an object of interest to a goal location cooperatively in both static and dynamic environmental conditions. There are practical and theoretical issues, which need to be addressed for realizing the cooperative transportation task. First, techniques such as the modified  Q  learning algorithm and the  multi-robot coordination framework in a multi-agent environment are used in the developed cooperative transportation system. These techniques deal with important issues that arise in a multi-agent environment. The related issues include task assignment, dynamic environment, and distributed control. Second, though multi-robot transportation itself applies in industrial projects and scientific explorations, it faces some challenges, which are addressed in the present thesis. As a rationale for a possible practical application, it is desirable to have multiple autonomous robots transport an object of interest in a dangerous or hazardous environment, rather than employing human operators for it. In this chapter, the techniques developed in the previous chapters, such as the hierarchical multi-robot architecture, multi-robot coordination mechanism, the modified  Q  learning  algorithm, and the pose estimation technology, are integrated to implement a physical multi-robot transportation system operating in a dynamic and unknown environment. These techniques are tested and validated with the available mobile robots in IAL at UBC.  107  5.2 Test Bed An experimental system has been developed in the JAL at UBC to implement a physical multi-robot object transportation system. An overview of this system is presented in Figure 5.1, which is a repetition of Figure 1.2.  Server  Wired/wireless Router TCP/IP -  Figure 5.1: The multi-robot object transportation system. Three autonomous mobile robots are used in the developed system to transport a heavy and long rectangular shaped object to a goal location in the testing area of our laboratory. According to the environment in the testing area, a simple global coordinate system is employed to validate the techniques developed in the previous chapters. In the beginning of the operation, each robot is informed of their initial position and orientation with respect to the global coordinate system. It estimates the latest pose (position and orientation) of a robot by recording and analyzing the data of the wheel encoders and the data of its compass sensor while it moves in the environment. Each robot in the system  108  ____  transmits continuously the current pose via the wireless network. In addition, the robots are assumed to know the global coordinates of the goal location in advance. The object and the obstacles are randomly placed within the environment. When the multi-robot system begins to operate, robots have to search the environment in order to detect the object. Once a robot in the system detects the object, it will estimate the pose of the object. Then the particular robot will transmit the object pose to the system and this robot will act as a leader of the group for that particular instance. The leader assign task to the other robots for detecting obstacles in the environment. The task of detection and feature estimation of objects and obstacles is accomplish by fusing the sensory data from the sonar, laser range finder and CCD camera of a robot. The sensors are assumed to only detect obstacles within the detection radius of 2 meters, and a robot is able to exchange its sensory information with its peers via wireless communication to establish a binary word for state identification. The robots possess local sensing capabilities only and therefore, they are able to know only a local area bounded by the detection radius of the environment. Two color blobs are present on the object of interest. One color blob is placed on one lateral side and on its opposite side. The other color blob is on the adjoining side and its opposite side. After a robot detects the color blobs using its CCD camera, the robot rearranges its pose to estimate the pose of the object by fusing information from the sonar and the laser range finder of the robot. If an object without any color blobs is detected within the environment, it will be regarded as an obstacle. Large and heavy objects with their color blobs are shown in Figure 5.2.  , Color Blob (a)  (b)  Figure 5.2: The color blobs used to identifi the orientation of an object. 109  Each robot makes decisions independently to build the state binary code utilizing the pose information of the objects and obstacles. Robots need to exchange this information with their peers in order to form a cooperation strategy and for making decisions. This subject has been addressed in Chapter 3. The developed system consists of a computer server, which is used to synchronize the actions of the robots. However, it does not serve as a centralized decision maker because the system is fully distributed where each robot makes decisions independently. The mobile robots are manufactured by ActivMedia Robotics, which is an established company in the mobile robot industry. One four-wheel-driven PioneerTM 3-AT robot and two three-wheel-driven PioneerTM 3-DX robots are used in the cooperative object transportation system developed at our laboratory (IAL). They represent an agile, versatile, and intelligent mobile robotic platform with high performance current management to provide power when it is needed. Built on a core client-server model, the P3-DXIAT robots contain an embedded Pentium III computer, opening the way for onboard vision processing, Ethernet-based communication, laser sensing, and other autonomous functions. The P3 robots store up to 252 watt-hours of hot-swappable batteries. They have a ring of eight forward sonars and ring of eight rear sonars. Their powerful motors and 19 cm wheels reach speeds of 1.6 mIs and carry a payload of up to 23 kg. In order to maintain accurate dead reckoning data at these speeds, the Pioneer robots use 500-tick encoders. Its sensing and accessories include laser based navigation, bumpers, gripper, vision, compass and a suite of other options. A P3-DX robot and a P3AT robot are shown in Figure 5.3 and Figure 5.4, respectively.  Figure 5.3: PioneerTM 3-AT mobile robot.  110  Figure 5.4: PioneerTM 3-DX mobile robot. The P3-DX/AT robots with the ARIA software have an ability to: •  Wander randomly  •  Plan paths with gradient navigation  •  Provide C/C++ development platform  •  Localize using sonar and laser distance finder  •  Display a map of its sonar and/or laser readings  •  Drive under control of software, keys or joystick  •  Communicate sensor and control information related to sonar, motor encoder, motor controls, user I/O, and battery charge data  •  Simulate behaviors offline with the MobileSimTM simulator, which accompanies each development environment.  The robot can building maps and constantly updates its position using its Laser Mapping and Navigation System and MobileEyes, within a few centimeters while traveling within 111  mapped areas. The user is able to observe the view of the robot remotely, speak, and hear audio, with appropriate accessories and send the robot on patrol (wandering). In summary, the Pioneer-3 DX or AT robots are all-purpose robots, used for research and applications involving mapping, tele-operation, localization, monitoring, reconnaissance, vision, manipulation, cooperation and other behaviors.  5.3 Physical Multi-Robot Transportation System A physical multi robot transportation system has been developed to integrate various approaches developed in chapters 2 though 4. The system has local sensing capabilities only. An experiment is carried out to validate these approaches in a real environment in the presence of sensor noise. In this experiment, three mobile robots equipped with the modified Q learning algorithm are employed to test the cooperative transportation strategy of a large and heavy object. The object is to be transported from one end of the environment to a predetermined goal location in the presence of an obstacle. The simulation system described in Chapter 3 is employed to train the robots, before the experiment is carried out. In this manner, the Q-tables of the robots are improved according to the environment model. After 10,000 rounds of simulated object-pushing, the Q-tables of the three simulated robots are exported to the three physical robots to complete a physical cooperative object-pushing task. In the simulation system described in Chapter 3, after each step of object-pushing, the robots will identifi the new environment state and select the corresponding actions according to the MQLA. The same policy is employed in the experimental system. However, a larger distance is selected as the detection radius for the physical multi-robot object-pushing system in comparison to the simulation system. The experimental results of cooperatively transporting an object in a real environment are presented in Figure 5.5.  112  (a)  (b)  (c)  113  t71[  (J)  ()  (p)  (g)  I  (h)  (i)  115  a) Figure 5.5: Multi-robot cooperative transportation. The three mobile robots, the large object and an obstacle have been employed in our experimental setup. The three mobile robots are informed about their initial pose (positions and orientations) with respect to the global coordinate system before they begin their work. As shown in Figure 5.5 (a), first a large object is placed in the testing area of our laboratory. When the system operation starts, each robot uses its CCD camera to wander within the testing area and to detect the color blob on the object surface. Figure 5. 5 (b) shows one of the robot in the system detecting the object. If a robot among the three robots detects the color blob on the object surface, then that particular robot rearranges its current pose so as to center the color blob in its camera frame. Figure 5. 5 (c) shows the robot, which detects the object, has rearranged its current pose to center the color blob in its camera frame. After that, the pose of the object is estimated relative to the current pose of the particular robot using the sensory data from its laser range finder. Figure 5.5 (d) shows the robot, which detects the object, goes to estimate the pose of the object. By fusing relative pose of the object, the robot can estimate the pose of the object in the global coordinate system. This information is then transmitted to the other robots in the group. At the same time, the robot that detected the object will be assigned the leadership of the group and all other robots in the system will transfer their decision rights to the coordination unit of the leader robot through their arbitrators. After all the robots have transferred their decision making rights to the leader robot, it will assign  116  tasks to peer robots in the system in order to assist them in identifying the obstacle distribution around the object. Figure 5.5 (e) shows how all the robots in the system try to identify the obstacle distribution around the object. The leader robot transmits all processed sensor information to the system. Then the robots will be able to establish the state of the current local environment individually by fusing the information of the object pose with the information of local obstacle distribution. Local environmental state is established by the robots, and the optimal action under this state is determined using its Q-table. Figure 5.5  (0 shows how the robots push the object with the selected actions.  If none of the robot in the system can find the object, those robots will wander in the environment until one of them detects the object. While each robot tries to identify the existing obstacles within its detection area, simultaneously it scans its local environment utilizing the laser range finders. If an object is detected within this area, the robot needs to verify whether it is an obstacle or it is a peer robot. Figure 5.5 (g) and (h) show the robots detecting an obstacle (the blue garbage bin) in the path and avoiding that obstacle. This obstacle had not been detected before by the sensors of the robots due to the limited detection radius. Figure 5.5 (f) and (g) show that the robots have changed their formation to adapt to the new environment state. Here they attempt to change the orientation of the object in order to avoid the obstacle. Figure 5.5 (g) and (h) show that the robots have avoided the obstacle successfully. Figure 5.5 (i) and (j) show that the robots effectively have completed a cooperative  transportation task in an environment with one unknown obstacle. However, there is an error due to wheel slip, which corresponds to the difference between the actual location of the robot and its desired location. From Figure 5.5 (a)-(j), it is observed that the learning-based multi-robot system is quite effective in carrying out a cooperative transportation task in an environment with  117  unknown obstacles. The learned Q-tables in the training stage help the robots to select good cooperation strategies in a robust and effective manner.  5.4 Summary A fully distributed multi-robot transportation system is physically implemented in the Industrial Automation Laboratory (IAL) at the University of British Columbia (UBC), which integrated the approaches developed in chapters 2 through 4. In the system, three mobile robots were employed to push a heavy object to a goal location in the laboratory environment. There were three challenges to achieve the goal in the developed system. First, each robot only possesses local sensing capability, which means they have to make decisions while they do not possess knowledge of the global environment. Second, the experimentation system presents sensor noise and uncertainty of actions, which did not exist in the simulation system. These constraints degrade the performance of the robots. Finally, the real environment is dynamic due to the presence of random obstacles. The distributed computing using C++ was introduced into the developed multi-robot system, which enabled the robots to cooperate in an easy and efficient manner. An experiment was carried out to evaluate the performance of the developed distributed multi-robot system. In this experiment, cooperative robotic transportation of a long and heavy object in an environment with unknown obstacles was carried out. The experimental results showed that the developed multi-robot system has the ability to work well in a dynamic environment, with sufficient accuracy. It can be concluded that the approaches developed in chapters 2 through 4 are useful and effective in enabling a multi-robot system to operate in a dynamic and unknown environment.  118  Chapter 6 Conclusions 6.1 Primary Contributions Multi-robot systems have to undergo many improvements before they can be used in real life applications. For example, the challenges that have to be faced are enormous in developing a system of multiple autonomous robots to cooperatively transport an object of interest in a dynamic and unlcnown environment like the surface of Mars. In this thesis, several techniques have been developed and integrated to support the operation of a multi-robot object transportation system in a complex and unstructured dynamic environment with an unknown terrain. In particular, the thesis has made contributions with respect to self-deterministic learning, robustness, action optimization, and real time operation of a cooperating team of robots in such environments. Basically, the contributions of the thesis can be classified into four main areas as indicated below. First, a general, distributed multi-robot architecture and a multi-robot coordination method were developed in Chapter 2. This hierarchical architecture integrated several techniques from artificial intelligence (Al) so that multi-robot systems could operate in a robust and effective manner in a dynamic and unknown environment. In its top level, a machine learning unit and a coordination unit were combined to establish good cooperative strategies for the robots while they attempt to cooperatively transport an object. A machine learning unit with effective training was used to select proper actions for the robots so that they could complete a common task quickly and effectively. In the state identification stage, the coordination unit would be temporarily activated by an arbitrator to search the environment according to the coordination mechanism for the object in order for the robots to estimate the pose and build the state binary word. In the middle level of the architecture, a behavior-based approach, which has been proved to be very effective for single robot systems, was employed to decompose the abstract behavior sent by its upper level into more detailed primitive behaviors so that the detailed control  119  requirements could be generated and executed. This architecture also included a communication module so that any robot could easily exchange information with other robots in the team using standard network protocols. By combining the learning, coordination and behavior-based approaches, and carefully designing the robot coordination mechanism, the developed multi-robot architecture was found to be more flexible and powerful than a traditional two layer architecture based on a single Al technique. Second, machine learning was employed to find optimal cooperation strategies for multirobot systems so that they could operate in a dynamic and unknown environment. For the prototype system has the to cope with scalability, computational complexity and communication bottleneck when the number of robots is increased. To overcome the associated problems,  a distributed Q-learning approach for multi-robot object  transportation was developed. By directly extending single-agent Q-learning to the multirobot domain and carefully designing the reward function, the developed system was able to demonstrate good adaptively and effectiveness in a complicated environment with multiple obstacles. However, due to lack of knowledge of the actions of the peer robots, the single-agent Q-learning algorithm was found to converge very slowly in a multi-robot environment. In fact, single-agent Q-Learning showed better performance (as measured by the average number of steps required in a round of object-pushing). In order to overcome various disadvantages of the traditional single-agent Q-learning algorithm when extended to a multi-agent scenario, a modified Q-Learning algorithm termed MQLA was developed in this thesis. By enabling the robots to learn in a predefined sequence, the new MQLA algorithm was able to overcome several major shortcomings in the single-agent Q-Leaming. Consequently it is more suitable for multi robot tasks. The simulation results showed that the MQLA algorithm needed less time to complete a multi-robot transportation task and received better reward than the traditional Q-Learning algorithm. Third, a pose estimation technique was developed to detect the object pose using two color-blobs simultaneously so that the orientations and positions of the robots could be  120  determined in a multi-robot object transportation task. This method was found to meet two challenges in carrying out the specific task: the changing illumination levels in the work environment and the real-time operating requirement. With the available domain knowledge, the algorithm was able to effectively and quickly track multiple moving color blobs simultaneously in an environment with uneven illumination. Finally, a physical multi-robot transportation project was developed in the laboratory to implement and validate the approaches developed in the thesis. The physical system faced more challenges than the computer simulation system; in particular, sensor noise, wheel slip, real-time operation, and motion constrains. In addition, in order to implement effective communication among multiple robots and facilitate proper execution of the new MQLA algorithm, the Microsoft visual studio (C++) technology was incorporated into the system. The experimental results showed that the developed physical system was able to operate effectively and robustly in a dynamic physical environment with obstacles.  6.2 Limitations and Suggested Future Research Although the developed multi-robot transportation system has demonstrated good performance both in computer simulation and physical experimentation, there are some areas that need improvement. Some directions for further research are indicated next.  6.2.1 Improvement of the Model and Algorithm of MQLA The credit assignment is rather difficult in multi-robot learning. For example, when multiple robots cooperatively push an object to a goal location, how to split the observed global reward among the robots is an important issue. In MQLA developed in this thesis assumes each robot makes a positive contribution to the collective task, which may not be true in a real multi-robot system. In practice, each robot makes a different contribution to the collective task. If the global reward is directly assigned to each robot, it means that each robot receives the same reward regardless of whether it makes a positive contribution to the collective task. Due to this global reward policy, MQLA simulation results in Chapter 3 have shown that the learning agents can converge slowly or even  121  become confused in a multi-robot cooperative task, thereby making the MQLA algorithm ineffective. It follows that updating the Q-values with a global reward is not proper in a dynamic and unknown environment. Therefore, more work may be needed to estimate the real reward of each robot from the global reward signal for the MQLA algorithm.  6.2.2 Integration of Learning and Coordination Learning and coordination are both important for multi-robot systems when operating in an unknown and dynamic environment. In this thesis, a type of switching strategy between learning and coordination was developed, where a coordination unit was temporarily activated when the stage of state identification of the machine learning approach was encountered. However, more advanced integration scheme would be useful so as to make use of coordination in a multi-robot task.  6.3 Summary New developments in multi-robot systems will endow the next generation of robotic systems with more powerful capabilities. It has been an important objective to allow multiple robots to autonomously carry out such tasks as cooperative object transportation or robotic soccer, in dynamic, unknown, and unstructured environments. In this thesis, several key approaches were developed toward achieving this goal. It is believed that no single AT technique can meet this objective. Instead, learning, reactive (behavior-based) paradigm and coordination approaches should be integrated, and new approaches should be developed on that basis so that an autonomous multi-robot system will be able to operate in a dynamic and unknown environment in an effective and robust manner. Some important work has been carried out in this thesis towards this general objective, specifically with regard to stage of state identification, improved robustness, object pose estimation, obstacle pose estimation, and real-time operation.  122  Bibliography Alpaydin, E., Introduction to Machine Learning, The MIT Press, Cambridge, MA, 2004. Arai, T., Pagello, E. and Parker, L.E., “Advances in multirobot systems,” IEEE Trans. on Robotics andAutomation, Vol.18, No. 5, pp. 655-661, 2002. Arkin, R.C., Behavior-Based Robotics, The MIT Press, Cambridge, MA, 1998. Camacho, D., Fernandez, F. and Rodelgo, M.A., “Roboskeleton: an architecture for coordinating robot soccer agents,” Engineering Application ofArtificial Intelligence, Vol. 19, No. 2, pp. 179-188, 2006. Cao, Z., Tan, M., Li, L. Gu, N. and Wang, S., “Cooperative hunting by distributed mobile robots based on local interaction,” IEEE Transactions on Robotics, Vol. 22, No. 2, pp. 403-407, 2006. Chaimowicz, L., Kumar, V. and Campos, M.F.M., “A paradigm for dynamic coordination of multiple robots,” Autonomous Robots, Vol. 17, No. 1, pp. 7-21, 2004. Craig, J.J., Introduction to Robotics: Mechanics and Control, Third Edition, Pearson Prentice Hall, Upper Saddle River, NJ, 2005. Dahl, T.S., Mataric, M.J. and Sukhatme, G.S., “Adaptive spatio-temporal organization in groups of robots,” Proc. of the 2002 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), Lausanne, Switzerland, pp. 1044-1049, 2004. De Silva, C.W, MECHATRONICS—An Integrated Approach, Taylor & Francis/CRC Press, Boca Raton, FL, 2005. Farinelli, A., Iocchi, L. and Nardi, D., “Multirobot systems: a classification focused on coordination,” IEEE Transactions on Systems, Man, and Cybernetics, Part B: Cybernetics, Vol. 34, No. 5, pp. 2015-2028, 2004. Gerkey, B.P., and Mataric, M.J., “Sold!: auction methods for multirobot coordination,” IEEE Trans. on Robotics and Automation, Vol. 18, No. 5, pp. 758-768, 2002. Gerkey, B.P., and Mataric, M.J., “A formal analysis and taxonomy of task allocation in multi-robot systems,” International Journal of Robotics Research, Vol. 23, No. 9, pp. 939-954, 2004. Gustafson, S. and Gustafson, D.A., “Issues in the scaling of multi-robot systems for general problem solving,” Autonomous Robots, Vol. 20, No. 2, pp. 125-13 6, 2006.  123  Gopalakrishnan, A., Greene, S. and Sekmen A., “Vision-based mobile robot learning and navigation,” Proceedings of IEEE International Workshop on Robot and Human Interactive Communication, Nashville, TN, pp. 48-53, 2005. Hajjdiab, H. and Laganiere, R., “Vision-based multi-robot simultaneous localization and mapping,” Proceedings of 1st Canadian Conference on Computer and Robot Vision, pp. 155-162, May, London, ON, Canada, 2004. Huntsberger, T., Pirjanian, P., Trebi-Ollennu, A., Nayar, H.D., Aghazarian, H, Ganino, A.J., Garrett, M., Joshi, S.S. and Schenker, P.S., “Campout: a control architecture for tightly coupled coordination of multi-robot systems for planetary surface exploration,” IEEE Trans. on Systems, Man, and Cybernetics, Vol. 33, No. 5, pp. 550-559, 2003. Ito, K. and Gofuku, A., “Hybrid autonomous control for multi mobile robots,” Advanced Robotics, Vol. 18, No. 1, pp. 83-99, 2004. Kapetanakis, S. and Kudenko, D., “Reinforcement learning of coordination in cooperative multi-agent systems,” Proceedings of Eighteenth national conference on Artificial intelligence, Edmonton, Alberta, Canada, pp. 326-33 1, 2002. Kato, K., Ishiguro, H. and Barth, M., “Identification and localization of multiple robots using omnidirectional vision sensors,” Electronics and Communications in Japan, Part II: Electronic, Vol. 86, No. 6, pp. 1270-1278, 2003. Konidaris, G.D. and Hayes, G.M., “An architecture for behavior-based reinforcement learning,” Adaptive Behavior, Vol. 13, No. 1, pp. 5-32, 2005. Kovac, K., Zivkovic, I. and Basic, B.D., “Simulation of multi-robot reinforcement learning for box-pushing problem,” Proceedings of the Mediterranean Electrotechnical Conference MELECON, Dubrovnik, Croatia, pp. 603-606, 2004. —  Kumar, M. and Garg, D.P., “Sensor-based estimation and control of forces and moments in multiple cooperative robots,” Journal ofDynamic Systems, Measurement, and Control, Transactions oftheASME, Vol. 126, No. 2, pp. 276-283, 2004. Littman, M.L., “Friend-or-Foe Q-learning in general-sum games,” Proc. International Conf on Machine Learning, San Francisco, CA, pp. 322-328, 2001.  18th  Liu, J. and Wu, J., Multi-agent Robotic Systems, CRC Press, Boca Raton, FL, 2001. Marshall, J.A., Fung, T., Broucke, M.E., D’eleuterio, G.M.T., Francis, B.A., “Experiments in multirobot coordination,” Robotics and Autonomous Systems, Vol. 54, No. 3, pp. 265-275, 2006.  124  Martison, E. and Arkin, R.C., “Learning to role-switch in multi-robot systems,” Proc. of IEEE International Conference on Robotics & Automation, Taibei, Taiwan, pp. 27272734, 2003. Maurin, B., Masoud, 0. and Papanikolopoulos, N. P., “Tracking all traffic: computer vision algorithms for monitoring vehicles, individuals, and crowds,” IEEE Robotics and Automation Magazine, Vol. 12, No. 1, pp. 29-36, 2005. Miyata, N., Ota, J., Arai T. and Asama, H. “Cooperative transport by multiple mobile robots in unknown static environments associated with real-time task assignment,” IEEE Trans. on Robotics andAutomation, Vol. 18, No. 5, pp. 769-780, 2002. Murarka, A., Modayil, J. and Kuipers, B., “Buildin local safety maps for a wheelchair robot using vision and lasers,” Proceedings of the 3’ Canadian Conference on Computer and Robot Vision, Quebec City, QC, Canada, pp. 25-32, 2006. Panait, L. and Luke, S., “Cooperative multi-agent learning: the state of the art,” Autonomous Agents and Multi-Agent Systems, Vol. 11, No. 3, pp. 387-434, 2005. Parker, L.E., “Current state of the art in distributed autonomous mobile robotics,” Distributed Autonomous Robotic Systems 4, Springer-Verlag, Tokyo, pp. 3-12, 2000. Parker, L.E., Touzet, C. and Fernandez, F., “Techniques for learning in multirobot teams,” (Chapter 7), Editor: Baich, T. and Parker, L.E., Robot Teams: From Diversity to Polymorphism, AK Peters, Ltd., Natick, Massachusetts, 2002. Pereira, G., Campos, M. and Kumar, V., “Decentralized algorithms for multi-robot manipulation via caging,” The International Journal of Robotics Research, Vol. 23, No. 7-8, pp. 783-795, 2004. Russell S. and Norvig P., Artflcial Intelligence: A Modern Approach, Second Edition, Pearson Education, Inc., Upper Saddle River, NJ, 2003. Spong, M.W., Hutchinson, S., and Vidyasagar, M., Robot Modeling and Control, John Wiley & Sons, Inc., Hoboken, NJ, 2006. Stone, P., Sridharan, M., Stronger, D., et al., “From pixels to multi-robot decisionmaking: A study in uncertainty,” Robotics and Autonomous Systems, Vol. 54, No. 11, pp. 933-943, 2006. Takahashi, S. and Ghosh, B.K., “Parameter identification of a moving object based on sensor fusion,” Proceedings of IEEE International Conference on Mechatronics and Automation, Niagara Falls, ON, Canada, pp. 17 1-176, 2005.  125  Tangamchit, P., Dolan, J.M. and Khosla, P.K., “Crucial factors affecting decentralized multirobot learning in an object manipulation task,” Proceedings of IEEE International Conference on Intelligent Robots and System, Las Vegas, NV, pp. 2023-2028, 2003. Taylor, M.E. and Stone, P., “Behavior transfer for value-function-based reinforcement learning,” Proceedings of the 4th International Conference on Autonomous Agents and Multi agent Systems, Utrecht, Netherlands, pp. 201-207, 2005. Tomono, M., “Environment modeling by a mobile robot with a laser range finder and a monocular,” Proceedings of 2005 IEEE Workshop on Advanced Robotics and its Social Impacts, Nagoya, Japan, pp. 133-138, 2005. Veeraraghavan, H., Masoud, 0. and Papanikolopoulos, N.P., “Computer vision algorithms for intersection monitoring,” IEEE Transactions on Intelligent Transportation Systems, Vol. 4, No. 2, pp. 78-89, 2003. Wang, Y. and de Silva, C.W., “An object transportation system with multiple robots and machine learning,” Proc. of American Control Conference (A CC 2005), Portland, OR, pp.1371-1376, 2005. Wang, Y. and de Silva, C.W., “Multi-robot box-pushing: single-agent Q-Learning vs. team Q-Learning,” Proceedings of 2006 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), Beijing, China, pp. 3694-3699, 2006. Wang, Y. and de Silva, C.W, “A machine learning approach to multi-robot coordination,” Engineering Applications ofArtficial Intelligence (Elsevier), 2007. Wang, Y., “Cooperative and Intelligent Control of Multi-robot Systems Using Machine Learning,” Thesis for The Degree of Doctor of Philosophy (University of British Columbia), 2007. Yang, E., and Gu, D., “Multiagent reinforcement learning for multi-robot systems: a survey,” Technical Report, http://robotics. usc. edu/—maja/teaching/cs584/papers/yango4multiagent.pdf 2004. Zaerpoora, A., Ahmadabadia, M.N., Barunia, M.R. and Wang, Z., “Distributed object transportation on a desired path based on constrain and move strategy,” Robotics and Autonomous Systems, Vol. 50, No. 2-3, pp. 115-128, 2005.  126  Appendix 1— Program Functions Summary This appendix outlines some important classes and functions as used in the present work.  Classes Development of the application in this thesis has used following important classes: MyClient, MyACTS, MyPose MyDecodeEncode MyQLearningRobot  Functions MyClient.h class MyClient public: MyClient (ArRobot* robot); void connectServer(int myNo); bool sendingString(char* string,bool broadcast,const char* WinnerlP);  MyACT.h class MyGeneralACTS: public ArRobot public: MyGeneralACTS (ArRobot* robot) { }; mt ACTSCo10rB1obQ; void CentreTheBlob(ArRobot* robot,int xl,int x2,int yl,int y2)  MyPose.h class MyPose:  public ArRobot  public: MyPose() {}; void FindObstacleGroup(ArRobot* robot, ArSick* sick, double* ObstacleArrayX, double* ObstacleArrayY); void poseOfTheBox( double xl, double yl, double x2, double y2, double thetal); ArPose poseEstimate(ArRobot* robot, ArSick* sick);  127  MyQLearning.h class MyDecodeEncode: public ArRobot{ public: MyDecodeEncode(ArRobot* robot) {}; void GetMailHeading(char* MailHeading); char* StringPackage(double x, double y, double theta); ArPose StringDecode (char* strToDecode);  class gLearningRobot: public ArRobot{ public: qLearningRobot(ArRobot*robot) :tao(0.9) ,action(0) ,actiomNum( 6) ,successProcessCount(0) ,learningRate(0.8) ,discountRate(0. 9) ,wl (0. 7) ,w2 (0. 25) ,w3 (0. 05) { }; double detectionRadius(int width,int height); void selectAction(int StateCode); mt probaSelect(double p[], mt n); void calculateRobotActionPositions (mt action, ArPose boxPose, ArPose p1); void calForceParaQ; void boxResponse (ArPose position); void calNextPosition (ArPose boxPose); bool qLearningRobot: :obstacieDetect(ArPose boxPose,double* obstacleArrayNewX, double* obstacleArrayNewY); void calculateReward(ArPose oldPose); void updateQ(int state, mt newState, double reward); void frameConvert(ArPose p1, ArPose p2, ArPose origin); void frame2Tol(ArPose p1, ArPose p2, ArPose origin);  General. cpp  mt BuildStateCode () void qTablelnitialization () mt main(int argc, char **argv)  128  Appendix 2— Program Codes Summary This appendix includes important Microsoft C++ files and their implementation codes.  C++ Files MyClient.h, MyACT h, MyPose.h, MyQLeaming.h, General.cpp .  MyClient.h / This program will uses to to communicate between robots during the transportation process. It acts as a client to all robots individually in the group to receive the handshaking signal from other robots.  #include “string.h” #include <iostream> #include <sstream>  class MyClient public: MyClient (ArRobot* robot); -MyClient () {}; void connectServer(int myNo); bool sendingString(char* string,bool broadcast,const char* Winnerlp); const char* ipAddress; mt myOwnNo; void close; private: ArRobot* myRobot; ArPose currentPose; double currentX; double currentY; double currentTh; char *strToSend; char strToBroadcast[128]; //const char* ipAddress; char *tok; char strToGet[lO]; char *otherlP[71;  129  char *strToEnd; char buffer[331; mt roboNo; mt maxRoboNo; ArMutex myCycleCailback; ArSocket globalOtherSoc[5]; sizet strSize; timet timeAuctionStart;  MyClient: :MyClient(ArRobot* robot): strToEnd (“BYE! ! !“), myRobot (robot) myExitPlanCB. setName (“myExitPlan”); Aria: :addExitCallback(&myExitPlanCB,  21);  I/Other robots’ IP address otherlP[O] =“192.168.1.3; otherlP[1] =“192.168.1.4”; otherlP[2] =“192.168.1.5”;  void MyClient: :connectServer(int myNo) //connect to other servers Sleep(10000) ; mt counter; myOwnNo = myNo; for (roboNo = 0; roboNo <maxRoboNo+1;  (roboNo == myNo) roboNo++; if (roboNo > maxRoboNo) break; for (counter = 0; counter < 1;  roboNo++)  if  7777+roboNo,  counter++)  if (globalOtherSoc[roboNo] .connect(otherlP[roboNo], ArSocket: :TCP))  break;  if  (counter  ==  1)  bool MyClient::sendingString(char* string, strToSend = string; if (broadcast  ==  bool broadcast)  true)  130  for  if if  (roboNo  (roboNo  =  0;  roboNo <maxRoboNo+1;  roboNo++)  ==  myOwnNo && roboNo < maxRoboNo) roboNo++; NULL) (WinnerlP if (strcmp(WinnerlP,otherlP[roboNoj) == 0  continue; if  (roboNo > maxRoboNo) break;  globalotherSoc [roboNo] .write (strToSend, strien (strToSend)); return true; else if  (WinnerlP NULL) ipAddress = WinnerlP; else  for  (roboNo  =  0;  roboNo < maxRoboNo+l;roboNo++)  (strcmp(ipAddress,otherlP[roboNo])  if  ==  0)  if(globalOtherSoc[roboNo] .write(strToSend,strlen(strToSend)) strlen(strToSend))  !=  return false; return true;  return false;  void MyClient: :close() myCycleCailback. tryLock 0; for (roboNo =0; roboNo < maxRoboNo+l;  roboNo++)  if(globalOtherSoc[roboNo] .write(strToEnd, strien (strToEnd)) ); 1 printf(’See You again!!!’ myCycleCailback unlock .  strlen(strToEnd))  ==  0;  131  MyACT.h #include “Aria.h” tinclude “ACTSClient h” .  #include<iostream> #include<conio h> #include<stdlib h> #include <signal. h> .  using namespace std; class MyGeneralACTS: public ArRobot public: MyGeneralACTS (ArRobot* robot) { }; —MyGeneralACTS () { }; mt ACTSC010rB10bQ; void CentreTheBlob(ArRobot* robot,int xl,int x2,int yl,int y ); 2 mt getXcgQ{return xcg;} mt getYcgQ{return ycg;} private: mt xcg,ycg; mt channel; blob generaiblob; ACTSClient actsClient;  mt  MyGeneralACTS: :ACTSC01orB10b()  if ( ! actsClient openPort (“5001”)) .  cout<<”Sorry can’t open acts port\n”<<endl; return 0;  actsClient. requestPacket Q; if( actsClient.receiveBloblnfoQ)  if (actsClient.getBlob(0,0, generalblob.area > 5000)  &generalblob)&&  cout<<”Blob Area :“ <<generalblob.area<<endl; cout<<”Received the Blob\n”<<endl; xcg = generalblob.xcg; ycg = generalblob.ycg; actsClient closePort ; return 1; .  else  132  cout<<”Blob Area :“ <<generalblob.area<<endl; cout<<”Blob is not the correct one\n”); actsClient.closePort() return 2;  else cout<<”The blob is not received\n”<<endl; actsClient. closePort ; return 2;  void MyGeneralACTS::CentreTheBlob(ArRobot* robot,int xl,int x2,int yl,int y2)  // 1st quadrant if(((xl—x2)> 40)  &&  ((yl—y2)< —40))  // (1)move backward  (2)move left  cout<<”The Blob is in the 1st Quadrant”<<endl; double robotAngle = robot->getPose() .getTh(); cout<<”robotAngle:  “<<robotAngle<<endl;  /7 the angle needs to be determined by either getTh or boxPoseEstimate robot->setHeading (robotAngle+90); ArUtil: :sleep(8000) robot->move (250); ArUtil: :sleep(8000) robot->setHeading (robotAngle+180); ArUtil: :sleep(8000) robot->move (700); ArUtil: :sieep (8000); robot->setHeading (robotAngle); ArUtil: :sleep(8000)  /7 2nd quadrant if(Hxl—x2)> 40)  &&  ((yl—y2)> 40))  7/ (l)move forward  (2)move left  cout<<”The Blob is in the 2nd Quadrant”<<endl; double robotAngle = robot—>getPoseQ.getThQ; cout<<”robotAngle:  <<robotAngle<<endl;  // the angle needs to be determined by either getTh or boxPoseEstimate robot—>setHeading (robotAngle+90);  133  ArUtil: :sleep (8000); robot—>move (250); ArUtil: :sleep(8000) robot->setHeading (robotAngle); ArUtil: :sleep(8000) robot—>move (100); ArUtil: :sleep(8000) ;  II 3rd quadrant if(((xl—x2)< —40)  &&  ((yl—y2)> 40))  /1 (l)move forward  (2)move right  cout<<”The Blob is in the 3rd Quadrant”<<endl; double robotAngle = robot—>getPoseQ.getThQ; /1 double robotAngle = ((22.0 * (90 + (robot.getPose() .getTh())))/ (180.0 * 7.0)); cout<<”robotAngle: “<<robotAngle<<endl;  //  the angle needs to be determined by either getTh or boxPoseEstimate robot—>setHeading (robotAngle—90); ArUtil: :sleep (8000); robot—>move (250); ArUtil: :sleep (8000); robot—>setHeading (robotAngle); ArUtil: :sleep (8000); robot—>move (100); ArUtil: :sleep (8000);  II 4th quadrant if(((xl—x2)< —40)  &&  ((yl—y2)< —40))  (2)move right // (1)move backward cout<<”The Blob is in the 4th Quadrant”<<endl; double robotAngle = robot—>getPoseO.getThQ; cout<<”robotAngle: “<<robotAngle<<endl;  II the angle needs to be determined by either getTh or boxPoseEstimate robot->setHeading (robotAngle—90); ArUtil: :sleep(8000)  robot—>move (250); ArUtil: :sleep (8000); robot->setHeading (robotAngle-l80);  134  ArUtil: :sleep(8000); robot->move (700); ArUtil: :sleep (8000); robot—>setHeading (robotAngle); ArUtil: :sleep(8000)  MyPose.h / This program uses to detect the pose of the box (position and orientation),which uses data from a laser rangefinder to detect a continues line of points. *****************************************************************/  #include <cmath> using namespace std;  class MyPose:  public ArRobot(  public: double center_a, MyPoseQ{}; -MyPose() {};  center_b;  void FindObstacleGroup(ArRobot* robot, ArSick* sick); void poseOfTheBox( double xl, double yl, double thetal); ArPose poseEstimate(ArRobot* robot, ArSick* sick); private: ArPose ArPose double double double  currentPose; boxPose; boxCentreX; boxCentreY; boxCentreTh;  double currentX; double currentY; double currentTh;  135  /*****************  This method use to find obstacles  ***************/  ArSick*  void MyPose::FindObstacleGroup(ArRobot* robot,  double angle,dist,distl,anglel, CenterDist, std: :list<ArPoseWithTime *> *readings; std: :list<ArposeWithTime *>: :iterator it;  sick){  CenterAngle;  sick—>lockDevice () distl = sick—>currentReadingPolar (—60, 60, &angle); readings  mt mt al  sick->getCurrentBufferQ;  al=0, a2=0, a4=0; ObstacleMinLineDiff =  =  300;  readings—>sizeO;  double* double* double* double* for  =  (it  distArray = new double[al]; angleArray = new double[al]; ObstacleDistArray = new double[al ObstacleAngleArray = new double[al =  readings—>beginQ;  it  readings—>endO;  it++){  robot—>getPose() .findDistanceTo(ArPose((*it)_> distArray[a2j getXQ, (*it)_>getyQ)); angleArray[a2] = robot—>getPose() .findAngleTo(Arpose((*it)_> getXQ, (*it)_>getyQ)); a2++;  for if  Sleep(l00);  (mt  a3=0; a3<a2—l; a3++) distArray[a3+lJ)> ObstacleMinLineDiff) (abs(distArray[a3] if (distArray[a3] > distArray[a3+l]){ ObstacleDistArray[a4] = distArray[a3]; ObstacleAngleArray[a4] = angleArray[a3]; —  else ObstacleDistArray[a4] ObstacleAngleArray[a4]  = =  distArray[a3+lJ; angleArray[a3+lj;  a4++;  mt indexLocal = (int) ((a4/2)—l); double* LocalObstacleArrayX = new doubleindexLocal1; double* LocalObstacleArrayY = new double[indexLocal]; for  (mt  a3=l;  a3 <  ((a4/2)+l); a3++){  CenterDist =(ObstacleAngleArray[a31  +  ObstacleAngleArray[a3+lj)/2;  136  CenterAngle = (abs(ObstacleAngleArray[a3] ObstacleAngleArray[a3+1J)/2) + ObstacleAngleArray[a3+1]; LocalobstacleArrayX[a3—i) = CenterDist * sin(CenterAngle); LocalobstacleArrayy[a3—lj = CenterDist * cos(CenterAngle); —  //  above corrdinate related to robot,  ObstacleArrayX ObstacleArrayy  = =  convert into Global coordinate  LocalObstacleArrayX; LocalObstacleArrayY;  void Mypose::poseOfTheBox( double xl, double xface, yface; mt height = 1700; mt width = 720; mt half_height = 850; mt half_width = 360;  double yl,  double thetal){  x_face = (xl + x2)/2; yface = (yl + y2)/2; center_a = xface + (half_width * cos (thetal)); center_b = yface + (half_width * sin (thetal)); “<<centerb cout<<” Box Center X and Y: \n”<< center_a <<“ <<endl;  ArPose MyPose::poseEstimate(ArRobot* robot,  ArSick* sick)  double angle,dist,distl,anglel,startangle,startdist,endangle,enddist; double temp angle,temp = 0.0; std: :list<ArPoseWithTime > *readings; std: :list<ArPoseWithTime *>: :iterator it; ArUtil: :sleep(l000); sick—>lockDevice Q; distl = sick->currentReadingPolar (—60, 60, &angle); readings  =  sick->getCurrentBufferQ;  mt mt  i=0, j=0, a=0, b=1, c=1; al=0, a2=0, a3, minAngle=0, minPosi; double distance=300.0, s=0; al = readings—>sizeO; double* distArray = new double[al]; double* angleArray = new double[alj; double distil, dist2, anglell, angle2, double maxmumLineDiff=300;  frontDistance;  137  for  (it = readings—>beginQ;  it  != readings—>end; it++){  double distOOl = robot—>getPose() .findDistanceTo(ArPose((*it)_> (kit) —>getY())); double angieOOl = robot—>getPose() .findAngleTo(Arpose((*it)_> getXO, (*it)_>getyo)); getX  ,  distArray[a2] = robot->getPose() .findDistanceTo(ArPose((*it)_> getXQ, (*it)_>getyQ)); angleArray[a2] = robot—>getPose() findAngleTo(ArPose( (*it)_> getXQ, (*it)_>getyQ)); a2++;  for  (a3=O; if  for if  a3<=al;  a3++)  ((angleArray[a3])>= minAngie){ frontDistance = distArray[a3]; minPosi = a3; break;  (a3=minPosi—l; a3>=O;  a3——)  (abs(distArray[a3+l]— distArray[a3])>maxmumLineDiff) distil = distArray[a3+lJ; angleli = angieArray[a3+l1; break;  for if  (a3=minPosi+i;  a3<=al;  a3++)  (abs(distArray[a3—lj- distArray[a3])>maxmumLineDiff) dist2 = distArray[a3—l]; angle2 = angieArray[a3—l]; break;  Startdist = distil; startangle= anglell; enddist = dist2; endangle= angie2;  /1 Start and End angle calculation based on the XY plan double alpha=O, beta=O, dl, d2; alpha = ((22.0 * (90 + startangleH/ (180.0 * 7.0)); beta = ((22.0 * (90 + endangleH/ (180.0 * 7.0));  138  dl= startdist; d2= enddist; double xl new, ylnew, x2_new, y2new, double newcenter_b ,newcenter_a; xl new ylnew x2new y2new  = = = =  dl dl d2 d2  *  * *  -  dnum;  ylnew; x2_new; * (7 * atan(num / dnumH)/22;  —  poseOfTheBox(xl new,  /  num,  cos(alpha); sin(alpha); cos(beta); sin(beta);  *  num = y2 new dnum = xl_new thetal = (180  thetal,  ylnew,  x2new,  y2new,  thetal);  Error Handler—Start *******************/ newcenter_a = center_a; newcenter_b = value + center_b; boxCentreX =newcenter_a; boxCentrey =newcenter_b; boxCentreTh =thetal; boxPose. setPose (boxCentreX, boxCentreY, boxCentreTh); * * * * * * **** ** * * * * * * * *** *  / * * ** * * * * * * * * * * * * * * * Error Handler—End  /  Start ********************/ Globle Box Pose double x globle new, ygloblenew, Tb globle new; double anglenew, gama globle new;  /***************  —  Tb globle new = ((22 * (robot->getPose().getThQ))/ angle_new = (robot->getPose().getTh()) thetal;  (180  *  7));  —  xgloble new = (robot->getPose().getX()) + center_a cos (Thgloble new) + center_b * sin (Tb globle new); ygloble new = (robot->getPoseQ.getYQ) + center_a center_b * sin (Tb globle new); (Thgloble new) gama globle new = (180 * 7 * angle_new) /22;  *  *  cos  -  cout<<  * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * *  \n” <<endl; x globle new << “ , “ ) “<<endl; “* * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * \n “ <<endl;  “  cout<< “Globle Box Position : << angle_new << ygloble new << “ , cout<<  /********************  Globle Box Pose  (  —  “  <<  End  *********************/  sick—>unlockDevice 0; robot—>waitForRunExit 0; return boxPose;  139  MyQLeaming.h class MyDecodeEncode:  public ArRobot{  public: MyDecodeEncode(ArRobot *robot, ArSick* sick, ArAnalogGyro* }; -MyDecodeEncode () { }; void GetMailHeading(char* MailHeading); char* StringPackageol(bool boxOrRobot,double x, double y, double theta); char* StringPackageo2 (mt state); ArPose StringDecode (char* strToDecode); mt StringDecodeState (char* strToDecode); mt capabiiityCheck(bool arm,bool gripper,bool camera); char* dec2bin(int statecode,char *binaryequivalent); mt totaiStateCodeOl (mt statecodeOl, mt statecodeo2); mt totaiStateCode02 (mt stateCodelntOl, mt stateCodelnt02); gyro)  {  private: double posex, posey, posetheta; mt state; char* strToSend char strToGet [128]; char strToBroadcast[128]; char* strToDecode; char* actionToSend ; char actionToGet[128]; char actionToBroadcast [128]; char* actoinToDecode; char buffer[33]; char *tok; char* localMailHeading; bool isFinished; double forceValue[6]; size_t strSize; ArPose boxOrRobotpose; ArPose position;  char* MyDecodeEncode: :StringPackage0l (bool boxOrRobot,double x, y, double theta) if  (boxOrRobot  ==  double  true)  strcpy(strToBroadcast,”boxPose! ! !“); strcat(strToBroadcast,itoa((int)x,buffer, 10)); strcat (strToBroadcast,”! strcat (strToBroadcast, itoa( (int)y, buffer, 10)); strcat (strToBroadcast,”! strcat (strToBroadcast, itoa( (int) theta,buffer, 10)); strToSend = strcat(strToBroadcast,”!! !“);  140  else strcpy (strToBroadcast, “robotpose ! strcat (strToBroadcast, itoa ( (int) x,buffer, 10)); strcat (strToBroadcast,”! !“); strcat (strToBroadcast, itoa ( (int) y,buffer, 10)); strcat (strToBroadcast,”! ! strcat(strToBroadcast,itoa((int)theta,buffer,10)); strToSend = strcat(strToBroaclcast,”!! !“);  return strToSenci;  char* MyDecodeEncode: :StringPackage02  (mt  state)  strcpy(actionToBroadcast, “An environment state! ! !“); strcat(actionToBroadcast, itoa( (int)state,buffer, 10)); strToSend = strcat (actionToBroadcast, “1 ! !“); return actionToSend;  ArPose MyDecodeEncode: : StringDecode (char* strToDecode) tok = strtok(strToDecode,”! ! localMailHeading = tok; if  ( strcmp(tok,”boxPose”) ==0) tok = strtok (NULL,”! ! !“); posex =atoi(tok); tok = strtok(NULL,”!!!”); posey = atoi(tok); tok = strtok(NULL,”!!!”); posetheta = atoi(tok); boxOrRobotpose. setPose (posex, posey, posetheta);  if  ( strcmp(tok,”robotPose”) ==0) tok = strtok(NULL,”! ! !“); posex =atoi (tok); tok = strtok(NULL,”!!!”); posey = atoi(tok); tok = strtok(NULL,”!!!”); posetheta = atoi(tok); boxOrRobotpose setPose (posex, posey, posetheta); return boxOrRobotpose;  141  mt  MyDecodeEncode: :StringDecodeState(char* strToDecode) tok = strtok(strToDecode,”! ! !“); localMailHeading = tok; if  (  strcmp(tok,”An environment state”)  ==0)  tok = strtok(NULL,”!!!”); state =atoi(tok);  return state;  mt  MyDecodeEncode: :capabilityCheck(bool arm,bool gripper,bool camera)  mt  capabilityCheck =0;  /1 armonly=l; gripperonly=2; cameraonly=3; arm&&gripper=4; // arm&&camera=5; gripper&&camera=7; arm&&gripper&&camera=8; /1 nocomponent=0; if  (arm == true) capabilityCheck  if  ++;  (gripper == true) capabilityCheck =capabilityCheck+2; capabilityCheck ++;  if  (camera == true)  capabilityCheck =capabilityCheck+3; capabilityCheck ++;  return capabilityCheck;  char* MyDecodeEncode: :dec2bin(int statecode,char *binaryequivalent)  mt mt  k =0,n = 0; remain, oldStatecode; char temp[16]; do oldStatecode = statecode; remain = statecode % 2; = statecode / 2; statecode tempk++J = remain + ‘0’; while (statecode > 0); while  0) (k > bmnaryequivalent[n++] binaryequivalent[n-l1  = =  temp[--kJ; 0;  142  return binaryequivalent;  mt  MyDecodeEncode: :totalStateCodeOl  (mt  statecodeOl,  mt  statecodeO2)  mt mt  b, k, m, n; sum = 0; char binaryequivalentOl [161; char binaryequivalento2 [161;  for  (mt  =  0;i < 16;i++)  binaryequivalent0l[i]= ‘\O’; binaryequivalentO2[iJ= ‘\O’;  bitset<16> binaryOl ( string (dec2bin(statecode0l, binaryequivalent0l)H; bitset<16> binaryo2 ( string(dec2bin(statecodeo2, binaryequivalento2H); bitset<16> binaryequivalent = binaryOl I binary02; for(k  =  0;  n  =  k  <=  15;  k++)  (int) (binaryequivalent[k) if ( (n > 1) II (n < 0)  for(b  =  return  (0);  1, m  15; m  b sum  =  =  sum  =  >  -  ‘0’);  k; m--)  2; +  n  *  b;  return(sum);  mt  MyDecodeEncode: :totalStateCode02 stateCodelnt02)  mt totaiStateCode; totalStateCode=(stateCodelnt0l return totaiStateCode;  (mt  I  stateCodelntOl,  mt  stateCodelnt02);  class MyqlearningRobot: public ArRobot{ public: MyqLearningRobot(ArRobot *robot, ArSick* sick, ArAnalogGyro* gyro) :tao(0.9),action(0), learningRate(0.8),discountRate(0.9) ,lArm(20),wl(0.7),w2(0.25),w3(0.05){}; -MyqLearningRobot () { };  143  double detectionRadius(int width,int height); mt selectAction(int StateCode); mt probaSelect(double p, mt n); ArPose calculateRobotActionPositions (mt action, ArPose boxPose); void calForceParaQ; void boxResponse(ArPose position,ArPose newPose); void calNextPosition (ArPose boxPose); bool obstacleDetect (ArPose boxPose); void calculateReward(ArPose oldPose); void updateQ(int state, mt newState, double reward); void frameConvert(ArPose p1, ArPose p2, ArPose origin); ArPose frame2Tol(ArPose p2, ArPose origin); bool arriveGoal(ArPose theGoal,ArPose boxPose);  private: double posex,posey,posetheta; sizet strSize; ArPose boxpose; mt forcevalue; double forceValue[61; mt actionNum; mt state; mt reward; mt action; double angle; double netForceAngle; double netForceValue; double netTorque; double* obstacleArrayNewX; double* obstacleArrayNewY; double q[6]; double sum; double minQ; double learningRate; double discountRate; double tao; mt lArm; double Mf,Mt; double wl,w2,w3;  double MyqLearningRobot: :detectionRadius (mt width, mt height) double detectRadius; detectRadius = (sqrt((double) (((width/2)*(width/2H+ ((height/2) * (height/2) ) ) ) ) +400; return detectRadius;  mt  MyqLearningRobot: :selectAction(int state) sum=0; minQ=l00000000; double ranSelProb=0. 1; double v=randQ*100;  144  if (v<=ranSelprob*lO0) action=(int) (randQ*6); return action;  for(int i=0;  i<=5;  i++)  q[il=qTable[statel [ii; if(q[i]<minQ) minQ=q[i];  for(int i=0;  i<=actionNum-l; i++)  q[i]=q[i]—minQ;  for(int 1=0;  i<=actionNum—1;  i++)  sum=0; sum=sum  +  exp(q[i]/tao);  double p[ 1; 6 for(int i=0; p[i}  =  i<=actionNum-1;  i++)  exp(q[ij/tao)/sum;  tao=tao*0 .999; action=probaSelect (P, actionNum); return action;  mt  MyqLearningRobot::probaSelect(double p[),  mt  n)  double accumSum[7]; accumSum[0j = 0; for(int i=l; i<=n;i++) accumSum[iJ=accumSum[i-l1+p[i—l] *100;  double v=rand() *100; for(int i=0; i<=n; i++) if (v>=accumSum[i]) continue; else return i—l;  145  return 0;  ArPose MyqLearningRobot: :calculateRobotActionPositions ArPose boxPose)  double w=1700; ArPose p2; if  (mt  action,  double h=720;  (action==0) p2.setX(w/2) ; p2.setY(0)  if  (action==1) p2. setX( (w/2) —50) ; p2. setY(h/2)  if  (action==2) p2.setX(—((w/2)—50)); p2.setY(h/2);  if  (action==3) p2.setX(—w/2); p2.setY(0);  if  (action==4) p2.setX(—((w/2)—50H; p2.setY(—h/2);  if  (action==5) p2.setyUw/2)—50); p2.setY(—h/2);  ArPose p1 = frame2Tol(p2,boxPose); return p1;  void MyqLearningRobot: :calForcePara() double P1 = 3.1415926535897932384626433832795; for(int i=0; i<6;i++) if  (forceValue[i)>30) forceValue [ii =0;  double fx=forceValue [3] —forceValue [01; double fy=forceValue[4]+forceValue[5]—forceValue[2]—forceValue[l];  146  netForceAngle=(atan2 (fy, fx) ) /PI*180; netForceValue=sqrt ( (fxfx) + (fy*fy)); netTorque= (forceValue [5] +forceValue [2] -forceValue [1] —forceValue [4] ) lArm;  *  bool MyqLearningRobot: :obstacleDetect(ArPose boxPose) double rr=0; ArPose obsta; ArPose obstacleLocalPos (0,0); for(int i=O;i<sizeof  (obstacleArrayNewX) ;i++)  obsta. setX (obstacleArrayNewX[i]); obsta. setY (obstacleArrayNewY [ii); frameConvert(obsta,obstacleLocalPos, double double double double  boxpose);  x=obstacleLocalPos.getXO; y=obstacleLocalPos.getYO; width=1700; height=720;  if((abs(x)<=width/2) && (abs(y)<=(225+height/2H) return true; if((abs(y)<=height/2) && (abs(x)<=(225+width/2H) return true; bool tmp=false; tmp= ( (x—width/2) * (x—width/2) + (y—height/2) * (y—height/2) ) <=225*225; tmp=tmp I (((x—width/2) * (x—width/2) + (y+height/2) * (y+height/2))<=225*225) ; tmp=tmp II (((x+width/2) * (x+width/2) + (y—height/2) * (y-height/2)) <=225*225); tmp=tmp II (Ux+width/2) * (x+width/2) + (y+height/2) * (y+height/2) ) <=225*225) tmp=tmp && (abs(x)>(width/2)) && (abs(y)>=height/2); if (tmp) return true; return false;  void MyqLearningRobot: :boxResponse (ArPose position,ArPose newPose) bool isFinished; isFinished = false; ArPose oldPose; oldPose.setX(position.getXQ); oldPose. setY (position. getY Q); oldPose. setTh (position. getTh ());  147  mt  tMax=1; for(int i=O;  i<tMax;i++)  calNextPosition (newPose); if  (obstacleDetect (newPose)) break;  else position.setX(newPose.getXQ); position.sety(newPose.getYQ); position. setTh (newPose getTh 0); .  calculateReward (oldPose); isFinished=true;  void MyqLearningRobot: :calNextPosition(ArPose boxPose) ArPose position; double P1 = 3.14159; boxPose.setX(position.getXQ+Mf*netForceValue*cos ( (netForceAngle+ position.getTh0)/180*PI)) boxPose.setY(position.gety0+Mf*netForceVaiue*sin( (netForceAngle+ position.getTh0)/180*PIH; boxPose. setTh (position. getTh () +netTorque*Mt);  void MyqLearningRobot: :calculateReward(ArPose oldPose)  // calalculate reward Distance double dOld=pow((oldPose.getX0—theGoal.getX0) ,2)+ pow((oldPose.getY0-theGoal.gety0) ,2); dOld=sqrt (dOld); double dNew=pow((position.getX0—theGoal.getX0),2)+ pow((position.getY0—theGoal.getY0),2); dNew=sqrt (dNew); double ratioE=O.3; double rDist= (dOld-dNew) *ratioE; ArPose tmpObst; ArPose obstLocalPosition; double obstAngle;  148  double min=l000;  // calalculate reward Obstacle for(int i=0;i<sizeof (obstacleArrayNewX) ;i++)  tmpObst. setX (obstacleArrayNewX [i]); tmpObst. setY (obstacleArrayNewY [ii); double tmpDist=pow((oldPose.getXQ—tmpObst.getXQ),2)+ powUoldPose.getYQ—tmpObst.getYW,2); tmpDist=sqrt (tmpDist); if (tmpDist> (detectionRadius (1700,720))) continue; frameConvert (tmpObst,  obstLocalPosition,  oldPose);  obstAngle=atan2(obLocPosition.getYQ,obLocposition.getXQ)/PI*180; double tmpDiff=abs (netForceAngle-obstAngle); if (tmpDiff>180) tmpDiff=360—tmpDiff; if  (tmpDiff<min) min=tmpDiff;  double rObstacle=0; if (min==1000) rObstacle=1; else rObstacle= (sin (min/180*PI/2 .0) —0.3) *1.5;  /1  calalculate reward Roation  double rRotation=abs(position.getThO—oldPose.getThO); rRotation=cos (rRotation/180*PI) -0.9;  // total reward reward=wl *rDist+w2 *rObstacle+w3*rRotation;  void MyqLearningRobot::updateQ(int state,  mt  newState,  double reward)  mt actionNum = 6; double learningRate = 0.8; double discountRate = 0.9; double q[6]; double vMax = -10000;  149  for(int i=0;  i<=actionNum—l;  i++)  q[iJ=qTable[newState] [i]; if(q[i]>=vMax) vMax=q[i];  qTable[state] [actionj=(l_learningRate)*qTable[statej [action] +learningRate* (reward+discountRate*vMax);  void MyqLearningRobot::frameConvert(ArPose pl,  angle double double double double  =  ArPose p2,  ArPose origin)  (angle/(l80*(PIH); xl yl x2 y2  = = = =  pl.getXQ; pl.getYQ; origin.getXQ; origin.getY();  p2. setX( (xl*cos (angle) +(yl*sin (angle))); p2. setY ( (yl*cos (angle) (xl*sin (angle))); —  ArPose MyqLearningRobot::frame2Tol(ArPose p2,  ArPose origin)  ArPose pl; double double double double double  xl = p2.getXQ; yl = p2.getYQ; x2 = origin.getXQ; y2 = origin.getYQ; theta = origin.getThQ;  pl.setX((xl*cos(theta)) pl.setY((xl*sin(theta)) return p1;  —  +  (yl*sin(thetaH+ x2); (yl*cos(theta))+ y2);  bool MyqLearningRobot: :arriveGoal (ArPose theGoal,ArPose boxPose) double minSuccessDist double d=0;  =  1000;  double x = boxPose.getXQ— theGoal.getXO; double y = boxPose.getY()— theGoal.getYO; d=sqrt ( (x*x) + (y*y)); if (d<=minSuccessDist) return true; else return false;  150  General.cpp / This program will uses to push the Box to the given location in the control environment at the IAL. During the transportation task need to consider about the speed limiting actions to the collision avoidance as well. Press escape to shut down Aria and exit. #include <bitset> #include<iostream> #include<conio. h> #include<stdlib h> #include “Aria.h” #include “Server.h” #include “Client.h” #include “Pose.h” #include ‘Qlearning.h” #include “Gene ra1ACTS .  .  using namespace std;  double objectPoseX, objectPoseY, objectPoseTh; double* obstacleArrayX; double* obstacleArrayY; double* obstacleArrayNewX; double* obstacleArrayNewY; double detectionRobotArrayX[51 ;double detections.obotArrayY double detectionRobotArrayTh[51; double dR;  mt  [51;  BuildStateCode (ArPose objectCenter) double objectCenterX = objectCenter.getXQ; double objectCenterY = objectCenter.getYQ; double objectCenterTh = objectCenter.getTh(); ArPose theGoal; theGoal.setPose(500,7000); double P1 = 3.1415926535897932384626433832795; mt stateCodeOfThisRobot; double goalAngle=(atan2 ( (theGoal.getYQ—objectCenterY), (theGoal.getXQ_objectCenterX)H/PI*180; if (goalAngle<0) goalAngle=goalAngle+3 60; stateCodeOfThisRobot = (int) (goalAngle/(360.0/256 for(int i=0; i< sizeof(obstacleArrayNewX);i++) ArPose localPos; objectCenterTh=objectCenterTh/180* (P1); double xl = obstacleArrayNewX[ij; //ObstacleArrayX[il; double yl = obstacleArrayNewY[il; //ObstacleArrayY[il; double x2 = objectCenterX; double y2 = objectCenterX;  151  localPos.setXUxl*cos(objectCenterTh))+(yl*sin(objectCenterTh)) (x2*cos (objectCenterTh) ) (y2sin (objectCenterTh)H; localPos.setY((yl*cos(objectCenterThH_(xl*sin(objectCenterTh)) + (x2*sin (objectCenterTh) ) (y2cos (objectCenterTh)H; double angle=(atan2(localPos.getYQ, localpos.getXQH/PI*180; —  —  —  if (angle<0) angle=angle+360; mt obstaStat=(int) (angle/45.O); obstaStat= (int)pow(2.0,obstaStat); stateCodeOfThisRobot=stateCodeOfThisRobot  I  (obstaStat<<8);  return stateCodeOfThisRobot;  void qTablelnitiaiization () double qTable[8192] [6]; for (mt ± = 0;i < 8192; i++) for (mt = 0;j < 6; j++) qTable[i] [jJ=O;  mt  main  (mt  argc,  Aria: : mit () mt robotNo  =  char **argv)  2;  ArPose initialPose (xvalue, yvalue); ArPose currentPose; ArPose theGoal; theGoal.setPose(500,7 000); ArRobot robot; ArSonarDevice sonar; ArSick sick; ArKeyHandler keyHandler; ArArgumentParser parser(&argc, argv); ArSimpleConnector simpleConnector(&argc,  argv);  robot.addRangeDevice (&sonar); robot.addRangeDevice (&sick);  /7 Create objects MyServer myServer; MyClient myClient (&robot); MyReading myReading(&robot); MyGeneralACTS myACTS (&robot); MyPose myPose; MyDecodeEncode myDecodeEncode (&robot, &sick, &gyro); MyqLearningRobot myqLearningRobot(&robot, &sick, &gyro); Aria: : setKeyHandler (&keyHandler); robot.attachKeyHandler(&keyHandler);  152  // the actions we’ll use for wander ArActionStaliRecover recover; ArActionBumpers bumpers; ArActionAvoidFront avoidFrontNear(”Avoid Front Near”, 225, 0); ArActionAvoidFront avoidFrontFar; ArActionConstantVelocity constantVelocity(”Constant Velocity”, 150); ArActionGoto gotoPoseAction(”goto”,ArPose(0,0,0),l00,100,80,2);  /1 Parse all command line arguments (!simpleConnector.parseArgs() II argc  if  >  1)  simpleConnector.logOptions 0; keyHandler. restore Q; return 1;  1/ Connect to the robot if  (!simpleConnector.connectRobot(&robot)) printf (“Could not connect to robot... Aria: :exit(l)  exiting\n”);  /1 Connect to the Laser Range Finder if  (!simpleConnector.connectLaser(&sick)) printf (“Could not connect to Laser Range Finder... Aria: :exit(l)  if  (!Aria: :parseArgs()  II  exiting\n”);  !parser.checkHelpAndwarnUnparsedO)  Aria: : logOptions Q; exit (1)  cout<<”Robot Th:  “<<  robot.getPose() .getThQ<<endl;  // add the actions robot.addAction(&recover, 100); robot.addAction(&bumpers, 75); robot.addAction(&avoidFrontNear, 51); robot.addAction(&avoidFrontFar, 35); robot.addAction(&constantVelocity, 25); robot.addAction(&gotoPoseAction, 50); / /****************************** myServer.open (7777); Sleep (1000) myServer. runAsync ; myReading. runAsync Q; myClient.connectServer (7777); robot. setAbsoluteMaxTransAccel (40); robot. runAsync (true); robot.comlnt (ArCommands: :ENABLE, 1); const char* myIP= “IP Adress”; char* newMail = NULL;  153  mt mt mt mt  noOfRobot=0; counti = 0; count2 = 0; totalNoOfRobot = 3; double dR, dRX, dRY, thetaDr; mt width = 1700; mt height = 720; ArPose aa,bb; char *55; char *messagestr; char *messagestate; double* x; double* y; double* th; char* mailheading = NULL; mt stateWord = 0; mt action = 0; mt statecodelnteger = 0; char *otherlP[7]; otherlP[0] =“192.168.1.3”; otherlP[11 =“192.168.1.4”; otherlP[2] =“192.168.l.5”; dR = myqLearningRobot.detectionRadius (1700,720);  while  (Aria: :getRunning())  robot. clearDirectMotion Q; newMail = myReading.checkMailO; if (newMail==”STOP”) robot.stop(); cout<<”wait! “<<endl;  if  (newMail==”Estimated Object Pose Transmitted”) aa = myDecodeEncode.StringDecode(ss); myDecodeEncode GetMailHeading (mailheading); .  if(mailheading  ==  “boxPose”)  objectPoseX = aa.getXQ; objectPoseY = aa.getY(); objectPoseTh = aa.getTh(); thetaDr = atan((objectPoseX-robot.getPose() .getXW/ (objectPoseY—robot.getPose () .getY OH; dRX = aa.getXO+dR*cos(thetaDr); dRY = aa.getYO+dR*sin(thetaDr); gotoPoseAction.setGoal(ArPose(dRX, // move to detection radious  dRX,  0));  robot.addAction(&bumpers, 75); robot.addAction(&avoidFrontNear, 52); robot.addAction(&avoidFrontFar, 20); while (!gotoPoseAction haveAchievedGoal 0) Sleep(l0000); myClient.sendingString(”Leader I’m in detection .  radius”,true,NULL);  154  if  (newMail==”Leader I’m in detection radius”) if  (((robot.getPoseQ.getXQ- dRX) <= 50) ((robot.getPoseQ.getYQ- dRY) for (mt IP=0; IP<3; IP++) if  && <= 50)){  ((!strcmp(myIP,otherlP[IP])==0) && (!strcmp( (char*) 1010, otherlP[IP] )==0))  messageStr = myDecodeEncode. StringPackageol (false,robot.getPoseO.getXQ,robot.getPoseO. getY () , robot. getPose () getTh H; myClient.sendingString(messageStr,true, otherlP[IPJ); myclient.sendingString(”Close to detection radius”,true, otherlP[IPI); robot. stop Q; .  if  (newMail==”I am in close to detection radius”) bb = myDecodeEncode.StringDecode(messageStr); myDecodeEncode. GetMailHeading (mailheading); if(mailheading == “robotPose”) if  (! (robotNo == 1010)) detectionRobotArrayX[noOfRobot]= bb.getXQ; detectionRobotArrayY[noOfRobot]= bb.getY(); detectionRobotArrayTh[noOfRobot]= bb.getTh(); noOfRobot ++; if (noOfRobot == 2) double twoX = detectionRobotArrayX[0]detectionRobotArrayX [1]; double twoY = detectionRobotArrayY[0]— detectionRobotArrayY [1]; double V=(double) sqrt( (twoX*twoX)+(twoY*twoYH; if  (V >=  ((2*dR)_500))  myPose.FindObstacleGroup(&robot, &sick); if ((sizeof(obstacleArrayX)== 0) && (sizeof (obstacleArrayY) == 0)) mt index = 0; double obstacleArrayNewX [sizeof (obstacleArrayX)]; double obstacleArrayNewY [sizeof (obstacleArrayX) 1; for (mt i=0; i <(sizeof (obstacleArrayX)—l);  i++)  double bbx = aa.getXQ;  155  double double double double if  bby = aa.getYQ; oAx = obstacleArrayX[i] bbx; oAy = obstacleArrayY[i] bby; oAxy = sqrt((oAx*oAx)+(oAy*oAyH; -  —  (oAxy <= dR)  obstacleArrayNewX[index] obstacleArrayNewY[index] index++;  = obstacleArrayX[i]; = obstacleArrayY[i];  stateCodelnteger = BuildStateCode (aa, obstacleArrayNewX, obstacleArrayNewY);  (mt  for if  infoTransmit=0;  infoTransmit<3;  (!strcmp (myIP, otherlP [infoTransmitj  infoTransmit++)  )  ==0)  myDecodeEncode StringPackageo2 (stateCodelnteger); myClient. sendingString (messageState, true,otherlP[infoTransmit]); myClient. sendingString(”generate the satate code”, true, otherlP[infoTransmit)); messageState =  (newMail==”generate the satate code”)  if  stateword = myDecodeEncode.StringDecodeState(messageState); myDecodeEncode GetMailHeading (mailheading); .  if  (mailheading == “An environment state”) if  (robotNo == 1010)  action = myqLearningRobot.selectAction(stateWord); myClient.sendingString(”I received an action i” ,true,NULL);  if  (newMail ==  “  I received an action i “)// i = 1,2,3,4,5,6  countl ++; if (countl == totalNoOfRobot) ArPose actionPosition = myqLearningRobot. calculateRobotAction Positions (action, aa);  156  gotoPoseAction. setGoal (actionPosition); robot.addAction(&bumpers, 75); robot.addAction(&avoidFrontNear, 52); robot.addAction(&avoidFrontFar, 20); while ( gotoPoseAction haveAchievedGoal 0) Sleep(100); robot.stop(); myClient.sendingString(”I am in the action position!! !“,true,NULL); robot remAct ion (&bumpers); robot. remAction (&avoidFrontNear); robot. remAction (&avoidFrontF’ar); .  .  if  (newMail  ==  “I am in the action position”)  count2 ++; if (myqLearningRobot.arriveGoal (theGoal,aa)  ==  true)  cout<<”Now in the goal”<<endl; break; I/end program bb  =  myPose.poseEstimate(&robot,  &sick,  &gyro);  while (robot.checkRangeDevicesCurrentPolar (—10, l0)>1200) currentPose = robot.getPose0; currentX = currentPose.getX0; currentY = currentPose.getl0; currentTh = currentPose.getTh(); Sleep ( 500)  if(robot.checkRangeDevicesCurrentPolar(-30, 30) <1200) robot. stop 0;  /1 /1 /1  2 ----  1  mt mt  I  3  I  4  I  blobxl, blobx2, blobyl, bloby2; bloblnfo = myACTS.ACTSC010rB10bO; cout<<’bloblnfo: “<<bloblnfo<<endl; ArUtil: :sleep(8000);  switch (bloblnfo) case 1:  1/ received the blob  myClient. sendingString(”STOP”, true,NULL);  157  myACTS.getXcgO; myACTS.getYcgQ; cout<<”Blob readings —> XCG:  “<<myACTS.getXcgQ<<”YCG: “<<myACTS.getYcg() <<endi;  blobxl = 320; blobyl = 240; blobx2 = myACTS.getXcgQ; bloby2 = myACTS.getYcgQ; myACTS.CentreTheBlob(&robot,blobxl,blobx2,blobyl,bloby2); ArUtil: :sleep(8000) double x, y, theta; myClient.sendingString(”Object Pose Estimating”, true,NULL); aa = myPose.poseEstimate(&robot, x = aa.getXQ; y = aa.getYQ; theta = aa.getThO;  &sick,  &gyro);  /1 robot detects the color blob; change its number myClient.myOwnNo = 1010; myClient.sendingString(”Object ss  =  if  Estimated”,true,NULL);  myDecodeEncode.StringPackageol (true,x,  (mt  for  Pose  robotld=0;  robotld<3;  y,  theta);  robotld++)  (!strcmp (myIP, otherlP[robotld] ) ==0) myClient.sendingString(ss,true,otherlP[robotld]);  ArUtil: :sleep (8000); myClient. sendingString (“Estimated Obj ect Pose Transmitted”, true,NULL); break;  case 2: // not received the blob cout<<”Sorry you haven’t luck this time”<<endl; cout<<”I’ll try my best find bolb here if possible”<<endl; 45; mt randomAngle = randQ%90 cout<<”Random Angle: “<<randomAngle<<endl; ArUtil: :sleep(8000) robot. setHeading (randomAngle); ArUtil: :sleep(8000) break; —  // Robot disconnected, Aria: : shutdown Q; return 0;  shut down  158  

Cite

Citation Scheme:

        

Citations by CSL (citeproc-js)

Usage Statistics

Share

Embed

Customize your widget with the following options, then copy and paste the code below into the HTML of your page to embed this item in your website.
                        
                            <div id="ubcOpenCollectionsWidgetDisplay">
                            <script id="ubcOpenCollectionsWidget"
                            src="{[{embed.src}]}"
                            data-item="{[{embed.item}]}"
                            data-collection="{[{embed.collection}]}"
                            data-metadata="{[{embed.showMetadata}]}"
                            data-width="{[{embed.width}]}"
                            async >
                            </script>
                            </div>
                        
                    
IIIF logo Our image viewer uses the IIIF 2.0 standard. To load this item in other compatible viewers, use this url:
http://iiif.library.ubc.ca/presentation/dsp.24.1-0068136/manifest

Comment

Related Items