"Applied Science, Faculty of"@en . "Mechanical Engineering, Department of"@en . "DSpace"@en . "UBCV"@en . "Siriwardana, Pallege Gamini Dilupa"@en . "2009-11-09T16:53:16Z"@en . "2009"@en . "Master of Applied Science - MASc"@en . "University of British Columbia"@en . "Multi robot cooperative transportation is an important research area in the multi robot\ndomain. In the process of object transportation, several autonomous robots navigate cooperatively in either a static or a dynamic environment to transport an object to a goal location and orientation. The environment may consist of both fixed and removable obstacles and it will be subject to uncertainty and unforeseen changes within the environment. Furthermore, more than one robot may be required in a cooperative mode\nfor handling heavy and large objects. These are some of the challenges addressed in the\npresent thesis.\nThis thesis develops a machine learning approach and investigates relevant research\nissues for object transportation utilizing cooperative and autonomous multiple mobile\nrobots. It makes significant original contributions in distributed multi robot coordination and self deterministic learning for robot decision making, and comes up with an optimal solution to the action selection conflicts of the robots in the cooperative system. This will help to improve the real time performance and robustness of the system. Also, the thesis develops a new method for object and obstacle identification in complex environments\nusing a laser range finder, which is more realistic than the available methods. A new\nalgorithm for object pose estimation algorithm is developed, enabling a robot to identify the objects and obstacles in a multi-robot environment by utilizing the laser range finder and color blob tracking.\nThe thesis develops a fully distributed hierarchical multi-robot architecture for enhanced coordination among robots in a dynamic and unknown environment. It strives to improve the real time performance and robustness. The system consists with three layers. By combining two popular artificial intelligence (Al) techniques such as learning and behavior based decision making, the developed architecture is expected to facilitate effective autonomous operation of cooperative multi-robot systems in a dynamically changing, unstructured, and unknown environment."@en . "https://circle.library.ubc.ca/rest/handle/2429/14700?expand=metadata"@en . "3574394 bytes"@en . "application/pdf"@en . "MACHINE LEARNING-BASED MULTI-ROBOT COOPERATIVE TRANSPORTATION OF OBJECTS by Pallege Gamini Dilupa Siriwardana B.Sc., University of Peradeniya, Sri Lanka, 2003 B.Tech., The Open University of Sri Lanka, 2004 A THESIS SUBMITTED TN PARTIAL FULFILLMENT OF THE REQUIREMENTS FOR THE DEGREE OF MASTER OF APPLIED SCIENCE in The Faculty of Graduate Studies (Mechanical Engineering) THE UNIVERSITY OF BRITISH COLUMBIA (Vancouver) May 2009 \u00C2\u00A9 Pallege Gamini Dilupa Siriwardana, 2009 Abstract Multi robot cooperative transportation is an important research area in the multi robot domain. In the process of object transportation, several autonomous robots navigate cooperatively in either a static or a dynamic environment to transport an object to a goal location and orientation. The environment may consist of both fixed and removable obstacles and it will be subject to uncertainty and unforeseen changes within the environment. Furthermore, more than one robot may be required in a cooperative mode for handling heavy and large objects. These are some of the challenges addressed in the present thesis. This thesis develops a machine learning approach and investigates relevant research issues for object transportation utilizing cooperative and autonomous multiple mobile robots. It makes significant original contributions in distributed multi robot coordination and self deterministic learning for robot decision making, and comes up with an optimal solution to the action selection conflicts of the robots in the cooperative system. This will help to improve the real time performance and robustness of the system. Also, the thesis develops a new method for object and obstacle identification in complex environments using a laser range finder, which is more realistic than the available methods. A new algorithm for object pose estimation algorithm is developed, enabling a robot to identify the objects and obstacles in a multi-robot environment by utilizing the laser range finder and color blob tracking. The thesis develops a fully distributed hierarchical multi-robot architecture for enhanced coordination among robots in a dynamic and unknown environment. It strives to improve the real time performance and robustness. The system consists with three layers. By combining two popular artificial intelligence (Al) techniques such as learning and behavior based decision making, the developed architecture is expected to facilitate effective autonomous operation of cooperative multi-robot systems in a dynamically changing, unstructured, and unknown environment. 11 Machine learning, is integrated into the developed multi-robot system for decision making during the transportation under uncertainty and unforeseen changes within the environment. For this purpose the conventional algorithm of Q learning known as single- agent Q learning algorithm improved to form the \u00E2\u0080\u009Cmodified Q learning algorithm\u00E2\u0080\u009D, which provides efficient state identification and optimally resolves action selection conflicts in multi robot learning. A C++ based simulation system and a physical multi-robot experimental system are developed in the laboratory to transport an object of interest to a goal location in a dynamic and unknown environment with a complex obstacle distribution. The approaches developed in this thesis are implemented in the prototype system and validated through computer simulation and experimentation. The results validate the developed techniques. 111 TABLE OF CONTENTS Abstract.ii Table of Contents iv List of Tables vii List of Figures viii Nomenclature x Abbreviations xii Acknowledgements xiii Chapter 1 Introduction 1 1.1 Project Description 2 1.2 Objective of the Research 4 1.3 Problem Definition 4 1.4 Related Work 7 1.4.1 Multi-Robot Coordination 9 1.4.1.1 Behavior- Based Approach for Coordination 9 1.4.1.2 Planning-Based Approach for Coordination 10 1.4.1.3 Distributed Control Approach for Coordination 10 1.4.1.4 Market-Based Approach for Coordination 11 1.4.1.5 Learning-Based Approach for Coordination 11 1.4.2 Machine Learning in Multi-Robot Systems 12 1.4.3 Pose Estimation Technologies for Multi-Robot Systems 14 1.5 Contributions and Organization of the Thesis 17 Chapter 2 Multi-Robot Coordination for Object Transportation 19 2.1 Overview 19 2.2 General Hierarchical Architecture for Multi Robot Coordination 20 2.2.1 Machine Learning Unit 26 2.2.2 Coordination Unit 27 2.2.3 Behavior-Based Distributed Control 28 2.2.3.1 Behavior Representation 30 2.2.3.2 Behavior Composition and Cooperation 31 2.2.3.3 Communication Behaviors for Coordination 33 2.2.3.4 Group Behaviors for Coordination 34 iv 2.2.3.5 Behavior Coordination for Object Transportation Sub Task 34 2.3 Mechanism for Multi Robot Coordination 37 2.3.1 Dynamic Role Assignment Mechanism for Coordination Unit 37 2.3.2 Dynamic Role Assignment Model for Coordination Unit 40 2.4 Distributed Control Issues for Multi Robot Coordination 43 2.5 Summary 44 Chapter 3 Machine Learning for Multi-robot Decision-making 46 3.1 Overview 46 3.2 Markov Decision Process (MDP) 48 3.3 Reinforcement Learning 51 3.4 Modified Q Learning Algorithm 54 3.5 Multi Robot Transportation Using Modified Q Learning 58 3.5.1 Distributed Multi-Robot Q-leaning 60 3.5.2 Object Transportation Task 61 3.5.3 Multi-Robot Object Pushing with Modified Q-learning 62 3.5.3.1 The Environment States for Modified Q Learning 62 3.5.3.2 Robot Actions 64 3.5.3.3 Object Dynamics 65 3.5.3.4 Reward Function 67 3.5.4 Simulation Results 68 3.5.4.1 Stimulation Results for Object Pushing 69 3.5.4.2 Effectiveness of Modified Q Learning in a Multi Robot Cooperative Task 71 3.6 Summary 76 Chapter 4 Pose Estimation for Multi Robot Object Transportation 78 4.1 Overview 78 4.2 Multi-Robot Transportation System 81 4.3 Pose Estimation Technology 83 4.3.1 Global Localization of Robot 85 4.3.2 Color Blob Tracking 87 4.3.3 Object Pose Estimation 89 4.3.3.1 Relative Pose Estimation 90 4.3.3.2 Object Global Estimation 93 4.4. Experimental Result 94 4.4.1 The Test-Bed 95 v 4.4.2 Experiments with the Stationary Robot 96 4.4.3 Experiments with a Moving Robot 101 4.5 Summary 105 Chapter 5 Experimentation in Distributed Multi-Robot Cooperative Transportation 107 5.1 Overview 107 5.2 Test Bed 108 5.3 A Physical Multi-Robot Transportation System 112 5.4 Summary 118 Chapter 6 Conclusions 119 6.1 Primary Contributions 119 6.2 Limitations and Suggested Future Research 121 6.2.1 Improvement of the Model and Algorithm of MQLA 121 6.2.2 Integrating Learning and Coordination 122 6.3 Summary 122 Bibliography 123 Appendix 1: Program Functions 127 Appendix 2: Program Codes 129 vi List of Tables Table 4.1: The actual and estimated object pose results in the first set of experiments with a stationary robot 96 Table 4.2: The actual and estimated object pose results for the second set of experiments with a stationary robot 99 Table 4.3: The actual and estimated object pose results for the first set of experiments with a moving robot 101 Table 4.4: The actual and estimated object pose results for the second set of experiments with a moving robot 103 vii List of Figures Figure 1.1 The overall project representation 3 Figure 1.2 Schematic representation of developed system 5 Figure 2.1 Hierarchical architecture for object transportation 23 Figure 2.2 Composite representation of the proposed architecture 25 Figure 2.3 Principle of machine learning 26 Figure 2.4 The environment state representation for coordination 28 Figure 2.5 The hierarchical behavior based control layer 29 Figure 2.6 SR diagram of a simple behavior 30 Figure 2.7 The SR diagram for a composite behavior with a competitive operator 32 Figure 2.8 The SR diagram of a composite behavior with a cooperative operator 33 Figure 2.9 Behavioral hierarchy for the coordinated transport sub-task 35 Figure 2.10 Transitions of the group behaviors of coordinated object transportation 36 Figure 2.11 Types of role assignment 38 Figure 2.12 Schematic representation of a role composed by three roles 42 Figure 3.1 The value iteration algorithm 51 Figure 3.2 The single-agent Q-learning algorithm 52 Figure 3.3 Modified Q learning algorithm 57 Figure 3.4 The developed multi-robot system 59 Figure 3.5 Description of a multi robot object pushing task 61 Figure 3.6 Environment state representation using binary code 63 Figure 3.7 Coding protocol of the environment states 64 Figure 3.8 Pushing action points available to each robot 65 Figure 3.9 Dynamics of the object with rectangular cross section 66 Figure 3.10 Two object-pushing snapshots under different conditions 70 Figure 3.11 The environmental state with decimal value \u00E2\u0080\u009C512\u00E2\u0080\u009D and six pushing actions available to the robots 72 Figure 3.12 The Q-values under state of \u00E2\u0080\u009C512\u00E2\u0080\u009D in 100,000 rounds of object pushing 73 Figure 3.13 Probability density estimate of Q values under state of \u00E2\u0080\u009C512\u00E2\u0080\u009D in 100,000 rounds of object pushing 74 viii Figure 3.14 Probability density estimate of a action selection probability under the state of \u00E2\u0080\u009C512\u00E2\u0080\u009D in 100,000 rounds of object pushing 75 Figure 4.1 The developed physical multi-robot system 82 Figure 4.2 General scheme of pose estimation for cooperative object transportation 85 Figure 4.3 Global pose estimation of wheeled robot 87 Figure 4.4 Flow diagram of color blob tracking 88 Figure 4.5 Division of camera frame into four quadrants 89 Figure 4.6 Schematic drawing of laser range sensor 90 Figure 4.7 Relative object pose estimation 90 Figure 4.8 MobileSim simulator visualized results from laser range finder 92 Figure 4.9 Arbitrary layout of objects 94 Figure 4.10 The experimental system in IAL at UBC 95 Figure 4.11 The x-axis error from Table 4.1 97 Figure 4.12 The y-axis error from Table 4.1 97 Figure 4.13 The orientation error from Table 4.1 98 Figure 4.14 The x-axis error from Table 4.2 99 Figure 4.15 The y-axis error from Table 4.2 100 Figure 4.16 The orientation error from Table 4.2 100 Figure 4.17 The x-axis error from Table 4.3 102 Figure 4.18 The y-axis error from Table 4.3 102 Figure 4.19 The orientation error from Table 4.3 103 Figure 4.20 The x-axis error from Table 4.4 104 Figure 4.21 The y-axis error from Table 4.4 104 Figure 4.22 The orientation error from Table 4.4 105 Figure 5.1 The multi-robot object transportation system 108 Figure 5.2 The color blobs used to identify the orientation of the object 109 Figure 5.3 PioneerTM 3-DX mobile robot 110 Figure 5.4 PioneerTM 3-AT mobile robot 111 Figure 5.5 Multi-robot cooperative transportation 116 ix Nomenclature Notations S = {s, s2, ..., s} Set of states in the environment A = {a1,2. . .,a} Set of actions available to a robot T(s, a, s\u00E2\u0080\u0099) Environment model, which decides the next environmental state s\u00E2\u0080\u0099when robot selects action a, under current states. fl(s) Probability distribution over the states T: S x A \u00E2\u0080\u0094> ,z(s) Transition function or transition model t time R : S x A \u00E2\u0080\u0094\u00C3\u00B7 fl Reward function. This gives the immediate reward when robot takes action a. under current state s Policy with which a robot selects its actions R(S) Received Reward Discount factor, which varies from \u00E2\u0080\u009C0\u00E2\u0080\u009D to \u00E2\u0080\u009C1\u00E2\u0080\u009D Optimal policy, which yields the highest expected sum of the discounted rewards U(s) Utility of a state 2T Utility of the subsequent state Maximum error allowed in the utility of any state 6 Maximum change in the utility of any state in an iteration U, U\u00E2\u0080\u0099 Vectors of the utilities for the states in s Q(s,,a,) An entry in the Q-table 77 Learning rate r Direct reward V \u00E2\u0080\u009CTemperature\u00E2\u0080\u009D parameter (see Chapter 3) Empty set A set including all actions selected by the robots x 8 Orientation angle F Net force applied by the robots F Net torque applied by the robots Cf,C1,c Coefficients At Short period of time Distance between the goal location and the object center before taking the pushing action Dnew Distance between the goal location and the object center after taking the pushing action T Sampling time A1 Action set of the i1\u00E2\u0080\u0099 robot I Set including the currently available actions xi Abbreviations MDP Markov Decision Process RL Reinforcement Learning MQLA Modified Q Learning Algorithm Al Artificial Intelligence LAN Local Area Network TCP/IP Transmission Control Protocol/ Internet Protocol MRS Multi-Robot System CCD Charge Couple Device GPS Global Positioning System FSA Finite State Diagram VLSI Very Large Scale Integration ACTS Advanced Color Tracking System xli Acknowledgements A number of people contributed to this effort intellectually, personally and financially. First and foremost, I wish to thank Prof. Clarence W. de Silva, Director of the Industrial Automation Laboratory, in view of number of reasons. He was my research supervisor, and his dedication and commitment towards the student community is appreciated by all of us who were his students. Had it not been for Professor de Silva\u00E2\u0080\u0099s considerable persuasive skills, boundless patience, great-heartedness, keen guidance and unbelievable human qualities; this particular research perhaps would never have come to fruition. I take this opportunity to thank Dr. Ying Wang, The post-doctoral research scientist in the Industrial Automation Laboratory, who is a scholar of multi-robotic systems. Whenever I had a problem in multi robotic systems, he assisted me along the correct path and I am privileged and honored to have worked with him. I must thank Mr. Howard Hu, a Research Assistant in the Industrial Automation laboratory, for his constant support and constructive remarks, especially in debugging the computer programs. I wish to express my sincere gratitude to Prof. Mu Chiao and Dr. Farbod Khoshnoud of SOFTEK, who served as members of my research committee. Furthermore, I wish to thank The Open University of Sri Lanka and the Asian development Bank funded Distance Education Modernization Project for the financial assistance in part. Major financial assistance, for my Research Assistantship, and for equipment and other resources for my research, was provided by research grants held by Prof. de Silva, notably: the National Sciences and Engineering Research Council (NSERC) of Canada Special Research Opportunity (SRO) and Discovery grants; Tier 1 Canada Research Chair (CRC); Canada Foundation for Innovation (CFI); and British Columbia Knowledge Development Fund (BCKDF). I also acknowledge the support extended by my Sri Lankan friends, Mr. Anjana Punchihewa, Mr. Eranda Harinath and Mr. Lalinda Weerasekara for their encouragements and support in different ways. xlii I take this opportunity to thank the members of the Industrial Automation Laboratory; especially Prof. Lalith Gamage, Mr. Tahir Khan, Mr. Mohammed Airasheed, Ms. Madalina Wierzbicki, Mr. Roland Haoxiang Lang, Mr. Bhenam Razavi, Mr. Srinivas Raman, Mr. Guan-Lu Zhang, Mr. Arunasiri Liyanage, Mr. Ramon Campos and Prof. Qingshan Zeng for their assistance, guidance and encouragements throughout the research project. Last but not least, I wish to thank my family for their constant support, especially my wife Lelka Samanmalee, my daughter Viduni Siriwardana, my mother, my father and my sister. Without their unwavering support, tolerance of my idiosyncrasies and abiding confidence, this research never would have been possible. xiv Chapter 1 Introduction The field of Multi-Robot Systems emerged as a major area of robotics research in the late 1980\u00E2\u0080\u0099s. Since then, the robotics community has investigated a number of important issues of cooperative multiple mobile robots. Prior to that most research was concentrated on either single robots or distributed problem-solving systems that did not involve multi-robot cooperation. Multi-robot systems (MRSs) are becoming increasingly significant in industrial, commercial and scientific applications. The number of robots used in these applications is also increasing. An MRS can be defined as a set of autonomous robots interacting in the same work environment in fairly complex ways to carry out a common task. It should be able to operate in dynamic environments, where uncertainty and unforeseen changes can arise due to the presence of moving robots, other agents and both stationary and moving obstacles. Such a system may consist of complex mobile robotic platforms equipped with sophisticated sensors and actuators, which may be required to execute complex tasks. The advantages of a multi-robot system (MRS) over a single-robot system are indicated now. Multiple robots have the potential to accomplish a given task more efficiently because the capabilities more diverse, flexible, and distributed. Multiple robots can localize themselves faster and more accurately by exchanging information among themselves about their positions and orientations (pose). An MRS possesses a greater fault tolerance than a powerful and expensive single robot, particular because if one robot in the MRS fails or malfunctions there will be other robots to assume its tasks. In summary, an MRS will have a greater range of task capabilities, greater efficiency, increased system performance, fault tolerance and robustness, lower economic cost, ease of development, distributed sensing and action, inherent parallelism in comparison to a single robot of comparable capability. In the MRS domain, the robotics community began to investigate major challenges such as robot synchronization, determination of a robot coordination strategy and identification of complex work environments. Also, they have investigated the potential for using MRS research to provide insight into social and life sciences involving groups of biological entities. 1 Multi-robot object transportation is an important research subject in the multi-robot domain. In an object transportation process, several autonomous robots navigate cooperatively in either a static or a dynamic environment to transport an object to a goal location and orientation. The environment may consist of fixed and movable obstacles and may be subject to uncertainty and unforeseen changes within the work environment. A single robot may not be able to handle heavy and large objects alone. Underlying challenges can be overcome by multiple mobile robots. A primary concern associated with multi-robot application is the method of enabling individual robots to autonomously adapt their behaviors to meet the dynamic changes in their task environment. For this, each robot needs to communication with its peers, exchange sensor information, formulate the coordination strategy and obstacle avoidance scheme, plan the robot trajectory, and assign and receive subtasks and roles for carrying out the required task quickly and successfully. Due to main challenges inherent in various research fields, multi-robot object transportation is a good benchmark application to assess the effectiveness of multi-robot control and coordination strategies. Therefore control and coordination methods for multi-robot systems have been investigated by various researchers and multi robot object transportation problem has become a theme of research and development. 1.1 Project Description A main objective of the overall project of \u00E2\u0080\u009CMulti-Robot Autonomous Assembly System\u00E2\u0080\u009D in our laboratory (Industrial Automation Laboratory) is to develop a physical multi-robot system with object manipulation, transportation and assembly capabilities. First, a group of intelligent autonomous robots cooperatively search and identify any useful parts within the work environment. Then the identified parts are individually or cooperatively transported to a designated place and assembled into a useful device. Finally, the built device is transported individually or cooperatively to a known location in an unknown and dynamic environment. During this process, the obstacles may be static or even appear randomly during the operation. The overall project representation is shown in Figure 1.1. 2 /\u00E2\u0080\u009C7 cle/ I Movable/- obstacle __________ F 0 N /777k N Movable / / Movable o 1 obstacle \u00C2\u00AE /obstacle Movable \u00E2\u0080\u0094 / / .A.. \u00E2\u0080\u00987- Fixed obstacle Goal Location / / obstacle \u00E2\u0080\u0098\u00E2\u0080\u0094\u00E2\u0080\u0098i Figure 1.1: The overall project representation. The overall project consists of three main stages: 1. Multi-Robot Foraging: Multiple mobile robots autonomously search the environment for useful objects/parts. The identified useful parts are transported by individual robots or cooperating multiple robots to a designated place. 2. Multi-Robot Assembly: Robots autonomously determine the assembly sequence for constructing a useful device, properly handle and manipulate the parts, and assemble the device. 3. Multi-Robot Transportation: Tow or more robots cooperatively transport the built device to the goal location. A major challenge of the overall project is the transportation of an object of arbitrary size and shape - a key issue in the multi-robot domain. In the stage of transporting components for assembly at a designated location the robots may have to work in a complex dynamic environment. The stages of multi-robot forging and transportation need to use cooperative object transportation technologies to accomplish the overall task. This thesis mainly concerns object transportation technologies and the development of a physical multi-robot system to transport an object of arbitrary size and shape. 3 1.2 Objective of the Research The physical multi-robot object transportation system, as developed through the research reported in the present thesis, has the ability to navigate an object to a predetermined location though a dynamic and unknown environment with complicated obstacle distribution. The primary research objectives of the thesis are listed below. o Develop a fully distributed physical multi-robot system for operation in a dynamic and unknown environment with complicated obstacle distribution. o Improve an available single-agent machine learning algorithm so as to enable a multi- robot system to deal with the uncertainty and unforeseen changes in the environment including changes of the robots. o Develop a technique to identify obstacles within a dynamic and unknown environment. o Develop a technique for pose estimation in a task of object transportation, to more realistically estimate the pose of the object of interest. o Evaluate the performance attributes such as coordination, self-learning and adaptation, speed, and accuracy of the developed multi-robot system and its underlying methodologies. 1.3 Problem Definition Development of a physical multi-robot system is an important objective of the present work. This system consists of a group of intelligent autonomous robots, which have the ability to work cooperatively for transporting an object to a goal location and orientation. The work environment is considered unknown and dynamic, and obstacles may be present or even appear randomly during the transportation process. Multi-robot coordination and machine learning are integrated into the developed physical system to cope with the challenges of the problem. A schematic representation of the developed system is shown in below Figure 1.2. 4 Figure 1.2: Schematic representation of the developed system. Three or more autonomous mobile robots are present in the system, which enable the cooperative transportation of an object to a goal location. During the course of the transportation, they have to interact with each other to establish the strategy of coordination and cooperation. The feasible and suitable locations and points of action have to be established for each robot assisting in the cooperative transportation. The object has to be transported quickly and effectively while avoiding obstacles that may be present during the transportation process. Furthermore, the system will have to avoid damage to the transported object. In the developed system, charge coupled device (CCD) cameras, laser range finders and sonars are used to monitor and measure the current location and orientation of an object. The dynamic and unpredictable environment contains both fixed objects and also movable obstacles, which may appear randomly. The robots and sensory system are separately linked to their host computers, which are connected though a local network to implement machine intelligence and system control. Wired/wireless Router TCP/IP 1I Server 5 It is important to consider three major challenges in the development of the multi-robot system. The first one is the dynamic and unpredictable environment. In particular, dynamic obstacles may be scattered arbitrarily within the working environment. The obstacles may appear and disappear randomly during the transportation process. In addition, there will be multiple robots working parallel. While one robot makes decisions, some other robots may have changed the environment according to their behaviors. This also makes the environment dynamic from the view of a single robot. As a result, the process of decision making becomes rather complex for the robots within the environment. Challenge is to make decisions in real time and learn to identify the conditions of the surrounding environment from the local viewpoint of the robots. The second challenge results from the localized sensing capabilities of the robots. In the present project it is assumed that each robot is capable of detecting an object or obstacle within a limited detection radius. The robots possess local sensing capabilities only, and consequently the global environment is generally unknown to them. A robot will not have a good knowledge about the location or the shape of an object until it moves sufficiently close to the object. The unknown environment is another major challenge in the multi-robot domain, because the robots do not have a complete understanding about the environment in advance, partly due to the unfamiliarity and partly due to the dynamic nature of the environment. Thus the system will not be able to utilize the traditional planning methodologies for the decision making process. The third challenge is the lack of reliable communication among robots, where the disturbances are represented by white noise. Due to unreliable communication, the robots may end up selecting a wrong coordination strategy. Improvement of the robustness of multi-robot decision- making in an environment with unreliable communication is also a challenging issue in the multi-robot domain. Normally, the environment of a multi-robot system is dynamic and unknown. In such a situation, the communication of robots is generally inconsistent. In this thesis, several approaches are introduced to meet these challenges in multi-robot systems in enabling them to work effectively in an unstructured environment. 6 1.4 Related Work A significant amount of work has been performed by the robotics community in the past 10 years in multi-robot coordination and learning for decision making, particularly to support effective and robust operation of multi-robot systems in both simple and dynamic environments. This section presents some relevant work in this context. Cao et at. (2006) presented a multi-robot hunting task in an unknown environment. They proposed a distributed control approach involving local interactions with respect to local coordinate systems. Proposed approach can cope with the cumulative errors of wheels of the robots and also unreliable communication networks. Computer simulation validated their approach. Kumar and Garg (2004) studied a multi-robot cooperative manipulation with two industrial robotic arms. They introduced a fuzzy logic-based approach to coordinate the motions of the two robotic manipulators so that the internal forces among them became minimum. In addition, they used Kalman filtering to estimate the external force on the end effecter, based on the information from a force/torque sensor mounted on the robot wrist. Stone et al. (2006) examined the issue of uncertainty in multi-robot systems. They identified several sources of uncertainty and proposed a method for reducing the uncertainty and making decisions in the face of uncertainty. Their method enables effective planning under uncertainty for robots engaged in goal orientated behavior within a dynamic and complex environment. The scaling issues of multi-robot systems are presented by Gustafson et al. (2006). They used several abstract models to elucidate that it was counterproductive to increase the number of robots or the sensor strength in a multi-robot system. Mataric et al. (2006) presented a mathematical model for dynamic task allocation in multi-robot systems, based on stochastic processes. They made an assumption for a large-scale multi-robot system with local sensing, behavior based controller, and distributed task allocation capabilities. Through storing the history of the environment examined in the internal state of a robot, they tried to cope with the issues of local sensing and indirect communication in multi-robot systems. Their model was demonstrated for a multi-robot foraging task. 7 In another paper, the same authors (Gerkey and Mataric, 2004) developed a taxonomy for MRTA (multi robot task allocation). They proposed three axes for use in describing MRTA problems: (1) Single task robots (ST) versus multi task robots (MT). (2) Single robot tasks (SR) versus multi robot tasks (MR). (3) Instantaneous assignment (IA) versus time extended assignment (TA). Based on introduced method, they compared six typical multi-robot architectures with their MRTA approaches. Many valuable surveys have been conducted on multi-robot systems. Farinelli et al. (2004) completed a survey on multi-robot systems and proposed a new taxonomy for multi-robot coordination. Multi-robot tasks were classified as unaware systems, aware but not coordinated systems, weakly coordinated systems, strongly coordinated and strongly centralized systems, strongly coordinated and weakly centralized systems, and strongly coordinated and distributed systems. In a survey carried out on existing multi-robot architectures, Parker (2000) pointed out several challenges in typical multi-robot tasks. Specially, she categorized the available architectures into three multi robot architectures as: general architecture, specialized architecture and hybrid architecture. In addition, she classified three typical multi-robot tasks and their main challenges. For multi-robot object transportation, a major challenge is uneven terrain; for multi-robot learning, a major challenge is how to assign credits among robots; and for multi-robot motion coordination, the key challenge is computational complexity. There are a number of surveys that review Multi-Robot Learning (MRL) or Multi-Agent Learning (MAL). For example, the survey presented by Yang and Gu (2004) identified that scaling an individual reinforcement learning algorithm to multi-robot systems would violate its Markov decision process assumption, which is the weakness of many available multi-robot reinforcement learning approaches. They also classified four frameworks of multi-agent reinforcement learning: the stochastic games (SGs)-based framework, the fictitious play framework, the Bayesian framework, and the policy iteration framework. Furthermore, they categorized multi-agent reinforcement algorithms for possible application to multi-robot systems. They argued that there were four major challenges in applying reinforcement learning to multi robot systems: the Markov decision assumption, continuous spaces, uncertainties, and 8 incomplete information. Three future directions were pointed out as well: the behavior based approach with reinforcement learning, fuzzy approximation functions, and continuous reinforcement learning. Panait and Luke (2005) presented a survey on multi-agent learning for cooperation and they discussed the two main challenges in cooperative multi-agent learning; namely, the large learning space and the dynamic environment. Furthermore, they classified two types of learning: concurrent learning and team learning. The major issues in the area of concurrent learning were discussed as: credit assignment, learning dynamics, teammate modeling and relationship between learning and communication. 1.4.1 Multi robot Coordination To improve a mechanism for coordinating multiple robots in the execution of cooperative tasks has been a challenging objective in multi-robot systems. Multi-robot coordination has been carried out according to a developed multi-robot architecture. Multi-robot architectures seek to improve the coordination mechanism for the object transportation process. Primarily five types of multi robot coordination approaches are briefly discussed now; namely, the behavior-based approach, the planning-based approach, the market-based approach, the distributed control-based approach, and the learning-based approach. 1.4.1.1 Behavior-based approach for coordination Parker (2000) introduced a distributed and behavior-based multi-robot architecture called L ALLIANCE. It employs the concepts \u00E2\u0080\u009CImpatient\u00E2\u0080\u009D and \u00E2\u0080\u009CAcquiesce\u00E2\u0080\u009D to dynamically stimulate behaviors for coordination among robots. Moreover, by evaluating the performance of the teammates and dynamic changes in the environment, L-ALLTANCE autonomously updates its parameters to adjust to those changes. Introduced architecture was validated in a box pushing task with heterogeneous mobile robots. The behavior based multi-robot coordination, CAMPOUT, has been introduced in the Robot Construction Crew (RCC) project developed in the Jet Propulsion Laboratory (JPL) of NASA. CAMPOUT is a fully distributed, behavior based multi-robot architecture and it uses leader follower distributed coordination strategy. CAMOUT was validated using a multi-robot 9 transportation task in an autonomous deployment project on a planetary surface. In their latest work they introduced a multi-robot autonomous construction task based on the CAMPOUT architecture. Mataric et a!. (2002) developed a behavior based approach for a distributed multi-robot system. They developed controllers which were robust to noise and robot failures. Three versions of the task were evaluated in a spatio-temporal context using interference, time to completion, and distance traveled as the main investigative parameters. Konidaris and Hayes (2005) proposed to integrate the behavior-based approach with reinforcement learning for multi-robot coordination. Furthermore, they suggested to use topological maps to reduce the learning space in reinforcement learning. Camacho et a!. (2006) introduced behavior based coordination for robot soccer agents. In their work, they allocated each robot a duty for execution of the particular defined task. A rule-based RECCA control algorithm was also proposed. 1.4.1.2 Planning-based approach for coordination Miyata et a!. (2002) presented coordination of multiple mobile robots for cooperative transportation in unknown static environments. They introduced a centralized approach for task planning and assignment. Priority-based assignment algorithms were used for multi-robot coordination and motion planning. 1.4.1.3 Distributed control approach for coordination Wang and de Silva (2005), Zaerpoora et al. (2005), Pereira et al. (2004), and Chaimowicz et al. (2004) presented the caging formation control problem in multi-robot coordination for cooperative transportation. They proposed an \u00E2\u0080\u009CObject Closure\u00E2\u0080\u009D strategy to coordinately move the object with multiple mobile robots while maintaining the formation. The major challenge was to establish a bounded movable area for the object during its transportation. In this case, the contact between the object and the robots did not need to be maintained by the controller of each robot. They termed it \u00E2\u0080\u009CObject Closure\u00E2\u0080\u009D, which was used as a type of distributed formation control strategy for multi-robot cooperative carrying tasks. Basically, the \u00E2\u0080\u009CObject Closure\u00E2\u0080\u009D approach is a 10 sub-category of behavior based multi-robot coordination. However, it places its emphasis on the distributed control features. Marshall et al. (2006) introduced a distributed control strategy called \u00E2\u0080\u009Ccyclic pursuit\u00E2\u0080\u009D for multi robot coordination. The particular approach was found to be robust in the presence of Un- modeled dynamics and delays due to sensing and information processing. 1.4.1.4 Market-based approach for coordination Multi-robot task allocation is a vital and complex area of multi-robot coordination and the conventional approaches have seen low success in the presence of a large number of robots in the system. However, market-based approaches seem better for solving the complex multi-robot task allocation problem, and they are gaining popularity. For example, Mataric et a!. (2002) presented an auction-based approach for multi-robot coordination. They utilized the publisblsubscribe first price auction method and validated it in a multi-robot box pushing task. During the transportation process a \u00E2\u0080\u009Cwatcher\u00E2\u0080\u009D robot monitors the transportation task, and \u00E2\u0080\u009Cpublishes\u00E2\u0080\u009D the required auctions for use by the \u00E2\u0080\u009Cpusher\u00E2\u0080\u009D robots in the same environment. The \u00E2\u0080\u009Cpusher\u00E2\u0080\u009D robots determine the available task at a given instance though \u00E2\u0080\u009Csubscribing\u00E2\u0080\u009D to the information from the \u00E2\u0080\u009Cwatcher\u00E2\u0080\u009D robot. By matching its own abilities to the required tasks, each \u00E2\u0080\u009Cpusher\u00E2\u0080\u009D robot bids for a task. When the \u00E2\u0080\u009Cwatcher\u00E2\u0080\u009D robot receives all bids from the \u00E2\u0080\u009Cpusher\u00E2\u0080\u009D robots, it selects the most suitable \u00E2\u0080\u009Cpusher\u00E2\u0080\u009D for the particular tasks and then the \u00E2\u0080\u009Cwatcher\u00E2\u0080\u009D robot broadcasts the decisions to the \u00E2\u0080\u009Cpusher\u00E2\u0080\u009D robots in the system. Particular auction-based approach was validated for a multi- robot box pushing task in a static environment without obstacles. 1.4.1.5 Learning-based approach for coordination Typically multi-robot systems work in a dynamic and unknown environment where a traditional behavior-based approach can easily fail because the designer cannot foresee all possible world states the robot would encounter and design the corresponding behavior. A behavior-based multi-robot system will work well in a known and structured environment. Learning capabilities are important for multi-robot systems to work properly in this type of environments. Most existing work in multi-robot learning utilizes reinforcement learning due to its simplicity and good real-time performance. For example, Ito and Gofuku (2004) proposed a two-layer multi robot architecture for multi-robot coordination in cooperative object transportation. They 11 introduced a centralized machine learning method in the top level of this architecture for decision making. A distributed approach of rule-based control was developed for the lower level of motion control of the robot, to reach a specified position and take a specific action according to the commands from the upper level. Their approach reduced the learning space by integrating reinforcement learning with genetic algorithms. 1.4.2 Machine Learning in Multi-Robot Systems Learning capabilities are desirable for multi-robot systems when the working environment is unknown, unstructured and dynamic. Technologies of machine learning have been employed commonly for this purpose. Among the machine learning approaches, reinforcement learning and particularly Q-Learning has been quite popular due to its simplicity and good real-time performance. A well-known work in this area, Parker et al. (2002) studied two significant aspects in multi- robot learning called learning new behaviors autonomously and learning parameter adjustments. They proposed two approaches for autonomously learning new behaviors in multi-robot systems. The first approach is called distributed pessimistic lazy Q-learning and it combines lazy learning with Q-learning. It also uses a pessimistic algorithm for the credit assignment problem. The second approach, called Q-Learning with VQQL (vector quantization with Q-learning) and the generalized Lloyd algorithm, addresses the generalization issue in reinforcement learning. Both approaches have been validated in a COMMMT (Cooperative Multi robot Observation of Multiple Moving Targets) project. Let us consider learning for parameter adjustment. Their previous paper (as mentioned under behavior-based approach) introduced the behavior-based architecture called L-ALLIANCE, which enables robots to autonomously update their control parameters so that they can adapt their behavior over time in response to changes in the capabilities of the robots, team composition, and the environment. Kapetanakis and Kudenko (2002) established two of multi-agent learning techniques called multiple single-agent learning and social multi-agent learning. In the multiple single-agent learning technique, each learning agent observes other agents as a part of the environment and an individual agent does not have any explicit knowledge of about other agents. However, in social multi-agent learning, agents can have a high level of knowledge of other agents and integrate this 12 knowledge in the learning process, potentially using local modeling techniques, communication and coordination to support the learning task. Martinson and Arkin (2003) presented a multi-robot foraging task with Q learning. They used Q learning to select roles of the robots dynamically and it was integrated with the behavior-based approach. The learning space was reduced by the behavior representation. They did simulation to verify their approach. Dahi et al. (2004) presented a Q learning-based multi-robot cooperative transportation task in a simple environment. They proposed a two-layer multi-robot architecture. The Q learning-based behavior-selection subsystem was located in the upper level and the behavior-based reactive subsystem was located in the lower level. Further, they introduced two communication strategies to speed up the convergence of Q learning. Kovac et al., (2004) addressed a \u00E2\u0080\u009Cpusher-watcher\u00E2\u0080\u009D multi-robot box pushing problem with reinforcement learning. According to their approach one robot acted as a \u00E2\u0080\u009Cwatcher\u00E2\u0080\u009D which observed the current world state and broadcasted it to other robots. The other robots in the project acted as \u00E2\u0080\u009Cpushers\u00E2\u0080\u009D who selected their respective pushing actions using Q learning. A major weakness of this approach is the need for a robust communication network. Taylor and Stone (2005) studied a methodology to transfer learned knowledge from one task to another with a different state-action space. This is an important issue in reinforcement learning. This transfer is beneficial for the reduction of the training time in reinforcement learning. Particular methodology used the specific domain knowledge to construct a knowledge mapping function between two considered tasks so that the knowledge transfer happened effectively. Tangamchit et al., (2003) addressed several crucial factors affecting decentralized multi-robot learning in an object manipulation task. They recognized four factors that directly affect the distributed multi robot learning: the reward type, the learning algorithm, diversity, and the number of robots. They concluded that the reward type and the learning algorithm considerably affected the final results, and the scalability of the system would also affect the learning speed but had only a slight effect on the final outcomes. 13 It is clear that reinforcement learning is rather popular in multi-robot coordination. The environment in a multi-robot task is typically dynamic because the robots themselves also change in their shared environment. Therefore, the single agent reinforcement learning algorithm when extended to a multi-robot system violates its assumption of a stationary environment. A number of multi-agent reinforcement algorithms have been introduced in the multi-agent area, namely, mini-max Q learning algorithm, Nash Q learning algorithm, friend or foe Q learning algorithms, team Q learning algorithm, and so on. The team Q learning algorithm is a simplified version of the Nash Q learning algorithm. All these algorithms assume a dynamic environment and allow each robot to observe the actions of its peers before it makes its own decisions. The large learning space is the main hurdle in multi-agent reinforcement learning algorithms. Each robot needs to monitor its own actions and its peer actions as well. Therefore, when the scalability of the system increases, the resulting learning space will be excessive. However, when the number of robots increases, the single agent Q learning algorithm still can provides reasonable performance due to its fixed learning space. Multi-robot community has tried to employ numerous approaches such as neuro-fuzzy, genetic algorithms (GAs), and so on to reduce the large learning space in multi-robot coordination. However, research in this area is still in its infancy, and there are many open questions yet to be explored. 1.4.3 Pose Estimation Technologies for Multi-robot Systems Estimation of pose (i.e., position and orientation) is essential for object transportation in a multi- robot system. Technologies of pose estimation are utilized to estimate the real-time pose of objects, including obstacles and the other robots, in the work environment. It will help identify the world state and make appropriate decisions accordingly. Researchers have utilized various sensory methods such as digital CCD (charge coupled device) cameras, global positioning systems (GPS), laser range finders and sonar devices to identify the pose of an object in a robotic work environment. Also, sensing algorithms have been developed to recognize, identify objects and estimate features or poses of the objects in the environment. 14 Lang et al. (2008) presented a multiple sensor-based method for robot localization and box pose estimation. A CCD camera mounted on the robot was used to find the box in its environment. A laser range finder mounted on the robot was activated to measure the distance between the laser source and the object. Finally homogenous matrix transformation was applied to represent the global pose of the robot and the box. The approach was validated using computer simulation. Park et al. (2006) introduced a method for global localization of an indoor environment, which employed object recognition using a stereo camera. Their method of global localization first made a coarse estimation of the pose and then refined it. The coarse pose was estimated by means of object recognition and least squares fitting through singular value decomposition, while the refined pose was estimated by using a particle filtering algorithm with onmi-directional metric information. Even though a vision system has been used for pose estimation, the approach has a number of drawbacks. In particular, the presented schemes are computationally expensive, time consuming, have time delays, and are susceptible to changes in the ambient lighting conditions. Ekvall et a!. (2005) studied a computer vision approach for object recognition and pose estimation based on color co-occurrence histogram and a geometric model. Even though the approach was somewhat robust, it failed under large changes in lighting conditions. Kato et al. (2003) presented a method for identifying a robot and obtaining its relative position in a multi-robot system using an omni-directional vision sensor. Their approach was validated using the simulation and experimentation. Similar work has been done by Hajjdiab et a!. (2004). A vision based approach for multi-robot simultaneous localization and mapping (SLAM) was presented. They presented a method to calculate the locations of the robots using a collection of sparse views of the planar surfaces on which the robots were moving. The camera movements were estimated using inter-image homographs computed by matching the overhead transformed views. Tracking of multiple moving objects using vision sensors is an important issue in road traffic control systems. Chen et al. (2005) introduced a framework for spatiotemporal vehicle tracking based on unsupervised learning segmentation and object tracking. Adaptive background learning 15 and subtraction method was applied to two real life traffic video sequences in order to obtain accurate spatiotemporal information on vehicle objects. Laser range finders and CCD cameras are commonly-used sensors in mobile robots. Most laser range finders have the ability to scan the surrounding environment and accurately detect objects within a radius of 50 meters, and return a distance measurement for each degree in a 180 degree range. On the other hand, CCD cameras can be used to detect the shape, color and surface features of the objects of interest and return the corresponding vision information. Researchers have combined these two sensors for object recognition and localization. Murarka et al. (2006) proposed a hybrid method incorporating laser range finders and vision for building local 2D maps to help an intelligent robotic wheelchair distinguish between safe and hazardous regions in its immediate environment. Here, laser range finders were used for localization and mapping of the obstacles in the 2D laser plane while vision was used for detection of the hazards and other obstacles in the 3D space. The information on the obstacles was used to construct a local 2D safety map. Tomono (2005) used a mobile robot equipped with a laser range finder and a monocular camera to build a 3-D environment model. In this framework, they first built a 2-D map by scanning objects in the environment with a laser sensor, and then verified the detected candidates by using vision-based object recognition. Takahashgi and Ghosh (2005) proposed a method to identify the parameters of a moving object by integrating a laser range finder and a vision sensor. The approach has been reduced the dimension of the parameter ambiguity. Gopalaicrishnan et al. (2005) developed a mobile robot navigation project which integrated a laser range finder and a vision sensor. The laser range finder was used for localization and the vision sensor was used object detection. Also a route based navigation technique was used for mobile robot navigation. 16 1.5 Contributions and Organization of the Thesis This thesis performs research and develops advance techniques which will help multi-robot systems operate in a robust and effective manner in a dynamic and unknown environment. The main contributions in the thesis are as follows: o A multi-robot coordination mechanism and a fully distributed three-layer multi-robot hierarchical architecture are developed to carry out cooperative tasks, which integrates machine learning and behavior-based approach of robot coordination. This coordination technique enables efficient state identification and coordination among robots. The developed system architecture has the ability to dynamically allocate a task in an unknown environment. o An improved reinforcement learning algorithm termed the \u00E2\u0080\u009Cmodified Q learning algorithm\u00E2\u0080\u009D is developed, which is able to provide efficient state identification and optimize the action selection conflicts in multi-robot learning. It also helps deal with some key issues in multi-robot learning; for example, task assignment, reduction of the learning space, and circumventing the Markov assumption. o A method is developed to estimate the pose of an object, which can be used to track potential obstacles and the other robots in the work environment simultaneously during object transportation. This approach is more realistic than other available approaches. It is validated using two types of physical experimentation. o A physical multi-robot transportation system, which integrates the developed techniques, is developed in our laboratory (Industrial Automation Laboratory). This multi-robot system works in a dynamic and unknown real-world environment and shows effectiveness, flexibility and overall good performance. The organization of the thesis is as follows. Chapter 1 has discussed multi-robot systems and the research objectives of the thesis. It defined the object transportation problem as studied in the thesis, and discussed related past work. Chapter 2 addresses the multi-robot coordination mechanism and the three-layer multi-robot architecture. Two techniques of multi-robot coordination included in multi robot architecture - the behavior-based approach and the machine 17 learning approach - are discussed and their relationship is presented. Several pertinent issues of distributed controls for multi-robot coordination are discussed. Chapter 3 proposes a modified Q learning algorithm in the multi-robot domain to facilitate robotic decision making in a dynamic and unknown environment. This algorithm is assessed using a multi-robot transportation task in simulation. The results are analyzed, and the advantages and disadvantages of the algorithm are indicated. In Chapter 4, an innovative method for detennining the pose of an object is developed, which will facilitate the robots to identify useful objects in a work environment. The technology is validated using a task of multi-robot object transportation, in simulation. Chapter 5 presents a physical multi-robot cooperative transportation system operating in a dynamic and unknown environment. Microsoft Visual Studio 2005 \u00E2\u0080\u0094 distributed computing model is introduced to allow robots to effectively communicate and cooperate. Experimental results are presented to study the performance of the developed system, particularly concerning adaptively and robustness. Chapter 6 summarizes the primary contributions of the thesis and indicates some relevant issues for future research. 18 Chapter 2 Multi-Robot Coordination for Object Transportation 2.1 Overview Multi-robot coordination can be defined as the ability to manage the interdependencies of activities between mobile robots. Coordination among a group of robots should in principle improve the overall performance of the team of robots, as robots may share their world views and may negotiate to identif\u00E2\u0080\u0099 the present world state and cooperatively select the actions. However, in practice, effectively handling in real-time, multi-robot information and coordination is a challenging task. Many open questions remain in the area of multi-robot coordination. How should a group of robots divide tasks among its members? Once roles have been assigned to the robots, how should they position themselves to fulfill their roles without interfering with their team-mates? What happens if a robot fails or if the environment changes so that a different robot is more suitable for the task? The coordination of multi-robot teams in dynamic environments is one of the fundamental problems in multi-robot transportation. The underlying question is, given a group of robots and a task to be performed in a dynamic environment, how the robots should be coordinated in order to successfully complete the task? Coordination normally implies synchronization of robot actions and exchanging of information among the robots. The level of synchronization and communication depends heavily on the task requirements, characteristics of the robots, and the environment. Thus, different levels of coordination are required in different situations. Coordination of a group of mobile robots is performed according to a multi robot architecture developed in the present work. The architecture represents the organization structure of the key components and their functional descriptions in a multi-robot system. Promoting cooperation or coordination among robots and enabling a member robot to make rational decisions are the main objectives of the architecture. 19 In this chapter, a framework for a group of robots to coordinate their activities and a hierarchical multi-robot architecture are proposed. This hierarchical multi-robot architecture integrates machine learning and behavior-based approaches. In section 2.2 a fully distributed hierarchical multi-robot architecture for robotic decision making and detailed implementation is described. Section 2.3 presents a coordination framework for a group of robots. This framework for multi- robot coordination facilitates global sharing of information during the stage of object environment state identification. It also describes how obstacle identification in the object neighborhood may be combined with multi-robot coordination. Finally, in section 2.4, relevant distributed control issues in multi-robot systems are discussed. The multi-robot architecture and coordination framework developed in this chapter will be validated using computer simulations and physical experiments. Details of the validation process will be presented in the subsequent chapters. 2.2 General Hierarchical Architecture for Multi-Robot Coordination There are five main approaches for multi-robot coordination as mentioned in section 1.4.1: the behavior-based approach, the planning-based approach, the distributed control-based approach, the market-based approach and the learning-based approach. Among these approaches, the behavior-based approach is the most popular one because of its simplicity and robustness. However, this approach is particularly suitable for a known and structured environment. The major drawback of this approach is that the designer must have an ability to predict all the available environment states, because he needs to design all possible behavior rules. However, most realistic systems need to make decisions in a dynamic and unknown environment, and this faces numerous challenges as indicated below. First, a human must design all the behavior rules and individual robots must establish their preferences in advance. It is virtually impossible for a human designer to predict all possible environmental states that the robots will encounter and design the relevant behavior rules for them. On the other hand, within a dynamic and unknown environment, behavior-based robotic decision making can easily fail when the number of environmental states is large. Second, typically there will be a large number of environmental states in a dynamic and unstructured environment. The designer must design a general rule base to deal with all these environment states. However, it will be virtually impossible for the designer to arrange the preference for each 20 behavior rule in an extensive rule base due to the problem of \u00E2\u0080\u009Ccombination explosion.\u00E2\u0080\u009D Therefore a purely behavior-based approach for multi-robot coordination is not feasible in a unknown and dynamic environment. Planning-based approach utilizes planning techniques of traditional artificial intelligence (Al) to establish optimal decisions for the robots (Miyata et al., 2002) in a multi-robot system. The centralized approach and the distributed approach are the two types of planning-based approaches for multi-robot coordination. In the centralized planning approach, the actions are planned by a single robot in the system and this robot assigns subtasks to all other robots in the team. Task allocation among multiple robots is a NP (non-deterministic polynomial) time problem in the theory of computational complexity. In fact computational complexity is the key disadvantage of this approach. For a practical multi- robot systems consisting of large number of robots, finding an analytical solution is very difficult and even impossible, through this approach. A compromise would be to approximate it by employing an Al based optimization technique. Furthermore, the lack of communication will make the centralized approach weak. On the other hand, the distributed planning approach plans robot actions independently and achieves a globally optimal decision by integration the individual decisions of all the robots. In view of its difficulty, few successful examples have been reported in the literature. In addition, both centralized and distributed planning approaches can easily fail in a dynamic and unknown environment since these approaches are unable to alter their policies (action sequences) for the robots quickly enough to adapt to a rapidly changing environment. The distributed control approach (Pereira et al., 2004; Wang and de Silva, 2005) utilizes a distributed control strategy such as \u00E2\u0080\u009CForm Closure,\u00E2\u0080\u009D \u00E2\u0080\u009CObject Closure,\u00E2\u0080\u009D or \u00E2\u0080\u009CLeader-Follower Control\u00E2\u0080\u009D to assign sub-tasks to the robots for controlling the team formation in the system. Distributed control is not a general approach. In the class of multi-robot cooperative transportation, it is only suitable for special task situations, and this is a major disadvantage of the approach. Furthermore, the approach of distributed control also suffers from the problem of computational complexity. 21 The market based approach (Mataric et al., 2002) is a rather promising approach for multi-robot coordination in a dynamic environment. However, there are many open questions related to this approach. Its main challenges are computational complexity and real time performance. The machine learning-based approach attempts to circumvent the computational complexity problem which affects other multi-robot coordination approaches by approximating the optimal polices of the robots. In particular, in view of its good real-time performance and simplicity, reinforcement learning, particularly the Q learning algorithm, has been employed commonly (Parker et al., 2002; Martinson and Arkin, 2003; Martinson and Arkin, 2003; Dahi et al., 2004; Taylor and Stone, 2005; Wang and de Silva, 2007). In a survey paper, Arai et al. (2002) have recommended that the area of learning based multi-robot coordination. The learning capability is essential for multi-robot coordination and task execution in a dynamic and unknown environment. As suggested in Chapter 1, the learning capability will enhance the adaptability to the dynamic environment and assist the robots to identify new environmental states. The robots can continuously adjust their action policies and improve the performance based on the experiences of past success and failure in a dynamic environment, thus completing the required tasks in a faster and more robust manner. Furthermore, Wang and de Silva (2006) have shown that the single agent learning scheme when extended to a multi-robot system, will violate the original assumption of a stationery environment. Consequently, the learning algorithm may not converge to its optimal policy. These observations clearly show that the problem of decision making for multi-robot coordination is a challenging undertaking. A single approach may not be able to solve all the problems encountered in a cooperative multi-robot transportation process. Therefore, the combination of several existing techniques will be employed to develop a multi-robot architecture that can effectively support multi-robot coordination and cooperation in a dynamic and unknown environment. This fully distributed multi-robot hierarchical architecture integrates two popular artificial intelligence (Al) techniques; namely, machine learning and behavior-based approaches for decision making in a dynamic, unknown and unstructured environment. Figure 2.1 shows the 22 proposed three level multi-robot architecture that offers improved robustness and real-time performance for multi-robot object transportation. High Level Tack Decomposition Ard ASn[Tilt Behaviour -- based Distnbued Corirol Hardware Level Rcboj Figure 2.1: Hierarchical Architecture for Object Transportation. The hierarchical architecture for object transportation integrates different techniques in a hierarchical manner to provide flexible and robust capabilities to the robots for decision making while familiarizing with the external environment dynamically. Basically, it consists of three layers: Task Decomposition and Assignment layer, Behavior-based Control layer, and Hardware level layer, subject to the constraints of the available robots. Task Decomposition and Assignment represents a high level layer. It has a learning unit and a coordination unit which are used to implement inter-robot coordination. The distribution of available information among robots plays a vital role in multi-robot cooperative transportation. Particularly, one robot in the system will collect information from its own sensors, and will transmit the information to the system. All robots in the system will gain an understanding about the sensor information of the peer robots in addition its own, though the communication network. 23 Such communication will determine the current world state cooperatively, and decide the robotic actions sequentially using its learning or coordination units to reach the goal. The high level layer will perform inter-robot task decompositions and assignment. During the transportation process each robot may use a learning unit or a coordination unit for making decisions. Particularly, in the stage of environmental state identification an arbitrator will be assigned dynamically who will monitor the environment state identification. A coordination unit is used to coordinate the activities of the particular stage. The arbitrator will decide which unit is used to make decisions for the stage of environment state identification. For example, when the robot navigates in a dynamic environment, usually the learning-based unit is activated. The arbitrator typically switches on the learning unit for decision making of the robot. However, once the arbitrator realizes that a robot within the system has identified the object and estimated its pose, it will turn off the learning unit temporarily and switch on the coordination unit. This will coordinate the state identification stage for the robots in the system within the bounded local region called the detection area (Details are fund in Chapter 3). When the robots complete the state identification stage, the arbitrator will return the decision making capability to the learning unit. Through this type of switch mechanism, the multi-robot architecture benefits from multi- robot coordination and the flexibly adapt to various environments. A popular behavior-based approach is employed in the middle level of the architecture, to implement robust control of robotic actions. As stated above, the behavior based approach is known to be efficient in multi-robot cooperative control. The hierarchical architecture utilizes the behavior-based approach to execute sub-tasks as determined by the top level. When the task decomposition and assignment level of the architecture establishes the current sub-tasks for the robots, they will be sent down to the behavior-based control level so as to be decomposed in further detail of robot behavior. The behavior-based control level consists of four types of behavior. The communication behaviors are responsible for communication among robots. The group behaviors execute coordination among robots; for example, \u00E2\u0080\u009CAssume Formation\u00E2\u0080\u009D and they are typically fired by the learning unit or the coordination unit in the top level of the architecture. Composite behaviors are responsible for the control of composite actions of individual robots such as \u00E2\u0080\u009CFind Object,\u00E2\u0080\u009D \u00E2\u0080\u009CMove to Location\u00E2\u0080\u009D and \u00E2\u0080\u009CAvoid obstacle.\u00E2\u0080\u009D These composite behaviors can be decomposed further into primitive behaviors such as \u00E2\u0080\u009CTurn left,\u00E2\u0080\u009D \u00E2\u0080\u009CGo straight\u00E2\u0080\u009D and 24 \u00E2\u0080\u009CMove Forward.\u00E2\u0080\u009D Through the design of hierarchical behaviors and by carefully organizing their preferences, the robots can complete the required tasks as defined by the Task Decomposition and Assignment level, in a effective and robust manner. The low level of the architecture is the hardware level. It consists of sensors, actuators, and communication hardware of the robots. They compose the hardware platform of the hierarchical multi-robot architecture for object transportation. What is proposed is a general architecture for distributed multi-robot coordination and decision making. This architecture is common for all robots in the system. A composite representation of the proposed architecture is given in Figure 2.2. Robot 01 Robot 02 Robot 03 High Level Control High Level Control High Level Control Behaviour Control Behaviour Control Behaviour Control Hardw\u00E2\u0080\u0099arc Level 1\u00E2\u0080\u0094lard ware Level Hardware Level _________ A _____________ Figure 2.2: Composite representation of the proposed architecture The top level of the architecture implements \u00E2\u0080\u009Csoft\u00E2\u0080\u009D control of a robot while the middle level implements \u00E2\u0080\u009Chard\u00E2\u0080\u009D control behavior. In particular, the top level decomposes and assigns tasks among robots; and the behavior based control layer decomposes the actions of a single robot further into more detailed primitive behaviors and controls motions of the robots. By combining established mainstream approaches in multi-robot coordination, such as learning and behavior based methods, this multi-robot architecture provides more flexibility and cooperating power in a dynamic, unknown and unstructured environment than those presented in the literature. The proposed multi-robot architecture will be validated through computer simulation and experimentation, as presented in the following chapters of the thesis. 25 2.2.1 Machine Learning Unit The machine learning unit is located in the top level of the proposed multi-robot architecture. The main purpose of this unit is to learn optimal decision making in a given dynamic environment though continuously monitoring actions of robots and the corresponding environmental rewards. The principle of the machine learning unit is presented in Figure 2.3. Figure 2.3: Principle of Machine Learning. All robots in the system gain knowledge by continuously monitoring the current environmental state, selecting an action sequentially and executing it, and receiving rewards from the environment. At the sequential action selection stage, each robot needs to observe concurrently the actions of the peers in the system and review their effects on the environment. Each robot employs a machine learning technique to learn from the past experience with the work environment. The machine learning technique developed in the thesis continuously improves its performance and approximates its optimal policy for decision making. The machine learning unit dynamically adjusts the mapping relation among the environment state and its actions. Consequently, the machine learning unit will lead to better performance in the system robots and will complement the behavior layer in the middle level. In this thesis, Q learning is utilized and enhanced to establish the mapping relationship between the environment state and the robot actions. The maj or challenge arises in the modification of the 26 single agent Q learning to accommodate a multi-robot environment. Next chapter will discuss this issue in detail. 2.2.2 Coordination Unit In the prototype system there are three autonomous heterogeneous robots which attempt to push a rectangular object cooperatively from a starting position to a goal location. Many challenges exist in this transportation task. For example, there are obstacles distributed randomly in the environment and each robot possesses local sensing capabilities only. This means a robot will not know about an obstacle before it moves sufficiently close to the obstacle. Another challenge is that the object is rather long and heavy, so a single robot will not be able to handle it alone. The robots have to find optimal cooperative pushing strategies to move the object to the goal location while avoiding collisions with any obstacles in the environment. In order to reach this goal, an arbitrator is designed in the top layer of the proposed architecture, which will select either the learning unit or the coordination unit for multi-robot decision making. When a multi-robot system begins its task, the arbitrator selects the machine learning unit as its decision making unit. Robots start exploring the environment for color coded object to be transported. Once an object is detected by a robot in the system, it rearranges its pose to center the color blob in the camera frame. Then the appropriate robot estimates the pose (position and orientation) of the object and the information will be transmitted to the other robots. After the object is detected and the pose is estimated, the particular robot will temporarily close the machine learning unit and transfer its decision rights to the coordination unit. At the same time the robot that detected the object is assigned as the leader of the team and all other robots in the system will stop their learning units and transfer their decision rights to the coordination unit of the leader robot. After all the robots have transferred their decision making rights to the leader robot, the coordination unit is activated temporarily. Hence, the leader robot should assign tasks to peers in the system who will assist in identif\u00E2\u0080\u0099ing the obstacle distribution around the object. The leader robot transmits all the processed sensor information to the system, and the robots will be able to establish the current local environment state individually after switching on their learning unit. After the state identification stage, the coordination unit of the leader robot will stop working and will transfer the decision rights to all the learning units of the robots. 27 Figure 2.4 shows a representation of the local environment state in the multi-robot object pushing task which employs the modified Q learning algorithm. This will be described in detailed in the next chapter. In the present subsection it is demonstrated why the coordination unit is necessary for multi-robot tasks. Detection Radius _. \u00E2\u0080\u0094 \u00E2\u0080\u0094 I Goal Location 0 I / \u00E2\u0080\u0094 \u00E2\u0080\u0094/ \u00E2\u0080\u00A21 o ,\u00E2\u0080\u009C / \u00E2\u0080\u0094\u00E2\u0080\u0098 \ 1 9 , ,/ .//\u00E2\u0080\u00A2\u00E2\u0080\u00A2\u00E2\u0080\u00A2 \ I \. Object X / I/O I ,.. / \u00E2\u0080\u0098\u00E2\u0080\u00A2\u00E2\u0080\u00A2_%. 4 \u00E2\u0080\u0094\u00E2\u0080\u0094 ObstacleJ Figure 2.4: The environment state representation for coordination. Both learning and coordination units are important for a multi-robot system to successfully complete cooperative tasks in a dynamic and unknown environment in the presence of complex obstacle distribution. 2.2.3 Behavior-Based Distributed Control The proposed hierarchical multi-robot architecture is represented in Figure 2.1. The behavior based distributed control layer is designed to control the local actions of the robots in the system. It is located between the high-level task decomposition and assignment layer and the low-level hardware layer. Normally, the high-level task decomposition and assignment layer selects quite abstract actions, which cannot be directly executed using actuators in the hardware layer. Hence, 28 these high-level abstract actions need to be decomposed into more detailed actions in the layer of behavior based distributed control. The hierarchical behavior control architecture CAMPOUT has been developed at NASA Jet Propulsion Laboratory - JPL (Huntsberger et at., 2003) to control the robot behavior in a robust manner. Their hierarchical control architecture consists of two types of behaviors called group behaviors and primitive behaviors. The group behavior layer further decomposes them into composite and communication behaviors. The mid-level behavior control layer based on CAMPOUT clearly indicates the hierarchy of communication behaviors, group behaviors and composite behaviors. Communication behaviors are the high level behaviors relative to the group behaviors and group behaviors are accomplished though the communication behaviors. Composite behaviors are the low level behaviors relative to group behaviors. The general idea of the behavior control layer is presented in Figure 2.5. Behaviors serve as basic building blocks for robotic actions in the \u00E2\u0080\u009CBehavior-Based Systems.\u00E2\u0080\u009D Simply put, behavior is a reaction to a stimulus. Thus, in some literature, behavior systems are also called as reaction systems. In reactive robotic systems, actions of robots are tightly coupled with their perceptions without a knowledge reasoning process or time history. The foundation for behavior-based systems has originated from animal models. Normally, biology has provided an existence proof that many of the tasks we want the behavior-based robotic systems to undertake are indeed doable. Additionally, the biological sciences, such as neuroscience, ecology, and Figure 2.5: The hierarchical behavior-based control layer. 29 psychology have elucidated various mechanism and models that may be useful in implementing behavior-based robotic systems. Behavior-based systems have several advantages over planning-based systems. First, a behavior- based system is simple and easy to design. For a fast changing environment, a planning-based system is difficult or even impossible to implement because it cannot adapt to a fast changing environment. A behavior-based system is much easier to design than a traditional system with planning and reasoning by tightly coupling perceptions with actions and avoiding complicated time consuming reasoning and planning. Also, a behavior-based system can be successful even in a dynamic and unknown environment and it has shown better performance and robustness in physical robotic systems, if the designer can predict all possible environmental states and arrange the corresponding behavior rules to deal with them. 2.2.3.1 Behavior Representation Several methods are available for expressing robotic behavior. These methods are stimulus- response diagram (SR), functional notation, and finite state acceptor (FSA). SR diagrams are used for graphical representation of specific behavior configurations. Here, behaviors are expressed using Stimulus Response (SR) diagrams. Figure 2.6 shows an SR diagram. Stimulus 4&\iieIIaviou,, r Response Figure 2.6: SR diagram of a simple behavior. According to Figure 2.6 a behavior can be expressed using the stimulus, response, and behavior. Therefore, the triple (S, R, ,6) represents the behavior, where S denotes the set of all possible stimuli, R represents the range of the response, and fi denotes the mapping function between SandR. Binary coded duple (p, )) represents an individual stimulus s(s e 5), where p indicates the type of stimulus and ..% denotes the strength of the stimulus. A threshold value r is assigned for each stimulus, and the response will be generated only if the value of stimulus is bigger than v. 30 In addition, a response r can be expressed with a six dimensional vector consisting of six parameters for mobile robots. For each of the six degrees of freedom of motion, each parameter encodes the magnitude of the translation and orientation responses: r=[x,y,z,8,cb,\u00C3\u00A7i],where reR (2.1) The gain value g is introduced for a particular behavior /3 in order to further modify the magnitude of its response. A behavior is expressed as r=g*r=g*/3(s) (2.2) 2.2.3.2 Behavior Composition and Cooperation Behavior coordination has two predominant classes of coordination. These classes are competitive (composition) class and cooperative class. Each class has several strategies for realization. First consider the composition of behaviors which is described in Figure 2.5, which clearly shows how composite behaviors are decomposed further into primitive behaviors. Therefore, simple primitive behaviors can be joined to construct more abstract composite behaviors. Composite behaviors are behavior packages on which a behavior-based robot system is built. A behavior coordination operator and a number of behavioral components are included with each composite behavior. The behavioral components may be either primitive behaviors or composite behaviors. High level behaviors can be built on some simple behaviors. These high level behaviors can be referred without needing to know their behavioral components. Composite behavior is given as follows: p = C(G * R) = C(G * B(S)) (2.3) ii S1 g1 o...o r S 0 g /32 where, R , S = , G = 2 0 ,and B s, 0...0g 31 Here, B stands for the vector with n number of current composite behaviors (fit \u00E2\u0080\u0098182\u00E2\u0080\u0099 fin). S represents a vector of all detectable stimuli relevant for each behavior ,8 at time t; R represents a vector of response of n number of behaviors (181,182,. , /3,,) at time t; G represents a gain matrix which modifies the response matrix R; and C represents the behavior coordination operator for composite behavior (There are two types of behavior coordination operators, called competitive operator and cooperative operator, for two classes of behavior coordination). Figure 2.7 shows the SR diagram of a composite behavior with a competitive operator. Stiniu Ii\u00E2\u0080\u00A2 According to this figure, conflict can result when two or more behaviors are active simultaneously, each with its own independent response. Competitive method provides a means of coordinating behavior response for conflict resolution. This competitive operator is designed to select the winner behavior from among the simultaneously activated behaviors. Only the winner\u00E2\u0080\u0099s response is directed to the robot for execution. There are numerous competitive operators; for example, suppression-based arbitration, arbitration via action selection, voting based arbitration, and so on. The other type of behavior coordination operator is the cooperative operator, which is shown in Figure 2.8. p Response Figure 2.7: The SR diagram for a composite behavior with a competitive operator. 32 Si i-i Stimuli The cooperative method provides an alternative to a competitive method such as arbitration. It uses behavior fusion and provides the ability to concurrently use the output of more than one behavior at a time. The outputs of multiple behaviors are fused and directed to the robot for execution. It has been pointed out that multiple behaviors are executed by the robot one at a time if they do not conflict with each other. The simplest cooperative operator is performed through vector addition or super-positioning. The key to a cooperative operator is to design the behaviors so that their responses can be fused. The potential field-based method is the most popular approach, which simply adds the outputs of multiple behaviors. 2.2.3.3 Communication Behaviors for Coordination For coordinating the activities of the robots in a multi-robot system, communication behaviors are designed to receive information from the high level task decomposition layer, exchange sensing information, objectives, and internal states of the robots. The communication behavior coordination is implemented through the high level task decomposition layer which is located in the top level of the behavior-based control layer. The behavior communication layer sends its received information to the group behavior layer for low level control. Explicit communication and implicit communication are the two types of communication. Explicit communication concerns sending or receiving information via computer network data links. However, robots also have the ability to exchange information indirectly with its teammates by interacting with the environment or using sensing. For example, a robot can estimate its relative distance from another robot or from closest obstacle by using its CCD camera or laser distance finder. p Response Figure 2.8: The SR diagram of a composite behavior with a cooperative operator. 33 2.2.3.4 Group Behaviors for Coordination The group behavior layer is responsible for decomposing the abstract behaviors, which are received from the high level task decomposition layer though the communication behavior layer. The group behavior layer will carefully coordinate the activities of the robots with the purpose of managing and solving conflicts among robots and making them contribute to the common tasks. The learning or coordination unit continuously decomposes subtasks among multiple robots and assigns them to each robot. When a robot receives its sub-task, it will decompose the sub-task further into more detailed composite behaviors in its behavior-based distributed control level and finally convert them into primitive behaviors for execution. 2.2.3.5 Behavior Coordination for Object Transportation Subtask A fully distributed multi-robot coordinated object transportation system is developed in this thesis. The behavior-based distributed control layer facilitates the implementation of the sub- tasks which are assigning by the learning/coordination units in the top level of the proposed architecture. Figure 2.9 shows the behavior hierarchy in the behavior-based distributed control layer for the coordinated object transportation subtask. In the object transportation subtask, first the robots wander in the environment, looking to find an object of interest. The group behavior of the \u00E2\u0080\u009CAssume Formation\u00E2\u0080\u009D makes the robot move to an intended site around the transported object so as to establish a formation with its peer robots. The learning/coordination unit in the top level of the proposed architecture establishes the transportation formation of the robots; it decomposes into the specific position and orientation of the robots, and sends the determined subtasks to the behavior control layer for execution. When the \u00E2\u0080\u009CAssume Formation,\u00E2\u0080\u009D parent behavior is activated, it will first turn on its child behavior of \u00E2\u0080\u009CFind Object.\u00E2\u0080\u009D It attempts to search for the object to be transported. If this object found from the environment, it estimates the object pose (location and orientation) in the environment. A CCD camera and a laser range finder are used to accomplish this estimation. Simultaneously, the composite behavior \u00E2\u0080\u009CFind Object\u00E2\u0080\u009D will call the \u00E2\u0080\u009CSearch Color Blob\u00E2\u0080\u009D composite behavior to make the robot to rearrange its pose continuously and try to locate the predefined color blob attached to the object and move it to the center of the camera frame, by using its CCD camera. The predefined color blob can be used to distinguish the 34 object of interest from other objects or obstacles in the environment. Then the \u00E2\u0080\u009CLaser measurement\u00E2\u0080\u009D primitive behavior is triggered to estimate the pose (position and orientation) of the object using the laser range finder. After that the composite behavior \u00E2\u0080\u009CFind Obstacle\u00E2\u0080\u009D will call the \u00E2\u0080\u009CLaser measurement\u00E2\u0080\u009D primitive behavior to measure the distances to the obstacles from the object of interest, angles and sizes. Then, the behavior of \u00E2\u0080\u009CMove to a Location\u00E2\u0080\u009D will attempt to move the robot to the designated action position as decided by its higher level learning/coordination unit. The composite behaviors \u00E2\u0080\u009CMove\u00E2\u0080\u009D and \u00E2\u0080\u009CObstacle Avoidance\u00E2\u0080\u009D are activated simultaneously, until the robot moves to its designated action position. Figure 2.9: Behavioral hierarchy for the coordinated transport sub-task. r Lend: () V) [ Behar J 35 The group behavior \u00E2\u0080\u009CCarry\u00E2\u0080\u009D is activated after all robots reach their designated action positions, and creates a formation around the object of interest. When the \u00E2\u0080\u009CCarry\u00E2\u0080\u009D parent behavior is activated, it will first turn on its child behavior of \u00E2\u0080\u009CSignal\u00E2\u0080\u009D which attempts to start a synchronization process to transport the object. During the transportation process, the robots need to monitor whether the object collides with any obstacles in the environment. If the object collides with an obstacle, the robots will abandon the \u00E2\u0080\u009CCarry\u00E2\u0080\u009D behavior and inform the incident to their learning/coordination units in the top level task decomposition and assignment layer. Then learning/coordination units plan their formation again. Not at th goal postion Figure 2.10: FSA describing the transitions of the group behaviors in the task of coordinated object transportation (Wang, 2007). Finite State Diagrams (FSA) is a method for expressing robotic behaviors. Here, Finite State Diagrams (FSA) are used to describe the aggregations and sequences of the behaviors so that the process of behavior activation and transition becomes more explicit. A finite state diagram can be specified by the quadruple (Q,6,q0,F) with Q representing all behavioral states; q0 representing the initial state; F representing all final (termination) states; and \u00C3\u00B6 representing the mapping function which maps the current behavior state q and input a into a new behavioral Other No at isiqiaed pt;on Tha forniulator is eablished All 36 state q\u00E2\u0080\u0099, i.e., q\u00E2\u0080\u0099 = 5(q,a). The FSA describing the transitions of the group behaviors in the project of coordinated object transportation is shown Figure 2.10. 2.3 Mechanism for Multi-Robot Coordination A multi-robot coordination mechanism must cope with different types of requirements, such as robot positioning and synchronization. It should provide flexibility and adaptability, allowing multi-robot teams to execute tasks efficiently and robustly. To accomplish this, a dynamic role assignment mechanism is proposed in which robots can dynamically change their behavior, enabling them to successfully execute different types of cooperative tasks. This approach is validated using environmental state identification. A role is defined as a function like environment state identification, which one or more robots perform during the execution of a cooperative transportation task. The particular state identification stage consists of three roles (modes): s1 - Wandering; s2 - Object Identification, and s3 - Obstacle Identification. 2.3.1 Dynamic Role Assignment Mechanism for Coordination Unit In general, to execute a cooperative role a team of robots must be coordinated; specifically, they have to synchronize their activities and exchange information. In this approach, each robot performs a role that determines its activity during the cooperative task. According to its internal state and information about the other robots and the role received through communication, a robot can dynamically change its role. The mechanism for coordination is centralized, but it is activated only for a short time. As mentioned in subsection 2.2 the leader robot has the power to make decisions based on its local information and the information received from peers. Each team member has a specification of the possible activities that should be performed during each phase of cooperation in order to complete the state identification task. These activities must be specified and synchronized considering several aspects, such as robot properties, task requirements, and characteristics of the environment. The dynamic role assignment will be responsible for allocating the correct activities to each robot and synchronizing the cooperative execution though the leader robot in the system. Before describing the details of the role assignment mechanism, it is necessary to define what a role is in a cooperative transportation task. A role is defined as a function that one or more robots 37 perform during the execution of the state identification stage, which is a cooperative robotic task according to the modified Q learning algorithm. Each robot will perform a role while certain internal and external conditions are satisfied, and will assume another role otherwise. The role will define the present activity of the robot in that instance. Here three distinct conditions are considered for role transition. CCD camera and laser range finder is used as sensors for this particular task, generating high level stimulus to the system. The role assignment mechanism allows the robots to change their roles dynamically during the execution of a task, adapting their activities according to the information they have about the system and the task. Basically, according to the state identification stage; there are two ways of changing roles during the execution of a cooperative task. The simplest way is the Allocation, in which a robot assumes a new role after finishing the execution of another role. In the Reallocation process, a robot terminates the performance of one role and starts or continues the performance of another role. Figure 2.11 shows a diagram with the two types of role assignment. The vertical bars inside the robots indicate the percentage of the role execution that has been completed. Rok 01 Ro1, 02 I Aflocattion \u00E2\u0080\u0098\u00E2\u0080\u0094 \__. RaI1ocaiiic,n Figure 2.11: Types of role assignment. 38 The control software for a robot includes programs for each role it can assume. The local information consists of the robot\u00E2\u0080\u0099s internal state and its perception about the environment. Global information contains data about the other robots and their view of the system and is normally received through explicit communication (message passing). A key issue is to determine the amount of global and local information necessary for the role assignment. This depends on the type of the activity that is being performed. The present approach allows for two types of explicit communication: synchronous and asynchronous. In synchronous communication, the messages are sent and received continuously in a constant rate, while in asynchronous communication an interruption is generated when a message is received. Synchronous messages are important in situations where the robots must receive constant updates about the activities of the others. On the other hand, asynchronous communication is used for coordination when, for example, the leader robot needs to inform the others about the object pose or the leader robot need to interrupt the wandering role of the other robots, and so on. An important requirement is to define when a robot should change its role. In the role allocation process, the robot detects that it has finished its role and assumes another available role. For example, consider the robots which wander in the work environment to find a color coded object. If a robot detects the color coded object then it needs to identify the object (particularly, estimate the object pose). In the reallocation process, the robots should know when to abandon the current role and assume another role. A possible way to do that is to use a function that measures the utility of performing a given role. A robot which performs role r has a utility given by Pr\u00E2\u0080\u00A2 When a new role r\u00E2\u0080\u0099 is available, the robot computes the utility of executing the new role Pr\u00E2\u0080\u0099\u00E2\u0080\u00A2 If the difference between the utilities is greater than a threshold: T((Pr \u00E2\u0080\u0094 Pr) > r) the robot will change its role. The function p can be computed based on local and global information and may be different for different robots, tasks and roles. Also, the value r must be chosen such that the possible overhead of changing roles will be compensated for by a substantial gain on the utility and consequently producing a better overall performance. 39 2.3.2 Dynamic Role Assignment Model for Coordination Unit The dynamic role assignment can be described and modeled in a more formal framework. In general, a multi-robot system can be described by the robotic state (X), which is a concatenation of the states of the individual robots: X=[x1,x2 ,xj (2.4) Considering a simple control system, the state of each robot varies as a function of its continuous robotic state (xi) and the input vector (u,). Also, each robot may receive information about the other robots in the system (2). This information consists of estimates of the robotic states of the other robots that are received mainly through communication. We use the hat (A) notation to emphasize that this information is an estimate because the communication can suffer delays, failures, and so on. Using the role assignment mechanism, in each moment a robot will be controlled using a different continuous equation according to its current role in the state identification stage. Therefore, we use the subscript q to indicate the current role of a robot. By following this description, the state equation of robot i during the execution of the task is given by: i =fq(XjUj2j) (2.5) Since each robot is associated with a control policy, we have u1 =kjq(Xj,2j) (2.6) and since 2 is a function of the state X, we can rewrite the state equation: = fq(X) (2.7) or, for the whole team, X = F (X), whereFq = {J ,f]T (2.8) The continuous behavior of each robot (it represent the robotic state at any instance) is modeled by these equations. Based on these equations, the continuous actions of the whole team of robots 40 are implemented during the execution of a cooperative state identification stage. These equations, together with the roles, role assignments, continuous and discrete variables, communication, and synchronization can be better understood and formally modeled using a function A, as described next. This function consist of a finite number of real-valued variables that change continuously, as specified by differential equations and inequalities, or discretely, according to specific assignments. The state identification stage consists of both discrete states (roles) and continuous states (robotic states). Therefore, the nature of the particular multi-robot system in state identification stage is a hybrid, given by A = {S,V,T,f,condl,cond2,init,R} (2.9) Here S = {l, 2 , k} is the set of roles and this is a discrete set of states of the robot. According to Figure 2.12, a role is composed by other roles, which are called hierarchical/sequential compositions. Therefore, variables of the system V (V = V u J) can be composed by discrete information (Vd) and continuous information (Va) . J\u00E2\u0080\u0099 represents the robotic states and j\u00E2\u0080\u009D1 represents the sensory information within each role. The dynamics of the continuous information are determined by the flows f. Generally continuous information is updated according to the differential equations inside each role (fq) Discrete transitions between pairs of roles (p, q) are specified by transitionT, which represents the role assignment. condl and cond2 are predicates related to the set of roles (5) and transition T, respectively and both condl and cond2 are defmed when each robot will assume a new role. The system can stay in a certain control mode while its condl is satisfied, and can take transition when its cond2 (jump condition) is satisfied. The initial states of the system are given by mit, and each transition T can also have an associated reset statement R, to change the value of some variable during a discrete transition using discrete information. Furthermore, variables are also updated according to the reset statement of each discrete transition. Finally, we can model the cooperative state identification stage execution using a parallel composition. Figure 2.12 shows a role composed by the three roles: s1 - Wandering; s2 - Object Identification; s3 - Obstacle Distribution. Each role is characterized by differential equations and algebraic 41 constraints. The condition under which transitions between roles occur and the conditions under which robots stay in a particular role, are specified. Communication among robots can also be modeled in this framework. We use a message passing mechanism to exchange information among the agents. To model message passing in a hybrid automaton, we consider that there are communication channels between agents, and use the basic operations send and receive to manipulate the messages. In the hybrid automaton, messages are sent and received in discrete transitions. These actions are modeled in the same way as assignments of values to variables (reset statements). It is rather common to use a self transition, i.e., a transition that does not change the discrete state, to receive and send messages. 4 2\u00E2\u0080\u00992(x)<0 (1 CT Role br Robot i Figure 2.12: Schematic representation of a role composed by three roles. 42 2.4 Distributed Control Issues for Multi-robot Coordination Normally, a multi-robot system with a distributed control architecture does not employ a leader robot to make decisions for all other robots in the system. The proposed hierarchical multi-robot architecture is also distributed control architecture. In a distributed control system, each robot needs to decide its own actions independently by observing the environment or communicating and negotiating with its teammates. The only exception is the coordination unit in the proposed architecture which requires centralized decision making when the robots need to identifj the environmental state. However, the centralized coordination unit works only for the state identification phase which exists for a very short period of time compared to the transportation task. Once the multi-robot system leaves the state identification stage, the learning unit will takeover the control of the robot again. A distributed architecture has many advantages over a centralized control architecture, especially for multi-robot systems. First, a distributed multi-robot system has an ability to increase the number of robots in the robot team without significantly increasing its computational cost. This represents the scalability. For example, the architecture proposed in this chapter can be expanded from to 3 to 20 robots without serious problems. If a member robot quits or a new robot enrolls into the system during the transportation process, the system is capable of adapting to changes of the team size dynamically. For a centralized multi-robot system, scalability is a difficult issue because the computational complexity of a centralized decision making algorithm increases rapidly with the number of robots. Second, distributed systems have a low communication volume. A decentralized distributed system does not assign a leading robot. Therefore, bottlenecks in communication can appear. However, a centralized system has a central leader and when the number of robots in the system increases the communication bottleneck can be so serious that the algorithm becomes worthless. Furthermore, distributed systems are capable of accomplishing some of the cooperative tasks even without communication. Particularly, most realistic cases of mobile robot systems are expected to work with a limited communication capability, because most multi-robot systems need to run in an environment where communication among robots is usually unreliable; for example, the bottom of a deep ocean or on a planetary surface. If the system does not establish reliable communication among robots, centralized decision making becomes weak because the 43 robot members cannot receive commands from their leader in a timely and accurate manner. However, distributed multi-robot system may work well even when communication among robots is not reliable. Third, the distributed systems are robust. Each robot in the system can make their decisions individually to accomplish a common task. When one robot in the system malfunctions or quits, other robots still can make decisions by themselves to continue their tasks. However, in a centralized system one leader robot makes decisions for all other robots in the team. Hence, once the leader robot fails, the whole system will fail entirely. Therefore, such systems are not robust compared to distributed control systems. Low efficiency is a major disadvantage of a distributed system, where multiple robots reach their agreements through some cooperation technique such as voting, action, and so on. These cooperation approaches are usually time consuming and are not efficient. Normally, distributed learning algorithms are much more difficult and complex than centralized equivalents. Therefore, a distributed system is usually not optimal, because the final policy reached by the members is not optimal. It is still an open question as to how to implement efficient and optimal distributed decision making. 2.5 Summary In this chapter, a three layer distributed multi robot architecture, which accommodates distributed machine learning, behavior-based distributed control and low level \u00E2\u0080\u009Chard\u00E2\u0080\u009D control, was developed. The objective of the architecture is to support multi-robot systems to properly coordinate the robotic tasks in a dynamic and unknown environment. The proposed architecture is a general platform that merges several existing techniques of multi-robot decision making, and accommodates homogenous or heterogeneous robots in the same framework. Before proposing the architecture, several popular multi-robot coordination approaches given in the literature were discussed, and their advantages and disadvantages were compared. The discussed multi-robot coordination approaches are the behavior-based approach, the planning based approach, the distributed control-based approach, the market-based approach, and the learning-based approach. 44 Next, the machine learning unit and the coordination unit were introduced. In the proposed architecture, an arbitrator dynamically selects either the learning unit or the coordination unit to make decisions for the robots. Then the behavior-based control layer and their sub-layer were presented. In this thesis, it is combined with the learning and coordination units in the top level of the architecture to control the actions of a robot. After that, the mechanism of multi-robot coordination was presented. This framework, which facilitates a group of robots to coordination their activities, allows global sharing of information during the stage of object environment state identification. Finally, some distributed control issues of multi-robot systems were discussed. The advantages and disadvantages of both centralized and distributed architectures were pointed out and some open questions in this field were identified. 45 Chapter 3 Machine Learning for Multi-Robot Decision Making 3.1 Overview Multi-robot decision making is a challenging task in many real world problems, which involve real-time operation of robotic systems in uncertain and dynamic environments. In these problems, actions and interactions of multiple mobile robot have to be considered concurrently so as to reach a global optimal policy. Among the available approaches, planning-based approach is a contender for dealing with this problem. However, planned optimal policy usually is feasible in static environments, and it is shown that the planning-based approach is inapplicable to fast changing environments. Current approaches for decision making in uncertain and dynamic multi-robot environments such as planet surfaces and urban environments have two main difficulties. First, many approaches do not take into account the uncertainty inherent in these problem domains. For example, in the multi-robot environment, terrain features can change continuously. Second, for systems with many robotic agents, state of the art methods can grow to be too cumbersome to be and suboptimal. For example, even though the physical environment is stationary, from the perspective of a single robot, the particular static environment is still a dynamic environment because the other robots of a multi-robot system take actions, continuously making changes to the environment. In view of these considerations, the behavior-based approach and the machine learning approach are the primary approaches that are capable of overcoming the challenges of multi-robot systems operating in dynamic, unstructured, and unknown environments. As mentioned in the previous chapter, the behavior-based approach is more robust for a dynamic environment than the planning approach. It is so because it can quickly adapt to a fast changing environment, by identifying the current environment state and couple it with a specific action. In addition, the behavior-based approach is known to become unsuccessful in a complex environment with a large number of environment states. 46 However, it is highly effective in simple environments. Particularly, the behavior-based approach needs a human designer to design all behavior rules and the designer needs to foresee all possible world states that the robots will encounter. Therefore it is almost impossible for this approach to be effective in a dynamic and unknown environment with a large number of states. Considering how humans accomplish physical tasks in a dynamic environment, it is not difficult to come to the conclusion that a human employs not only planning or reactive (behavior-based) techniques but also learning techniques. Humans identify the new environment states through their learning capabilities, come across corresponding optimal actions from the past experience, and improve their planning and reactive techniques. In other words, humans explore the unexpected environment states with less uncertainty and make their decisions more robust and effective in a dynamic environment. Therefore, machine learning has become an important subject in the in the multi-robot field. Machine learning is considered a vital development due to the highly dynamic nature of typical multi-robot environments. Among the existing machine learning approaches, reinforcement learning (RL), especially Q learning, is the most commonly used one in multi-robot systems due to its simplicity and good real time performance. The single agent Q learning algorithm is the most common Q learning algorithm. If we directly extend the single agent Q learning algorithm to a multi-robot system, the assumption of static environment is violated, and as a result the values of the Q-table may not converge. Although the robots still can find some good policies for cooperation, the performance of the whole team of robots can be degraded when that approach is extended in this manner. (Wang and de Silva, 2007) Therefore, an improved version of the single agent Q learning algorithm is proposed here for robot decision making, which can come up with an optimal solution to the action selection conflicts of the robots in a multi-robot system. Hence the proposed Q learning algorithm is suitable for real-time operation of cooperating multi-robot systems. 47 In this chapter machine learning is used to make decisions for a multi-robot object transportation task. Markov decisions processes (MDP) and the single agent Q Learning algorithm are introduced in section 3.2 and section 3.3, respectively. The single agent Q learning algorithm will converge to an optimal policy in an MDP problem without a transition model. Modified Q learning algorithm is presented in section 3.4 and it is validated though computer simulation. Section 3.5 presents the distributed decision making unit of Q learning-based multi-robot object transportation, which is validated though computer simulation. 3.2 Markov Decision Process (MDP) The Markov decision process framework for decision making, planning, and control is surprisingly rich in capturing the essence of purposeful activities in various practical situations. MDP models and associated problem arise in many areas, including medical decision making, maintenance planning, and mobile robot navigation. Therefore MDP is a popular topic in artificial intelligence and it facilitates an intelligent agent to make decisions in real-world problems. The basic concept of MDP is introduced now. MDP can be defined by the tuple< S,A,T,R,/3>, where \u00E2\u0080\u00A2 S = {s1,2 ,s}, is a set of states of the environment \u00E2\u0080\u00A2 A ={a1,a2 ,a},is a set of actions by the robot \u00E2\u0080\u00A2 T : SxA \u00E2\u0080\u0094> fl(s), is a transition function, which decides the next environmental states\u00E2\u0080\u0099 when the robot selects an action a, under the current states. fl(s) is the probability distribution over the states s. \u00E2\u0080\u00A2 R: S x A \u00E2\u0080\u0094 R, is a reward function. When the robot takes an action a. under the current states, it determines the immediate reward. MDP presents a sequence of decisions made by an agent or a robot in an environment with many states. At time t, the agent or a robot requires to observe the environment and 48 determine its current state s e S, and then select an action a c A based on its policy it which denotes the action needed to be taken by the robot or agent to reach the desired state. Further, there is a reward function which denotes the immediate reward when the agent or robot takes an action a under the current state s. Two major attributes of MDP make it interesting and challenging. First, that particular subsequent states\u00E2\u0080\u0099 is not deterministic when the agent or robot obtains an action a under the current environmental state s. Instead, it has a probability distribution fl(s) over all the states. This probability distribution is defined by a transition function or transition model T(s, a, s 9. Second, MDP\u00E2\u0080\u0099s transitions among states are Markovian; that is, the probability of reaching s\u00E2\u0080\u0099 from s depends only on s and not on the history of previous states. The performance of an agent or robot policy it is measured by the rewards it received when it made a sequence of decisions according to the policy and the sequence of states it visited. This measurement is usually represented by a sum of discounted rewards as given by [fiuR(s) I (3.1) where, 0 S then S \u00E2\u0080\u0094 U\u00E2\u0080\u0099(s) \u00E2\u0080\u0094 U(s) until 5