Open Collections

UBC Theses and Dissertations

UBC Theses Logo

UBC Theses and Dissertations

Machine learning-based multi-robot cooperative transportation of objects Siriwardana, Pallege Gamini Dilupa 2009

You don't seem to have a PDF reader installed, try download the pdf

Item Metadata


ubc_2009_fall_siriwardana_pallege.pdf [ 3.41MB ]
JSON: 1.0068136.json
JSON-LD: 1.0068136+ld.json
RDF/XML (Pretty): 1.0068136.xml
RDF/JSON: 1.0068136+rdf.json
Turtle: 1.0068136+rdf-turtle.txt
N-Triples: 1.0068136+rdf-ntriples.txt
Original Record: 1.0068136 +original-record.json
Full Text

Full Text

MACHINE LEARNING-BASED MULTI-ROBOTCOOPERATIVE TRANSPORTATION OF OBJECTSbyPallege Gamini Dilupa SiriwardanaB.Sc., University of Peradeniya, Sri Lanka, 2003B.Tech., The Open University of Sri Lanka, 2004A THESIS SUBMITTED TN PARTIAL FULFILLMENT OFTHE REQUIREMENTS FOR THE DEGREE OFMASTER OF APPLIED SCIENCEinThe Faculty of Graduate Studies(Mechanical Engineering)THE UNIVERSITY OF BRITISH COLUMBIA(Vancouver)May 2009© Pallege Gamini Dilupa Siriwardana, 2009AbstractMulti robot cooperative transportation is an important research area in the multi robotdomain. In the process of object transportation, several autonomous robots navigatecooperatively in either a static or a dynamic environment to transport an object to a goallocation and orientation. The environment may consist of both fixed and removableobstacles and it will be subject to uncertainty and unforeseen changes within theenvironment. Furthermore, more than one robot may be required in a cooperative modefor handling heavy and large objects. These are some of the challenges addressed in thepresent thesis.This thesis develops a machine learning approach and investigates relevant researchissues for object transportation utilizing cooperative and autonomous multiple mobilerobots. It makes significant original contributions in distributed multi robot coordinationand self deterministic learning for robot decision making, and comes up with an optimalsolution to the action selection conflicts of the robots in the cooperative system. This willhelp to improve the real time performance and robustness of the system. Also, the thesisdevelops a new method for object and obstacle identification in complex environmentsusing a laser range finder, which is more realistic than the available methods. A newalgorithm for object pose estimation algorithm is developed, enabling a robot to identifythe objects and obstacles in a multi-robot environment by utilizing the laser range finderand color blob tracking.The thesis develops a fully distributed hierarchical multi-robot architecture for enhancedcoordination among robots in a dynamic and unknown environment. It strives to improvethe real time performance and robustness. The system consists with three layers. Bycombining two popular artificial intelligence (Al) techniques such as learning andbehavior based decision making, the developed architecture is expected to facilitateeffective autonomous operation of cooperative multi-robot systems in a dynamicallychanging, unstructured, and unknown environment.11Machine learning, is integrated into the developed multi-robot system for decisionmaking during the transportation under uncertainty and unforeseen changes within theenvironment. For this purpose the conventional algorithm of Q learning known as single-agentQlearning algorithm improved to form the “modifiedQlearning algorithm”, whichprovides efficient state identification and optimally resolves action selection conflicts inmulti robot learning.A C++ based simulation system and a physical multi-robot experimental system aredeveloped in the laboratory to transport an object of interest to a goal location in adynamic and unknown environment with a complex obstacle distribution. Theapproaches developed in this thesis are implemented in the prototype system andvalidated through computer simulation and experimentation. The results validate thedeveloped techniques.111TABLE OF CONTENTSAbstract.iiTable of Contents ivList of Tables viiList of Figures viiiNomenclature xAbbreviations xiiAcknowledgements xiiiChapter 1 Introduction 11.1 Project Description 21.2 Objective of the Research 41.3 Problem Definition 41.4 Related Work 71.4.1 Multi-Robot Coordination Behavior- Based Approach for Coordination Planning-Based Approach for Coordination Distributed Control Approach for Coordination Market-Based Approach for Coordination Learning-Based Approach for Coordination 111.4.2 Machine Learning in Multi-Robot Systems 121.4.3 Pose Estimation Technologies for Multi-Robot Systems141.5 Contributions and Organization of the Thesis 17Chapter 2 Multi-Robot Coordination for Object Transportation 192.1 Overview 192.2 General Hierarchical Architecture for Multi Robot Coordination 202.2.1 Machine Learning Unit262.2.2 Coordination Unit 272.2.3 Behavior-Based Distributed Control Behavior Representation 302.2.3.2 Behavior Composition and Cooperation 312.2.3.3 Communication Behaviors for Coordination 332.2.3.4 Group Behaviors for Coordination 34iv2.2.3.5 Behavior Coordination for Object Transportation Sub Task 342.3 Mechanism for Multi Robot Coordination 372.3.1 Dynamic Role Assignment Mechanism for Coordination Unit 372.3.2 Dynamic Role Assignment Model for Coordination Unit 402.4 Distributed Control Issues for Multi Robot Coordination 432.5 Summary 44Chapter 3 Machine Learning for Multi-robot Decision-making 463.1 Overview 463.2 Markov Decision Process (MDP) 483.3 Reinforcement Learning 513.4 ModifiedQLearning Algorithm 543.5 Multi Robot Transportation Using Modified Q Learning 583.5.1 Distributed Multi-Robot Q-leaning 603.5.2 Object Transportation Task 613.5.3 Multi-Robot Object Pushing with Modified Q-learning 623.5.3.1 The Environment States for Modified Q Learning 623.5.3.2 Robot Actions 643.5.3.3 Object Dynamics 653.5.3.4 Reward Function 673.5.4 Simulation Results 683.5.4.1 Stimulation Results for Object Pushing 693.5.4.2 Effectiveness of ModifiedQLearning in a Multi Robot Cooperative Task 713.6 Summary 76Chapter 4 Pose Estimation for Multi Robot Object Transportation 784.1 Overview 784.2 Multi-Robot Transportation System 814.3 Pose Estimation Technology 834.3.1 Global Localization of Robot 854.3.2 Color Blob Tracking 874.3.3 Object Pose Estimation 894.3.3.1 Relative Pose Estimation 904.3.3.2 Object Global Estimation 934.4. Experimental Result 944.4.1 The Test-Bed 95v4.4.2 Experiments with the Stationary Robot 964.4.3 Experiments with a Moving Robot 1014.5 Summary 105Chapter 5 Experimentation in Distributed Multi-Robot Cooperative Transportation 1075.1 Overview 1075.2 Test Bed 1085.3 A Physical Multi-Robot Transportation System 1125.4 Summary 118Chapter 6 Conclusions 1196.1 Primary Contributions 1196.2 Limitations and Suggested Future Research 1216.2.1 Improvement of the Model and Algorithm of MQLA 1216.2.2 Integrating Learning and Coordination 1226.3 Summary 122Bibliography 123Appendix 1: Program Functions 127Appendix 2: Program Codes 129viList of TablesTable 4.1: The actual and estimated object pose results in the first set of experiments with astationary robot 96Table 4.2: The actual and estimated object pose results for the second set of experiments witha stationary robot 99Table 4.3: The actual and estimated object pose results for the first set of experiments with amoving robot 101Table 4.4: The actual and estimated object pose results for the second set of experiments witha moving robot 103viiList of FiguresFigure 1.1 The overall project representation 3Figure 1.2 Schematic representation of developed system 5Figure 2.1 Hierarchical architecture for object transportation 23Figure 2.2 Composite representation of the proposed architecture 25Figure 2.3 Principle of machine learning 26Figure 2.4 The environment state representation for coordination 28Figure 2.5 The hierarchical behavior based control layer 29Figure 2.6 SR diagram of a simple behavior 30Figure 2.7 The SR diagram for a composite behavior with a competitive operator 32Figure 2.8 The SR diagram of a composite behavior with a cooperative operator 33Figure 2.9 Behavioral hierarchy for the coordinated transport sub-task 35Figure 2.10 Transitions of the group behaviors of coordinated object transportation 36Figure 2.11 Types of role assignment 38Figure 2.12 Schematic representation of a role composed by three roles 42Figure 3.1 The value iteration algorithm 51Figure 3.2 The single-agent Q-learning algorithm 52Figure 3.3 Modified Q learning algorithm 57Figure 3.4 The developed multi-robot system 59Figure 3.5 Description of a multi robot object pushing task 61Figure 3.6 Environment state representation using binary code 63Figure 3.7 Coding protocol of the environment states 64Figure 3.8 Pushing action points available to each robot 65Figure 3.9 Dynamics of the object with rectangular cross section 66Figure 3.10 Two object-pushing snapshots under different conditions 70Figure 3.11 The environmental state with decimal value “512” and six pushing actionsavailable to the robots 72Figure 3.12 The Q-values under state of “512” in 100,000 rounds of object pushing 73Figure 3.13 Probability density estimate ofQvalues under state of “512” in 100,000rounds of object pushing 74viiiFigure 3.14 Probability density estimate of a action selection probability underthe state of “512” in 100,000 rounds of object pushing 75Figure 4.1 The developed physical multi-robot system 82Figure 4.2 General scheme of pose estimation for cooperative object transportation 85Figure 4.3 Global pose estimation of wheeled robot 87Figure 4.4 Flow diagram of color blob tracking 88Figure 4.5 Division of camera frame into four quadrants 89Figure 4.6 Schematic drawing of laser range sensor 90Figure 4.7 Relative object pose estimation 90Figure 4.8 MobileSim simulator visualized results from laser range finder 92Figure 4.9 Arbitrary layout of objects 94Figure 4.10 The experimental system in IAL at UBC 95Figure 4.11 The x-axis error from Table 4.1 97Figure 4.12 The y-axis error from Table 4.1 97Figure 4.13 The orientation error from Table 4.198Figure 4.14 The x-axis error from Table 4.2 99Figure 4.15 The y-axis error from Table 4.2 100Figure 4.16 The orientation error from Table 4.2 100Figure 4.17 The x-axis error from Table 4.3102Figure 4.18 The y-axis error from Table 4.3102Figure 4.19 The orientation error from Table 4.3 103Figure 4.20 The x-axis error from Table 4.4104Figure 4.21 The y-axis error from Table 4.4104Figure 4.22 The orientation error from Table 4.4 105Figure 5.1 The multi-robot object transportation system108Figure 5.2 The color blobs used to identify the orientation of the object109Figure 5.3PioneerTM3-DX mobile robot 110Figure 5.4PioneerTM3-AT mobile robot 111Figure 5.5 Multi-robot cooperative transportation116ixNomenclatureNotationsS = {s, s2, ..., s} Setof states in the environmentA = {a1,a2,...,a} Setof actions available to a robotT(s, a, s’)Environment model, which decides the next environmental states’when robot selects action a, under current states.fl(s) Probability distribution over the statesT: S x A —> ,z(s) Transition function or transition modelt timeR : S x A —÷ fl Reward function. This gives the immediate reward when robottakes action a. under current state sPolicy with which a robot selects its actionsR(S)Received RewardDiscount factor, which varies from “0” to “1”Optimal policy, which yields the highest expected sum of thediscounted rewardsU(s) Utility of a state2TUtility of the subsequent stateMaximum error allowed in the utility of any state6 Maximum change in the utility of any state in an iterationU, U’ Vectors of the utilities for the states in sQ(s,,a,) An entry in the Q-table77Learning raterDirect rewardV“Temperature” parameter (see Chapter 3)Empty setA set including all actions selected by the robotsx8 Orientation angleFNet force applied by the robotsFNet torque applied by the robotsCf,C1,cCoefficientsAt Short period of timeDistance between the goal location and the object center beforetaking the pushing actionDnewDistance between the goal location and the object center aftertaking the pushing actionTSampling timeA1Action set of the i1’ robotI Set including the currently available actionsxiAbbreviationsMDP Markov Decision ProcessRL Reinforcement LearningMQLA ModifiedQLearning AlgorithmAl Artificial IntelligenceLAN Local Area NetworkTCP/IP Transmission Control Protocol/ Internet ProtocolMRS Multi-Robot SystemCCD Charge Couple DeviceGPS Global Positioning SystemFSA Finite State DiagramVLSI Very Large Scale IntegrationACTS Advanced Color Tracking SystemxliAcknowledgementsA number of people contributed to this effort intellectually, personally and financially.First and foremost, I wish to thank Prof. Clarence W. de Silva, Director of the IndustrialAutomation Laboratory, in view of number of reasons. He was my research supervisor,and his dedication and commitment towards the student community is appreciated by allof us who were his students. Had it not been for Professor de Silva’s considerablepersuasive skills, boundless patience, great-heartedness, keen guidance and unbelievablehuman qualities; this particular research perhaps would never have come to fruition.I take this opportunity to thank Dr. Ying Wang, The post-doctoral research scientist in theIndustrial Automation Laboratory, who is a scholar of multi-robotic systems. Whenever Ihad a problem in multi robotic systems, he assisted me along the correct path and I amprivileged and honored to have worked with him.I must thank Mr. Howard Hu, a Research Assistant in the Industrial Automationlaboratory, for his constant support and constructive remarks, especially in debugging thecomputer programs.I wish to express my sincere gratitude to Prof. Mu Chiao and Dr. Farbod Khoshnoud ofSOFTEK, who served as members of my research committee.Furthermore, I wish to thank The Open University of Sri Lanka and the Asiandevelopment Bank funded Distance Education Modernization Project for the financialassistance in part. Major financial assistance, for my Research Assistantship, and forequipment and other resources for my research, was provided by research grants held byProf. de Silva, notably: the National Sciences and Engineering Research Council(NSERC) of Canada Special Research Opportunity (SRO) and Discovery grants; Tier 1Canada Research Chair (CRC); Canada Foundation for Innovation (CFI); and BritishColumbia Knowledge Development Fund (BCKDF). I also acknowledge the supportextended by my Sri Lankan friends, Mr. Anjana Punchihewa, Mr. Eranda Harinath andMr. Lalinda Weerasekara for their encouragements and support in different ways.xliiI take this opportunity to thank the members of the Industrial Automation Laboratory;especially Prof. Lalith Gamage, Mr. Tahir Khan, Mr. Mohammed Airasheed, Ms.Madalina Wierzbicki, Mr. Roland Haoxiang Lang, Mr. Bhenam Razavi, Mr. SrinivasRaman, Mr. Guan-Lu Zhang, Mr. Arunasiri Liyanage, Mr. Ramon Campos and Prof.Qingshan Zeng for their assistance, guidance and encouragements throughout theresearch project. Last but not least, I wish to thank my family for their constant support,especially my wife Lelka Samanmalee, my daughter Viduni Siriwardana, my mother, myfather and my sister. Without their unwavering support, tolerance of my idiosyncrasiesand abiding confidence, this research never would have been possible.xivChapter 1IntroductionThe field of Multi-Robot Systems emerged as a major area of robotics research in the late1980’s. Since then, the robotics community has investigated a number of important issues ofcooperative multiple mobile robots. Prior to that most research was concentrated on either singlerobots or distributed problem-solving systems that did not involve multi-robot cooperation.Multi-robot systems (MRSs) are becoming increasingly significant in industrial, commercial andscientific applications. The number of robots used in these applications is also increasing.An MRS can be defined as a set of autonomous robots interacting in the same work environmentin fairly complex ways to carry out a common task. It should be able to operate in dynamicenvironments, where uncertainty and unforeseen changes can arise due to the presence ofmoving robots, other agents and both stationary and moving obstacles. Such a system mayconsist of complex mobile robotic platforms equipped with sophisticated sensors andactuators,which may be required to execute complex tasks.The advantages of a multi-robot system (MRS) over a single-robot system are indicated now.Multiple robots have the potential to accomplish a given task more efficiently because thecapabilities more diverse, flexible, and distributed. Multiple robots can localize themselves fasterand more accurately by exchanging information among themselves about their positions andorientations (pose). An MRS possesses a greater fault tolerance than a powerful and expensivesingle robot, particular because if one robot in the MRS fails or malfunctions there will be otherrobots to assume its tasks. In summary, an MRS will have a greater range of task capabilities,greater efficiency, increased system performance, fault tolerance and robustness, lower economiccost, ease of development, distributed sensing and action, inherent parallelism in comparison to asingle robot of comparable capability.In the MRS domain, the robotics community began to investigate major challenges such as robotsynchronization, determination of a robot coordination strategy and identification of complexwork environments. Also, they have investigated the potential for using MRS research to provideinsight into social and life sciences involving groups of biological entities.1Multi-robot object transportation is an important research subject in the multi-robot domain. Inan object transportation process, several autonomous robots navigate cooperatively in either astatic or a dynamic environment to transport an object to a goal location and orientation. Theenvironment may consist of fixed and movable obstacles and may be subject to uncertainty andunforeseen changes within the work environment. A single robot may not be able to handleheavy and large objects alone. Underlying challenges can be overcome by multiple mobilerobots. A primary concern associated with multi-robot application is the method of enablingindividual robots to autonomously adapt their behaviors to meet the dynamic changes in theirtask environment. For this, each robot needs to communication with its peers, exchange sensorinformation, formulate the coordination strategy and obstacle avoidance scheme, plan the robottrajectory, and assign and receive subtasks and roles for carrying out the required task quicklyand successfully. Due to main challenges inherent in various research fields, multi-robot objecttransportation is a good benchmark application to assess the effectiveness of multi-robot controland coordination strategies. Therefore control and coordination methods for multi-robot systemshave been investigated by various researchers and multi robot object transportation problem hasbecome a theme of research and development.1.1 Project DescriptionA main objective of the overall project of “Multi-Robot Autonomous Assembly System” in ourlaboratory (Industrial Automation Laboratory) is to develop a physical multi-robot systemwithobject manipulation, transportation and assembly capabilities. First, agroup of intelligentautonomous robots cooperatively search and identify any useful parts within the workenvironment. Then the identified parts are individually or cooperatively transported to adesignated place and assembled into a useful device. Finally, the built device is transportedindividually or cooperatively to a known location in an unknown and dynamic environment.During this process, the obstacles may be static or even appear randomly during theoperation.The overall project representation is shown in Figure 1.1.2/“7cle/IMovable/- obstacle__________F0N/777k N Movable//Movableo1obstacle®/obstacleMovable —//.A..‘7-Fixedobstacle Goal Location//obstacle‘—‘iFigure 1.1: The overall project representation.The overall project consists of three main stages:1. Multi-Robot Foraging: Multiple mobile robots autonomouslysearch the environment foruseful objects/parts. The identified useful parts are transported by individualrobots orcooperating multiple robots to a designated place.2. Multi-Robot Assembly: Robots autonomously determine the assembly sequenceforconstructing a useful device, properly handle and manipulate the parts, andassemble thedevice.3. Multi-Robot Transportation: Tow or more robotscooperatively transport the built deviceto the goal location.A major challenge of the overall project is the transportation ofan object of arbitrary size andshape - a key issue in the multi-robot domain. In the stageof transporting components forassembly at a designated location the robots may have to workin a complex dynamicenvironment. The stages of multi-robot forging and transportation need touse cooperative objecttransportation technologies to accomplish the overall task. This thesis mainly concernsobjecttransportation technologies and the development of a physical multi-robotsystem to transport anobject of arbitrary size and shape.31.2 Objective of the ResearchThe physical multi-robot object transportation system, as developed through the researchreported in the present thesis, has the ability to navigate an object to a predetermined locationthough a dynamic and unknown environment with complicated obstacle distribution. Theprimary research objectives of the thesis are listed below.o Develop a fully distributed physical multi-robot system for operation in a dynamic andunknown environment with complicated obstacle distribution.o Improve an available single-agent machine learning algorithm so as to enable a multi-robot system to deal with the uncertainty and unforeseen changes in the environmentincluding changes of the robots.o Develop a technique to identify obstacles within a dynamic and unknown environment.o Develop a technique for pose estimation in a task of object transportation, to morerealistically estimate the pose of the object of interest.o Evaluate the performance attributes such as coordination, self-learning andadaptation,speed, and accuracy of the developed multi-robot system and its underlyingmethodologies.1.3 Problem DefinitionDevelopment of a physical multi-robot system is an important objective of the present work.This system consists of a group of intelligent autonomous robots, which have the ability to workcooperatively for transporting an object to a goal location and orientation. Thework environmentis considered unknown and dynamic, and obstacles may be present or even appear randomlyduring the transportation process. Multi-robot coordination and machine learning are integratedinto the developed physical system to cope with the challenges of the problem. A schematicrepresentation of the developed system is shown in below Figure 1.2.4Figure 1.2: Schematic representation of the developed system.Three or more autonomous mobile robots are present in the system, which enable thecooperativetransportation of an object to a goal location. During the course of the transportation, theyhaveto interact with each other to establish the strategy of coordination and cooperation.The feasibleand suitable locations and points of action have to be established for each robot assistingin thecooperative transportation. The object has to be transported quickly and effectivelywhileavoiding obstacles that may be present during the transportation process.Furthermore, thesystem will have to avoid damage to the transported object. In thedeveloped system, chargecoupled device (CCD) cameras, laser range finders and sonars are used to monitor andmeasurethe current location and orientation of an object. The dynamic and unpredictableenvironmentcontains both fixed objects and also movable obstacles, which may appear randomly. The robotsand sensory system are separately linked to their host computers, which are connectedthough alocal network to implement machine intelligence and system control.Wired/wirelessRouterTCP/IP1IServer5It is important to consider three major challenges in the development of the multi-robot system.The first one is the dynamic and unpredictable environment. In particular, dynamic obstaclesmay be scattered arbitrarily within the working environment. The obstacles may appear anddisappear randomly during the transportation process. In addition, there will be multiple robotsworking parallel. While one robot makes decisions, some other robots may have changed theenvironment according to their behaviors. This also makes the environment dynamic from theview of a single robot. As a result, the process of decision making becomes rather complex forthe robots within the environment. Challenge is to make decisions in real time and learn toidentify the conditions of the surrounding environment from the local viewpoint of the robots.The second challenge results from the localized sensing capabilities of the robots. In the presentproject it is assumed that each robot is capable of detecting an object or obstacle within a limiteddetection radius. The robots possess local sensing capabilities only, and consequently the globalenvironment is generally unknown to them. A robot will not have a good knowledge about thelocation or the shape of an object until it moves sufficiently close to the object. The unknownenvironment is another major challenge in the multi-robot domain, because the robots do nothave a complete understanding about the environment in advance, partly due to the unfamiliarityand partly due to the dynamic nature of the environment. Thus the system will not be able toutilize the traditional planning methodologies for the decision making process.The third challenge is the lack of reliable communication among robots, where the disturbancesare represented by white noise. Due to unreliable communication, the robots may end upselecting a wrong coordination strategy. Improvement of the robustness of multi-robot decision-making in an environment with unreliable communication is also a challenging issue in themulti-robot domain.Normally, the environment of a multi-robot system is dynamic and unknown. In such a situation,the communication of robots is generally inconsistent. In this thesis, several approaches areintroduced to meet these challenges in multi-robot systems in enabling them to work effectivelyin an unstructured environment.61.4 Related WorkA significant amount of work has been performed by the robotics community in the past 10 yearsin multi-robot coordination and learning for decision making, particularly to support effectiveand robust operation of multi-robot systems in both simple and dynamic environments. Thissection presents some relevant work in this context.Cao et at. (2006) presented a multi-robot hunting task in an unknown environment. Theyproposed a distributed control approach involving local interactions with respect to localcoordinate systems. Proposed approach can cope with the cumulative errors of wheels of therobots and also unreliable communication networks. Computer simulation validated theirapproach.Kumar and Garg (2004) studied a multi-robot cooperative manipulation with two industrialrobotic arms. They introduced a fuzzy logic-based approach to coordinate the motions of the tworobotic manipulators so that the internal forces among them became minimum. In addition, theyused Kalman filtering to estimate the external force on the end effecter, based on the informationfrom a force/torque sensor mounted on the robot wrist.Stone et al. (2006) examined the issue of uncertainty in multi-robot systems. They identifiedseveral sources of uncertainty and proposed a method for reducing the uncertainty and makingdecisions in the face of uncertainty. Their method enables effective planning under uncertaintyfor robots engaged in goal orientated behavior within a dynamic and complex environment. Thescaling issues of multi-robot systems are presented by Gustafson et al. (2006). They usedseveralabstract models to elucidate that it was counterproductive to increase the number of robots or thesensor strength in a multi-robot system.Mataric et al. (2006) presented a mathematical model for dynamic task allocation in multi-robotsystems, based on stochastic processes. They made an assumption for a large-scale multi-robotsystem with local sensing, behavior based controller, and distributed task allocation capabilities.Through storing the history of the environment examined in the internal state of a robot, theytried to cope with the issues of local sensing and indirect communication in multi-robot systems.Their model was demonstrated for a multi-robot foraging task.7In another paper, the same authors (Gerkey and Mataric, 2004) developed a taxonomy forMRTA (multi robot task allocation). They proposed three axes for use in describing MRTAproblems: (1) Single task robots (ST) versus multi task robots (MT). (2) Single robot tasks (SR)versus multi robot tasks (MR). (3) Instantaneous assignment (IA) versus time extendedassignment (TA). Based on introduced method, they compared six typical multi-robotarchitectures with their MRTA approaches.Many valuable surveys have been conducted on multi-robot systems. Farinelli et al. (2004)completed a survey on multi-robot systems and proposed a new taxonomy for multi-robotcoordination. Multi-robot tasks were classified as unaware systems, aware but not coordinatedsystems, weakly coordinated systems, strongly coordinated and strongly centralized systems,strongly coordinated and weakly centralized systems, and strongly coordinated and distributedsystems.In a survey carried out on existing multi-robot architectures, Parker (2000) pointed out severalchallenges in typical multi-robot tasks. Specially, she categorized the available architectures intothree multi robot architectures as: general architecture, specialized architecture and hybridarchitecture. In addition, she classified three typical multi-robot tasks and their main challenges.For multi-robot object transportation, a major challenge is uneven terrain; for multi-robotlearning, a major challenge is how to assign credits among robots; and for multi-robot motioncoordination, the key challenge is computational complexity.There are a number of surveys that review Multi-Robot Learning (MRL) or Multi-AgentLearning (MAL). For example, the survey presented by Yang and Gu (2004) identified thatscaling an individual reinforcement learning algorithm to multi-robot systems would violate itsMarkov decision process assumption, which is the weakness of many available multi-robotreinforcement learning approaches. They also classified four frameworks of multi-agentreinforcement learning: the stochastic games (SGs)-based framework, the fictitious playframework, the Bayesian framework, and the policy iteration framework. Furthermore, theycategorized multi-agent reinforcement algorithms for possible application to multi-robot systems.They argued that there were four major challenges in applying reinforcement learning to multirobot systems: the Markov decision assumption, continuous spaces, uncertainties, and8incomplete information. Three future directions were pointed out as well: the behavior basedapproach with reinforcement learning, fuzzy approximation functions, and continuousreinforcement learning.Panait and Luke (2005) presented a survey on multi-agent learning for cooperation and theydiscussed the two main challenges in cooperative multi-agent learning; namely, the largelearning space and the dynamic environment. Furthermore, they classified two types of learning:concurrent learning and team learning. The major issues in the area of concurrent learning werediscussed as: credit assignment, learning dynamics, teammate modeling and relationship betweenlearning and communication.1.4.1 Multi robot CoordinationTo improve a mechanism for coordinating multiple robots in the execution of cooperative taskshas been a challenging objective in multi-robot systems. Multi-robot coordination hasbeencarried out according to a developed multi-robot architecture. Multi-robot architectures seek toimprove the coordination mechanism for the object transportation process. Primarily five typesof multi robot coordination approaches are briefly discussed now; namely, the behavior-basedapproach, the planning-based approach, the market-based approach, the distributed control-basedapproach, and the learning-based approach. Behavior-based approach for coordinationParker (2000) introduced a distributed and behavior-based multi-robot architecture called LALLIANCE. It employs the concepts “Impatient” and “Acquiesce” to dynamically stimulatebehaviors for coordination among robots. Moreover, by evaluating the performance of theteammates and dynamic changes in the environment, L-ALLTANCE autonomously updatesitsparameters to adjust to those changes. Introduced architecture was validated in abox pushingtask with heterogeneous mobile robots.The behavior based multi-robot coordination, CAMPOUT, has been introduced in the RobotConstruction Crew (RCC) project developed in the Jet Propulsion Laboratory (JPL) of NASA.CAMPOUT is a fully distributed, behavior based multi-robot architecture and it uses leaderfollower distributed coordination strategy. CAMOUT was validated using a multi-robot9transportation task in an autonomous deployment project on a planetary surface. In their latestwork they introduced a multi-robot autonomous construction task based on the CAMPOUTarchitecture.Mataric et a!. (2002) developed a behavior based approach for a distributed multi-robot system.They developed controllers which were robust to noise and robot failures. Three versions of thetask were evaluated in a spatio-temporal context using interference, time to completion, anddistance traveled as the main investigative parameters.Konidaris and Hayes (2005) proposed to integrate the behavior-based approach withreinforcement learning for multi-robot coordination. Furthermore, they suggested to usetopological maps to reduce the learning space in reinforcement learning.Camacho et a!. (2006) introduced behavior based coordination for robot soccer agents. In theirwork, they allocated each robot a duty for execution of the particular defined task. A rule-basedRECCA control algorithm was also proposed. Planning-based approach for coordinationMiyata et a!. (2002) presented coordination of multiple mobile robots for cooperativetransportation in unknown static environments. They introduced a centralized approach for taskplanning and assignment. Priority-based assignment algorithms were used for multi-robotcoordination and motion planning. Distributed control approach for coordinationWang and de Silva (2005), Zaerpoora et al. (2005), Pereira et al. (2004), and Chaimowicz et al.(2004) presented the caging formation control problem in multi-robot coordination forcooperative transportation. They proposed an “Object Closure” strategy to coordinately move theobject with multiple mobile robots while maintaining the formation. The major challenge was toestablish a bounded movable area for the object during its transportation. In this case, the contactbetween the object and the robots did not need to be maintained by the controller of each robot.They termed it “Object Closure”, which was used as a type of distributed formation controlstrategy for multi-robot cooperative carrying tasks. Basically, the “Object Closure” approach is a10sub-category of behavior based multi-robot coordination. However, it places its emphasis on thedistributed control features.Marshall et al. (2006) introduced a distributed control strategy called “cyclic pursuit” for multirobot coordination. The particular approach was found to be robust in the presence of Un-modeled dynamics and delays due to sensing and information processing. Market-based approach for coordinationMulti-robot task allocation is a vital and complex area of multi-robot coordination and theconventional approaches have seen low success in the presence of a large number of robots in thesystem. However, market-based approaches seem better for solving the complex multi-robot taskallocation problem, and they are gaining popularity. For example, Mataric et a!. (2002) presentedan auction-based approach for multi-robot coordination. They utilized the publisblsubscribe firstprice auction method and validated it in a multi-robot box pushing task. During thetransportation process a “watcher” robot monitors the transportation task, and “publishes” therequired auctions for use by the “pusher” robots in the same environment. The “pusher” robotsdetermine the available task at a given instance though “subscribing” to the information from the“watcher” robot. By matching its own abilities to the required tasks, each “pusher” robot bids fora task. When the “watcher” robot receives all bids from the “pusher” robots, it selects the mostsuitable “pusher” for the particular tasks and then the “watcher” robot broadcasts the decisions tothe “pusher” robots in the system. Particular auction-based approach was validated for a multi-robot box pushing task in a static environment without obstacles. Learning-based approach for coordinationTypically multi-robot systems work in a dynamic and unknown environment where a traditionalbehavior-based approach can easily fail because the designer cannot foresee all possible worldstates the robot would encounter and design the corresponding behavior. A behavior-basedmulti-robot system will work well in a known and structured environment. Learning capabilitiesare important for multi-robot systems to work properly in this type of environments. Mostexisting work in multi-robot learning utilizes reinforcement learning due to its simplicity andgood real-time performance. For example, Ito and Gofuku (2004) proposed a two-layer multirobot architecture for multi-robot coordination in cooperative object transportation. They11introduced a centralized machine learning method in the top level of this architecture for decisionmaking. A distributed approach of rule-based control was developed for the lower level ofmotion control of the robot, to reach a specified position and take a specific action according tothe commands from the upper level. Their approach reduced the learning space by integratingreinforcement learning with genetic algorithms.1.4.2 Machine Learning in Multi-Robot SystemsLearning capabilities are desirable for multi-robot systems when the working environment isunknown, unstructured and dynamic. Technologies of machine learning have been employedcommonly for this purpose. Among the machine learning approaches, reinforcement learningand particularly Q-Learning has been quite popular due to its simplicity and good real-timeperformance.A well-known work in this area, Parker et al. (2002) studied two significant aspects in multi-robot learning called learning new behaviors autonomously and learning parameter adjustments.They proposed two approaches for autonomously learning new behaviors in multi-robot systems.The first approach is called distributed pessimistic lazy Q-learning and it combines lazy learningwith Q-learning. It also uses a pessimistic algorithm for the credit assignment problem. Thesecond approach, called Q-Learning with VQQL (vector quantization with Q-learning) and thegeneralized Lloyd algorithm, addresses the generalization issue in reinforcement learning. Bothapproaches have been validated in a COMMMT (Cooperative Multi robot Observation ofMultiple Moving Targets) project. Let us consider learning for parameter adjustment. Theirprevious paper (as mentioned under behavior-based approach) introduced the behavior-basedarchitecture called L-ALLIANCE, which enables robots to autonomously update their controlparameters so that they can adapt their behavior over time in response to changes in thecapabilities of the robots, team composition, and the environment.Kapetanakis and Kudenko (2002) established two of multi-agent learning techniques calledmultiple single-agent learning and social multi-agent learning. In the multiple single-agentlearning technique, each learning agent observes other agents as a part of the environment and anindividual agent does not have any explicit knowledge of about other agents. However, in socialmulti-agent learning, agents can have a high level of knowledge of other agents and integrate this12knowledge in the learning process, potentially using local modeling techniques, communicationand coordination to support the learning task.Martinson and Arkin (2003) presented a multi-robot foraging task withQlearning. They usedQlearning to select roles of the robots dynamically and it was integrated with the behavior-basedapproach. The learning space was reduced by the behavior representation. They did simulation toverify their approach.Dahi et al. (2004) presented aQlearning-based multi-robot cooperative transportation task in asimple environment. They proposed a two-layer multi-robot architecture. The Q learning-basedbehavior-selection subsystem was located in the upper level and the behavior-based reactivesubsystem was located in the lower level. Further, they introduced two communication strategiesto speed up the convergence of Q learning.Kovac et al., (2004) addressed a “pusher-watcher” multi-robot box pushing problem withreinforcement learning. According to their approach one robot acted as a “watcher” whichobserved the current world state and broadcasted it to other robots. The other robots in theproject acted as “pushers” who selected their respective pushing actions usingQlearning. Amajor weakness of this approach is the need for a robust communication network.Taylor and Stone (2005) studied a methodology to transfer learned knowledge from one task toanother with a different state-action space. This is an important issue in reinforcement learning.This transfer is beneficial for the reduction of the training time in reinforcement learning.Particular methodology used the specific domain knowledge to construct a knowledge mappingfunction between two considered tasks so that the knowledge transfer happened effectively.Tangamchit et al., (2003) addressed several crucial factors affecting decentralized multi-robotlearning in an object manipulation task. They recognized four factors that directly affect thedistributed multi robot learning: the reward type, the learning algorithm, diversity, and thenumber of robots. They concluded that the reward type and the learning algorithm considerablyaffected the final results, and the scalability of the system would also affect the learning speedbut had only a slight effect on the final outcomes.13It is clear that reinforcement learning is rather popular in multi-robot coordination. Theenvironment in a multi-robot task is typically dynamic because the robots themselves alsochange in their shared environment. Therefore, the single agent reinforcement learning algorithmwhen extended to a multi-robot system violates its assumption of a stationary environment.A number of multi-agent reinforcement algorithms have been introduced in the multi-agent area,namely, mini-maxQlearning algorithm, Nash Q learning algorithm, friend or foe Q learningalgorithms, team Q learning algorithm, and so on. The teamQlearning algorithm is a simplifiedversion of the NashQlearning algorithm. All these algorithms assume a dynamic environmentand allow each robot to observe the actions of its peers before it makes its own decisions.The large learning space is the main hurdle in multi-agent reinforcement learning algorithms.Each robot needs to monitor its own actions and its peer actions as well. Therefore, when thescalability of the system increases, the resulting learning space will be excessive. However, whenthe number of robots increases, the single agentQlearning algorithm still can providesreasonable performance due to its fixed learning space.Multi-robot community has tried to employ numerous approaches such as neuro-fuzzy, geneticalgorithms (GAs), and so on to reduce the large learning space in multi-robot coordination.However, research in this area is still in its infancy, and there are many open questions yet to beexplored.1.4.3 Pose Estimation Technologies for Multi-robot SystemsEstimation of pose (i.e., position and orientation) is essential for object transportation in a multi-robot system. Technologies of pose estimation are utilized to estimate the real-time pose ofobjects, including obstacles and the other robots, in the work environment. It will help identifythe world state and make appropriate decisions accordingly. Researchers have utilized varioussensory methods such as digital CCD (charge coupled device) cameras, global positioningsystems (GPS), laser range finders and sonar devices to identify the pose of an object in a roboticwork environment. Also, sensing algorithms have been developed to recognize, identify objectsand estimate features or poses of the objects in the environment.14Lang et al. (2008) presented a multiple sensor-based method for robot localization and box poseestimation. A CCD camera mounted on the robot was used to find the box in its environment. Alaser range finder mounted on the robot was activated to measure the distance between the lasersource and the object. Finally homogenous matrix transformation was applied to represent theglobal pose of the robot and the box. The approach was validated using computer simulation.Park et al. (2006) introduced a method for global localization of an indoor environment, whichemployed object recognition using a stereo camera. Their method of global localization firstmade a coarse estimation of the pose and then refined it. The coarse pose was estimated bymeans of object recognition and least squares fitting through singular value decomposition, whilethe refined pose was estimated by using a particle filtering algorithm with onmi-directionalmetric information. Even though a vision system has been used for pose estimation, theapproach has a number of drawbacks. In particular, the presented schemes are computationallyexpensive, time consuming, have time delays, and are susceptible to changes in the ambientlighting conditions.Ekvall et a!. (2005) studied a computer vision approach for object recognition and poseestimation based on color co-occurrence histogram and a geometric model. Even though theapproach was somewhat robust, it failed under large changes in lighting conditions.Kato et al. (2003) presented a method for identifying a robot and obtaining its relative position ina multi-robot system using an omni-directional vision sensor. Their approach was validatedusing the simulation and experimentation. Similar work has been done by Hajjdiab et a!. (2004).A vision based approach for multi-robot simultaneous localization and mapping (SLAM) waspresented. They presented a method to calculate the locations of the robots using a collection ofsparse views of the planar surfaces on which the robots were moving. The camera movementswere estimated using inter-image homographs computed by matching the overhead transformedviews.Tracking of multiple moving objects using vision sensors is an important issue in road trafficcontrol systems. Chen et al. (2005) introduced a framework for spatiotemporal vehicle trackingbased on unsupervised learning segmentation and object tracking. Adaptive background learning15and subtraction method was applied to two real life traffic video sequences in order to obtainaccurate spatiotemporal information on vehicle objects.Laser range finders and CCD cameras are commonly-used sensors in mobile robots. Most laserrange finders have the ability to scan the surrounding environment and accurately detect objectswithin a radius of 50 meters, and return a distance measurement for each degree in a 180 degreerange. On the other hand, CCD cameras can be used to detect the shape, color and surfacefeatures of the objects of interest and return the corresponding vision information. Researchershave combined these two sensors for object recognition and localization.Murarka et al. (2006) proposed a hybrid method incorporating laser range finders and vision forbuilding local 2D maps to help an intelligent robotic wheelchair distinguish between safe andhazardous regions in its immediate environment. Here, laser range finders were used forlocalization and mapping of the obstacles in the 2D laser plane while vision was used fordetection of the hazards and other obstacles in the 3D space. The information on the obstacleswas used to construct a local 2D safety map.Tomono (2005) used a mobile robot equipped with a laser range finder and a monocular camerato build a 3-D environment model. In this framework, they first built a 2-D map by scanningobjects in the environment with a laser sensor, and then verified the detected candidates by usingvision-based object recognition.Takahashgi and Ghosh (2005) proposed a method to identify the parameters of a moving objectby integrating a laser range finder and a vision sensor. The approach has been reduced thedimension of the parameter ambiguity.Gopalaicrishnan et al. (2005) developed a mobile robot navigation project which integrated alaser range finder and a vision sensor. The laser range finder was used for localization and thevision sensor was used object detection. Also a route based navigation technique was used formobile robot navigation.161.5 Contributions and Organization of the ThesisThis thesis performs research and develops advance techniques which will help multi-robotsystems operate in a robust and effective manner in a dynamic and unknown environment. Themain contributions in the thesis are as follows:o A multi-robot coordination mechanism and a fully distributed three-layer multi-robothierarchical architecture are developed to carry out cooperative tasks, whichintegrates machine learning and behavior-based approach of robot coordination. Thiscoordination technique enables efficient state identification and coordination amongrobots. The developed system architecture has the ability to dynamically allocate atask in an unknown environment.o An improved reinforcement learning algorithm termed the “modified Q learningalgorithm” is developed, which is able to provide efficient state identification andoptimize the action selection conflicts in multi-robot learning. It also helps deal withsome key issues in multi-robot learning; for example, task assignment, reduction ofthe learning space, and circumventing the Markov assumption.o A method is developed to estimate the pose of an object, which can be used to trackpotential obstacles and the other robots in the work environment simultaneouslyduring object transportation. This approach is more realistic than other availableapproaches. It is validated using two types of physical experimentation.o A physical multi-robot transportation system, which integrates the developedtechniques, is developed in our laboratory (Industrial Automation Laboratory). Thismulti-robot system works in a dynamic and unknown real-world environment andshows effectiveness, flexibility and overall good performance.The organization of the thesis is as follows. Chapter 1 has discussed multi-robot systems and theresearch objectives of the thesis. It defined the object transportation problem as studied in thethesis, and discussed related past work. Chapter 2 addresses the multi-robot coordinationmechanism and the three-layer multi-robot architecture. Two techniques of multi-robotcoordination included in multi robot architecture - the behavior-based approach and the machine17learning approach - are discussed and their relationship is presented. Several pertinent issues ofdistributed controls for multi-robot coordination are discussed. Chapter 3 proposes a modifiedQlearning algorithm in the multi-robot domain to facilitate robotic decision making in a dynamicand unknown environment. This algorithm is assessed using a multi-robot transportation task insimulation. The results are analyzed, and the advantages and disadvantages of the algorithm areindicated. In Chapter 4, an innovative method for detennining the pose of an object is developed,which will facilitate the robots to identify useful objects in a work environment. The technologyis validated using a task of multi-robot object transportation, in simulation. Chapter 5 presents aphysical multi-robot cooperative transportation system operating in a dynamic and unknownenvironment. Microsoft Visual Studio 2005 — distributed computing model is introduced toallow robots to effectively communicate and cooperate. Experimental results are presented tostudy the performance of the developed system, particularly concerningadaptively androbustness. Chapter 6 summarizes the primary contributions of the thesis and indicates somerelevant issues for future research.18Chapter 2Multi-Robot Coordination for Object Transportation2.1 OverviewMulti-robot coordination can be defined as the ability to manage the interdependencies ofactivities between mobile robots. Coordination among a group of robots should in principleimprove the overall performance of the team of robots, as robots may share their worldviewsand may negotiate to identif’ the present world state and cooperatively select the actions.However, in practice, effectively handling in real-time, multi-robot information and coordinationis a challenging task.Many open questions remain in the area of multi-robot coordination. How should a group ofrobots divide tasks among its members? Once roles have been assigned to the robots,how shouldthey position themselves to fulfill their roles without interferingwith their team-mates? Whathappens if a robot fails or if the environment changes so that a different robot is more suitable forthe task?The coordination of multi-robot teams in dynamic environments is one of the fundamentalproblems in multi-robot transportation. The underlying question is, given a group of robots and atask to be performed in a dynamic environment, how the robots should be coordinated in order tosuccessfully complete the task? Coordination normally implies synchronization of robotactionsand exchanging of information among the robots. The level of synchronizationandcommunication depends heavily on the task requirements, characteristics of the robots, andtheenvironment. Thus, different levels of coordination are required in different situations.Coordination of a group of mobile robots is performed according to a multi robot architecturedeveloped in the present work. The architecture represents the organization structure of thekeycomponents and their functional descriptions in a multi-robot system. Promoting cooperation orcoordination among robots and enabling a member robot to make rational decisions arethe mainobjectives of the architecture.19In this chapter, a framework for a group of robots to coordinate their activities and a hierarchicalmulti-robot architecture are proposed. This hierarchical multi-robot architecture integratesmachine learning and behavior-based approaches. In section 2.2 a fully distributed hierarchicalmulti-robot architecture for robotic decision making and detailed implementation is described.Section 2.3 presents a coordination framework for a group of robots. This framework for multi-robot coordination facilitates global sharing of information during the stage of objectenvironment state identification. It also describes how obstacle identification in the objectneighborhood may be combined with multi-robot coordination. Finally, in section 2.4, relevantdistributed control issues in multi-robot systems are discussed. The multi-robot architecture andcoordination framework developed in this chapter will be validated using computer simulationsand physical experiments. Details of the validation process will be presented in the subsequentchapters.2.2 General Hierarchical Architecture for Multi-Robot CoordinationThere are five main approaches for multi-robot coordination as mentioned in section 1.4.1: thebehavior-based approach, the planning-based approach, the distributed control-based approach,the market-based approach and the learning-based approach. Among these approaches, thebehavior-based approach is the most popular one because of its simplicity and robustness.However, this approach is particularly suitable for a known and structured environment. Themajor drawback of this approach is that the designer must have an ability to predict all theavailable environment states, because he needs to design all possible behavior rules. However,most realistic systems need to make decisions in a dynamic and unknown environment, and thisfaces numerous challenges as indicated below.First, a human must design all the behavior rules and individual robots must establish theirpreferences in advance. It is virtually impossible for a human designer to predict all possibleenvironmental states that the robots will encounter and design the relevant behavior rules forthem. On the other hand, within a dynamic and unknown environment, behavior-based roboticdecision making can easily fail when the number of environmental states is large. Second,typically there will be a large number of environmental states in a dynamic and unstructuredenvironment. The designer must design a general rule base to deal with all these environmentstates. However, it will be virtually impossible for the designer to arrange the preference for each20behavior rule in an extensive rule base due to the problem of “combination explosion.” Thereforea purely behavior-based approach for multi-robot coordination is not feasible in a unknown anddynamic environment.Planning-based approach utilizes planning techniques of traditional artificial intelligence (Al) toestablish optimal decisions for the robots (Miyata et al., 2002) in a multi-robot system. Thecentralized approach and the distributed approach are the two types of planning-basedapproaches for multi-robot coordination.In the centralized planning approach, the actions are planned by a single robot in the system andthis robot assigns subtasks to all other robots in the team. Task allocation among multiple robotsis a NP (non-deterministic polynomial) time problem in the theory of computational complexity.In fact computational complexity is the key disadvantage of this approach. For a practical multi-robot systems consisting of large number of robots, finding an analytical solution is very difficultand even impossible, through this approach. A compromise would be to approximate it byemploying an Al based optimization technique. Furthermore, the lack of communication willmake the centralized approach weak. On the other hand, the distributed planning approach plansrobot actions independently and achieves a globally optimal decision by integration theindividual decisions of all the robots. In view of its difficulty, few successful examples havebeen reported in the literature. In addition, both centralized and distributed planning approachescan easily fail in a dynamic and unknown environment since these approaches are unable to altertheir policies (action sequences) for the robots quickly enough to adapt to a rapidly changingenvironment.The distributed control approach (Pereira et al., 2004; Wang and de Silva, 2005) utilizes adistributed control strategy such as “Form Closure,” “Object Closure,” or “Leader-FollowerControl” to assign sub-tasks to the robots for controlling the team formation in the system.Distributed control is not a general approach. In the class of multi-robot cooperativetransportation, it is only suitable for special task situations, and this is a major disadvantage ofthe approach. Furthermore, the approach of distributed control also suffers from the problem ofcomputational complexity.21The market based approach (Mataric et al., 2002) is a rather promising approach for multi-robotcoordination in a dynamic environment. However, there are many open questions related to thisapproach. Its main challenges are computational complexity and real time performance.The machine learning-based approach attempts to circumvent the computational complexityproblem which affects other multi-robot coordination approaches by approximating the optimalpolices of the robots. In particular, in view of its good real-time performance and simplicity,reinforcement learning, particularly theQlearning algorithm, has been employed commonly(Parker et al., 2002; Martinson and Arkin, 2003; Martinson and Arkin, 2003; Dahi et al., 2004;Taylor and Stone, 2005; Wang and de Silva, 2007). In a survey paper, Arai et al. (2002) haverecommended that the area of learning based multi-robot coordination.The learning capability is essential for multi-robot coordination and task execution in a dynamicand unknown environment. As suggested in Chapter 1, the learning capability will enhance theadaptability to the dynamic environment and assist the robots to identify new environmentalstates. The robots can continuously adjust their action policies and improve the performancebased on the experiences of past success and failure in a dynamic environment, thus completingthe required tasks in a faster and more robust manner. Furthermore, Wang and de Silva (2006)have shown that the single agent learning scheme when extended to a multi-robot system, willviolate the original assumption of a stationery environment. Consequently, the learningalgorithm may not converge to its optimal policy.These observations clearly show that the problem of decision making for multi-robotcoordination is a challenging undertaking. A single approach may not be able to solve all theproblems encountered in a cooperative multi-robot transportation process. Therefore, thecombination of several existing techniques will be employed to develop a multi-robotarchitecture that can effectively support multi-robot coordination and cooperation in a dynamicand unknown environment.This fully distributed multi-robot hierarchical architecture integrates two popular artificialintelligence (Al) techniques; namely, machine learning and behavior-based approaches fordecision making in a dynamic, unknown and unstructured environment. Figure 2.1 shows the22proposed three level multi-robot architecture that offers improved robustness and real-timeperformance for multi-robot object transportation.High LevelTack DecompositionArdASn[TiltBehaviour -- basedDistnbuedCorirolHardwareLevelRcbojFigure 2.1: Hierarchical Architecture for Object Transportation.The hierarchical architecture for object transportation integrates different techniques in ahierarchical manner to provide flexible and robust capabilities to the robots for decision makingwhile familiarizing with the external environment dynamically. Basically, it consists of threelayers: Task Decomposition and Assignment layer, Behavior-based Control layer, and Hardwarelevel layer, subject to the constraints of the available robots.Task Decomposition and Assignment represents a high level layer. It has a learning unitand acoordination unit which are used to implement inter-robot coordination.The distribution ofavailable information among robots plays a vital role in multi-robot cooperative transportation.Particularly, one robot in the system will collect information from its own sensors, and willtransmit the information to the system. All robots in the system will gain an understandingaboutthe sensor information of the peer robots in addition its own,though the communication network.23Such communication will determine the current world state cooperatively, and decide the roboticactions sequentially using its learning or coordination units to reach the goal. The high levellayer will perform inter-robot task decompositions and assignment.During the transportation process each robot may use a learning unit or a coordination unit formaking decisions. Particularly, in the stage of environmental state identification an arbitrator willbe assigned dynamically who will monitor the environment state identification. A coordinationunit is used to coordinate the activities of the particular stage. The arbitrator will decide whichunit is used to make decisions for the stage of environment state identification. For example,when the robot navigates in a dynamic environment, usually the learning-based unit is activated.The arbitrator typically switches on the learning unit for decision making of the robot. However,once the arbitrator realizes that a robot within the system has identified the object and estimatedits pose, it will turn off the learning unit temporarily and switch on the coordination unit. Thiswill coordinate the state identification stage for the robots in the system within the bounded localregion called the detection area (Details are fund in Chapter 3). When the robots complete thestate identification stage, the arbitrator will return the decision making capability to the learningunit. Through this type of switch mechanism, the multi-robot architecture benefits from multi-robot coordination and the flexibly adapt to various environments.A popular behavior-based approach is employed in the middle level of the architecture, toimplement robust control of robotic actions. As stated above, the behavior based approach isknown to be efficient in multi-robot cooperative control. The hierarchical architecture utilizes thebehavior-based approach to execute sub-tasks as determined by the top level. When the taskdecomposition and assignment level of the architecture establishes the current sub-tasks for therobots, they will be sent down to the behavior-based control level so as to be decomposed infurther detail of robot behavior. The behavior-based control level consists of four types ofbehavior. The communication behaviors are responsible for communication among robots. Thegroup behaviors execute coordination among robots; for example, “Assume Formation” and theyare typically fired by the learning unit or the coordination unit in the top level of the architecture.Composite behaviors are responsible for the control of composite actions of individual robotssuch as “Find Object,” “Move to Location” and “Avoid obstacle.” These composite behaviorscan be decomposed further into primitive behaviors such as “Turn left,” “Go straight” and24“Move Forward.” Through the design of hierarchical behaviors and by carefully organizing theirpreferences, the robots can complete the required tasks as defined by the Task Decompositionand Assignment level, in a effective and robust manner.The low level of the architecture is the hardware level. It consists of sensors, actuators, andcommunication hardware of the robots. They compose the hardware platform of the hierarchicalmulti-robot architecture for object transportation.What is proposed is a general architecture for distributed multi-robot coordination and decisionmaking. This architecture is common for all robots in the system. A composite representation ofthe proposed architecture is given in Figure 2.2.Robot 01 Robot 02Robot 03High Level Control High Level ControlHigh Level ControlBehaviour Control Behaviour ControlBehaviour ControlHardw’arc Level 1—lard ware Level Hardware Level_________A_____________Figure 2.2: Composite representation of the proposed architectureThe top level of the architecture implements “soft” control of a robot while the middle levelimplements “hard” control behavior. In particular, the top level decomposes andassigns tasksamong robots; and the behavior based control layer decomposes the actions of a single robotfurther into more detailed primitive behaviors and controls motions of the robots. By combiningestablished mainstream approaches in multi-robot coordination, such as learning and behaviorbased methods, this multi-robot architecture provides more flexibility and cooperating powerin adynamic, unknown and unstructured environment than those presented in the literature.The proposed multi-robot architecture will be validated through computersimulation andexperimentation, as presented in the following chapters of the thesis.252.2.1 Machine Learning UnitThe machine learning unit is located in the top level of the proposed multi-robot architecture.The main purpose of this unit is to learn optimal decision making in a given dynamicenvironment though continuously monitoring actions of robots and the correspondingenvironmental rewards. The principle of the machine learning unit is presented in Figure 2.3.Figure 2.3: Principle of Machine Learning.All robots in the system gain knowledge by continuously monitoring the currentenvironmentalstate, selecting an action sequentially and executing it, and receiving rewardsfrom theenvironment. At the sequential action selection stage, each robot needs to observeconcurrentlythe actions of the peers in the system and review their effects on the environment.Each robotemploys a machine learning technique to learn from the past experience with theworkenvironment. The machine learning technique developed in the thesis continuously improvesitsperformance and approximates its optimal policy for decision making.The machine learning unit dynamically adjusts the mapping relationamong the environmentstate and its actions. Consequently, the machine learning unit will lead to better performanceinthe system robots and will complement the behavior layer in the middlelevel.In this thesis,Qlearning is utilized and enhanced to establish the mapping relationship betweenthe environment state and the robot actions. The maj or challenge arises in themodification of the26single agent Q learning to accommodate a multi-robot environment. Next chapter will discussthis issue in detail.2.2.2 Coordination UnitIn the prototype system there are three autonomous heterogeneous robots which attempt to pusha rectangular object cooperatively from a starting position to a goal location. Many challengesexist in this transportation task. For example, there are obstacles distributed randomly in theenvironment and each robot possesses local sensing capabilities only. This means a robot willnot know about an obstacle before it moves sufficiently close to the obstacle. Another challengeis that the object is rather long and heavy, so a single robot will not be able to handle it alone.The robots have to find optimal cooperative pushing strategies to move the object to the goallocation while avoiding collisions with any obstacles in the environment.In order to reach this goal, an arbitrator is designed in the top layer of the proposed architecture,which will select either the learning unit or the coordination unit for multi-robot decisionmaking. When a multi-robot system begins its task, the arbitrator selects the machine learningunit as its decision making unit. Robots start exploring the environment for color coded object tobe transported. Once an object is detected by a robot in the system, it rearranges its pose to centerthe color blob in the camera frame. Then the appropriate robot estimates the pose (position andorientation) of the object and the information will be transmitted to the other robots. After theobject is detected and the pose is estimated, the particular robot will temporarily close themachine learning unit and transfer its decision rights to the coordination unit. At the same timethe robot that detected the object is assigned as the leader of the team and all other robots in thesystem will stop their learning units and transfer their decision rights to the coordination unit ofthe leader robot. After all the robots have transferred their decision making rights to the leaderrobot, the coordination unit is activated temporarily. Hence, the leader robot should assign tasksto peers in the system who will assist in identif’ing the obstacle distribution around the object.The leader robot transmits all the processed sensor information to the system, and the robots willbe able to establish the current local environment state individually after switching on theirlearning unit. After the state identification stage, the coordination unit of the leader robot willstop working and will transfer the decision rights to all the learning units of the robots.27Figure 2.4 shows a representation of the local environment state in the multi-robot objectpushing task which employs the modified Q learning algorithm. This will be described indetailed in the next chapter. In the present subsection it is demonstrated why the coordinationunit is necessary for multi-robot tasks.DetectionRadius_. —— IGoal Location0I/ ——/ •1o,“/—‘ \ 19,,/.//•••\I\.ObjectX/I/OI ,../‘••_%. 4 ——ObstacleJFigure 2.4: The environment state representation for coordination.Both learning and coordination units are important for a multi-robot system to successfullycomplete cooperative tasks in a dynamic and unknown environment in the presence of complexobstacle distribution.2.2.3 Behavior-Based Distributed ControlThe proposed hierarchical multi-robot architecture is represented in Figure 2.1. The behaviorbased distributed control layer is designed to control the local actions of therobots in the system.It is located between the high-level task decomposition and assignment layer and the low-levelhardware layer. Normally, the high-level task decomposition and assignment layer selects quiteabstract actions, which cannot be directly executed using actuators in the hardware layer. Hence,28these high-level abstract actions need to be decomposed into more detailed actions in the layer ofbehavior based distributed control.The hierarchical behavior control architecture CAMPOUT has been developed at NASA JetPropulsion Laboratory - JPL (Huntsberger et at., 2003) to control the robot behavior in a robustmanner. Their hierarchical control architecture consists of two types of behaviors called groupbehaviors and primitive behaviors. The group behavior layer further decomposes them intocomposite and communication behaviors.The mid-level behavior control layer based on CAMPOUT clearly indicates the hierarchy ofcommunication behaviors, group behaviors and composite behaviors. Communication behaviorsare the high level behaviors relative to the group behaviors and group behaviors areaccomplished though the communication behaviors. Composite behaviors are the low levelbehaviors relative to group behaviors. The general idea of the behavior control layer is presentedin Figure 2.5.Behaviors serve as basic building blocks for robotic actions in the “Behavior-Based Systems.”Simply put, behavior is a reaction to a stimulus. Thus, in some literature, behavior systems arealso called as reaction systems. In reactive robotic systems, actions of robots are tightly coupledwith their perceptions without a knowledge reasoning process or time history. The foundation forbehavior-based systems has originated from animal models. Normally, biology has provided anexistence proof that many of the tasks we want the behavior-based robotic systems to undertakeare indeed doable. Additionally, the biological sciences, such as neuroscience, ecology, andFigure 2.5: The hierarchical behavior-based control layer.29psychology have elucidated various mechanism and models that may be useful in implementingbehavior-based robotic systems.Behavior-based systems have several advantages over planning-based systems. First, a behavior-based system is simple and easy to design. For a fast changing environment, a planning-basedsystem is difficult or even impossible to implement because it cannot adapt to a fast changingenvironment. A behavior-based system is much easier to design than a traditional system withplanning and reasoning by tightly coupling perceptions with actions and avoiding complicatedtime consuming reasoning and planning. Also, a behavior-based system can be successful evenin a dynamic and unknown environment and it has shown better performance and robustness inphysical robotic systems, if the designer can predict all possible environmental states and arrangethe corresponding behavior rules to deal with them. Behavior RepresentationSeveral methods are available for expressing robotic behavior. These methods are stimulus-response diagram (SR), functional notation, and finite state acceptor (FSA). SR diagrams areused for graphical representation of specific behavior configurations. Here, behaviorsareexpressed using Stimulus Response (SR) diagrams. Figure 2.6 shows an SR diagram.Stimulus 4&\iieIIaviou,,rResponseFigure 2.6: SR diagram of a simple behavior.According to Figure 2.6 a behavior can be expressed using the stimulus, response, and behavior.Therefore, the triple (S, R, ,6) represents the behavior, where S denotes the set of all possiblestimuli, R represents the range of the response, andfidenotes the mapping function betweenSandR.Binary coded duple (p, )) represents an individual stimulus s(s e 5), where p indicates the typeof stimulus and ..% denotes the strength of the stimulus. A threshold value r is assigned for eachstimulus, and the response will be generated only if the value of stimulus is bigger than v.30In addition, a response r can be expressed with a six dimensional vector consisting of sixparameters for mobile robots. For each of the six degrees of freedom of motion, each parameterencodes the magnitude of the translation and orientation responses:r=[x,y,z,8,cb,çi],where reR (2.1)The gain value g is introduced for a particular behavior /3 in order to further modify themagnitude of its response. A behavior is expressed asr=g*r=g*/3(s) (2.2) Behavior Composition and CooperationBehavior coordination has two predominant classes of coordination. These classes arecompetitive (composition) class and cooperative class. Each class has several strategies forrealization. First consider the composition of behaviors which is described in Figure 2.5, whichclearly shows how composite behaviors are decomposed further into primitive behaviors.Therefore, simple primitive behaviors can be joined to construct more abstract compositebehaviors. Composite behaviors are behavior packages on which a behavior-based robot systemis built. A behavior coordination operator and a number of behavioral components are includedwith each composite behavior. The behavioral components may be either primitive behaviors orcomposite behaviors.High level behaviors can be built on some simple behaviors. These high levelbehaviors can bereferred without needing to know their behavioral components. Composite behavior is given asfollows:p = C(G*R) = C(G*B(S)) (2.3)iiS1 g1 o...or S 0 g/32where, R , S = , G =20,and Bs, 0...0g31Here, B stands for the vector with n number of current composite behaviors(fit ‘182’ fin).S represents a vector of all detectable stimuli relevant for each behavior ,8 at time t; Rrepresents a vector of response of n number of behaviors(181,182,., /3,,) at time t; G represents again matrix which modifies the response matrix R; and C represents the behavior coordinationoperator for composite behavior (There are two types of behavior coordination operators, calledcompetitive operator and cooperative operator, for two classes of behavior coordination). Figure2.7 shows the SR diagram of a composite behavior with a competitive operator.Stiniu Ii•According to this figure, conflict can result when two or more behaviors are activesimultaneously, each with its own independent response. Competitive method provides a meansof coordinating behavior response for conflict resolution. This competitive operator is designedto select the winner behavior from among the simultaneously activated behaviors. Only thewinner’s response is directed to the robot for execution. There are numerous competitiveoperators; for example, suppression-based arbitration, arbitration via action selection, votingbased arbitration, and so on.The other type of behavior coordination operator is the cooperative operator, which is shown inFigure 2.8.pResponseFigure 2.7: The SR diagram for a composite behavior with a competitive operator.32Sii-iStimuliThe cooperative method provides an alternative to a competitive method such as arbitration. Ituses behavior fusion and provides the ability to concurrently use the output of more than onebehavior at a time. The outputs of multiple behaviors are fused and directed to the robot forexecution. It has been pointed out that multiple behaviors are executed by the robot one at a timeif they do not conflict with each other. The simplest cooperative operator is performed throughvector addition or super-positioning. The key to a cooperative operator is to design the behaviorsso that their responses can be fused. The potential field-based method is the most popularapproach, which simply adds the outputs of multiple behaviors. Communication Behaviors for CoordinationFor coordinating the activities of the robots in a multi-robot system, communication behaviorsare designed to receive information from the high level task decomposition layer, exchangesensing information, objectives, and internal states of the robots. The communication behaviorcoordination is implemented through the high level task decomposition layer which is located inthe top level of the behavior-based control layer. The behavior communication layer sends itsreceived information to the group behavior layer for low level control. Explicit communicationand implicit communication are the two types of communication. Explicit communicationconcerns sending or receiving information via computer network data links. However, robotsalso have the ability to exchange information indirectly with its teammates by interacting withthe environment or using sensing. For example, a robot can estimate its relative distance fromanother robot or from closest obstacle by using its CCD camera or laser distance finder.pResponseFigure 2.8: The SR diagram of a composite behavior with a cooperative operator.332.2.3.4 Group Behaviors for CoordinationThe group behavior layer is responsible for decomposing the abstract behaviors, which arereceived from the high level task decomposition layer though the communication behavior layer.The group behavior layer will carefully coordinate the activities of the robots with the purpose ofmanaging and solving conflicts among robots and making them contribute to the common tasks.The learning or coordination unit continuously decomposes subtasks among multiple robots andassigns them to each robot. When a robot receives its sub-task, it will decompose the sub-taskfurther into more detailed composite behaviors in its behavior-based distributed control level andfinally convert them into primitive behaviors for execution. Behavior Coordination for Object Transportation SubtaskA fully distributed multi-robot coordinated object transportation system is developed in thisthesis. The behavior-based distributed control layer facilitates the implementation of the sub-tasks which are assigning by the learning/coordination units in the top level of the proposedarchitecture.Figure 2.9 shows the behavior hierarchy in the behavior-based distributed control layer for thecoordinated object transportation subtask. In the object transportation subtask, first the robotswander in the environment, looking to find an object of interest. The group behavior of the“Assume Formation” makes the robot move to an intended site around the transported object soas to establish a formation with its peer robots. The learning/coordination unit in the top level ofthe proposed architecture establishes the transportation formation of the robots; it decomposesinto the specific position and orientation of the robots, and sends the determined subtasks to thebehavior control layer for execution. When the “Assume Formation,” parent behavior isactivated, it will first turn on its child behavior of “Find Object.” It attempts to search for theobject to be transported. If this object found from the environment, it estimates the object pose(location and orientation) in the environment. A CCD camera and a laser range finder are used toaccomplish this estimation. Simultaneously, the composite behavior “Find Object” will call the“Search Color Blob” composite behavior to make the robot to rearrange its pose continuouslyand try to locate the predefined color blob attached to the object and move it to the center of thecamera frame, by using its CCD camera. The predefined color blob can be used to distinguish the34object of interest from other objects or obstacles in the environment. Then the “Lasermeasurement” primitive behavior is triggered to estimate the pose (position and orientation) ofthe object using the laser range finder. After that the composite behavior “Find Obstacle” willcall the “Laser measurement” primitive behavior to measure the distances to the obstacles fromthe object of interest, angles and sizes. Then, the behavior of “Move to a Location” will attemptto move the robot to the designated action position as decided by its higher levellearning/coordination unit. The composite behaviors “Move” and “Obstacle Avoidance” areactivated simultaneously, until the robot moves to its designated action position.Figure 2.9: Behavioral hierarchy for the coordinated transport sub-task.rLend:() V)[BeharJ35The group behavior “Carry” is activated after all robots reach their designated action positions,and creates a formation around the object of interest. When the “Carry” parent behavior isactivated, it will first turn on its child behavior of “Signal” which attempts to start asynchronization process to transport the object. During the transportation process, the robotsneed to monitor whether the object collides with any obstacles in the environment. If the objectcollides with an obstacle, the robots will abandon the “Carry” behavior and inform the incidentto their learning/coordination units in the top level task decomposition and assignment layer.Then learning/coordination units plan their formation again.Not at thgoal postionFigure 2.10: FSA describing the transitions of the group behaviors in the task of coordinatedobject transportation (Wang, 2007).Finite State Diagrams (FSA) is a method for expressing robotic behaviors. Here,Finite StateDiagrams (FSA) are used to describe the aggregations and sequences of the behaviors so that theprocess of behavior activation and transition becomes more explicit. A finite statediagram canbe specified by the quadruple (Q,6,q0,F) with Q representing allbehavioral states;q0 representing the initial state; Frepresenting all final (termination) states; and ö representingthe mapping function which maps the current behavior state q and input a into a newbehavioralOtherNo at isiqiaedpt;onTha forniulatoris eablishedAll36state q’, i.e., q’ = 5(q,a). The FSA describing the transitions of the group behaviors in theproject of coordinated object transportation is shown Figure Mechanism for Multi-Robot CoordinationA multi-robot coordination mechanism must cope with different types of requirements, such asrobot positioning and synchronization. It should provide flexibility and adaptability, allowingmulti-robot teams to execute tasks efficiently and robustly. To accomplish this, a dynamic roleassignment mechanism is proposed in which robots can dynamically change theirbehavior,enabling them to successfully execute different types of cooperative tasks. This approachisvalidated using environmental state identification. A role is defined as a functionlikeenvironment state identification, which one or more robots perform during the execution ofacooperative transportation task. The particular state identification stage consists of threeroles(modes): s1 - Wandering; s2 - Object Identification, and s3 - ObstacleIdentification.2.3.1 Dynamic Role Assignment Mechanism for CoordinationUnitIn general, to execute a cooperative role a team of robots must be coordinated;specifically, theyhave to synchronize their activities and exchange information. Inthis approach, each robotperforms a role that determines its activity during the cooperative task. According to itsinternalstate and information about the other robots and the rolereceived through communication, arobot can dynamically change its role. The mechanism for coordination is centralized,but it isactivated only for a short time. As mentioned in subsection 2.2 theleader robot has the power tomake decisions based on its local information and the information received frompeers. Eachteam member has a specification of the possible activities that should beperformed during eachphase of cooperation in order to complete the state identification task. These activitiesmust bespecified and synchronized considering several aspects, such as robotproperties, taskrequirements, and characteristics of the environment. The dynamic roleassignment will beresponsible for allocating the correct activities to each robot andsynchronizing the cooperativeexecution though the leader robot in the system.Before describing the details of the role assignment mechanism, it is necessary todefine what arole is in a cooperative transportation task. A role is defined as afunction that one or more robots37perform during the execution of the state identification stage, which is a cooperative robotic taskaccording to the modifiedQlearning algorithm. Each robot will perform a role while certaininternal and external conditions are satisfied, and will assume another role otherwise. The rolewill define the present activity of the robot in that instance. Here three distinct conditions areconsidered for role transition. CCD camera and laser range finder is used as sensors for thisparticular task, generating high level stimulus to the system.The role assignment mechanism allows the robots to change their roles dynamically during theexecution of a task, adapting their activities according to the information they have about thesystem and the task. Basically, according to the state identification stage; there are two ways ofchanging roles during the execution of a cooperative task. The simplest way is the Allocation, inwhich a robot assumes a new role after finishing the execution of another role. In theReallocation process, a robot terminates the performance of one role and starts or continues theperformance of another role. Figure 2.11 shows a diagram with the two types of role assignment.The vertical bars inside the robots indicate the percentage of the role execution that has beencompleted.Rok 01 Ro1, 02I Aflocattion‘— \__.RaI1ocaiiic,nFigure 2.11: Types of role assignment.38The control software for a robot includes programs for each role it can assume. The localinformation consists of the robot’s internal state and its perception about the environment. Globalinformation contains data about the other robots and their view of the system and is normallyreceived through explicit communication (message passing). A key issue is to determine theamount of global and local information necessary for the role assignment. This depends on thetype of the activity that is being performed.The present approach allows for two types of explicit communication: synchronous andasynchronous. In synchronous communication, the messages are sent and received continuouslyin a constant rate, while in asynchronous communication an interruption is generated when amessage is received. Synchronous messages are important in situations where the robots mustreceive constant updates about the activities of the others. On the other hand, asynchronouscommunication is used for coordination when, for example, the leader robot needs to inform theothers about the object pose or the leader robot need to interrupt the wandering role of the otherrobots, and so on.An important requirement is to define when a robot should change its role. In the role allocationprocess, the robot detects that it has finished its role and assumes another available role. Forexample, consider the robots which wander in the work environment to find a color coded object.If a robot detects the color coded object then it needs to identify the object (particularly, estimatethe object pose). In the reallocation process, the robots should know when to abandon the currentrole and assume another role. A possible way to do that is to use a function that measures theutility of performing a given role. A robot which performs role r has a utility given byPr•Whena new role r’ is available, the robot computes the utility of executing the new rolePr’•If thedifference between the utilities is greater than a threshold:T((Pr— Pr) >r) the robot will changeits role. The function p can be computed based on local and global information and may bedifferent for different robots, tasks and roles. Also, the value r must be chosen such that thepossible overhead of changing roles will be compensated for by a substantial gain on the utilityand consequently producing a better overall performance.392.3.2 Dynamic Role Assignment Model for Coordination UnitThe dynamic role assignment can be described and modeled in a more formal framework. Ingeneral, a multi-robot system can be described by the robotic state (X), which is a concatenationof the states of the individual robots:X=[x1,x2 ,xj (2.4)Considering a simple control system, the state of each robot varies as a function of its continuousrobotic state (xi) and the input vector (u,). Also, each robot may receive information about theother robots in the system (2). This information consists of estimates of the robotic states of theother robots that are received mainly through communication. We use the hat(A)notation toemphasize that this information is an estimate because the communication can suffer delays,failures, and so on. Using the role assignment mechanism, in each moment a robot will becontrolled using a different continuous equation according to its current role in the stateidentification stage. Therefore, we use the subscript q to indicate the current role of a robot. Byfollowing this description, the state equation of robot i during the execution of the task is givenby:i =fq(XjUj2j) (2.5)Since each robot is associated with a control policy, we haveu1 =kjq(Xj,2j)(2.6)and since 2 is a function of the state X, we can rewrite the state equation:= fq(X) (2.7)or, for the whole team,X = F (X), whereFq= {J ,f]T(2.8)The continuous behavior of each robot (it represent the robotic state at any instance) is modeledby these equations. Based on these equations, the continuous actions of the whole team of robots40are implemented during the execution of a cooperative state identification stage. Theseequations, together with the roles, role assignments, continuous and discrete variables,communication, and synchronization can be better understood and formally modeled using afunction A, as described next. This function consist of a finite number of real-valued variablesthat change continuously, as specified by differential equations and inequalities, or discretely,according to specific assignments. The state identification stage consists of both discrete states(roles) and continuous states (robotic states). Therefore, the nature of the particular multi-robotsystem in state identification stage is a hybrid, given byA = {S,V,T,f,condl,cond2,init,R} (2.9)Here S = {l, 2 , k} is the set of roles and this is a discrete set of states of the robot. Accordingto Figure 2.12, a role is composed by other roles, which are called hierarchical/sequentialcompositions. Therefore, variables of the system V (V = V u J) can be composed by discreteinformation(Vd)and continuous information(Va). J’ represents the robotic states andj”1represents the sensory information within each role. The dynamics of the continuous informationare determined by the flowsf.Generally continuous information is updated according to thedifferential equations inside each role(fq)Discrete transitions between pairs of roles (p, q) arespecified by transitionT, which represents the role assignment. condl and cond2 are predicatesrelated to the set of roles (5) and transition T, respectively and both condl and cond2 aredefmed when each robot will assume a new role. The system can stay in a certain control modewhile its condl is satisfied, and can take transition when its cond2 (jump condition) is satisfied.The initial states of the system are given by mit, and each transition T can also have anassociated reset statement R, to change the value of some variable during a discretetransitionusing discrete information. Furthermore, variables are also updated according to the resetstatement of each discrete transition. Finally, we can model the cooperative state identificationstage execution using a parallel composition.Figure 2.12 shows a role composed by the three roles: s1 - Wandering; s2 - Object Identification;s3 - Obstacle Distribution. Each role is characterized by differential equations and algebraic41constraints. The condition under which transitions between roles occur and the conditions underwhich robots stay in a particular role, are specified.Communication among robots can also be modeled in this framework. We use a message passingmechanism to exchange information among the agents. To model message passing in a hybridautomaton, we consider that there are communication channels between agents, and use the basicoperations send and receive to manipulate the messages. In the hybrid automaton, messages aresent and received in discrete transitions. These actions are modeled in the same way asassignments of values to variables (reset statements). It is rather common to use a selftransition,i.e., a transition that does not change the discrete state, to receive and send messages.42’2(x)<0(1CTRole br Robot iFigure 2.12: Schematic representation of a role composed by three roles.422.4 Distributed Control Issues for Multi-robot CoordinationNormally, a multi-robot system with a distributed control architecture does not employ a leaderrobot to make decisions for all other robots in the system. The proposed hierarchical multi-robotarchitecture is also distributed control architecture. In a distributed control system, each robotneeds to decide its own actions independently by observing the environment or communicatingand negotiating with its teammates. The only exception is the coordination unit in the proposedarchitecture which requires centralized decision making when the robots need to identifj theenvironmental state. However, the centralized coordination unit works only for the stateidentification phase which exists for a very short period of time compared to the transportationtask. Once the multi-robot system leaves the state identification stage, the learning unit willtakeover the control of the robot again. A distributed architecture has many advantages over acentralized control architecture, especially for multi-robot systems.First, a distributed multi-robot system has an ability to increase the number of robots in the robotteam without significantly increasing its computational cost. This represents the scalability. Forexample, the architecture proposed in this chapter can be expanded from to 3 to 20 robotswithout serious problems. If a member robot quits or a new robot enrolls into the system duringthe transportation process, the system is capable of adapting to changes of the team sizedynamically. For a centralized multi-robot system, scalability is a difficult issue because thecomputational complexity of a centralized decision making algorithm increases rapidly with thenumber of robots.Second, distributed systems have a low communication volume. A decentralized distributedsystem does not assign a leading robot. Therefore, bottlenecks in communication canappear.However, a centralized system has a central leader and when the number of robots in the systemincreases the communication bottleneck can be so serious that the algorithm becomes worthless.Furthermore, distributed systems are capable of accomplishing some of the cooperativetaskseven without communication. Particularly, most realistic cases of mobile robotsystems areexpected to work with a limited communication capability, because most multi-robot systemsneed to run in an environment where communication among robots is usually unreliable; forexample, the bottom of a deep ocean or on a planetary surface. If the system does not establishreliable communication among robots, centralized decision making becomes weak because the43robot members cannot receive commands from their leader in a timely and accurate manner.However, distributed multi-robot system may work well even when communication amongrobots is not reliable.Third, the distributed systems are robust. Each robot in the system can make their decisionsindividually to accomplish a common task. When one robot in the system malfunctions or quits,other robots still can make decisions by themselves to continue their tasks. However, in acentralized system one leader robot makes decisions for all other robots in the team. Hence, oncethe leader robot fails, the whole system will fail entirely. Therefore, such systems are not robustcompared to distributed control systems.Low efficiency is a major disadvantage of a distributed system, where multiple robots reach theiragreements through some cooperation technique such as voting, action, and so on. Thesecooperation approaches are usually time consuming and are not efficient. Normally, distributedlearning algorithms are much more difficult and complex than centralized equivalents.Therefore, a distributed system is usually not optimal, because the final policy reached by themembers is not optimal. It is still an open question as to how to implement efficient and optimaldistributed decision making.2.5 SummaryIn this chapter, a three layer distributed multi robot architecture, which accommodatesdistributed machine learning, behavior-based distributed control and low level “hard” control,was developed. The objective of the architecture is to support multi-robot systems to properlycoordinate the robotic tasks in a dynamic and unknown environment. The proposed architectureis a general platform that merges several existing techniques of multi-robot decision making, andaccommodates homogenous or heterogeneous robots in the same framework.Before proposing the architecture, several popular multi-robot coordination approaches given inthe literature were discussed, and their advantages and disadvantages were compared. Thediscussed multi-robot coordination approaches are the behavior-based approach, the planningbased approach, the distributed control-based approach, the market-based approach, and thelearning-based approach.44Next, the machine learning unit and the coordination unit were introduced. In the proposedarchitecture, an arbitrator dynamically selects either the learning unit or the coordination unit tomake decisions for the robots. Then the behavior-based control layer and their sub-layer werepresented. In this thesis, it is combined with the learning and coordination units in the top levelof the architecture to control the actions of a robot. After that, the mechanism of multi-robotcoordination was presented. This framework, which facilitates a group of robots to coordinationtheir activities, allows global sharing of information during the stage of object environment stateidentification. Finally, some distributed control issues of multi-robot systems were discussed.The advantages and disadvantages of both centralized and distributed architectures were pointedout and some open questions in this field were identified.45Chapter 3Machine Learning for Multi-Robot Decision Making3.1 OverviewMulti-robot decision making is a challenging task in many real world problems, whichinvolve real-time operation of robotic systems in uncertain and dynamic environments. Inthese problems, actions and interactions of multiple mobile robot have to be consideredconcurrently so as to reach a global optimal policy. Among the available approaches,planning-based approach is a contender for dealing with this problem. However, plannedoptimal policy usually is feasible in static environments, and it is shown that theplanning-based approach is inapplicable to fast changing environments.Current approaches for decision making in uncertain and dynamic multi-robotenvironments such as planet surfaces and urban environments have two main difficulties.First, many approaches do not take into account the uncertainty inherent in these problemdomains. For example, in the multi-robot environment, terrain features can changecontinuously. Second, for systems with many robotic agents, state of the art methods cangrow to be too cumbersome to be and suboptimal. For example, even though the physicalenvironment is stationary, from the perspective of a single robot, the particular staticenvironment is still a dynamic environment because the other robots of a multi-robotsystem take actions, continuously making changes to the environment.In view of these considerations, the behavior-based approach and the machine learningapproach are the primary approaches that are capable of overcoming the challenges ofmulti-robot systems operating in dynamic, unstructured, and unknown environments. Asmentioned in the previous chapter, the behavior-based approach is more robust for adynamic environment than the planning approach. It is so because it can quickly adapt toa fast changing environment, by identifying the current environment state and couple itwith a specific action. In addition, the behavior-based approach is known to becomeunsuccessful in a complex environment with a large number of environment states.46However, it is highly effective in simple environments. Particularly, the behavior-basedapproach needs a human designer to design all behavior rules and the designer needs toforesee all possible world states that the robots will encounter. Therefore it is almostimpossible for this approach to be effective in a dynamic and unknown environment witha large number of states.Considering how humans accomplish physical tasks in a dynamic environment, it is notdifficult to come to the conclusion that a human employs not only planning or reactive(behavior-based) techniques but also learning techniques. Humans identify the newenvironment states through their learning capabilities, come across corresponding optimalactions from the past experience, and improve their planning and reactive techniques. Inother words, humans explore the unexpected environment states with less uncertainty andmake their decisions more robust and effective in a dynamic environment.Therefore, machine learning has become an important subject in the in the multi-robotfield. Machine learning is considered a vital development due to the highly dynamicnature of typical multi-robot environments. Among the existing machine learningapproaches, reinforcement learning (RL), especially Q learning, is the most commonlyused one in multi-robot systems due to its simplicity and good real time performance.The single agent Q learning algorithm is the most commonQlearning algorithm. If wedirectly extend the single agent Q learning algorithm to a multi-robot system, theassumption of static environment is violated, and as a result the values of the Q-table maynot converge. Although the robots still can find some good policies for cooperation, theperformance of the whole team of robots can be degraded when that approach is extendedin this manner. (Wang and de Silva, 2007)Therefore, an improved version of the single agentQlearning algorithm is proposed herefor robot decision making, which can come up with an optimal solution to the actionselection conflicts of the robots in a multi-robot system. Hence the proposedQlearningalgorithm is suitable for real-time operation of cooperating multi-robot systems.47In this chapter machine learning is used to make decisions for a multi-robot objecttransportation task. Markov decisions processes (MDP) and the single agentQLearningalgorithm are introduced in section 3.2 and section 3.3, respectively. The single agent Qlearning algorithm will converge to an optimal policy in an MDP problem without atransition model. ModifiedQlearning algorithm is presented in section 3.4 and it isvalidated though computer simulation. Section 3.5 presents the distributed decisionmaking unit ofQlearning-based multi-robot object transportation, which is validatedthough computer simulation.3.2 Markov Decision Process (MDP)The Markov decision process framework for decision making, planning, and control issurprisingly rich in capturing the essence of purposeful activities in various practicalsituations. MDP models and associated problem arise in many areas, including medicaldecision making, maintenance planning, and mobile robot navigation. Therefore MDP isa popular topic in artificial intelligence and it facilitates an intelligent agent to makedecisions in real-world problems. The basic concept of MDP is introduced now.MDP can be defined by the tuple< S,A,T,R,/3>, where• S = {s1,s2 ,s}, is a set of states of the environment• A ={a1,a2 ,a},is a set of actions by the robot• T : SxA —> fl(s), is a transition function, which decides the next environmentalstates’ when the robot selects an action a, under the current states. fl(s) is theprobability distribution over the states s.• R: S x A — R, is a reward function. When the robot takes an action a. under thecurrent states, it determines the immediate reward.MDP presents a sequence of decisions made by an agent or a robot in an environmentwith many states. At time t, the agent or a robot requires to observe the environment and48determine its current state s e S, and then select an action a c A based on its policy itwhich denotes the action needed to be taken by the robot or agent to reach the desiredstate. Further, there is a reward function which denotes the immediate reward when theagent or robot takes an action a under the current state s.Two major attributes of MDP make it interesting and challenging. First, that particularsubsequent states’ is not deterministic when the agent or robot obtains an action a underthe current environmental state s. Instead, it has a probability distribution fl(s) over allthe states. This probability distribution is defined by a transition function or transitionmodel T(s, a, s9.Second, MDP’s transitions among states are Markovian; that is, theprobability of reaching s’ from s depends only on s and not on the history of previousstates.The performance of an agent or robot policy it is measured by the rewards it receivedwhen it made a sequence of decisions according to the policy and the sequence of states itvisited. This measurement is usually represented by a sum of discounted rewards as givenby[fiuR(s)I(3.1)where, 0 </3 1 is the discount factor, and R(s,) is the reward received when the agentvisits the state s at time t. Because the transitions among states are not deterministic inview of the probabilistic nature of the transition function T(s, a, s’), given a policy, thesequence of states visited by the agent or robot each time is not fixed and it has aprobability distribution. Therefore, an optimal policy ir is a policy that yields the highestexpected sum of the discounted rewards, which is given by* Iyr =armaxE[/3R(s1)I7r (3.2)Given an MDP problem, the way to find all optimal policies has to be carefullyconsidered, if more than one optimal policy exists. The value iteration algorithm has been49developed to find the solution to this problem (Russel and Norvig, 2003). In thisalgorithm, first the concept of “utility of a state” is defined asU(s)=E[fitR(s)Is0=s] (3.3)From equation (3.3), the utility of a state s is given by the expected sum of discountedrewards when the agent executes the optimal policy.If we have the utilities of all states, action selection (agent’s or robot’s decision) willbecome easy; specifically, to select the action that maximizes the expected utility of thesubsequent state:(s) = arg max T(s, a, s’)U(s’) (3.4)However, determination of the utilities of all states is not an easy task. Bellman foundthat the utility of state could be divided into two parts: the immediate reward for that stateand the expected discounted utility of the next state, as given byU(s) = R(s) + ,8 max T(s, a, s’)U(s’) (3.5)Equation (3.5) is the Bellman equation. There is a corresponding Bellman equation foreach state. For example, if there are n states totally, then there are n number of Bellmanequations for the states. The utilities of the n states are found by solving the n Bellmanequations. However, the Bellman equation is a nonlinear equation because it includes a“max” operator. The analytical solutions cannot be found using techniques of linearalgebra. The only way to find the solutions of n Bellman equations is to employ iterativetechniques. The value iteration algorithm is an approach to find the utilities of all states,and is given below.50Function Value-Iteration (mdp,6) returns a utility functioninputs: mdp, an MDP with state S, transition model T, reward function R,discount13,and the maximum error allowed in the utility of any state.local variables: U, U’ are the vectors of the utilities for the states in 5,initially zero. 5 is the maximum change in the utility of any state in aniteration.repeatU-U’;5-Ofor each state s in S doU’(s) — R(s) + /3 max T(s, a, s’)U(s’)IfIU’(s)— U(s) > S then S — U’(s) — U(s)until 5<s(l—/3)//3Figure 3.1: The Value Iteration Algorithm (Russel and Norvig, 2003).3.3 Reinforcement LearningThe value iteration algorithm is introduced in section 3.2 to solve MDP problems.However, the environment model T(s, a, s’) and the reward function R(s) are usuallyunknown in a real robot decision making situation. Therefore, the value iterationalgorithm is not practically viable for real robot decision making. Usually, robots areunable to obtain perfect information of the environment model for employing the valueiteration algorithm to solve the Markov Decision Process (MDP) problem. ReinforcementLearning is a viable solution to overcome this challenge.51In reinforcement learning, an agent or a robot observes a sequence of state transitions andreceives rewards, through trials of taking actions under different states in theenvironment. This can be used to estimate the environment model and approximate theutilities of the states. This implies that reinforcement learning is a model-free learningapproach, which is an important feature. There are numerous variants of reinforcementlearning among which Q-learning is the most common algorithm. A typical Q-learningalgorithm is introduced below.For each state s, e (s1 , s2,..., n)and action a. e (a1,a2,...,am),initialize the tableentry Q(s1,a) to zero. Initialize v to 0.9. Initialize the discount factor 0<fi1and the learning rate 0 < i 1• Observe the current state .c• Do repeatedly the following:• Probabilistically select an actionakwith probabilityeQsTP(ak)= m, and execute iteQar• Receive the immediate reward r• Observe the new state s’• Update the table entry for Q(s,ak)• Q(s,ak) = (1—i1)Q(s,ak)+77(r+,6rnaxQ[s’,a’])•Figure 3.2: The single-agent Q-learning algorithm.An empty Q-table is set up and all its entries are initialized to zero at the initiation of theQ-learning algorithm. The Q-table is a 2D table. Its rows represent the environment states52and its columns represent the actions available to the agent or the robot. Q(s1,a)represents the value of a Q-table entry and describes how desirable it is to take the actionunder the state Si, while the utility U(s1) in section 2.2 represents how appropriate itis for the agent or the robot to be in the state s,,. Parameters such as the learning rate r,the discount factor ,8 and the “temperature” parameter z, have to be initialized as well.During the operation, the agent or the robot observes the environment, determines thecurrent state s, selects an actionakprobabilistically with probability given in equation(3.6) and executes the agent or the robot executes the actionak,it will receive a reward r from theenvironment and will observe the new environment states’. Based on the information ofthe immediate reward (r) and the new environment states’, the agent will update itsQtable by utilizing the following equation:Q(s,ak)= (1— 77)Q(s,ak)+ (r + ,Bmax Q[s’,a’j) (3.7)After updating the Q-table, the current environment state is transited from s to s’. Basedon the new state, the above operations are repeated until the values of the Q-table entriesconverge.In the Q-Learning algorithm described in Figure 3.2, a s—greedy search policy presentedin equation (3.6) is used instead of a greedy policy which always selects the action withthe maximumQvalue. In a s — greedy search policy, with probability s, the agent orrobot has an ability to select a unique action uniformly and randomly among all possibleactions, and with probability 1-6, the agent or the robot selects the action with a highQvalue under the current state. In addition, the probability S is decreased gradually. Thebenefit of the s - greedy search policy is its balance in exploring unknown states against53exploiting known states when the agent attempts to learn its decision-making skills andimproves itsQtable. In equation (3.6), the probability s is controlled though decreasingthe “temperature” parameter r.3.4 ModifiedQLearning AlgorithmThe mentioned single agentQlearning algorithm is the most commonQlearningalgorithm. It is employed to find optimal action policies for the members of a multi-robotsystem and it can help robots find good cooperation policies. However, this algorithm hassome serious disadvantages in a multi-robot environment. In particular, directlyextending the single agentQlearning algorithm to a multi-robot system violates itsassumption of static environment, and as a result the values of the Q-table may notconverge. Although the robots still can find some good policies for cooperation, theperformance of the whole team of robot can be degraded when that approach is used.(Wang and de Silva, 2007)Accordingly, a novelQlearning algorithm is developed, which is suitable for real-timeoperation of cooperative multi-robot systems. It determines the optimal solution to theaction selection conflicts and provides a more realistic state identification methodologyfor a multi-robot system. The developed algorithm is suitable for real-time operation ofcooperating multi-robot systems.The single agent Q learning algorithm is improved and is named the “ModifiedQlearning algorithm.” The basic idea of the modified Q-learning algorithm comes from astrategy that is typically employed by humans when they cooperatively push a large andheavy object to a goal location. The group members typically do not observe theenvironment state individually and decide the environment state independently. Instead,one of them may handle one side, and another member may handle a complementaryside. Then they share their reactive information and come to a conclusion about thepresent environment state. Likewise other group members typically do not select theirpushing locations and forces concurrently. One person among them will select hispushing location and apply a force first. Then, by observing the first person’s action, thesecond person will select his pushing action (i.e., a cooperative strategy) accordingly.54Next, the third person will determine his action by observingthe actions of the first twopeople. In this manner, when all the people in the group woulddetermine their actions,and they will execute the overall pushing actions through simple synchronization. Thiscooperative strategy is known to work well in manual tasks. By taking inspiration fromhuman cooperation, the same strategy is used here to develop the modified Q-learningalgorithm for multi-robot transportation tasks. A modified algorithm is presented inFigure 3.3.• Assume that there are n robots, R1,R2,..., R which are arranged in aspecial sequence. The subscripts represent their positions in thissequence.• S1,S2 , S, is the corresponding role set available for the robots. Inparticular, the robot R, has S role available for the particular instant.• Robots R1,R2,..., R exploring the environment for color coded object• If the robot R. detects the color coded objecto R, estimates the pose (position and orientation) of the object.o Pose information will be transmitted to R1,R2,..., Ro R, assigned as the leader of the team.o R1,R2,..., R,_1andR,÷ , R1,..., R transfer the decision rights to thecoordination unit of the R.o R, should assign tasks to R1,R2 ,...,R11andR , R,..., R,1 in thesystem who will assist in identifying the obstacle distribution aroundthe objecto R1 ,R2,..., R are transmitting their processed sensor information tothe system.o R1,R2 ,..., R are will be able to establish the local environment stateindividually• else If Robots R1,R2 ,..., R exploring the environment for color codedobjectFigure 3.3(a): Environment state identification algorithm.55• Assume that there are n robots, R1,R2 ,..., R whichare arranged in a specialsequence. The subscripts represent their positions in this sequence.• A1,A2 ,Aare the corresponding actions sets availablefor the robots. Inparticular, the robot R, has m, actions availablefor execution as givenbyR1 : A. =(a,a ,a,,)• Q(s,a),Q2) Q(s,,a1)are the correspondingQtables for the robots. Allentries in theQtables are initialized to zero.• Initialize r to 0.9.•is a set including all actions selected by the robots thus far, and çS representstheempty set.• Do repeatedly the following:o Initialize ( =o Observe the current environment state so For(i=lton)1. Generate the current available action set 1, = (A1 — (A1 m C’))2. The robot R1 selects the action e 1, with probabilitye”°)/rP(a)=m(3.8)(s,a )/rWhere a c Y,(r =1, 2,..., k) and k is the size of the setY,3. Add action a’, to the seta. End Forb. For each robot R. = (1 =1,2,..., n), execute the corresponding selected action a’,c. Receive an immediate reward rd. Observe the new state s’e. For each robot R1 = (i = 1,2,..., n), update its table entry forQ,(s, a,a , ..., a) asfollows:where 0 <i <1 is the learning rate and 0<fi< 1 is the discount rate.os+—s’,r+—r*0.999Figure 3.3(b): ModifiedQlearning algorithm56Eia1eQbJeAssign LeaderGet DecisiontIcadIESLIITatC Ohsaclc Pose andSize according to th Decision[Determine Current State cFigure 3.3(c): The flow diagram of the modifiedQlearning algorithm.During operation, robots in the system start wander in the environment forcolor codedobject to be transported. Once an object is detected by a robot inthe system, it rearrangesits pose to center the color blob in the camera frame. Then the appropriate robot estimatesthe pose (position and orientation) of the object and the information will be transmittedtothe other robots. . The robot that identified the object will temporarily close the machinelearning unit and transfer its decision rights to the coordination unit. Particular robotisassigned as the leader of the team for the state identification process and all other robotsin the system will stop their learning units and transfer their decision rightsto thecoordination unit of the leader robot. Thereafter, the leader robot will assign tasks topeers in the system, enabling to identify the obstacle distribution around the object. Allmembers in the system will transmit their information while processing the informationcontinuously. All this information would determine the current local environment state sof the object individually by all robots. According to the present state, robots willindividually select their respective actions.Estimate ObjcctPosc andCalculate Goal Orientation57In the action selection stage, a robot probabilisticallyand sequentially selects an actiona,with probability; and executes it. The proposedmodifiedQlearning algorithm uses asequential action selection method to overcome theaction selection conflicts amongrobots. When two robots select the same action totransport the object to the samelocation, it violates the cooperation strategy. Accordingto the proposed sequential actionselection method, if one robot selects an action, it shouldtransmit the information to thesystem. This would help to overcome the conflict.After the agent takes the actionakaccording to the sequential action selection method, itwill receive a reward r from the environmentand observe the new environments’.Based on the information of r and s’, the agent willupdate its Q-table according to(3.10)In this manner the current environment state is transited froms to s’. Based on the newstate, the above operations are repeated until the values of the Q-table entries converge.Learning rate i determines to what extent the newly acquiredinformation will overridethe old information. Discount factor determines the importance of future rewards. The“temperature” parameter is r. The developed approach is validated throughexperimentation.3.5 Multi-Robot Transportation Using ModifiedQLearningMulti-robot object transportation is an important subject in multi-agent robotics. In anautonomous transportation system, several autonomous mobile robots movecooperatively to transport an object to a goal location and orientation in a static ordynamic environment while avoiding fixed or removable obstacles. It is a ratherchallenging task. For example, in the transportation process, each robot needs to sense thepositions of the obstacles and the other robots in the neighborhood, and any change in theenvironment. Then it needs to communicate with the peers, exchange the sensinginformation, discuss the coordination strategy, suggest the obstacle avoidance strategy,assign or receive the subtasks and roles, and coordinate the actions so as to move theobject quickly and successfully to the goal location. Such operations have many practical58applications in such fields as space exploration, intelligent manufacturing, deep seasalvage, dealing with accidents in nuclear power plants, and robotic warfare.As explained in section 3.1, a learning capability is desirable for a cooperative multi-robot system. It will help the robots to cope with a dynamic or unknown environment andmake the entire system increasingly flexible and autonomous. Although most of theexisting commercial multi-robot systems are controlled remotely by a human,autonomous performance will be desirable for the next generation of robotic systems.Without a learning capability, it will be quite difficult for a robotic system to becomefully autonomous. This provides the motivation for the introduction of machine learningtechnologies into a multi-robot system.A schematic representation of the developed system is shown in Figure 1.2 and it isrepeated in Figure 3.4.Figure 3.4: The developed Multi Robot System.I Wired/wirelessRouter_____TCP/IP59In this section, a multi-robot transportation system outfitted with distributed Q-learning isstudied, which works in a dynamic and unknown environment with a complex obstacledistribution. Modified Q-learning algorithm is used to accomplish this task.3.5.1 Distributed Multi-robot Q-leaningThere are two ways to extending single-agent reinforcement learning into the multi-robotfield. One way is to treat multi-robot learning as a combination of multiple single-agentlearning processes across the team members. The other way employs SG-based(stochastic game-based) reinforcement learning As mentioned above, ModifiedQlearning is also an improved version of typical single agent Q learning. Therefore themultiple single agent nature of the typical single agentQlearning algorithm plays asignificant role in multi-robot systems. In this section we consider the combination ofmultiple single-agent learning type only. Each robot is equipped with the reinforcementlearning technique of the multiple single agent learning type , and learns their skills as ifthe other robots do not exist in the environment. This type of reinforcement learning iscalled “distributed learning”. The main advantage of “distributed learning” is itssimplicity.In this method, every robot just needs to monitor its own actions and the correspondingrewards, and it does not need to care about the behaviors of the peer robots.Consequence of this is a small learning space and a low computational cost. However,directly extending the single-agent reinforcement learning algorithm to the multi-robotdomain violates the MDP (Markov Decision Processes) assumption - the theoretical basisof reinforcement learning - which assumes that the agent’s environment is stationary anddoes not contain any other active agents.It follows that, “distributed learning” may find some utility in the multi-robot learningdomain because of its simplicity. Distributed learning may work in a purely cooperativemulti-robot task (Yang and Gu, 2004). Theoretical proof of its convergence under someconditions is available (Littman, 2001). On the other hand, “distributed Q-learning” isapplied to a multi-robot object transportation task in a dynamic and unknownenvironment, which is a challenging and important research topic in multi-robot60coordination systems. Next, some important issues of “distributed learning” arediscussed and effectiveness, robustness and adaptivity are studied through computersimulation of a learning-based multi-robot system.3.5.2 Object Transportation TaskEven though considerable work has been done in multi-robot object transportation, it isstill difficult problem. In a task of object transportation, multiple autonomous robots haveto cooperatively push an object to a goal location and orientation in a dynamic andunknown environment. Pushing is the transportation method used to accomplish theparticular task. The particular object has a rectangular cross section. It is large and heavy,and a single robot is unable to accomplish the task alone. Furthermore, each robot onlypossesses local sensing capability and the obstacles may appear and disappear randomlyin the transportation path. Therefore robots cannot plan the trajectory in advance. Figure3.5 presents a dynamic and unknown environment in a continuous space. Both dynamicand static planning technologies will fail if the environment changes quickly and thecomputational speed of the robots is not sufficiently fast. The learning based approachmay be a better solution to this challenging problem than a static/dynamic planningapproach. The modified Q learning approach described in Figure 3.3 is utilized to executethis multi-robot object pushing problem.. FRandon, ObsLaclesGoal LocationI .‘‘:ObjeCtN•Figure 3.5: Description of a multi-robot object pushing task.613.5.3 Multi-robot Object Pushing with Modified Q-learningModified Q-learning is used as the learning technique for the multi-robot object pushingproblem. The operational method is as follows. Each robot is equipped with modifiedQlearning. When the robots begin to transport the object, each robot identifies the currentenvironmental state cooperatively utilizing the state identification technique mentionedearlier, which consists of the object pose estimation technology and obstacleidentification technology, and then individually develops the current environment state asa binary word (Details are given in subsection Next, the robots independentlyemploy the Modified Q-learning algorithm given in Figure 3.3 to select their actionssequentially to push the object. After the actions are taken, each robot cooperativelyutilizes the state identification technique and observes the new environmental stateindividually and receives its immediate reward r. The robots then update their ownQtables with the reward and the new environment state information, and continue with thenext step of object-pushing until the object reaches to goal location. Each robotindividually maintains their Q-tables and makes their decisions. Therefore, it is a fullydistributed learning-based decision making system. In other words, the multi-robot objectpushing problem is solved with a purely learning-based approach.However, the proposed “Distributed Reinforcement Learning” system also violates theassumption of stationary environment in the MDP problem. While a robot is pushing theobject of interest, its peer robots are also pushing it. Therefore, the environment is notstationary anymore from the viewpoint of this robot. Furthermore, the random obstaclesmake the environment more complicated. However, the simulation results in thefollowing section will show that the object pushing task can be completed successfullyalbeit with less efficiency. The Environmental States for ModifiedQLearningThe Modified Q-learning algorithm is employed in the task of multi-robot object pushing.As mentioned before, environmental state identification plays a vital role inaccomplishing the overall task. Normally, a multi-robot working environment iscontinuous and complex. Therefore, a methodology is needed to express the continuous62environment where the infinite number of environmental states have to be represented bya finite number of states. The environmental state is coded into 16-bit binary code asshown in Figure 3.6.Obstacle DistributionGoal OrientationA[15 87Figure 3.6: Environmental State Representation using Binary Code.The current obstacle distribution around the object within the detection radius isrepresented by the higher 8 bits (from bit 8 to bit 15) in the binary code. The orientationof the goal location relative to the current object position and orientation (pose) isrepresented by the lower 8 bits (from bit 0 bit 7) in the binary code. The coding protocolis presented in Figure 3.7.The coding protocol assumes that the robots only possess local sensing abilities.Consequently, the system needs to identifi the local environment and detect the obstaclesthat appear within the detection radius. In order to describe the obstacle distribution inFigure 3.7, the detection circle is divided into 8 equal sectors. Each sector corresponds toone of the higher 8 bits of the environmental state. If there is at least one obstacledetected in this sector, the corresponding bit in the state will become “1.” Otherwise, itwill become “0.” For example, in Figure 3.7, the higher 8-bits of the environmental stateis “10100100” which indicates the current obstacle distribution around the object withinthe detection circle.63fyDetectionRadius0/Figure 3.7: Coding protocol of the environment states.The orientation of the goal relative to the current object pose is represented by the lower8 bits of the environmental state code. The orientation angle & of the goal in Figure 3.7 isused to calculate the lower 8 bits ofthe state using the following equation:Statebll(O_7)= int(8 /(360.0 / 256))intOfunction is used to convert a float value into an integer number. Robot actions(3.11)Normally, a robot has the ability to push an object of rectangular cross section atanycontact point on the surfaces of the object. Therefore, the object provides an infinitenumber of action points for the robots to push. However, a finite number of action pointsneeds to be defined to implement the modifiedQlearning algorithm. Accordingly, onlysix action points (a1a6) will be available to each robot, as shown in Figure 3.8 (Wang,2007)./Goal Location0/1•09I x0/I —-f—__064a3 a2a4 a1Figure 3.8: Pushing action points available to each robot. Object dynamicsThis section discusses the application of the modified Q learning algorithm to themulti-robot object pushing problem. The object dynamics is simplified in thesimulationsystem. It is assumed that the object displacement is proportional to the net force exertedby the robots, and the rotational angle of the object is also proportional to the nettorqueexerted by the robots. In Figure 3.9, the dynamics of the object is illustrated,where theobject is being pushed by three mobile robots (Wang, 2007).AAEZEa5 a6Object65Yenv/Y2YiXenvFigure 3.9: Dynamics of the object with rectangular cross section.According to Figure 3.9, three robots push the object with rectangular crosssection and1 to F are their pushing forces. The object dynamics is described by the followingequations: (Wang, 2007)x2 =x1 ÷cFcos(y+a) (3.12).v2=y+cFcos(y+a (3.13)a2=a1+cF(3.14)where, F is the net force, F is the net torque, c1 and c, are coefficients,andy = arctg2(Fj/F)//xl x,663.5.3.4 Reward functionA common immediate reward policy is presented in this section. Specifically, each robotreceives the same reward after it pushes the object for a short period of time At. Thereward consists of the following three parts: (Wang, 2007)(1) The reward for pushing the object toward the goal locationRdISce= (DOld — Dnew )cd(3. 1 5)whereDOldandDneware the distances between the goal location and the objectcenter before and after taking the pushing actions, and c is a coefficient.(2) The reward for the rotational behavior of the objectRrotation = cos(Ia2 —a1I)with 0Ia2 —a1Ig(3.16)where, (a2 — a1)is the rotational angle of the object, when it is pushed by therobots for a period of time At. While a small rotation is encouraged, a largerotation will be punished because it will make handling of the object difficult forthe robots.(3) The reward for obstacle avoidance1i.o;f no obstacleRobsiacle = —— (3.17)1 .5(sin(ip’ / 2))where, i’ is the angle between the direction of the net force and the direction ofthe obstacle closest to the net force.Finally, the total reward is calculated as:R= WlRdistce+W2Rrotation+ W3Robstacle(3.18)Where w1 to w3are the weights andw1+w2 +w3 =1.0.673.5.4 Simulation ResultsThe simulation system is developed to validate the proposed modifiedQlearningtechnique. Microsoft Visual C++ language is used to program a learning-based multi-robot object pushing system, which has been described in the previous sections. Thedimensions of the simulated environment are 1000 x 700 units and the dimensions of theobject with rectangular cross section are 120 x 80 units. Gray color circles with a radiusof 2 units are used to represent the three robots, which are used for the simulation systemin the subsequent figures to push the object to the goal location. The object with arectangular cross section is large and heavy, and an individual robot cannot handle itwithout the help of its peers.During the simulation, based on the available environmental state, the robots sequentiallyselect their actions using the modified Q-learning algorithm. After selecting their actionssequentially, they begin to push the object after synchronization. The object is pushed fora short period of time At, and the same reward is given all the robots. Next, observingthe new environmental state, based on the reward and the new environmental state, therobots independently update their Q-tables. Next step of pushing starts, and so on, untilthe goal location is reached or the total number of steps exceeds the maximum limit.The object pushing is completed ones the object reaches the goal location or the totalnumber of steps for one round exceeds the limit. When a round of object pushing iscompleted, a new round of object pushing is started. In addition, the starting location ofthe object, the goal location, and the obstacles are selected randomly, every time the nextround of object pushing is started. However, the Q-table of each robot is inherited formthe last round. The simulation system is stopped after the total number of rounds reachesthe desired number.The parameter values of the simulation system are: /3=0.9, r = 0.9, i = 0.8,c =0.275, c, =0.0175,Cd=0.5, w1 =0.6, w2 =0.04, w3 =0.36, and 1=10. Elevenobstacles appear randomly in the environment during the simulation, and the maximumnumber of steps per round is 6000.683.5.4.1 Simulation Results for Object PushingTwo different object pushing snapshots under different obstacle distributions and goallocation are presented in Figure 3.10. Three robots represented by very small circles arelocated on the sides of the object. The eleven large blue color circles represent theobstacles and the red color circle represents the goal location in the environment.4ObjedTrajectyGoalLocation.Obacle.(a). Snapshot of a successfiil object pushing taskObstacleI.4StartingPoint69fib,(b). Another successful task with different obstacle distribution and goal locationFigure 3.10: Two object-pushing snapshots under different conditions.The three robots first cooperatively push the object upwards to the goal location in thetop right corner, as presented in Figure 3.10 (a). During the object pushing process, theobject reaches an obstacle in the transportation path. Successfully avoiding the obstacleafter several failed trials, the robots select to slightly push the object downwards to avoidthe obstacle. The object is pushed by the robots upward again to reach the direction of thegoal location. However, before the object reaches the goal location, it reaches anothercouple of obstacles. The robots again adjust their actions according to the newenvironment states and avoid the obstacles during transportation. Finally, the object issuccessfully pushed to the goal location.Figure 3.10(b) presents another object pushing task with a different obstacle distribution.Both these results demonstrate that the modifiedQlearning algorithm which is also asingle agent reinforcement learning algorithm, can be extended to multi-robot object.. ..70pushing in an effective and successful manner. In the process of pushing and obstacleavoidance, the robots demonstrate a flexible and adaptive sequential action selectionstrategy in a complex and unknown environment. In addition, the system employs a fullydistributed multi-robot decision-making system with local sensing capability.Consequently, it is robust and scalable, and may be easily employed in various complexenvironments and applications.It is also observed that the learning algorithm is not perfect in every respect. It can sufferfrom the lack of local sensing. However, the particular problem can be solved byincreasing the detection radius. That will lead to rapid expansion of the learning space,which can make the algorithm ineffective. Effectiveness of Modified Q-learning in a Multi-Robot Cooperative TaskAs mentioned before, immediate reward is a same common reward and it is distributedamong the robot. Because a robot’s reward is affected by the action of its peers, theassumption of a stationary environment is violated in the Modified Q-learning algorithmas well. This means the particular algorithm has less sensitivity in a multi-robotenvironment. However, it was possible for single agent reinforcement learning to learnthe coordination among robots in a purely cooperative multi-robot task (Yang and Gu,2004). This observation is verified with the multi-robot object pushing simulationspresented in this section. Additionally, some probability analysis results are used toevaluate the Modified Q-leaming in a multi-robot cooperative task as presented next.For the present simulations, the environment state is selected randomly for analysis,whose state value represents the binary number “1000000000” or decimal number “512.”Its corresponding Q-values for the six action candidates are also examined. According tothe coding protocol of the environmental states in section, the binary value of“1000000000” represents an environmental state, which is shown in Figure 3.11. Therobots are allowed to push the object continuously for 100,000 rounds to gain experienceand update their respective Q-tables.710\ActionGoat LocationNoOlFI0‘4DetectionRadius+—.—.—.-1-.0Action I ActionNo 05 No 060b0Figure 3.11: The environmental state with decimal value “512” and six pushingactions available to the robots.The Q-values under the state of”512”in 100,000 rounds of object pushing are recordedand shown in Figure3.12, and the probability density estimate of these Q-values is givenin Figure3.13.72Round number4x 1018161412100864200Figure 3.12: The Q-values under state of “512” in 100,000 rounds of object pushing.2 4 6 8 1073O .350.3ci)0.15ciiD20.100.050-5Q ValueFigure 3.13: Probability density estimate ofQvalues under the state of “512” in 100,000rounds of object pushing.When the environment is stationary, the Q-values in a single-robot system will usuallyconverge to some values with probability one (Yang and Gu, 2004). Figure 3.14 showsthat the Q-values increase rapidly from the initial values of zero to some positive valuesin a very early stage. Then the Q-values start to oscillate in a bounded area with an upperbound of 16 and a lower bound of 3. From Figure 3.14 and Figure 3.15 it is ratherdifficult to find which action is selected at higher probability under the state of “512.”The Q-values in Figure 3.14 appear to oscillate randomly and the curves overlap witheach other. Therefore, it appears that the robots do not learn any cooperative strategy inthe training process and only randomly select their actions. This phenomenon conflictswith the results in Figure 3.12.0 5 10 15 20740-0.2According to the sequential action selection strategy learned by the robots, the action)Irselection probability in each round is calculated using the equation P(a) =(s,a )/rFigure 3.16 presents the probability density estimate of the action selection probabilitywith the sample of 100,000 rounds of object pushing.1412ElO02Action 1Action 2Action 3Action 4Action 5Action 60 0.2 0.4 0.6 0.8Action Selection Probabilty1.2Figure 3.14: Probability density estimate of an action selection probability under the stateof “512” in 100,000 rounds of object pushing.Figure 3.14 gives the probability density estimate of sequential action selectionprobability under a special state in 100,000 rounds of object pushing. It is shown thatthere is a high probability that the robots select the action number Olin a low probability,and there is a lower probability that the robots select action number 04 in a lowprobability. In other words, action number 01 is selected by the robots in low probability75under the environmental state of “5 12”and action number 04 is selected by the robots inhigher probability under that particular state. According to Figure 3.11, it is observed thataction number 01 pushes the object away from the goal location, and action number 04pushes the object towards the goal without colliding with the obstacle. By comparingFigure 3.14 with Figure 3.11, it is easy to conclude that the robots learn the correctsequential action selection strategy when they cooperatively push the object to the goallocation. This explains why the robots in Figure 3.10 can cooperate to push the object tothe goal successfully in a complex and dynamic environment with random obstacledistributions.3.6 SummaryThis chapter addressed the application of machine learning to multi-robot decisionmaking. Most multi-robot operating environments, such as planet surfaces, are dynamicand unstructured where the terrain features and the obstacle distribution change withtime. Furthermore, even if the external physical environment is stationary, the overallsystem structure is still dynamic from the viewpoint of a single robot because otherrobots may take actions thereby changing the environment of that particular robotcontinuously. The environmental dynamics makes multi-robot decision-making quitecomplex, where the traditional task planning becomes ineffective because a plannedoptimal policy can become inappropriate a few minutes later due to changes in theenvironment. Therefore, multi-robot systems face some special challenges; for example,dynamic and unknown environments, which the traditional Al technologies cannoteffectively deal with.Reinforcement learning is commonly used to make decisions for a single robot due to itsgood real time performance and guaranteed convergence in a static environment. In thischapter, first, the single agent reinforcement learning was introduced for multi-robotdecision making. The basic idea of the single agentQlearning and its framework ofMarkov Decision Processes were introduced in section3 .2 and section 3.3. In section 3.4,a more challenging multi-robot environment was assumed, which is dynamic andunknown and has a more complex obstacle distribution. By observing human capabilities76of dealing with a dynamic environment, it is easy to conclude that a human employs notonly planning or reactive (behavior-based) techniques but also learning techniques tosuccessfully complete a task in such an environment. Through learning, a human learnsnew world states, finds optimal actions under these states, and improves his planning andreactive techniques continuously. Learning enables him to deal with unexpected worldstates, decrease uncertainties in the environment, and make his decision-makingcapability more robust in a dynamic environment. For such an environment, a distributedmulti-robot object pushing system based on the modified Q learning algorithm wasdeveloped. The simulation results showed that the robots still could complete the taskwell at considerable speed.A modified Q-learning algorithm, which is suitable for multi-robot decision making wasdeveloped. The simulation results showed that it has disadvantages when used to makedecisions in a multi-robot system, even though it can enable the robots to make gooddecisions and properly complete tasks in most of cases. Because it violates theassumption of a static environment, directly applying the learning algorithm to the multirobot environment will result in low convergence speed, and there is no guarantee thatthe algorithm will converge to the optimal policies. However, the developed learningalgorithm showed good performance with regard to more realistic of state identificationand action selection in a simulated multi-robot object-pushing exercise.77Chapter 4Pose Estimation for Multi-Robot Object Transportation4.1 OverviewA multi-robot system consists of several autonomous robots for carrying out a commontask cooperatively. This represents an important research area in robotics. Such systemsshould be able to monitor and learn their cooperative strategy. Because of their flexibleconfiguration, high efficiency, robustness, low cost and similarity to human groups, thereare numerous research issues and applications associated with them.Several autonomous robots share the environment in a multi-robot system. Therefore, theteam members need to know their exact position and orientation in the sharedenvironment. Not only that an individual robot needs to know its own latest pose(position/orientation), but also the pose of other robots and potential obstacles in theshared environment in order to make rational decisions. Therefore, pose estimation playsa vital role in a multi-robot system.There are many means to measure the pose of a robot or an obstacle; for example, usingdigital cameras, sonar, or laser distance finders. However, most multi-robot systemsemploy digital cameras for this purpose, which offer three key advantages. First, a digitalimage provides a rich source of information on multiple moving objects simultaneouslyin the operating environment. Second, there is a possibility to build fast and accuratevision subsystems at low cost. Third, robot cameras observe and understand theiroperating environment in a “natural” manner as how humans use eyes to observe theworld (Wang and de Silva, 2007). However, numerous obstacles exist in employing acomputer vision system. First, since the object detection has to be done through featureextraction from the image, if the lighting conditions of the environment change, the resultcan become inaccurate. Second, in the image capturing process, hidden features of theimage will not be in the image. Therefore, it will be difficult to detect an object occludedby another object. Third, different light sources may have different light intensity levels78from different directions in the background. In a practical environment, with differentlight sources and lighting directions, identifying an object can become difficult. Fourth,object detection using digital cameras becomes challenging when the actual orientation orthe extracted features are different from those employed in the training process. Fifth,during image processing with large amounts of training data, computational processingpower may not be adequate, and this will affect the overall performance of the system.Use of computer vision as a stand alone sensory approach in a dynamic environment todetect objects has been attempted by others. In particular, Kay and Lee (1991) usedimages from a stereo camera mounted on the end-effecter of a robot manipulator toestimate the pose of an object with respect to the base frame of the robot. The orientationof the object was estimated by using the least-squares method. A Kalman filter was usedto smooth the estimation error. Stone and Veloso (1998) studied a multi-robot soccersystem. In their work, they used a global camera to monitor the positions of the robotsand objects in the game. Simon et a!. (1994) developed a system which was able toperform full 3-D pose estimation of a single arbitrarily shaped rigid object. They used ahigh speed VLSI range sensor to acquire data on the object, and then performed poseestimation by fitting the data to a triangular mesh model using an enhancedimplementation of the iterative closest point algorithm. Veeraraghavan et al. (2003)developed a computer vision algorithm to track the vehicle motion at a trafficintersection. A multilevel approach using a Kalman filter was presented by them fortracking the vehicles and pedestrians at an intersection. The approach combined low-levelimage-based blob tracking with high-level Kalman filtering for position and shapeestimation. Maurin et al. (2005) presented a vision-based system for monitoring crowdedurban scenes. Their approach combined an effective detection scheme based on opticalflow and background removal that could monitor many moving objects simultaneously.Kalman filtering integrated with statistical methods was used in their approach. Chen etal. (2005) presented a framework for spatiotemporal vehicle tracking using unsupervisedlearning-based segmentation and object tracking. In their work, an adaptive backgroundlearning and subtraction method was applied to two real-life traffic video sequences inorder to obtain accurate spatiotemporal information on vehicle objects. Ekvall et a!.(2005) presented a computer vision approach for object recognition and pose estimation79based on color co-occurrence histogram and a geometric model. Even though theapproach was somewhat robust, it failed under large changes in lighting conditions. Parket al. (2006) presented a method for global localization of an indoor environment, whichemployed object recognition using a stereo camera. Their method of global localizationfirst estimated a coarse pose and then a refined pose. The coarse pose was estimated bymeans of object recognition and least squares fitting through singular valuedecomposition, whereas the refined pose was estimated by using a particle filteringalgorithm with omni-directional metric information. Even though a vision system hasbeen used for pose estimation, their work has a number of drawbacks. In particular, thepresented schemes are computationally expensive, time consuming, have time delays,and are susceptible to changes in lighting conditions. Lang et al. (2008) presented amultiple sensor based method for robot localization and box pose estimation. A CCDcamera mounted on the robot was used to find the box in its environment. A laser rangefinder mounted on the robot was activated to measure the distance between the lasersource and the object. Finally, homogenous matrix transformation was applied torepresent the global pose of the robot and the box. The approach was validated usingMicrosoft Robotics Studio simulation environment.In a multi-robot object transportation task, multiple mobile robots move quickly in anunpredicted manner and the vision system needs to capture their positions andorientations within a very short time. Therefore, the conventional time-consuming visionalgorithms are not feasible here. These algorithms are too complex and computationallydemanding for meeting real-time constraints in a multi-robot object transportationsystem.On the other hand, in an object transportation task, the multiple mobile robots move in alarge area that has different levels of illumination. Multi-robot systems usually work inlarge, unstructured, and unknown environments with uneven lighting conditions. Therobots usually move into and move out of sub-areas having different brightness levels. Inorder to track the robots effectively in such an environment, the vision system must berobust with respect to different illumination conditions. However, most of the existingalgorithms do not consider this problem.80Object and obstacle recognition and pose estimation are essential for an objecttransportation system with mobile robots. Therefore, computer vision alone is not anadequate solution for the detection of objects and potential obstacles. Sensor fusion canbe proposed as an alternative to overcome the problems mentioned above. This chapterinvestigates an approach for pose (position and orientation) estimation of an object orpotential obstacle using a laser range finder and a CCD camera mounted on a mobilerobot. In the next section, the multi-robot transportation system in the laboratory isintroduced. In Section 4.3, the proposed pose estimation approach is presented. Section4.3.1 presents the robot global localization. Section 4.3.2 outlines color blob tracking forobject recognition. Section 4.3.3 presents object pose estimation. An experimentalevaluation of object pose estimation is given in Section 4.4. The conclusions of thechapter are given in Section Multi-robot Transportation SystemThe primary objective of the multi-robot object transportation project in the IndustrialAutomation Laboratory (IAL) is to develop a physical multi-robot system where a groupof intelligent autonomous robots has the ability to work cooperatively to transport anobject to a goal location and orientation in an unknown and dynamic environment.Obstacles may be present or even appear randomly in the work environment during thetransportation process. Three or more autonomous mobile robots are present in thesystem, which enable cooperative transportation of an object to a goal location. Duringthe course of the transportation, they have to interact with each other to develop thecoordination and cooperative strategy and decide on the feasible transportation locationsand suitable action positions selected by each robot so that the object is transportedquickly and effectively while avoiding obstacles that may be present during thetransportation process. Furthermore, the system may have to consider avoiding damage tothe transported object. Charge coupled device (CCD) camera systems, laser range findersand sonar are used to monitor and measure the current location and orientation of theobject. A schematic representation of the initial version of the developed system is shownin Figure 1.2 and is repeated in Figure 4.1.81Figure 4.1: The developed physical multi-robot system.The dynamic and unpredictable work environment contains fixed obstacles and movableobstacles, which may appear randomly. The robots and the sensory system are separatelylinked to their host computers, which are connected though a local network to implementcomplicated controls and machine intelligence. The robots, the camera, and the sensorsare separately linked to their host computers, which are connected through a localcommunication network, to implement complex controls and machine intelligence.It is important to consider three major challenges in the development of the system,which are also major challenges in multi-robot systems. The first one is the dynamic andunpredictable nature of the environment. Especially, dynamic random obstacle arescattered within the working environment of robots. The obstacles may appear anddisappear randomly during the transportation process. In addition, there are multipleTCP/IPServer82robots working in parallel. While one robot makes decisions, other robots may havechanged the environment according to their behaviors. This also makes the environmentdynamic from the viewpoint of the first robot. As a result, the decision making processbecomes very complex for the robots within the dynamic environment. A majorchallenge is to make decisions in real-time while identifying hidden or invisible featuresof objects and the surrounding environment conditions from the local viewpoint of therobots.The second challenge results from the local sensing capabilities of the robots. In thepresent project, it is assumed that each robot is capable of detecting an object or anobstacle within a limited detection radius. It means that the robots possess local sensingcapabilities only and therefore the global environment is unknown to them. Robots do nothave an understanding about the location or the shape of an obstacle until a robot movesclose to the obstacle. The unknown environment is a major challenge in the multi-robotdomain because the robots do not have a complete understanding about the environmentin advance, partly due to the unfamiliarity and partly due to the dynamic nature. Thus thesystem cannot utilize the traditional planning technologies for decision making.The third challenge is the lack of a reliable communication capability among robots,where the disturbances are represented by white noise. Due to unreliable communication,the robots may end up selecting a wrong coordination strategy. Improvement of therobustness of multi-robot decision-making in an environment with unreliablecommunication is a challenging issue in the multi-robot domain.4.3 Pose Estimation TechniquePose estimation in real time is a fundamental requirement in multi-robot cooperation forobject transportation. Though there has been substantial growth of research activity inpose estimation of a robot, the pose estimation of objects and potential obstacles has notreceived the much needed attention. Estimation of the Cartesian coordinate position (x, y)and orientation (8) of an object can be used in diverse applications like object graspingand robotic manipulation, motion prediction, and object transportation. This chapterpresents an approach to acquire meaningful data on distance and orientation of an object83using the laser range finder of a mobile robot for calculating the pose (center andorientation) of the object with respect to the robot in the global coordinate system. Thedeveloped methodology as presented here is an integral part of the multi-robotcooperative object transportation system. A group of mobile robots explore theenvironment to detect a color coded object. Once the object is detected, that particularrobot estimates the object pose. This includes the center and the orientation of the object.However, the object of interest is large and heavy, and cannot be transported by a singlerobot. Therefore, a team of robots in the multi-robot system is called upon tocooperatively transport the object. Figure 4.2 presents the general scheme of poseestimation of an object for cooperative object transportation.According to the proposed method, successful transportation constitutes six steps, asoutlined next.Initialization of all robots: Robot global localization. Local localization of all robots aretransformed into global localizationSearch for an object: Robots start searching for an object of interest to be transported tothe goal location and orientation.Rearrange of robot pose: Once the robot in the multiple robot system finds a color codedobject, it rearranges its pose to center the color blob in the camera frame.Object pose estimation: The particular robot, which found the color blob, estimates thepose of the object and takes over the leadership in the rest of the state identificationprocess. The particular robot assigns tasks to the member robots in the system during thestate identification stage.Obstacle pose estimation: The other robots in the system identify the local obstacledistribution around the object and build the state binary code of themselves. Then thesystem determines suitable points of contact (action points) with the object fortransporting it.84Transportation of the object: Multiple robots transport the object by pushing the object ofinterest. The process is repeated until the object reaches the goal location and orientation.CInitialize RobotsC____//:4 7 EncoderRobot Global Localization //---I Search Object-— NoIs Object FoundRearrange Robot Pose- :-:-Object Pose Estimation 4- ZLTransport the ObjectFigure 4.2: General scheme of pose estimation for cooperative object transportation.4.3.1 Global Localization of RobotGlobal pose estimation is required since there is more than one robot in the environmentof cooperative object transportation. An important means of estimating pose of a robot isodometry. This method uses data from the shaft encoder mounted on a wheel of therobot. The encoder measures the speed and the displacement of the drive wheel after aspecific time step and is added to the pose of the robot in the previous time step. Withreference to Figure 4.3, the kinemati model of the mobile robot is expressed by the85///Camera/WanderLaser RangeFinderrelations presented next. The state vector representing the pose of the mobile robot in theglobal frame is given by:q(k) = [x(k) y(k) 8(k)]T , where x(k) and y(k) are the coordinates of position P in mm,and 8(k) is the orientation in degrees. Also, D(k) is the distance traveled between thetime steps k and k+]; v1(k) is the robot translational speed in mm/s; T is the samplingtime in seconds; 8(k) is the angle between the robot and the global x-axis; A 8(k) is therotation angle between time steps k and k+1;0L(k)and0R(k) are the angular velocitiesof the left and right wheels, respectively; r is the radius of the two drive wheels; and d isthe distance between the two wheels. The kinematic model of the mobile robot is givenby:x(k + 1) = x(k) + D(k).cosO(k + 1) (4.1)y(k +1) = y(k) + D(k). sin 8(k +1) (4.2)8(k +1) = 8(k) + A8(k) (4.3)D(k) v (k).T (4.4)A8(k) = o(k).T (4.5)v (k)= COL(k).r + coR(k).r(4.6)2o.(k)=WR(k).r— ‘L(k).r(47)dIt follows that the updated pose state vector is q(k + 1) = [x(k +1) y(k + 1) 8(k+ 1)]TBy using equations 4.1 through 4.7, the robot global pose q(k +1) for a wheeled robot iscomputed as:86x(k) WL (k)± R(k)cos0(k) +E(aiR(k)- O)L(k)))q(k + 1) = y(k)+aiL(k)±(k)sin0(k)+ L(aiR(k)- O)L(k))(4.8)aiR(1’)— O)L(k)0(k)2XRxl k)Figure 4.3: Global Pose Estimation of Wheeled Robot.4.3.2 Color Blob TrackingThe phenomenon of color blob template matching is utilized for color blobtracking. Theapproach presented in this section entails detection of the object to be transported withthe help of two different color blobs attached to its vertical surfaces. For thispurpose theAdvanced Color Tracking System (ACTS) capability is employed. ACTS is a softwaretool which in combination with a color camera, allows the application to track up to320colored objects at a speed of 30 frames per second.The robots explore the surrounding environment within the sensory range and a robotreceives a stimulus if it locates an object with the color blob. As it locates the object,therobot rearranges its pose so as to move the color blob attached to the object to the centerof the camera frame. As illustrated in Figure 4.5, the camera frameis divided into fourquadrants. Here, the size of the camera frame is (640x480) pixels, while the center ofthe camera frame is represented by a square of(40x40) pixels. If the color blob falls87within the square, it is believed to be in the center. Otherwise the robot rotates its base sothat the detected color blob is approximately located at the centerline of the cameraframe. By centering the color blob in the camera frame, a suitable position is providedfor the Laser range finder to scan the object for pose estimation. A flow diagram of theabove mentioned process is shown in Figure 4.4.Figure 4.4: flow diagram of color blob tracking\i:s1ove Robotto centre theblobNo‘488(0,0) (640,0)o oSecond QuadrantfThird Quadratit40o 0First Quadrant Fourth Qu acirant(0,480)4(640,480)Figure 4.5: Division of Camera Frame into Four Quadrants.4.3.3 Object Pose EstimationAs noted in section 4.3.1, once the robot rearranges its pose to center the color blobwithin camera frame, the laser range finder that is mounted on the robot is activated toestimate the pose of the object. Figure 4.6 indicates how the laser range finder is used tomeasure the range. Specifically, the distance is determined by calculating the phasedifference between the incident and reflected signals. If the transmission time from A toB is T, phase difference isq,and the modulation frequency isI’m’then the distancebetween A and B is given by: L = c . The pose of the object is estimated relative tothe robot, which is then transformed into global pose estimation (Gao and Xiong, 2006).This is needed because the robot may have to communicate the object pose to otherrobots in order to transport the object cooperatively.89///I- Reflection RayLJiidniRay____Figure 4.6 (a): Schematic drawing of laser range sensor.Figure 4.6 (b): A 1 80-degree laser range sensor. Relative Pose EstimationWith reference to Figure 4.7, data on distance and angle are acquired through the laserrange finder to calculate the center and the orientation of the object relative to the robotcoordinate system.Figure 4.7: Relative Object Pose Estimation.‘VCB90Note that a and /3 are the angles corresponding to the distances d1 and d2, respectively.Also, d1= AOR,d2= BOR;ORandXRare robot coordinates; and d1 and d2 are thedistances from laser range finder to two distinct edges of the object. We have:dlcosa=ORX3(4.9)d2 cos,6= ORXI(4.10)d1sina=AX3 (4.11)d2sin/3=BX1 (4.12)By using equations 4.9 through 4.12, the angle which represents the orientation of theobject is given by:‘=tan’d2sinfl—d1sina(4.13)d1 cosa — d2 cos/3Object orientation can be found by using equation 4.13.The center point of the object°Bis given by:cosa + d2 cos/3 +2(dsinfl — d1 sina)]![d1sina + d2 sinfl +2(d1cosa — d2 cosfi)]xc= cosa +d2cosfl+-4(dsinfl—d1sina)] (4.14)“c=-i[dlsina+d2sinfl+-42(djcosa—d2cosfl)] (4.15)Equations 4.13 through 4.15 represent the object pose relative to robot pose, and can begiven by the vector:91o=[xY(4.16)Equation 4.16 describes the object with respect to the mobile robot, and is called therelative pose.Figure 4.8 shows the results of the laser range finder using MobleSim simulator. Thereare six objects within the laser scanner according to the results given in Figure 4.9.Because the object with a color blob label should be in the center of the laser range scanafter applying the color blob tracking in Section 4.3.2, it is very easy to establish that theobject must be in the range between - 14° and 32°.Fig. 4.8 (a) MobileSim simulator visualized results from laser range finder.iiFig. 4.8 (b) Laser range finder results using MobileSim simulator.924.3.3.2 Object Global EstimationA homogenous transformation matrix (Spong et a?, 2006), which represents thetranslational and rotational motion of one frame with respect to another frame, is appliedto represent the relation between the three coordinate systems: the robot coordinatesystem, the object coordinate system, and the global coordinate system.The homogeneous transformation matrix between the object coordinate system and therobot coordinate system can be expressed by using the result obtained in section 0 1The robot global pose is known from equation 4.8 in section 4. The homogeneoustransformation matrix between the robot coordinate system and the global coordinatesystem can be established as:cos 8T”=tsin80—sin8 Xcos8 Y0 1By using equations 4.17 and 4.18, the homogeneous transformation matrix T between theobject coordinate system and the global coordinate system can be computed as:cos8T T”.T’ = sin80—sin8cos80X cos8’ —sin9 XY . sin8’ cos9 }1 0 0 1—sin+O’)Xcos8—}sin8+xsin+9) cos+9)XsinO+}cos&+yT’= cos8’Y(4.17)(4.18)T:0 0 1(4.19)93From equation 4.19, the pose of the object in the global coordinate system is determinedas:xXcos8—Y,sin8+xo”=y = xsin8+Ycos8+y8 A tan(sin(8 + 8’)! cos(8 + 8’))4.4 Experimental ResultsIn this section the experimental results are presented using the test bed described insection 4.4.1. Experiments are conducted under two different situations. In the first set ofexperiments, the robot is kept stationary and the objects are placed at different positionsand orientations. In the second set of experiments, the robots move, and once an object isdetected by a robot it calculates the pose of the detected object. In both sets ofexperiments the objects are placed in different poses, as indicated in Figure 4.9.—5\ —----• Position 01I Position 02/RobotFigure 4.9: Arbitrary layout of objects.944.4.1 The Test-bedA test bed has been developed in the Industrial Automation Laboratory (IAL) for multi-robot transportation, as shown in Figure 4.10. It includes one four-wheel-drivenActiveMedia PioneerTM 3-AT robot and two three-wheel-driven ActiveMediaPioneerTM 3-DX robots. All three robots include wireless Ethernet, eight sonar sensorsand a gyro. Two robots include a CCD camera and a laser range finder. Each robothouses a Pentium based computer running windows 2000. The laser range finder in thepresent system uses a SICK LMS 200 2D scanner, which has a horizontal range of 180°with a maximum resolution of0.5°, which is used to monitor the positions andorientations of the object, potential obstacles and mobile robots. The device producesrange estimation using the time required for the light to reach the target and return(Nourbaksh et at, 2004).Figure 4.10: The experimental system in IAL at UBC.954.4.2 Experiments with Stationary RobotTwo sets of experiments are conducted in the category of stationary robot. In the first setof experiments, the robot is placed at the global position [0, 100, 0], which gives the Xcoordinate, Y-coordinate, and orientation, respectively. The seven test points in Table 1represent an object having dimensions 525mmx257mmx446mm, placed in theworkspace at seven different positions and orientations. First the actual center of the testpoint is measured, which is compared with the center estimated through the laser rangefinder that is mounted on the robot. Figures 4.11, 4.12 and 4.13 correspond to theinformation in Table 4.1. Specifically, Figure 4.11 gives the difference between the actualand the estimated values of the x-coordinate of the center of the object. Figure 4.12represents the difference between the actual and the estimated values of the y-coordinateof the center of the object. Figure 4.13 indicates the difference between the actualorientation and the estimated orientation of the object.Table 4.1: The actual and estimated object pose results in the first set of experiments witha stationary robot.Test X- 0-Points actual estimated error %error actual estimated error %erroractual estimated error %error1 248 286.36 38.36 15.47 2095 1985.69 109.31 5.22 16 14.17 1.8311.442 -219 -241.24 22.24 10.16 1184 1150.34 33.66 2.84 -12 -13.08 1.0893 943 929.25 13.75 1.46 1460 1324.37 135.63 9.29 3836.96 1.04 2.744 1471 1387.21 83.79 5.7 463 350.17 112.83 24.37 70 68.661.34 1.915 -1364 -1370.01 6.01 0.44 538 536.67 1.33 0.25 -66-66.84 0.84 1.276 -1368 -1388.36 20.36 1.49 1354 1317.24 36.76 2.71 -32-33.97 1.97 6.167 -710 -727.02 17.02 2.4 2298 2274.81 23.19 1.01 -7-9.31 2.31 3396LFigure 4.11: The x-axis error from Table 4.1.Figure 4.12: The y-axis error from Table 4.1.200015001000I [[5OOO500-1000-1500D X- actual• X - estimatedDX- error-2000Test Point25002000Iii1500E10005001IIIDY-actual• Y- estimatedDY-errorki[01 2 3 4 5 6 7Test Point9780Figure 4.13: The orientation error from Table 4.1.In the second set of experiments the robot global pose is changed to [40, 105, 60]. Thedimensions of the object in this experiment are 525mmx257mmx446 mm. Table 4.2presents the result for the second experiment, and Figures 4.14, 4.15, and 4.16 show theerror results corresponding to the information given in this table. The same parameters asin the previous experiment are used. In both sets of experiments, with a few exceptions,the % error is well within an acceptable range, which is 100 mm in the consideredapplication.EE00C006040200-20-40-60-80C theta- actual• theta - estimatedC theta - errorTest Peint98Table 4.2: The actual and estimated object pose results for the second set of experimentswith a stationary robot.Test X- X- Y- Y- 8- 6-Points actual estimated error %error actual estimated error %error actual estimated error %error1 248 266.56 18.56 7.48 2095 1964.69 130.31 6.22 16 13.06 2.94 18.382 -219 -169.26 49.74 22.71 1184 1145.34 38.66 3.27 -12 -13.54 1.54 12.833 943 931.65 11.35 1.2 1460 1303.52 156.48 10.72 38 36.56 1.44 3.794 1471 1364.56 106.4 7.24 463 327.65 135.35 29.23 70 67.66 2.34 3.345 -1364 -1293.14 70.86 5.2 538 534.66 3.34 0.62 -66 -67.67 1.67 2.536 -1368 -1339.67 28.33 2.07 1354 1299.93 54.07 3.99 -32 -34.86 2.86 8.947 -710 -692.93 17.07 2.4 2298 2258.81 39.19 1.71 -7 -10.21 3.21 45.86HJuiFigure 4.14: The x-error from Table 4.2.EEa)C.,U,2000150010005000-500-1000-1500-2000c X- actual• X- estimatedDX- errorTest Point99Figure 4.15: The y-error from Table 4.2.LI34’J4’DthetaerrOrFigure 4.16: The orientation error from Table 4.2.2500 —20001500100050:II:DY-actual• Y- estimatedDY-errorlull1 2 3 4 5 6 7Test Point8060402001-20-40-60-80Test Point1004.4.3 Experiments with a Moving RobotTwo sets of experiments are conducted in this category as well. Here, robots explore theenvironment in searching for an object to be transported. Once a robot finds the colorcoded object, it rearranges its pose to center the color blob within the camera frame. Thenthe laser range finder is activated to estimate the orientation and the center of the object.In the first set of experiments, an object having dimensions 430x537x594 is used.Seven different experiments are carried out. Table 4.3 presents the results of theexperiments. Figures 4.17, 4.18, and 4.19 correspond to this table, which show thedifference between the actual and the estimated x-coordinate, y-coordinate, andorientation, respectively, of the center of the object.Table 4.3: The actual and estimated object pose results for the first set of experimentswith a moving robot.Test X- X- Y- Y- 8- 8-Points actual estimated error %error actual estimated error %error actual estimated error %error1 1318 1384.52 66.52 5.05 3455 3514.78 59.78 1.73 30 28.57 1.43 4.772 2467 2543.12 76.12 3.09 4047 4149.39 102.39 2.53 15 12.95 2.05 13.673 1205 1237.39 32.39 2.69 6147 6219.28 72.28 1.18 45 44.21 0.79 1.764 2745 2790.32 45.32 1.65 6123 6183.43 60.43 0.99 -15 -16.43 1.43 9.535 1465 1487.45 22.45 1.53 7442 7543.27 101.27 1.36 -60 -61.25 1.25 2.086 2063 2092.14 29.14 1.41 8522 8632.02 110.02 1.29 -30 -31.37 1.37 4.577 2842 2914.42 72.42 2.55 9712 9828.21 116.21 1.2 -75 -76.65 1.65 2.21013500Figure 4.17: The x - error from Table 4.3.Figure 4.18: The y - error from Table 4.3.300025002000C)C.,, 1500U,100050[i[ IL IIFII—‘UII:_X- actual•X-estimated— X-error1 2 3 4 5 6 7Test Point12000 -______________________________10000E8000_________DY-actualI______ ____ ________________:rTaHTest Point10260I 2 3 •.Figure 4.19: The orientation error from Table 4.3.In the second set of experiments, an object having dimensions 590mmx726mmx171 5mm is used. Table 4.4 presents the results from these experiments. Figures 4.20,4.21, and 4.22 correspond to this table, which show the difference between the actual andthe estimated x-coordinate, y-coordinate, and orientation, respectively, of the center of theobject. Here too the error readings are well within acceptable range.Table 4.4: The actual and estimated object pose results for the second set of experimentswith a moving robot.Test X- X- Y- Y- 8- 6-Points actual estimated error %error actual estimated error %error actual estimated error %error1 1321 1356.43 35.43 2.68 3486 3509.52 23.52 0.67 30 28.27 1.73 5.772 2488 2502.32 14.32 0.58 4023 4112.23 89.23 2.22 15 13.95 1.05 73 1223 1235.43 12.43 1.02 6123 6179.72 56.72 0.93 45 43.77 1.23 2.734 2724 2787.21 63.21 2.32 6147 6184.17 37.17 0.6 -15 -16.43 1.43 9.535 1437 1462.76 25.76 1.79 7412 7513.23 101.23 1.37 -60 -62.24 2.24 3.736 2040 2072.34 32.34 1.59 8541 8612.98 71.98 0.84 -30 -32.06 2.06 6.877 2825 2903.59 78.59 2.78 9703 9812.73 109.73 1.13 -75 -77.06 2.06 2.75I—ri,lU40200-20-40-60EE0C).5U,D theta- actual• theta - estimatedO theta - error-100Test Point103—Figure 4.20: The x-axis error from Table 4.4.Figure 4.21: The y-axis error from Table 4.4.35003000IID X actual- •X-estimatedi i,...,rjr[:j,oxeorTest Point1200010000E8000E60004000200:111HHL1[1 2 3 4 5 6 7Test Point104By comparing the % error of the results obtained when the robots were stationary withthose when the robots were moving, it is seen that the latter results are better. When arobot autonomously explores the surroundings and finds an object, it stops at a suitabledistance from the object, rearranges its pose to centre the color blob within the cameraframe and then estimates the pose of the object. On the other hand, in the case of astationary robot, the object is placed in front of the robot, and the associated distance maynot be a suitable. Consequently, human error may have contributed to the enhanced %error in pose estimation with a stationary robot.4.5 SummaryA method for pose estimation of an object for application in cooperative objecttransportation by mobile robots was developed in this chapter. A CCD camera, opticalencoders and laser range finders were the sensors utilized by the robots. In order totransport the object by robots, first robot global localization was established. Next, theCCD camera was used to find the object in the work environment. Once the object wasidentified by using a color blob tracking approach, the robot rotated its base to move thecolor blob into the center of the camera frame. Finally a laser range finder was used to6040200a)-20-40-60-80C theta- actual• theta - estimatedO theta - error-100Test PointFigure 4.22: The orientation error from Table 4.4.105scan the object and to determine the distance and the orientation angle. The developedapproach was thoroughly tested using a series of physical experiments.106Chapter 5Experimentation in Distributed Multi-Robot CooperativeObject Transportation5.1 OverviewMulti-robot object transportation is an active research area in the robotics community.The particular multi-robot system as developed in our laboratory (Industrial AutomationLaboratory - JAL at the University of British Columbia - UBC) consists of threeheterogonous multiple robots. It is intended to transport an object of interest to a goallocation cooperatively in both static and dynamic environmental conditions. There arepractical and theoretical issues, which need to be addressed for realizing the cooperativetransportation task. First, techniques such as the modifiedQlearning algorithm and themulti-robot coordination framework in a multi-agent environment are used in thedeveloped cooperative transportation system. These techniques deal with important issuesthat arise in a multi-agent environment. The related issues include task assignment,dynamic environment, and distributed control. Second, though multi-robot transportationitself applies in industrial projects and scientific explorations, it faces some challenges,which are addressed in the present thesis. As a rationale for a possible practicalapplication, it is desirable to have multiple autonomous robots transport an object ofinterest in a dangerous or hazardous environment, rather than employing human operatorsfor it.In this chapter, the techniques developed in the previous chapters, such as the hierarchicalmulti-robot architecture, multi-robot coordination mechanism, the modified Q learningalgorithm, and the pose estimation technology, are integrated to implement a physicalmulti-robot transportation system operating in a dynamic and unknown environment.These techniques are tested and validated with the available mobile robots in IAL at UBC.1075.2 Test BedAn experimental system has been developed in the JAL at UBC to implement aphysicalmulti-robot object transportation system. An overview of this system is presentedinFigure 5.1, which is a repetition of Figure 1.2.-Figure 5.1: The multi-robot object transportation system.Three autonomous mobile robots are used in the developedsystem to transport a heavyand long rectangular shaped object to agoal location in the testing area of our laboratory.According to the environment in the testing area, a simple globalcoordinate system isemployed to validate the techniques developed in the previous chapters.In the beginningof the operation, each robot is informed of theirinitial position and orientation withrespect to the global coordinate system.It estimates the latest pose (position andorientation) of a robot by recording and analyzing the data of thewheel encoders and thedata of its compass sensor while it moves in the environment.Each robot in the systemWired/wirelessServerRouterTCP/IP108transmits continuously the current pose via the wireless network. In addition, the robotsare assumed to know the global coordinates of the goal location in advance.The object and the obstacles are randomly placed within the environment. When themulti-robot system begins to operate, robots have to search the environment in order todetect the object. Once a robot in the system detects the object, it will estimate the poseof the object. Then the particular robot will transmit the object pose to the system and thisrobot will act as a leader of the group for that particular instance. The leader assign taskto the other robots for detecting obstacles in the environment. The task of detection andfeature estimation of objects and obstacles is accomplish by fusing the sensory data fromthe sonar, laser range finder and CCD camera of a robot. The sensors are assumed to onlydetect obstacles within the detection radius of 2 meters, and a robot is able to exchangeits sensory information with its peers via wireless communication to establish a binaryword for state identification. The robots possess local sensing capabilities only andtherefore, they are able to know only a local area bounded by the detection radius of theenvironment.Two color blobs are present on the object of interest. One color blob is placed on onelateral side and on its opposite side. The other color blob is on the adjoining side and itsopposite side. After a robot detects the color blobs using its CCD camera, the robotrearranges its pose to estimate the pose of the object by fusing information from the sonarand the laser range finder of the robot. If an object without any color blobs is detectedwithin the environment, it will be regarded as an obstacle. Large and heavy objects withtheir color blobs are shown in Figure 5.2.,Color Blob____(a) (b)Figure 5.2: The color blobs used to identifi the orientation of an object.109Each robot makes decisions independently to build the state binary code utilizing thepose information of the objects and obstacles. Robots need to exchange this informationwith their peers in order to form a cooperation strategy and for making decisions. Thissubject has been addressed in Chapter 3.The developed system consists of a computer server, which is used to synchronize theactions of the robots. However, it does not serve as a centralized decision maker becausethe system is fully distributed where each robot makes decisions independently.The mobile robots are manufactured by ActivMedia Robotics, which is an establishedcompany in the mobile robot industry. One four-wheel-drivenPioneerTM3-AT robot andtwo three-wheel-drivenPioneerTM3-DX robots are used in the cooperative objecttransportation system developed at our laboratory (IAL). They represent an agile,versatile, and intelligent mobile robotic platform with high performance currentmanagement to provide power when it is needed. Built on a core client-server model, theP3-DXIAT robots contain an embedded Pentium III computer, opening the way foronboard vision processing, Ethernet-based communication, laser sensing, and otherautonomous functions. The P3 robots store up to 252 watt-hours of hot-swappablebatteries. They have a ring of eight forward sonars and ring of eight rear sonars. Theirpowerful motors and 19 cm wheels reach speeds of 1.6 mIs and carry a payload of up to23 kg. In order to maintain accurate dead reckoning data at these speeds, the Pioneerrobots use 500-tick encoders. Its sensing and accessories include laser based navigation,bumpers, gripper, vision, compass and a suite of other options. A P3-DX robot and a P3-AT robot are shown in Figure 5.3 and Figure 5.4, respectively.Figure 5.3:PioneerTM3-AT mobile robot.110Figure 5.4:PioneerTM3-DX mobile robot.The P3-DX/AT robots with the ARIA software have an ability to:• Wander randomly• Plan paths with gradient navigation• Provide C/C++ development platform• Localize using sonar and laser distance finder• Display a map of its sonar and/or laser readings• Drive under control of software, keys or joystick• Communicate sensor and control information related to sonar, motor encoder,motor controls, user I/O, and battery charge data• Simulate behaviors offline with theMobileSimTMsimulator, which accompanieseach development environment.The robot can building maps and constantly updates its position using its Laser Mappingand Navigation System and MobileEyes, within a few centimeters while traveling within111mapped areas. The user is able to observe the view of the robot remotely, speak, and hearaudio, with appropriate accessories and send the robot on patrol (wandering).In summary, the Pioneer-3 DX or AT robots are all-purpose robots, used for research andapplications involving mapping, tele-operation, localization, monitoring, reconnaissance,vision, manipulation, cooperation and other behaviors.5.3 Physical Multi-Robot Transportation SystemA physical multi robot transportation system has been developed to integrate variousapproaches developed in chapters 2 though 4. The system has local sensing capabilitiesonly. An experiment is carried out to validate these approaches in a real environment inthe presence of sensor noise.In this experiment, three mobile robots equipped with the modifiedQlearning algorithmare employed to test the cooperative transportation strategy of a large and heavy object.The object is to be transported from one end of the environment to a predetermined goallocation in the presence of an obstacle.The simulation system described in Chapter 3 is employed to train the robots, before theexperiment is carried out. In this manner, the Q-tables of the robots are improvedaccording to the environment model. After 10,000 rounds of simulated object-pushing,the Q-tables of the three simulated robots are exported to the three physical robots tocomplete a physical cooperative object-pushing task.In the simulation system described in Chapter 3, after each step of object-pushing, therobots will identifi the new environment state and select the corresponding actionsaccording to the MQLA. The same policy is employed in the experimental system.However, a larger distance is selected as the detection radius for the physical multi-robotobject-pushing system in comparison to the simulation system.The experimental results of cooperatively transporting an object in a real environment arepresented in Figure 5.5.112(a)(b)(c)113t71[(J)()(p)(g)I(h)(i)115a)Figure 5.5: Multi-robot cooperative transportation.The three mobile robots, the large object and an obstacle have been employed in ourexperimental setup. The three mobile robots are informed about their initial pose(positions and orientations) with respect to the global coordinate system before theybegin their work. As shown in Figure 5.5 (a), first a large object is placed in the testingarea of our laboratory. When the system operation starts, each robot uses its CCD camerato wander within the testing area and to detect the color blob on the object surface.Figure 5. 5 (b) shows one of the robot in the system detecting the object. If a robot amongthe three robots detects the color blob on the object surface, then that particular robotrearranges its current pose so as to center the color blob in its camera frame. Figure 5. 5(c) shows the robot, which detects the object, has rearranged its current pose to center thecolor blob in its camera frame. After that, the pose of the object is estimated relative tothe current pose of the particular robot using the sensory data from its laser range finder.Figure 5.5 (d) shows the robot, which detects the object, goes to estimate the pose of theobject. By fusing relative pose of the object, the robot can estimate the pose of the objectin the global coordinate system. This information is then transmitted to the other robots inthe group. At the same time, the robot that detected the object will be assigned theleadership of the group and all other robots in the system will transfer their decisionrights to the coordination unit of the leader robot through their arbitrators. After all therobots have transferred their decision making rights to the leader robot, it will assign116tasks to peer robots in the system in order to assist them in identifying the obstacledistribution around the object. Figure 5.5 (e) shows how all the robots in the system try toidentify the obstacle distribution around the object. The leader robot transmits allprocessed sensor information to the system. Then the robots will be able to establish thestate of the current local environment individually by fusing the information of the objectpose with the information of local obstacle distribution. Local environmental state isestablished by the robots, and the optimal action under this state is determined using itsQ-table. Figure 5.5(0shows how the robots push the object with the selected actions.If none of the robot in the system can find the object, those robots will wander in theenvironment until one of them detects the object.While each robot tries to identify the existing obstacles within its detection area,simultaneously it scans its local environment utilizing the laser range finders. If an objectis detected within this area, the robot needs to verify whether it is an obstacle or it is apeer robot.Figure 5.5 (g) and (h) show the robots detecting an obstacle (the blue garbage bin) in thepath and avoiding that obstacle. This obstacle had not been detected before by the sensorsof the robots due to the limited detection radius.Figure 5.5 (f) and (g) show that the robots have changed their formation to adapt to thenew environment state. Here they attempt to change the orientation of the object in orderto avoid the obstacle.Figure 5.5 (g) and (h) show that the robots have avoided the obstacle successfully. Figure5.5 (i) and (j) show that the robots effectively have completed a cooperativetransportation task in an environment with one unknown obstacle. However, there is anerror due to wheel slip, which corresponds to the difference between the actual locationof the robot and its desired location.From Figure 5.5 (a)-(j), it is observed that the learning-based multi-robot system is quiteeffective in carrying out a cooperative transportation task in an environment with117unknown obstacles. The learned Q-tables in the training stage help the robots to selectgood cooperation strategies in a robust and effective manner.5.4 SummaryA fully distributed multi-robot transportation system is physically implemented in theIndustrial Automation Laboratory (IAL) at the University of British Columbia (UBC),which integrated the approaches developed in chapters 2 through 4. In the system, threemobile robots were employed to push a heavy object to a goal location in the laboratoryenvironment. There were three challenges to achieve the goal in the developed system.First, each robot only possesses local sensing capability, which means they have to makedecisions while they do not possess knowledge of the global environment. Second, theexperimentation system presents sensor noise and uncertainty of actions, which did notexist in the simulation system. These constraints degrade the performance of the robots.Finally, the real environment is dynamic due to the presence of random obstacles. Thedistributed computing using C++ was introduced into the developed multi-robot system,which enabled the robots to cooperate in an easy and efficient manner.An experiment was carried out to evaluate the performance of the developed distributedmulti-robot system. In this experiment, cooperative robotic transportation of a long andheavy object in an environment with unknown obstacles was carried out. Theexperimental results showed that the developed multi-robot system has the ability to workwell in a dynamic environment, with sufficient accuracy. It can be concluded that theapproaches developed in chapters 2 through 4 are useful and effective in enabling amulti-robot system to operate in a dynamic and unknown environment.118Chapter 6Conclusions6.1 Primary ContributionsMulti-robot systems have to undergo many improvements before they can be used in reallife applications. For example, the challenges that have to be faced are enormous indeveloping a system of multiple autonomous robots to cooperatively transport an objectof interest in a dynamic and unlcnown environment like the surface of Mars. In this thesis,several techniques have been developed and integrated to support the operation of amulti-robot object transportation system in a complex and unstructured dynamicenvironment with an unknown terrain. In particular, the thesis has made contributionswith respect to self-deterministic learning, robustness, action optimization, and real timeoperation of a cooperating team of robots in such environments. Basically, thecontributions of the thesis can be classified into four main areas as indicated below.First, a general, distributed multi-robot architecture and a multi-robot coordinationmethod were developed in Chapter 2. This hierarchical architecture integrated severaltechniques from artificial intelligence (Al) so that multi-robot systems could operate in arobust and effective manner in a dynamic and unknown environment. In its top level, amachine learning unit and a coordination unit were combined to establish goodcooperative strategies for the robots while they attempt to cooperatively transport anobject. A machine learning unit with effective training was used to select proper actionsfor the robots so that they could complete a common task quickly and effectively. In thestate identification stage, the coordination unit would be temporarily activated by anarbitrator to search the environment according to the coordination mechanism for theobject in order for the robots to estimate the pose and build the state binary word. In themiddle level of the architecture, a behavior-based approach, which has been proved to bevery effective for single robot systems, was employed to decompose the abstract behaviorsent by its upper level into more detailed primitive behaviors so that the detailed control119requirements could be generated and executed. This architecture also included acommunication module so that any robot could easily exchange information with otherrobots in the team using standard network protocols. By combining the learning,coordination and behavior-based approaches, and carefully designing the robotcoordination mechanism, the developed multi-robot architecture was found to be moreflexible and powerful than a traditional two layer architecture based on a single Altechnique.Second, machine learning was employed to find optimal cooperation strategies for multi-robot systems so that they could operate in a dynamic and unknown environment. For theprototype system has the to cope with scalability, computational complexity andcommunication bottleneck when the number of robots is increased. To overcome theassociated problems, a distributed Q-learning approach for multi-robot objecttransportation was developed. By directly extending single-agent Q-learning to the multi-robot domain and carefully designing the reward function, the developed system was ableto demonstrate good adaptively and effectiveness in a complicated environment withmultiple obstacles. However, due to lack of knowledge of the actions of the peer robots,the single-agent Q-learning algorithm was found to converge very slowly in a multi-robotenvironment. In fact, single-agent Q-Learning showed better performance (as measuredby the average number of steps required in a round of object-pushing).In order to overcome various disadvantages of the traditional single-agent Q-learningalgorithm when extended to a multi-agent scenario, a modified Q-Learning algorithmtermed MQLA was developed in this thesis. By enabling the robots to learn in apredefined sequence, the new MQLA algorithm was able to overcome several majorshortcomings in the single-agent Q-Leaming. Consequently it is more suitable for multirobot tasks. The simulation results showed that the MQLA algorithm needed less time tocomplete a multi-robot transportation task and received better reward than the traditionalQ-Learning algorithm.Third, a pose estimation technique was developed to detect the object pose using twocolor-blobs simultaneously so that the orientations and positions of the robots could be120determined in a multi-robot object transportation task. This method was found to meettwo challenges in carrying out the specific task: the changing illumination levels in thework environment and the real-time operating requirement. With the available domainknowledge, the algorithm was able to effectively and quickly track multiple moving colorblobs simultaneously in an environment with uneven illumination.Finally, a physical multi-robot transportation project was developed in the laboratory toimplement and validate the approaches developed in the thesis. The physical systemfaced more challenges than the computer simulation system; in particular, sensor noise,wheel slip, real-time operation, and motion constrains. In addition, in order to implementeffective communication among multiple robots and facilitate proper execution of thenew MQLA algorithm, the Microsoft visual studio (C++) technology was incorporatedinto the system. The experimental results showed that the developed physical system wasable to operate effectively and robustly in a dynamic physical environment withobstacles.6.2 Limitations and Suggested Future ResearchAlthough the developed multi-robot transportation system has demonstrated goodperformance both in computer simulation and physical experimentation, there are someareas that need improvement. Some directions for further research are indicated next.6.2.1 Improvement of the Model and Algorithm of MQLAThe credit assignment is rather difficult in multi-robot learning. For example, whenmultiple robots cooperatively push an object to a goal location, how to split the observedglobal reward among the robots is an important issue. In MQLA developed in this thesisassumes each robot makes a positive contribution to the collective task, which may not betrue in a real multi-robot system. In practice, each robot makes a different contribution tothe collective task. If the global reward is directly assigned to each robot, it means thateach robot receives the same reward regardless of whether it makes a positivecontribution to the collective task. Due to this global reward policy, MQLA simulationresults in Chapter 3 have shown that the learning agents can converge slowly or even121become confused in a multi-robot cooperative task, thereby making the MQLA algorithmineffective. It follows that updating the Q-values with a global reward is not proper in adynamic and unknown environment. Therefore, more work may be needed to estimatethe real reward of each robot from the global reward signal for the MQLA algorithm.6.2.2 Integration of Learning and CoordinationLearning and coordination are both important for multi-robot systems when operating inan unknown and dynamic environment. In this thesis, a type of switching strategybetween learning and coordination was developed, where a coordination unit wastemporarily activated when the stage of state identification of the machine learningapproach was encountered. However, more advanced integration scheme would be usefulso as to make use of coordination in a multi-robot task.6.3 SummaryNew developments in multi-robot systems will endow the next generation of roboticsystems with more powerful capabilities. It has been an important objective to allowmultiple robots to autonomously carry out such tasks as cooperative object transportationor robotic soccer, in dynamic, unknown, and unstructured environments. In this thesis,several key approaches were developed toward achieving this goal. It is believed that nosingle AT technique can meet this objective. Instead, learning, reactive (behavior-based)paradigm and coordination approaches should be integrated, and new approaches shouldbe developed on that basis so that an autonomous multi-robot system will be able tooperate in a dynamic and unknown environment in an effective and robust manner. Someimportant work has been carried out in this thesis towards this general objective,specifically with regard to stage of state identification, improved robustness, object poseestimation, obstacle pose estimation, and real-time operation.122BibliographyAlpaydin, E., Introduction to Machine Learning, The MIT Press, Cambridge, MA, 2004.Arai, T., Pagello, E. and Parker, L.E., “Advances in multirobot systems,” IEEE Trans. onRobotics andAutomation, Vol.18, No. 5, pp. 655-661, 2002.Arkin, R.C., Behavior-Based Robotics, The MIT Press, Cambridge, MA, 1998.Camacho, D., Fernandez, F. and Rodelgo, M.A., “Roboskeleton: an architecture forcoordinating robot soccer agents,” Engineering Application ofArtificial Intelligence, Vol.19, No. 2,pp.179-188, 2006.Cao, Z., Tan, M., Li, L. Gu, N. and Wang, S., “Cooperative hunting by distributed mobilerobots based on local interaction,” IEEE Transactions on Robotics, Vol. 22, No. 2,pp.403-407, 2006.Chaimowicz, L., Kumar, V. and Campos, M.F.M., “A paradigm for dynamiccoordination of multiple robots,” Autonomous Robots, Vol. 17, No. 1,pp.7-21, 2004.Craig, J.J., Introduction to Robotics: Mechanics and Control, Third Edition, PearsonPrentice Hall, Upper Saddle River, NJ, 2005.Dahl, T.S., Mataric, M.J. and Sukhatme, G.S., “Adaptive spatio-temporal organization ingroups of robots,” Proc. of the 2002 IEEE/RSJ International Conference on IntelligentRobots and Systems (IROS), Lausanne, Switzerland,pp.1044-1049, 2004.De Silva, C.W, MECHATRONICS—An Integrated Approach, Taylor & Francis/CRCPress, Boca Raton, FL, 2005.Farinelli, A., Iocchi, L. and Nardi, D., “Multirobot systems: a classification focused oncoordination,” IEEE Transactions on Systems, Man, and Cybernetics, Part B:Cybernetics, Vol. 34, No. 5, pp. 2015-2028, 2004.Gerkey, B.P., and Mataric, M.J., “Sold!: auction methods for multirobot coordination,”IEEE Trans. on Robotics and Automation, Vol. 18, No. 5,pp.758-768, 2002.Gerkey, B.P., and Mataric, M.J., “A formal analysis and taxonomy of task allocation inmulti-robot systems,” International Journal of Robotics Research, Vol. 23, No. 9,pp.939-954, 2004.Gustafson, S. and Gustafson, D.A., “Issues in the scaling of multi-robot systems forgeneral problem solving,” Autonomous Robots, Vol. 20, No. 2,pp.125-13 6, 2006.123Gopalakrishnan, A., Greene, S. and Sekmen A., “Vision-based mobile robot learning andnavigation,” Proceedings of IEEE International Workshop on Robot and HumanInteractive Communication, Nashville, TN,pp.48-53, 2005.Hajjdiab, H. and Laganiere, R., “Vision-based multi-robot simultaneous localization andmapping,” Proceedings of 1st Canadian Conference on Computer and Robot Vision, pp.155-162, May, London, ON, Canada, 2004.Huntsberger, T., Pirjanian, P., Trebi-Ollennu, A., Nayar, H.D., Aghazarian, H, Ganino,A.J., Garrett, M., Joshi, S.S. and Schenker, P.S., “Campout: a control architecture fortightly coupled coordination of multi-robot systems for planetary surface exploration,”IEEE Trans. on Systems, Man, and Cybernetics, Vol. 33, No. 5,pp.550-559, 2003.Ito, K. and Gofuku, A., “Hybrid autonomous control for multi mobile robots,” AdvancedRobotics, Vol. 18, No. 1,pp.83-99, 2004.Kapetanakis, S. and Kudenko, D., “Reinforcement learning of coordination incooperative multi-agent systems,” Proceedings of Eighteenth national conference onArtificial intelligence, Edmonton, Alberta, Canada,pp.326-33 1, 2002.Kato, K., Ishiguro, H. and Barth, M., “Identification and localization of multiple robotsusing omnidirectional vision sensors,” Electronics and Communications in Japan, PartII: Electronic, Vol. 86, No. 6,pp.1270-1278, 2003.Konidaris, G.D. and Hayes, G.M., “An architecture for behavior-based reinforcementlearning,” Adaptive Behavior, Vol. 13, No. 1,pp.5-32, 2005.Kovac, K., Zivkovic, I. and Basic, B.D., “Simulation of multi-robot reinforcementlearning for box-pushing problem,” Proceedings of the Mediterranean ElectrotechnicalConference — MELECON, Dubrovnik, Croatia,pp.603-606, 2004.Kumar, M. and Garg, D.P., “Sensor-based estimation and control of forces and momentsin multiple cooperative robots,” Journal ofDynamic Systems, Measurement, and Control,Transactions oftheASME, Vol. 126, No. 2,pp.276-283, 2004.Littman, M.L., “Friend-or-Foe Q-learning in general-sum games,” Proc. 18thInternational Conf on Machine Learning, San Francisco, CA, pp. 322-328, 2001.Liu, J. and Wu, J., Multi-agent Robotic Systems, CRC Press, Boca Raton, FL, 2001.Marshall, J.A., Fung, T., Broucke, M.E., D’eleuterio, G.M.T., Francis, B.A.,“Experiments in multirobot coordination,” Robotics and Autonomous Systems, Vol. 54,No. 3, pp. 265-275, 2006.124Martison, E. and Arkin, R.C., “Learning to role-switch in multi-robot systems,” Proc. ofIEEE International Conference on Robotics & Automation, Taibei, Taiwan,pp.2727-2734, 2003.Maurin, B., Masoud, 0. and Papanikolopoulos, N. P., “Tracking all traffic: computervision algorithms for monitoring vehicles, individuals, and crowds,” IEEE Robotics andAutomation Magazine, Vol. 12, No. 1,pp.29-36, 2005.Miyata, N., Ota, J., Arai T. and Asama, H. “Cooperative transport by multiple mobilerobots in unknown static environments associated with real-time task assignment,” IEEETrans. on Robotics andAutomation, Vol. 18, No. 5,pp.769-780, 2002.Murarka, A., Modayil, J. and Kuipers, B., “Buildin local safety maps for a wheelchairrobot using vision and lasers,” Proceedings ofthe 3’ Canadian Conference on Computerand Robot Vision, Quebec City, QC, Canada,pp.25-32, 2006.Panait, L. and Luke, S., “Cooperative multi-agent learning: the state of the art,”Autonomous Agents and Multi-Agent Systems, Vol. 11, No. 3,pp.387-434, 2005.Parker, L.E., “Current state of the art in distributed autonomous mobile robotics,”Distributed Autonomous Robotic Systems 4, Springer-Verlag, Tokyo, pp. 3-12, 2000.Parker, L.E., Touzet, C. and Fernandez, F., “Techniques for learning in multirobotteams,” (Chapter 7), Editor: Baich, T. and Parker, L.E., Robot Teams: From Diversity toPolymorphism, AK Peters, Ltd., Natick, Massachusetts, 2002.Pereira, G., Campos, M. and Kumar, V., “Decentralized algorithms for multi-robotmanipulation via caging,” The International Journal ofRobotics Research, Vol. 23, No.7-8,pp.783-795, 2004.Russell S. and Norvig P., Artflcial Intelligence: A Modern Approach, Second Edition,Pearson Education, Inc., Upper Saddle River, NJ, 2003.Spong, M.W., Hutchinson, S., and Vidyasagar, M., Robot Modeling and Control, JohnWiley & Sons, Inc., Hoboken, NJ, 2006.Stone, P., Sridharan, M., Stronger, D., et al., “From pixels to multi-robot decision-making: A study in uncertainty,” Robotics and Autonomous Systems, Vol. 54, No. 11,pp.933-943, 2006.Takahashi, S. and Ghosh, B.K., “Parameter identification of a moving object based onsensor fusion,” Proceedings of IEEE International Conference on Mechatronics andAutomation, Niagara Falls, ON, Canada,pp.17 1-176, 2005.125Tangamchit, P., Dolan, J.M. and Khosla, P.K., “Crucial factors affecting decentralizedmultirobot learning in an object manipulation task,” Proceedings of IEEE InternationalConference on Intelligent Robots and System, Las Vegas, NV,pp.2023-2028, 2003.Taylor, M.E. and Stone, P., “Behavior transfer for value-function-based reinforcementlearning,” Proceedings of the 4th International Conference on Autonomous Agents andMulti agent Systems, Utrecht, Netherlands,pp.201-207, 2005.Tomono, M., “Environment modeling by a mobile robot with a laser range finder and amonocular,” Proceedings of 2005 IEEE Workshop on Advanced Robotics and its SocialImpacts, Nagoya, Japan, pp. 133-138, 2005.Veeraraghavan, H., Masoud, 0. and Papanikolopoulos, N.P., “Computer visionalgorithms for intersection monitoring,” IEEE Transactions on Intelligent TransportationSystems, Vol. 4, No. 2,pp.78-89, 2003.Wang, Y. and de Silva, C.W., “An object transportation system with multiple robots andmachine learning,” Proc. of American Control Conference (A CC 2005), Portland, OR,pp.1371-1376, 2005.Wang, Y. and de Silva, C.W., “Multi-robot box-pushing: single-agent Q-Learning Q-Learning,” Proceedings of 2006 IEEE/RSJ International Conference onIntelligent Robots and Systems (IROS), Beijing, China,pp.3694-3699, 2006.Wang, Y. and de Silva, C.W, “A machine learning approach to multi-robot coordination,”Engineering Applications ofArtficial Intelligence (Elsevier), 2007.Wang, Y., “Cooperative and Intelligent Control of Multi-robot Systems Using MachineLearning,” Thesis for The Degree of Doctor of Philosophy (University of BritishColumbia), 2007.Yang, E., and Gu, D., “Multiagent reinforcement learning for multi-robot systems: asurvey,” Technical Report,http://robotics.usc. edu/—maja/teaching/cs584/papers/yango4multiagent.pdf 2004.Zaerpoora, A., Ahmadabadia, M.N., Barunia, M.R. and Wang, Z., “Distributed objecttransportation on a desired path based on constrain and move strategy,” RoboticsandAutonomous Systems, Vol. 50, No. 2-3,pp.115-128, 2005.126Appendix 1— Program FunctionsSummaryThis appendix outlines some important classes and functions as used in thepresent work.ClassesDevelopment of the application in this thesis has used following importantclasses:MyClient, MyACTS, MyPose MyDecodeEncode MyQLearningRobotFunctionsMyClient.hclass MyClientpublic:MyClient(ArRobot*robot);void connectServer(int myNo);boolsendingString(char*string,bool broadcast,constchar*WinnerlP);MyACT.hclass MyGeneralACTS: public ArRobotpublic:MyGeneralACTS(ArRobot*robot){ };mt ACTSCo10rB1obQ;voidCentreTheBlob(ArRobot* robot,intxl,int x2,int yl,inty2)MyPose.hclass MyPose: public ArRobotpublic:MyPose() {};voidFindObstacleGroup(ArRobot*robot,ArSick*sick,double*ObstacleArrayX,double*ObstacleArrayY);void poseOfTheBox( double xl, double yl, double x2, doubley2, double thetal);ArPoseposeEstimate(ArRobot*robot,ArSick*sick);127MyQLearning.hclass MyDecodeEncode: public ArRobot{public:MyDecodeEncode(ArRobot*robot) {};voidGetMailHeading(char*MailHeading);char*StringPackage(double x, double y, double theta);ArPose StringDecode(char*strToDecode);class gLearningRobot: public ArRobot{public:qLearningRobot(ArRobot*robot):tao(0.9) ,action(0) ,actiomNum(6) ,successProcessCount(0),learningRate(0.8) ,discountRate(0.9) ,wl (0. 7) ,w2 (0. 25) ,w3 (0. 05) { };double detectionRadius(int width,int height);void selectAction(int StateCode);mt probaSelect(double p[], mt n);void calculateRobotActionPositions (mt action, ArPoseboxPose, ArPose p1);void calForceParaQ;void boxResponse (ArPose position);void calNextPosition (ArPose boxPose);bool qLearningRobot: :obstacieDetect(ArPoseboxPose,double*obstacleArrayNewX,double*obstacleArrayNewY);void calculateReward(ArPose oldPose);void updateQ(int state, mt newState, double reward);void frameConvert(ArPose p1, ArPose p2, ArPose origin);void frame2Tol(ArPose p1, ArPose p2, ArPose origin);General.cppmt BuildStateCode ()void qTablelnitialization ()mt main(int argc, char**argv)128Appendix 2— Program CodesSummaryThis appendix includes important Microsoft C++ files and their implementationcodes.C++ FilesMyClient.h, MyACT . h, MyPose.h, MyQLeaming.h, General.cppMyClient.h/This program will uses to to communicate between robots during thetransportation process. It acts as a client to all robots individuallyin the group to receive the handshaking signal from other robots.#include “string.h”#include <iostream>#include <sstream>class MyClientpublic:MyClient(ArRobot*robot);-MyClient() {};void connectServer(int myNo);boolsendingString(char*string,bool broadcast,constchar*Winnerlp);constchar*ipAddress;mt myOwnNo;void close;private:ArRobot*myRobot;ArPose currentPose;double currentX;double currentY;double currentTh;char*strToSend;char strToBroadcast[128];//constchar*ipAddress;char*tok;char strToGet[lO];char*otherlP[71;129char*strToEnd;char buffer[331;mt roboNo;mt maxRoboNo;ArMutex myCycleCailback;ArSocket globalOtherSoc[5];sizet strSize;timet timeAuctionStart;MyClient::MyClient(ArRobot*robot):strToEnd (“BYE! ! !“),myRobot (robot)myExitPlanCB. setName (“myExitPlan”);Aria: :addExitCallback(&myExitPlanCB, 21);I/Other robots’ IP addressotherlP[O] =“;otherlP[1] =“”;otherlP[2] =“”;void MyClient: :connectServer(int myNo)//connect to other serversSleep(10000) ;mt counter;myOwnNo = myNo;for (roboNo = 0; roboNo <maxRoboNo+1; roboNo++)if (roboNo == myNo)roboNo++;if (roboNo > maxRoboNo)break;for (counter = 0; counter < 1; counter++)if (globalOtherSoc[roboNo] .connect(otherlP[roboNo],7777+roboNo, ArSocket: :TCP))break;if (counter == 1)boolMyClient::sendingString(char*string, bool broadcast)strToSend = string;if (broadcast == true)130for (roboNo = 0; roboNo <maxRoboNo+1; roboNo++)if (roboNo == myOwnNo && roboNo < maxRoboNo)roboNo++;if (WinnerlP NULL)if (strcmp(WinnerlP,otherlP[roboNoj) == 0continue;if (roboNo > maxRoboNo)break;globalotherSoc [roboNo] .write (strToSend, strien (strToSend));return true;elseif (WinnerlP NULL)ipAddress = WinnerlP;elsefor (roboNo = 0; roboNo < maxRoboNo+l;roboNo++)if (strcmp(ipAddress,otherlP[roboNo]) == 0)if(globalOtherSoc[roboNo] .write(strToSend,strlen(strToSend)) !=strlen(strToSend))return false;return true;return false;void MyClient: :close()myCycleCailback. tryLock0;for (roboNo =0; roboNo < maxRoboNo+l; roboNo++)if(globalOtherSoc[roboNo] .write(strToEnd, strlen(strToEnd)) ==strien (strToEnd))printf(’See You again!!!’1);myCycleCailback . unlock 0;131MyACT.h#include “Aria.h”tinclude “ACTSClient . h”#include<iostream>#include<conio . h>#include<stdlib h>#include <signal. h>using namespace std;class MyGeneralACTS: public ArRobotpublic:MyGeneralACTS(ArRobot*robot){ };—MyGeneralACTS() { };mt ACTSC010rB10bQ;voidCentreTheBlob(ArRobot*robot,int xl,int x2,int yl,int y2);mt getXcgQ{return xcg;}mt getYcgQ{return ycg;}private:mt xcg,ycg;mt channel;blob generaiblob;ACTSClient actsClient;mt MyGeneralACTS: :ACTSC01orB10b()if ( ! actsClient . openPort (“5001”))cout<<”Sorry can’t open acts port\n”<<endl;return 0;actsClient. requestPacketQ;if( actsClient.receiveBloblnfoQ)if (actsClient.getBlob(0,0, &generalblob)&&generalblob.area > 5000)cout<<”Blob Area :“ <<generalblob.area<<endl;cout<<”Received the Blob\n”<<endl;xcg = generalblob.xcg;ycg = generalblob.ycg;actsClient . closePort;return 1;else132cout<<”Blob Area :“ <<generalblob.area<<endl;cout<<”Blob is not the correct one\n”);actsClient.closePort()return 2;elsecout<<”The blob is not received\n”<<endl;actsClient. closePort ;return 2;voidMyGeneralACTS::CentreTheBlob(ArRobot* robot,int xl,int x2,intyl,int y2)// 1st quadrantif(((xl—x2)> 40) && ((yl—y2)< —40))// (1)movebackward (2)move leftcout<<”The Blob is in the 1st Quadrant”<<endl;double robotAngle = robot->getPose() .getTh();cout<<”robotAngle: “<<robotAngle<<endl;/7 the angle needs to bedetermined by either getTh or boxPoseEstimaterobot->setHeading (robotAngle+90);ArUtil: :sleep(8000)robot->move (250);ArUtil: :sleep(8000)robot->setHeading (robotAngle+180);ArUtil: :sleep(8000)robot->move (700);ArUtil: :sieep (8000);robot->setHeading (robotAngle);ArUtil: :sleep(8000)/7 2nd quadrantif(Hxl—x2)> 40) &&((yl—y2)> 40))7/(l)move forward (2)move leftcout<<”The Blob is in the 2nd Quadrant”<<endl;double robotAngle = robot—>getPoseQ.getThQ;cout<<”robotAngle: <<robotAngle<<endl;// the angle needs to bedetermined by either getTh or boxPoseEstimaterobot—>setHeading (robotAngle+90);133ArUtil: :sleep (8000);robot—>move (250);ArUtil: :sleep(8000)robot->setHeading (robotAngle);ArUtil: :sleep(8000)robot—>move (100);ArUtil: :sleep(8000) ;II 3rd quadrantif(((xl—x2)< —40) && ((yl—y2)> 40))/1 (l)move forward (2)move rightcout<<”The Blob is in the 3rd Quadrant”<<endl;double robotAngle = robot—>getPoseQ.getThQ;/1 doublerobotAngle = ((22.0*(90 +(robot.getPose() .getTh())))/ (180.0*7.0));cout<<”robotAngle: “<<robotAngle<<endl;// the angle needs to bedetermined by either getTh or boxPoseEstimaterobot—>setHeading (robotAngle—90);ArUtil: :sleep (8000);robot—>move (250);ArUtil: :sleep (8000);robot—>setHeading (robotAngle);ArUtil: :sleep (8000);robot—>move (100);ArUtil: :sleep (8000);II 4th quadrantif(((xl—x2)< —40) && ((yl—y2)< —40))// (1)movebackward (2)move rightcout<<”The Blob is in the 4th Quadrant”<<endl;double robotAngle = robot—>getPoseO.getThQ;cout<<”robotAngle: “<<robotAngle<<endl;II the angle needs to bedetermined by either getTh or boxPoseEstimaterobot->setHeading (robotAngle—90);ArUtil: :sleep(8000)robot—>move (250);ArUtil: :sleep (8000);robot->setHeading (robotAngle-l80);134ArUtil: :sleep(8000);robot->move (700);ArUtil: :sleep (8000);robot—>setHeading (robotAngle);ArUtil: :sleep(8000)MyPose.h/This program uses to detect the pose of the box (position andorientation),which uses data from a laser rangefinder to detect acontinues line of points.*****************************************************************/#include <cmath>using namespace std;class MyPose: public ArRobot(public:double center_a, center_b;MyPoseQ{};-MyPose() {};voidFindObstacleGroup(ArRobot*robot,ArSick*sick);void poseOfTheBox( double xl, double yl, double thetal);ArPoseposeEstimate(ArRobot*robot,ArSick*sick);private:ArPose currentPose;ArPose boxPose;double boxCentreX;double boxCentreY;double boxCentreTh;double currentX;double currentY;double currentTh;135/*****************This method use to find obstacles***************/voidMyPose::FindObstacleGroup(ArRobot* robot, ArSick* sick){double angle,dist,distl,anglel, CenterDist, CenterAngle;std: :list<ArPoseWithTime*> *readings;std: :list<ArposeWithTime*>::iterator it;sick—>lockDevice()distl = sick—>currentReadingPolar (—60, 60, &angle);readings = sick->getCurrentBufferQ;mt al=0, a2=0, a4=0;mt ObstacleMinLineDiff = 300;al = readings—>sizeO;double*distArray = new double[al];double*angleArray = new double[al];double*ObstacleDistArray = new double[aldouble*ObstacleAngleArray = new double[alfor (it = readings—>beginQ; it readings—>endO; it++){distArray[a2j robot—>getPose().findDistanceTo(ArPose((*it)_>getXQ,(*it)_>getyQ));angleArray[a2] = robot—>getPose().findAngleTo(Arpose((*it)_>getXQ,(*it)_>getyQ));a2++; Sleep(l00);for (mt a3=0; a3<a2—l; a3++)if (abs(distArray[a3] — distArray[a3+lJ)> ObstacleMinLineDiff)if (distArray[a3] > distArray[a3+l]){ObstacleDistArray[a4] = distArray[a3];ObstacleAngleArray[a4] = angleArray[a3];elseObstacleDistArray[a4] = distArray[a3+lJ;ObstacleAngleArray[a4] = angleArray[a3+lj;a4++;mt indexLocal = (int) ((a4/2)—l);double*LocalObstacleArrayX = new doubleindexLocal1;double*LocalObstacleArrayY = new double[indexLocal];for (mt a3=l; a3 < ((a4/2)+l); a3++){CenterDist =(ObstacleAngleArray[a31 + ObstacleAngleArray[a3+lj)/2;136CenterAngle = (abs(ObstacleAngleArray[a3] —ObstacleAngleArray[a3+1J)/2) + ObstacleAngleArray[a3+1];LocalobstacleArrayX[a3—i) = CenterDist*sin(CenterAngle);LocalobstacleArrayy[a3—lj = CenterDist*cos(CenterAngle);// above corrdinate related to robot, convert into Global coordinateObstacleArrayX = LocalObstacleArrayX;ObstacleArrayy = LocalObstacleArrayY;void Mypose::poseOfTheBox( double xl, double yl, double thetal){double xface, yface;mt height = 1700;mt width = 720;mt half_height = 850;mt half_width = 360;x_face = (xl + x2)/2;yface=(yl + y2)/2;center_a = xface + (half_width*cos (thetal));center_b = yface + (half_width*sin (thetal));cout<<” Box Center X and Y: \n”<< center_a <<“ “<<centerb<<endl;ArPoseMyPose::poseEstimate(ArRobot*robot,ArSick*sick)double angle,dist,distl,anglel,startangle,startdist,endangle,enddist;double temp angle,temp = 0.0;std: :list<ArPoseWithTime >*readings;std: :list<ArPoseWithTime*>::iterator it;ArUtil: :sleep(l000);sick—>lockDevice Q;distl = sick->currentReadingPolar (—60, 60, &angle);readings = sick->getCurrentBufferQ;mt i=0, j=0, a=0, b=1, c=1;mt al=0, a2=0, a3, minAngle=0, minPosi;double distance=300.0, s=0;al = readings—>sizeO;double*distArray = new double[al];double*angleArray = new double[alj;double distil, dist2, anglell, angle2, frontDistance;double maxmumLineDiff=300;137for (it = readings—>beginQ; it != readings—>end; it++){double distOOl = robot—>getPose().findDistanceTo(ArPose((*it)_>getX,(kit)—>getY()));double angieOOl = robot—>getPose().findAngleTo(Arpose((*it)_>getXO,(*it)_>getyo));distArray[a2] = robot->getPose().findDistanceTo(ArPose((*it)_>getXQ,(*it)_>getyQ));angleArray[a2] = robot—>getPose() findAngleTo(ArPose((*it)_>getXQ,(*it)_>getyQ));a2++;for (a3=O; a3<=al; a3++)if ((angleArray[a3])>= minAngie){frontDistance = distArray[a3];minPosi = a3;break;for (a3=minPosi—l; a3>=O; a3——)if (abs(distArray[a3+l]— distArray[a3])>maxmumLineDiff)distil = distArray[a3+lJ;angleli = angieArray[a3+l1;break;for (a3=minPosi+i; a3<=al; a3++)if (abs(distArray[a3—lj- distArray[a3])>maxmumLineDiff)dist2 = distArray[a3—l];angle2 = angieArray[a3—l];break;Startdist = distil;startangle= anglell;enddist = dist2;endangle= angie2;/1 Start and End anglecalculation based on the XY plandouble alpha=O, beta=O, dl, d2;alpha = ((22.0*(90 + startangleH/ (180.0*7.0));beta = ((22.0*(90 + endangleH/ (180.0*7.0));138dl= startdist;d2= enddist;double xl new, ylnew, x2_new, y2new, thetal, num, dnum;double newcenter_b ,newcenter_a;xl new = dl*cos(alpha);ylnew = dl*sin(alpha);x2new = d2*cos(beta);y2new = d2*sin(beta);num=y2 new - ylnew;dnum = xl_new — x2_new;thetal = (180*(7*atan(num / dnumH)/22;poseOfTheBox(xl new, ylnew, x2new, y2new, thetal);/ Error Handler—Start*******************/newcenter_a = center_a;newcenter_b = value + center_b;boxCentreX =newcenter_a;boxCentrey =newcenter_b;boxCentreTh =thetal;boxPose. setPose (boxCentreX, boxCentreY, boxCentreTh);/* * ** * * * * * * * * * * * * * * *Error Handler—End* * * * * * **** ** * * * * * * * *** *//***************Globle Box Pose — Start********************/double x globle new, ygloblenew, Tb globle new;double anglenew, gama globle new;Tb globle new = ((22*(robot->getPose().getThQ))/ (180*7));angle_new = (robot->getPose().getTh()) — thetal;xgloble new = (robot->getPose().getX()) + center_a*cos (Thgloble new) + center_b*sin (Tb globle new);ygloble new = (robot->getPoseQ.getYQ)+ center_a*cos(Thgloble new) - center_b*sin (Tb globle new);gama globle new = (180*7*angle_new) /22;cout<<“ * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * *\n” <<endl;cout<< “Globle Box Position :(“ << x globle new << “ , “ygloble new << “ , << angle_new << ) “<<endl;cout<<“* * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * *\n “ <<endl;/********************Globle Box Pose — End*********************/sick—>unlockDevice 0;robot—>waitForRunExit 0;return boxPose;139MyQLeaming.hclass MyDecodeEncode: public ArRobot{public:MyDecodeEncode(ArRobot*robot, ArSick*sick,ArAnalogGyro*gyro){ };-MyDecodeEncode (){ };voidGetMailHeading(char*MailHeading);char*StringPackageol(bool boxOrRobot,double x, double y, doubletheta);char*StringPackageo2 (mt state);ArPose StringDecode(char*strToDecode);mt StringDecodeState(char*strToDecode);mt capabiiityCheck(bool arm,bool gripper,bool camera);char*dec2bin(int statecode,char*binaryequivalent);mt totaiStateCodeOl (mt statecodeOl, mt statecodeo2);mt totaiStateCode02 (mt stateCodelntOl, mt stateCodelnt02);private:double posex, posey, posetheta;mt state;char*strToSendchar strToGet [128];char strToBroadcast[128];char*strToDecode;char* actionToSend;char actionToGet[128];char actionToBroadcast [128];char*actoinToDecode;char buffer[33];char*tok;char*localMailHeading;bool isFinished;double forceValue[6];size_t strSize;ArPose boxOrRobotpose;ArPose position;char*MyDecodeEncode: :StringPackage0l (bool boxOrRobot,double x, doubley, double theta)if (boxOrRobot == true)strcpy(strToBroadcast,”boxPose! ! !“);strcat(strToBroadcast,itoa((int)x,buffer, 10));strcat (strToBroadcast,”!strcat (strToBroadcast, itoa( (int)y, buffer, 10));strcat (strToBroadcast,”!strcat (strToBroadcast, itoa( (int) theta,buffer, 10));strToSend = strcat(strToBroadcast,”!! !“);140elsestrcpy (strToBroadcast, “robotpose !strcat (strToBroadcast, itoa ( (int) x,buffer, 10));strcat (strToBroadcast,”! !“);strcat (strToBroadcast, itoa ( (int) y,buffer, 10));strcat (strToBroadcast,”! !strcat(strToBroadcast,itoa((int)theta,buffer,10));strToSend = strcat(strToBroaclcast,”!! !“);return strToSenci;char*MyDecodeEncode: :StringPackage02 (mt state)strcpy(actionToBroadcast, “An environment state! ! !“);strcat(actionToBroadcast, itoa( (int)state,buffer, 10));strToSend = strcat (actionToBroadcast, “1 ! !“);return actionToSend;ArPose MyDecodeEncode: : StringDecode(char*strToDecode)tok = strtok(strToDecode,”! !localMailHeading = tok;if(strcmp(tok,”boxPose”) ==0)tok = strtok (NULL,”! ! !“);posex =atoi(tok);tok = strtok(NULL,”!!!”);posey = atoi(tok);tok = strtok(NULL,”!!!”);posetheta = atoi(tok);boxOrRobotpose. setPose (posex, posey, posetheta);if(strcmp(tok,”robotPose”) ==0)tok = strtok(NULL,”! ! !“);posex =atoi (tok);tok = strtok(NULL,”!!!”);posey = atoi(tok);tok = strtok(NULL,”!!!”);posetheta = atoi(tok);boxOrRobotpose setPose (posex, posey, posetheta);return boxOrRobotpose;141mt MyDecodeEncode::StringDecodeState(char*strToDecode)tok = strtok(strToDecode,”! ! !“);localMailHeading = tok;if(strcmp(tok,”An environment state”) ==0)tok = strtok(NULL,”!!!”);state =atoi(tok);return state;mt MyDecodeEncode: :capabilityCheck(bool arm,bool gripper,bool camera)mt capabilityCheck =0;/1 armonly=l; gripperonly=2; cameraonly=3;arm&&gripper=4;// arm&&camera=5; gripper&&camera=7;arm&&gripper&&camera=8;/1 nocomponent=0;if (arm == true)capabilityCheck ++;if (gripper == true)capabilityCheck =capabilityCheck+2;capabilityCheck ++;if (camera == true)capabilityCheck =capabilityCheck+3;capabilityCheck ++;return capabilityCheck;char*MyDecodeEncode: :dec2bin(int statecode,char*binaryequivalent)mt k =0,n = 0;mt remain, oldStatecode;char temp[16];dooldStatecode = statecode;remain = statecode % 2;statecode = statecode / 2;tempk++J = remain + ‘0’;while (statecode > 0);while (k > 0)bmnaryequivalent[n++] = temp[--kJ;binaryequivalent[n-l1 = 0;142return binaryequivalent;mt MyDecodeEncode: :totalStateCodeOl (mt statecodeOl, mt statecodeO2)mt b, k, m, n;mt sum = 0;char binaryequivalentOl [161;char binaryequivalento2 [161;for (mt = 0;i < 16;i++)binaryequivalent0l[i]= ‘\O’;binaryequivalentO2[iJ= ‘\O’;bitset<16> binaryOl(string (dec2bin(statecode0l,binaryequivalent0l)H;bitset<16> binaryo2(string(dec2bin(statecodeo2,binaryequivalento2H);bitset<16> binaryequivalent = binaryOl I binary02;for(k = 0; k <= 15; k++)n = (int) (binaryequivalent[k) - ‘0’);if((n > 1)II (n < 0)return (0);for(b = 1, m = 15; m > k; m--)b=2;sum = sum + n*b;return(sum);mt MyDecodeEncode: :totalStateCode02 (mt stateCodelntOl, mtstateCodelnt02)mt totaiStateCode;totalStateCode=(stateCodelnt0l I stateCodelnt02);return totaiStateCode;class MyqlearningRobot: public ArRobot{public:MyqLearningRobot(ArRobot*robot, ArSick*sick,ArAnalogGyro*gyro):tao(0.9),action(0), learningRate(0.8),discountRate(0.9),lArm(20),wl(0.7),w2(0.25),w3(0.05){};-MyqLearningRobot() { };143double detectionRadius(int width,int height);mt selectAction(int StateCode);mt probaSelect(doublep,mt n);ArPose calculateRobotActionPositions (mt action, ArPose boxPose);void calForceParaQ;void boxResponse(ArPose position,ArPose newPose);void calNextPosition (ArPose boxPose);bool obstacleDetect (ArPose boxPose);void calculateReward(ArPose oldPose);void updateQ(int state, mt newState, double reward);void frameConvert(ArPose p1, ArPose p2, ArPose origin);ArPose frame2Tol(ArPose p2, ArPose origin);bool arriveGoal(ArPose theGoal,ArPose boxPose);private:double posex,posey,posetheta;sizet strSize;ArPose boxpose;mt forcevalue;double forceValue[61;mt actionNum;mt state;mt reward;mt action;double angle;double netForceAngle;double netForceValue;double netTorque;double*obstacleArrayNewX;double*obstacleArrayNewY;double q[6];double sum;double minQ;double learningRate;double discountRate;double tao;mt lArm;double Mf,Mt;double wl,w2,w3;double MyqLearningRobot: :detectionRadius (mt width, mt height)double detectRadius;detectRadius = (sqrt((double)(((width/2)*(width/2H+((height/2)*(height/2)) ) ) )+400;return detectRadius;mt MyqLearningRobot: :selectAction(int state)sum=0;minQ=l00000000;double ranSelProb=0. 1;doublev=randQ*100;144if(v<=ranSelprob*lO0)action=(int)(randQ*6);return action;for(int i=0; i<=5; i++)q[il=qTable[statel [ii;if(q[i]<minQ)minQ=q[i];for(int i=0; i<=actionNum-l; i++)q[i]=q[i]—minQ;for(int 1=0; i<=actionNum—1; i++)sum=0;sum=sum + exp(q[i]/tao);double p[61;for(int i=0; i<=actionNum-1; i++)p[i} = exp(q[ij/tao)/sum;tao=tao*0.999;action=probaSelect(P,actionNum);return action;mt MyqLearningRobot::probaSelect(double p[), mt n)double accumSum[7];accumSum[0j = 0;for(int i=l; i<=n;i++)accumSum[iJ=accumSum[i-l1+p[i—l]*100;double v=rand()*100;for(int i=0; i<=n; i++)if (v>=accumSum[i])continue;elsereturn i—l;145return 0;ArPose MyqLearningRobot: :calculateRobotActionPositions (mt action,ArPose boxPose)double w=1700; double h=720;ArPose p2;if (action==0)p2.setX(w/2) ; p2.setY(0)if (action==1)p2. setX( (w/2) —50) ; p2. setY(h/2)if (action==2)p2.setX(—((w/2)—50)); p2.setY(h/2);if (action==3)p2.setX(—w/2); p2.setY(0);if (action==4)p2.setX(—((w/2)—50H; p2.setY(—h/2);if (action==5)p2.setyUw/2)—50); p2.setY(—h/2);ArPose p1 = frame2Tol(p2,boxPose);return p1;void MyqLearningRobot: :calForcePara()double P1 = 3.1415926535897932384626433832795;for(int i=0; i<6;i++)if (forceValue[i)>30)forceValue [ii =0;double fx=forceValue [3] —forceValue [01;double fy=forceValue[4]+forceValue[5]—forceValue[2]—forceValue[l];146netForceAngle=(atan2 (fy, fx))/PI*180;netForceValue=sqrt( (fxfx)+ (fy*fy));netTorque= (forceValue [5] +forceValue [2] -forceValue [1] —forceValue [4] )*lArm;bool MyqLearningRobot: :obstacleDetect(ArPose boxPose)double rr=0;ArPose obsta;ArPose obstacleLocalPos (0,0);for(int i=O;i<sizeof (obstacleArrayNewX) ;i++)obsta. setX (obstacleArrayNewX[i]);obsta. setY (obstacleArrayNewY [ii);frameConvert(obsta,obstacleLocalPos, boxpose);double x=obstacleLocalPos.getXO;double y=obstacleLocalPos.getYO;double width=1700;double height=720;if((abs(x)<=width/2) && (abs(y)<=(225+height/2H)return true;if((abs(y)<=height/2) && (abs(x)<=(225+width/2H)return true;bool tmp=false;tmp=((x—width/2)*(x—width/2) + (y—height/2)*(y—height/2) )<=225*225;tmp=tmpI(((x—width/2)*(x—width/2) + (y+height/2)*(y+height/2))<=225*225);tmp=tmpII(((x+width/2)*(x+width/2) + (y—height/2)*(y-height/2))<=225*225);tmp=tmpII (Ux+width/2)*(x+width/2) + (y+height/2)*(y+height/2))<=225*225)tmp=tmp && (abs(x)>(width/2)) && (abs(y)>=height/2);if (tmp)return true;return false;void MyqLearningRobot: :boxResponse (ArPose position,ArPose newPose)bool isFinished;isFinished = false;ArPose oldPose;oldPose.setX(position.getXQ);oldPose. setY (position. getYQ);oldPose. setTh (position. getTh ());147mt tMax=1;for(int i=O; i<tMax;i++)calNextPosition (newPose);if (obstacleDetect (newPose))break;elseposition.setX(newPose.getXQ);position.sety(newPose.getYQ);position. setTh (newPose . getTh 0);calculateReward (oldPose);isFinished=true;void MyqLearningRobot: :calNextPosition(ArPose boxPose)ArPose position;double P1 = 3.14159;boxPose.setX(position.getXQ+Mf*netForceValue*cos((netForceAngle+position.getTh0)/180*PI))boxPose.setY(position.gety0+Mf*netForceVaiue*sin( (netForceAngle+position.getTh0)/180*PIH;boxPose. setTh (position. getTh ()+netTorque*Mt);void MyqLearningRobot: :calculateReward(ArPose oldPose)// calalculatereward Distancedouble dOld=pow((oldPose.getX0—theGoal.getX0) ,2)+pow((oldPose.getY0-theGoal.gety0) ,2);dOld=sqrt (dOld);double dNew=pow((position.getX0—theGoal.getX0),2)+pow((position.getY0—theGoal.getY0),2);dNew=sqrt (dNew);double ratioE=O.3;double rDist= (dOld-dNew)*ratioE;ArPose tmpObst;ArPose obstLocalPosition;double obstAngle;148double min=l000;// calalculate reward Obstaclefor(int i=0;i<sizeof (obstacleArrayNewX) ;i++)tmpObst. setX (obstacleArrayNewX [i]);tmpObst. setY (obstacleArrayNewY [ii);double tmpDist=pow((oldPose.getXQ—tmpObst.getXQ),2)+powUoldPose.getYQ—tmpObst.getYW,2);tmpDist=sqrt (tmpDist);if (tmpDist> (detectionRadius (1700,720)))continue;frameConvert (tmpObst, obstLocalPosition, oldPose);obstAngle=atan2(obLocPosition.getYQ,obLocposition.getXQ)/PI*180;double tmpDiff=abs (netForceAngle-obstAngle);if (tmpDiff>180)tmpDiff=360—tmpDiff;if (tmpDiff<min)min=tmpDiff;double rObstacle=0;if (min==1000)rObstacle=1;elserObstacle= (sin(min/180*PI/2.0) —0.3)*1.5;/1 calalculate reward Roationdouble rRotation=abs(position.getThO—oldPose.getThO);rRotation=cos(rRotation/180*PI)-0.9;// total rewardreward=wl*rDist+w2 *rObstacle+w3*rRotation;void MyqLearningRobot::updateQ(int state, mt newState, double reward)mt actionNum = 6;double learningRate = 0.8;double discountRate = 0.9;double q[6];double vMax = -10000;149for(int i=0; i<=actionNum—l; i++)q[iJ=qTable[newState] [i];if(q[i]>=vMax)vMax=q[i];qTable[state][actionj=(l_learningRate)*qTable[statej[action]+learningRate* (reward+discountRate*vMax);void MyqLearningRobot::frameConvert(ArPosepl,ArPose p2, ArPose origin)angle= (angle/(l80*(PIH);double xl = pl.getXQ;double yl = pl.getYQ;double x2 = origin.getXQ;double y2 = origin.getY();p2. setX((xl*cos(angle)+(yl*sin(angle)));p2. setY ((yl*cos(angle)— (xl*sin(angle)));ArPose MyqLearningRobot::frame2Tol(ArPose p2, ArPose origin)ArPose pl;double xl = p2.getXQ;double yl = p2.getYQ;double x2 = origin.getXQ;double y2 = origin.getYQ;double theta = origin.getThQ;pl.setX((xl*cos(theta)) — (yl*sin(thetaH+x2);pl.setY((xl*sin(theta)) + (yl*cos(theta))+y2);return p1;bool MyqLearningRobot: :arriveGoal (ArPose theGoal,ArPose boxPose)double minSuccessDist = 1000;double d=0;double x = boxPose.getXQ— theGoal.getXO;double y = boxPose.getY()— theGoal.getYO;d=sqrt ((x*x)+ (y*y));if (d<=minSuccessDist)return true;elsereturn false;150General.cpp/This program will uses to push the Box to the given location in thecontrol environment at the IAL. During the transportation task needto consider about the speed limiting actions to the collision avoidanceas well. Press escape to shut down Aria and exit.#include <bitset>#include<iostream>#include<conio. h>#include<stdlib . h>#include “Aria.h”#include “Server.h”#include “Client.h”#include “Pose.h”#include ‘Qlearning.h”#include “Gene ra1ACTS .using namespace std;double objectPoseX, objectPoseY, objectPoseTh;double*obstacleArrayX;double*obstacleArrayY;double*obstacleArrayNewX;double*obstacleArrayNewY;double detectionRobotArrayX[51 ;double detections.obotArrayY [51;double detectionRobotArrayTh[51; double dR;mt BuildStateCode (ArPose objectCenter)double objectCenterX = objectCenter.getXQ;double objectCenterY = objectCenter.getYQ;double objectCenterTh = objectCenter.getTh();ArPose theGoal;theGoal.setPose(500,7000);double P1 = 3.1415926535897932384626433832795;mt stateCodeOfThisRobot;double goalAngle=(atan2 ( (theGoal.getYQ—objectCenterY),(theGoal.getXQ_objectCenterX)H/PI*180;if (goalAngle<0)goalAngle=goalAngle+3 60;stateCodeOfThisRobot = (int) (goalAngle/(360.0/256for(int i=0; i< sizeof(obstacleArrayNewX);i++)ArPose localPos;objectCenterTh=objectCenterTh/180*(P1);double xl = obstacleArrayNewX[ij; //ObstacleArrayX[il;double yl = obstacleArrayNewY[il; //ObstacleArrayY[il;double x2 = objectCenterX;double y2 = objectCenterX;151localPos.setXUxl*cos(objectCenterTh))+(yl*sin(objectCenterTh)) —(x2*cos(objectCenterTh) )—(y2sin (objectCenterTh)H;localPos.setY((yl*cos(objectCenterThH_(xl*sin(objectCenterTh)) +(x2*sin(objectCenterTh))—(y2cos (objectCenterTh)H;double angle=(atan2(localPos.getYQ, localpos.getXQH/PI*180;if (angle<0)angle=angle+360;mt obstaStat=(int) (angle/45.O);obstaStat= (int)pow(2.0,obstaStat);stateCodeOfThisRobot=stateCodeOfThisRobotI(obstaStat<<8);return stateCodeOfThisRobot;void qTablelnitiaiization()double qTable[8192] [6];for (mt ± = 0;i < 8192; i++)for (mt = 0;j < 6; j++)qTable[i] [jJ=O;mt main (mt argc, char**argv)Aria: : mit()mt robotNo = 2;ArPose initialPose (xvalue, yvalue);ArPose currentPose;ArPose theGoal;theGoal.setPose(500,7 000);ArRobot robot;ArSonarDevice sonar;ArSick sick;ArKeyHandler keyHandler;ArArgumentParser parser(&argc, argv);ArSimpleConnector simpleConnector(&argc, argv);robot.addRangeDevice (&sonar);robot.addRangeDevice (&sick);/7 Create objectsMyServer myServer;MyClient myClient (&robot);MyReading myReading(&robot);MyGeneralACTS myACTS (&robot);MyPose myPose;MyDecodeEncode myDecodeEncode (&robot, &sick, &gyro);MyqLearningRobot myqLearningRobot(&robot, &sick, &gyro);Aria: : setKeyHandler (&keyHandler);robot.attachKeyHandler(&keyHandler);152// the actions we’ll use for wanderArActionStaliRecover recover;ArActionBumpers bumpers;ArActionAvoidFront avoidFrontNear(”Avoid Front Near”, 225, 0);ArActionAvoidFront avoidFrontFar;ArActionConstantVelocity constantVelocity(”Constant Velocity”, 150);ArActionGoto gotoPoseAction(”goto”,ArPose(0,0,0),l00,100,80,2);/1 Parse all command line argumentsif (!simpleConnector.parseArgs() II argc > 1)simpleConnector.logOptions0;keyHandler. restoreQ;return 1;1/ Connect to the robotif (!simpleConnector.connectRobot(&robot))printf (“Could not connect to robot... exiting\n”);Aria: :exit(l)/1 Connect to the Laser Range Finderif (!simpleConnector.connectLaser(&sick))printf (“Could not connect to Laser Range Finder... exiting\n”);Aria: :exit(l)if (!Aria: :parseArgs() II !parser.checkHelpAndwarnUnparsedO)Aria: : logOptions Q;exit (1)cout<<”Robot Th: “<< robot.getPose() .getThQ<<endl;// add the actionsrobot.addAction(&recover, 100);robot.addAction(&bumpers, 75);robot.addAction(&avoidFrontNear, 51);robot.addAction(&avoidFrontFar, 35);robot.addAction(&constantVelocity, 25);robot.addAction(&gotoPoseAction, 50);//****************************** (7777);Sleep (1000)myServer. runAsync;myReading. runAsync Q;myClient.connectServer (7777);robot. setAbsoluteMaxTransAccel (40);robot. runAsync (true);robot.comlnt (ArCommands: :ENABLE, 1);constchar*myIP= “IP Adress”;char*newMail = NULL;153mt noOfRobot=0;mt counti = 0;mt count2 = 0;mt totalNoOfRobot = 3;double dR, dRX, dRY, thetaDr;mt width = 1700;mt height = 720;ArPose aa,bb;char*55;char*messagestr;char*messagestate;double*x;double*y;double*th;char*mailheading = NULL;mt stateWord = 0; mt action = 0;mt statecodelnteger = 0;char*otherlP[7];otherlP[0] =“”;otherlP[11 =“”;otherlP[2] =“192.168.l.5”;dR = myqLearningRobot.detectionRadius (1700,720);while (Aria: :getRunning())robot. clearDirectMotion Q;newMail = myReading.checkMailO;if (newMail==”STOP”)robot.stop();cout<<”wait! “<<endl;if (newMail==”Estimated Object Pose Transmitted”)aa = myDecodeEncode.StringDecode(ss);myDecodeEncode . GetMailHeading (mailheading);if(mailheading == “boxPose”)objectPoseX = aa.getXQ; objectPoseY = aa.getY();objectPoseTh = aa.getTh();thetaDr = atan((objectPoseX-robot.getPose() .getXW/(objectPoseY—robot.getPose () .getY OH;dRX= aa.getXO+dR*cos(thetaDr);dRY= aa.getYO+dR*sin(thetaDr);gotoPoseAction.setGoal(ArPose(dRX, dRX, 0));// move to detection radiousrobot.addAction(&bumpers, 75);robot.addAction(&avoidFrontNear, 52);robot.addAction(&avoidFrontFar, 20);while (!gotoPoseAction . haveAchievedGoal 0)Sleep(l0000);myClient.sendingString(”Leader I’m in detection radius”,true,NULL);154if (newMail==”Leader I’m in detection radius”)if (((robot.getPoseQ.getXQ- dRX) <= 50) &&((robot.getPoseQ.getYQ- dRY) <= 50)){for (mt IP=0; IP<3; IP++)if ((!strcmp(myIP,otherlP[IP])==0) &&(!strcmp((char*)1010, otherlP[IP] )==0))messageStr = myDecodeEncode. StringPackageol(false,robot.getPoseO.getXQ,robot.getPoseO.getY () , robot. getPose () . getTh H;myClient.sendingString(messageStr,true, otherlP[IPJ);myclient.sendingString(”Close to detection radius”,true, otherlP[IPI);robot. stop Q;if (newMail==”I am in close to detection radius”)bb = myDecodeEncode.StringDecode(messageStr);myDecodeEncode. GetMailHeading (mailheading);if(mailheading == “robotPose”)if (! (robotNo == 1010))detectionRobotArrayX[noOfRobot]= bb.getXQ;detectionRobotArrayY[noOfRobot]= bb.getY();detectionRobotArrayTh[noOfRobot]= bb.getTh();noOfRobot ++;if (noOfRobot == 2)double twoX = detectionRobotArrayX[0]-detectionRobotArrayX [1];double twoY = detectionRobotArrayY[0]—detectionRobotArrayY [1];double V=(double) sqrt((twoX*twoX)+(twoY*twoYH;if (V>= ((2*dR)_500))myPose.FindObstacleGroup(&robot, &sick);if ((sizeof(obstacleArrayX)== 0) &&(sizeof (obstacleArrayY) == 0))mt index = 0;double obstacleArrayNewX [sizeof(obstacleArrayX)];double obstacleArrayNewY [sizeof(obstacleArrayX)1;for (mt i=0; i <(sizeof (obstacleArrayX)—l); i++)double bbx = aa.getXQ;155double bby = aa.getYQ;double oAx = obstacleArrayX[i] - bbx;double oAy = obstacleArrayY[i] — bby;double oAxy= sqrt((oAx*oAx)+(oAy*oAyH;if (oAxy <= dR)obstacleArrayNewX[index] = obstacleArrayX[i];obstacleArrayNewY[index] = obstacleArrayY[i];index++;stateCodelnteger = BuildStateCode (aa, obstacleArrayNewX,obstacleArrayNewY);for (mt infoTransmit=0; infoTransmit<3; infoTransmit++)if (!strcmp (myIP, otherlP [infoTransmitj) ==0)messageState = myDecodeEncode StringPackageo2(stateCodelnteger);myClient. sendingString (messageState, true,otherlP[infoTransmit]);myClient. sendingString(”generate the satate code”, true,otherlP[infoTransmit));if (newMail==”generate the satate code”)stateword = myDecodeEncode.StringDecodeState(messageState);myDecodeEncode . GetMailHeading (mailheading);if (mailheading == “An environment state”)if (robotNo == 1010)action = myqLearningRobot.selectAction(stateWord);myClient.sendingString(”I received an action i”,true,NULL);if (newMail == “ I received an action i “)// i = 1,2,3,4,5,6countl ++;if (countl == totalNoOfRobot)ArPose actionPosition = myqLearningRobot. calculateRobotActionPositions (action, aa);156gotoPoseAction. setGoal (actionPosition);robot.addAction(&bumpers, 75);robot.addAction(&avoidFrontNear, 52);robot.addAction(&avoidFrontFar, 20);while(gotoPoseAction . haveAchievedGoal0)Sleep(100);robot.stop();myClient.sendingString(”I am in the actionposition!! !“,true,NULL);robot . remAct ion (&bumpers);robot. remAction (&avoidFrontNear);robot. remAction (&avoidFrontF’ar);if (newMail == “I am in the action position”)count2 ++;if (myqLearningRobot.arriveGoal (theGoal,aa) == true)cout<<”Now in the goal”<<endl;break; I/end programbb = myPose.poseEstimate(&robot, &sick, &gyro);while (robot.checkRangeDevicesCurrentPolar (—10, l0)>1200)currentPose = robot.getPose0;currentX = currentPose.getX0;currentY = currentPose.getl0;currentTh = currentPose.getTh();Sleep( 500)if(robot.checkRangeDevicesCurrentPolar(-30, 30) <1200)robot. stop 0;/1 2 I 3/1 ---- I/1 1 I4mt blobxl, blobx2, blobyl, bloby2;mt bloblnfo = myACTS.ACTSC010rB10bO;cout<<’bloblnfo: “<<bloblnfo<<endl;ArUtil: :sleep(8000);switch (bloblnfo)case 1: 1/ received the blobmyClient. sendingString(”STOP”, true,NULL);157myACTS.getXcgO;myACTS.getYcgQ;cout<<”Blob readings —> XCG: “<<myACTS.getXcgQ<<”YCG:“<<myACTS.getYcg() <<endi;blobxl = 320; blobyl = 240;blobx2 = myACTS.getXcgQ;bloby2 = myACTS.getYcgQ;myACTS.CentreTheBlob(&robot,blobxl,blobx2,blobyl,bloby2);ArUtil: :sleep(8000)double x, y, theta;myClient.sendingString(”Object Pose Estimating”,true,NULL);aa = myPose.poseEstimate(&robot, &sick, &gyro);x = aa.getXQ;y = aa.getYQ;theta = aa.getThO;/1 robot detects thecolor blob; change its numbermyClient.myOwnNo = 1010;myClient.sendingString(”Object Pose Estimated”,true,NULL);ss = myDecodeEncode.StringPackageol (true,x, y, theta);for (mt robotld=0; robotld<3; robotld++)if (!strcmp (myIP, otherlP[robotld]) ==0)myClient.sendingString(ss,true,otherlP[robotld]);ArUtil: :sleep (8000);myClient. sendingString (“Estimated Obj ect PoseTransmitted”, true,NULL);break;case 2: // not received the blobcout<<”Sorry you haven’t luck this time”<<endl;cout<<”I’ll try my best find bolb here if possible”<<endl;mt randomAngle = randQ%90 — 45;cout<<”Random Angle: “<<randomAngle<<endl;ArUtil: :sleep(8000)robot. setHeading (randomAngle);ArUtil: :sleep(8000)break;// Robot disconnected, shutdownAria: : shutdown Q;return 0;158


Citation Scheme:


Usage Statistics

Country Views Downloads
China 9 1
United States 8 0
India 3 0
France 3 0
Ukraine 2 0
Israel 1 0
Brazil 1 0
City Views Downloads
Unknown 8 0
Wuhan 5 0
Rochester 2 0
Ashburn 2 0
Beijing 2 1
Chandigarh 2 0
Guangzhou 2 0
Princeton 1 0
University Park 1 0
Seattle 1 0
Bangalore 1 0

{[{ mDataHeader[type] }]} {[{ month[type] }]} {[{ tData[type] }]}
Download Stats



Customize your widget with the following options, then copy and paste the code below into the HTML of your page to embed this item in your website.
                            <div id="ubcOpenCollectionsWidgetDisplay">
                            <script id="ubcOpenCollectionsWidget"
                            async >
IIIF logo Our image viewer uses the IIIF 2.0 standard. To load this item in other compatible viewers, use this url:


Related Items