Open Collections

UBC Theses and Dissertations

UBC Theses Logo

UBC Theses and Dissertations

Real-time intelligent behaviour in dynamic environments : soccer-playing robots Sahota, Michael K. 1993

Your browser doesn't seem to have a PDF viewer, please download the PDF to view this item.

Item Metadata

Download

Media
831-ubc_1993_fall_sahota_michael.pdf [ 4.03MB ]
Metadata
JSON: 831-1.0051205.json
JSON-LD: 831-1.0051205-ld.json
RDF/XML (Pretty): 831-1.0051205-rdf.xml
RDF/JSON: 831-1.0051205-rdf.json
Turtle: 831-1.0051205-turtle.txt
N-Triples: 831-1.0051205-rdf-ntriples.txt
Original Record: 831-1.0051205-source.json
Full Text
831-1.0051205-fulltext.txt
Citation
831-1.0051205.ris

Full Text

We accept this thesis as conformingto the required standardReal-Time Intelligent Behaviour in DynamicEnvironments: Soccer-playing RobotsbyMichael K. SahotaB.A.Sc. in Engineering Science,University of Toronto, 1991A THESIS SUBMITTED IN PARTIAL FULFILLMENT OFTHE REQUIREMENTS FOR THE DEGREE OFMASTER OF SCIENCEinTHE FACULTY OF GRADUATE STUDIESDEPARTMENT OF COMPUTER SCIENCETHE UNIVERSITY OF BRITISH COLUMBIAAugust 1993© Michael K. Sahota, 1993In presenting this thesis in partial fulfilment of the requirements for an advanceddegree at the University of British Columbia, I agree that the Library shall make itfreely available for reference and study. I further agree that permission for extensivecopying of this thesis for scholarly purposes may be granted by the head of mydepartment or by his or her representatives. It is understood that copying orpublication of this thesis for financial gain shall not be allowed without my writtenpermission.(Signature)Department of  C.-01)1-4^C--eThe University of British ColumbiaVancouver, CanadaDate  5-eiel. 21,013 DE-6 (2/88)AbstractAn autonomous robot operating in a dynamic environment is confronted by the questionof, "What to do now?" The problem of designing a controller for a car-like robot competingwith another robot in a game of soccer is considered. This is a dynamic environment; thelocations of the ball and the robots are constantly changing. Rapid and appropriate responsesto changes in the world are central to intelligent behaviour.There are no established robot architectures that seem adequate for the challengesposed in dynamic domains. Traditional work in Artificial Intelligence has focused on theconstruction of plans for future execution and has resulted in architectures that are notextensible to dynamic environments. Even recently developed "reactive" architectures suchas the subsumption architecture, situated automata, and RAP do not seem satisfactory.Reactive deliberation has been designed to present a set of structural elements neededin dynamic domains. Reactive deliberation makes three contributions to robot architecture.First, the decisions of what task to achieve and how to achieve it are best resolved in unison.Second, the transient goals of a robot must be evaluated at a rate commensurate with changesin the environment. Third, goal-oriented modules called behaviours are a useful abstractionthat allow effective goal-arbitration and sharing of scarce computational resources.The effectiveness of reactive deliberation has been demonstrated through a tournamentof one-on-one soccer games between robots. Current functionality includes motion planning,ball shooting, and playing goal, with accurate motion control at speeds of 1 m/s. The resultsof the soccer tournament suggest that the architectural elements in reactive deliberation aresufficient for real-time intelligent control in dynamic environments.Table of ContentsAbstract^ iiContents iiiList of Tables^ viiList of Figures viiiAcknowledgments^ ix1 Introduction 11.1 Motivation ^  11.2 Research Goals and Objectives ^  21.3 Thesis Outline ^  32 On Robot Architecture 52.1 Setting The Stage^  52.1.1 The World of Autonomous Mobile Robots ^  62.1.2 Robots have goals ^  92.1.3 Criteria for Robot Architectures ^  112.2 Complete Reactive Architectures  132.2.1 The Subsumption Architecture^  132.2.2 Concrete-Situated ^  152.2.3 Situated Automata  172.2.4 Summary ^ 20iii2.3 Partitioned Architectures ^  212.3.1 Shakey ^  212.3.2 Reactive Action Packages ^ 242.3.3 ATLANTIS ^ 252.3.4 The Dynamics of Action Selection ^  262.3.5 Summary ^ 282.4 Additional Related Literature ^  302.5 Discussion ^  323 Reactive Deliberation 343.1 The Architecture ^  353.2 The Executor  373.3 The Deliberator ^  383.3.1 More Than Just A Theory of Action Selection ^ 393.3.2 Further Issues ^  41Subtopic 1 Starvation and Stability ^  41Subtopic 2 Use of Plans ^ 42Subtopic 3 Multiple Resources 42Subtopic 4 Modularity and Concurrency^  42Subtopic 5 A Possible Unitary Framework 43Subtopic 6 Learning ^  43Subtopic 7 Inter-robot Cooperation ^  433.4 Discussion ^  44iv4 Playing Soccer: A Real-World Experiment^ 464.1 The Dynamite Testbed ^  464.1.1 The Soccer Field  474.1.2 System Overview ^  474.1.3 User Interface  484.1.4 System Latency^ 494.1.5 Soccer with Monster Trucks ^  504.1.6 Non-holonomic Constraints  504.1.7 Design Decisions ^  514.1.8 Importance of the testbed to this thesis ^  524.2 The Soccer Controller Implemented in Reactive Deliberation ^ 534.2.1 Overview ^  534.2.2 Executor  544.2.3 Deliberator ^  564.3 The Experiment  585 Results and Evaluation^ 615.1 The LCI Cup  615.2 Discussion ^  626 Conclusions 656.1 Summary and Contributions ^  656.2 Future Work ^ 67Appendix A:The Vision Subsystem^ 70Appendix B:The Simulator^ 73Appendix C:Path Planning 75C.1 Introduction ^  75C.2 A Metric for Motion Planners ^  75C.3 Motion Planning in an Infinite Plane ^  76C.4 Motion Planning in a Convex Arena 77C.5 A Fast Motion Planner ^  78Bibliography^ 80viList of TablesTable 1^Distinguishing Features of Complete Reactive Architectures ^ 20Table 2^Partitioned Architectures in Perspective ^  29Table 3^Action Schema for Soccer-Playing  55Table 4^Soccer-Playing Behaviours in Reactive Deliberation ^ 57Table 5^Final Scores in the LCI Cup ^  61vuList of FiguresFigure 1 An abstraction of the robot domain ^  6Figure 2 Where are the goals in a robot system?   9Figure 3 Task-achieving Behaviours in the Subsumption Architecture ^ 14Figure 4 An abstraction of the Concrete-Situated Architecture ^ 17Figure 5 The Architecture of Shakey the Robot ^  22Figure 6 The RAP Architecture ^  24Figure 7 The Architecture of Reactive Deliberation  36Figure 8 Robot Players Inhabiting the Soccer Field ^  47Figure 9 Hardware Setup ^  49Figure 10 Two Dynamo Vehicles: Zeno (front) and Heraclitus (rear) ^ 51Figure 11 The Reactive Deliberation Controller ^  53Figure 12 Path Plot for the Car-like Dynamite  55Figure 13 The Vision Engine^  70Figure 14 Output from the real camera ^  71Figure 15 Processed output from the DataCube ^  71Figure 16 The Dynamite Simulator with Two Cars and a Ball ^ 74Figure 17 Jump paths with unidirectional motion  77vu'AcknowledgmentsWithout the help and support of the Clod-Buster/Dynamo group this thesis would neverhave happened. Stewart Kingdon, an extraordinary vision programmer and DataCube guru,is the creator of the vision system that I have relied on for months. Rod Barman handledthe electronic fiddlybits and was a key participant in getting this project on the road. Creditalso goes to Heath Wilkinson whose simulator I used extensively.After a long debate last summer, I was finally able to convince Alan Mackworth to bemy supervisor. This turned out to be a good thing. Over the past year, we have had a numberof meetings that were so good that they went on and on and . . . Thanks to Alan, I havekept on the straight and narrow, hence avoiding the seductive trap of Brooksian robotics.I have received invaluable comments on draft versions of this thesis; I would like torecognize the following individuals for taking the time to review it: Jeff Brewster, Stewart,Alan, Keiji Kanazawa, Ying Zhang, Dinesh Pai, and Michael Horsch.Thanks also goes to my wife Beverly who has put up with my long hours, sudden planchanges, and late arrivals. In time, perhaps, will I be weaned off my desire to show thelatest copy of the Dynamo video to everyone we know.ixChapter 1Introduction1.1 MotivationEver since the nineteen-fifties when Isaac Asimov identified robots as machines thatcould be helpful to society, people have dreamed of having robots that can operate side byside with human beings [Asi90]. One obvious characteristic of our environment is that it isdynamic — it changes over time. As robot designers, we need robot architectures to guidethe design of robots that can function in dynamic environments.Robot architecture is both the art and science of building robots as well as the structureof robots. The computational aspects robot architecture are of interest to computer scientists.Although this includes the type and organization of computing elements, this thesis willbe restricted to considering only the software. The term robot architecture will be used todescribe approaches taken in the design of software for sensing, thinking, and acting.There are no established architectures that seem adequate for the challenges posed indynamic domains. Traditional work in Artificial Intelligence has focused on the constructionof plans for future execution and has resulted in architectures that are not extensibleto dynamic environments. Even recently developed "reactive" architectures such as the1Chapter 1^ Introductionsubsumption architecture [Bro86], situated automata [KR90], and RAP [Fir89] do not seemsatisfactory.1.2 Research Goals and ObjectivesThe main goal of this thesis is to describe an architecture that can generate real-timeintelligent behaviour in dynamic environments. A robot operating within the real-timeconstraints placed by the external environment must answer the question: "What to donow?" It is not sufficient for a robot to react and interact with its environment; it must act ingoal-oriented ways to produce intelligent behaviour and not just any behaviour. Intelligentbehaviour simply means that both the achievement of goals and the manner in which theyare accomplished are important. The importance of real-time control is identified by thefollowing quote: "An oncoming truck waits for no theorem prover." [Gat92] The moral isthat robots operating in dynamic domains must keep pace with changes in the environment.This thesis investigates robotic intelligence as opposed to general intelligence. Roboticintelligence is about generating intelligent behaviour in robots, while general intelligence isthe pursuit of human level intelligence through connectionist and symbolic approaches.Soccer has been chosen as a domain for experiments with mobile robots since it hascharacteristics prevalent in the real-world that are absent from typical robot problem domains.Soccer-playing is a dynamic environment because the ball and the cars are all moving. Onefeature is that the score of a game provides an objective measure of a robot's ability tofunction. The inadequacy of robot architectures suitable for soccer motivates the need forbetter architectures.The research strategy pursued is to: (1) determine a research problem, (2) developa solution, and (3) test the solution with an experiment. The research problem is the2Chapter 1^ Introductiondevelopment of an architecture suitable for dynamic domains such as soccer. The solutionpresented in this thesis is a robot architectures called reactive deliberation. The experiment isa tournament of one-on-one games of soccer that are conducted using the Dynamite testbed.The testbed consists of a collection of radio-controlled robots under visual control. Theuse of real-world robots to demonstrate the effectiveness of the proposed architecture is anessential component of this thesis. Hence, the construction of a system of soccer playingrobots is the secondary goal of this thesis.The architecture presented in this thesis, reactive deliberation, is focussed on the problemsthat arise in dynamic domains. Further, it is an incomplete robot architecture that does notaddress important issues such as sensor processing and fusion, the development of maps,and world modeling.1.3 Thesis OutlineThe purpose of Chapter 2 is to consider the suitability of established architectures forrobots operating in dynamic environments. The role of robot architecture and the challengesin the area of mobile robots provide a context for the work performed in this thesis. Landmarkrobotic architectures are analyzed in the context of their suitability to dynamic environments.In Chapter 3, the principal results of this thesis are presented through a robot architecturecalled reactive deliberation. It can be viewed as a coherent summary of the essential elementsof an architecture for dynamic domains.The purpose of Chapter 4 is to demonstrate the link from theory to practice throughthe Dynamite testbed that has been designed for experiments with multiple mobile robots.A robot controller based on reactive deliberation has been designed so that a robot may3Chapter 1^ Introductioncompete with another robot in a game of soccer. A series of soccer experiments called theLCI Cup are described.The experimental results and conclusions are contained in Chapter 5. In addition, thetheoretical limitations and advantages of reactive deliberation in relation to other architecturesare discussed.Chapter 6 summarizes the contributions of this thesis.The appendices provide greater detail on subjects not directly connected with theobjective of this thesis: the vision subsystem, the simulator, and path planning algorithms.It is possible to avoid most of the theory on robot architecture by omitting Chapters 2and 3. This leaves the reader with the more application-oriented details in Chapters 4 and 5.On the other hand, Chapter 3 provides a detailed look at reactive deliberation and containsthe bulk of the contributions of this thesis.4Chapter 2On Robot ArchitectureThe purpose of this chapter is to consider the suitability of existing architectures forrobots operating in dynamic environments. Before considering specific architectures, somebackground is introduced, such as the role of robot architecture, the challenges in the areaof mobile robots, and a classification scheme for robot architectures. The body of thechapter describes landmark robot architectures belonging to each class, and a discussion ofthe differences is contained in the final section. Section 2.4, additional related literature,describes previous work relevant to this thesis that has not been presented as part of acoherent architecture.2.1 Setting The StageThere is more to robotics than just computer science; the real world is not an abstractdata type. Effecting change in the environment is not as simple as writing to some outputregisters. Few intelligent robotic systems function without carefully monitoring changes inthe environment. Usable information about the environment has to be synthesized fromcomplex sensor data.5Chapter 2^ On Robot ArchitectureRobotReamingMachin.ery Effectors(Ccetroller)EnvironmentFigure 1: An abstraction of the robot domain2.1.1 The World of Autonomous Mobile RobotsThe purpose of this section is to describe the complexities faced by mobile robots.Figure 1 provides an abstract view of the domain of robots. Robots interact with the worldthrough sensors and effectors/actuators. Sensors measure physical properties in the worldsuch as the light impinging on a camera, or the closure of a switch (contact sensor). Effectorsallow a robot to effect changes in the world. For example, the motors and steering servos in amobile robot allow it to move about and push objects. The part of the robot labelled reasoningmachinery is typically a monoprocessor computer, although it could be a combinational logiccircuit. A robot architecture seeks to describe the content and organization of the controller.A typical view of the design process for robots is as follows. Given a specification ofthe desired behaviour of the robot and a description of the robot and environment, producea design for the robot controller [ZM92a]. The triple of sensors, actuators, and environmentis considered fixed. For example, it is difficult to modify the robot hardware and nearlyimpossible to modify the environment. The role of robot architecture is to aid the design ofa controller that will produce the desired behaviour given a fixed triple.There are a number of characteristics of the environments of mobile robots that compli-6Chapter 2^ On Robot Architecturecate the design of controllers. Typically the environments are unstructured, unpredictable,complex, dynamic and perceived through noisy sensors.Mobile robots are often intended to operate in unstructured or unconstrained environ-ments. The real world is usually not organized so that it is easy for a robot to navigate andidentify hazards. An example of a highly structured environment is a room without furniturewith smooth and level floors, fixed lighting, and orthogonal adjacent walls. However, anormal office usually has irregular structure. A robot would have to contend with differentkinds of obstacles such as chairs, tables, and filing cabinets. The floor might have extensioncords, video cables, or even a thick Persian carpet on it. Most buildings have windows sothe optical sensors of a robot must work under varying illumination conditions. Even thedomain of an office, however, is highly structured in comparison with outdoor environmentssuch as a forest.Most environments are unpredictable. For instance, a robot may be unable to predictthe outcome of an action. Consider a robot rolling a die; unless it knows how to cheat, theoutcome of the roll is unpredictable. A more practical example is that of a robot openinga door. The robot can make a partial prediction of what will be on the other side based onprior knowledge. Things may have changed since the last time the robot opened the door,so this can be characterized as partially unpredictable.Agre and Chapman [AC87] characterize the world as complex. The essential idea is thatmost real world situations cannot be fully represented inside a computer. There is so muchto know that only a subset of all the information available can be absorbed and reasonedabout in a finite amount of time. The issue of complexity places limits on the suitability andcompleteness of world models internal to a robot.7Chapter 2^ On Robot ArchitectureAs illustrated in Figure 1, robots obtain knowledge of the outside world through sensors.Sensors provides a limited and incomplete view of the world. The raw data provided bythe sensors has finite resolution and is often noisy. Many robots are limited to sonar rangesensors, although some have video and other sensors. As a result, robots are limited toincomplete information. Robots need to interpret raw data so that they are able to functionin the world. As anyone in computer vision will testify, this is a difficult taskA fundamental problem for robots is that the world is dynamic. The environment changeswith time due to: actions taken by the robot, actions taken by other agents, and continuingprocesses in the environment (e.g. a rolling ball). In dynamic worlds, a robot may have torespond to changes within a limited period of time (e.g. the problem of an oncoming truck).The controller for a mobile robot is an embedded system with fixed computational hard-ware. Limited time coupled with fixed computational hardware results in a limited amountof computation that can be performed. Regardless of advances in technology, robots willonly be able to perform bounded computations. Effective robot architectures must provide astrategy for dealing with scarce computational resources [DB88]. A computational resourceis a multipurpose computing engine (typically a Von Neumann architecture computer) thatcan be used for a variety of tasks. The problem of distributing computational resources isoften the problem of sharing computer cycles.Robot domains can be classified according to how well the above characteristics apply.It is interesting to note that many architectures are designed for specific domains wherecertain problems are ignored. This taxonomy could be more rigorously developed to classifydomains and architectures, but this is beyond the scope of this thesis.8Chapter 2^ On Robot Architecture2.1.2 Robots have goalsAll robots have goals. These goals may be explicit in the mind of the designer, in theinternal structures of the robot, or externally attributed to the robot by an outside observer. InFigure 2 a robot arm is trying to pick up a block. It is easy to imagine how the goals mightbe represented in each of the designer, robot, and observer. It is possible for the robot to haveeither explicit or implicit goals. In the figure, these alternatives are shown by the predicatePickup(Block) and the binary encoding 1001110011000. For example, some programminglanguages compile down to circuit descriptions, so there are no predicates stored in the robot.(The circuit description is shown in the figure as a series of bits.)Pickup(Block)???--,..0 ObserverFigure 2: Where are the goals in a robot system?An observer may or may not be able to ascribe goals to a robot exhibiting complexbehaviour, so the goals of the robot need not necessarily be represented in observers. Incontrast, the designer is responsible for constructing the robot program and so must haveexplicit goals to program the robot. Goals describe function and purpose, and yet if thedesigner has no goals (for the robot), then what is the meaning of the robot program?9Chapter 2^ On Robot ArchitectureHence, a purposeful robot must contain implicit or explicit goals that are generated from theexplicit goals of the designer.When discussing or designing the architecture of a robot it is useful to describe thesub-goals that the robot is trying accomplish. The term transient goals describes the currentconcrete low level goals that a robot is trying to achieve through its actions. Transientgoals should be interpreted as active/current/operative goals. They have extent in time andtypically cannot be accomplished through a single atomic action. Transient goals changewith time: a robot may first try to accomplish one particular goal, while later a differentone. For example, a robot may have many goals that motivate movement from one room toanother. The transient goal in this situation would be to move through a certain doorway. Itis not the motivation, but rather the action attempted that reflects the transient goal. If onlya single action can be performed at a time, then there is only one transient goal active at atime. Transient goals are used to compare and classify approaches to robot architecture.Why are transient goals important? A robot operating in a dynamic world should seek toproduce intelligent behaviour through the selection of appropriate transient goals. If transientgoals are seldom evaluated, then a robot may continue to pursue a transient goal that is nolonger appropriate. Some important questions that robot architectures must answer are "Howare transient goals chosen?" and "When are transient goals changed?"Robots use control signals to drive actuators. At the lowest level of a robot controller,control signals are used to drive motors and other actuator mechanisms. Control signals canbe thought of as the level below which there is no further processing. Typically roboticsystems have a control rate associated with them that is a function of the hardware design.This is the maximum frequency that the robot can drive its actuators at.10Chapter 2^ On Robot ArchitectureRobot architectures can be compared and classified using transient goals and controlsignals. They will be differentiated from one another based on when transient goals areevaluated and can be changed. A complete reactive architecture is an architecture thatselects transient goals at the same rate as control signals. In such architectures, only controlsignals are generated, and transient goals are implicit in them. A partitioned architecture isone where transient goals are formed independently and usually at a slower rate than controlsignals. Both of these classes of architectures will be illustrated in the following sectionsthrough landmark robot architectures.The word reactive means the rapid response to stimulus. When applied to the wordbehaviour, it means that actions (1) occur within bounded time and (2) are in response tochanges in the environment. The term reactive architectures is used in this sense and manyof the architectures discussed in this chapter would fall under this classification. It doesnot mean that the robot has no internal state, although it has occasionally been used in thismanner.Some, but not all, reactive architectures are able to select transient goals at the same rateas control signals. These are named complete reactive architectures to note this distinguishingfeature. The word complete is used to indicate that a complete decision is made that includesthe selection of transient goals. Other architectures are labelled reactive because they respondto changes in the environment while pursuing specific transient goals, although they areunable to change transient goals.2.1.3 Criteria for Robot ArchitecturesThere is no consensus on how to build robot controllers. Every year new architecturesare proposed that are different from previous ones, yet the majority seem not to be designed11Chapter 2^ On Robot Architectureto test out a theory or to satisfy general criteria, but rather to work with a particular hardwareconfiguration on a specific problem. It is difficult to compare architectures that are associatedwith different domains, and to make matters worse, the target domains are rarely describedclearly.In this thesis, architectures will be evaluated according to their suitability for dynamicdomains. Architectures are judged on their underlying model and not on the performanceof a specific implementation. The following questions establish the criteria for comparingrobot architectures:• When are transient goals selected?• How are transient goals selected?• How are real-time constraints met?• What kinds of computations are permitted?• Are suitable abstractions provided?The main criteria in this thesis is based on when and how the transient goals of a robot areselected. Other issues of general importance to robot architecture such as sensory processing,the role of internal state (world modeling), general design issues are not addressed in thisthesis.Another omission from this thesis is a treatment of the correctness of robotic systems.Formal concepts from concurrent systems such as safety and liveness can be applied to roboticsystems. A safety property guarantees that an undesirable event never happens [MP92]. Aliveness property guarantees that a desirable event eventually happens (this means that thesystem is not susceptible to either deadlock or livelock). Constraint Nets, a computationalmodel for real-time systems, provides a thorough treatment of this [ZM92a; ZM92b; ZM93].12Chapter 2^ On Robot ArchitectureIndividual robots can be compared, but differences may be the result of technical detailswhich have nothing to do with their architectures. However, implemented systems providea lower bound on the utility of an architecture since limitations in the architecture are oftenreflected in the functionality of a robot. The robot controller described in this thesis will beexperimentally tested to establish a lower bound on the utility of reactive deliberation. Thiscriterion will also be used to highlight significant limitations of other architectures.Appropriate questions to ask of a robot are, "What tasks can it perform?" and "Howwell are they accomplished?" These measures of utility are determined by the observer ofa robot. A nice wording for these ideas is: "Intelligence is determined by the dynamicsof interaction with the world." and "Intelligence is in the eye of the observer." [Bro91].The essence of these ideas is that robots must achieve goals in an appropriate manner andinteract intelligently with the world.2.2 Complete Reactive ArchitecturesThe descriptor complete is used to describe a class of reactive architectures where bothtransient goals and control signals are evaluated at a system specific rate. In this section,the following architectures are described: the subsumption architecture, the concrete-situatedapproach and the situated automata. The common features of complete reactive architecturesare summarized in the last subsection.2.2.1 The Subsumption ArchitectureBrooks proposes the subsumption architecture that uses a hierarchy of task achievingbehaviours rather than the traditional Artificial Intelligence sense-model-plan-act framework[Bro86; Bro91]. The behaviours in the subsumption architecture are generic types ofbehaviour, not specific instances. For example, a hierarchy of behaviours that include generic13Chapter 2^ On Robot Architectureones like "plan changes to the world" is given in Figure 3. Some activity-specific examplesof behaviours are: seek-light, avoid-darkness, avoid-obstacles, and recharge [Kub92].reason about behaviour of objectsplan changes to the worldidentify objectsmonitor changesFromSensors build mapsToEffectors explorewanderavoid objectsFigure 3: Task-achieving Behaviours in the Subsumption Architecture [Bro86]Each task-achieving behaviour independently maps sensor inputs into desired controlsignal outputs. Behaviours principally interact with one another through a fixed set ofinhibition and suppression connections, although message passing between behaviours isallowed. These connections allow higher level behaviours to subsume the function of lowerlevels. Higher levels can suppress the outputs of lower levels when they wish to take controlof the robot. This form of arbitration, where the highest level behaviour wins, makes sensein the subsumption architecture where the highest level is the most sophisticated.Each behaviour is composed of a collection of simple computing elements: augmentedfinite state machines. The elements can store state, perform simple computations andcommunicate through fixed-topology connections. These design choices result in someinteresting features/limitations: there is no central locus of control; there is no central modelof the world; pointers and data structures cannot be easily implemented (and are discouragedsince they result in costly computations); and the search space of computations is bounded.14Chapter 2^ On Robot ArchitectureIn the subsumption architecture there is no explicit representation of goals and knowledge.Brooks argues that this is a good thing and promotes the slogan, "The world is its own bestmodel." [Bro91]. Since there is no explicit representation, the transient goals of the robotare implicit in the control signals it generates. The subsumption architecture is clearly acomplete reactive architecture since the transient goals and the control signals are evaluatedat the same rate. One additional note is that a behaviour does not guarantee an output. Forinstance, if a behaviour is performing a time consuming computation it may not produce anoutput on time. This feature permits computations that require more time than a sensing-action cycle. It is not clear, however, that slow behaviours can be effectively combined withmore responsive ones.The focus of the subsumption architecture is on engineering robots by starting out assimple as possible and developing them incrementally. This approach has been demonstratedon at least ten robots from the 50 gram Squirt to the much larger Herbert that has a robot armattached [Bro90]. However, the commitment to avoid representation seems to be a significanthinderance in the development of more sophisticated robots.2.2.2 Concrete-SituatedThe concrete -situated approach take by Agre and Chapman characterises real worldsituations as complex, uncertain, and immediate [AC87; Cha91]. Their approach has beenused to build software agents that play two different video games. Their system, althoughimplemented in LISP, is based on a connectionist approach to architecture where expensive(according to [Cha91]) data structures such as pointers and dynamic storage allocation areprohibited.The main thesis of the work is that intelligent activity can be produced from moment15Chapter 2^ On Robot Architectureto moment improvisation and need not result from the pursuit of detailed plans. The firstsystem designed on this basis, Pengi, has no internal state, while the more sophisticatedSonja has a small amount of distributed local memory. The action of the agent is determinedprincipally by the current situation. The implicit assumption is that the portion of the worldobservable by the agent contains sufficient information for making an intelligent decision.In such an environment, there is no need for memory, since the world itself acts as memorythat has perfect fidelity. The video games that Agre and Chapman's software agents livein are instances of environments that contain complete information. Unfortunately, the realworld is not composed of discrete blocks that form a finite 40 by 30 grid.Patterns of behaviour, called routines, result from interaction with the environment andnot from the explicit encoding of repetitive actions. Agre and Chapman have demonstratedthat simple rules for action can generate intelligent coherent behaviour. One criticism madeof this work is that the designer of the system is responsible for analysing the environmentand abstracting appropriate rules for behaviour [Mae90]. However, this is no different fromwhat is needed of the builders of other systems; the problem of automatic programming hasnot yet been solved.The system consists of a visual routine processor and a collection of action proposersas can be seen in Figure 4. The proposers correspond quite closely to behaviours. There isa proposer for each internal and external action that the agent can make. Internal actionsare computation requests sent to the vision routine processor, while external actions aremoves that the agent makes in the environment. Internal actions provide a central featureof the system: task directed sensing (or more accurately, situation directed computation).This approach to vision is gaining increasing recognition for solving some of the complexity16Chapter 2^ On Robot Architectureproblems in computer vision. Conflicts between proposers are ultimately resolved througha fixed priority scheme, so that only one action is enabled at a time. In general, multipleinternal actions will take place before the proposal for an external action is accepted.InternalActions-111.FromSensorsVisual RoutineProcessorCentral Systemwith Proposers ExternalActions^ ToEffectorsFigure 4: An abstraction of the Concrete-Situated ArchitectureEach change in the environment results in a new action. There is no separation betweenthe selecting transient goals (through proposers) and control signals (external actions); thewinning external action proposal is the control signal. Since transient goals and controlsignals are evaluated at the same rate, the concrete-situated approach is a complete reactivearchitecture.A valid criticism of the concrete-situated approach is that there are many problems inthe real world that do not arise in video games. Despite this, the work has made a significantcontribution to the theory of building intelligent systems. In the two video-game systems thesoftware agents could examine complex situations and generate intelligent actions througha distributed network of behaviours.2.2.3 Situated AutomataKaelbling and Rosenschein have created two separate robot architectures based on theRex programming language. The architecture presented in "An Architecture for IntelligentReactive Systems" [Kae90] appears to be a strongly related to the subsumption architecture,while the later situated automata approach [KR90], is more rigorous with its treatment of17Chapter 2^ On Robot Architecturesemantics. To avoid confusion, the architectures will be described in chronological order.However, outside of this subsection, only the situated automata approach will be referred to.The architecture described in "An Architecture for Intelligent Reactive Systems" [Kae90]is based on the Rex programming language. Rex is designed for the implementation of real-time embedded systems. It supports recursion and functional programming. Programs canbe compiled into hardware specifications that can be built or simulated by sequential code.All computations in Rex are guaranteed to terminate in constant time. Rex also supports anincremental planning system that is compatible with real-time operation. The planner verifiesthat its goal is unchanged and the context of the plan is satisfied by the environment beforecontinuing planning for a fixed period of time.The architecture proposed in [Kae90] looks like a cleaned-up version of the subsumptionarchitecture, although there are some significant differences. One notable difference is theseparation of the perception and action components of the robot. The perception componentstores the world model of the robot with data types ranging from raw sensor values to abstractworld models. The similarity with the subsumption architecture lies in the organization ofthe action component into hierarchically mediated behaviours. The behaviours described in[Kae90] are similar to the ones in [Bro86]. A fundamental difference, however, is that theformer fosters the use of meaningful symbols. The idea of situating planning in real-timeactivity (mentioned earlier) is a good one, but the details provided in the paper do not explainthe connection between the behaviour-based approach and planning.In [KR90] a more formal approach to robot synthesis, situated automata, is presented.The only connection with the previous paper appears to be the use of the Rex programminglanguage. Rather than using the earlier behavioural style of architecture, situated automata18Chapter 2^ On Robot Architecturebuilds on the perception-action split through two metalanguages, Ruler and Gapps, whichcompile into Rex expressions. Ruler is used to describe the perception component byestablishing connections between raw sensor data and propositional descriptions of theenvironment. (This is essentially the same as the approach to sensing in the earlierarchitecture.) Gapps is used to describe the action component of a robot. It has a formalsemantic grounding for describing robot programs.The behaviour of the robot is specified through a set of goal reduction rules as wellas a prioritized goal list. The goal reduction rules describe how actions of the robot leadto the accomplishment of the goals. The prioritized goal list ranks the goals according toimportance. Just, as the subsumption architecture, Gapps has a fixed hierarchy of goals. Thebehavioural specification are compiled into a finite set of condition-action pairs that form arobot program and can be compiled further into a circuit description.In situated automata a clock tick is defined as the minimum-cycle time needed to readinputs, perform some computation, and set outputs. A single action is generated from theset of condition-action pairs. The action component of the robot has no state: it is purelyreactive. There is, however, state in the perception component to reduce the effects of sensornoise and uncertainty. The action produced is a control signal as well as a transient goal.Thus situated automata is a complete reactive architecture.A serious drawback to Gapps is that it has a fixed ranking of goals and does not supportarbitrarily complicated computations such as planning. Some remedies to the planningproblem are discussed but it is not clear if they can be cleanly integrated with situatedautomata. It is claimed in [KR90] that the situated automata approach has been used in avariety of robotic applications, although none are described.19Chapter 2^ On Robot Architecture2.2.4 SummaryComplete reactive architectures select transient goals at the same rate as control signals,so the transient goals certainly can keep pace with the environment. The transient goals areimplicit in the control signals and not explicitly represented. The architectures describedin this section have a strong focus on reactive behaviour. They are capable of moment tomoment activity, but there is no projection or planning of future states.Figure 1 contrasts the complete reactive architectures. The architectures are listed in therows, and the distinguishing features in the columns.FeatureArchitectureModel ofComputationLimitedComputation? Focus on: SystemsSubsumptionArchitectureAFSM NoEngineeredSolutionMany Robots(10)ConcreteSituatedCircuit Yes Improvisationthrough RoutinesVideo GamePlayers (2)SituatedAutomata CircuitYes(Possibly No) Formal SemanticsVariety ofApplications (?)Table 1 Distinguishing Features of Complete Reactive ArchitecturesThe heading Model of Computation refers to the computational model used in thearchitecture. All of the architectures can compile down to a circuit level description.The restriction of the computational mechanism to circuits has significant impact on thecapabilities of these architectures. For instance, all these systems can respond quickly tochanges in the environment. A limitation of these architectures is the inability to performsearch type algorithms needed for planning, since they have little or no memory.The architectures with limited computation are guaranteed to complete all computationseach system cycle. This means that only a small class of algorithms can be used by these20Chapter 2^ On Robot Architecturearchitectures. Although the subsumption architecture permits unbounded computations, othercommitments within the design philosophy limit the applicable algorithms. The issue oflimited computation may lead to problems extending these architectures to more sophisticatedsystems.The column Focus on contains the motivation or guiding principle of each architecture.Although these architectures share a large number of features, they have significantly differentmotivations. The only one concerned with semantics and correctness is the situated automataapproach.The Systems column lists robotic systems built using the architecture. It is includedbecause experiments are needed to substantiate theories of robot architecture. The concrete-situated approach has only been demonstrated in a video game, so its extensibility has yet tobe demonstrated. The subsumption architecture has had simple applications in dynamicdomains documented; little has been published on the experimental use of the situatedautomata approach.2.3 Partitioned ArchitecturesPartitioned architectures are architectures where transient goals are generated independentof, and typically at a slower rate than, control signals. The approaches discussed in thissection are: Shakey the robot, Reactive Action Packages (RAP), ATLANTIS, and thedynamics of action selection. This section concludes with a discussion of the commonfeatures and differences between architectures.2.3.1 Shakey"Shakey" is the name of a mobile robot that was used in experiments in the late sixtiesand early seventies [Ni184]. This was one of the first attempts at creating an intelligent21FromSensorsPLANEXExecutorToEffectorsSTRIPSPlannerPredicatesWorld ModelAPlanPredicates Out PredicatesVFigure 5: The Architecture of Shakey the RobotChapter 2^ On Robot Architectureautonomous robot and strongly influenced successive attempts at building robots. Shakey isconsidered the "traditional" Artificial Intelligence approach to building robots.Figure 5 shows the architecture of Shakey the Robot. Shakey is partitioned into theSTRIPS planner and the PLANEX executor. The planner is responsible for "high-level"reasoning and instructs the executor through plans. The executor is responsible for monitoringthe execution of the plan and communicates with the planner through an abstract world modelthat stores information in predicate form. The partition between planning and executionmonitoring that began with Shakey is reflected in all the partitioned architectures in thissection.The reasoning component of the robot is the STRIPS planner. It is assumed that therobot is given a high level goal by an external authority. The STRIPS planner performs adirected search through the space of possible actions to produce a plan. The plan consists ofa sequence of operators (actions) that should result the goal state when they are executed. Anexample operator actually used in Shakey is: GOTOADJROOM(ROOM1,DOOR,ROOM2)[Ni184]. This reads as: Go To Adjacent Room from Rooml through Door to Room2.22Chapter 2^ On Robot ArchitectureThe PLANEX component of the system is responsible for executing the plan. PLANEXiterates over a series of perceptions and actions. First, it checks to see if the goal state hasbeen achieved. If it has not, PLANEX selects an operator whose preconditions are true.If no operator is valid, then plan failure is reported to the STRIPS planner. The operatorsare typically selected in sequence, although some may be skipped if some external agentassists Shakey.Operators are associated with action routines called Intermediate-Level Actions (ILAs).ILAs are composed of a sequence of Low-Level Actions (LLAs) LLAs are the bottom ofthe abstraction hierarchy and are implemented in LISP. Each LLA has termination conditionsbased on simple sensory data. It is the LLAs that form feedback loops with the environmentto perform simple tasks. The ILAs allow the STRIPS planner to plan at a higher level ofabstraction the LLAs The operator (ILA) GOTOADJROOM is composed of the followingLLAs: DOORPIC, TURN, PLANJOURNEY, ROLL, etc. [Ni184].The operators defined in this system are composed of LLAs. LLAs are related to transientgoals since they are close to the idea of a simple action or activity. Hence, a STRIPS plan canbe thought of as a sequence of transient goals. The transient goals of Shakey are establishedduring planning and are committed to long afterwards. New plans are generated when thecurrent plan has been completed or there is a problem in execution. Shakey is a partitionedarchitecture because transient goals are evaluated independent of and usually far in advanceof the control signals.The assumptions made in building the planner for Shakey are very restrictive: thereis a single agent in the world, the world is static, actions are discrete, and all actions arepredictable. Each of the problems stated in Section 2.1.1 were engineered away or ignored.23ToEffectorsPlanner()World Model^Status SketchyPlans RAP LibraryRAP ExecutorInstructionsDataFromSensors ControllerChapter 2^ On Robot ArchitectureIt is worth noting that all of the assumptions made in Shakey are invalid in an environmentwith people in it. Shakey is not at all appropriate for dynamic environments.2.3.2 Reactive Action PackagesThe Reactive Action Packages (RAP) system is a descendant of Shakey with somesignificant improvements [Fir89]. Like Shakey, the decision making component of the robotis a planning system. Reactive action packages play exactly the same role as operatorsand ILAs in Shakey. The operator role of the reactive action packages is to maintain theplanner's delusion that there are abstract primitive actions in a continuous world [Fir92].The ILA role is to provide a means for describing robot programs. The RAP Executor actsas a bridge between an abstract planner and a controller embedded in the real world. Theentire architecture can be seen in Figure 6.Figure 6: The RAP ArchitectureEach RAP is a robot program that consists of a task, success criteria, and a numberof alternate methods for carrying out the task [Fir89]. RAP provides a more structuredapproach to robot programing than ILAs since each method within a RAP can be thoughtof as a simple behaviour-based program. In the latest version of RAP theory, methods24Chapter 2^ On Robot Architectureenable and disable collections of control and sensing modules in the controller rather thandirectly performing sensing and controlling the actuators [Fir92]. The controller continuouslyinteracts with the world to produce reactive behaviour. A notable improvement over Shakeyis that dynamic activities such as following someone across the room are now supported.Further, the behaviour-based approach has resulted in more robust control and the systemtolerates the failure of actions.The RAP system generates a "sketchy plan" composed of a sequence of tasks. Thesequentially executed methods within each task are equivalent to transient goals. LikeShakey, plans and transient goals are generated only when the preceding plan has beencompleted or has failed. As a result the problem of producing appropriate behaviour (asopposed to reactive behaviour) in a dynamic world is not addressed. RAP is a partitionedarchitecture for the same reasons as Shakey.2.3.3 ATLANTISATLANTIS (A Three-Layer Architecture for Navigating Through Intricate Situations) isdescribed as a direct intellectual descendent of the original RAP system [Gat92]. It has a threelayer architecture that consists of a deliberator and a controller connected by a sequencer thatis similar to the RAP executor. It allows an asynchronous traditional planner to operate witha reactive control mechanism. In ATLANTIS, the deliberator can perform any computations,so it is difficult to draw a clear schematic for the architecture. However, the figure for RAP,Figure 6, is fairly close, except that the planner/deliberator has access to sensor data andcan add to the world model.The deliberator is restricted to computations that are interruptible so that complexcomputations will not degrade the real-time performance of the robot. The architectural25Chapter 2^ On Robot Architecturemodel makes no commitment to any particular deliberation mechanism, although all lengthycomputations must be performed by the deliberator. The current implementation includes avision system, a world modeler, and a planner in the deliberator.The controller is composed of behaviour producing modules called primitive actions.Each primitive action is very similar to the control modules in the RAP system. Althoughthey are called primitive actions, there is no commitment to discrete actions and continuousactivities are supported.The deliberator shares computational resources with the controller and can operate whilethe controller pursues the current plan. Simultaneous planning and control allows thedeliberator to consider alternate plans. Plans (and hence transient goals) are evaluated inATLANTIS as available processing time allows. This makes it possible for a robot toproduce intelligent behaviour in dynamic environments.In conclusion, ATLANTIS attempts to alleviate some of the difficulties with the traditionalArtificial Intelligence approach to robotics. It has some features appropriate to dynamicdomains, such as the consideration of alternate plans. However, the commitment to theplans-as-communication view [AC90] prevents specific plan details from being computeduntil they are needed, resulting in a greater latency in response.23.4 The Dynamics of Action SelectionThe dynamics of action selection is a theory of action selection intended for autonomousagents [Mae89; Mae90]. It describes a mechanism for action selection and is not a completerobot architecture. The dynamics of action selection provides an alternative to traditionalplanning systems. It is discussed in this section because the dynamics of action selectioncould be inserted in the "planner" box of any of the other partitioned architectures. In26Chapter 2^ On Robot Architectureaddition, this material is relevant because the problems of action selection and the selectionof transient goals are essentially the same.Maes argues that a mechanism for action selection is needed for a robot operating in acomplex dynamic environment [Mae90]. Any such mechanism must be fast, favour actionsrelevant to the current situation, and be able to exploit opportunities. Further, there is a trade-off between "goal-oriented" and "situation-oriented" behaviour. "Goal-oriented" behaviourmeans that a robot prefers to follow an active plan rather than respond to the current situation.The commitment to pursue a plan irrespective of changes in the environment is inappro-priate. The decision made by a robot of what to do now should be based on the expectedutility of each action, not on the arbitrary notion of following a plan. The postulated trade-offdoes not apply. However, the inspiration for a systematic means of evaluating the appro-priateness of actions is insightful.The dynamics of action selection mechanism accepts high-level goals and predicatesabout the world as inputs. At each time step, it outputs an action. There is a competencemodule [Min86] for each possible action that is represented as a node in an activation network.Competence modules are connected through successor, predecessor and conflicter links thatrepresent dependencies and conflicts among the actions. Energy is injected into the networkfor each active goal and valid proposition. A small number of tunable global parametersare used to regulate the flow of energy through the links connecting the nodes. After a setnumber of energy spreading iterations, the competence module with the highest energy isselected as output.The dynamics of action selection is identified as a fast mechanism for robots, although itis not ideal in a number of ways. First, predicates describing the world are readily available27Chapter 2^ On Robot Architectureand are used to the exclusion of probabilistic models (actually, this is a planned modification).Second, all goals input the same amount of activation energy into the network, although itis unlikely that all goals are of equal importance. Third, in an experimental evaluation ofdifferent action selection mechanisms the dynamics of action selection fared very poorly[Tyr93].2.3.5 SummaryThe partitioned architectures are, roughly speaking, composed of a planning componentand an execution monitoring component. Transient goals are selected independently of andless frequently than control signals. This is advantageous because it is typically not necessaryto select transient goals as frequently as control signals in dynamic environments. Transientgoals are explicitly represented as actions. Actions are associated with activity producingmodules that abstract away system specific implementation issues. This allows modules tobe tested independently and reduces the complexity of the entire system.Table 2 focuses on the selection of transient goals in partitioned architectures. Thearchitectures are listed in the rows, and the distinguishing features in the columns. The issuesthat are important for partitioned architectures are different from those that are relevant tothe complete reactive architectures. Hence, the differences between the headings of this tableand the table on complete reactive architectures.The question, "When are transient goals selected?" is the most important in dynamicworlds, since transient goals must be able to change with the environment. The headingSystems is included since robot theories must be tested to ensure validity. The bottom rowprovides a sneak preview of reactive deliberation, the architecture presented in this paper,since it is a partitioned architecture.28Chapter 2^ On Robot ArchitectureFeatureArchitectureWhen aretransient goalsselected?How aretransient goalsselected?How far intothe future dotransient goalsextend?SystemsShakeyOn failure orcompletion ofplan.Planner ArbitraryLength Plans "Office" RobotRAPOn failure orcompletion ofplan.Planner ArbitraryLength PlansOffice Robot(1?)ATLANTISOn failure orcompletion ofplan and asfrequently aspossible.Planner.(Possibly othermechanisms) ArbitraryLength PlansOutdoorRobots (>2)Dynamics ofAction SelectionAs frequentlyas possible.ActivationNetwork One ActionTested inSimulatorReactiveDeliberationAs frequentlyas possible.Estimates of theutility ofbehavioursOne ActionMultipleSoccer PlayingRobotsTable 2 Partitioned Architectures in PerspectiveThe top three architectures, Shakey, RAP, and ATLANTIS, exhibit similar characteristics,since they are all designed around planning systems. In Shakey and RAP, arbitrarily longsequences of actions are planned and then executed without much consideration for otheractions. ATLANTIS is similar, but allows alternate plans to be considered. In all three,sequences of actions are generated, and not just a single action. The danger with theseapproaches is that much of the robot's success lies in its ability to predict future worlds Arobot must be able to operate in a dynamic world where there are multiple agents, actionsare continuous, and the effects of non trivial actions are unpredictable. To operate in a realenvironment, transient goals must be evaluated frequently: the blind pursuit of plans is poorparadigm to follow.29Chapter 2^ On Robot ArchitectureThe dynamics of action selection generates a single action at each time step, so it isable to select actions appropriate under the current circumstances. This approach is moreappropriate for robots in dynamic environments, than the approaches based on planning.However, the dynamics of action selection is hardly an ideal scheme.One notable omission from this chapter is a survey of hierarchical architectures such asNASREM where the problem of controlling a robot is considered analogous to the problemof controlling an army or business [A1b81]. Hierarchical architectures in their general formhave been successfully used for the control of robot work cells in manufacturing. However,their use in mobile robots has been limited and their applicability in uncertain and dynamicenvironments has yet to be demonstrated. For these reasons, hierarchical architectures havenot been given detailed treatment. It is worth noting that partitioned architectures can beconsidered a special case of hierarchical architectures.2.4 Additional Related LiteratureThis section describes additional previous work relevant to this thesis. The work isrelated to mechanisms that arbitrate between conflicting goals. Planning systems are notsuitable for this; they do not decide which goal the agent should pursue, but rather accepta single goal as an input. One possible solution is to develop a heuristic mechanism thatarbitrates between goals such as the dynamics of action selection. A more rigorous approachis to use decision theory to evaluate the expected utility of each action.Decision theory can be used to answer the question "What to do now?" (It can also beused to compute a sequence of actions.) Given an initial state wo and finite sets of decisionsD and world states W, decision theory requires that a value be assigned to each world statev[w]and for each potential decision that the conditional probability of reaching each world30Chapter 2^ On Robot Architecturestate be specified p[w d, wo}. The expected value of each decision EV[d] is then the sumof the values of each world state weighted by its conditional probability as shown in thefollowing equation: EV[d] = E v[w] x p[w I d, wo]. The best decision is then simply thewE Wdecision with the greatest expected value.One problem with decision theory is that there are a lot of world states and this leads toa very large number of conditional probabilities. The above instructions must be performedfor each initial state, so for n world states and m decisions 1 , the number of conditionalprobabilities needed are n2m. Continuous properties must be discretized; This forces thedesigner of a system using decision theory to abstract away (hopefully) unimportant detailsof the world to reduce the number of conditional probabilities to be computed. A decisionmodel for robots based on decision theory has been proposed [KD89], however it doesnot handle continuous variables, nor is it possible to do sophisticated spatial reasoning. Forrealistic robotic environments, decision theory does not appear to be good approach, howeverit motivates the approach taken in reactive deliberation of generating estimates of the utilityof actions.Bidding as a mechanism for action selection has been proposed before for autonomousrobots. Minsky describes a central marketplace where mental proto -specialists compete forcontrol [Min86]. Each proto-specialist represents a different goal and generates a bid basedon the internal urgency of the goal. He argues that this approach is bound to result inunstable behaviour, but this conclusion is reached since only the urgencies of the goalsand not the current situation are considered in his scheme. Once bids are based on boththe importance of the goal and the current situation the objections raised are invalid. TheI^The situation is actually worse than this. If there are k independent propositions, then n=2k. Also, m is exponential in the numberof independent decisions, and quadratic in the number of different control values.31Chapter 2^ On Robot Architectureproblem of mediating behaviours is also discussed in [Kae90] and follows the same (flawed)line of arguments as in Minsky.Action selection mechanisms based on bidding are referred to as drives in psychologicalliterature. Different mechanisms for action selection are compared through a simulation ofa zebra living in an African savannah in [Tyr93]. He found that while the drives approachis effective, a free-flow decision hierarchy (neural network) works better due to its abilityto combine preferences (actions). It makes sense to combine preferences in the simulationwhen the world is a discrete grid and there are only a small number of actions. It is unclearhow preferences can be meaningfully combined in a real robot where actions, such as thedesired direction of motion, are continuous.Systems such a contract nets allow negotiation between agents to make a decision[Smi80]. This is very different from the bidding mechanism described above. Withnegotiation, the bids an agent makes are influenced by the bids of other agents, whilewith the bidding described above, there is no interaction among agents (behaviours, goals).Negotiation has been used for the control of multiple robots [Nor92].2.5 DiscussionThrough transient goals an abstract view of robot architecture is provided. From thisviewpoint, landmark architectures have been partitioned into two groups: complete reactivearchitectures and partitioned architectures. Each group has a number of features that areuseful in dynamic domains.In complete reactive architectures, the transient goals of the robot are implicit in thecontrol signals. These architectures can compile down to a circuit level description and can32Chapter 2^ On Robot Architectureonly perform bounded computations, and so are able to respond quickly to changes in theenvironment.Partitioned architectures have transient goals explicitly represented as actions. Actionsare associated with activity producing modules to abstract away system specific implemen-tation issues and increase modularity. One weakness typical of these architectures is thecommitment to arbitrary length plans.The separation of the selection of transient goals from the frequent generation of controlsignals in partitioned architectures provides two advantages. First, transient goals can beselected at whatever rate is appropriate for the environment since it is seldom necessary toselect transient goals as frequently as control signals. Second, lengthy computations can beperformed, while complete reactive architectures are restricted to those that can be done atthe same rate as control signals.It is not appropriate to rank the architectures described in this chapter according totheir suitability for dynamic domains, since there is no clear basis for such a comparison.For instance, both the concrete-situated approach and the dynamics of action selection areonly partial specifications of robot architectures and results have been limited to softwaresimulations. Shakey, RAP, and ATLANTIS are very similar with only small differences,although ATLANTIS supports some features that recognize the time limited nature of dynamicenvironments. The utility of the situated automata approach, although it appears limited, isnot entirely clear. The subsumption architecture has some good elements, but has not evolvedsignificantly since its inception in the mid-eighties.33Chapter 3Reactive DeliberationIn this chapter, the main results of this thesis are presented through an architecture calledreactive deliberation. It combines responsiveness to the environment with intelligent decisionmaking. Even deliberation must be to some extent reactive to changes in the environment.Although the name is an oxymoron, it is consistent with Artificial Intelligence nomenclature(cf. Reactive Planning).Although reactive deliberation is a partitioned architecture, it shares many of the featuresof the complete reactive architectures and contains the architectural elements needed indynamic domains. Reactive deliberation was influenced principally by RAP and the dynamicsof action selection, although all of the approaches discussed in the previous chapter havecontributed to it in some way.Reactive deliberation has two related architectural contributions. The first is the partitionof the architecture into deliberative and executive layers. The second is the decompositionof the deliberative layer into behaviours that represent the goals of the robot. Each of theseis motivated and explained in the following sections.34Chapter 3^ Reactive Deliberation3.1 The ArchitectureReactive deliberation is a partitioned architecture that follows the principles behindShakey. However, there are many ways to divide and conquer the problem of robot control.The difference between reactive deliberation and other approaches is the level of abstractionat which the split between reasoning and execution monitoring occurs. In this section, analternative way of selecting this split, one that is suited to dynamic domains, is presentedand justified.Reactive deliberation is partitioned into deliberative and executive layers. The delibera-tive layer decides what to do and how to do it, while the executive layer interacts with theenvironment in real-time through a set of action schemas that receive run-time parametersfrom the deliberative layer. The components that form each layer are collectively referredto as the deliberator and the executor respectively.A structural model illustrating the partition can be seen in Figure 7. The deliberatoris composed of multiple concurrently active behaviours, while the executor has only oneaction schema enabled at a time. The deliberator and the executor run asynchronously toallow the executor to continuously interact with the world and the deliberator to performtime consuming computations. More detailed information on behaviours and action schemasare provided in the following sections.The deliberator is responsible for answering the questions: "What to do now?" and"How should it be done?" Believers in the theory of plans-as-communication would arguethat these questions can and should be resolved independently [Gat92]. In this case a plannerdecides what to do based on an abstract world model, while the problem of resolving howeach action should be performed is postponed until it is to be executed. In a dynamic35Chapter 3^ Reactive DeliberationDeliberator behaviours•Sensand Sor Datatatus ExecutorActionParamandtersFromSensorsaction schemas ToEffectors-DPFigure 7: The Architecture of Reactive Deliberationenvironment, however, these questions are usually interrelated. Before committing to anaction, it is important to verify that the action is both feasible and more appropriate thanother actions. Partitioned architectures that follow the planning paradigm check to see if anaction is feasible, but not if there is a better action.Answering the question "How should it be done?" provides information about the utilityof an action. For example, detailed planning may show that one action is impossible, whileanother can be accomplished quickly. This suggests that generating plans at a high level ofabstraction may not provide an effective solution for the problem of action selection. Unlessall actions of the robot are feasible and the outcomes can be predicted at design time, thequestion "What to do now?" cannot be intelligently answered without also answering "Howshould it be done?"The interface between the two layers is fairly simple, yet reveals important structuralfeatures of reactive deliberation. The executor regularly sends filtered sensory data as wellas occasional status messages describing events or errors from the enabled schema. The36Chapter 3^ Reactive Deliberationdeliberator sends an action with parameters to the executor when the enabled schema hasaccomplished its objective or an action more appropriate the current situation is found. Thisis a central feature of the architecture: the deliberator can interrupt the current activity ofthe executor and provide it with a more appropriate one. Notice also that the deliberator isresponsible for generating a single action, whereas other partitioned architectures (exceptingthe dynamics of action selection) generate a complete plan (i.e. sequences of actions). Thisdistinction helps focus the deliberative activities on the immediate situation.3.2 The ExecutorThe executor is composed of a collection of action schemas. An action schema is arobot program that interacts with the environment in real-time to accomplish specific actions.Schemas receive run-time parameters from the deliberative layer that fully define the activity.The Action schemas exhibit the same level of complexity as controller modules in RAPand primitive actions in ATLANTIS. They are designed in the spirit of behaviour-basedapproaches, where each schema is experimentally verified. For example, typical schemasmight be: follow path, drive along wall, push box, and pickup block. All the schemastogether define the capabilities of the robot; they are independent of the robot's goals.The deliberator enables a single action schema with a set of run-time parameters thatfully defines the activity. Only one action schema is enabled at a time and it interacts withthe environment through a tight feedback loop. In the world of real-time control there is noroom for time consuming planning algorithms. Computations in action schemas are restrictedto those that can keep pace with the environment, so lengthy computations are performedin the deliberator.37Chapter 3^ Reactive DeliberationThe boundary between appropriate and inappropriate computations in the executor is afunction of the computing power of a particular system and specific environmental constraints.The idea is that any computations that can be performed within the time constraints ofthe environment are suitable for use in the executor. Computations which do not meettime constraints are relegated to the deliberator to avoid degrading the ability of the robotto interact in real-time. Regardless of advances in computing power, there will likely beinteresting algorithms that do not run in real-time. This suggests that the partition betweenthe executor and the deliberator is indicative of a technology-independent need to segmentcomputations.3.3 The DeliberatorIn this section, the internal specifications of the deliberator module are described. Thefocus of the deliberator is on an effective mechanism for selecting actions or transient goalsin a timely manner. There are a large number of architectural issues to be discussed, so themore peripheral ones have been relegated to the Subsection 3.3.2, Further Issues.A central feature of reactive deliberation is that the deliberator is composed of modulescalled behaviours that represent the goals of the robot. The notion of a behaviour is usedin the sense of Minsky's mental proto -specialists [Min86] with some important distinctions.In reactive deliberation, each behaviour computes an action and generates a bid reflectinghow suitable it is in the current situation. The most appropriate behaviour, and hence action,is determined in a distributed manner through inter-behaviour bidding. Some examplesof behaviours are: clean floor, recharge, deliver mail, and patrol building. Notice thatbehaviours characterize goals, not actions.38Chapter 3^ Reactive DeliberationA behaviour is a robot program that computes actions that may, if executed, bring abouta specific goal. Behaviours compute actions whereas action schemas perform actions. Eachbehaviour must perform the following: 1) select an action schema, 2) compute run-timeparameters for the schema, and 3) generate a bid describing how appropriate the action is.3.3.1 More Than Just A Theory of Action SelectionThe deliberator is responsible for ensuring the robot acts intelligently in its environment.It must decide what the robot should do now by selecting the action or transient goal topursue. Reactive deliberation does this while also addressing the problem of limited timeand bounded computational resources.The deliberator in reactive deliberation is composed of modules that each represent agoal or behaviour. Each behaviour evaluates the world and performs whatever planning isneeded to fully describe its associated action. A bid is produced by each behaviour thatreflects how beneficial it would be for the behaviour to gain control of the robot. In otherwords, a bid expresses the expected utility of an action given the current world state. Thebehaviour with the highest bid gains control of the robot's effectors.Each bid is an estimate of the expected utility that is based on the current state of theworld as well as the results of planning. Currently, the criteria for generating the bids arehand coded and tuned so that the most appropriate behaviour is active in each situation. Thisapproach requires the designer of a system to explicitly state the conditions under whichcertain behaviours are suitable and favourable. A simplified version of this appears in allthe complete reactive architectures. For example, the concrete-situated approach uses binarypreference relations to establish an ordering of proposers or actions.39Chapter 3^ Reactive DeliberationIn a real robot there is more to the problem of action selection than just deciding whatto do. In dynamic environments, a robot needs to quickly decide what to do and how to doit. The deliberator must keep pace with changes in the environment to produce intelligentbehaviour. Each behaviour is responsible for computing a bid and planning the action. Fixedcomputational resources need to be distributed among the behaviours, since it is typicallythe case that there is too much computation to be done.In reactive deliberation behaviours are not always active. Assuming the architecture isimplemented on a standard serial processor (which may be time sharing with the controlleror on a separate processor), the order of activation of behaviours must be scheduled. Atany given time there will be a ruling behaviour that has control of the robot executor. It isdesirable to schedule the ruling behaviour more frequently, since it may want to modify thecurrent action to reflect changes in the environment. Another observation is that behaviourswhose bids will be lower than the active behaviour's do not need to plan their actions. Thisis useful since planning an action is more computationally expensive than generating a bid.The exact mechanism for distributing computational resources is left unspecified as it isstrongly dependent on the real-time requirements of the system, the number of behaviours,and the resources need by each behaviour. However, the basic principle is to divide theavailable computational resources among the behaviours such that the ruling behaviourreceives more resources. This allows behaviours that perform minimal computations torespond quickly, while those that perform lengthy computations will respond slowly. Itmight be appropriate to allocate resources according to the importance and needs of eachbehaviour, but there are no provisions for this in the current implementation. There isno perfect architectural solution to the problem of limited computational resources. If the40Chapter 3^ Reactive Deliberationcomputations slow, then the robot will be slow too. The only possible solutions are to getmore powerful computers or better algorithms.Reactive deliberation is a good approach for robot domains that do not have a largenumber of goals (and hence behaviours). Most robots built to date have few capabilitiesand can only accomplish a small number of goals. It is clear that reactive deliberationmay become cumbersome if there are a large number of behaviours. On the other hand,mechanisms could be developed that might, for instance, switch banks of behaviours off insituations where they would never gain control.3.3.2 Further IssuesIn this subsection a number of issues peripheral to reactive deliberation are discussed.These are: starvation and stability, planning, multiple resources, distributed versus centraldecision making, learning, and multiple robots.Starvation and Stability There are two issues that must be addressed by any arbitrationmechanism: starvation and stability. Starvation occurs when a behaviour should gaincontrol but does not. This can be avoided by tuning the bids generated by each behaviour.For instance "impatient" behaviours might have bids that increase with time until thebehaviour gains control. A system is unstable if it alternates between two behaviours withoutaccomplishing the goal of either one. Stability can be accomplished through appropriatebidding. Behaviours that have terminating actions bias their bid upward to reflect the factthat they are closer to completing the action. This prevents the system from alternatingbetween two different behaviours without accomplishing either objective. Nonterminatingbehaviours may have to incorporate a "boredom" component in bidding so that if they41Chapter 3^ Reactive Deliberationhave control for a long period of time, they get bored and reduce their bid to allow otherbehaviours to gain control.Use of Plans The use of plans is permitted within the deliberator and may be of great useguiding the activities of the robot. For instance, a behaviour might include a planner anda plan monitor that would bid against other behaviours on the basis of the appropriatenessof the current plan step in the current situation. However, plans are inappropriate whenused as an abstract robot program simply handed over to an executor that follows it tothe exclusion of alternate actions. It is widely recognized that robots need to operate in adynamic environments that are unpredictable and full of uncertainty [Mac93]. An exceptionto traditional plans is universal plans [Sch87] where actions are described for all possibleworld states; however, this is not a tractable solution.Multiple Resources In the above description, all the behaviours compete for control of theentire robot: there is only a single resource. This can be extended to multiple independentresources, such as motion, sensor and manipulator control, where each behaviour producesa bid for each resource that it needs. If control were assumed only when all the desiredresources were available, the problem of deadlock could be avoided, although starvationmust again be solved by proper tuning of the system.Modularity and Concurrency Reactive deliberation is a distributed mechanism for actionselection. Each behaviour is responsible for evaluating the world according to its own criteria.There is no central decision maker that evaluates the world and decides the best course ofaction. One feature of reactive deliberation is the ability to add new behaviours withoutmodifying the bidding criteria of the established system; a new behaviour must, of course,42Chapter 3^ Reactive Deliberationbe tuned to be compatible with existing ones. Also, the behaviours are independent, sothey can have different representations and approaches to generating actions. For instance,a behaviour could incorporate a traditional planner to generate complex activities. Anothermajor advantage is that behaviours can be run concurrently on different processors, thusimproving the response time of the system.A Possible Unitary Framework Reactive deliberation need not be distributed. It couldbe implemented as a unitary system that consults a database to obtain bidding criteriaand the goal of each behaviour. Unfortunately a unitary system forces commitment toone representation for deliberative activities, whereas independent behaviours allow multiplerepresentations and distributed processing. However, one advantage to using a unitary systemis that a single representation system is easier to design and test.Learning It is possible to implement a learning system that would generate appropriatebids, rather than use the current method of tuning the bids by hand. Given a fixed set of inputsdescribing important domain attributes and the results of planning, appropriate bids could belearned. For example, neural networks are good for approximating nonlinear multivariablefunctions. However, the training set may be difficult to generate for dynamic domains.Inter-robot Cooperation Reactive deliberation can be extended to multiple robots, whereeach robot would broadcast its intended actions and a bid which estimates the appropriatenessof that action. Robots whose actions are in conflict will reevaluate their decision to includethe bid information of other robots. Robots with lower bids than other robots will lower theirinternal bid for that action, since that action is not a good one in the current situation. This43Chapter 3^ Reactive Deliberationshould result in another behaviour gaining control of the robot. It is not clear, however, thatsuch a mechanism will lead to effective inter-robot cooperation.3.4 DiscussionReactive deliberation contains a consistent set of architectural elements needed in dy-namic domains. It was developed to investigate some of the problems with producing real-time intelligent behaviour in dynamic environments. Reactive deliberation makes three con-tributions to robot architecture. -A new split between reasoning and control is needed because the questions "What to donow?" and "How should it be done?" are interrelated. The partition between reasoning andcontrol advocated in reactive deliberation makes the deliberator responsible for answeringthese questions. The executor is restricted to bounded-time computations to provide a tightfeedback loop with the environment. Reactive deliberation recognizes that (low level) robotcontrol is an on-line activity, while action selection and planning can be off-line activitieswithout degradation in performance. Complete reactive architectures and constraint nets areessentially on-line and do not take advantage of the ability to perform off-line computations.Other partitioned architectures will in general have greater lag since detailed planning ispostponed until the action is to be executed.The transient goals of a robot need to be evaluated at a rate commensurate with changesin the environment. This is an important claim since other partitioned architectures (exceptthe dynamics of action selection) do not meet this criterion. In reactive deliberation, therestriction of attention to the next action allows the deliberator focus on the best action now.Goal-oriented behaviours are a useful abstraction that allow effective goal arbitrationand sharing of scarce computational resources. Goal arbitration is accomplished through a44Chapter 3^ Reactive Deliberationdistributed bidding mechanism that reflects the expected utility of the actions of the robot,while computational resources are shared among behaviours.Table 2 on page 29 compares reactive deliberation with the partitioned architectures.The dynamics of action selection serves a similar role as reactive deliberation, but uses thespreading of energy in an activation network to select actions, rather than utility mindedbehaviours. The traditional planning-based architectures are quite different since theyconsider plans rather than individual actions. This bias prevents them from reacting tochanges in the environment.Reactive deliberation is not a panacea for robotic woes. A further disclaimer is that itis an incomplete robot architecture. Its focus is on the issues related to dynamic domains,ignoring a number of important robotic issues. The largest omissions are on sensing issuesand on world modeling. Sensor processing, fusion, and the development of maps and worldmodels are really hard problems. This work ignores them to focus on the problem of real-timeintelligent behaviour in dynamic environments.45Chapter 4Playing Soccer: A Real-WorldExperimentThis chapter links theory to practice describing a robot controller that has been con-structed using reactive deliberation. The controller has been designed so that the robot maycompete with another robot in a game of soccer. Soccer-playing was chosen as a domainfor experimentation since it requires real-time interaction with a dynamic environment andinvolves inter-robot competition.4.1 The Dynamite TestbedThe Dynamite testbed, which has grown out of the Dynamo (Dynamics and MobileRobots) project at UBC [BKL+93; BKM+93], consists of a fleet of radio controlled vehiclesthat receive commands from a remote computer. Robot position and orientation is determinedusing off-board visual sensing. The testbed allows autonomous robots to be controlled withrelative ease from an off-board computer. The testbed is intended for robotics experiments,so an effort was made to simplify the visual tracking of robots and the programmer interfacewhile retaining the challenges faced by real robots.Chapter 4^ Playing Soccer: AReal-World Experiment4.1.1 The Soccer FieldThe mobile robot bases are commercially available radio controlled vehicles. Two 1/24scale racing-cars are used in the experiment. (There are six racing cars in the lab altogether).Each is 22 cm long, 8 cm wide, and 4 cm high excluding the antenna. These cars have drivenunder computer control at speeds of 140 cm/s, well below the maximum speed of 6 m/s. Thesoccer field is 244 cm by 122 cm in size. Two cars and a ball are shown their environmentin Figure 8. The ball is the small object in the middle of the image; each of the cars is fittedwith two circular colour markers allowing the vision system to identify their position andorientation. The robots are neither as flexible nor as competent as human soccer players. Asa result, the environment is modified in two ways. First, there is a wall around the soccerfield which prevents the ball (and the players!) from going out of bounds. Second, thereare bathers to prevent the ball from getting trapped in the corners. Since these are Canadianrobots, it is not unreasonable for the soccer field to be shaped like an ice hockey rink.Figure 8: Robot Players Inhabiting the Soccer Field4.1.2 System OverviewThe hardware used in this system is shown in Figure 9. There is a single colour cameramounted in a fixed position above the soccer field (Figure 8 shows the image of the soccer47Chapter 4^ Playing Soccer: AReal-World Experimentfield viewed from the camera). The video output of the camera is transmitted to special-purpose video processing hardware named the DataCube in Figure 9. The DataCube is adataflow computer which has been programmed to classify image pixels into different colourclasses at video rate. This information is transmitted to a network of transputers which forma MIMD computer. Additional vision processing is performed on the transputers to find theposition, in screen coordinates, of the centroid of each coloured blob and to transform thesepositions from screen to world coordinates. The entire vision subsystem is called the VisionEngine [LBaJL91]. The Vision Engine produces the absolute position and of all the objectson the soccer field. The orientation is also produced for the cars, but not the ball. This isdone at 60 Hz with an accuracy in position of a few millimeters. For further information onthe Vision Engine and the vision software, refer to Appendix A.The reasoning and control components of a vehicle can be implemented on any number oftransputers out of the available pool. Each vehicle is controlled by a user program runningon two transputer nodes: one for the deliberator and one for the executor. An arbitrarynumber of nodes, labeled 1 to n in Figure 9, can be used in parallel to control n independentvehicles. The movement of all vehicles is controlled through a radio transmitter attachedto a single transputer node. Commands are transmitted to the vehicles at a rate of 50 Hz.System users are able to install their own planning and control routines and simply link into the vision and transmitter systems. Each user program is independent and does not effectthe operation of others. User programs can cooperate with one another by communicatingvia message passing.4.1.3 User InterfaceOne of the advantages of this environment is a clean user interface. The user program48RGB Camera(Single CCD)User Node: Reasoning & Control ;Radio Transputer Network ;Transmitter;Chapter 4^ Playing Soccer: AReal-World ExperimentFigure 9: Hardware Setupwhich is to perform reasoning ; planning and control is shielded from the some of thecomplexities in the world. The input to the user program is a sequence of vectors describingthe locations of all the objects. The output from the user program is the sequence of controlsignals sent to the vehicle. The control signals are throttle and steering angle. These controlthe power given to the motors and the angle of the front wheels.This interface is of significant benefit to the user. It allows one to focus on making therobot "do the right thing" instead of worrying about implementation details. It is possibleto create this interface since there is a robust vision system and a reconfigurable networkof off-board computers. Another benefit this abstraction provides is that user programs canbe connected to either a simulator or the real system. The simulator is described fully inAppendix B. It has proved to be an invaluable tool in developing robot programs.4.1.4 System LatencyThe vision system runs at the video standard of 60 Hz (7 = 16.7 ms). The rate of ourcommercial R/C equipment is 50 Hz (T = 20 ms). The rates do not match and there is nomechanism for providing synchronization. This problem is ignored by the action schemaand in the internal model of the vehicle dynamics. The rate mismatch problem causes anincrease in the latency of the servos of between 0 and 20 ms. It is possible for control49Chapter 4^ Playing Soccer: AReal-World Experimentsignals to be overwritten in the output buffer, however this will have no significant impactbecause control signals typically change smoothly except at separated discontinuities. Theoverall system latency (including rate mismatch latency) is estimated at 140 ms, but this ismainly due to mechanical servo-travel time. As a result, ignoring the problem has only asmall effect on the controllability of the robots.4.1.5 Soccer with Monster Trucks1/10 scale radio controlled trucks as well as the 1/24 scale racing cars have been used toplay soccer. In Figure 10 a racing car, Zeno, is in the foreground and a truck, Heraclitus, isin the background. (The names are inspired by Monty Python's soccer-playing philosopherssketch. Zeno and Heraclitus were two Greek philosophers particularly concerned withdynamic worlds.) A 12 inch (30 cm) ruler is also visible. The trucks were the originalvehicles used in soccer-playing experiments. The field used with the truck was 10 ft. (3 m)by 10 ft. which is too small because the truck's minimum turning circle is about 7 ft. In sucha constricted environment, navigation was difficult and resulted in odd-looking path plans.The Dynamite testbed, in contrast, has a width of about 2.5 turning circles and a length oftwice that. There is more than enough room to play and experiment in.4.1.6 Non-holonomic ConstraintsAs mentioned earlier, the robots used in the Dynamite testbed are radio controlled cars.These robots cannot translate and rotate freely in the plane, unlike the omnidirectional robotstypically used in indoor mobile robot experiments, such as Shakey. Car-like robots aresubject to non -holonomic kinematic constraints that limits the change in a robot's directionof motion [Lat91]. A holonomic constraint is a constraint that is a function of only the currentconfiguration (position and orientation) and time. Non-holonomic constraints are constraints50Chapter 4^ Playing Soccer: AReal-World ExperimentFigure 10: Two Dynamo Vehicles: Zeno (front) and Heraclitus (rear)with higher order (with respect to time) terms such as velocity that cannot be integratedout. In practical terms, this means that cars cannot move sideways and have motion that isbounded by minimum turning radius circles. Non-holonomic constraints can alternatively beinterpreted as constraints that do not reduce the size of the configuration space of the robot.The non-holonomic constraints affect the problem of robot control. Since motion isconstrained, it is no longer possible to move in the direction of the desired destination.Although motion can be accomplished without motion planning, there is a stronger need forplanning than with omnidirectional robots. Real life problems such as parallel parking areconnected with motion planning. Cars need to move backwards when parallel parking closeto the curb, just as the soccer-playing robots drive backwards to make a shot when the ballis very close to a wall.4.1.7 Design DecisionsThe Dynamite testbed has been engineered so that some important issues in the "realworld" are ignored. This reflects a careful research strategy where the problems faced in51Chapter 4^ Playing Soccer: AReal-World Experimentdynamic domains have been examined to the exclusion of other problems.The vision system is simple, but this thesis is not about a new approach to vision, norabout building robots to roam around offices or solve other industrial problems. A singleoff-board camera allows us to track at least six (and possibly more) robots at a time in afixed area, and that is all that is needed for experiments with multiple robots. Off-boardcomputation is used because it is easier to modify software and the few conceptual gains tobe made from engineering an on-board system are not relevant to this thesis.In Section 2.1.1 a number of characteristics of robotic domains were introduced. TheDynamite testbed is a structured, simple environment. (i.e. not unstructured and complex.)The problem of sensor ambiguity has been engineered away. Coloured blobs are mappedinto car and ball location and the locations of the goals and the walls of the field are knowna priori in the fixed world coordinates. The camera is calibrated to a coordinate systemthat is consistent with the fixed locations. The goal markings discernible in Figure 8 arefor the benefit of humans only; it has no effect on the vision system. The two importantcharacteristics that do apply to soccer-playing are that it is unpredictable and dynamic.Although the complexity of the testbed has been minimized, the experiments explore achallenging dynamic world. The cars are able to drive safely at scale speeds upwards of 80km/h in a confined space. These are autonomous mobile robots that function in a near-realworld. Robots playing soccer sounds like fun, but in no way is this a toy experiment.4.1.8 Importance of the testbed to this thesisAn important disclaimer: the author did not build this testbed. Although he has put asignificant amount of time and energy into it, others have contributed far more. This thesisshould be evaluated on the usefulness of reactive deliberation and not on the construction of52Shoot Wait^ClearServo^behavioursGo *ADefend Line^DeferuiAed Line^Home LineGo SomeLineChapter 4^ Playing Soccer: AReal-World Experimentthe Dynamite testbed. The testbed was, however, necessary for testing reactive deliberationand its use is an integral part of this thesis.4.2 The Soccer Controller Implemented in Reactive Deliberation4.2.1 OverviewFigure 11 shows the soccer-playing controller based on reactive deliberation. TheExecutor box lists the action schemas, while the Deliberator box lists the behaviours. Thebehaviours are responsible for accomplishing objectives, whereas the action schemas areprimitive operations or activities that the robot is capable of performing. For example, thego home line behaviour plans a path from the current position to a position in front of therobot's home goal, while the follow path schema causes the robot to track a specific pathtrajectory. The details of behaviours and action schemas will be explained further in thefollowing subsections.Deliberator•Sensor Dataand Status Executor •Action andParam tersFollowPath StopToEffectorsFromSensorsServoDefendFigure 11: The Reactive Deliberation Controlleraction schemasIdle53Chapter 4^ Playing Soccer: AReal-World ExperimentThe deliberator and executor each run on their own transputer nodes (processors) andcommunicate through message passing. The executor sends position and status informationto the deliberator at 60 Hz. The deliberator asynchronously updates the current action ofthe executor when appropriate.The controller has been implemented in C. (This is the only programming language ourlab supports on the transputer network.) It has been constructed over a period of a year,yet only an estimated 3 or 4 person-months have gone into design, coding, and testing. Thevarious parts of the controller total more than 5000 lines of code. This figure excludes thesimulator and vision system, since they were written by other members of the laboratory.4.2.2 ExecutorTable 3 shows the action schemas that have been written for the executor. Only oneaction schema is enabled at a time. The enabled schema sets the robots control outputs,throttle and steering angle, and sends messages to the active behaviour in the deliberatorwhen it is having problems. The enabled schema can also transfer control to other schemas.For instance, the follow path schema transfers control to the stop schema when it has reachedthe end of its path. Similarly, the stop schema transfers control to the idle schema whenthe robot has stopped moving.The follow path schema is by far the most complex one, since the robot is subject tonon-holonomic constraints. This schema follows the path (that consists of circular arcs andstraight line segments — see Appendix C) to within a certain tolerance measured in absoluteposition and heading errors. An example of a planned path and the actual path followedis shown in Figure 12. The path plot is for the center of the car. The box in the middleindicates the dimensions of the robot.54Playing Soccer: AReal-World ExperimentChapter 4Action Schema Parameters FunctionFollow Path Path Plan. Follow the path trajectory whileminimizing tracking error.Servo none. Drive at ball until it is struck(provided this will advance the ball).Defend Offset ofdefense line.Stay between the ball and the net andintercept incoming balls.Stop none. Reduce the speed of the car to zero.Idle none. Do nothing.Y (cm)120.00100.0080.0060.0040.0020.000.00Table 3 Action Schema for Soccer-Playing/)7htl   ',,,,- • '..--.-.-,.7-,!^•NPlaypen WallsDynamitePlanned PathActual PathX (cm)0.00^50.00^100.00^150.00^200.00^250.00Figure 12: Path Plot for the Car-like DynamiteEven the servo schema is atypical of simple behavioural routines. This schema triesto servo the robot into the ball. However, rather than just driving straight at the ball, thisroutine servos the robot to a future location of the ball that is predicted using an internalmodel of the ball's dynamics.The defend schema alternates between two modes. Normally, the robot stays betweenthe ball and the center of the net. However, if the projected motion of the ball will carryit past the end of the robot, the robot moves to intercept it in an effort to keep the ball55Chapter 4^ Playing Soccer: AReal-World Experimentaway from the net.The levels of performance in control that are demonstrated in this thesis are dueprincipally to the use of feed-forward control. At low speeds, feed-forward control is notneeded, and a simple fuzzy logic controller is sufficient for accurate control of the robot.However, at higher speeds (1 m/s), simple control routines are not adequate. The futureposition and velocity of the robot is predicted using information about the latency in thesensing and control and a model of the robot dynamics. Feed-forward control uses thepredicted state of the robot to determine the control signals, rather than the current state.There are three executor subsystems that run independently of the enabled action schema.Alternatively, one could think of these routines as being instantiated in each schema. Thesensor processing subsystem receives position data for all the objects in soccer field fromthe vision engine at 60 Hz. A least-squares approximation is used to estimate the velocity ofthe ball and the cars. The communications subsystem updates the planner with position andvelocity data at 60 Hz. The collision detection subsystem automatically transfers control tothe stop schema to prevent collisions with other robots.4.2.3 DeliberatorThe deliberator implemented using reactive deliberation does not contain a traditionalplanner. Instead, the deliberator is composed of behaviours, listed in Table 4, that representthe goals of the robot. The action schema invoked by each behaviour is listed in the secondcolumn. It is possible for a behaviour to select different schemas, although this was foundnot to be needed. The last column identifies the goal of the behaviour. Each behaviour isresponsible for generating a bid and planning an action. However, in this implementation,planning is limited to motion planning.56Chapter 4^ Playing Soccer: AReal-World ExperimentBehaviour Action Schema GoalShoot Follow Path Shoot the ball into the net(either directly or off a wall).Clear Follow Path Clear the ball from home endof the soccer fieldServo at Ball Servo Advance the ball towards theopponent's goal.Wait Idle Wait for an opportunity to dosomething.Go Red Line Follow Path Drive to the center of thesoccer field.Defend Red Line Defend Play forward defence.Go Home Line Follow Path Drive to the home net.Defend Home Line Defend Play goalie.Table 4 Soccer-Playing Behaviours in Reactive DeliberationBehaviours generate bids in the range [0,10] (floating point). The bids are simplealgebraic formulas that are easily computed. Each behaviour has a basic bid that is modifiedthrough the addition and subtraction of weighted factors that depend on the environment andthe results of motion planning. Complex conditions in the environment, such as the distanceof the ball from one's home goal, are converted to factors in the range [0,1]. One exceptionis path plans that are converted to factors based on the log of the expected travel time.The shoot behaviour generates bids in a typical manner. It has a base bid that is modifiedwith environment factors and planning results. Some of the factors include: the speed andheading of the ball, the Euclidean distance to the ball, the Euclidean distance of the othercar to the ball, and the speed of the other car. Since shoot plans and follows a path, itsbid is biased upwards when it has begun following a path (this is to provide stability inthe bidding process).The bids used here are crude estimates of the expected utility of the actions proposed by57Chapter 4^ Playing Soccer: AReal-World Experimenteach behaviour. The utility of each action depends on the internal importance of the goal (thebase bid), environmental conditions (factors), and the results of planning (log factors). Thefactors are combined with weights so that bids are an estimate (according to the designer)of utility.The system currently alternates between the ruling behaviour and other behaviours. Thereis no time limit (in theory) on the computations, although in the implementation all thebehaviours are restricted to bounded computations that do not last longer than 300 ms. Thisis a maximum, and many behaviours do not perform planning when their bids are too low,so there is little need for a more complex scheduling scheme.The non-holonomic constraints on the car-like robots make motion planning difficult,since it is not possible to move the robot in an arbitrary direction. An algorithm based onJumps [FW88] is used to provide fast motion planning. A number of paths are generated andthe lowest cost (in terms of time) is selected. Extensive details can be found in Appendix C.4.3 The ExperimentIn this section, a series of soccer-playing experiments that are used to evaluate reactivedeliberation are described. Different versions of the controller will be compared to establishthe utility of the architecture.The deliberator is capable of performing at a number of levels of ability that reflect theincremental development of the controller. The simplest version alternates between offensiveand defensive behaviours without any evaluation of the environment. The next gives controlto the deliberator only when the car is stationary and the idle action schema is active. Thefinal version exhibits the concurrent deliberation and execution that characterize reactivedeliberation.58Chapter 4^ Playing Soccer: AReal-World ExperimentThe purpose of the soccer playing robot is (not surprisingly) to play soccer. Successin soccer is measured by the number of soccer games won compared to the number lost.The primary objective in soccer is to win the game. Although real soccer players may havesecondary goal such as avoiding injury and personally scoring as many goals as possible,these will not be addressed. The way to win a game is to score more goals than the otherteam. This involves scoring goals on the other team and preventing goals from being scoredagainst one's own team.Soccer is a dynamic game; the locations of the ball and the robots are constantlychanging. To be successful in this domain, the robot must be able to accomplish a collectionof meaningful tasks and not just exhibit "cute" behaviours. In soccer, there is an easilyidentified measure of success — the score at the end of a soccer game. The competitivenature of soccer allows us to compare the adequacy of the robot players either with otherautonomous robots or with robots under direct control of a human being. Perhaps a betterdescription for the soccer domain is adversarial, rather than dynamic, since the opposingplayer is working directly against the objectives of the robot.The three controllers compete for the LCI Cup (Laboratory for Computational IntelligenceCup). The LCI Cup is a round-robin tournament where each controller plays the othersin a game of soccer. Presumably the fully implemented controller will be victorious,however, differences in the levels of play may indicate the relative importance of the differentarchitectural elements. Part of the tournament includes games where controllers play copiesof themselves! The purpose of these games is to get an idea of how random the games are.The soccer games are one-on-one tournaments between robots. With only one roboton each team, the focus is on dynamic domains and not on multi-robot cooperation. Each59Chapter 4^ Playing Soccer: AReal-World Experimentgames is ten minutes in duration, with five minute halves. At halftime the robot chassis areswitched to prevent a bias based on differences in the mechanical components. There is onlyone game played between each pair of controllers since the length of each game is arbitrary.For example, the total difference in score in two one minute games would be the same aswith one two minute game. The game duration was arbitrarily set at ten minutes, and thisproved to result in sufficiently high scores to allow differentiation between controllers.One aberration in the testing procedure is that the ball was not centered after each goal.There is no external process that referees the game and tells the robots when a goal hasofficially been scored and when to resume play. As a result, the robots are continuouslyplaying soccer: each half-game is five minutes of continuous play. To compensate for onerobot repeatedly scoring goals, a robot that had been scored upon had to clear the ball fromit's end before any more goals were counted. This greatly simplified the testing procedure.Soccer games with human beings have not been included in the LCI Cup. Humansare at a great disadvantage playing soccer through a radio-controlled car since it is difficultto achieve the accurate control needed to successfully shoot the ball. Games between themiddle version of the controller and many individuals have been held. In these informalgames the computer usually won, however, a skilled and practised human could certainlywin on a regular basis.60Chapter 5Results and Evaluation5.1 The LCI CupThe winner of the 1993 LCI Cup is the fully implemented reactive deliberation controller.Table 5 shows the scores for all the games. The score for the controller in the row is the firstnumber, and the score for the controller in the column is the second. (e.g. 11 - 1 means thatthe reactive deliberation controller scored 11 goals while the no-wit controller scored only 1.)Controller No-wit Half-wit ReactiveDeliberationReactiveDeliberation 11 - 1 7 - 4 3 - 3Half-wit  6 - 3 5 - 5No-wit 8 - 2Table 5 Final Scores in the LCI CupThe no-wit controller does not select goals based on the state of the world, but simplyalternates between offensive and defensive behaviours. The half-wit controller gives controlto the deliberator only when the car is stationary and the idle action schema is active. Thisoccurs either when a schema has terminated or when a set maximum time is exceeded.61Chapter 5^ Results and EvaluationThe reactive deliberation controller performs concurrent deliberation and execution, as isintended of the architecture.There is an element of chance in these soccer games: the scores are a result of acomplex set of interactions between the robots and their environment. These results arepartially repeatable because the same general results will emerge, but the actual scores willbe different. For a better estimate of the results, the duration of the soccer game could beextended.The rank of the controllers from best to worst is: reactive deliberation, half-wit, and no-wit. This ranking is probably reliable since the better controllers scored nearly twice as manygoals (7-4 and 6-3 are the scores) as the controller ranked beneath it. The results of the gamesplayed with the same controller indicate that the better two controllers (reactive deliberation,half-wit) generate fairly constant performance, while the no-wit controller produces somewhatrandom performance. The scores (5-5 and 3-3) should be interpreted as close scores, ratherthan identical. They really do not show the underlying randomness that is present as mightbe shown by a listing of when the goals were scored. The score 8-2 in the no-wit vs. no-witgame is a result of the almost random playing strategy of that controller.5.2 DiscussionThe difference in score between the reactive deliberation and half-wit controllers issignificant. The only difference between these two controllers are that reactive deliberationconsiders alternate actions all the time, while the half-wit controller only when an actionschema terminates. The reactive deliberation controller selects transient goals as frequentlyas possible and can interrupt actions. The half-wit controller is like the traditional planning-based architectures: alternate actions are considered only when the current action has62Chapter 5^ Results and Evaluationterminated. This is evidence that the frequent evaluation of transient goals is critical tosuccess in dynamic worlds. This is not a surprising result — what is surprising is thatarchitectures such as Shakey and RAP do not include provisions for this.The level of performance that the robots were able to achieve is partially due to the useof internal world models. As was mentioned in the last chapter, an internal model of thedynamics of the robot is used to provide feed-forward control. This is not a superfluouselement; it really is necessary for the robots to operate at speeds of 1 m/s. Brooks arguesthat "the world is its own best model" and that internal models are inappropriate [Bro91].Experiences with soccer-playing robots suggest that Brooks' slogan is wrong.If internal models are necessary, then how can the complete reactive architectures besuitable for dynamic domains? The subsumption architecture prohibits the use of explicitinternal models, thus limiting the knowledge a designer can encode in a robot. The concrete-situated approach focuses on building patterns of interaction with the environment based onthe way the world is now. To use feed-forward control, a history of control signals is needed.The situated automata approach only has internal state in the perception part of the controller,so one of the perception outputs could be the predicted location/state of the robot. Of thethree architectures, only the situated automata is not ruled out on the basis of internal model.The performance of the robot players is largely a function of the selection of transientgoals. The bidding mechanism has been fine-tuned through an iteration cycle with ob-servations of soccer games followed by incremental changes to the behaviours. A usefulabstraction that helps with this is the routines of action from the concrete-situated approach.The idea here is that a pattern of activity such as clear the ball, shoot, defend red line, . . . Inthe case of soccer-playing, the construction of successful robots does involve careful attention63Chapter 5^ Results and Evaluationto patterns of activity. This is an emergent result of the soccer-playing experiments.In conclusion, the reactive deliberation controller plays a nice, although not flawless,game of soccer. The competitive nature of soccer places very strict time constraints on therobots and allows different controllers to be easily compared. The dynamic and unpredictablenature of one-on-one robot soccer favours approaches that are concerned with the immediatesituation and reactive deliberation takes advantage of this.64Chapter 6Conclusions6.1 Summary and ContributionsThe purpose of this thesis is to investigate the structures needed in robot architecturesfor dynamic environments. The architectures surveyed were argued to be inadequate fordynamic domains. The weakness of the partitioned architectures such as RAP is due tothe commitment to arbitrary length plans and the infrequent evaluation of transient goals.Complete reactive architectures such as the subsumption architecture are limited by the lackof internal models and restricted computational structures.Reactive deliberation has been designed to present a set of structural elements needed indynamic domains. The main features of the architecture are the following:• Reactive deliberation is partitioned into an executive layer and a deliberativelayer.• The executive layer interacts with the environment in real-time through a set ofaction schemas that receive run-time parameters from the deliberative layer.65Chapter 6^ Conclusions• The deliberative layer is composed of goal-oriented modules called behavioursthat select transient goals (actions) in a distributed manner through utility esti-mates.• Behaviours provide modularity in the design of robot controllers and alloweffective goal arbitration as well as the distribution of limited computationalresources.The theoretical contributions of reactive deliberation to the design philosophy of robotarchitecture for dynamic environments are the following:• The transient goals (actions) of a robot need to be evaluated at a rate commen-surate with changes in the environment.• A new split between reasoning and control is needed since action selection cannotbe suitably determined independently of detailed planning.• Goal-oriented behaviours are a useful abstraction.Soccer is a demanding dynamic domain; the locations of the ball and the robots areconstantly changing. A controller based on reactive deliberation has been implementedto allow robots to compete in one-on-one games of soccer. Current functionality includesmotion planning, ball shooting and playing goal. The robots can drive under accurate controlat speeds up to 1 m/s, while simultaneously considering alternate actions.In a soccer tournament called the LCI Cup, the effectiveness of the controller has beendemonstrated. Specifically, the importance of modifying goals in response to changes in theenvironment has been shown. Further, the results suggests that the architectural elements inreactive deliberation are sufficient for real-time intelligent control in dynamic environments.66Chapter 6^ ConclusionsIt has been postulated that reactive deliberation is nothing more than a kludge that hasbeen hacked together solely for the purpose of soccer-playing robots. Although the soccerdomain was the impetus for the development of reactive deliberation, the guiding principlesof its design were drawn from other architectures. The extensive survey of other architecturescontained in Chapter 2 stands as testimony of the attention that has been paid to previouswork in this area. Reactive deliberation is not a kludge.One criticism that has been made of this work is that the soccer domain is too reactive.Many tasks in the real world do not require that an agent be highly reactive to changes in theenvironment. For such cases reactive deliberation is irrelevant. The usefulness of reactivedeliberation is in those tasks where a robot must be reactive to changes in the environment.6.2 Future WorkReactive deliberation is an architecture that focuses on the problems associated withdynamic domains. One possible extension to reactive deliberation is to address problemssuch as sensing and world modeling that have been hitherto ignored in this thesis. Perhapsmore important than this is the need to test reactive deliberation with robots operating inother environments.The cornerstone of reactive deliberation is action selection through behaviours that reportutility estimates. One extension would be to develop a more formal mechanism for estimatingthe utility of actions in robots. Alternatively, attention could be focused learning mechanismsand this would preclude the need for a more formal mechanism. With either extension, goodabstractions are needed to minimize complexity, yet there are few theories on how to findgood abstractions. Finally, reactive deliberation could be extended to multiple vehicles totest the appropriateness of local utility estimates with multiple robots.67Chapter 6^ ConclusionsThe problem of robot control in dynamic domains is still not fully solved. Theoutstanding problem specific to this area is that of scarce computational resources. Thedesire to respond rapidly to changes in the environment is in conflict with the need toassimilate new information and plan alternate actions. The complete reactive architecturesuse limited computational models and sacrifice the assimilation of information, while thepartitioned architectures typically sacrifice rapid responses. The solution taken in reactivedeliberation is to focus attention on the immediate situation to perform only directly relevantcomputations. In general, the computations that are appropriate with fixed computationalresources are a function of the environment and the tasks the robot must achieve. Moreattention is needed to fully explore this trade-off.The assessment of robotics architectures is difficult since they describe a design process.This thesis has demonstrated that reactive deliberation is a plausible architecture and itsadvantages over other approaches have been argued. Just as reactive deliberation has beeninfluenced by previous architectures, future architectures may be influenced by reactivedeliberation. Reactive deliberation is an incomplete architecture specification that is focussedon dynamic worlds; no claim has been made that it is the ultimate robot architecture. Rather,robot architectures should be seen as a succession of improvements.The experiments performed in this thesis only compare versions of a reactive deliberationcontroller with itself. The experiments should be extended so that reactive deliberation cancompete with other architectures and with a human operator. Since motor control of therobots is difficult for human beings, a user interface could be constructed where the humandoes the deliberation and the computer does the path planning and control. This would pit theintelligence of reactive deliberation against the intelligence of a human operator and provide68Chapter 6^ Conclusionsa better measure of the effectiveness of this architecture.69EE MV200DigicolourDataCubeMAXTRAN(moment)•FromCameraMAXBUSRGB^ /Video TransputerLinks --._1800(xform)^ , (/43',0) toControllerTransputerNetworkAppendix AThe Vision SubsystemThe Vision Engine is able to track the world space position of the centroids of colouredtargets placed on the robots at 60Hz using the off-board camera. Figure 13 shows thepipelined setup of the hardware. There are four stages in converting from RGB videoto (x,y,9) configurations in the world: (1) classification into regions (on the DataCube),(2) finding the moments of the convex regions (on the MAXTRAN), (3) converting from(row,colunui) to (x,y) for each region, and (4) grouping regions into logical objects (i.e.(x,y) -4 (x,y,0))•Figure 13: The Vision EngineThe colour video camera is placed in a fixed position so that the entire soccerfield/workspace is visible, as is shown in Figure 14. The video is fed into the Datacubehardware which is programmed to preprocess the image to simplify and speed up the cen-troid calculations. The Digicolour converts the analog video signal to a digital format. TheMV200 converts the incoming colour pixels from RGB colour space into HSV colour space70Appendix A The Vision Subsystemand then classifies them as belonging to either background or a designated colour class. Anexample of the processed output is shown in Figure 15. The coloured blobs for the car andthe ball show up as black in the figure, while the background is classified into black andwhite depending on the value of the image point. This stream of classified pixels is thenrun-length encoded and passed onto the transputers for further processing. The mapping ofstage 1 is: video —> blobs.Figure 14: Output from the real cameraFigure 15: Processed output from the DataCubeThe moment program on the MAXTRAN transputer examines the incoming processedimage and finds the centroid, in screen coordinates, of each disjoint, connected region. Themapping of stage 2 is: blobs —* (row,column) coordinates of each blob.The xform program performs stages 3 and 4 of the transformation on a single transputernode. In stage 3, xform corrects for radial lens distortion and finds the corresponding pointin world space which satisfies both the perspective projection for the camera's view positionand the constraint that all points must lie in a known plane (parallel to the floor). Themapping of stage 3 is (for each blob): (row,column) -4 (x,y).Once the world space position of each region has been found, it is necessary to mapthese onto targets. In the current implementation there will be only a single target of each71Appendix A^ The Vision Subsystemcolour class visible, and the region with the most pixels for each class is assumed to be thetarget. Then, since each robot has two targets on it, the position and orientation of the robotcan be found directly from the position of its targets. The accuracy in position is on the orderof a few millimeters. The mapping of stage 4 is: (x,y) blobs (x,y,O) logical vehicles.The robustness of the system varies considerably. The theory is as follows. By paintingthe targets using fluorescent paint, and then only accepting highly saturated colours of theappropriate hue, the pixel classification can be made with almost perfect accuracy. A furtheradvantage of picking fluorescent colours is that it is less sensitive to changes in the colour ofthe lighting. In reality, Stewart and I have spent hours twiddling around with the boundariesin hue, saturation, and value for the colour classes. Even so, the system can produce incorrectclassifications under certain conditions 2 that result in catastrophic errors. However, once theseproblems have been ironed out and the system is in a stable state, it works really well.The system is quite fast, for tracking with video cameras, and has low latency. TheDatacube preprocessing only introduces a few milliseconds of delay relative to the camera,and the moment program typically needs only 2ms of processing time to find the region'scentroids (for a normal image this processing can be done while the image is being transferredto the Maxtran). The remaining calculations are fairly trivial and require at most lms. Oncethe cost of transmission is included, the entire latency of the Vision Engine is about 5ms incontrast with a separation of 16.7ms between video frames.2^Such as someone wearing blue jeans walking by.72Appendix BThe SimulatorThe simulator has proved invaluable in the development of the robot controller describedin this thesis. Although a significant amount of work is required to build a simulator, it is wellworth the time spent. It is easier to use the simulator than the real testbed for the followingreasons: (1) real hardware breaks down, and is unavailable from time to time; (2) you mayhave to timeshare physical resources such as computer hardware with other users; (3) thesimulator can be used from any workstation, and multiple people can simultaneously beworking; (4) when your controller goes haywire, you don't wreck your carefully constructedrobot; (5) code can be tested quickly without booting up the real system (which takes time).The version of the simulator that was used in the development of this thesis, supporteda single car in an infinite plane. A ball, walls, and another car, could be hallucinated by thecontroller so that fairly extensive testing could be done even with a minimal simulator. Allthat the simulator supported was a coarse model of the vehicle's dynamics. Graphics werenot used since there was only one car and a xgraph plot can be used to illustrate both theplanned path and the path traversed. (See Figure 12 on page 55.) Even this minimal setupprovided enough facilities to do a lot of debugging.The visual display that the latest version of the simulator generates is shown in Figure 16.The simulator supports dynamic models for the cars and the ball. In the image two cars canbe seen (an arbitrary number of cars can be used) in the soccer field with a semi-visibleball between the cars. Collisions between the ball, cars, and walls are supported as well, sosoccer games can be conducted in the simulation.73Appendix B^ The SimulatorFigure 16: The Dynamite Simulator with Two Cars and a Ball74Appendix CPath PlanningC.1 IntroductionFast generation of motion plans is an essential part of a robot operating in a dynamicenvironment. The problem of planning for a car-like robot that is capable of travelling inforward and reverse is considered. Car-like robots are subject to non-holonomic kinematicconstraints that limit the change in a robot's direction of motion.Recently, there has been considerable interest in motion planning with non-holonomicconstraints [JC89; RS90; BCL91; BLJ91], however, much of the work fails to meet allthree criteria: 1) fast execution, 2) efficient motion plans (i.e. forward as well as reversemotion must be provided), and 3) planner can handle obstacles. There is no algorithm thatsatisfies all three criteria. My approach sacrifices optimality and completeness for speed, byconsidering a restricted class of motion plans.The basic problem is: given an initial configuration c2 = (xi,yi,e9i) in R2 x S 1 , find apath to a final configuration cf = (xf,yf,t9f). A configuration describes the position andorientation of a robot. A car-like robot can move forward and backward, but not sideways.The rate of change of orientation with respect to path length is limited by the maximumturning angle of the wheels. The result is that the motion is bounded by a minimum turningradius circle.The following section looks at metrics for comparing motion planning algorithms. Thisis followed by a survey of previous work that has been divided into two sections: one formotion planning without obstacles and one with obstacles. The algorithm used in this thesisis presented in Section C.5.C.2 A Metric for Motion PlannersPrevious work in motion planning under non-holonomic constraints has focussed ondeveloping algorithms that compute optimal paths that are the lowest cost according to some75Appendix C^ Path Planningdefinition of cost. Cost (cost 1) is defined as the path length plus a penalty for each changein the direction of motion.In robotics applications the real cost is a function of the time used to compute the path,time used following the path, and the energy used following the path. The cost defined abovecaptures these notions relevant to the output of a path planner. The time needed to computethe path is the measure that is used to compare different approaches.The path length is the total distance that the turning center of the car travels. Note thatthe average distance each wheel travels could also be used, but this will not, in general,produce the same costs. A change in the direction of motion occurs when the car changesfrom forward to reverse motion or vice versa. Changing direction is not desirable, sinceit takes time and requires energy. A penalty can be computed in units of distance that isrepresentative of the cost of changing direction.There are two other possible cost estimates: cost2 is just the path length, and cost3 ispath length + penalty for changes in direction of motion + penalty for changing steering orcurvature. Many planning algorithms use cost2, although costl and cost3 are more accurate.The penalty for changing curvature accounts for the limited rate at which the wheels of avehicle can be turned.Although cost3 is realistic for robots that will use these planning algorithms, the effectof changing curvature is ignored in literature. One exception is [F.1n93] where tangential andnormal accelerations along the path are considered. The jump method that is extended inthis thesis also ignores this cost. The justification is that ignoring the penalty for changingcurvature greatly simplifies motion planning. Also, an appropriately designed low-levelcontroller can compensate for the problem.The difference between cost2 and costl is simply a matter of counting the number ofreversals in motion. An optimal path using cost2 is the shortest path in units of distance. Anoptimal path using costl can be thought of as an optimal trade-off between distance, time,and energy consumed. In this paper, costl will be used to evaluate the approach presented.All three of the proposed cost functions implicitly account for the energy and timerequired to follow a path plan. The functions do not, however, account for the time requiredto plan a path. It is hard to evaluate this cost for two reasons. One, a robot may needto compare multiple paths before selecting one. Two, it is difficult to compare algorithms,since the approaches are varied.C.3 Motion Planning in an Infinite PlaneIn this section, motion planning for a robot in an infinite plane will be considered sincethis is easier than motion planning in an arena or with obstacles. For instance, optimal pathsunder cost2 can be easily generated.A set of optimal paths for a car that cannot change direction is described in [Dub57].Since reversals are not allowed, cost2 is used and optimal means shortest. An optimal path76Appendix C^ Path Planningconsists of 3 subpaths that are either circular arcs (C) or straight line segments (S). Suchpaths are of the form CSC or CCC. The circular arcs have curvature inversely proportionalto the radius of the minimum turning radius circle. There are 6 possible combinations ofsegments that can result in an optimal path, however there is no explicit formula for thispath. The only way to find the optimal path is to try each one of the 6 combinations.The results of [Dub57] were extended in [RS90] to a car that can change direction. Theresults are analogous. Here an optimal path consists of 5 subpaths, where subpaths can bezero in length. The paths are of the from CCSCC. There are 48 paths that must be consideredto determine an optimal path. This method for computing paths is expensive even in theabsence of obstacles. The paper describes a method for determining the shortest path usingcost2. It is not clear that cost 1, which is a preferable measure, can be incorporated easily,since paths that are currently discarded may be optimal.[FW88] created the notion of jumps, where a jump is the concatenation of 3 subpaths: acircular arc, a straight line segment and another circular arc (or CSC). The circular arcs arepart of a minimum turning radius circle. [Lat91] has used jumps for motion planning witha car that cannot change direction. In such a case, there are at most four possible jumpsbetween two configurations. An example of such paths are shown in Figure 17. Each path islabeled with two letters. The first letter corresponds to the orientation of initial circle, and thesecond to the final circle. For instance, "RL" refers to the path where the minimum turningradius circle to the right of the initial configuration is chosen as well as the circle to the left ofthe final configuration. The algorithm presented in this paper is an extension to this approach.RLLR^ RRFigure 17: Jump paths with unidirectional motionC.4 Motion Planning in a Convex ArenaIn this section, we consider the problem of motion planning where the robot is locatedin a convex arena. It is assumed that the arena is the only static obstacle. There may beother moving objects, but they will not be considered as it is assumed that their motionis unpredictable and that collision detection can be considered an on-line problem that isoutside the scope of this thesis.77Appendix C^ Path Planning[RS90] have a method for computing optimal paths in an infinite plane that has not beenextended to motion planning with obstacles. A motion planner that produces optimal pathsin the presence of obstacle has yet to be created. None of the following methods are optimal.Using the original jump notation for a unidirectional car, [FW88] propose an algorithmfor deciding whether there exists feasible path by considering contact points with obstacles.This approach is not sufficient for the problem under consideration since it does not considerpaths with reversals.Geometric motion planners produce path plans that do not in general obey non-holonomicconstraints. [13LJ91] discuss an iterative motion planner that runs as a post-processing stageafter any geometric planner. The iterative planner modifies the path within a tolerance boundto provide a path that does obey non-holonomic constraints. This path will be locally optimal,but there is nothing to ensure that it is globally optimal. Further, since the two planners aredisjoint there is little evidence to suggest that efficient paths will be produced.[JC89] approach the problem of motion planning by discretizing configuration space andbuilding a directed search graph. Path planning can be done by searching this graph. Theresults presented are for generating paths without reversals, although it can be extended toinclude paths with reversals at the expense of doubling the branching factor in the graph.The main problem with this approach is that a large number of possible paths are computedand discarded. It is too slow. It is interesting to note that the performance of this algorithmwill improve as complexity increases and free space is reduced.[MC92] describes a method for planning that uses skeletons (safe path segments). A pathconsists of three subpaths: a path from the initial configuration to a skeleton, a concatenationof skeletons, and a path form the final skeleton to the final configuration. The approach isbased on generating a table lookup for the shortest feasible path distance between two pointsin configuration space that can then be used to compute skeletons. This approach, althoughcomplex, seems to be practical for a robot operating in an environment with many obstacles.The approach, however, does not meet our earlier requirement of fast execution.C.5 A Fast Motion PlannerThe approach used in this thesis, called jumps+, is an extension of the jumps approachto a car that can change direction. If there were no restrictions placed on motion there wouldbe 64 paths to be considered. (4 pairs of circles) x (4 inter-circle tangents) x (2 directions toreach the tangent on each circle)2 = 64. The paths generated will not always be optimal. If anoptimal path is of the form CSC (Circle arc, line Segment, Circle arc), it will be found usingjumps+. There are some optimal paths that cannot be found, such as CCC and CCSCC paths.To shoot or drive into a ball in soccer, the robot can only approach its final configurationin one direction. As a result, only 32 of the 64 possible paths are of interest. A simpleapproach is used where all 32 possible paths are generated, and then tested for collisions in78Appendix C^ Path Planningorder of increasing length. This is not very inefficient, and could be greatly improved byinterleaving path generation with collision tests.This approach is reasonable for a convex arena. However, the planner is flawed becauseit a) can only find some optimal paths, but not all of them and b) fails to produce any path planwith certain initial and final configurations. It does, however, perform much less computationthan any of the other path planning approaches, by considering only a restricted set of paths.Possible extensions include the addition of static and moving obstacles. However,increasing the number of obstacles may not be feasible, since this will further deteriorate thechances that a CLC path will exist between any two configurations. The problem of motionplanning with moving obstacles could potentially be solved under this approach since it isfast and could respond in real-time.79Bibliography[AC87][AC90][Alb81][Asi90][ECL91][B1KL+93][BKM+93][BL.I91][Bro86][Bro90][Bro91][Cha91][DB88][Dub57]Philip Agre and David Chapman. Pengi: An implementation of a theory ofactivity. In Proceedings of AAAI-87, 1987.Philip Agre and David Chapman. What are plans for? In Pattie Maes,editor, Designing Autonomous Agents: Theory and Practice from Biology toEngineering and Back, pages 17-34. M.I.T. Press, 1990.James Albus. Brians, behaviour, and robotics. BYTE Publications, 1981.Isaac Asimov. Robot Visions. Peguin, 1990.J-D Boissonnat, A. Cerezo, and J. Leblond. Shortest paths of boundedcurvature in the plane. Technical Report No. 1503, I.N.R.I.A., 1991.R. Barman, S. Kingdon, J. Little, A. K. Mackworth, D.K. Pai, M. Sahota,H. Wilkinson, and Y. Zhang. Dynamo: real-time experiments with multpilemobile robots. In Proceedings of Intelligent Vehicles Symposium — Tokyo,1993.R.A. Barman, S.J. Kingdon, A.K. Mackworth, D.K. Pai, M.K. Sahota,H. Wilkinson, and Y. Zhang. Dynamite: A testbed for multpile mobile robots.In Proceedings IJCAI Workshop on Dynamically Interacting Robots, 1993.Andre Bellaiche, Jean-Paul Laumond, and Paul Jacobs. Controllability ofcar-like  robots and complexity of the motion planning problem with non-holonomic constraints. In Proceedings of the International Symposium onIntelligent Robotics, 1991.Rodney A. Brooks. A robust layered control system for a mobile robot. IEEEJournal of Robotics and Automation, RA-2:14-23, 1986.Rodney A. Brooks. Elephants don't play chess. In Pattie Maes, editor,Designing Autonomous Agents: Theory and Practice from Biology toEngineering and Back, pages 3-15. M.I.T. Press, 1990.Rodney A. Brooks. Intelligence without reason. Technical Report 1293, M.I.T.A.I. Lab, 1991.David Chapman. Vision, Instruction, and Action. MIT Press, 1991.Thomas Dean and Mark Boddy. An analysis of time dependent planning. InProceedings of AAAI-88, 1988.L.E. Dubins. On curves of minimal length with a constraint on averagecurvature and with prescribed initial and terminal positions and tangents.American Journal of Math, 1957.80[E1n93]^A. Elnagar. Piecewise smooth and safe trajectory planning for mobile robots.In Proceedings of the International Conference on Intelligent Robotics andSystems, 1993. (Forthcoming).[Fir89]^R. James Firby. Adaptive Execution in Complex Dynamic Worlds. PhD thesis,Yale University, 1989.[Fir92]^R. James Firby. Building symbolic primitives with continuous control routines.In First International Conference on Artificial Intelligence Planning Systems,1992.[FW88]^S. Fortune and G. Wilfong. Planning constrained motion. In Proceedings ofthe Fourth Symposium on Computational Geometry, 1988.[Gat92]^Erann Gat. Integrating planning and reacting in a heterogeneous asynchronousarchitecture for controlling real-world mobile robots. In Proceedings of AAA1-92, 1992.[JC89]^P. Jacobs and J. Canny. Planning smooth paths for mobile robots. In In STOCS ,Chicago, pages p. 445-459. Association for Computing Machinery, 1989.[Kae90]^Leslie Pack Kaelbling. An architecture for intelligent reactive systems. InReadings in Planning. Morgan Kaufman, 1990. Originally in Reasoning aboutActions and Plans (Morgan Kaufman), 1987.[ICD89]^Keiji Kanazawa and Thomas Dean. A model for projection and action. InProceedings of the Eleventh International Joint Conference on ArtificialIntelligence, 1989.[KR90]^Leslie Pack Kaelbling and Stanley J Rosenschein. Action and planning inembedded agents. In Pattie Maes, editor, Designing Autonomous Agents:Theory and Practice from Biology to Engineering and Back, pages 35-48.M.I.T. Press, 1990.[Kub92]^Claus Ronald Kube. Collective robotic intelligence: A control theory for robotpopulations. Master's thesis, University of Alberta, 1992.[Lat91]^Jean-Claude Latombe. Robot Motion Planning. Kluwer, 1991.[LBaJL91]- J. Little, R. Barman, and S. Kingdon anf J. Lu. Computational architectures forresponsive vision: the vision engine. In Proceedings of Computer Architecturesfor Machine Perception, 1991. Paris.[Mac93]^Alan Mackworth. On seeing robots. In A. Basu and X. Li, editors, ComputerVision: Systems, Theory, and Applications. World Scientific Press, 1993.[Mae89]^Pattie Maes. The dynamics of action selection. In Proceedings of the EleventhInternational Joint Conference on Artificial Intelligence, 1989.[Mae90]^Pattie Maes. Situated agents can have goals. In Pattie Maes, editor, DesigningAutonomous Agents: Theory and Practice from Biology to Engineering andBack, pages 49-70. M.I.T. Press, 1990.81[MC92]^B. Mirtich and J. Canny. Using skeletons for nonholonomic path planningamong obstacles. In Proceedings of the 1992 IEEE International Conferenceon Robotics and Automation, 1992.[Min86]^Marvin Minsky. The Society of Mind. Simon & Schuster Inc., 1986.[MP92]^Zohar Manna and Amir Pnueli. The temporal logic of reactive and concurrentsystems. Springer-Verlag, 1992.[Ni184]^Nils Nilsson. Shakey the robot. Technical report, SRI International, 1984.Collection of Earlier Technical Reports.[Nor92]^Fabrice Noreils. An architecture for cooperative and autonomous mobilerobots. In Proceedings of the 1992 IEEE International Conference on Roboticsand Automation, 1992.[RS90]^J.A. Reeds and L.A. Shepp. Optimal paths for a car that goes both forwardsand backwards. Pacific Journal of Mathematics, 1990.[Sch87]^Marcel Schoppers. Universal plans for reactive robots in unpredictableenvironments. In Proceedings of the Tenth International Joint Conference onArtificial Intelligence, 1987.[Smi80]^G. Smith. The contract net protocol: High-level communication and control ina distributed problem solver. IEEE Transactions on Computing, 29(12), 1980.[Tyr93]^Toby Tyrrell. Computational Mechanisms for Action Selection. PhD thesis,Edinburgh University, 1993.[ZM92a]^Ying Zhang and Alan Mackworth. Constraint nets: A semantic model for real-time embedded systems. Technical Report TR 92-10, University of BritishColumbia, 1992.[ZM92b]^Ying Zhang and Alan Mackworth. Will the robot do the right thing? TechnicalReport TR 92-31, University of British Columbia, 1992.[ZM93]^Ying Zhang and Alan Mackworth. Design and analysis of embedded real-timesystems: An elevator case study. Technical Report TR 93-4, University ofBritish Columbia, 1993.82

Cite

Citation Scheme:

        

Citations by CSL (citeproc-js)

Usage Statistics

Share

Embed

Customize your widget with the following options, then copy and paste the code below into the HTML of your page to embed this item in your website.
                        
                            <div id="ubcOpenCollectionsWidgetDisplay">
                            <script id="ubcOpenCollectionsWidget"
                            src="{[{embed.src}]}"
                            data-item="{[{embed.item}]}"
                            data-collection="{[{embed.collection}]}"
                            data-metadata="{[{embed.showMetadata}]}"
                            data-width="{[{embed.width}]}"
                            async >
                            </script>
                            </div>
                        
                    
IIIF logo Our image viewer uses the IIIF 2.0 standard. To load this item in other compatible viewers, use this url:
http://iiif.library.ubc.ca/presentation/dsp.831.1-0051205/manifest

Comment

Related Items