Open Collections

UBC Theses and Dissertations

UBC Theses Logo

UBC Theses and Dissertations

Improved action and path synthesis using gradient sampling Traft, Neil 2017

Your browser doesn't seem to have a PDF viewer, please download the PDF to view this item.

Item Metadata


24-ubc_2017_november_traft_neil.pdf [ 1.22MB ]
JSON: 24-1.0357150.json
JSON-LD: 24-1.0357150-ld.json
RDF/XML (Pretty): 24-1.0357150-rdf.xml
RDF/JSON: 24-1.0357150-rdf.json
Turtle: 24-1.0357150-turtle.txt
N-Triples: 24-1.0357150-rdf-ntriples.txt
Original Record: 24-1.0357150-source.json
Full Text

Full Text

Improved Action and Path Synthesis using GradientSamplingbyNeil TraftBSc. Computer Science, Tulane University, 2007A THESIS SUBMITTED IN PARTIAL FULFILLMENTOF THE REQUIREMENTS FOR THE DEGREE OFMaster of ScienceinTHE FACULTY OF GRADUATE AND POSTDOCTORALSTUDIES(Computer Science)The University of British Columbia(Vancouver)October 2017c© Neil Traft, 2017AbstractAn autonomous or semi-autonomous powered wheelchair would bring the benefitsof increased mobility and independence to a large population of cognitively impairedolder adults who are not currently able to operate traditional powered wheelchairs.Algorithms for navigation of such wheelchairs are particularly challenging dueto the unstructured, dynamic environments older adults navigate in their dailylives. Another set of challenges is found in the strict requirements for safety andcomfort of such platforms. We aim to address the requirements of safe, smooth,and fast control with a version of the gradient sampling optimization algorithmof [Burke, Lewis & Overton, 2005]. We suggest that the uncertainty arising fromsuch complex environments be tracked using a particle filter, and we propose theGradient Sampling with Particle Filter (GSPF) algorithm, which uses the particles asthe locations in which to sample the gradient. At each step, the GSPF efficiently findsa consensus direction suitable for all particles or identifies the type of stationarypoint on which it is stuck. If the stationary point is a minimum, the system hasreached its goal (to within the limits of the state uncertainty) and the algorithmnaturally terminates; otherwise, we propose two approaches to find a suitabledescent direction. We illustrate the effectiveness of the GSPF on several exampleswith a holonomic robot, using the Robot Operating System (ROS) and Gazebo robotsimulation environment, and also briefly demonstrate its extension to use a versionof the RRT* planner instead of a value function.iiLay SummarySome of those who are most in need of improved mobility are unable to operate apowered wheelchair safely and are denied this crucial improvement to their qualityof life.If a robotic platform were able to liberate the user from the low-level tasks ofcollision avoidance and smooth steering, it would open up usage to a much wider setof individuals and only require high-level directional commands. Furthermore, thiskind of human-robot teamwork would have wide-ranging applications far beyondpowered wheelchairs.Our work is a start down the path toward forms of human-robot teamwork thatcan meet the safety requirements needed to pass certification for operation of apowered wheelchair. It exhibits safety benefits over more traditional approacheswhile remaining efficient enough to be applied on a platform of limited power andcompute such as a wheelchair.iiiPrefaceThe algorithm described in Section 3.4 was originally conceived by my supervisor,Ian Mitchell. This algorithm, along with the experiments in Sections 5.1, 5.2, and5.4 were presented in our publication: N. Traft and I. M. Mitchell. ImprovedAction and Path Synthesis Using Gradient Sampling. In 55th IEEE Conference onDecision and Control, pages 6016–6023. IEEE, 2016. [37] Chapters 3, 4, and 5were borrowed, with modifications and additions, from this paper.The implementation of GSPF is based on an earlier version by students BrandenFung and Carolyn Shen.ivTable of ContentsAbstract . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . iiLay Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . iiiPreface . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . ivTable of Contents . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . vList of Tables . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . viiList of Figures . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . viiiGlossary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . ixAcknowledgments . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . x1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11.1 Navigation Under Uncertainty . . . . . . . . . . . . . . . . . . . 11.2 Safe, Smooth Control . . . . . . . . . . . . . . . . . . . . . . . . 21.3 Navigation with Value Functions . . . . . . . . . . . . . . . . . . 31.4 An Example . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 32 Related Work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 82.1 Types of Uncertainty . . . . . . . . . . . . . . . . . . . . . . . . 82.1.1 Selective Focus . . . . . . . . . . . . . . . . . . . . . . . 92.1.2 Where This Work Fits . . . . . . . . . . . . . . . . . . . 10v2.2 Types of Robustness . . . . . . . . . . . . . . . . . . . . . . . . 102.2.1 Partially Observable Markov Decision Process (POMDP)Solvers . . . . . . . . . . . . . . . . . . . . . . . . . . . 102.2.2 Other Approaches . . . . . . . . . . . . . . . . . . . . . 112.2.3 Comparison To This Work . . . . . . . . . . . . . . . . . 123 Problem & Background . . . . . . . . . . . . . . . . . . . . . . . . . 143.1 Problem Formulation . . . . . . . . . . . . . . . . . . . . . . . . 153.2 Planning with a Value Function . . . . . . . . . . . . . . . . . . . 153.3 State Estimation with Particle Filters . . . . . . . . . . . . . . . . 173.4 The Gradient Sampling Algorithm . . . . . . . . . . . . . . . . . 174 Gradient Sampling with Particle Filter . . . . . . . . . . . . . . . . 214.1 Gradient Sampling with State Uncertainty . . . . . . . . . . . . . 214.1.1 Contrasting Gradient Sampling and GSPF . . . . . . . . . 254.2 Classifying and Resolving Stationary Points . . . . . . . . . . . . 254.2.1 Stationary Points Classification . . . . . . . . . . . . . . 264.2.2 Resolution by Eigenvector . . . . . . . . . . . . . . . . . 274.2.3 Other Failures to Achieve Consensus . . . . . . . . . . . 285 Experiments . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 305.1 Single Obstacle . . . . . . . . . . . . . . . . . . . . . . . . . . . 315.2 Narrow Hallway . . . . . . . . . . . . . . . . . . . . . . . . . . . 355.3 Hallway Entrance . . . . . . . . . . . . . . . . . . . . . . . . . . 365.4 Cluttered Scene . . . . . . . . . . . . . . . . . . . . . . . . . . . 395.5 Other Planners: RRT* . . . . . . . . . . . . . . . . . . . . . . . . 396 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 45Bibliography . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 46viList of TablesTable 5.1 Average Angular Difference Between Successive Actions . . . 35Table 5.2 Angular Change and Safety Metrics . . . . . . . . . . . . . . . 38viiList of FiguresFigure 1.1 Narrow corridor with value function. . . . . . . . . . . . . . . 4Figure 1.2 Various descent methods in the narrow corridor. . . . . . . . . 5Figure 1.3 Uncertainty in the optimal action. . . . . . . . . . . . . . . . 6Figure 3.1 Finding a “consensus” among different actions. . . . . . . . . 19Figure 4.1 Illustration of proposition 1. . . . . . . . . . . . . . . . . . . 22Figure 4.2 Using a quadratic approximation to resolve a saddle point. . . 28Figure 5.1 Single obstacle scenario, costmap and value function. . . . . . 31Figure 5.2 Single obstacle scenario, planned path. . . . . . . . . . . . . 32Figure 5.3 Relationship between state uncertainty and consensus action. . 33Figure 5.4 Navigating a narrow hallway under three competing methods. 34Figure 5.5 Entering a narrow hallway under three competing methods. . . 37Figure 5.6 Cluttered scenario, costmap and value function. . . . . . . . . 40Figure 5.7 Cluttered scenario, planned path on value function. . . . . . . 41Figure 5.8 Cluttered scenario, RRT* tree. . . . . . . . . . . . . . . . . . 42Figure 5.9 Cluttered scenario, planned path on RRT*. . . . . . . . . . . 43viiiGlossaryGSPF Gradient Sampling with Particle FilterLQG Linear Quadratic GaussianMCL Monte Carlo LocalizationPOMDP Partially Observable Markov Decision ProcessQMDP Q-value Markov Decision ProcessROS Robot Operating SystemSGD Stochastic Gradient DescentixAcknowledgmentsI owe my sincere and heartfelt thanks to my supervisor, Dr. Ian M. Mitchell. Heaccepted a student who had less than stellar undergraduate grades and some minorexposure to robotics, took him under his wing, gave him ample freedom to explore,gave him direction when he needed it, and introduced him to countless new ideas.My tenure at UBC has been a defining experience and I cannot overestimate itsimpact on my life and how lucky I have been to have been part of this community.I would also like to thank Dr. Dinesh Pai and Dr. Elizabeth Croft for allowingme the use of their labs and their robots. It was very rewarding to gain experienceon such a wide variety of hardware with distinctly different sensors and linkages,and that has had its own special role in expanding my horizons.I thank my parents for their love and patience sent from afar, and especiallyfor their support and encouragement on this journey, even though they would haveloved to have me close by in Massachussetts. It’s good to be missed.Finally, I would like to thank the Canadian Institutes of Health Research (CIHR)for funding this research through the CANWHEEL Emerging Team in WheeledMobility for Older Adults Grant #AMG-100925, and the National Science andEngineering Council of Canada (NSERC) for their support through Discovery Grant#298211.xChapter 1IntroductionThis research was supported by CANWHEEL1, an interdisciplinary team of clinicalresearchers and basic scientists. The engineering team at CANWHEEL is concernedwith bringing the benefits of power wheelchairs to a wider group of users than iscurrently possible, particularly by making wheelchairs easier to operate. A keybreakthrough in this regard would be a power wheelchair that largely drove itself,or assisted in the avoidance of obstacles. Thus, one of the high-level goals of theresearchers at CANWHEEL is to design, build, and evaluate an autonomous orsemi-autonomous powered wheelchair.Such a wheelchair should be able to operate in ordinary homes and hospitals,should offer a comfortable ride, and should above all have strong safety guarantees.These requirements constitute significant challenges to the underlying robotic sys-tem: navigation in an uncertain environment and safe, smooth control. In this thesiswe propose a simple solution to these two common control problems.1.1 Navigation Under UncertaintyOne of the most significant challenges on the way to realizing this goal is thedynamic, unstructured environments that this type of wheelchair will need to operatein to be successful. A powered wheelchair is most useful in the user’s home, in ahospital, or in an elder care facility where older adults with mobility challenges1Website: frequently find themselves. Homes are varying and sometimes cluttered—thismakes for an unstructured environment that is difficult to measure and map precisely.Hospitals are even more difficult for the fact of being filled with other people,moving about to their own destinations. This makes for a dynamic environmentwhose future state is difficult to predict.Despite decades of robotics research in these kinds of environments—for in-stance, the museum robot RHINO [6] of 1998 and robotic wheelchairs from at leastas early as 19902—navigating safely and robustly in these environments is still adifficult and unsolved problem.The difficulty of such environments is twofold:1. Their unstructured, possibly cluttered nature gives rise to a large amount ofuncertainty in the current and future state of the environment. Planning underuncertainty is a rich field and will be discussed in Chapter 2.2. Their dynamic nature requires the robot to perceive and react quickly tochanges in the environment.These two features are difficult to reconcile. Planning under uncertainty requiresthe consideration of not a single observed state, but all possible states of theenvironment. The resulting computational complexity makes it much harder to planas quickly as feature #2 requires.1.2 Safe, Smooth ControlIt is our goal to generate a safe and smooth control signal, given a map of obstacles.These two goals can be contradictory.To synthesize safe controls, i.e. controls which avoid collisions to the best ofthe robot’s ability, we require that the robot’s movements be subject to a cost whichincreases as the robot approaches an obstacle. Given such a cost function, wegenerate an optimal control policy. In this framework, an optimal control is a safecontrol.Many techniques have been developed for synthesizing (approximately) optimalfeedback control inputs/actions in the control and robotics literature. Informally,2See [14] for a survey of early progress in intelligent wheelchairs.2we seek progression to a prescribed state with minimal cost and a smooth inputsequence. This can often be accomplished for systems with linear dynamics andquadratic cost functions (or problems which can be reasonably approximated in thatmanner). However, the dynamics of a powered wheelchair are distinctly nonlinear,and a cost function which depends on cluttered obstacles will be significantly morecomplex than a simple quadratic. Hence, our requirement for safety gives rise tononsmooth feedback controls.1.3 Navigation with Value FunctionsThe uncertainty in future state makes it important to plan via feedback control andnot open loop: since we cannot know future states, we cannot plan a sequence ofoptimal actions ahead of time. One method for optimal feedback control is the useof value functions.A feedback control requires us to define a navigation function over the robot’sconfiguration space. A navigation function is a function from state space to controlspace, pi : X →U . For each state in the state space, it gives an appropriate action,so the robot can make progress toward the goal from wherever it happens to finditself. For optimal control, the navigation function can be efficiently derived fromthe value function.The value function returns the minimum cost to go to the goal from any givenstate. As we will explain in Chapter 3, these functions can be constructed as the so-lution to a dynamic programming problem. As our cost functions from the previoussection are potentially nonconvex and nonsmooth, so too is the value function. Theoptimal action is to take the action which results in the maximum decrease in thisfunction; i.e., to follow the direction of steepest descent. Thus, our solution to thisproblem amounts to steepest descent on a nonsmooth, nonconvex function—albeit aspecial class of nonconvex functions with a single global minimum, thus not havingthe possibility of converging to a local minimum.1.4 An ExampleIn order to illustrate this challenge, consider the narrow corridor scenario shownin figure 1.1. For simplicity, we work with an isotropic, holonomic vehicle in the3Figure 1.1: Narrow corridor example. Top: Obstacles are gray, the goallocation is a blue circle. Middle: Contours of the value function. Thecorresponding cost function penalizes states near the walls. Bottom:Vector field of the gradients of the value function (on a subsampled gridfor visibility). All gradients in the corridor have a rightward component,but there is an abrupt jump between upward and downward components.plane: It can move in any direction at some bounded maximum speed. The vehicleis trying to reach a goal location on the right side of a narrow corridor. The objectiveis to minimize path length, but for safety purposes we penalize states that are closeto the walls. To solve the scenario, we approximate a minimum time to reachvalue function in the obstacle free space, and then the resulting optimal feedbackcontroller follows the gradient of the value function.The first problem arises even with numerical simulations where we know theexact state: Chattering of the optimal control as illustrated in figure 1.2. A typicalcyber-physical control system for vehicles and robots operates on a loop in whichnew control signals are generated at roughly periodic time intervals. If the controlsignals arise from gradients of a value function, this approach is mathematically4Figure 1.2: Narrow corridor example paths. Every figure is overlayed withcontours of the value function, and markers are placed along the trajectoryat each step. Top: Fixed stepsize path. Middle: Adaptive stepsizepath from MATLAB’s ode15s (an implicit scheme). Bottom: Sampledgradient path.equivalent to fixed stepsize gradient descent (essentially forward Euler integration)and gives rise to significant chattering. A naive response is to use a variable stepsizeintegrator such as MATLAB’s ode23 or ode45. It turns out that they can generatedecent paths, but require an unreasonable number of tiny steps. Those familiarwith numerical integration will immediately diagnose stiffness as the problem, andprescribe an implicit integrator such as MATLAB’s ode23t or ode15s. They takefewer steps but require much longer to run because they still generate many tinysteps in parts of the domain where the optimal path is straight. In fact, the problemarises because the value function is (nearly) non-differentiable along the center ofthe corridor, and consequently the gradient is (nearly) discontinuous precisely wherethe optimal path lies. Put another way, the gradient descent differential equation isnot just stiff in a normal sense, but infinitely so. Figure 1.2 also illustrates how our5Figure 1.3: State uncertainty, as represented by samples from a (simulated)particle filter. Figure shows a zoomed in view of the left side of thecorridor scenario from figure 1.1. Each dot is a state sample, and theattached arrow shows the optimal action for that sample. Depending onwhich sample most accurately represents the actual state, the optimalaction could be in any direction.proposed approach generates a much more desirable solution with large stepsizeand accurately optimal path.The second problem arises in physical systems in which the state is not accu-rately known. Figure 1.3 shows a commonly used non-parametric representationof state uncertainty called the particle filter: Each point represents a possible stateof the system. Every direction of the compass is covered by at least one particle’soptimal action. The typical mechanism for choosing an action—extract the meanstate or most probable state and use the corresponding optimal action—will chooseone of these actions and will not even recognize that any action may be counter-productive. In fact, the system should refine its state estimate if possible beforechoosing an action.With these two problems in mind, the contributions of this thesis are to show:6• How the simple gradient sampling algorithm [8] borrowed from nonsmoothoptimization can easily be combined with a standard particle filter representa-tion of state uncertainty.• That the resulting algorithm can at each step generate a productive actionchoice which takes into account current state uncertainty or determine that nosuch action exists.• That the resulting action choices nearly eliminate the chattering often ob-served over multiple steps when using gradient descent based path planners.We illustrate its success in simple simulation with figure 1.2, but more impor-tantly we illustrate its success in a full scale robotic simulation with state uncertaintyand simulated noisy sensors and motion, using the widely adopted Robot OperatingSystem (ROS) / Gazebo environment.The algorithm does give rise to one undesirable behavior: it tends to steer towardnot just minima of the value function but any stationary point, including saddlepoints. Consequently, we propose a procedure for categorizing whether a stationarypoint is the desired minimum and discuss two approaches to resolve saddle points.A pleasant side-effect of the categorization algorithm is automatic determinationof whether the goal has been reached to the degree possible given the current stateuncertainty. Finally, we demonstrate that our algorithm is easily adapted to useother underlying optimal planners, such as RRT*.7Chapter 2Related WorkThe problem of “robustness to uncertainty” is broad and can be tackled in manyways. Under this umbrella there exists everything from very general mathematicalformalisms to highly domain-specific solutions to a narrow aspect of the problem.The Gradient Sampling with Particle Filter (GSPF) algorithm sits somewhere inbetween these two extremes, providing robustness to a very general set of systemsand uncertainty, but with assumptions that limit its applicability in exchange forremaining tractable. It is a relatively uncomplicated way to account for existingstate uncertainty and produce more stable paths, but does not attempt to modelfuture uncertainty.2.1 Types of UncertaintyThere are many sources of uncertainty in real-world robotic planning, often groupedinto three main categories [42]:1• Motion uncertainty• Environmental uncertainty• Sensing uncertaintyMotion uncertainty is non-determinism in the robotic control system. It is anexpression of the fact that the next state is a stochastic function of the current state1Sometimes the second category is split to create a total of four groups, as in [18].8and the control. Even when the current state and control are perfectly known, futurerobot states cannot be exactly predicted.The future state of the environment around the robot is similarly non-deterministic.This is the result of the many possible forms of environmental uncertainty: our priormaps of the environment may not be exact; other objects in the environment may besubject to motion uncertainty; there may be other agents in the environment whooperate under unknown motion models or whose controls we cannot observe.Sensing uncertainty is non-determinism in the observation function. It con-tributes uncertainty to both the state of the robot and the state of the environment.When taken together with the first two types of uncertainty, the result is that notonly can the state not be predicted in the future, it cannot even be observed at thepresent time.2.1.1 Selective FocusWork which addresses planning under uncertainty may sometimes only address oneor two of these categories. For instance, works which address adversarial games [33]or navigation among pedestrians [13, 38] often focus only on the uncertain intentionof their opponent (i.e. environmental uncertainty).More commonly, motion and sensing uncertainty are considered together, butenvironmental uncertainty is ignored. This is because many works assume a staticworld with an accurate map; in this case, the only uncertainty is in the state of therobot (the sources of that uncertainty being motion and sensing).This is even the case for implementations of the very general Partially Ob-servable Markov Decision Process (POMDP) model, since many POMDP solutionsassume the cost and transition functions are time-invariant, which is only true undera static environment. Some attempt to get around this by computing a policy over thebelief state of the entire environment, but this is impractical for all but the smallestenvironments, such as the Tag example in [28] or the intersection example in [2].An exception is online POMDP solvers, which can update their policy according tonew observations and beliefs [31], but these still have the difficulty of representingall possible observations (and thus all possible evolutions of the environment) whichcould occur from the current belief state.92.1.2 Where This Work FitsGSPF can deal with motion and sensing uncertainty, by sampling from the stateestimate. The formulations demonstrated in this thesis cannot deal well with envi-ronmental uncertainty, because the underlying planners assume a static environmentand are not well-equipped for fast replanning. However, if given a planner withthat capability, then the gradient sampling method would naturally accommodateit. Recent work in high-frequency replanning shows that some highly parallelizedplanners may be fast enough to introduce some robustness to dynamic obstacles [34].2.2 Types of RobustnessIn addition to there being many types of uncertainty to be robust to, there aremany different ways that one can be robust. One might minimize the probabilityof collision, or maximize the probability of success. One might minimize somecost; this could be the expected cost, or the nominal cost, or the cost subjectto some chance constraints on the probability of collision. Or one may simplyminimize uncertainty directly, with hope of enabling traditional collision avoidanceapproaches.2.2.1 POMDP SolversMinimizing the expected cost of a policy is a common approach. This is theobjective of the general POMDP formulation. The general applicability of POMDPsmakes them very attractive, but they are also known to be PSPACE-complete (andthus very unlikely to be solvable in polynomial time) [26]. While there have beensignificant advances in making POMDPs tractable by sampling beliefs [2, 28], toour knowledge there is not yet any solution which can be applied in real-time(replanning multiple times per second) to continuous state and action spaces—aprerequisite to employing the technique in robotics.There has, however, been some promising recent work in online and real-timePOMDP solvers. Monte Carlo Tree Search within the belief space has enabled onlinesolvers in large state spaces [33], but is yet to be applied to robotic applicationsor continuous action spaces. More recently, a similar concept called an AdaptiveBelief Tree was combined with Generalized Pattern Search to enable the use of10continuous action spaces, and although it is not quite as fast as we would like, theinitial results are very promising [32]. Similarly to our method, both of these can beapplied using only a black-box simulator of the robotic system, and do not requirerepresenting the full transition or observation probability densities. Like real-timeperformance, this is another essential practicality which many POMDP solutionslack—essential because enumerating or even sampling from the space of possibleobservations is very difficult.2.2.2 Other ApproachesIn the absence of the ideal POMDP solver, there have been a plethora of alternativemethods of varying applicability. Some even boast features that are not enabledby POMDPs, such as directly maximizing the probability of success [34]. Othersare slightly more limited, minimizing expected cost subject to an upper bound onprobability of collision [1, 9, 12]. Still more limited are the methods which onlyminimize expected cost, with no estimation on probability of collision; this includesmost methods based on Linear Quadratic Gaussian (LQG) control, e.g. [35, 41].Moving on from there, there have been methods which minimize the cost of thenominal trajectory and then attempt to track this trajectory, rather than minimizingthe expected cost of a feedback policy [21, 22] (although [22] does admit modelingrisk in the objective function, something that is not often permitted). Alternatively,LQG-MP [40] does not optimize the cost directly, but is able to choose the planwith lowest probability of collision from a set of candidate plans.Some methods minimize the state uncertainty directly, rather than having this fallout naturally as a side effect of minimizing expected cost. The Bayesian frameworkpresented in [9] can minimize uncertainty while keeping task time bounded, orminimize time while keeping uncertainty bounded. Belief Road Map [29] does notbound or minimize covariance over the whole trajectory, but it can minimize finalcovariance.As methods vary in their attack on uncertainty, so do they vary in their assump-tions and limitations. It is common to address only Gaussian beliefs [1, 5, 9, 12, 27].Many methods are designed specifically for linear systems, or linear-quadratic, orlinearized-quadratic [21, 22, 34, 40, 41]. Some methods are based on tracking a11nominal trajectory, and if they deviate too far from the nominal plan their policymay no longer be applicable [5]—this includes most of the LQ methods as well.Similarly, some methods perform local optimizations on a nominal plan, and cannotclaim to be globally optimal [27, 41].Certain methods operate on discrete space/observations only [28]. BU-RRT*[21] requires polytopic obstacles. It also requires knowing and modeling of alluncertainties present in the environment, as do many POMDP solvers [2, 28] (becausethey need to be able to fully enumerate transition and observation probabilitydensities).2.2.3 Comparison To This WorkGSPF does a bit more than minimizing the cost of a deterministic trajectory—ateach step, it takes the state uncertainty into account to guarantee progress. We havea mechanism to identify when uncertainty is too large to guarantee progress, and toattempt relocalization in that event. In this way, the system localizes to the extentnecessary to reach the goal, but otherwise does not explicitly work to minimize orbound uncertainty.Although not intentional, GSPF does something very similar to Q-value MarkovDecision Process (QMDP) [20, 31]. QMDP is a POMDP approximation which as-sumes that all partial observability will disappear after a single step. The currentstate uncertainty and the motion uncertainty are represented, but it is assumed that inthe next step the state will be perfectly known. Likewise, in GSPF the action chosenfor each particle is optimal assuming that further evolution is deterministic, so theGSPF consensus direction likewise incorporates the assumption of deterministicfuture evolution. Because they do not consider partial observability, neither methodis capable of information-gathering actions which may help reduce uncertainty, al-though GSPF is capable of identifying when the uncertainty is too large to guaranteeprogress.QMDP resolves conflicting actions by choosing the one with the highest expectedvalue (where the expectation is run over only the current belief state). This typicallyrequires discrete state and action sets so that this optimization can be performedefficiently. GSPF, on the other hand, is framed around a continuous action space,12and in fact requires continuity of the actions and the value function. Rather thanthe maximum expected reward, we choose an action which guarantees a positivereward. This favors choosing actions safely over choosing actions greedily.13Chapter 3Problem & BackgroundAt a high level, we seek to choose a sequence of direction commands which willnavigate a robot from its current location to a known goal location in a knownmap. Although the robot is able to move in any direction, the problem is non-trivialbecause the map contains obstacles which must be avoided. Furthermore, the robotis unsure of its current position, may not move exactly as commanded, and receivesnoisy sensor information from which it must estimate its position and movement. Avery common approach to solve this problem is to plan deterministically optimalpaths, track state uncertainty online, but then choose the action which is determinis-tically optimal for some single possible state. Such an approach can be theoreticallyjustified by the separation principle for linear time-invariant systems with Gaussiannoise [3], but most mobile ground robots and their sensors are highly nonlinear sothe widespread adoption of this pipeline is due to its ease of implementation andfrequent experimental success (modulo the chattering issue mentioned earlier).In this chapter we outline the elements of our optimal navigation pipeline, and inthe next chapter we will show how these elements can be naturally integrated. Theindividual elements are common robotic tools, but are not typically used together.We track uncertainty with the popular particle filter representation. Optimal actionsare chosen by gradient descent on a value function approximation. These twoelements will be unified by the gradient sampling algorithm from the non-smoothoptimization literature, which is also described in this chapter.143.1 Problem FormulationOur goal is to navigate the robot through the state space Ω to some compact targetsetT ; we present results forΩ⊆R2, but it is straightforward to extend the approachto higher dimensions. We seek paths x(·) : [t0, t f ]→Ω that are optimal according toan additive cost metricψ(x0) = infx(·)∫ t ft0c(x(s)) ds, (3.1)where x(0) = x0, x(t f ) ∈T and x(·) are drawn from the set of feasible paths suchthat x(t) ∈Ω\T for t0 ≤ t < t f . The value function ψ(x) measures the minimumcost to go from state x to T (this cost may not be achievable, but paths whose costsare arbitrarily close to ψ(x) exist). The cost function c(·) is assumed to be strictlypositive and Lipschitz continuous.The optimal solution of (3.1) depends on the feasible paths. Here we willassume only the simplest form of isotropic holonomic dynamicsddt x(t) = x˙(t) = u(t), (3.2)where ‖u(t)‖ ≤ 1 (all norms are assumed to be Euclidean ‖ · ‖ = ‖ · ‖2 unlessotherwise specified). We assume the input signal u(·) is measurable and hence x(t)is continuous.3.2 Planning with a Value FunctionThe value function (3.1) satisfies a dynamic programming principle and can beshown to be the viscosity solution of the Eikonal equation (for example, see [39])‖∇ψ(x)‖= c(x), for x ∈Ω\T ;ψ(x) = 0, for x ∈T .(3.3)Under the dynamics (3.2), the viscosity solution ψ(x) of (3.3) is continuous andalmost everywhere differentiable. The only local minima of ψ(x) occur at T , butthe function will usually be non-convex: it can have saddle points inΩ\T and localmaxima at boundaries of Ω which are not also boundaries of T . It will typicallynot be differentiable at these critical points, as well as on other lower dimensional15subsets of the domain.Except on simple domains Ω and for simple cost functions c(x), it is notpractical to find the viscosity solution of (3.3) analytically; however, there existmany efficient approaches to approximate it; for example [11, 16, 19, 39]. Inorder to handle complex domains and cost functions, approximations are generallyconstructed on a discrete grid.Given the value function, the optimal state feedback action is easily extractedu∗(x) =∇ψ(x)‖∇ψ(x)‖ . (3.4)Consequently, generation of an optimal path is equivalent to gradient descent ofψ(x). Of course we typically cannot represent the exact solution of (3.2) and (3.4)either, so we seek an approximate path in the form of a sequence of waypoints{x(ti)}i for some sequence of timesteps t0 < t1 < t2 < · · ·< t f and some initial x0.A common approach is essentially a forward Euler integration with fixed timestep∆tti+1 = ti+∆t,x(ti+1) = x(ti)+∆tu∗(x(ti)).(3.5)Unfortunately, a straightforward implementation of (3.5) to generate paths fromthe value function (or even a fancier implementation with adaptive stepsize) fallsprey to the well-established problem with gradient descent (as illustrated in the toptwo subplots of figure 1.2): The resulting paths chatter or take many steps to achievethe optimum. This outcome is not surprising, since the optimal paths often proceeddown the middle of steep-sided valleys in the value function, and in these valleys thevalue function displays the large disparity in curvature that causes gradient descentsuch problems. In fact, the value function may not be differentiable there, so thegradients change discontinuously and the disparity in curvature is infinite.A further complication arises because ψ(x) and hence ∇ψ(x) are approximatednumerically. The numerical algorithms will typically return an approximation of∇ψ(x) even at values of x where the true ψ(x) is not differentiable, and moregenerally the approximate values of ∇ψ(x) may be inaccurate near these regionswhere differentiability fails.163.3 State Estimation with Particle FiltersThe particle filter is a popular technique for state estimation or localization ofsystems with nonlinear dynamics and/or sensor models. We focus on a versioncommonly used in robotics called Monte Carlo Localization (MCL) [36]. The stateestimate is represented by a collection of weighted samples {(w(k)(t),x(k)(t))}. Thisestimate is updated by predictions whenever the system state evolves and correctionswhenever sensor readings arrive, typically in an alternating iteration. Predictionsupdate only the state component by drawing a new samplex(k)(ti+1)∼ p(x(ti+1) | x(k)(ti),u(ti)), (3.6)where p(x(ti+1)|x(ti),u(ti)) is the probability distribution over future states givenpast state and input; in other words, the dynamics (3.2) plus some motion noise.Corrections update only the weight component by multiplicationw(k)(ti+1) = p(sensor reading | x(k)(ti+1)) w(k)(ti),where p(sensor reading|x(k)(ti+1)) models the probability of seeing the sensor read-ing given the particle’s current state.In MCL the particle representation is also regularly resampled, typically aftereach sensor reading. During resampling, a new collection of particle locations isdrawn (with replacement) from the existing locations with probability proportionalto the existing particles’ weights, and the weights are all reset to unity. In theremainder of the paper we will work with the state estimate only after resampling,so we assume unit weights.3.4 The Gradient Sampling AlgorithmIt is well known that gradient descent performs poorly on nonsmooth optimizationproblems, and many algorithms have been proposed to overcome its limitations.Here we draw inspiration from the gradient sampling algorithm [8], which isdesigned to generate a sequence of high quality linesearch directions despite thepresence of discontinuous and/or poorly approximated derivatives in the objectivefunction.17The basic algorithm is given a function ψ and a starting point x(0), and aimsto efficiently locate a minimum of the function. In each iteration starting at thecurrent point x(t), it seeks to choose a point x(t + 1) in a way that guaranteesdescent(so that ψ(x(t+1))≤ψ(x(t))). For a differentiable function this is triviallyachieved by performing a line search in the direction of the negative gradient to findan appropriate step size γ(t):x(t+1) = x(t)− γ(t) ∇ψ(x(t))‖∇ψ(x(t))‖ .However, as we have mentioned before, we are interested in value functions whichare not differentiable everywhere. This can be addressed through the use of thegeneralized gradient, also referred to as the Clarke subdifferential. Informally, theClarke subdifferential is the compact set of directions of maximum descent froma given point. It can be defined in terms of the convex hull of gradients in theneighborhood around a point x¯ ∈ Rn∂Cψ(x¯) = conv{limr∇ψ(xr) : xr→ x¯,xr ∈ Q}where Q is a full-measure subset of points in the neighborhood of x¯ which aredifferentiable. See [10] for a full exposition.Given the set ∂Cψ(x¯), we can use any vector in this set as a valid direction ofdescent, and perform a line search as in the steepest descent case. However, thisset is difficult to compute. We can instead approximate the generalized gradient bytaking a sampling of gradients in the same neighborhood [7]. This is the key idea inthe gradient sampling algorithm.The algorithm takes a sampling of gradients at points within a radius ε ,x(k)(t) = x(t)+ ε δx(k), (3.7)p(k)(t) = ∇ψ(x(k)(t)) (3.8)for k = 1, . . . ,K, where {δx(k)} are sampled independently and uniformly from the18Figure 3.1: Finding a “consensus” among different actions. Left: Gradientvectors (yellow) shown at the corresponding samples’ locations, andthe resulting consensus action (red). In this neighbourhood, the valuefunction has a ridge. Right: The gradients (yellow) plotted in gradientspace, their convex hull (blue) and p∗(t) (red). The consensus action is aleftward movement, which is a descent direction for all samples.unit ball. Next, the algorithm computes the generalized gradient approximation,P(t) = conv{p(1)(t), . . . , p(K)(t)}.The algorithm chooses the point with the minimum normp∗(t) = argminp∈P(t)‖p‖2 (3.9)The authors of [8] cite a desire to be conservative in making this choice, as well asthe ability to prove convergence under this choice.Thus, the direction p∗(t) is a consensus direction: a direction of descent fromall samples x(k)(t). Any rescaling of p∗(t) is also a consensus direction. Mostimportantly, it can easily be found by solving a simple convex quadratic optimizationproblem. The technique is illustrated in figure 3.1.If ‖p∗(t)‖= 0, there is a Clarke ε-stationary point—conceptually a local mini-mum, maximum or saddle point—somewhere within the ε-ball around x(t), and no19direction can be agreed upon by all samples. In this case the radius ε is reduced anda new sample set (3.7)–(3.8) is obtained.Otherwise, the algorithm performs an Armijo line search along the vector givenby p∗(t) to determine an appropriate step length s, and the update is given byx(ti+1) = x(ti)− s p∗(t)‖p∗(t)‖ (3.10)The algorithm terminates when ε shrinks to a predetermined threshold. Whenpaired with a linesearch procedure that ensures sufficient descent, such as Armijo,it is shown to converge to a Clarke stationary point under suitable conditions.When we repurpose this algorithm to path planning, we will have to deal with thepossibility of arriving at a stationary point which is not the desired minimum.20Chapter 4Gradient Sampling with ParticleFilterThe basic gradient sampling algorithm can generate a path for simulations, visualiza-tions, and other situations where the initial condition x0 is known, the dynamics (3.2)are accurate, and the chosen input (3.4) is accurately implemented. Unfortunately,for most robotic systems these assumptions do not hold. In this chapter we considerhow the algorithm can be adapted to the case where x(t) can only be estimated.4.1 Gradient Sampling with State UncertaintyThe state estimate representation used by particle filters suggests a natural adapta-tion of the gradient sampling algorithm: Instead of choosing the gradient samplelocations with (3.7), use the particles’ locations (3.6) directly. We call this versionGSPF, and outline its key properties in the following propositions.Proposition 1. If the solution p∗(t) of (3.9) is such that ‖p∗(t)‖ 6= 0, then p∗(t) isa descent direction in the value function for all particles.Proof. A direction d is a descent direction at point x if and only if dT p(t) < 0,where p(t) is the gradient of the value function at point x(t). The descent directionchosen by GSPF will be d = −p∗(t). So to produce a descent direction for allparticles, we must show that p∗(t)T p(k)(t)> 0, ∀i ∈ {1, . . . ,K}. In other words, theminimum norm convex combination of gradients forms an acute angle with every21aba1a2c = b - aFigure 4.1: The minimum norm direction p∗(t) must form an acute angle withevery point in the convex hull in order to be a direction of descent for allparticles. We show that if this is not the case for a potential solution a,then a must not be minimum norm.particle’s gradient. We will prove the stronger property that the minimum normchoice forms a descent direction with not just any single particle, but any convexcombination of particles: p∗(t)T p > 0, ∀p ∈ P(t).For simplicity of notation, let a be the proposed solution to (3.9), and let b ∈ Cbe any convex combination of gradient samples:C ={∑kλk p(k)(t) :∑kλk = 1, ∀k : λk ≥ 0}.To be the true solution, the proposal a must have the minimum norm:a = argmin‖a′‖ ∀a′ ∈ C .We will show that a must have a positive dot product with all gradient samples,because if it did not, then there would be a choice with a smaller norm:aT b≤ 0 =⇒ ∃a′ ∈ C : ‖a′‖< ‖a‖. (4.1)22Note that since the minimum norm p∗(t)> 0, the magnitudes of all vectors inthe convex combintation must be greater than zero: ‖a‖> 0 and ‖b‖> 0.Let c= b−a andD = {(1−λ )a+λb : λ ∈ [0,1]}. Note thatD ⊂C (by virtueof convexity, all points on the line segment between a and b are within the convexhull).Let a1 be the projection of a onto c.a1 =aT ccT cc=aT (b−a)(b−a)T (b−a)(b−a)=aT b−‖a‖2‖a‖2+‖b‖2−2aT b(b−a)Let ρ =aT b−‖a‖2‖a‖2+‖b‖2−2aT bSo a1 = ρ(b−a) (4.2)Let a2 be the rejection of a from c.a2 = a−a1= a−ρ(b−a)= (1+ρ)a−ρbBy Pythagoras, we have:‖a1‖2+‖a2‖2 = ‖a‖2=⇒‖a2‖2 = ‖a‖2−‖a1‖2=⇒‖a2‖< ‖a‖ if ‖a1‖> 0 (4.3)Now we have that a2 is a vector which lies somewhere on the line passingthrough a and b. If we let λ = −ρ as defined by equation 4.2, we note it has the23same structure as our set D :a2 = (1+ρ)a−ρb= (1−λ )a+λbExpanding this choice of λ ,λ =−ρ = ‖a‖2−aT b‖a‖2+‖b‖2−2aT b (4.4)we can see that if aT b≤ 0 then the above quantity is positive. We can also see thedenominator is strictly greater than the numerator. Therefore, 0 < λ < 1, whichplaces a2 within the set D .Referring back to equation 4.2, and the fact that ‖a‖ > 0 and ‖b‖ > 0 andaT b ≤ 0 =⇒ ρ 6= 0, we can also see that ‖a1‖ > 0. Therefore, by equations 4.3and 4.4, we have demonstrated our assertion 4.1: specifically, we have shown thata2 ∈ C and ‖a2‖< ‖a‖, so a is not the minimum norm in the convex hull C andour leading assumption that p∗(t) = a cannot be true if there exists a b such thataT b≤ 0.The particle gradient samples are within the convex hull, so the property whichholds for b holds for them as well: the minimum norm choice must be a descentdirection for all.Proposition 2. If the solution p∗(t) of (3.9) is such that ‖p∗(t)‖ = 0, there is nodirection which is a descent direction for all particles.Proof. We will propose that there is some direction d which is a direction of descentfor all gradient samples, and draw a contradiction showing this cannot be the case.Let the chosen direction d be the opposite of some gradient combination such that−d = a ∈ C . Assume that aT p(k)(t)> 0, ∀k ∈ [1,K].Since (0,0) is one of the points in the set C , we know there is some setting of24λ1, . . . ,λK such that ∑k λk p(k)(t) = (0,0). Call this point b. Therefore,aT b =∑kλkaT p(k)(t) = 0.All λk are nonnegative and at least one must be positive. Without loss of generality,consider all terms where λk > 0; there will be at least one such term, and they willsum to 0. Therefore either aT p(k)(t) = 0 for all such terms, or at least one termsatisfies aT p(k)(t)< 0. Thus, a contradiction: we cannot have aT p(k)(t)> 0, ∀k ∈[1,K]. There is no choice of a which results in a descent direction for all particles.4.1.1 Contrasting Gradient Sampling and GSPFWe note that in GSPF, we do not have direct control over the step length s; instead,the step length is implicitly determined by how long (ti+1− ti) the system evolvesbefore the particle filter is again resampled. This choice implicitly replaces (3.10)withx(ti+1) = x(ti)+(ti+1− ti)(p∗(t)‖p∗(t)‖2).It is straightforward to modify the probability distribution in (3.6) to take thischange into account. From a theoretical point of view, the goal of the Armijo linesearch which provided the step size in the basic algorithm was to ensure a sufficientdescent condition, which is then used in the proof of convergence to Clarke ε-stationary points of the value function. Assuming that resampling occurs sufficientlyoften, it should be possible to ensure a similar sufficient descent condition in theGSPF, but that by itself will not rescue the convergence proof because we have alsoreplaced (3.10) with (3.6). Whether or not the convergence theorem still holds,GSPF appears to converge to stationary points in practice.4.2 Classifying and Resolving Stationary PointsAdapting gradient sampling to the case where the system state is estimated by a par-ticle filter is straightforward, but we no longer have direct control over the samplingradius ε and so we must devise an alternative termination criterion. Furthermore,25the gradient sampling algorithm converges to stationary points of any kind, but weseek a local minimum (which by construction (3.3) is guaranteed to be a globalminimum and occur at the target set); consequently, we must devise a mechanismfor escaping stationary points which are not local minima.Fortunately, we do have indirect control over our sampling radius: We canperform more sensor updates (and resamplings) in the hope that the spread of thestate estimate is reduced with the introduction of more observations. In Chapter 5we simulate a robot which can choose to use either a low or high precision sensor,and which would prefer to use the low precision version whenever possible toconserve power. Similar multi-tiered sensing solutions may include cases wheremultiple sensors are available but not always deployed, or where the robot couldre-orient a sensor with a limited field of view.4.2.1 Stationary Points ClassificationBefore going to the trouble of gathering additional sensor readings, we should firstdetermine whether a stationary point is a desirable minimum or an undesirablesaddle point or maximum. To do so, we will locally approximate the value functionas a quadraticψ¯(x) = 12(x− xc)T A(x− xc)+bT (x− xc)+ c (4.5)in the neighborhood of the particles, where xc is the center of curvature and matrixA is symmetric. Rather than fitting the value function directly, we fit the gradient ofthe quadratic approximation∇ψ¯(x) = A(x− xc)+b (4.6)to the set of gradient samples {p(k)(t)} that we have already collected (3.8). In ourimplementation, we set xc to be the mean of the particle locations {x(k)(t)}, useleast squares to fit A and b, update xc = A−1b, and refit A and b (at which point b isvery close to zero). The resulting matrix A is an approximation of the Hessian ofthe value function in the neighbourhood of the stationary point.We note in passing that quasi-Newton optimization algorithms, such as BFGS,26also approximate the Hessian of the objective function [24]. Such algorithms aredesigned to construct their approximation efficiently in high dimensions using asingle objective function sample from each of multiple steps. In our case we wish toapproximate a low dimensional Hessian which takes into account information fromall of the current particles and only the current step, so we find the least squaresfit described above more efficient and appropriate than an adapted quasi-Newtonupdate.If the stationary point is a minimum, A will be positive definite. Algorithmically,we compute the eigenvalues of A (relatively inexpensive for the low to moderatedimensional systems in which we are interested) and declare victory if they are allpositive. If there are negative eigenvalues, then we can attempt to improve the stateestimate through additional sampling (as described above) and thereby escape thestationary point.4.2.2 Resolution by EigenvectorIn practice, improved sensing is not always available or is insufficient to resolveundesirable stationary points. Fortunately, the eigenvalue decomposition of theHessian A also provides us with an alternative method to determine a reasonableaction. Each eigenvector v corresponding to a negative eigenvalue of A is locally adirection of descent for the value function. By proposition 2 there is no consensusdirection of descent for all particles, but we can choose some direction from amongthese eigenvectors in the hope of escaping the stationary point.If there is only a single negative eigenvalue (which must be the case at a saddlepoint in 2D) with corresponding eigenvector v, there are only two descent directions:v or −v. A simple voting procedure can determine which of these the majority ofthe particles prefer. Letα =K∑k=1sign(−vT p(k)).If α < 0 we travel in the direction of −v, otherwise we travel in the direction of +v.Figure 4.2 illustrates this procedure. If multiple eigenvalues of A are negative, onecould use the simple voting procedure on the eigenvector associated with the mostnegative eigenvalue, or one could devise a more complex procedure for searching27Figure 4.2: Using a quadratic approximation to resolve a saddle point. In bothplots gradient samples from an area where there is a saddle point areshown in yellow. Left: Gradient vectors shown at their correspondingparticle locations and the eigenvectors of the local Hessian approximation(blue). The vectors pointing inward correspond to a positive eigenvalue,while those pointing outward correspond to a negative eigenvalue. Theaction chosen (red) is upward, because more gradient samples agree withthis sense of the eigenvector corresponding to the negative eigenvalue.Right: Gradient vectors shown in gradient space and their convex hull(blue). The convex hull contains the origin so there is no consensusdirection.over the space spanned by the eigenvectors associated with all of the negativeeigenvalues to find a direction favorable to more of the particles.4.2.3 Other Failures to Achieve ConsensusIn practice, there are scenarios which do not achieve consensus, and yet are notstraddling stationary points. An example is shown in the “Hallway Entrance”example of Section 5.3: near the entrance to a hallway, it is possible for gradientsto span more than 180◦—but there is no stationary point anywhere in this example.Thus, we cannot assume that lack of consensus signifies a stationary point.Therefore, before we take action based on the quadratic approximation, we mustdetermine if the results are sane. We do this by checking if the center of curvature28of the quadratic approximation (xc from equation 4.5) is within the convex hull ofthe particles. If the quadratic’s center is not in the span of the particles, then it islikely not a stationary point as we were expecting. In this case, we do not use theeigenvalues of the quadratic to determine descent direction, and we must relocalizeif we hope to continue.29Chapter 5ExperimentsTo test GSPF in as realistic an environment as possible in silico, we use the MCLparticle filter implementation found in ROS [30], and hook it to Gazebo [17] tosimulate the robot’s (noisy) motion and sensor systems. We have modified theoriginal MATLAB code from [8] to accept the particle locations as the samplelocations, and communicate with ROS through MATLAB’s Engine API for C++. Forconvenience, we construct the value functions using a transformation of (3.3) toa time-dependent Hamilton-Jacobi PDE [25] that is easily solved using existingsoftware [23, section 2.7]. Given the value function approximation, we approximatethe gradients numerically at the nodes of the grid using (upwind) finite differences,and then (linearly) interpolate these approximate gradients to states which are not onthe grid. The value function and gradient approximations are currently constructedoffline.In each of the following examples, we simulate a holonomic disc robot witha sweeping single-beam LIDAR range sensor. The robot can travel equally fastin any direction in the plane. The sensor has a 260◦ horizontal field of view. Wecan simulate either a low precision (noisier) or high precision version of the sensor.Unless otherwise specified, the high precision version of the sensor is used.In all figures that depict paths taken by the robot, both the ground truth trajectory(green solid lines) and state estimate (blue stippled lines) are shown. The “stateestimate” is the value returned by the MCL implementation, which is usually themean of the particles’ locations (precisely, it is the mean of the dominant cluster,30Figure 5.1: Single obstacle scenario. Left: Costmap, with costs ranging fromlow (blue) to high (red). Costs are highest near the outer boundary ofthe domain and near the obstacle. Right: Contours of the resulting valuefunction. Stationary points are the minimum at the goal (−2.0,0.0) andthe saddle point on the east side of the obstacle (0.9,0.0). The horizontalridge running through this saddle point is the decision boundary betweengoing north or south around the obstacle.and most of the time there is only one cluster).5.1 Single ObstacleOur first example has only a single symmetric obstacle, and is used to demonstratethe main features of the GSPF. The robot must travel around the obstacle from east(4.0,0.2) to west (−2.0,0.0) while avoiding a saddle point. Figure 5.1 illustratesthe scenario.The robot starts by using the noisier sensor and an initial state estimate withlarge covariance. The optimal deterministic path goes northwest around the obsta-cle, but this cannot be determined from the noisy state estimate, which straddlesthe north/south decision boundary. The GSPF identifies that west is a consensusdirection, so the robot moves that way. The state estimate improves, but not enoughto resolve the choice between north and south; consequently, the system encountersthe saddle point. As explained in section 4.2, an approximate local Hessian isconstructed and the presence of positive and negative eigenvalues identifies the31Figure 5.2: The single obstacle example using two different stationary pointresolution methods (zoomed to show only the relevant portion of thedomain). Top: Improved localization. Bottom: Eigenvector voting. Tentrials of each are shown.stationary point as a saddle. Two approaches to resolve such undesirable stationarypoints were proposed, and we illustrate both in figure 5.2 and below.To demonstrate improved localization, we switch to the high precision versionof the range finder and perform MCL sensor corrections (and resampling). Figure 5.3shows how the covariance of the state estimate drops in one of the trials as theseimproved corrections are incorporated, until finally a consensus direction emerges(when the particles all end up on the north side of the decision boundary) and therobot escapes the saddle.For resolution by eigenvector, we keep the noisy sensor. When we identify thesaddle point, we examine the eigenvectors of the approximate Hessian. The particlesvote on the direction of the eigenvector associated with the negative eigenvalue withwhich they most agree. The majority votes to go north, allowing escape from the320 5 10 15 20 25 30 35 400.000.791.572.363.143.934.715.506.28Action Angle vs. Particle CovarianceTimeAngle00. 5.3: Covariance of the particle cloud and the consensus action direc-tion as a function of time for a single run of the improved localizationapproach. The covariance of the state distribution decreases as we movewest (θ ≈ 3.14 rad) until we get stuck on the saddle point (the greyshaded segment). Covariance further decreases as we take new obser-vations from the improved sensor. We see a sudden change in direction(θ ≈ 1.57 rad) when the covariance drops sufficiently low that the parti-cles no longer surround the saddle.saddle.Whichever resolution procedure is used, once the robot escapes the saddlepoint GSPF continues easily around the north of the obstacle and then southwesttoward the goal. When the particles are in the goal region another stationary pointis identified. A new approximate Hessian is constructed, its positive eigenvaluesconfirm that we have reached the target, and GSPF terminates.33Figure 5.4: Navigating to a goal in the bottom right corner. In all cases,actions are chosen at roughly 0.05m intervals. Top: Action chosen bysteepest descent on the expected state generates chattering. Middle:Action chosen by SGD generates a more randomized, but still jaggedpath. Bottom: Action chosen by GSPF generates a smoother path.34Table 5.1: Average Angular Difference Between Successive ActionsUpdate Rate Expected State SGD GSPFmean (variance) mean (variance) mean (variance)0.01 m 16.6◦ (0.05) 16.0◦ (0.08) 2.37◦ (0.02)0.05 m 16.6◦ (0.21) 22.6◦ (0.29) 0.54◦ (0.00)0.1 m 15.2◦ (0.04) 23.5◦ (0.79) 0.39◦ (0.00)0.2 m 23.3◦ (0.15) 29.4◦ (2.76) 0.01◦ (0.00)5.2 Narrow HallwayWe illustrated that the basic gradient sampling algorithm can resolve the chatteringproblem encountered when the value function is (nearly) non-differentiable infigure 1.2. In this section we illustrate that the same chattering behaviour can arisewhen using the default ROS approach of planning based on the expected state of theMCL filter, and that the chattering cannot be resolved simply by randomly samplingstates.The robot starts to the left of a narrow hallway, and must travel down the hallwayto its goal on the bottom right. The cost function is chosen to penalize states whichare too close to the walls, and the resulting value function displays a steep sidedvalley through the narrow hallway. Figure 5.4 illustrates a single run for each of theapproaches. Choosing actions based on the expected state shows the same chatteringbehaviour as the fixed stepsize approach in figure 1.2, while GSPF shows a relativelysmooth path.We also include a comparison against actions chosen by Stochastic GradientDescent (SGD) [4]. One might hypothesize that by taking a random sample from thestate estimation, we might give all gradients equal contribution and offset opposinggradients. However, this does not introduce consensus among drastically opposinggradients; in fact, it can even exacerbate the problem by sampling from particlesvery close to the wall. Our results in Table 5.1 corroborate this and emphasize theneed for consensus, not just equal contribution.It is also possible that an improved path might be generated by adjusting thestepsize. To test that hypothesis we ran simulations at a variety of update rates and35measured the angle between successive action choices:∑i∣∣∣∣arctan(y(ti)x(ti))− arctan(y(ti−1)x(ti−1))∣∣∣∣,where p∗(t) = [x(t),y(t)]T .(5.1)We refer to this as the Angle Metric. Only the portion of each trajectory in thehallway was used (from x =−4.0 to x = 3.0). Results are shown in Table 5.1. Thelarge heading changes characteristic of chattering persist for the steepest descentand SGD approaches over a wide range of update rates, while GSPF avoids theproblem even at fast update rates. We were able to observe chattering by GSPF ifwe stringently limited the number of particles (around twenty or fewer), but such asmall number is not enough to localize reliably anyway.5.3 Hallway EntranceOne of the convenient advantages of our technique is the ability to halt the robot inthe abscence of consensus. This often happens in situations where the robot is at riskof collision if it proceeds without taking uncertainty into account. Take, for instance,the situation illustrated in Figure 1.3. The mean state would command the robotto proceed directly east to enter the hallway. However, many of the particles arenot centered on the entrance or are closer to the wall. If the true robot state is moreaccurately represented by one of these outliers, then the command by expected statewill take it straight into collision. The robot would benefit from instead pausing andattaining better localization before proceeding.We reconstructed this scenario in a ROS/Gazebo simulation to measure thepotential safety improvements. The robot is initialized just outside the entranceto a narrow hallway, with a state estimate sampled from a 2D Gaussian withσxx = σyy = 0.1 meters. The correct direction for the robot to move is directly east,but some particles would quickly be in collision if they were commanded to movein that direction. From the knowledge available to the robot, no consensus can bereached.This kind of scenario was touched upon in Section 4.2.3. We are not near astationary point, and yet we are unable to reach consensus. Therefore GSPF will36Figure 5.5: Navigating the entrance to a narrow hallway. Compare to fig-ure 5.4. Not depicted is how many particles are in collision; we onlyshow the expected state here. Top: Action chosen by steepest descenton the expected state generates chattering. Middle: Action chosen bySGD generates an even less stable path. Bottom: Action chosen by GSPFgenerates a smoother path.37Table 5.2: Angular Change and Safety MetricsStatistic Expected State SGD GSPFmean (variance) mean (variance) mean (variance)Angle Change 22.2◦ (0.3) 42.7◦ (3.7) 9.3◦ (2.1)Probability of Collision 15.2% (37.4) 46.6% (530.0) 2.9% (3.8)Mean Particle Cost 66.0 (0.4) 67.9 (3.6) 59.7 (1.2)activate its high-fidelity sensor and attempt relocalization until consensus is reached.We compare this to the alternative approaches using the expected state estimate.Trials for all three methods are depicted in Figure 5.5.Table 5.2 shows the comparison between our three different control schemeson both the Angle Metric (Equation 5.1) and a new metric designed to measuresafety. The Probability of Collision Metric is given by the maximum of the fractionof particles currently in collision at any time during a single trial:maxi(ci/K), (5.2)where ci is the number of particles in collision at time i. We also report the meanparticle cost over the whole trial, which is a proxy to how “close” the particles areto being in collision (a cost of 99 or 100 is in collision; a cost of 0 is ≥ 5.75 metersfrom an obstacle; and costs 1–98 scale exponentially with their proximity to anobstacle). All metrics are averaged over 10 trials. GSPF shows the best performanceon all three metrics.This demonstrates the natural affect that GSPF has on safety without any ad-ditional modifications. However, this is not a fair comparison to methods whichexplicitly consider safety. For example, one way to mitigate the risk of the tradi-tional method would be to simulate the motion of each particle before issuing thecommand, and perform relocalization if this action would bring any particle intocollision. We could then compare this technique to our own relocalization techniqueto see if they are comparable safety measures. We leave this as future work inexploring the safety aspects of GSPF.385.4 Cluttered SceneHere we examine a slightly more complex scene with several obstacles of differentshapes, as illustrated in figure 5.6. The function exhibits many saddle points (typi-cally one near each separate obstacle) and ridges that represent decision surfaces.Trials using action selection based on expected state and GSPF are shown infigure 5.7. Eigenvector voting procedures were used to resolve all stationary points.We observe several differences in the paths generated by the two algorithms. First,GSPF does not make an immediate decision on which side of the first obstacle topass, and sometimes eventually chooses to pass to the west. Second, GSPF exhibitsa tendency to stay further away from obstacles. In both cases these behavioursarise because GSPF takes into account the position (and hence optimal action) of allparticles.The second behaviour can be considered an appropriate response to a potentialcollision (if the true state turns out to be the particle which is near the obstacle). Thefirst, however, may generate paths which are worse than those which immediatelychoose a route around the obstacle, even if the route turns out to be slightly subop-timal for the true state. This behaviour occurs along ridges in the value function,so a potential solution is to approximate the Hessian at every step and examineits eigenvalues. The presence of a (sufficiently) negative eigenvalue indicates thatthe particle filter is straddling a ridge in the value function (and is therefore likelyheaded to a saddle point anyway), so we could immediately apply one of the saddlepoint resolution approaches from section 4.2 to choose a better direction.5.5 Other Planners: RRT*We have so far assumed that actions are chosen based on the (approximated) valuefunction as discussed in section 3.2; however, there is no reason the gradientsampling approach cannot be combined with other algorithms that generate actionchoices. Any planner which can quickly evaluate action choices for a set of samplestates is suitable, although the stationary point procedures from section 4.2 constructan approximate Hessian and so assume that actions are consistent with an underlying(but potentially unknown) value function. In this final example we demonstrate theuse of GSPF with a multi-query version of the RRT*, a convergent optimal planner39-6 -4 -2 0 2 4-6-4-2024Figure 5.6: The cluttered scenario. Top: Costmap, with costs ranging from low(blue) to high (yellow). Black indicates (inflated) obstacles. Obstaclesare inflated by the robot radius (0.2 m) to prevent collision. Costs extendout to 6.0 m from the obstacles. Bottom: The value function for thecluttered scene. The goal location is in the bottom left. Contours of thevalue function and the resulting gradient vector field are shown.40Figure 5.7: Navigation with value function planner through the cluttered scene.The robot travels from the upper right to the lower left. Obstacles areshown in black. Ten trials are shown in each case. Top: Action chosenbased on expected state. Bottom: Action chosen by GSPF.41Figure 5.8: The result of running RRT* on the cluttered scene. Compare tothe vector field given by the value function in figure 5.6.based on Rapidly Exploring Random Trees [15].Figure 5.8 shows the result of generating an RRT* plan starting from the goalstate. This tree acts as our navigation function: at any state, it can return the(approximately) optimal action from the RRT node which is the nearest neighbor tothat state (in this case it returns a 2D vector toward the goal, scaled to the maximumspeed of the robot). This experiment is performed on the same cluttered scenario asthe previous section. We start the robot with a slightly different initial state to forceit to start on the decision boundary for which way it should pass the first obstacle,as this boundary is slightly different according to RRT*.Figure 5.9 illustrates the performance of GSPF under this type of navigationfunction. GSPF behaves very similar to the way it behaved with the value function.The results are also very similar to the typical planning by expected state, but theexpected state approach had a hard time satisfying the RRT* termination criterion42Figure 5.9: Navigation with RRT* planner through the cluttered scene. Therobot travels from the upper right to the lower left. Obstacles are shownin black. Ten trials are shown in each case. Top: Action chosen based onexpected state. Bottom: Action chosen by GSPF.43and hence tended to chatter around the goal location. GSPF’s termination criterionnaturally takes into account the distribution of the particles and hence it did notsuffer from this issue.44Chapter 6ConclusionWe observed that the gradient sampling algorithm from [8] can be used to resolvethe chattering problem commonly encountered when generating optimal paths fromvalue function approximations. We then proposed the GSPF algorithm, which usesthe particles as the gradient sample locations, thereby naturally and efficientlygenerating consensus directions suitable for all particles or detecting that no suchconsensus can be reached. When no consensus exists we use the eigenvalues of anapproximate Hessian to diagnose whether we have arrived at the goal or are stuckon an undesirable stationary point; in the latter case two approaches were describedfor finding a descent direction. The scheme was illustrated on three examples in theROS / Gazebo simulation environment. GSPF was also briefly demonstrated using anRRT* planner instead of a value function approximation. Although not illustrated,it is also straightforward to apply the gradient sampling approach to systems whosestate uncertainty is characterized by parametric representations, such as the Kalmanfilter.In the future we intend to further explore the use of GSPF with other plannersand its extension to anisotropic and non-holonomic dynamics. The use of a faster,multi-query planner will bring us closer to robust operation in bustling hospitalsettings. And the implementation atop a non-holonomic differential drive systemwill constitute a crucial step toward our goal of developing a safe, comfortablerobotic wheelchair.45Bibliography[1] A.-A. Agha-Mohammadi, S. Chakravorty, and N. M. Amato. FIRM:Sampling-based feedback motion planning under motion uncertainty andimperfect measurements. The International Journal of Robotics Research, 33(2):268–304, 2014. → pages 11[2] H. Bai, D. Hsu, and W. S. Lee. Integrated perception and planning in thecontinuous space: A POMDP approach. The International Journal ofRobotics Research, 33(9):1288–1302, 2014. → pages 9, 10, 12[3] D. P. Bertsekas. Dynamic Programming and Optimal Control, volume 1. →pages 14[4] L. Bottou. Online algorithms and stochastic approximations. In D. Saad,editor, Online Learning and Neural Networks. Cambridge University Press,Cambridge, UK, 1998. → pages 35[5] A. Bry and N. Roy. Rapidly-exploring random belief trees for motionplanning under uncertainty. In Robotics and Automation (ICRA), 2011 IEEEInternational Conference on, pages 723–730. IEEE, 2011. → pages 11, 12[6] W. Burgard, A. B. Cremers, D. Fox, D. Ha¨hnel, G. Lakemeyer, D. Schulz,W. Steiner, and S. Thrun. The interactive museum tour-guide robot. InProceedings of the 15th AAAI Conference on Artificial Intelligence, pages11–18, 1998. → pages 2[7] J. V. Burke, A. S. Lewis, and M. L. Overton. Approximating subdifferentialsby random sampling of gradients. Mathematics of Operations Research, 27(3):567–584, 2002. → pages 18[8] J. V. Burke, A. S. Lewis, and M. L. Overton. A robust gradient samplingalgorithm for nonsmooth, nonconvex optimization. SIAM Journal onOptimization, 15(3):751–779, 2005. → pages 7, 17, 19, 30, 4546[9] A. Censi, D. Calisi, A. De Luca, and G. Oriolo. A Bayesian framework foroptimal motion planning with uncertainty. In Robotics and Automation(ICRA), 2008 IEEE International Conference on, pages 1798–1805. IEEE,2008. → pages 11[10] F. H. Clarke. Optimization and Nonsmooth Analysis. SIAM, 1990. → pages18[11] E. Cristiani and M. Falcone. Fast semi-Lagrangian schemes for the Eikonalequation and applications. SIAM Journal on Numerical Analysis, 45(5):1979–2011, 2007. → pages 16[12] N. E. Du Toit and J. W. Burdick. Robot motion planning in dynamic,uncertain environments. IEEE Transactions on Robotics, 28(1):101–115,2012. → pages 11[13] S. Ferguson, B. Luders, R. C. Grande, and J. P. How. Real-time predictivemodeling and robust avoidance of pedestrians with uncertain, changingintentions. In Algorithmic Foundations of Robotics XI, volume 107, pages161–177. Springer, 2015. → pages 9[14] T. Gomi and A. Griffith. Developing intelligent wheelchairs for thehandicapped. In Assistive Technology and Artificial Intelligence, pages150–178. Springer, 1998. → pages 2[15] S. Karaman and E. Frazzoli. Sampling-based algorithms for optimal motionplanning. The International Journal of Robotics Research, 30(7):846–894,2011. → pages 42[16] R. Kimmel and J. A. Sethian. Optimal algorithm for shape from shading andpath planning. Journal of Mathematical Imaging and Vision, 14(3):237–244,2001. → pages 16[17] N. Koenig and A. Howard. Design and use paradigms for Gazebo, anopen-source multi-robot simulator. In Intelligent Robots and Systems (IROS),2004 IEEE/RSJ International Conference on, volume 3, pages 2149–2154.IEEE, 2004. → pages 30[18] S. M. LaValle and R. Sharma. On motion planning in changing, partiallypredictable environments. The International Journal of Robotics Research, 16(6):775–805, 1997. → pages 847[19] F. Li, C.-W. Shu, Y.-T. Zhang, and H. Zhao. A second order discontinuousGalerkin fast sweeping method for Eikonal equations. Journal ofComputational Physics, 227(17):8191–8208, 2008. → pages 16[20] M. L. Littman, A. R. Cassandra, and L. P. Kaelbling. Learning policies forpartially observable environments: Scaling up. In Proceedings ofInternational Conference on Machine Learning, 1995. → pages 12[21] B. D. Luders and J. P. How. An optimizing sampling-based motion plannerwith guaranteed robustness to bounded uncertainty. In Proceedings of theAmerican Control Conference (ACC), pages 771–777. IEEE, 2014. → pages11, 12[22] B. D. Luders, S. Karaman, and J. P. How. Robust sampling-based motionplanning with asymptotic optimality guarantees. In AIAA Guidance,Navigation, and Control Conference (GNC), 2013. → pages 11[23] I. M. Mitchell. A toolbox of level set methods (version 1.1). Department ofComputer Science, University of British Columbia, Vancouver, BC, Canada,Tech. Rep. TR-2007-11, June 2004. → pages 30[24] J. Nocedal and S. J. Wright. Numerical optimization. Springer Science, 35(67-68):7, 1999. → pages 27[25] S. Osher. A level set formulation for the solution of the Dirichlet problem forHamilton-Jacobi equations. SIAM Journal on Mathematical Analysis, 24(5):1145–1152, 1993. → pages 30[26] C. H. Papadimitriou and J. N. Tsitsiklis. The complexity of Markov decisionprocesses. Mathematics of Operations Research, 12(3):441–450, 1987. →pages 10[27] S. Patil, G. Kahn, M. Laskey, J. Schulman, K. Goldberg, and P. Abbeel.Scaling up Gaussian belief space planning through covariance-free trajectoryoptimization and automatic differentiation. In Algorithmic Foundations ofRobotics XI, volume 107, pages 515–533. Springer, 2015. → pages 11, 12[28] J. Pineau, G. Gordon, S. Thrun, et al. Point-based value iteration: An anytimealgorithm for POMDPs. In International Joint Conference on ArtificialIntelligence, volume 3, pages 1025–1032, 2003. → pages 9, 10, 12[29] S. Prentice and N. Roy. The belief roadmap: Efficient planning in belief spaceby factoring the covariance. The International Journal of Robotics Research,2009. → pages 1148[30] M. Quigley, K. Conley, B. Gerkey, J. Faust, T. Foote, J. Leibs, R. Wheeler,and A. Y. Ng. ROS: an open-source robot operating system. In ICRAworkshop on open source software, volume 3, page 5, 2009. → pages 30[31] S. Ross, J. Pineau, S. Paquet, and B. Chaib-Draa. Online planning algorithmsfor POMDPs. Journal of Artificial Intelligence Research, 32:663–704, 2008.→ pages 9, 12[32] K. M. Seiler, H. Kurniawati, and S. P. Singh. An online and approximatesolver for POMDPs with continuous action space. In Robotics andAutomation (ICRA), 2015 IEEE International Conference on, pages2290–2297. IEEE, 2015. → pages 11[33] D. Silver and J. Veness. Monte-Carlo planning in large POMDPs. InAdvances in Neural Information Processing Systems, pages 2164–2172, 2010.→ pages 9, 10[34] W. Sun, S. Patil, and R. Alterovitz. High-frequency replanning underuncertainty using parallel sampling-based motion planning. IEEETransactions on Robotics, 31(1):104–116, 2015. → pages 10, 11[35] W. Sun, J. van den Berg, and R. Alterovitz. Stochastic extended LQR:Optimization-based motion planning under uncertainty. In AlgorithmicFoundations of Robotics XI, volume 107, pages 609–626. Springer, 2015. →pages 11[36] S. Thrun, W. Burgard, and D. Fox. Probabilistic Robotics. MIT Press, 2005.→ pages 17[37] N. Traft and I. M. Mitchell. Improved action and path synthesis usinggradient sampling. In Decision and Control (CDC), 2016 IEEE 55thConference on, pages 6016–6023. IEEE, 2016. → pages iv[38] P. Trautman and A. Krause. Unfreezing the robot: Navigation in dense,interacting crowds. In Intelligent Robots and Systems (IROS), 2010 IEEE/RSJInternational Conference on, pages 797–803. IEEE, 2010. → pages 9[39] J. N. Tsitsiklis. Efficient algorithms for globally optimal trajectories. IEEETransactions on Automatic Control, 40(9):1528–1538, 1995. → pages 15, 16[40] J. Van Den Berg, P. Abbeel, and K. Goldberg. LQG-MP: Optimized pathplanning for robots with motion uncertainty and imperfect state information.The International Journal of Robotics Research, 30(7):895–913, 2011. →pages 1149[41] J. Van Den Berg, S. Patil, and R. Alterovitz. Motion planning underuncertainty using iterative local optimization in belief space. TheInternational Journal of Robotics Research, 31(11):1263–1278, 2012. →pages 11, 12[42] M. P. Vitus and C. J. Tomlin. A hybrid method for chance constrained controlin uncertain environments. In Decision and Control (CDC), 2012 IEEE 51stAnnual Conference on, pages 2177–2182. IEEE, 2012. → pages 850


Citation Scheme:


Citations by CSL (citeproc-js)

Usage Statistics



Customize your widget with the following options, then copy and paste the code below into the HTML of your page to embed this item in your website.
                            <div id="ubcOpenCollectionsWidgetDisplay">
                            <script id="ubcOpenCollectionsWidget"
                            async >
IIIF logo Our image viewer uses the IIIF 2.0 standard. To load this item in other compatible viewers, use this url:


Related Items