UBC Theses and Dissertations

UBC Theses Logo

UBC Theses and Dissertations

Autonomous, wireless sensor network-assisted target search and mapping Beyme, Steffen 2014

Your browser doesn't seem to have a PDF viewer, please download the PDF to view this item.

Item Metadata


24-ubc_2014_november_beyme_steffen.pdf [ 1.08MB ]
JSON: 24-1.0167007.json
JSON-LD: 24-1.0167007-ld.json
RDF/XML (Pretty): 24-1.0167007-rdf.xml
RDF/JSON: 24-1.0167007-rdf.json
Turtle: 24-1.0167007-turtle.txt
N-Triples: 24-1.0167007-rdf-ntriples.txt
Original Record: 24-1.0167007-source.json
Full Text

Full Text

Autonomous, Wireless Sensor Network-AssistedTarget Search and MappingbySteffen BeymeDipl.-Ing. Electrical Engineering, Humboldt-Universität zu Berlin, 1991A THESIS SUBMITTED IN PARTIAL FULFILLMENTOF THE REQUIREMENTS FOR THE DEGREE OFDoctor of PhilosophyinTHE FACULTY OF GRADUATE AND POSTDOCTORALSTUDIES(Electrical and Computer Engineering)The University Of British Columbia(Vancouver)October 2014c© Steffen Beyme, 2014AbstractThe requirements of wireless sensor networks for localization applications arelargely dictated by the need to estimate node positions and to establish routes todedicated gateways for user communication and control. These requirements addsignificantly to the cost and complexity of such networks.In some applications, such as autonomous exploration or search and rescue,which may benefit greatly from the capabilities of wireless sensor networks, itis necessary to guide an autonomous sensor and actuator platform to a target, forexample to acquire a large data payload from a sensor node, or to retrieve the targetoutright.We consider the scenario of a mobile platform capable of directly interrogatingindividual, nearby sensor nodes. Assuming that a broadcast message originatesfrom a source node and propagates through the network by flooding, we studyapplications of autonomous target search and mapping, using observations of themessage hop count alone. Complex computational and communication tasks areoffloaded from the sensor nodes, leading to significant simplifications of the nodehardware and software.This introduces the need to model the hop count observations made by the mo-bile platform to infer node locations. Using results from first-passage percolationtheory and a maximum entropy argument, we formulate a stochastic jump processwhich approximates the message hop count at distance r from the source. We showthat the marginal distribution of this process has a simple analytic form whose pa-rameters can be learned by maximum likelihood estimation.Target search involving an autonomous mobile platform is modeled as a stochas-tic planning problem, solved approximately through policy rollout. The cost-to-goiiat the rollout horizon is approximated by an open-loop search plan in which pathconstraints and assumptions about future information gains are relaxed. It is shownthat the performance is improved over typical information-driven approaches.Finally, the hop count observation model is applied to an autonomous mappingproblem. The platform is guided under a myopic utility function which quantifiesthe expected information gain of the inferred map. Utility function parameters areadapted heuristically such that map inference improves, without the cost penalty oftrue non-myopic planning.iiiPrefaceChapters 2 to 4 are based on manuscripts that to date have either been published, oraccepted or submitted for publication, in peer-reviewed journals and conferences.All manuscripts were co-authored by the candidate as the first author, with revi-sions and comments by Dr. Cyril Leung. In all these works, the candidate had theprimary responsibility for conducting the research, the design and performance ofsimulations, results analysis and preparation of the manuscripts, under the supervi-sion of Dr. Cyril Leung. The following list summarizes the publications resultingfrom the candidate’s PhD work:• S. Beyme and C. Leung, “Modeling the hop count distribution in wirelesssensor networks”, Proc. of the 26th IEEE Canadian Conference on Electri-cal and Computer Engineering (CCECE), pages 1–6, May 2013.• S. Beyme and C. Leung, “A stochastic process model of the hop count dis-tribution in wireless sensor networks”, Elsevier Ad Hoc Networks, vol. 17,pages 60–70, June 2014.• S. Beyme and C. Leung, “Rollout algorithm for target search in a wirelesssensor network”, Proc. of the IEEE 80th Vehicular Technology Conference,Sept. 2014. Accepted.• S. Beyme and C. Leung, “Wireless sensor network-assisted, autonomousmapping with information-theoretic utility”, 6th IEEE International Sym-posium on Wireless Vehicular Communications, Sept. 2014, Accepted.ivTable of ContentsAbstract . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . iiPreface . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . ivTable of Contents . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . vList of Tables . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . viList of Figures . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . viiList of Acronyms . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . viiiAcknowledgments . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . ix1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11.1 Background and Motivation . . . . . . . . . . . . . . . . . . . . 11.2 Thesis Organization and Contributions . . . . . . . . . . . . . . . 42 Stochastic Process Model of the Hop Count in a WSN . . . . . . . . 62.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 62.1.1 Motivation and Related Work . . . . . . . . . . . . . . . 72.1.2 Chapter Contribution . . . . . . . . . . . . . . . . . . . . 82.1.3 Chapter Organization . . . . . . . . . . . . . . . . . . . . 92.2 Wireless Sensor Network Model . . . . . . . . . . . . . . . . . . 92.2.1 Stochastic Geometry Background . . . . . . . . . . . . . 92.2.2 First-Passage Percolation . . . . . . . . . . . . . . . . . . 11v2.3 Stochastic Process Model for the Hop Count . . . . . . . . . . . . 142.3.1 Jump-type Lévy Processes . . . . . . . . . . . . . . . . . 142.3.2 Maximum Entropy Model . . . . . . . . . . . . . . . . . 172.3.3 Maximum Likelihood Fit of the Hop Count Process . . . . 192.4 Simulation Results . . . . . . . . . . . . . . . . . . . . . . . . . 212.4.1 Simulation Setup . . . . . . . . . . . . . . . . . . . . . . 212.4.2 Hop Count Distribution . . . . . . . . . . . . . . . . . . . 222.4.3 Localization of Source Node . . . . . . . . . . . . . . . . 232.5 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 263 Rollout Algorithms for WSN-assisted Target Search . . . . . . . . . 333.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 333.1.1 Background and Motivation . . . . . . . . . . . . . . . . 343.1.2 Chapter Contribution . . . . . . . . . . . . . . . . . . . . 363.1.3 Chapter Organization . . . . . . . . . . . . . . . . . . . . 373.2 System Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . 373.2.1 Wireless Sensor Network Model . . . . . . . . . . . . . . 373.2.2 Autonomous Mobile Searcher . . . . . . . . . . . . . . . 393.2.3 Formulation of Target Search Problem . . . . . . . . . . . 403.3 Approximate Online Solution of POMDP by Rollout . . . . . . . 443.3.1 Rollout Algorithm . . . . . . . . . . . . . . . . . . . . . 443.3.2 Parallel Rollout . . . . . . . . . . . . . . . . . . . . . . . 443.4 Heuristics for the Expected Search Time . . . . . . . . . . . . . . 453.4.1 Constrained Search Path . . . . . . . . . . . . . . . . . . 463.4.2 Relaxation of Search Path Constraint . . . . . . . . . . . 463.5 Information-Driven Target Search . . . . . . . . . . . . . . . . . 503.5.1 Mutual Information Utility . . . . . . . . . . . . . . . . . 503.5.2 Infotaxis and Mutual Information . . . . . . . . . . . . . 513.6 A Lower Bound on Search Time for Multiple Searchers . . . . . . 523.7 Simulation Results . . . . . . . . . . . . . . . . . . . . . . . . . 533.7.1 Simulation Setup . . . . . . . . . . . . . . . . . . . . . . 533.7.2 Idealized Observations . . . . . . . . . . . . . . . . . . . 553.7.3 Empirical Observations . . . . . . . . . . . . . . . . . . . 56vi3.8 Statistical Dependence of Observations . . . . . . . . . . . . . . 573.8.1 Mitigation of Observation Dependence . . . . . . . . . . 593.8.2 Explicit Model of Observation Dependence . . . . . . . . 613.9 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 644 WSN-assisted Autonomous Mapping . . . . . . . . . . . . . . . . . . 674.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 674.1.1 Background and Motivation . . . . . . . . . . . . . . . . 684.1.2 Chapter Contribution . . . . . . . . . . . . . . . . . . . . 704.1.3 Chapter Organization . . . . . . . . . . . . . . . . . . . . 704.2 System Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . 704.2.1 Wireless Sensor Network Model . . . . . . . . . . . . . . 704.2.2 Autonomous Mapper . . . . . . . . . . . . . . . . . . . . 724.3 Mapping Path Planning . . . . . . . . . . . . . . . . . . . . . . . 744.4 Simulation Results . . . . . . . . . . . . . . . . . . . . . . . . . 764.4.1 Simulation Setup . . . . . . . . . . . . . . . . . . . . . . 764.4.2 Simulation of Map Inference . . . . . . . . . . . . . . . . 774.5 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 775 Conclusions and Future Work . . . . . . . . . . . . . . . . . . . . . 815.1 Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 815.2 Future Work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 845.2.1 Parametric Models of the Hop Count Distribution . . . . . 845.2.2 Statistical Dependence of Hop Count Observations . . . . 855.2.3 Simulation-based Observation Models . . . . . . . . . . . 855.2.4 Multi-Modal Observations . . . . . . . . . . . . . . . . . 85Bibliography . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 86A Necessary Condition for the Hop Count Process . . . . . . . . . . . 97B Proof of Strong Mixing Property . . . . . . . . . . . . . . . . . . . . 99viiC Constrained-Path Search as Integer Program . . . . . . . . . . . . . 102C.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 102C.2 System Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . 104C.2.1 Minimizing the Expected Search Time . . . . . . . . . . . 104C.2.2 Maximizing the Detection Probability . . . . . . . . . . . 106C.3 Performance Evaluation . . . . . . . . . . . . . . . . . . . . . . . 107C.4 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 108D Infotaxis and Mutual Information . . . . . . . . . . . . . . . . . . . 114D.1 Background . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 114D.2 Proof of Equivalence . . . . . . . . . . . . . . . . . . . . . . . . 114D.3 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 116E Pseudocode . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 117viiiList of TablesTable 4.1 Map average entropy and MSE, wi adapted according to (4.16) 78Table 4.2 Map average entropy and MSE, wi adapted according to (4.17) 78ixList of FiguresFigure 2.1 Realization of a random geometric graph . . . . . . . . . . . 10Figure 2.2 CDFs of the translated Poisson model and the empirical hopcount in a 2D network, for mean node degree 8 . . . . . . . . 25Figure 2.3 CDFs of the translated Poisson model and the empirical hopcount in a 2D network, for mean node degree 16 . . . . . . . 26Figure 2.4 CDFs of the translated Poisson model and the empirical hopcount in a 2D network, for mean node degree 40 . . . . . . . 27Figure 2.5 Kullback-Leibler divergence (KLD) between empirical distri-bution and translated Poisson distribution . . . . . . . . . . . 28Figure 2.6 CDFs of the translated Poisson model and the empirical hopcount in a 1D network, for mean node degree 8 . . . . . . . . 29Figure 2.7 CDFs of the translated Poisson model and the empirical hopcount in a 1D network, for mean node degree 16 . . . . . . . 30Figure 2.8 CDFs of the translated Poisson model and the empirical hopcount in a 1D network, for mean node degree 40 . . . . . . . 31Figure 2.9 CDFs of the normalized localization error, given 8 hop countobservations, for mean node degrees 8 and 40. . . . . . . . . . 32Figure 3.1 Dynamic Bayes Network representing a POMDP . . . . . . . 42Figure 3.2 CDF of the search time for rollout under random action selec-tion, compared to the search time for myopic mutual informa-tion utility. Rollout horizon H = 4, hop counts generated bymodel M3 . . . . . . . . . . . . . . . . . . . . . . . . . . . . 57xFigure 3.3 CDF of the search time for rollout under random action se-lection, compared to the search time for non-myopic mutualinformation utility. Rollout horizon H = 4, hop counts gener-ated by model M3 . . . . . . . . . . . . . . . . . . . . . . . 58Figure 3.4 CDFs of the search time for rollout under 3 different base poli-cies: random, constant and greedy action selection. Rollouthorizon H = 4, hop counts generated by model M3 . . . . . . 59Figure 3.5 CDF of the search time for rollout under random action, com-pared to 2 parallel rollout approaches: random and constantaction selection, random and greedy action selection. Rollouthorizon H = 4, hop counts generated by model M3 . . . . . . 60Figure 3.6 CDF of the search time for rollout under random action, com-pared to 2 parallel rollout approaches: random and constantaction selection, random and greedy action selection. Rolloushorizon H = 2, hop counts generated by model M3 . . . . . . 61Figure 3.7 CDFs of the search time for rollout under random action, withhorizon H = 4, for the 3 hop count generation models M1, M2and M3 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 62Figure 3.8 CDFs of the search time for rollout under random action, withhorizon H = 4 and for the 3 hop count generation models M1,M2 and M3 . . . . . . . . . . . . . . . . . . . . . . . . . . . 63Figure 3.9 Deviation of local average hop count from the model mean . . 65Figure 3.10 Correlation between hop count observations, as a function ofthe distance from the source node and the inter-observationdistance . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 66Figure 4.1 True map, inferred map and autonomous mapper path . . . . . 79Figure 4.2 Average entropy per map element . . . . . . . . . . . . . . . 80Figure 4.3 Mean square error between true and inferred map . . . . . . . 80Figure C.1 Search path of minimum expected search time . . . . . . . . . 109Figure C.2 Search path of maximum probability of detection . . . . . . . 110xiFigure C.3 Cumulative probability of detection for the two search policiesof minimum expected search time and maximum probabilityof detection, and for the optimal search without path constraint 111Figure C.4 Search path for unpenalized objective function . . . . . . . . 112Figure C.5 Cumulative probability of detection for unpenalized objectivefunction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 113xiiList of AcronymsThe following acronyms are frequently used in this thesis:KLD Kullback-Leibler divergenceMLE Maximum likelihood estimationMSE Mean square errorPOMDP Partially observable Markov decision processWSN Wireless sensor networkxiiiAcknowledgmentsThe experience of graduate school at UBC has been rewarding and fulfilling, and Iowe a debt of gratitude to many people here, whose support made this possible.I reserve special thanks for my thesis supervisor and mentor, Dr. Cyril Leung,whom I have had the privilege to work with. His invaluable insight and continuedencouragement have been an inspiration throughout this journey of thesis research.He let me explore with great freedom and offered the guidance and the supportwithout which this work would not have progressed to this point.I would like to thank my supervisory committee members, Dr. Vikram Krish-namurthy and Dr. Z. Jane Wang, for their advice and much appreciated feedback.I would also like to thank the faculty, staff and fellow students in the Depart-ment of Electrical and Computer Engineering, who all contributed to create a stim-ulating research environment.This work was supported in part by the Natural Sciences and Engineering Re-search Council (NSERC) of Canada under Grants OGP0001731 and 1731-2013and by the UBC PMC-Sierra Professorship in Networking and Communications.Finally, I owe enduring gratitude to my parents, for the love and encouragementthey have provided. My greatest thanks go to my wife, Beatriz, and to our children,Carl and Alex, whose love and patience have let me see this thesis through.xivTo my FamilyxvChapter 1Introduction1.1 Background and MotivationBoth target localization and mapping are important applications of wireless sensornetworks (WSNs) [80] as well as autonomous robotics [110]. Much research con-tinues to be devoted to advance the state-of-the-art in these fields and to expand andenhance the operational capabilities of these complementary technologies. Thisthesis considers the joint application of wireless sensor networks and autonomousrobotics.Some applications, such as autonomous exploration or search and rescue, wouldbenefit from both the pervasive sensor coverage which only WSNs can provide, andthe mobility of a single or multiple autonomous sensor and actuator platforms. Ina joint application, a typical mission would involve the ad hoc deployment of aWSN (due to the circumstances of the mission, often in a random manner) and anautonomous platform able to locate one or several “targets”, or inferring a map,by interacting with the WSN. The objective can range from collecting a large datapayload from a sensor node that has reported an event of interest (but energy con-straints prevent the actual, recorded observation data from being forwarded throughthe wireless sensor network), moving more sophisticated sensing or actuating ca-pabilities closer to the site of interest (as in planetary exploration), to retrieval ofthe target outright. A survey of related applications at the intersection of WSNsand autonomous robotics can be found in [91]. A method for localization and1autonomous navigation using WSNs was recently described in [22].Many WSNs used for localization are organized around the concept of location-aware sensor nodes which forward individual target location estimates to one orseveral sink nodes, using multi-hop communication. There can be varying degreesof local cooperation between nodes to enhance the sensing and localization perfor-mance [80]. The purpose of the sinks is to aggregate, or “fuse”, information frommultiple sensor nodes to improve the location estimates and ultimately, to serveas gateways through which a user communicates with the WSN. The sinks canbe dedicated nodes, or may be dynamically selected from the population of sen-sor nodes (for example based on the available amount of energy) to act as fusioncenters. Node location awareness can be addressed by the use of special-purposelocalization hardware (e.g. GPS or time/angle of arrival) and associated protocols.The need to establish node positions and to maintain multi-hop routes from thesensor nodes to the statically or dynamically assigned fusion centers thus drivesmany of the hardware and communication requirements of WSNs and contributessignificantly to their cost and complexity.As an approach to reduce the cost and complexity of WSNs used by the ap-plications considered in this thesis, we assume that the sensor nodes are location-agnostic and that an autonomous platform, assumed to be location-aware, acts as amobile sink by directly interrogating nearby sensor nodes. Instead of discoveringand maintaining routes to the sink, a simple message broadcast protocol is respon-sible for information dissemination in the WSN. As a consequence, the need forspecial-purpose hardware to support node localization, as well as complex routingstrategies and dedicated sink nodes for performing in-situ sensor fusion, is elimi-nated. By offloading expensive computational and communication tasks from thesensor nodes to the autonomous platform, a significant simplification of the sen-sor node hardware and software requirements can be achieved. In this thesis, onlyobservations of the hop count of a message originating from a source node areassumed to be available to the autonomous platform, to infer the location of thesource node.Although we do not pursue the concept further in this thesis, an autonomoussensor platform allows for the seamless integration of WSN hop count observationswith the platform’s on-board sensing capabilities, which are often sophisticated and2complementary to the WSN and may include laser or ultrasound ranging, imagingetc. Similarly, it is conceivable to use hop count based localization in combinationwith other low-cost methods of WSN node localization, to obtain overall improvednode location estimates. This includes for example methods based on receivedsignal strength (RSS), which add little or no extra cost to the sensor nodes [12].The joint application of WSNs and autonomous platforms for target search andmapping raises the need for an observation model which relates the statistics of thehop count of a broadcast message to the distance from the source node in an appro-priately defined WSN. This problem is central to localization methods referred to as“range-free” [102]. However, for many reasonable models of WSNs, the character-ization of the hop count statistics, given the source-to-sink distance, remains a chal-lenging, open problem for which only approximations are known, which in manycases can only be evaluated at significant computational cost [24, 72, 82, 107].Under certain assumptions, which include linear observations and a Gaussian errormodel, the Kalman filter is optimal for localization (these assumptions are some-what relaxed in the extended Kalman filter) [57]. However, the hop count obser-vations in a WSN are not well characterized by this model, generally requiringnonparametric Bayesian methods to compute the a posteriori probability densityfunction of the target. Typically performed by grid or particle filters, these methodsrequire a large number of numerical evaluations of the observation model [110]. Itis therefore important, to develop models that have low computational complexityand characterize the hop count reasonably well. This is the subject of Chapter 2 ofthis thesis.The search for a (generally moving) target by an autonomous platform basedon observations of the hop count of a broadcast message can be described as astochastic planning problem. The general framework for this type of problem isthe partially observable Markov decision process (POMDP) [69]. Unfortunately,for most problems of practical relevance, solving the POMDP exactly is compu-tationally intractable. This has given rise to the need for approximate, suboptimalsolutions which can achieve acceptable performance, many of which are based ononline Monte Carlo simulation [85, 106]. However, even suboptimal planning al-gorithms can still present formidable computational challenges. In this thesis, wepropose the use of an efficient heuristic to limit the computational cost of a policy3rollout algorithm [8, 17], which is the subject of Chapter 3. The computationalrequirements of Monte Carlo methods such as policy rollout further magnify theneed for observation models of low complexity.Closely related to localization is autonomous mapping [25, 108], which weconsider in Chapter 4. Autonomous mapping platforms are typically equipped withsensors such as range and direction finders or cameras, and are therefore gearedtowards the mapping of physical objects (or obstacles) in the environment. In con-trast, the pervasive sensor coverage of WSNs enables the dense mapping of quan-tities such as the concentration of chemicals, vibrations and many others, whosemeasurement requires close physical contact with the sensor. With WSN-assistedmapping, the data association problems [110] inherent in many mapping applica-tions can be sidestepped quite easily. Another difficult problem in autonomousmapping is the planning of an optimal path along which observations are made,such that the mean error between the true and the inferred map is minimized, usu-ally over a finite time horizon. Due to the curse of dimensionality, this problem isgenerally intractable and good approximate techniques to find a near-optimal pathare required for any practical application.1.2 Thesis Organization and ContributionsThesis OrganizationThe subject of Chapter 2 is the derivation of an observation model, which relatesthe hop count distribution of a broadcast message in a suitably defined WSN to thesource-to-sink distance. We evaluate the model by comparison with the empiricalhop count in a simulated WSN. This model is the basis for Chapter 3, in which westudy target search by an autonomous platform as a stochastic planning problem,and use Monte Carlo techniques to solve it approximately. In Chapter 4, we studythe problem of path planning for an autonomous platform which relies on hop countobservations from the WSN to infer the map of sensor measurements. As in thepreceding chapter, approximate solution methods are developed as the path plan-ning problem is generally intractable. Finally, Chapter 5 summarizes conclusionsfrom this work and provides a few suggestions for future research.4ContributionsThe thesis makes the following contributions:• In Chapter 2, we formulate a stochastic jump process whose marginal dis-tribution has a simple analytical form and models the hop count of the first-passage path from a source to a sink node.• In Chapter 3, we model the target search as an infinite-horizon, undiscountedcost, online POMDP [69] and solve it approximately through policy rollout[8]. The terminal cost at the rollout horizon is described by a heuristic basedon a relaxed, optimal search problem.• We show that a target search problem described in terms of an explicit trade-off between exploitation and exploration (referred to as “infotaxis” [112]), ismathematically equivalent to a target search with a myopic mutual informa-tion utility.• A lower bound on the expected search time for multiple uniformly dis-tributed searchers is given in terms of the searcher density, based on thecontact distance [4] in Poisson point processes.• We propose an integer autoregressive INAR(1) process [1] for translatedPoisson innovations, as a model for the statistical dependence of hop countobservations in a WSN.• In Chapter 4, we propose a myopic, information-theoretic utility functionfor path planning in an autonomous mapping application. Utility functionparameters are heuristically adapted to offset the myopic nature of the utilityand achieve improved performance of the map inference.5Chapter 2Stochastic Process Model of theHop Count in a WSN2.1 IntroductionIn this chapter, we consider target localization in randomly deployed multi-hopwireless sensor networks, where messages originating from a sensor node arebroadcast by flooding and the node-to-node message delays are characterized byindependent, exponential random variables. Using asymptotic results from first-passage percolation theory and a maximum entropy argument, we formulate astochastic jump process to approximate the hop count of a message at distancer from the source node. The resulting marginal distribution of the process has theform of a translated Poisson distribution which characterizes observations reason-ably well and whose parameters can be learned, for example by maximum like-lihood estimation. This result is important in Bayesian target localization, wheremobile or stationary sinks of known position may use hop count observations, con-ditioned on the Euclidean distance, to estimate the position of a sensor node oran event of interest within the network. For a target localization problem with afixed number of hop count observations, taken at randomly selected sensor nodes,simulation results show that the proposed model provides reasonably good locationerror performance, especially for densely connected networks.62.1.1 Motivation and Related WorkTarget localization in wireless sensor networks (WSNs) is an active area of re-search with wide applicability. Due to power and interference constraints, the vastmajority of WSNs convey messages via multiple hops from a source to one orseveral sinks, mobile or stationary. Localization techniques which exploit the in-formation about the Euclidean distance from a sensor node, contained in the hopcount of a message originating from that node, are referred to as range-free [102].Range-free localization is applicable to networks of typically low-cost, low-powerwireless sensor nodes without the hardware resources needed to accurately mea-sure node positions, neighbor distances or angles (for example using GPS, time orangle of arrival). It is therefore an attractive approach in situations where a com-promise is sought between localization accuracy on one hand, and cost, size andpower efficiency on the other.Various hop-count based localization techniques for WSNs have been pro-posed; for a survey see e.g. [102]. Relating hop count information to the Eu-clidean distance between sensor nodes, exemplified by the probability distributionof the hop count conditioned on distance, remains a challenging problem. Exceptin special cases, such as one-dimensional networks [2], only approximations canbe obtained; such approximations are often in the form of recursions, which tendto be difficult to evaluate [24, 72, 107].Moreover, the hop count depends on the chosen path from the source to the sinkand is therefore a function of the routing method employed by the network. For ex-ample, an approximate closed-form hop count distribution is proposed in [82] andevaluated for nearest, furthest and random neighbor routing, in which a forwardingnode selects the next node from a semi-circular neighborhood oriented towards thesink, under the assumption that this neighborhood is not empty. Some of the ex-isting localization algorithms, such as the DV-hop algorithm [77] and its variants,define the hop count between nodes as that of the shortest path [24, 72, 107]. Otherlocalization algorithms use the hop count of a path established through greedy for-ward routing [20, 59, 76, 113, 117], that is, a path which makes maximum progresstoward the sink with every hop. In most cases, the overhead incurred by establish-ing and maintaining routes is not negligible. Simpler alternatives may be needed7when sensor nodes impose more severe complexity constraints.This chapter is motivated by the range-free target localization problem in net-works of position-agnostic wireless sensor nodes, which broadcast messages usingflooding under the assumption that node-to-node message delays can be character-ized by independent, exponential random variables. This is a reasonable assump-tion in situations where sensor nodes enter a dormant state while harvesting energyfrom the environment and wake up at random times, or when the communicationchannel is unreliable and retransmissions are required. Under these conditions, afirst-passage path emerges as the path of minimum passage time from a source toa sink. Networks of this type can be described in terms of first-passage percolation[23, 33].Localization of the source node may be performed by mobile or stationarysinks able to fuse hop count observations Z ∈ N to infer the location X ∈ R2 ofthe source node, where pX(x) denotes the a priori pdf of X . By Bayes’ rule, thea posteriori pdf of the source location is pX(x|z)∼ pZ(z|x) pX(x), conditioned onobserving the hop count z at the sink position. Knowledge of the observation modelpZ(z|x), that is, the conditional pdf of the hop count, given the source location hy-pothesis x, is essential for the success of this approach, which may be complicatedfurther by the presence of model parameters whose values are not known a prioriand must be learned on- or off-line. Bayesian localization involves a large numberof numerical evaluations of the observation model, due to the typically large spaceof location hypotheses. This creates a need for observation models with low com-putational complexity, which may outweigh the need for high accuracy in someapplications.2.1.2 Chapter ContributionThe main contribution of this chapter is the formulation of a stochastic jump pro-cess whose marginal distribution has a simple analytical form, to model the hopcount of the first-passage path from a source to a sink, which is at distance r. Incontrast to earlier works [20, 24, 72, 76, 82, 107, 113, 117], which generally usegeometric arguments to derive expressions for the hop count distribution, our ap-proach utilizes the abstract model of a jump process and describes the hop count in8terms of the marginal distribution of this process. Starting with a stochastic processof stationary increments satisfying a strong mixing condition, we make a simplify-ing independence assumption which allows the hop count to be modeled as a jumpLévy process with drift [6, 29]. We show that, consistent with our assumptionsabout the hop count, the maximum entropy principle [52] leads to the selectionof a translated Poisson distribution as the marginal distribution of the hop countmodel process.2.1.3 Chapter OrganizationThis chapter is structured as follows: in Section 2.2, we review relevant conceptsfrom stochastic geometry and first-passage percolation and introduce our wirelesssensor network model. Our main result, the stochastic process {Zr} which modelsthe observed hop count distribution at distance r from a source node, is derived inSection 2.3. We describe how the parameters of this process can be learned usingmaximum likelihood estimation. In Section 2.4, simulation results are presentedwhich show that in sensor networks of the type considered in this chapter, themarginal distribution of the model process approximates the empirical hop countreasonably well. Furthermore, we study the localization error due to the approx-imation by comparison with a fictitious, idealized network in which observationsare generated as independent draws from our model. Proofs of propositions usedto derive our model are given in Appendix A and B. This chapter is based onmanuscripts published in [9, 10].2.2 Wireless Sensor Network Model2.2.1 Stochastic Geometry BackgroundThe geometry of randomly deployed WSNs is commonly described by Gilbert’sdisk model [35], a special case of the more general Boolean model [104]. Givena spatial Poisson point process Pλ = {Xi : i ∈ N} of density λ on R2, two sensornodes are said to be linked if they are within communication range R of each other.Gilbert’s model induces a random geometric graph Gλ ,R = {Pλ ,ER} with nodeset Pλ and edge set ER (Figure 2.1). The node density λ and the communication9range R are related through the mean node degree δ = piλR2, so that the graphcan be defined equivalently in terms of a single parameter as Gδ . Without loss ofgenerality, it is convenient to condition the point process on there being a node atthe origin. By Slivnyak’s Theorem [4], if we remove the point at the origin fromconsideration, we have a Poisson point process.llllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllFigure 2.1: Realization of a random geometric graph Gδ restricted to [0,1]2Of key interest in the study of Gδ is continuum percolation [13, 100], that is,the conditions under which a cluster of infinitely many connected nodes emergesin such a graph, and the probability that the node under consideration (without lossof generality, the origin) belongs to this cluster. Percolation is widely known toexhibit a phase transition from a subcritical regime characterized by the existenceof only clusters with a finite number of nodes, to the formation of a single infinitecluster with probability 1, as δ is increased above a critical threshold δc. Thepercolation probability is defined as the probability that the origin belongs to theinfinite cluster. It is zero for values of δ below the threshold, and an increasingfunction of δ above. The phase transition is interesting in its own right and much10research is devoted to the analysis of critical phenomena [100]. Just above thecritical threshold, the incipient infinite connected cluster has characteristics whichare generally undesirable for a communication network, such as large minimumpath lengths, which diverge at the critical threshold. A universal property of allsystems exhibiting a phase transition is that characteristic quantities display power-law divergence near the threshold [100]. Therefore, the mean node degree (orequivalently, the node density or the transmission range) is generally chosen by thedesigner so that the network operates sufficiently deep in the supercritical regime,i.e. it is strongly supercritical. As the simulations will show, the model hop countdistribution is a good approximation of the empirical hop count for large values ofthe mean node degree, but deviates more noticeably when the mean node degreeapproaches the critical threshold δc.2.2.2 First-Passage PercolationThe dissemination of messages in some broadcast WSNs has been described interms of first-passage percolation [23, 33, 34, 64], where every node forwards thebroadcast message to all of its neighbors, so that over time, a message cluster forms(illustrating the relationship between first-passage percolation and random growthmodels [30, 84]). This type of information dissemination is commonly referred toas flooding. We assume that the node-to-node message delays can be characterizedby independent, exponentially distributed random variables with a common mean.This is a reasonable assumption for WSNs in which the nodes enter a dormant state[23, 39], while replenishing their energy storage by harvesting from unpredictableenvironmental sources, and thus transmit at random times. The independence ofthe node-to-node delays in the standard first-passage percolation model impliesthat transmissions do not interfere with each other. Various approaches to mitigateinterference exist: for simplicity, we postulate the allocation of orthogonal trans-mission resources with a sufficiently large reuse factor [75], so that interferencecan be neglected. In addition, we assume that multiple simultaneous broadcastsinitiated by the same or by different source nodes do not interact, i.e. hop count ob-servations associated with different broadcasts are independent. Under real-worldconditions, this would not be very efficient and one might consider e.g. schemes11of message aggregation. Finally, unless some form of authentication is introduced,it is conceivable to disrupt such a system intentionally. Messages convey the cu-mulative number of hops, until observed by a sink. The initial message is taggedwith a unique ID by the source node, so that duplicates can be recognized eas-ily and discarded by the receiving nodes to prevent unnecessary (and undesirable)retransmissions.First-passage percolation was introduced in [41] as a model of e.g. fluid trans-port in random media and has been extensively studied since [47]. In first-passagepercolation, the edges E of a graph G are assigned independent, identically dis-tributed non-negative random passage times {Θ(e) : e ∈ E } with the common lawFΘ. The passage time for a path γ in G is then defined as the sum of the edgepassage timesT (γ) =∑e∈γΘ(e). (2.1)A path γ(x,y) from node x to y, defined by the ordered set of edges {(x,v1),(v1,v2), . . . ,(vn−1,y)} has a hop count of |γ | = n. Let Γ(x,y) be the set of allpaths connecting x and y. Then, the first-passage time from x to y isT (x,y) = infγ∈Γ(x,y){T (γ)} . (2.2)The first-passage time so defined induces a metric on the graph, with the minimiz-ing paths referred to as first-passage paths, or geodesics. Because the edge passagetimes are non-negative, geodesics cannot contain loops, that is, they are necessar-ily self-avoiding paths connecting the end points. Rather than studying the randomgeometric graph Gδ , it is sometimes more convenient to consider the unit distancegraph on the square lattice Z2, for example when appealing to translational invari-ance. In fact, most classical results in first-passage percolation were establishedin the latter setting [41, 95, 96, 115]. On the square lattice, first-passage time isoften studied between points mex and nex on the x-axis with m < n, where ex isthe unit vector in the direction of the x-axis, and denoted Tm,n = T (mex,nex). The12first-passage time is a subadditive stochastic process, that isT0,n ≤ T0,m +Tm,n whenever 0 < m < n. (2.3)Under suitable conditions, the first-passage time is characterized by the asymptotictime constant, µ , defined throughlimn→∞T0,nn= µ(FΘ) = µ. (2.4)This limit has been shown to exist almost surely (Kingman’s subadditive ergodictheorem [61], and version with weaker conditions in [65]). Despite much effort,many open problems remain in first-passage percolation; in particular, it is notknown if the first-passage time T0,n has a limiting distribution [47].Compared to the rather extensive study of the first-passage time constant µ andalthough closely related, the path length, or hop count, has attracted less attentionin the literature on first-passage percolation. Let N0,n denote the hop count of thefirst-passage path between o and nex with first-passage time T0,n (if this path is notunique, select the one with the smallest hop count to break ties). Suppose now thatall edge passage times Θ(e) are subject to a small perturbation s. Intuitively, thehop count N0,n is found to be the derivative, if one exists, of the first-passage timewith respect to s [96]N0,n =ddsT0,n∣∣∣s=0(2.5)Let (FΘ⊕ s)(t) = FΘ(t− s) denote the law of the perturbed edge passage times.Under the assumption that Θ is bounded below away from zero, i.e. FΘ(a) = 0 forsome a > 0, it is shown in [41] that the time constant µ(FΘ⊕ s) has a derivativewith respect to s for almost all s, denoted by µ ′(s). Furthermore, if µ ′(0) exists, itis established thatlimn→∞N0,nn= µ ′(0) in probability. (2.6)This result implies that number of hops per unit distance is asymptotically con-stant. The technical assumption that FΘ(t) is bounded away from zero is naturally13satisfied in the sensor network setting, because message forwarding by any realsensor node incurs a minimum, non-zero latency.It is important to point out that the first-passage path, and therefore the hopcount, is invariant to the common mean of the edge passage times. This can beproven by a simple change of the unit of time. In turn, the mean edge passagetime may be sensitive to the operating conditions of the WSN, such as the averageenergy available from environmental sources. A very desirable consequence is,that the statistics of the hop count are completely determined by design parametersand do not depend on operating conditions.The asymptotically linear relationship between the first-passage time and theEuclidean distance from the source node has been demonstrated in [23, 33, 64],essentially validating (2.4) for the types of WSNs studied by the authors. To thebest of our knowledge, the model presented in this chapter is the first to expressthe hop count of first-passage paths at distance r from the source as the marginaldistribution of a suitable stochastic jump process.2.3 Stochastic Process Model for the Hop Count2.3.1 Jump-type Lévy ProcessesDespite much effort, general results for the limiting distribution of the first-passagetime and by extension, the hop count, remain elusive, among other unsolved prob-lems in the theory of first-passage percolation [47]. Moreover, if the objective is todescribe the hop count distribution in WSNs, where the typical distances from thesource can not be characterized as “asymptotic”, knowledge of limiting distribu-tions is of little value. Therefore, an attempt is made here to model the hop countby a stochastic process, motivated by several simplifying assumptions. The result-ing process is characterized by a simple parametric marginal distribution pZr(z) forthe hop count Z at distance r. Simulation results are presented in Section 2.4.2 tovalidate the model. We also evaluate its performance in a sensor node localizationproblem in Section 2.4.3.Initially, we consider a WSN on the square lattice and specifically, the sequenceof sensor nodes located at {knex : k ≥ 0}, where n ∈ N defines their spacing. Due14to the subadditivity property of the first-passage time (2.3), we haveT0,νn ≤ν−1∑k=0Tkn,(k+1)n, ν ∈ N. (2.7)Suppose now that we approximate the hop count N0,νn of the first-passage path bya stochastic process of the formNˆ0,νn =ν−1∑k=0Nkn,(k+1)n, ν ∈ N. (2.8)A necessary, although not sufficient condition for such a process to approximatethe true hop count without systematic bias is therefore, that N0,νn is neither sub-nor superadditive. In Appendix A, we give a proof that the hop count of the first-passage path is not subadditive. The proof that the hop count is not superadditiveuses an analogous argument and is therefore is omitted. The process (2.8) has thefollowing property, a proof of which is given in Appendix B:Proposition 1. The hop count increments {N0,n,Nn,2n, . . .} are strongly mixing.That is to say, the hop count variables N0,n and Nkn,(k+1)n may be consideredindependent for k→ ∞, or asymptotically independent. We use the mixing propertyto motivate a simplifying independence assumption for the increments of our hopcount model process. The assumption of i.i.d. edge passage times implies thatthe hop count increments are also stationary. Stochastic processes with stationaryand independent increments form the class of processes known as Lévy processes[6, 50].Definition 2.1. A stochastic process Xr is said to be a Lévy process, if it satisfiesthe following conditions:1. stationary increments: the law of Xr−Xs is the same as the law of Xr−s−X0for 0≤ s < r,2. independent increments: Xr−Xs is independent of Xt , for 0≤ t ≤ s < r,3. X0 = 0,4. sample paths are right continuous with left limits.15Among these, jump-type Lévy processes [29] restricted to integer jump sizescan be considered prototypes for the hop count process. Lévy processes may fur-thermore include a deterministic component, which in our model represents theminimum hop count required to reach a sink at distance r > 0 due to the finite,one-hop transmission range R. The jump component of a Lévy process with inte-ger jumps is characterized by a Lévy measure [6] of the formm(x) =∑jλ jδ (x−n j), n j ∈ Z\{0}. (2.9)The Lévy measure describes the distribution of the jumps of the stochastic process,in terms of the jump sizes n j and the associated intensities λ j. The measure isbounded, that is∑jλ j < ∞. (2.10)Later it will become necessary to specify a Lévy measure for our model. Fornow, we are interested in the general form of the marginal distribution of the Lévyprocess, which can be determined from its characteristic function, given by theLévy-Khintchine formula [6]. For a measure of the form (2.9), the characteristicfunction isϕXr(u) = exp(r[iub+∑jλ j(eiun j −1)]), (2.11)which describes a compound Poisson process [50]Xr = br+Nr∑i=0Yi, (2.12)where b ∈ R denotes the rate of the deterministic, linear drift of the process and Nris a Poisson variable with expectation λ r where λ =∑λ j. The random variables Yiare independent and identically distributed with fY (·) = λ−1m(x), also known asthe compounding distribution.16The expectation of the jump Lévy process is linear in the distance r, asEXr =−iϕ ′Xr(0) = r(b+∑jλ jn j), (2.13)which implies, that the average number of hops per unit distance is a constant,EXrr= b+∑jλ jn j. (2.14)This property of the model process is consistent with the asymptotic behavior ofthe hop count in first-passage percolation (2.6), i.e. for sufficiently large distances,we expect the average hop count to increase linearly in r.Equation (2.13) imposes constraints on the support and the mean of the distri-bution of the jump component of the process Xr, respectively. As the deterministiccomponent br in (2.13) represents the minimum hop count to reach a node at dis-tance r, the support of the jump component of the process must be restricted to thenon-negative integers. The mean of the jump component of Xr is given by (2.13)as λr = r∑λ jn j.2.3.2 Maximum Entropy ModelWe now turn to the selection of a specific Lévy measure, whose general structureis given by (2.9), for the jump component of the process Xr. This can be viewedas a model identification problem. Appealing to the law of parsimony (Occam’srazor, [42, 70]), this heuristic can be used to select among all possible modelsone that is consistent with our knowledge and incorporates the least number of apriori assumptions or parameters. This criterion suggests the selection of the Lévymeasure m(x) = λδ (x−1), giving rise to the simple Poisson process.A more powerful argument can be based on the maximum entropy principle,due to Jaynes [51, 52], which states that among all distributions which satisfy ana priori given set of constraints representing our knowledge, we should select theone which is least informative, or more formally, has the maximum entropy subjectto the constraints. The entropy of a probability mass function f (k) = { fk : k ∈ Z+}17is defined as [19]H( f ) =−∞∑k=0fk log fk. (2.15)Consider for now the process with the characteristic functionϕXr(u) =(p1− eiu(1− p))r(2.16)corresponding to the negative binomial distribution NB(r, p), with support on thenon-negative integers. This distribution is infinitely divisible and therefore, it is thedistribution of some Lévy process [50]. If and only if r = 1, the marginal distri-bution of this process is geometric with parameter p. Among the discrete distribu-tions supported on Z+ and subject to a mean constraint, the geometric distributionis well-known to possess maximum entropy [54]. Consequently, the Lévy process(2.16) achieves maximum entropy, however only for r = 1.A maximum entropy Lévy process, on the other hand, corresponds to an in-finitely divisible distribution which has maximum entropy for all r ≥ 0, undersuitable constraints. This condition is satisfied by the Poisson distribution, and[54, Theorem 2.5] asserts its maximum entropy status within the class of ultralog-concave distributions [66], which includes all Bernoulli sums and thus, the bi-nomial family. This maximum entropy result reinforces the choice of a Poissonvariable Yλr as the jump component of the Lévy process with drift, such thatXr = br+Yλr . (2.17)Up to this point, we have ignored that the drift term br is real-valued. In ourapplication, the drift term represents the minimum hop count to reach a node atdistance r due to the finite transmission radius R and it is appropriate to replace itby the integerζr =⌈rR⌉. (2.18)We obtain the main result of this chapter, the hop count model process {Zr : r∈R+}18with mean µr, asZr = ζr +Yλr (2.19)where ζr is the minimum hop count and Yλr is a Poisson variable with a mean givenby λr = µr−ζr. The process Zr has the marginal probability mass functionpZr(z) = TPois(z;µr,ζr), Pois(z−ζr;λr), z≥ ζr (2.20)which is referred to as the translated Poisson distribution [5].2.3.3 Maximum Likelihood Fit of the Hop Count ProcessThe maximum likelihood principle is used to obtain an estimate of a distributionparameter θ , defined as the parameter value which maximizes the likelihood func-tion. The maximum likelihood estimator (MLE) has the desirable property of beingasymptotically efficient, that is, unbiased and approaching the Cramér-Rao lowerbound on the variance of θˆ for increasing sample sizes [58].The mean (2.13) of the hop count model process Zr is assumed to increaselinearly with the distance r from the source node x, at an a-priori unknown rate andintercept (θ1, θ2) ∈Θ, where Θ= R+× [1,∞), which we want to estimate basedon observations of the hop count at different sites, given x. The mean number ofhops as a function of distance r is approximated asµr =0 if r = 0,θ1r+θ2 if r > 0.(2.21)The intercept θ2 ≥ 1 reflects the fact that for any sensor node other than the source,the hop count must be at least one. Furthermore, at any distance r the hop countmust be greater or equal than the minimum hop count given by (2.18).An important property of the hop count process is, that θ1, θ2 are invariantto the choice of the mean node-to-node passage time, which can be shown by asimple change of units of time. The process parameters depend only on the densityand the transmission range of the sensor nodes. This implies that the MLE can19be performed off-line at the network’s design time, while the mean node-to-nodemessage delay is allowed to vary (slowly) with operating conditions, such as theaverage energy harvested by the sensor nodes.An observation z with minimum hop count ζ is modeled by a translated Poissonvariable Z with probability mass function (2.20)pZr(z) =λ z−ζ(z−ζ )! exp(−λ ). (2.22)where λ = µ−ζ from (2.19). By substituting λ = θ1r+θ2−ζ we obtain the pmfof the observation, parameterized by the rate θ1 and intercept θ2,pZ(z|r;θ) =(θ1r+θ2−ζ )z−ζ(z−ζ )! exp(− (θ1r+θ2−ζ )). (2.23)We now consider a sample of n independent (though in general, not identicallydistributed) observations z1:n with joint conditional probabilityp(z1:n|r1:n;θ) =n∏i=1pZ(zi|ri;θ). (2.24)For maximum likelihood estimation problems, it is often convenient to use the log-likelihood function, defined as the logarithm of (2.24) and interpreted as a functionof the parameter vector θ given the observations,`(θ |z1:n,r1:n) = ln p(z1:n|r1:n;θ) (2.25)=n∑i=1ln pZ(zi|ri;θ). (2.26)With (2.23), the log-likelihood for our problem becomes`(θ |z1:n,r1:n) =n∑i=1[(zi−ζi) ln(θ1ri +θ2−ζi)− ln(zi−ζi)!− (θ1ri +θ2−ζi)]. (2.27)20If a maximum likelihood estimate exists, then it isθˆML = argmaxθ∈Θ`(θ |z1:n,r1:n). (2.28)The partial derivatives of the log-likelihood function with respect to θ are zero atany local extremum. Hence, the maximum likelihood estimate is a solution of thesystem of likelihood equations∂`∂θ1=n∑i=1[zi−ζiθ1ri +θ2−ζi−1]ri = 0∂`∂θ2=n∑i=1[zi−ζiθ1ri +θ2−ζi−1]= 0. (2.29)Multiplication with the common denominator transforms (2.29) into a system ofbivariate polynomials in θ1 and θ2f (θ1,θ2) = 0g(θ1,θ2) = 0 (2.30)which has the same zeros as (2.29). By Bezout’s theorem [105], this system hasdeg( f ) ·deg(g) zeros (θ1, θ2) ∈ C2 which can by easily computed by typical nu-merical solvers even for large polynomial degrees (> 1000), subject to the con-straint that (θ1, θ2) ∈ Θ, where Θ= R+× [1,∞). For every zero satisfying theconstraint, negative definiteness of the Hessian of ` must be ascertained for a lo-cal maximum of the likelihood to exist. The maximum likelihood estimate for ourproblem is identified with the solution (θ1, θ2) which globally maximizes `, subjectto (θ1, θ2) ∈Θ.2.4 Simulation Results2.4.1 Simulation SetupThe translated Poisson distribution was applied to model the empirical hop countobserved in simulated broadcast WSNs. For this purpose, our simulation generates103 realizations of random geometric graphs Gδ on the unit square [0,1]2, where21the number of nodes is a Poisson variable with density λ = 4000 nodes per unitarea. The source node is located at the center, a setup which minimizes bound-ary effects. We consider mean node degrees ranging from δ = 8 to δ = 40, whichare representative of weakly as well as strongly supercritical networks (the criti-cal mean node degree is δc = piλc(R)R2 ≈ 4.52, based on a value of λc(1)≈ 1.44for the empirical critical node density as given in [63] for R = 1). Node-to-nodedelays are modeled by independent exponential random variables with unit mean.The first-passage paths are computed as the minimum-delay paths from the sourceto all nodes within a thin annular region defined by r±∆. The simulation resultsare not sensitive to the exact choice of ∆, provided that ∆< r/10. Based on the hopcounts of a sample of 103 nodes at random distances from the source, we determinethe model process parameters θ1,θ2 using maximum-likelihood estimation, as de-scribed in Section Hop Count DistributionFor selected distances r from the source, Figures 2.2, 2.3 and 2.4 show the cdfsof the hop count, corresponding to mean node degrees δ = 8, 16 and 40, respec-tively. We observe, that the translated Poisson model closely approximates theempirical hop count in strongly supercritical networks (δ = 40, Figure 2.4). Forweakly supercritical networks (δ = 8, Figure 2.2), the empirical hop count distri-bution at larger values of r has a noticeably heavier upper tail compared to thetranslated Poisson model. However, the fit improves quickly with increasing nodedegree (δ = 16, Figure 2.3). To assess the goodness of fit as a function of meannode degree δ and distance r from the source, we compute the Kullback-Leiblerdivergence (KLD, [19]) between the empirical and the model distribution. Figure2.5 indicates that the translated Poisson distribution is an increasingly good fit atlarger distances and higher mean node degrees. As the mean node degree variesfrom δ = 8 to δ = 40, the rate of improvement is large at first and then graduallydecreases.While the stochastic process model for the hop count was derived by relying onproperties of two-dimensional networks, which exhibit continuum percolation andasymptotic properties of the hop count (2.6), the resulting process has no formal22dependency on the dimensionality of the problem. It is therefore reasonable toevaluate the suitability of the translated Poisson distribution as a model for the hopcount in one-dimensional WSNs. To this end, we generate realizations of WSNs on[0,r], by placing nodes uniformly at random with density λ = δ/(2R). The sourcenode is located at 0. Communication range R and node-to-node delays are definedas in the two-dimensional setting described earlier. To relax constraints on the first-passage path between 0 and r, we extend the network from [0,r] to [−A,r+A] (thesimulation results show no sensitivity to this extension for A≥ 2R)The cdfs shown in Figures 2.6, 2.7 and 2.8, corresponding to mean node de-grees δ = 8, 16 and 40, respectively, demonstrate that the translated Poisson dis-tribution is also a good fit for the empirical hop count in one-dimensional WSNs.Similar to our observations in two-dimensional networks, the fit improves as thedistance and mean node degree increase.2.4.3 Localization of Source NodeIn this section, we evaluate the localization error which results from the applicationof the translated Poisson distribution as a model for the hop count in a WSN. Weuse the same network model and simulation setup as described in Section 2.4.2. Anode located at the center x0 of the unit square [0,1]2 is the source of a broadcastmessage. Hop count observations are made at randomly chosen nodes in the WSN,conceivably by a mobile sink which can interrogate a node at its current position sto obtain the node’s hop count information. The mobile sink, which is assumed toknow its own position, then estimates the location of the source node, given all thehop count observations.A second, fictitious WSN serves as a baseline for comparison. In this network,the hop count at an interrogated node a distance r away from the source node isgenerated by drawing from the translated Poisson distribution Z ∼ TPois(zi;µ,ζ ),with µ = θ1r+θ2 (2.21) and ζ = dr/Re (2.18). These observations are idealizedin the sense that their statistics are characterized exactly by the observation model.At the mobile sink, let a random variable X ∈ [0,1]2 describe the (unknown)location of the source node and zi ∈ N be the hop count observed at position si ∈[0,1]2, i = 1, . . . ,n. The mobile sink applies Bayes’ rule to compute the posterior23pdf of the source node location aspX(x|z1:n) = α pX(x)n∏i=1TPois(zi;µi,ζi) ∀x ∈ [0,1]2, (2.31)where α is a constant to normalize the posterior pdf, and pX(x) is the prior distri-bution of the source location. The parameters of the translated Poisson distribu-tion are µi = θ1ri +θ2 (2.21) and ζi = dri/Re (2.18), with the Euclidean distanceri = ‖si−x‖ between the source location hypothesis x and the mobile sink positionsi. The parameters θ1,θ2 are determined off-line by maximum-likelihood estima-tion (2.21). In practice, the a posteriori distribution pX(x|z1:n) is often computedrecursively, i.e. one observation at a time, by a Bayes filterpX(x|z1:k) = αk pX(x|z1:k−1)TPois(zk;µk,ζk) ∀x, (2.32)where pX(x|z0) = pX(x) is the prior distribution of the source location, which isuniform in our simulation. In order for the Bayes filter to be computationallytractable, we discretize the location hypothesis space [0,1]2 to a grid of J = 32×32cells x j, j = 1, . . . ,J. This form of the Bayes filter is referred to as a grid filter [110].Since in a grid filter, the mode of the distribution is necessarily quantized, we usethe posterior mean as our estimate xˆ of the source location,xˆ =J∑j=1x j pX(x j|z1:n). (2.33)The a posteriori pdf of the source location tends to become unimodal and symmet-ric as the number of randomly taken hop count observations increases, so that theposterior mean is increasingly representative of the maximum a posteriori estimate(MAP).In the same fashion, we perform source node localization in the idealized, base-line WSN and denote the resulting a posteriori pdf of the source location qX(x|z1:n).Figure 2.9 shows the cdfs of the normalized localization errore = ‖xˆ− x0‖/R (2.34)corresponding to pX(x|z1:n) (2.32) and qX(x|z1:n) with n = 8 randomly chosen ob-240 2 4 6 8 = 0.5RCDFl modelempiricalllllll l l l l l0 2 4 6 8 10 = 1Rl modelempiricalllllllll l l l l l0 5 10 = 2Rhop countCDFl modelempiricall l llllllll ll l l l l l0 5 10 15 20 25 30 350. = 8Rhop countl modelempiricall l l l l l l l l l l l l lllllllllllll ll l l l l l l l lhop countCDFFigure 2.2: Comparison of the translated Poisson model and the empiri-cal first-passage hop count (with 95% confidence interval) in a two-dimensional network with mean node degree δ = 8. CDFs shown forsource-to-sink distances r ∈ {0.5R, R, 2R, 8R}.servations, for δ = 8 and δ = 40. We observe that the localization error becomessmaller with increasing mean node degree, δ . For δ = 8, the probability that the lo-calization error exceeds 3R is approximately 0.075. This is likely due to the heavierupper tail observed in the empirical hop count distribution for weakly supercriticalWSNs, as seen in Figure 2.2. It is known that the robustness of Bayesian inferencesuffers, if the distribution of the observation noise has heavier tails than the like-lihood function modeling it [90]. For δ = 40, the probability that the localizationerror exceeds 3R is less than 0.025.250 2 4 6 8 = 0.5RCDFl modelempiricalllllll l l l l l0 2 4 6 8 10 = 1Rl modelempiricalllllllll l l l l l0 5 10 = 2Rhop countCDFl modelempiricall l llllllll ll l l l l l0 5 10 15 20 25 30 350. = 8Rhop countl modelempiricall l l l l l l l l l l l l llllllllllll ll l l l l l l l l lhop countCDFFigure 2.3: Comparison of the translated Poisson model and the empiri-cal first-passage hop count (with 95% confidence interval) in a two-dimensional network with mean node degree δ = 16. CDFs shown forsource-to-sink distances r ∈ {0.5R, R, 2R, 8R}.2.5 ConclusionIn this chapter, we have proposed a new approach to model the hop count distri-bution between a source and a sink node in broadcast WSNs, in which messagepropagation is governed by first-passage percolation and node-to-node delays arecharacterized by i.i.d. exponential random variables. We utilize the abstract modelof a stochastic jump process to describe the hop count. By making a simplifyingindependence assumption and using a maximum entropy argument, the hop countmodel process is shown to have a translated Poisson marginal distribution. Sim-ulation results confirm that the empirical hop count distribution is generally wellapproximated by this model. The simulation results also show that the error re-sulting from the application of the translated Poisson model in a target localization260 2 4 6 8 = 0.5RCDFl modelempiricallllllll l ll l0 2 4 6 8 10 = 1Rl modelempiricall llllllll l l l l0 5 10 = 2Rhop countCDFl modelempiricall l lllllllll ll l l l l0 5 10 15 20 25 30 350. = 8Rhop countl modelempiricall l l l l l l l l l l l l llllllllllll ll l l l l l l l l lhop countCDFFigure 2.4: Comparison of the translated Poisson model and the empiri-cal first-passage hop count (with 95% confidence interval) in a two-dimensional network with mean node degree δ = 40. CDFs shown forsource-to-sink distances r ∈ {0.5R, R, 2R, 8R}.problem is small with high probability, although for node degrees approaching thecritical percolation threshold, the performance degrades. Due to its low computa-tional complexity, we expect this model to be a good candidate for the observationmodel in Bayesian target localization applications in low-cost WSNs which rely onhop count information alone.While the translated Poisson distribution is the most parsimonious result con-sistent with our assumptions about the hop count process, the general form of thejump component of the process is a compound Poisson distribution which, givenadditional information or assumptions, may provide an improved fit of the empiri-cal hop count.270.0050.0100.0200.0500.1000.2000.5000.5 1.0 2.0 4.0 8.0normalized distance r/RKLDllllllδ = 8δ = 12δ = 16δ = 24δ = 40Figure 2.5: Kullback-Leibler divergence (KLD) between empirical distribu-tion and translated Poisson distribution, for δ ∈ {8, 12, 16, 24, 40} andat source-to-sink distances r ∈ {0.5R, R, 2R, 4R, 8R}. The KLD de-creases with mean node degree δ .280 1 2 3 4 5 6 = 0.5RCDFl modelempiricallllll l l l0 2 4 6 = 1Rl modelempiricalllllll ll l0 2 4 6 8 = 2Rhop countCDFl modelempiricall lllllll ll l l0 5 10 15 = 8Rhop countl modelempiricall l l l l l l l l llllllllll l l lhop countCDFFigure 2.6: Comparison of the translated Poisson model and the empiri-cal first-passage hop count (with 95% confidence interval) in a one-dimensional network with mean node degree δ = 8. CDFs shown forsource-to-sink distances r ∈ {0.5R, R, 2R, 8R}.290 2 4 6 8 = 0.5RCDFl modelempiricalllllll l l l l l0 2 4 6 8 = 1Rl modelempiricallllllll ll l l l0 2 4 6 8 10 = 2Rhop countCDFl modelempiricall llllllll ll l l l0 5 10 15 20 = 8Rhop countl modelempiricall l l l l l l l l l llllllllll ll l l l lhop countCDFFigure 2.7: Comparison of the translated Poisson model and the empiri-cal first-passage hop count (with 95% confidence interval) in a one-dimensional network with mean node degree δ = 16. CDFs shown forsource-to-sink distances r ∈ {0.5R, R, 2R, 8R}.300 2 4 6 8 = 0.5RCDFl modelempiricallllllll l l l l0 2 4 6 8 10 = 1Rl modelempiricalllllllll ll l l l0 5 10 = 2Rhop countCDFl modelempiricall l llllllll ll l l l l l0 5 10 15 20 25 300. = 8Rhop countl modelempiricall l l l l l l l l l l llllllllll ll l l l l l l lhop countCDFFigure 2.8: Comparison of the translated Poisson model and the empiri-cal first-passage hop count (with 95% confidence interval) in a one-dimensional network with mean node degree δ = 40. CDFs shown forsource-to-sink distances r ∈ {0.5R, R, 2R, 8R}.310.0 0.5 1.0 1.5 2.0 2.5δ = 8CDFl idealized hop countempirical hop countlllllll ll l l l l0.0 0.5 1.0 1.5 2.0 2.5δ = 24CDFl idealized hop countempirical hop countllllllll ll l l l0.0 0.5 1.0 1.5 2.0 2.5δ = 40CDFl idealized hop countempirical hop countllllllll ll l l lnormalized localization error eCDFFigure 2.9: CDFs of the normalized localization error e = ‖xˆ− x0‖/R, givenhop count observations at 8 randomly selected nodes, for mean nodedegrees δ ∈ {8, 24, 40}. We compare localization based on empiricalfirst-passage hop counts with idealized hop counts (independent drawsfrom the observation model).32Chapter 3Rollout Algorithms forWSN-assisted Target Search3.1 IntroductionAn autonomous, mobile platform is tasked with finding the source of a broad-cast message in a randomly deployed network of location-agnostic wireless sensornodes. Messages are assumed to propagate by flooding, with random node-to-nodedelays. In WSNs of this type, the hop count of the broadcast message, given thedistance from the source node, can be approximated by a simple parametric dis-tribution. The autonomous platform is able to interrogate a nearby sensor node toobtain, with a given success probability, the hop count of the broadcast message.In this chapter, we model the search as an infinite-horizon, undiscounted cost,online POMDP and solve it approximately through policy rollout. The cost-to-goat the rollout horizon is approximated by a heuristic based on an optimal searchplan in which path constraints and assumptions about future information gains arerelaxed. This cost can be computed efficiently, which is essential for the applicationof Monte Carlo methods, such as rollout, to stochastic planning problems.We present simulation results for the search performance under different basepolicies as well as for parallel rollout, which demonstrate that our rollout approachoutperforms methods of target search based on myopic or non-myopic mutual in-formation utility. Furthermore, we evaluate the search performance for different33generative models of the hop count to quantify the performance loss due to the useof an approximate observation model and the rather significant effect of statisti-cal dependence between observations. We discuss how to explicitly account fordependence by adapting an integer autoregressive model to describe the hop count.3.1.1 Background and MotivationWireless sensor networks (WSNs) have been an object of growing interest in thearea of target localization. The achievable localization accuracy depends on manyfactors, among which are the nature of the observed phenomena, sensor modalities,the degree of uncertainty about sensor node locations as well as processing andcommunication capabilities. Typically, sensor observations are processed in situand reduced to position estimates, which must be routed through the WSN viamultiple hops to dedicated sink nodes, in order to be accessible by the network’suser. In some applications, for example autonomous exploration or search andrescue, an essential feature is that the position estimates reported by the WSN areused to assist and guide a mobile sensor and actuator platform (or searcher, forshort) to the target. The main goal for the searcher is to make contact with thetarget, for example to acquire large amounts of payload information from a sensorthat has observed and reported an event of interest, or to retrieve the target outright.Alternative to the use of WSNs as described, it is conceivable for a mobilesearcher in the deployment area of the WSN to interrogate nearby sensor nodesdirectly to gather information, based on which the position of the target can beestimated. Thereby, expensive computational and communication tasks can be of-floaded from the sensor nodes. A further possibility is to integrate informationobtained from the WSN with the searcher’s on-board, perhaps more sophisticatedsensing capabilities, thus mitigating the need for the WSN to perform very preciselocalization. This can result in a significant simplification of the sensor node hard-ware and software requirements and reduced node energy consumption. Consistentwith this goal, we assume that the sensor nodes are location-agnostic and unawareof the distances from, or angles between neighbor nodes. The information madeavailable by the WSN to the searcher is assumed to be statistically related to thetarget location, e.g. noisy measurements of the distance.34In this chapter, we consider the use of randomly deployed WSNs in which,starting from a source node, messages are disseminated by flooding and the node-to-node transmission delays are random. In such a network, provided that thenode density exceeds a critical threshold, a random cluster of all nodes that havereceived the broadcast message grows over time, starting with the source node thatinitiated the broadcast upon detection of an event of interest, in a process known asfirst-passage percolation [15]. For such a network, it was shown in Chapter 2 thatthe message hop count distribution, parameterized by the distance from the sourcenode, can be approximated by a simple stochastic process model.We are motivated by the problem of a mobile, autonomous searcher which isgiven the task of locating a (generally moving) target, that is, the source node ofa broadcast message in a WSN of the type studied in Chapter 2, relying on ob-servations of the message hop count alone. We focus on the question of how thesearcher can home in and make contact with the target in the shortest expectedtime, given only the hop count observations. Making contact with the target is de-fined here as observing a hop count of zero. The most general framework for thistype of optimal, sequential decision problem under uncertain state transitions andstate observations is the partially observable Markov decision process (POMDP)[56, 69, 79, 110]. POMDPs optimally blend the need for exploration to reduceuncertainty with making progress toward the goal state. Unfortunately, for all butsmall problems (in terms of the sizes of the state, action and observation spaces),solving POMDPs is computationally intractable. This is due to the fact that thesolution is quite general, and must be computed for every possible belief state. Thebelief state (or information state) is the POMDP concept of expressing the uncer-tainty about the true, unobservable state as a probability distribution over all states.For search problems such as the one considered here, it can be more productive toconsider only the current belief state and plan the search with respect to the actu-ally reachable belief space. This approach is referred to as online POMDP [85].Online POMDPs have the additional advantage over offline solutions of being ableto (quickly) respond to changing model parameter values. However, in many prac-tical applications, online POMDPs still present a formidable computational chal-lenge, compounded by the need to operate in real-time. Therefore, online POMDPsare most often solved approximately and hence, suboptimally [17]. One class of35approximate techniques is based on Monte Carlo techniques. Among these, pol-icy rollout [8, 17] is characterized by a guarantee of improved performance overthe base policy on which the rollout relies. Furthermore, rollout algorithms lendthemselves to parallelization and have the anytime property, which means that theexecution can be stopped at any time, yielding an approximate solution. However,the accuracy of the solution increases the longer the algorithm is allowed to run.3.1.2 Chapter ContributionThis chapter makes the following contributions:1. We propose a rollout-based algorithm which minimizes the expected time-to-detection, to guide a mobile searcher to a target. The search is modeled asan infinite-horizon, undiscounted cost, online POMDP and solved approx-imately through policy rollout. As a sample-based method, rollout strictlyapplies only to finite-horizon problems. To approximate the terminal cost-to-go at the rollout horizon, we propose the use of a heuristic based on anoptimal search plan, where search path constraints and assumptions aboutfuture information gains are relaxed. This terminal cost can be computedefficiently in O(n logn) time, thus making rollout a viable solution approachfor this online planning problem. We show through simulation, that ourrollout approach outperforms popular methods of target search based on amyopic or non-myopic mutual information utility.2. We show, that a formulation of the target search problem in terms of anexplicit tradeoff between exploitation and exploration (referred to as “info-taxis” [112]), is equivalent to a myopic mutual information utility.3. An argument based on the contact distance in Poisson point processes moti-vates the formulation of a lower bound on the expected search time for mul-tiple searchers, which are assumed to be distributed uniformly at random, interms of the searcher density.4. Simulations of a stationary target search indicate that the statistical depen-dence between observations can not be ignored without significant penalty.36We discuss mitigating strategies and propose an integer autoregressive pro-cess as a model of the observation dependence. This model is derived byadapting the INAR(1) process [1] to translated Poisson innovations.3.1.3 Chapter OrganizationThe chapter is organized as follows: In Section 3.2, our system model is intro-duced. The main contribution, a heuristic for the expected search time at the rollouthorizon, is derived in Section 3.4.2. Because of its use as a reference for perfor-mance evaluation, myopic and non-myopic mutual information utilities are brieflyreviewed in Section 3.5. We present simulation results in Section 3.7 that show theperformance improvement of the rollout algorithm over existing techniques. Wealso discuss the loss of performance due to statistical dependence of the empiricalhop count observations, and present mitigating strategies, including a model forthe observation dependence, in Section 3.8. Conclusions for future work are drawnin Section 3.9. To simplify the notation for probabilities, we omit the names ofrandom variables when this is unambiguous.3.2 System Model3.2.1 Wireless Sensor Network ModelRandomly deployed WSNs are frequently described by Gilbert’s disk model [35,40], which we adopt in this chapter. Without loss of generality, the deploymentarea is assumed to be the unit square [0,1]2 ⊂ R2. Sensor nodes are distributedaccording to a spatial Poisson point process Pλ of density λ , restricted to thedeployment area. Two sensor nodes are said to be linked if they are within com-munication range R of each other. The sensor nodes Pλ and their communicationlinks ER form a random geometric graph Gδ = {Pλ ,ER} [81] with mean node de-gree δ = piλR2. The mean node degree must exceed a critical threshold for a largeportion of the network to be connected.In order to simulate a mobile searcher making hop count observations whileoperating in the deployment area of such a WSN, we discretize both the searcherposition and the sensor node coordinates. Rather than placing sensor nodes ran-37domly in [0,1]2 according to a Poisson point process, we restrict possible node lo-cations to a finite set of N = M2 grid coordinates, C = {0,1/M, . . . ,(M−1)/M}2,indexed by the set X = {1, . . .N}. We refer to X as the set of cells. A cell isoccupied by a sensor node with the uniform probability p. The resulting randomnode placement over C approximates a Poisson point process with density λ = pNrestricted to [0,1]2, and converges to a Poisson point process in the limit of de-creasing cell size (this is one way to define a Poisson point process).The sensor node in cell x which initiates a message broadcast upon the pre-sumed detection of an event of interest, is designated as the target. Every nodeforwards the message to its neighbors within the communication range R, and du-plicates of the message are discarded in order to avoid unnecessary retransmissions.Each node-to-node link has an associated transmission delay. We assume that thenode-to-node delays are independent, exponentially distributed random variableswith a common mean. This is a reasonable assumption for networks where com-munication is unreliable and retransmissions are required, or where sensor nodesharvest energy from unreliable environmental sources and may be dormant for ran-dom periods. Furthermore, we assume that interference between transmissions isnegligible. In such a network, the observed hop count at the node in cell x′ is thehop count of the first-passage path, that is the pathγ∗ = argminγ∈Γ(x,x′){Td(γ)} (3.1)which minimizes the message transmission delay Td(γ) for all paths γ ∈ Γ(x,x′)connecting nodes x and x′. In Chapter 2 it was established that for networks withstrongly supercritical mean node degree δ , the hop count at distance r from thetarget is well approximated by a translated Poisson distribution,fZ(z;r) = Pois(z−ζ (r);λ (r)) z≥ ζ (r) (3.2)where λ (r) = µ(r)−ζ (r), with a mean hop count given by µ(r) = θ1r + θ2 andminimum hop count ζ (r) = dr/Re. The parameters θ1,θ2 can be learned in variousways, for example through maximum likelihood estimation as described in Section2.3.3. It is worth noting that θ1,θ2 are invariants of the common mean node-to-38node passage time (which in turn may be sensitive to operating conditions, suchas the average available energy), i.e. θ1,θ2 only depend on the network designparameters.In the simulations, we use different generative models of the hop count. Thisenables the independent study of the relative performance of various search algo-rithms and further effects due to non-ideal observations.Definition 3.1. We define the following generative models of the hop count:M1: observations are the first-passage hop counts in a WSN realization which ismodeled as a random geometric graph with exponentially distributed node-to-node passage times, and are referred to as empirical hop counts,M2: observations are first-passage hop counts, but are randomly sampled frommultiple, independent realizations of WSNs. Hop counts generated by thismodel are referred to as independent, empirical hop counts,M3: observations are independent draws from the translated Poisson model (3.2),and are referred to as idealized hop counts.It was verified through simulations of the generative models M1 and M3 inSection 2.4.3, that the smallest average target localization error is achieved usingthe idealized hop count observations generated by M3. In Section 2.4.3, the obser-vations are taken at randomly selected nodes of the WSN. With increasing meannode degree of the WSN, the localization error performance of M1 approaches thatof M3. In this chapter, simulations are designed to evaluate the search-time perfor-mance for different generative models, whereby the observations are constrainedto be made along the path of the mobile searcher.3.2.2 Autonomous Mobile SearcherThe mobile searcher acts by moving among the cells defined by the grid coordi-nates C = {cx}x∈X . In each visited cell, the searcher attempts to acquire the hopcount of the broadcast message. If the cell is occupied by a sensor node, the mes-sage hop count is observed with probability q. Otherwise, the searcher receivesno response at all. The target is said to be detected when a hop count of zero is39observed. Time is assumed to be discrete. At every time step k, the searcher maydecide to stay in the cell it is currently visiting, or move to one of the four neighborcells. Any searcher action incurs a cost, that is, an increment of search time.3.2.3 Formulation of Target Search ProblemA POMDP is a general model for the optimal control of systems with uncertainstate transitions and state observations [69, 110]. In target search problems for-mulated as POMDPs, the target motion is modeled as a Markov chain, searcheractions may have uncertain effects, and only noisy observations of at least onestate variable are available. Since a focus of this chapter is the study of a hop countobservation model applied to target search, we can restrict attention to a stationarytarget and assume, that the searcher position is completely determined by the ac-tions (that is, the searcher position is known without ambiguity). It is worth point-ing out that certain instances of search problems can be modeled as multi-armedbandits, for which index policies exist under suitable conditions [71, 97]. In thischapter, online methods of policy rollout will be used to compute an approximate,suboptimal solution for the target search problem.Any POMDP can be defined in terms of a tuple 〈S ,A ,T,J,Z ,O〉, where Sis the state space, A the set of admissible actions, T the state transition function, Jthe cost function (commonly, the reward function R is specified instead), Z is theset of observations and O defines the state observation law.Definition 3.2. The POMDP for the target search problem is defined by• S =X ×Y is a finite, joint state space, where X is the range of the par-tially observed target position andY =X the range of the searcher position.A state is represented by the vector s = (x,y)T .• A = {Ay}y∈Y is a family of action sets indexed by the set of all possiblesearcher positions. An action set Ay ⊆ {stay,north,east,south,west,D} de-fines the possible moves the searcher can make from its current position y inthe next time step, and is augmented by an action D denoting detection. Ifthe target has not been detected by time k, the searcher’s next action is con-strained by ak+1 ∈Ayk\{D}. If the target is detected at time k (by observinga message hop count of zero), then ak+n = D, for all n > 0.40• T :S ×Ay×S → [0,1] is the transition kernel describing the state dynam-ics. In our model, where the target is assumed to be a stationary sensor nodeand the searcher position is completely determined, we haveT (s,a,s′), Pr{Sk+1 = s′|Sk = s,a} (3.3)=δx′,x δy′,y a = Dδx′,x δy′,y+a otherwise(3.4)• J :Ay→ R specifies the cost of executing the action a,J(a),0, if a = D1, if a ∈Ay(3.5)• Z = N0∪{φ} is the set of observations of message hop counts, N0, aug-mented by a possibility of making no observation, denoted φ .• O :S ×Ay×Z → [0,1] is the hop count observation model, defined asO(s,a,z),1, a = DPr{Z = z|s}, otherwise(3.6)wherePr{0|s}=q, if x = y0, otherwise(3.7)Pr{φ |s}=1−q, if x = y1−qp, otherwise(3.8)Pr{n|s}=0, if x = yqp fZ(n;r(s)), otherwise(3.9)for n > 0. Here, fZ(n;r(s)) is the translated Poisson distribution (3.2), andr(s) = ‖cx− cy‖2 is the Euclidean distance between target and searcher po-sitions.41Figure 3.1: Dynamic Bayes Network representing a POMDPSince the state is only partially observable, the mobile searcher maintains abelief state, that is, a joint probability mass function b(s) = Pr{S = s} of the statevariable S, to represent the uncertainty about the true state. Due to independence,b(s) = bX(x)bY (y). The belief state bk at time k is a sufficient statistic [110] of allpast history of actions a1:k = a1, . . . ,ak and observations z1:k = z1, . . . ,zk, given ana priori belief state b0 with bX ,0(x) = 1/N and bY,0(y) = δy,y0 . The belief state hasthe Markov property: after taking an action and making an observation, the newbelief state is conditionally independent of the past, given the previous belief state,bk(s) = Pr{s|z1:k,a1:k,b0}= Pr{s|zk,ak,bk−1}. (3.10)The a posteriori belief state can therefore be updated recursively by applyingBayes’ rule,bk(s) = η O(s,ak,zk) ∑σ∈ST (σ ,ak,s)bk−1(σ) (3.11)where the normalization factor η is given byη−1 = ∑s∈SO(s,ak,zk) ∑σ∈ST (σ ,ak,s)bk−1(σ) (3.12)We use the following shorthand notation for the belief state update:bk = B(bk−1,ak,zk) (3.13)42Any POMDP can be represented graphically as a dynamic Bayesian network (Fig-ure 3.1) which is a model of the relevant variables and their relationships.Given a stationary policy, i.e. a mapping pi :B→Ay from the space B ofbelief states to actions in the set of admissible actions, which governs the actionselection of the searcher, the expected infinite-horizon, undiscounted cost isJpi∞(bk) = E[∞∑n=1J(ak+n)∣∣∣∣bk](3.14)where ak+n = pi(bk+n−1) and the expectation is with respect to the bk+n for n≥ 1.It has been shown in [92] for a path-constrained search problem, that the infinite-horizon, undiscounted cost is finite under suitable assumptions. To make this prob-lem tractable for rollout, we introduce the terminal costJ0(b,a) =0, if a = DTh(b), otherwise(3.15)where Th(b) is a heuristic cost which approximates the expected search time at therollout horizon under relaxed assumptions. The heuristic cost will be derived inSection 3.4, resulting in the following approximate cost-to-goJpiH(bk) = E[J0(bk+H ,ak+H)+H∑n=1J(ak+n)∣∣∣∣bk]. (3.16)It is convenient to define the Q-function [110], which quantifies the cost of takingthe action a in belief state b, and selecting actions according to policy pi thereafter,Qpi(b,a) = J(a)+Eb′[JpiH−1(b′) |b,a](3.17)where b′ = B(b,a,z) (3.13), i.e. b′ is the belief state conditioned on the next actiona and observation z.433.3 Approximate Online Solution of POMDP by Rollout3.3.1 Rollout AlgorithmIn a rollout algorithm, the expectation in (3.17) is approximated as a sample aver-age of the cost, obtained through Monte Carlo simulation of a fixed number, W , ofsystem trajectories (Algorithm 1). A trajectory starts in sample state s∼ bk, takesobservation zk+1 ∼ Pr(Z = z|s,ak+1) and selects the next action ak+1 according tothe base policy pi . Observation and action selection are repeated up to the rollouthorizon H, with time step n incurring the cost J(ak+n). At the horizon, the ter-minal, heuristic cost J0(bk+H ,ak+H) is evaluated and added. The effective rolloutpolicy pi ′ is then obtained (implicitly) by selecting the action which minimizes theaverage cost,ak+1 = pi ′(bk) = argminα∈Ayk\{D}Qpi(bk,α). (3.18)It is guaranteed that the rollout policy pi ′ is an improvement over the base policy,in the sense that for the cost, we have Jpi′H (b)≤ JpiH(b). Rollout is effectively asingle-step policy iteration [48]. It should be noted however, that an improvementin a heuristic cost does not generally imply an improvement in the true, infinite-horizon, undiscounted cost. The base policy pi is typically chosen such that theaction selection is computationally very simple. In this chapter, we adopt a numberof different base policies and evaluate their search time performance by simulation.3.3.2 Parallel RolloutA generalization of the rollout approach, which uses multiple base policies pi ∈Πfrom a predefined set of policies, Π, and is referred to as parallel rollout, wasintroduced in [16]. In parallel rollout, the Q-function is computed asQΠ(b,a) = J(a)+Eb′[minpi∈ΠJpiH−1(b′) |b,a](3.19)Parallel rollout is based on the intuition that multiple base policies allow a morethorough exploration of the reachable belief space than a single base policy alone.It is shown in [16] that the finite-horizon cost for parallel rollout is no higher than44the lowest cost achievable by any of the component policies pi ∈Π. However,with a heuristic cost, this guarantee does not imply an improvement of the actualperformance, and simulation must be used to evaluate the search time for differentpolicy sets Π.3.4 Heuristics for the Expected Search TimeHeuristics can be used to approximate the expected cost-to-go of a POMDP. Theuse of a heuristic is appropriate when the computation of the exact Q-functionis impractical, but an approximation of the expected cost-to-go can be obtainedby making reasonable, simplifying assumptions about the future behavior of thesystem under consideration [17]. Heuristics can be either generic or application-specific. As an example of a generic heuristic, the QMDP [67] is based on theassumption that the state of the system becomes observable after the next timestep, thus reducing the problem to the solution of an MDP. The performance re-sulting from the application of such a generic heuristic can vary drastically withthe specific application. The quality of a heuristic for a given problem can beimproved, when its design incorporates “expert knowledge” about the applicationdomain. Any performance gain due to the improved heuristic, however, comes atthe expense of general applicability.In this section, we derive heuristics for the terminal cost at the rollout horizon,which approximate the remaining expected search time for the mobile searcher.The heuristics themselves are based on solutions of relaxed, optimal search prob-lems.A first observation about the future behavior of the system, prevalent especiallyat the later stages of the search, is that the hop count information available to thesearcher tends to get exhausted, as each sensor node conveys at most a single hopcount observation. In this section we will therefore assume, that no useful informa-tion in the form of hop count observations z > 0 can be obtained beyond the rollouthorizon. Furthermore, we propose that the searcher action constraints in Definition3.2 (collectively referred to as search path constraint) can be relaxed when the cellsof high probability of containing the target are spatially concentrated during thelater stages of the search, thus requiring the searcher to travel only short distances.453.4.1 Constrained Search PathIn this section, we consider a search time heuristic based on the assumption thatthe searcher will not be able to acquire useful hop count information (other thandetecting the target, defined as z = 0) beyond the rollout horizon. This assumptionleads to the requirement of solving an optimal, path-constrained search problem.The path-constrained search is a computationally challenging problem, which hasbeen shown to be of NP-complete complexity [111].A number of algorithms for the solution of optimal, path-constrained searchproblems, involving both static and moving targets, have been proposed and an-alyzed by various authors. These include dynamic programming to compute anoffline solution of a POMDP [27, 92], as well as branch-and-bound techniquesto solve linear or non-linear integer program formulations of the search problem[28, 38, 68, 86]. In most of these approaches, the optimization goal is chosen tobe the maximization of the probability of detecting the target, subject to a costconstraint (such as a finite search time horizon). The objective in our applicationis to minimize the expected search time. We show in Appendix C how to formu-late this problem as an integer linear program and compare its performance to thecorresponding integer linear program for the maximum probability of detection.Unfortunately, the computational complexity of the path-constrained searchproblem prevents its use as a heuristic to approximate the future expected searchtime in a Monte Carlo based rollout algorithm. In the following section, we con-sider the relaxation of the path constraint, resulting in a heuristic which yields toan efficient solution approach.3.4.2 Relaxation of Search Path ConstraintAs the search progresses and more hop count observations are gathered, we gen-erally find that the target pmf tends to concentrate around the most likely targetposition. At this stage of the search, the searcher will be traveling shorter distancesand dwell longer in cells with high probability of containing the target, so that thepath constraint becomes less imposing.These considerations motivate the construction of a heuristic through relax-ation of the search path constraint which limits the searcher to stay in the current46cell or move to an immediate neighbor cell. We will further assume that no usefulinformation in the form of hop count observations z > 0 can be obtained beyondthe rollout horizon.At the rollout horizon, let the target location be described by the belief statebX . Recall that the detection probability per look is q, given that the target node isin the visited cell. We derive the heuristic by starting with a discrete search plannot subject to a path constraint [103]. Let l(x, `−1) denote the number of looksthat have been placed in cell x so far, out of a total of `−1 looks. The detectionfunction for the `-th look, conditioned on cell x containing the target, is given bythe geometric distributionβ (x, `) = q(1−q)l(x,`−1). (3.20)The probability of failing to detect the target in `−1 looks and succeeding on the`-th look, placed in cell x, is thereforebX(x)β (x, `). (3.21)Under the assumption that each look incurs the same cost of one search time incre-ment, the optimal search plan, that is, the sequence of looks which minimizes theexpected search time, is shown in [103] to beξ ∗` = argmaxx∈XbX(x)β (x, `) (3.22)that is, the next look is always placed in the cell with the highest probability ofdetecting the target (with ties broken arbitrarily). The search commences in the cellof highest prior target probability bX(x) and, if the target is not detected, continuesto look in this cell until for some x′ ∈X ,bX(x′)β (x′, `)≥ bX(x)β (x, `). (3.23)The search then expands by allocating search effort to both cells x and x′. The47expected search time isTh(b) =∞∑`=1`bX(ξ ∗` )β (ξ ∗` , `) (3.24)Computing this expectation, however, involves the optimal discrete search planξ ∗ (an infinite sequence) which can only be evaluated recursively. We show that,if the assumption of searching by discrete looks is relaxed, the expected searchtime can be computed in explicit form. In order to allow the search effort to beapplied continuously rather than in discrete increments, the search parameter u,also referred to as search intensity, is introduced in [53] and defined byq = 1− exp(−1/u), (3.25)where q is the detection probability per discrete look. Let bX(x), x ∈X be sortedin decreasing order, such that bX(x1)≥ ·· · ≥ bX(xN), and define bX(xN+1) = 0.The optimal continuous time search [103] starts in cell x1 at t1 = 0, and generallyexpands from n to n+1 cells after a dwell time of ∆tn = tn+1− tn, given bybX(xn+1) = bX(xn)exp[−∆tnnu], (3.26)where the search effort parameter nu reflects the expanding search which involvesmore and more cells. It is shown in [53], that the optimal search is equivalent tothe greedy maximization of the entropy of the target pmf.Since the bX(x) are sorted in decreasing order, there exists an m≤ N for whichbX(xm)> 0 and bX(xn) = 0 for n > m. The sequence tn is given through (3.26) bythe recursiontn+1 = tn +nu log[bX(xn)bX(xn+1)], n = 1, . . . ,m. (3.27)An explicit expression for tn istn = u log[∏ni=1 bX(xi)bnX(xn)], n = 1, . . . ,m. (3.28)As the search never expands to involve more than m cells, the final dwell time48interval ∆tm is unbounded above, i.e. tm+1→ ∞. The expected search time for thisproblem isTh(b) = u−1m∑n=1∫ tn+1tnτ bX(xn)exp[−τ− tnnu]dτ (3.29)where the factor u−1 normalizes the continuous pdf. With (3.26), the expectedsearch time with the proposed heuristic can be expressed asTh(b) =m∑n=1n [bX(xn)(tn+nu)−bX(xn+1)(tn+1+nu)] (3.30)where limtm+1→∞ bX(xm+1)(tm+1+nu) = 0. The cell y, visited by the searcher at therollout horizon, does not generally coincide with the cell x1 of maximum probabil-ity of containing the target. It can be argued that, if the distribution bX is relativelyconcentrated around the cell x1, the searcher has to travel a number of time stepsequal toϑ(b) = M‖cy− cx1‖1 (3.31)before encountering cells of significant probability of containing the target. Thiscan be interpreted as a limited form of a search path constraint. The extra cost interms of search time is included in the heuristic for the expected search time at therollout horizon (3.30),Th(b) = ϑ(b)+m∑n=1n [bX(xn)(tn+nu)−bX(xn+1)(tn+1+nu)] . (3.32)The heuristic search time 3.32 can be computed efficiently. Due to the requirementto sort the m non-zero probability values contained in bX , the run time is of the or-der O(m logm), that is, the complexity of the best known sort algorithms [62]. Theremaining steps of computing 3.32 can be performed in linear time (or logarithmictime when fully parallelized). Speed gains of several orders of magnitude over theinteger linear programming approach in Section 3.4.1 are typical.493.5 Information-Driven Target SearchThe search objective can be described informally as a reduction of the initial uncer-tainty about the target location. Consequently, algorithms that maximize a suitablemeasure of information gain between the a priori and a posteriori belief states,though suboptimal, are readily applicable to many search problems. We considerhere the maximization of the mutual information, which is a special case of themore general Rényi divergence [17]. Many information-driven algorithms havebeen proposed, although for some variants [74, 112] it can be shown, as in Section3.5.2, that they reduce to the basic form of maximization of the mutual information.Differences between algorithms relate more often to the belief state representation,for example as grid-based, mixture or particle filters, and the approximations usedto compute the information gain [43, 78, 87, 93, 94].Due to its widespread application, we use information-driven search as a base-line to evaluate the search-time performance of the proposed, heuristic rollout al-gorithms against.3.5.1 Mutual Information UtilityInformation-driven target search algorithms can be classified broadly into methodsbased on a myopic [31, 43–45, 93, 110] or non-myopic [87, 94] mutual informa-tion utility. It is characteristic for all of these methods to select the action whichmaximizes the mutual information between the current state Sk ∼ bk and the futureobservations {Zk+n}n=1,...,H , that isak+1 = argmaxa∈AykI(Xk;Zk+1:k+H |a). (3.33)The mutual information is defined [19] as the difference between the entropy ofthe state, and the conditional entropy of the state given the observations (omittingtime index k)I(S;Z1:H |a) = H(S)−EZ1:H [H(S|z1:H ,a)]. (3.34)50In the myopic case, i.e. for H = 1, the mutual information can be computed exactlybased on its definition,I(S;Z|a) = H(S)− ∑z∈ZPr{Z = z|a}H(S|z,a). (3.35)For H > 1 however, computing the expectation in (3.34) quickly becomes imprac-tical, as the width of the POMDP policy tree, (|A ||Z |)H , grows exponentially inthe horizon. By applying Monte Carlo techniques, the expectation in (3.34) is re-placed by a sample average. Therefore, the rollout algorithm (Algorithm 1) lendsitself to computing a sample average of the information gains of a number, W , ofsystem trajectories. As in Section 3.2.2, a trajectory starts in a sample state s∼ b,takes an observation z∼ Pr(Z = z|s,a) and selects the next action according to thebase policy pi . The sample average of the information gain H(Xk)−H(Xk|z1:H)over W trajectories is then interpreted as an approximation of the non-myopic mu-tual information under base policy pi (Algorithm 3). The rollout policy is obtainedimplicitly by selecting the action which maximizes the approximate mutual infor-mation utility,ak+1 = argmaxa∈AykIpi(Xk;Zk+1:k+H |a). (3.36)3.5.2 Infotaxis and Mutual InformationA generalized information-driven search strategy referred to as “infotaxis” wasproposed in [112], in which the optimal seacher behavior is characterized as atradeoff between the need to perform exploitative and exploratory, information-gathering actions, motivated by concepts from reinforcement learning [106]. Thistradeoff is reflected in the infotactic utility function used in [112] and [74], whichis a weighted sum of terms favouring either greedy or exploratory behaviour.It can be shown that the infotactic utility can be reduced to the myopic mutualinformation (3.34) with H = 1. A proof for this equivalence is given in AppendixD. Infotaxis offers an interesting viewpoint by showing explicitly, that maximiz-ing the mutual information inherently balances exploitation and exploration in anoptimal way, when the overall goal is the entropy reduction of the belief state.513.6 A Lower Bound on Search Time for MultipleSearchersIt has been demonstrated, that multiple cooperating searchers can achieve a muchimproved search time performance [32, 36, 46, 49, 73, 89]. This improvementcan be interpreted as a form of diversity gain. Other factors being equal (suchas the observation model), the search time therefore depends to a large extent onthe initial, spatial distribution of the searchers and the degree of their cooperation.Ideally, the distributed searchers share the same information, or belief state, whichis formally equivalent to a centralized approach. In more realistic models however,communication constraints limit the amount of, or the rate at which informationmay be shared between searchers.Since deploying and operating the searchers is expensive, it is of significantinterest to characterize mean search times as a function of the number of searchers.A trivial lower bound on the search time is the time for the searcher closest to thetarget to reach it, provided that the target position is known. Assuming that thesearchers are initially distributed uniformly at random in the plane, and that thetime for multiple cooperating searchers to obtain an unambiguous fix on the targetis small compared to the time required to reach it, we obtain a theoretical lowerbound on the search time as a function of the number of searchers per unit area. Inour application, this is a reasonable assumption in light of the results in Chapter 2which show that generally, only a few observations from random positions sufficeto achieve good target localization.We assume furthermore, that all searchers travel at the same, constant speed v,so that the searcher closest to the observed target will reach it first.Under these assumptions, the mean search time can be expressed in terms ofthe contact distance in a spatial Poisson point process Pλ representing the initialpositions of the searchers. The contact distance is a well-defined random variable,given by the shortest distance from the (fixed) target x to the closest searcher anddenoted d(x,Pλ ) [4]. The distribution function of the contact distance isPr{d(x,Pλ )< r}= 1− exp(−λpir2) (3.37)52and its mean isE [d(x,Pλ )] =12√1λ . (3.38)With the mean contact distance, we obtain the lower bound on the mean searchtime as a function of the searcher density per unit area, λ , asE[Tsearch]≥12v√1λ . (3.39)This lower bound offers the insight that in order to reduce the expected search timeby a factor of a through the deployment of multiple searchers, the density of thesearchers must be increased by at least a2.3.7 Simulation Results3.7.1 Simulation SetupIn the simulations, policy rollout [8] is used to approximately solve the onlinePOMDP formulation of the search problem in Definition 3.2, which uses messagehop count observations to estimate the target location [10]. The proposed rolloutalgorithm approximates the infinite horizon, undiscounted cost at the rollout hori-zon by heuristic (3.32).We compare the search times obtained by the rollout algorithm to approacheswhich use myopic or non-myopic mutual information utility to control the searcher,similar to [43, 74, 87, 94, 112]. All simulations share the same message hop countobservation model [10] to estimate the target location. We furthermore evaluate theperformance of some extensions of the basic rollout approach (Algorithm 1), suchas different base policies, as well as parallel rollout [16] of multiple base policies.The WSN deployment area is assumed to be the unit square. The search spaceis defined by dividing the deployment area into a square grid of N search cells.The size of a grid cell thus limits the accuracy to which the target position can bedetermined. On the other hand, the cost of a single update of the a posteriori targetprobability distribution is proportional to the size of the search space, N, whichis incurred by the rollout algorithm |A |HW times per time step. As the searcher53requires a number of time steps proportional to√N to cover a given distance, thetotal simulation time increases as N3/2 (ignoring other overhead). A compromisebetween target localization accuracy and simulation time on a contemporary multi-core processor has been determined to be N = 32×32. Sensor node locationsare restricted to the discrete cell coordinates, an appropriate assumption given thediscrete nature of the search problem. A cell is occupied by a sensor node withprobability p = 0.75. The cell at the center of the deployment area is declared asthe source of a broadcast message and the hop counts assigned to each sensor nodeare generated by the models M1, M2 or M3 defined in Section 3.2.1. The searcher’sinitial position is random, but at a fixed Euclidean distance of d = 10/√N awayfrom the target.To make the search time performance of different algorithms or base policiescomparable, the simulations are designed to use the same set of hop count realiza-tions (generated by either one of the models M1 to M3) and the same set of initialstates for every case.The searcher’s hop count observation model is a translated Poisson distribution(3.2), given the distance r from the hypothesized target and the communicationradius R. The parameters θ1,θ2 governing the mean hop count of (3.2) are deter-mined off-line by maximum likelihood estimation as in Chapter 2, based on thechosen node occupation probability p = 0.75 and a network mean node degree ofδ = 40.The mobile searcher has a success probability q = 0.75 of making a hop countobservation, given that the visited cell is occupied by a sensor node. The searchercan either stay in the cell currently visited, or move to one of its immediate neigh-bor cells, subject to search space boundary constraints.The rollout parameters used in the simulation are: the rollout horizon is setto H = 4 (except for one test case which uses H = 2), the number of trajectoriessampled per action is W = 200. The base policies used by the rollout algorithmare:random: selects an action a ∈Ayk+n from the current action set uniformly atrandom.constant: repeatedly executes the initial action a ∈Ayk .54greedy: selects the action which maximizes the a posteriori probability bX ofdetecting the target at the simulated searcher position after the nextaction.3.7.2 Idealized ObservationsTo evaluate and compare the performance of the rollout and information-driven al-gorithms (Figures 3.2 to 3.6), we apply the generative hop count model M3 definedin Section 3.2.1. Recall that in this model, the observed hop counts are independentdraws from the translated Poisson distribution, given the distance from the sourcenode. This represents an idealized situation, as the observation statistics match thesearcher’s observation model exactly.Figure 3.2 shows the cdfs of the search time for the rollout algorithm (undera base policy of random action selection and with heuristic terminal cost (3.32)),and for a myopic, mutual information-driven algorithm. Rollout achieves a betteraverage search time, as well as a 100% success rate. With the myopic approach,the searcher suffers from an inability to revisit cells in which the hop count hasbeen observed previously, so that no further information gain is possible. This canprevent the searcher from further approaching the target. As Figure 3.3 shows, witha non-myopic, mutual information-driven algorithm (3.36), the searcher is able toreach the target with a 100% success rate, although in terms of the average searchtime, it is still outperformed by the rollout algorithm.Figure 3.4 shows the result of comparing the 3 base policies defined earlier inthis section. Selecting base policies is guided by heuristics and is also to a large ex-tent dictated by computational complexity considerations in the Monte Carlo sim-ulation context. We find that, out of the set of base policies evaluated, the randomaction selection yields the best search time performance. This result indicates thatthe solution of our problem benefits from a relatively high degree of exploration(by random walk) of the reachable belief space to reduce uncertainty, whereas thegreedy policy grants too much weight to unreliable information about the targetlocation, embodied by the current belief state.Figure 3.5 shows the cdfs of the search time for parallel rollout (3.19). Wedefine 2 base policy sets, which are combinations of random action selection with55constant action selection, and random action selection with greedy action selection,respectively. Due to the use of a heuristic for the terminal cost, we cannot rely on aguarantee of performance improvement over the single policy case (the cdf for thesingle policy rollout under random action selection is shown for reference). Theparallel rollout of random and constant action selection is seen to have improvedperformance over random action selection alone most of the time, but not always.By reducing the rollout horizon to H = 2, any performance differences between thepolicies diminish (Figure 3.6). This is expected, considering that for H = 1 (themyopic case), the choice of base policy becomes altogether immaterial.3.7.3 Empirical ObservationsThe search times obtained for the generative model M3 constitute a lower bound onthe search time performance, because M3 generates idealized hop count observa-tions which are independent and whose distribution matches that of the observationmodel (3.2). Another set of simulations is designed to evaluate the performance forthe empirical models M1 and M2, both of which generate observations based onfirst-passage hop counts in WSNs. Observations generated by M1 represent a re-alistic set of empirical hop counts in a single instance of a WSN, whereas M2generates observations to be statistically independent, by virtue of being sampledfrom independent realizations of WSNs. Figure 3.7 compares the cdfs of the searchtime for the 3 generative models. The performance loss experienced under modelM2 is attributable to the approximate nature of the observation model (3.2). Theadditional performance loss under the realistic model M1 is explained by the statis-tical dependence of the empirical observations. This rather significant effect shows,that the naïve Bayes assumption of independent observations incurs a large perfor-mance penalty in this application. We discuss aproaches of mitigating observationdependence in Section 3.8.The relative performance advantage of rollout over the non-myopic, information-driven approach (Figure 3.3) is generally preserved under the generative hop countmodels M1 and M2, as Figure 3.8 shows.560 20 40 60 80 100 1200. timeCDFl rollout (randomAction)mutual information (myopic)l l l l l l llllllllll ll l ll l l l l l lFigure 3.2: CDF of the search time for rollout under random action selec-tion, compared to the search time for myopic mutual information utility.Rollout horizon H = 4, hop counts are generated by model M3. Errorbars indicate the 95% confidence intervals.3.8 Statistical Dependence of ObservationsIn order to gain insight into the statistical dependence between hop count obser-vations, we compute the local average deviation of the hop count from the modelmean, that is, we estimate E[z− µ(r)]. Local averaging is achieved by Gaussiankernel smoothing with σ = 0.1. The average deviation of the hop count from themodel mean over the WSN deployment area [0,1]2 is shown in Figure 3.9 for asource node at the center, and for the generative hop count models M1-M3 definedin Section 3.2.1, respectively.It is helpful to picture, somewhat informally, the broadcast message propagat-ing through the WSN as a tree-like process. Clearly, the hop counts on relatedbranches are dependent random variables. As a consequence, we find that locally,there can be a noticeable average deviation of the empirical hop count (M1) fromthe model mean (Figure 4.1a). For independent empirical hop counts as generated570 20 40 60 80 100 1200. timeCDFl rollout (randomAction)mutual information (nonmyopic)l l l l l l llllllllll ll l ll l l l l l lFigure 3.3: CDF of the search time for rollout under random action selec-tion, with horizon H = 4, compared to the search time for non-myopicmutual information utility. Rollout horizon H = 4, hop counts are gen-erated by model M3. Error bars indicate the 95% confidence intervals.by model M2, the deviations are much smaller (Figure 4.1b), similar to the inde-pendent hop counts drawn from the translated Poisson model M3 (Figure 4.1c).The statistical dependence between hop count observations can be more for-mally characterized in terms of the correlation coefficientρ = E[(Zk−µk)(Zk−1−µk−1)]σZkσZk−1(3.40)where Zk−1 is the hop count observed at distance r from the target, and Zk the hopcount at a distance dr away from the first observation. The correlation coefficientρ(r,dr), as a function of the distance r from the target and the inter-observationdistance dr, is shown in Figures 3.10a, 3.10b, 3.10c, for the hop count models M1,M2 and M3 described in Section 3.2.1, averaged over 8 WSN realizations.It seems clear that in order to approach the search time performance achievedby the rollout algorithm under idealized hop count observations M3, the depen-580 20 40 60 80 100 1200. timeCDFl rollout (randomAction)rollout (constantAction)rollout (greedyAction)l l l l l l llllllllll ll l ll l l l l l lFigure 3.4: CDFs of the search time for rollout under 3 different base poli-cies: random, constant and greedy action selection. Rollout horizonH = 4, hop counts are generated by model M3. Error bars indicate the95% confidence intervals.dence between empirical observations must be addressed. The value of model M2is in establishing a lower bound for the expected search time achievable by anytechnique for mitigating the dependence of the empirical observations (under oth-erwise identical rollout parameters).We discuss two approaches to reduce the impact of observation dependence.It may be possible to achieve acceptable search-time performance by effectivelyavoiding the use of (strongly) dependent observations. Failing this approach, theobservation dependence must be taken into account explicitly in the observationmodel.3.8.1 Mitigation of Observation DependenceThe dependence of the empirical hop count observations (M1) can be reduced inthe following manner: rather than relying on a single broadcast, the source nodelaunches a sequence of broadcasts, each one effectively inducing a new, indepen-590 20 40 60 80 100 1200. timeCDFl rollout (randomAction)parallel rollout (random, constant)parallel rollout (random, greedy)l l l l l l llllllllll ll l ll l l l l l lFigure 3.5: CDF of the search time for rollout under random action, com-pared to 2 parallel rollout approaches: random and constant action se-lection, random and greedy action selection. Rollout horizon H = 4,hop counts are generated by model M3. Error bars indicate the 95%confidence intervals.dent realization of the random node-to-node delays. When interrogated by thesearcher, a node reports the hop count of one (e.g. selected at random) of thebroadcast messages received. Unfortunately, this approach is not energy efficient.Figure 3.10a suggests that the correlation between successive observations ofthe empirical hop count decreases with the distance between two observations. Wemay increase the average inter-observation distance by reducing the observationprobability q. Unfortunately, this also reduces the number of hop count observa-tions available to the searcher in a given time, and contributes to longer searchtimes. A reduction of the number of observations made by a single searcher, how-ever, can be offset by deploying multiple cooperating searchers, which can increasethe spatial diversity of the hop count observations, thereby further reducing depen-dence between observations.In our application, it is also possible to explicitly determine the degree, in terms600 20 40 60 80 100 1200. timeCDFl rollout (randomAction)parallel rollout (random, constant)parallel rollout (random, greedy)l l l l l l lllllllll ll l ll l l l l l l lFigure 3.6: CDF of the search time for rollout under random action, com-pared to 2 parallel rollout approaches: random and constant action se-lection, random and greedy action selection. Rollous horizon H = 2,hop counts are generated by model M3. Error bars indicate the 95%confidence intervals.of graph distances, to which observations are related. This approach requires, thateach message does not only convey the ID of its source node, but the entire historyof IDs of nodes visited. It is then easy for the searcher to determine the most recent,common ancestor node for a given pair of observations, which is indicative of theirdegree of dependence. The searcher can then decide how to make use of a newobservation, if at all.3.8.2 Explicit Model of Observation DependenceIn order to explicitly take observation dependence into account, we propose to aug-ment the hop count observation model (3.2) by computing the conditional proba-bility distribution of the current hop count, fZ(zk|zk−1;rk,rk−1), given the previousobservation and based on the correlation coefficient. The correlation coefficient isassumed to be known and expressed as a function of the distance from the target610 20 40 60 80 100 1200. timeCDFl rollout (idealized observations)rollout (empirical, independent)rollout (empirical)l l l l l l llllllllll ll l ll l l l l l lFigure 3.7: CDFs of the search time for rollout under random action, withhorizon H = 4, for the 3 hop count generation models M1, M2 and M3.Error bars indicate the 95% confidence intervals.and the inter-observation distance, as in Figure 3.10.Models for stationary processes of count data have been developed by variousauthors [11, 55, 114]. We consider the discrete-valued, auto-regressive INAR(1)model with support restricted to the non-negative integers [1], as an analogue tothe linear, auto-regressive AR(1) model. We adapt the INAR(1) model to describethe hop count observation at time k by lettingZk−ζk = ρ ◦ (Zk−1−ζk−1−∆ζ )+(1−ρ)◦ (Vk−ζk) (3.41)where Zk and Zk−1 represent the current and the previous hop counts, ζk and ζk−1are the minimum hop counts (3.2) due to the finite transmission radius R, and∆ζ = ζk− ζk−1. The variable Vk is referred to as the innovation. The correlationcoefficient ρ is assumed to be non-negative. The binomial thinning operator fordiscrete random variables, denoted ◦, can be interpreted as an analogue to the mul-tiplication by a scalar a ∈ [0,1]. Binomial thinning of a discrete random variable Z620 20 40 60 80 100 1200. timeCDFl rollout (idealized observations)rollout (empirical, indep.)rollout (empirical)nonmyopic MI (idealized observations)nonmyopic MI (empirical, indep.)nonmyopic MI (empirical)Figure 3.8: CDFs of the search time for rollout under random action, withhorizon H = 4 and for the 3 hop count generation models M1, M2 andM3. For comparison, the corresponding results for the non-myopicmutual information utility are shown to demonstrate that the perfor-mance advantage of our search time heuristic over the non-myopic,information-driven approach is generally preserved, regardless of thechoice of generative hop count model. Error bars indicate the 95% con-fidence intervals.is defined in [101] bya◦Z ,Z∑i=1Bi (3.42)where the Bi are i.i.d. Bernoulli random variables, with Pr{Bi = 1} = a andPr{Bi = 0} = 1− a. If the innovation Vk in (3.41) is an i.i.d. translated Poissonvariable with parameter λ , and provided that ζk = ζk−1, it can be shown that thestationary distribution of Zk is also translated Poisson with parameter λ . This fol-lows from the stationary behavior of the INAR(1) model for Poisson innovations63[55]. Given the hop count observation Zk−1 = zk−1, the conditional probability dis-tribution fZ(zk|zk−1;rk,rk−1) is the convolution of a binomial distribution with aPoisson distribution [55], that isfZ(zk|zk−1;λ ,ζ ) =m∑n=0(zk−1−ζn)ρn(1−ρ)zk−1−ζ−neλ1−ρ(λ1−ρ)zk−ζ(zk−ζ−n)!(3.43)where m = min(zk − ζ , zk−1 − ζ ) and λ , ζ are the parameters of the translatedPoisson innovation Vk, with µ(rk) = λ +ζ and ζ = drk/Re as defined in (3.2).3.9 ConclusionWe have proposed rollout algorithms to approximately solve a path-constrainedsearch problem, involving an autonomous searcher observing message hop countsin a WSN, formulated as an online POMDP. The rollout approach uses a novelheuristic for the terminal cost-to-go, making the approach computationally tractable.It was shown, by simulation, that the algorithms outperform both myopic and non-myopic mutual information-driven search approaches. A comparison of the searchperformance under different base policies and parallel rollout shows the poten-tial for further performance improvement, though likely to be gained at additionalcomputational cost.We have also evaluated the search performance under different generative mod-els for the hop count, varying from idealized to more realistic, to quantify the per-formance loss due to statistically dependent observations. We have discussed meth-ods of reducing dependence, and adapted an integer autoregressive process modelto account for the dependence explicitly. Future research will focus on models ofthe observation dependence and their performance evaluation.64 −2  −1.5  −1  −0.5  −0.5  −0.5  −0.5  0  0  0  0  0.5  0.5  1  1  1  1.5  1.5 0.0 0.2 0.4 0.6 0.8 Empirical (M1) 0  0  0 0.0 0.2 0.4 0.6 0.8 Empirical, independent (M2) 0  0  0  0 0.0 0.2 0.4 0.6 0.8 Idealized (M3)Figure 3.9: Deviation of the local average hop count from the model meanover the unit square [0,1]2, with source node at center.65distance from source nodeinter−observation distance 0.1  0.2  0.3  0.4  0.5  0.6  0.7 0.05 0.10 0.15 0.20 0.25 0.30 0.350. Empirical (M1)distance from source nodeinter−observation distance 0  0  0  0  0  0.1  0.2  0.3  0.4  0.6 0.05 0.10 0.15 0.20 0.25 0.30 0.350. Empirical, independent (M2)distance from source nodeinter−observation distance 0  0  0.1  0.2  0.3  0.6 0.05 0.10 0.15 0.20 0.25 0.30 0.350. Idealized (M3)Figure 3.10: Correlation between observations for different generative hopcount models, as a function of the distance from the source node andthe inter-observation distance.66Chapter 4WSN-assisted AutonomousMapping4.1 IntroductionA mobile, autonomous platform is assisted by a wireless sensor network in its taskof inferring a map of the spatial distribution of a physical quantity that is mea-sured by the sensor nodes. Sensor nodes initiate a broadcast in the network, whenthe measured quantity assumes a value in the range of interest. Specifically, weconsider randomly deployed networks of location-agnostic wireless sensor nodes,which broadcast messages by flooding. The node-to-node delays are assumed tobe random. In networks of this type, the hop count of a broadcast message, giventhe distance from the source node, can be approximated by a simple parametricdistribution. The mobile platform can interrogate a nearby sensor node to obtain,with a given success probability, the hop counts of the broadcast messages origi-nating from different source nodes. By fusing the information from successive hopcount observations, the mobile platform infers the locations of the source nodesand thereby, the spatial distribution of the quantity of interest. The path taken bythe mobile platform should minimize the resulting mapping error as quickly aspossible. We propose an information-driven path planning approach, in which themobile platform acts by maximizing a weighted sum of myopic, mutual informa-tion gains. We show by simulation, that a suitable adaptation of the weights is67effective at reducing the error between the true and the inferred map, by preventingthe information gain to be dominated by only a few source nodes.4.1.1 Background and MotivationExploration and mapping of unknown environments is a fundamental applicationof autonomous robotics [109, 110]. In its most general form, referred to as thesimultaneous localization and mapping (SLAM) problem [3, 25], neither a map ofthe environment nor the location of the autonomous sensor platform within it, areknown a priori and must be inferred from observations.Much research has been devoted to the problem of generating maps of physi-cal objects or obstacles in those environments by autonomous platforms typicallyequipped with sensor technologies such as laser or ultrasound range finders, imag-ing sensors etc. [88, 98]. In order to enhance the situational awareness of anautonomous platform even further, we may be faced with the task of determiningthe spatial distribution of a physical quantity, for example the concentration of achemical, which cannot observed from a distance, but whose measurement requiresdirect physical contact with the sensor. In this case, it would be rather impracticalto use an autonomous sensor platform to obtain a dense map of the quantity ofinterest. Instead, wireless sensor networks (WSNs), due to their pervasive sensingability, seem to be a more appropriate solution to this task. In principle, messagescan convey sensor readings and corresponding node locations (provided the nodesare location-aware) via multiple hops through the WSN and to a data sink, wherea spatial map of the observed quantity is then easily generated. In a surveillanceapplication described in [21], sensor nodes are assumed to establish their own po-sitions by GPS or other localization methods, which are subsequently used in thereporting of events of interest to an unmanned aerial vehicle (UAV). However,inherent cost and complexity limitations of most practical WSNs can make an ap-proach requiring full node location awareness and routing capability difficult andexpensive to implement. When deployed in unknown and possibly inhospitableenvironments, it is reasonable to assume that the sensor node placement is random(with a known average density), and that the node locations are not known a priori.The requirement to establish accurate sensor node locations and multi-hop routes68to the data sink adds considerable complexity to the hardware and software, whichoften conflicts with the sensor node energy and cost budgets, especially those im-posed by very simple, low-cost and low-power devices.We consider an approach in which a mobile data sink (or “mapper”) operatesin the area to be mapped and in which a WSN has been deployed, by placing sen-sor nodes uniformly at random to take measurements of a quantity of interest. Weassume that due to cost and operational constraints, we are limited to use location-agnostic sensor nodes. The mapper can interrogate nearby sensor nodes directly toobtain message hop count information, based on which the source locations associ-ated with the received messages are estimated. The hop count is statistically relatedto the distance between the source node and the interrogated node. The mobile plat-form, which is assumed to be location-aware, infers a spatial map of the quantity ofinterest based on the hop count information. Messages are disseminated from thesource nodes through flooding with random node-to-node transmission delays, ina process known as first-passage percolation [15]. We assume furthermore, that in-terference between message transmissions can be neglected, see also Section 2.2.2.For such a network, it was shown in Chapter 2 that the message hop count, condi-tioned on the distance from the source node, can be approximated by the marginaldistribution of a simple stochastic process.This chapter is motivated by the problem of an autonomous mapper whichis given the task of estimating the spatial distribution of a physical quantity, byrelying solely on observations of message hop counts in WSNs of the type studiedin Chapter 2. For simplicity, suppose that we are only interested in obtaining abinary map, indicating the sensor node locations at which the measured quantityhas a value exceeding a threshold. Sensor nodes measuring sub-threshold valuesdo not initiate a message broadcast, which reduces network traffic. The problemfor the mapper can be described as finding an path, along which a maximum ofinformation is gathered to infer the map.We focus on determining a path along which the autonomous mapper interro-gates the wireless sensor nodes to acquire message hop count observations asso-ciated with each map element. The objective is to reduce the error between theinferred and the true map as much as possible, over a finite time horizon.694.1.2 Chapter ContributionWe propose a path planning method based on a parametric, myopic information-theoretic utility, which is defined as a weighted sum of the mutual informationbetween the random variables describing the source node locations and future hopcount observations. The weights are adapted using heuristics to offset the myopicnature of the utility, in order to improve mapping performance relative to a plainmutual information utility, but without the expense of true non-myopic path plan-ning.4.1.3 Chapter OrganizationThe chapter is organized as follows. Our system model, which includes a wirelesssensor network component and an autonomous mapper component, is presentedin Section 4.2. The wireless sensor network model is largely identical to the onedescribed in Section 3.2.1. The mapping path planning algorithm is developed inSection 4.3. We present simulation results in Section 4.4 and draw conclusions inSection 4.5. To simplify notation of probabilities, we omit the names of randomvariables when this is unambiguous.4.2 System Model4.2.1 Wireless Sensor Network ModelWe start by recalling the main characteristics of the WSN model used in Section3.2.1 for target search. The WSN deployment area is assumed to be the unit square[0,1]2 ⊂ R2. Sensor nodes are distributed according to a spatial Poisson pointprocess of density λ , restricted to the deployment area. The nodes have a com-munication range of R, thus inducing a random geometric network graph Gδ withmean node degree δ = piλR2. We assume that the design parameters λ ,R are cho-sen such that the mean node degree exceeds a critical threshold, above which thenetwork is connected with high probability.For the simulation of an autonomous mapper making hop count observations,it is convenient to discretize both the mapper position and the sensor node coordi-nates. Rather than placing sensor nodes randomly in [0,1]2 according to a Poisson70point process, we restrict possible node locations to a finite set of N = L2 grid co-ordinates, C = {0,1/L, . . . ,(L−1)/L}2, indexed by the set X = {1, . . .N}. Werefer to X as the set of cells. A cell is occupied by a sensor node with probabil-ity p. The resulting random node placement over C approximates a Poisson pointprocess with density λ = pN restricted to [0,1]2.The true mapM to be inferred by the mapper is defined as the subset of sensornodes in X , who initiate a broadcast in response to measuring a value in excess ofa predefined threshold,M = {x ∈X : x is source of a broadcast}. (4.1)The sensor node positions are described by the set of random variables M(i) withrealizations m(i), where i = 1 . . . |M |.Sensor nodes are assigned unique IDs, which are used to tag messages by thenode which initiates a broadcast. Every node forwards the messages to all of itsneighbors within the communication range; however, duplicates of a message withthe same ID are discarded in order to avoid unnecessary retransmissions. Eachnode-to-node link has a transmission delay associated with it. We assume that thenode-to-node delays are independent, exponentially distributed random variableswith a common mean. This is a reasonable assumption for networks where com-munication is unreliable and retransmissions are required, or where sensor nodesharvest energy from unreliable environmental sources and may be dormant for ran-dom periods. In such a network, the observed hop count at node x′ is the hop countof the first-passage path, that is the pathγ∗ = argminγ∈Γ(x,x′){Td(γ)} (4.2)which minimizes the message transmission delay Td(γ) for all paths γ ∈ Γ(x,x′)connecting nodes x and x′. In Chapter 2, it was established that for networks withstrongly supercritical mean node degree δ , the hop count at distance r from thetarget can be approximated by a translated Poisson distribution,fZ(z;r) = Pois(z−ζ (r);λ (r)) z≥ ζ (r) (4.3)71where λ (r) = µ(r)−ζ (r), with a mean hop count given by µ(r) = θ1r + θ2 andminimum hop count ζ (r) = dr/Re. The parameters θ1,θ2 may be inferred usingmaximum likelihood estimation as described in Section 2.3.3. It is worth notingthat θ1,θ2 are invariant to the choice of the common mean node-to-node delay,which may depend on operating conditions such as the average available energy.In other words, θ1,θ2 only depend on network design parameters and can thereforebe estimated off-line.For the simulation of map inference in this chapter, we apply the idealizedgenerative hop count model M3, as defined in Section Autonomous MapperThe autonomous mapper acts by moving, at discrete times, among the cells definedby the grid coordinates C = {cx}x∈X . At every time step k, the mapper movesto one of the four neighbor cells, subject to mapping area boundary conditions.We assume that the mapper position, given by the variable y ∈X , is completelydetermined by the action. Note that in many mapping applications, this assumptionis relaxed to account for uncertainty in the mapper position [110].If the visited cell is occupied by a sensor node, a set-valued observation ofmessage hop counts is obtained with probability q. Otherwise, the mapper receivesno response at all. A single observation z =(z(1), . . . ,z(|M |))is a set of messagehop counts associated with all source nodes in the set M defining the true map.Formally, we define the set of observations as Z = N0∪{φ}, where N0 are hopcounts and φ denotes the observation of no response. The mapper‘s observationmodel Pr{z(i)|m(i),y} is defined byPr{0|m(i),y}=q, if m(i) = y0, otherwise(4.4)Pr{φ |m,y}=1−q, if m = y1−qp, otherwise(4.5)Pr{n|m,y}=0, if m(i) = yqp fZ(n;r(m,y)), otherwise(4.6)72for n > 0. Here, fZ(n;r(m(i),y)) is the translated Poisson model (4.3), with theEuclidean distance r(m(i),y) = ‖cm(i) − cy‖2 between the map element at positionm(i) and the mapper at position y.A significant advantage of the proposed mapping method is that, owing to theunique message IDs, the feature association problem inherent in many mappingapplications [110, 116], is sidestepped: there is never ambiguity about the associa-tion of a hop count observation z(i) with the originating map element.Inference of the mapM from hop count observations requires the computationof a belief state over the space of maps, that is, an a posteriori probability distri-bution, conditioned on the history of mapper actions and observations. The beliefstate has the Markov property: after taking an action and making an observation,the new belief state is conditionally independent of the past, given the previousbelief state,bk(m) = Pr{m|z1:k,y1:k,b0}= Pr{m|zk,yk,bk−1}. (4.7)It is generally impractical to compute the joint a posteriori probability distributiondue to the combinatorial size of the hypothesis space: the number of binary maphypotheses for our problem is(N|M |)(4.8)However, by assuming that the map elements M(i) are pairwise independent ran-dom variables, we need to compute only the marginal a posteriori probability dis-tributions, asPr{m|zk,yk,bk−1}=|M |∏i=1Pr{m(i)|z(i)k ,yk,bk−1}(4.9)With every new set-valued hop count observation, the marginal a posteriori beliefstates can be updated recursively by applying Bayes’ rule,bk(m(i))= η Pr{z(i)k |m(i),yk}bk−1(m(i))(4.10)73where the normalization factor η is given byη−1 = ∑m(i)∈XPr{z(i)k |m(i),yk}bk−1(m(i))(4.11)4.3 Mapping Path PlanningIn this section, we formulate a mapping path planning algorithm based on aninformation-theoretic utility. Myopic, information-theoretic utilities have beenused for path planning in a variety of target search and tracking problems [43,110, 112] as well as mapping applications [14, 99].Assume that at time k, the mapper considers its next action ak+1 ∈Ayk , whichtakes the mapper to position yk+1. Then, for a given map element M(i), i.e. thesource node of a broadcast, the expected gain in information about its location isthe mutual information [19]I(M(i);Z(i)k+1|ak+1). (4.12)The information gain for a map, under the assumed pairwise independence of itselements, is the sum of the mutual information of the elements. We define themapper utility for taking the action ak+1 at the next time step as the weighted sumof the mutual informationU(ak+1) =|M |∑i=1w(i)k I(M(i);Z(i)k+1 |ak+1). (4.13)The weights w(i)k may be adapted to offset the myopic nature of the utility, withoutincurring the computational cost of a true, non-myopic planning algorithm. Themapper selects the action which maximizes the utility (4.13) at the next time step,ak+1 = argmaxα∈AykU(α). (4.14)In many situations, the mapper is expecting to realize a large information gain at thenext time step, by detecting a nearby map element, which is equivalent to observinga hop count of zero. The detection of a map element M(i) has a high information74content as the entropy H(M(i)) becomes zero at once, while observations of mapelements further away do not generally promise any large information gains. Amyopic utility function prevents the mapper from taking future information gainsappropriately into account, by excessive focus on imminent detections. As a result,the mapper may dwell in the neighborhood of some nodes for an extended timeseeking to make detections, even though the location estimates of these nodes mayalready be sufficiently accurate for the purposes of mapping, and likely to improveover the remaining horizon.We propose two heuristics for adapting the weights w(i)k in the utility function(4.13). The first heuristic simply excludes map elements from further considerationin the path planning, for which the minimum hop count observed up to time k,z(i)min,k = minn=0...kz(i)n (4.15)is smaller than a predefined threshold z0, by adapting the weights throughw(i)k =1, if z(i)min,k ≥ z00, otherwise.(4.16)The second heuristic is motivated by the intuition, that map elements for whichsufficient information has been acquired, have low entropy. Weights are adaptedbased on the entropy of corresponding map elements (in addition to the minimumhop count),w(i)k =H(M(i))Hmax, if z(i)min,k ≥ z00, otherwise,(4.17)where H(M(i)) is the entropy of map element M(i), and Hmax = logN denotes themaximum entropy for any map element, that is, the entropy of the a priori uniformbelief state over N cells for the location of a map element.754.4 Simulation Results4.4.1 Simulation SetupWe have simulated the inference of a binary map M ⊂ X by an autonomousmapper, assisted by a wireless sensor network which is represented by a generativehop count model.The WSN deployment area is assumed to be the unit square. The mapping gridis defined by dividing the deployment area into N = 32×32 cells. Sensor nodelocations are restricted to the discrete cell coordinates, an appropriate assumptiongiven the discrete nature of the mapping problem. A cell is occupied by a sensornode with probability p = 0.75.Let the map represent the spatial distribution of some physical quantity (e.g.concentration, temperature, presence/absence) measured by the sensor nodes. De-spite the multitude of possible phenomena we might be interested in mapping, itis reasonable to assume that most physical processes are spatially correlated, withi.i.d. processes being rather uncommon.The hop count is modeled in an idealized fashion as described in Section 3.2.1:observations are generated as independent draws from a translated Poisson distri-bution (4.3), given the distance r from the map element associated with the hopcount observation. The model parameters θ1,θ2 governing the mean hop count of(4.3) are determined off-line by maximum likelihood estimation as described inChapter 2, given a node occupation probability p = 0.75 and a network mean nodedegree of δ = 40.The mapper has a success probability q = 0.75 of making a set-valued obser-vation of hop counts, z =(z(1), . . . ,z(|M |)), given that the visited cell is occupiedby a sensor node.The mapper then moves to the immediate neighbor cell which maximizes theutility (4.14), subject to mapping area boundary constraints.764.4.2 Simulation of Map InferenceThe result of the map inference is assessed in terms of two metrics, the averageentropy per inferred map element,H¯ =1|M ||M |∑i=1H(M(i)) (4.18)and the mean square error between the true map and the inferred map, which wedefine asmse =1|M ||M |∑i=1(1−b(M (i)))2, (4.19)where the b(M (i)) denote the a posteriori probabilities of the elements of the truemap, M .The weights wi in (4.13) are used to mitigate the myopic nature of the utilityfunction, and are adapted heuristically, as described by (4.16) and (4.17), respec-tively.Simulation results for a single, random realization of a map are shown in Figure4.1, with simulation parameters as defined in the preceding section. The averageentropy per map element and the mean square error as functions of time are shownin Figure 4.2 and Figure 4.3. The final average entropy and the MSE after 150time steps are summarized in Table 4.1 and Table 4.2, for different values of theminimum hop count parameter z0 which controls the weights wi. We find thatthe adaptive, weighted mutual information utility is more effective at reducing theerror between the true and the inferred map than a plain mutual information utility(corresponding to (4.16) with z0 = 0). We explain this result by the adaptation ofthe weights wi in the path planning process, which assigns more importance to theinformation gain from map elements whose location is still relatively uncertain. Itis also easy to see from Tables 4.1 and 4.2 that for larger z0, the performance startsto decrease again as map elements are excluded from the utility, and thus from pathplanning, too soon.77Table 4.1: Map average entropy and MSE, wi adapted according to (4.16)z0 H¯ [bit] MSE0 3.39 0.721 3.39 0.722 2.23 0.523 2.07 0.474 2.34 0.54Table 4.2: Map average entropy and MSE, wi adapted according to (4.17)z0 H¯ [bit] MSE0 2.43 0.561 2.43 0.562 1.94 0.443 2.26 0.514 2.41 0.554.5 ConclusionWe have proposed a mapping approach involving an autonomous mapper inter-acting closely with a randomly deployed wireless sensor network, using the hopcount based observation model developed in Chapter 2 to infer a map of a physicalquantity measured by the sensor nodes. The path of the mobile sink is governedby a utility function based on a weighted sum of the myopic, mutual information(4.13) between the random variables describing the individual map elements andthe associated future hop count observations. We propose and simulate a heuristicfor adapting the weights wi in the utility function, based on a minimum hop countcriterion (4.16) and the entropy of the map elements (4.17). It is found that theweighted mutual information utility is effective in improving map inference rela-tive to the plain mutual information utility, in terms of both the average entropy permap element and the mean square error between the true and the inferred map, andwithout the cost and complexity of a true, non-myopic algorithm.78(a) True Map(b) Inferred Map(c) Mapping PathFigure 4.1: The true map, the inferred map and the autonomous mapper path,shown with hop count observations and non-responses over 150 timesteps.790 50 100 15002468time stepsavg. entropy [bit]Figure 4.2: The average entropy per map element over time0 50 100 1500. stepsmseFigure 4.3: The mean square error between true and inferred map over time80Chapter 5Conclusions and Future WorkThis chapter provides the conclusions of this thesis and suggests directions of fu-ture research.5.1 ConclusionsThis thesis addresses several challenges that are encountered in joint applicationsof WSNs and autonomous platforms. The need for such applications arises forexample in autonomous mapping or search and rescue, where the pervasive sen-sor coverage of a WSN and the mobility of an autonomous platform (which mayhave additional on-board sensing and actuating capabilities specific to the missionrequirements) complement each other well. Assuming that the autonomous plat-form is able to directly interrogate nearby sensor nodes, complex computationaland communication tasks can be offloaded from the sensor nodes, leading to po-tentially significant simplifications of the node hardware and software. The WSNrequirements laid out in this thesis are very simple: sensor nodes are location-agnostic and messages are disseminated by flooding, which eliminates the need forspecial-purpose localization hardware and complex routing techniques.In Chapter 2, we have proposed a new approach to model the hop count dis-tribution between a source and a sink node in broadcast WSNs, in which messagepropagation can be described by first-passage percolation. A limitation of the first-passage percolation model is the assumption of i.i.d. node-to-node message delays,81a condition which may not be satisfied if node-to-node transmissions interfere witheach other. In this thesis, we assume that such interference can be neglected andfurthermore, in Chapter 4, that multiple simultaneous broadcasts do not interact.Our approach to model the hop count differs from related works in that the hopcount distribution arises as a maximum entropy model, given a priori informa-tion about the hop count process from first-passage percolation theory and undera few simplifying assumptions. The resulting, approximate hop count distributionis shown to have a translated Poisson distribution. The maximum entropy methodis very general and can accommodate additional a priori information, such as newtheoretical results e.g. about higher moments of the first-passage path length.Simulation results confirm, that for WSNs which are characterized as stronglysupercritical (i.e. with a mean node degree δ  δc), the empirical hop count dis-tribution is well approximated by our model. In these WSNs, the localization errorresulting from the application of the translated Poisson model is generally small.For node degrees approaching the critical percolation threshold δc, as well as forsmall source-to-sink distances, the error performance degrades however. This canbe attributed in part to some of the a priori assumptions made in the maximumentropy model, which are only known to hold asymptotically, such as the linear-ity of the mean hop count with distance, or the independence of the hop countincrements. Since the network is not connected below the critical threshold, butrather consists of a collection of disconnected nodes and small connected clusters,a broadcast message cannot propagate through (most) of the WSN and the hopcount model breaks down in the subcritical regime.Due to its low computational complexity, we expect the proposed, approximatehop count distribution to be a good candidate for use as the observation model inBayesian target localization or mapping applications involving low-cost WSNs,which rely on message hop count (and possibly other information, such as RSSI).In these applications, the observation model must be evaluated repeatedly, typicallyfor a very large number of location hypotheses. The proposed hop count model hasthe advantage, that the model parameters are functions only of the sensor node den-sity and transmission range, and can therefore be estimated off-line. Importantly,the model parameters are invariant to the mean node-to-node message delay, whichin turn may depend on operating conditions, such as the average power available82to the sensor nodes.In the remaining chapters, we have considered two generic applications of amobile platform assisted by a WSN: autonomous target search and autonomousmapping.In Chapter 3, we have proposed algorithms to approximately solve a path-constrained target search problem involving an autonomous searcher assisted by aWSN, using the hop count observation model developed in Chapter 2. The searchproblem is formulated as a POMDP, whose solution is generally intractable. Wefind an approximate solution by using policy rollout, i.e. an online Monte Carloplanning method. Our rollout approach uses a novel heuristic to approximate theterminal cost-to-go at the rollout horizon, thereby making the approach computa-tionally efficient. Our simulations are restricted to the case of a stationary target,due to the focus on the hop count observation model and the resulting performanceof rollout, compared to mutual information methods.It was shown, by simulation, that the rollout algorithm consistently outper-forms non-myopic mutual information-driven search approaches, at comparablecomputational cost. A comparison of the search-time performance under differentbase policies and parallel rollout shows potential for further performance improve-ments, although any performance gain will likely come at additional computationalcost. We have evaluated the search-time performance under different generativemodels for the hop count, varying from idealized to more realistic, to quantify theperformance loss due to statistically dependent observations, given the source nodelocation. We find that in our application, a reliance on the naïve Bayes assumptioncarries a significant performance penalty. Therefore, we adapt an integer autore-gressive process model to the hop count observations, which accounts for theirdependence explicitly.Also in Chapter 3, we have shown that a target search problem described interms of an explicit tradeoff between exploitation and exploration (referred to as“infotaxis” [112]), is mathematically equivalent to a target search with a myopicmutual information utility. This result confirms the intuition that a mutual informa-tion utility optimally balances exploratory and greedy behavior, when informationgain is the objective. Furthermore, in Chapter 3 we provide a lower bound onthe expected search time for multiple uniformly distributed searchers in terms of83the searcher density, based on the contact distance in Poisson point processes: thetime to reach the target is inversely proportional to the square root of the searcherdensity.In Chapter 4, we have considered an autonomous mapping approach involvingan autonomous mapper interacting closely with a WSN, using the hop count basedobservation model developed in Chapter 2, to infer a map of a physical quantitymeasured by the sensor nodes. The problem of planning an optimal path for themapper is generally intractable. Myopic planning approaches are common, butmay provide only poor performance. In our approach, the path of the mapper isgoverned by a parametrized utility function based on a weighted sum of the my-opic, mutual information between the random variables describing the individualmap elements and the associated future hop count observations. We propose andsimulate a heuristic for adapting the parameters of this utility function based onboth a minimum hop count criterion and the entropy of the map elements. It isfound that the weighted mutual information utility is effective in improving mapinference over a plain mutual information utility, in terms of both the average en-tropy per map element and the mean square error between the true and the inferredmap, but without the cost and complexity of a truly non-myopic algorithm.5.2 Future Work5.2.1 Parametric Models of the Hop Count DistributionWhile the translated Poisson model has been shown to arise as the maximum en-tropy distribution consistent with our a priori assumptions about the hop count,the stochastic jump process admits a more general compound Poisson distribution,providing additional degrees of freedom in the model identification. Given furtherinformation or assumptions about the hop count process (for example higher mo-ments of the length of the first-passage path), maximum entropy modeling mayyield an improved fit for the hop count distribution.845.2.2 Statistical Dependence of Hop Count ObservationsWe have found that the statistical dependence of the hop count observations has asignificant, degrading effect on the search-time performance. Future research willfocus on models of the statistical observation dependence and their performanceevaluation. It is an open question whether a parametric approach, such as ourinteger autoregressive model, or a simulation-based approach will be more efficientto address this issue, and more work is required to determine the tradeoffs.5.2.3 Simulation-based Observation ModelsAt present, the autonomous searcher relies on simple parametric models for boththe hop count distribution and the statistical dependence of the observations, whichhave the advantage of low computational cost. It is likely however, that furtherperformance improvement will be gained only at the cost of increased complexityof the parametric formulations. Under these conditions, an alternative is to con-sider Monte Carlo simulation to model the hop count observation statistics. Theincreased computational cost of simulation may be offset by the elimination of in-creasingly complex parametric modeling. More importantly, simulation providesmuch greater flexibility in the choice of WSN message propagation mechanisms,for which no parametric hop count models are known, or are exceedingly complex.5.2.4 Multi-Modal ObservationsIn many WSNs, additional information related to sensor node locations is availableat little or no extra cost (e.g. received signal strength, or RSSI). Future work willdetermine how such information can be optimally “fused” with the hop count infor-mation to improve localization performance. Furthermore, in joint applications ofWSNs and autonomous platforms, it is reasonable to assume that the mobile plat-form is equipped with on-board sensors (e.g. laser or ultrasound ranging, imagingsensors etc.). Information fusion algorithms have to be designed to make optimaluse of all available sensors, WSN as well as on-board.85Bibliography[1] M. A. Al-Osh and A. A. Alzaid. First-order integer-valued autoregressiveINAR(1) process. Journal of Time Series Analysis, 8(3):261–275, May1987.[2] N. Antunes, G. Jacinto, and A. Pacheco. On the minimum hop count andconnectivity in one-dimensional ad hoc wireless networks.Telecommunication Systems, 39(2):137–143, 2008.[3] J. Aulinas, Y. Petillot, J. Salvi, and X. Lladó. The SLAM Problem: ASurvey. In Proceedings of the 11th Conference on Artificial IntelligenceResearch and Development, pages 363–371, Oct. 2008.[4] A. Baddeley, I. Bárány, and R. Schneider. Spatial point processes and theirapplications. In Stochastic Geometry, volume 1892 of Lecture Notes inMathematics, pages 1–75. Springer, 2007.[5] A. D. Barbour and T. Lindvall. Translated Poisson Approximation forMarkov Chains. Journal of Theoretical Probability, 19(3):609–630, Dec.2006.[6] R. F. Bass. Stochastic Processes. Cambridge University Press, 2011.[7] M. Berkelaar, K. Eikland, and P. Notebaert. Lpsolve 5.5, Open SourceMixed-Integer Linear Programming System.http://lpsolve.sourceforge.net/5.5/, 2004.[8] D. P. Bertsekas and D. A. Castañon. Rollout algorithms for stochasticscheduling problems. J. Heuristics, 5(1):89–108, Apr. 1999.[9] S. Beyme and C. Leung. Modeling the hop count distribution in wirelesssensor networks. In Proceedings of the 26th Annual IEEE CanadianConference on Electrical and Computer Engineering (CCECE), pages 1–6,May 2013.86[10] S. Beyme and C. Leung. A stochastic process model of the hop countdistribution in wireless sensor networks. Ad Hoc Networks, 17:60–70, June2014.[11] A. Biswas and P. X.-K. Song. Discrete-valued ARMA processes. Statistics& Probability Letters, 79(17):1884–1889, Sept. 2009.[12] J. Blumenthal, R. Grossmann, F. Golatowski, and D. Timmermann.Weighted Centroid Localization in Zigbee-based Sensor Networks. InIEEE International Symposium on Intelligent Signal Processing, 2007,pages 1–6, Oct. 2007.[13] B. Bollobás and O. Riordan. Percolation. Cambridge University Press,2006.[14] F. Bourgault, A. A. Makarenko, S. B. Williams, B. Grocholsky, and H. F.Durrant-Whyte. Information based adaptive robotic exploration. InProceedings of the IEEE/RSJ International Conference on IntelligentRobots and Systems (IROS), pages 540–545, Sept. 2002.[15] S. R. Broadbent and J. M. Hammersley. Percolation processes. I. Crystalsand Mazes. Proceedings of the Cambridge Philosophical Society, 53:629–641, July 1957.[16] H. S. Chang, R. Givan, and E. K. P. Chong. Parallel Rollout for OnlineSolution of Partially Observable Markov Decision Processes. DiscreteEvent Dynamic Systems, 14(3):309–341, July 2004.[17] E. P. Chong, C. M. Kreucher, and A. O. Hero III. Partially ObservableMarkov Decision Process Approximations for Adaptive Sensing. DiscreteEvent Dynamic Systems, 19(3):377–422, Sept. 2009.[18] COIN. COmputational INfrastructure for Operations Research.http://www.coin-or.org, 2009.[19] T. M. Cover and J. A. Thomas. Elements of Information Theory.Wiley-Interscience, 2006.[20] S. De, A. Caruso, T. Chaira, and S. Chessa. Bounds on hop distance ingreedy routing approach in wireless ad hoc networks. Int. J. Wire. Mob.Comput., 1(2):131–140, Feb. 2006.[21] E. P. de Freitas, T. Heimfarth, A. Vinel, F. R. Wagner, C. E. Pereira, andT. Larsson. Cooperation among wirelessly connected static and mobile87sensor nodes for surveillance applications. Sensors, 13(10):12903–12928,Sept. 2013.[22] N. Deshpande, E. Grant, and T. Henderson. Target localization andautonomous navigation using wireless sensor networks; a pseudogradientalgorithm approach. IEEE Systems Journal, 8(1):93–103, Mar. 2014.[23] O. Dousse, P. Mannersalo, and P. Thiran. Latency of wireless sensornetworks with uncoordinated power saving mechanisms. In Proceedings ofthe 5th ACM International Symposium on Mobile Ad Hoc Networking andComputing, pages 109–120, May 2004.[24] S. Dulman, M. Rossi, P. Havinga, and M. Zorzi. On the hop count statisticsfor randomly deployed wireless sensor networks. Int. J. Sen. Netw., 1(1):89–102, Sept. 2006.[25] H. Durrant-Whyte and T. Bailey. Simultaneous localization and mapping:part i. IEEE Robot. Autom. Mag., 13(2):99–110, June 2006.[26] R. Durrett. Probability: Theory and Examples. Cambridge UniversityPress, 2010.[27] J. N. Eagle. The Optimal Search for a Moving Target When the SearchPath is Constrained. Operations Research, 32(5):1107–1115, Oct. 1984.[28] J. N. Eagle and J. R. Yee. An optimal branch-and-bound procedure for theconstrained path, moving target search problem. Operations Research, 38(1):110–114, 1990.[29] E. Eberlein. Jump-type Lévy processes, pages 439–455. Springer, 2009.[30] M. Eden. A Two-dimensional Growth Process. In Proceedings of the 4thBerkeley Symposium on Mathematical Statistics and Probability, Volume 4:Contributions to Biology and Problems of Medicine, pages 223–239, 1961.[31] D. Fox, W. Burgard, and S. Thrun. Active markov localization for mobilerobots. Robotics and Autonomous Systems, 25(3-4):195–207, Nov. 1998.[32] T. Furukawa, F. Bourgault, B. Lavis, and H. Durrant-Whyte. RecursiveBayesian search-and-tracking using coordinated UAVs for lost targets. InProceedings of the 2006 IEEE International Conference on Robotics andAutomation (ICRA), pages 2521–2526, May 2006.88[33] R. Ganti and M. Haenggi. Dynamic Connectivity and Packet PropagationDelay in ALOHA Wireless Networks. In Conference Record of the 41stAsilomar Conference on Signals, Systems and Computers (ACSSC’07),pages 143–147, Nov. 2007.[34] R. Ganti and M. Haenggi. Bounds on the information propagation delay ininterference-limited ALOHA networks. In Proceedings of the 7thinternational conference on Modeling and Optimization in Mobile, Ad Hoc,and Wireless Networks, pages 513–519, June 2009.[35] E. N. Gilbert. Random Plane Networks. Journal of the Society forIndustrial and Applied Mathematics, 9(4):533–543, Dec. 1961.[36] V. Gintautas, A. A. Hagberg, and L. M. A. Bettencourt. Leveragingsynergy for multiple agent infotaxis. In Proceedings of Social Computing,Behavioral Modeling, and Prediction, Los Alamos National Laboratory,Jan. 2008.[37] R. M. Gray. Probability, Random Processes, and Ergodic Properties.Springer Publishing Company, 2nd edition, 2009.[38] A. Guitouni and H. A. Masr. A Nonlinear Mixed Integer Program forSearch Path Planning Problem. In Proceedings of the 4th MultidisciplinaryInternational Scheduling Conference: Theory and Applications (MISTA2009), pages 277–290, Aug. 2009.[39] S. Guo, Y. Gu, B. Jiang, and T. He. Opportunistic flooding inlow-duty-cycle wireless sensor networks with unreliable links. InProceedings of the 15th Annual International Conference on MobileComputing and Networking (MobiCom ’09), pages 133–144, Sept. 2009.[40] M. Haenggi, J. Andrews, F. Baccelli, O. Dousse, and M. Franceschetti.Stochastic geometry and random graphs for the analysis and design ofwireless networks. IEEE J. Sel. Areas Commun., 27(7):1029–1046, Sept.2009.[41] J. M. Hammersley and D. J. A. Welsh. First-passage percolation,subadditive processes, stochastic networks and generalized renewal theory.Bernoulli, Bayes, Laplace Anniversary Volume (Neyman, J. and LeCam, L.,eds), pages 61–110, 1965.[42] M. H. Hansen and B. Yu. Model Selection and the Principle of MinimumDescription Length. Journal of the American Statistical Association, 96:746–774, 1998.89[43] G. Hoffmann and C. Tomlin. Mobile Sensor Network Control UsingMutual Information Methods and Particle Filters. IEEE Trans. Autom.Control, 55(1):32–47, Jan. 2010.[44] G. Hoffmann, S. Waslander, and C. Tomlin. Mutual Information Methodswith Particle Filters for Mobile Sensor Network Control. In Proceedings ofthe 45th IEEE Conference on Decision and Control, pages 1019–1024,Dec. 2006.[45] G. M. Hoffmann. Autonomy for Sensor-Rich Vehicles: Interaction betweenSensing and Control Actions. PhD thesis, Stanford University, Sept. 2008.[46] G. M. Hoffmann, S. L. Waslander, and C. J. Tomlin. DistributedCooperative Search using Information-Theoretic Costs for Particle Filterswith Quadrotor Applications. In Proceedings of the AIAA Guidance,Navigation, and Control Conference, pages 21–24, 2006.[47] C. D. Howard. Models of First-Passage Percolation. In Probability onDiscrete Structures, pages 125–174. Springer, 2004.[48] R. A. Howard. Dynamic Programming and Markov Processes. MIT Press,1960.[49] J. Hu, L. Xie, K.-Y. Lum, and J. Xu. Multiagent information fusion andcooperative control in target search. IEEE Trans. Control Syst. Technol., 21(4):1223–1235, July 2013.[50] K. Ito. Stochastic Processes: Lectures given at Aarhus University.Springer, 2004.[51] E. T. Jaynes. Prior probabilities. IEEE Trans. Syst. Sci. Cybern., 4:227–241, 1968.[52] E. T. Jaynes. On the rationale of maximum-entropy methods. Proceedingsof the IEEE, 70(9):939–952, Sept. 1982.[53] E. T. Jaynes. Entropy and Search Theory. In C. R. Smith andJ. W. T. Grandy, editors, Maximum-Entropy and Bayesian Methods inInverse Problems, pages 443–454. D. Reidel, 1985.[54] O. Johnson. Log-concavity and the maximum entropy property of thePoisson distribution. Stochastic Processes and their Applications, 117(6):791–802, June 2007.90[55] R. C. Jung, G. Ronning, and A. Tremayne. Estimation in conditional firstorder autoregression with discrete support. Statistical Papers, 46(2):195–224, Apr. 2005.[56] L. P. Kaelbling, M. L. Littman, and A. R. Cassandra. Planning and actingin partially observable stochastic domains. Artificial Intelligence, 101(1-2):99–134, May 1998.[57] R. E. Kalman. A new approach to linear filtering and prediction problems.Journal of Fluids Engineering, 82(1):35–45, 1960.[58] S. M. Kay. Fundamentals of Statistical Signal Processing: EstimationTheory. Prentice-Hall, Inc., 1993.[59] H. Keeler and P. Taylor. A stochastic analysis of a greedy routing schemein sensor networks. SIAM Journal on Applied Mathematics, 70(7):2214–2238, Apr. 2010.[60] C. Keller. Applying optimal search theory to inland SAR: Steve Fossettcase study. In Proceedings of the 13th Conference on Information Fusion(FUSION’10), pages 1–8, 2010.[61] J. F. C. Kingman. Subadditive ergodic theory. The Annals of Probability, 1(6):883–899, Dec. 1973.[62] D. E. Knuth. The Art of Computer Programming, Volume 3: Sorting andSearching. Addison Wesley Longman Publishing Co., Inc., 2nd edition,1998.[63] Z. Kong and E. M. Yeh. Characterization of the Critical Density forPercolation in Random Geometric Graphs. In Proceedings of the IEEEInternational Symposium on Information Theory (ISIT’07), pages 151–155,June 2007.[64] Z. Kong and E. M. Yeh. Information dissemination in large-scale wirelessnetworks with unreliable links. In Proceedings of the 4th AnnualInternational Conference on Wireless Internet, pages 321–329, Nov. 2008.[65] T. M. Liggett. An improved subadditive ergodic theorem. The Annals ofProbability, 13(4):1279–1285, Nov. 1985.[66] T. M. Liggett. Ultra Logconcave Sequences and Negative Dependence.Journal of Combinatorial Theory, Series A, 79(2):315–325, Aug. 1997.91[67] M. L. Littman, A. R. Cassandra, and L. P. Kaelbling. Learning Policies forPartially Observable Environments: Scaling Up. In Readings in Agents,pages 495–503. Morgan Kaufmann Publishers Inc., 1998.[68] N. Lo, J. Berger, and M. Noel. Toward optimizing static target search pathplanning. In Proceedings of the IEEE Symposium on ComputationalIntelligence for Security and Defence Applications (CISDA), pages 1–7,2012.[69] W. S. Lovejoy. A survey of algorithmic methods for partially observedMarkov decision processes. Annals of Operations Research, 28(1):47–65,1991.[70] D. J. C. MacKay. Information Theory, Inference and Learning Algorithms.Cambridge University Press, 2002.[71] A. Mahajan and D. Teneketzis. Multi-armed bandit problems. InFoundations and Applications of Sensor Management, pages 121–151.Springer, 2008.[72] G. Mao, Z. Zhang, and B. Anderson. Probability of k-hop connectionunder random connection model. IEEE Commun. Lett., 14(11):1023–1025,Nov. 2010.[73] A. Marjovi, J. Nunes, P. Sousa, R. Faria, and L. Marques. Anolfactory-based robot swarm navigation method. In Proceedings of the2010 IEEE International Conference on Robotics and Automation (ICRA),pages 4958–4963, May 2010.[74] J.-B. Masson, M. B. Bechet, and M. Vergassola. Chasing information tosearch in random environments. J. Phys. A: Math. Theor., 42(43):434009,Oct. 2009.[75] A. Molisch. Wireless Communications. Wiley, 2nd edition, 2010.[76] S. Nath, V. Ekambaram, A. Kumar, and P. V. Kumar. Theory andAlgorithms for Hop-Count-Based Localization with Random GeometricGraph Models of Dense Sensor Networks. ACM Trans. Sen. Netw., 8(4):1–38, Sept. 2012.[77] D. Niculescu and B. Nath. DV Based Positioning in Ad Hoc Networks.Telecommunication Systems, 22:267–280, 2003.92[78] U. Orguner, P. Skoglar, D. Tornqvist, and F. Gustafsson. Combinedpoint-mass and particle filter for target tracking. In Proceedings of the 2010IEEE Aerospace Conference, pages 1–10, Mar. 2010.[79] C. Papadimitriou and J. N. Tsitsiklis. The Complexity of Markov DecisionProcesses. Math. Oper. Res., 12(3):441–450, Aug. 1987.[80] N. Patwari, J. Ash, S. Kyperountas, A. Hero, R. Moses, and N. Correal.Locating the nodes: cooperative localization in wireless sensor networks.IEEE Signal Process. Mag., 22(4):54–69, July 2005.[81] M. Penrose. Random Geometric Graphs (Oxford Studies in Probability).Oxford University Press, USA, 2003.[82] G. Rahmatollahi and G. Abreu. Closed-form hop-count distributions inrandom networks with arbitrary routing. IEEE Transactions onCommunications, 60(2):429–444, 2012.[83] R. Rezaiifar and A. M. Makowski. From optimal search theory tosequential paging in cellular networks. IEEE J.Sel. A. Commun., 15(7):1253–1264, Sept. 1997.[84] D. Richardson. Random Growth in a Tessellation. MathematicalProceedings of the Cambridge Philosophical Society, 74:515–528, Nov.1973.[85] S. Ross, J. Pineau, S. Paquet, and B. Chaib-draa. Online planningalgorithms for POMDPs. J Artif Intell Res., 32(2):663–704, July 2008.[86] J. O. Royset and H. Sato. Route optimization for multiple searchers. NavalResearch Logistics (NRL), 57(8):701–717, 2010.[87] A. Ryan and J. K. Hedrick. Particle filter based information-theoretic activesensing. Robotics and Autonomous Systems, 58(5):574–584, May 2010.[88] Z. A. Saigol, R. W. Dearden, J. L. Wyatt, and B. J. Murton.Information-lookahead planning for AUV mapping. In Proceedings of the21st International Joint Conference on Artificial intelligence, pages1831–1836, July 2009.[89] S. Sarid, A. Shapiro, E. Rimon, and Y. Edan. Classifying theheterogeneous multi-robot online search problem into quadratic timecompetitive complexity class. In Proceedings of the 2011 IEEEInternational Conference on Robotics and Automation (ICRA), pages4962–4967, May 2011.93[90] I. C. Schick and S. K. Mitter. Robust recursive estimation in the presenceof heavy-tailed observation noise. The Annals of Statistics, 22(2):1045–1080, 1994.[91] S. Shue and J. Conrad. A survey of robotic applications in wireless sensornetworks. In Proceedings of 2013 IEEE Southeastcon, pages 1–5, Apr.2013.[92] S. Singh and V. Krishnamurthy. The optimal search for a Markovian targetwhen the search path is constrained: the infinite-horizon case. IEEE Trans.Autom. Control, 48(3):493–497, Mar 2003.[93] P. Skoglar. Planning Methods for Aerial Exploration and Ground TargetTracking. PhD thesis, Linköping University, Oct. 2009.[94] P. Skoglar, U. Orguner, and F. Gustafsson. On information measures basedon particle mixture for optimal bearings-only tracking. In Proceedings ofthe 2009 IEEE Aerospace Conference, pages 1–14, March 2009.[95] R. T. Smythe and J. C. Wierman. First-Passage Percolation on the SquareLattice, I. Advances in Applied Probability, 9(1):38–54, Mar. 1977.[96] R. T. Smythe and J. C. Wierman. First-Passage Percolation on the SquareLattice, III. Advances in Applied Probability, 10(1):155–171, Mar. 1978.[97] N.-O. Song and D. Teneketzis. Discrete search with multiple sensors.Mathematical Methods of Operations Research, 60(1):1–13, 2004.[98] A. Souza, R. Maia, R. Aroca, and L. Goncalves. Probabilistic robotic gridmapping based on occupancy and elevation information. In Proceedings ofthe 2013 16th International Conference on Advanced Robotics (ICAR),pages 1–6, Nov 2013.[99] C. Stachniss, G. Grisetti, and W. Burgard. Information Gain-basedExploration Using Rao-Blackwellized Particle Filters. In Proceedings ofRobotics: Science and Systems (RSS), pages 65–72, June 2005.[100] D. Stauffer and A. Aharony. Introduction to percolation theory. Taylor &Francis, 1992.[101] F. W. Steutel and K. van Harn. Discrete analogues of self-decomposabilityand stability. The Annals of Probability, 7(5):893–899, Oct. 1979.94[102] R. Stoleru, T. He, and J. A. Stankovic. Range-Free Localization. In SecureLocalization and Time Synchronization for Wireless Sensor and Ad HocNetworks, volume 30, pages 3–31. Springer US, 2007.[103] L. Stone. Theory of Optimal Search. Academic Press, 1975.[104] D. Stoyan, W. S. Kendall, and J. Mecke. Stochastic Geometry and ItsApplications, 2nd Edition. Wiley, 1996.[105] B. Sturmfels. Polynomial Equations and Convex Polytopes. The AmericanMathematical Monthly, 105(10):907–922, Dec. 1998.[106] R. S. Sutton and A. G. Barto. Introduction to Reinforcement Learning.MIT Press, 1st edition, 1998.[107] X. Ta, G. Mao, and B. Anderson. Evaluation of the Probability of K-HopConnection in Homogeneous Wireless Sensor Networks. In Proceedings ofthe 2007 IEEE Global Telecommunications Conference, pages 1279–1284,Nov. 2007.[108] P. Thompson, E. Nettleton, and H. F. Durrant-Whyte. Distributed largescale terrain mapping for mining and autonomous systems. In 2011IEEE/RSJ International Conference on Intelligent Robots and Systems,IROS 2011, San Francisco, CA, USA, September 25-30, 2011, pages4236–4241, 2011.[109] S. Thrun. Robotic Mapping: A Survey. In Exploring Artificial Intelligencein the New Millennium, pages 1–35. Morgan Kaufmann Publishers Inc.,2003.[110] S. Thrun, W. Burgard, and D. Fox. Probabilistic Robotics (IntelligentRobotics and Autonomous Agents). The MIT Press, 2005.[111] K. E. Trummel and J. R. Weisinger. The complexity of the optimal searcherpath problem. Oper. Res., 34(2):324–327, Mar. 1986.[112] M. Vergassola, E. Villermaux, and B. I. Shraiman. ’Infotaxis’ as a strategyfor searching without gradients. Nature, 445(7126):406–409, Jan. 2007.[113] S. Vural and E. Ekici. On Multihop Distances in Wireless Sensor Networkswith Random Node Locations. IEEE Trans. Mobile Comput., 9(4):540–552, Apr. 2010.[114] C. H. Weiss. Thinning operations for modeling time series of counts - asurvey. Advances in Statistical Analysis, 92(3):319–341, Aug. 2008.95[115] J. C. Wierman. First-Passage Percolation on the Square Lattice, II.Advances in Applied Probability, 9(2):283–295, June 1977.[116] R. Wong, J. Xiao, S. Joseph, and Z. Shan. Data association forsimultaneous localization and mapping in robotic wireless sensor networks.In Proceedings of the 2010 IEEE/ASME International Conference onAdvanced Intelligent Mechatronics (AIM), pages 459–464, July 2010.[117] M. Zorzi and R. Rao. Geographic random forwarding (GeRaF) for ad hocand sensor networks: multihop performance. IEEE Trans. Mobile Comput.,2(4):337–348, Dec. 2003.96Appendix ANecessary Condition for the HopCount ProcessIn this Appendix, it is shown that neither subadditivity nor superadditivity holdfor the hop count of the first-passage path in the random geometric graph G . Weutilize this property in Section 2.3.1 as a necessary condition for the hop count tobe modeled as a sum of increments (2.8).Proof. For given nodes o, x, y ∈ G , the first-passage path γ(o,y) has a passagetime which satisfies To,y ≤ To,x +Tx,y, due to subadditivity (2.3). Suppose that theassociated hop count is subadditive, i.e. No,y ≤ No,x +Nx,y, with equality if x ison the first-passage path. Assume that x is not on the first-passage path. Then,there is at least one edge not common with either of the paths γ(o,x) or γ(x,y),denoted (u,v). We construct a new graph realization G ′ by opening edge (u,v)and instantiating a new node w such that the Euclidean distances ‖u−w‖ ≤ R and‖v−w‖ ≤ R, i.e. node w is linked to both u and v. Furthermore, set the edgepassage times Θ(u,w), Θ(w,v) such that Θ(u,w) +Θ(w,v) = Θ(u,v); for anyadditional edges that may form by inserting w, choose edge passage times largeenough so that the passage times of the paths γ(o,y), γ(o,x) and γ(x,y) remainunchanged from G . However, the hop count of γ(o,y) has now increased by 1. Wecan repeat inserting nodes in this manner, until No,y > No,x +Nx,y. This contradictsour assumption that the hop count is subadditive.97By an analogous argument, it can be shown that the hop count is not superad-ditive, either.98Appendix BProof of Strong Mixing PropertyIn this Appendix, it is shown that the increments of the hop count process on theunit distance graph, defined on the square lattice Z2, are asymptotically indepen-dent, or more formally, strongly mixing. We use this property in Section 2.3.1 tomotivate a simplifying independence assumption, as one of the defining propertiesof the increments of a Lévy process.A network represented by the unit distance graph on the square lattice Z2 maybe characterized as a dynamical system (Ω,F ,µ,Σ), that is, a probability spacetogether with a measurable transformation Σ : Ω→Ω [37]. Here, Ω= RE+ is theconfiguration space of the network, formed by the Cartesian product of its i.i.d.edge passage time variables {Θ(e)}e∈E . Furthermore, F is a σ -algebra and µ isa probability measure preserved under the transformation Σ : Ω→Ω, defined as ashift of the i.i.d. edge passage time variables by one unit to the left, that isµ(F) = µ(Σ−1F) for any F ∈F . (B.1)The measure-preserving transformation Σ and a random variable N0,n : Ω→ Z+,taking values in the measurable space (Z+,A ), together define the strictly station-ary process of the hop count incrementsNkn,(k+1)n(ω) = N0,n(Σkn(ω)) (B.2)Given the adjacent hop count increments N0,n and Nn,2n, the arbitrary hop count99events A,B ∈A and their pre-images F,G ∈F correspond throughF = N−10,n (A) = {ω : N0,n(ω) ∈ A} (B.3)G = N−1n,2n(B) = {ω : Nn,2n(ω) ∈ B} (B.4)It is common to write {N0,n ∈ A} etc. for these events.Definition B.1 ([26]). A dynamical system (Ω,F ,µ,Σ) with a measure-preserv-ing (shift) transformation Σ is said to be strongly mixing, if for all measurable setsF,G ∈F the conditionlimk→∞µ(F ∩Σ−kG) = µ(F)µ(G) (B.5)is satisfied.Proposition 1. The increments {N0,n,Nn,2n, . . .} are strongly mixing.Proof. Without loss of generality, let n = 2 j for j = 0,1, . . .. We define a two-parameter family of translated and scaled cylinder sets C (t,s)⊂ Z2, where t ∈ Zand s ∈ N, byC (t,s) ={1, . . . ,n}×Z ∪ o if t = 0, s = 1{n+1, . . . ,2n}×Z ∪ nex if t = n, s = 1{t− (s−1) j+1, . . . , t +(s+1) j}×Z otherwise.(B.6)Let N0,n|C (0,k) denote the hop count variable associated with the first-passage pathconnecting o and nex, restricted to the cylinder C (0,k), for k = 1,2, . . .. Similarly,Nkn,(k+1)n|C (kn,k) denotes the hop count variable associated with the first-passagepath connecting knex and (k+1)nex, restricted to the cylinder C (kn,k). For everyk, the cylinder setsC (0,k) andC (kn,k) define two disjoint subgraphs on the squarelattice. First-passage paths restricted to disjoint subsets of the square lattice clearlyhave independent hop counts, therefore we haveµ({N0,n|C (0,k) ∈ A}∩{Nkn,(k+1)n|C (kn,k) ∈ B})=µ{N0,n|C (0,k) ∈ A}µ{Nkn,(k+1)n|C (kn,k) ∈ B}.(B.7)100By the shift-invariance of the measure µ ,µ({N0,n|C (0,k) ∈ A}∩Σ−(k−1)n{Nn,2n|C (n,k) ∈ B})=µ{N0,n|C (0,k) ∈ A}µ{Nn,2n|C (n,k) ∈ B}.(B.8)Taking k→ ∞ relaxes the cylinder restriction, and we obtainlimk→∞µ({N0,n ∈ A}∩Σ−(k−1)n{Nn,2n ∈ B})=µ{N0,n ∈ A}µ{Nn,2n ∈ B},(B.9)that is, the hop count increments satisfy the strong mixing condition (B.5).101Appendix CConstrained-Path Search asInteger ProgramWe consider the problem of finding a target in a wireless sensor network. Varia-tions of this problem include, for example, paging a user in some ad hoc network,or a mobile searcher able to interrogate nodes in a sensor network. The mobilesearcher looks for the target in the current cell and if unsuccessful, passes to one ofthe adjacent cells. This imposes a constraint on the possible sequences of visitedcells, or the search path. We assume that an a-priori probability distribution of thelocation of the target is known to the path planning algorithm. The path planner’sobjective is to minimize the expected time to detection. We show that this problemcan be expressed and solved as an integer linear program, and evaluate its perfor-mance in comparison with a maximum probability of detection search, as well asan unconstrained search.C.1 IntroductionFor the rollout algorithm proposed in Chapter 3, a possible heuristic to compute theexpected search time is based on the assumption that beyond the rollout horizon, nofurther hop count observations (other than detecting the target, defined as observinga hop count of zero) are made. This assumption is reasonable, especially withthe search progressing to the late stage, when hop count observations tend to be102exhausted in the vicinity of the target.In this Appendix, we consider the path-constrained search for a target node ina simplified wireless sensor network. It is assumed that a belief state of the target isavailable to the searcher, inferred from past hop count observations up to the rollouthorizon, as described in Section 3.2.2. Within the scope of the algorithms studiedin this Appendix, we refer to the belief state as the a priori probability distributionof the target.While optimal search problems are commonly associated with e.g. search andrescue operations or exploration [60], they are encountered also in applicationssuch as target detection or paging in wireless sensor or ad hoc networks [83]. Thetheory of optimal search traces its origins to anti-submarine operations during WW-II. The general setup involves a target hidden in some search space, whose where-abouts are characterized by an a priori probability distribution, and a searcher try-ing to detect the target, typically such that the cumulative probability of detectionis maximized, subject to a cost constraint, e.g. a finite search time horizon. If thesearch effort is applied continuously and the searcher’s movement is not restrictedby a path constraint, the problem can be solved analytically under suitable condi-tions for the detection function [103]. If the search is performed by taking discrete“looks” and a path constraint is imposed, numerical optimization techniques mustgenerally be resorted to. Path-constrained search is a computationally challengingproblem in so far as its complexity has been shown to be NP-complete [111]. Anumber of algorithms for the solution of the optimal, path-constrained search havebeen published by various authors. These include approaches based on dynamicprogramming (POMDP) [27] as well as branch-and-bound (non-linear) integer pro-gramming [28, 38, 68, 86].In the majority of related works, the search planner’s objective is the maximiza-tion of the target detection probability over a finite time horizon. In this Appendix,we consider the minimization of the expected search time, and show how to ex-press and solve this search problem as an integer linear program. We compare thedetection performance due to our objective function to the search with maximumprobability of detection, and also to the upper bound on the cumulative detectionprobability obtained by the optimal search with path constraint relaxation.The Appendix is structured as follows: in Section C.2, we state our assump-103tions, and formulate integer linear programs with the objectives of minimum ex-pected search time and maximum probability of detection, respectively. In SectionC.3, we provide some simulation results for the optimal search path and comparethe cumulative detection probability obtained for the different objective functions.Conclusions are drawn in Section C.4.C.2 System ModelThe target search is performed on a 2-dimensional search space consisting of Jcells, j = 1, . . . ,J. The target’s whereabouts are characterized by an a priori prob-ability distribution p j over the search space. If the target is in the cell interrogatedby the searcher, it will be detected with probability q = 1. We do not allow falsedetections in case of an absent target. If the searcher does not detect the target inthe present cell j, it moves to a cell in the set N ( j), defined as the set of neigh-bors of j. We assume that the searcher is initially located in cell µ , and the targetin cell ν . The search proceeds through time steps k = 1, . . . ,H, with the horizonH ≤ J. The search path, i.e. the sequence of cells j(k), k = 1, . . . ,H followed bythe searcher is denoted by ω . Then, the probability of detecting the target on thesearch path isPω =H∑k=1p j(k). (C.1)We formulate the search problem as a binary integer linear program. A set ofbinary decision variables x j[k] is used to indicate the location of the searcher in cellj at time k.C.2.1 Minimizing the Expected Search TimeThe objective is to minimize the expected search time over all paths of length H,given that the target is on the search path ω ,minimize E[K|ν ∈ ω] =H∑k=1kp j(k)Pω(C.2)104where Pω is given by (C.1). Because there is no obvious way to transform thisobjective into a linear function of binary decision variables, we consider the relatedproblemminimizeH∑k=1k p j(k). (C.3)If the horizon is H < J, a problem with this objective function is that a search plan-ner could minimize it by simply avoiding cells with high probability of containingthe target, and visit cells with low probability instead. The objective function istherefore augmented with a term that penalizes such attempts by giving weight tothe probability of the cells not visited:minimizeH∑k=1k p j(k)+(H +1)(1−Pω). (C.4)In terms of the binary decision variables x j[k], the objective function (C.4) is nowexpressed asH∑k=1kJ∑j=1p j x j[k]+ (H +1)H∑k=1J∑i=1pi(1− xi[k]) (C.5)=H∑k=1J∑j=1k p j x j[k]+ (H +1) p j (1− x j[k]))=H∑k=1J∑j=1(k−H−1) p j x j[k]+ (H +1)H∑k=1J∑j=1p j=H∑k=1J∑j=1(k−H−1) p j x j[k]+ (H +1)H (C.6)Before we state the constraints of the search problem, we show that no smallervalue can be obtained for the objective function (C.6) by exchanging a cell on thesearch path for one of the unvisited cells of lower probability.Proof. Referring to (C.5), let j be a cell on, and i a cell off the optimal search path,with p j > pi. We show that the objective value increases if we exchange p j for pi.From (C.5), we see that the contribution of cells j and i to the value of the objective105function is k p j +(H +1) pi. We determine thatk p j +(H +1) pi < k pi +(H +1) p j (C.7)holds if and only if p j > pi, for k ≤ H. Therefore, visiting cell i instead of j canonly increase the value of the objective function (C.5).Constant terms in the objective function (C.6) are ignored during optimization,so that the final optimization problem takes the formminimizeH∑k=1J∑j=1(k−H−1) p j x j[k] (C.8)subject toJ∑j=1x j[k] = 1 all k (C.9)H∑k=1x j[k]≤ 1 all j (C.10)∑i∈N ( j)xi[k]− x j[k+1]≥ 0 all j,k (C.11)xµ [1] = 1 (C.12)x j[k] ∈ {0,1} all j,k (C.13)where j = 1, . . . ,J and k = 1, . . . ,H. Constraint (C.9) ensures that one and onlyone cell can be visited at each time k, (C.10) prevents the searcher from looking incells already visited and (C.11) restricts the searcher to move only to neighbors ofthe current cell. Constraint (C.12) finally defines the searcher’s initial position.C.2.2 Maximizing the Detection ProbabilityFor comparison, we also present an integer linear program describing the maxi-mum detection probability search. Here, the search planner simply maximizes theprobability of detecting the target over all paths of length H, with no particularweight given to early detection. As can be seen, this problem differs from theminimum expected time search only in the objective function.106maximizeH∑k=1J∑j=1p j x j[k] (C.14)subject toJ∑j=1x j[k] = 1 all k (C.15)H∑k=1x j[k]≤ 1 all j (C.16)∑i∈N ( j)xi[k]− x j[k+1] all j,k (C.17)xµ [1] = 1 (C.18)x j[k] ∈ {0,1} all j,k (C.19)C.3 Performance EvaluationTo evaluate and compare the performance of the two search problems (C.8) and(C.14), we define a 10×7 grid as the search space (J = 70), on which the searcheris allowed to move subject to the next-neighbor constraint. This setup representsa wireless sensor network with cell centers corresponding to node locations, asdescribed in Section 3.2.1. The a priori target distribution over the search space isdefined by a Gaussian mixture with the components0.28 N ((4,2), 0.8)+0.72 N ((7,5), 2.0) . (C.20)The mobile searcher is initially placed in cell j = 1 with the grid coordinates (1,1).The search starts at time k = 1 and has a horizon of H = 35 time steps.Both problems (C.8) and (C.14) give rise to integer linear programs of sizeJ×K = 2450 binary variables and 2486 constraints. This relatively large problemsize dictates a programmatic generation of the problem description, suitable forinput into an integer linear program solver. We use the open-source lpSolvepackage [7] to generate the problem description, and solve it using the Cbc mixed-integer linear program solver [18].The computed optimal search paths for the minimum expected search time107(Figure C.1) and the maximum probability of detection (Figure C.2) show that inorder to minimize the expected time to detection, the searcher visits cells with higha priori target probabilities as early as possible, unlike the maximum probabilityof detection search. As can be seen in Figure C.3, the cumulative probability ofdetection for the minimum expected time search increases at a faster rate initially.Reaching the horizon however, the maximum probability of detection search willachieve a higher total cumulative probability of detection, Pω , as expected. InFigure C.3 we compare the results of both search plans to the upper bound resultingfrom the relaxation of the search path constraint. In this case, where the searcheris free to move to any node within the search space in a single time step, it is wellknown that the optimal search policy is to visit the cells in descending order of thea priori target probability [103]. This policy is optimal for both the probability ofdetection and expected search time.To demonstrate the effectiveness of the penalized objective function (C.4) forthe minimum expected time search, we provide an example for a search using theunpenalized objective function (C.3), which would enable the search planner tominimize the objective function simply by visiting cells with low a priori targetprobabilities. The resulting search path in Figure C.4 shows that this is indeedthe consequence, and Figure C.5 confirms, that the unpenalized objective function(C.3) is not suitable for achieving a high probability of detection.C.4 ConclusionWe have formulated an integer linear program to solve the optimal, path-constrainedsearch in a wireless sensor network with a minimum expected search time objec-tive. We are able to show for the simulated configuration, that the expected searchtime, conditioned on a target located on the search path, is an improvement overthe search time obtained by the maximum probability of detection search, at theexpense of a very small loss of the total detection probability over the entire searchhorizon. Over most of the search horizon, the cumulative probability of detectionachieved by the minimum expected time search approaches more closely the upperbound given by a search plan with relaxed path constraint.Unfortunately, as a heuristic to approximate the future expected cost within108Figure C.1: Search path ω of minimum expected search time (C.8). Thea-priori target probability is a Gaussian mixture (C.20) over a searchspace of size J = 10×7. The search horizon is H = 35. The expectedsearch time is E[K|ν ∈ ω] = 17.28, with a cumulative probability ofdetection of Pω = 0.793.the framework of Monte Carlo solution methods for online POMDPs, integer pro-gramming is not practical due to its computational complexity. The solution timesfor problems of a size as studied here exceed that of other search time heuristics asproposed in Section 3.4.2 by about 4 orders of magnitude.109Figure C.2: Search path ω of maximum probability of detection (C.14). Thea-priori target probability is a Gaussian mixture (C.20) over a searchspace of size J = 10×7. The search horizon is H = 35. The expectedsearch time is E[K|ν ∈ ω] = 20.05, with a cumulative probability ofdetection of Pω = 0.804.1100 5 10 15 20 25 30 350. of Detectionmin. expected search timemax. probability of detectionrelaxation of path constraintFigure C.3: Comparison of the cumulative probability of detection, Pω , forthe two search policies of minimum expected search time and maxi-mum probability of detection, and for the optimal search without pathconstraint. The minimum expected time search tends to maximize theprobability of detection early, whereas the maximum probability of de-tection search maximizes over the entire horizon and achieves a higherprobability of detection when reaching the horizon, which is evident inthis example.111Figure C.4: Search path ω of minimum expected search time for the unpe-nalized objective function (C.3). Ignoring the penalty term introducedin (C.4), the search planner would be permitted to minimize the objec-tive function by visiting only cells with low a-priori probability, likelymissing the target. Compare with Figure C.1.1120 5 10 15 20 25 300. of Detectionmin. expected search timeFigure C.5: Cumulative probability of detection for the unpenalized, min-imum expected search time objective function (C.3). Ignoring thepenalty term introduced in (C.4), the search planner would be permit-ted to minimize the objective function by visiting only cells with lowa-priori probability so that the cumulative probability of detection re-mains small. Compare with Figure C.3.113Appendix DInfotaxis and Mutual InformationD.1 BackgroundA generalized search strategy referred to as “infotaxis” was introduced in [112], inthe context of localizing a diffusive source in dilute conditions, based on discretedetections of particles emitted by the source. In the framework of infotaxis, the aposteriori belief state bX for the source position X , conditioned on the history of ob-servations, is updated for every new observation using Bayes’ rule. The searcher,currently positioned at y, selects the next action a which maximally reduces theentropy H(X) of the belief state. The entropy reduction problem is posed as anoptimal tradeoff between the need to perform exploitative (i.e. greedy) and ex-ploratory actions, motivated by concepts from reinforcement learning [106]. Thistradeoff is reflected in the infotactic utility function [112, Equation (1)], which is aweighted sum of terms favoring either greedy or exploratory behavior.D.2 Proof of EquivalenceWe show that the infotactic utility is precisely the mutual information between thecurrent belief state and myopic, future observations.Proof. Adapted to our notation, the infotactic utility [112, Equation (1)] for theexpected change of entropy after taking action a and observing Z, can be written114as follows:−∆H(X) = bX(y′)H(X)− [1−bX(y′)] ∑z∈N0ρz∆H(X |Z = z,a). (D.1)The ρz are the probabilities of detecting z particles, z ∈ N0, at the next searcherposition y′ = y+a. For the purposes of this proof, we do not require further detailsof the observation model used in [112]. In the framework of infotaxis, the right-hand side of (D.1) is the convex sum of a greedy term, that is, the expected entropyreduction upon the imminent detection of the source, and an exploratory term de-scribing the entropy reduction due to additional particle detections. Let the eventof detecting the source be interpreted as another observation, denoted z = D. Theconditional entropy after detecting the source is H(X |Z = D,a) = 0, eliminatingany uncertainty about the source location. Then−∆H(X) = bX(y′) [H(X)−H(X |Z = D,a)]+ [1−bX(y′)] ∑z∈N0ρz [H(X)−H(X |Z = z,a)] . (D.2)We define the augmented set of observationsZ =N0∪{D}, and the correspondingobservation probabilities pZ(z|a) aspZ(z|a) =bX(y+a) if z = D,[1−bX(y+a)]ρz otherwise.(D.3)We can then express (D.2) as−∆H(X) = ∑z∈ZpZ(z|a)[H(X)−H(X |z,a)] (D.4)= H(X)−EZ[H(X |z,a)]. (D.5)The RHS of (D.5) is the defining expression of the mutual information [19] be-tween the belief state X and the expected future observation Z, given the action a,that isI(X ;Z|a) = H(X)−EZ[H(X |z,a)]. (D.6)115We have thus shown that the infotactic utility given by [112, Equation (1)] is equiv-alent to the mutual information.D.3 ConclusionWe conclude that infotaxis is equivalent to search strategies based on a myopic,mutual information utility. Nonetheless, infotaxis offers an interesting viewpointby showing explicitly, that maximizing the mutual information inherently balancesexploitation and exploration in an optimal way, when the overall goal is the reduc-tion of the entropy of the belief state.116Appendix EPseudocode117Algorithm 1 Pseudocode for a rollout algorithm to compute a heuristic,approximate expected search time, given the single base policy pi baseRequire: A , b, pi base, H,W1: for all a ∈Ayk\{D} do2: ; initialize average search time3: Q[a] := 04: for i := 1 to W do5: ; simulate system trajectory6: a˜ := a7: b˜ := bX8: x˜ ∼ bX9: y˜ := yk +a10: for n := 1 to H do11: Q[a] := Q[a]+ 1W J(a˜)12: z˜∼ O(x˜, y˜)13: if z˜ = 0 then14: break15: end if16: b˜ := B(b˜, a˜, z˜)17: a˜ := pi base(b˜)18: y˜ := y˜+ a˜19: end for20: if z˜ 6= 0 then21: ; add heuristic, terminal search time22: Q[a] := Q[a]+ 1W J0(b˜, a˜)23: end if24: end for25: end for26: ; select next action27: a := argmin Q[a]28: return a118Algorithm 2 Pseudocode for a parallel rollout algorithm to compute aheuristic, approximate expected search time, given the set Π of base policiesRequire: A , b,Π, H,W1: for all a ∈Ayk\{D} do2: ; initialize average search time3: Q[a] := 04: for i := 1 to W do5: for all pi base ∈Π do6: ; simulate system trajectory7: J[pi base] := 08: a˜ := a9: b˜ := bX10: x˜ ∼ bX11: y˜ := yk +a12: for n := 1 to H do13: J[pi base] := J[pi base]+ 1W J(a˜)14: z˜∼ O(x˜, y˜)15: if z˜ = 0 then16: break17: end if18: b˜ := B(b˜, a˜, z˜)19: a˜ := pi base(b˜)20: y˜ := y˜+ a˜21: end for22: if z˜ 6= 0 then23: ; add heuristic, terminal search time24: J[pi base] := J[pi base]+ 1W J0(b˜, a˜)25: end if26: end for27: Q[a] := Q[a]+ argmin J[pi]28: end for29: end for30: ; select next action31: a := argmin Q[a]32: return a119Algorithm 3 Pseudocode for rollout algorithm to compute the approximate,nonmyopic mutual information, given the base policy pi baseRequire: A , b, pi base, H,W1: for all a ∈Ayk\{D} do2: initialize mutual information3: I[a] := H(bX)4: for i := 1 to W do5: ; simulate system trajectory6: a˜ := a7: b˜ := bX8: x˜ ∼ bX9: y˜ := yk +a10: for n := 1 to H do11: z˜∼ O(x˜, y˜)12: if z˜ = 0 then13: break14: end if15: b˜ := B(b˜, a˜, z˜)16: a˜ := pi base(b˜)17: y˜ := y˜+ a˜18: end for19: if z˜ 6= 0 then20: ; subtract posterior entropy21: I[a] := I[a]− 1W H(b˜)22: end if23: end for24: end for25: ; select next action26: a := argmax I[a]27: return a120


Citation Scheme:


Citations by CSL (citeproc-js)

Usage Statistics



Customize your widget with the following options, then copy and paste the code below into the HTML of your page to embed this item in your website.
                            <div id="ubcOpenCollectionsWidgetDisplay">
                            <script id="ubcOpenCollectionsWidget"
                            async >
IIIF logo Our image viewer uses the IIIF 2.0 standard. To load this item in other compatible viewers, use this url:


Related Items