UBC Theses and Dissertations

UBC Theses Logo

UBC Theses and Dissertations

Where did it go? : regaining a lost target for robot visual servoing Radmard, Sina 2015

Your browser doesn't seem to have a PDF viewer, please download the PDF to view this item.

Item Metadata

Download

Media
24-ubc_2016_may_radmard_sina.pdf [ 5.99MB ]
Metadata
JSON: 24-1.0223895.json
JSON-LD: 24-1.0223895-ld.json
RDF/XML (Pretty): 24-1.0223895-rdf.xml
RDF/JSON: 24-1.0223895-rdf.json
Turtle: 24-1.0223895-turtle.txt
N-Triples: 24-1.0223895-rdf-ntriples.txt
Original Record: 24-1.0223895-source.json
Full Text
24-1.0223895-fulltext.txt
Citation
24-1.0223895.ris

Full Text

  WHERE DID IT GO? REGAINING A LOST TARGET FOR ROBOT VISUAL SERVOING   by  SINA RADMARD  B.Sc., Yazd University, Yazd, IRAN, 2006 M.Sc., Sharif University of Technology, Tehran, IRAN, 2008    A THESIS SUBMITTED IN PARTIAL FULFILLMENT OF THE REQUIREMENTS FOR THE DEGREE OF  DOCTOR OF PHILOSOPHY   in   THE FACULTY OF GRADUATE AND POSTDOCTORAL STUDIES  (Mechanical Engineering)    THE UNIVERSITY OF BRITISH COLUMBIA (Vancouver)  January 2016    © Sina Radmard, 2015ii  Abstract When a robotic visual servoing/tracking system loses sight of its target, the servo fails due to loss of input. To resolve this problem, a search method is required to generate efficient actions and bring the target back into the camera field of view (FoV) as soon as possible. For high dimensional platforms like a camera-mounted manipulator, an eye-in-hand system, such a search must address the difficult challenge of generating efficient actions in an online manner while avoiding visibility and kinematic constraints. This work considers two common scenarios of visual servoing/tracking failure, when the target leaves the camera FoV and when visual occlusions, occlusions for brevity, disrupt the process. To handle the first scenario, a novel algorithm called lost target search (LTS) is introduced to plan online efficient sensor actions. To handle the second scenario, an improved algorithm called lost target recovery algorithm (LTRA) allows a robot to look behind an occluder during active visual search and re-acquire its target in an online manner. Then the overall algorithm is implemented on a telepresence platform to evaluate the necessity and efficacy of autonomous occlusion handling for remote users. Occlusions can occur when users in remote locations are engaged in physical collaborative tasks. This can yield to frustration and inefficient collaboration between the collaborators. Therefore, two human-subjects experiments are conducted (N=20 and 36 respectively) to investigate the following interlinked research questions: a) what are the impacts of occlusion on telepresence collaborations, and b) can an autonomous handling of occlusions improve telepresence collaboration experience for remote users? Results from the first experiment demonstrate that occlusions introduce a significant social interference that necessitates collaborators to reorient or reposition themselves. Subsequently, results from the second experiment indicate that the use of an autonomous controller yields a remote user experience that is more comparable (in terms of their vocal non-verbal behaviors, task performance and perceived workload) to collaborations performed by two co-located parties.  These contributions represent a step forward in making robots more autonomous and user friendly while interacting with human co-workers. This is a necessary next step for successful adoption of robots in human environments. iii  Preface This thesis is based on expansion of several manuscripts, through collaboration of multiple researchers. The material from the publications has been modified to make the thesis coherent. A version of Chapter 2 has been published in:  S. Radmard and E. A. Croft, "Active Target Search for High Dimensional Robotic Systems," Auton. Robots, 2015. The contribution of the author was in developing, implementing, evaluating the method and writing the manuscript. Prof. Croft assisted with her valuable suggestions and editing the manuscript. A version of Chapter 3 has been submitted as:  S. Radmard, D. Meger, J. J. Little and E. A. Croft, "Resolving Occlusion in Active Visual Target Search of High Dimensional Robotic Systems," 2015. The contribution of the author was in developing, implementing, evaluating the method and writing the manuscript. David Meger, Professor’s Little’s doctoral student, assisted with suggestions, setting up the experimental set-up and editing the manuscript. Profs. Croft and Little assisted with their valuable suggestions and editing the manuscript. A version of Chapter 4 has been submitted as:  S. Radmard, A. Moon and E. Croft, "Impacts of Visual Occlusion and its Resolution in Telepresence Collaborations," 2015. The contribution of the author was in user study design, conducting studies, collecting and analyzing data, evaluating the results and writing the manuscript. AJung Moon, Professor’s Croft’s doctoral student, assisted with human-subjects study design, data analysis and editing the manuscript. Prof. Croft assisted with her valuable suggestions and editing the manuscript.  The user studies described in this chapter were performed with the approval of the Behavioral Research Ethics Board (BREB) at the University of British Columbia (UBC), under ethics application #H14-01937 “Telepresence and Occlusion Study”.  iv  Table of Contents Abstract .............................................................................................................................................................. ii Preface ............................................................................................................................................................... iii Table of Contents ............................................................................................................................................. iv List of Tables ................................................................................................................................................... vii List of Figures ................................................................................................................................................. viii Glossary ........................................................................................................................................................... xii Acknowledgements ........................................................................................................................................ xiv Chapter 1 ˗ Introduction ................................................................................................................................... 1 1.1 MOTIVATION .............................................................................................................................................. 1 1.2 PROBLEM FORMULATION ........................................................................................................................... 2 1.3 BACKGROUND ............................................................................................................................................ 4 1.3.1 Searching for a Target outside the Agent’s FoV ................................................................................. 5 1.3.2 Searching for an Occluded Target....................................................................................................... 7 1.3.3 Impacts of the Search Algorithm in Telepresence .............................................................................. 8 1.4 THESIS OBJECTIVES .................................................................................................................................... 9 1.5 THESIS OUTLINE ....................................................................................................................................... 11 Chapter 2 - Active Target Search for High Dimensional Robotic Systems ................................................ 12 2.1 INTRODUCTION ......................................................................................................................................... 12 2.2 RELATED WORK ....................................................................................................................................... 12 2.2.1 Recursive Bayesian Estimation ......................................................................................................... 13 2.2.2 Cost Functions .................................................................................................................................. 15 2.2.3 Planning Horizon .............................................................................................................................. 16 2.3 PROBLEM FORMULATION ......................................................................................................................... 18 v  2.4 METHODOLOGY ........................................................................................................................................ 19 2.4.1 Observation Model ............................................................................................................................ 20 2.4.2 Cost Function .................................................................................................................................... 23 2.4.3 Constraints ........................................................................................................................................ 25 2.4.4 Planning Horizon .............................................................................................................................. 26 2.5 RESULTS ................................................................................................................................................... 32 2.5.1 Experimental Platform ...................................................................................................................... 32 2.5.2 Recursive Bayesian Estimation ......................................................................................................... 33 2.5.3 Cost Function .................................................................................................................................... 35 2.5.4 Planning Horizon .............................................................................................................................. 38 2.5.5 Generalizing to Higher Dimensions .................................................................................................. 39 2.6 CONCLUSION ............................................................................................................................................ 42 Chapter 3 - Resolving Occlusion in Active Visual Target Search of High Dimensional Robotic Systems ........................................................................................................................................................................... 44 3.1 INTRODUCTION ......................................................................................................................................... 44 3.2 RELATED WORK ....................................................................................................................................... 45 3.3 PROBLEM FORMULATION ......................................................................................................................... 47 3.4 METHODOLOGY ........................................................................................................................................ 48 3.4.1 Cost Function .................................................................................................................................... 49 3.4.2 Constraints ........................................................................................................................................ 52 3.4.3 Optimization ..................................................................................................................................... 57 3.5 RESULTS ................................................................................................................................................... 63 3.5.1 Simulation Results ............................................................................................................................ 63 3.5.2 Experimental Set-up .......................................................................................................................... 67 3.5.3 Experimental Results ........................................................................................................................ 68 3.6 CONCLUSION ............................................................................................................................................ 70 Chapter 4 - Visual Occlusion Handling in Telepresence Collaborations .................................................... 72 4.1 INTRODUCTION ......................................................................................................................................... 72 vi  4.2 RELATED WORK ....................................................................................................................................... 74 4.3 PRELIMINARY STUDY ............................................................................................................................... 76 4.3.1 Telepresence Platform ....................................................................................................................... 76 4.3.2 Experimental Task ............................................................................................................................ 77 4.3.3 Manual Control Interfaces ................................................................................................................ 80 4.3.4 Selection of Manual Interface ........................................................................................................... 81 4.4 MAIN USER STUDY DESIGN ..................................................................................................................... 82 4.4.1 Autonomous Mode Controller .......................................................................................................... 82 4.4.2 Participants ........................................................................................................................................ 82 4.4.3 Procedure .......................................................................................................................................... 83 4.4.4 Measures ........................................................................................................................................... 84 4.5 RESULTS ................................................................................................................................................... 85 4.5.1 Impacts of Occlusion on Telepresence Collaborations (H1, H2) ...................................................... 86 4.5.2 Autonomous Handling of Occlusions to Improve Telepresence Collaborations (H3-H6) ............... 87 4.6 DISCUSSION .............................................................................................................................................. 91 4.7 CONCLUSION ............................................................................................................................................ 93 Chapter 5 - Conclusions .................................................................................................................................. 95 5.1 SUMMARY ................................................................................................................................................ 95 5.2 CONTRIBUTIONS ....................................................................................................................................... 96 5.3 FUTURE WORK ......................................................................................................................................... 97 Bibliography ..................................................................................................................................................... 99 Appendix ......................................................................................................................................................... 110 A.1 STUDY ADVERTISEMENT ....................................................................................................................... 110 A.2 CONSENT FORMS ................................................................................................................................... 112 A.3 QUESTIONNAIRES .................................................................................................................................. 115  vii  List of Tables Table 2-1. LTS algorithm parameters in simulation .......................................................................................... 34       viii  List of Figures Figure 1-1. A schematic representation of a high dimensional eye-in-hand system searching for a moving target, a) when he target leaves the FoV, b) when the target is occluded.  Figure is generated using Barrett Technology WAM CAD model. .......................................................................................................................... 3 Figure 2-1. A scaled detection model in 𝓕𝒊 for a specific depth with depth coefficient ℵ(𝒅𝒌) = 𝟏. ................ 22 Figure 2-2. Schematic representation of the detection model. ........................................................................... 22 Figure 2-3. An example of the penalty function for a joint with joint limits of [−𝟑 𝟑] 𝒓𝒂𝒅. ........................... 26 Figure 2-4. A simplified 2D search when sensor action in one step does not return any information. ............. 27 Figure 2-5. Simulated greedy planning in a simplified 2D search when sensor action in one step does not return any information. All of the stages are addressed in Algorithm 1. ........................................................... 30 Figure 2-6. Experimental platform. .................................................................................................................... 33 Figure 2-7. Grid-based versus particle filter to solve for RBE of the target considering different number of grids/particles from time step and travel distance point of view........................................................................ 35 Figure 2-8. Common cases where a) (2-25) can cause a delay and b) (2-26) might fail by getting stuck in a local minimum. .................................................................................................................................................. 36 Figure 2-9. Entropy-based cost function versus PD cost function from time step and travel distance point of view for 100 simulation trials. ........................................................................................................................... 36 Figure 2-10. Designed experiment to lose the target during tracking, a) the moment that the searching phase is triggered, b) the moment that the target is recovered......................................................................................... 37 Figure 2-11. Entropy-based cost function versus PD cost function from time step, time and travel distance point of view for 40 experimental trials. ............................................................................................................ 38 Figure 2-12. Greedy planning versus RHC from time step and travel distance point of view for 100 simulation trials. .................................................................................................................................................................. 38 ix  Figure 2-13. 2-DoF (pan-tilt) search versus 7-DoF (high dimensional) search from time step and travel distance point of view for 100 simulation trials. ................................................................................................ 39 Figure 2-14. A sample simulation of 2-DoF (pan-tilt) search versus 7 DoF (high dimensional) search. a) shows a 2-DoF search at time step 18, b) shows a 7-DoF search at time step 10 and c) compares the CDP of these two search methods. .................................................................................................................................. 40 Figure 2-15. 2-DoF (pan-tilt) search versus 7-DoF (high dimensional) search from time step, time and travel distance point of view for 20 experimental trials. .............................................................................................. 41 Figure 2-16. A sample experiment comparing 2-DoF (pan-tilt) search versus 7-DoF (high dimensional) search: a) tracking phase, b) the moment that the target leaves the camera FoV, c) the moment that the target is recovered in a 2-DoF search, d) the moment that the target is recovered in a 7-DoF search. ........................ 41 Figure 3-1. a) Detecting only a single edge of the occluder. b) A view of the occlusion plane with the projected camera FoV, defining different types of occlusion edges. 1-3 represent known edges, 4 represents an unknown edge and 5-7 represent potential edges. ......................................................................................... 50 Figure 3-2. A scaled detection model in 𝓕𝒊 for a specific depth with depth coefficient ℵ(𝒅𝒌) = 𝟏, a) without occlusion, b) with occlusion .............................................................................................................................. 54 Figure 3-3. Schematic representation of the detection model. ........................................................................... 55 Figure 3-4. A schematic representation of a shadow plane and critical points, 𝒙𝒄𝒓, in the vicinity of the closest points on the shadow plane to the camera and robot’s base location. ................................................................ 57 Figure 3-5. One-step look-ahead planning using LTRA-RS. Samples are derived randomly in C-space. a) Rectangular board occlusion scenario. b) Human occlusion scenario. .............................................................. 58 Figure 3-6. One-step look-ahead planning using LTRA-CS. Samples are derived in Cartesian space and transferred to C-space using IK. a) Rectangular board occlusion scenario. b) Human occlusion scenario. ...... 59 Figure 3-7. a) An example of the penalty functions, 𝝍(𝒒𝒊) and 𝒘(𝒒𝒊), for a joint with joint limits of [−𝟑 𝟑] 𝒓𝒂𝒅.  b) Sample distribution around a critical point derived from (3-26). ............................................ 61 Figure 3-8. One-step look-ahead planning using LTRA-IS. Critical points are transferred from Cartesian to C-space using IK as seeding points for intelligent sampling. a) Rectangular board occlusion scenario. b) Human occlusion scenario. ............................................................................................................................................. 62 Figure 3-9. Comparison of the three LTRAs given 6 initial robot configurations for a rectangular board occlusion with respect to time step and total travel distance cost for 100 simulation trials. ............................. 64 x  Figure 3-10. Comparison of the three LTRAs given 6 initial robot configurations for a human occlusion with respect to time step and total travel distance cost for 100 simulation trials. ...................................................... 65 Figure 3-11. Comparison of the three LTRAs given Elbow Down initial robot configuration for a human occlusion with respect to computational cost for 100 simulation trials. ............................................................ 67 Figure 3-12. a) The articulated telepresence platform. b) An occlusion scenario where the person on the left represents the interlocutor and the person wearing a yellow T-shirt is the occluder. ........................................ 68 Figure 3-13. Comparison of the three LTRAs with random initial robot configurations with respect to time step, time and total travel distance cost for 20 experimental trials. ................................................................... 69 Figure 3-14. A stationary-interlocutor/moving-occluder telepresence scenario. In each stage a room view and a mounted camera view is presented. The person with the ARTag is the interlocutor while the person with yellow T-shirt is the occluder. In this sample scenario the LTRA-IS solver recovers the interlocutor in one step as the occluder approaches the pink bucket. ............................................................................................... 70 Figure 4-1. The 7-DoF telepresence platform used in our experiments consists of a tablet and a Kinect mounted on its end-effector. .............................................................................................................................. 77 Figure 4-2. A sample Jigsaw puzzle for the pilot used for the collaborative task. ............................................ 78 Figure 4-3. Schematic representation of the experimental telepresence collaboration set-up used for both preliminary and main studies. Both the interlocutor and the occluder were experimenters. The numbers in front of the interlocutor indicate the order in which the interlocutor completed the puzzle. The numbers in front of the telepresence platform represent the puzzle pieces at which the occluder moved to occlude the pilot. ................................................................................................................................................................... 79 Figure 4-4. Demonstration of an occlusion introduced during the experiments and its resolution. The ‘Mounted camera view’ shows an example of what the pilot, the participant, saw during the experiment. ..... 80 Figure 4-5. Manual mode interfaces. a) Leap-based interface that operates through hand gestures. b) Keyboard-based interface that operates through a set of keys. .......................................................................... 81 Figure 4-6. Overview of the experimental procedure. All participants completed three remote collaboration trials and five sets of questionnaires. We alternated the Manual and Autonomous modes in between participants to account for possible order effect. ............................................................................................... 84 Figure 4-7. Occlusion impacts in Stationary mode. a) Speech energy is significantly higher in Occlusion condition compared to No-occlusion. b) There is no significant difference in speech percentage of Occlusion xi  and No-occlusion. c) The task performance is significantly higher in No-occlusion condition as its piece completion time is significantly lower. .............................................................................................................. 87 Figure 4-8. Comparison of objective measures across different interaction modes. a) Speech energy is highest in Stationary, followed by Manual, Autonomous and co-located collaboration. b) Speech percentage is also highest in Stationary, followed by Manual, Autonomous and co-located collaboration. c) Task completion time is also highest in Stationary, followed by Manual, Autonomous and co-located collaboration. d) Occlusion resolution time is significantly lower in Autonomous mode compared to Manual mode. ............... 89 Figure 4-9. Comparison of subjective measures. a) TPI measures on the sense of presence across the three conditions illustrate that Stationary mode provides the lowest perceived sense of spatial and social presence. b) NASA TLX measures illustrate that Autonomous mode has significantly lower perceived workload compared to Manual model. .............................................................................................................................. 90  xii  Glossary 1D/2D/3D/7D  one/two/three/seven dimensional CDP  cumulative detection probability C-space Configuration space DoF  degree of freedom EKF   extended Kalman filter FK  forward kinematics FoV  field of view GMM  Gaussian mixture model IK  inverse kinematics KF  Kalman filter  LTRA   lost target recovery algorithm LTS   lost target search MRP  mobile remote presence  NBV  next best view NP-hard non-deterministic Polynomial-time hard PD  probability of detection PDF  probability density function PRM  probabilistic roadmap  RBE  recursive Bayesian estimation  RHC  receding horizon control ROS   robot operating system RRT  rapidly exploring random tree xiii  SIR  sampling importance resampling SMC  sequential Monte Carlo TLX  task load index TPI  temple presence inventory UAV  unmanned aerial vehicle UKF  unscented Kalman filter xiv  Acknowledgements I would like to thank Professor Elizabeth Croft for her supervision and insight into my research throughout my doctoral study. I am very much indebted to her for her valuable lessons, mentorship and support since the first day I met her in CARIS lab. I am grateful of my supervisory committee, Professor James Little, Professor Farrokh Sassani and Professor Ryozo Nagamune, for their valuable feedback during the course of this thesis. Thanks also go to Professor Nando de Freitas, David Meger, AJung Moon, Ambrose Chan, Cole Shing, Joost Hilte, Louisa Hardjasa and Alex Toews in particular, who supported me in my research, and all of the volunteers who participated in my studies. I was fortunate during my candidacy to be surrounded by great members of the CARIS and CEL labs at UBC: Professor Mike Van der Loos, Jeswin, Tom, Susana, Wesley, Ergun, Eric, Simon, Amir, Chris, Jenny, Tina, Navid, Bulmaro, Philip, Matthew, Ben, Shalaleh, Mahsa, Sara, Justin, Ehsan, Danial, Marius and Masih. I would like to acknowledge the sources grants that supported my research: the Natural Science and Engineering Research Council of Canada (NSERC), the Canada Foundation for Innovation (CFI) and the UBC Institute for Computing, Information and Cognitive Systems (ICICS). Finally, but most importantly, I would like to thank my family, specifically my parents, whose unconditional love and support have assisted me throughout my studies and my life.   1  Chapter 1 ˗ Introduction 1.1 Motivation Robots are rapidly becoming incorporated into our daily lives. Development of prototype robotic assistants such as the Fetch robot1, the Baxter2 and the Sawyer3 and the rise of commercial robotic products such as the Roomba4, have demonstrated both interest and applications for robots that can function successfully in human environments. The ability for robots to work in semi-structured and even unstructured environments, that are far different than current robotic workcells, is important to the successful adoption of robots in human environments.  This transition is enabled by ongoing research in vision guided robot control or, visual servoing [1], [2],  that allows robots to operate within the “vision based” world in which humans work.  In addition, having a single camera sensory platform in robotics is cost effective, easy to maintain and computationally not that expensive. Therefore the use of a camera as the main or only feedback sensor has great appeal in areas like surveillance, assembly line part picking, manipulation, inspection, rescue robots, soccer robots, planetary rovers, etc. [3]. Given the limited range nature of a camera, it is necessary for all of these applications to keep the camera field of view (FoV) focused on the task. However, a common problem associated with using a camera as the feedback sensor is losing sight of the object that the camera was viewing. In manipulation, an object may evade the camera FoV. During an inspection, a human co-worker may occlude the object of interest from the camera. In such situations the robot needs to acquire new data to locate the lost object. The new data could be obtained from other sensor platforms if available; alternatively the robot could acquire new data by searching for the target, based on the past data it has collected. A review of the current literature suggests some solutions for specific scenarios that involve searching for a lost target. Acquiring data from other available sensor platforms like laser scanners or sonar rangefinder                                                      1 A mobile manipulator, Fetch Robotics, http://fetchrobotics.com/fetchandfreight/. 2 A collaborative robot, Rethink Robotics, http://www.rethinkrobotics.com/products-software/. 3 A collaborative robot, Rethink Robotics, http://www.rethinkrobotics.com/products-software/. 4 iRobot Roomba®, a vacuum cleaning robot. 2  (widely used in outdoor applications) or other stationery cameras (more applicable to indoor tasks) is a common solution [4], [5]. However, sensor fusion has its own challenges in online applications due to increased amount of data that needs to be processed. Furthermore, there is the cost, calibration and maintenance of the extra sensor which may not be affordable for the particular application.  A small number of recent research contributions, e.g. [6], propose visual tracking algorithms that address overcoming temporary occlusion. Although these contributions were made in the context of single camera platform, they do not address the case where the target leaves the FoV, or where the occlusion is not temporary. However, these works advance the ability of robotic platforms to become more autonomous in their working environment.  The ability to look for and find targets in the case of a more significant loss of target location is a natural next step to improve robot autonomy in wide variety of applications. 1.2 Problem Formulation The majority of robotic vision applications share the common task of object tracking. Such a task involves object detection, trajectory estimation and hardware manipulation, all of which depend on some visual cues of the target. Therefore, it is important for the robot to maintain visibility of the task at hand. Although methods to avoid loss of target visibility should be employed, given the less structured nature of the human environments where robots will operate, it is likely that loss of task visibility will inevitably occur. Given this likelihood, it is desirable that the robot can find the target in an autonomous/automatic fashion. This thesis directly addresses this problem to equip robots with automated methods to efficiently handle lost target scenarios; that is, to quickly re-acquire the target, so the robots can return to their visual tasks autonomously. This problem is defined in the context of visual target tracking for a single-camera mounted on a high degree of freedom (DoF) robot to address a wide range of applications. Irrespective of the specific visual task at hand prior the target being lost, robots should find the lost target efficiently and then locate it within a safe region of the acquired image. The specific focus is on recovery of visual tracking through active visual search in two common scenarios. One scenario involves the target leaving the FoV (Figure 1-1.a), and the other includes visual occlusions in dynamic environments (Figure 1-1.b). An example of the first scenario is loss of a target traveling on a conveyer belt due to an abrupt change in the conveyer belt’s speed or direction during an inspection or grasping process. An example for the second scenario is the loss of a target when a human co-worker intrudes between the robot and the target of interest.  3    a)                                                                                          b)  Figure 1-1. A schematic representation of a high dimensional eye-in-hand system searching for a moving target, a) when he target leaves the FoV, b) when the target is occluded.  Figure is generated using Barrett Technology WAM CAD model. To address this problem, in the first instance, a fast and efficient online planner - an adaptive planner that updates and replans as new information arrives - is required to search for a moving object when a high dimensional camera mounted robotic platform loses track of the target. Without loss of generality, this work considers a camera-mounted manipulator - a 7-DoF eye-in-hand system - as the exemplar high dimensional robotic platform while the target moves in 3D as depicted in Figure 1-1.a. When the target state is not observable, probabilistic models can capture uncertainties in the target location and sensory observations. Then the target information needs to be updated as new information becomes available through observations. A planner or decision maker is also required to plan optimal sensory observations while considering sensor agent constraints. This process is the foundation of probabilistic search strategies originating from operational research work [7], [8]. As the next step, visual occlusions (herein referred to as occlusions) must be resolved to recover a lost moving target. Here, this work classifies occlusions into two types. The first type involves a moving target or moving occluder that results in temporary loss of visibility; however the target quickly reappears within the FoV without the need for camera repositioning. This type, namely a “temporary occlusion”, is addressed by Tsai et al. [9]. This thesis considers a second type of occlusion, namely a “persistent occlusion”. Resolving this type of occlusion requires the search for a camera pose (manipulator configuration) that restores visibility of the lost target so that tracking can resume.  Resolution of this type of occlusion requires the robot to actively look around the occluder. The resulting new configuration must avoid singularities and joint limits while ensuring the target is inside the camera FoV.  4  For an algorithm to restore the view of the target in presence of persistent occlusions, it must first obtain an understanding of the occluder extents. This involves: i) constructing a map that splits the world into occlusion regions, free-space, and unobserved areas, ii) continuously updating this map with new sensory data, and iii) permitting queries about "how much new information will be gained about the occlusion if the robot is moved to position 𝒒". Once a partial or complete map is obtained, the algorithm needs to reason about whether looking around the known occluder boundaries can restore the view of the lost target.  In contrast to most of the vision-related control schemes in robotics, there is no specific goal image or goal pose to achieve. Instead, the algorithm must recover the target as soon as possible from any view or pose. This is highlighted in robotic telepresence application where the camera and a local user can take various formations. In robotic telepresence, a user in a remote location embodies a robot with various levels of control over it to communicate with the local user. This application further allows to investigate the impacts of the achieved autonomy in human environments where robots will collaborate and operate. Then the algorithm’s performance needs to be compared against the way humans handle lost target scenarios in telepresence as a baseline.  In order to develop an algorithm to handle different scenarios of a lost target in high-dimensional robotics vision and investigate its potential applications, a review of the related works is necessary. This is the focus of the next section, where the related background is provided in three specific subsections. It reviews the works related to the two lost target scenarios and then the specific application of robotic telepresence in more detail. 1.3 Background Active visual search is a subset of a more general problem, active vision, in robotics and computer vision. Active vision (active perception) was introduced in [10] as “a problem of controlling strategies applied to the data acquisition process which will depend on the current state of the data interpretation and the goal or the task of the process”. A recent survey on active vision in robotics, [11], reviews a variety of application fields from object recognition and modeling to site reconstruction and inspection, surveillance, localization and mapping, navigation and exploration, tracking and search, robotic manipulation and assembly.  Researchers in vision-guided robotics primarily focus on maintaining the target within the camera FoV [12]–[17]. The work in [12] proposed a control strategy using Cartesian manipulability gradient and a 5  joint limit penalty function to increase the camera’s tracking region. The authors demonstrated that their proposed strategy overcame tracking failures arising from boundaries due to robot joint limits or kinematic singularities. Later, [18] adopted a risk-based tracking approach to minimize the risk of an evading target becoming occluded by environmental obstacles. However, the system exited with failure every time the target was not localized. Therefore, maintaining visibility is not always possible due to factors such as limited camera FoV, system constraints, occlusions and poor lighting conditions. For example, during visual servoing in [19] if the target leaves the FoV or is occluded, a search is launched only in the image space until the target is found or the time spent on searching passes a predefined time constraint. In other words, if the target does not reappear in a predictable and narrowly localized site, such algorithms will fail. The ability to robustly look for and find truly “lost” targets is a natural next step to improve robot autonomy in wide range of applications. Therefore, next section reviews the works focused on searching for a lost target when the target is outside sensor agent’s range in various robotics applications. 1.3.1 Searching for a Target outside the Agent’s FoV Two closely related fields relevant to the target re-acquisition problem are: search, and pursuit-evasion [20]. In search the target is unaware, and moves independently of, the searcher agent state [21], [22] while in pursuit-evasion an evader with full awareness of the searcher state creates the worst-case adversary [23], [24]. This thesis focuses on lost target search also known as one-sided search, as opposed to pursuit-evasion or adversarial search.  A related area of research, where a robotic platform with a single, limited-range sensor applies various searching methods, involves unmanned aerial vehicles (UAVs), where one or more UAVs search for one or more lost targets (at sea or in the countryside). One of the early implementations of UAVs search for a single target is presented in [25] using a grid-based recursive Bayesian estimation (RBE), [26]. Further improvements were later provided by the same group in [27], [28]. Considering the same 2D platform, other forms of RBE and optimal planners were applied in [29]–[31] among others.  Various search algorithms have also been implemented on wheeled mobile robot platforms equipped with mounted sensors (usually a camera) to search for an object in an indoor environment. A stereo camera is the only sensor mounted on a mobile robot platform in [32], [33] to search for a stationary target in an indoor environment. The work in [34] further improves the object search to large dynamic cluttered indoor environments by utilizing environment semantics and additional sensors for obstacle avoidance.  6  There are a few applications of search methods for robotic manipulators. A sample-based motion planning was developed in [35] for an eye-in-hand system with mobile base to search for a stationary target in a known 3D indoor environment. This approach is presented in simulation and its offline planning process is stated to be time consuming as it covers the whole environment. In contrast, real world implementation of a novel, multistage optimization algorithm to search for partially observed or unobserved stationary targets in 3D environment using multi-DoF eye-in-hand systems is presented in [36]. While this approach considers robot joint limits and singularities, it is slow even for stationary targets due to the adopted image-based deterministic cost functions. To speed up the process, a probabilistic search is suggested in [37] for robotic manipulators, but only simulation results of a manipulator searching for a moving target in 2D are presented.  While an eye-in-hand manipulator platform often has six or more DoFs, the majority of the probabilistic search techniques in robotics focus on target search in 2D. The first implementation of a 2.5D multi-agent target search with height maps is claimed in [38]. Recently, sample-based viewpoint planning techniques are adopted in [34], [35] for mobile robots/manipulators to search for a stationary target in 3D. In addition, [37] simulates searching for a moving target in 2D and mentions development of more general search algorithms to higher dimensions as future work. In a survey of recent search and pursuit-evasion advancements in mobile robotics field [20], the extension of efficient algorithms for practical search into 3D and higher dimensions is mentioned as an open problem. Also, bounded speed, constrained mission endurance and refined sensing models are listed as ongoing practical challenges in the field, and these are some of the problems that this research tries to tackle. From a complexity perspective, the active object search is shown to be NP-hard (non-deterministic Polynomial-time hard) in general even considering a single agent searching for a single target [39], while under various simplifying assumptions, it is proved to be NP-complete [39], [40]. On the other hand, this problem becomes further intractable as the spatial dimensions of the sensor and target state as well as the temporal dimension of planning process increase. High-DoF searching agents, target motion in higher dimensions and planning over a time horizon (beyond one step) add to the complexity of the problem. Therefore, an approximate solution is all one can guarantee, and the aim is to find an approximate algorithm that can search efficiently in an online fashion. This algorithm should avoid the curse of dimensionality encountered in RBE and optimize a high dimensional cost function to plan a trajectory with a reasonable planning horizon. This search algorithm is further challenged in presence of environmental constraints like occlusions. In the next section, a review of various robotic vision works that address different types of occlusions is presented.  7  1.3.2 Searching for an Occluded Target Occlusions are another important source of tracking failure in robotic vision. Researchers have addressed partial occlusion handling in the robotic manipulation literature mainly through implementation of robust object recognition algorithms. For example in [41], the authors used scale-invariant feature transform (SIFT) features to recognize and estimate the pose of partially occluded objects. The authors of [42] used a robust error measure to track partially occluded objects. This measure assumes that the error between a template and the image is not normally distributed.  Temporary occlusion handling is also addressed in the literature mainly through robust tracking techniques in the image domain. For example, [43] presents a 2D probabilistic tracking-by-detection approach in a particle filtering framework to track dynamically moving persons in complex scenes with occlusions. Later in [44], a robust context-based tracking algorithm estimates target locations during temporary occlusions using the relative motion cues between the target and contextual features.   Resolving persistent occlusion has been addressed in a closely related field of next best view (NBV) planning literature. One early work [45] plans viewing angles based on the edges of shadow regions appeared in a range image. The higher parts of a scene occlude the scanning surface, causing shadow regions to appear in the range image during 3D modeling of an unknown scene. The authors adopted a geometry-based approach to model the occluded regions through planning a 1D viewing angle. Later in [46], occlusion-free viewpoints are computed for a stationary target in a cluttered environment. They defined the visibility region of a feature to be “the open and possibly empty set consisting of all viewpoints in free space for which the feature is visible in its entirety”. Then, they computed the visibility region of a target of interest in an assumed a priori known polyhedral world. In fact, they adopted an extensive geometrical approach to compute the visibility region that is not necessary if rapid recovery of a dynamic occluded target is the main goal.  As reported, in the case of a persistent occlusion, visibility reasoning, even around well-determined occluder extents, is not trivial [46]. The manipulator operates in Configuration space (C-space), and the occluder exists as a set of 3D boundaries in the robot’s workspace. On one hand, robot workspace planning is hampered by nonlinear maps of joint limits and singularities of robotic manipulators. On the other hand, densely mapping the entire 3D boundary into C-space with inverse kinematics (IK) is both computationally expensive and problematic for robots with high DoF that can have multiple (or, if redundant, infinite) IK solutions [47].  8  The work on NBV planning in [48], [49], where they perform C-space planning during 3D modeling of object shapes, provides a great insight. However, their approach of transferring the entire 3D workspace information into C-space is expensive. An expensive planner that cannot operate online is unsuitable for visual tracking where the agent has to keep up with the target. Therefore, an online planner that is applicable to higher dimensional search with constraints is required, where the agent state space is different from the target and occlusion state spaces. This online planner would allow for applying the algorithm on a real world robotic application and investigating its efficacy in practice.  An interesting application of visual tracking is in robotic telepresence where smooth and continuous tracking plays an important role. In robotic telepresence, occlusion as an environmental interference can deteriorate the telepresence experience. Therefore, the current state of robotic telepresence in understanding the occlusion impacts and handling environmental constraint is reviewed next.    1.3.3 Impacts of the Search Algorithm in Telepresence Robotic telepresence systems (robot-mediated videoconferencing) allow users in two different geographical locations to have audio-visual communication to converse and collaborate with each other [50], [51]. Increasing number of these systems are being used in populated areas such as museums [52], health care centers [53], and offices [54]. In such applications, there is an increased chance that visual contact between a pilot (the person remotely located behind the display, using the robot as a proxy) and an interlocutor (the person co-located with the robot watching the display/robot) will be disrupted due to occlusion by other humans and even other robotic telepresence systems. Occlusions can be problematic since they can negatively impact human-human and human-robot interactions. In human social motor behavior context [55], occlusion is demonstrated to lead to a less stable interpersonal coordination of movement between people during social interactions. Social learning literature suggests that a collaborating agent’s beliefs about his/her partner can diverge during an instructional task when an occlusion occurs [56]. In proxemics (the study of the use of space in social interactions), occlusion is introduced as an extrinsic environmental interference in complex environments [57]. Complementing these findings are studies that emphasize the importance of occlusion-free environments in fostering bidirectional communication. In telepresence and videoconferencing literature it is well-documented that maintaining eye contact and use of gaze cues during bidirectional communications positively impact perceived social presence [58], [59]. Therefore, it is likely that occlusion negatively affects the experience of using teleconference platforms as a remote collaboration 9  tool. However, as far as the author is aware, there has not been a systematic investigation that demonstrates and attempts to quantify the impacts occlusion can have on telepresence collaborations. When communication happens across a conventional, stationary videoconferencing platform, it is nearly impossible for the remote user to actively avoid or resolve the occlusion. When a robotic telepresence system is used, on the other hand, a remote user can resolve occlusions by manipulating the robot’s camera position/orientation. However, the task of controlling the robot can distract the user from the collaborative task at hand. Therefore, if occlusion poses significant hindrance to telepresence collaborations, it is important to understand how such hindrance could be mediated. Attempts to handle occlusions in the context of telepresence robots have been reported in the past. For example in [60], the authors adopted a threshold-based waiting policy to handle temporary occlusions that occurred while a mobile remote presence (MRP) system was following a person in office environments. However, the impacts of the adopted occlusion-handling technique was not investigated and user feedback was not considered as a design guideline. Design of a telepresence robot controller based on user studies usually includes a comparison between a manual and autonomous control modes [61], [62]. Findings from these studies reveal a trade-off between the two modes depending on the measures used and tasks considered. For example, a user study involving a telepresence walk-and-talk scenario [61] demonstrates that autonomous navigation of a telepresence robot lowers the pilot’s cognitive workload and improves the task performance and perceived safety compared to manual navigation. On the other hand, manually controlling a telepresence robot’s translational movement can produce a stronger social telepresence compared to fully autonomous control of the robot as stated in [62]. This thesis investigates the efficacy of the developed autonomous control mode in the context of resolving occlusions and compares it against a manual control mode in telepresence collaborations. 1.4 Thesis Objectives To address the shortcomings of the defined problem within the current literature, summarized at the end of each section in 1.3, the research objectives are defined as follows. This thesis attempts to provide an algorithm that searches for a lost target in high dimensional robotics in an online manner. It addresses the case where the target moves in 3D Cartesian space while the sensor plans in higher dimensions (usually 6 or 7D) of C-space as opposed to the common 2D lost target search problems. The thesis aims to develop an observation model and a cost function that efficiently searches in higher dimensions while considering 10  sensor constraints and robot kinematics constraints. To evaluate this algorithm, the thesis provides comparisons with current available algorithms in the literature on a real robotic eye-in-hand system. This work then further improves the cost function to tackle occlusions as an environmental constraint. This requires to identify feasible visibility restoring configurations (a 3D property) in dynamic environments while keeping the computational cost low. The resulting algorithm, lost target recovery algorithm (LTRA), should reason around the occlusion and recover the target with few predict/plan/act/observe iterations. Then, it is required to compare the LTRA on a real world robotic application of telepresence against existing method/s to actively resolve occlusions during remote communications. This thesis then explores what impacts occlusion has on remote collaborations and how different degrees of autonomy in resolving occlusions affect the collaboration experience. To investigate these interlinked objectives, two human-subjects experiments were conducted. This allows one to understand human behaviors in telepresence collaborations when visual occlusions are introduced, and to evaluate the necessity of the need to handle occlusions in telepresence robotics. This thesis then tries to evaluate the efficacy of applying LTRA on this real world robotics application from user’s point of view and empirically investigate its implications. The goal of this work is to develop a novel algorithm for high-dimensional robotics vision to search for a lost target in common scenarios. These common scenarios include when the target leaves the camera FoV and when an occlusion, as an environmental constraint, occludes the target in real time. This work then wants to investigate the efficacy of such an algorithm in a real world robotic application of telepresence. Achieving these goals require answering the following research questions: I. What strategies are most effective for a high dimensional active visual search for a lost target? How does one measure search effectiveness? II. How can the sensor constraints, robot kinematics constraints and visual environmental constraints be effectively addressed in the proposed searching algorithm? III. What is the added complexity level of the search algorithm in presence of occlusions, and what are the required improvements in the algorithm to handle occlusions? IV. What are the impacts of occlusion on telepresence collaborations? V. Can the developed autonomous handling of occlusions improve telepresence collaboration experience for remote users? These questions are addressed in detail in the following chapters as outlines in the next section.  11  1.5 Thesis Outline The following chapter introduces the lost target search (LTS) and utilizes the latest available information from the target just prior to leaving the FoV to initiate an optimal online search. It explains various features of the LTS algorithm and provides simulation comparisons with common methods existing in the literature. The capabilities of the general LTS algorithm is then demonstrated on a laboratory scale 7-DoF eye-in-hand system tracking a fast moving target. In chapter 3, the LTS algorithm is further extended to handle occlusions as a more complex lost target scenario. This chapter elaborates the key differences between the resulting algorithm, LTRA, and LTS; then demonstrates its capabilities in simulation and experiment on an improved version of the platform used in chapter 2, representing a telepresence system. The comparison of the proposed LTRA with an existing method in the literature is also presented.  Chapter 4 focuses on the application of the LTRA in real world robotic telepresence experience. It presents the design of two interlinked human-subjects studies. These studies are designed to investigate the impacts of occlusions on a telepresence experience and how effective the proposed LTRA can be from user’s point of view. The efficacy of the proposed LTRA on the telepresence platform is also compared against human user performance to handle occlusions. The research is then concluded in chapter 5 and a summary of the research contributions are provided. The future research avenues of this work are also discussed in this chapter. 12  Chapter 2 - Active Target Search for High Dimensional Robotic Systems 2.1 Introduction As explained in the previous chapter, when a robotic visual servoing/tracking system loses sight of the target, the servo fails due to loss of input. To resolve this problem a search method, namely LTS, to generate efficient actions to bring the target back into the camera FoV as soon as possible is required. For high dimensional platforms, like a camera-mounted manipulator or an eye-in-hand system, such a search must address the difficult challenge of generating efficient actions in an online manner while avoiding kinematic constraints. This chapter develops an online planner for probabilistic search that is applicable to higher dimensional search with constraints, where the agent and target state spaces are different. An observation model is introduced that forms the bridge between the agent space and the target space. The work in this chapter implements a sequential Monte Carlo (SMC) method [63] to handle nonlinear systems with non-Gaussian noise models in RBE (recursive Bayesian estimation). During this process, different approximate solutions with some common cost functions and planners are compared to drive the search in simulation and on a physical eye-in-hand platform.  The following section provides some necessary background in the LTS field. Section 2.3 presents the problem in detail. Then the methodology is presented and the contributions are elaborated in Section 2.4 followed by the results in Section 2.5. These last three sections are presented in similar subsections to highlight different aspects of this chapter’s contributions. Finally this chapter is concluded in Section 2.6.   2.2 Related Work A probabilistic search is structured around target belief state evolution and sensor optimal trajectory planning. A target belief state - posterior distribution of the target state also known as filtering density – evolves by RBE while relying on two main sources of information. One is the target’s a priori 13  information, obtained from tracking before losing sight of the target in our case, and the other one is new observations. In addition, an RBE contains two stages of prediction and update, where it propagates the target state by a state transition model and updates the belief state using an observation model through Bayes’ rule. In this process the target’s a priori information is presented by a probability density function (PDF) and both of the state transition model and observation model contain noise terms. As both the target and the sensor have motion models, we refer to the target motion model and sensor motion model as the state transition model and sensor motion model respectively throughout the thesis. Depending on the noise terms and the models, closed-form and approximate solutions exist for RBE [64]. For example linear models with Gaussian noise terms would result in Kalman filter (KF) with closed-form solution for the target’s belief state. If the models are nonlinear or the noise terms are non-Gaussian, then approximate solutions exist [65]. For example, Extended Kalman filters (EKF) and Unscented Kalman filters (UKF) are used when the system is non-linear but the noise terms are still Gaussian. In the presence of non-Gaussian noise in nonlinear systems, approximate grid-based methods (like Hidden Markov models) are used when the continuous state space can be decomposed into grids (cells), otherwise SMC is applied. Even though EKFs are used for tracking a moving target and localizing a stationary target as in [66], [67], most of the LTS literature relies on the last two methods to solve RBE as discussed later.  Irrespective of the adopted RBE technique, a sensor trajectory planner is required to incorporate the new information, captured through new observations, in the planning process in order to make optimal decisions. Therefore, a cost function based on the target belief state is considered in information-theoretic applications to include all of the observations thus far in the planning process. Then the planner can be designed to make optimal decisions for one- or multi-step look-ahead depending on the planning horizon.  Finally, this whole probabilistic search process can be formulated as a partially observed Markov decision process (POMDP) as in [34], [68], [69] that incorporates all of the aforementioned steps. Here we discuss the related literature of RBE solutions, various choices for cost functions and the planning horizon for LTS to better understand the potential options for an efficient online high dimensional LTS algorithm for a moving target in 3D. 2.2.1 Recursive Bayesian Estimation  As the camera observation model and a high dimensional robotic platform are nonlinear in general with various noise terms, closed-form solutions of RBE are not considered here. On the other hand, as the dimensionality of the target and sensor state increases, the tractability of the search algorithm greatly 14  depends on the RBE approximation method. Approximate grid-based methods cover the whole feasible space while suffering from curse of dimensionality. Randomization breaks this curse for SMC methods, but they can get stuck in a local minimum. Although methods to overcome this problem exist (e.g., resampling for particle filter), alternative target belief state evolution options of approximate grid-based methods and particle filters have been adopted in literature as we summarize below. Approximate grid-based methods are used extensively in UAV search application were the search space is tessellated into 2D grids for target search and tracking [21], [22], [25], [27], [31], [70]. Another grid-based 2D search where a mobile robot decomposes an indoor environment search space into a convex tessellation is presented in [69]. This convexity criterion is considered to guarantee that the agent has a line-of-site to a target in the same cell. Approximate 2D grid-based search is also adopted in [37] where a manipulator searches for a moving target in 2D. In fact, the suggested future research work to improve the target search to 3D is one of the motivations of our work. Approximate grid-based methods are also used in 3D object search. For example, both the search space and the action space are tessellated in [32] to visually search for an object in 3D environment using a mobile robot. Even though the target exists in a 3D environment and the robot moves in 3D, the implemented algorithm only uses the vertices of the 2D grid. In a similar application, [33] uses two observation models - crude and refined - for global and local search of a mobile robot for a stationary 3D object. However, it uses both of these models to update a 2D grid-based probability map recursively to plan the search. Finally, a 3D grid map of an indoor environment is used in [34] to search for different targets. However, environment semantics are used to significantly reduce the number of feasible grids and speed up the search process. In fact, the computational cost is one of the main drawbacks of tessellating 3D search spaces, a problem that SMC avoids. Using a particle filter as an SMC method is widely adopted in probabilistic LTS. For example, a mobile robot searches for a moving target in an indoor 2D environment in [71] using a particle filter and tries to minimize its expected acquisition time. In order to apply non-linear models and reduce computational costs, [28] uses a particle filter to allow multiple UAV agents to search for multiple targets in 2D over a planning horizon in a timely fashion. A similar application of a particle filter in UAV 2D search allows for horizon planning in [29], [72]. Finally, [73] shows that particle filter estimation performance outperforms the approximate grid-based method in a 2D search.  In this chapter, we compare the performance of grid-based methods versus particle filters for RBE approximation in a simplified scenario based on the work in [73]. This belief state estimation is then used 15  to form a cost function to drive the search optimally. To this end, in the next section we review the related literature on choice of cost function. 2.2.2 Cost Functions There are multiple choices for cost functions in the literature for search and exploration purposes. In a few examples, the trace or determinant of the covariance matrix of a set of estimated states is used. For instance in [66], the trace of the target’s position estimate covariance matrix at each time step is considered as the cost function for target localization. The same cost function, calculated over the entire planning horizon, is used in [74]. For a similar application of online sensing and planning [68], the trace of the covariance matrix of the estimated state (robot pose and location of map features) at the end of a planning horizon forms the cost function.      A popular choice of cost function in the field of LTS is maximizing probability of detection (PD), or in other words, minimizing the probability of no detection. Maximization of the probability of detecting a target, marginalized over all possible target states, is used as a cost function in [25], [27], [33], [73]. Detecting a target for the first time at each iteration while considering no-detection likelihood in the previous iterations is used as a cost function in [70], [72]. The cumulative sum of the probability of detecting the target for the first time, or cumulative detection probability (CDP), is also used in [31], [75], [76]. The incorporation of some additional terms like the inverse of travel distance along with the PD in the cost function is also reported in [32], [35], [37]. Another popular cost function, based on information theory, maximizes the information gathered over a set of actions by minimizing the belief state entropy. There are numerous applications of this objective function in the field of robotics. For example, in a robot localization application using an adaptive particle filter approach, the number of particles is adapted based on the entropy of the robot’s state with respect to its environment [77]. In [71], for a target search application, the authors utilize a target state entropy cost function that considers an additional travel distance term to compensate for the scan area while planning ahead one step at a time. In [29] a novel approach is presented to predict the expected target belief state entropy in an active sensing problem that drives a planner over a horizon. The authors show that a controller based on the maximum PD resembles the outcome of a minimum entropy controller when the PD is low.  Following on this statement, we also compare the performance of the two most common cost functions of PD and belief state entropy while considering additional terms to consider sensor action’s cost. We present the result of this comparison both in simulation and on our platform by considering a greedy (one-16  step look-ahead) planning approach. The most promising cost function is then used to decide on the planning horizon while considering various horizon planning techniques in the literature.  These are presented next.  2.2.3 Planning Horizon In the target search and localization literature, sensor planning is applied in multi-step look-ahead fashion or simplified to one-step look-ahead, also known as greedy or myopic planning. Multi-step look-ahead planning is a well-known approach in control theory, known as model predictive control (MPC) or receding horizon control (RHC) [78]. An optimization problem is solved at each time step to find an optimal series of actions while planning over a time horizon. Then the control input at that time step is the first action in that series of optimal actions. This optimization problem allows consideration of the control limitations and system constraints. However, solving this optimization problem for continuous nonlinear systems with high degrees of uncertainty in higher dimensions is prohibitive [79]. RHC is an NP-hard problem as dimensions grow exponentially by planning horizon, and the only solution is to discretize the problem and turn the optimization into a combinatorial optimization problem. Various methods to discretize the target state space, the sensor’s control action or combination of both, are proposed and compared in the literature. For example, grid-based planning is use in [27], [70] to solve the unconstrained optimization for RHC of UAVs. A fixed five step horizon planning is used in [27], while [70] uses a variable horizon planning technique to decrease the dimensionality of the optimization as the sensor approaches the target. In the latter, when the sensor is far from the target, the planning horizon is increased, and as it approaches the target, short-term plans are made using a gradient-based optimizer.  Another method to turn the optimization into a combinatorial problem is by discretizing the control action. This approach is applied to mobile robots and UAVs in [69], [75], [80]. In [69] and [75] a limited step look-ahead planning is considered due to the exponential nature of the RHC. For a set of Ψ number of discretized control actions in one step of a 𝜏-step horizon, Ψ𝜏 possible paths exist. In order to reduce this number, a rollout policy approach is proposed in [80], that reduces the considered possible paths from Ψ𝜏 to Ψ + (𝜏 − 1)Ψ2. This heuristic approach initiates a greedy search from the next step at any given time while marginalizing over all possible discretized actions at that time. This rollout policy for horizon planning of a UAV with restricted motion model is claimed to guarantee its navigation safety in a hostile environment. Another popular approach to solve RHC is based on dynamic programming. The search for multiple stationary targets in a static environment is solved using dynamic programming in [81] while dividing the 17  environment into a set of distinct regions. In [72] the authors adopt dynamic programming principle to plan over a horizon for UAVs with discrete set of control actions. They represented the target belief state by particle filter, and compared the cost of different branches of control actions across search trees. Branch and bound techniques are also used for finite horizon planning in information gathering and object search objectives. The optimal search problem for a target in indoor environments is solved by branch and bound in [76] and [82]. Informative path planning in outdoor environments uses branch and bound in [83] to speed up the planning  while relying on a monotonic cost function of variance reduction. In all of these solutions based on branch and bound, the agent’s control action space is discretized in various forms to generate nodes for the search tree. Random trees provide the base for another horizon planning approach. The agent’s action is planned based on rapidly exploring random trees (RRT) in [74] while the target state estimation is evolved using particle filter. The authors introduced shuffling and randomized pruning - techniques adopted from genetic algorithms - to improve computational efficiency of a random tree where the tree nodes are generated by biased sampling. The overall computational cost of this approach is 𝜏Ψ2. Although they were able to solve a search problem in 3D using random tree, the results are based on offline global open-loop optimization rather than in an online receding horizon fashion. Further improvements are provided in [30] to consider various alternatives of information objectives for an informative motion planning problem. The incremental sample-based approach of RRT is also compared against rapidly exploring random graphs (RRG) and probabilistic roadmaps (PRM). However, generating trajectories in higher dimensions and in an online fashion are mentioned as future work that is another motivation of this chapter.  When the sensor’s action and state space are high dimensional, discretization to solve RHC fails due to the curse of dimensionality. To handle this, multiple Monte Carlo methods are proposed in the literature to approximate the cost function over a horizon. In [68] a state variance cost function is approximated by samples over a horizon of 𝜏 while Bayesian optimization is used to minimize the number of queries to find the optimal path (policy). Linear interpolation of samples is also used in [29] to approximate an entropy-based cost function over a horizon of 𝜏. However, approximating any cost function over a planning horizon with samples is a costly process as it requires calculation of the posterior distribution for each of the samples.  Computational cost is a major drawback of all the aforementioned approximation techniques to solve RHC. In order to avoid the computational cost for an online planning problem, the tendency in the literature is to adopt the simplified greedy planning approximation [79]. Greedy planning approach is 18  implemented on UAV in [21], [22], [25], [71], [84], on mobile robots in [32], [66], [67], and on manipulators in [35], [37], [73] to search for a target. However, a greedy approach can get stuck in local minima. To overcome this problem, various heuristic methods are built upon greedy planning in [31], [33], [34]. We also propose a heuristic greedy approach to avoid getting stuck in local minima and compare its performance against a fast RHC method adopted from literature.  Based on this review, in this chapter we present a particle filter approach to evolve the belief state of a target in 3D after leaving the camera FoV. Then an entropy-based cost function is proposed considering sensory platform constraints. This cost function is utilized in a heuristic greedy planner to drive an online optimization for a high dimensional robotic platform to efficiently search for the lost target. Each of these major components is compared against common methods in the literature and simulation and experimental results on an eye-in-hand system are presented to show the superior performance of our proposed algorithm.       2.3 Problem Formulation When a moving target is inside the camera FoV, the target can be visually tracked and followed (visual tracking/servoing). Herein, we refer to this situation as the “tracking phase”. To improve the tracking performance, state estimation techniques can be applied to estimate the target motion based on some pre-assumed motion models [9]. When the target leaves the FoV, no new data about its state is available to the sensor, and tracking/servoing algorithms fail. However, the knowledge about the target based on the last available information prior to leaving the FoV can be used in an RBE to estimate its state over time.  We refer to this situation as the “searching phase”. The state transition model of the target irrespective of its situation with respect to the FoV is represented by:  𝒙𝑘+1 = 𝑓𝑘+1(𝒙𝑘, 𝒘𝑘), (2-1) or in the probabilistic form, 𝑝(𝒙𝑘+1|𝒙𝑘), also known as process model. Here 𝒙𝑘 ∈ 𝑅𝑛𝑥 is the target state at time step 𝑘 and 𝑓𝑘+1: 𝑅𝑛𝑥 × 𝑅𝑛𝑤 → 𝑅𝑛𝑥 is any possible function of the target motion. The independent and identically distributed (i.i.d) process noise, 𝒘𝑘, represents the uncertainty in target state. Herein, the dimension of a vector α is written as 𝑛𝛼. The sensor motion model is given by: 19   𝒒𝑘+1 = 𝑔𝑘+1(𝒒𝑘, 𝒖𝑘), (2-2) where, 𝒒𝑘 ∈ 𝑅𝑛𝑞 is the sensor state at time step 𝑘 and 𝑔𝑘+1: 𝑅𝑛𝑞 × 𝑅𝑛𝑢 → 𝑅𝑛𝑞 is any possible function of the sensor motion given its control input 𝒖𝑘. Here we assume perfect sensor localization assuming a perfect robot model. This is a valid assumption as any uncertainty in the robot model can be captured in the observation model noise term. The mounted camera on the robot provides this observation model:  𝒛𝑘 = ℎ𝑘(𝒙𝑘, 𝒒𝑘 , 𝒗𝑘), (2-3) or in the probabilistic form, 𝑝(𝒛𝑘|𝒙𝑘, 𝒒𝑘), also known as likelihood. Here 𝒛𝑘 is the observation at time step 𝑘 and ℎ𝑘: 𝑅𝑛𝑥 × 𝑅𝑛𝑞 × 𝑅𝑛𝑣 → 𝑅𝑛𝑧 is any possible observation function considering i.i.d measurement noise 𝒗𝑘. Given a series of observations 𝒛1:𝑘 = {𝒛𝑖, 𝑖 = 1, … , 𝑘}, RBE calculates the target belief state or posterior distribution of 𝑝(𝒙𝑘|𝒛1:𝑘) recursively through a prediction and update stage. The prediction stage combines a priori information with state transition model through Chapman-Kolmogorov equation as follows:  𝑝(𝒙𝑘+1|𝒛1:𝑘) = ∫ 𝑝(𝒙𝑘+1|𝒙𝑘)𝑝(𝒙𝑘|𝒛1:𝑘) 𝑑𝒙𝑘 . (2-4) As new observations become available, Bayes’ rule updates the target belief state as follows:   𝑝(𝒙𝑘+1|𝒛1:𝑘+1) =𝑝(𝒙𝑘+1|𝒛1:𝑘)𝑝(𝒛𝑘+1|𝒙𝑘+1, 𝒒𝑘+1)∫ 𝑝(𝒙𝑘+1|𝒛1:𝑘)𝑝(𝒛𝑘+1|𝒙𝑘+1, 𝒒𝑘+1)𝑑𝒙𝑘+1 . (2-5) The process starts with (2-4) from 𝑝(𝒙1|𝒛1:1) ≡ 𝑝(𝒙0) where 𝒛1:1 is the last available observation from the tracking phase. The goal is to generate an optimal or sub-optimal sensor control input 𝒖𝑘 to find the lost target as soon as possible with minimum cost. Therefore, a cost function based on target state and as a function of sensor control input is formed. Depending on how the cost function is formed, a single control input 𝒖𝑘 or a sequence of control inputs 𝒖𝑘:𝑘+𝜏 over a horizon 𝜏 is planned at time step 𝑘, though only 𝒖𝑘 is applied at each step to acquire new observation. Then (2-5) is used to update the target belief state. This whole process iterates until the target is recovered and the searching phase is terminated.   2.4 Methodology In order to plan efficient sensor actions, the first step is to solve the RBE to evolve the target belief state. To handle the nonlinearities in the observation model and some state transition models as well as any 20  noise term, we apply the Sampling Importance Resampling (SIR) particle filter with residual resampling step as detailed in [63], [65]. In this case, target’s a priori distribution 𝑝(𝒙𝑘|𝒛1:𝑘) starting from 𝑝(𝒙0) is approximated by a set of 𝑁 weighted Dirac delta functions as follows:  𝑝(𝒙𝑘|𝒛1:𝑘) ≈ ∑ 𝜔𝑘𝑖 𝛿(𝒙𝑘 − 𝒙𝑘𝑖 )𝑁𝑖=1, (2-6) where, 𝒙𝑘𝑖  is one of the 𝑁 particles. The particles transit through state transition model in SIR, while their normalized associated weights, {𝜔𝑘𝑖 , 𝑖 = 1: 𝑁}, are calculated based on the principle of importance sampling through the observation model and resampling stage such that ∑ 𝜔𝑘𝑖 = 1𝑁𝑖=1 . In this chapter, the particles are points (𝒙𝑘𝑖 = [𝑥, 𝑦, 𝑧]𝑘𝑖 ) in Cartesian space representing target location probability, as 𝒙 is expressed in Cartesian coordinates (𝑛𝑥 = 3). We also express our sensor state in the robot configuration space (typically a 6 or 7 dimensional space for a robotic manipulator) where the vector 𝒒 represents a joint angle vector (𝒒𝑘 = [𝑞1, … , 𝑞𝑛𝑞]𝑘). By choosing to plan sensor actions in configuration space we avoid the need for robot singularity handling and feasibility checks through IK and facilitate joint limit avoidance by implementing them as constraints. Therefore, the observation model in (2-3) must bridge between these two spaces in order to update the belief state by particle filter.  2.4.1 Observation Model The whole process of search and tracking can be handled through particle filter as long as a thorough observation model is available. During tracking phase a detection event represents the observation, 𝒛𝑘 =𝑫𝑘, while during searching phase, a no-detection event represents the worst-case observation scenario by considering 𝒛𝑘 = ?̅?𝑘. As the observations in the tracking and searching phase are complementary, the detection and no-detection models are combined in a Bernoulli model to cover both cases as follows:  𝑝(𝒛𝑘|𝒙𝑘, 𝒒𝑘) = [𝑝(𝑫𝑘|𝒙𝑘 , 𝒒𝑘)]𝐼1(𝒙𝑘,𝒒𝑘). [𝑝(?̅?𝑘|𝒙𝑘, 𝒒𝑘)]𝐼2(𝒙𝑘,𝒒𝑘), 𝐼𝑖 = 0 𝑜𝑟 1. (2-7) Here we focus on the detection model as the no-detection model used in searching phase is simply the complementary part. We propose a continuous detection model based on a pin-hole camera model that encapsulates the depth parameter as well. A similar approach is considered in [85] to generate a geometry-based camera observation model called sensed sphere.  This sensed sphere is a subsection of a sphere generated by intersecting camera view volume defined by camera bounding planes and concentric spheres of different radiuses centered at camera location. The observation probability is considered 21  constant within this discretized sensed sphere which does not address depth and boundary effects in image processing.  To consider the image edge effect and a nominal detection depth range, and bridge between the Cartesian space and configuration space we define our heuristic detection model in the image frame, ℱ𝑖, considering [𝑢, 𝑣, 𝑑]𝑘 coordinates. The image plane is a rectangular plane with poor detection capability on the edges. To encapsulate these features, we construct our detection model from the product of four sigmoid functions centered close to the edges of the image plane. When 𝐿𝑢 and 𝐿𝑣 represent the width of the image plane in 𝑢 and 𝑣 directions in pixels with a safety bound of size 𝜖, the model is expressed as follows:   𝑝(𝑫𝑘|[𝑢, 𝑣, 𝑑]𝑘) = ℵ(𝑑𝑘). ∏11 + 𝑒(−1)𝑖𝛼(𝑢𝑘−𝑙𝑢).11 + 𝑒(−1)𝑖𝛼(𝑣𝑘−𝑙𝑣)2𝑖=1, 𝑙𝑢 = (−1)𝑖(𝐿𝑢 2⁄ − 𝜖),  𝑙𝑣 = (−1)𝑖(𝐿𝑣 2⁄ − 𝜖),  (2-8) where 𝛼 defines the slope of the sigmoid functions. In this model, ℵ(𝑑𝑘) is the depth coefficient to account for the depth effect in the detection model. This model is depicted in Figure 2-1 for the case when ℵ(𝑑𝑘) = 1. The schematic representation of the overall detection model is presented in Figure 2-2 when the depth coefficient is defined as follows: This coefficient is formed by product of two sigmoid functions of depth. One with a sharp slope (𝛽𝑠) represents the shortest range, 𝑑𝑠, that the object is detectable by the camera. This is usually the depth at which the entire target fits in the FoV or the target image deblurs. The second one with a mild slope (𝛽𝑙) defines the longest detection range, 𝑑𝑙, at which the target is so far away that is no longer detectable. Both 𝑑𝑠 and 𝑑𝑙 depend on the target size, sensor specifications, lighting condition and the detection algorithm that can be calculated or experimentally examined for a given platform in model base detection algorithms. In the case of a 3D camera, like Microsoft Kinect5, that the depth information is extracted from point cloud data, this information is provided by the sensor’s manufacturer.                                                              5 Xbox®, “Xbox is either a registered trademark or trademark of Microsoft Corporation in the United States and/or other countries.”  ℵ(𝑑𝑘) =11 + 𝑒−𝛽𝑠(𝑑𝑘−𝑑𝑠).11 + 𝑒𝛽𝑙(𝑑𝑘−𝑑𝑙). (2-9) 22   Figure 2-1. A scaled detection model in 𝓕𝒊 for a specific depth with depth coefficient ℵ(𝒅𝒌) = 𝟏.  Figure 2-2. Schematic representation of the detection model. The model presented in (2-8) is defined in the image space, while the particles are expressed in the Cartesian space. To transfer all of the particles defined in the world frame to the image frame, we consider the robot base frame, ℱ𝑏, as the world frame. In this case all of the points are expressed through homogenous transformation matrices 𝚻𝑜𝑏 . Here, 𝚻𝑜𝑏 , 𝚻𝑒𝑏  and 𝚻𝑐𝑒  represent transformation matrices from the object frame to the base frame, the robot end-effector frame to the base frame and the camera frame to the end-effector frame respectively. Thus, any object can be expressed in the camera frame, 𝚻𝑜𝑐 , by the following kinematic transformation:           1 0.5 0.1 z 1 1 0.5 0.1 z ℵ 23   𝚻𝑜𝑐 = 𝚻−1𝑐𝑒 𝚻−1𝑒𝑏 (𝒒𝑘) 𝚻𝑜𝑏 . (2-10) Here, 𝚻𝑐𝑒  is the camera extrinsic parameters matrix captured through camera calibration, while 𝚻𝑒𝑏  is a function of the sensor state 𝒒𝑘.  Note that, 𝚻𝑒𝑏  is calculated through the robot’s forward kinematics (FK) and accounts for the sensor motion model in (2-2) affecting the overall observation model. Therefore, any point 𝒙𝑘 at time step 𝑘 expressed in the base frame ℱ𝑏 can be expressed in the image frame ℱ𝑖 as a function of 𝒒𝑘 as follows:  [𝑢𝑣𝑑] 𝑖𝑘= Κ[𝐼|0] [𝑥𝑦𝑧1] 𝑐 𝑘⟼ [𝑢𝑣𝑑] 𝑖𝑘(𝒒𝑘) = Κ[𝐼|0] 𝚻−1𝑐𝑒 𝚻−1𝑒𝑏 (𝒒𝑘) [𝑥𝑦𝑧1] 𝑏𝑘. (2-11) Here Κ is the camera calibration matrix defined by camera intrinsic parameters, and [𝐼|0] is a matrix formed by a 3x3 identity matrix and a zero vector. Substituting (2-11) and (2-9) into (2-8) provides the heuristic detection model that is a function of target and sensor state (𝒙𝑘 and 𝒒𝑘). This detection model or its complement is then used to update the target belief state in the particle filter update stage. It is also used to form different cost functions, whereby substituting (2-2) in (2-11), optimal sensor control actions can be derived to make efficient observations. In the next section we develop a cost function to make efficient sensor actions in the searching phase. Therefore, the no-detection model is considered as the observation model from now on unless stated otherwise. As target is recovered, the same procedure can be followed by adopting the detection model instead to continue with the tracking phase.  2.4.2 Cost Function The entropy of a random variable in information theory is a measure of its uncertainty. For a continuous random variable 𝜃 with PDF 𝑝(𝜃), the entropy is calculated as follows:  H(𝑝(𝜃)) = − ∫ 𝑝(𝜃) log 𝑝(𝜃) 𝑑𝜃. (2-12) As more information becomes available, the random variable entropy reduces. Therefore, the entropy of the target belief state can provide us with a cost function to make informative optimal observations. As observations are collected over time, the entropy is calculated as follows:  H(𝑝(𝒙𝑘+1|𝒛1:𝑘+1)) = − ∫ 𝑝(𝒙𝑘+1|𝒛1:𝑘+1) log 𝑝(𝒙𝑘+1|𝒛1:𝑘+1) 𝑑𝒙𝑘 . (2-13) 24  Utilizing the particle filter approximation, one can substitute (2-6) in (2-13) to achieve the following approximation:   H(𝑝(𝒙𝑘+1|𝒛1:𝑘+1)) ≈ − ∑ 𝜔𝑘+1𝑖 log 𝜔𝑘+1𝑖𝑁𝑖=1 (2-14) that is the same as the entropy of a discrete random variable. Even though the particles’ location is not addressed in this approximation, this is a sufficient measure to reduce uncertainty of the target belief state based on the particles’ weight as long as the weights are updated through observations. The case to account for particles’ location when the weights are not updated will be handled later when we propose our greedy planning. We also consider an additional travel distance term in the cost function to minimize the sensor action cost. This addresses the facts that in high-DoF robotic platforms multiple robot configurations can result in the same end-effector location or different observations might result in similar entropy reductions. The resulting cost function is then as follows:  Γ(𝒙𝑘+1, 𝒒𝑘 , 𝒒𝑘+1) = −(1 − 𝜌) ∑ 𝜔𝑘+1𝑖 log 𝜔𝑘+1𝑖𝑁𝑖=1+ 𝜌(𝒒𝑘 − 𝒒𝑘+1)TΦ(𝒒𝑘 − 𝒒𝑘+1). (2-15) Here 0 < 𝜌 ≤ 1  is a coefficient to adjust the emphasis on travel distance, while Φ is a diagonal matrix assigning different positive weights to different robot joints as they may not have equivalent travel costs. This cost function is similar to the one presented in [71], however we will further improve it later to account for sensor constraints and particle locations. We will also compare the performance of this cost function against PD cost function.    The PD cost function that we chose for comparison is the probability of no-detection when it is marginalized over all possible target states. This is the same as the normalization factor in the update formula in (2-5) for no-detection likelihood, and represents the volume under the target belief state PDF after being carved by the no-detection likelihood. Considering the normalization factor in (2-5) in its particle filter representation and adding the travel distance term provides us with an alternative cost function as follows similar to the one in [35]:  Γ(𝒙𝑘+1, 𝒒𝑘 , 𝒒𝑘+1) = (1 − 𝜌) ∑ 𝜔𝑘𝑖 𝑝(𝒛𝑘+1|𝒙𝑘+1𝑖 , 𝒒𝑘+1)𝑁𝑖=1+ (2-16) 25  𝜌(𝒒𝑘 − 𝒒𝑘+1)TΦ(𝒒𝑘 − 𝒒𝑘+1). Irrespective of the adopted cost function, the planner should be able to account for the robot motion limitations such as joint and velocity limits via constraints. These limitations and constraints are discussed in the following section.  2.4.3 Constraints The constraints applied in the literature are usually of simple form such as maximum turning rate or speed in UAV and mobile robots that is implemented in the sensor motion model. However, in the case of high-DoF robotic manipulators with nonlinear constraints of joint limits, a constraint optimization is required. To avoid complexity, we introduce a penalty function to our cost functions in (2-15) and (2-16) in the following form to account for the joint limits:  Ψ(𝒒𝑘) = ∏[𝜓(𝑞𝑖)]𝑘𝑛𝑞𝑖=1, [𝜓(𝑞𝑖)]𝑘 = (1 + 𝑒−𝛾(𝑞𝑖−𝑞𝑖𝑚𝑖𝑛)(𝑞𝑖𝑚𝑎𝑥−𝑞𝑖)(𝑞𝑖𝑚𝑎𝑥−𝑞𝑖𝑚𝑖𝑛)2 )𝑘, 𝑖 = 1, … , 𝑛𝑞 . (2-17) In this penalty function, 𝑞𝑖𝑚𝑖𝑛 and 𝑞𝑖𝑚𝑎𝑥 represent the minimum and maximum joint limits of joint 𝑖 respectively. The coefficient 𝛾 is chosen to be a large number to suddenly increase the cost close to the joint limits. This penalty function is depicted in Figure 2-3 for a sample joint with joint limits of [−3 3] 𝑟𝑎𝑑 and various 𝛾. This penalty function is applied to the cost functions in (2-15) and (2-16) as follows:  Γ𝑐(𝒙𝑘+1, 𝒒𝑘 , 𝒒𝑘+1) = Ψ(𝒒𝑘). Γ(𝒙𝑘+1, 𝒒𝑘 , 𝒒𝑘+1). (2-18)  26   Figure 2-3. An example of the penalty function for a joint with joint limits of [−𝟑 𝟑] 𝒓𝒂𝒅. The same approach can be adopted to take the velocity and acceleration limits of a robotic manipulator into account. However, the alternative approach is to implement these limits in the sensor motion model in (2-2). In this chapter we implement the velocity limits as the constraints in the sensor motion model as we manipulate our hardware through velocity control. This is provided by:    𝒒𝑘+1 = 𝒒𝑘 + 𝒖𝑘Δ, 0 < |𝒖𝑘| < ?̇?𝑚𝑎𝑥 (2-19) where, Δ is the time step in the control loop and ?̇?𝑚𝑎𝑥 is the robot maximum velocity vector in joint space. Substituting this equation into (2-11) provides us with an observation model as a function of robot velocity. This would allow us to plan observations in a greedy or RHC fashion as discussed next. 2.4.4 Planning Horizon Even though low computational cost of greedy planning is appealing for a fast and efficient online LTS, it has the potential to get stuck in local minima [31]. As indicated in the literature, RHC can ameliorate this issue and also has other benefits in this context.  For example, the navigational challenges of a UAV in hostile environment makes RHC inevitable as discussed in [80]. RHC is also helpful for cases where sensors have limited FoV or constrained footprint [29], [70]. In such systems if the sensor location is far from the prior target distribution, a greedy search would not return an optimal solution. This is the case where the particles’ weights do not get updated as mentioned in Section 2.4.2. This is a rare scenario in UAV search as usually a broad prior PDF covers the entire search space based on the map, the available information of the terrain, and the lost target [31]. -3 -2 -1 0 1 2 3012345q(rad)   =10 =100 =1000𝜓 27    Figure 2-4. A simplified 2D search when sensor action in one step does not return any information. In eye-in-hand systems, even if the robot has limited navigational challenges apart from joint limits, the mounted camera is a limited FoV sensor, and the lost target has narrow prior PDF based on the limited information prior to leaving the FoV. Therefore RHC is an inevitable choice unless we address the cases where the next step observation over the entire range of sensor motion does not return any information. In this case, as depicted in Figure 2-4 for a simplified 2D search, the cost of any action is similar and the system lacks a policy to make an action. Instead of horizon planning for such scenario, we propose a switching objective function to solve this challenge for greedy planning. Whenever the observation model does not return any new information, we switch from minimizing the belief state entropy (or probability of no-detection) in (2-18) to minimizing the distance between the agent and the observation location that maximizes information gain. The switching happens when the observation model does not carve out any volume from the target belief state PDF. This switch is implemented through detection model in (2-8) when the weights of all particles are returned close to zero. Considering the particle’s weight vector and a small number 𝜀, the switch is:  ‖𝝎𝑘‖ < 𝜀, 𝑤ℎ𝑒𝑛 𝒛𝑘 = 𝑫𝑘. (2-20) When the switch happens, we propose to apply a sample-based approach that is more aligned with information gain. In fact we try to sample from 𝑝(𝒛𝑘+1|𝒛1:𝑘) when 𝒛𝑘 = 𝑫𝑘 that is the future observation in one step based on all previous observations. This is the PDF that appears in the denominator of the update formula in (2-5) for detection likelihood:   𝑝(𝒛𝑘+1|𝒛1:𝑘) = ∫ 𝑝(𝒙𝑘+1|𝒛1:𝑘)𝑝(𝒛𝑘+1|𝒙𝑘+1, 𝒒𝑘+1)𝑑𝒙𝑘+1. (2-21) -4 -3 -2 -1 0 1-1.5-1-0.500.511.5xy  ParticlesSensor footprintMaximum range in one step28  By sampling from this distribution, we are simulating a combinatorial optimization technique to find an approximate optimal sensor state ?̂?𝑘+1 that would minimize the belief state entropy in the next step if the observation could happen at that state. Then we try to minimize the distance from the current sensor state 𝒒𝑘 to that configuration as it is not possible to jump from 𝒒𝑘 to ?̂?𝑘+1 in one step. This would generate an action towards an optimal location in the future when no new sensor action in one step from 𝒒𝑘 can return new information.  The distribution in (2-21) is formed by propagated target state through state transition model, and the detection model. We follow the approach presented in [29] to randomly sample from this distribution when the target PDF is approximated by the particle filter. The particle filter allows us to use the particle locations and weights to approximate the distribution in (2-21) as follows:  𝑝(𝒛𝑘+1|𝒛1:𝑘) = ∑ 𝜔𝑘𝑖 𝑝(𝒛𝑘+1|𝒙𝑘+1𝑖 )𝑁𝑖=1. (2-22) In order to facilitate the random sampling process from (2-22), we need to replace the detection model with a distribution from which random sampling techniques already exist. A multivariate Gaussian distribution in particle’s space (Cartesian space) with mean and covariance as functions of 𝒙𝑘+1𝑖  results in a multivariate Gaussian mixture model (GMM) as follows:  𝑝(𝒛𝑘+1|𝒛1:𝑘) = ∑ 𝜔𝑘𝑖𝑁𝑖=1𝒩(𝜇(𝒙𝑘+1𝑖 ), ∑(𝒙𝑘+1𝑖 )). (2-23) We sample 𝑁𝑠 times (𝒙𝑘+1𝑠 , 𝑠 = 1: 𝑁𝑠) from this distribution while our number of samples is much smaller than our number of particles (𝑁𝑠 < 𝑁). Then each sample in Cartesian space is simply associated with a camera transformation matrix, 𝚻𝑘+1𝑠𝑐𝑏 , based on geometry by assuming that only camera orientation should change to point at the sample position. This is a valid assumption as long as the target is not lost due to an occlusion. These camera transformation matrices, { 𝚻𝑘+1𝑠 −1𝑐𝑏 , 𝑠 = 1: 𝑁𝑠}, replace 𝚻−1𝑐𝑒 𝚻−1𝑒𝑏 (𝒒𝑘) in (2-11) to provide simulated sample observations based on our heuristic observation model. Then the sample observation that gains maximum information (minimum entropy) results in an optimal sample point ?̂?𝑘+1.   ?̂?𝑘+1 = argmin𝒙𝑘+1𝑠(− ∑ 𝜔𝑘+1𝑖 log 𝜔𝑘+1𝑖𝑁𝑖=1) , 𝑠 = 1: 𝑁𝑠. (2-24) 29  Algorithm 1. LTS algorithm 1: Begin with target belief state at time 𝑘, 𝑝(𝒙𝑘|𝒛1:𝑘). (Figure 2-5 a) 2: Apply the state transition model and propagate the particles using (2-4). (Figure 2-5 b) 3: If ‖𝝎𝑘‖ ≥ 𝜀, 𝑤ℎ𝑒𝑛 𝒛𝑘 = 𝑫𝑘, then: a) Find the action 𝒒𝑘+1 in (2-19) that minimizes (2-18) using (2-15).     Else: b) Generate a GMM, 𝑝(𝒛𝑘+1|𝒛1:𝑘), using the propagated particles and (2-23). (Figure 2-5 c) c) Sample 𝑁𝑠 times from the GMM to find 𝒙𝑘+1𝑠  and associated 𝚻𝑘+1𝑠𝑐𝑏 . (Figure 2-5 d) d) Use (2-24) to find ?̂?𝑘+1 and ?̂?𝑘+1𝑐𝑏 . (Figure 2-5 e) e) Find ?̂?𝑘+1 based on ?̂?𝑘+1𝑐𝑏  through IK. f) Plan 𝒒𝑘+1 to minimize the distance between 𝒒𝑘 and ?̂?𝑘+1 considering (2-17) and (2-19). (Figure 2-5 f) 4: Make an observation at 𝒒𝑘+1 and update the belief state using (2-5). 5: Resample the particles using residual resampling. Finally the ?̂?𝑘+1 associated optimal camera transformation matrix, ?̂?𝑘+1𝑐𝑏 , results in the optimal sensor state, ?̂?𝑘+1, through one call of IK. This approach allows us to have different target and sensor spaces while bridging between them with low computational cost. It is through this switching greedy approach that we account for particle locations in the cost function and provide a tool to avoid getting stuck in local minima. Our overall greedy LTS algorithm including the switching cost function is presented in Algorithm 1. The stage a) in Algorithm 1 uses a constrained nonlinear solver (fmincon in MATLAB) to generate optimal 𝒖𝑘 that minimizes (2-18). The connection between 𝒖𝑘 in (2-19) and (2-18) is built by a series of substitutions: (2-19) in (2-11), (2-11) in (2-8), (2-8) in (2-7), (2-7) in (2-6), (2-6) in (2-15), and (2-15) in (2-18).  Moreover, to visualize the optimal sensor action planning process in one step when switched, we present a simplified 2D search where the target and sensor state space are the same as depicted in Figure 2-5. 30    a)                                                                                      b)   c)                                                                                     d)  e)                                                                                      f) Figure 2-5. Simulated greedy planning in a simplified 2D search when sensor action in one step does not return any information. All of the stages are addressed in Algorithm 1. A similar approach to our switching cost function is proposed in [84] where the objective function switches between the CDP and the agent’s distance to the mode of the target PDF when the CDP goes below a certain threshold. Their switch is the same as replacing the optimal sample in (2-24) with:  ?̂?𝑘+1 = 𝒙𝑘+1𝑖 , i = argmax𝑖(𝜔𝑖). (2-25) -4 -3 -2 -1 0 1-1.5-1-0.500.511.5xy  ParticlesSensor footprint-4 -3 -2 -1 0 1-1.5-1-0.500.511.5xy  Propagated particlesMaximum range in one step-4 -3 -2 -1 0 1-1.5-1-0.500.511.5xy  Gaussian distribution-4 -3 -2 -1 0 1-1.5-1-0.500.511.5xy  s=1s=2s=3Sample observation-4 -3 -2 -1 0 1-1.5-1-0.500.511.5xys=2  Optimal observation-4 -3 -2 -1 0 1-1.5-1-0.500.511.5xy  Next step observation31  This approach is interpreted as a tradeoff between detection reward along the path and the distance to the most probable place of the target. The same approach is adopted in [37] in the form of a multi objective function, where PD and distance of the sensor from the mode of the target PDF are combined linearly. However, it is claimed that the detection objective tends to explore local areas with high probability before moving the sensor to the area of maximum probability.  Navigating towards the peak of the target belief state PDF has also been applied as the only planner in the general search method in [33] and is known as streaming Drosophila search [22]. Another switching approach is used in [28] to plan greedy actions through a linear combination of a search and capture objective. When the sensor agent is far from the target, the distance between the agent and the weighted average of the particles (mean of the target belief state PDF) is minimized, and as it gets closer, the uncertainty in target state is minimized by minimizing the sum of variances on the state. This switch is the same as replacing the optimal sample in (2-24) with:  ?̂?𝑘+1 = ∑ 𝜔𝑖𝒙𝑘+1𝑖𝑁𝑖=1. (2-26) In the result section we will demonstrate the benefits of using (2-24) instead of (2-25) and (2-26) while keeping the rest of the algorithm consistent. We believe that our approach is a more general approach as it can include both (2-25) and (2-26) as preselected samples for (2-24). We will also compare the performance of our greedy planning using a switching cost function against a RHC method. We choose a computationally cheap method to approximate the cost function over a horizon. We simplify the information theoretic approach presented in [29] as follows to reduce the computational cost and make it more suitable for an online LTS planning. Instead of constructing an approximate cost function by samples over a planning horizon 𝜏 to plan the optimal control action, we choose the action associated with the sample with minimum cost as a worst-case (minimax) approach. We also simplify the entropy calculation to (2-14) instead of constructing a continuous distribution from particles for entropy measurement. Finally, we implement our proposed greedy planner to drive an efficient search on a high dimensional robotic platform. This implementation fully reveals the potential of our proposed observation model to update our knowledge of the target state in 3D Cartesian space when the sensor makes actions in 7D configuration space. It also shows rapid recovery of a lost target for visual servoing and tracking 32  applications where online and efficient algorithms are required to keep the target in the FoV as much as possible. Simulation and experimental results and comparisons are presented in detail in the next section.  2.5 Results In order to compare grid-based versus particle filter, the effect of different cost functions and horizon planning versus greedy search, we first implement our search strategy on two of the manipulator’s wrist joints. Treating the mounted camera as a pan-tilt search platform as a result would simplify our comparisons and make them alike the most common 2D search in the UAV literature. We also believe that as long as there is no occlusion, a pan-tilt search is a feasible search for online robotic vision applications. We then present improved performance by providing the results of our high dimensional search algorithm implementation.  As our LTS method is not analytically tractable, we compare its various aspects through multiple repeated simulation and experimental trials for various initial conditions and present the statistically analyzed empirical results. The quality measures used for comparison depend on the simulation trial termination condition as it is handled variously in the literature [31]. One termination condition is when CDP reaches a certain threshold, then the number of steps are compared as a measure [21]. A specific time is also considered as another termination condition while CDPs are compared as a measure [31].  We declare the target to be recovered in simulation when the CDP reaches 0.98. We then use the number of steps as the search speed measure to compare among different methods. We also consider the travel distance of the sensor during search as a measure to account for sensor action cost. Here, the travel distance is measured by the Euclidean distance in joint space. In each set of simulation comparisons the result of 100 simulation trials are provided in this chapter. 2.5.1 Experimental Platform To implement our proposed online high dimensional LTS algorithm, we consider a visual servoing platform incorporating in a 7-DoF robotic arm (Barrett Technology WAM) equipped with a PrimeSense Carmine 1.09 camera tracking an object tagged with an ARTag6 as presented in Figure 2-6. The RGB images are only used with a model-based ARTag detection to detect and localize the target in 3D with respect to the camera.                                                        6 ARTag is a fiduciary marker system to support augmented reality [142]. The ARTag tracking library is available at: http://wiki.ros.org/ar_track_alvar 33   Figure 2-6. Experimental platform. We utilize a visual servoing algorithm capable of docking to and then following a target moving at 5 cm/s. The target is mounted on a toy train moving along a stadium shaped train track with circumference of 178 cm. As the toy train’s maximum speed is about 8 cm/s, the robot cannot keep up with the target, and it leaves the FoV. At that point our algorithm is triggered to quickly recover the target along the uneven track as quickly as possible. A similar platform is simulated where the train track can have any random shape, with a constant acceleration motion model, to validate the capabilities of our system. Here we compare each aspect of our search algorithm step by step, starting with the RBE solution in the next section. 2.5.2 Recursive Bayesian Estimation To compare the performance of particle filter versus grid-based method in solving the RBE of the target we considered a pan-tilt search with our proposed switching constrained cost function in (2-18) and greedy planning in simulation. We further assumed that our target is stationary but placed outside the camera FoV with random initial positions. As the relative initial placement of sensor and target affects the search, we kept the robot initial configuration the same and generated 100 random initial locations for the target. Both particle filter and grid-based methods solved the RBE for the 100 random initial target locations considering the stationary target case of the LTS parameters in Table 2-1.      34  Table 2-1. LTS algorithm parameters in simulation  Stationary Target Moving Target 𝑝(𝒙0) 𝒩 (𝜇(𝒙0), [2.25 0 00 2.25 00 0 0]) × 10−2 𝑢𝑛𝑖𝑓(𝒙0 − [0.1,0.1,0], 𝒙0 + [0.1,0.1,0]) 𝒘𝑘 𝒩 (𝜇(𝒙𝑘), [1.5 0 00 1.5 00 0 0.015] × 10−3) 𝒩 (𝜇(𝒙𝑘), [1.5 0 00 1.5 00 0 0.015] × 10−3) [𝑥, 𝑦, 𝑧]0 [0.3, −0.5,0.4] ≤ [𝑥, 𝑦, 𝑧]0 ≤ [1,0.5,0.4] [0.3, −0.5,0.4] ≤ [𝑥, 𝑦, 𝑧]0 ≤ [1,0.5,0.4] [?̇?, ?̇?, ?̇?]0 [?̇?, ?̇?, ?̇?]0 = [0,0,0] −[0.2,0.2,0] ≤ [?̇?, ?̇?, ?̇?]0 ≤ [0.2,0.2,0] [?̈?, ?̈?, ?̈?]0 [?̈?, ?̈?, ?̈?]0 = [0,0,0] −[5,5,0] × 10−3 ≤ [?̈?, ?̈?, ?̈?]0≤ [5,5,0] × 10−3 𝛼 10 10 [𝐿𝑢, 𝐿𝑣] [640, 480] [640, 480] 𝜖 10 10 [𝛽𝑠 , 𝛽𝑙] [104, 10] [104, 10] [𝑑𝑠 , 𝑑𝑙] [0.1, 1.5] [0.1, 1.5] 𝑁 64 64 𝑁𝑠 40 40 𝜌 0.5 0.5 𝜀 10−5 10−5 𝒒0 [−0.5, 0.9, 0, 1.7, 0, 0.5, 0] [−0.5, 0.9, 0, 1.7, 0, 0.5, 0] 𝒒𝑚𝑖𝑛 [−2.6, −2.0, −2.8, −0.9, −4.76, −1.6, −3.0] [−2.6, −2.0, −2.8, −0.9, −4.76, −1.6, −3.0] 𝒒𝑚𝑎𝑥 [2.6, 2.0, 2.8, 3.1, 1.24, 1.6, 3.0] [2.6, 2.0, 2.8, 3.1, 1.24, 1.6, 3.0] ?̇?𝑚𝑎𝑥 [1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0] [1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0] Δ 0.05 0.05 𝛾 104 104 We further evaluated the effect of increasing the number of particles/grids on the LTS performance for the same 100 initial conditions as presented in Figure 2-7. In all of the scenarios, the particle filter  35     Figure 2-7. Grid-based versus particle filter to solve for RBE of the target considering different number of grids/particles from time step and travel distance point of view. outperforms the grid-based method from the time step and travel distance point of view. Therefore, it is our choice to solve the RBE for our LTS from this point forward. In addition, both measures decrease by about 4% and 10% while consecutively quadrupling and then octupling the number of particles in the particle filter (originally 16). As the computational cost in particle filter increases at least linearly with number of particles we choose to use 64 particles for our experiments in the following sections. Note that, the T-bars in Figure 2-7, as well as the upcoming figures, represent 95% of the range of the measured parameter given multiple, 100 in this case, random initial conditions.   2.5.3 Cost Function To compare the effect of different cost functions in (2-15) and (2-16) we first address our proposed switching method presented in (2-24) and justify it against the alternatives in (2-25) and (2-26). Here we demonstrate the common drawback of each alternative option. The shortcoming of the switch presented in (2-25) is when all the particles have the same weight, e.g., at the beginning when no observation is made. In this case any particles can represent the mode of the target PDF as depicted in Figure 2-8 a), delaying the information gathering process.  The drawback of the switch in (2-26) is when there is a bifurcation in particles’ distribution, in which case the target PDF mean is a local minimum that no particle gets updated as depicted in Figure 2-8 b). On the other hand, our proposed switch in (2-24) that always tries to gain maximum information is a more general switch as it can include both of the alternative switches as two hardcoded samples. The only argument against our proposed switch can be its computational cost, which is a fraction of the entire algorithm as long as (𝑁𝑠 < 𝑁). Therefore, we proceed with our proposed switch from now on. 0102030405016              64            256Time step00.511.522.516              64              256 ← # of Grids/ParticlesTravel distance (rad)Grid-basedParticle filter36   a)                                                                                         b) Figure 2-8. Common cases where a) (2-25) can cause a delay and b) (2-26) might fail by getting stuck in a local minimum. Considering our proposed switch, we compare the cost functions in a pan-tilt search for a moving target with parameters in Table 2-1 and greedy planning. We again kept the robot initial configuration fixed while generating 100 random constant acceleration motion models with random initial position and velocity for the target. We kept this repository of random target motions consistent in all of the simulation evaluations from this point forward. This allowed us to track the effects of cost function, horizon planning and sensor dimensionality as presented later.  The simulation results for the two cost function choices are presented in Figure 2-9. Even though the entropy-based cost function outperforms the PD cost function from time step point of view, its travel distance is only slightly better. Comparing within each trial, the entropy-based pan-tilt greedy search has fewer time steps in 55% of the trials while only in 39% of the trials has less travel distance than PD pan-tilt greedy search. This similar performance confirms the statement in [29] that in low detection probability, “the maximum probability of detection has been shown to nearly solve the minimum entropy problem”.  Figure 2-9. Entropy-based cost function versus PD cost function from time step and travel distance point of view for 100 simulation trials. 00.51 -0.500.5-0.500.5xyz WAM 7DOF  Sensor pathParticlesCamera axis projection00.51 -0.500.5-0.500.5xyz WAM 7DOF  Sensor pathParticlesCamera axis projection051015202530Time step00.20.40.60.81Travel distance EntropyPD37    a)                                                                                                     b) Figure 2-10. Designed experiment to lose the target during tracking, a) the moment that the searching phase is triggered, b) the moment that the target is recovered. We also compare the cost functions’ performance in real experiments. In order to provide consistent and challenging conditions for comparison, we designed an experiment in which the target is intentionally occluded for a certain period of time. We placed a tunnel (depicted in Figure 2-10) over a section of the train track making the target suddenly invisible during tracking for about 8.1 s (65 cm) while the train was moving at around 8 cm/s. While our visual servoing algorithm tracked the target using all of the 7 DoFs, a pan-tilt search was triggered as soon as the target went into the tunnel. This experiment was conducted 40 times for each cost function and their results are presented in Figure 2-11.  Among 40 experimental trials for each of the cost functions, 15% of them failed and were excluded from the results in Figure 2-11. We defined a trial to fail when the train finishes a lap without being recovered by the robot, i.e., the target would have been recovered if the robot just had waited for it to show up again. However, all of the 15% failed trials did recover the target in the train’s second lap (after coming out of the tunnel again). Apart from the number of time step and travel distance, we compare the actual time it took the planner to recover the target from the moment that the target left the FoV including the 8.1 s it spent under the tunnel. This comparison indicates that, even in a greedy searching phase, our robot control loop rate slowed down to about 8 Hz consistently for both cost functions compared to about 16 Hz in the tracking phase. Finally, the experimental results presented in Figure 2-11 confirm the simulation results that both cost functions perform similarly from all measures point of view. However, we choose the entropy-based cost function for our LTS algorithm upon its slightly better performance for further evaluation in the following sections. 38   Figure 2-11. Entropy-based cost function versus PD cost function from time step, time and travel distance point of view for 40 experimental trials. 2.5.4 Planning Horizon To compare the performance of our proposed greedy algorithm against a RHC, we considered a pan-tilt search in simulation only with the same initial conditions as last section. The results of our entropy-based pan-tilt greedy search presented in the last section (Figure 2-9) is compared against a computationally cheap RHC with horizon planning of 5 and 40 trajectory samples as explained before. The results as presented in Figure 2-12 show the superior performance of our proposed greedy algorithm for both measures. Comparing within each trial, the entropy-based pan-tilt greedy search has fewer time steps in 89% of the trials while only in 56% of the trials has less travel distance than entropy-based pan-tilt RHC. However, one cannot conclude that RHC has lower performance in LTS in general as our presented RHC is highly simplified to satisfy low computational cost for online applications. We believe that a more sophisticated RHC as presented in [29] has the potential to beat the greedy performance, but its computational cost would make it intractable for online real world visual servoing/tracking implementation. This is supported by the fact that even our simplified RHC in its current format for pan-tilt search is still more expensive than our greedy approach, as computing the belief state for each of the trajectory samples is costly. The RHC computational cost is further discussed in the next section.   Figure 2-12. Greedy planning versus RHC from time step and travel distance point of view for 100 simulation trials. 020406080100120140Time step05101520Time (s)02468Travel distance EntropyPD051015202530Time step00.20.40.60.811.2Travel distance GreedyRHC39  2.5.5 Generalizing to Higher Dimensions Another benefit of a greedy search is its capability to increase the problem dimensionality. Given our LTS problem, we can expand the search over all of the robot’s DoFs. However, in RHC this would add exponentially to the solution space and make it intractable. For example in [72], replacing a UAV’s fixed camera with a gimbal camera is presented as a challenge as it adds to the sensor space dimensions and requires more branching in the path optimization problem. We have formulated our cost function in (2-18) in such a way that can easily extend the sensor state to higher dimensions while considering all of the constraints. Therefore, we compare the performance of our pan-tilt search (Figure 2-9 & Figure 2-12) against the full 7-DoF search in simulation considering the same initial conditions. The results as presented in Figure 2-13 show that the 7-DoF search outperforms the pan-tilt search in both measures. Comparing within each trial also shows that our entropy-based 7-DoF greedy search has fewer time steps in 77% of the trials and less travel distance in 89% of the trials than the entropy-based pan-tilt greedy search. The main reason, as observed during simulation trials, is that the extra DoFs of the sensor allow it to zoom back quickly to reacquire the target. This action provides a broader FoV around the lost target vicinity resulting in a quicker recovery. This outcome is depicted in a sample simulation trial in Figure 2-14 comparing the sensor path and CDP of the two search methods.   Figure 2-13. 2-DoF (pan-tilt) search versus 7-DoF (high dimensional) search from time step and travel distance point of view for 100 simulation trials.   0510152025Time step00.20.40.60.81Travel distance 2-DoF7-DoF40   a)                                                                                                        b)  c) Figure 2-14. A sample simulation of 2-DoF (pan-tilt) search versus 7 DoF (high dimensional) search: a) shows a 2-DoF search at time step 18, b) shows a 7-DoF search at time step 10 and c) compares the CDP of these two search methods. To validate our simulation work, we compared the performance of our pan-tilt versus 7-DoF search in an experiment for the actual case where our visual servoing algorithm could not keep up with our faster moving target as described in the Section 2.5.1.  During 20 experimental trials for each of the search methods (pan-tilt versus 7-DoF), the robot lost the fast moving target at some point during tracking phase. At that point the LTS was triggered and remained active until the target was recovered. The results were compared from time step, time and travel distance point of view as presented in Figure 2-15. In these experimental results the 7-DoF search outperforms the pan-tilt search from time step and travel distance point of view confirming the simulation results. However, the actual target recovery time in 7-DoF search took longer than pan-tilt search. This is related to the higher computational cost for the 7-DoF search compared to pan-tilt search as these planners were controlling the robot on average at 3 Hz and 5 Hz respectively during the searching phase.  00.51 -0.500.5-0.500.5xyz WAM 7DOF  TargetTarget pathSensor pathParticlesCamera axis projection00.51 -0.500.5-0.500.5xyz WAM 7DOF  TargetTarget pathSensor pathParticlesCamera axis projection0 5 10 15 20 2500.20.40.60.81Time stepCDP  2 DoF7 DoF41   Figure 2-15. 2-DoF (pan-tilt) search versus 7-DoF (high dimensional) search from time step, time and travel distance point of view for 20 experimental trials. The zooming back nature of the 7-DoF search actions to recover the target was usually observed during the experimental trials. A sample trial is depicted in Figure 2-16 for both pan-tilt and 7-DoF search. The tracking phase of the experiment is shown in Figure 2-16 a) while the moment that the searching phase is triggered is shown in Figure 2-16 b). The moment that the pan-tilt search recovers the target only by reorienting the mounted camera is depicted in Figure 2-16 c). In contrary, Figure 2-16 d) shows the target recovery moment by 7-DoF search where planned zooming back in addition to camera reorientation has led to target recovery.   a)                                                                                            b)   c)                                                                                              d)  Figure 2-16. A sample experiment comparing 2-DoF (pan-tilt) search versus 7-DoF (high dimensional) search: a) tracking phase, b) the moment that the target leaves the camera FoV, c) the moment that the target is recovered in a 2-DoF search, d) the moment that the target is recovered in a 7-DoF search. 010203040506070Time step0246810Time (s)00.511.522.53Travel distance (rad)2-DoF7-DoF42   2.6 Conclusion In this chapter we discussed the problem of eye-in-hand systems as a high dimensional robotic platform actively recovering a target that leaves the camera FoV in an online manner. As robots move into more complex and unstructured environments, solving this LTS problem is highly valuable for visual tracking/servoing applications. It increases robot’s level of autonomy by providing a transition to recover a lost target as soon as possible with minimum cost. In this chapter we addressed this problem in a holistic manner and systematically compared it to other approaches in the literature A novel observation model was presented to relate the space in which the target state evolves to the space that the agent plans the search, while these two spaces can be of different types and dimensions. The problem was formulated as an optimization problem to plan fast and low cost sensor actions in higher dimensions while avoiding robot’s kinematic constraints.   We believe that using particle filter, our novel observation model, switching cost function and greedy planning are the key steps towards online search in higher dimensions. The particle filter allows us to evolve the target belief state cheaply, suitable for higher dimensions of target space such as 3D Cartesian space. Our observation model provides the link between the target space and the robot C-space. Our switching cost function avoids getting stuck in local minima while planning informative sensor actions considering robot’s kinematic constraints. The greedy planning approach also allows increasing the problem dimensionality with low computational cost, making it suitable for high dimensional online LTS. We then compared the performance of various features of our overall algorithm in simulation trials against other common methods to show the superior performance of our proposed algorithm. Among entropy-based pan-tilt greedy search, PD pan-tilt greedy search, entropy-based pan-tilt RHC and entropy-based 7-DoF greedy search that were applied to the same 100 randomly generated simulation trials, our entropy-based 7-DoF greedy search outperformed all of the other methods. Comparing within each trial of all methods, it has fewer time steps in 72% of the trials and less travel distance in 77% of the trials. Real world implementation of the algorithm was also presented on an eye-in-hand visual servoing platform while servoing to and tracking a moving target. We demonstrated that our high dimensional online LTS is capable of recovering a fast moving target quickly after it leaves the FoV due to the limitations of our visual servoing algorithm.  43  Our results also showed that a high dimensional search improves the performance, as extra DoFs provide better clearing actions. In addition, a pan-tilt search might not provide feasible views due to environmental constraints. For example, a lost target scenario caused by an occlusion cannot be handled through a pan-tilt search due to the physics of the scenario. In such scenarios, the high dimensional search should be used to reposition the camera in addition to potential reorientation through planning. Solving the optimization problem in the presence of occlusions is the focus of the next chapter.   44  Chapter 3 - Resolving Occlusion in Active Visual Target Search of High Dimensional Robotic Systems 3.1 Introduction In the previous chapter an algorithm was developed to actively search for a target that is outside the camera FoV of high dimensional eye-in-hand systems without considering environmental constraints. In this chapter the LTS algorithm is improved to handle visual occlusions that disrupt visual tracking in those systems. This chapter enhances the observation model and the cost function presented in the previous chapter to consider occlusions. To allow efficient planning, exhaustive mapping of the 3D occluder into C-space is avoided, and instead informed samples are used to strike a balance between target search and information gain. The overview of this chapter to develop this LTRA is as follows: A target tracking module estimates the occluded target’s position based on the latest observations and a motion model. Simultaneously, a simple probabilistic occlusion map is constructed. Based on this continuously-evolving map and the estimated target state, a planning module selects new arm configurations. The planner selects configurations such that the expected resulting observations are likely to update the information of the target state and the occluder extent while minimizing the action cost. This process is repeated until the target is recovered. The planning module in this chapter is represented by a constrained optimization problem that does not permit a closed-form solution. Therefore, this chapter proposes two Monte Carlo (i.e., sampling-based) optimization approaches that are efficient in their calls to IK. One approach that is adopted from NBV literature uses Cartesian space sampling and a call to IK per sample. The other novel approach uses a few calls to IK at critical points defined around known occluder boundaries and the frontiers of unknown space. These critical points provide informative seeds to direct our C-space sampling, which increases its efficiency. The two approaches are then compared against the common approach of random sampling in C-space. The author is aware that all of these approaches come at the price of returning only finite-sample 45  approximations of the optimal configuration. Therefore, in the text below, such solutions are referred to as approximate solutions and note that they are not strictly optimum solutions of the original optimization.  This chapter extends the work in [86] by improving the cost function and solution techniques. It also includes new simulation and experimental results from an application of the algorithm that is presented in [87]. This chapter first reviews some relevant literature in the next section. Section 3.3 explains the details of the problem. Then the algorithm is presented and the contributions are elaborated in Section 3.4. The results are presented in Section 3.5 followed by conclusion in Section 3.6. 3.2 Related Work A closely related field to active target search, under active vision, is NBV planning that is mainly used to map an environment or model an object. As compared in [88], both fields try to render previously unseen areas visible in an efficient manner with different goals. Both problems are formed as optimization problems where approximate solutions are offered in the literature. Therefore, we review some of the related literature in NBV planning where similar approaches are adopted to form a cost function and to solve it approximately.  Combining various goals/constraints as separate terms in the cost function is a common practice in NBV planning. An example is provided in [89], where one or a team of mobile manipulators equipped with limited-range 3D vision sensor map a stationary target through NBV planning. The cost function in this work tries to maximize the new information, maintain or improve the map accuracy and minimize the exploration time by linearly combining these terms. The NBV planning in [90] uses a different information gain approach based on surface reconstruction uncertainty and combines it with cost of motion among poses to form the cost function. The depth images along the planned trajectory are then used to model the object in 3D. To solve these cost functions efficiently in NBV planning, approximate sampling-based approaches are commonly employed. For example, a set of uniform random samples and predefined samples in Cartesian space are used respectively in [89], [91] to select the optimal observer’s next viewpoint to reconstruct object models. Both methods also consider workspace limits by checking the kinematic constraints. The best viewpoint at each iteration in [90], [92] is picked from a poll of sample viewpoints in 3D on a sphere centered around a stationary object of interest. The feasibility of these samples are checked in [90] through IK while the sensor kinematic constraints are not considered in [92]. However, these random 46  sampling approaches in 3D with limited constraints might not be efficient when the overall goal is beyond modeling an object.   The aforementioned sampling-based approaches to solve a cost function in NBV planning are a subset of sampling-based robot motion planning. The sampling-based motion planners in general generate fast but sub-optimal solutions in C-space between a start and a goal configuration while satisfying kinematic and environmental constraints (mainly obstacles) [93]. A recent review of various sampling-based motion planners is provided in [94]. A comparison of the two most common planners PRM, as a multiple-query motion planning, and RRT, as an incremental single-query motion planning, is also provided in [95]. In this work, we adopt a sampling-based planner to actively search for an occluded target in a one-step fashion that is explained later. It is a type of single-query sampling approach to find a goal configuration instead of planning to reach a goal configuration.  Early principles and foundations of active search have been established in [88], [96]. The work in [96] focuses on the complexity analysis of active visual search, while the work in [88] presents various approximate solutions to a specific simplified search. The approximate solutions for specific search scenarios have significantly improved in recent years thanks to higher computational power. However, the extension of active search to higher dimensions still imposes significant challenges. For example, a mobile robot in [32] visually searches for a stationary object in 3D; though the implemented algorithm only uses the vertices of a 2D grid by tessellating both the search space and the action space. In fact, a recent survey of search in mobile robots describes the extension of efficient algorithms for practical search into higher dimensions as an open problem [20].  Further, the few previous methods that apply high dimensional search to eye-in-hand systems only present simulation results and do not consider occlusions [35], [37]. Therefore, high dimensional search to recover a moving occluded target in a dynamic environment while considering the kinematic and environmental constraints is still not fully explored.  Resolving occlusion, as an environmental constraint, adds to the complexity of the active search problem and is only reported in a few places. A multistage optimization algorithm to search for partially observed stationary targets in 3D environments using multi-DoF eye-in-hand systems is presented in [36]. While this approach considers robot joint limits and singularities, it is slow even for stationary targets due to the adopted image-based deterministic cost function. A probabilistic cost function is adopted in [97] to search for a stationary target while detecting and avoiding occlusions using a tessellated 2D map. This paper describes a passive approach for handling occlusions, while active strategies are mentioned as potential future work. In [88], the authors compare two model-based approaches to select viewpoints against two 47  model-free approaches: a fixed viewpoint approach and a viewpoint selection approach based on occluding edges.  Their findings demonstrate that extensive modeling of the occlusions might not be necessary in a search task. However, this was only applied to the simple 1D search case of circular rotations around the cluttered environment.  Here we adopt a simple modeling technique for the occlusion where it provides us with information about occluding edges. We then use these occluding edges to select visibility-enabling viewpoints in higher dimensions. Our method efficiently ensures that the camera position allows rays emanating from it to reach the target without passing through the occlusion. At the same time, the orientation of the camera is steered towards the target/occlusion to maximize the information gain. This combines the two stages of “where to move next” and “where to look next” in a 3D mobile robot search introduced in [85].  We must also address the related problem of constructing occluder maps, as these enable our visibility reasoning. Our method accomplishes this using the concept of entropy maps, motivated by Zhang et al. [98]. Therein, a visual search strategy for eye-in-hand manipulator planning explicitly avoids uninteresting regions of the world. The authors utilized a novel extremum seeking control (ESC) over entropy maps, constructed from images, in real time, without a priori information. The same concept enables our action planner to locate viewpoints that are likely to yield additional information about the target or occluder extents. We explain the details of this action planner later to solve our high dimensional active visual target search problem.  3.3 Problem Formulation When a moving target is inside the camera FoV, the target can be visually tracked and followed (visual tracking/servoing). Herein, we refer to this situation as the “tracking phase”. When an occluder occludes the target, no new data about exact target state is available to the sensor, and tracking/servoing algorithms fail. However, the knowledge about the target based on the last available information prior to occlusion can be used to estimate its state over time.  We refer to this situation as the “searching phase”. A state transition model, 𝑝(𝒙𝑘|𝒙𝑘−1), propagates the target state, 𝒙𝑘−1 ∈ 𝑅𝑛𝑥, at time step 𝑘. Herein, the dimension of a vector α is written as 𝑛𝛼. The mounted camera then provides an observation model, 𝑝(𝒛𝑘|𝒙𝑘, 𝒒𝑘), also known as the likelihood. Here 𝒛𝑘 is the observation and 𝒒𝑘 ∈ 𝑅𝑛𝑞 is the sensor state (robot state) at time step 𝑘. We then update the target belief state, 𝑝(𝒙𝑘|𝒛1:𝑘), recursively through a prediction and update stage using an RBE technique [26]. The update is performed after making sensor 48  actions, 𝒒𝑘+1, that have been chosen by a planning algorithm. This planning algorithm formulates the searching process to resolve the occlusion and recover the target as an optimization problem.  We propose a planning cost function that strikes a compromise between exploring the occlusion’s extents and exploiting target recovery while minimizing the cost of making actions. We address the FoV constraint through our observation model, while applying joint limits as a constraint on the optimization problem.  We then solve the proposed nonlinear optimization problem using different solvers to avoid extensive mapping of the environment, which increases efficiency. This will achieve the overall goal of our system: performing sensor actions that restore a clear view of the target (i.e., the ray emanating from the camera to the target does not pass through the occluder thus terminating searching phase) in an efficient manner. Planning efficient sensor actions is the main technical focus of this chapter in the chain of predict, plan, act, and observe. This chain is what constitutes our LTRA and iterates in a one-step fashion until the occlusion is resolved and the target is recovered. At this point the searching phase terminates and the tracking phase resumes. 3.4 Methodology In order to plan efficient sensor actions to resolve the occlusion and recover the target, the first step is to implement an RBE to evolve the target belief state. To handle the nonlinearities in the observation model and some state transition models as well as any noise terms, we apply the SIR (sampling importance resampling) particle filter with residual resampling step as detailed in [63], [65]. In this case, the target’s a priori distribution, 𝑝(𝒙𝑘|𝒛1:𝑘), given a set of observations, 𝒛1:𝑘, starting from 𝑝(𝒙0) is approximated by a set of 𝑁 weighted Dirac delta functions as follows:  𝑝(𝒙𝑘|𝒛1:𝑘) ≈ ∑ 𝜔𝑘𝑖 𝛿(𝒙𝑘 − 𝒙𝑘𝑖 )𝑁𝑖=1 , (3-1) where, 𝒙𝑘𝑖  is one of the 𝑁 particles. The particles transit through the state transition model in SIR, while their normalized associated weights, {𝜔𝑘𝑖 , 𝑖 = 1: 𝑁}, are calculated based on the principle of importance sampling through the observation model and resampling stage such that ∑ 𝜔𝑘𝑖 = 1𝑁𝑖=1 .  In this work, the particles are points (𝒙𝑘𝑖 = [𝑥, 𝑦, 𝑧]𝑘𝑖 ) in Cartesian space representing probable target locations, as 𝒙 is expressed in Cartesian coordinates (𝑛𝑥 = 3). We also express our sensor state in the robot C-space (typically a 6 or 7 dimensional space for a robotic manipulator) where the vector 𝒒 49  represents a joint angle vector (𝒒𝑘 = [𝑞1, … , 𝑞𝑛𝑞]𝑘). By choosing to plan sensor actions in C-space, we avoid the need for robot singularity handling and feasibility checks through IK and facilitate joint limit avoidance by implementing them as a constraint. Therefore, an observation model must bridge between these two spaces in order to update the target belief state by particle filter. The updated target belief state is a major driver of the sensor action planner and is included in the cost function as explained in the following section.  3.4.1 Cost Function We formulate the task of finding next-best-views for target recovery as optimization of a mixed-initiative cost function defined in C-space. We linearly combine all of the different factors following common practice in NBV planning and target acquisition problems [71], [89], [99]. We consider the factors of “exploiting target recovery”, ΓExpt, and “exploring for the occlusion map”, ΓExpr, while minimizing the “cost of making action”, ΓDist. The resulting cost function is then as follows:  Γ(𝒙𝑘+1, 𝒒𝑘 , 𝒒𝑘+1) = ΓExpt(𝒙𝑘+1, 𝒒𝑘+1) + ΓExpr(𝒒𝑘+1) + ΓDist(𝒒𝑘, 𝒒𝑘+1). (3-2) We design all of these terms to result in small positive values, while only ΓExpt will be out of [0 1] range. This increases the weight of the exploiting target recovery term that we consider as the most important goal. We will later improve this cost function to account for sensor constraints. 3.4.1.1 Exploiting Target Recovery The target recovery term of our cost function favors motions that reduce our uncertainty in the target state. The entropy of a random variable in information theory is a measure of its uncertainty. For a continuous random variable 𝜃 with PDF 𝑝(𝜃), the entropy is calculated as follows:  H(𝑝(𝜃)) = − ∫ 𝑝(𝜃) log 𝑝(𝜃) 𝑑𝜃. (3-3) As more information becomes available, the entropy is reduced. Therefore, the entropy of the target belief state can provide us with a cost function to make informative observations of the target. As observations are collected over time, the entropy is calculated as follows:  H(𝑝(𝒙𝑘+1|𝒛1:𝑘+1)) = − ∫ 𝑝(𝒙𝑘+1|𝒛1:𝑘+1) log 𝑝(𝒙𝑘+1|𝒛1:𝑘+1) 𝑑𝒙𝑘. (3-4) 50  Utilizing the particle filter approximation, one can substitute (3-1) in (3-4) to achieve the following approximation for the exploitation cost:   ΓExpt(𝒙𝑘+1, 𝒒𝑘+1) ≈ − ∑ 𝜔𝑘+1𝑖 log 𝜔𝑘+1𝑖𝑁𝑖=1 , (3-5) that is the same as the entropy of a discrete random variable. Each particles’ associated weight is calculated through our observation model that is introduced later. One should note that, the particles’ location is not addressed in (3-5). However, this approximation based on the particles’ weight is a sufficient measure to reduce uncertainty of the target belief state as long as the weights are updated through observations. Our optimization solver, as discussed later, will always try to observe some particles while resolving the occlusion.  3.4.1.2 Exploring Occlusion  Our occlusion mapping process includes: i) the probabilistic model used to represent the system's partial knowledge of the occluder extents from a sequence of images; and ii) the information-theoretic measure used to capture representation completeness and fidelity and to compare the value of making each new observation. Here, we assume the occluder is a convex hull of known and potential edges. Our map construction process is illustrated in Figure 3-1.                                                                              a)                                                                                  b) Figure 3-1. a) Detecting only a single edge of the occluder. b) A view of the occlusion plane with the projected camera FoV, defining different types of occlusion edges. 1-3 represent known edges, 4 represents an unknown edge and 5-7 represent potential edges. 51  In the first camera image where the target is occluded, usually only a subset of the occluder’s edges is detected. We label these as known edges (e.g., edge 1 in Figure 3-1.b). The boundaries of the image limit our spatial perception. We name the limits of known occlusion regions caused by image extents as potential edges (e.g., edges 5, 6, and 7 in Figure 3-1.b). Until mapping is complete there are unknown edges (e.g., edge 4 in Figure 3-1.b). Our system assumes such edges exist because the occluder is a connected, bounded object, but has no knowledge of their location until they are directly observed in the sensor. Our probabilistic mapping model is formed from the edge-set by a flood-fill operation. Regions surrounded by known and potential edges are occlusion with certainty. The occlusion/visibility of regions beyond potential edges is considered uniformly uncertain, as our only a priori knowledge of the occluder shape is that it is bounded and connected. Finally, areas outside known edges are considered visible with certainty.  Our mapping algorithm incrementally constructs the edge map with each new image observation in order to resolve potential and unknown edges into known edges. Here we use a combination of occupancy grid and convex polygon models to represent the occluder map. The occupancy grid is used to calculate the information gain while the convex polygon model is used for our visibility condition as explained later. Each camera motion that observes a previously unseen portion of the occluder will lead to some additional map information, but we encourage actions that lead to as much new knowledge as possible. This is captured with expected Information Gain, 𝐼𝐺(𝒒𝑘+1), as:  𝐼𝐺(𝒒𝑘+1) = 𝐻𝑝𝑘(𝑚) − 𝐸 (𝐻𝑝𝑘+1(𝑚|𝒒𝑘+1)), (3-6) where 𝐻𝑝𝑘(𝑚) refers to the entropy of the occupancy probability, 𝑝𝑘, at time step 𝑘, of a map, 𝑚, defined as:  𝐻𝑝𝑘(𝑚) =  − ∑ 𝑝𝑘(𝑚𝑗) log (𝑝𝑘(𝑚𝑗))𝑗 . (3-7) Here 𝑚𝑗 denotes a grid cell of the occupancy grid map with index 𝑗. Hence, the information gain is defined as the measure of all the unexplored grid cells set. This is computed by a ray casting procedure similar to [100].    52  Finally we compute the maximum possible information gain, 𝐼𝐺𝑡, by resolving all of the remaining edges in the occupancy grid to known state within a single step, to normalize each computed 𝐼𝐺(𝒒𝑘+1) and obtain ΓExpr(𝒒𝑘+1) in the range [0 1]:  ΓExpr(𝒒𝑘+1) = 𝐼𝐺(𝒒𝑘+1)/𝐼𝐺𝑡. (3-8) 3.4.1.3 Cost of Making Actions We favour goal configurations that move the arm as little as possible. Therefore, we consider a travel distance term in the cost function to minimize the sensor action cost. This addresses the facts that in high-DoF robotic platforms multiple robot configurations can result in the same end-effector location or different observations might result in similar entropy reductions. The considered travel distance cost is then as follows:  ΓDist(𝒒𝑘, 𝒒𝑘+1) = (𝒒𝑘 − 𝒒𝑘+1)TΦ(𝒒𝑘 − 𝒒𝑘+1). (3-9) Here Φ is a diagonal matrix assigning different positive weights to different robot joints as they may not have equivalent travel costs. It also places ΓDist(𝒒𝑘, 𝒒𝑘+1) in the range [0 1]. 3.4.2 Constraints To solve the cost function in (3-2), one should consider the sensor constraints to assure feasible solutions. A camera-mounted manipulator, as a limited-range sensor agent, comprises FoV and kinematic constraints. We first address the FoV constraint by introducing a novel observation model that considers simple but efficient occlusion models. We then address joint limits constraint, a position constraint, by restructuring the cost function in (3-2). We also simplify the velocity and acceleration constraints implementation by considering their maximum in the manipulator’s motion model.  3.4.2.1 FoV Constraint The FoV constraint is affected by two principles. One is the limited-range characteristic of a camera model. The second is imposed due to the presence of an occlusion. Both of these effects can be addressed in a single observation model. This observation model is also used to update the target belief state in particle filter, and to incorporate new information from the occlusion model. In fact, the whole process of search and tracking can be handled through a particle filter as long as a thorough observation model is available. During the tracking phase, a detection event represents the observation, 𝒛𝑘 = 𝑫𝑘, while during 53  searching phase, a no-detection event represents the worst-case observation scenario by considering 𝒛𝑘 =?̅?𝑘. As the observations in the tracking and searching phase are complementary, the detection and no-detection models are combined in a Bernoulli model to cover both cases as follows:  𝑝(𝒛𝑘|𝒙𝑘, 𝒒𝑘) = [𝑝(𝑫𝑘|𝒙𝑘, 𝒒𝑘)]𝐼1(𝒙𝑘,𝒒𝑘). [𝑝(?̅?𝑘|𝒙𝑘, 𝒒𝑘)]𝐼2(𝒙𝑘,𝒒𝑘), 𝐼𝑖 = 0 𝑜𝑟 1. (3-10) Here we focus on the detection model as the no-detection model used in searching phase is simply its complement. We propose a continuous detection model based on a pin-hole camera model that encapsulates the depth parameter as well. A similar approach is considered in [85] to generate a geometry-based camera observation model called the sensed sphere.  This sensed sphere is a subsection of a sphere generated by intersecting the camera view volume defined by the camera bounding planes and concentric spheres of different radii centered at the camera location. The observation probability is considered constant within this discretized sensed sphere, which does not properly address depth and boundary effects in image processing.  To consider the image edge effect and a nominal detection depth range, and to bridge between the Cartesian space and C-space, we define our heuristic detection model in the image frame, ℱ𝑖, considering [𝑢, 𝑣, 𝑑]𝑘 coordinates. The image plane is rectangular, with poor detection capability on the edges. To encapsulate these features, we construct our detection model from the product of four sigmoid functions centered close to the edges of the image plane. When 𝐿𝑢 and 𝐿𝑣 represent the width of the image plane in 𝑢 and 𝑣 directions in pixels with a safety bound of size 𝜖, the model is expressed as follows:   𝑝(𝑫𝑘|[𝑢, 𝑣, 𝑑]𝑘) = 𝔣. ℵ(𝑑𝑘). ∏11 + 𝑒(−1)𝑖𝛼(𝑢𝑘−𝑙𝑢).11 + 𝑒(−1)𝑖𝛼(𝑣𝑘−𝑙𝑣)2𝑖=1, 𝑙𝑢 = (−1)𝑖(𝐿𝑢 2⁄ − 𝜖),  𝑙𝑣 = (−1)𝑖(𝐿𝑣 2⁄ − 𝜖), (3-11) where, 𝛼 defines the slope of the sigmoid functions. In this model, 𝔣 and ℵ(𝑑𝑘) are respectively the occlusion and depth coefficients to account for the occlusion and depth effect in the detection model. This model is depicted in Figure 3-2 with and without an occlusion for the case when ℵ(𝑑𝑘) = 1.  To consider the occlusion in the observation model in this work, we assume all occluders to be convex polyhedrons. It can also be shown that a pin-hole camera model preserves convexity [101]. Therefore, the occluder’s projection into the image plane is a convex polygon. We then define the occlusion coefficient based on a convex polygon encompassing 𝑖𝑖 number of edges as follows: 54   𝔣 = {0,   [𝑢𝑣] ∈ 𝒫1,   [𝑢𝑣]∉ 𝒫,   𝒫 = {[𝑢𝑣] : 𝐴 [𝑢𝑣] ≤ 𝑏}. (3-12) Here 𝐴 ∈ 𝑅𝑖𝑖×2 is a matrix of the normal vectors of all edges and 𝑏 ∈ 𝑅𝑖𝑖×1 is a vector of distance from the origin. The depth coefficient is then defined as follows: This coefficient is formed by product of two sigmoid functions of depth. One with a sharp slope (𝛽𝑠) represents the shortest range, 𝑑𝑠, that the object is detectable by the camera. This is usually the depth at which the entire target fits in the FoV or the target image deblurs. The second one with a mild slope (𝛽𝑙) defines the longest detection range, 𝑑𝑙, at which the target is too far to be detectable. Both 𝑑𝑠 and 𝑑𝑙 depend on the target size, sensor specifications, lighting condition and the detection algorithm that can be calculated or experimentally examined for a given platform in model-based detection algorithms. In the case of a 3D camera, such as the Microsoft Kinect, the depth information is extracted from point cloud data, and the parameters 𝑑𝑠 and 𝑑𝑙 are provided by the sensor’s manufacturer. The schematic representation of the overall detection model is presented in Figure 3-3.                                                         a)                                                                                                               b) Figure 3-2. A scaled detection model in 𝓕𝒊 for a specific depth with depth coefficient ℵ(𝒅𝒌) = 𝟏, a) without occlusion, b) with occlusion  ℵ(𝑑𝑘) =11+𝑒−𝛽𝑠(𝑑𝑘−𝑑𝑠).11+𝑒𝛽𝑙(𝑑𝑘−𝑑𝑙). (3-13) 55   Figure 3-3. Schematic representation of the detection model. The model presented in (3-11) is defined in the image space, while the particles are expressed in the Cartesian space. To transfer all of the particles defined in the world frame to the image frame, we consider the robot base frame, ℱ𝑏, as the world frame. In this case all of the points, 𝒙𝑘, are expressed in the base frame as 𝒙 𝒃𝑘. Then, the next step is to express all of the points in the camera frame, ℱ𝑐, through a homogenous transformation matrix 𝚻(𝒒𝑘)𝑐𝑏 . This represents transformation matrix from the camera frame to the base frame and is calculated through the kinematics chain. The last step to express 𝒙 𝒃𝑘 in the image frame, ℱ𝑖, is to use the camera calibration matrix, Κ, defined by camera intrinsic parameters. This process is as follows:  [𝑢𝑣𝑑] 𝑖𝑘(𝒒𝑘) = Κ[𝐼|0] 𝚻−1𝑐𝑏 (𝒒𝑘) [𝑥𝑦𝑧1] 𝑏 𝑘⟼ [𝑢𝑣𝑑] 𝑖𝑘= Κ[𝐼|0] [𝑥𝑦𝑧1] 𝑐 𝑘. (3-14) Here [𝐼|0] is a matrix formed by a 3x3 identity matrix and a zero vector. Substituting (3-13) and (3-14) into (3-11) provides the heuristic detection model that is a function of target and sensor state (𝒙𝑘 and 𝒒𝑘). This detection model or its complement is then used to update the target belief state in the particle filter update stage. It is also used to form the exploiting target recovery term of the cost function in (3-5) to make efficient sensor actions in the searching phase. Therefore, the no-detection model is considered as the observation model during the searching phase. As target is recovered, the same procedure can be 1 0.5 0.1 z 1 1 0.5 0.1 z Occluder ℵ 56  followed by adopting the detection model instead of the no-detection model to continue with the tracking phase.  3.4.2.2 Kinematic Constraint The sensor kinematic constraints are usually applied in simple forms in the literature, such as maximum turning rate or speed in UAV and mobile robots implemented in the sensor motion model. In the case of high-DoF robotic manipulators with nonlinear constraint of joint limits, a constrained optimization is required. This constrained optimization problem, as the sensor action planner, plans actions, observations, in a greedy fashion as follows:  𝒒𝑘+1 = argmin𝒒Γ(𝒙𝑘+1, 𝒒𝑘, 𝒒) , 𝑠. 𝑡.  𝒒𝑚𝑖𝑛  ≤ 𝒒 ≤ 𝒒𝑚𝑎𝑥. (3-15) It uses the cost function in (3-2) while considering the minimum and maximum joint limits vectors, 𝒒𝑚𝑖𝑛 and 𝒒𝑚𝑎𝑥 respectively. In order to efficiently utilize stochastic optimization, we modify our cost function with a penalty function that accounts for the joint limits:  Ψ(𝒒𝑘) = ∏[𝜓(𝑞𝑖)]𝑘𝑛𝑞𝑖=1, [𝜓(𝑞𝑖)]𝑘 = (1 + 𝑒−𝛾(𝑞𝑖−𝑞𝑖𝑚𝑖𝑛)(𝑞𝑖𝑚𝑎𝑥−𝑞𝑖)(𝑞𝑖𝑚𝑎𝑥−𝑞𝑖𝑚𝑖𝑛)2)𝑘, 𝑖 = 1, … , 𝑛𝑞. (3-16) In this penalty function, 𝑞𝑖𝑚𝑖𝑛 and 𝑞𝑖𝑚𝑎𝑥 represent the minimum and maximum joint limits of joint 𝑖 respectively. The coefficient 𝛾 is chosen to be a large number to suddenly increase the cost close to the joint limits. This penalty function is depicted in Figure 3-7.a for a sample joint with various values for 𝛾. This penalty function is applied to the cost function in (3-2) as follows:  Γ𝑐(𝒙𝑘+1, 𝒒𝑘, 𝒒𝑘+1) = Ψ(𝒒𝑘)Γ(𝒙𝑘+1, 𝒒𝑘 , 𝒒𝑘+1). (3-17) The same approach can be adopted to account for the velocity and acceleration limits of a robotic manipulator. However, we consider motion with maximum velocity and acceleration of the manipulator in this chapter for simplicity. As a result, the constrained optimization problem in (3-15) is restructured as follows to plan actions: 57    𝒒𝑘+1 = argmin𝒒Γ𝑐(𝒙𝑘+1, 𝒒𝑘, 𝒒). (3-18) Solving (3-18) would result in optimal sensor actions to resolve the occlusion and recover the target. However, its solution is not analytically feasible and approximate solutions are all one can guarantee. This is the focus of the next section, which introduces several approximate sampling-based approaches. 3.4.3 Optimization The cost function in (3-17) is nonlinear and nonconvex due to the nature of high dimensional robotic systems and our proposed observation model. Therefore, a numerical technique is required as the optimization does not have a closed form solution. An exhaustive search in C-space easily becomes intractable due to higher dimensions of the solution space. Therefore, we turn this problem into a Monte Carlo optimization problem and apply three different sampling-based solvers. Irrespective of the solver, the resulting configuration places the camera in the target’s visibility region. Here we review and define some terminologies before using them in this section.  The visibility region, 𝑅𝑣𝑖𝑠, and the occluded region are defined in [46] as complementary regions of a free space around a target. The boundary between these two regions is formed by the occluder itself and what we define as shadow planes (visibility planes). A shadow plane is then constructed around the target by occluder edges. Each detected edge of the occluder defines a shadow plane. Figure 3-4 illustrates the   Figure 3-4. A schematic representation of a shadow plane and critical points, 𝒙𝒄𝒓, in the vicinity of the closest points on the shadow plane to the camera and robot’s base location. 58  geometry used to construct a shadow plane from the target’s estimated position, ?̂?𝑘+1, and an occluding edge. The target’s estimated position in this chapter is calculated by computing the expectation of the particle filter:  ?̂?𝑘+1 = ∑ 𝜔𝑖𝒙𝑘+1𝑖𝑁𝑖=1 . (3-19) 3.4.3.1 Random Sampling The “Lost Target Recovery Algorithm – Random Sampling” (LTRA-RS) is our benchmark algorithm to solve the cost function in (3-17). This technique was also adopted in [89]–[92] for NBV planning where uniform random samples were generated near the current robot configuration or on a sphere centered around a stationary object. Here we randomly sample from a multivariate uniform distribution in robot’s 𝑛 dimensional C-space, where 𝑛 is the number of joints. Given joint limits [𝒒𝑚𝑖𝑛 𝒒𝑚𝑎𝑥], the proposal distribution is as follows:   𝜋 = {1∏ (𝑞𝑖𝑚𝑎𝑥−𝑞𝑖𝑚𝑖𝑛)𝑖 , 0 ,  𝑖𝑓 𝑞𝑖𝑚𝑖𝑛 < 𝑞𝑖 < 𝑞𝑖𝑚𝑎𝑥 𝑂𝑡ℎ𝑒𝑟𝑤𝑖𝑠𝑒, 𝑖 = 1, … , 𝑛𝑞. (3-20) These samples of joint vector cover the whole robot task space as shown in Figure 3-5 and are directly used to solve (3-18). However, a significant number of these samples fall into the occluded region and cannot recover the target as shown by red stars in Figure 3-5.                                                           a)                                                                                                             b) Figure 3-5. One-step look-ahead planning using LTRA-RS. Samples are derived randomly in C-space. a) Rectangular board occlusion scenario. b) Human occlusion scenario. -101 -1 01 23 45-0.500.511.5BoardTargetX YZ   WAMConvex HullMapped AreaObserved AreaObserved ParticlesOccluded ParticlesEstimated TargetShadowed SamplesVisible Samples-101-1012345-0.500.511.5XYZ  WAMHumanTarget-1-0.500.51024-0.500.511.5XY Convex HullMapped AreaObserved AreaObserved ParticlesOccluded ParticlesEstimated TargetShadowed SamplesVisible Samples59  3.4.3.2 Cartesian Space Sampling The “Lost Target Recovery Algorithm – Cartesian Sampling” (LTRA-CS) is adopted from [48], [49] in NBV planning, where the goal is to model an object of interest efficiently. In [49], a desired set of viewpoints is created by ray-tracing an octree representation of the workspace including the object. This set of desired viewpoints is then mapped into C-space for efficient and obstacle free path planning. Here we create our set of desired viewpoints by finding the point closest to the current camera location on each ray created by casting the estimated target location through the edges of the mapped occluder, depicted as a blue plane in Figure 3-6. To improve robustness, we uniformly expand the surface area of the mapped occluder by a bounding strip (a tunable factor estimated from the image processing precision). This allows the viewpoints to satisfy a so-called visibility condition by falling in the visibility region. A similar technique is adopted in [102] to avoid local minima around obstacles while planning sub-goal configurations around obstacles in a sampling-based local motion planner. A set of points are uniformly distributed along the edges of this expanded occluder (red stars in Figure 3-6). Then, the estimated target location (black point in Figure 3-6) is projected through these expanded edge points to generate the sample viewpoints, Cartesian space samples, shown as blue stars (the estimated target location is ray-casted through these points into robot task space, resulting in one viewpoint per ray found as the closest point on each ray to the current camera location). Each viewpoint is then mapped into C-space through a call of IK. For every feasible result, the terms in (3-17) are calculated, and the sample configuration with minimum cost is the next best action to perform.     a)                                                                                                                       b)  Figure 3-6. One-step look-ahead planning using LTRA-CS. Samples are derived in Cartesian space and transferred to C-space using IK. a) Rectangular board occlusion scenario. b) Human occlusion scenario. -101 -1 0 12 3 45-0.500.511.5BoardTargetX YZ   WAMConvex HullMapped AreaObserved AreaObserved ParticlesOccluded ParticlesEstimated TargetShadowed SamplesVisible Samples-1 -0.5 0 0.5 1024-0.500.511.5XYZ  WAMHumanTarget-101024-0.500.511.5XYZ  Convex HullMapped AreaObserved AreaObserved ParticlesOccluded ParticlesEstimated TargetEdge SamplesMapped Samples-101024-0.500.511.5XYZ  Convex HullMapped AreaObserved Areabserved Particlesccl ded Particlesti ated TargetEdge SamplesMapped Samples60  3.4.3.3 Intelligent Sampling The “Lost Target Recovery Algorithm – Intelligent Sampling” (LTRA-IS) refers to a method that derives samples in C-space around critical points. Critical points (shown as pink diamonds in Figure 3-8) represent two types of heuristically promising actions:  “zooming back” from, and “looking around”, the occluder.  Both of these types of points are generated in Cartesian space and then transferred into C-space using IK. The “zooming back” critical point is a single point in Cartesian space that moves the camera back slightly along the ray casted from the estimated target location to the current camera location. There also exist two “looking around” critical points per shadow plane, captured in the vicinity of the closest points on the shadow plane to the current camera and robot’s base location (depicted in Figure 3-4). Since IK is computationally expensive, we only use IK to transfer these critical points, 𝒙𝑐𝑟, from Cartesian space to C-space, 𝒒𝑐𝑟, to provide informative seeds to direct our C-space sampling. The informative seeds with feasible 𝒒𝑐𝑟 form the means of a multivariate Gaussian distribution in C-space.  The samples generated in C-space from this multivariate Gaussian distribution should satisfy certain conditions. They should satisfy the joint limits (joint limit condition), fall in the visibility region when transferred into Cartesian space (visibility condition), and orient towards the estimated target location (orientation condition). We satisfy these conditions by carefully choosing the covariance of the multivariate Gaussian distribution in C-space. We first satisfy the visibility and orientation conditions by forming a heuristic covariance matrix in Cartesian space and then transforming it into a covariance matrix in C-space. Our heuristic places the samples in close vicinity of the shadow plane, in the visibility region. One possible distribution is to place the samples on a disk centered at the critical points and parallel to the shadow plane. The covariance matrix of such a distribution in the Cartesian space is achieved by carefully choosing the eigenvectors, directions of the variance, and eigenvalues, magnitudes of the variance. We generate an orthonormal set of vectors, 𝑽, in Cartesian space when the augmented shadow plane normal vector, 𝒗1, is the first vector (eigenvector):    𝑽 = [𝒗1, 𝒗2, … , 𝒗6], 𝒗1 = [𝒏𝑐𝑟𝟎3×1]. (3-21) The shadow plane normal vector, 𝒏𝑐𝑟, is augmented by a zero vector to avoid any rotation of the samples around critical points to satisfy the orientation condition. Then, by choosing all of the eigenvalues, 𝜆𝑖, to be small except 𝜆2, 𝜆3, we can distribute the samples parallel to the shadow plane. This results in the following covariance matrix in Cartesian space: 61    Σ𝑿 = 𝑽𝝀𝑽⊺, 𝝀 = [𝜆1 ⋯ 0⋮ ⋱ ⋮0 ⋯ 𝜆6]. (3-22) Using a Jacobian transformation between the camera frame and the robot joint space we have:  𝛿𝑿 = 𝐽(𝒒)𝛿𝒒 ⇔ 𝛿𝒒 = 𝐽†(𝒒)𝛿𝑿, (3-23) where 𝛿𝑿 = [𝛿𝑥 𝛿𝑦 𝛿𝑧 𝛿𝑢 𝛿𝑣 𝛿𝑤]⊺, and 𝐽†(𝒒) is the pseudoinverse of the Jacobian matrix at point 𝒒. Therefore the transformed covariance matrix in the C-space is provided by:   Σ𝒒 = 𝐽†(𝒒𝑐𝑟)Σ𝑿𝐽†(𝒒𝑐𝑟)⊺. (3-24) Finally we define a penalty term, 𝑊𝒒𝑐𝑟, penalizing joint limits similar to [103] to satisfy the joint limit condition.    𝑊𝒒𝑐𝑟 = [𝑤(𝑞1) ⋯ 0⋮ ⋱ ⋮0 ⋯ 𝑤(𝑞𝑛𝑞)]𝒒𝑐𝑟, [𝑤(𝑞𝑖)]𝒒𝑐𝑟 = (1 − 𝑒−𝛾(𝑞𝑖−𝑞𝑖𝑚𝑖𝑛)(𝑞𝑖𝑚𝑎𝑥−𝑞𝑖)(𝑞𝑖𝑚𝑎𝑥−𝑞𝑖𝑚𝑖𝑛)2) , 𝑖 = 1, … , 𝑛𝑞. (3-25) The coefficient 𝛾 is the same as the one in (3-16), and this penalty function is also depicted in Figure 3-7.a for a sample joint with joint limits of [−3 3] 𝑟𝑎𝑑 and various 𝛾.                                                        a)                                                                                                         b)                Figure 3-7. a) An example of the penalty functions, 𝝍(𝒒𝒊) and 𝒘(𝒒𝒊), for a joint with joint limits of [−𝟑 𝟑] 𝒓𝒂𝒅.  b) Sample distribution around a critical point derived from (3-26). -3 -2 -1 0 1 2 3012345q(rad)   ,  =10 ,  =100 w,  =10 w,  =100-1-0.500.51-0.500.511.500.511.52XYZ WAM  Estimated TargetShadow PlaneCritical PointShadow Plane Normal VectorCartesian SamplesTransformed C-space Samplesq(rad) 62  The penalty function is applied to the covariance matrix as follows resulting in a multivariate GMM in C-space around np feasible mapped critical points:  𝜋 = ∑ 𝒩(𝒒|𝒒𝑐𝑟,𝑗, Σ𝑗)𝑛𝑝𝑗=1 ,   Σ𝑗 = 𝑊𝒒𝑐𝑟Σ𝒒𝑊𝒒𝑐𝑟⊺. (3-26) A schematic representation of the samples derived from this distribution around one critical point is presented in Figure 3-7.b. Samples derived from this proposal distribution satisfy all of the considered conditions except the visibility condition in two cases. One is for the samples around the critical point representing zooming back action and the other is when the Jacobian matrix at a critical point is singular. Finally, one should note that the distribution in (3-26) depends on a feasible IK solution for critical points. In fact, not all of the critical points would return a feasible IK solution as illustrated by pink diamonds without any samples around them in Figure 3-8. In cases where there is no feasible solution, the robot would not make an action. To avoid these situations, we derive 20% of the samples randomly from (3-20) and the remaining 80% from (3-26). This is a common practice in sampling-based motion planning when the information from the environment is inaccurate. For example, a C-space explorative sampler compensates for the exploitative sampling-based planner, applicable when accurate information is available, in [104].         a)                                                                                                               b)  Figure 3-8. One-step look-ahead planning using LTRA-IS. Critical points are transferred from Cartesian to C-space using IK as seeding points for intelligent sampling. a) Rectangular board occlusion scenario. b) Human occlusion scenario.    -101 024-0.500.511.5BoardTargetXYZ   WAMConvex HullMapped AreaObserved AreaObserved ParticlesOccluded ParticlesEstimated TargetShadowed SamplesVisible SamplesCritical Points-1-0.500.51-1012345-0.500.511.5HumanXYZ WAMTarget  Convex HullMapped AreaObserved AreaObserved ParticlesOccluded ParticlesEstimated TargetShadowed SamplesVisible SamplesCritical Pointsz63  3.5 Results Evaluation of a multi-faceted (non-analytically tractable) planning approach such as ours requires multiple repeated trials to test robustness to a variety of configurations.  To validate the system, we implemented our LTRA on a detailed simulator and in a real world experiment.  Although our algorithm uses a one-step look-ahead planning approach, the system may fail to resolve the occlusion in the first step if the occlusion is not fully mapped or the particle filter does not track the target accurately. Therefore, we consider the number of steps to recover the target as a performance metric. We also use the total Euclidean distance travelled by the robot (in C-space) as another performance metric representing the cost of making actions. Computational cost is another metric to compare the three solvers presented in Section 3.4.3. To ensure computation is fairly compared, we used 64 particles to track the occluded target and considered 100 samples for each of the three solvers to solve (3-18) in each trial, unless stated otherwise. 3.5.1 Simulation Results Our detailed simulator consists of a Barrett WAM 7-DoF robot model, a moving target, and two different simulated occluders.  The first occluder is a large rectangular board and the second is a simplified human model constructed from real data captured through a Microsoft Kinect skeleton tracker. Each joint angle command issued by our planner results in a simulated image that is used to drive subsequent planning. Target tracking and occlusion analysis are also simulated, although for simplicity we assume error-free visual measurements. We also use a numerical 7-DoF IK7 to implement the mapping between Cartesian space and C-space. Multiple trials involving 6 different initial conditions of the robot and 100 random initial conditions of the target were executed for each occluder, and planning results are aggregated to compare methods. The various robot’s initial conditions capture a reasonable range of configurations and are referred to as: ‘Elbow Down’, ‘Home’, ‘Elbow Up’, and 3 other random configurations. The 100 random initial conditions of the target were generated to pass behind the stationary occlusion relative to the camera. As the relative initial placement of sensor and target affects the search, we kept this repository of random target motions consistent in all of the simulation evaluations.                                                        7 An existing ROS package available at https://code.google.com/p/lis-ros-pkg/. 64    Figure 3-9. Comparison of the three LTRAs given 6 initial robot configurations for a rectangular board occlusion with respect to time step and total travel distance cost for 100 simulation trials. A single trial starts with the camera tracking the target using the wrist joints (pan-tilt tracking). Our LTRA starts as soon as the target is occluded.  Then the planner provides a new robot pose that drives simulated robot motion. Afterwards, the target and occluder information are updated accordingly. This process is repeated until the target is recovered.  The results of the first round of simulation trials considering the rectangular board occlusion are presented in Figure 3-9 for each of the LTRA-RS, -CS and -IS. Note that, the T-bars in this and the following figures represent 95% of the range of the measured parameter given multiple (100 in this case) random initial conditions. The number of steps to recover the target mainly varies between 1 to 2.5 steps for all three LTRAs. LTRA-IS always requires fewer steps than LTRA-RS, while LTRA-CS requires fewer steps than LTRA-RS in all but two robot’s initial conditions. There is no particular pattern when comparing LTRA-CS and LTRA-IS. This inconsistency is investigated later in this section. For total travel distance cost, in all cases, the LTRA-CS yields the minimum, and LTRA-IS yields less total travel distance cost compared to LTRA-RS.  00.511.522.53Home       Elbow Down     Elbow Up      Random1        Random2       Random3TimestepLTRA-RSLTRA-CSLTRA-IS024681012Home       Elbow Down     Elbow Up      Random1        Random2       Random3Total travel distance cost LTRA-RSLTRA-CSLTRA-IS65    Figure 3-10. Comparison of the three LTRAs given 6 initial robot configurations for a human occlusion with respect to time step and total travel distance cost for 100 simulation trials. The simulation results of the trials with a human occluder are presented in Figure 3-10. The number of steps in this case mainly varies between 1 to 2 steps for all three LTRAs. LTRA-CS and LTRA-IS always require fewer steps than LTRA-RS, while LTRA-CS requires fewer steps than LTRA-IS in the majority of the robot’s initial conditions. These results are consistent when comparing all LTRAs in terms of total travel distance cost. One of the main reasons for a higher average number of steps than expected for the LTRA-CS and LTRA-IS when compared to LTRA-RS in the board occlusion simulation is IK failure. In this simulation, the large occluder limits the robot’s workspace significantly. As a result, in some of the simulation trials the numerical IK did not return any feasible results. Finding approximate solutions to recover the target in the LTRA-CS and LTRA-IS depends on feasible IK solutions, thus their results were adversely affected in this simulation.  In comparison, in the human occluder simulation, the smaller size of the occluder 00.511.522.5Home       Elbow Down     Elbow Up      Random1        Random2       Random3TimestepLTRA-RSLTRA-CSLTRA-IS012345678910Home       Elbow Down     Elbow Up      Random1        Random2       Random3Total travel distance cost (rad)LTRA-RSLTRA-CSLTRA-IS66  model permits more feasible IK solutions, resulting in a more consistent pattern among different solvers. As shown in this simulation, the solvers’ performance improves in the order of LTRA-RS, LTRA-IS and LTRA-CS as long as IK does not fail.  The computational cost of IK also affects the overall computational cost for the LTRA-CS and LTRA-IS solvers. We evaluated the effect of IK computation by looking at the relative overall computation of each solver. We measured the process time (CPU time) of each LTRA from the moment the target was lost to the moment that it was recovered for the human occluder simulation. To fully understand the effect of IK, we also evaluated the results for the cases of doubling the number of samples and using a different IK. Here, we only present the results for one robot’s initial condition as the results do not vary noticeably among considered initial conditions. Using the numerical 7 DoF IK and 100 samples to solve (3-18), LTRA-CS required the largest CPU time followed by LTRA-IS and then LTRA-RS (shown in Figure 3-11.a). The large difference between LTRA-CS and LTRA-RS CPU time is mainly due to IK computation. LTRA-IS takes les CPU time than LTRA-CS as fewer calls to IK are required. This difference was highlighted when we doubled the number of samples to 200. As a result, the CPU time of both LTRA-RS and -IS almost doubled, Figure 3-11.b. However, the CPU time of LTRA-CS is more than doubled as proportionally more samples returned feasible IK solutions. Doubling the number of samples, also improved the performance of each solver in the expense of more computational cost (the number of steps and total travel distance cost reduced in general). We then used a 6 DoF IK with closed-form solution, keeping joint 3 of the simulated 7 DoF WAM fixed. To reduce computational cost, we used kinematic decoupling and restricted the solutions to the front half-sphere/elbow down configurations [105]. By applying this IK, the CPU time was reduced for LTRA-IS and notably reduced for LTRA-CS (comparing Figure 3-11.a and Figure 3-11.c). The CPU time of LTRA-RS remained the same as IK is not required in this solver. The LTRA-CS CPU time was less than LTRA-RS, Figure 3-11.c, because the IK failed for most of the LTRA-CS samples, and upon IK failure the rest of the computations (calculation of (3-17)) were discarded. The IK failure also affected the relative performance of the solvers (from number of steps and total travel distance cost perspectives) by returning more inconsistent results.  67   a)                                        b)                                                  c) Figure 3-11. Comparison of the three LTRAs given Elbow Down initial robot configuration for a human occlusion with respect to computational cost for 100 simulation trials. Overall, the choice of the type of IK used is a trade-off between computational cost (speed) and feasibility (performance). A better and more computationally expensive IK results in better performing (from number of steps and total travel distance cost perspectives) but slower LTRA-CS/IS compared to LTRA-RS. In the next section, we use the fast 6 DoF IK for LTRA-CS/IS to investigate our LTRA and compare the relative performance of the three solvers in a real world experiment. 3.5.2 Experimental Set-up In order to verify our proposed LTRA and various sampling-based optimization solvers, we implemented them on a high dimensional telepresence platform with a stationary base. A robotic telepresence system allows users in two different geographical locations to have audio-visual communication. Our algorithm provides the system with an autonomous mode control to rapidly resolve occlusions and minimize its impact on the quality of interactions between users. Our telepresence system incorporates in a 3-D camera (Microsoft Kinect) and a 7" tablet mounted on a 7-DoF robotic arm (Barrett Technology WAM) as shown in Figure 3-12.a. However, we only actuated six of the robot’s DoF and utilized the computationally cheaper 6 DoF IK (keeping joint 3 fixed). One person represented an interlocutor (a local user) communicating with a remote user through the platform while another person performed the role of an occluder depicted in Figure 3-12.b. 00.511.522.53100 samples, 7 DoF IKCPUtime (s)00.511.522.53200 samples, 7 DoF IKCPUtime (s)00.511.522.53100 samples, 6 DoF IKCPUtime (s)LTRA-RSLTRA-CSLTRA-IS68                                                                 a)                                                                                         b) Figure 3-12. a) The articulated telepresence platform. b) An occlusion scenario where the person on the left represents the interlocutor and the person wearing a yellow T-shirt is the occluder. We used an open-source Kinect skeleton tracker software in our experiment to track the occluder, and constructed a convex hull around his frame. Meanwhile, we were only interested in recovering the interlocutor’s head as our target, rather than his whole body when it was occluded. To simplify the interlocutor detection and speed up our image processing, we attached an ARTag on the interlocutor’s torso. As a result, we were able to implement all aspects of our software including image acquisition, image processing, skeleton tracking, path planning, and robot control on one computer. All of these software components were connected through ROS Fuerte (Robot Operating System 5th distribution release) in our implementation. 3.5.3 Experimental Results Each solver was tested over 20 trials where the occluder and interlocutor were asked to deliberately create occlusion scenarios, while keeping the scenarios similar across different solvers. These scenarios included moving-interlocutor/stationary-occluder (6 trials), stationary-interlocutor/moving-occluder (6 trials), and moving-interlocutor/moving-occluder (8 trials) alternating two at a time. The interlocutor and occluder referenced markers placed on the ground to help them keep their relative distance to the robot consistent. The 20 trials for each solver were conducted consecutively without restarting the robot. Therefore, random initial conditions were considered for the robot in these experimental trials. To further challenge the LTRA algorithm in general, the interlocutor stopped right behind the occluder in scenarios with moving interlocutor. This resulted in an abrupt change in the target motion model, making it harder for the particle filter to track the occluded interlocutor. The results of the experimental trials are presented in Figure 3-13. The number of steps mainly varies between 1 to 2.5 steps for all three LTRAs, while LTRA-IS has the minimum number of steps. We also treated the trials taking more than 4 steps to resolve the occlusion as failures. Under this regime, the 69  LTRA-RS, LTRA-CS and LTRA-IS failed at a 25%, 12.5% and 8.33% rate respectively. All of these indicate that our LTRA algorithm in general is capable of rapidly resolving occlusions and the LTRA-IS algorithm is best for recovering the lost target with a low failure rate. We further compared the time it took from the moment that the interlocutor was lost to the moment that he was recovered. One should note that this time includes the robot motion with a filtered trapezoidal joint velocity profile. This joint velocity profile has maximum velocity and acceleration of 1 rad/s and 6 rad/s2. The result as shown in Figure 3-13 indicates that LTRA-CS took the least amount of time to recover the target followed closely by LTRA-IS. This is due to the high failure rate of our simple and computationally cheap IK as explained in Section 3.5.1.  In terms of total travel distance cost, LTRA-IS has the least cost followed by LTRA-CS, and from this perspective both of these solvers are more efficient than the baseline LTRA-RS. In fact, all of the experimental results indicate the superior performance of the two proposed solvers compared to the baseline LTRA-RS. Also, the overall experimental results also favour the LTRA-IS over LTRA-CS. An example of stationary-interlocutor/moving-occluder scenario using LTRA-IS is depicted in Figure 3-14. In this sample scenario the interlocutor is recovered in one step.   Figure 3-13. Comparison of the three LTRAs with random initial robot configurations with respect to time step, time and total travel distance cost for 20 experimental trials.  00.511.522.53Timestep02468101214Time(s)0246810Total travel distance cost (rad)LTRA-RSLTRA-CSLTRA-IS70   Figure 3-14. A stationary-interlocutor/moving-occluder telepresence scenario. In each stage a room view and a mounted camera view is presented. The person with the ARTag is the interlocutor while the person with yellow T-shirt is the occluder. In this sample scenario the LTRA-IS solver recovers the interlocutor in one step as the occluder approaches the pink bucket. 3.6 Conclusion In this chapter we presented a novel algorithm to resolve occlusions in an online manner in single-camera high dimensional eye-in-hand systems. Persistent occlusion is a source of visual target tracking failure in dynamic environments. However, it has not been addressed in the active visual search literature for high dimensional robotic systems when rapid recovery of a moving target is necessary.  We first defined the goals of active visual search problem as a multi-objective cost function that captures many intuitive aspects of this problem. Our proposed cost function rapidly trades off searching for the target with acquiring further information about the occluder extents and cost of making actions. At the same time, the singularity and joint limits are handled respectively by C-space planning and imposed constraints on the cost function. We then introduced an observation model to account for FoV constraint. This observation model is capable of updating target belief state in presence of occlusions. The active construction of the occluder map was then utilized to solve the cost function efficiently. It enabled us to search for viewpoints that allow the robot to look around the occluder. We applied three different approximate sampling-based optimization solvers to solve this optimization problem. LTRA-RS as a commonly practiced random sampling approach was adopted as baseline. We then presented, LTRA-CS, a Cartesian space sampler adopted from NBV planning literature. We also introduced a novel approach, LTRA-IS, that explores the C-space while exploiting around informative seeds.  71  Our proposed solvers rarely employ expensive IK. Rather, computations are performed efficiently in C-space. These solvers indicated superior performance both in simulation and experimental results against the baseline approach. Where higher computations can be tolerated, LTRA-CS performs better by applying a more accurate IK, while LTRA-IS is preferred to strike a balance between computational cost and performance. Our results on a high dimensional robotic platform also demonstrated the capabilities of our LTRA in general to resolve occlusions and rapidly recover moving targets. In the next chapter, we evaluate the efficacy of applying LTRA on a real world robotics application from user’s point of view and empirically investigate its implications.   72  Chapter 4 - Visual Occlusion Handling in Telepresence Collaborations 4.1 Introduction Previous chapters first developed a high dimensional active search algorithm and then improved it to resolve occlusions in high dimensional eye-in-hand systems. One of the application areas of such systems is in robotic telepresence to improve user experience. In this regard, the LTRA has the potential to improve a telepresence experience by resolving occlusions. Occlusions can occur when users in remote locations are engaged in physical collaborative tasks. This can yield to frustration and inefficient collaboration between the collaborators. When the communication happens across a conventional, stationary videoconferencing platform, it is impossible for a remote user to actively avoid or resolve the occlusion. When a robotic telepresence system is used, on the other hand, the system or the remote user can resolve occlusions by controlling the robot’s motion. The aim of this chapter is to design a better user interface to improve remote collaboration experience. The author conducted two human-subjects experiments (N=20 and 36 respectively) to investigate the following interlinked research questions: a) what are the impacts of occlusion on telepresence collaborations, and b) can an autonomous handling of occlusions improve telepresence collaboration experience for remote users? In both studies, the focus was on the pilot’s perspective of remote collaborations rather than that of the interlocutor. The author used the same collaborative, experimental task in both studies where the task required subjects to maintain visual contact with each other. During the task, a third person (an occluder) walked in between the interlocutor and the robot in order to create instances of occlusion. In the first study (preliminary study), 20 participants were invited to participate in a collaborative task with an experimenter while co-located and telepresent. This study examined the possibility that occlusions can elicit users to intentionally change their position/orientation or that of their proxy, the telepresence platform. Task performance and vocal non-verbal behavior measures were used to study the impacts occlusion has on telepresence collaboration. 73  In the second study (main study), a total of 36 participants were provided with three different modes of control to resolve occlusions during a robot-mediated videoconferencing: Stationary, Manual, and Autonomous. The Stationary mode served as the baseline where the robot remained stationary regardless of occlusion presence. For the Manual mode, a user-friendly interface empirically designed in the preliminary study was used to manually control the robot. The Autonomous mode used a novel LTRA algorithm to automatically resolve occlusions. This study then compared the three control modes in terms of the participant’s task performance, time to resolve the occlusion, vocal non-verbal behavior, sense of presence, perceived workload and user preference.  With the two interlinked studies, this chapter tests the following hypotheses:   H1. Occlusions during telepresence collaborations significantly deteriorate the task performance and disrupt the vocal non-verbal behavior. H2. Occlusion during both remote and co-located collaboration is an impactful extrinsic environmental interference that necessitates a change in the spatial relationship between the two collaborating parties. H3. A telepresence system that autonomously resolves occlusions significantly improves the vocal non-verbal behavior – indicative of a more fluent communication – of the pilot compared to that of the Manual and Stationary modes. H4. Autonomous handling of occlusions significantly improves task performance of a remote collaborative task when compared to Manual and Stationary modes.  H5. Occlusion resolution during telepresence collaborations generates the highest sense of presence in Manual mode, followed by Autonomous mode and then Stationary mode.   H6. The pilot’s perceived workload of a collaborative task is significantly reduced in Autonomous mode compared to Manual mode during telepresence collaborations. By evaluating these hypotheses one will better understand the nature of spatial relationships in remote collaborations. Studying different types of controllers will also contribute to the design of telepresence platform controllers used to handle occlusions, permitting smooth and unencumbered communication between the pilot and the interlocutor. This chapter will provide timely input to designers as the demand for consumer telepresence platforms increases, and these systems expand into more complex environments.  The following section provides some background. Then Section 4.3 presents the preliminary study. The same telepresence system and experimental task are used in the main study. Findings from the preliminary study informed the main study presented in Section 4.4. The results and discussions are presented in Section 4.5 and 4.6 respectively. This chapter finally concludes in Section 4.7. 74  4.2 Related Work In order to study the impacts of occlusion on remote collaboration and repositioning/reorienting of robotic proxies, we first review the telepresence literature that takes different types of spatial relationships into account. In this section, we highlight two aspects of these closely related telepresence studies. The cause and effect of various spatial relationships is the first aspect of interest. The second aspect relates to the methods that enable a pilot to control the spatial relationships in telepresence interactions. Spatial relationships between humans, and between humans and robots, have been extensively studied in the fields of social science and social robotics. One of the widely used theories in social science is the interpersonal distance theory [106], that introduces the four voluntarily created spatial relationship classes: intimate, personal, social and public, as a set of proxemics features. F-formations [107] represent another well-known proxemics theory, categorizing the dynamics of spatial and orientation relationships between two or more people. These theories are widely applied in human-robot interaction (HRI). Examples of HRI proxemics studies include extracting human spatial arrangements with a robot [108], investigating spatial relationships in a human-robot collaboration [109] and increasing the social interaction of robots with humans [110].  In telepresence literature, the studies that focus on the effect of spatial relationships can be categorized into two general groups. The first group investigate these effects from an interlocutor’s point of view. For example, in [111] the change of a telepresence platform’s height affected how the interlocutor perceived the persuasiveness and authority of the pilot. Two different heights for an MRP were manually adjusted for this study prior to the experiments. In another study that involves elderly participants who video-conference with their families [112], an automatically directing display is found useful based on a custom usability questionnaire. An audio-visual attention control system was in charge of directing this tabletop pan-tilt display. The swivel motion of a display is then demonstrated to result in more conversational engagement, help communicate the pilot’s focus of attention and is preferred by users compared to a stationary platform [113]. In this study, two different modes of control - directing the display with a mouse (explicit mode) and directing the display by following pilot’s head turns (implicit mode) - were compared against a stationary mode. Continuing on the effects of spatial relationship of telepresence platforms from interlocutor’s point of view, MeBot’s expressive modes and body poses show improvements in psychological involvement, interaction engagement/enjoyment and cooperation level [114]. MeBot is a desktop-use telepresence platform with high DoF. Its expressive modes and body poses were controlled by tracking the pilot’s head 75  gesture. A passive controller also allowed the pilot to manually control the arm and the shoulder. The pointing gestures of a similar telepresence platform, performed through predefined commands, is also shown  to convey spatial information, consistent with human-human interaction [115].  As reviewed so far, various effects of spatial relationships are covered in telepresence studies from the interlocutor’s point of view. However, none of those that we are aware of address the effects of occlusion. These studies also rarely consider designing and investigating various controllers to improve spatial relationships from the user’s standpoint. This latter aspect is more often addressed in the second group of telepresence studies focused on the effects of spatial relationships from the pilot’s perspective. We discuss this group of literature in more detail as our work also focuses on the pilot side of telepresence collaboration.  In a recent study of remote collaboration, using an MRP system, the ability to move the telepresence platform in the remote space was found to increase the pilot’s sense of presence [116]. The authors carefully designed the collaborative construction task condition to cause high levels of mobility. However, the study only compared a keyboard-based manual control of the MRP against a stationary mode and did not consider an autonomous control mode. The authors also showed, contrary to one of their hypothesis, that manual mobility control degraded task performance. One explanation offered is the high level of attention required to simultaneously perform the task and maneuver the MRP. The other suggested reason is that the utilized interface failed to provide the pilot with sufficient understating of the spatial positioning of the platform with respect to the remote environment. In another study focused on pilot’s point of view, [62], manual control of a telepresence robot by a user is suggested to produce a strong perceived social telepresence while the same is not true for an automatic control mode. In fact, the perceived social telepresence, the degree to which telepresence resembles face-to-face interaction, was significantly better only in a manual mode compared to a stationary and autonomous modes. The authors designed a task of having various presentation spots in a laboratory while giving a tour to an MRP pilot to cause movability. They defined movability as a much shorter travel distance in contrast to mobility. They then used a keyboard-based manual mode and an autonomous mode (controlled by an experimenter) to provide movability as a spatial relationship. The F-formations created by pilots, as another spatial relationship, are shown to be correlated to the pilots’ perceived sense of spatial and social presence as well as ease of use [117], [118]. An MRP system was manually driven around using a mouse interface by 21 participants during a home visit task interacting with two elder actors. Different F-formations were observed in various situations through video coding. The authors attributed this to the scripted actions of the interlocutors, space constraints, the 76  MRP’s wide-angle lens camera and pilots’ navigational skills. They concluded that improving the interface to control the robot leads to a better quality of interaction. Building on these studies of spatial behaviours focused on the pilot, we investigate the effects of occlusion as an extrinsic environmental interference in remote collaborations. We also investigate the potential for occlusions to elicit repositioning/reorienting behaviour of the robot proxy. We then provide a user-friendly controller for the pilot to resolve occlusion and minimize its negative impacts. In the next section, we outline our preliminary study aimed at investigating impacts of occlusion on face-to-face collaborations. 4.3 Preliminary Study We conducted a preliminary study, approved by UBC BREB, with 20 participants (10 females and 10 males) to investigate the impacts occlusion has on face-to-face collaborations. First, we conducted a human-human interaction study where a participant, co-located with a confederate, collaborated with the confederate on a vision-dependent task. We observed how the co-located, collaborating parties react to occlusions by qualitatively video coding participants’ reaction to occlusions. We observed that the participants always take an action (e.g., moved their head) when they cannot see the confederate. Then, the participants were moved to a different location (remote room), where they performed the same collaborative task while manually controlling our telepresence platform using two different manual interfaces. This provided us with a within-subject comparison of gesture-based and keyboard-based interfaces. This preliminary study informed the main experiment (Section 4.4): it served to validate the effectiveness of the designed collaborative task, and to select a user-friendly manual interface that can be used to provide a fair comparison against an autonomous control of the telepresence platform. Detailed results of this preliminary study are presented in [119]. In this section, we describe our telepresence platform and the collaborative Jigsaw puzzle task that was used for both the preliminary and main study. Then we present the two manual interfaces considered in the preliminary study and their results. 4.3.1 Telepresence Platform In conversation and collaborative meeting scenarios, a telepresence platform becomes equivalent to a videoconferencing display when it is positioned in front of an interlocutor [116]. It is also shown that spatial positioning/orienting of telepresence platforms during remote conversation can positively affect 77  the telepresence experience. These spatial behaviors are mainly possible thanks to higher DoF of the telepresence platforms used, as in [111], [112], [114]. Higher DoF that allow for independently articulated head and torso (providing independent motion for the camera/display) on a robot base is even recommended as a design guideline to facilitate facial contact during walking conversations [120]. Following the recommendations from the literature, we designed a high-DoF telepresence platform that is capable of resolving occlusions through spatial reconfiguration. Our platform consists of a Skype8-enabled 7˝ tablet and a Microsoft Kinect mounted on a 7-DoF Barrett WAM arm (see Figure 4-1). This platform represents a decoupled torso design as proposed in [87] and is controlled via ROS. The end-effector (camera and display) motion of the robot is decoupled from its base, which reduces the burden of controlling the robot’s base when it is approximately located in a proper formation with the interlocutor. Although the base of our platform is stationary, the outcome of this work can be applied to any telepresence platform that can be remotely controlled to reposition/reorient.  Figure 4-1. The 7-DoF telepresence platform used in our experiments consists of a tablet and a Kinect mounted on its end-effector. 4.3.2 Experimental Task  We designed a collaborative Jigsaw puzzle task that heavily relies on continued visual contact between the two collaborators. In this task, we gave the pilot, a participant, a completed and numbered version of a 9-piece puzzle (see Figure 4-2). The participant was then told that the interlocutor, a confederate, is given the same but un-numbered puzzle pieces scattered in a bin (see Figure 4-3). The participant was asked to                                                      8 SkypeTM, “SkypeTM is a trade mark of Skype and our research is not affiliated, sponsored, authorised or otherwise associated by/with the Skype group of companies.” 78  instruct the confederate to find and place the odd-shaped puzzle pieces one at a time in the order in which the pieces were numbered. Unbeknownst to the participant, the confederates’ pieces were numbered on the back side in order to assure consistency in the confederate’s behaviour - especially the time it takes her to find each piece - across all experimental conditions and participants.  Throughout the task the occluder, an experimenter, intentionally walked in between the line of sight of the participant and the confederate multiple times. This introduced multiple occlusions and disruption of visual contact. As a result, various spatial formations between the interlocutor and the participant/telepresence platform could be formed to resolve the occlusion. The participants, as the instructors, were free to reposition/reorient themselves (the telepresence platform in the case of remote collaboration) or to request the confederate to move. The interlocutor’s actions were limited to following the pilot’s instructions throughout the task. The bin and an empty puzzle board to place the pieces on were located in front of the confederate who was facing the telepresence platform. The same confederate played the role of the interlocutor for the entire experiment to ensure that all participants had a similar experience. After the confederate found each piece, the participant was required to confirm the piece visually before allowing the confederate to place the piece on the board. Likewise, the confederate was instructed to make sure that she has the participant’s confirmation before placing the piece on the board.  Figure 4-2. A sample Jigsaw puzzle for the pilot used for the collaborative task. 1 2 345987679   Figure 4-3. Schematic representation of the experimental telepresence collaboration set-up used for both preliminary and main studies. Both the interlocutor and the occluder were experimenters. The numbers in front of the interlocutor indicate the order in which the interlocutor completed the puzzle. The numbers in front of the telepresence platform represent the puzzle pieces at which the occluder moved to occlude the pilot.  In each mode, four occlusions were intentionally introduced on specific puzzle pieces. The occluder walked in between the two collaborating parties during the task as soon as the pilot started to instruct these four specific puzzle pieces. In all cases, participants took an action to resolve the occlusion. The occluder left the FoV when the pilot confirmed that the interlocutor has the right piece. An occlusion scenario and its resolution action by the telepresence system are depicted in Figure 4-4. InterlocutorAssembly TableBinOccluder421-2-3-4-5-6-7-8-9Board6 8# of the pieces that will be occludedPilot/ParticipantCameraPuzzleScreen/WebcamMicTabletWAM RobotTelepresence PlatformRemote roomLocal room80   Figure 4-4. Demonstration of an occlusion introduced during the experiments and its resolution. The ‘Mounted camera view’ shows an example of what the pilot, the participant, saw during the experiment. 4.3.3 Manual Control Interfaces  Differences in user perception and task performance in using manual and autonomous control modes of a telepresence robot can depend on the particular manual control interface used. In order to provide a fair comparison between autonomous and manual resolution of occlusions in our main study, we designed and empirically compared the efficacy of two manual interfaces.  A non-contact, gesture-based interface using the Leap motion9 controller and a conventional keyboard-based interface were designed to remotely control the camera/display of our telepresence platform. We conducted a user study with the 20 participants of our preliminary study using the aforementioned Jigsaw puzzle task. Gesture-based Interface: The Leap motion controller has been used to control robotic manipulators before [121], [122]. We designed our Leap-based interface such that the user’s hand reference frame is mapped to the reference frame of the robot end-effector (see Figure 4-5.a). The hand displacement was mapped by a gain of two to the robot end-effector displacement (the Leap motion controller workspace is approximately half the size of WAM workspace). In order to avoid robot singularities, the motion of the mounted camera was constrained on forward/backward motions. Therefore, the user’s hand displacement generated a planar end-effector motion consisting of up/down and left/right movement that is suitable for                                                      9 Leap Motion™ Controller 81  the experimental task. The hand orientation was mapped directly to control the end-effector orientation. Therefore, the hand motion controlled 5 DoFs of the mounted camera/display on the 7-DoF robot end-effector.  Keyboard-based Interface: A conventional keyboard-based [116], [123] robot control mode was also implemented. On a regular keyboard, five pairs of keys were used to control the robot end-effector. Each pair controlled a specific DoF of the end-effector (as indicated in Figure 4-5.b). The 2 translational pairs (A/D & W/S keys) displaced the end-effector in 5mm increments while the 3 rotational pairs (arrow keys & Delete/End keys) controlled the end-effector’s roll, pitch and yaw in 1° increments. 4.3.4 Selection of Manual Interface In this preliminary user study, the conventional keyboard-based interface demonstrated significantly superior performance in all usability measures considered. We considered objective measures including task completion time and travel distance of the robotic arm. Our subjective measures included questions from NASA Task Load Index (TLX) [124] and System Usability Scale (SUS) [125] questionnaires designed to measure perceived workload and industrial usability respectively. Even though gesture-based interfaces are becoming more and more popular in robotic teleoperation applications [126], [127] we obtained a poor performance for our gesture-based interface compared to the keyboard-based interface. The keyboard-based interface was also preferred over the Leap-based interface by 14 out of 20 participants. We suspect this to be caused by two factors: our interface implementation for telepresence, and unfamiliarity of majority of the participants with Leap motion controller. Therefore, we chose the keyboard-based interface as the manual mode interface in the main study. Detailed results of this preliminary study is presented in [119].                                        a)                                                                         b) Figure 4-5. Manual mode interfaces. a) Leap-based interface that operates through hand gestures. b) Keyboard-based interface that operates through a set of keys. 82  4.4 Main User Study Design Informed by the results from the preliminary study, we conducted a within-subjects experiment, approved by the UBC Behavioral Research Ethics Board (application #H14-01937), to address the question “Can an autonomous handling of occlusions improve telepresence collaboration experience for remote users?” and to further investigate the impacts occlusion has on telepresence collaborations. Utilizing the aforementioned Jigsaw puzzle task, we tested three different modes of control (Stationary, Manual and Autonomous) to handle occlusions during three trials of remote collaboration.  In the Stationary mode, the participants, in the remote room, were not given any control over the robot while it remained stationary. For Stationary mode and in case of failing to resolve the occlusion, the occluder was trained to occlude the FoV for 30s. This time was chosen based on the average occlusion resolution time of 32.22s we observed from the keyboard-based interface condition of our preliminary study. In the Manual mode, the participants were trained to use the keyboard-based interface outlined in Section 4.3. In the Autonomous mode, the robot resolved the occlusions for the participants. In this section, we outline the autonomous controller, participants, the procedure employed in our experiment, and the measures collected from this study. 4.4.1 Autonomous Mode Controller  We implemented a novel autonomous controller to resolve the occlusion and keep the interlocutor in the FoV at all times. This controller is based on LTRA developed in [86], [87] where the interlocutor is considered as a target and occlusion is considered to trigger a lost target scenario. We use a skeleton tracking software package10 to track the occluder, and apply a one-step look-ahead probabilistic search to rapidly resolve the occlusion. The probabilistic search is formed as a constraint optimization problem to return feasible robot configurations in a real time fashion. An additional constraint is applied to keep the mounted camera/display always horizontal to avoid any tilted view of the room and tilted display.  4.4.2 Participants We recruited 36 new participants (18 females and 18 males) with age ranging from 19 to 60 (Mean = 25.42, SE =2.26). Each participant was compensated $10 CAD and was given a chance to win an extra $5 by beating the previous time scores of task completion time, as a means of motivation. None of the participants were color blind, and 30 of them had played Jigsaw puzzle in the past. The participants’                                                      10 Available at http://wiki.ros.org/openni_tracker 83  gaming experience ranged from less than once a month (17/36), at least once a month (8/36) to at least twice a week (11/36). The participants’ usage of video conferencing systems also varied from less than once a month (14/36), at least once a month (13/36), to more than twice a week (9/36).  For six of the participants, the Skype connection halted or slowed down to a level where audio/video signals were not recognizable. We excluded the data from these participants in our analysis. 4.4.3 Procedure The participants met the principal experimenter at the remote room where they completed a demographics questionnaire upon arrival. Then, they were introduced to our confederate and secondary experimenter (interlocutor and occluder, respectively) via Skype, and were instructed on the rules of the Jigsaw puzzle task and their role as an instructor to the interlocutor. Stationary mode was the first and baseline control mode for all participants. This control mode allowed us to observe the impacts of occlusion on participants’ vocal non-verbal behavior and task performance. These effects were measured through video coding and audio signal processing recorded by a microphone. Note that the Stationary mode resembles video conferencing systems commonly available today, where there is no means for the remote person to modify the FoV. The participants were informed that the occluder, but not the interlocutor, will hear them during the experiment. Apart from that, they were free to employ any instruction strategy they liked to complete the task (e.g., to ask the confederate to move during occlusion).  Afterwards, the order in which Manual and Autonomous modes were presented was alternated between participants to account for possible order effect. The Manual mode allowed a pilot to manually control the telepresence FoV, while the Autonomous mode resolved occlusions and restored the interlocutor to be in the FoV automatically.  Prior to starting the Manual mode trial, participants were taught how to use the keyboard-based interface by watching a short video. The principal experimenter also demonstrated the interface to the participants afterwards to show the extent of the system’s capabilities. Then, the participants were asked to familiarize themselves with the interface until they felt comfortable using it. They were also told to explore all of the interface’s capabilities before starting the trial. Prior to beginning of the trial, the participants were made aware that the use of the manual control interface is totally optional in completing the trial. After each trial, the participants completed a series of questionnaires. At the end of the entire experiment, they were asked to indicate their preferred interaction mode and to provide final comments. An overview of the experimental procedure is presented in Figure 4-6. All three trials of the experiment were video and audio recorded for analysis purposes. 84   Figure 4-6. Overview of the experimental procedure. All participants completed three remote collaboration trials and five sets of questionnaires. We alternated the Manual and Autonomous modes in between participants to account for possible order effect.  4.4.4 Measures We considered objective and subjective measures to study the impacts of occlusion on co-located and remote collaborations. These measures also allowed us to study the effects of various modes of control during remote collaboration. We computed speech energy (sum of the square values of speech amplitude) and speaking time from speech characteristics of sociometric measures to investigate vocal non-verbal behavior of the participants (H1 and H3). These measures have been used in other telepresence studies through audio signal processing [113], [118]. It is even concluded in both of these studies that objective sociometric measures are more demonstrative than subjective questionnaires. These measures are in fact discussed as one of the social signals during communication in [128].  We used a scaled magnitude of the speech amplitude in Pa (pascal) to calculate the speech energy and used speech percentage instead of speaking time by normalizing it over total conversation time. As the experiment was designed in such a way that occlusion happens over specific pieces, we broke down the conversation to the periods of time spent working on occluded versus non-occluded pieces. We annotated videos from the Stationary mode to extract the time segment of each puzzle piece and labeled them in terms of occlusion into Occlusion and No-occlusion conditions. We used these time segments and labels on the audio signal to distinguish between occluded and non-occluded pieces to compare their associated sociometric measures as one aspect of H1.  The other aspect of H1, the occlusion impact on task performance, was measured by the difference between the times to complete a piece for the Occlusion versus No-occlusion conditions in Stationary mode. The task completion time as a measure of task performance has commonly been used (e.g., [61], [116]). It might appear that we have a systematic bias in the completion time for the Occlusion condition since the occluder consistently occluded the interlocutor for 30 seconds in this condition, preventing the 85  participant from visually confirming the puzzle piece; however, the pilot was free to ask the interlocutor to move as a method to resolve the occlusion. We then compared some of the objective measures from the main study with that of the preliminary study in order to determine which of the three control modes produced the closest outcome to the co-located collaboration. This allows us to bring remote communications as close as possible to co-located human-human communications as one of the main goals of telepresence robotics. We compared the speech amount and speech energy from sociometric measures over the entire conversation of the co-located collaboration and three control modes to study whether a telepresence system that autonomously resolves occlusions significantly improves the remote communication (H3). We also compared the overall task completion time as a measure of task performance in various interaction modes to explore H4. As a measure of efficiency of Manual and Autonomous modes, we compared the occlusion resolution time, the time it took for the participant/system to resolve an occlusion from the moment the occlusion occurred. The occlusion resolution time was measured through frame-by-frame analysis of recorded videos for the two modes. We also qualitatively coded video recordings of the co-located collaboration and Manual mode in order to observe behavioural changes in participants upon presence of occlusion. This helped us to evaluate whether occlusion is indeed an interference that necessitates a change in the spatial relationship between the two collaborating parties (H2). The subjective measures were captured through a set of questionnaires. We used Temple Presence Inventory (TPI) questionnaire (a questionnaire related to use of most media and media content) [129] to investigate whether the pilot’s sense of presence is the highest in Manual mode, followed by Autonomous mode and then Stationary mode (H5). This questionnaire was used in numerous previous telepresence studies [114], [117], [118]. In this study, we used the questions from spatial presence, engagement and social presence aspects of TPI questionnaire. In addition, we employed NASA TLX questionnaire [124] to examine whether Autonomous mode reduces the perceived workload of the pilot compared to Manual mode (H6). This questionnaire was used in various telepresence and teleoperation usability studies before [61], [130]–[132]. We also checked the users’ preference of different remote collaboration modes at the end of the experiment in order to further assess H2. 4.5 Results In this section we first present the results associated with the impacts of occlusion. Then we compare the measures across Stationary, Manual and Autonomous modes to investigate effective means of occlusion 86  resolution in remote collaboration. The detailed interpretations of these results are presented later in Section 4.6. 4.5.1 Impacts of Occlusion on Telepresence Collaborations (H1, H2) In this subsection, we present the results addressing the question, what are the impacts of occlusion on telepresence collaborations? The overall approach of participants’ instruction was observed to be verbal explanation of a puzzle piece accompanied by visually showing it. In the co-located collaboration studied in the preliminary study, we observed that the participants always took action(s) to avoid or resolve occlusions. Actions such as leaning to the sides, stretching arms to the sides and leaving the chair were observed from the recorded videos. In the Stationary mode of the main study, the majority of the participants (25 out of 30) opted to verbally describe the puzzle piece during occlusion and confirming it even though they were free to ask the interlocutor to move. This was reflected in the vocal behavior and completion time of the Occlusion versus No-occlusion conditions of Stationary mode as follows: by design, each participant had five non-occluded and four occluded conversation segments in different orders that were later labeled through video annotation. We then conducted a two-tailed paired samples t-test on the average of these sets of segments for speech energy, speech percentage and piece completion time.  The speech energy, as presented in Figure 4-7.a, is significantly higher (t(29)=11.31, p<0.001) with Occlusion (M=0.56, SE=0.06) compared to No-occlusion (M=0.28, SE=0.04). However, as shown in Figure 4-7.b, we did not find any significant difference in participants’ speech percentage (t(29)=0.47, p=0.64) between Occlusion (M=36.76, SE=2.76) and No-occlusion (M=36.27, SE=3.24). The participants also took significantly longer time (t(29)=15.71, p<0.001) to complete the piece in Occlusion (M=0:59, SE=0:03) than in No-occlusion (M=0:31, SE=0:02) as shown in Figure 4-7.c. The difference between the means of these two times is slightly less than our designed occlusion time (30s). This is because only a few participants asked the interlocutor to move as a method to resolve occlusion in Stationary mode. 87  4.5.2 Autonomous Handling of Occlusions to Improve Telepresence Collaborations (H3-H6)  In this section, we present the results associated with our second research question, can an autonomous handling of occlusion improve telepresence collaboration experience for remote users? 4.5.2.1 H3. Vocal Non-Verbal Behaviour We analyzed the entire Stationary mode audio signal to compare it against the Manual and Autonomous modes as well as the co-located collaboration experiment from the preliminary study. We used R [133] and lme4 [134] to perform two linear mixed effects analysis on both speech energy and speech percentage. We considered the control modes as a within-subject fixed effect, gender as a between-subject fixed effect, and intercepts for participants as a random effect in the model. For both speech energy and speech percentage, assumption of homoscedasticity and normality of residuals were satisfied based on residual plot and Q-Q plot respectively. P-values were obtained by likelihood ratio tests of the full model with the effect in question against the model without the effect in question.  We did find significant main effects of control modes on speech energy (X2(3)=130.88, p<0.01). There were no significant gender effect or interaction effect. Pairwise comparison across the control modes using Bonferroni method shows that speech energy in co-located collaboration (M=0.93, SE=0.18) is significantly lower than that of Stationary (M=3.63, SE=0.15, p<0.001), Manual (M=2.11, SE=0.15, p<0.001) and Autonomous (M=1.97, SE=0.15, p<0.001) modes as presented in Figure 4-8.a. While we  * p<0.05                   a)                                             b)                                          c) Figure 4-7. Occlusion impacts in Stationary mode. a) Speech energy is significantly higher in Occlusion condition compared to No-occlusion. b) There is no significant difference in speech percentage of Occlusion and No-occlusion. c) The task performance is significantly higher in No-occlusion condition as its piece completion time is significantly lower. 00.10.20.30.40.50.60.7Speech energyNo-occlusionOcclusion*2830323436384042Speech percentageNo-occlusionOcclusion0:000:090:170:260:350:430:521:001:09Piece completion time (m:s)No-occlusionOcclusion*88  found the speech energy of both Autonomous and Manual modes to be significantly lower than that of Stationary (p<0.001), we did not significant difference between Autonomous and Manual modes (p=1.00). We also found significant main effects of control modes on speech percentage (X2(3)=31.37, p<0.001). There was a significant gender effect (X2(1)=10.14, p<0.01), but no interaction effect. We found that speech percentage in male participants (M=27.78, SE=1.34) is significantly lower than that of female participants (M=33.90, SE=1.34, p<0.01). As presented in Figure 4-8.b, pairwise comparison using Bonferroni method shows that speech percentage in co-located collaboration (M=25.09, SE=1.68) is significantly lower than that of Stationary (M=36.17, SE=1.37, p<0.001) and that of Manual (M=31.81, SE=1.37, p<0.05), but not significantly than that of Autonomous (M=30.30, SE=1.37, p=0.11). The speech percentage of both Autonomous and Manual are also significantly lower than that of Stationary (p<0.001 and p<0.05 respectively), but no significant differences are found between Manual and Autonomous (p=1.00).  4.5.2.2 H4. Overall Task Completion Time To compare the task completion time across the three control modes with that of the co-located collaboration, we again performed a linear mixed effects analysis. We considered the interaction mode as within-subject and gender as between-subject fixed effects and intercepts for participants as a random effect in the model. The homoscedasticity and normality assumptions were satisfied based on residual plot and Q-Q plot respectively. P-values were obtained by likelihood ratio tests of the full model with the effect in question against the model without the effect in question. We found significant main effects of control modes on the overall task completion time (X2(3)=167.53, p<0.001). There was no significant gender or interaction effect. Pairwise comparison using Bonferroni method shows that overall task completion time of all interaction modes are significantly different (p<0.001) from each other. As shown in Figure 4-8.c the task completion time of interaction modes is the fastest in co-located collaboration (M=3:21, SE=0:10), followed by Autonomous (M=4:26, SE=0:08), Manual (M=4:53, SE=0:08), and finally Stationary (M=6:34, SE=0:08).  89  4.5.2.3 Occlusion Resolution Time (an Efficiency Measure) We video coded both Manual and Autonomous modes to capture the time it took each mode to resolve an occlusion. As occlusion occurred four times in each trial in alternating order, we took the average of the occlusion resolution time per participant for each mode. A two-tailed paired samples t-test revealed that it took participants significantly longer (t(29)=7.07, p<0.001) to resolve occlusions manually in Manual mode (M=0:09, SE=0:02) than it took the system to resolve them autonomously in Autonomous mode (M=0:03, SE=0:00) (see Figure 4-8.d). 4.5.2.4 H5. Sense of Presence (TPI) We conducted a one-way Repeated Measures ANOVA on three measures of TPI (spatial presence, engagement, and social presence) across the three control modes as presented in Figure 4-9.a. Only the  a)                                                                         b)                     * p<0.05                                 c)                                                                       d) Figure 4-8. Comparison of objective measures across different interaction modes. a) Speech energy is highest in Stationary, followed by Manual, Autonomous and co-located collaboration. b) Speech percentage is also highest in Stationary, followed by Manual, Autonomous and co-located collaboration. c) Task completion time is also highest in Stationary, followed by Manual, Autonomous and co-located collaboration. d) Occlusion resolution time is significantly lower in Autonomous mode compared to Manual mode. 00.511.522.533.54Speech energy Co-locatedStationaryManualAutonomous**0510152025303540Speech percentageCo-locatedStationaryManualAutonomous**0:001:262:534:195:467:12Task completiontime (m:s)Co-locatedStationaryManualAutonomous*0:000:020:030:050:070:090:100:12Occlusion resolution time(m:s)ManualAuto*90  social presence measure violated the assumption of sphericity according to Mauchly’s test of sphericity (X2(2)=8.13, p<0.01). We corrected for this assumption violation using the Greenhouse-Geisser method (ε=0.799).  We found significant main effects for spatial presence (F(2, 58)=13.97, p<.001). Pairwise comparison using Bonferroni method showed that spatial presence of Stationary (M=3.10, SE=.26) is significantly less than that of Manual (M=3.81, SE=.24, p<0.001) and Autonomous (M=3.66, SE=.27, p<0.01). There is no significant difference in spatial presence between Manual and Autonomous (p=0.80). While we did not find any significant main effect of engagement across the control modes (F(2, 58)=1.85, p=0.17), we found significant main effects for social presence (F(1.60, 46.32)=7.74, p<.01). Pairwise comparison using Bonferroni method indicate that social presence of Stationary (M=4.49, SE=.14) is significantly lower than that of Manual (M=5.08, SE=.17, p<.01). We observed a trend that Autonomous (M=4.92, SE=.18) generally has a higher social presence than that of the Stationary, but this trend is not significant (p=.07). There is also no significant difference in social presence between Manual and Autonomous (p=0.42). Finally, we did not find any significant gender effects on any of the TPI measures or the control modes. 4.5.2.5 H6. Workload (NASA TLX) We conducted a two-tailed paired samples t-test on the NASA TLX questionnaire results. As presented in Figure 4-9.b, the Manual mode (M=48.74, SE=5.85) has significantly higher perceived workload (t(29)=3.17, p<0.01) than the Autonomous mode (M=43.06, SE=6.13).   * p<0.05                                           a)                                                              b) Figure 4-9. Comparison of subjective measures. a) TPI measures on the sense of presence across the three conditions illustrate that Stationary mode provides the lowest perceived sense of spatial and social presence. b) NASA TLX measures illustrate that Autonomous mode has significantly lower perceived workload compared to Manual model. 0123456TPI StationaryManualAutoSpatial presence Social presenceEngagement***0102030405060NASA TLXManualAuto*91  4.5.2.6 User Preference Among all of the 30 participants, no one indicated Stationary mode as their preferred choice of remote collaboration. The Manual mode was the preferred choice for seven of the participants, while the remaining 23 chose the Autonomous mode. 4.6 Discussion In order to investigate the impacts occlusion has on remote collaborations, we closely examined the findings from Stationary mode of the main study. Dividing the collaboration into Occlusion and No-occlusion conditions, we found that speech energy is significantly higher when occlusion is present than when it is not, while we did not find any significant difference in speech percentage (Figure 4-7.a/b). Our results also suggest that occlusion significantly hinders task performance as reflected in the longer completion time of the Occlusion condition (Figure 4-7.c). These results support our hypothesis, H1, that occlusions during telepresence collaborations significantly deteriorate the task performance and disrupt the vocal non-verbal behavior. As occlusion disrupts visual contact, its impact can be interpreted as using a different media. In [135], audio only communication is compared with video conferencing only, video conferencing with a drawing tool, and face-to-face communication during a collaborative task. They found that task completion time and total word count decrease in that order. Our results of the various interaction modes’ effect on speech percentage and overall task completion time in Figure 4-8.b/c has a similar trend to different media effects observed in [135]. In this regard, our Stationary mode resembles audio only communication, while Manual and Autonomous modes resemble video conferencing and assistive video conferencing respectively.  Our observation of co-located collaboration from the preliminary study demonstrated that participants always took action(s) to resolve occlusions, perhaps as a mean to minimize negative impacts of the occlusion. In our main study, all of the participants moved the telepresence camera/display in Manual mode when occlusion was presented. No participant also preferred Stationary mode over the other modes. These findings support H2 that occlusion is in fact an impactful extrinsic environmental interference that elicits a change in position/orientation during remote collaborations. Our findings on the two speech characteristics considered in this chapter also demonstrate a trend; both speech energy and speech percentage is highest in Stationary, followed by Manual, Autonomous and co-located collaboration (Figure 4-8.a/b). In the literature, speech energy and speech percentage during social 92  interactions are coarsely used as a measure of vocal excitement [136] and activity [137] respectively. Hence, one may interpret increase in speech energy and percentage as improvement in vocal non-verbal behavior. However, in our vision-dependent collaborative task, the co-located collaboration, our benchmark, demonstrated the lowest speech energy and speech percentage. A similar outcome, lower speech energy, is also reported in [138] when comparing collaborations in a co-located setting with a distributed setting (a simulated conference call setting). One should also note that in telepresence technologies, we share the ultimate goal of creating a telepresence interaction as close to that of co-located interactions. Therefore, we assert that lower, rather than higher, speech energy and speech percentage in our experiment imply an interaction that more closely resembles the co-located collaboration.  We found that Manual and Autonomous modes demonstrate significantly lower speech energy and percentage than Stationary mode, yet that of Autonomous mode is not significantly lower than Manual mode (Figure 4-8.a/b). This partially supports H3 that autonomous resolution of occlusion significantly improves vocal non-verbal behaviour of the pilot over the other two modes of control. However, it suggests that the ability to resolve occlusions, or having a shorter duration of occlusion encounters in general, yields an improved (mimics co-located interactions best) vocal non-verbal behavior in the pilot irrespective of whether the occlusion is resolved manually or autonomously. The ability to resolve occlusions during remote collaborations also has an effect on task performance. This effect is apparent from the fact that Manual and Autonomous modes yielded significantly shorter task completion times compared to that of Stationary mode (Figure 4-8.c). Autonomous mode also showed a significantly better task performance compared to Manual mode, supporting H4. This result also indicates that, with the LTRA algorithm used to autonomously resolve occlusions for the pilot, we can achieve a telepresence task performance that is closest to that of our co-located counterpart. We found only partial support for H5 that a pilot’s perceived sense of presence is highest in Manual mode, followed by Autonomous and Stationary mode. This hypothesis was adopted from the telepresence literature claiming that mobility increases sense of presence [116] and social presence in particular is stronger for manual mode compared to stationary and autonomous modes of operating an MRP [62]. Both Manual and Autonomous modes of our main study have significantly higher sense of spatial presence while only Manual mode shows a significantly higher sense of social presence compared to Stationary mode (Figure 4-9.a). Therefore, we conclude that mobility yields improvements in the sense of spatial presence but not necessarily that of social presence. 93  Comparing the perceived workload between Autonomous and Manual modes provided support for H6. The perceived workload of our collaborative task is significantly less when occlusions are handled autonomously rather than manually (Figure 4-9.b). This and our relative task performance results of the two modes are aligned with the findings in [61] comparing an autonomous mode with a manual mode on a different telepresence platform. Occlusion resolution time between the two modes also indicates that our Autonomous mode is more efficient in resolving occlusions compared to Manual mode (Figure 4-8.d). The Autonomous mode is also the preferred choice of majority of the participants.  Finally, moving a telepresence robot in an environment during a remote collaboration inherently addresses the robot's management of space as the robot acts as the pilot’s proxy. Hence, various proxemics factors are at play. In a human-robot interaction proxemics study, [139], a data driven model is developed to autonomously control the proxemic behavior of the robot. It is then discussed in the future work that a dynamic socially situated proxemic behavior is required to handle extrinsic environmental interference such as occlusion in complex environments. Even though occlusion is one of the extrinsic environmental interferences on proxemic behavior, the outcome of our work, comparing different modes of control, can be generalized to telepresence controller design for other dynamic proxemic behaviors. 4.7 Conclusion In this research we studied the impacts occlusion has on human-human collaboration both when the collaborating agents are co-located and telepresent. This advances our knowledge of human behaviors during telepresence interactions in complex environments. In addition, telepresence systems provide the pilot with capability of resolving occlusions through robot motion. Thus, we also focused on a controller design to mimic co-located interactions, reduce the pilot’s perceived workload and increase the task performance and sense of presence. We believe that developing a better controller for robot-mediated interactions is a key stepping stone to allow seamless collaboration using telepresence systems. We designed a vision-dependent collaborative task to conduct a co-located and remote collaboration experiment. We introduced occlusions in each trial to observe participants’ vocal non-verbal (communication) behavior as well as task performance. We found that occlusions in both co-located and remote collaborations trigger participants to change their spatial relationships with respect to their partner. With co-located collaboration serving as a benchmark for remote collaboration modes, our results demonstrate that occlusions deteriorate task performance and negatively affect vocal non-verbal behavior of the pilot. 94  We then learned that providing a mean to resolve occlusion in remote collaborations improves the vocal non-verbal behavior towards co-located collaborations. It was through a high-DoF telepresence platform with an articulated head and base design that dynamic spatial relationships were enabled. We empirically designed a user-friendly manual interface and applied a novel autonomous LTRA to control the dynamic spatial behaviors.  We also investigated whether the autonomous resolution of occlusions help further mediate the negative impacts of occlusion. Our results support the findings from the literature that autonomous, rather than manual, handling of spatial relationships in remote collaborations improves the task performance and the pilot’s perceived workload. However, there seems to be a trade-off between improving the task performance and perceived workload and sense of social presence. While we found that handling occlusions manually or autonomously improves the sense of social and spatial presence, the sense of social presence is better when occlusions are handled manually than autonomously. 95  Chapter 5 - Conclusions 5.1 Summary In this thesis, the focus was on the problem of high dimensional robotics vision losing track of the target of interest. Given the semi-structured or unstructured nature of human environments where robots are aimed to work in, regaining the visual cues of the target as quickly as possible is of high importance to the tracking algorithm. Then online algorithms were developed for high dimensional robotics vision to rapidly recover the lost target while considering sensor constraints, robot kinematic constraints and visibility constraints. The algorithms addressed two common scenarios of tracking failure. Chapter 2 introduced the online LTS algorithm to search for lost targets when they leave the camera FoV. A particle filter was applied to estimate the target location in 3D, while the proposed observation model provided the bridge between the robot high dimensional C-space and target Cartesian space. The LTS algorithm was then applied to an eye-in-hand visual servoing platform while tracking and servoing to a fast moving target. The results of testing various features revealed that the application of an entropy based cost function, and planning in higher dimensions in a greedy fashion, are essential features of the online LTS algorithm. Chapter 3 presented the improved LTS algorithm to handle visual occlusions that disrupt visual tracking in high dimensional eye-in-hand systems. The resulting LTRA algorithm allowed a robot to look behind an occluder during active visual target search and re-acquire its target in an online manner. A particle filter continuously estimated the target location when it was not visible and an enhanced observation model updated the target belief state while taking occlusions into account. Meanwhile, a simple but efficient map of the occluder extents was built to compute potential occlusion-clearing motions. To allow efficient planning, exhaustive mapping of the 3D occluder into C-space was avoided, and instead informed samples struck a balance between target search and information gain. The mixed-initiative cost function balanced the goal of gaining more information about the target and occluder extent while minimizing the sensor action cost. Then the introduced approximate sampling-based optimization solvers with efficient data-driven proposals allowed for planning one-step at a time.  96  The results of chapter 3 indicated that the proposed LTRA applied to an articulated telepresence system can quickly obtain clear views of the interlocutor when occlusion is persistent and significant camera motion is required. This was an indication that the disparate planning goals of the system can be effectively balanced using the multi-objective cost function. The results also showed that the proposed solvers outperform a common approach in the literature. Finally, two human-subjects studies were designed in Chapter 4 to investigate the necessity and efficacy of the LTRA algorithm in robotic telepresence application. Results from the first study demonstrated that occlusions introduce a significant social interference that necessitates collaborators to reorient or reposition themselves in space. Subsequently, the second experiment evaluated the efficacy of the developed autonomous occlusion handling for remote users. Self-reported and performance metrics of the users were compared when they were asked to manually resolve occlusions and when the autonomous controller resolved the occlusions for them. Results from this experiment indicated that the use of an autonomous controller yields a remote user experience that is more comparable (in terms of their vocal non-verbal behaviors, task performance and perceived workload) to collaborations performed by two co-located parties. However, the studies found that the remote user feels more present when s/he manually handles occlusions than when occlusions are handled autonomously. Finally, the implications of a better controller design for similar robot-mediated social interactions were discussed.  5.2 Contributions  The major contributions of this research are summarized as follows:  A fast online search algorithm, LTS, is developed that is applicable to high dimensional robotics vision search for a moving target in 3D. This algorithm is then improved to handle visual occlusions. The improved algorithm, LTRA, identifies feasible visibility restoring configurations (a 3D property) in dynamic environments without expensive C-space mapping and with minimal calls to IK. This algorithm does not require a priori knowledge of the occluder size or shape, and the target motion can be arbitrary.  A novel observation model is introduced to incorporate in the sensor constraints in the planning process of LTS. This observation model considers the camera boundaries and depth effects, while bridging between the Cartesian space and C-space. This model is then improved for LTRA to include a simple but efficient occlusion model to account for visibility constraints. 97   A multi-objective cost function that avoids the kinematics constraints of a multi-DoF robotic platform was designed and evaluated for LTS. This entropy-based cost function is then improved in LTRA to ensure the target is recovered while the occluder is mapped within a small number of predict/plan/act/observe iterations.  Two new sampling-based solvers are developed that solve the nonlinear and nonconvex cost function of LTRA. These solvers provide efficient approximate solutions that are applicable to online high dimensional searching scenarios.  Then two interlinked human-subjects studies are designed for the telepresence application. One study investigates the impacts of occlusions when introduced in a collaboration context and contributes to the current understanding of human behaviors in telepresence. The second study focuses on how different degrees of autonomy in resolving occlusions affect the collaboration experience. During these two studies, various objective and subjective mesaures were examined to provide a through insight to the investigated impacts.  The minor contributions of this research are summarized as follows:  The developed LTS algorithm is implemented both in simulation and on a physical eye-in-hand platform, and comparisons of this online search algorithm with relevant, previously published, algorithms are provided.  In presence of occlusions, comparisons of different solvers for the online algorithm, LTRA, are provided both in simulation and on a physical articulated telepresence platform. This algorithm provides an autonomous control mode for the telepresence application to resolve occlusions and keep the interlocutor in the FoV at all times.   An intuitive manual mode interface is designed and user studied to remotely control the articulated telepresence platform. This manual mode interface is used in the human-subjects studies as a baseline to evaluate the efficacy of the novel autonomous mode controler. 5.3 Future Work This research developed a novel online planner for high dimensional eye-in-hand systems to search for lost targets and presented one of its applications in telepresence robotics. Following this work, a number of potential future research avenues are suggested as follows:  Apart from occlusions, other environmental constraints such as obstacles and self-collision are introduced challenges when planning over the entire robot’s DoFs. These challenges, known as self-98  collision and whole-arm collision avoidance in high-DoF robotic path planning, are ongoing research areas. Limiting the LTS solution space in higher dimensions by considering other environmental constraints is a natural next step specifically in cluttered environments.   To speed up the developed LTRA in this work, one proposal is to adopt the efficient mapping technique presented in [140], [141]. Their planar convex polygon extraction technique based on depth images appears to be suitable to replace the combination of occupancy grid and convex polygon models used to map the occluder in this work. In fact, their technique models the environment as a set of plane polygons and stores the corner points and normal vectors of each polygon without constructing an occupancy grid map.   Another area of improvement is to implement adaptive balancing between exploitative and explorative samples of the LTRA-IS sampling strategy in Chapter 3. This is based on the balancing approach presented in [104] to direct exploitative samples towards feasible critical points and explore the C-space when not enough feasible solutions for critical points are available.    In the application of this research on telepresence, this research only considered occlusion as one of the extrinsic environmental interferences eliciting a change in spatial relationships during remote collaborations. The belief is that one can adopt the outcomes of different modes of control from this research to better handle dynamic spatial behaviors in telepresence communications.   In the telepresence studies, the underlying task involved in the remote collaboration can affect the collaboration communication outcomes and the efficacy of one control mode over another. Conducting similar studies while considering other communication tasks and interaction fluency measures as well as considering interlocutor’s perspective are interesting future research avenues.   Finally, a follow up experiment of this research is to apply the developed algorithms on mobile robots with a mounted pan-tilt camera. A commercialized MRP is an example of such platforms to search for an interlocutor, follow that person in a crowded environment with instances of occlusions, and try to keep a smooth conversation between the pilot and the interlocutor.  99  Bibliography [1] F. Chaumette and S. Hutchinson, “Visual Servo Control. I. Basic Approaches,” IEEE Robotics and Automation Magazine, vol. 13, no. 4, pp. 82–90, 2006. [2] F. Chaumette and S. Hutchinson, “Visual Servo Control. II. Advanced Approaches,” IEEE Robotics & Automation Magazine, vol. 14, no. 1, pp. 109–118, 2007. [3] D. Kragic and M. Vincze, “Vision for Robotics,” Found. Trends Robot., vol. 1, no. 1, pp. 1–78, 2009. [4] A. Krause, “Optimizing Sensing Theory and Applications,” PhD Thesis, Carnegie Mellon University, 2008. [5] A. Bakhtari, M. D. Naish, M. Eskandari, E. A. Croft, and B. Benhabib, “Active-Vision-Based Multisensor Surveillance - An Implementation,” IEEE Trans. Syst. Man. Cybern., vol. 36, no. 5, pp. 668–679, 2006. [6] C. Y. Tsai and K. T. Song, “Dynamic Visual Tracking Control of a Mobile Robot with Image Noise and Occlusion Robustness,” Image Vis. Comput., vol. 27, no. 8, pp. 1007–1022, 2009. [7] B. O. Koopman, “Search and Its Optimization,” Am. Math. Mon., vol. 86, no. 7, pp. 527–540, 1979. [8] L. D. Stone, Theory of Optimal Search. Academic Press, 1975. [9] C. Y. Tsai, K. T. Song, X. Dutoit, H. Van Brussel, and M. Nuttin, “Robust Visual Tracking Control System of a Mobile Robot Based on a Dual-Jacobian Visual Interaction Model,” Rob. Auton. Syst., vol. 57, no. 6–7, pp. 652–664, Jun. 2009. [10] R. Bajcsy, “Active Perception,” Proc. IEEE, vol. 76, no. 8, pp. 966–1005, 1988. [11] S. Chen, Y. Li, and N. M. Kwok, “Active Vision in Robotic Systems: A Survey of Recent Developments,” Int. J. Rob. Res., vol. 30, no. 11, pp. 1343–1377, 2011. [12] B. J. Nelson and P. K. Khosla, “Strategies for Increasing the Tracking Region of an Eye-in-Hand System by Singularity and Joint Limit Avoidance,” Int. J. Rob. Res., vol. 14, no. 3, pp. 255–269, Jun. 1995. [13] G. Chesi, K. Hashimoto, D. Prattichizzo, and A. Vicino, “A Switching Control Law for Keeping 100  Features in the Field of View in Eye-in-Hand Visual Servoing,” in IEEE International Conference on Robotics and Automation, 2003, pp. 3929–3934. [14] R. Murrieta-Cid, B. Tovar, and S. Hutchinson, “A Sampling-based Motion Planning Approach to Maintain Visibility of Unpredictable Targets,” Auton. Robots, vol. 19, no. 3, pp. 285–300, Dec. 2005. [15] D. Panagou and V. Kumar, “Cooperative Visibility Maintenance for Leader-Follower Formations in Obstacle Environments,” IEEE Trans. Robot., vol. 30, no. 4, pp. 831–844, 2014. [16] G. Chesi, K. Hashimoto, D. Prattichizzo, and A. Vicino, “Keeping Features in the Field of View in Eye-in-Hand Visual Servoing: A Switching Approach,” IEEE Trans. Robot., vol. 20, no. 5, pp. 908–913, 2004. [17] N. R. Gans, G. Hu, K. Nagarajan, and W. E. Dixon, “Keeping Multiple Moving Targets in the Field of View of a Mobile Camera,” IEEE Trans. Robot., vol. 27, no. 4, pp. 822–828, 2011. [18] C. Y. Lee, H. Gonzalez-Banos, and J. C. Latombe, “Real-time Tracking of an Unpredictable Target Amidst Unknown Obstacles,” in 7th International Conference on Control, Automation, Robotics and Vision, 2002, vol. 2, pp. 596–601. [19] D. Kragi, “Visual Servoing for Manipulation: Robustness and Integration Issues,” PhD Thesis, Numerical Analysis and Computer Science, Royal Institute of Technology, 2001. [20] T. H. Chung, G. A. Hollinger, and V. Isler, “Search and Pursuit-Evasion in Mobile Robotics,” Auton. Robots, vol. 31, no. 4, pp. 299–316, Jul. 2011. [21] L. F. Bertuccelli and J. P. How, “Search for Dynamic Targets with Uncertain Probability Maps,” in American Control Conference, 2006, pp. 737–742. [22] T. H. Chung and J. W. Burdick, “Analysis of Search Decision Making Using Probabilistic Search Strategies,” IEEE Trans. Robot., vol. 28, no. 1, pp. 132–144, 2012. [23] B. P. Gerkey, G. Gordon, and S. Thrun, “Visibility-based Pursuit-evasion with Limited Field of View,” Int. J. Rob. Res., vol. 25, no. 4, pp. 299–315, Apr. 2006. [24] G. Hollinger, A. Kehagias, and S. Singh, “GSST: Anytime Guaranteed Search,” Auton. Robots, vol. 29, no. 1, pp. 99–118, Apr. 2010. [25] F. Bourgault, A. Göktogan, T. Furukawa, and H. F. Durrant-Whyte, “Coordinated Search for a Lost Target in a Bayesian World,” Adv. Robot., vol. 18, no. 10, pp. 979–1000, 2004. [26] N. Bergman, “Recursive Bayesian Estimation: Navigation and Tracking Applications,” PhD Dissertation, Linköping University, 1999. [27] B. Lavis, T. Furukawa, and H. F. Durrant Whyte, “Dynamic Space Reconfiguration for Bayesian Search and Tracking with Moving Targets,” Auton. Robots, vol. 24, no. 4, pp. 387–399, Jan. 2008. 101  [28] C. F. Chung and T. Furukawa, “Coordinated Pursuer Control using Particle Filters for Autonomous Search-and-Capture,” Rob. Auton. Syst., vol. 57, no. 6–7, pp. 700–711, Jun. 2009. [29] A. Ryan and J. K. Hedrick, “Particle Filter Based Information-Theoretic Active Sensing,” Rob. Auton. Syst., vol. 58, no. 5, pp. 574–584, May 2010. [30] G. A. Hollinger and G. S. Sukhatme, “Sampling-based Robotic Information Gathering Algorithms,” Int. J. Rob. Res., vol. 33, no. 9, pp. 1271–1287, Jun. 2014. [31] L. Lin, M. A. Goodrich, and S. Member, “Hierarchical Heuristic Search Using a Gaussian Mixture Model for UAV Coverage Planning,” IEEE Trans. Cybern., vol. 44, no. 12, pp. 2532–2544, 2014. [32] K. Shubina and J. K. Tsotsos, “Visual Search for an Object in a 3D Environment using a Mobile Robot,” Comput. Vis. Image Underst., vol. 114, no. 5, pp. 535–547, May 2010. [33] J. Ma, T. H. Chung, and J. Burdick, “A Probabilistic Framework for Object Search with 6-DOF Pose Estimation,” Int. J. Rob. Res., vol. 30, no. 10, pp. 1209–1228, Jun. 2011. [34] A. Aydemir, A. Pronobis, G. Moritz, and P. Jensfelt, “Active Visual Object Search in Unknown Environments Using Uncertain Semantics,” IEEE Trans. Robot., vol. 29, no. 4, pp. 986–1002, 2013. [35] J. Espinoza, A. Sarmiento, R. Murrieta-Cid, and S. Hutchinson, “Motion Planning Strategy for Finding an Object with a Mobile Manipulator in Three-Dimensional Environments,” Adv. Robot., vol. 25, no. 13–14, pp. 1627–1650, Jan. 2011. [36] Y. Zhang, J. Shen, and N. Gans, “Real-Time Optimization for Eye-in-Hand Visual Search,” IEEE Trans. Robot., vol. 30, no. 2, pp. 325–339, Apr. 2014. [37] S. Webb and T. Furukawa, “Belief-Driven Manipulator Visual Servoing for Less Controlled Environments,” Adv. Robot., vol. 22, no. 5, pp. 547–572, 2008. [38] A. Kolling, A. Kleiner, K. Sycara, and M. Lewis, “Pursuit-Evasion in 2.5D Based on Team-Visibility,” in IEEE/RSJ International Conference on Intelligent Robots and Systems, 2010, pp. 4610–4616. [39] A. Andreopoulos and J. K. Tsotsos, “A Theory of Active Object Localization,” in 12th IEEE International Conference on Computer Vision, 2009, pp. 903–910. [40] Y. Ye, J. K. Tsotsos, I. B. M. T. J. Watson, Y. Heights, and N. York, “A Complexity-Level Analysis of the Sensor Planning Task for Object Search,” Comput. Intell., vol. 17, no. 4, pp. 605–620, 2001. [41] D. Kragic, M. Björkman, H. I. Christensen, and J. O. Eklundh, “Vision for Robotic Object Manipulation in Domestic Settings,” Rob. Auton. Syst., vol. 52, no. 1, pp. 85–100, 2005. [42] M. A. Greminger and B. J. Nelson, “A Deformable Object Tracking Algorithm Based on the Boundary Element Method that is Robust to Occlusions and Spurious Edges,” Int. J. Comput. Vis., 102  vol. 78, no. 1, pp. 29–45, 2008. [43] M. D. Breitenstein, F. Reichlin, B. Leibe, E. Koller-Meier, and L. Van Gool, “Robust Tracking-by-Detection using a Detector Confidence Particle Filter,” in IEEE International Conference on Computer Vision, 2009, pp. 1515–1522. [44] F. Xiong, O. I. Camps, and M. Sznaier, “Dynamic Context for Tracking Behind Occlusions,” in Computer Vision - ECCV 2012, Lecture Notes in Computer Science, vol. 7576, 2012, pp. 580–593. [45] J. Maver and R. Bajcsy, “Occlusions as a Guide for Planning the Next View,” IEEE Trans. Pattern Anal. Mach. Intell., vol. 15, no. 5, pp. 417–433, 1993. [46] K. Tarabanis, R. Y. Tsai, and A. Kaul, “Computing Occlusion-Free Viewpoints,” IEEE Trans. Pattern Anal. Mach. Intell., vol. 18, no. 3, pp. 279–292, 1996. [47] S. Wang, J. Bao, and Y. Fu, “Real-time Motion Planning for Robot Manipulators in Unknown Environments Using Infrared Sensors,” Robotica, vol. 25, no. 2, pp. 201–211, Mar. 2007. [48] Y. Yu and K. Gupta, “C-space Entropy: A Measure for View Planning and Exploration for General Robot-Sensor Systems in Unknown Environments,” Int. J. Rob. Res., vol. 23, no. 12, pp. 1197–1223, 2004. [49] L. Torabi and K. Gupta, “An Autonomous Six-DOF Eye-in-Hand System for in situ 3D Object Modeling,” Int. J. Rob. Res., vol. 31, no. 1, pp. 82–100, 2012. [50] A. Kristoffersson, S. Coradeschi, and A. Loutfi, “A Review of Mobile Robotic Telepresence,” Adv. Human-Computer Interact., vol. 2013, pp. 1–17, Jan. 2013. [51] K. M. Tsui, M. Desai, and H. A. Yanco, “Exploring Use Cases for Telepresence Robots,” in Proceedings of the 6th international conference on Human-robot interaction - HRI ’11, 2011, pp. 11–18. [52] G. Giannoulis, M. Coliou, G. S. Kamarinos, M. Roussou, P. Trahanias, A. Argyros, D. Tsakiris, A. Cremers, D. Schulz, W. Burgard, D. Haehnel, V. Savvaides, P. Friess, D. Konstantios, and A. Katselaki, “Enhancing Museum Visitor Access Through Robotic Avatars Connected to the Web,” in Museums and the Web, 2001. [53] InTouch Health, “InTouch Health Receives FDA Clearance for the RP-VITA Remote Presence Robot,” Press Release, 2013. [Online]. Available: http://www.intouchhealth.com/in-the-news/press-releases/01-08-2013/. [Accessed: 07-Oct-2015]. [54] M. K. Lee and L. Takayama, “‘Now, I Have a Body’: Uses and Social Norms for Mobile Remote Presence in the Workplace,” in Proceedings of the SIGCHI Conference on Human Factors in Computing Systems, 2011, pp. 33–42. [55] R. C. Schmidt and M. J. Richardson, “Dynamics of Interpersonal Coordination,” in Understanding Complex Systems, Springer Berlin Heidelberg, 2008, pp. 281–308. 103  [56] C. Breazeal, J. Gray, and M. Berlin, “An Embodied Cognition Approach to Mindreading Skills for Socially Intelligent Robots,” Int. J. Rob. Res., vol. 28, no. 5, pp. 656–680, 2009. [57] R. Mead, “Space, Speech, and Gesture in Human-Robot Interaction,” in Proceedings of the 14th ACM International Conference on Multimodal interaction - ICMI ’12, 2012, p. 333. [58] B. van Dijk, J. Zwiers, R. op den Akker, O. Kulyk, H. Hondorp, D. Hofs, and A. Nijholt, “Conveying Directional Gaze Cues to Support Remote Participation in Hybrid Meetings,” Towar. Auton. Adapt. Context. Multimodal Interfaces, pp. 412–428, Jan. 2011. [59] Y. Bondareva and D. G. Bouwhuis, “Determinants of Social Presence in Videoconferencing,” in Proceedings of the Workshop on Environments for Personalized Information Access, 2004, pp. 1–9. [60] A. Cosgun, D. A. Florencio, and H. I. Christensen, “Autonomous Person Following for Telepresence Robots,” in IEEE International Conference on Robotics and Automation, 2013, pp. 4335–4342. [61] W. C. Pang, G. Seet, and X. Yao, “A Study on High-Level Autonomous Navigational Behaviors for Telepresence Applications,” Presence Teleoperators Virtual Environ., vol. 23, no. 2, pp. 155–171, Aug. 2014. [62] H. Nakanishi, Y. Murakami, D. Nogami, and H. Ishiguro, “Minimum Movement Matters: Impact of Robot-Mounted Cameras on Social Telepresence.,” in Computer-Supported Cooperative Work and Social Computing, 2008, pp. 303–312. [63] A. Doucet, N. de Freitas, and N. Gordon, Sequential Monte Carlo Methods in Practice. Springer-Verlag, 2001. [64] D. J. Simon, Optimal State Estimation: Kalman, H Infinity, and Nonlinear Approaches. Wiley-Interscience, 2006. [65] M. S. Arulampalam, S. Maskell, N. Gordon, and T. Clapp, “A Tutorial on Particle Filters for Online Nonlinear/Non-Gaussian Bayesian Tracking,” IEEE Trans. Signal Process., vol. 50, no. 2, pp. 174–188, 2002. [66] K. Zhou, S. Member, and S. I. Roumeliotis, “Optimal Motion Strategies for Range-Only Constrained Multisensor Target Tracking,” IEEE Trans. Robot., vol. 24, no. 5, pp. 1168–1185, 2008. [67] J. Vander Hook, P. Tokekar, and V. Isler, “Cautious Greedy Strategy for Bearing-only Active Localization: Analysis and Field Experiments,” J. F. Robot., vol. 31, no. 2, pp. 296–318, Mar. 2014. [68] R. Martinez-Cantin, N. de Freitas, E. Brochu, J. Castellanos, and A. Doucet, “A Bayesian Exploration-Exploitation Approach for Optimal Online Sensing and Planning with a Visually Guided Mobile Robot,” Auton. Robots, vol. 27, no. 2, pp. 93–103, Aug. 2009. 104  [69] G. Hollinger, S. Singh, J. Djugash, and A. Kehagias, “Efficient Multi-robot Search for a Moving Target,” Int. J. Rob. Res., vol. 28, no. 2, pp. 201–219, Feb. 2009. [70] J. Tisdale, Z. Kim, and J. K. Hedrick, “Autonomous UAV Path Planning and Estimation,” IEEE Robotics and Automation Magazine, no. June, pp. 35–42, 2009. [71] J. P. Ramirez, E. A. Doucette, J. W. Curtis, and N. Gans, “Moving Target Acquisition Through State Uncertainty Minimization,” in American Control Conference, 2014, pp. 3425–3430. [72] C. Geyer, “Active Target Search from UAVs in Urban Environments,” in IEEE International Conference on Robotics and Automation, 2008, pp. 2366–2371. [73] S. Radmard and E. A. Croft, “Approximate Recursive Bayesian Filtering Methods for Robot Visual Search,” in IEEE International Conference on Robotics and Biomimetics, 2011, pp. 2067–2072. [74] M. Kobilarov, J. E. Marsden, and G. S. Sukhatme, “Global Estimation in Constrained Environments,” Int. J. Rob. Res., vol. 31, no. 1, pp. 24–41, Oct. 2011. [75] P. Niedfeldt, R. Beard, B. Morse, and S. Pledgie, “Integrated Sensor Guidance using Probability of Object Identification,” in American Control Conference, 2010, pp. 788–793. [76] H. Lau, S. Huang, and G. Dissanayake, “Probabilistic Search for a Moving Target in an Indoor Environment,” in IEEE/RSJ International Conference on Intelligent Robots and Systems, 2006, pp. 3393–3398. [77] G. Cen, N. Matsuhira, J. Hirokawa, H. Ogawa, and I. Hagiwara, “New Entropy-Based Adaptive Particle Filter for Mobile Robot Localization,” Adv. Robot., vol. 23, no. 12–13, pp. 1761–1778, Jan. 2009. [78] J. Mattingley, Y. Wang, and S. Boyd, “Code Generation for Receding Horizon Control,” in IEEE International Symposium on Computer-Aided Control System Design, 2010, pp. 985–992. [79] E. Sommerlade, “Active Visual Scene Exploration,” PhD Thesis, Department of Engineering Science, University of Oxford, 2010. [80] X. Tian, Y. Bar-Shalom, and K. R. Pattipati, “Multi-step Look-Ahead Policy for Autonomous Cooperative Surveillance by UAVs in Hostile Environments,” in 47th IEEE Conference on Decision and Control, 2008, pp. 2438–2443. [81] H. Lau, S. Huang, and G. Dissanayake, “Optimal Search for Multiple Targets in a Built Environment,” in IEEE/RSJ International Conference on Intelligent Robots and Systems, 2005, pp. 3740–3745. [82] J. Cabanillas, E. F. Morales, and L. E. Sucar, “An Efficient Strategy for Fast Object Search Considering the Robot’s Perceptual Limitations,” Adv. Artif. Intell. - IBERAMIA 2010, Lect. Notes Comput. Sci., vol. 6433, pp. 552–561, 2010. 105  [83] J. Binney and G. S. Sukhatme, “Branch and Bound for Informative Path Planning,” in IEEE International Conference on Robotics and Automation, 2012, pp. 2147–2154. [84] E.-M. Wong, F. Bourgault, and T. Furukawa, “Multi-vehicle Bayesian Search for Multiple Lost Targets,” in IEEE International Conference on Robotics and Automation, 2005, pp. 3169–3174. [85] Y. Ye and J. K. Tsotsos, “Sensor Planning for 3D Object Search,” Comput. Vis. Image Underst., vol. 73, no. 2, pp. 145–168, Feb. 1999. [86] S. Radmard, D. Meger, E. A. Croft, and J. J. Little, “Overcoming Unknown Occlusions in Eye-in-Hand Visual Search,” in IEEE International Conference on Robotics and Automation, 2013, pp. 3075–3082. [87] S. Radmard and E. A. Croft, “Overcoming Occlusions in Semi-Autonomous Telepresence Systems,” in 16th International Conference on Advanced Robotics, 2013, pp. 1–6. [88] L. Wixson, “Viewpoint Selection for Visual Search,” in IEEE Computer Society Conference on Computer Vision and Pattern Recognition, 1994, pp. 800–805. [89] V. A. Sujan and S. Dubowsky, “Efficient Information-based Visual Robotic Mapping in Unstructured Environments,” Int. J. Rob. Res., vol. 24, no. 4, pp. 275–293, 2005. [90] M. Krainin, B. Curless, and D. Fox, “Autonomous Generation of Complete 3D Object Models Using Next Best View Manipulation Planning,” in IEEE International Conference on Robotics and Automation, 2011, pp. 5031–5037. [91] K. Jonnalagadda, R. Lumia, G. Starr, and J. Wood, “Viewpoint Selection for Object Reconstruction Using only Local Geometric Features,” in IEEE International Conference on Robotics and Automation, 2003, pp. 2116–2122. [92] Y. F. Li and Z. G. Liu, “Information Entropy-Based Viewpoint Planning for 3-D Object Reconstruction,” IEEE Trans. Robot., vol. 21, no. 3, pp. 324–337, 2005. [93] S. M. Lavalle, “Sampling-based Motion Planning,” in Planning Algorithms, Cambridge University Press, 2006, pp. 185–248. [94] M. Elbanhawi and M. Simic, “Sampling-based Robot Motion Planning: A Review,” IEEE Access, vol. 2, pp. 56–77, 2014. [95] S. Karaman and E. Frazzoli, “Sampling-based Algorithms for Optimal Motion Planning,” Int. J. Rob. Res., vol. 30, no. 7, pp. 846–894, 2011. [96] J. K. Tsotsos, “On the Relative Complexity of Active vs. Passive Visual Search,” Int. J. Comput. Vis., vol. 7, no. 2, pp. 127–141, 1992. [97] X. Chen and S. Lee, “Visual Search of an Object in Cluttered Environments for Robotic Errand Service,” in IEEE International Conference on Systems, Man, and Cybernetics, 2013, pp. 4060–4065. 106  [98] Y. Zhang, M. Rotea, and N. Gans, “Sensors Searching for Interesting Things: Extremum Seeking Control on Entropy Maps,” in 50th IEEE Conference on Decision and Control, 2011, pp. 4985–4991. [99] J. I. Vasquez-Gomez, E. Lopez-Damian, and L. E. Sucar, “View Planning for 3D Object Reconstruction,” in IEEE/RSJ International Conference on Intelligent Robots and Systems, 2009, pp. 4015–4020. [100] L. Freda, G. Oriolo, and F. Vecchioli, “Sensor-Based Exploration for General Robotic Systems,” in IEEE/RSJ International Conference on Intelligent Robots and Systems, 2008, pp. 2157–2164. [101] S. Boyd and L. Vandenberghe, Convex Optimization. New York, NY, USA: Cambridge University Press, 2004. [102] S. Byrne, W. Naeem, and S. Ferguson, “An Intelligent Configuration-Sampling Based Local Motion Planner for Robotic Manipulators,” in 9th International Workshop on Robot Motion and Control - Workshop Proceedings, 2013, pp. 147–153. [103] S. Radmard, D. Meger, E. A. Croft, and J. J. Little, “Overcoming Occlusions in Eye-in-Hand Visual Search,” in American Control Conference, 2012, pp. 4102–4107. [104] M. Rickert, A. Sieverling, and O. Brock, “Balancing Exploration and Exploitation in Sampling-Based Motion Planning,” IEEE Trans. Robot., vol. 30, no. 6, pp. 1305–1317, 2014. [105] M. W. Spong, S. Hutchinson, and M. Vidyasagar, “Forward and Inverse Kinematics,” in Robot Modeling and Control, Wiley, 2005, pp. 496–523. [106] E. T. Hall, The Hidden Dimension: Man’s Use of Space in Public and Private. London : Bodley Head, 1966. [107] A. Kendon, “Spatial Organization in Social Encounters: the F-formation System,” in Conducting Interaction: Patterns of Behavior in Focused Encounters, New York: Cambridge University Press, 1990. [108] H. Huettenrauch, K. S. Eklundh, A. Green, and E. a. Topp, “Investigating Spatial Relationships in Human-Robot Interaction,” in IEEE International Conference on Intelligent Robots and Systems, 2006, pp. 5052–5059. [109] H. Kuzuoka, Y. Suzuki, J. Yamashita, and K. Yamazaki, “Reconfiguring Spatial Formation Arrangement by Robot Body Orientation,” in 5th ACM/IEEE International Conference on Human-Robot Interaction, 2010, pp. 285–292. [110] R. Mead, A. Atrash, and M. J. Matarić, “Automated Proxemic Feature Extraction and Behavior Recognition: Applications in Human-Robot Interaction,” Int. J. Soc. Robot., vol. 5, no. 3, pp. 367–378, May 2013. [111] I. Rae, L. Takayama, and B. Mutlu, “The Influence of Height in Robot-Mediated Communication,” in Proceedings of the 8th ACM/IEEE International Conference on Human-robot 107  Interaction - HRI ’13, 2013, pp. 1–8. [112] R. Yan, K. P. Tee, Y. Chua, and Z. Huang, “A User Study for an Attention-Directed Robot for Telepresence,” in 11th International Conference on Smart Homes and Health Telematics, 2013, pp. 110–117. [113] D. Sirkin, G. Venolia, J. Tang, G. Robertson, T. Kim, K. Inkpen, M. Sedlins, B. Lee, and M. Sinclair, “Motion and Attention in a Kinetic Videoconferencing Proxy,” Lect. Notes Comput. Sci. (including Subser. Lect. Notes Artif. Intell. Lect. Notes Bioinformatics), vol. 6946, pp. 162–180, 2011. [114] S. O. Adalgeirsson, C. Breazeal, and S. Ö. Aðalgeirsson, “MeBot A Robotic Platform for Socially Embodied Telepresence,” in Proceedings of 5th ACM/IEEE International Conference on Human-Robot Interaction, 2010, no. 2007, pp. 15–22. [115] J. J. Cabibihan, W. C. So, S. Saj, and Z. Zhang, “Telerobotic Pointing Gestures Shape Human Spatial Cognition,” Int. J. Soc. Robot., vol. 4, no. 3, pp. 263–272, Apr. 2012. [116] I. Rae, B. Mutlu, and L. Takayama, “Bodies in motion: Mobility, Presence, and Task Awareness in Telepresence,” in Proceedings of the 32nd annual ACM conference on Human factors in computing systems - CHI ’14, 2014, pp. 2153–2162. [117] A. Kristoffersson, K. Severinson Eklundh, and A. Loutfi, “Measuring the Quality of Interaction in Mobile Robotic Telepresence: A Pilot’s Perspective,” Int. J. Soc. Robot., vol. 5, no. 1, pp. 89–101, Aug. 2012. [118] A. Kristoffersson, S. Coradeschi, K. S. Eklundh, and A. Loutfi, “Towards Measuring Quality of Interaction in Mobile Robotic Telepresence using Sociometric Badges,” Paladyn J. Behav. Robot., vol. 4, no. 1, pp. 34–48, 2013. [119] S. Radmard, A. Moon, and E. A. Croft, “Interface Design and Usability Analysis for a Robotic Telepresence Platform,” in IEEE RO-MAN: The 24th IEEE International Symposium on Robot and Human Interactive Communication, 2015. [120] B. Cohen, J. Lanir, R. Stone, and P. Gurevich, “Requirements and Design Considerations for a Fully Immersive Robotic Telepresence System,” in The 6th ACM/IEEE Conference on Human-Robot Interaction, Social Robotic Telepresence Workshop, 2011, pp. 16–22. [121] S. Arts, M. Sam, F. Hernoux, R. Bearee, L. Gajny, O. Gibaru, E. Nyiri, L. Motion, F. Hernoux, R. Béarée, L. Gajny, E. Nyiri, J. Bancalin, and O. Gibaru, “Leap Motion pour la capture de mouvement 3D par spline L 1 . Application à la robotique,” Sci. Arts Métiers, 2013. [122] D. Bassily, C. Georgoulas, J. Güttler, T. Linner, T. Bock, and T. U. München, “Intuitive and Adaptive Robotic Arm Manipulation using the Leap Motion Controller,” in International Symposium on Robotics, 2014, pp. 78–84. [123] C. Guo and E. Sharlin, “Exploring the Use of Tangible User Interfaces for Human-robot Interaction: A Comparative Study,” in Proceedings of the SIGCHI Conference on Human Factors 108  in Computing Systems - CHI ’08, 2008, pp. 121–130. [124] S. G. Hart and L. E. Staveland, “Development of NASA-TLX (Task Load Index): Results of Empirical and Theoretical Research,” in Human Mental Workload, 1988, pp. 139–183. [125] J. Brooke, “SUS - A Quick and Dirty Usability Scale,” in Usability evaluation in industry, P. W. Jordan, B. Thomas, B. A. Weerdmeester, and A. L. McClelland, Eds. London: Taylor and Francis, 1996, pp. 189–194. [126] G. Du, P. Zhang, J. Mai, and Z. Li, “Markerless Kinect-Based Hand Tracking for Robot Teleoperation,” Int. J. Adv. Robot. Syst., vol. 9, no. 36, pp. 1–10, 2012. [127] M. Obaid, F. Kistler, M. Häring, R. Bühling, and E. André, “A Framework for User-Defined Body Gestures to Control a Humanoid Robot,” Int. J. Soc. Robot., vol. 6, no. 3, pp. 383–396, Apr. 2014. [128] A. Vinciarelli, M. Pantic, and H. Bourlard, “Social Signal Processing: Survey of an Emerging Domain,” Image Vis. Comput., vol. 27, no. 12, pp. 1743–1759, Nov. 2009. [129] M. Lombard, T. B. Ditton, and L. Weinstein, “Measuring Presence : The Temple Presence Inventory,” in 12th Annual International Workshop on Presence, 2009. [130] A. M. Walker, D. P. Miller, and C. Ling, “Spatial Orientation Aware Smartphones for Tele-operated Robot Control in Military Environments: A Usability Experiment,” Proc. Hum. Factors Ergon. Soc. Annu. Meet., vol. 57, no. 1, pp. 2027–2031, Sep. 2013. [131] J. A. Adams and H. Kaymaz-Keskinpala, “Analysis of Perceived Workload when using a PDA for Mobile Robot Teleoperation,” in IEEE International Conference on Robotics and Automation, 2004, pp. 4128–4133. [132] A. Kiselev and A. Loutfi, “Using a Mental Workload Index as a Measure of Usability of a User Interface for Social Robotic Telepresence,” in Ro-Man Workshop on Social Robotic Telepresence, 2012, pp. 3–6. [133] R Core Team, “R: A Language and Environment for Statistical Computing.” Vienna, Austria, 2015. [134] W. S. Bates D, Maechler M, Bolker B, “lme4: Linear Mixed-Effects Models Using Eigen and S4.,” R package version 1.1-7, 2014. [Online]. Available: http://cran.r-project.org/package=lme4. [Accessed: 07-Oct-2015]. [135] A. D. I. Kramer, L. M. Oh, and S. R. Fussell, “Using Linguistic Features to Measure Presence in Computer-Mediated Communication,” in Proceedings of the SIGCHI conference on Human Factors in computing systems - CHI ’06, 2006, pp. 913–916. [136] H. Hung and D. Gatica-Perez, “Estimating Cohesion in Small Groups Using Audio-Visual Nonverbal Behavior,” IEEE Trans. Multimed., vol. 12, no. 6, pp. 563–575, 2010. [137] J. R. Curhan and A. Pentland, “Thin Slices of Negotiation: Predicting Outcomes from 109  Conversational Dynamics within the First 5 Minutes.,” J. Appl. Psychol., vol. 92, no. 3, pp. 802–811, 2007. [138] T. Kim, A. Chang, L. Holland, and A. S. Pentland, “Meeting Mediator: Enhancing Group Collaboration using Sociometric Feedback,” in Proceedings of the 2008 ACM Conference on Computer Supported Cooperative Work, 2008, pp. 457–466. [139] R. Mead and M. J. Matari, “Probabilistic Models of Proxemics for Spatially Situated Communication in HRI,” in The 9th ACM/IEEE International Conference on Human-Robot Interaction, Algorithmic Human-Robot Interaction Workshop, 2014, pp. 3–9. [140] J. Biswas and M. Veloso, “Planar Polygon Extraction and Merging from Depth Images,” in IEEE International Conference on Intelligent Robots and Systems, 2012, pp. 3859–3864. [141] J. Biswas and M. Veloso, “Depth Camera Based Indoor Mobile Robot Localization and Navigation,” in IEEE International Conference on Robotics and Automation, 2012, pp. 1697–1702. [142] “ARTag,” Wikipedia, the Free Encyclopedia. [Online]. Available: https://en.wikipedia.org/wiki/ARTag. [Accessed: 07-Oct-2015].            110  Appendix  This appendix presents the details of the study advertisement used to recruit participants, consent forms, and questionnaires used in the human-subjects studies for this thesis. All of these documents, presented in figures, were approved by the Behavioral Research Ethics Board (BREB) at the University of British Columbia (UBC), under ethics application #H14-01937 “Telepresence and Occlusion Study”. A.1 Study advertisement  Figure A-1 presents the study advertisement for both preliminary and main human-subject studies in Chapter 4 that was posted around campus of the University of British Columbia.               111   Figure A-1. The call for volunteers that was posted around to recruit participants for human-subject studies. 112  A.2 Consent forms  The consent forms presented to the study participants for both preliminary and main human-subject studies in Chapter 4 are shown in Figure A-2 and Figure A-3 consecutively.  113   Figure A-2. Consent form used in the preliminary human-subject study. 114   Figure A-3. Consent form used in the main human-subject study.  115  A.3 Questionnaires  During the preliminary human-subject study in Chapter 4, the participants were presented with a series of questions ranging from demographics to questions from NASA TLX and SUS usability questionnaires followed by preference and comments. These questions are presented in Figure A-4 to A-8.  During the main human-subject study in Chapter 4, the participants completed a series of questions ranging from demographics to questions from spatial presence, engagement and social presence aspects of TPI questionnaire and NASA TLX followed by preference and comments. These questions are presented in Figure A-9 to A-16. 116   Figure A-4. Contents of the questionnaire for the preliminary study (page 1 of 5). 117   Figure A-5. Contents of the questionnaire for the preliminary study (page 2 of 5). 118   Figure A-6. Contents of the questionnaire for the preliminary study (page 3 of 5). 119   Figure A-7. Contents of the questionnaire for the preliminary study (page 4 of 5). 120   Figure A-8. Contents of the questionnaire for the preliminary study (page 5 of 5). 121   Figure A-9. Contents of the questionnaire for the main study (page 1 of 8). 122   Figure A-10. Contents of the questionnaire for the main study (page 2 of 8). 123   Figure A-11. Contents of the questionnaire for the main study (page 3 of 8). 124   Figure A-12. Contents of the questionnaire for the main study (page 4 of 8). 125   Figure A-13. Contents of the questionnaire for the main study (page 5 of 8). 126   Figure A-14. Contents of the questionnaire for the main study (page 6 of 8). 127   Figure A-15. Contents of the questionnaire for the main study (page 7 of 8). 128   Figure A-16. Contents of the questionnaire for the main study (page 8 of 8). 

Cite

Citation Scheme:

        

Citations by CSL (citeproc-js)

Usage Statistics

Share

Embed

Customize your widget with the following options, then copy and paste the code below into the HTML of your page to embed this item in your website.
                        
                            <div id="ubcOpenCollectionsWidgetDisplay">
                            <script id="ubcOpenCollectionsWidget"
                            src="{[{embed.src}]}"
                            data-item="{[{embed.item}]}"
                            data-collection="{[{embed.collection}]}"
                            data-metadata="{[{embed.showMetadata}]}"
                            data-width="{[{embed.width}]}"
                            async >
                            </script>
                            </div>
                        
                    
IIIF logo Our image viewer uses the IIIF 2.0 standard. To load this item in other compatible viewers, use this url:
http://iiif.library.ubc.ca/presentation/dsp.24.1-0223895/manifest

Comment

Related Items