@prefix vivo: . @prefix edm: . @prefix ns0: . @prefix dcterms: . @prefix skos: . vivo:departmentOrSchool "Applied Science, Faculty of"@en, "Electrical and Computer Engineering, Department of"@en ; edm:dataProvider "DSpace"@en ; ns0:degreeCampus "UBCV"@en ; dcterms:creator "Miehling, Erik J."@en ; dcterms:issued "2011-09-16T16:53:28Z"@en, "2011"@en ; vivo:relatedDegree "Master of Applied Science - MASc"@en ; ns0:degreeGrantor "University of British Columbia"@en ; dcterms:description "This thesis formulates a stochastic scheduler for use in adaptive resource management of a single Ground Moving Target Indicator (GMTI) radar when faced with tracking multiple, weakly maneuvering targets. The general problem involves first determining a priority allocation of the L targets, then determining the optimal time to spend using the specified allocation. We present a framework for computing the threat estimate and associated priority of each target in a surveillance environment, termed the radar macro-manager. The threat level of each target is computed based on its heading and proximity relative to a set of user-specified static assets. We present a weight assignment algorithm based on the geography of the surveillance region and use eigenvector centrality to assign vulnerability weights to each asset. The error in the threat level is computed based on the error-covariance matrix of each target, provided by a Kalman filter. Both the threat level and threat error are used to compute the respective priority rank distributions. From the priority distributions of each target we specify a queue of tasks to maximize a reward function associated with processing the queue. The queue is determined with the aid of structural results from the field of optimal issuing which involves ordering the priority rank distributions with respect to the monotone likelihood ratio. In particular, we compute an optimal queue which specifies the order in which we observe individual targets. The length of each target observation is controlled by an external stochastic process, termed the radar micro-manager. The problem of determining this optimal stopping time is formulated as a sequential decision process, a type of Markov decision process. We provide conditions such that the optimal policy is characterized by a monotone threshold policy on the partially ordered set of positive definite error covariance matrices of each target. Given that the optimal policy is monotone, we can efficiently approximate its form with an affine hyperplane using a hybrid random search - Simultaneous Perturbation Stochastic Approximation (SPSA) algorithm. Detailed numerical simulations evaluate the performance of both the radar macro-manager and radar micro-manager."@en ; edm:aggregatedCHO "https://circle.library.ubc.ca/rest/handle/2429/37417?expand=metadata"@en ; skos:note "Stochastic Target Scheduling for Radar Resource Management Threat Assessment and Optimal Threshold Policies by Erik J. Miehling B.A.Sc, University of British Columbia, 2009 A THESIS SUBMITTED IN PARTIAL FULFILLMENT OF THE REQUIREMENTS FOR THE DEGREE OF Master of Applied Science in THE FACULTY OF GRADUATE STUDIES (Electrical and Computer Engineering) The University Of British Columbia (Vancouver) September 2011 c© Erik J. Miehling, 2011 Abstract This thesis formulates a stochastic scheduler for use in adaptive resource manage- ment of a single Ground Moving Target Indicator (GMTI) radar when faced with tracking multiple, weakly maneuvering targets. The general problem involves first determining a priority allocation of the L targets, then determining the optimal time to spend using the specified allocation. We present a framework for computing the threat estimate and associated priority of each target in a surveillance environment, termed the radar macro-manager. The threat level of each target is computed based on its heading and proximity relative to a set of user-specified static assets. We present a weight assignment algorithm based on the geography of the surveil- lance region and use eigenvector centrality to assign vulnerability weights to each asset. The error in the threat level is computed based on the error-covariance matrix of each target, provided by a Kalman filter. Both the threat level and threat error are used to compute the respective priority rank distributions. From the priority distributions of each target we specify a queue of tasks to maximize a reward func- tion associated with processing the queue. The queue is determined with the aid of structural results from the field of optimal issuing which involves ordering the pri- ority rank distributions with respect to the monotone likelihood ratio. In particular, we compute an optimal queue which specifies the order in which we observe in- dividual targets. The length of each target observation is controlled by an external stochastic process, termed the radar micro-manager. The problem of determining this optimal stopping time is formulated as a sequential decision process, a type of Markov decision process. We provide conditions such that the optimal policy is characterized by a monotone threshold policy on the partially ordered set of posi- tive definite error covariance matrices of each target. Given that the optimal policy ii is monotone, we can efficiently approximate its form with an affine hyperplane using a hybrid random search - Simultaneous Perturbation Stochastic Approxima- tion (SPSA) algorithm. Detailed numerical simulations evaluate the performance of both the radar macro-manager and radar micro-manager. iii Preface The research conducted over the duration of my Master’s degree concerned the field of stochastic control and optimization, primarily motivated by applications in defence. The work presented in this dissertation is based on the papers listed below. 1. V. Krishnamurthy, R. Bitmead, M. Gevers, and E. Miehling. Sequential de- tection with mutual information stopping cost. submitted to IEEE Transac- tions on Signal Processing, 2011. 2. E. Miehling, K. Topley, V. Krishnamurthy, and M. Goulding. Threat Assess- ment and Stochastic Target Prioritization for Radar Resource Management. In progress, 2011. The contributions of the dissertation’s author in the above publications is out- lined as follows. Chapter 2 discusses the radar manager structure and the target model. The author’s contributions lie in analyzing the validity of modeling as- sumptions such as the LTI nature of the model and the inclusion of a probability of detection in the model to account for missed detections. Chapter 3 focuses on the task prioritization component of the radar manager and is largely comprised of material from the (in progress) paper, “Threat Assessment and Stochastic Target Prioritization for Radar Resource Management.” The majority of the work pre- sented in this chapter is that of the dissertation’s author, namely, the formulation of the threat function, threat rank function, and the application of optimal issuing poli- cies to the problem of stochastic task prioritization. Chapter 4 deals with the task scheduling component of the radar manager. The work presented in this chapter is based on the theory and ideas found in the paper, “Sequential detection with mutual iv information stopping cost.” The author helped in the application of the monotone threshold policy to the GMTI radar problem. Much effort was made by the author to ensure seamless integration between the two levels of the radar manager. The numerical results in Chapter 5 were also solely the work of the dissertation author. All simulations were coded in Matlab with significant effort made to ensure that the simulation scenarios were relevant to real world surveillance situations. v Table of Contents Abstract . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . ii Preface . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . iv Table of Contents . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . vi List of Tables . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . ix List of Figures . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . x Acknowledgments . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . xii Dedication . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . xiii 1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1 1.1 Background and Motivation . . . . . . . . . . . . . . . . . . . . 2 1.1.1 Why Dynamic Programming? . . . . . . . . . . . . . . . 7 1.1.2 Motivation . . . . . . . . . . . . . . . . . . . . . . . . . 8 1.2 Contribution and Main Results . . . . . . . . . . . . . . . . . . . 9 1.2.1 Main Results . . . . . . . . . . . . . . . . . . . . . . . . 10 1.3 Thesis Structure . . . . . . . . . . . . . . . . . . . . . . . . . . . 12 2 Radar Manager and Target Dynamics . . . . . . . . . . . . . . . . . 13 2.1 Radar Manager Structure . . . . . . . . . . . . . . . . . . . . . . 13 2.2 Target Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17 2.3 Bayesian Tracker . . . . . . . . . . . . . . . . . . . . . . . . . . 20 vi 3 Radar Macro-manager . . . . . . . . . . . . . . . . . . . . . . . . . 23 3.1 Radar Macro-manager Structure . . . . . . . . . . . . . . . . . . 23 3.2 Macro-manager Scheduler . . . . . . . . . . . . . . . . . . . . . 24 3.2.1 Optimal Issuing Policy . . . . . . . . . . . . . . . . . . . 27 3.2.2 Target Prioritization . . . . . . . . . . . . . . . . . . . . 29 3.3 Threat Function . . . . . . . . . . . . . . . . . . . . . . . . . . . 32 3.3.1 Threat Estimate . . . . . . . . . . . . . . . . . . . . . . . 33 3.3.2 Asset Weight Assignment . . . . . . . . . . . . . . . . . 39 3.3.3 Threat Estimate Uncertainty . . . . . . . . . . . . . . . . 44 3.4 Threat Rank Function . . . . . . . . . . . . . . . . . . . . . . . . 47 3.4.1 Distribution Assignment . . . . . . . . . . . . . . . . . . 51 4 Radar Micro-manager . . . . . . . . . . . . . . . . . . . . . . . . . . 58 4.1 Radar Micro-manager Structure . . . . . . . . . . . . . . . . . . 60 4.2 Optimal Stopping as a Sequential Decision Process . . . . . . . . 60 4.2.1 Decision Epochs . . . . . . . . . . . . . . . . . . . . . . 62 4.2.2 Target States and State Update . . . . . . . . . . . . . . . 62 4.2.3 Action Space . . . . . . . . . . . . . . . . . . . . . . . . 65 4.2.4 Operating and Stopping Costs . . . . . . . . . . . . . . . 65 4.2.5 Objective: Optimal Stopping Time . . . . . . . . . . . . . 70 4.3 Monotone Threshold Policy . . . . . . . . . . . . . . . . . . . . 74 4.4 Parameterized Policy . . . . . . . . . . . . . . . . . . . . . . . . 76 4.4.1 Parameterizations . . . . . . . . . . . . . . . . . . . . . . 77 4.4.2 Policy Estimation via Stochastic Approximation . . . . . 77 5 Numerical Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . 81 5.1 Macro-manager Numerical Results . . . . . . . . . . . . . . . . . 82 5.2 Micro-manager Numerical Results. . . . . . . . . . . . . . . . . . 85 5.2.1 Target Fly-by . . . . . . . . . . . . . . . . . . . . . . . . 85 5.2.2 Persistent Surveillance . . . . . . . . . . . . . . . . . . . 89 6 Conclusions and Future Work . . . . . . . . . . . . . . . . . . . . . 95 6.1 Summary of Work Accomplished . . . . . . . . . . . . . . . . . . 95 6.2 Future Work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 96 vii Bibliography . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 98 A Appendix: Tables . . . . . . . . . . . . . . . . . . . . . . . . . . . . 107 B Appendix: Derivations . . . . . . . . . . . . . . . . . . . . . . . . . . 109 B.1 Target Model Derivation . . . . . . . . . . . . . . . . . . . . . . 109 B.2 Approximation and Justification of Linear Gaussian State Space Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 111 B.3 Kalman Filtering with Intermittent Observations . . . . . . . . . . 114 B.4 Threat Error Covariance Derivation . . . . . . . . . . . . . . . . 116 C Appendix: Proofs . . . . . . . . . . . . . . . . . . . . . . . . . . . . 119 C.1 Proofs: Macro-manager . . . . . . . . . . . . . . . . . . . . . . 119 viii List of Tables Table A.1 Rate of change of Jacobian matrix for various running times. . 107 Table A.2 Ratio of second-order term to first-order term of Taylor series expansion, ζ = 0.1. . . . . . . . . . . . . . . . . . . . . . . . 107 Table A.3 Ratio of second-order term to first-order term of Taylor series expansion, ζ = 0.8. . . . . . . . . . . . . . . . . . . . . . . . 108 ix List of Figures Figure 1.1 The categorization for radar resource management algorithms. 4 Figure 2.1 A high-level schematic of the proposed radar manager. . . . . 14 Figure 2.2 The two time scale set-up of the radar manager corresponding to the target priority queue. . . . . . . . . . . . . . . . . . . . 15 Figure 2.3 The three time scales of the radar manager corresponding to the target priority allocation vector. . . . . . . . . . . . . . . 16 Figure 2.4 Typical fly-by scenario of a GMTI radar. . . . . . . . . . . . . 17 Figure 3.1 A schematic of the radar macro-manager. . . . . . . . . . . . 25 Figure 3.2 Utility function, d(x), representing the valuation of an obser- vation at a given value of accumulated threat reward. . . . . . 29 Figure 3.3 Multi-target, multi-asset scenario. . . . . . . . . . . . . . . . 34 Figure 3.4 Radial velocity component and range vector used to determine threat of target to asset. . . . . . . . . . . . . . . . . . . . . . 34 Figure 3.5 A typical target-asset scenario. . . . . . . . . . . . . . . . . . 37 Figure 3.6 Threat level estimates (both first and second order) compared to the true threat value for L = 4 targets. . . . . . . . . . . . . 40 Figure 3.7 Typical road network with asset locations. . . . . . . . . . . . 41 Figure 3.8 Road network represented as a graph. . . . . . . . . . . . . . 41 Figure 3.9 True threat error covariance compared to first-order approxi- mation of threat error. . . . . . . . . . . . . . . . . . . . . . . 46 Figure 3.10 True threat error covariance compared to second-order approx- imation of threat error. . . . . . . . . . . . . . . . . . . . . . 47 Figure 3.11 Form of target rank assignment functions. . . . . . . . . . . . 50 x Figure 3.12 Distribution of threat for a fixed target trajectory at a particular time. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 51 Figure 3.13 The dependency of the mean and variance of a binomial distri- bution on the parameter p for a fixed parameter N. . . . . . . 52 Figure 3.14 The least-squares estimate of the parameter p of a binomial distribution for a given rank mean and variance. . . . . . . . . 54 Figure 3.15 The relative errors in the rank mean and rank variance obtained from the least-squares estimate, pl∗n . . . . . . . . . . . . . . . 54 Figure 4.1 A schematic of the radar micro-manager. . . . . . . . . . . . 61 Figure 4.2 Timing of events for the micro-manager. . . . . . . . . . . . . 64 Figure 4.3 Deterministic sample-path cost, D(P,τ) versus τ . . . . . . . . 72 Figure 4.4 Sample convergence plot of the SPSA algorithm. . . . . . . . 80 Figure 5.1 Average gain in reward for each target observation using vari- ous ordering policies. . . . . . . . . . . . . . . . . . . . . . . 84 Figure 5.2 Performance of optimal policy for each intial target set, m. . . 84 Figure 5.3 Typical target and GMTI sensor trajectories. . . . . . . . . . . 86 Figure 5.4 Figure showing one cycle of micro-manager with accuracy of state estimates for an arbitrary priority vector. . . . . . . . . . 87 Figure 5.5 Surface plot representing dependence of the sample-path cost versus probability of detection and operating cost. . . . . . . . 89 Figure 5.6 Plot of sample-path cost of variance periodic policies and the parameterized policy versus ordered initial conditions. . . . . 90 Figure 5.7 Representation of the persistent surveillance scenario in GMTI systems. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 91 Figure 5.8 Track estimates in persistent surveillance case over multiple micro-manager scheduling intervals, with true tracks and track estimates. . . . . . . . . . . . . . . . . . . . . . . . . . . . . 93 Figure 5.9 Plot of log-determinants for each target over multiple micro- manager scheduling intervals. . . . . . . . . . . . . . . . . . 94 Figure B.1 Acceleration during the kth iteration of length T . . . . . . . . 110 xi Acknowledgments The research conducted over the course of my Master’s degree would not have been possible without the guidance and support of several individuals. First of all, my utmost gratitude is directed towards my Master’s supervisor, Professor Vikram Krishnamurthy. Your knowledge helped to guide my research along novel and in- teresting paths, while your patience provided me with the freedom to think through and solve interesting research problems. I would also like to thank the Department of National Defence (DRDC-Ottawa) for generously funding the research during my Master’s degree. I am honored to have had the opportunity to have worked directly with multiple defence scientists and engineers throughout my Master’s de- gree. In particular, Dr. Bhashyam Balaji and Dr. Jack Ding at Defence Research and Development Canada (DRDC) - Ottawa and Dr. Martie Goulding at MacDon- ald, Dettwiler and Associates (MDA) Corporation, were invaluable in bridging the gap between the theory and application of the algorithms developed. Furthermore, I would like to thank my family for their continuous encouragement throughout the course of my degree. Your endless support has always made it possible for me to find the courage to tackle intimidating new problems throughout my life. Lastly, I would like to extend my appreciation to all members of the Statistical Sig- nal Processing laboratory at the University of British Columbia. The exceptional knowledge of each one of you helped me immensely throughout my degree. – Erik J. Miehling xii Dedication To my loving parents xiii Chapter 1 Introduction With the advent of state-of-the-art technology comes new capabilities and an as- sociated need for advanced methodologies in order to realize the technology’s full potential. For example, with the introduction of the internet, came the ability for computers to communicate over a network. In order to ensure this communica- tion was efficient and reliable, a set of communication protocols were developed, termed the TCP/IP (Transmission Control Protocol and Internet Protocol). Another example, reaching further into the past, concerns the introduction of the computer and the development of the operating system. With the initial introduction of the computer in the early 1950s, tasks were only able to be processed one at a time, forcing other queued tasks to wait until the completion of the previous job. The introduction of the concept of an operating system allowed the sharing of com- puter resources between multiple jobs, through the use of timesharing and batch processing, resulting in a much higher rate of use. The need for novel algorithms is especially pertinent in the field of radars and surveillance. A radar is a system which sends pulses of electromagnetic (radio) waves towards an object and deduces characteristics of objects in its path based on properties of the reflected signal. The original motivating factor for the technol- ogy was for the use in military applications, but today, radar systems are used in a variety of applications including the areas of astronomy, air traffic control, me- teorology, and surveillance, among others. Classically, in applications of surveil- lance, systems had multiple, mechanically steered radars, with each radar devoted 1 to a particular function (tracking, search, weapon assignment). However, with the development of Electronically Steered Phased Array Radars (ESPARs), it became possible for one radar to perform many functions, eliminating the need for multi- ple, dedicated radars. This was made possible by the ability of ESPAR systems to change radar parameters (beam position, dwell time, waveform, and energy level [20]) instantaneously. Naturally, this raises the question: How does one optimally change the radar parameters to best fit the operating mode and to achieve the best possible performance of the radar? The techniques used to control the radar pa- rameters are classified under the term radar resource management (RRM). 1.1 Background and Motivation The field of radar resource management has been an area of increasing interest over the past few decades. The specific purpose of any RRM algorithm is to analyze the mission profile and generate appropriate tasks through the assignment of particular radar operating modes. Due to the diversity of mission profiles and the correspond- ing tasks, modern multi-function radars (MFRs) may be faced with a multitude of possible jobs to perform. Since the radar has a finite amount of resources (time, energy, or processing power [20]) to expend, clearly a challenge would arise in a scenario where the number of tasks outweighs the available resources. In this sce- nario, tradeoffs would need to be made. It would be of great interest to determine the “best” tradeoff by specifying a method to optimally allocate the finite amount of resources between the multiple tasks. The method of determining an optimal tradeoff is dependent upon the opti- mization goal in mind. The general techniques of RRM are commonly considered to be either track maintenance, target search, or scheduling [62, 92]. This disserta- tion will focus on the last of the aforementioned methods, the issue of scheduling, as well as the associated task of determining the priority of each task. The need for determining task priorities is relevant in any scenario where there are more tasks to perform than can reasonably be completed within a limited amount of time (or with a limited amount of resources). It would then be logical to prescribe some sort of prioritization, such that the more important tasks are handled earlier than less important tasks. Before delving into the details of the formulation of models 2 and results of the thesis, a background of other algorithms is presented. The complete problem of optimally allocating the ESPAR’s resources, while considering the possibility for a variety of operating modes, is a highly complex stochastic control problem. As a result, the work in the literature either make sig- nificant assumptions or focus on a subset of the full problem in order to obtain a solution. All radar scheduling techniques can be separated into two main cate- gories: non-adaptive and adaptive [20, 99]. Non-adaptive techniques [29, 67], that is, heuristic schedulers, are based on rule-based design. The task prioritization and scheduling is typically pre-computed and static, enabling the convenience of real- time use of the scheduler, but coming at the sacrifice of unknown, potentially poor, performance. Advanced multi-function radars primarily use adaptive scheduling algorithms, thus the remaining portion of the literature review will be spent dis- cussing various adaptive scheduling algorithms. An adaptive scheduling algorithm is one in which the prioritization and scheduling is determined dynamically to opti- mize system performance. The increased performance, over non-adaptive methods, comes at a higher (sometimes intractable) level of computation. Various algorithms work to alleviate the computation required to obtain an optimal solution. We cate- gorize these algorithms based on the classification found in [20], which states that the vast majority of adaptive algorithms can be divided into four main categories: artificial intelligence, Quality of Service Resource Allocation Model (Q-RAM), resource-aided, and dynamic programming algorithms. The reader is directed to Figure 1.1 for a visual representation of the algorithm categorization. The first main category within the field of adaptive scheduling algorithms is the well-researched area of artificial intelligence. Many methods in various subfields of artificial intelligence have been applied to the problem of both task prioritization and scheduling. First, techniques in the subfield of neural networks (models based on the interconnected network of nodes found in biological neural networks) have been used to solve both of the aforementioned problems: prioritization [46] and scheduling [39]. The discussion in [46] introduces a procedure for assigning prior- ity values to targets based on the structure of an artificial neuron. The priority of a target is computed based on a weighted sum of five factors: range, radial velocity, acceleration, rank, and signal (friend or foe). The weights are computed via a learn- ing method using back propagation. The associated problem of task scheduling has 3 Radar Resource Management Adaptive Algorithms Non-adaptive Algorithms Resource-aided Algorithms Dynamic Waveform Algorithms Adaptive Update- Rate Algorithms Dynamic Programming Artificial Intelligence Neural Networks Expert Systems Fuzzy Logic Q-RAM Figure 1.1: The categorization for radar resource management algorithms. also been studied with the use of neural networks. The discussion in [39] analyzes a formulation of task interleaving for track maintenance of multiple targets, taking advantage of the idle time between sending and receiving a pulse. The optimal interleaving network is based on a Hopfield network [36], a special type of neural network. Another subfield of artificial intelligence, termed an expert system, has been applied to the problems of prioritization and scheduling. An expert system is a system which includes a database of human (expert) knowledge used to resolve discrepancies in results and clarify uncertainties. One of the first applications of expert systems to the problem of radar resource allocation can be found in [94]. Here, the authors use an expert system to analyze the surveillance environment and modify the radar parameters and modes accordingly. The work of [70] has applied the theory to the selection of the transmit signal by controlling the signal’s type of modulation, carrier frequency, pulse duration, and pulse repetition interval (PRI). The last main subfield of artificial intelligence is the area of fuzzy logic. Fuzzy logic is a form of logic in which there exists more than just the binary outcomes: “true” and “false,” allowing for a truth value ranging anywhere in between. Mul- tiple papers in the literature analyze how fuzzy logic can be applied to solve the 4 task prioritization problem [62–64, 95]. Characteristics of targets and tasks are in- herently vague, thus descriptions such as friendly or dangerous or high-threat can be represented mathematically with the aid of fuzzy logic, providing flexibility in priority assignment and scheduling requirements [95]. Another category of algorithms which aim to allocate resources according to a measure of Quality of Service (QoS), termed Q-RAM algorithms, solve both the task prioritization and task scheduling problems simultaneously. The main idea behind Q-RAM techniques is, given a finite set of tasks, to assign an operating setpoint reflecting how many resources are allocated to each task such that the sys- tem utility is maximized [20]. The structure of Q-RAM algorithms consist of three main components [27]: 1) Q-RAM block, solves a convex optimization problem to maximize utility by considering many system parameters; 2) schedulability enve- lope, defines a region of schedulability as a convex constraint for the optimization problem; and 3) template-based scheduler, generates a dwell schedule based on parameters provided by the Q-RAM block. The scheduler in turn provides infor- mation on the scheduling of tasks to the Q-RAM block such that the scheduling constraint can be modified if necessary. The prominent work of [74] helped to provide the core foundation for the application of QoS management to resource allocation. Q-RAM can be treated as a nonlinear or integer programming problem, solved offline, resulting in the application to be restricted to static environments [35]. Classically, this prevented Q-RAM algorithms from being applied to the highly-dynamic problem of radar resource management; however, recent papers have adapted Q-RAM techniques such that they are more applicable [28, 34, 84]. In particular, [84] formulated a template-based dwell scheduler with timing and en- ergy constraints. Tasks at run-time were then fit into the appropriate pre-computed templates. Other methods to improve optimization time [28, 34] have helped to make the radar application of Q-RAM more practical and relevant. The next main category of radar resource management algorithms are termed resource-aided algorithms. The purpose of these algorithms is not to provide a specific task prioritization or schedule, but instead to modify the lower-level be- haviour and parameters of the radar to obtain improved functionality. The two main approaches in this class of algorithms modify either, 1) The radar waveform, or 2) The update rate of the Bayesian tracker. The first category typically termed 5 dynamic waveform algorithms aim to modify the waveform signal to reduce the demand for resources (time, energy, or processing power). The precise choice of the waveform depends on the operating mode in question. The work of Kershaw and Evans discuss optimal waveform selection for target tracking purposes and probabilistic data association [43, 44]. Their results rely on the assumption of an optimal receiver, such that the one-step ahead waveform optimization problem can be solved to minimize the mean-squared tracking error at the next time step. Other work studies the waveform selection problem for the application of target classi- fication [87]. The choice of waveform minimizes the decision time by essentially maximizing the information in the returned radar signal. Additional papers in the literature approach the waveform selection problem for target detection [52], by minimizing the time taken to detect new targets, and target tracking [53], by min- imizing the tracking error of each target. The second category of resource-aided algorithms, termed adaptive update rate algorithms control the sampling rate of the tracker. Traditional trackers use uniform update rates for measurements; however, it is possible to use an adaptive update rate and achieve improved radar perfor- mance. The work of Keuk and Blackman [93] deals with optimizing radar param- eters (SNR, track sharpness, detection threshold) with respect to radar load. They also discuss a calculation of the target revisit time, which is based upon the esti- mated lack of information regarding the respective target. An alternate adaptive update rate algorithm [85], is based upon an IMM model, the purpose of which is to estimate and predict the target states as well as estimate the process noise. The update interval is computed from the process noise estimation to reduce the number of track updates per unit time. Additional work [81] builds upon previous adaptive update rate algorithms by considering scenarios with high rates of missed detections and thus improving the performance of the tracker in the presence of clutter. The final method of radar resource management discussed, and a method used in this dissertation, is the area of (stochastic) dynamic programming. Dynamic programming is a tool that allows optimization problems to be broken down such that decisions can be made in stages [10]. Current decisions are made to not only achieve a low present cost, but also low expected costs for all future decision epochs. Stochastic dynamic programming, a generalization of dynamic program- 6 ming which no longer assumes that the state transitions are deterministic, has been well researched in the area of radar resource management for the problem of both task prioritization and task scheduling. The work of Strömberg and Grahn [89] present algorithms for search and track functions based on dynamic programming; however, their solution method relies on a brute force optimization (computational requirements grow as the square of the number of tasks to be scheduled), making the algorithm intractable for large (realistic) surveillance applications. This compu- tational burden of dynamic programming approaches is typically termed the curse of dimensionality. However, work has been done to drastically improve the com- putational requirements. In particular, the application of the theory of multi-armed bandits to sensor resource management [50, 97] makes stochastic dynamic pro- gramming a more attractive, practical approach. Specifically, these papers discuss how to compute the Gittins index, which allows the general multi-armed bandit problem to be separated into independent optimization problems, allowing the for- mulations of optimal and suboptimal beam scheduling algorithms. The aforemen- tioned formulations assume that the states of the “un-played arms” stay constant; however, more recent work has offered a generalization of the multi-armed bandit problem to allow for the states of the un-played arms to vary over time. This is termed the restless bandit problem and has been applied to the problem of mini- mization tracking error [83] in a radar surveillance scenario. 1.1.1 Why Dynamic Programming? As will be discussed later, the component of the radar manager in this dissertation which deals with the task scheduling problem is built upon the theory of stochas- tic dynamic programming. In order to provide justification for this choice, the disadvantages and shortcomings of other radar resource management algorithms are presented. First, a common theme in the artificial intelligence algorithms is the requirement for a form of a knowledge base. This necessitates learning meth- ods, in the case of neural networks, or generation of fuzzy rules, in the case of fuzzy logic. The creation of these knowledge bases is typically slow and compu- tationally expensive, making these methods largely impractical and inefficient for use in radar resource management [20]. The next class of algorithms that were 7 discussed were Q-RAM methods. Since these methods rely on assigning an op- erating setpoint, each setpoint typically needs to be evaluated before determining the optimal choice. For large or dynamic systems, this is not tractable [27]. Also, analysis has been carried out on the performance of Q-RAM techniques and it was shown that the resulting performance is far from optimal [37]. As stated in Section 1.1, resource-aided algorithms do not offer a solution to the task prioritiza- tion or scheduling problems, rather, they provide improved performance in systems already employing radar resource management techniques. As a result, a discus- sion of their performance drawbacks would not help to provide justification for our choice of a resource management algorithm. The final resource management algo- rithm that was discussed was the area of dynamic programming. As discussed in the previous section, dynamic programming algorithms typically suffer from the curse of dimensionality, making their application to the dynamic problem of radar resource management impractical. However, the approach offered in this disserta- tion builds upon structural results discussed in earlier work [49, 99] and applies the theory to resource management in GMTI systems. The structural results enable the dynamic programming problem to be efficiently solved and effectively applied to realistic radar applications. 1.1.2 Motivation The motivation for the work presented in this dissertation is to offer a new, novel approach to radar resource management. This approach aims to offer a complete radar resource manager (in combination with the results of [51]). It consists of two main functions, task prioritization and task scheduling, and is based on struc- tural results in the fields of optimal issuing policies and dynamic programming, respectively. The task prioritization component, termed the macro-manager, takes a semantic approach to prioritization, by ranking targets based on the operator’s incentive to observe them. The reason for this approach is that it is typically quite difficult to exactly quantify the reward for observing a particular target before an observation has been made. For example, consider an image of a scenario, it is not clear how to assign a number value to the amount of information obtained from 8 a person observing the image.1 The other component of the proposed radar re- source management algorithm, termed the micro-manager, is responsible for the task scheduling component. Based on the structural results of [49], we extend the theory to the application of observation scheduling in a GMTI scenario. By pre- senting the algorithms in terms of structural results, we avoid the typical drawbacks of the resource management techniques, allowing our algorithms to be applied in realistic systems. The goal for developing radar resource management algorithms is to improve one’s ability to make conclusions on the intent of targets based on observation of their states. However, the reader should keep in mind that the algorithms presented in this dissertation are intended to aid the human operator. One of the goals of the scheduler, from a design perspective, is to compute the threat of multiple targets more quickly and more accurately than a human operator would otherwise be able to do so without the aid of this system. For this reason, the scheduler proposed is thought to be more of an addition, rather than a replacement, to the human element of target surveillance. 1.2 Contribution and Main Results In this thesis, we consider a general scenario where a single electronically steered GMTI radar is tasked with tracking multiple targets. Under these conditions, one would be interested in determining a management scheme for optimally tracking all targets with a certain degree of accuracy. We propose a dual time-scale method of radar resource management which performs task prioritization and task schedul- ing simultaneously. The component of the radar manager responsible for task pri- oritization, operating on the slow time-scale, is termed the radar macro-manager. In particular, the macro-manager specifies one of two prioritizations: 1) a priority allocation vector, or 2) a priority queue of target observations, although the main focus of this dissertation is on the latter of the two prioritization choices. The micro-manager operates on the fast time-scale and is responsible for determining the length of the scheduling interval for a given task prioritization. This is achieved 1For readers who are interested in the conflict between semantic importance and resource utiliti- zation, see [34] for a good discussion. 9 by using structural results in the field of dynamic programming and Markov deci- sion processes. There is an inherent tradeoff in the micro-management problem. Since it is assumed that our radar can only observe one target at a given time, by observing one target and improving the accuracy of its measurements, we are simultaneously predicting the states of all other targets, resulting in poorer mea- surements over time. The goal is to minimize the uncertainty of the current target through a function of not only its own level of uncertainty, but also the uncertainty of all other targets in the system. The resulting dual time-scale radar manager performs two objectives, prioritization and scheduling, concurrently. 1.2.1 Main Results The main results of this dissertation are summarized in the following points, • Full formulation of dual time-scale radar manager: The proposed scheduler offers a full formulation of a dual-time scale system for radar resource man- agement. The macro-manager, responsible for determining the task prioriti- zation, operates on the slower of the two time scales. When the prioritized list has been computed (based on target states estimated via a Kalman fil- ter) it is then passed to the micro-manager. The micro-manager operates on the fast time scale and is in charge of determining the optimal stopping time associated with the prescribed task prioritization. • Target Prioritization Queue: The work here builds upon the ideas developed in [49, 51] by presenting the addition of a complete formulation of the radar macro-manager. We define the prioritized task list, computed using a struc- tural result from the field of optimal issuing policies [78], as a prioritized queue of target observations. The queue is computed from a measurement of threat of each target based on the following three factors: 1) Target proximity and heading relative to sensitive assets; The threat for each target is computed based on the target state (position and velocity) rela- tive to every asset in the surveillance region. We build upon the formulation found in [42] by allowing for targets with non-constant velocity and also consider the range of the target to the asset in the scenario where the velocity is directed away from the asset (unlike in [66]). From the threat level of each 10 target-asset pair, we compute a weighted sum based on the vulnerability to compute the threat level of each target. 2) Measure of vulnerability of each asset; We use a method from the field of link analysis [21] termed eigenvector centrality. Given a road network and known asset locations, we construct a graph and the associated adja- cency matrix based on a measure of each road’s accessibility. From this, the dominant eigenvector is computed, representing the centrality of each node. Knowing which of the nodes correspond to assets allows us to determine the relative vulnerability of each asset from the dominant eigenvector. The total threat of each target to all assets is then computed as the weighted (vulnera- bility) sum of each target-asset threat. 3) Threat estimation error of each target; Out of the papers in the literature dealing with threat assessment of a target, almost all do not consider the both the error in the target state estimation as well as the target estimate as part of the threat computation. In this dissertation, when computing the priority of a target’s observation, we consider both the threat estimate and its associated error, reflecting the fact that, given two targets with an equivalent threat es- timate, we should be more concerned about a target with a high uncertainty, than that of a low uncertainty. The threat estimate error is computed based on a second-order Taylor expansion of the threat model, using the error co- variance matrix returned from the Kalman filter tracker. • Optimal threshold policies in GMTI systems: The results of [49] discussing conditions for the existence of an optimal monotone threshold policy are ex- tended to the application of target tracking using a GMTI radar. In GMTI systems, the radar is mounted on a moving aircraft and measurements are taken in terms of range, range rate, and azimuth, which are all nonlinear in the Cartesian target states (position and velocity). As a result, we linearize the measurement model and show that it is nearly linear and time-invariant for typical operating conditions, thus warranting the use of a Kalman fil- ter. We also model the fact that the radar will inherently miss detections by modifying the Kalman filter updates accordingly. Missed detections are due to multiple factors [61] such as an excess of external noise, adverse target 11 behaviour (velocity of target drops below the minimum detectable velocity [76]), or the presence of measurement clutter. Details of the target model can be found in Section 2.2. • Numerical results: We present a thorough analysis of the performance of the proposed scheduler. In order to accurately evaluate the performance of each component of the radar manager, we run numerical simulations for both the radar macro-manager and micro-manager, comparing the performance of each to other methods. Simulations are motivated by typical scenarios in surveillance, such as a fly-by scenario, where the aircraft follows a constant- velocity trajectory, and a persistent surveillance scenario where the target region is continually observed. 1.3 Thesis Structure The remaining chapters of the thesis are laid out as follows. Chapter 2 will discuss the general radar manager, the observation and target model, and the legacy tracker. Chapter 3 analyzes the problem of determining a task prioritization, termed the radar macro-manager. The process of computing the prioritized task list involves formulating the threat function, computing asset vulnerabilities, and calculating the threat estimate uncertainty. Chapter 4 discusses radar micro-management problem, responsible for determining the optimal time to use the specified task prioritization provided by the macro-manager. We derive conditions to ensure the optimal policy takes a monotone (threshold) form. We state a stochastic approximation algorithm that forms an affine hyperplane approximation to the policy. Detailed numerical simulations can be found in Chapter 5. Simulations analyze the performance of both levels of the radar manager. Finally, Chapter 6 presents conclusions and po- tential directions for future work. Derivations of models and proofs of the main results of the thesis are located in Appendices B and C, respectively. 12 Chapter 2 Radar Manager and Target Dynamics 2.1 Radar Manager Structure The radar manager is divided into two main levels of abstraction. These two layers are termed the macro-manager and the micro-manager. The macro-manager con- tains three elements: the threat function, the threat rank function, and the macro- manager scheduler. The macro-manager is responsible for determining a target prioritization of the L targets in the system. Depending on the type of radar and general surveillance structure in question, this prioritization could correspond to one of a variety of alternatives. Two potential choices are listed below, 1. Target priority queue; queue of L targets in the system; this queue specifies the order in which targets should be observed, 2. Target priority allocation vector; ν = [ ν1,ν2, . . . ,νL ] , where∑∀l ν l = 1. The value ν l ≥ 0 corresponds to the assigned priority of target l. The target prioritization is determined from the Bayesian state estimates of the target states (from the Bayesian tracker contained in the radar micro-manager) as well as the location of sensitive assets in the observation environment. The details of the macro-manager are discussed in Chapter 3. The micro-manager block 13 contains three elements: the electronically steered multi-function radar, a Bayesian tracker, and the micro-manager scheduler itself. The micro-manager is responsible for determining the optimal stopping time associated with the current assigned target prioritization. This decision is based off of calculating the optimal stopping set using the system and model parameters. The micro-manager is discussed in Chapter 4. A high-level schematic outlining the interaction between the micro- manager and macro-manager can be seen in Figure 2.1. Macro-manager(1) Micro-manager(3) (fast time-scale, k)(4) Bayesian estimate(5) ... (9) Highest priority target(10) Lower priority targets(11) (fast time-scale, n)(2) a (slow time-scale, n)(2) Target Prioritization Figure 2.1: A high-level schematic of the proposed radar manager. The macro-manager (see Chapter 3), which determines the optimal prior- ity allocation and specifies the desired queue (observation order) of the targets, l ∈ {1,2, . . . ,L}, operates on the slow time-scale, n. The micro- manager (see Chapter 4) determines the optimal time to continue using the specified priority allocation and operates on the fast time-scale, k. It is necessary to provide some high-level details of the radar operation before proceeding to the following sections. In particular, we differentiate between the operation of the radar when a particular choice of target prioritization is made. The first choice considered for the target prioritization is the target priority queue. In 14 this case, the macro-manager, which operates on the slow time-scale, n, specifies a queue of the L targets which maximizes a defined reward. In this prioritization choice, the micro-manager receives measurements on the fast time-scale, k, and processes the queue sequentially, devoting all resources to the current target in the queue (filtering) while devoting no resources to the other L−1 targets (predicting).1 The time-scale interaction for this prioritization can be seen in Figure 2.2. The Micro-manager time scale Macro-manager time scale n = 1 n = 2 n = 3 n = 4 n = 5 k = 1 . . .= 1 . . .= 1. . .. . .= 2 = 2 = 2} } } 1st Queued Target 1 Queued Target 2nd Queued Target 2 Queued Target Lth Queued Target τo12 τ o2 2 τ oL 2 Figure 2.2: The two time scale set-up of the radar manager corresponding to the target priority queue. Targets are observed in an order specified by the macro-manager while the L−1 other targets are predicted based on their respective models. length of a given iteration at the macro-manager level, n, is defined by the sum of the optimal tracking times that the micro-manager specifies for each target in the queue τo1n ,τo2n , . . . ,τoLn , where each ol takes on a unique target index l ∈L . The length of a tracking interval, n, is highly dependent upon the level of uncertainty in the targets, but is typically on the order of 10 to 50 iterations, which corresponds to 1 to 5 seconds at the typical sampling time of T = 0.1s. It should be noted that the iterations of this time scale are not uniformly spaced, unlike that of the micro-manager time scale, k. The operation of the radar manager is slightly different in the case of the tar- get prioritization taking the form of a priority allocation vector, ν . We introduce a higher resolution tracker time-scale, t, at which individual observations are re- 1This corresponds to the case of the target priority vector specifying ν l̄ = 1 for the highest priority target, l̄, and ν l = 0 for all of the L−1 other target, l 6= l̄. 15 ceived. Each iteration of t is assumed to be on the order of milliseconds. The inclusion of the tracker time scale can be seen via the following argument. It is useful to consider the priority vector, ν , as a vector specifying the ratio of the micro-management cycle spent looking at each target. For example, for a priority vector ν = [ν1,ν2] = [0.7,0.3], the radar would observe target l = 1 for 70% of the tracking time, τ , and target 2 the remainder of the time (30%).2 Since the stopping time τ is unknown, we switch between the targets on a much faster time-scale, termed the tracker time scale, such that the tracking time ratios specified in the priority vector ν are satisfied for each time k. Since a decision to stop or continue with the current priority vector is made at the micro-manager time-scale, k, we are ensured to satisfy the prescribed priority allocation, regardless of the time we re- lease control back to the macro-manager. The reader is instructed to refer to Figure 2.3 for a graphical representation of the various time-scales. A detailed analysis of the micro-management problem for the target priority vector case is presented in [51]. n = 1 n = 2 n = 3 n = 4 k = 1 k = 2 k = 3 t = 1 t = 2 t = 3 . . . t = 100 . . . Micro-manager time scale Tracker time scale Macro-manager time scale k = τ2 Figure 2.3: The three time scales of the radar manager corresponding to the target priority allocation vector. Measurements are received at the tracker time scale Before delving into further details of the radar manager’s components, it is useful to provide context by discussing the details of our target model and Bayesian 2Equivalently, we can interpret ν as the ratio the radar’s SNR devoted to each target. 16 tracker, as well as any fundamental assumptions and approximations. 2.2 Target Model Given that the main application of the theory presented in this thesis is for GMTI systems, we assume that our targets are confined to a two-dimensional physical world. See Figure 2.4 for a typical tracking scenario with measurements from a aircraft-based GMTI radar. θd Λp A0 ψ rs (ξ̇xk , ξ̇yk) (ξxk , ξyk) x y z ξzk z) = (0, 0, 0) Figure 2.4: Typical fly-by scenario of a GMTI radar. The quantity A0 is defined to be the center of our surveillance region, Λp, and is set to be located at the point (x,y,z) = (0,0,0). The term rs is called the range swath. The platform is assumed to be located at (x,y) = (ξxk ,ξyk) and moving at a velocity (ẋ, ẏ) = (ξ̇xk , ξ̇yk) with platform altitude defined to be constant at ξzk =√ ξ 2xk +ξ 2 yk tan(θd), where θd is the angle of depression, typically 10 ◦− 15◦. The quantity, ψ , is termed the squint angle, with a value of 90◦ corresponding to the broadside-looking geometry [76]. The tracker assumes that all targets l ∈ {1,2, ...,L} follow a model based on the discrete white noise acceleration (DWNA) model [8, 11]. The form of this 17 discrete-time dynamic system is as follows [51], slk+1 = Fs l k +Gw l k (2.1) zlk = { h(slk,ξk)+ 1√ ν l∆ vlk with probability p l d /0 with probability 1− pld . (2.2) As stated, the targets are assumed to be moving on a two-dimensional plane, thus the state vector contains information regarding the position, xk,yk and velocity, ẋk, ẏk, resulting in a state vector for target l of slk = [ xlk, ẋ l k,y l k, ẏ l k ]T . Also, zlk is the observation vector, and Gwlk, 1√ ν l∆ vlk are zero-mean Gaussian random vectors with covariance matrices Q ≥ 0 (positive definite ordering) and R(ν l∆) ≥ 0, re- spectively. The quantity ν l represents the priority allocation given to target l. It is assumed that the measurement noise, vk, is independent from the state, sk. Also, the process noise, wk, is assumed to be independent of s0, ...,sk and z0, ...,zk. The platform state information at time k is captured in the vector ξk. Similar to the state vector for each target, the platform state vector includes information about both the position and velocity, but now regarding the GMTI platform, thus ξk = [ ξxk , ξ̇xk ,ξyk , ξ̇yk ]T , denoting the x and y positions and velocities, respectively. The state matrix, F , which models the physics of the non-maneuvering target, and the process noise gain matrix, G, are defined as follows,3 F =  1 T 0 0 0 1 0 0 0 0 1 T 0 0 0 1  , G =  1 2 T 2 0 T 0 0 12 T 2 0 T  (2.3) where T is the sampling interval. The process noise covariance matrix, Q, is de- 3The justification for these matrices and a derivation of the model can be found in Appendix B.1. 18 fined as, Q =  1 4 T 4σ2x 12 T 3σ2x 0 0 1 2 T 3σ2x T 2σ2x 0 0 0 0 14 T 4σ2y 12 T 3σ2y 0 0 12 T 3σ2y T 2σ2y  Where the values of the process noise standard deviations, σx and σy, control the intensity of the process noise and are dependent upon the (expected) target being tracked.4 The random movements of the targets are accounted for via an injected (white) process noise [11]. The measurement noise is defined as, R(ν l∆) = E [( 1√ ν l∆ vk )( 1√ ν l∆ vk )T] (2.4) = 1 ν l∆ E [ vkvTk ] (2.5) = 1 ν l∆  σ 2 r 0 0 0 σ2a 0 0 0 σ2ṙ  (2.6) where the variances, σ2r , σ2a , and σ2ṙ representing the variance in range, azimuth, and range rate, respectively, are chosen depending on the application and radar in question [92]. As will be discussed in Section B.3, we assume that we are re- ceiving measurements from a radar which has a probability of detection less than one, pld < 1, accounting for the possibility of missed detections. As can be seen in Equation (2.2), the observation for target l at time k, zlk, takes on one of two values. If the detection is missed, then zlk = /0, which can be interpreted as an observa- tion being received, but with an infinite noise variance. On the other hand, if the measurement is successfully received, then zlk = h(s l k,ξk)+ 1√ ν l∆ vlk. The nonlinear measurement function, h(slk,ξk), maps the target state and platform state to range, azimuth, and range rate (Doppler) values. It is defined as follows (with the target 4It should be noted that in radar tracking applications, in order to fit a model to the targets, one needs to have an idea of the type of target being tracked before measurements are actually made. As stated earlier, we have assumed a non-maneuvering target model, which for the purposes of GMTI and ground moving targets, is commonly accepted. 19 index, l, omitted), h(sk,ξk) =  √ (xk−ξxk)2+(yk−ξyk)2+ξ 2zk arctan ( yk−ξyk xk−ξxk ) (xk−ξxk )(ẋk−ξ̇xk )+(yk−ξyk )(ẏk−ξ̇yk )√ (xk−ξxk )2+(yk−ξyk )2+ξ 2zk  (2.7) Notice that the above range, azimuth, and range rate calculations are based on the relative positions and velocities of the target and the platform. This is equivalent to (the hypothetical situation of) treating either the target or the platform as being fixed in space, then taking measurements. We can approximate the above non-linear measurement model, by a Linear Gaussian State Space Model (LGSSM). The validity of this approximation is ach- ieved by considering typical target behaviour and showing that the measurement model, namely the non-linear measurement function, h(sk,ξk), can be well approx- imated by a linear, time-invariant function. The resulting LGSSM is as follows, the details of which can be found in Appendix B.2. slk+1 = Fs l k +Gw l k (2.8) zlk = { Hlslk + 1√ ν l∆ vlk w.p. p l d /0 w.p. 1− pld (2.9) 2.3 Bayesian Tracker The prevailing feature of any control system is that it is able to be added to the existing black-box system, to modify and regulate the output to something that is desired. Thus, the radar control system proposed in this thesis must be able to seamlessly interact with the pre-existing, legacy Bayesian tracker. In common GMTI systems, state estimation performed by the Bayesian tracker is achieved via a Kalman filter. As can be seen in the Appendix (Section B.2), the system model, Equation (2.9), is approximately a linear and time-invariant, Gaussian model on the micro-management time-scale, thus a Kalman filter is warranted. The Bayesian tracker receives measurements in polar coordinates, correspond- 20 ing to range, azimuth, and range rate (see Equation (2.7)), from the multifunction radar and aims to create “tracks” of each of the targets in the surveillance envi- ronment. These tracks are represented by two quantities, the filtered state estimate of each target l, denoted ŝlk|k, and a measure of “how good” the state estimate is, represented by an associated error covariance matrix, Plk|k. These two quantities are defined as follows, ŝlk|k = E { slk ∣∣zl0,zl1, . . . ,zlk} (2.10) Plk|k = E {( slk− ŝlk|k )( slk− ŝlk|k )T ∣∣zl0,zl1, . . . ,zlk} (2.11) which are computed via a Kalman filter using the measurements {zl1,zl2, . . . ,zlk}. For notational simplicity, we will write the sequence of measurements as Zlk ={ zl0,z l 1, . . . ,z l k } . Using a Kalman filter, the solutions to the above equations are, termed the measurement update equations [3], are ŝlk|k = ŝ l k|k−1+Pk|k−1H [ HT Plk|k−1H +R ]−1( zlk−HT ŝlk|k−1 ) (2.12) Plk|k = P l k|k−1−Plk|k−1H [ HT Plk|k−1H +R ]−1 HT Plk|k−1 (2.13) where ŝlk|k−1 = E { slk |Zk−1 } and Plk|k−1 = E {( slk− ŝlk|k−1 )( slk− ŝlk|k−1 )T |Zk−1} are the conditional mean and error covariance, respectively. These quantities are initialized by ŝ0|−1 = ŝ0 and P0|−1 = P0 ≥ 0. We wish to obtain representation for the quantities ŝlk+1|k =E { slk+1 |Zk } and Plk+1|k =E {( slk− ŝlk+1|k )( slk− ŝlk+1|k )T |Zk} as functions of the conditional mean, ŝlk|k−1, and covariance, P l k|k−1. The follow- ing equations, termed the time update equations, express solutions for ŝlk+1|k and 21 Plk+1|k, ŝlk+1|k = Fŝ l k|k (2.14) = Fŝlk|k−1+FPk|k−1H [ HT Plk|k−1H +R ]−1( zlk−HT ŝlk|k−1 ) (2.15) Plk+1|k = FP l k|kF T +Q (2.16) = FPlk|k−1F T +Q−FPlk|k−1H [ HT Plk|k−1H +R ]−1 HT Plk|k−1F T (2.17) where Equation (2.17) is termed the Riccati equation. In the absence of any mea- surements, the state and covariance are updated by the following equations, ŝlk+1|k = Fŝ l k|k−1 (2.18) Plk+1|k = FP l k|k−1F T +Q. (2.19) The state and error covariance updates in Equations (2.15) and (2.17) are com- puted assuming that we have access to all measurements in Zk. However, the radar will inherently miss detections (due to an excess of external noise, adverse target behaviour, presence of measurement clutter) resulting in some of the measurements to be missed. This results in the normal Riccati recursion, Equation (2.17), to no longer hold. The modified equations, taking the non-unity probability of detection into account, are defined as follows, R(Plk ,z l k) def = FPlkF T +Q−1 /0C(zlk)FPlkHT (HPlkHT +R)−1HPlkFT (2.20) L(Plk) def = FPlkF T +Q (2.21) The details regarding the derivation of the above equations can be found in Ap- pendix B.3. 22 Chapter 3 Radar Macro-manager The purpose of the radar macro manager is to develop an automated, systematic method for characterizing and predicting potential high threat situations and con- sequently be able to take appropriate preemptive measures. Given that the theme of this document is motivated by surveillance applications, these preemptive mea- sures are completely passive and refer to how to control the radar. Recall the two choices stated in Section 2.1 for target prioritization: 1) Target priority queue, and 2) Target priority allocation vector. The analysis in this section will be focused upon the former of the two prioritization choices, the target priority queue. Recall that this queue specifies the order in which the radar observes each of the l targets. In particular, we wish to specify the optimal order in which targets should be observed as to maximize the immediate reward for performing target observations. 3.1 Radar Macro-manager Structure The ultimate goal of radar surveillance is to determine the intent of each target in our system. Computing the threat level of each target based on its local behaviour provides some useful insight into its true intentions. The process of drawing con- clusions about a target based on its interaction with its environment can be clas- sified within the field of situational awareness. Situation awareness (occasionally referred to as situation analysis in the literature) is a topic which is concerned with 23 measuring spatial and temporal characteristics of the current environment and ex- tracting relevant information to make optimal decisions and predict the future state of the environment. In other words, in the context of the problem discussed in this document, given information about the current state of each target in our system, we can calculate and predict how it will likely behave in the near future, resulting in information which the radar scheduler can use to make opportunistic scheduling decisions. The radar macro-manager in this document consists of three main components. First, the threat function receives Bayesian estimates from the tracker from which it computes an estimate of the threat level as well as an associated uncertainty esti- mate. Second, the threat rank function receives the threat estimate for each target, defined as the pair (r̂ln,e l n), and ranks the targets according to their priority by as- signing a random variable, X ln, and associated probability distributions, defined by a mean and a variance. From the mean and variance values, we assign a specific distribution to each target’s rank such that it belongs to a class of likelihood ratio orderable distributions. Lastly, the macro-manager scheduler receives the L prior- ity ranks, {X1n ,X2n , . . . ,XLn }, and produces a target prioritization. Specifically, we use the ranks to determine an optimally ordered queue specifying the order to ob- serve the L targets in the system. We also specify a definition of a tracking time constraint vector, computed from the optimal queue, which modifies the behaviour of the micro-manager by imposing a maximum tracking time constraint dependent upon the priority of each target. A schematic of the macro-manager can be seen in Figure 3.1. Since the main theoretical result of the macro-manager is the ap- plication of the optimal issuing policy to the problem of determining an optimal observation queue, we will present this result first. Immediately following, we will state how the threat and priority of each target is computed based on the target information. 3.2 Macro-manager Scheduler As stated earlier, the macro-manager could specify one of two prioritizations: a target priority queue, or a target priority vector. The theory presented in this section is devoted to determining this optimal priority queue, whereas the presentation of 24 Bayesian estimate(5) Threat Function Macro-manager Scheduler Macro-manager State Estimate, sln Error Covariance Matrix, P ln Target Prioritization Threat Estimate, r̂lnThreat Error, e l n X lnThreat Rank Function Figure 3.1: A schematic of the radar macro-manager. the priority vector is omitted. In systems where the total value obtained from processing a queue of jobs is dependent upon the order in which the tasks are performed, it would be of great interest to come up with the order that maximizes the total value. This is precisely the role of the macro-manager scheduler; to determine the optimal order to observe each of the L targets in our environment. Since the total number of orders for L targets is L!, it would be intractable to perform an exhaustive search of all orders in systems with a even a modest number of targets. Fortunately, we can make use of structural results developed in the literature to compute this optimal order. The theory behind optimal issuing was largely motivated by problems in in- ventory management. The problem of optimal depletion of a finite inventory has been the topic of research for over half a century. Some of the early papers in this 25 field, [5, 19, 32] worked to provide conditions on the items (when the supply and demand are assumed known) such that one of either Last-In-First-Out (LIFO) or First-In-First-Out (FIFO) ordering policies was optimal. Later research focused on the issue of optimal stock depletion under the assumption of random item rankings. Derman and Klein in particular were among the first to study the problem of opti- mal issuing when the value of each item was considered to be stochastic. Pierskalla and Roach [69] extended the problem to deal with stochastic supply and demand for an application to perishable inventory (inventory control for a blood bank). The theory from optimal stock depletion has made its way in many other lines of research over the years. One of the applications involves attempting to deter- mine the optimal order such to maximize the total field-life (life-time) of a group of items. This problem was first approached assuming each item had a determinis- tic field-life [12], then was generalized to allow for stochastic field-lives [13, 78]. In the case of stochastic field-life of an item, optimal issuing orders were achieved by ordering the field-life distributions of each item with respect to both monotone likelihood ratio ordering [13, 78] and hazard-rate ordering [22]. The field-life of each item, in both the deterministic and stochastic cases, is assumed to be con- trolled by a so-called field-life survival function, which models how the field-life distributions change over time. Albright [1] worked to derive results for many dif- ferent forms of the field-life survival function, considering two scheduling cases: independent and dependent. The independent case assumes that the process that defines at what time the items are used is independent on the field lives of each item and the order in which they are used. Conversely, the dependent case considers the fact that the field lives of successive items are dependent upon their locations in the queue. Even though many different forms of the field-life survival function were considered, results for the dependent case (more interesting than the independent case; however, considerably more complex) were limited to fairly simplistic sur- vival functions. The theory of optimal issuing has also been applied to maximizing the net present value of a list of R&D projects [31], maximizing accuracy of pre- dictive form filling [54], and minimizing the makespan (total processing time) of a queue of jobs [14, 16, 30, 71]. We first outline the results of the problem of optimal issuing of items with stochastic field-lives originally discussed in [78]. This involves introducing the 26 definitions and lemmas used to obtain the results. We then apply Ross’ result to the problem of determining an optimal order for observing the L targets in our system, and outline the main proofs in the context of our problem. 3.2.1 Optimal Issuing Policy We start by introducing three types of stochastic orders, used to determine if one random variable dominates another random variable. Most stochastic orders are not complete orders, but rather partial orders, since not every pair of distributions are orderable. It is necessary, for the purposes of the following theory, to state the definition of three types of (partial) stochastic orders, expected value dominance, first-order stochastic dominance, and likelihood ratio dominance.1 Definition 3.2.1. Let X and Y be random variables with positive support. 1. Expected Value Dominance: The random variable X is said to be dominate the random variable Y in expectation if, E{X} ≥ E{Y} (3.1) 2. First-order Stochastic Dominance: The random variable X is said to be stochastically larger than the random variable Y if, P(X > w)≥ P(Y > w) for all w. (3.2) This is denoted by X ≥st Y . 3. Likelihood Ratio Dominance: Let X ∼ f (x) and Y ∼ g(x), where “∼” means “distributed as.” Then X dominates Y in a likelihood ratio sense if, f (x) g(x) increases in x. (3.3) This is denoted by X ≥LR Y . 1For the purposes of this paper, we will only need to consider random variables that are confined to a positive domain. 27 The likelihood ratio order is a stronger condition than first-order stochastic dominance, thus X ≥LR Y ⇒ X ≥st Y , but the converse, in general, is not true. First-order stochastic dominance also implies dominance in expectation. Thus, X ≥LR Y ⇒ X ≥st Y ⇒ E{X} ≥ E{Y} (3.4) The problem of maximizing the field-life of a stockpile of n items is discussed in [78](p.118) and will be restated here. Each of the n items has a ranking, rep- resented by the random variables Xi, i = 1,2, . . . ,n. The field-life of the ith item if released into the field at time t is Li(t) = Xid(t). The items are assumed to be placed into the field sequentially and replace immediately the moment the previ- ous item expires. That is, when item i reaches the end of its field-life, item i+ 1 will immediately be placed into the field. The function d(t), termed the field-life survival function, models how the rankings change over time. For the example in [78], it is assumed that d(t) is a non-negative, non-decreasing, concave function, thus the items are assumed to improve with time. The logic behind this particular choice of d(t) is clearly dependent upon the particular application. A relevant ex- ample for this form of d(t) is a stockpile of batteries, each in their own charger, where the charge of each battery increases more quickly at a low charge than when the battery is reaching a full charge. Conversely, the case of d(t) being decreas- ing function would fit an example where the products are worsening in time in the stockpile, such is the case in perishable inventory.2 The main result is that given a likelihood ratio ordering of the ranks, X1 ≤LR X2 ≤LR . . .≤LR Xn, and non-negative, non-decreasing, concave function d(t), the policy which issues the first item, then the second item, and so on, until the last item, will maximize the probability of a maximum total field-life. That is, the theory specifies to issue items in order of increasing likelihood ratios of their respective distributions. The alternate case, for d(t) a non-negative, non-decreasing, convex function, implying that items are improving at an increasing rate while in the stockpile, is 2This is clearly the more relevant case for the problem of perishable inventory; however, there is a surprising lack of literature that analyze decreasing d(t) in the stochastic case. 28 modified and applied in the following section to determining an optimal target prioritization. For this case, the optimal order to maximize the total expected field- life is to issue items in decreasing ranks; n→ n−1→ . . .→ 1. The proof follows from a similar argument discussed in Ross [78] but is presented in the Appendix C in the context of our problem for intuition behind the operation of the scheduler. 3.2.2 Target Prioritization The optimal issuing policy provided by Ross [78] can be applied to the problem of determining an optimal order to observe the L targets. First, we define the threat rank of a target as a quantity which is assigned based on the target’s rank, X ln. As can be seen later, in Section 3.4, a target, l, with a high level of incentive (a high level of threat) at time n is assigned a high rank mean, µX ln . The incentive (immediate reward) of observing a particular target is analogous to the field-life of an item, as discussed in the optimal issuing problem statement in Section 3.2.1. The immediate reward of observing target l at an accumulated reward value, x, is defined as T ln = X l nd(x). The function d(x), seen in Figure 3.2, represents the valuation (utility) of the immediate reward of observing the current target, based on a given level of accumulated reward, x. The form of d(x) is positive (d(0)> 0), d(x) x d(0) Figure 3.2: Utility function, d(x), representing the valuation of an observa- tion at a given value of accumulated threat reward. non-decreasing, convex function, indicating that we value a gain, x̄ > 0, in threat reward more at a higher accumulated reward value, x2, than an equivalent gain at a lower value, x1. That is, d(x2+ x̄)−d(x2)> d(x1+ x̄)−d(x1) for x2 > x1 (3.5) 29 The intuition behind the shape of this function can be understood by the following argument. Notice that x is an accumulated variable, as is time, t, in the example discussion in [78]. As targets in the queue are observed, the accumulated reward, x, is incremented accordingly. It is clearly evident that an accumulation of a threat reward x̄ requires a positive amount of time t. As a result, a higher accumulated reward translates into a longer time before some targets are observed. Thus, the perceived threat of these targets should be higher and as a result, a higher util- ity for their observation is assigned the longer we wait to make the observation, thus justifying the increasing slope of the function d(x). In summary, the goal of the macro-manager is to find an order to maximize the total accumulated reward, which is accumulated at an amount dependent upon the threat estimate and threat error of the target under observation. Thus, essentially, we are ordering our target observations to maximize the reward proportional to the incentive for observing each target. The ordering problem is now formally presented in the context of the target scheduling problem. The set O contains all L! possible permutations of the or- ders of target observations. We define an order as o = (o1,o2, . . . ,oL) ∈ O which corresponds to one of the permutations of O . Each of elements o1,o2, . . . ,oL is assigned a unique target index from 1,2, . . . ,L such that no two elements have the same target index. The optimal order, denoted o∗, maximizes the expected total threat reward, denoted T oLn . That is, o∗ = argmax o∈O {E{T oLn }} (3.6) T̄ oLn = maxo∈O {E{T oLn }} (3.7) where the reward T oLn is defined recursively as follows, T o1n = X̄ o1 n d(0) (3.8) T o2n = T o1 n + X̄ o2 n d(T o1 n ) (3.9) ... (3.10) T oLn = T oL−1 n + X̄ oL n d(T oL−1 n ) (3.11) 30 where X̄oln is the realization of the random threat rank variable, X ol n . Note that T oL n itself is a random variable, and dependent upon the order in which we observe targets, making the problem of determining the optimal order non-trivial. The optimal ordering policy for the target observations is stated formally in the following theorem. Theorem 3.2.2. Suppose that there are L targets with ranks, X1n ,X2n , . . . ,XLn which can be arranged according to the likelihood ratio order (Definition 3.2.1) as fol- lows, X1 ≤LR X2 ≤LR . . .≤LR XL (3.12) Then for a non-negative, non-decreasing, convex utility function d(x), the policy that maximizes the probability for all x that the total reward exceeds x, is to observe the targets in the following order, o∗ = (L,L−1, . . . ,1) (3.13) and is termed the optimal order. Proof. See Appendix C.1.4. The above theorem states that the optimal order, defined in Equation (3.6), cor- responding to the maximum threat reward, Equation (3.7), is achieved by ordering the observations according to decreasing likelihood ratios of their respective ranks, X1n ,X 2 n , . . . ,X L n . We now state the details of the modeling of the threat and priority of each target, necessary for the computation of the aforementioned priority ranks and optimal observation queue. 31 3.3 Threat Function The definition of the problem of threat assessment, and of threat itself, can vary widely depending on the context. A fairly general definition of threat assessment, provided by Allouche et al. [80], which seems to apply to all treatments of threat analysis in the literature, is as follows: “The analysis of the past, present and ex- pected actions of external entities, and their consequences, to identify menacing situations and quantitatively establish the degree of their impact on the mission, the intents, the plans, the actions and the human and material assets of some valu- able units to be protected, taking into account the defensive actions that could be performed to reduce, avoid or eliminate the identified menace.” The precise inter- pretation of threat is dependent on the particular situation in question. In [68], the author focuses on a scenario where an UAV must reach a destination by taking the path of least threat. A high threat value in this context corresponds to a scenario where the UAV is likely to be detected by a radar. Threat in [72] is defined from the perspective of vehicle tracking systems, and thus refers to a scenario where two objects may collide. The definition of threat in the context of target tracking and radar systems typically refers to the perceived risk that a target presents to the radar itself. Since in our application, the radar measurements are assumed to be of ground-moving targets performed via an aircraft-mounted GMTI radar, we do not maintain the same definition of target threat. Instead, we consider a set of sensi- tive assets; a collection of areas which we wish to protect from enemy targets, and compute the threat of each target in relation to these assets. Some papers in the literature deal with the the problem of threat evaluation in the context of military applications and defence. The problem of asset protection is discussed in [18, 80] (termed threat reference point); however, both references lack an explicit mathematical formulation for determining the threat to these as- sets based on information on the target behaviour. Their analysis is thorough, but only in a qualitative sense; with the work of [79] providing a thorough review of threat evaluation and weapon assignment. More detailed formulations of threat functions are discussed in [42, 64, 66]. The analysis for threat evaluation in [42] is broken down into three main components: 1) Proximity: A quantity Time-to- Closest-Point-of-Approach (TPCA) is defined which represents the expected time 32 for the target in question to reach a particular region centered around an asset (with a lower TPCA value representing a higher threat); 2) Capability: based on esti- mations of the target’s weapons systems, fuel capacity, and general lethality of a target; 3) Intent: a value which can be inferred from various factors such as head- ing, range rate, and altitude. The discussion in [64] is slightly more general than that of [42] and includes a decision tree for target priority assessment which takes into account factors such as target trajectory (heading, range rate), hostile/friendly ID, weapon systems, and track quality. The threat function formulation discussed in [66] makes an assumption that the target poses no threat to an asset if its rate of motion is directed away from the given asset. In reality, due to a potential abil- ity for the target to change directions in a short time frame, the proximity of the target-asset pair should still be taken into account, even if the velocity is directed away from the asset. This extension is considered in our formulation. It is the aim of the following sections to provide an explicit, mathematical link between target behaviour and fixed assets and the respective levels of threat, building upon the work already developed throughout the literature. 3.3.1 Threat Estimate In this section, we address the factors that can be used to determine a target’s level of threat. We are provided with estimates of the targets’ states, both position and velocity, as well as knowledge of any sensitive assets in the environment. We start with determining the threat of a given target to a particular asset in our system. An asset could be a friendly target that we are trying to protect or simply just a region we wish the target to stay away from. In any case, the closer the target is to a given asset, the higher we should consider its threat. Note that assets are defined as points inR2 and are chosen by the system designer. Consider the following scenario, with two targets and three assets as seen in Figure 3.3. The coordinates (xp1 ,y p 1),(x p 2 ,y p 2), ...,(x p A,y p A) represent the locations of our assets, where A is the total number of assets. The quantity ṙl = [ẋl, ẏl] represents the instantaneous velocity vector of target l ∈ L = {1,2, ...,L} and ra,l represents the range vector from target l to asset a. We wish to calculate the component of a given target’s velocity in the direction of a particular asset. Consider Figure 3.4 for the 33 Future (unknown) tracks Prior (known) tracks Figure 3.3: Multi-target, multi-asset scenario. structure of a sample calculation of the threat of target l to asset a. Figure 3.4: Radial velocity component and range vector used to determine threat of target to asset. Define the target-asset threat (TAT) function as, γ̃a,l = ṙa,l ra,l (3.14) 34 where, ṙa,l = [xpa − xl,ypa − yl] · [ẋl, ẏl]√ (xpa − xl)2+(ypa − yl)2 (3.15) ra,l = √ (xpa − xl)2+(ypa − yl)2 (3.16) The vector ṙa,l is the magnitude of the projection of the target’s velocity vector onto the direction vector between target l and asset a. Also, the value ra,l is the distance between the target and the asset. This TAT function is directly proportional to the velocity of the target with respect to the asset, that is, γ̃a,l will be positive when the target is moving towards the asset and negative when the target is moving away from the asset. Also, the function is inversely proportional to the distance between the target and asset. This makes intuitive sense, as when the target is close to the asset, it should be considered to be more of a threat. There are some issues with the current formulation of the TAT function that should be addressed. Consider the case where we have two assets (xp1 ,y p 1) and (xp2 ,y p 2), and the target is located equidistant between the two but moving directly towards, say, the first asset. At this moment, the TAT functions values between the target and the two assets will exactly cancel out, that is, γ1,1 = ṙ1,1 r1,1 and γ2,1 = ṙ2,1 r2,1 = ṙ2,1 r1,1 =− ṙ1,1 r1,1 ⇒ γ1,1+ γ2,1 = 0 (3.17) Thus, in the logical next step where we wish to combine the TAT functions for multiple targets and assets to compute the total threat of a target, a simple addition would result in the threat of the above target to be zero, when in reality, it is still a very real threat to the asset located at (xp1 ,y p 1). To alleviate this problem, we restrict the TAT functions to map to only positive numbers. One way of doing this is to use the exponential function as follows, γa,l = eṙa,l ra,l (3.18) The threat value is exponential in the projected velocity ṙa,l , and thus negative quantities are mapped to a dampened positive value. Since the TAT function for 35 each target-asset pair is guaranteed to be a positive number, we can perform a weighted sum to determine the threat of the lth target to all of the assets in our system, Γ(sl) = A ∑ a=1 waγa,l (3.19) where wa > 0,∑a wa = 1, are the importance weights of each asset in our system. A high value of wa implies that we consider asset a as an important asset. In essence, the TAT function considers both the proximity threat as well as the threat associated with the direction and velocity in relation to all assets in our surveillance region. The formal definition for the TAT function is as follows, Definition 3.3.1 (Target-Asset Threat Function). The Target-Asset Threat (TAT) function, denoted Γ, maps the state vector of target l, sl ∈R4×1 to a positive threat value, for each target, rl ∈ R+. That is, Γ : R4×1→ R+, rl = Γ(sl) = A ∑ a=1 waγa,l (3.20) where γa,l is defined in Equation (3.18). For some intuition behind the calculated Γ(sl) value, consider the following argument. By computing this quantity, we are essentially calculating the target’s capability to be a hazard to the assets in the system. Following the argument out- lined in [18], the logic here is that, in general, a target will not have an intent to attack unless it is capable of one. Thus by computing the capability we are thus gaining information on its intent and can the use Γ(sl) as a measure of the target’s threat. A typical target-asset scenario can be seen in Figure 3.5. As can be seen from the figure, the state estimates do not provide completely accurate information of the true target tracks, that is, there is an error associated with the state estimates. As a result, we would also expect a discrepancy between the true threat and the estimated threat. This is indeed the case, as seen in Figure 3.6. 36 −400 −200 0 200 400 600 800 −500 0 500 1000 x−coordinate (in meters) y− co or di na te (in m ete rs) Figure 3.5: A typical target-asset scenario. True target tracks are repre- sented by the colored (solid) tracks, whereas the dashed lines repre- sent the state estimates. There are a total of three assets in the system, which are assumed to be fixed with coordinates (all in units of meters) (xp1 ,y p 1) = (220,−150), (xp2 ,yp2) = (−170,230), and (xp3 ,yp3) = (400,20) with importance weights w1 = 1/3, w2 = 1/2, w3 = 1/6, respectively. The radar platform is moving along a linear, constant velocity path with states, [ ξxk , ξ̇xk ,ξyk , ξ̇yk ]T = [10km,53m/s,−30km,85m/s]T at a con- stant altitude of ξz = 5575. Consider a set of optimal target state estimates, denoted ŝln, provided from our Bayesian tracker. Then one may expect the optimal mean, r̂ln, of the threat estimate to simply be, r̂ln = E{rln} ≈ Γ(ŝln) (3.21) However, the above expression is not accurate if the function, Γ, is highly nonlin- ear or the level of noise is high [60]. This arises from the fact that the expectation of a function of a random variable is, in general, not equal to the function of the ex- pected value of a random variable. That is, r̂ln = E{rln}= E{Γ(sln)} 6= Γ(E{sln}) = 37 Γ(ŝln), where sln denotes the true, unknown state vector of each target.3 Thus we wish to find a better approximation of the threat estimate to the true threat value. One way of doing this is to expand Γ(s) in a Taylor series expansion. That is, rln = Γ(s l n) = Γ(ŝ l n)+∇sΓ (s l n− ŝln)+ 1 2 ∇2sΓ (s l n− ŝln)⊗ (sln− ŝln)+ . . . (3.22) Where, for the sake of notation, we have omitted the notation denoting that the partial derivative Jacobian, ∇sΓ ∈R1×4, and Hessian, ∇2sΓ ∈R1×4 2 , are dependent upon the target, l, and time, n, indices since they are evaluated at the state estimate, sln = ŝ l n. The Jacobian vector is as follows (the Hessian is omitted due to its large size), ∇sΓ= A ∑ a=1 ( wae −(ẋx̄a+ẏȳa) ra [ (ẏx̄a−ẋȳa)ȳa−rax̄a r4a − x̄ar2a (ẋȳa−ẏx̄a)x̄a−raȳa r4a − ȳar2a ]∣∣∣ s=ŝ ) (3.23) where x̄a = x− xpa , ȳa = y− ypa , and ra = √ x̄2a+ ȳ2a. It should be noted that the Jacobian and Hessian can be computed numerically, where we would only rely on function evaluations to approximate the entire function, the essence of Taylor series expansions. However, in this problem, since we have complete knowledge of the threat function, we are able to analytically compute the Jacobian and Hessian offline as a function of the state, providing the advantage of an immediate value for a given state. If we now only consider the first order approximation, that is, keeping the first two terms of the RHS of Equation (3.22), then take expectations of both sides, we obtain, E{rln}= r̂ln = E { Γ(ŝln)+∇sΓ (s l n− ŝln) } (3.24) = E { Γ(ŝln) } +∇sΓ E { (sln− ŝln) } (3.25) = Γ(ŝln) (3.26) Since (sln− ŝln) is assumed to be zero-mean, then E{(sln− ŝln)}= 0. We can see that 3This should be clear for readers who are aware of Jensen’s inequality. 38 the above does not give us any more information than Equation (3.21). This first- order estimate can be seen in Figure 3.6 as the black-dotted line. We now consider the second order approximation, by keep the first three terms of Equation (3.22) and taking expectations, E{rln}= r̂ln = E { Γ(ŝln)+∇sΓ (s l n− ŝln)+ 1 2 ∇2sΓ (s l n− ŝln)⊗ (sln− ŝln) } (3.27) = Γ(ŝln)+ 1 2 ∇2sΓ E { (sln− ŝln)⊗ (sln− ŝln) } (3.28) = Γ(ŝln)+ 1 2 ∇2sΓ vec ( (sln− ŝln)(sln− ŝln)T ) (3.29) = Γ(ŝln)+ 1 2 ∇2sΓ vec ( Pln ) (3.30) Where the steps (3.28) and (3.29) are obtained from [60]. The vec() operator stacks the columns of a matrix argument, thus the quantity vec ( Pln ) ∈ R42×1 is a vector created by stacking the columns of the covariance matrix Pln. Equation (3.29) offers a refined estimate; the accuracy of which can be observed as the red-dotted line in Figure 3.6. The additional term, 12∇ 2 sΓ vec ( Pln ) , an approximation of the bias, is necessary in the presence of nonlinearities of Γ and noise in the state sln to ensure that filter estimates are unbiased [6, 26, 55, 60]. Given the knowledge of the uncertainty in our state estimates (contained in the error covariance matrix Pln), and now that we have reasonable estimates of the threat level, we can better estimate the error in these estimates. But first, we analyze a small extension to problem of assigning asset weights. 3.3.2 Asset Weight Assignment We now discuss a short extension of the problem of assigning weights to the assets. The weight of an asset is dependent upon a measure of its vulnerability. In typical tracking scenarios, chiefly in urban-surveillance scenarios, the motion of targets is restricted to roadways. Assuming that the assets we wish to protect are located on roadways as well, we aim to compute the the weight of each asset as a represen- tation of its accessibility from nearby roadways. The tools used to determine this vulnerability weight are obtained from the field of eigenvector centrality. Google’s PageRank algorithm is a form of eigenvector centrality, discussed in [21]. 39 200 400 600 800 1000 0 0.05 0.1 0.15 0.2 0.25 0.3 0.35 0.4 Measurement iterations Tr ue a nd E st im at ed T hr ea t o f T ar ge t 1 200 400 600 800 1000 0 0.05 0.1 0.15 0.2 0.25 0.3 0.35 0.4 Measurement iterations Tr ue a nd E st im at ed T hr ea t o f T ar ge t 2 200 400 600 800 1000 0 0.01 0.02 0.03 0.04 0.05 0.06 0.07 Measurement iterations Tr ue a nd E st im at ed T hr ea t o f T ar ge t 3 200 400 600 800 1000 0 0.5 1 1.5 2 2.5 Measurement iterations Tr ue a nd E st im at ed T hr ea t o f T ar ge t 4 Figure 3.6: Threat level estimates compared to the true threat value for L = 4 targets. Starting in the top-left and moving right, plots correspond to target indices l = 1,2,3,4. The solid (colored) line represents the true threat value, whereas the black-dotted and red-dotted lines represent the first-order and second-order approximations respectively. Measurement iterations of length T = 0.1s. Consider the following road network, in Figure 3.7, representing a typical urban-surveillance scenario with assets situated at either the intersections of roads or at the termination of a road. Assuming a ground map of all access routes to each asset is known, we can construct a graph G = {V,W} with a node set V and a weighted adjacency matrix W , as in Figure 3.8. The node set V consists of all assets, all intersections of routes, and all route termination points. The weighted adjacency matrix is W ∈ R|V |×|V |, where |V | (represents the cardinality of set V) is the total number of nodes in the graph. The matrix W contains a positive element Wi j if and only if there is an access route from node j to i. We assume that all 40 Figure 3.7: Typical ground map with 14 nodes. The three red dots indicate asset locations. There are three road types; large, medium and small, representing the accessibility of the respective road way. ŵ1 ŵ2 ŵ3 ŵ1 ŵ1 ŵ1 ŵ2 ŵ1 ŵ1 ŵ1 ŵ1 ŵ2 ŵ1 ŵ1 ŵ1 1 2 3 4 5 6 7 8 9 10 11 12 13 14 Figure 3.8: Graphical representation of Figure 3.7. Filled nodes indicate as- set locations, defined by the asset index set ī = (1,8,11), where hollow dots indicate road termination points or intersections. The three road types are assigned respective weights (ŵ1, ŵ2, ŵ3). 41 weights Wi j are contained in [0,1/U ], and that each path weight, represented by the values (ŵ1, ŵ2, ŵ3), is defined by the system-designer to be proportional to the ease with which a target can travel along the corresponding route. In other words, a narrow alley-way would be assigned a weight near zero, whereas a super high-way would be assigned a weight near 1/U . To account for the possibility that a target might approach an asset from a non-access route (i.e. over a rocky terrain, across a river, or through a series of interconnecting buildings), we assign to the map a constant d ∈ [0,1/U ] indicating the possibility that a target might travel along a non-access route. For instance, if all assets were located in the middle of a desert, then d would be near 1/U since there would be little need for a target to remain on a designated road. We now use the weighted adjacency matrix to compute the eigenvector cen- trality. We define a new matrix, termed the adjusted weighted adjacency matrix as Ŵ ∈ R|V |×|V | where each element satisfies Ŵi j = maxi, j∈V{Wi j,d}. For any choice of d ∈ (0,1/|V |], the matrix Ŵ is a positive matrix, and thus by the Perron- Frobenius Theorem, Ŵ will have a unique positive right-eigenvector, m ∈ R|V |×1, corresponding to the eigenvalue equal to the spectral radius of Ŵ . The vector m is thus the principal eigenvector of Ŵ . We define m̄ as the normalized eigenvector, m, such that m̄i = mi/(∑i∈V mi), resulting in ∑i∈V m̄i = 1. Now, we can define the well-known measure of the graph centrality of node i ∈V as the element m̄i. Recall, the precise weights of the roads are chosen by the designer to represent how easily a target can move along a respective path. We define them arbitrarily as follows, (ŵ1, ŵ2, ŵ3,d) = (0.8,0.5,0.1,0.05)/|V |. For these values, the graph in Figure 3.8, has the associated weighted adjacency matrix and corresponding normalized principal eigenvector of, 42 W =  d ŵ2 d d d d d d d d d d d d ŵ2 d ŵ1ŵ1 d d d d d d d d d d d ŵ1 d d d d d d d d d d d d d ŵ1 d d ŵ1 d d ŵ1 d d d d d d d d d ŵ1 d ŵ1ŵ2 d d d d d d d d d d d ŵ1 d d d d d d d d d d d d d ŵ2 d d ŵ2 d d ŵ3 d d d d d d ŵ1 d d ŵ2 d ŵ1ŵ1 d d d d d d d d d d d ŵ1 d d d d d d d d d d d d d ŵ1 d d ŵ1 d d ŵ1 d d d d d d ŵ3 d d ŵ1 d ŵ1ŵ1 d d d d d d d d d d d ŵ1 d d d d d d d d d d d d d ŵ1 d d d d d d d d d d d d ŵ1 d d d d  , m̄ =  0.032 0.066 0.040 0.095 0.071 0.041 0.055 0.113 0.054 0.051 0.102 0.129 0.090 0.059  T (3.31) We now index the above vector, m̄, by the asset index set ī = [1,8,11] to obtain a measure of the vulnerability the assets in our system. Notice that the asset vul- nerabilities in this example are m̄ī = [0.032,0.113,0.102] indicating that the assets index by node index i = 8 and i = 11 are roughly twice as vulnerable as the asset indexed by i= 1. Using the importance weight, wa, defined in the previous section, we now define the weighted vulnerability value for each asset a as, w̄a = m̄īawa ∑a m̄īawa (3.32) such that ∑a w̄a = 1. Recall the definition of the TAT function from Section 3.3.1, but now with the updating asset weights w̄a, rl = Γ(sl) = A ∑ a=1 w̄aγa,l (3.33) where γa,l is defined in Equation (3.18). The TAT function now considers the prox- imity of the threat, the accessibility of the asset, and also the threat associated with the target velocity in relation to all assets in our surveillance region. 43 3.3.3 Threat Estimate Uncertainty From the previous sections, we have a function which maps the estimate of the target states to an estimate of the target threat. Since the state estimates themselves are not precise, that is, there is an associated error covariance matrix, the resulting threat estimate also has a degree of uncertainty. Using the error covariance matrix provided at the end of the previous micro-manager tracking interval, Pln (beginning of the current macro-manager interval), and knowledge of the exact form of the threat function, we wish to derive the associated threat variance, eln, corresponding the the threat estimate, r̂ln. The general problem of determining the propagation of uncertainty can be stated formally as follows, Given a nonlinear function f : RN → RM and a random vector x ∈ RN with an associated mean x̂ and error covariance matrix Px ∈ RN×N , define y = f (x) where y ∈ RM. Determine the resulting mean ŷ and error covariance matrix Py ∈ RM×M of the random vector y. There are many different approaches one can take to obtain a solution to the above problem. The methods found in the literature can be divided into two main categories: empirical and analytical. Empirical methods, and the most general of all the approaches, largely fall under the classification of Monte Carlo methods [4, 33]. However, these approaches are very computationally expensive and cannot be performed in real-time. Conversely, the analytical methods of finding a solu- tion to the above problem rely on finding a valid approximation of moments of the function of random variables. The usual method for this approximation is via Taylor series expansion. Some methods in the literature only include up to the first-order term [65], which are reasonable approximations assuming the function is roughly linear in the region near the linearization. However, when the function exhibits highly nonlinear behaviour, as in the case of the threat function, we are re- quired to retain higher-order terms of the series to achieve more accurate analytic estimates. Various papers in the literature discuss second-order moment approx- imations [25, 56, 96, 100], albeit, many of them contain awkward notation and complicated representations. A very nice treatment of expressing higher order ap- 44 proximations to moments of functions of random variables is seen in [60] where an explicit representation of the Taylor expansion is presented using Kronecker products and partial derivative block matrices. We start by defining e = sln− ŝln, where e ∈R4×1 is assumed to be a zero-mean random vector. We wish to compute the error covariance of the threat, defined as (where we have omitted the target index l and the time index n), PΓ = E { (r−E{r})(r−E{r})T} (3.34) where r is the true threat value. If we only take the first-order approximation of Equation (3.22) and compute the error covariance matrix4 then the resulting ap- proximation is, PΓ ≈ E { (r−E{r})(r−E{r})T} (3.35) = E { (Γ(ŝ)+∇sΓ e−Γ(ŝ))(Γ(ŝ)+∇sΓ e−Γ(ŝ))T } (3.36) = E { (∇sΓ e)(∇sΓ e)T } (3.37) = E { ∇sΓ eeT (∇sΓ)T } (3.38) = ∇sΓ E { eeT } (∇sΓ)T (3.39) = ∇sΓ P (∇sΓ)T (3.40) where P is the error covariance matrix Pln with indices l and n omitted. The result of this error covariance approximation can be seen in Figure 3.9, which is evidently not a very good estimate of the true error. We now present the second-order approximation to the error covariance of the threat. The method is similar; however, we now keep two terms of the expansion in Equation (3.22). Assuming Gaussian noise, the expression for the approximation of the error covariance of the threat is, PΓ ≈ ∇sΓ P (∇sΓ)T + 14∇ 2 sΓ ( D− ssT )(∇2sΓ)T (3.41) 4In this case, the error covariance matrix is in fact just a scalar, since Γ : R4×1→ Rm×m, where m = 1. In general the error covariance matrix of the output variable is of dimension m×m. 45 100 200 300 400 500 0 0.05 0.1 0.15 0.2 Measurement iterations Tr ue a nd E st im at ed T hr ea t E rro r, l= 1 100 200 300 400 500 0 0.05 0.1 0.15 0.2 0.25 0.3 Measurement iterations Tr ue a nd E st im at ed T hr ea t E rro r, l= 2 100 200 300 400 500 0 0.01 0.02 0.03 0.04 0.05 Measurement iterations Tr ue a nd E st im at ed T hr ea t E rro r, l= 3 100 200 300 400 500 0 0.2 0.4 0.6 0.8 1 Measurement iterations Tr ue a nd E st im at ed T hr ea t E rro r, l= 4 Figure 3.9: True threat error covariance compared to first-order approxima- tion of threat error. True threat error is computed by taking the absolute difference between the true threat value and the first-order approxima- tion to the threat mean. Colored (not red) lines represent the true threat error where red-dotted lines represent the estimated value. With the details of the derivation presented in the Appendix (Section B.4). The second-oder error estimate accuracy compared to the true threat error can be seen in Figure 3.10. 46 100 200 300 400 500 0 0.05 0.1 0.15 0.2 Measurement iterations Tr ue a nd E st im at ed T hr ea t E rro r, l= 1 100 200 300 400 500 0 0.05 0.1 0.15 0.2 0.25 0.3 Measurement iterations Tr ue a nd E st im at ed T hr ea t E rro r, l= 2 100 200 300 400 500 0 0.01 0.02 0.03 0.04 0.05 Measurement iterations Tr ue a nd E st im at ed T hr ea t E rro r, l= 3 100 200 300 400 500 0 0.2 0.4 0.6 0.8 1 Measurement iterations Tr ue a nd E st im at ed T hr ea t E rro r, l= 4 Figure 3.10: True threat error covariance compared to second-order approx- imation of threat error. The true threat error is computed in the same way as the first-order case (see caption of Figure 3.9); however, we now use the second-order approximation to the threat mean. Colored (not red) lines represent the true threat error where red-dotted lines represent the estimated value. Comparing the first-order error approximation (Figure 3.9) to the second-order approximation (Figure 3.10) for each target clearly shows the improvement in ac- curacy. Note that the “true” threat error is not the same in the two plots, since the threat mean estimate was different in the first and second order cases. 3.4 Threat Rank Function From the previous section, we now have an estimate of the threat value for each target l as well as an associated estimation in the uncertainty of the computed threat value. Given a target with a particular threat estimate, denoted as the pair (r̂ln,e l n), it would be useful to assign a priority rank to the target. We denote the rank of 47 targets in our system, at time n, by the random variables, X1n ,X 2 n , . . . ,X L n (3.42) where X ln refers to the rank of target l. The rank can be interpreted as a measure of the incentive (the immediate reward) to observe a particular target. Thus the rank assigned to a target is proportional to the level of threat and the uncertainty of that target. The threat rank function function, see Figure 3.1, serves the purpose of assigning this rank to each target. With knowledge of the rank of each target in the surveillance region, we aim to compute an optimal policy to maximize the total (immediate) reward associated with the observations of targets at various levels of threat. It is important to notice that the level of threat of a target is a quantity that is completely controlled by the state of the targets. Since it is assumed that the targets behave independently of the actions of the radar (as opposed to a game theoretic formulation [47, 82]), we have no effect on the value of the threat. We do however, have an effect on the value of the threat uncertainty. By observing a particular target, we consequently reduce the uncertainty in the state estimates, and since the threat error, eln, is based off this uncertainty, we reduce the threat error as a result. The purpose of the macro-manager scheduler is not to simply reduce the un- certainty in the threat error of all targets, regardless of the threat value, but instead, maximize a reward for observing a target at a specific threat error and the threat value. This follows the same argument regarding the semantic approach to priori- tization (as discussed in the motivation section of the introduction). Consider the case where we have an equivalent threat error in two targets of differing threat esti- mates; this should translate into different incentives for observing each target, and thus different rewards. Observe the following example scenario for justification. Consider two targets, l = 1 and l = 2, with threat estimates and errors obeying the following relationships, r̂1n > r̂ 2 n and e l n = e 2 n. (3.43) That is, the value of the threat estimate of target l = 1 is higher than that of target 48 l = 2 but they both have the same threat error. From a rather intuitive standpoint, it is evident that an uncertainty in a high threat should be of more concern than an uncertainty in a low threat. As mentioned earlier, since a target’s rank of a target is a measure of it’s level of threat, our threat reward function should capture this concept and assign a higher rank to target l = 1. The threat rank function is responsible for specifying the rank of each target. We consider both the level of threat and the threat variance in the definition of the rank, X ln = br̂ l n+ e l n (3.44) Given the random variable, we wish to define it by a specific distribution. For simplicity, we consider the family of probability distributions that can be defined by the two-parameters, mean and variance. The mean and variance of the above rank can be computed as follows, assuming r̂ln is the only random quantity and e l n is completely deterministic, µX ln = E{X ln} = E{brln+ eln} = bE{rln}+ eln = br̂ln+ e l n (3.45) σ2X ln = Var{X l n} = Var{brln+ eln} = b2Var{rln} = b2eln (3.46) where the coefficient b≥ 0 is chosen by the system-designer to define which factor, threat or error, they consider to be a higher threat. The justification for the form of rank mean and rank variance functions is as follows. First, we qualitatively describe the labeled vertices in Figure 3.11 as fol- lows: (1) Low error in a low threat value, (2) Low error in a high threat value, (3) High error in a low threat value, (4) High error in a high threat value. Recall that 49 0 1 2 3 4 5 0 1 2 3 4 5 2 4 6 8 10 0 1 2 3 4 5 0 1 2 3 4 5 2 4 6 8 10 Estimate, r̂ln (a) (b) µXln (1) σ2Xln Error, eln (3) (4) (2) Estimate, r̂ln Error, eln Figure 3.11: The general form of the functions specifying the target rank mean, µX ln , and target rank variance, σ 2 X ln . The target rank mean in- creases in both the threat estimate, r̂ln, and the threat estimate error, e l n. The target rank variance increases monotonically in eln and is indepen- dent of r̂ln. the rank, X ln, of a target is a measure of its level of threat, thus the target with the highest error (highest uncertainty) in the highest threat estimate should be consid- ered to be the highest threat target and thus be assigned the highest rank mean, µX ln . On the contrary, the target with the lowest error (lowest uncertainty) in the lowest threat estimate should be considered the safest target and be given the lowest rank mean. This intuition is reflected in Figure 3.11(a), with the highest and lowest ranks represented by vertices (4) and (1), respectively. The vertices (2) and (3), corresponding to the cases low error in a high threat, and high error in a low threat, respectively, are shown to be assigned roughly the same rank mean, with a slightly higher rank assigned to case (3). If we are indifferent between the cases (2) and (3), we would set the coefficient b = 1. However, if we consider case (2) to be a higher threat than case (3), we would set b > 1. Conversely, the case b < 1 corresponds to considering case (3) to be a higher threat than case (2). The rank mean function exhibits the following dependency on the threat estimate, r̂ln, and threat error, e l n. 1.) Consider two targets with equivalent threat estimates, r̂1n = r̂ 2 n. Then µX1n > 50 µX2n for e 1 n > e 2 n. 2.) Consider two targets with equivalent threat error estimates, e1n = e 2 n. Then µX1n > µX2n for r̂ 1 n > r̂ 2 n. The above relationships are intuitive: the target rank mean increases in both the threat estimate and the level of threat error. The rank variance is only dependent upon the threat error, by Equation (3.46). 3.4.1 Distribution Assignment Given that we have values for the rank mean, µX ln , and rank variance, σ 2 X ln , of the random variables, X1n ,X 2 n , . . . ,X L n , we now aim to specify the probability distribu- tions of the target ranks. To provide rationale for assigning a particular distribution to the target rank, we perform numerical simulations. We consider a fixed true tar- get trajectory and compute the track estimate multiple times, with each evaluation resulting in a threat value. We then produce a histogram of the threat ranks from the threat estimates, the results of which can be seen in Figure 3.12. 0.7 0.8 0.9 1 1.1 1.2 1.3 1.4 0 0.05 0.1 0.15 0.2 0.25 0.3 0.35 0.4 0.45 0.5 Threat Rank Pr ob ab ilit y Figure 3.12: The distribution of threat rank values for a given fixed trajectory. The threat rank evaluations result in a unimodal, symmetric distribution. For simplicity, we quantize the threat rank level into N = 10 discrete levels. Thus the rank distribution of each target will have the same (discrete) support. As a result, 51 we consider the class of discrete distributions. Consider the binomial distribution, a discrete probability distribution with the following PMF, Bin(N, p) = ( N r ) pr(1− p)N−r (3.47) with mean, µb =N p, and variance, σ2b =N p(1− p). For a fixed N, the dependence of the mean and variance on the parameter, p, can be seen in Figure 3.13. (a) (b) p = 1/2 = 1 p = 1 σ2bµb N/4 N/ Figure 3.13: The dependency of the (a) mean, µb, and (b) variance, σ2b , of a binomial distribution on the parameter p for a fixed parameter N. From Equations (3.45 and 3.46) we have expressions for both the rank mean and rank estimate. Given that we have fixed the possible threat values to N = 10 for every target, we now must choose the remaining parameter, p∈ [0,1], to best fit µX ln and σ2X ln . We do this by computing the least squares estimation of the parameter, p. We start by defining the objective: we wish to obtain the least squares estimate of the probability parameter, pln, to minimize the sum of the square of residuals. That is, pl∗n = argmin pln∈[0,1] ‖ρ ln‖22 (3.48) The quantity, ρ ln, is defined as, ρ ln = [ µX ln σ2X ln ] − [ N pln N pln(1− pln) ] = [ µX ln−N pln σ2X ln−N p l n(1− pln) ] (3.49) For simplicity, we will omit the time, n, and target, l, indices for the remainder of 52 the calculations. Given the form of ρ , Equation (3.48) represents a nonlinear least- squares problem, with ‖ρ‖22 = (µ−N p)2 + ( σ2−N p(1− p))2. The following lemma characterizes the solution of the nonlinear least-squares problem. Lemma 3.4.1. For any n > 0, mean µ ∈ R and variance σ2 ≥ 0, the non-linear least-squares optimization, p∗ = argmin p∈C ( µ−np)2+ (σ2−np(1− p))2 (3.50) has the following unique real solution, p∗ = 1 2 + a 3(22/3)n ( b+ √ b2+4a3 )1/3 − ( b+ √ b2+4a3 )1/3 3(24/3)n (3.51) where a = 3n(4σ2+n) and b = 54n2(−2µ+n). Proof. See Appendix C.1.1. We refer to the real solution as the least-squares estimate of the parameter p, denoted by pl∗n ∈ R. As can be seen in Figure 3.14, the value is confined to the range [0,1]. We show that this is indeed true in the following lemma. Lemma 3.4.2. The least-squares estimate pl∗n ∈ R is confined to the range [0,1] for N,σ2 > 0 and µ ∈ [0,N]. Proof. See Appendix C.1.2. The fit of the least-squares estimate, pl∗n , is now analyzed. We plot each term of the objective function, ‖ρ ln‖22, in Figure 3.15. The absolute error values are defined 53 σ2Xln µXln 0 0.5 1 1.5 2 2.5 2 4 6 8 10 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 pl∗n Figure 3.14: The least-squares estimate of the parameter pln of a binomial distribution for a given rank mean, µX ln , and variance, σ 2 X ln . Notice that for µX ln ∈ [0,N] and σ2X ln ∈ [0,N/4] the least-squares estimate is confined to pl∗n ∈ [0,1]. (a) (b) µXlnµXln σ2Xln |eµ| σ2Xln | | |eσ2 | 0 0.5 1 1.5 2 2.5 0 2 4 6 8 10 0 0.5 1 1.5 2 2.5 3 3.5 4 4.5 0 0.5 1 1.5 2 2.5 0 2 4 6 8 10 0 1 2 3 4 5 6 7 8 9 Figure 3.15: The relative errors in the (a) rank mean, µX ln , and (b) rank vari- ance, σ2X ln , obtained from the least-squares estimate, p l∗ n . 54 as, ∣∣eµ ∣∣= ∣∣∣µX ln−N pl∗n ∣∣∣ µX ln , (3.52) |eσ2 |= ∣∣∣σ2X ln−N pl∗n (1− pl∗n )∣∣∣ σ2X ln (3.53) Note that since the least-square estimate pl∗n is dependent on both the rank mean and rank variance, the error quantities ∣∣eµ ∣∣ and |eσ2 | are also dependent upon these variables. Given the least-squares estimate, pl∗n , the binomial approximation to the rank of each target is defined as, X̂ ln ∼ Bin(N, pl∗n ) (3.54) Recall, from Theorem 3.2.2, we are required to order the ranks of all targets, X1n ,X 2 n , . . . ,X L n according to the likelihood ratio order. Since we have assigned a bi- nomial distribution to approximate each target’s threat rank, denoted X̂1n , X̂ 2 n , . . . , X̂ L n , we are now interested in conditions such that we can order the binomial distribu- tions. Recall that we have quantized the threat for each target to a finite number of discrete levels, thus all targets’ binomial distributions have a common parameter, N. We present the following lemma stating conditions on a binomial distribution’s parameter p, such that likelihood ratio dominance between two binomial random variables is achieved. Lemma 3.4.3. Consider two random variables, denoted K1∼Bin(N, p1) and K2∼ Bin(N, p2) where N is a positive integer. Then K1 ≥LR K2 if and only if, p1 ≥ p2 (3.55) Proof. See Appendix, Section C.1.3. 55 As a result of Lemma 3.4.3, by ordering the binomial parameters we are or- dering the corresponding binomial random variables, X̂1n , X̂ 2 n , . . . , X̂ L n , with respect to the likelihood ratio order. Consequently, the corresponding target observation order is specified, the performance of which is analyzed in the numerical results in Section 5. We now introduce a constraint for the operation of the micro-manager which takes into account the priority rank of each target. Maximum Tracking Time Ratio Vector The purpose of this section is to discuss a method to incorporate the threat level of each target into the operation of the micro-manager. As the reader will see later, the goal of the micro-manager controller is to maximize the mutual information be- tween a target’s observations and states. That is, the micro-manager minimizes the error in the track estimates, defined by their respective error covariances, of each target. An assumption of the macro-manager is that we will process all items in the queue. Consider the case where the micro-manager does not consider the threat of the targets as a factor in its operation. This would result in the micro-manager achieving optimal (in a sense defined by the micro-manager) state estimates of each target, regardless of its level of threat. It is clear that this is not ideal in most scenarios, as we would care much more about the accuracy of states estimates of a high threat target than the accuracy of estimates of a low threat target. Thus, the micro-manager needs to take the threat level of each target into account. We consider the threat of each target at the level of the micro-management controller by imposing a maximum tracking time constraint for each queued target, τ̄oln , dependent upon its level of threat. The justification here is that a target should be allowed a shorter tracking time if it is not considered a high threat, reflecting the fact that we wish to not spend many resources on a low threat target. The maximum tracking time of the lth target in the queue is defined as, τ̄oln = max{dιoln τo1n e,τ} (3.56) where τo1n is the optimal stopping time specified by the micro-manager for the first 56 target in the queue.5 The quantity, ιn, is termed the tracking time constraint vector with each element defined as, ιoln = τ̄oln τ̄o1n = τ̄oln τo1n def = E { X̂oln } E { X̂o1n } (3.57) which specifies the ratio of the tracking time allotted to each target relative to the stopping time of the first target in the queue, τo1n . We define this ratio as the relative expected rank of the respective target to the expected rank of the first target in the queue. Recall the optimal ordering theory presented in Section 3.2.2, and the result in Equation (3.4); we know that the following is satisfied, X̂o1n ≥LR X̂o2n ≥LR . . .≥LR X̂oLn ⇒ E { X̂o1n }≥ E{X̂o2n }≥ . . .≥ E{X̂oLn } (3.58) for the order (o1,o2, . . . ,oL). Thus, E { X̂o1n } is the maximum expected rank of all targets. As a result, we can write the tracking time ratios, in Equation (3.57), as, ιo1n = 1≥ ιo2n ≥ ιo3n ≥ . . .≥ ιoLn (3.59) The above specifies for the first target in the queue, o1, the micro-manager should track it for 100% of the specified stopping time, defined as τo1n , and targets o2,o3, . . . ,oL for ratios of τo1n specified by the respective values ιo2n , ιo3n , . . . , ιoLn . By imposing maximum tracking time constraints on the lower priority targets, we are making the estimates of the respective target suboptimal; however, this is by design as we care less about the accuracy of states of lower priority targets. This sacrifice comes at the reward of processing the target queue in a shorter time-frame, freeing up the resources of the radar for additional functions. 5The operation d−e is termed the ceiling function and specifies to round the argument up to the nearest integer. 57 Chapter 4 Radar Micro-manager The target prioritization provided by the macro-manager is computed at a particular instant in time, using the current (and potentially past) observed target states. Since the targets evolve over time, their respective threat to assets will vary, and the ideal prioritization will also change in time. This implies that at some point, we wish to stop using the current target prioritization and give control back to the macro- manager to provide an updated prioritization. The decision on whether to continue or stop with the provided allocation is based upon noisy, Bayesian estimates of the target states, with the stochastic nature of the stopping time problem arising from the fact that the estimates are assumed to be received with a non-unity probability of detection. The purpose of the micro-manager is to determine the optimal time to return control back to the macro-manager, thus providing a solution to the task scheduling problem. In general, this allocation can refer to a variety of prioritizations. The allo- cations mentioned throughout the dissertation are: 1) a priority vector, or 2) a prioritized queue of target observations. Referring to the target and measurement model from Section 2.2, in the case of the prioritized queue, the tracker devotes all of its resources to the current target in the queue and no resources to the remaining targets. Mathematically, assuming target l̄ is the current target in the queue, this is represented by ν l̄ = 1 and ν l = 0 for l 6= l̄, stating that target l̄ is devoted a Kalman filter where all other targets are updated using Kalman predictors. It is evident that the priority vector allocation is a more general form than that of the prioritized 58 queue, thus we will present the micro-management optimization problem in terms of the priority vector, following the results presented in [51]. The problem of resource management using POMDPs has been well researched. Multiple papers have formulated the resource management problem as a POMDP [15, 41, 75]; however, the solution techniques used for the formulations are only tractable for small, unrealistic scenarios. In realistic applications, solutions to POMDPs are intractable (the problem is PSPACE hard [58]). For this reason, structural results are obtained to make the solutions of POMDPs reflecting real- istic scenarios tractable. In particular, we are concerned with determining neces- sary and sufficient conditions such that the optimal solution varies monotonically with the system parameters (a concept termed monotone comparative statistics1). There has been a large amount of work presented in regard to structural results for POMDPs, with the prominent works being that of Albright [2] and Lovejoy [59]. In particular, Albright [2] provides conditions for a two-state system such that the optimal reward and optimal action are both monotone in the current information state. Lovejoy [59] generalizes the results of Albright to n states by invoking the monotone likelihood ratio (partial) order on the set of probability distributions on the information state. Krishnamurthy [48] further generalizes the results of Love- joy by considering non-standard POMDPs (nonlinear in the information state) and by introducing an ordering that results in less restrictive sufficient conditions to achieve a monotone form of the optimal policy. The interest in this chapter is the application of the previously formulated struc- tural results to the problem of radar resource management in GMTI systems. Work has been presented using structural results in the context of radar resource man- agement [48, 49]. Specifically, [48] presents sufficient conditions on the cost func- tions, dynamics of the Markov chain, and target and observation probabilities such that the optimal policy has a threshold structure with respect to monotone likeli- hood ratio ordering on lines spanning the information simplex. The work of [49] applies the results of [48] to the problem of determining the optimal time to stop ob- serving a particular target. Recent work [51] generalizes the results of [49] by de- termining conditions to characterize a monotone threshold policy for determining 1Monotone comparative statistics are concerned with optimal solutions which vary monotonically with respect to its parameter, with the paramount source being the work of Topkis [90]. 59 the optimal stopping time associated with a prescribed priority allocation (instead of prescribing the optimal time to spend observing a single target). The presen- tation for the sufficient conditions, such that the optimal policy takes a monotone threshold form, will follow the discussion in [51]. Once we have shown the mono- tone structure, we are able to efficiently compute a parameterized approximation to the policy. The parameterized policy is computed by minimizing the sample-path cost, a process which is based on the target and observation model and the system parameters. Since the computation is only based on the model, it is computed of- fline, allowing the optimal action to be obtained (in real-time) solely on the current state of the system. 4.1 Radar Micro-manager Structure The radar micro-manager block consists of three main components, as can be seen in Figure 4.1. First, the multi-function radar, with an electronically steered beam, is responsible for transmitting and receiving radar pulses, from which target data is generated. Second, the Bayesian tracker obtains the information for all targets from the multi-function radar and forms track estimates and associated error co- variance matrices. If necessary, we direct the reader to Section 2.3 where the de- tails of the Bayesian tracker were originally discussed. Lastly, the micro-manager scheduler is responsible for specifying the optimal action (from the optimal policy) based on the Bayesian estimates received from the tracker and the target prioritiza- tion provided by the radar macro-manager. The action generated by the scheduler controls for how long the radar uses the allocation provided by the macro-manager. The focus of this chapter is on the contents of the micro-manager scheduler block, primarily the sufficient conditions such that the optimal policy takes a monotone threshold form. 4.2 Optimal Stopping as a Sequential Decision Process We formulate the radar micro-manager as a multivariate sequential decision pro- cess, a form of a POMDP. We start by introducing the components of a sequential decision process, which is comprised of the following five ingredients [73], 60 Micro-manager(3) Bayesian estimate(5) Electronically Steered Beam Beam Multi-function Radar Bayesian Tracker Tracker L targets Micro-manager Scheduler policy target data Bayesian estimate Target Prioritization Figure 4.1: A schematic of the radar micro-manager. 1. Decision epochs: defines when decisions can be made. 2. State space: the set of allowable states that the system can reside in. 3. Transition probabilities: the probability that a particular state-to-state tran- sition will occur given a particular action. 4. Action space: the set of actions available to the decision maker. 5. Costs and Rewards: the immediate costs and rewards associated with choos- ing a particular action in a particular state. In addition to the above five components, we define an optimization objective, such that the actions are chosen to minimize a cost (or equivalently maximize a reward), the details of which are provided in Section 4.2.5. A Markov decision process (MDP) is a specific type of sequential decision model where we place the restriction that the choice of an action for time m+ 1, 61 is based solely on the information available at time m. Also, for the purposes of the micro-manager, we focus on a generalization of an MDP, where it is assumed that we cannot directly observe the underlying states, termed a Partially Observed Markov Decision Process (POMDP). This is justified due to the fact that we are unable to observe the true tracks of or targets due to the presence of noise and measurement errors. 4.2.1 Decision Epochs The decision epochs refer to the times at which the micro-manager can specify an action. For both listed target prioritizations (Section 2.1), the decision epochs are the iterations k on the micro-manager time scale (see Figures 2.2 and 2.3). That is, k = {1,2, . . .} . (4.1) The above implies that the decision epochs are an infinite set; however, in Section 4.2.5 we formulate the micro-manager optimization as an optimal stopping time problem, with a stopping rule. Thus the decision epoch set is indeed finite, but since the stopping time is unknown (stochastic) we define at set of decision epochs as stated. 4.2.2 Target States and State Update The information required to make a decision concerning the optimal action is con- tained within the state of the system. We define the state of the micro-manager problem to be the set of error covariance matrices Pk. Pk = (P̄k,Pk) = ({P̄1k , P̄2k , . . . , P̄Lk },{P1k ,P2k , . . . ,PLk }) (4.2) where P̄lk = E {( slk−E(slk) )( slk−E(slk) )T} = Lk−1(Pl0) is the purely predicted (measurement-free) estimate and can be computed for any time k solely from the initial error covariance matrix P0. The notation Lk−1() refers to the k− 1 itera- tion of the Lyapunov function. The quantity Plk is measurement-dependent and is defined as Plk = E {( slk−E(slk|zlk−1) )( slk−E(slk|zlk−1) )T} =R(Plk−1,z l k−1), termed the filtered estimate. The quantity P̄lk can be interpreted as the error covariance 62 of the state estimate when not conditioned on any measurements; whereas, Plk is conditioned on the measurement zlk−1. The state update is dictated by the modified Riccati update and Lyapunov up- date. Based on the current state Plk and the received measurement z l k, we compute the next state Plk+1 using one of the following equations, Plk+1 = R(P l k ,z l k) def = FPlkF T +Q−1 /0C(zlk)FPlkHT (HPlkHT +R)−1HPlkFT (4.3) Plk+1 = L(P l k) def = FPlkF T +Q (4.4) where the modified Riccati update, Equation (2.20, 4.3), is used if we are filtering a particular target, and conversely, the Lyapunov update, Equation (2.21, 4.4), is used if we are predicting a particular target, first defined in Section 2.3. The complete state update is dependent upon the target prioritization specified by the macro- manager. 1. Target Priority Allocation Vector The covariances P̄lk are updated, as stated previously, using the Lyapunov equation. The covariances Plk are updated using the Riccati equation, taking into account the priority vector ν = [ν1,ν2, . . . ,νL]. Thus, Pk+1 = ( L(P̄lk),R(P l k ,z l k) ) for each l ∈L (4.5) 2. Target Priority Queue Let ol be the current target in the observation queue. For a given macro- management cycle, n, and micro-manager tracking interval k = 1,2, . . . ,τoln , target ol is updated via the Riccati equation, Polk+1 = R(P ol k ,z ol k ) (4.6) whereas the states of all other targets l 6= ol are updated via the measurement- free Lyapunov equation, Plk+1 = L(P l k) for all l 6= ol (4.7) 63 Similarly, as the queued targets are processed, state updates are made ac- cordingly, with the radar manager only filtering the current queued target and predicting all other targets. The rationale for listing multiple choices for the target prioritization is to illus- trate the fact that the micro-manager specifies the optimal time to terminate any allocation provided by the macro-manager, and does not just serve the purpose of specifying the optimal time to stop tracking a particular target (although this is one of the choices of the prioritization). A decision on the action for the next iteration k+ 1 is dictated by an optimal policy uk+1 = η∗(Pk+1) (discussed in Section 4.2.5) and is based on the updated state Pk+1. The timeline of events for micro-manager can be seen in the following diagram, Figure 4.2. }Propagation of targets Measurement received, zlk , k , k + 1Next action, uk+1 = η ∗(Pk+1) Current state, Pk State update, Pk+1 Figure 4.2: Micro-manager timing diagram. 64 4.2.3 Action Space Recall that the objective of the micro-manager is to determine the optimal time to stop using the current target prioritization and release control back to the macro- manager. Given that the problem is to find the optimal stopping time to terminate tracking the current target or the current priority allocation, the action set is simply, U = { u1,u2 } (4.8) where, u1→ stop and u2→ continue, which refers to whether the scheduler chooses to stop or continue with the current prioritization. 4.2.4 Operating and Stopping Costs Operating Cost It is now necessary to define the operating cost and stopping cost of the radar, corresponding to the actions u2 and u1, respectively. The operating cost is assumed to be fixed in time k, and independent of the current state and measurements. We denote the operating cost as, C̄(sk,Zk,u2) = cν > 0 (4.9) Mutual Information Stopping Cost The stopping cost is dependent upon both the states of the targets, slk, and the measurements, Zlk. We define the cost in terms of a measure of mutual information between the target states and the target measurements. The definition of mutual information is as follows, Definition 4.2.1 (Cover and Thomas [17]). The mutual information between two random variables, X and Y , is defined as, I(X ;Y ) = ∫ x ∫ y p(x,y) log ( p(x,y) p(x)p(y) ) (4.10) 65 where p(x) and p(y) represents the probability distributions of X and Y respec- tively with p(x,y) representing the joint probability distribution. With some manipulation, the mutual information can be written as, I(X ;Y ) = ∫ x ∫ y p(x,y) log ( p(x,y) p(x)p(y) ) dydx (4.11) = ∫ x ∫ y p(x,y) log ( p(x|y) p(x) ) dydx (4.12) = ∫ x ∫ y p(x,y) [log(p(x|y))− log(p(x))]dydx (4.13) = ∫ x ∫ y p(x,y) log(p(x|y))dydx− ∫ x ∫ y p(x,y) log(p(x))dydx (4.14) =− ∫ x p(x) log(p(x))dx− ( − ∫ x ∫ y p(x,y) log(p(x|y))dydx ) (4.15) The entropy (representing the amount of hidden information in a random variable) of a continuous random variable X , termed the differential entropy, is defined as, H(X) =− ∫ x p(x) log(p(x))dx (4.16) The conditional entropy of a random variable X , conditioned on another random variable Y is defined as, H(X |Y ) =− ∫ x ∫ y p(x,y) log(p(x|y))dydx (4.17) The conditional entropy H(X |Y ) represents the amount of uncertainty “left-over” in X given knowledge about Y . Equation (4.15) can be written in terms of differential entropies as, I(X ;Y ) = H(X)−H(X |Y ) (4.18) Note, that mutual information is commutative, in the sense that the uncertainty reduction in X by knowing Y is the same as the uncertainty reduction in Y by 66 knowing X , that is, I(X ;Y ) = I(Y ;X). Given that the Bayesian tracker is providing the scheduler with target state estimates that are distributed as multi-variate Gaussian random variables, it is nec- essary to state the following result on the entropy of a multi-variate Gaussian dis- tribution. Theorem 4.2.2 (Entropy of a multi-variate Gaussian distribution, [17] p.249). Let X ∈ Rn have a multi-variate Gaussian distribution with mean µX and covariance KX . Then H(X) = n 2 log(2pie|KX |) (4.19) where |KX | denotes the determinant of KX . Define l̄ as the highest priority target. In the case of the macro-manager as- signing a priority vector, l̄ would be defined as l̄ = argmaxl∈L ( ν l ) . If the macro- manager has specified a priority queue, the current target in the queue is assigned the index l̄. From the definition of Pk in Section 4.2.2, and Theorem 4.2.2, the mutual infor- mation, in Equation 4.18, at time k for target l (indices omitted), can now defined in terms of the covariances P̄k and Pk as, I(slk;Z l k) = H(s l k)−H(slk|Zlk) (4.20) = n 2 log ( 2pie|P̄lk | ) − n 2 log ( 2pie|Plk | ) (4.21) = n 2 [ log |P̄lk |+ log(2pie)− log |Plk |− log(2pie) ] (4.22) = n 2 [ log |P̄lk |− log |Plk | ] (4.23) = 2 [ log |P̄lk |− log |Plk | ] (4.24) We define a more general form of the above mutual information equation 67 through the inclusion of positive constants α, β ∈ RL+, Î ( Plk,α l,β l ) = α l log |P̄lk |−β l log |Plk |. (4.25) The constants α, β are chosen to affect the behaviour of the scheduler. Note that when α l = β l = 2, for all l ∈L , the expression in Equation (4.25) agrees with the classical definition of mutual information in Equation (4.24). We now define a stopping cost of the following form [51], C̄(Pk,α,β ,uk = u1) =−Î ( Pl̄k ,α l̄,β l̄ ) +F [ Î ( Plk ,α l,β l ) ; l 6= l̄ ] (4.26) where F : RL−1→ RL−1 is a monotone (order preserving) function for each of its L− 1 elements. The stopping cost, Equation (4.26), is an intuitive choice. For a clear understanding, consider a scenario where we do not allow for the possibil- ity of missed detections, that is pd = 1. As the radar observes target l = l̄, the conditional differential entropy H(slk|Zlk) will decrease as we condition upon more measurements. Thus, by the definition of mutual information, see Equation (4.20), the mutual information between the target states sl̄k and the measurements Z l̄ k will monotonically increase over time, that is, Î ( Pl̄k+1,α l̄,β l̄ ) ≥ Î ( Pl̄k ,α l̄,β l̄ ) (4.27) Thus, resulting in the first term of the stopping cost to decrease as k increases. Si- multaneously, the mutual information between the states of the unobserved targets, sl 6=l̄k , and their respective measurements, Z l 6=l̄ k , will decrease in time k. As a result, since F is assumed to be monotone, the stopping cost will decrease in k, C̄(Pk,α,β ,u1)≥ C̄(Pk+1,α,β ,u1) (4.28) The cost for stopping becomes “cheaper” the longer we spend observing target l = l̄. Given that the operating cost of the radar is a fixed constant in time k, see Equation (4.9), the cost for stopping will eventually become less than the cost for continuing, and it would be logical for the scheduler to specify the stop action, u1. Recall that in general, we do allow for the possibility of missed detections, thus 68 pd < 1. As a result, the state Pk is a random variable, and the inequalities stated in Equations (4.27) and (4.28) are no longer guaranteed to hold, thus stopping in finite time (τ <∞) is not assured. To alleviate this problem, we impose a maximum stopping time, τ̄ , such that if k = τ̄ , action uk = uτ̄ = u1 is automatically chosen.2 Thus we have defined a stopping rule and a decision to stop will be made at some time τ < ∞ almost surely (with probability one). Recall the discussion from Section 2.1 concerning the purpose of the macro- manager. The specific stopping cost is dependent upon the prioritization provided by the macro-manager, 1. Target Priority Allocation Vector In this case, the macro-manager provides the micro-manager with a vector, ν = [ ν1,ν2, . . . ,νL ] corresponding to the priority allocated to each target. In this case, the stopping cost is a function of the mutual information (and follows directly from Equation (4.26)), C̄(Pk,α,β ,u1) =−α l̄ log |P̄l̄k |+β l̄ log |Pl̄k |+F [ α l log |P̄lk |−β l log |Plk |; l 6= l̄ ] (4.29) 2. Target Priority Queue In the case of the macro-manager specifying a target priority queue, the radar is devoting all of it’s resources to target l̄ and none of its resources to targets l 6= l̄, that is, ν l̄ = 1 and ν l = 0 for all l 6= l̄. Since we are only filtering target l̄ and predicting all other targets l 6= l̄, the following holds, P̄lk = P l k for all l 6= l̄ (4.30) and the mutual information in Equation (4.24) is zero. As a result, we set α l = 0 for all l, and consider the difference in differential entropy as the stopping cost for the case of the target priority queue, C̄(Pk,u1) =−Î ( Pl̄k ,0,β l̄ ) +F [ Î ( Plk ,0,β l ) ; l 6= l̄ ] (4.31) 2This maximum stopping time is encoded in the ιn vector defined by the macro-manager. 69 = β l̄ log |Pl̄k |−F [ β l log |Plk |; l 6= l̄ ] (4.32) where 0 is a vector of zeros of length L− 1. The above expression is of the same form (up to constants) to the difference in the marginal differential entropy and the conditional differential entropy. C̄(Pk,u1) = H(Pl̄k)−F [ H(Plk); l 6= l̄ ] (4.33) = 2log ( 2pie|Pl̄k | ) −F [ 2log ( 2pie|Plk | ) ; l 6= l̄ ] (4.34) = 2log(2pie)+2log |Pl̄k |−F [ 2log(2pie)+2log |Plk |; l 6= l̄ ] (4.35) The subsequent optimization problem involving the aforementioned stop- ping cost does is not affecting by the presence of additive constants, thus we can safely ignore these and denote the stopping cost as Equation (4.32). The precise choice of the function F can vary depending on the scenario as long as the monotonicity property is preserved. The choice of F will affect the behaviour of the micro-manager scheduler. A variety of forms are, • Cumulative Function: F[(−) ; l 6= l̄]= 1L−1 ∑l 6=l̄ {(−)} • Maximum Function: F[(−) ; l 6= l̄]= maxl 6=l̄ {(−)} • Minimum Function: F[(−) ; l 6= l̄]= minl 6=l̄ {(−)} The optimization problem, termed the optimal stopping time problem is now formally presented. 4.2.5 Objective: Optimal Stopping Time The purpose of the radar micro-manager is to determine the optimal time to con- tinue tracking with the target prioritization provided by the macro-manager. This can be formulated as a class of problems termed optimal stopping time problems. An optimal stopping time problem is any problem where we are able to choose 70 when to stop a particular process in order to minimize a cost (or equivalently, max- imize a reward). We provide a simple example of an optimal stopping time problem to provide background for the reader. The following example presents a scenario where a seller must choose whether to accept or reject consecutive offers on an asset. Example 4.2.3 (The Asset Selling Problem [10]). Consider a person selling an asset for which they are offered an amount of money, mk ∈ [0,M] (independent random variables), at each time period, k = 1,2, . . . ,N− 1. The person’s choices at each period k are “sell,” for which they sell the asset at price mk, and “do not sell,” causing them to wait until the next period for the subsequent offer. It is assumed that the person cannot later accept previously rejected offers, and that they also must accept the offer mN−1. Given that the accepted offer will gain a constant interest rate of r > 0 until time k = N, what is the policy which maximizes the personal profit at the Nth period? The optimal policy in the previous example specifies to accept the offer (action “sell”) if it is greater than the expected revenue discounted to the current time, oth- erwise “do not sell”. The solution was obtained by rewriting the problem in the form of Bellman’s equation [9] and using standard dynamic programming tech- niques to calculate the optimal policy. A complete worked solution can be found in [10]. Before discussing the full micro-manager optimal stopping time problem, we first present the following deterministic scenario with a unity probability of detec- tion, pd = 1, to offer some insight into the objective of the micro-manager. Example 4.2.4. For this framework, consider the stopping time problem with a unity probability of detection, pd = 1, and fixed initial error covariance matrices, P0 = P̂0, P̄0 = P̂0. The optimal policy aims to determine the optimal time to stop 71 tracking using the current allocation, τd , which satisfies the following, τd = argmin τ∈[0,τmax] { (τ−1)cν +C̄(Pτ ,u1)|P0 = P̂, P̄0 = P̂ } (4.36) where τmax is the maximum allowable stopping time imposed to ensure the micro- manager tracking cycle stops in finite time. We define the following quantity, D(P,τ) = (τ−1)cν +C̄(Pτ ,u1) (4.37) We solve this optimization numerically by plotting D(P,τ) versus τ for the fixed initial error covariances. The following plot, Figure 4.3, shows the dependence of D(P,τ) on τ for various operating costs, cν , 5 10 15 20 25 30 66 68 70 72 74 76 78 80 82 84 86 Deterministic stopping time, τ Sa m pl e − pa th co st , D ( P , τ) c ν = 0.5 c ν = 0.6 c ν = 0.7 c ν = 0.8 c ν = 0.9 c ν = 1 Figure 4.3: Deterministic sample-path cost, D(P,τ), versus stopping time τ . 72 Analyzing Figure 4.3, we can see that there exists a time at which the accu- mulated operating cost exceeds the stopping cost, corresponding to the minimum of the respective plot. The optimal deterministic stopping time for each value of cν is achieved at the minimum of the respective plots. The optimal determin- istic policy thus assigns the optimal stopping times τd = (13, 10, 8, 7, 6, 6) for cν = (0.5, 0.6, 0.7, 0.8, 0.9, 1.0), respectively. The previous example illustrates that the optimal policy in the deterministic case is to simply find the time associated with the minimum of a deterministic function. However, in the stochastic case, where pd < 1, noise is introduced, thus causing the given function, D(P,τ), to not have a well-defined minimum. One may question why the policy for the stochastic case would not just follow a similar procedure as the one discussed above, but now plot the expectation of the function, D(P,τ), for each stopping time. There is an inherent loss of performance associated with doing this: we would be averaging D(P,τ) for each deterministic stopping time, thus removing the stochastic aspect of the optimal policy. That is, the nature of the optimal policy for the stochastic case is that it will not always assign a fixed stopping time given a particular set of initial covariance matrices, but instead, it will prescribe an optimal stopping time that is dependent upon the sequence of measurements up until the stopping time. For further justification of this argument, we formally introduce the stochastic optimal stopping time problem. Let η be a stationary policy which maps the error covariance set, Pk, at time k to an action uk, η : Pk→ uk ∈ { u1, u2 } (4.38) where, as defined earlier, u1 : (stop) and u2 : (continue). We wish to find the policy that minimizes the sample-path cost, defined as, Jη(P) = Eη { τ ∑ k=1 C̄ (Pk,uk) ∣∣P0 = P, P̄0 = P} (4.39) = Eη { (τ−1)cν +C̄(Pτ ,u1)|P0 = P, P̄0 = P } (4.40) 73 That is, η∗(P) = argmin η∈η {Jη(P)} (4.41) The policy η∗(P) is termed the optimal policy. The typical solution method for dynamic programming problems begins by writing the problem in terms of a set of equations termed Bellman’s dynamic pro- gramming equations [9]. Writing the optimal stopping time problem in this form results in the following, V̄ (P) = min { C̄(P),cν +Ez [ V (R(Pl̄,zl̄),L(P̄l̄),R(Pl,zl),L(P̄l)) ]} (4.42) η(P) = argmin { C̄(P),cν +Ez [ V (R(Pl̄,zl̄),L(P̄l̄),R(Pl,zl),L(P̄l)) ]} (4.43) where l 6= l̄ indexes the remaining L−1 targets. The optimal stopping set is defined by, Sstop = { P : η∗(P) = u1 } (4.44) It is the goal of the micro-manager to characterize the optimal policy, η∗(P), which generates the optimal stopping set, such that the sample-path cost, Equation (4.39), is minimized. Techniques typically used to solve the Bellman equation rely on properties that are unfortunately not satisfied by the above formulation. For exam- ple, the state space, defined as the set of error covariance matrices Pk, is uncount- ably infinite. As a result, in order to solve Equations (4.42) and (4.43), we rely on structural results, formulated in [51] and outlined in the following section. 4.3 Monotone Threshold Policy The main theoretical results behind the form of the optimal stopping time policy are now presented, based on the recent work of Krishnamurthy et al [51]. The main theorem of this section states the existence of an optimal threshold policy, η∗(P), which is monotone in the error covariance matrices of the targets, P. Sufficient conditions for this monotone policy are obtained by exploiting monotone properties of the Lyapunov and Riccati filter updates, stopping cost (stochastic observability), 74 and value function. Before stating the main theory of the micro-manager scheduler, it is necessary to introduce some notation. We define M to be the set of all S×S real, symmetric, positive semi-definite covariance matrices. We say that one matrix, A ∈M dom- inates another B ∈M in a positive-definite sense if xT Ax ≥ xT Bx for all non-zero x ∈RS×1 denoted as A B. Also, we denote A B if xT Ax > xT Bx for non-zero x. Note that this is a partial order, since not all matrices can be ordered with respect to the positive-definite order, thus [M,] is a partially ordered set. The following theorem is the main result of the micro-management scheduler and concerns the structure of the optimal policy η∗(P). Theorem 4.3.1 (Krishnamurthy et al. [51]). Given the sequential decision problem with sample-path cost, Jη(P) = Eη { (τ−1)cν +C̄(Pτ ,u1)|P0 = P, P̄0 = P } and stopping set, Sstop = { P : η∗(P) = u1 } with a general stopping cost (stochastic observability) of, C̄(Pk,α,β ,u1) =−α l̄ log |P̄l̄k |+β l̄ log |Pl̄k |+F [ α l log |P̄lk |−β l log |Plk |; l 6= l̄ ] The following results hold, 1. Target Priority Allocation Vector: The optimal decision policy η∗(P) is in- creasing in Pl̄ , decreasing in P̄l̄ , and decreasing in Pl and increasing in P̄l for each l 6= l̄, on the poset [M,]. 2. Target Priority Queue: The optimal policy η∗(P) is increasing in Pl̄ and, for each l 6= l̄, decreasing in Pl on the poset [M,]. Proof. See Krishnamurthy et al. [51]. 75 The proof of Theorem 4.3.1 relies on the following structural monotonicity results on the partially ordered set [M,]. The details of how these results are used can be found in the complete proof in [51]. (1) Monotone properties of Lyapunov and Riccati update equations: Two sets results are necessary, (i) For P1  P2 then R(P1,z)  R(P2,z) and L(P1)  L(P2) [3]; and (ii) The functions |R(P,z)|/|P| and |L(P)|/|P| are monotone non-increasing in P. (2) Monotone cost function: Using the above result (1)(ii), it can be shown that the stopping cost is mono- tone non-increasing in Pl̄, P̄l and monotone non-decreasing in Pl, P̄l̄ for all l 6= l̄. (3) Monotone policy: Using properties of supermodularity [90], namely the fact that for supermod- ular (submodular) functions f (x,u), the argminu { f (x,u)} is non-increasing (non-decreasing) in argument x, it is shown that the optimal policy η∗(P) is non-decreasing in Pl̄ and non-increasing in Pl , l 6= l̄. The monotonicity results are shown using induction on the value iteration algo- rithm. In particular, the value iteration algorithm, a fixed point iteration of the Bellman equation, is applied recursively over the iterations k = 1,2, . . .. The proce- dure uses the aforementioned structural results to prove the existence of an optimal monotone threshold policy, η∗(P). 4.4 Parameterized Policy Given that we have proved the existence of the optimal policy η∗(P) and shown it is monotone, we are able to parameterize and efficiently compute an approxima- tion. The knowledge of the monotone structure of the optimal policy still requires us to define the precise form of the approximation. The two main categories of parameterizations discussed are linear and affine hyperplanes. Since both of these are inherently only approximations to the true policy, the behaviour of the parame- terized policy will indeed be suboptimal, the performance of which can be seen in the numerical results presented in Chapter 5. 76 4.4.1 Parameterizations There are multiple parameterizations that could be used to approximate the optimal policy. The only restriction is that the parameterized policy follows the conditions stated in Theorem 4.3.1. The following parameterization, Equation (4.45), is an affine hyperplane approximation which, upon observation, can be seen to satisfy the aforementioned conditions of Theorem 4.3.1. Both linear and affine approxi- mations were computed; however, the affine policy approximations offered much better performance than their linear counterparts. ηθ (Pk) = { u1 if −θ l̄T Pl̄kθ l̄ +θ l̄ T P̄l̄kθ l̄ +∑l 6=l̄ [ θ lT Plkθ l +θ l T P̄lkθ l ] ≥ 1 u2 otherwise (4.45) The optimal policy is thus parameterized by the parameter vectors, θ l,θ l , with l = {1,2, . . . ,L}. We define the complete parameter vector, θ , as the vertical con- catenation of the 2L parameter vectors, such that θ is of dimension 2SL×1. In order to ensure that the parameterized policy, Equation (4.45), satisfies the conditions in Theorem 4.3.1, we need to specify allowed values for θ . A neces- sary and sufficient condition for the parameterized policy ηθ (P) to be monotone increasing in Pl̄, P̄l and monotone decreasing in P̄l̄, P̄l for l 6= l̄, is θ ∈Θ=R2SL×1. Also, to ensure that the optimal parameter vector is unique (to arbitrary scaling), we require that for each l, θ lT θ l = 1 (similarly for θ l). This is guaranteed by parameterizing each according to a spherical coordinate parameterization, θ lφ = [ cos(φ l1) sin(φ l 1)cos(φ l 2) sin(φ l 1)sin(φ l 2)cos(φ l 3) sin(φ l 1)sin(φ l 2)sin(φ l 3) ]T where φ l ∈ R3×1. We define φ as the vertical concatenation of all φ l and φ l , thus φ ∈ R6L×1. Stochastic approximation algorithms can now be used to estimate the optimal parameter vector, φ ∗, and in turn, the optimal parameterized policy η∗θ (P). 4.4.2 Policy Estimation via Stochastic Approximation There are many methods available to determine the optimal parameter vector. Meth- ods such as gradient search, simulated annealing, random search, genetic algo- 77 rithms and neural networks, are just a few of the many algorithms out there. The gradient search methods obtain a local minimum of the cost function by moving in a decreasing direction, determined by the gradient. This is a relatively efficient method; however, since the algorithm only finds a local minimum, many initial conditions must be used in order to have a reasonable chance of finding the global minimum. The method of simulated annealing attempts to intentionally move in locally “worse” directions (opposing the gradient) to prevent being caught in local minima. A guaranteed way to ensure that the search does not get trapped in a lo- cal minimum, given a large enough search space, is to use a basic random search method. The issue with random search is that the necessary search space grows exponentially as the size of the parameter being estimated increases. That is, for a discrete-valued parameter vector of length 8 where each element could take on 100 possible values, there are 1008 possible combinations. For our problem, the parameter vector is continuously valued, not discrete, therefore the search space becomes uncountably large. Adding noise into the problem requires a certain de- gree of averaging to find the true minimum, thus increasing search time further and making this method rather impractical for our problem. As a result, we decided to pursue a hybrid gradient search and random search method. Genetic algorithms and neural networks are also used for stochastic search and optimization; however, they will not be discussed here. For a good overview of simulation optimization and stochastic approximation algorithms see [23]. Typically it is difficult to come up with a fair metric for measuring the perfor- mance of the various stochastic approximation algorithms, thus making it difficult to make comparisons and make the most appropriate choice for the problem at hand. This is due to the fact that due to the stochastic nature of the problem; we cannot obtain an exact value for a given minima, but rather just an estimate. The only way to reduce the variance of this estimate is to run multiple averaging runs, which can be very time consuming, and also leaves the possibility for making false conclusions when comparing different approximation methods if the variance is not sufficiently reduced. Following the reasoning in [88], we measure efficiency of each method through how many function evaluations the algorithm requires. For gradient search algorithms, the best known techniques are perturbation analy- sis and likelihood ratio/score function methods. We chose a perturbation analysis 78 method, simultaneous perturbation stochastic approximation (SPSA) to be exact, since it only requires two evaluations of the cost function (independent of the size of the parameter being estimated), thus making it more efficient than the common Kiefer-Wolfowitz algorithm which requires 2n evaluations, where n is the length of the parameter vector being estimated. The algorithm to determine the approxi- mation to the threshold policy using SPSA is outlined below, Algorithm 1: Multi-Linear Threshold Policy Approximation Step 0: Initialization Choose highest priority target l ∈ L. Define Pl̄0 and P l 0 for l 6= l̄ initial covariance matrices. Define SPSA parameters. Set n = 1 Step 1: Stochastic Optimization (SPSA Loop) Compute initial threshold coefficients, φ0 by sampling from the set Φ= R6L×1. For n = 0,1,2, ..,Nmax Define SPSA parameters, εn = ε(1+n)ζ , δn = δ (1+n)γ , ωn(i) = ±1 w.p. 0.5 where i = {1,2, ...,SL}. Perturb parameter vector, φ±n = φn±δnωn. Simulate Jφ+n and Jφ−n , For k = 1,2, ...,Kmax Update states accordingly (see Section 4.2.2). Choose action based on the policy, u±k+1 = ηφ±n (Pk+1). If u+k+1 = stop, then Jφ+n = ∑ k+1 m=1 C̄(Pk+1,u + m). If u−k+1 = stop, then Jφ−n = ∑ k+1 m=1 C̄(Pk+1,u − m). End For; End Simulate; Compute gradient estimate, ∇̂φJ = Jφ+n −Jφ−n 2µnωn . Update parameter vector, φn+1 = φn− εn∇̂φJ. End For; 79 Step 2: Termination The parameter vector corresponding to the best multi-linear approximation to the threshold policy is given by, φ ∗ = argminφ∈ΦJφ = φNmax . In practice there may be differences between the true (no noise) minimum and the terminal estimate pro- vided by the algorithm. The above only holds if Nmax is sufficiently large, of which is typically impractical for most applications. Thus the above procedure should be monitored by a human operator to ensure convergence to a (possibly local) mini- mum. An example convergence plot of the SPSA algorithm can be seen in Figure 4.4, 200 400 600 800 1000 1200 1400 0 10 20 30 40 50 60 70 80 SPSA iteration, n Sa m pl e − pa th Co st Figure 4.4: Convergence of the SPSA algorithm to a minimum of the objec- tive function. 80 Chapter 5 Numerical Results In this section, we present numerical results to display the performance of the radar manager. We first generate scenarios for the radar macro-manager and evaluate the performance of the computed optimal order. Then, we analyze the performance of the micro-manager by evaluating its performance in two common radar applica- tions, the target fly-by problem and the persistent surveillance problem. Compar- isons to other algorithms are made in both the macro-manager and micro-manager simulation sections.1 Below is the general protocol outlining the operating and interaction of the radar manager. The simulations are based on the following operation. Protocol 1: Dual Time-scale Radar Manager Operation Step 0: Initialization Define asset locations, (xp1 ,y p 1),(x p 2 ,y p 2), ...,(x p A,y p A). Construct graph G= {V,W} and compute vulnerability weights, w̄a (Section 3.3.2). Step 1: Target Prioritization (Macro-manager) Compute threat estimate, r̂ln, of each target. 1All simulations were carried out for L = 4 targets. 81 Compute threat estimate error, eln, of each target. Assign priority distributions, X̂ ln, from least-squares binomial parameters, p l∗ n . Order binomial parameters to construct optimal observation queue, o∗. Compute maximum tracking time vector, ιn. Step 2: Target Scheduling (Micro-manager) For i = 1,2, ..,L, Observe ith target in observation queue o∗, and set k = 1. While k < ιn(i), Update covariance matrices of each target accordingly (Section 4.2.2). Compute action uk+1 = η∗(Pk+1). If uk+1 = stop break; End If; Increment fast time step, k = k+1. End While; End For; Increment slow time step, n = n+1, and return to Step 1. 5.1 Macro-manager Numerical Results The performance of the macro-manager is now analyzed. Recall that the purpose of the macro-manager was to determine a target prioritization, specifically an optimal target observation queue, based on the target state estimates, targets’ error covari- ances, and asset locations. To evaluate the performance of the macro-manager, we compute the optimal order associated with various sets of initial target scenarios and compare the optimal order’s performance with the performance of all other possible observation orders2 on each of the initial target scenarios. We assume M sets of initial conditions, indexed by m. This process allows us to illustrate the 2For a system with L targets, this translates into L! possible orders. 82 flexibility of the application of the optimal order policy. For simplicity, each one of the scenarios is assumed to have the same fixed asset locations. The initial target conditions used for the following simulations were generated as follows. The true target positions, x10,x 2 0, . . . ,x L 0 and target velocities, ẋ 1 0, ẋ 2 0, . . . , ẋ L 0 , were generated by sampling from the uniform distributions U(−100,100) and U(−4,4), respectively. For simulation purposes (to construct the initial error co- variance matrices) it was necessary to also generate initial target state estimates. These estimates were generated by the inclusion of an additive Gaussian noise term. The initial position estimates were computed as, x̂l0 = x l 0 +w l x and velocity estimates, ˙̂xl0 = ẋ l 0 +w l ẋ, where w l x ∼ N(0,102) and wlẋ ∼ N(0,22) for all l. Simi- larly, the true states and state estimates for the y component were generated using the same parameters. The true target state vectors and state estimate vectors were then constructed as sl0 = [x l 0, ẋ l 0,y l 0, ẏ l 0] T and ŝl0 = [x̂ l 0, ˙̂xl0,y l 0, ˙̂yl0] T , allowing the initial error covariance matrix to be computed, Pl0 = (s l 0− ŝl0)(sl0− ŝl0)T . The optimal order was then computed following the procedure outline stated in Step 1 of Protocol 1. The stopping time of each target observation was considered to be a random time, τ , sampled from the uniform distribution U(τ, τ̄) where τ = 5 and τ̄ = 10 iterations, which corresponds to random stopping time (at T = 0.1s) of 0.5–1 seconds. The following plot, Figure 5.1, presents a comparison of the average reward associated with processing each queue item using various policies. The reward for each policy (order) is computed over all initial target states, m ∈M, and averaged. Notice that the average reward of the optimal policy dominates all other policy rewards for each target observation. Practically, this is of interest since if the user decides to terminate the queue before completion, the optimal policy will have achieved the highest reward (out of all policies) for the targets already observed. Figure 5.1 shows the average reward over all initial target states, but it would be of interest to also analyze the performance of the optimal order for each m ∈M. Figure 5.2 shows the reward associated with the optimal order for each set of initial target states. Notice that the optimal policy achieves the maximum reward for nearly all of the initial conditions. The few cases where an alternate order results in a higher reward is due to the random motion of the target resulting in a different (than the initial) priority rank realization by the time the corresponding target is 83 1 2 3 4 0 0.5 1 1.5 2 2.5 3 3.5 4 4.5 5 Queue Item R e w a rd Optimal Order Policy Random Order Policy Deterministic Order Policies Figure 5.1: Average gain in reward for each target observation using various ordering policies. 0 10 20 30 40 50 60 70 80 90 100 3 3.5 4 4.5 5 5.5 6 6.5 Ordered initial target states, m R ew ar d Optimal Order Policy Random Order Policy Deterministic Order Policies Figure 5.2: Performance of optimal policy for each intial target set, m. 84 observed. The performance of the optimal order policy is dependent upon how close the actual threat rank realization (by the time the target is observed) is to the initial threat ranks. This is due to the fact that the computation of the optimal order is based solely on the initial estimates of the target states. The performance would then be optimal if the targets were assumed to be static over a macro-management interval, n (since in this scenario, the threat rank realizations will be equivalent to the initial random variables used to compute the optimal order). In reality, the targets are not fixed in time; however, at the time-scale at which the macro-manager operates and the nature of the (slow moving) targets, the states of the targets do not change at a rate such that the updated order (if it were computed at the time the target was observed) would be different than the initial optimal order. 5.2 Micro-manager Numerical Results. To illustrate the micro-management algorithms proposed in this paper, we consider two important GMTI surveillance problems – the target fly-by problem and the persistent surveillance problem.3 5.2.1 Target Fly-by We consider 4 ground targets that are tracked by a GMTI platform, as seen in Fig- ure 5.3. The nominal range from the GMTI sensor to the target region is approx- imately r̃ = 30km. For this example, the initial (at the start of the micro-manager cycle) estimated and true target states of the four targets are: ŝ10 = [130,5.5,84,8.1] T, s10 = [100,3,40,7] T ŝ20 = [−47.88,−2.38,210.41,0.418]T, s20 = [−20,−4,200,1]T ŝ30 = [55.84,2.37,121.74,9.56] T, s30 = [50,2,95,10] T ŝ40 = [−55.13,5.75,−68.41,−6.10]T, s40 = [−70,5,−50,−6]T Since we are assuming a constant platform height, and since the range to our target region may change slightly over the tracking interval, it is assumed that our 3A version of this section can be found in [51]. 85 Figure 5.3: Typical target and GMTI sensor trajectories. depression angle, θd , and squint angle, ψ , will change with time in order to keep the region of interest illuminated. We assume in this example that the most uncertain target is regarded as be- ing the highest priority. Based on the previous state estimates, the mean square error values are, MSE(ŝ10) = 710.87, MSE(ŝ 2 0) = 222.16, MSE(ŝ 3 0) = 187.37, and MSE(ŝ40) = 140.15. Thus, target l = 1 is the most uncertain, and we denote l̄ = 1. The simulation parameters are as follows: number of targets L = 4; sampling time T = 0.1s; probability of detection pd = 0.75; track standard deviations of target model σx = σy = 0.5m; measurement noise parameters σr = 20m, σa = 0.5◦, σṙ = 5m/s; and platform states [px, ṗx, py, ṗy] = [10km,53m/s,−30km,85m/s]. We assume a target priority vector of ν = [ν1,ν2,ν3,ν4] = [0.6,0.39,0.008,0.002]. Recall from Section 2.2 that the target priority determines the covariance of the observation noise. We chose an operating cost of cν = 0.8, and a stopping cost as in Equation (4.29) (with cumulative function F() as discussed in Section 4.2.4) with α1,...,4 = β 2,...,4 = 0.05, β 1 = 5. The parameterized policy chosen for this example was that of Equation (4.45). Figure 5.4 illustrates the performance of the micro-manager when tracking the four targets using the optimal parameterized policy. Notice that the state estimates of the current target, l = 1, settle near the true track near the end of the tracking cycle. Meanwhile, the estimates for the other three targets start to drift off their 86 Target 2 Target 3 Target 1 Target 4 Figure 5.4: Figure showing one cycle of micro-manager with accuracy of state estimates for an arbitrary priority vector. True tracks (solid line) with track estimates (dashed line) and error ellipses. An operating cost of cν = 0.8 and stopping cost as in Equation (4.29) were used. Dots represent the start of the track. 87 respective true tracks near the end of the tracking cycle. We found that the average stopping time for given cost depends largely on the initial uncertainties in the states of each target. In general, the higher the uncertainty for the high priority target, the longer it will take before control is returned to the macro-manager. We now explore the sensitivity of the sample-path cost (achieved with the pa- rameterized policy) on various system parameters. The two parameters which are varied are the probability of detection, pd , and the operating cost, cν ; the effects of which can be seen in Figure 5.5. The sample-path cost increases with the operating cost and decreases with the probability of detection. Notice also that the plot is annotated with the corresponding average stopping times achieved with the param- eterized policy. Higher values of the operating cost, cν , cause the micro-manager to specify the “stop” action sooner than for lower values of cν . As can be seen in the figure, neither the sample-path cost or the average stopping time is very sensi- tive to changes in the probability of detection. However, varying the operating cost has a large affect on both the sample-path cost and the associated average stopping time. The performance of the optimal parameterized policy is measured using mul- tiple initial conditions. As seen in Figure 5.6, the optimal parameterized policy achieves the best periodic policy, for any initial condition. That is, the optimal parameterized policy is the lower envelope of all possible periodic stopping times, for each initial condition. The optimal period for the periodic policy is highly dependent upon the ini- tial condition. This is illustrated in Figure 5.6(b). Notice that for various initial conditions, the optimal stopping time changes frequently. The main performance advantage of the optimal parameterized policy is that it achieves virtually the same cost as the optimal periodic policy, but for any initial condition. Note that the slight vertical shift of the cost associated with the parameterized policy is likely due to the loss in performance associated with approximating the monotone threshold curve by an affine function. 88 0 0.2 0.4 0.6 0.8 1 0.4 0.5 0.6 0.7 0.8 0.9 1 55 60 65 70 75 80 ν Oper ating cos t, c Probability of detection , p d Sa m pl e − pa th co st Stopping Time: 73 iter. Stopping Time: 6 iter. Stopping Time: 78 iter. Stopping Time: 6 iter. Figure 5.5: Surface plot representing dependence of the sample-path cost (achieved with by the parameterized policy) on the probability of de- tection, pd , and the operating cost, cν . The sample-path cost increases in the operating cost, but decreases in the probability of detection. Note the stopping times associated with the labelled vertices above. 5.2.2 Persistent Surveillance Persistent surveillance involves exhaustive surveillance of a region over long time intervals, typically over the period of several hours or weeks [77]. This has recently received much attention in the defense literature since it has the potential to provide critical, long-term battlefield information. For example, by tracking targets for long periods of time using aerial based radars, such as DRDC-Ottawa‘s XWEAR radar [7] or the U.S. Air Force’s new Gorgon Stare Wide Area Airborne Surveillance System, operators can “rewind the tapes” in order to determine the origin of any target of interest [98]. This paper is concerned with achieving the best estimates 89 (a) (b) 0 50 100 150 200 250 300 350 400 450 500 50 52 54 56 58 60 62 64 66 68 70 Ordered Initial Conditions, P0 Sa m pl e − pa th Co st 250 255 260 265 270 275 280 285 290 295 300 59 59.2 59.4 59.6 59.8 60 60.2 60.4 60.6 60.8 61 Ordered Initial Conditions, P0 Sa m pl e − pa th Co st Stopping Time τ = 11 Stopping Time τ = 12 Stopping Time τ = 13 Stopping Time τ = 14 Stopping Time τ = 15 Stopping Time τ = 16 Parameterized Policy Figure 5.6: Plot of sample-path cost of variance periodic policies and the pa- rameterized policy (thick-dashed line) versus ordered initial conditions. Initial conditions are ordered with respect to the cost achieved using the parameterized policy for that particular initial condition. Notice that the parameterized policy performs just as well as the best deterministic stopping time for any particular initial condition, illustrating the versa- tility of the parameterized policy. during the entire tracking period. Consider Fig.5.7 illustrating the persistent surveillance setup. Here r̃ is the nominal range from the target region to the GMTI platform, assumed for our simulations to be approximately 30km. The points on the GMTI track la- beled (1) – (72) correspond to locations4 where we evaluate the Jacobian (3.23). Assume a constant platform orbit speed of 250m/s (or approximately 900km/h [91]) and a constant altitude of approximately 5000m. Assuming 72 divisions along the 30km radius orbit, the platform sensor takes 10.4 seconds to travel be- tween the track segments. Using a similar analysis as in the Appendix, the mea- surement model changes less than 5% in l2-norm in 10.4s, thus the optimal param- eter vector is approximately constant on each track segment. Simulation parameters for this example are as follows: number of targets L= 4; sampling time T = 0.1s; probability of detection pd = 0.9; track standard devia- tions of target model σx = σy = 0.5m; and measurement noise parameters σr = 20m, σa = 0.5◦, σṙ = 5m/s. The platform velocity is now changing (assume a con- 4The platform state at location n ∈ {1,2, ...,72} is defined as p = [r̃ cos(n · 5◦),−ṽsin(n · 5◦), r̃ sin(n ·5◦), ṽcos(n ·5◦)] 90 Figure 5.7: Representation of the persistent surveillance scenario in GMTI systems. The GMTI platform (aircraft) orbits the target region in order to obtain persistent measurements as long as targets remain withing the target region. The nominal range from the platform to the target region is assumed to be 30km. stant speed of ṽ= 250m/s), unlike in the previous example, where only the position of the platform was changing. Since the linearized model will be quite different at each of the pre-specified points, (1)–(72), along the GMTI track, we need to pre- compute the optimal threshold policy at each of the respective locations. We then switch between the optimal strategies when we reach that location on the track. We consider the special case where the radar devotes all its resources to one target, and none to the other targets. That is, we assume a target priority vector of ν = [1,0,0,0]. In this case, the first target is allocated a Kalman filter, with all the other targets allocated Kalman predictors. Since the threshold parametrization vectors depend on the target’s state and measurement models, the first target l = l̄ has a unique parameter vector, where targets l 6= l̄ all have the same parameter vectors. Also, θ l = θ , for all l ∈ {1,2, ...,L}. The choice of the operating cost cν should be lower in the persistent surveil- lance case than in the fly-by example. The intuition behind this is due to the 91 fact that in the fly-by example, where we defined a target priority vector of ν = [0.6,0.39,0.008,0.002], the radar is switching between the targets to satisfy the tracking time ratio (on the fast time scale, t) defined in ν . Thus the operating cost per iteration k should be lower in the persistent surveillance case where we as- sumed to be looking at only one target (ν = [1,0,0,0]). A value of cν = 0.1 < 0.8 satisfies this requirement. The values of α and β can be chosen to modify the gen- eral behaviour of the scheduler. The values chosen in this example were α1 = 0.25, α2,...,4 = 0, β 1 = 0.25, β 2,...,4 = 1 with Equation (4.45) as the parameterized policy and Equation (4.29) (with cumulative function F()) as the stopping cost. The optimal parameterized policy was computed using Algorithm 4.4.2 at each of the 72 locations on the GMTI sensor track. As the GMTI platform orbits the tar- get region, we switch between these parameterized policy vectors, thus continually changing the adopted tracking policy. For the purposes of these simulations (il- lustrating the effects of the micro-manager) we chose a simple macro-manager as follows, l̄ = argmax l=1,...,L { log |Pl| } (5.1) We then assign the priority vector as ν l̄ = 1 and ν l = 0 for all l 6= l̄. Figure 5.8 shows track estimates over multiple macro-management scheduling intervals while the scheduler is using the policy associated with the second of the 72 locations in Figure 5.7. The corresponding values of the log-determinants of each of the targets’ error covariance matrices is shown in Figure 5.9. 92 −100 −50 0 50 100 150 −150 −100 −50 0 50 100 150 200 250 x−coordinate (m) y− co or di na te (m ) Target 2 Target 3 Target 1 Target 4 Figure 5.8: Track estimates in persistent surveillance case over multiple micro-manager scheduling intervals, with true tracks (solid line) and track estimates (starred line). The red portion of each targets track es- timate corresponds to when the radar is filtering that target, where the cyan portion is where the state is being predicted. Dots represent the start of the track. 93 10 20 30 40 50 60 70 80 90 100 7 8 9 10 11 12 13 14 Tracking iteration, k lo g− de te rm in an t o f e rro r c ov ar ia nc e m at ric es Target 1 Target 2 Target 3 Target 4 Figure 5.9: Plot of log-determinants for each target over multiple micro- manager scheduling intervals. Bold sections of each line correspond to where the radar is filtering the target (radar initially filtering target l = 1), where as normal-weighted lines correspond to the target states being predicted. Data points marked in red indicate that the observation was missed. 94 Chapter 6 Conclusions and Future Work This section summarizes the main points of the document and also suggests logical extensions and generalizations of the ideas discussed in the dissertation. 6.1 Summary of Work Accomplished This dissertation analyzed the problem of tracking L targets with a single multi- function radar with the goal of optimally allocating the radar’s resources. The problem was divided into two main components, the radar macro-manager, re- sponsible for assigning priorities to tasks, and the radar micro-manager, respon- sible for scheduling each of the specified tasks in time. The main result of the macro-manager was the application of structural results from the field of optimal issuing. The structural results were used to determine an optimal observation queue to maximize a reward proportional to the incentive for observing each target. In or- der to compute the optimal queue, the threat of each target was computed based multiple factors: 1) The target’s proximity and heading relative to fixed assets; 2) The error in the state estimate of each target; and 3) The computed vulnerability of each asset. A constraint on the maximum tracking time for each target is also computed based on the relative priority ranks. The specified observation queue and maximum tracking time vector are then passed to the micro-manager. The micro- manager determines the optimal time to continue tracking each target, subject to the maximum tracking time constraint. Sufficient conditions were presented such 95 that the optimal stopping time policy takes a monotone threshold form. Due to the monotone structure, an efficient numerical algorithm, a hybrid random search/gra- dient search method, was constructed to compute the approximation to the policy. Detailed numerical results were presented to evaluate the performance of both the macro-manager and micro-manager. We considered two realistic GMTI surveil- lance problems – the target fly-by problem and the persistent surveillance problem. 6.2 Future Work 1. Region-based Scheduler: The work presented in this dissertation provided the framework for a dual time-scale radar manager consisting of both task prioritization and task scheduling components for individual targets. An in- teresting extension to the work presented here would be to formulate the task prioritization and scheduling problems on a region basis. This would involve reformulating the macro-manager such that the priority ranks represent the priority of a particular region. 2. Minimization of Tracking Makespan: One interesting extension to the radar resource management problem would be to optimize the tracking make- span. The tracking makespan is defined as the total time taken to observe a queue of L targets, where the tracking time for each target is specified by the radar micro-manager. However, given that the goal is to minimize the error in the level of threat of each target, minimizing tracking makespan would only minimize threat error if the state estimate error was proportional to the target’s threat estimation error. That is, the higher the uncertainty in the target, the higher the level of threat in that target. However, this is not the case in general. Consider Figure 3.5 for a visual reference to the fol- lowing argument. Consider a specific class of targets shown in the figure, namely weakly maneuvering targets. In the case where a target is moving away from an asset (as is the case for target l = 3), even though the state estimate is diverging from the true track, the threat error approaches zero. This makes intuitive sense as both the estimated target state and the true target state do not represent threats given their respective trajectories, albeit being different. However, if the target is moving towards an asset or if the 96 targets are considered to be highly maneuvering, the relationship between a target’s state estimate and its threat estimate may be complex. It is possible to determine this relationship by coming up with a predicted threat (and cor- responding predicted threat error) based on the target’s current behaviour. It would be interesting to take advantage of the predicted threat error and use this information to improve the performance of the macro-manager schedul- ing algorithm. This would require a generalization of the theory presented in the literature (regarding scheduling of jobs with wait-dependent processing times) to allow for nonlinear target-dependent degradation functions, since each target’s predicted threat error is dependent upon not only the target, but also its location to various assets in the environment. As of now, there does not exist any theory that can handle this scenario. 3. Precedence Constraints: In some tracking scenarios, it may be useful to perform tasks of a certain type before certain other tasks. A useful exam- ple of this in the context of radar tracking would be the desire to observe a particular region using a low resolution tracking mode (to detect a target), before using a higher-resolution, more expensive mode to track any targets in that particular region. The work of Garey [24] considers task scheduling problems with so-called precedence constraints. It would be interesting to apply the results of this topic to the radar tracking problem. 97 Bibliography [1] S. C. Albright. Optimal stock depletion policies with stochastic lives. Management Science, 22(8):852–857, 1976. [2] S. C. Albright. Structural results for partially observable markov decision processes. Operations Research, 27(5):1041–1053, 1979. [3] B. D. O. Anderson and J. B. Moore. Optimal Filtering. Prentice-Hall, Cliffs, NJ, 1979. [4] G. Anderson. Error propagation by the monte carlo method in geochemical calculations. Geochimica et Cosmochimica Acta, 40(12):1533 – 1538, 1976. [5] K. J. Arrow, T. Harris, and J. Marschak. Optimal inventory policy. Econometrica, 19(3):250–272, 1951. [6] M. Athans, R. Wishner, and A. Bertolini. Suboptimal state estimation for continuous-time nonlinear systems from discrete noisy measurements. Automatic Control, IEEE Transactions on, 13(5):504 – 514, Oct. 1968. [7] B. Balaji, A. Damini, and K. Wang. Persistent gmti surveillance: theoretical performance bounds and some experimental results. Radar Sensor Technology XIV, 7669(1):766910, 2010. [8] Y. Bar-Shalom and X. R. Li. Estimation and tracking - Principles, techniques, and software. Artech House, Northwood, MA, 1993. [9] R. E. Bellman. Dynamic Programming. Princeton University Press. Princeton, NJ, 1957. [10] D. P. Bertsekas. Dynamic Programming and Optimal Control, volume 1 and 2. Athena Scientific. Belmont, MA, 1995. 98 [11] S. S. Blackman and R. Popoli. Design and Analysis of Modern Tracking Systems. Artech House, 1999. [12] M. Brown and S. M. Ross. Optimal issuing policies. Management Science, 19(11):1292–1294, 1973. [13] M. Brown and H. Solomon. Optimal issuing policies under stochastic field lives. Journal of Applied Probability, 10(4):761–768, 1973. [14] S. Browne and U. Yechiali. Scheduling deteriorating jobs on a single processor. Operations Research, 38(3):495–498, 1990. [15] J. P. L. Cadre and S. Laurent-Michel. Optimizing the receiver maneuvers for bearings-only tracking. Automatica, 35(4):591 – 606, 1999. [16] E. Campbell, J. Hasenbein, and V. Tardif. Optimal scheduling disciplines in systems with wait-dependent service times. 2000. [17] T. M. Cover and J. A. Thomas. Elements of Information Theory. Wiley-Interscience, Aug. 1991. [18] I. Dall. Threat assessment without situation assessment. In Information, Decision and Control. IDC 99., pages 365 –370, 1999. [19] C. Derman and M. Klein. Inventory depletion management. Management Science, 4(4):450–456, 1958. [20] Z. Ding. A survey of radar resource management algorithms. In Electrical and Computer Engineering, 2008. CCECE 2008. Canadian Conference on, pages 1559–1564, May 2008. [21] D. Easley and J. Kleinberg. Networks, Crowds, and Markets: Reasoning About a Highly Connected World. Cambridge University Press, 2010. [22] E. G. Enns and P. F. Ehlers. Optimal issuing policies for non-identical units with stochastic lifetimes. Journal of Applied Probability, 13(1):183–189, 1976. [23] M. C. Fu. Optimization for simulation: Theory vs. practice. INFORMS Journal on Computing, 14(3):192–215, July 2002. [24] M. R. Garey. Optimal task sequencing with precedence constraints. Discrete Mathematics, 4(1):37 – 56, 1973. 99 [25] I. Y. Gejadze, F.-X. Le Dimet, and V. Shutyaev. On optimal solution error covariances in variational data assimilation problems. Journal of Computational Physics, 229:2159–2178, Mar. 2010. [26] A. Gelb. Applied Optimal Estimation. MIT Press, Cambridge, MA, 1974. [27] S. Ghosh. Scalable QoS-based Resource Allocation. PhD thesis, Carnegie Mellon University, 2004. [28] S. Ghosh, R. Raj Rajkumar, J. Hansen, and J. Lehoczky. Integrated qos-aware resource management and scheduling with multi-resource constraints. Real-Time Systems, 33:7–46, 2006. [29] B. Gillespie, E. Hughes, and M. Lewis. Scan scheduling of multi-function phased array radars using heuristic techniques. In Radar Conference, 2005 IEEE International, pages 513 – 518, May 2005. [30] K. D. Glazebrook. On permutation policies for the scheduling of deteriorating stochastic jobs on a single machine. Journal of Applied Probability, 30(1):184–193, 1993. [31] D. Granot and D. Zuckerman. Optimal sequencing and resource allocation in research and development projects. Management Science, 37(2): 140–156, 1991. [32] J. A. Greenwood. Issue priority: Last in first out (lifo) vs first in first out (fifo) as a method of issuing items from supply storage. Naval Research Logistics Quarterly, 2(4):251, 1955. [33] F. Gustafsson and G. Hendeby. On nonlinear transformations of stochastic variables and its application to nonlinear filtering. In Acoustics, Speech and Signal Processing, 2008. ICASSP 2008. IEEE International Conference on, pages 3617 –3620, Apr. 2008. [34] J. Hansen, S. Ghosh, R. Rajkumar, and J. Lehoczky. Resource management of highly configurable tasks. In Parallel and Distributed Processing Symposium, 2004. Proceedings. 18th International, page 116, Apr. 2004. [35] F. Harada, T. Ushio, and Y. Nakamoto. Adaptive resource allocation control for fair qos management. Computers, IEEE Transactions on, 56(3): 344 –357, Mar. 2007. [36] J. Hopfield. Neural networks and physical systems with emergent collective computational abilities. In Proceedings of the National Academy of Sciences of the USA, volume 79, pages 2554–2558, Apr. 1982. 100 [37] A. Irci, A. Saranli, and B. Baykal. Study on q-ram and feasible directions based methods for resource management in phased array radar systems. Aerospace and Electronic Systems, IEEE Transactions on, 46(4):1848 –1864, Oct. 2010. [38] R. S. Irving. Integers, polynomials, and rings. Springer-Verlag New York, 2004. [39] A. Izquierdo-Fuente and J. Casar-Corredera. Optimal radar pulse scheduling using a neural network. In Neural Networks. IEEE World Congress on Computational Intelligence., IEEE International Conference on, volume 7, pages 4588 –4591, June 1994. [40] A. Jazwinski. Stochastic Processes and Filtering Theory. Academic Press. New York, NY., 1970. [41] S. Ji, R. Parr, and L. Carin. Nonmyopic multiaspect sensing with partially observable markov decision processes. Signal Processing, IEEE Transactions on, 55(6):2720 –2730, June 2007. [42] F. Johansson and G. Falkman. A bayesian network approach to threat evaluation with application to an air defense scenario. Information Fusion, 2008 11th International Conference on, pages 1 –7, July 2008. [43] D. Kershaw and R. Evans. Optimal waveform selection for tracking systems. Information Theory, IEEE Transactions on, 40(5):1536 –1550, Sept. 1994. [44] D. Kershaw and R. Evans. Waveform selective probabilistic data association. Aerospace and Electronic Systems, IEEE Transactions on, 33 (4):1180 –1188, Oct. 1997. [45] A. Klenke and L. Mattner. Stochastic ordering of classical discrete distributions. ArXiv e-prints, Mar. 2009. [46] W. Komorniczak, T. Kuczerski, and J. Pietrasinski. The priority assignment for detected targets in multi-function radar. In Microwaves, Radar and Wireless Communications. 2000. MIKON-2000. 13th International Conference on, volume 1, pages 244 –247, 2000. [47] C. Kreucher, D. Blatt, and A. Hero. Adaptive multi-modality sensor scheduling for detection and tracking of smart targets. In Digital Signal Processing, 2005. 101 [48] V. Krishnamurthy and D. Djonin. Structured threshold policies for dynamic sensor scheduling - partially observed markov decision process approach. Signal Processing, IEEE Transactions on, 55(10):4938 – 4957, Oct. 2007. [49] V. Krishnamurthy and D. Djonin. Optimal threshold policies for multivariate pomdps in radar resource management. Signal Processing, IEEE Transactions on, 57(10):3954 –3969, Oct. 2009. [50] V. Krishnamurthy and R. Evans. Hidden markov model multiarm bandits: a methodology for beam scheduling in multitarget tracking. Signal Processing, IEEE Transactions on, 49(12):2893 –2908, Dec. 2001. [51] V. Krishnamurthy, R. Bitmead, M. Gevers, and E. Miehling. Sequential detection with mutual information stopping cost. submitted to IEEE Transactions on Signal Processing, 2011. [52] B. La Scala, W. Moran, and R. Evans. Optimal adaptive waveform selection for target detection. In Radar Conference, 2003. Proceedings of the International, pages 492 – 496, Sept. 2003. [53] B. La Scala, M. Rezaeian, and B. Moran. Optimal adaptive waveform selection for target tracking. In Information Fusion, 2005 8th International Conference on, volume 1, pages 552–557, July 2005. [54] A. Lad, Y. Yang, R. Ghani, and B. Kisiel. Toward optimal ordering of prediction tasks. In SDM’09, pages 884–893, 2009. [55] D. Lerro and Y. Bar-Shalom. Tracking with debiased consistent converted measurements versus ekf. Aerospace and Electronic Systems, IEEE Transactions on, 29(3):1015 –1022, July 1993. [56] G. Leu and R. Baratti. An extended kalman filtering approach with a criterion to set its tuning parameters; application to a catalytic reactor. Computers & Chemical Engineering, 23(11-12):1839 – 1849, 2000. [57] X. Liu and A. Goldsmith. Kalman filtering with partial observation losses. In Decision and Control, 2004. CDC. 43rd IEEE Conference on, volume 4, pages 4180 – 4186, Dec. 2004. [58] W. Lovejoy. A survey of algorithmic methods for partially observed markov decision processes. Annals of Operations Research, 28:47–65, 1991. 102 [59] W. S. Lovejoy. Some monotonicity results for partially observed markov decision processes. Operations Research, 35(5):736–743, 1987. [60] D. Manolakis. Kronecker product based second order approximation of mean value and covariance matrix in nonlinear transformations. Signal Processing Letters, IEEE, 18(1):43 –46, Jan. 2011. [61] J. I. Marcum. Statistical theory of target detection by pulsed radar. Technical report, United States Air Force, Project RAND, Dec. 1947. [62] S. Miranda, C. Baker, K. Woodbridge, and H. Griffiths. Knowledge-based resource management for multifunction radar: a look at scheduling and task prioritization. Signal Processing Magazine, IEEE, 23(1):66 – 76, Jan. 2006. [63] S. Miranda, C. Baker, K. Woodbridge, and H. Griffiths. Fuzzy logic approach for prioritisation of radar tasks and sectors of surveillance in multifunction radar. Radar, Sonar Navigation, IET, 1(2):131 –141, Apr. 2007. [64] S. L. C. Miranda, C. J. Baker, K. Woodbridge, and H. D. Griffiths. Simulation methods for prioritising tasks and sectors of surveillance in phased array radars. International Journal of Simulation, 5(1-2):1825, 2004. [65] B. Ochoa and S. Belongie. Covariance propagation for guided matching. 3rd Workshop on Statistical Methods in Multi-Image and Video Processing, 2006. [66] N. Okello and G. Thoms. Threat assessment using bayesian networks. Proceedings of the Sixth International Conference on Information Fusion, 2003. [67] A. Orman, C. Potts, A. Shahani, and A. Moore. Scheduling for a multifunction phased array radar system. European Journal of Operational Research, 90(1):13 – 25, 1996. [68] M. Pelosi. Range Limited UAV Trajectory using Terrain Masking under Radar Detection Risk. PhD thesis, Nova Southeastern University, 2010. [69] W. P. Pierskalla and C. D. Roach. Optimal issuing policies for perishable inventory. Management Science, 18(11):603–614, 1972. [70] J. Pietrasinski and W. Komorniczak. Application of artificial intelligence in radar resource management. In Microwaves and Radar, 1998. MIKON ’98., 12th International Conference on, volume 1, pages 138 –142, May 1998. 103 [71] M. L. Pinedo. Scheduling: Theory, Algorithms, and Systems. Prentice-Hall, 3rd edition, 2008. [72] A. Polychronopoulos, M. Tsogas, A. Amditis, S. L. Andreone, and F. Tango. Dynamic situation and threat assessment for collision warning systems: the euclide approach. 2004. [73] M. L. Puterman. Markov Decision Processes: Discrete Stochastic Dynamic Programming. Wiley-Interscience, Apr. 1994. [74] R. Rajkumar, C. Lee, J. Lehoczky, and D. Siewiorek. A resource allocation model for qos management. In Real-Time Systems Symposium, 1997. Proceedings., The 18th IEEE, pages 298 –307, Dec. 1997. [75] M. Rezaeian. Sensor scheduling for optimal observability using estimation entropy. In Pervasive Computing and Communications Workshops, 2007. PerCom Workshops ’07. Fifth Annual IEEE International Conference on, pages 307 –312, Mar. 2007. [76] J. A. Richards. Gmti radar minimum detectable velocity. Technical report, Sandia National Laboratories, Apr. 2011. [77] R. Rimey, W. Hoff, and J. Y. Lee. Recognizing wide-area and process-type activities. In Information Fusion, 2007 10th International Conference on, pages 1–8, July 2007. [78] S. Ross. Introduction to Stochastic Dynamic Programming. New York, Academic Press, 1983. [79] J. N. Roux and J. H. van Vuuren. Threat evaluation and weapon assignment decision support: A review of the state of the art. ORiON, 23:151–186, 2007. [80] J. Roy, S. Paradis, and M. Allouche. Threat evaluation for impact assessment in situation analysis systems. Signal Processing, Sensor Fusion, and Target Recognition XI. SPIE., 4729(329), 2002. [81] P. Sarunic and R. Evans. Adaptive update rate tracking using imm nearest neighbour algorithm incorporating rapid re-looks. Radar, Sonar and Navigation, IEE Proceedings -, 144(4):195 –204, Aug. 1997. [82] C. Savage and B. L. Scala. Sensor management for tracking smart targets. Digital Signal Processing, 19(6):968 – 977, 2009. DASP’06 - Defense Applications of Signal Processing. 104 [83] B. L. Scala and B. Moran. Optimal target tracking with restless bandits. Digital Signal Processing, 16(5):479 – 487, 2006. [84] C.-S. Shih, S. Gopalakrishnan, P. Ganti, M. Caccamo, and L. Sha. Template-based real-time dwell scheduling with energy constraint. In Real-Time and Embedded Technology and Applications Symposium, 2003. Proceedings. The 9th IEEE, pages 19 – 27, May 2003. [85] H.-J. Shin, S.-M. Hong, and D.-H. Hong. Adaptive-update-rate target tracking for phased-array radar. Radar, Sonar and Navigation, IEE Proceedings -, 142(3):137 –143, June 1995. [86] B. Sinopoli, L. Schenato, M. Franceschetti, K. Poolla, M. Jordan, and S. Sastry. Kalman filtering with intermittent observations. Automatic Control, IEEE Transactions on, 49(9):1453 – 1464, Sept. 2004. [87] S. Sowelam and A. Tewfik. Waveform selection in radar target classification. Information Theory, IEEE Transactions on, 46(3):1014 –1029, May 2000. [88] J. Spall. Introduction to Stochastic Search and Optimization. Wiley, New York, NY, 2003. [89] D. Stromberg and P. Grahn. Scheduling of tasks in phased array radar. In Phased Array Systems and Technology, IEEE International Symposium on, pages 318 –321, Oct. 1996. [90] D. Topkis. Supermodularity and Complementarity. Princeton University Press. Princeton, NJ, 1998. [91] US-Air-Force. E-8c joint stars factsheet. http://www.af.mil/information/factsheets/factsheet.asp?id=100, Sept. 2007. [92] P. van Genderen. State-of-the-art and trends in phased array radar. In Perspectives on Radio Astronomy: Technologies for Large Antenna Arrays, pages 1–10, 2000. [93] G. van Keuk and S. Blackman. On phased-array radar tracking and parameter control. Aerospace and Electronic Systems, IEEE Transactions on, 29(1):186 –194, Jan. 1993. [94] V. Vannicola and J. Mineo. Expert system for sensor resource allocation. In Proceedings of the 33rd Midwest Symposium on Circuits and Systems, 1990. 105 [95] M. Vine. Fuzzy logic in radar resource management. In Multifunction Radar and Sonar Sensor Management Techniques (Ref. No. 2001/173), IEE, pages 5/1 – 5/4, Nov. 2001. [96] Y. Wang and G. S. Chirikjian. Nonparametric second-order theory of error propagation on motion groups. The International Journal of Robotics Research November, 27(11-12), Dec. 2008. [97] R. Washburn, M. Schneider, and J. Fox. Stochastic dynamic programming based approaches to sensor resource management. In Information Fusion, 2002. Proceedings of the Fifth International Conference on, volume 1, pages 608 – 615, 2002. [98] R. Whittle. Gorgon stare broadens uav surveillance. Aviation Week, a division of The McGraw-Hill Companies, Nov. 2010. [99] J. Wintenby and V. Krishnamurthy. Hierarchical resource management in adaptive airborne surveillance radars. Aerospace and Electronic Systems, IEEE Transactions on, 42(2):401 – 420, Apr. 2006. [100] P. J. Winzer. Accuracy of error propagation exemplified with ratios of random variables. Review of Scientific Instruments, 71(3):1447 –1454, Mar. 2000. 106 Appendix A Appendix: Tables D(s̄10,ξ10) D(s̄50,ξ50) D(s̄100,ξ100) sa0 0.0010 0.0052 0.0104 sb0 0.0009 0.0049 0.0104 sc0 0.0010 0.0059 0.0119 sd0 0.0007 0.0040 0.0080 se0 0.0010 0.0053 0.0112 Table A.1: Rate of change of Jacobian matrix for various running times. E(s10, s̄10,ξ10,0.1) E(s50, s̄50,ξ50,0.1) E(s100, s̄100,ξ100,0.1) sa0 0.00019091 0.0010597 0.01395 sb0 0.00020866 0.0011699 0.014375 sc0 0.00019165 0.0010813 0.01453 sd0 0.0002008 0.0011065 0.011844 se0 0.0002294 0.0012735 0.015946 Table A.2: Ratio of second-order term to first-order term of Taylor series ex- pansion, ζ = 0.1. 107 E(s10, s̄10,ξ10,0.8) E(s50, s̄50,ξ50,0.8) E(s100, s̄100,ξ100,0.8) sa0 0.00019267 0.0011104 0.014211 sb0 0.00021104 0.0012164 0.014633 sc0 0.00019447 0.0011228 0.014838 sd0 0.00020148 0.0011386 0.012178 se0 0.00023083 0.0013596 0.016603 Table A.3: Ratio of second-order term to first-order term of Taylor series ex- pansion, ζ = 0.8. 108 Appendix B Appendix: Derivations B.1 Target Model Derivation This section derives the discrete white noise acceleration target model, referring to Blackman and Popoli [11] and Bar-Shalom [8]. There are multiple ways to derive the following model; however, the following direct derivation in the discrete-time domain is simplest. It is common to use a constant velocity model to represent the behaviour of targets. In this case, it would be logical to set the acceleration equal to zero, then apply integration to obtain expressions for the velocity and position of the target. However, in practice, targets do not have a constant velocity, but rather undergo small deviations. Thus we equate the acceleration of a target to white noise instead, accounting for the small changes in velocity, ak = wk (B.1) where the white noise process is characterized by the following, E[wk] = 0 (B.2) E[wkwl] = σ2wδkl (B.3) where δkl is the Kronecker delta function. 109 For the purposes of this derivation, we assume motion only along one dimen- sion. Since we have assumed that the x and y dynamics of the target are independent of each other, the generalization to two dimensions is straightforward. Recall the form of the state equation, slk+1 = F1s l k +G1w l k (B.4) where F1 is the state transition matrix (for one dimension), and G1 is the gain vector. From elementary physics, we know that a particle moving at a velocity ẋk for a time T will travel a distance of ∫ T 0 ẋkdt = T xk. This justifies the form of the state transition matrix defined below. Recall from Equation (B.1), the acceleration in the kth iteration is constant (see Figure B.1). Figure B.1: Acceleration during the kth iteration of length T . Thus, the change in displacement for the target is wkT 2/2 with a gain in ve- locity of wkT . Thus the state matrix, F1, and the gain matrix, G1, are defined as follows, F1 = [ 1 T 0 1 ] , G1 = [ T 2/2 T ] (B.5) Similar to before, the covariance matrix of the process noise is defined as, Q = E[G1wkwkGT1 ] = G1E[wkwk]G T 1 = G1σ 2 wG T 1 = [ 1 4 T 4σ2w 12 T 3σ2w 1 2 T 3σ2w T 2σ2w ] (B.6) Extending this to two dimensions is straightforward. As mentioned earlier, we assume the filtering across coordinate is decoupled [8]. Also, since each compo- nent of the target’s motion is assumed to be governed by the same model, the ma- 110 trices describing the multi-dimensional system are just block-diagonal repetitions of the matrices in the one dimensional case. That is, F =  1 T 0 0 0 1 0 0 0 0 1 T 0 0 0 1  , (B.7) with process noise covariance matrix, Q =  1 4 T 4σ2x 12 T 3σ2x 0 0 1 2 T 3σ2x T 2σ2x 0 0 0 0 14 T 4σ2y 12 T 3σ2y 0 0 12 T 3σ2y T 2σ2y  B.2 Approximation and Justification of Linear Gaussian State Space Model In order to apply Kalman filtering techniques, we require our state and observation equations to follow a LGSSM. A LGSSM requires that the state and measurement matrix be linear and time-invariant and the noise terms to be zero-mean, Gaussian random variables. Since the model discussed in Section 2.2 is clearly not linear or time-invariant (due to the measurement process), it is necessary to approximate the nonlinear, time-varying model by a linear, time-invariant model. We then analyze how accurate this approximation is for typical tracking scenarios. The first step is to linearize the model. We follow Jazwinski [40] and linearize around a deterministic target trajectory, s̄lk+1 = Fs̄ l k, and nominal measurement, z̄lk = h(s̄ l k,ξk). Let s̃ l k = s l k− s̄lk and z̃lk = zlk− z̄lk, the first-order Taylor series expan- sion around the nominal trajectory results in the following model, s̃lk+1 = Fs̃ l k +Gw l k (B.8) 111 z̃lk = { ∇sh(s̄lk,ξk)s̃ l k +R1(s l k, s̄ l k,ξk,γ)+ 1√ ν l∆ vlk with probability p l d /0 with probability 1− pld . (B.9) where R1(slk, s̄ l k,ξk,γ) represents the linearization error and is bounded as ‖R1(slk, s̄lk,ξk,γ)‖ ≤ 12‖(slk− s̄lk)T∇2h(ζ ,ξ )(slk− s̄lk)‖, with ζ = γslk +(1− γ)s̄lk for some γ ∈ [0,1]. This expression is dependent upon the Hessian tensor,∇2h(s̄lk,ξk)∈ R3×4×4. The first term in Equation (2.2) corresponds to the Jacobian, ∇sh(s̄lk,ξk) defined as (with indices k and l omitted for simplicity), ∇sh(s,ξ ) = ∂hi(s,ξ ) ∂ s j =  δx r 0 δy r 0 −δy δ 2x +δ 2y 0 δxδ 2x +δ 2y 0 δ̇x r − δxδyδ̇y+δ 2x δ̇x r3 δx r δ̇y r − δxδyδ̇x+δ 2y δ̇y r3 δy r  , r =√δ 2x +δ 2y +ξ 2z (B.10) with relative coordinates defined as δx = x− ξx, δy = y− ξy and similarly for the relative velocities δ̇x and δ̇y. We seek to evaluate the accuracy of the linear and time-invariant approxima- tion. First, consider the following quantities, E(sk, s̄k,ξk,ζ ) = 1 2 ‖(sk− s̄k)T∇2h(ζ sk +(1−ζ )s̄k,ξk)(sk− s̄k)‖ ‖∇sh(s̄k,ξk)s̃k‖ (B.11) D(s̄k,ξk) = ‖∇sh(s̄k,ξk)−∇sh(s̄0,ξ0)‖ ‖∇sh(s̄k,ξk)‖ (B.12) The first ratio, E(sk, s̄k,ξk,ζ ), computes the error in linearizing the model by com- paring the magnitude of the second-order term to the first-order term of the Taylor series expansion of h(s,ξ ). Recall that from the definition of R1(sk, s̄k,ξk,ζ ), ‖R1(sk, s̄k,ξk,ζ )‖ ‖∇sh(s̄k,ξk)s̃k‖ ≤ E(sk, s̄k,ξk,ζ ) (B.13) Thus by evaluating E(sk, s̄k,ξk,ζ ) for typical tracking scenarios, we are bounding 112 the linearization error. The second ratio, D(s̄k,ξk), evaluates how much the first order approximation of h(s,ξ ) varies over time, and thus indicates the accuracy of the time-invariant property of the new model. Consider the following significantly different initial target states (a – e), sa0 = [100 3 40 7], s b 0 = [−20 −4 200 1], sc0 = [50 2 95 10], sd0 = [−70 5 −50 −6], se0 = [150 −15 10 0] (B.14) The platform states are defined as follows, ξk = [ ξxk , ξ̇xk ,ξyk , ξ̇yk ]T = [−35km,100m/s,−15km,20m/s]T (B.15) Note that we have assumed that the platform is moving along a linear track with a constant speed. Recall the equation relating the platform altitude to the x and y displacement of the aircraft, ξz = √ ξ 2x +ξ 2y θd . Assuming a depression angle of θd = 15◦, the corresponding platform altitude is ξz = 10203.2m. We now allow the system to propagate according to the following model: T = 0.1s, σx = σy = 0.5, σr = 20m, σṙ = 5m/s, σa = 0.5◦ with a true track vari- ability parameter of σp = 1.5. The value σp is used as the value for σx and σy to simulate the target tracks (treated as true data for simulation purposes). The tables seen in the Appendix, Section A, present the values of the quantities in Equations (B.11) and (B.12) at various running times, k. Notice that the analysis was per- formed for a maximum iteration count of k = 100. At a sampling time of T = 0.1s, this corresponds to a maximum usage length of 10s for a specific tracking alloca- tion. As will be seen later in the simulations section, , the typical stopping time was around k = 10 to 15 iterations (1 to 1.5 seconds), thus the assumption of a maximum time of 10 seconds is quite conservative. Consider the tables in the Appendix, Section A. Table A.1 represents the rate of change of the Jacobian matrix with respect to the micro-manager tracking time, k, while tables A.2 and A.3 compare the ratio of the second-order term of the Taylor series expansion to the first-order term, for ζ = 0.1 and ζ = 0.8, respectively.We 113 can see that the values in the above tables are small, with approximate bounds of, ‖R1(sk, s̄k,ξk,ζ )‖ ‖∇sh(s̄k,ξk)s̃k‖ ≤ E(sk, s̄k,ξk,ζ )≤ 0.016 (B.16) D(s̄k,ξk) = ‖∇sh(s̄k,ξk)−∇sh(s̄0,ξ0)‖ ‖∇sh(s̄k,ξk)‖ ≤ 0.012 (B.17) for any of the listed initial states and for all k ≤ 100. Thus, on the micro-manager time scale, we can safely state that the system is well-approximated by the follow- ing linear, time-invariant state space model, 1 slk+1 = Fs l k +Gw l k (B.18) zlk = { Hlslk + 1√ ν l∆ vlk w.p. p l d /0 w.p. 1− pld (B.19) where the linearized measurement model is defined as, Hl = ∇sh(s̄lk,ξk) evaluated using the initial target and platform states. B.3 Kalman Filtering with Intermittent Observations Due to the fact that we cannot assume that the radar successfully receives all ob- servations, we must re-consider the classical Kalman filtering problem, but this time noting the presence of these missed detections. The presence of intermittent observations causes the normal (pd = 1) Riccati recursion, Equation (2.17), to no longer hold. Note that it is assumed here that the observations are dropped com- pletely, rather than only partially received as in [57]. The general recursion, stated below, takes into account the probability of detection, and is derived in [86]. We will provide an outline of the derivation here to give the reader some intuition into the filtering equations. The derivation begins by treating the arrival of an observation as a Bernoulli random variable, κ , with PMF f (κ; pd) = pκd (1− pd)1−κ where pd is the probabil- ity of detection. The output noise for a given time step k and target l, υk = 1√ν l∆v l k 1The Gaussian property and covariance calculations follow similarly under the assumption that the SNR is moderate, allowing the higher-order-terms to be omitted. 114 is defined as (index l omitted), Pr (υk|κk) = { N(0,R) if κk = 1 N(0,σ2I) if κk = 0 (B.20) for a given σ2. The idea of the derivation is to use a “dummy” observation with a fixed variance when the observation is missed, then take the limit as σ approaches infinity. Incorporating this idea into the equations for the state and covariance updates, we obtain the following equations, ŝlk|k = ŝ l k|k−1+Pk|k−1H T [HPk|k−1HT +1{1}(κk)R (B.21) +1{0}(κk)σ2I ]−1( zlk−Hŝlk|k−1 ) Plk|k = P l k|k−1−Plk|k−1HT [ HPlk|k−1H T +1{1}(κk)R (B.22) +1{0}(κk)σ2I ]−1 HPlk|k−1 where the indicator function is defined as, 1A(x) = { 1 if x ∈ A 0 if x 6∈ A (B.23) Taking the limit as σ2→ ∞, it can be shown [86] that the above equations (Equa- tions (B.21) and (B.22)) when combined with Equations (2.14) and (2.16) form the modified state update and the modified Riccati update, shown below, ŝlk+1 = Fŝ l k +κkFP l kH [ HT PlkH +R ]−1( zlk−HT ŝlk ) (B.24) Plk+1 = FP l kF T +Q−κkFPlkHT (HPlkHT +R)−1HPlkFT (B.25) where we use the simplifying notation ŝlk = ŝ l k|k−1 and P l k = P l k|k−1. Note that in the deterministic update, the Riccati recursion is guaranteed to converge to some stationary matrix; however, due to the random process κ1,κ2, . . . ,κk introduced, the update is now stochastic and convergence is no longer guaranteed. A numerical lower bound for the value of pd is discussed in [86] in order to ensure convergence. 115 For the purposes of this paper, we consider a probability of detection high enough that we do not need to worry about the possibility of the modified Riccati update diverging from the stationary matrix.2 Note that κk = 1− 1 /0(zlk) = 1 /0C(zlk), thus the modified Riccati update is in fact measurement-dependent (and model dependent) unlike the classical Riccati update, in Equation (2.17), which is only dependent upon the model. We define the following error covariance update functions (the measurement-dependent Riccati update and the measurement-free Lyapunov update) for later use, R(Plk ,z l k) def = FPlkF T +Q−1 /0C(zlk)FPlkHT (HPlkHT +R)−1HPlkFT (B.26) L(Plk) def = FPlkF T +Q (B.27) Note that the Lyapunov update, Equation (B.27), is equivalent to the Riccati update, Equation (B.26), for a given time k if zlk = /0. That is, when an observation is missed, the covariance update is equivalent to just predicting the error covariance. B.4 Threat Error Covariance Derivation The details of the second-order Taylor approximation of the threat error is pre- sented. Keeping two terms of the expansion in Equation (3.22), we obtain the following estimate, with s = E { vec ( eeT )} = vec(P) ∈ R42×1, PΓ ≈ E { (r−E{r})(r−E{r})T} (B.28) = E {( Γ(ŝ)+∇sΓ e+ 1 2 ∇2sΓ e⊗ e− ( Γ(ŝ)+ 1 2 ∇2sΓ s )) ( Γ(ŝ)+∇sΓ e+ 1 2 ∇2sΓ e⊗ e− ( Γ(ŝ)+ 1 2 ∇2sΓ s ))T} (B.29) = E {( ∇sΓ e+ 1 2 ∇2sΓ e⊗ e− 1 2 ∇2sΓ s ) ( ∇sΓ e+ 1 2 ∇2sΓ e⊗ e− 1 2 ∇2sΓ s )T} (B.30) 2This stationary matrix is no longer unique (unlike the standard algebraic Riccati equation) due to the stochastic nature of the modified Riccati update. 116 = ∇sΓ E { eeT } (∇sΓ)T + 1 2 ∇sΓ E { e(e⊗ e)T }( ∇2sΓ )T − 1 2 ∇sΓ E { e sT }( ∇2sΓ )T + 1 2 ∇2sΓ E { (e⊗ e)eT}(∇sΓ)T + 1 4 ∇2sΓ E { (e⊗ e)(e⊗ e)T }( ∇2sΓ )T − 1 4 ∇2sΓ E { (e⊗ e)sT}(∇2sΓ)T − 1 2 ∇2sΓ E { seT } (∇sΓ)T − 14∇ 2 sΓ E { s(e⊗ e)T }( ∇2sΓ )T + 1 4 ∇2sΓ E { ssT }( ∇2sΓ )T (B.31) Now, using the fact that e is zero-mean, the above simplifies to, PΓ ≈ ∇sΓ P (∇sΓ)T + 12∇sΓC ( ∇2sΓ )T + 1 2 ∇2sΓC T (∇sΓ)T + 1 4 ∇2sΓ ( D− ssT )(∇2sΓ)T (B.32) where C = E { e(e⊗ e)T } , and D = E { (e⊗ e)(e⊗ e)T } = E {( eeT )⊗ (eeT )}. Knowing that the PDF of the noise term e is symmetric about the mean (since e is a Gaussian-distributed random variable), then the third order moments contained in the matrix C are zero [60]. The expression for the error covariance of the threat then reduces to the following, PΓ ≈ ∇sΓ P (∇sΓ)T + 14∇ 2 sΓ ( D− ssT )(∇2sΓ)T (B.33) The matrix D contains fourth-order moments, and along with the matrix ssT , are of dimension 42×42. The matrix D can equivalently be regarded as a 4×4 block matrix with 4×4 matrix elements, D = E   e21eeT e1e2eeT e1e3eeT e1e4eeT e2e1eeT e22eeT e2e3eeT e2e4eeT e3e1eeT e3e2eeT e23eeT e3e4eeT e4e1eeT e4e2eeT e4e3eeT e24eeT   (B.34) ⇒ Dmn,i j = E { emeneie j } (B.35) 117 where ei ∈ R4×1 represents the i th element of the error vector e and m, n and i, j refers to the block’s and element’s row and column, respectively. For Gaussian PDFs, D can be computed from the error covariance matrix, P, as the following Gaussian approximation [26] (p.192), Dmn,i j = PmnPi j +PmiPn j +Pm jPn j (B.36) The elements of the matrix ssT can be written in terms of elements of the covariance matrix P as, ssTmn,i j = PmiPn j (B.37) 118 Appendix C Appendix: Proofs C.1 Proofs: Macro-manager C.1.1 Proof of Lemma 3.4.1 Proof. We begin by computing the derivatives of ‖ρ‖22 with respect to p, ∂ ∂ p ‖ρ‖22 = 4n2 p3−6n2 p2+4n(n+σ2)p−2n(µ+σ2) (C.1) ∂ 2 ∂ p2 ‖ρ‖22 = 12n2 p(p−1)+4n(n+σ2) (C.2) The general solution method to determine the optimal parameter p in the least- squares sense is to compute the gradient and equate it to zero. Since we only have one parameter, we only have one gradient equation, represented by Equation (C.1). In order to ensure that we minimize ‖ρ‖22 when equating the gradient to zero, we must first be certain that ‖ρ‖22 is convex in p. This is equivalent to checking if the second derivative, Equation (C.2), is positive for all p. ∂ 2 ∂ p2 ‖ρ‖22 > 0 (C.3) 12n2 p(p−1)+4n(n+σ2)> 0 (C.4) 3np(p−1)+n+σ2 > 0 (C.5) 119 Referring to Figure 3.13(b), we can see that the maximum value of the variance of a binomial distribution, σ2b = np(1− p), is n/4 achieved at p = 1/2. Thus the first term in Equation (C.5) is bounded below by −3n/4, that is, 3np(p−1)+n+σ2 >−3n 4 +n+σ2 = n 4 +σ2 > 0 (C.6) since n,σ2 > 0. As a result, the second derivative of ‖ρ‖22 with respect to p is strictly positive and equivalently ‖ρ‖22 is convex in p. Using the convexity result, we can expect to minimize ‖p‖22 by equating the first derivative, Equation (C.1), to zero. ∂ ∂ p ‖ρ‖22 = 0 (C.7) ⇒ 4n2 p3−6n2 p2+4n(n+σ2)p−2n(µ+σ2) = 0 (C.8) ⇒ 2np3−3np2+2(n+σ2)p−σ2−µ = 0 (C.9) Equation (C.9) is a cubic function in p. There are three solutions to this equation, but only one of which is real, as we will now show. From [38], it follows that the cubic, Equation (C.9), has a unique real root if and only if the following condition holds, 3n2σ2+12nσ4+16σ6−27n2µ+27nµ2+7n3 > 0 . (C.10) We now show that for any n > 0 and σ2 ≥ 0, the inequality (C.10) follows. Since σ2 ≥ 0, we have, 3n2σ2+12nσ4+16σ6−27n2µ+27nµ2+7n3 ≥−27n2µ+27nµ2+7n3 . (C.11) Let g(n,µ) = −27n2µ + 27nµ2 + 7n3. Differentiating g(n,µ) with respect to µ yields, g(n,µ)µ =−27n2+54nµ . (C.12) The second derivative of g(n,µ) with respect to µ equals 54n and thus is strictly positive for any n > 0. Setting the right-hand side of (C.12) to zero and solving for µ then yields the unique global minimizer µ∗ of g(n,µ), that is µ∗ = n/2. Since 120 g(n,µ∗) = n3/4 > 0, the inequality (C.11) implies (C.10) must hold for any n > 0. Applying Theorem ?? in cite ??, the unique solution to (C.9) is then given by (3.51). C.1.2 Proof of Lemma 3.4.2 Proof. By Lemma 3.4.1 there is a unique real solution p∗ to (3.50) for any variance σ2≥ 0, mean µ ∈R, and n> 0. We now show that for any mean µ ∈ [0,n], variance σ2 ≥ 0, and n > 0, if p∗ /∈ [0,1] then there exists a p̃ ∈ [0,1] such that ( µ−np̃)2+ (σ2−np̃(1− p̃))2 < (µ−np∗)2+ (σ2−np∗(1− p∗))2 (C.13) and thus any p∗ /∈ [0,1] cannot be the unique real solution to (3.50). First assume p∗ > 1 and let p̃ = 1. We then have for any mean µ ∈ [0,n], (µ−np∗)2 = µ2−np∗(2µ−np∗) = µ2+n ( 2µ(1− p∗)+n((p∗)2−1)− (2µ−n)) ≥ µ2+n ( 2n(1− p∗)+n((p∗)2−1)− (2µ−n)) = µ2+n ( n(p∗−1)2− (2µ−n)) > (µ−np̃)2 , (C.14) where the first inequality holds since p∗ > 1, µ ∈ [0,n], and n > 0, and the last inequality holds since p∗ > 1 and n > 0. Next assume p∗ < 0 and let p̃ = 0. We then have for any mean µ ≥ 0, (µ−np∗)2 = µ2+np∗(np∗−2µ) > µ2 = (µ−np̃)2 , (C.15) where the inequality holds since np∗ < 0 and (np∗−2µ)< 0 for any µ ≥ 0, p∗ < 0 and n > 0. Lastly, note that any p∗ /∈ [0,1] implies p∗(1− p∗) < 0, thus for any variance 121 σ2 ≥ 0 and n > 0 we have,( σ2−np∗(1− p∗))2 = σ4+np∗(1− p∗)(np∗(1− p∗)−2σ2) > σ4 = ( σ2−np̃(1− p̃))2 , (C.16) where p̃ ∈ {0,1}. Combining (C.14)-(C.16) yields (C.13). C.1.3 Proof of Lemma 3.4.3 Proof. Before proving Lemma 3.4.3, we state the following lemma from Mattner [45], Lemma C.1.1 (Mattner [45]). Consider two random variables, de- noted X ∼ Bin(n1, p1) and Y ∼ Bin(n2, p2). Then X ≥LR Y if and only if, p2 = 0 or n1 ≥ n2 and n1 p11− p1 ≥ n2 p2 1− p2 (C.17) Define f1 = Bin(N, p1) and f2 = Bin(N, p2) corresponding to the two binomial random variables, K1 ∼ f1 and K2 ∼ f2, with a common parameter N. By Lemma C.1.1, we know that for X ∼ Bin(n1, p1) and Y ∼ Bin(n2, p2), we require that either p2 = 0 or n1 ≥ n2 and n1 p11−p1 ≥ n2 p2 1−p2 in order for X ≥LR Y . Ignoring the trivial case p2 = 0, clearly n1 ≥ n2 is satisfied for n1 = n2 = N, then a sufficient condition for K1 ≥LR K2 is, N p1 1− p1 ≥ N p2 1− p2 (C.18) ⇒ p1 1− p1 ≥ p2 1− p2 (C.19) ⇒ p1 ≥ p2 (C.20) 122 C.1.4 Proof of Theorem 3.2.2 This proof is restated from Ross [78] in the context of our problem. It is necessary to state and prove some lemmas before proving the main result. Lemma C.1.2. Suppose that X1 and X2 are two, independent random variables with probability distributions f and g, respectively and let X1 ≥LR X2. If h(x1,x2) is a real-valued function satisfying, h(x2,x1)≥ h(x1,x2) whenever x1 ≥ x2 (C.21) then, h(X2,X1)≥st h(X1,X2) (C.22) Proof. We first define two new random variables, U and V , from the random variables X1 and X2, as U =max(X1,X2) and V =min(X1,X2). Next, we condition on U and V with u and v, respectively, u≥ v. The conditional distribution of h(X1,X2) is concentrated on the two points, h(u,v) and h(v,u). The following probability, λ1 ≡ P(X1 = max(X1,X2), X2 = min(X1,X2)|U = u,V = v) = f (u)g(v) f (u)g(v)+ f (v)g(u) (C.23) is assigned to the smaller point h(u,v). Similarly, h(X2,X1) is also concentrated on the two points h(u,v) and h(v,u) with probability, λ2 ≡ P(X2 = max(X1,X2), X1 = min(X1,X2)|U = u,V = v) = g(u) f (v) g(u) f (v)+g(v) f (u) (C.24) given to the smaller point h(u,v). Recall the hypotheses u ≥ v and X1 ≥LR X2 and notice that the expressions (C.23) and (C.24) have a common denominator, thus, λ1 ≥ λ2⇒ f (u)g(v)≥ g(u) f (v) (C.25) 123 Therefore, conditional on U = u and V = v, h(X2,X1) is larger than h(X1,X2) with respect to first order stochastic dominance, P(h(X2,X1)≥ a|U = u,V = v)≥ P(h(X1,X2)≥ a|U = u,V = v) (C.26) Taking expectations of both sides results in the desired result, h(X2,X1)≥st h(X1,X2). (C.27) Lemma C.1.3. If d(x) is a non-negative convex function, then for all 0 < r1 < r2 and y≥ 0, r1d(y)+ r2d(y+ r1d(y))≤ r2d(y)+ r1d(y+ r2d(y)) . (C.28) Proof. Fix y≥ 0. If d(y) = 0 then (C.28) is satisfied since both sides vanish. We thus assume d(y) > 0. Note that y ≥ 0 and 0 < r1 < r2 implies, 0≤ y < y+ r1d(y)< y+ r2d(y) . (C.29) Next observe that convexity of d(x) implies that for any t,h > 0, d(y+ t)< tt+h d(y+ t+h)+ h t+h d(y) , ⇒ (t+h)d(y+ t)− td(y+ t+h)−hd(y)< 0 , ⇒ (t+h)(d(y+ t)−d(y))− t(d(y+ t+h)−d(y))< 0 , ⇒ d(y+t)−d(y)t < d(y+t+h)−d(y)t+h . Hence ( d(y+t)−d(y))/t is increasing in t, thus from (C.29) we have, d(y+ r1d(y))−d(y) r1d(y) < d(y+ r2d(y))−d(y) r2d(y) . (C.30) 124 Rearranging (C.30) as follows then yields (C.28), (d(y+ r1d(y))−d(y))r2d(y)< (d(y+ r2d(y))−d(y))r1d(y) ⇒ r2d(y+ r1d(y))− r2d(y)< r1d(y+ r2d(y))− r1d(y) , ⇒ r1d(y)+ r2d(y+ r1d(y))< r2d(y)+ r1d(y+ r2d(y)) . (C.31) Lemma C.1.4. If d(y) is a non-negative, non-decreasing, convex func- tion, then x+ td(x) is an increasing function of x, x≥ 0, for all t ≥ 0. Proof. Given the form of the function d, we know that d′(x)≥ 0 for all x. By differentiating x+ td(x) with respect to x, we obtain 1+ td′(x) which is positive for all x, t ≥ 0, thus x+ td(x) in increasing in x for all x, t ≥ 0. We now use the aforementioned results to prove Theorem 3.2.2. Proof. We first introduce some notation; the target sequence l1, l2, . . . , lL corre- sponds to the sequence of ranks X̂1, X̂2, . . . , X̂L. The proof of the main result is based on induction. The base case, with two targets, can be shown using Lemmas C.1.2 and C.1.3. The total accumulated reward is maximized by observing the tar- get with the larger rank (with respect to likelihood ratio ordering) first, thus the base case, with L = 2 holds. Induction Hypothesis: Assume that for L−1 targets it is optimal to observe tar- gets in order of decreasing priority ranks (with respect to the likelihood ratio order). We now analyze the L target scenario, first considering the class of policies that specify the target observation queue at time zero, termed static policies. There are L! of these policies, corresponding to the L! possible orders to observe L targets. Consider any static policy. Let target l j be the target that is observed last in the queue. Using Lemma C.1.4, it follows that among the queues which specify target l j as the last observation, the queue resulting in the maximum total accumulated 125 reward is the one which stochastically maximizes the accumulated reward for ob- serving the first L−1 targets. Recalling the induction hypothesis, of all observation queues that specify target l j as the last observation, the queue which stochastically maximizes the accumulated reward is the one which observes the first L−1 targets in order of decreasing priority ranks (in a likelihood ratio sense). Interchange Argument: If l j is not the target with the smallest priority rank, X̂1, that is, l j 6= 1, then the best observation queue is the one which specifies the observation of the target with the smallest priority rank immediately prior to the last target observation l j. However, if we let z denote the level of accumulated reward when only targets l1 and l j remain, then the problem is now one where there are two target observations a a function dz(x) = d(z+ x). Since we have defined the function d to be non-negative and convex, it follows from the base case of the induction argument that if we interchange the observations of targets l j and l1, we would achieve a stochastically larger accumulated reward. Thus for any static policy, there exists one that observes target l1 last and has a stochastically larger reward. However, out of all the queues that specify to observe target l1 last, the one with the stochastically largest accumulated reward is (by Lemma C.1.4 and the induction hypothesis) the order provided by the theorem. We now consider the class of observation queues which allow for the option of changing the order of the remaining observations after the first k targets have been observed. The policies associated with these queues are referred to as dy- namic policies. It is shown that the optimal policy in the static case (above) also stochastically maximizes the accumulated reward in the class of dynamic policies. Suppose k = 1, then it follows from the interchange argument that it is optimal to not assign a queue that specifies anything other than to observe targets in order of decreasing priority ranks. If this is true for k−1 completed observations then, by the same argument, it is true when k observations have been completed since, by the induction hypothesis, we would only use this opportunity at most once. Since the result is true for all k, then the optimal policy in the static case also stochasti- cally maximizes the reward in the class of dynamic policies. 126"@en ; edm:hasType "Thesis/Dissertation"@en ; vivo:dateIssued "2011-11"@en ; edm:isShownAt "10.14288/1.0072219"@en ; dcterms:language "eng"@en ; ns0:degreeDiscipline "Electrical and Computer Engineering"@en ; edm:provider "Vancouver : University of British Columbia Library"@en ; dcterms:publisher "University of British Columbia"@en ; dcterms:rights "Attribution-ShareAlike 3.0 Unported"@en ; ns0:rightsURI "http://creativecommons.org/licenses/by-sa/3.0/"@en ; ns0:scholarLevel "Graduate"@en ; dcterms:title "Stochastic target scheduling for radar resource management : threat assessment and optimal threshold policies"@en ; dcterms:type "Text"@en ; ns0:identifierURI "http://hdl.handle.net/2429/37417"@en .