META-LEVEL PATTERN ANALYSIS FOR TARGET TRACKINGbyMUSTAFA H. FANASWALAB.Sc., American University of Sharjah, 2007M.A.Sc., Carleton University, 2009A THESIS SUBMITTED IN PARTIAL FULFILLMENT OFTHE REQUIREMENTS FOR THE DEGREE OFDOCTOR OF PHILOSOPHYinTHE FACULTY OF GRADUATE AND POSTDOCTORAL STUDIES(Electrical and Computer Engineering)THE UNIVERSITY OF BRITISH COLUMBIA(Vancouver)April 2015c© Mustafa H. Fanaswala, 2015A B S T R A C TClassical target tracking operates on a fast time-scale (order of seconds) duringwhich target dynamics are constrained by the physical laws of motion. This disser-tation is motivated by pattern analysis on a meta-level (order of minutes or larger)during which the intent of a target manifests. On such a coarse time-scale, Marko-vian models quantifying the physical laws of motion are not useful in detectinganomalous behavior. This is due to the hierarchical nature of plans and goal-oriented targets. In this dissertation, several novel stochastic models are devisedto capture long-range dependencies within target trajectories for the joint purposeof classification and enhanced tracking. The behavior of targets on the meta levelis captured through both positional features (destination) as well as movementpatterns (shape). Such features are used with context-free grammar models andreciprocal Markov models (one dimensional Markov random fields) for modelingspatial trajectories with a known end point. The intent of a target is assumed tobe a function of the shape of the trajectory it follows and its intended destination.The stochastic grammar models developed are concerned with trajectory shapeclassification while the reciprocal Markov models are used for destination predic-tion. Towards this goal, Bayesian signal processing algorithms with polynomialcomplexity are also presented. The versatility of such models is illustrated withtracking applications in radar-based and optical surveillance.iiP R E FA C EThis dissertation has been prepared by Mustafa Fanaswala in its entirety andis composed of some original parts as well as parts that have previously beenpublished. Chapter 1 is the original work of the author and provides a big pic-ture view of the problem area and motivations for the proposed solutions andapproaches. Chapter 2 is a background chapter introducing Markovian modelsthat are the benchmark against which all models introduced in this thesis arecompared. Hidden reciprocal models are presented in Chapter 3 with novel sig-nal processing tools for inference. The material in Chapter 3 is joint work withmy supervisor Prof. Vikram Krishnamurthy and Prof. Langford White that haspreviously appeared in [1] and [2]. Prof. White is credited with the originalformulation of hidden reciprocal models and the optimum filtering and state es-timation algorithms. Chapter 4 introduces stochastic context-free grammars andtheir associated inference algorithms. The chapter is primarily background ma-terial with a novel contribution towards the computation of prefix probabilitiesfor long trajectory sequences. Chapter 5 also comprises of background materialto set up the state-space model framework for legacy tracking and the switchingstate-space model framework for road-constrained tracking.The material in Chapter 6, 7 and 8 is joint work with my supervisor Prof.Vikram Krishnamurthy. The material in Chapters 6-8 is application-oriented andhas also been previously published. All applications in this dissertation have beenpeer-reviewed in international conferences and journals. The work presented inChapter 6 combines elements of publications [3], [2] and [4]. All the novel trajec-tory models presented are my original contribution. The probabilistic constraintson the rule probabilities are also my original contribution. In addition, all numer-ical simulations have been carried out by me. Chapter 7 is based on work thathas appeared in [5]. The formulation of the syntactic track-before-detect prob-lem is my original contribution, together with the particle filtering methodologyproposed. In addition, all numerical simulations have been carried out by me.Finally, the work in Chapter 8 is original work with a new application in jointgesture recognition and tracking. A version of the chapter appears in [6].iiiTA B L E O F C O N T E N T SAbstract . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . iiPreface . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . iiiTable of Contents . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . ivList of Tables . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . viiiList of Figures . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . ixList of Algorithms . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . xiAcknowledgements . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . xii1 introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11.1 Summary of Contributions . . . . . . . . . . . . . . . . . . . . . . . . . 21.1.1 Meta-level Tracking for Spatio-temporal Trajectory Analysis . 21.1.2 Syntactic Track-Before-Detect . . . . . . . . . . . . . . . . . . . 31.1.3 Gestural Command Recognition . . . . . . . . . . . . . . . . . 41.2 Scope of Research . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 52 markovian models . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 82.1 Markov Chains . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 92.2 Hidden Markov Models . . . . . . . . . . . . . . . . . . . . . . . . . . 92.3 Inference Using Hidden Markov Models . . . . . . . . . . . . . . . . 102.3.1 Model Evaluation . . . . . . . . . . . . . . . . . . . . . . . . . . 102.3.2 State Estimation . . . . . . . . . . . . . . . . . . . . . . . . . . . 122.3.3 The Viterbi Algorithm . . . . . . . . . . . . . . . . . . . . . . . 132.3.4 Parameter Estimation . . . . . . . . . . . . . . . . . . . . . . . . 142.3.5 Implementation Issues . . . . . . . . . . . . . . . . . . . . . . . 153 reciprocal models . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 173.1 Reciprocal Chains . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 173.1.1 Alternative Construction from Markov Transitions . . . . . . 193.2 Hidden Reciprocal Chains . . . . . . . . . . . . . . . . . . . . . . . . . 203.3 Inference Using Hidden Markov Bridges . . . . . . . . . . . . . . . . 213.3.1 Model Evaluation . . . . . . . . . . . . . . . . . . . . . . . . . . 213.3.2 State Estimation . . . . . . . . . . . . . . . . . . . . . . . . . . . 23iv3.3.3 The Viterbi Algorithm . . . . . . . . . . . . . . . . . . . . . . . 234 stochastic grammar models . . . . . . . . . . . . . . . . . . . . . . . 254.1 Languages . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 254.2 Grammars . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 254.3 Chomsky Hierarchy of Languages . . . . . . . . . . . . . . . . . . . . 274.4 Stochastic Languages and Stochastic Grammars . . . . . . . . . . . . 284.5 Stochastic Regular Languages, Markov Chains and Hidden MarkovModels . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 294.6 Stochastic Context-Free Languages and Grammars . . . . . . . . . . 324.6.1 Inference Using Stochastic Context-Free Grammars . . . . . . 345 state space models . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 395.1 Linear-Gaussian Dynamical System . . . . . . . . . . . . . . . . . . . 395.2 Linear-Gaussian State Space Model . . . . . . . . . . . . . . . . . . . . 395.3 Inference Using Linear-Gaussian State Space Models . . . . . . . . . 405.3.1 Kalman Filter . . . . . . . . . . . . . . . . . . . . . . . . . . . . 405.4 Non-linear State Space Models . . . . . . . . . . . . . . . . . . . . . . 425.5 Inference Using Non-linear State Space Models . . . . . . . . . . . . . 425.5.1 Extended Kalman Filter . . . . . . . . . . . . . . . . . . . . . . 425.5.2 Particle Filter . . . . . . . . . . . . . . . . . . . . . . . . . . . . 435.6 Switching State Space Models . . . . . . . . . . . . . . . . . . . . . . . 455.7 Inference Using Switching State Space Models . . . . . . . . . . . . . 465.7.1 The Interacting Multiple-Model Filter . . . . . . . . . . . . . . 465.7.2 The Multiple-Model Particle Filter . . . . . . . . . . . . . . . . 486 meta-level tracking for spatio-temporal trajectory anal-ysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 506.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 506.1.1 Literature Survey . . . . . . . . . . . . . . . . . . . . . . . . . . 526.1.2 Organization and Results . . . . . . . . . . . . . . . . . . . . . 536.2 Hierarchical Tracking Framework to Assist Human Operators . . . . 546.2.1 Base-Level Tracker . . . . . . . . . . . . . . . . . . . . . . . . . 556.2.2 Tracklet Estimation . . . . . . . . . . . . . . . . . . . . . . . . . 556.2.3 Trajectory Classification Objective . . . . . . . . . . . . . . . . 576.3 Models for Anomalous Patterns . . . . . . . . . . . . . . . . . . . . . . 576.3.1 Random Walk Models . . . . . . . . . . . . . . . . . . . . . . . 576.3.2 Reciprocal Models for Destination-Aware Trajectories . . . . . 586.3.3 Destination-aware Paths . . . . . . . . . . . . . . . . . . . . . . 596.3.4 Target Rendezvous . . . . . . . . . . . . . . . . . . . . . . . . . 606.3.5 Palindrome Paths . . . . . . . . . . . . . . . . . . . . . . . . . . 61v6.3.6 Linear Trajectories . . . . . . . . . . . . . . . . . . . . . . . . . 626.3.7 Arc-like Trajectories . . . . . . . . . . . . . . . . . . . . . . . . . 626.3.8 Rectangle Trajectories . . . . . . . . . . . . . . . . . . . . . . . 626.4 Constraints on SCFG Rule Probabilities . . . . . . . . . . . . . . . . . 636.5 Data Fusion in a Multi-Sensor Network . . . . . . . . . . . . . . . . . 676.6 Illustrative Example: Multi-target Pattern Of Life Change Detection 686.7 Numerical Examples . . . . . . . . . . . . . . . . . . . . . . . . . . . . 706.7.1 Single-Target Scenarios . . . . . . . . . . . . . . . . . . . . . . . 706.7.2 Pattern of Life Analysis . . . . . . . . . . . . . . . . . . . . . . 756.7.3 Data Fusion Using Multiple Sensors . . . . . . . . . . . . . . . 776.8 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 807 trajectory constrained track-before detect . . . . . . . . . . . 927.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 927.1.1 Why Use Stochastic Context-Free Grammars for TrajectoryModeling? . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 927.1.2 Why Consider Trajectory-Constrained TBD? . . . . . . . . . . 957.1.3 Literature Survey . . . . . . . . . . . . . . . . . . . . . . . . . . 967.2 SCFG-driven Multiple Model Track-Before-Detect . . . . . . . . . . . 987.2.1 Trajectory-Constrained Model . . . . . . . . . . . . . . . . . . . 987.2.2 Estimation Objective . . . . . . . . . . . . . . . . . . . . . . . . 1007.3 Modeling and Inference Using SCFGs . . . . . . . . . . . . . . . . . . 1017.3.1 SCFG Models for Anomalous Trajectories . . . . . . . . . . . . 1017.3.2 Comparison between SCFGs and HMMs . . . . . . . . . . . . 1037.3.3 Inference using SCFGs . . . . . . . . . . . . . . . . . . . . . . . 1057.4 Bayesian Filtering of Syntactic TBD . . . . . . . . . . . . . . . . . . . . 1057.4.1 Multiple-Model SCFG Particle Filter . . . . . . . . . . . . . . . 1067.4.2 Rao-Blackwellised Multiple-Model SCFG Particle Filter . . . . 1087.5 Numerical Examples . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1107.5.1 Reduced State-Space "Toy" Example . . . . . . . . . . . . . . . 1127.5.2 Syntactic TBD Examples . . . . . . . . . . . . . . . . . . . . . . 1137.6 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1148 gestural command recognition . . . . . . . . . . . . . . . . . . . . 1198.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1198.1.1 Literature Survey . . . . . . . . . . . . . . . . . . . . . . . . . . 1208.1.2 Organization and Contributions . . . . . . . . . . . . . . . . . 1218.2 Switching State Space Models for Gestural Commands . . . . . . . . 1238.2.1 Mode-Dependent State Dynamics . . . . . . . . . . . . . . . . 1238.2.2 Camera Sensor . . . . . . . . . . . . . . . . . . . . . . . . . . . 1258.3 Gesture Models . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 128vi8.3.1 Right-angular Patterns . . . . . . . . . . . . . . . . . . . . . . . 1298.3.2 Digital-signal like Patterns . . . . . . . . . . . . . . . . . . . . . 1298.3.3 Triangular and Trapezoidal Patterns . . . . . . . . . . . . . . . 1298.3.4 Palindromic Patterns . . . . . . . . . . . . . . . . . . . . . . . . 1298.4 Rao-Blackwellized Syntactic Tracking for Classification . . . . . . . . 1308.4.1 SCFG-based Multiple Model Particle Filter . . . . . . . . . . . 1308.4.2 Earley-Stolcke Parser For SCFG Inference . . . . . . . . . . . . 1328.5 Numerical Examples . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1338.6 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1349 conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1369.1 Discussion of Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1369.2 Future Work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1399.2.1 Approximate Inference Using Context-Sensitive Grammars . 1399.2.2 Grammatical Structure Inference . . . . . . . . . . . . . . . . . 1399.2.3 Novel Application Domains . . . . . . . . . . . . . . . . . . . . 140bibliography . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 142viiL I S T O F TA B L E STable 1 The Chomsky Hierarchy . . . . . . . . . . . . . . . . . . . . . 28viiiL I S T O F F I G U R E SFigure 1 State transition diagram of an equivalent Mealy machine . . 30Figure 2 State transition diagram of an equivalent Moore machine . . 31Figure 3 Example of a meta-level tracker output overlaid on noisyradar measurements. . . . . . . . . . . . . . . . . . . . . . . . 51Figure 4 Conceptual sketches of trajectories composed of velocityand position tracklets. . . . . . . . . . . . . . . . . . . . . . . . 52Figure 5 Hierarchical tracking framework system diagram . . . . . . 54Figure 6 Quantized position tracklets and velocity tracklets . . . . . . 56Figure 7 State transition diagram of a position-tracklet based ran-dom walk model. . . . . . . . . . . . . . . . . . . . . . . . . . 58Figure 8 Destination-aware trajectory example . . . . . . . . . . . . . . 60Figure 9 Rendezvous trajectory example . . . . . . . . . . . . . . . . . 60Figure 10 Palindrome trajectory example . . . . . . . . . . . . . . . . . 61Figure 11 Example of an arc trajectory with a minimal grammar rep-resentation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 63Figure 12 Example of an m-rectangle trajectory with a minimal gram-mar representation . . . . . . . . . . . . . . . . . . . . . . . . 64Figure 13 Multi-target pattern of life change as an anomaly . . . . . . . 68Figure 14 Depiction of shape and destination-constrained trajectories . 81Figure 15 Base-level tracker output for an m-rectangle and arc trajectory 82Figure 16 Online classification results for m-rectangle and arc trajec-tories . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 82Figure 17 Receiver operating characteristic curves and recognition ratesbetween SCFGs and HMMs . . . . . . . . . . . . . . . . . . . 83Figure 18 Online classification of destination . . . . . . . . . . . . . . . 84Figure 19 ROC curves for destination-aware and palindrome trajectories 84Figure 20 ROC curves for fused SCFG and RP models . . . . . . . . . . 85Figure 21 ROC curves for multi-target pattern of analysis . . . . . . . . 86Figure 22 An external building surveillance example . . . . . . . . . . 87Figure 23 Online fused classification in noiseless scenario with cam-eras observing linear segments of trajectory . . . . . . . . . . 88Figure 24 Online fused classification with cameras observing linearsegments of trajectory in noise . . . . . . . . . . . . . . . . . . 89Figure 25 An internal hallway surveillance example . . . . . . . . . . . 90Figure 26 Online fused classification of a noiseless rectangular trajec-tory in the internal hallway scenario . . . . . . . . . . . . . . 91ixFigure 27 Online fused classification of a noisy rectangular trajectoryin the internal hallway scenario . . . . . . . . . . . . . . . . . 91Figure 28 The Chomsky Hierarchy and examples of long-range de-pendent trajectories . . . . . . . . . . . . . . . . . . . . . . . . 93Figure 29 Deficiencies of conventional track-before-detect . . . . . . . . 94Figure 30 Syntactic track-before-detect system framework . . . . . . . . 95Figure 31 Grammatical description of an arc and an m-rectangle . . . . 102Figure 32 Comparison between HMMs and SCFGs . . . . . . . . . . . . 116Figure 33 Performance of SCFG and HMM-based switching on track-ing a reduced dimension state-space model . . . . . . . . . . 117Figure 34 Syntactic-TBD numerical simulation results . . . . . . . . . . 118Figure 35 The gesture recognition and tracking system framework . . 120Figure 36 Gesturelet directions and depiction of perspective projec-tion onto sensor pixels . . . . . . . . . . . . . . . . . . . . . . 122Figure 37 Intermediate steps for finger/hand detection . . . . . . . . . 127Figure 38 Spatio-temporal gesture patterns following syntactic rules . 128Figure 39 Simulation results for tracking and gesture classification . . 135xL I S T O F A L G O R I T H M S1 Earley-Stolcke Parser . . . . . . . . . . . . . . . . . . . . . . . . . . . . 372 SCFG-driven Multiple Model Particle Filter . . . . . . . . . . . . . . 1093 Rao-Blackwellised Particle Filter for Syntactic TBD . . . . . . . . . . 1114 Rao-Blackwellised SCFG Particle Filter . . . . . . . . . . . . . . . . . 132xiA C K N O W L E D G E M E N T SThis dissertation has been a labor of love for a majority of the time that I haveworked on it. However, in the moments that it turned onerous, many peoplearound me helped to make the journey smoother, memorable and enjoyable. Iwould like to thank Prof. Vikram Krishnamurthy, for being an excellent super-visor and a role model on how to conduct cutting-edge academic research. Attimes, his insight and wisdom seemed almost otherworldly and his ability to cutdown a problem to its core always amazes me. I hope I have imbibed much ofthis thinking for use in my future endeavors. In addition, I would also like tothank Dr. Bhashyam Balaji, Dr. Martie Goulding and Prof. Langford White fortheir valuable expertise and many discussions on various topics related to the dis-sertation.No journey can be completed without the first few steps which were made en-tirely possible only because of my parents. I would like to thank my parents,Mumtaz and HaiderAli Fanaswala, for their unwavering belief and support inmy life. My brother, Adnan Fanaswala, has always kept me a pedestal, which Ihave fallen off many times, but he refuses to acknowledge my failings. He hasbeen a rock of support and a fountain of advice far beyond his years. My wife,Heidi Wollenberg, has been through thick and thin with me. Her joyful approachto life has been one of the most important revelations in my adult life.Finally I would also like to thank my colleagues, Omid, Maziyar, Eric and Maryamat the Statistical Signal Processing Laboratory for the everyday banter and cama-raderie during the course of my education at the University of British Columbia.xii1 I N T R O D U C T I O NPattern analysis is defined in the scope of this thesis as the statistical analysis ofall signals generated in the world. These signals include images, sounds, writtentext, DNA, protein strings, spike trains in neurons, or financial and climactictime series. Pattern theory proposes that the types of patterns (and the hiddenvariables needed to describe these patterns) that are found in one class of signalswill often be found in the others and that their characteristic variability will besimilar. It is thus useful to seek generative models for signals for various signalprocessing tasks.In this dissertation, we are concerned with the tracking of a dynamic processlike the position and velocity of a target. On the fast time-scale (order of seconds),target dynamics are constrained by the physical laws of motion. Markovian mod-els can be used on such time-scale because physical variables like position andvelocity do not change instantaneously by large amounts. However, when at-tempting to extract higher-level reasoning about a target’s trajectory pattern, theinstantaneous position and velocity of a target are less meaningful. Meta-leveltracking seeks to interpret target trajectories to uncover the semantics of the tar-get movements. Such reasoning is typically performed on the order of a fewminutes. At such a scale, Markovian assumptions about the target dynamics donot necessarily hold. This dissertation seeks to address this issue by devisinggenerative models to interpret target trajectories having long-range dependenciesthat are not Markovian in nature. Such dependencies naturally arise due to thehierarchical structure inherent in planned activities.The traditional machine learning [10] or statistical pattern recognition [11, 12]approach toward the analysis of patterns studies one or more data sets with thegoals of (a) fitting probability distributions to each data set, (b) finding optimaldecision rules for classifying new data into the correct data set, and (c) separatinga single data set into clusters when it appears to be a mixture. The central issue insuch an approach is to fully model the complexities of the data source but not theaccidental variations of the specific data set (bias-variance trade-off). As a result,the emphasis is on finding discriminative models for the data set which are onlyuseful for crude categorization. On the other hand, the approach espoused in thisdissertation emphasizes finding generative models for signals which can be usednot only for characterization, but also denoising, tracking etc.The solution presented in this dissertation is to apply a more expressive andgeneral model than Markov models to characterize the sequential process. The1generative mechanism of the sequential process is encoded as a tree-like processusing a grammar-constrained linguistic framework. The objective of the thesis isto formulate a meta level tracking framework, introduce and analyze the use ofgrammar models as the knowledge representation model, and discuss properties,applications, and algorithms involved.1.1 summary of contributionsConventional fast-time scale trackers provide real time state estimates based on se-quential measurements. However, while the state estimates provide a true pictureof the process under observation, it does not give any interpretation or higher-level reasoning. This dissertation proposes a two-layer hierarchical approachwhere the interpretation capability is implemented based on output from con-ventional legacy trackers. The proposed solution is to interpret a target trajectoryas being comprised of geometric primitives called tracklets. The sequential gener-ation of the tracklets is governed by a grammar model which permits analysis ofthe filtered observations within the context of the generative syntactic model. Thesyntactic patterns observed in the track estimates contain valuable informationfor interpreting and visualizing complicated sequential data. The algorithms aredeveloped for radar-based tracking, visual surveillance as well as gesture recog-nition where gestures can be interpreted as hand trajectories.1.1.1 Meta-level Tracking for Spatio-temporal Trajectory AnalysisThe analysis of spatio-temporal trajectories is made possible by the joint use ofmeta-level tracking and modeling trajectories via stochastic context-free gram-mars. On coarse time scales, anomalous trajectories can signify target intentthrough their shape and eventual destination. Such trajectories exhibit complexspatial patterns and have well defined destinations with long-range dependenciesimplying that Markov (random-walk) models are unsuitable. How can estimatedtarget tracks be used to detect anomalous trajectories such as circling a buildingor going past a sequence of checkpoints? This dissertation develops context-freegrammar models and reciprocal Markov models (one dimensional Markov ran-dom fields) for modeling spatial trajectories with a known end-point. The intentof a target is assumed to be a function of the shape of the trajectory it followsand its intended destination. The stochastic grammar models developed are con-cerned with trajectory shape classification while the reciprocal Markov modelsare used for destination prediction. We also generalize the reciprocal process asa special case of a time-varying context-free grammar. The main contributionsrelated to this topic in Chapter 6 are:21. A meta-level tracking framework is formulated using a base level tracker(operating on a fast time-scale) and a higher-level intelligence layer (operat-ing on a slower time-scale). The detection of anomalous trajectories is formu-lated as a supervised classification problem. The SCFG and RP models usethe output from base-level tracking algorithms to either provide feedback toenhance the tracker or to perform higher-level inference recognizing anoma-lous trajectories. As a result, the tools developed are legacy-compatible.2. SCFGs are used as a modeling framework for spatial patterns like arcs, rect-angles, closed paths etc. They are also used to model move-stop-move be-havior which is a common evasive tactic used by targets. Towards this end, aquantized representation of velocity directions are used as low-level features.This representation allows a novel application in modeling the destinationof a target by placing constraints on the SCFG rule probabilities. As a result,both shape and destination can be captured in the same generative SCFGmodel.3. While SCFGs are able to constrain the final destination in an expected sense,anomalous trajectories which are sensitive to the exact destination require anexplicit position-based representation. This is carried out through the noveluse of RPs and Markov bridges. Towards this end, a grid-based quantizationof Cartesian space is used as low-level features. Such a representation allowsdestination prediction of the target which can further be used to enhance theaccuracy of the base-level tracker.4. Novel trajectory models are proposed for various shapes, destination-awaretargets and other anomalous scenarios like change in pattern of life, ren-dezvous events and target re-tracing their path. These involve both single-target and multi-target scenarios. In addition, stochastic reciprocal pro-cesses are generalized as a special case of time-varying stochastic context-free grammars.5. The use of SCFGs for shape classification and the use of RPs for destina-tion prediction is combined in a probabilistic fusion framework to providean inference that is able to use both shape and destination cues to detectanomalous trajectories of interest.1.1.2 Syntactic Track-Before-DetectConventional target tracking algorithms employ a detect-then-track approach thatperforms filtering based on target detection. A target is detected by applying a3hard threshold on the sensor measurement utilizing a suitable metric, for exam-ple, a constant false alarm ratio. Such a method of first detecting targets andsubsequently forming tracks is more efficient in terms of computational complex-ity. However, in dimly lit (low signal-to-noise ratio) conditions, the backgroundclutter is often at a comparable strength to the target returns and hard threshold-ing leads to overwhelming spurious detections. A track-before-detect (TBD) ap-proach uses multiple frames of the raw sensor measurements with the objectiveof avoiding a hard thresholding decision. Consequently, TBD algorithms jointlyestimate the existence of the target (detection) as well as track its kinematic state(filtering).In this dissertation, we enhance the TBD approach by exploiting target trajec-tory patterns developed in Chapter 6. Conventionally, maneuvering targets aremodeled using jump Markov state space models. However, modeling maneu-vers via Markov chains does not facilitate modeling complex spatial trajectorieslike u-turns, closed trajectories and circling patterns. The main contribution inChapter 7 is to model maneuvering targets as a stochastic context-free grammar(SCFG) modulated state space model. The proposed track-before-detect approachexploits higher level information regarding the intent and/or trajectory patternof the target. We show that the resulting syntactic TBD algorithms can success-fully detect and track targets at significantly lower SNR than conventional TBDalgorithms. The main contributions related to this topic in Chapter 7 are:1. A novel formulation of the syntactic track-before-detect using target dynam-ics and measurement process that allow embedding trajectory constraints.This formulation is not specific to a specific sensing modality and can beused with either radar or infra-red sensors.2. The Earley-Stolcke parser for SCFG is combined in a novel manner within aparticle filtering solution to perform state estimation of the target trajectoryin low SNR conditions. In addition, a Rao-Blackwellised version is alsopresented for variance-reduction in the state estimates.1.1.3 Gestural Command RecognitionA gesture is another example of a spatio-temporal pattern and can be consideredas a trajectory. While the syntactic track-before-detect problem involves highlynon-linear measurements, gesture recognition using visual or time-of-flight sen-sors involves a mildly non-linear measurement process that can be made linearin homogeneous coordinates and using an adaptive measurement matrix. In thisdissertation, we examine the problem of gestural command recognition using avision-based approach. A physics-based generative model is proposed for ges-4tures utilizing a regime-switching state space model. The 3D coordinates of thehand/finger is the state variable of interest whose evolution is driven by the ges-turelets composing a specific gesture. A perspective projection through a pin-holecamera is used as the sensing modality with an additional “detector” stage toconvert the image measurement into a point measurement. The joint trackingand classification of the gesture is done through a filter bank of model-matchedfilters using a novel Rao-Blackwellised approach. The main contributions relatedto gesture recognition in Chapter 8 are:1. A novel formulation of the joint tracking and classification of static gesturerecognition. This involves a regime-switching state space model with ges-turelets as the geometric primitive of a gesture.2. The development of a library of novel gestures which are highly discrimina-tive and are invariant to user signing speed under the SCFG formalism.3. The use of the projective Kalman filter to incorporate the non-linear effectsof the perspective projection due to the pinhole camera model.4. A novel Rao-Blackwellised particle filter combining the projective Kalmanfilter and the Earley-Stolcke parser is used together with a novel scalingtrick in the Earley-Stolcke parser to account for long trajectories.1.2 scope of researchThis section describes what is not within the scope of the thesis. There are twomain approaches towards meta-level pattern analysis espoused in this disserta-tion. The hierarchical tracking framework is used mainly for classification andvisualization purposes. This involves splitting the tracking problem on two timescales. A conventional tracker is run on the fast time-scale while the SCFG trackerruns at the coarse time-scale. The other approach involves embedding trackletsas modes of a switching state-space model. This allows the joint tracking andclassification of target trajectories. This dissertation is thus restricted to modelingframeworks that fit within either of these approaches.The analysis and interpretation of time-series involved with object tracking arebased on algorithms formulated using mainly SCFGs, and it is not straightfor-ward to extend them to more general grammars such as context sensitive or unre-stricted grammar. In terms of knowledge representation, the algorithms derivedare applicable mainly to interpret syntactic patterns in one dimensional sequen-tial data. Even though more complicated grammars such as graph grammars andtree grammars may be applied to recognize patterns in higher dimensional data,they are not considered in this thesis; one dimensional sequential data is the most5common data representation for tracking applications. Moreover, the emphasisof this thesis is on novel models and the associated signal processing related toapplication in tracking. The parameter estimation problem of learning the ruleprobabilities or transition probabilities of the SCFG and RP models respectivelyare not addressed in a detailed manner. There is an established literature base forthe learning problem in SCFGs which can be utilized to learn for more generalpurpose grammars. The scope of this thesis is limited to simpler grammars whoserule probabilities can be determined using system-theoretic constraints that havebeen developed in the dissertation. Moreover, in the case of RP models, variousconstruction methods are provided in the dissertation for the associated three-point transition probabilities. In addition to the scope imposed by the models,assumptions are also made in the application areas to limit the scope, and theywill be described next.In the hierarchical two-scale tracking approach, the aim is to a) develop usefultrajectory models for anomalous patterns incorporating long-term dependenciesand b) develop signal processing tools for the classification and state-estimationusing SCFG and RP models. In this context, we do not impose any constraints onthe algorithms used in the low-level tracker which is highly desirable for legacypurposes. The only assumption made is that the low-level tracker is able to pro-vide filtered estimates of the target position and velocity. Moreover, the low-leveltracker is also responsible to resolve data association issues and tracking errorsrelated to non-ideal detection (such as missed detections and false tracks). Thehigher-level SCFG tracker assumes such a scenario and only deals with trackletdetection errors. The scalability of this approach towards multi-target scenariosis fairly trivial because the low-level tracker is burdened with the data associa-tion problem. As long as the legacy tracker is able to separate targets, trajectoryclassification can be independently carried out on each target track.In the regime-switching state space model, the emphasis is on the joint track-ing and classification of the target trajectories. Target tracking, in this case, isconsidered to be a hybrid state estimation problem. As a result, the treatmentin this dissertation is mainly focused on the methodology used to incorporatetrajectory constraints into the filtering framework and on sub-optimal filteringalgorithms using SCFG trajectory models. There is no distinction between a low-level or a high-level tracker. Additionally, in this case also, we consider idealdetection and do not deal with data association issues. The main motivation to-wards using SCFG models is to show deviation from Markovian assumption. Asa result, only hidden Markov model based comparisons are used in numerical ex-periments. Other models such as semi-Markov and hierarchical Markov modelsmay afford better performance but they still suffer from the bounded-order de-pendency as vanilla HMMs. In this approach, the extension to multiple targets is6not straight-forward. If the targets are well separated, then existing data associa-tion algorithms can be used. However, in the case of interacting targets, the use oftrajectory constraints can also help disambiguate paths to allow for multi-targettracking. However, such a scenario has not been explored in this thesis.72 M A R K O V I A N M O D E L SIn this chapter, a review of Markovian models is provided to use as a benchmarkfor comparison against novel models introduced in this thesis. The most naturalway to treat temporal data is to ignore the sequential aspects and treat each theobservations as identical and independently distributed (i.i.d) variables. Such anapproach fails to exploit the sequential dependencies present in the data, such ascorrelations between observations that are “close” in the sequence. Time-series aredata records for which the constituent data points can be naturally ordered. Thisorder often corresponds to an underlying single physical dimension, typicallytime t, though any other single dimension may be used. The time-series modelswe consider are probability models over a collection of random variables x1, . . . , xTwith individual variables xt indexed by time t. A probabilistic time-series modelrequires a specification of the joint distribution P{x1, . . . , xT}. In the case of theobserved data xt being discrete, the joint probability table for P{x1, . . . , xT} hasexponentially many entries. It is infeasible to independently specify all the expo-nentially many entries. Consequently, there is a need for simplified models underwhich these entries can be parameterised in a lower-dimensional manner.For consistency with the causal nature of time, we consider the cascade decom-position of the joint distributionP{x1:T} =T∏t=1P{xt|x1:t−1} = P{x1}T∏t=2P{xt|x1:t−1} (1)with the convention P{xt|x1:t−1} = P{x1} for t = 1. As a simplifying assump-tion, it is often useful to assume that the influence of the immediate past is morerelevant than the remote past. In Markovian models, only a limited number ofprevious observations influence the future. In the following discussion, all dy-namical models use the notion of a state to represent the variable xt.Definition 1. The state xt of a dynamical system is the smallest subset of system variablesthat can represent the entire configuration of the system at any given time t.For example, a simple dynamical model for human speech uses the amplitudeof the speech signal as the state variable. In radar tracking, the evolution of atarget is represented using its instantaneous position, velocity and acceleration asstate variables. In contrast, a dynamical model of video transmission uses an 8-bitdiscrete intensity to represent the state of a pixel in the video at time t.8Let X denote the set of all possible distinguishable states x of a dynamicalsystem (also called the state space). Let nx ∈ N denote the dimension of thestate space. Depending on the system being modeled, the state space can becontinuous-valued (X ⊂ Rnx) or discrete-valued such that |X | = NX (where |X |represents the size of the set).2.1 markov chainsA Markov chain defined on the state sequence x1:T is a probabilistic model inwhich the following conditional independence assumption holds:P{xt|x1:t−1} = P{xt|xt−L:t−1} (2)where L ≥ 1 is the order of the Markov chain and xt = ∅ for t < 1. Theprobability law in (2) implies that a future state only depends on the immediateL lagged states. In the remainder of this chapter, a first order Markov chain isused to simplify notation. This does not result in any loss of generality becausea Lth order Markov chain can be expressed as a first-order Markov chain with ahigher dimensional state space X L. As a result, the specification of a first orderMarkov chain only requires knowledge of the initial state distribution P{x1} andthe transition distribution P{xt|xt−1}. For a discrete-valued state space X , thetransition distribution of a first order Markov chain can be represented usinga NX × NX transition matrix Mt. In most cases, the transitions are only state-dependent and are independent of time. The resulting Markov transition matrixM = {Mij} obtained by dropping the dependence on time t is called homogeneousor stationary. The initial state distribution pi = {pii} is a probability mass vectorsuch that∑i pii = 1.Definition 2. A discrete-valued Markov chain GMC = 〈X ,pi, M〉 is a 3-tuple consistingof a finite state space X , an initial state distribution pi and a transition matrix M witheach element Mij = P{xt = j|xt−1 = i} representing the conditional probability oftransitioning from state i ∈ X to state j ∈ X .2.2 hidden markov modelsA hidden Markov model (HMM) is a doubly stochastic process defined by an un-derlying Markov chain on hidden (or ‘latent’) states x1:T that are not directly ob-servable. The observed variables y1:T are dependent on the hidden states throughan emission probability P{yt|xt}. This defines a joint distributionP{x1:T, y1:T} = P{y1|x1}P{x1}T∏t=2P{yt|xt}P{xt|xt−1} (3)9characterized by the transition distribution P{xt|xt−1} and the emission distribu-tion P{yt|xt}. As defined in Sec. 2.1, the transition distribution P{xt = j|xt−1 = i}for a finite-state HMM is represented by the NX × NX matrix M = {Mij}. If theobservations yt also take discrete values k from a finite observation set Y , the emis-sion distribution P{yt = k|xt = j} can be represented using a NX × NY matrixC = {Cj,k}. For continuous outputs, the emission distribution selects one of NXpossible output distributions P{yt|xt = j}, j ∈ {1, . . . , NX }. A common assump-tion is to use a parametric density like the Gaussian distribution. In such a case,the emission probabilities of the HMM are represented by a vector of distributionsC = Cj(yt), j ∈ {1, . . . , NX }.Definition 3. A discrete-valued hidden Markov model with discrete observations GHMM =〈X ,Y ,pi, M, C〉 is a 5-tuple consisting of a finite state space X , a finite observation spaceY , an initial state distribution pi, a transition matrix M with each element Mij = P{xt =j|xt−1 = i} representing the conditional probability of transitioning from state i ∈ X tostate j ∈ X and an emission probability structure C representing the conditional proba-bility of emitting observation y ∈ Y when the latent state is j ∈ {1, . . . , NX }.2.3 inference using hidden markov modelsThe basic probabilistic queries that we are in interested in computing for hiddenMarkov models are the following:2.3.1 Model EvaluationThe model evaluation problem seeks to compute the likelihoodP{y1, y2, . . . , yT; GHMM} of the observations y1:T being generated by a specifiedmodel GHMM. This quantity is also called the probability of the evidence P{y1:T}where the explicit dependence on the model GHMM is dropped for simpler no-tation. The likelihood computation can be performed by summing up over allhidden paths P{y1:T} = ∑x1:T P{x1:T, y1:T}. This can be efficiently computed us-ing the forward algorithm.2.3.1.1 The Forward AlgorithmConsider the joint probability P{x1:T, y1:T} = P{y1:T|x1:T}P{x1:T} of a hiddenstate sequence x1:T and the observation sequence y1:T. The factorization involvesthe conditional probability of the observation sequence given a state sequenceP{y1:T|x1:T} =T∏t=1P{yt|xt}, (4)10where each observation yt depends only on the underlying state xt. The prob-ability of the state sequence P{x1:T} can be factorized using the cascade de-composition in (1). Consequently, the probability of the observation sequencey1, . . . , yt, . . . , yT given a specific HMM GHMM can be calculated in a brute-forcemanner by enumerating every possible state sequence x1, . . . , xt, . . . , xT of lengthT and marginalizing their joint probabilityP{y1:T} =∑x1:TP{y1:T|x1:T}P{x1:T}=∑x1,...,xTT∏t=1P{yt|xt}P{xt|xt−1}. (5)The interpretation of the computation in (5) is that at each time instant t, the HMMtransitions from a state xt−1 to a subsequent state xt with the transition probabilityP{xt|xt−1}. Simultaneously, an observation yt is emitted with emission probabil-ity P{yt|xt}. Since there are NTX possible state sequences for a given model GHMM,the evaluation of (5) involves 2TNTX calculations. However, a computationally ef-ficient procedure called the forward algorithm can be used in practice.Consider the forward variable αt(i) = P{y1:t, xt = i; GHMM} defined as the prob-ability of the partial observation sequence y1, . . . , yt (until time t) and the state iat time t given a model GHMM. The forward variable can be solved inductively asfollows:initialization The forward probabilities αt are initialized as the probabilityof the initial state being i and emitting the observation y1 such thatα1(i) = piiCi(y1), 1 ≤ i ≤ NX . (6)induction The induction step can be understood by using the trellis diagram inFig. showing how state j can be reached at time t + 1 from the NX possiblestates at time t. Since the transition of each state i is considered to all statesin X , the induction step runs in O(N2X ) time. The key idea is that becausethere are only NX states, all the possible state sequences will merge intothese NX states regardless of the length of the observation sequence.αt+1(j) =[NX∑i=1αt(i)Mij]Cj(yt+1),1 ≤ t ≤ T − 11 ≤ j ≤ NX(7)termination Finally, the probability of the observation sequence can be ob-tained by marginalizing the forward probability αT at the final time T overall the states.P{y1:T; GHMM} =NX∑i=1αT(i) (8)112.3.1.2 The Backward AlgorithmIn a similar manner, consider the backward variable βt(i) = P{yt+1:T|xt = i}defined as the probability of the partial observation sequence from t + 1 to thefinal time T, given state i at time instant t. An induction procedure can also beused for the backward probabilitiesinitialization The backward probabilities βt are initialized by arbitrarily defin-ing βT to be 1 for all states i such thatβT(i) = 1, 1 ≤ i ≤ NX . (9)induction The induction in (10) shows that in order to have been in state iat time instant t, and to account for the observation sequence from timet + 1 onwards, all the states j at time t + 1 must be considered. This is ac-complished by the transition term from state i to j, the emission probabilityterm Cj(yt+1) and the remaining partial observation sequence from state jrepresented by the previous backward probability βt+1(j).βt(i) =NX∑j=1MijCj(yt+1)βt+1(j),t = T − 1, T − 2, . . . , 11 ≤ i ≤ NX(10)termination The backward probability can also be used to compute the likeli-hood of the observation sequence byP{y1:T; GHMM} =NX∑i=1β1(i) (11)2.3.2 State EstimationThe state of the underlying Markov chain is a “hidden” variable which can beestimated based on the observations y1:t. An optimality criterion is required tosolve the state estimation as best as possible in some sense. For example, smooth-ing refers to computing the marginal distribution of the hidden state P{xt|y1:T}when the entire observation sequence is received. Such a criterion seeks the statesxt that are individually most likely at each time t and it maximizes the expectednumber of correct individual states. It can be implemented by defining the poste-rior probability variable γt(i) = P{xt = i|y1:T} as the probability of being in state12i at time instant t, given the entire observation sequence y1:T. Using the forwardαt and backward βt probabilitiesγt(i) =P{xt = i, y1:T}P{y1:T}= P{xt = i, y1:T}∑NXi=1 P{xt = i, y1:T}= αt(i)βt(i)∑NXi=1 αt(i)βt(i). (12)Using γt(i), the individually most likely state x∗t is chosen at time instant t asx∗t = arg maxi∈X[γt(i)] , 1 ≤ t ≤ T. (13)2.3.3 The Viterbi AlgorithmAs opposed to the individually most likely state estimates in (13), the single beststate sequence x∗1:T for the observation sequence y1:T can be recovered using adynamic programming algorithm. Define the Viterbi probabilityνt(i) = maxx1,x2,...,xt−1P{x1, x2, . . . , xt−1, xt = i, y1:t} (14)which computes the best score (highest probability) along a single path that ac-counts for the first t observations and ends in state i. By induction, the Viterbiprobabilities can be recursively computed asνt+1(j) = maxi[νt(i)Mij]Cj(yt+1). (15)To retrieve the optimal state sequence x∗1:T, an auxiliary array ψt(j) is used to keeptrack of the argument (state) that maximized (15) for each t and j. The completealgorithm is as follows:initialization The Viterbi probabilities νt are initialized similar to the forwardprobabilities as the joint probability of the initial state being i and emittingthe observation y1. The auxiliary array is initialized with 0 for each state.ν1(i) = piiCi(y1), 1 ≤ i ≤ NX (16)ψ1(i) = 0. (17)13induction The induction in (15) is very similar to the forward algorithm exceptthat the sum has been replaced by the max operator.νt(j) = maxi[νt−1(i)Mij]Cj(yt),2 ≤ t ≤ T1 ≤ j ≤ NX(18)ψt(j) = arg maxi[νt−1(i)Mij],2 ≤ t ≤ T1 ≤ j ≤ NX(19)termination The Viterbi algorithm terminates by choosing the max state argu-ment for the Viterbi probabilitiesx∗T = arg maxi[νT(i)] . (20)path back-tracking Finally, the optimal state sequence can be recovered byback-tracking over the auxiliary array such thatx∗t = ψt+1(x∗t+1), t = T − 1, T − 2, . . . , 1. (21)2.3.4 Parameter EstimationThe estimation problem involves finding the “right” model parameter values θthat specify a model GHMM(θ) most likely to produce sample paths similar to agiven data set D. A maximum likelihood (ML) method called the Baum-Welch al-gorithm can be adopted to solve the optimization problem arg maxθ P{D; GHMM(θ)}.To describe the re-estimation procedure of HMM parameters, define the probabil-ity ζt(i, j) = P{xt = i, xt+1 = j|y1:T; GHMM} of being in state i at time instant tand in state j at time t + 1 given the entire observation sequence y1:T and a givenmodel GHMM. From the definition of forward and backward variables, ζt(i, j) canbe written asζt(i, j) =P{xt = i, xt+1 = j, y1:T}P{y1:T}=αt(i)MijCj(yt+1)βt+1(j)P{y1:T}=αt(i)MijCj(yt+1)βt+1(j)∑NXi=1 ∑NXj=1 αt(i)MijCj(yt+1)βt+1(j). (22)The smoothing probability variable γt(i) = ∑NXj=1 ζt(i, j) can be obtained by marginal-izing over the state at time t+ 1. If we sum γt(i) over the time index t, the resultant14quantity can be interpreted as the expected number of times that state i is visited.Equivalently,∑T−1t=1 γt(i) can also be interpreted as the as the expected numberof transitions made from state i. Similarly, the summation∑T−1t=1 ζt(i, j) can beinterpreted as the expected number of transitions from state i to state j. Usingthese quantities, the following re-estimation formulas can be used to obtain theparameters of an HMMpii = γ1(i) (23)Mij =∑T−1t=1 ζt(i, j)∑T−1t=1 γt(i). (24)If the current model parameters are defined as θ and the re-estimated modelparameters from (23) are denoted by θ, then it is shown in [13] thatP{y1:t; GHMM(θ)} > P{y1:t; GHMM(θ)}.The re-estimation formulas in (23) can be derived directly using the expectation-maximization algorithm [14]. When the observations are discrete-valued, theemission probabilities can be re-estimated byCjk =∑Tt=1, s.t yt=k γt(j)∑Tt=1 γt(j). (25)2.3.5 Implementation IssuesThe computation of the model likelihoods consists of the sum of a large numberof product terms in (3). Since, each transition probability and emission probabil-ity term is less than 1, each term of the forward probability αt(i) or the backwardprobability βt(i) exponentially decreases to zero as t starts to become large. Forsufficiently large t, the dynamic range will exceed the precision range of mod-ern computers. To deal with such underflow issues, a scaling procedure can beincorporated into the forward, backward and Viterbi procedures.The basic scaling approach multiplies αt(i) by a scaling coefficient ct that is in-dependent of i (it depends only on t) with the goal of keeping the scaled αt(i)within the dynamic range of the computer for 1 ≤ t ≤ T. A similar scaling isapplied to the βt(i) terms and finally, at the end of the computation, the scalingcoefficients are canceled out exactly. Define αˆt(i) as the appropriately scaled for-ward probability, α˜t(i) as the local scaled version and αt(i) as the unscaled version.For t = 1, the initial forward probability α1(i) is calculated using (6). The locallyscaled version α˜1(i) = α1(i), with c1 =[∑NXi=1 α1(i)]−1. Finally, the scaled version15αˆ1(i) = c1α1(i). For each 2 ≤ t ≤ T, the local unscaled version is first computedaccording to the inductionα˜t+1(j) =[NX∑i=1αˆt(i)Mij]Cj(yt+1),1 ≤ t ≤ T − 11 ≤ j ≤ NX.The scaling coefficient ct =[∑NXi=1 α˜t(i)]−1is then determined using the localscaled version which, in turn, is used to obtain the scaled version αˆt(i) = ctα˜t(i).Next, the backward probability βt(i) terms are computed by using the same scalefactors ct obtained from the forward procedure. Hence, the scaled βs are of theform βˆt(i) = ctβt(i). Each scale factor effectively restores the magnitude of the αterms to 1. Due to the comparable magnitudes of the α and β terms, using thesame scaling factors on the βs is an effective method to keep the computationwithin reasonable bounds.In order to compute the probability of the observation sequence, we can use theproperty that∏Tt=1 ct ∑NXi=1 αT(i) = 1. Since P{y1:t} = ∑Tt=1 αt(i), the observationlikelihood can be computed asP{y1:T} =1∏Tt=1 ct. (26)163 R E C I P R O C A L M O D E L SReciprocal processes are one-dimensional Markov random fields (MRFs), althoughthey are not Markov processes in the usual sense. However, any Markov process isreciprocal. Such class of processes is more suitable than Markov models for char-acterizing one-dimensional spatial phenomenon and temporal processes whichare probabilistically determined by the future in addition to past values. Recip-rocal processes share the fundamental property of MRFs in that their statisticalproperties may be specified by a nearest neighbor condition.3.1 reciprocal chainsConsider a random process xt indexed on the set t ∈ {1, . . . , T} for some integerT ≥ 3. We assume that the process takes values in some set X with finite car-dinality NX ≥ 1. The state space X is the set of integers X = {1, . . . , NX } ( adiscrete-state process). A stochastic process is defined to be reciprocal in natureifP{xt|xt′ , t′ 6= t} = P{xt|xt−1, xt+1}, (27)for each t = 3, . . . , T − 1. As a result, xt is conditionally independent of its valueat all time points x1, . . . , xt−2, xt+2, . . . , xT given its neighbors xt−1 and xt+1. Fora finite state process, the 3-point transitions can be denoted using a 3D arrayQ = {Qijl} where element Qijl = P{xt = j|xt−1 = i, xt+1 = l}. The RC modelis specified by its 3-point transition functions in (27) along with a given jointdistribution Π = P{x1, xT} on the end-points of the process. The end-point dis-tribution Π = {Πim} is a probability mass matrix such that ∑NXi=1 ∑NXm=1 Πim = 1.Each element Πim represents the joint probability of the chain starting in state iand terminating in the state m at time T.Definition 4. A discrete-valued reciprocal chain GRC = 〈X ,Π, Q〉 is a 3-tuple consist-ing of a finite state space X , an end-point distribution Π and a 3-point transition matrixQ with each element Qijl = P{xt = j|xt−1 = i, xt+1 = l} representing the conditionalprobability of transitioning to state j ∈ X given that the state at the previous time pointxt−1 = i and the state at the future time instant xt+1 = l.A reciprocal chain can be considered to be composed of NX Markov bridges,each Markov bridge corresponding to the end-point xT = m being fixed at one17of the m ∈ NX states. The important role of Markov bridges become apparentwhen we consider the joint distribution of the states of a RC, using direct Bayes’conditioning and (27)),P{x1:T} = P{x1, xT}T−1∏t=2P{xt|xt−1, xt+1}. (28)The factorization in (28) follows directly from the Hammersley-Clifford theorem[15]. The product terms in (28) are the state transitions for a Markov bridge“pinned” at xT and depend only on the 3-point transitions in (27). A Markovbridge can be modeled from an RC model utilizing the following property of areciprocal process ([16], property (a3), p.80). Let 1 ≤ s < t < u < v ≤ T, thenP{xu|xs, xv}P{xt|xs, xu} = P{xt|xs, xv}P{xu|xt, xv}. (29)Let s = t− 1, u = t + 1, v = T, then the probability law represented by a Markovbridge isP{xt|xt−1, xT} =P{xt|xt−1, xt+1}P{xt+1|xt, xT}P{xt+1|xt−1, xT}. (30)We can then fully specify the set of NX Markov bridges using (30) by the backwardrecursion for t = T − 2, T − 3, . . . , 1Bmij (t) = P{xt+1 = j|xt = i, xT = m}=QijlBmjl (t + 1)(NX∑j′=1Qij′lBmj′l(t + 1))−1. (31)The last term on the right-hand side of (31) is the normalization constant. TheMarkov bridge transition probabilities are initialized at T− 1 by Bmij (T− 1) = Qijm.It can be shown that the quantity on the right hand side of (31) is independentof the index l. As mentioned in [16], this property is the RC analogue to theChapman-Kolmogorov equation for Markov processes. Since the terms Bmj,·(t + 1)form a probability distribution for each i, m and t, there is at least l for whichBmj,l(t + 1) is non-zero. This index may be selected to ensure (31) is well-defined.From a numerical perspective, it is appropriate to choose a value of l (for eachj, m, t) which maximizes the value of Bmj,l(t + 1).The specification of the NX Markov bridges from the specified 3-point transi-tions requires a total complexity of O(N3X T), where each MB requires complexityO(N2X T). For a given Markov bridge model pinned at a final state m, the setof T − 1 MB transition matrices are denoted using Bmt = {Bmij (t)}. Each Bmt is18a NX × NX stochastic matrix such that ∑NXi=1 Bmij (t) = 1. Additionally, a Markovbridge with final state m has an initial state distribution pimi given by the condi-tional distributionpimi = P{x1 = i|xT = m} =Πim∑NXi′=1 Πi′m. (32)Definition 5. A finite state Markov bridge GMB = 〈X , m,pim, Bmt 〉 is a 4-tuple consist-ing of a finite state space X , a final destination state m to be reached at time T, an initialstate distribution conditioned upon the final state pim = {pimi } and a set of time-varyingtransition matrices Bmt with each element Bmij (t) = P{xt = j|xt−1 = i, xT = m} repre-senting the conditional probability of transitioning to state j ∈ X given that the state atthe previous time point xt−1 = i and the desired final state is m at time T.The representation of reciprocal processes using the concept of Markov bridgesallows a convenient characterization for fixing the end-point of a target’s trajectory.This permits consideration of “destination-constrained" trajectories which can beimparted with the intent of the target. The starting and ending point of thetrajectory is thus constrained by a Markov bridge model θn ∈ Θ. Each suchtrajectory with different ending points can be represented using a Markov bridgemodel. If we consider the intent of the target as deciding which of the n regionsthe target will end up in, the intent inference task amounts to classifying theobserved trajectory of the target to one of the N models in Θ.3.1.1 Alternative Construction from Markov TransitionsIn many case, 3-point transitions are not available for the construction of a Markovbridge model. In such cases, suppose we have the time-varying Markov transitionmatrices Mi,j(t) = P{xt+1 = j|xt = i} and an initial state distribution pii =P{x1 = i}. To generate an MB, we must ensure that the probability of the endpointxT = m is unity for some state m ∈ X . Consider the transition probabilitiesBmij (t) = P{xt+1 = j|xt = i, xT = m} which specify the probability of transitioningto state j, given the current state i at time t and the final destination state m attime T. Using Bayes’ rule, it can be shown [16] that the transition probabilitiesBmij (t)Bmij (t) = P{xt+1 = j|xt = i, xT = m}= P{xt+1 = j|xt = i}P{xT = m|xt+1 = j}P{xT = m|xt = i}=Mi,j(t)Φj,m(t + 1, T)Φi,m(t, T), (33)19where Φ(s, t) = Φi,m(s, t) is defined for s ≤ t byΦ(s, t) ={I s = tMt−1Mt−2 . . . Ms s < t, (34)and I is the identity matrix. In line 2 of (33), we have applied the Markov propertythat xT is conditionally independent of xt given xt+1. In order to construct an RC,endpoints (x1 = i, xT) are drawn from the endpoint distribution Π and constructa Markov bridge starting at x1 = i using the transition probabilities defined in(33). The equivalent 3-point transitions for this process is specified by the Markovtransitions Mt asQijm =Mji(t− 1)Mik(t)Φjm(t− 1, t + 1). (35)3.2 hidden reciprocal chainsSuppose that the reciprocal chain x1:T is observed via the observation process y1:T.The observation yt is assumed to be conditionally independent of xt′ and yt′ giventhe state xt. Equivalently, the state xt is hidden and is observed due to corruptionby a mutually independent noise process which is also statistically independentof xt. This conditional independence implies thatP{y1:T|x1:T} =T∏t=1P{yt|xt}.The process y1:T is called a hidden reciprocal chain (HRC). The conditional ob-servation likelihoods (or emission probabilities) of the HRC is denoted by C. Inthe case of a finite alphabet observation set Y , the observation likelihood canbe represented by a NX × NY matrix such that Cjk = P{yt = k|xt = j}, wherek ∈ Y , j ∈ X . In the more general case of a continuous-valued observation pro-cess, the observation likelihood is represented by an array of probability distribu-tions Cj(yt) = P{yt|xt = j}. Each of the NX Markov bridges characterizing thereciprocal chain can be augmented with the observation likelihoods C to define ahidden Markov bridge.Definition 6. A discrete-valued hidden Markov bridge GHMB = 〈X ,Y , m,pim, Bmt , C〉is a 6-tuple consisting of a finite state space X , an observation space Y , a final destinationstate m ∈ X , an initial state distribution pim conditioned on the final state m , a setof time-varying transition matrices Bmt with each element Bmij (t) = P{xt = j|xt−1 =i, xT = m} representing the conditional probability of transitioning to state j ∈ X given20the state i ∈ X at the previous instant and the final state m ∈ X at time T, andan emission probability structure C representing the conditional probability of emittingobservation y ∈ Y when the hidden state is j ∈ X .3.3 inference using hidden markov bridgesAnalogous to the HMM case, probabilistic inference using hidden Markov bridgesconstitute the following:3.3.1 Model EvaluationThe model evaluation problem using an HRC model seeks to compute the prob-ability of the observation sequence P{y1, y2, . . . , yT; GHRC} being generated by aspecific hidden reciprocal chain model GHRC. A procedure similar to the forwardalgorithm for HMMs can be used when dealing with Markov bridges.3.3.1.1 The Forward AlgorithmConsider the forward variable αmt (i) = P{y1:t, xt = i|xT = m; GHMB} defined asthe probability of the partial observation sequence y1, . . . , yt (until time t) and thestate i at time t given the final state xT = m and a specified model GHMB. Theforward variable can be solved inductively as follows:initialization The forward probabilities αmt are initialized as the product ofthe probability of the initial state being i given that the final state is m andthe probability of emitting the observation y1 such thatαm1 (i) = P{x1 = i|xT = m}P{y1|x1 = i}= pimi Ci(y1), 1 ≤ i ≤ NX . (36)induction The induction step recursively computes the forward probabilitysuch thatαmt+1(j) =NX∑i=1P{xt+1 = j, xt = i, y1:t+1|xT = m}=[NX∑i=1P{xt+1 = j|xt = i, xT = m}P{xt = i, y1:t|xT = m}]× Cj(yt+1)=[NX∑i=1Bmij (t)αmt (i)]Cj(yt+1),1 ≤ t ≤ T − 11 ≤ j ≤ NX(37)21termination Finally, the probability of the observation sequence given the fi-nal state m can be obtained by marginalizing the forward probability αmT atthe final time T over all the states.P{y1:T|xT = m; GHMB} =NX∑i=1αmT (i) (38)The observation likelihoodP{y1:T} =NX∑m=1P{y1:T|xT = m}P{xT = m}can be computed by marginalizing over the joint probability of the observa-tion sequence y1:T and all final states xT. The end state probability distri-bution P{xT} is obtained by marginalizing the endpoint distribution Π overall initial states for each final state 1 ≤ m ≤ NX .3.3.1.2 The Backward AlgorithmConsider the backward variable βmt (i) = P{yt+1:T|xt = i, xT = m} defined asthe probability of the partial observation sequence from t + 1 to the final time T,given state i at time instant t and the pinned final state m at time T . An inductionprocedure can also be used for the backward probabilitiesinitialization The backward probabilities are initialized by defining βmT−1such thatβmT−1(i) = P{yT|xT−1 = i, xT = m}= Cm(yT), 1 ≤ m ≤ NX . (39)induction The induction step uses the definitionβmt (i) =NX∑j=1P{yt+1:T, xt+1 = j|xt = i, xT = m}=NX∑j=1Cj(t + 1)P{yt+2:T|xt+1 = j, xT = m}× P{xt+1 = j|xt = i, xT = m}=NX∑j=1Cj(yt+1)βmt+1(j)Bmij (t),t = T − 1, T − 2, . . . , 11 ≤ i ≤ NX(40)22termination The backward probability can also be used to compute the likeli-hood of the observation sequence given the final state m byP{y1:T|xT = m; GHMB} =NX∑i=1βm1 (i) (41)3.3.2 State EstimationState estimation using hidden Markov bridges can be implemented by definingthe smoothing density γmt (i) = P{xt = i|y1:T, xT = m} as the probability of beingin state i at time instant t, given the entire observation sequence y1:T and the finalstate m. Using the forward αmt and backward βmt probabilitiesγmt (i) =P{xt = i, y1:T|xT = m}P{y1:T|xT = m}= P{xt = i, y1:T|xT = m}∑NXi=1 P{xt = i, y1:T|xT = m}= αmt (i)βmt (i)∑NXi=1 αmt (i)βmt (i). (42)The conditioning on the final state m can be removed by marginalizing the joint aposteriori probabilitiesγt(i) =NX∑m=1γmt (i)P{xT = m}, (43)where P{xT = m} is obtained from the endpoint distribution Π. Using γt(i), theindividually most likely state x∗t for a hidden reciprocal chain is chosen at timeinstant t asx∗t = arg maxi∈X[γt(i)] , 1 ≤ t ≤ T. (44)3.3.3 The Viterbi AlgorithmAs opposed to the individually most likely state estimates in (13), the maximumlikelihood state sequence x∗1:T for the observation sequence y1:T can be recoveredusing a dynamic programming algorithm. Define the Viterbi probabilityνmt (i) = maxx1,x2,...,xt−1 P{x1, x2, . . . , xt−1, xt = i, y1:t|xT = m} (45)23as the best score over all paths from time 1, . . . , t − 1, ending in state xt = i,given that the final state at time T is state m. Using dynamic programming, thepath with the highest probability at the next time step in the hidden reciprocalchain xt+1 = j can be calculated utilizing the Markov bridge property and theobservation likelihoodsνmt+1(j) = maxi[νmt (i)Bmij (t)]Cj(yt+1). (46)To retrieve the optimal state sequence x∗1:T, a back-pointer ψmt (j) is used to keeptrack of the argument (state) that maximized (46) for each t and j. The completealgorithm is as follows:initialization The Viterbi probabilities νmt are initialized similar to the for-ward probabilities as the probability of the initial state being i and emittingthe observation y1 given the final state m. The auxiliary array is initializedwith 0 for each state.ν1(i) = pimi Ci(y1), 1 ≤ i ≤ NX (47)ψ1(i) = 0. (48)induction The induction in (15) is very similar to the forward algorithm exceptthat the sum has been replaced by the max operator.νmt (j) = maxi[νmt−1(i)Bmij (t− 1)]Cj(yt),2 ≤ t ≤ T1 ≤ j ≤ NX(49)ψt(j) = arg maxi[νmt−1(i)Bmij (t− 1)],2 ≤ t ≤ T1 ≤ j ≤ NX(50)termination At the final time T, the final Viterbi probabilities νmT for eachMarkov bridge model defined by its end point m have been calculated. Themost likely value of the final end point xT (this is the same as choosingthe most likely Markov bridge) can be found by maximizing across all NXMarkov bridgesx∗T = arg maxm′[νmT (m′)P{xT = m′}]. (51)path back-tracking Finally, the optimal state sequence can be recovered byback-tracking over the auxiliary array associated with the Markov bridgethat maximizes (51) such thatx∗t = ψx∗Tt+1(x∗t+1), t = T − 1, T − 2, . . . , 1. (52)244 S T O C H A S T I C G R A M M A R M O D E L SIn this chapter, a review of stochastic grammar models is presented together withrelevant inference algorithms for the tasks involved in this thesis. Grammar mod-els are uniquely suited to signals which arise from a hierarchical combination ofsub-units. These are compositional models which are specified by rules determin-ing the manner in which sub-units can combine to form higher-order structure. Inthis chapter, several types of grammar models are presented which can be shownto subsume both Markov chain and reciprocal chain models.4.1 languagesConsider a finite, non-empty set V of symbols, called the alphabet. From theindividual symbols v ∈ V we construct strings, which are finite sequences ofsymbols from the alphabet. For example, if the alphabet V = {a, b}, then ababand aaabbba are strings on V . A special symbol e to denote an empty string - astring without any symbols.The infinite set of all finite length strings formed by a concatenation of symbolsfrom V is denoted by the positive (transitive) closure V+ and V∗ = V+ ∪ e denotesthe Kleene (reflexive and transitive) closure of the alphabet V . For example, ifV = {a, b}, thenV+ = {a, b, aa, bb, ab, ba, aaa, bbb, aab, . . .}V∗ = {e, a, b, aa, bb, ab, ba, aaa, bbb, aab, . . .} (53)Definition 7. A set of strings all of which are chosen from a set V∗ where V is a particularalphabet, is called a language λ. If V is an alphabet, and λ ⊆ V∗, then λ is a languageover V .Notice that a language over V need not include strings with all the symbols ofV , so once we have established that λ is a language over V , we can also concludethat it is a language over any alphabet that is a superset of V .4.2 grammarsThe definition of a formal language in Def. 7 is extremely broad as it looselydefines a language as set of strings over an alphabet V . However, such a definition25precludes the imposition of structure that is inherent in most languages. Hence, amore useful way of defining formal languages is through the use of grammars. Agrammar can be colloquially defined as a set of rules which are used to constructa language λ contained in V∗ over some alphabet V . These rules allow us toreplace strings of intermediary symbols with strings of symbols contained in Vto form an element of the language. By placing restrictions on the rules, differenttypes of languages can be developed.Definition 8. A formal grammar G is a quadruple, G = 〈X ,V ,R, S〉, where X is afinite set of non-terminal symbols, V is the alphabet (a finite set of terminal symbols), Ris a finite set of grammatical production (re-write) rules, S ∈ X is a special sentencestart symbol. Elements of the non-terminal set X = {X1 = S, X2, . . .} are the variableswhich can be re-written by using production rules in R. The words forming the stringsin the language are represented by concatenations of elements of the terminal set V ={v1, v2, . . .}. In general, R is a partially defined function R : (V ∪ X )∗ → V ∪ X )∗).However, certain restrictions applied to the production rules allow us to define useful typesof grammars.In the sequel, non-terminal symbols are represented using capital letters, andterminal symbols are represented using lower case letters. To illustrate the use ofgrammars in defining a language, consider the following grammar G = (X ,V ,R, S):X = {S, X2}V = {a, b}R ={S→ a X2 | bX2 → b S | a}(54)Using the rules specified in (54), a number of strings can be derived by repeatedapplication of the grammar production rules:1. S⇒ b2. S⇒ a X2 ⇒ a a3. S⇒ a X2 ⇒ a b S⇒ a b b4. S⇒ a X2 ⇒ a b S⇒ a b a X2 ⇒ a b a a5. S⇒ a X2 ⇒ a b S⇒ a b a X2 ⇒ a b a b S⇒ a b a b b6. S⇒ a X2 ⇒ a b S⇒ a b a X2 ⇒ a b a b S⇒ . . .⇒ a b a b a b . . . a b b26The special arrow⇒ in a derivation S⇒ a X2 is used to denote that non-terminal Sderives the string a X2 through the application of grammar production rules. Thelanguage generated in such a manner contains an infinite number of strings thatcan be of arbitrary length. The strings start with either a or b. If a string startswith b, then it only contains only symbols. Strings terminate with either a a orb b, and consist of a distinct repeated pattern a b. The use of a few simple rules in(54) is able to generate a rich set of symbol sequence with a highly characteristicpattern. This simple example illustrates the power of grammatical representationof languages.Definition 9. A language λ(G) generated by the grammar G is the set of all sequencesconsisting of terminal symbols that can be generated by starting with the non-terminalS and recursively applying production rules to new non-terminals until the resultingsymbol sequence only contains terminal symbols.4.3 chomsky hierarchy of languagesThe production rules R in Def. 8 are vaguely defined as a mapping from onesymbol string to another symbol string. The seminal work of Noam Chomsky[17] used restrictions on the properties of the production rules tp develop a hier-archy of languages known as the Chomsky hierarchy. The grammatical rules inRtake the following form depending on its membership in the Chomsky hierarchyshown in Fig. 28a:1. For a regular or finite-state grammar (RG), the rules in R are independentof the context in which the substitution to the non-terminal symbols areapplied. They are of the formXi → v Xj or Xi → v where v ∈ V , Xi, Xj ∈ X . (55)2. For a context-free grammar (CFG), the rules inR extend the form of regulargrammar rules to include combinations of non-terminals on their right-handside, taking the formX → ψ, X ∈ X , ψ ∈ (X ∪ V)∗ , (56)where ψ is a combination of non-terminal and terminal symbols. The lefthand side of the rule is still restricted to only one non-terminal symbol. Thenotation (X ∪ V)∗ denotes (possibly empty) arbitrary length combinationsof elements of the non-terminal set X and the terminal set V .27Table 1: The Chomsky HierarchyGrammar Rule Structure LanguageRG Xi → vXj, X → v Regular or Finite-State Language (RL)CFG X → ψ Context-Free Language (CFL)CSG α X β→ α ψ β Context-Sensitive Language (CSL)UG α X β→ ψ Unrestricted Language (UL)3. For a context-sensitive grammar (CSG), the rules in R extend the form ofcontext-free rules to include context around the non-terminal being writtenon the left-hand side, giving rules of the formα X β→ α ψ β, X ∈ X , α, β, ψ ∈ (X ∪ V)∗ . (57)4. For an unrestricted grammar (UG), any production rules of the formα X β→ ψ, X ∈ X , α, β, ψ ∈ (X ∪ V)∗ (58)are allowed. These types of grammars are also called type-0 grammars.Chomsky also classified languages in terms of the grammars that can be used todefine them. The hierarchy of grammars and languages are illustrated using Ta-ble 1 and Fig. 28a. Each inner circle of Fig. 28a is a subset of the outer circle. Thus,context-sensitive languages are a special form (more restrictive) of unrestrictedlanguages, context-free languages are less expressive than context-sensitive lan-guages and regular languages comprise the most restrictive class.4.4 stochastic languages and stochastic grammarsLanguages λ(G) induced by a grammar G can form powerful generative modelsas illustrated by the example in Def. 8. However, the use of grammars as signalmodels requires a probabilistic framework to deal with the stochastic variationinherent in the real world. Moreover, the incorporation of a probability measureover strings derived from a language provides a ranking mechanism for variousinference tasks in signal processing applications. A deterministic language orgrammar can be extended into the domain of stochastic languages by inducing aconditional probability distribution on the selection of a production rule when agiven non-terminal is chosen for re-writing.28Definition 10. A stochastic formal grammar G(P) is a formal grammar G = 〈X ,V ,R,P , S〉augmented with a family of probability lawsP = {pij|∑jpij = 1, i = 1, 2, . . . , |X |, j ∈ nXi}on the production rules R = {rij, i = 1, 2, . . . , |X |, j ∈ nXi}. Each non-terminal Xihas nXi ∈N alternative rules which can be selected to re-write Xi.Definition 11. A stochastic language λ(G(P)) generated by the stochastic grammarG(P) is the set of all sequences consisting of terminal symbols that can be generatedby starting with the start symbol S and recursively using production rules applied withprobability P to new non-terminals until the resulting symbol sequence only containsterminal symbols.We illustrate the generative process involving a stochastic language by extend-ing the grammar shown previously in Def. 8. Consider the stochastic grammarG = (X ,V ,R,P , S):X = {S, X2}V = {a, b}R =r11 : S0.9−→ a X2r12 : S0.1−→ br21 : X20.9−→ b Sr22 : X20.1−→ a(59)The non-terminal X1 has 2 alternative rules r11, r12 (and similarly for non-terminalX2). Each rule rij : Xipij−→ α has an associated conditional probability pij of beingselected from its nXi alternative rules when the non-terminal Xi on its left handside is chosen for re-writing. A valid probability distribution must be definedover all alternative rules for a non-terminal Xi by ensuring that ∑j∈nXi pij = 1.4.5 stochastic regular languages , markov chains and hidden markovmodelsThe definition of a Markov chain in Def. 2 and a hidden Markov model in Def. 3bears many similarities to the definition of a stochastic grammar in Def. 10. Discrete-state, discrete-time Markov chains are considered as the equivalent representa-tions of a stochastic regular language[18]. Such a representation has been success-fully used in bioinformatics and computational genomics [19], in natural languageand speech processing [20] etc.29Figure 1: The state transition diagram of a Markov chain analogous to the grammar in(59).A finite-state Markov chain is defined as a triple GMC = 〈X ,pi, M〉, where Xis the state space of the Markov chain, pi is the NX × 1 vector of the initial stateprobability distribution and M is the NX × Nxset transition matrix. The relation-ship between stochastic regular grammars and Markov chains can be illustratedby examining the transition structure of the grammar in (59). A Markov chainGMC analogous to the production rules R of (59) can represented byX = {S1, S2, T}, pi =100 , M =0 0.9 0.10.1 0 0.90 0 1 , (60)where M and pi are defined with respect to the state ordering of X in (60). Thestate transition diagram of the Markov chain in (60) is shown in Fig. 1. A new“terminating” state T is added to the state space of the Markov chain to accountfor rules r12 and r22 that do not have any non-terminals on their right hand side. Itis interpreted as an absorbing state with a self-transition probability of 1 and is de-picted using a double circle. As such, a Markov chain can only capture the transi-tion dynamics of the grammar and cannot address the generation and transforma-tion aspects of a stochastic regular grammar. This is due to the fact that a Markovchain cannot incorporate the terminal symbols V . A hidden Markov model can ad-dress this shortcoming as they are particularly suitable for representing stochasticlanguages of finite-state discrete-event systems observed in noisy environments.They separate the uncertainty in the model attributed to the observation processfrom the uncertainties associated with the system’s functionality. Generally speak-ing, HMMs are Markov chains indirectly observed through a noisy process. Theyare formally defined in Ch. 2 as a 5-tuple GHMM = 〈X ,Y ,pi, M, C〉 where X is thestate space of the Markov chain, Y is a finite observation alphabet (terminal set),pi is the NX × 1 initial state distribution vector of the underlying Markov chain,M is the NX × NX transition matrix of the Markov chain and C is an NX × NY30Figure 2: The state transition diagram of a hidden Markov model analogous to the gram-mar in (59) defined as a Moore machine. The double circle represents an absorb-ing state (with no transitions to other non-terminals). The notation S/a denotesthat a transition to the state S is associated with an emission of the terminalsymbol a.observation likelihood matrix relating the probability of a state j ∈ X emitting anobservation k ∈ Y .The grammar in (59) can also be used to illustrate how HMMs relate to stochas-tic RGs. The Markov chain for this grammar (defined by (60)) can be extendedby incorporating the alphabet V of the grammar in (59) through the structure ofthe observation probability matrix C. However, this extension requires a trans-formation of the structure of the Markov chain in Fig. 1. The observation proba-bility matrix C associates symbols of the alphabet V with the states of the chain,whereas stochastic RGs associate generation of non-terminals with transitions ofa finite state machine. The former case is known in the literature as the Mooremachine, and the latter is referred to as the Mealy machine [21]. Therefore, to ac-commodate for the structural constraints of the HMM, the Markov chain in Fig.1has to be converted to the Moore machine form as described in detail in [21]. Theresulting HMM has the following structurepi =0.9000.1, M =0 0.1 0.9 00.9 0 0 0.10 0 1 00 0 0 1, C =1 00 11 00 1, (61)where the matrices are indexed according the ordering of the state space X ={S, X2, T1, T2} and the terminal alphabet set Y = {a, b}. The resulting state transi-tion diagram of the HMM is shown in Fig. 2 where two terminating states T1, T2have been augmented to the state space.314.6 stochastic context-free languages and grammarsThe stochastic context-free grammar (SCFG) formalism is a generalization of theHMM. In most stochastic models for sequential data, finite state or even simplen-gram approaches dominate. A major reason for this is that although most stan-dard algorithms for stochastic finite-state models (i.e., HMMs) have generalizedversions for SCFGs, they become computationally more demanding, and oftenintractable in practice [22]. However, the m-gram models have a large number ofparameters and fail to capture hierarchical structure inherent in many time-series.SCFGs provide a flexible framework for devising generative models of time-seriesusing a hierarchical structure.Many natural signals like speech and images, can be viewed as a hierarchicalcomposition of atomic sub-units combining with syntactic rules. At each level inthe hierarchy, the atomic unit is made up of groups of atomic units from lowerlevels in the hierarchy. Such a hierarchical decomposition of a signal is called aparse tree in formal language theory. A parse tree ψ = {r1, r2, . . . , rN} is a sequenceof rules applied in order to the left-most non-terminal of the pre-terminal stringγ1, γ2, . . . , γN. Each pre-terminal string γn+1 ∈ (X ∪ V)∗ is obtained by applyinggrammar rule rn to the left-most non-terminal in γn. The result of applying rN onthe pre-terminal string γN results in a terminal sequence v1, v2, . . . , vT. Note thatthe index N refers to the number of production rules applied to the start symbolS in order to obtain a terminal string. There is a one-to-one correspondencebetween the depth of the parse tree and the number of production rules re-writesinvolved in the generation of the terminal string provided that only the left-mostnon-terminal is expanded at each layer of the parse tree. The index T refers to thelength of the terminal string which is, in general, different from the depth of thetree.SCFGs can encode hierarchical structure into the description of a time-series byspecifying permissible re-write rules for of each non-terminal. Note that regulargrammars can never give rise to branching in the hierarchical decomposition rep-resented by a parse tree. This is due to the restrictive nature of regular grammarrules that can only contain one non-terminal on the right hand side of the gram-mar rule. As a result, regular grammars give rise to a linear directed graphicalstructure. However, the more expressive nature of stochastic context-free rulesallow branching structure to emerge.Definition 12. A stochastic context-free grammar G = 〈X ,V ,R,P , S〉 is a quintuple,where X is a finite set of non-terminal symbols, V is the alphabet (a finite set of terminalsymbols), R is a finite set of grammatical production (re-write) rules, P is a rule probabil-ity set {pm}|R|m=1 assigning a probability pmdef= P{rm} to each rule rm ∈ R and S ∈ X32is a special sentence start symbol. The rule set R of an SCFG contains only context-freerules of the form in (56). The rule probabilities create a conditional probability distribu-tion over all alternative expansions of a non-terminal Xi such that ∑nXii=1 pm = 1.0 for eachXi ∈ X . The number of alternative expansions of a nonterminal Xi is represented by nXi .The context-free nature of the grammar rules implies conditional independenceof the rule application process. As a result, the probability of a parse tree is theproduct of the probabilities of the rules appliedP{ψ} =N∏n=1rn. (62)The parse tree ψ generating an input sentence is unobserved and hence thesequence of rule choices can be considered a hidden variable analogous to thelatent states of a hidden Markov model. More generally, every terminal sequencex1:T defines a set of possible parse trees Ψ which could generate the observedterminal sequence. As a result the probability of a terminal sequence x1:T givenan SCFG model G isP{x1:T|G} =∑ψ∈ΨP{x1:T, ψ}=∑ψ∈ΨP{x1:T|ψ}P{ψ}=∑ψ∈ΨN∏n=1rn. (63)The reduction to (63) results due to the fact that P{x1:T|ψ} = 1.0 for a fully ob-served noise-free terminal string. This is because, knowledge of the entire parsetree implies knowledge about the final terminal string. The expression in (63)implies that the probability of a terminal string is equal to the sum of the proba-bilities of all possible parse tree deriving that string.While the parse tree of the SCFG is unobserved, an additional hidden layer canbe imposed on the SCFG through a noise observation process. In this dissertation,such a process is called a hidden SCFG analogous to the terminology prevalent instatistical signal processing.Definition 13. A discrete-valued hidden stochastic context-free grammar GSCFG =〈X ,V , Vˆ ,R,P , S, C〉is a 7-tuple consisting of the same parameters as an SCFG with an additional observationset Vˆ and an observation probability matrix C = {Cj(vˆi)}, where Cj(vˆi) = P{vˆi|vj}.A sample y1:T from a hidden SCFG can be viewed as a process with two layersof hidden variables. The first layer of hidden variables is the parse tree ψ =33{r1, . . . , rN} composed of a sequence of production rules. The second layer ofhidden variables are the noisy-free terminal symbols x1:T, with each xt = vj ∈ V .These are only observed through a measurement process resulting in observationsy1:T with each yt = vˆi ∈ Vˆ .The probability of a measured terminal sequence y1:T given an SCFG model GisP{y1:T|G} =∑ψ∈ΨP{y1:T, ψ}=∑ψ∈ΨP{y1:T|ψ}P{ψ}=∑ψ∈ΨT∏t=1P{yt|xt}N∏n=1rn. (64)4.6.1 Inference Using Stochastic Context-Free GrammarsIn this dissertation, we are mainly concerned with (a) computing likelihoodsP{y1:T; GSCFG}. and (b) recovering the parse tree ψ and most likely clean terminalsequence arg maxx1:T P{x1:T|y1:T; GSCFG} from an observed sentence y0:T given anSCFG model GSCFG. The model likelihoods are used to classify between differentanomalous patterns while the most likely parse tree estimation is used to recoverthe “clean” terminal sequence originally generated by the SCFG. The conventionalalgorithm for computing terminal string probabilities is the inside-outside algo-rithm [23] which is a generalization of the forward-backward algorithm used forHMMs. The Cocke-Young-Kasami parser is a chart-based Viterbi algorithm thatused the inside-outside algorithm together with a reverse pass over the input toobtain the most likely parse tree. However, both the inside-outside algorithm andthe CYK parser require the grammar to be in a restricted form called the Chomskynormal form (CNF). In this dissertation, the Earley-Stolcke parser [22] is used dueto its ability to deal with unrestricted context-free rules. Moreover, it is capableof generating likelihoods for partial sentences y1:t due to its incremental left toright operation. The inside-outside algorithm is only capable of generating fullsentence probabilities.The Earley-Stolcke parser builds left-most derivations of a string by keepingtrack of all possible derivations that are consistent with the observations y1:t upto a certain epoch t. As more observations are revealed, the set of possible deriva-tions (each of which corresponds to a partial parse tree) can either expand as newchoices are introduced, or shrink as a result of resolved ambiguities. As eachobservation yt is received by the parser, a set of states ut = {unt , n = 1, 2, . . .} are34created which represent all the different derivations that can explain the observa-tions y1:t.An Earley state is the basic control structure used in the operations of the Earley-Stolcke parser. Each state unt ∈ ut represents a re-write rule rm ∈ R such that theobservation yt is derived from its right-hand side. An Earley state is denoted bytt′X → λ · Yµ [α, γ]. The upper-case letters X and Y are non-terminals, λ and µare substrings of non-terminals and terminals. The index t represents the currentepoch. The dot ‘·’ is a marker demarcating the portion of the right-hand side thathas already been recognized by the parser. The index t′ is a back-pointer backto the epoch at which the parser re-wrote non-terminal X using an instance ofproduction rule rm. Each state is a link in an incomplete chain of rule choices thatcould have generated the observed sequence y1:t.Each state is also associated with a forward probability α and an inner prob-ability γ. The forward probability α(unt ) of a state unt represents the probabilityof the grammar GSCFG deriving the observations y1:t while passing through thestate unt = tt′X → λ · Yµ at epoch t. This is analogous to the definition of forwardprobability for the case of an HMM GHMM[14]. The inner probability γ(unt ) ofa state unt is defined as the probability of a particular rule generating an infix ofthe sentence aˆt′ :t represented by the accumulated probabilities of all derivationsstarting with a state tt′X → λ ·Yµ and ending at the statettX → λ Y · µ.For the purposes of dealing with the start symbol, the Earley-Stolcke parser isinitialized with a dummy state 00 → ·S [α = 1, γ = 1]. At each epoch, the states inan Earley set ut are processed in order, by performing one of three operations oneach state. These operations may add more states to ut and may also put statesin a new state set ut+1. Whenever an operation attempts to add a new state, it islinked to an existing state. Such a sequence of linked states represents differentrules choices that could have generated the input string.1) The prediction operation is applied to states when there is a non-terminal to theright of the dot. It causes the addition of one new state to ut for each alternativeproduction rule of that non-terminal. The dot is placed at the beginning of theproduction rule in each new state. The back-pointer is set to t, since the statewas created in ut. Thus the predictor adds to ut all productions which mightgenerate sub-strings beginning with xt+1. More formally, for a state tt′X → λ · Yµin the state set ut, the predictor adds a new predicted state ttY → ·ν for each of thealternative production rules (Y → ν) ∈ R. A link is thus created between thesestates.2) The scanning operation, on the other hand, is applicable only in the case whenthere is a terminal a to the right of the dot. The scanner compares that symbola with the observation yt+1, and if they match, it adds the state to ut+1, with thedot moved over one symbol in the state to indicate that the terminal symbol has35been scanned. If tt′X → λ · aµ exists and P{yt+1|a} > 0, the scanning operationadds a new scanned state t+1t X → λa · µ to state set ut+1. The symbol likelihoodP{yt+1|xt+1 = a} > 0 allows us to incorporate the noisy observation process intoSCFG generation.3) The third operation, the completion operation, is applicable to a state if its dotis at the end (tt′Y → ν.) of its production rule. Such a state is called a “complete”state. For every complete state, the parser back-tracks to the state set ut′ indicatedby the pointer in the complete state, and adds all states from ut′ to ut which haveY (the non-terminal corresponding to that production) to the right of the dot. Itmoves the dot over Y in such states. Intuitively, ut′ is the state set where Y wasfirst expanded. The parser has seen evidence yt′ :t that Y → ν was indeed used,so we move the dot over the Y in these states to show that its terminal yield beensuccessfully scanned. A completion operation adds a new state tt′′X → λY · µ(called a completed state) using t′t′′X → λ ·Yµ andtt′Y → ν·, where t′′ ≤ t′ ≤ t.The Earley-Stolcke parser continues in this manner until all the observation y1:Tsymbols have been scanned. If the final state set uT contains the state 00S′ → S·,then the algorithm terminates successfully. It represents a successful parse of thesentence x1, . . . , xt, . . . , xT. The recursive updates of the forward probability α andinner probability γ for each of the state operations is summarized in Algorithm 1.To deal with underflow issues, a state independent scaling must be applied atevery epoch. Define the prefix probability ζt = P{y1:t|GSCFG} as the probabilityof the grammar GSCFG generating the observations y1, . . . , yt as the prefix of acomplete sentence y1, . . . , yT. Philosophically, this involves a summation over allpossible suffixes β ∈ (X ∪ V)∗. However, in [22], it is shown that this is equivalentto the sum of the forward probabilities α of all scanned states in the state set utsuch thatζt =∑β∈(X∪V)∗P{y1, . . . , yt, β} =∑n∈utα(tt′X → λa · µ), (65)where n is a scanned state. An appropriate choice for the scaling factor ct in eachstate set is to use the inverse of the one-step prediction probability defined asζt|t−1 = P{yt|y1:t−1} =P{y1, . . . , yt}P{y1, . . . , yt−1}= ζtζt−1. (66)The scaling factor c1 = 1/ζ1 is initialized using the prefix probability of thefirst observation y1. The scaling factor at all other epochs t = 2, . . . , T are com-puted as ct = 1/ζt|t−1. The forward α and γ probabilities of all scanned statesare multiplied by the scaling factor to prevent underflow issues. The likelihood36Lt = P{y1:t|GSCFG} of the observation sequence y1:t can be computed from thesequence of scaling factors c1, . . . , ct asLt =1∏tt′=1 ct′. (67)An algorithmic description for one cycle of the Earley-Stolcke parser operationis given in Algorithm 1. The notation + = on lines 13, 14 and 17 of Algorithm1 denotes an accumulated sum over all repeated states in the state set being pro-cessed. RL(Z, Y) and RU(Z, Y) are pre-computed |X | × |X | matrices representingcollapsed factors to account for looping production rules into a single step withmodified probability updates. Further details can be found in [22].Algorithm 1 Earley-Stolcke Parser1: function EarleyStolckeParser(u0:t−1, aˆt)Scanning2: for t−1t′ X → λ · aµ [α, γ] ∈ ut−1 do3: Add tt′X → λa · µ [α, γ] if P{aˆt|a} > 0.4: α′ = αP{aˆt|a}5: γ′ = γP{aˆt|a}6: ζt = ∑n∈ut α(tt′X → λa · µ)7: Compute ct = 1ζt|t−1 by computing ζt|t−1 from (66)8: Scale α, γ for all scanned states using ctCompletion9: for tt′Y → ν · [α′′, γ′′] ∈ ut do10: for t′t′′X → λ · Zµ [α, γ] ∈ ut′ do11: if RU(Z, Y) 6= 0 then12: Add tt′′X → λZ · µ [α′, γ′]13: α′+ = αγ′′RU(Z, Y)14: γ′+ = γγ′′RU(Z, Y)Prediction15: for tt′X → λ · Zµ [α, γ] ∈ ut do16: Add ttY → ·ν [α′, γ′] if RL(Z, Y) 6= 0,17: α′+ = αRL(Z, Y)P{Y → ν}18: γ′ = P{Y → ν}19: return ut, ct37In addition to sequence likelihoods Lt, we are interested in recovering the se-quence of clean tracklets x1:T. For an SCFG model, this involves computing theViterbi parse of the observations y1:T. A Viterbi parse recovers the sequence of rulechoices (or the parse tree ψ∗) that assigns maximum probability to y1:T among allpossible parses. The same sequence of operations as described in Algorithm 1 canbe used to recover the Viterbi parse. However, each state unt also keeps tracks ofits Viterbi path probability φ. The Viterbi probability φ is propagated in the samemanner as the inner probability γ. During completion, the accumulated sum + =is replaced by a maximum over all products that contribute to the completed state.The complete state tt′Y → ν· at the current epoch t associated with the maximumis recorded as the predecessor of tt′′X → λZ · µ. The Viterbi probability updatealso does not use the collapsing loop factor RU(Z, Y) as loops can only lowerthe probability of a derivation. Once the final state 00S′ → S· is accepted by theparser, a recursive procedure can back-track over the Earley state sets to recoverthe Viterbi parse ψ∗ [22]. The leaf nodes of the parse tree can then be traversed torecover the clean terminals x1:T.385 S TAT E S PA C E M O D E L SState-space models and extensions to switching state-space models are reviewedin this chapter. In previous chapters, the state space X of stochastic models likehidden Markov models, hidden reciprocal chains and stochastic grammars com-prised of a finite set of symbols. A state space model (SSM) generalizes the hiddenstates to be continuous in nature. Such models are useful for characterizing theanalog nature of real-world signals and are often called “physics-based” models.5.1 linear-gaussian dynamical systemIn contrast to the discrete-state models of Ch. 2 - 4, parametric continuous statedistributions are not automatically closed under operations such as products andmarginalization. The development of practical algorithms for which inferenceand learning can be carried out efficiently, the form of the continuous transitionP{xt|xt−1} must be restricted. A simple yet powerful class of such transitionsare the linear dynamical systems which define the temporal evolution of a statevariable xt according to the discrete-time linear update equationxt = Ftxt−1 + vt, (68)where Ft is the transition matrix at time t and vt ∼ N (vt|vt, Qt) is additive Gaus-sian noise with mean vt and co-variance matrix Qt. Such a model is equivalent toa first order Markov chain with transition probability P{xt|xt−1} = N (xt; Ftxt−1 + vt, Qt).At t = 1, the process has an initial state distribution P{x1} = N (x1; v1, R1). Fort > 1, if the parameters are time-independent such that Ft ≡ F, vt ≡ v, Qt ≡ Q,the process is called time-invariant.5.2 linear-gaussian state space modelA linear Gaussian state space model defines a stochastic linear dynamical systemin a hidden space over a sequence of observations y1:T. Each observation yt is alinear function of the latent (hidden) state xt. Such a model can also be consid-ered as a form of a linear dynamical system on the joint variables st = (xt, yt)with parts involving the state variable xt missing. These are powerful generativemodels for time-series and their use is widespread ranging from application ineconometrics, speech processing, target tracking etc.39Definition 14. A linear-Gaussian state space model is defined by a transition model fora state variable xt ∈ Rnx and an emission model for observations yt ∈ Rny such thatxt = Ftxt−1 + vt, vt ∼ N (vt; vt, Qt) (transition) (69)yt = Htxt + wt, wt ∼ N (wt; wt, Rt) (emission), (70)where Ft is the transition matrix, Ht is the observation matrix, vt, Qt is the mean andco-variance of the state noise and wt, Rt is the mean and co-variance of the observationnoise.The transition and emission models define a first order Markov modelP{x1:T, y1:T} = P{x1}P{y1|x1}T∏t=2P{xt|xt−1}P{yt|xt} (71)with the initial state probabilities, transition probabilities and observation likeli-hoods defined by Gaussian distributionsP{x1} = N (x1; v1, R1) (72)P{xt|xt−1} = N (xt; Ftxt−1 + vt, Qt) (73)P{yt|xt} = N (yt; Htxt−1 + wt, Rt) (74)Due to the Markov nature of the transitions and the form of the probabilities, suchmodels are commonly referred to Gauss-Markov state space models.5.3 inference using linear-gaussian state space modelsFor the discrete-state models in Ch. 2 - 4, various probabilistic queries such asmodel evaluation, state estimation and parameter estimation were considered.However, for continuous-state space models, we are mainly interested in the on-line state estimation of the latent state variable xt given all the observations ytobserved until the present time instant t. This is called the filtering problem andis the primary task in online tracking of the state.5.3.1 Kalman FilterThe Kalman filter exact is an algorithm for exact Bayesian filtering of linear-Gaussian state space models. Define the marginal posterior at time t byP{xt|y1:t} = N (xt; xˆt,Σt) , (75)is a Gaussian with mean xˆt and co-variance Σt. Since all the involved densities areGaussian in nature, the the prediction and update steps for the density in (75) canbe performed in closed form using a prediction step and a measurement updatestep as outlined below.405.3.1.1 Kalman PredictionThe Kalman prediction step can be derived from the relationP{xt|y1:t} =∫P{xt|xt−1, y1:t−1}P{xt−1|y1:t−1}=∫N (xt; Ftxt−1 + vt, Qt)N (xt−1; xˆt−1,Σt−1) dxt−1= N(xt; xˆt|t−1,Σt|t−1), (76)where xˆt|t−1 , Ftxˆt−1 + vt and Σt|t−1 = FtΣt−1Fᵀt + Qt.5.3.1.2 Kalman UpdateThe Kalman update step can be computed using Bayes’ rule asP{xt|y1:t} ∝ P{yt|xt}P{xt|y1:t−1}= N (xt|xˆt,Σt) , (77)where xˆt = xˆt|t−1 + Ktrt and Σt = (I − KtHtΣt|t−1). The residual variable rt isthe difference between the predicted observation yˆt and the actual observation ytsuch thatrt , yt − yˆt (78)yˆt , E{yt|y1:t} = Htxˆt|t−1 + wt (79)and Kt is the Kalman gain matrix given byKt = Σt|t−1Hᵀt S−1t . (80)The residual co-variance matrix St is computed usingSt , cov{rt|y1:t−1}= HtΣt|t−1Hᵀt + Rt. (81)The equation for the update of the mean of the Gaussian posterior xˆt = xˆt|t−1 +Ktrt implies that the new mean estimate after receiving the observation yt canbe obtained from the predicted mean xˆt|t−1 by adding a correction factor Ktrt.The correction factor is a weighted version of the error signal rt. If Ht = I, thenKt = Σt|t−1S−1t , which is the ratio between the co-variance of the prior (predictedmean from the transition model) and the co-variance of the measurement error. Ifthe prior is a strong predictor of the mean or if the sensors are very noisy, |Kt|will be small and a small weight will be placed on the error signal. Conversely, ifthe prior is weak and the sensors are precise, then |Kt will be large, an the errorsignal will receive high weighting on the correction term.415.4 non-linear state space modelsIn the more general case of non-linear transitions or observations, there are noknown finite-dimensional filters to perform exact inference. Furthermore, non-Gaussian noise is also very common, e.g., due to outliers. As a result, we mustresort to approximate inference methods.Definition 15. A non-linear state space model is defined by a non-linear transition func-tion for a state variable xt ∈ Rnx and a non-linear emission function for observationsyt ∈ Rny such thatxt = ft (xt−1, vt) (transition) (82)yt = ht (xt, wt) (emission), (83)where ft is a non-linear transition function, ht is a non-linear observation function andvt, wt are the state and measurement noise.5.5 inference using non-linear state space models5.5.1 Extended Kalman FilterThe extended Kalman filter is predominantly used when the transition or observa-tion functions are stationary and non-linear but the state and measurement noiseare Gaussian. That is, we consider models of the formxt = f (xt−1) + vt, vt ∼ N (0, Qt) (84)yt = h (xt) + wt, wt ∼ N (0, Rt) , (85)where f (·) and h(·) are non-linear but differentiable functions. The basic idea isto linearize f and h about the previous state estimate xˆt using a first order Taylorseries expansion, and then to apply the standard Kalman filter equations. Thenoise variance in the equations (Q and R) is not changed, i.e., the additional errordue to linearization is not modeled. Thus, the stationary non-linear dynamicalsystem is approximated with a non-stationary linear dynamical system. In moredetail, the method works as follows. The measurement model is approximatedusingP{yt|xt} ≈ N(yt; h(xˆt|t−1) + Ht(yt − xˆt|t−1), Rt), (86)where Ht is the Jacobian matrix of h evaluated at the prior state estimateHij ,δhi(x)δxj(87)Ht , H|x=xˆt|t−1 . (88)42In a similar manner, the state dynamics are approximated usingP{xt|xt−1} ≈ N (xt; f (xˆt−1) + Ft(yt−1 − xˆt−1), Qt) , (89)where Ft is the Jacobian matrix of f evaluated at the prior state estimateFij ,δ fi(x)δxj(90)Ft , F|x=xˆt−1 . (91)Given the approximations in (89) and (86), the filtering density can be computedusing the following Kalman equations:xˆt|t−1 = f (xˆt−1) (92)Σt|t−1 = FtΣt−1Fᵀt + Qt (93)Kt = Σt|t−1Hᵀt (HtΣt|t−1Hᵀt + Rt)−1 (94)xˆt = xˆt|t−1 + Kt(yt − h(xˆt|t−1)) (95)Σt = (I−KtHtΣt|t−1). (96)There are two main cases when the EKF works poorly. The first scenario occurswhen the state co-variance is large. In this case, the prior distribution of the statevariable is broad, so a significant amount of probability mass is spread throughdifferent parts of the transition function that are far away from the mean, wherethe function has been linearized. The other detrimental scenario occurs when thefunction is highly non-linear near the current mean. The unscented Kalman filter(UKF) can overcome both these limitations and a detailed exposition can be foundin [10], [24].5.5.2 Particle FilterThe most general case of non-linear and non-Gaussian state space models can-not be appropriately handled using the approximate algorithms of the extendedKalman filter or the unscented Kalman filter. While these methods enjoy many ofthe benefits of the Bayesian approach and are about as fast as optimization-basedpoint estimation methods, they assume conjugate priors and exponential familylikelihoods (leading to limited domain application).Particle filtering is a Monte Carlo (simulation based) algorithm for recursiveBayesian inference. The basic idea is to approximate the belief state (of the entirestate trajectory) using a weighted set of particlesP{x1:t|y1:t} ≈N∑n=1wnt δx1:t(xn1:t), (97)43where δ(·) is the Dirac delta function and {xn1:t, n = 1, . . . , N} is a set of supportpoints (particles) that are used to approximate the belief state with associatedweights wnt that are chosen according to the principle of importance sampling[25]. Consequently, the true posterior is represented using a discrete weightedapproximation {xn1:t, wnt }Nn=1.If the samples xn1:t are drawn from an importance (proposal) density pi{x1:t|y1:t},then the weightswnt ∝P{xn1:t|y1:t}pi{xn1:t|y1:t}. (98)However, if the proposal density is chosen to factorize such thatpi{xn1:t|y1:t} = pi{xnt |xn1:t−1, y1:t} pi{xn1:t−1|y1:t−1},then sequential sampling can be carried out with a new measurement yt to obtainsamples xn1:t ∼ pi{x1:t|y1:t} by augmenting each of the existing samples xn1:t−1 ∼pi{x1:t−1|y1:t−1} with the new state xit ∼ pi{xt|x1:t−1, y1:t}. The new randommeasure obtained in this fashion constitutes the predicted belief state.To derive the measurement update, the pdf P{x1:t|y1:t} is first expressed asP{x1:t|y1:t} =P{yt|x1:t, y1:t−1}P{x1:t|y1:t−1}P{yt|y1:t−1}= P{yt|x1:t, y1:t−1}P{xt|x1:t−1, y1:t−1}P{x1:t−1|y1:t−1}P{yt|y1:t−1}= P{yt|xt}P{xt|xt−1}P{yt|y1:t−1}P{x1:t−1|y1:t−1}∝ P{yt|xt}P{xt|xt−1}P{x1:t−1|y1:t−1} (99)The incremental weight update can be derived by substituting (99) into (98) aswnt ∝P{yt|xnt }P{xnt |xnt−1}P{xn1:t−1|y1:t−1}pi{xnt |xn1:t−1, y1:t}pi{xn1:t−1|y1:t−1}= wnt−1P{yt|xnt }P{xnt |xnt−1}pi{xnt |xn1:t−1, y1:t}(100)Furthermore, if pi{xt|x1:t−1, y1:t} = pi{xt|xt−1, yt}, then the proposal density isonly dependent on the previous state xt−1 and the instantaneous measurement yt.This is particularly useful in the case when only a filtered estimate of posteriorP{xt|y1:t} is required at each time step. In such scenarios, only xnt need be stored,and so one can discard the path, xn1:t−1, and the history of observations, y1:t−1.The modified weight is thenwnt ∝ wnt−1P{yt|xnt }P{xnt |xnt−1}pi{xnt |xnt−1, yt}, (101)44and the posterior filtered density P{xt|y1:t} can be approximated as the empiricaldistributionP{xt|y1:t} ≈N∑n=1wnt δxt(xnt ).In statistical literature, this procedure is referred to as sequential importance sam-pling which consists of recursive propagation of importance weights wnt and sup-port points xnt as each measurement is received sequentially. A more detailedstudy of the particle filter and its associated operations can be found in [25].5.6 switching state space modelsMany engineering applications involves dealing with nonlinear dynamic systemscharacterized by a few possible modes (or regimes) of operation. In tracking, forexample, this applies to maneuvering targets, where system behavior patterns arereferred to as system modes. These types of problems are often referred to ashybrid-state estimation problems involving both continuous-valued target stateand a discrete-valued mode variable.Definition 16. A switching state space model with a discrete mode variable at ∈ Ataking values in the mode set A is defined by the following dynamic and measurementequations:xt = ft (xt−1, at, vt) (transition) (102)yt = ht (xt, at, wt) (emission), (103)where ft is in general, a time-varying non-linear transition function, ht is a time-varyingnon-linear observation function and vt, wt are the state and measurement noise. The modeat is in effect during the sampling period (t− 1, t].The mode variable at is commonly modeled by a time-homogeneous nA statefirst-order Markov chain with transitional probabilitiesMij = P{at = j|at−1 = i}, i, j ∈ A. (104)The transition probability matrix of the Markov chain is represented by the nA ×nA matrix M and the initial mode probabilities are represented by the vector ß.The conceptual recursive solution to the state estimation (filtering) problem usingswitching state space models can be encapsulated using:P{xt, at|y1:t} =∑iMij∫P{xt|xt−1, at = j}P{xt−1, at−1 = i|y1:t−1}dxt−1(105)P{xt, at = j|y1:t} =P{yt|xt, at = j}P{xt, at = j|y1:t−1}∑j∫P{yt|xt, at = j}P{xt, at = j|y1:t−1}dxt(106)455.7 inference using switching state space modelsThere are no known finite dimensional filters for the state estimation of switchingstate space models. However, certain sub-optimal filters have been devised toobtain acceptable tracking performance.5.7.1 The Interacting Multiple-Model FilterA special case of a general hybrid system is the jump Markov linear system (JMLS)that can be considered to be a generalization of a linear system. However, whilethe Kalman filter can be used for optimal state estimation in the case of linearsystems, there are only suboptimal algorithms for the JMLS.Definition 17. A jump Markov linear system can be defined using the dynamics andmeasurement equations:xt = Ft−1(at)xt−1 + vt−1(at) (107)yt = Ht(at)xt + wt−1(at), (108)where Ft(at) is a time-varying state transition matrix dependent on the mode at, Ht(at)is a time-varying measurement matrix dependent on the mode at, vt−1(at) and wt(at)are mode dependent zero-mean white Gaussian noise processes with covariances Qt−1(at)and Rt(at) respectively.The JMLS is a non-linear system because xt or yt do not depend linearly on thehybrid state st = [xᵀt , at]ᵀ. The JMLS becomes linear only if the mode variable atis fixed (given). The number of parameters required to characterize the posteriorpdf P{xt|y1:t} of the JMLS grows exponentially with time.Suppose that the mode history up to time index t is denoted as al1:t = {al1, al2, . . . , alt}, l =1, . . . , ntA, where alt denotes the particular value that the mode variable takes dur-ing the sampling period (t, t + 1] in the lth mode sequence. The posterior filteringdensity can be expressed as a Gaussian mixtureP{xt|y1:t} =ntA∑l=1P{xt|al1:t, y1:t}P{al1:t|y1:t}, (109)46where P{al1:t|y1:t} is the probability of a particular mode sequence given measure-ments y1:t. Using Bayes rule, this mode sequence likelihood can be expressedasP{al1:t|y1:t} = P{al1:t|yt, y1:t−1}= P{yt|al1:t, y1:t−1}P{al1:t|y1:t−1}P{yt|y1:t−1}∝ P{yt|al1:t, y1:t−1}P{alt|al1:t−1}P{al1:t−1|y1:t−1}. (110)The posterior filtering density given the lth mode sequence is a Gaussian (for ajump-Markov linear system) and can be computed using a Kalman filter withparameters that correspond to each mode that is active at time t.The problem with (109) is that the number of mixture components (alternativemode histories) in the Gaussian sum grows exponentially with time t. Hence, apractical filter for a jump-Markov linear system has to be sub-optimal, based onsome approximations. Many alternatives exist in the tracking literature [24]. Inthis dissertation, the interacting multiple model filter is used due to its establishedempirical performance in the literature.The interacting multiple model (IMM) filter uses a bank of nA model-matchedfilters. The input to the jth model-matched filter is an interaction (mixture) ofall nA model-matched filters. Thus, before the model-matched filtering step, theIMM performs the “mixing” of models. The posterior density at t− 1 for model jis represented by the Gaussian density N(xjt; xˆjt−1|t−1,Σjt−1|t−1)wherexˆjt−1|t−1 =nA∑i=1µi|jt−1xˆit−1|t−1 (111)Σjt−1|t−1 =nA∑i=1µi|jt−1[Σit−1|t−1 +(xˆit−1|t−1 − xˆjt−1|t−1)(112)×(xˆit−1|t−1 − xˆjt−1|t−1)ᵀ]. (113)The weights in (111) and (112) are the mixing probabilitiesµi|jt−1 = P{at−1 = i|at = j, y1:t−1}. (114)Using Bayes’ rule, it follow thatµi|jt−1 =piijP{at−1 = i|y1:t−1}∑nAi=1 piijP{at−1 = i|y1:t−1}, (115)where piij is the transition probabilities of the Markov switching mode variable.After the model-matched filtering, the posterior density of the mode j at time t47is represented by N(xjt; xˆjt|t,Σjt|t). Another application of Bayes’ rule results inupdating the mode probabilities asP{at = j|y1:t} =Λjt ∑nAi=1 piijP{at−1 = i|y1:t−1}∑nAj=1 Λjt ∑nAi=1 piijP{at−1 = i|y1:t−1}, (116)where Λjt = P{yt|y1:t−1, at = j} is the model conditioned likelihood function. Theoutput of the IMM is a Gaussian mixture, however, it is usually collapsed into themean and covariance of the mixture asxˆt|t =nA∑i=1P{at = i|y1:t}xˆit|t (117)Σt|t =nA∑i=1P{at = i|y1:t} (118)×[Σit|t +(xˆit|t − xˆt|t) (xˆit|t − xˆt|t)ᵀ](119)5.7.2 The Multiple-Model Particle FilterJump-Markov systems with non-linear dynamics or measurements are widelyfound in different applications [25]. Similar to the particle filter methodology inSec. 5.5.2, the multiple-model particle filter (MMPF) has been proposed to per-form non-linear filtering with switching dynamical models. Such a problem be-longs to a wider class of hybrid state estimation problems where the augmentedstate space st consists of both a continuous-valued part xt and a discrete-valuedpart at. The components of the continuous-valued vector are usually target kine-matic variables such as position, velocity, acceleration etc. The discrete-valuedvector could be a combination of mode variable, target attributes or a data associ-ation vector.The multiple model particle filter in a sequential Monte Carlo approximationof the conceptual solution in (105). Let the augmented state vector be definedas before st = [xᵀt , at]ᵀ. The initial densities P{x0} and P{a1} are assumed to beknown a priori and are part of the model specifications. Let {snt , wnt }Nn=1 denote arandom measure that characterizes the posterior density P{st|y1:t} such that eachparticle snt consists of two components: xnt and ant .The first step of the MMPF is to generate a random set {ant }Nn=1 based on the pre-vious mode particles {ant−1}Nn=1 and the transitional probability matrix M = [Mij],where i, j ∈ A. The sampling of the new mode particles can be performed by sam-pling from the discrete distribution Mant−1,j that is obtained by selecting the ant−148row of M. The next step in the generic MMPF performs a regime conditioned se-quential importance sampling by using an appropriate proposal sampling densityxnt ∼ pi{xt|xnt−1, ant , yt}. The incremental weight update is then carried out usingwnt ∝ wnt−1P{yt|xnt , ant }P{xnt |xnt−1, ant }pi{xnt |xnt−1, ant , yt}(120)together with the appropriate normalization. The above forms a general templatefor the MMPF. Specific details are dependent on the choice of the proposal densityand the form of the dynamics and measurement equations.496 M E TA - L E V E L T R A C K I N G F O R S PAT I O - T E M P O R A L T R A -J E C T O RY A N A LY S I S6.1 introductionPhysical sensor-based target tracking is a classical problem that has been stud-ied in great detail [24, 26]. This chapter presents middleware algorithms to helphuman surveillance operators interpret tracks in order to detect and visualize sus-picious spatio-temporal target trajectories. While state space models are ideal fortarget tracking, the main idea presented in the sequel is that stochastic context-free grammar (SCFG) models and other non-Markovian models like reciprocalprocesses are useful for modeling and interpreting trajectories.Several models have been presented in this chapter for specific target trajec-tories signifying malicious intent. Both single-target and multiple-target anoma-lous scenarios are examined. More specifically, 8 trajectory models are presented:(i) random walk, (ii) reciprocal process, (iii) linear, (iv) arc-like, (v) rectangular,(vi) destination-aware, (vii) palindromic and (viii) target rendezvous trajectories.Our modeling framework focuses on the human-sensor interface (middleware)tasked with high-level reasoning and visualization from lower-level sensor mea-surements. Bayesian signal processing algorithms are also developed to performmodel classification, fusion and change detection using novel SCFG and RP mod-els.ExampleConsider the scenario in Fig. 3 where a target executing a trapezoidal trajectory(dashed blue line) is observed in noise. This trajectory could signify a target avoid-ing an obstacle by deviating away, passing the obstacle and then returning to itsprevious path. Given noisy point measurements, how can one devise algorithmsto detect whether the target executed such a pattern? This is a non-trivial estima-tion problem (that cannot be solved efficiently with template matching techniques)since exponentially-many scaled versions of the shape need to be considered1.SCFG models offer an efficient framework to model common shapes (shown in1 As an equivalent example, consider a string albmcmdn where l, m and n are unknown positiveintegers such that l + 2m+ n = k. How can one extract the arc trajectory bmcm from a noisy versionof the string? If b and c were composite alphabets that were the union of r other symbols, template50−30 −20 −10 0 10 20 30 40 50 60 70−30−20−10010203040506070X−PositionY−Position Radar detectionsBase−level trackerMeta−level trackerFigure 3: A radar makes noisy observations (green dots) that can be tracked by a conven-tional tracker as shown by the solid red line. However, useful information likeshape and destination is often obscured by the estimation process. The mainaim in meta-level tracking is to detect and track higher-level characteristics ofa trajectory like the movement patterns shown as blue arrows. In this example,a target executes an arc movement (the open trapezoidal shape) that can be as-sociated with various anomalous behaviors. Our two-scale approach uses thebase-level tracker output to track intent-driven trajectories having long-rangedependencies.Fig. 4a) in a scale-invariant manner. The trajectory classification and modeling ap-proach proposed in this thesis can also be viewed as a “visualization” layer thatis able to assist the human analyst by extracting suspicious spatio-temporal pat-terns embedded in noisy tracks as depicted in Fig. 3. Efficient polynomial-timealgorithms exist to perform classification using SCFG models which are used inSec. 6.7 to demonstrate the effectiveness of SCFG models over competing hiddenMarkov models.matching would require listing an exponential number∑k/2m=0[k− (2m− 1)]rm+1 of possibilities, forall possible choices of l, m and n, since the values of these integers are not known.51NW 11TH ST.W. PARK PL.NW 10TH ST.N. WALKER AVE.N. DEWEY AVE.N. DEWEY AVE.C LA SSEN DR .NW 10TH ST.CLASS EN DR . N. WALKER AVE.H O S P I T A LEMERGENCYRADAR(a) (b)Figure 4: (a) A conceptual sketch of a city-wide radar surveillance application. Targettrajectories evolve over roads and are constrained to certain shapes due to thegeometry imposed by urban environments. The trajectory in red depicts an m-rectangle while that in blue represents an arc. (b) The road network in a city canbe treated as a directed graph with nodes representing intersections and edgesrepresenting connecting streets. The trajectory in blue shows a palindrome pathrepresenting a target retracing its path. Such paths can be considered anomalousin many settings. The red trajectory depicts rendezvousing targets which is ofgreat interest in many crime-related events. Such targets are called “destination-aware” because the evolution of their trajectory is influenced by the destinationend-point.6.1.1 Literature SurveyThe classification and tracking of anomalous spatio-temporal trajectories arise inmany application areas such as target tracking using radars[2], gesture recogni-tion using optical [27] and time-of-flight sensors[28], human action recognitionin camera networks[29], gait analysis [30], network packet traces[31], vehiculargeo-position coordinates[32] etc. A common approach in such sequential patternrecognition problems, is the use of hidden Markov models to capture the tem-poral dependence of observations. Such an approach is taken in [33] by usingflow vectors on objects being tracked in video sequences. Spatial nodes of interestare first isolated using a Gaussian mixture modeling technique. Routes are thencreated by clustering different trajectories and a high-level hidden Markov modelis learned for each route. However, such a model cannot incorporate destination-specific information. Moreover, long-term dependencies in the trajectory are alsolost due to the Markov assumption.52Our work is related to the approach taken in [34] and [35]. A non-probabilisticcontext-free grammar approach is used in [34] to identify two-person interactionslike hugs, hand-shakes, kicks and punches to enforce syntactic structure on de-tected events. In [35], SCFGs are used to recognize cheating actions in card-gamesat casinos. Our work departs from them significantly as we consider trajectorymodeling in a tracking situation and not an action recognition system.6.1.2 Organization and ResultsThe chapter is organized as follows:1) Hierarchical Tracking FrameworkSec. 6.2, formulates a hierarchical two time-scale tracking system as depicted inFig. 5. The algorithms presented use the track estimates from an existing “base-level” tracker to perform filtering and change detection at the middleware layer.Thus, they are at a higher layer of abstraction than conventional tracking and arefully compatible with existing trackers.2) Modeling of Anomalous PatternsThe main contribution of this chapter is the novel modeling of various anomalouspatterns like target rendezvous, path re-tracing etc. based on SCFG models thatare presented in Sec. 6.3. Two different kinds of models are considered based oneither movement patterns (velocity-based) or the specific sequence of sites visitedthe target (position-based). Movement patterns can model complex geometricshapes while position tracklet sequences can model destination-specific anoma-lies.3) System-Theoretic Constraints On Rule ProbabilitiesIn Sec. 6.4, constraints on the rule probabilities are derived by fixing the expectedtrajectory length. When used in combination with a particular shape, constrain-ing the length of the trajectory enables knowledge of the target destination in aprobabilistic sense.4) Multi-Sensor FusionIn Sec. 6.5, an alternative scenario of inference in a multi-sensor network is ex-amined and modifications are used in the grammar modeling approach to enableinference.53SENSOR MEASUREMENTSBASE TRACKERTRACKLET ESTIMATORDECISION LAYER CLASSIFICATION VITERBI PARSEVISUALIZATION LAYER TRAJECTORY VISUALIZATION MODELSSCFG INFERENCE USING EARLEY-STOLCKE PARSERFigure 5: The proposed hierarchical tracking framework to assist human radar operators.A base-level tracker T outputs filtered state estimates at a fast time-scale τ. Thetracklet estimator aggregates state estimates from the base tracker and emitsquantized tracklets aˆt at a slower time scale t. A set of SCFG models for differ-ent threat scenarios are considered in the classification problem. The sequenceof noisy tracklets aˆ0:T is fed into the Earley-Stolcke parser to either performmodel classification to generate alarms or as a visualization tool to recover thesuspicious trajectory.5) Multi-target Change DetectionIn Sec. 6.6, a multi-target change detection approach is presented for pattern oflife analysis in aggregate behavior of multiple targets. A likelihood-based ap-proach is presented to detect changes in the target dynamics based on traffic flowinformation.6) Numerical SimulationsThe detection performance of the SCFG models are illustrated through extensivenumerical simulations in Sec. 6.7. SCFG models are compared to HMMs andour results show a significant increase in detection probability. We also show vianumerical simulation, that changes in the traffic flow can lead to reliable detectionof change in pattern of life using SCFG models. We use both camera-based wide-area surveillance and radar-based scenarios as proof-of-concept.6.2 hierarchical tracking framework to assist human operatorsIn this section, a system-level description of the tracking framework is presented(as depicted in Fig. 5). A conventional (base-level) target tracker operates on the54fast time scale, while the higher level “middleware layer” operates on a slowertime scale.6.2.1 Base-Level TrackerThe base-level tracker is a Bayesian filter operating on the fast time scale (orderof seconds) denoted by τ = 1, 2, . . .. The base-level tracker can be represented asan operator T that uses sensor measurements zτ to update a posterior filteringdistribution Fτ over the position and velocity of the target byFτ = T(Fτ−1, zτ). (121)For example, consider a target represented by its kinematic state s = [sx, sy, sx˙, sy˙]ᵀ.The state variables (sx, sy) refer to the position of the target while (sx˙, sy˙) refer tothe velocity of the target in Cartesian co-ordinates. Classical target tracking usesa state space modelsτ+1 = f (sτ, wτ),zτ = h(sτ) + vτ,where wτ and vτ represent the state and measurement noise respectively. Thestate transition and measurement functions are represented by f (·) and h(·) re-spectively. A base-level tracker estimates the target state trajectory from theradar measurements in a causal manner. This is a filtering problem involvingcomputation of (an approximation to) the a posteriori filtering distribution Fτ =P{sτ|z0, z1, . . . , zτ}.6.2.2 Tracklet EstimationLet t = 1, 2, . . . denote a slower time scale (order of minutes), which we call theepoch scale, at which the human analyst makes decisions. The human analystreduces the posterior distributions Fτ to the mean position/velocity and qualityof estimate (e.g. variance). More specifically, given the sequence of track distribu-tions, Ft = {Fτt , . . . , Fτt+1−1}, define a tracklet on the slow time scale asaˆt = H(Ft) ∼ P{·|at}. (122)Here, the tracklet aˆt denotes, a quantization of an average state vector sˆt of thetarget obtained from the tracklet estimator at epoch t, and can be viewed as anoisy version of the “true” underlying quantized position and/or unit velocityvector of the target denoted as at.55(a)4pi3 2pi 4pi04pi−2pi−4pi3−North-EastNorthSouth South-EastWestSouth-WestNorth-WestEast(b)Figure 6: (a) shows the discretization of a surveillance space Λ into grids (xi, yi), eachof size ∆x × ∆y. Each grid element is lexicographically ordered into a set Vpos.There are a total of |Vpos| bins in the surveillance space.(b) shows the velocitytracklets v ∈ Vvel representing cardinal radial directions.Two different types of tracklets are considered: (a) the position tracklets aˆpostwhich are output by the position tracklet estimator Hpos and (b) velocity trackletsaˆvelt which are quantized throughHvel. Tracklets are used as syntactic sub-units oftarget trajectories. The SCFG shape models utilize velocity tracklets as sub-unitsof the trajectory shape while SCFG destination-specific pattern models utilize po-sition tracklets as sub-units of a goal-directed trajectory (following a specific pat-tern of visited sites).The position tracklet estimator operates on a discretized surveillance spaceΛ over which the target is observed. At each time instant, the position track-let estimator quantizes the average (sˆxt , sˆyt ) state estimates to the closest element(xi, yi) ∈ Λ on the discretized 2-D grid Λ as shown in Fig. 6a. The velocity track-let estimator utilizes the average velocity estimate (sˆx˙t , sˆy˙t ) to find the direction ofmotion of the target. The possible directions of motion of the target are quan-tized into 8 radial cardinal directions from the set Vvel = {~a = −pi,~b = −3pi4 ,~c =−pi2 ,~d = −pi4 ,~e = 0, ~f = pi4 ,~g = pi2 ,~h = 3pi2 }. The cardinal directions representedby the mode are shown in Fig. 6b. Each mode is represented with a lowercasealphabet and an→ superscript to denote that it is a unit directional vector.566.2.3 Trajectory Classification ObjectiveA target trajectory is associated with a specific spatio-temporal pattern depend-ing on its shape and/or the pattern of sites visited by the target. Each trajectoryis assumed to be generated by a model Gk ∈ G, k = {1, . . . , K}, where there areK = |G| different types of anomalous patterns under consideration. Develop-ment of these models is the main contribution in this chapter and is describedin Sec. 6.3. As a target moves in a region of interest (ROI), it generates trackletsaˆt, t = 0, 1, . . .. The anomalous trajectory classification task is then defined as find-ing the model G∗ ∈ G that has the highest probability of explaining the observedtracklet sequence a0, . . . , aT,G∗ = arg maxGk∈GP{Gk|aˆ0, . . . , aˆT}. (123)6.3 models for anomalous patternsIn this section, 8 types of models for anomalous target trajectories are presented.The motivation behind using SCFG models like arcs, rectangles, palindromes etc.is that such trajectories cannot be exclusively generated by Markov models2. Aformal proof of this assertion can be constructed using the Pumping Lemma forregular grammars (HMMs) [36], [21].6.3.1 Random Walk ModelsRandom walks are the most elementary of Markov models for target trajectoriesevolving on a road network represented as a finite set of nodes Vpos = {vi, i =1, . . . , |Vpos|}; see Fig. 6a. The target position at can be represented using thebin vi ∈ Vpos in which it resides at instant t. A Markov chain model GMC oftarget dynamics over such a road network can be parameterised using a priordistribution P{a0 = vi} over the initial state a0 at time t = 0 and a possibly time-varying transition matrix At(i, j) = P{at = vj|at−1 = vi}. A transition matrix forsuch random walk models can be calculated based on physical distance betweennodes. The state transition diagram of such a model is shown in Fig. 7.2 This implies that a Markov chain can generate a sample path that is an instance of a shape withsome finite probability. However, it cannot do so with probability 1 unless the state space isextended artificially to the length of the trajectory.57ENDSFigure 7: The state transition diagram for target trajectories on a road network. For a ran-dom walk model, the transition probabilities pnbr are non zero only for neigh-boring nodes that are connected by streets. The neighbor probabilities are uni-formly distributed over all connected neighbors. For the Markov bridge model,the transition probabilities are time-varying.6.3.2 Reciprocal Models for Destination-Aware TrajectoriesA human target following anomalous behavior rarely moves according to a ran-dom walk model. The sequence of actions taken by an intelligent agent are oftenpre-meditated on a global scale with random variation at the local scale. Withthis assumption in mind, the local dynamics of a target are more appropriatelymodeled using a reciprocal process.A discrete-time reciprocal process at ∈ Vpos is a one-dimensional Markov ran-dom field with the non-causal propertyP{at = vj|as 6=t} = P{at = vj|at−1 = vi, at+1 = vl}, (124)parameterised by the homogeneous 3-point transitionsQ(i, j, l) = P{at = vj|at−1 = vi, at+1 = vl}with vi, vj and vl ∈ Vpos. In urban environments, traffic information can be usedto estimate the 3-point transitions Q(i, j, l) between intersections in a road net-work. This is described in Sec. 6.7.1.58Using reciprocal dynamics, a destination-aware trajectory is defined by a priorifixing the final destination of the target aT = vk. The resulting Markov bridge ischaracterized by a probability transition lawP{at|at−1, aT} =P{at|at−1, at+1}P{at+1|at, aT}P{at+1|at−1, aT},that induces a backwards recursion for time-varying two-point Markov transitionsBkt (i, j) =Q(i, j, l)Bkt+1(j, k)(∑j′∈VposQ(i, j′, l)Bkt+1(j′, l))−1, (125)where the second term on the right hand side of (125) is the normalization con-stant. The time-varying transitionsBkt (i, j) = P{at = vj|at−1 = vi, aT = vk}refer to 2-point transitions of a target with final destination aT = vk. To ensurethat the final destination of the target aT = vk, we initialize the recursion withBkT−1(i, j) = 1.0 for j = k and 0.0 otherwise. Additionally, at time t = T − 2, thetarget transitions according to BkT−2(i, j) = Q(i, j, k). A more detailed treatmentcan be found in [2].6.3.3 Destination-aware PathsA destination-aware path is a target trajectory which is heading towards a knowndestination aT = vk while following local dynamics according to the 3-point transi-tions Q(i, j, l). The Markov bridge approach can also be viewed as a time-varyingstochastic context-free grammar for destination-specific trajectories.A destination-aware trajectory satisfies the constraint P{aT = vk} = 1.0. Adestination-aware SCFG model can be defined using a starting rule of the formS → viXivk with rule probability P{S → viXivk} = P{a0 = vi|aT = vk}. Thedestination-constrained SCFG model is characterized by rules of the form Xi →vjXj with time-varying rule probabilities given by the Markov bridge transitionssuch that P{Xi → vjXj} = Bkt (i, j), where Bkt (i, j) represents the probability in(125). Such rules are only created for neighboring nodes vi, vj that are connectedby an edge. Suppose vk represented the position of a sensitive asset like anembassy or a check-point. Using the approach presented above, a grammaticalmodel GSCFGk represents all trajectories with target destination vk.59Figure 8: The blue star node is the destination node and the red square node is the initialstarting point. The target trajectory follows a destination-aware model to travelbetween the initial and destination nodes.6.3.4 Target RendezvousConsider the trajectories followed by two targets3represented by position trackletsa1t and a2t . A rendezvous is defined as an anomalous event where two targets meetat the same position vk at time T. Such a situation is depicted in Fig. 9. The ren-dezvous of two targets at a given node vk can be considered as a destination-awaretrajectory and modeled using the grammar rules described in the previous section.Define the multiple target position tracklet sequence at =[a1t , a2t]ᵀas a vector con-Figure 9: The two targets represented by their trajectories aˆ1t and aˆ2t follow a rendezvousmodel to meet at the blue star destination node.catenation. The rendezvous of two targets is then modeled as a destination-awaretrajectory a0, . . . , aT−1, aT = [vk, vk]ᵀ with the final state of both targets pinned tothe intended meeting point vk at time T. The pre-computed time-varying transi-tion matrix Bkt (i, j) is used in an extended state-space model to define a higherorder (two target) transition matrixBk,kt (m, n) = Bkt (i, j)⊗ Bkt (i, j), (126)3 The restriction to two targets is for simplification of notation and can be extended to an arbitrarynumber of targets. However, the state space increases exponentially with the number of targets.60where i, j ∈ Vpos and m, n ∈ Vpos × Vpos. The ⊗ operator represents the Kro-necker product between two matrices. The notation Bk,kt (m, n) represents the time-varying transition probability P{at = n|at−1 = m, aT = [vk, vk]ᵀ}. The grammarmodel for target rendezvous has the same structure as destination-aware trajec-tory models. However, the non-terminal and terminal sets are different due to theexpansion in the state space. The time-varying rule probabilities are specified in(126).6.3.5 Palindrome PathsA palindrome refers to a sequence a0a1 . . . aT−1aT = aTaT−1 . . . a1a0 such that re-versing the sequence results in the same pattern. In the context of target trajec-tories using position tracklets, a palindromic paths refers to a target re-tracingits previously traversed path. Such paths can be indicative of behaviors such assearching for a dropped/lost item. A simple grammar for a palindromic path isshown in Fig. 10 which is characterized by rules of the form Xi → vjXjvj. Due tothe inside-outside (branching) manner in which an SCFG generates a sequence,we can ensure the palindrome property by emitting a matching terminal on theleft and right hand side when transitioning to a new node vj.Figure 10: A target re-traces its path to form a palindrome trajectory. The midpoint of thetrajectory is the folding point at which the two halves of the trajectory representthe same nodes.Remark: The next few models generate geometric shapes using velocity trackletsas geometric primitives. The clean terminals at and noisy observations aˆt takevalues from the cardinal direction set Vvel shown in Fig. 6b. These generalize ourpast work [2] by incorporating system uncertainty in the movement patterns ofthe target. This generalization is important because a target attempting to travelin a specific geometric shape does not have a global view of the evolution of itstrajectory. As a result, the local movement patterns of the target are subject tosmall perturbations. The internal process noise is incorporated by allowing each61directional movement to have a finite probability of perturbation into neighboringradial directions.6.3.6 Linear TrajectoriesLinear trajectories are straight paths that are generated by constant velocity targetdynamics obeying local Markov dependency. Linear grammar models are repre-sented using the compact form Gline = {~an} implying that the model can generateall trajectories involving n movements of a target in the direction represented bythe unit vector ~a. A simple regular grammar for lines is characterized by rules ofthe form S→~aS |~a with~a ∈ Vvel representing the target’s direction of motion.6.3.7 Arc-like TrajectoriesThe compact form for arc-like trajectories is Garc = {~an~b+~cn} which is charac-terized by an equal movements in opposing directions represented by the unitvectors~a and ~c (shown in Fig. 11a). The notation~b+ denotes an arbitrary numberof movements in the direction represented by ~b. A simple grammar capable ofgenerating arcs of all lengths is shown in Fig. 11b. We use the notion of arcs torepresent u-turn and open trapezoidal patterns.6.3.8 Rectangle TrajectoriesWe consider the modified-rectangle language (with associated grammar shownin Fig. 12b as Gm-rectangle = {~am~b+~cm~d∗}. The modified-rectangle grammar canmodel any trajectory comprising of four sides at right angles (not necessarily aclosed curve) with at least two opposite sides being of equal length. The notation~b+ and ~d∗ represent an arbitrary number of movements in the correspondingdirections represented by that mode. A full rectangle with both opposite sides ofequal length cannot be modeled by an SCFG [21].Summary: In formal language theory, it is known (and provable using pumpinglemmas [36]) that trajectories like arcs, rectangles and palindromes are impossibleto generate exclusively using Markov models [36]. Additionally, the incorporationof reciprocal dynamics using time-varying rule probabilities allows destination-aware and rendezvous trajectories to be efficiently modeled. The highlight ofthis section was to illustrate the use of such non-Markovian models in trajectoryclassification.62Overall arc-like trajectoryInternal process noise (dither)(a)Arcr1 : S→ AXC [p1]r2 : X → AXC [p2]r3 : X → B [p3]r4 : A→~a [p4]r5 : B→~bB [p5]r6 : B→~b [p6]r7 : C → ~c [p7](b)Figure 11: (a) show depictions of an arc-like trajectory with system uncertainty. The incor-poration of internal process noise serves the philosophical purpose of modelingoverall geometry using self-embedding rules to generate shapes having sideswith equal lengths while also allowing for internal perturbations. In the case ofa human target, such a process noise models the inability of the target to mon-itor their trajectory in a global fashion. (b) shows a simple grammar model forarc-like trajectories.6.4 constraints on scfg rule probabilitiesIn this section, we describe two constraint conditions which ensure that the SCFGis well-posed and also allows us to constrain the expected destination of a target.The first condition is called the consistency constraint (described in [37]) whichensures that the grammar is able to terminate in a finite length string. The secondcondition constrains the final destination of the target in an expected sense. Thisresults from working out the expression for the first order moments of an SCFGin terms of the rule probabilities.The production rule probabilities of a grammar can be chosen such that theexpected value of the final destination is equal to the intended final destination.We assume that the target maintains a constant speed in the direction of motion. Ifthe target speed is κ m/s, then each estimated tracklet zt represents the movementof the target by κ meters in a radial direction represented by zt. The total distanceSt traveled by the target until time t is given bySt =t∑τ=1κzτ. (127)63Overall m-rectangle trajectoryInternal process noise (dither)(a)m-Rectangler1 : S→ AXCD [p1]r2 : X → AXC [p2]r3 : X → B [p3]r4 : A→~a [p4]r5 : B→~bB [p5]r6 : B→~b [p6]r7 : C → ~c [p7]r8 : D → ~dD [p8]r9 : D → ~d [p9](b)Figure 12: (a) shows an example of a m-rectangle trajectory with internal perturbationsand global shape. The grammar in (b) is able to generate all sizes of trajectoriessatisfying such a shape.The co-ordinate system used is the Cartesian plane which is represented by theunit vectors~a and~b as shown in Fig. 6b. Consequently, every trajectory consistingof a sequence of tracklets zt can be written as a linear combination of~a and~b. Eachof the other unit directional vectors can also be written as a linear combination of~a and~b using vector addition.Consider a target moving in an m-rectangle trajectory like that in Fig. 12a. Let’sassume that the target starts at an initial position (x1, y1) which represents theposition vector x1~a + y1~b. We wish to constrain the final destination of the targetto be at (xT, yT) representing the position vector xT~a + yT~b. This implies that thetotal distance traveled by the target is (xT − x1)~a + (yT − y1)~b. The total distanceST can be computed as the sum of the number of times the target moves in eachof the directions in Q (because of the constant speed assumption). This is givenbyST =T∑t=1zt = (xT − x1)~a + (yT − y1)~b, (128)=(N~a~a + N~b~b + N~c~c + . . . + N~h~h),64where T is the total length of the string and Nqj is the number of times the targetmoves in direction qj ∈ Q. Our interest is from a modeling perspective and hencewe would like to constrain the total distance the target travels. It turns out thatthe probabilistic nature of the grammar production rules allows us to constrainthe total distance traveled if we can bound the expected value of the total distancetraveled. This is obtained from (128) by replacing each quantity with its expectedvalue viz., the expected number of times E{Nqj} that the target travels in each ofthe directions qj ∈ Q. To compute these expected values, we need to define thefollowing matrices.The matrix E has |N | rows indexed by non-terminals and |P| columns indexedby production rules. An element Ei,j of the matrix E has value pj if productionrule aj has non-terminal Ni on its left and 0 otherwise. Here pj is the probabilityof choosing rule aj.The matrix C has |P| rows indexed by production rules and |N | columns in-dexed by non-terminals. An element Ci,j of the matrix C is the number of occur-rences of non-terminal Nj on the right hand side of production rule ai.The stochastic expectation matrix N = E · C is an |N | × |N | matrix indexed bynon-terminals. An element Ni,j of the matrix N is the expected number of timesa non-terminal Nj will occur when Ni is re-written using exactly one productionrule. The stochastic expectation matrix N plays an important part in determiningthe consistency of a grammar. A grammar is consistent if it terminates in a finitelength terminal string. A grammar G is consistent if the largest eigenvalue ρ(N)of its stochastic expectation matrix N is less than one [37].We are also interested in the matrix N∞ for any number of production rulesbeing applied. This is given by N∞ =∑∞i=0 Ni. This series converges as shown in[38] toN∞ = (I − N)−1, (129)where I is a |N | × |N | identity matrix. The matrix N∞ is called the non-terminalexpectation matrix. Each element N∞i,j represents the expected number of non-terminals Nj that could result from a non-terminal Ni taking into account anarbitrary number of production rules being applied.The matrix T has |P| rows indexed by production rules and |Q| columns in-dexed by terminals. An element Ti,j of the matrix T represents the number oftimes the terminal qj appears on the right hand side of the production rule ai.The matrix W = E · T has |N | rows indexed by non-terminals and |Q| columnsindexed by terminals. An element Wi,j of the matrix W represents the expectednumber of instances of terminal qj resulting from one re-write of the non-terminalNi.65The terminal expectation matrix W∞ = N∞ ·W has |N | rows indexed by non-terminals and |Q| rows indexed by terminals. Each element W∞i,j represents thenumber of instances of terminal qj resulting from an arbitrary application of pro-duction rules starting from a non-terminal Ni.Using the matrices defined above, we can now compute the expected wordlength of the grammar G. The expected word length is defined as the total numberof terminals qj derived starting from a particular non-terminal Ni. Since we areinterested in the final word length, we consider only the non-terminal S which isthe starting symbol (with index 1). This can be represented asE{Nqj} = W∞1,j. (130)Using the expected values from (130), the total distance traveled by the targetcan be constrained which is the same as constraining the end-point of the trajec-tory given that we know the initial starting point. The constraint equation is givenby∑jW∞1,j ~qj = (xT − x1)~a + (yT − y1)~b, (131)after resolving the 8 radial directions into constituent combinations of ~a and ~b.The value of each element W∞i,j is a function of the production rule probabili-ties p1, . . . , p|P|. Using the constraint in (131) and the consistency constraint, weobtain one inequality constraint (the consistency constraint) and one equality con-straint (the expected word length constraint). The exact form of these constraintsdepends on the form of the grammatical rules. In the case of the grammars de-scribed in Sec. 6.3, the consistency constraint turns out to be a trivial constraint.The eigenvalues of the stochastic expectation matrix N are specific rule probabil-ities. Since a rule probability is 0 ≤ pm ≤ 1, the consistency criteria is triviallysatisfied for these grammars. As a result, we are only left with one equality con-straint to satisfy.In the case of the grammar in Fig. 12b, the expected word length constraint re-sults in a multi-linear equation (after multiplying by the common factor through-out) of the formp2 + p3 − 1p2 − 1−1p6 − 1+ p3 + p4(p2 − 1)(p9 − 1)− κ = 0, (132)where κ is the distance between xT and x1. Since we are faced with only oneequation in 5 unknowns, where each unknown 0 ≤ pm ≤ 1, a brute-force ap-proach is used to obtain a feasible solution. A 5-dimensional grid is created andthe constraint equation in (132) is swept to find a feasible solution set.666.5 data fusion in a multi-sensor networkIn this chapter, we have mainly considered single sensor (radar) surveillance ofa target. However, there are various multi-sensor applications such as wide-areacamera sensor networks that are amenable towards a meta-level tracking frame-work. This section describes how to use the observations made by a networkof sensors to make a final decision regarding the multiple hypotheses (differenttrajectories) being tested. A single sensor in the network is only able to make amyopic inference on the intent of a target being tracked. For example, we knowthat a rectangular trajectory as seen in Fig. 12 consists of 4 linear segments. Asensor observing only one of the linear segments of a trajectory is more likely toclassify the trajectory as being a line. However, when the different parts of thetrajectory as observed by different sensors is jointly considered, a better inferencecan be made.The following approach to data fusion is utilized in this chapter. Each sensortransmits the posterior probability of each SCFG model given the observed stringat that sensor node to a central fusion node. The central fusion node combines theposterior probabilities received from each sensor node to obtain a fused densitywhich is used to determine the intent of the target being tracked. This is doneby assigning the intent to the model with the highest posterior probability. Thecombination formula (also called a consensus rule) CF : [0, 1]J → [0, 1] is a map-ping from the posterior probabilities computed by all the J camera sensors in thenetwork to a fused posterior density obtained at the central node. Consider thecase of J sensors each computing the posterior probability of a given SCFG modelamong M possible models provided with the observation string n1, . . . , nk. Theposterior probability of the m-th model computed by the j-th sensor is given bypj(GSCFGm |n1, . . . , nk). With this setup, we examine two consensus rules viz., thelinear opinion pool and the logarithmic opinion pool [39].linear opinion pool is a consensus rule which takes the formCF(p1, p2, . . . , pJ) =J∑i=1αi pi, (133)where∑i αi = 1. The linear opinion pool is simple, it yields a probabilitydistribution and the weights αi can be used to impart a notion of sensorreliability to the individual sensors in the camera network. However, thelinear opinion pool is not a Bayesian decision maker because the consensusrule is not derived from the joint probability using Bayes rules.67kv(a)kvANOMALY(b)Figure 13: (a) The node vk represents the location of a sensitive asset and it also forms theend-point of a destination-aware trajectory model GSCFGk . In a normal patternof life, local traffic flows towards vk with sample trajectories shown in greendashed lines. (b) When an anomalous event occurs near the node vk, the pat-tern of life switches to an abnormal state with sample trajectories (shown insolid red) arriving at vk using aberrant paths attempting to avoid the anoma-lous event.logarithmic opinion pool is an alternative to the linear opinion pool andthe consensus rule takes the formCF(p1, p2, . . . , pJ) =∏Ji=1 pαii∫∏Ji=1 pαii, (134)where it is usually assumed that∑i αi = 1. The logarithmic opinion pooltreats the posterior probabilities coming from each sensor independentlyand results in a Bayesian decision maker. The main problem with thismethod is to decide how to assign the weights αi.6.6 illustrative example : multi-target pattern of life change de-tectionIn this section, multiple targets are incorporated into anomaly detection as achange detection problem. The aggregate behavior of all targets moving in thesurveillance space Λ is assumed to arise from reciprocal dynamics characterizedby the 3-point transitions Q(i, j, l). An anomaly can be defined as a change in theunderlying reciprocal dynamics such that a normal regime Q(1)(i, j, l) is in effectin the time interval t = {0, 1, . . . , t0 − 1} and an abnormal regime Q(2)(i, j, l) is ineffect in the time interval t = {t0, t0 + 1, . . . , T}.68In [2], a surveillance application is described which seeks to detect changes inthe pattern of life of a local population. Often, the local population is sympathetictowards insurgent and rebel groups. Information about anomalous events likethe installation of an improvised explosive device (IED) can quickly propagatethrough the local population which manifests itself in the change of traffic pat-terns around the area where the IED was placed. A pictorial depiction of such ascenario is shown in Fig. 13. A sensitive asset is located at node vk. The normaloperation of targets follows the behavior shown in dashed green lines. However,a hypothetical anomaly in the vicinity of node vk causes target behavior to changesignificantly as depicted by the solid red lines.Change detection can be described as the problem of determining when, duringthe observation interval (or if at all), the underlying reciprocal dynamics switchesbetween two known models. We impose no prior knowledge on the switchingtime (considering it as an unknown deterministic quantity). Consider a stackedvector aˆt =[aˆ1t , . . . , aˆnt , . . . , aˆNt]ᵀrepresenting the noisy position tracklets aˆnt ofeach target n = {1, . . . , N} existing in the surveillance space Λ at time t. The totalnumber of targets is denoted by N. For notational convenience, the number oftargets is assumed to be fixed and all targets begin moving at epoch t = 0. Thelikelihood computation of the multi-target tracklet sequence P{aˆ0:T} factorisesover individual targets because the trajectory of each target is assumed indepen-dent of other targets and only depends on the underlying reciprocal dynamics.Recall that destination-aware targets have their final state pinned to some des-tination vk reached in a fixed number of movements from epoch 0 to epoch T.Consequently, they are expected to arrive at their destination after T time steps.The vector tracklet sequence aˆ0:T obeys one of the following hypotheses:H1: The tracklet sequence aˆn0:T of each target n being tracked was generated by anSCFG G(1) with underlying reciprocal dynamics Q(1).Ht0 : For some t0 ∈ {1, . . . , T − 2}, the tracklets aˆnt were generated by an SCFGG(1) with normal underlying reciprocal dynamics Q(1) for t = {0, . . . , t0 − 1} andby an SCFG G(2) with anomalous reciprocal dynamics Q(2) for t = {t0, . . . , T}.HT−1: The entire tracklet sequence aˆn0:T of each target n was generated by an SCFGG(2) with underlying reciprocal dynamics Q(2).Our aim is to provide a maximum likelihood technique to (a) detect if a changeoccurred from the normal operating regime and (b) the time t0 at which thechange occurred. Towards this goal, consider the log-likelihood L(T, t0) = P{aˆ0:T|Ht0}69of the noisy observed multi-target tracklet sequence aˆ0, . . . , aˆT of all targets exist-ing in the surveillance spaceL(T, t0) = log P{aˆ0, . . . , aˆT|Ht0}=N∑n=1log P{aˆn0 , . . . , aˆnT|Ht0}=N∑n=1log P{aˆn0 , . . . , aˆnt0−1|G(1)}︸ ︷︷ ︸from Earley-Stolcke parser in (67)+ log P{aˆnt0 , . . . , aˆnT|G(2)}︸ ︷︷ ︸from Earley-Stolcke parser in (67)(135)The multi-target anomaly detection problem is cast as a change detection prob-lem seeking the hypothesis Ht0 with the maximum likelihood of explaining theobserved vector tracklet sequencet∗0 = arg max L(T, t0), (136)where it is understood that t∗0 = 1 implies that the normal regime was in oper-ation during the observation period and t∗0 = T − 1 implies that the anomalousregime was in operation during the entire observation period t = 0, . . . , T. Thelog likelihood in (135) can also be computed in a recursive manner for each targetusing the prefix probability in (65).6.7 numerical examplesIn Sec. 6.7.1, the classification performance of single-target trajectories is consid-ered by examining a) only SCFG models, b) only RP models and c) fused SCFGand RP models. Sec. 6.7.2 explores the use of grammatical models towards ren-dezvous detection for two targets and change detection in aggregate target behav-ior for pattern of life analysis.6.7.1 Single-Target ScenariosThe set of experiments in Sec. 6.7.1.1 demonstrates individual examples for theclassification of anomalous trajectories based on SCFG shape models. The sce-nario being considered is depicted in Fig. 14a. The next set of experiments inSec. 6.7.1.2 deals with the use of the SCFG and RP models in both shape anddestination-constrained trajectories for pattern of life analysis. We contrast theSCFG representation with that of hidden reciprocal models. The scenario be-ing considered is depicted in Fig. 14b. We demonstrate the effectiveness of ourapproach using receiver operating characteristic (ROC) curves for detection per-formance. In each simulation, a base tracker is used to extract tracklets which are70then input into the appropriate classifier. The SCFG filter refers to the Bayesiansignal processing algorithm described in Sec. 4.6.1. An extended Kalman filter(EKF) which is explained in Chapter 5 is employed for tracking the target us-ing a constant acceleration model for the target dynamics in all the subsequentsimulations.6.7.1.1 Velocity-Tracklet Based SimulationsConsider the scenario depicted below in Fig. 14a, where the target moves in aspecific trajectory around a sensitive asset (like an embassy or a security check-point). The shape of the trajectory correlates with the intent of the target and isof great importance if it can be reliably detected. The two shapes considered hereare an m-rectangle trajectory which depicts circling behavior and an arc trajectorywhich can be used to signify avoidant behavior. A clean and estimated version ofthe m-rectangle trajectory traversed by a target is shown in Fig. 15a while that ofan arc trajectory is shown in Fig. 15b.The estimated state of the target output by the base-level tracker is used togenerate tracklets that form the input to the SCFG filter. The quantization of thevelocity tracklets has a de-noising effect on the state estimates. As a result, evenfairly unsophisticated base-level trackers can be used to obtain relatively cleanstrings of the target trajectory. The quantization noise can be captured using theprobability mass functionP{zt = qj|zt = qi} =0.7 if i = j,0.1 if |i− j| = pi4 ,0.05 if |i− j| = pi2 ,0 otherwise(137)For the m-rectangle trajectory in Fig. 15, the grammar model probabilities arecomputed using Bayes’ rule and the assumption of a uniform prior over all gram-mar models. The posterior probability distribution is shown in Fig. 16a. We ob-serve that the line grammar initially has the highest posterior probability becausethe target has not yet made a turn. As soon as the first turn is made, the posteriorprobability of the line vanishes. After the third turn is made, the m-rectangletrajectory becomes the more likely grammar model.The detection performance of SCFG and HMM models is compared in the fol-lowing manner. A continuous-valued target state trajectory s0:T is generated forthree different kinds of shapes viz., lines, arcs and m-rectangles. For example, anarc (like that shown in Fig. 3) can be generated by concatenating the trajectoriesgenerated by 3 different constant velocity (CV) models. The first CV model is71initialized using a normal distribution having a mean with the position at originand a unit velocity in the NE direction such that P{s0} ∼ N([0, 0, 1√2, 1√2]ᵀ,Σ).The co-variance matrix Σ can be chosen to generate trajectories with differentsignal-to-noise ratios (SNRs). This model is then used to generate the upwardpart of the arc for τ = 0, . . . , τ1 time points. The second CV model is initial-ized using the final position of the target in CV model 1 and with velocity mag-nitude vmagτ1 =√(sx˙τ1)2 + (sy˙τ1)2 equal to the final velocity of the target in CVmodel 1. However, the velocity magnitude is concentrated in the x-axis suchthat P{sτ1+1} ∼ N([sxτ1 , syτ1 , vmagτ1 , 0]ᵀ,Σ). Finally, the last part of the arc trajectoryis created by running a 3rd CV model for time points τ = τ2 + 1, . . . , T such thatthe initial state P{sτ2+1} ∼ N([sxτ2 , syτ2 ,vmagτ2√2,−vmagτ2√2]ᵀ,Σ).In all simulations, we generate trajectories of length 1000 time points at thefine time-scale τ. A noisy radar observation process is used and the measure-ments zτ are fed into an extended Kalman filter (detailed in Sec. 5.5.1). The filterestimates are then aggregated every 20 time points and velocity tracklets aˆt aregenerated at the slower time-scale t. For the shape-based trajectories, we create4 types of models GSCFG = {GSCFGline , GSCFGarc , GSCFGm-rectangle, GSCFGarbitrary} and GHMM ={GHMMline , GHMMarc , GHMMm-rectangle, GHMMarbitrary} using both SCFG and HMM frameworks.Models Garbitrary for arbitrary trajectories are also added to each model set. Thetransition matrix of the arbitrary HMM is set up such that the transition proba-bility from tracklet vi to vj is inversely proportional to the angular separation ofthe directions represented by vi and vj. It can represent jumps from any move-ment direction to all the other directions. Such a model is the archetype of arandom walk. A similar model is also used in grammatical form for the arbitrarySCFG model. Equivalent HMMs for lines, arcs and rectangles are created using aleft-to-right transition structure and the transition probabilities are learned fromsynthetically generated example sequences for each shape. The re-estimation pro-cedure for discrete-valued HMMs is used (which is described in Sec. 2.3.4). Asseen from the receiver-operating-characteristic (ROC) curves in Fig. 17, the SCFGmodels have a much better detection capability than corresponding HMM mod-els. This is intuitively due to the fact they form better generative models as theycan capture the inherent hierarchical branching in the pattern generating process.6.7.1.2 Position-Tracklet Based SimulationsConsider the target trajectories shown in Fig. 14b which are of a m-rectangle trajec-tory ending at different destinations. Using the constraints described in Sec. 6.4,a trajectory can be forced to end at a certain point (expected destination). The72rule probabilities of a template m-rectangle grammar are solved for using theseconstraints and three different grammar models are produced (one for each desti-nation). The trajectory in Fig. 15a is the same as the one having destination A inFig. 14b. The posterior probability of the grammar models is computed using thetrajectory with destination A as the input and the result is shown in Fig. 18a. Forthis example, we notice that after the 3rd turn, the correct destination is chosen asthe model with highest posterior probability. This is because for an m-rectangletrajectory, displacement only occurs in one dimension. Since two opposite sidesof an m-rectangle should be of the same size, only one of the remaining sidesneeds to be observed, before the model can predict the destination.Using the same set of 3 destinations depicted in Fig. 14b, a set of Markov bridgemodels can be constructed using the development in Sec. 6.3.3 and Chapter 3.The trajectory with destination A is then used as input to classify the destinationbased on the signal processing of RP models. The results from Sec. 3.3 are used toobtain the posterior probabilities (where each Markov bridge represents one of thedestinations). The posterior probabilities can be seen in Fig. 18b. We immediatelynotice that the RP formulation is unable to distinguish between the models untilnearly the end of the trajectory. However, at a certain point in the trajectory,the posterior probability of the correct destination increases until it reaches itsdestination with probability 1.Incorporation of traffic information: Position-tracklet trajectories depend on theincorporation of the 3-point transitions Q(i, j, l) described in Sec. 6.3.1 and Chap-ter 3. Traffic authorities in various cities collect and store turns ratio and trafficflow information over road networks using traffic cameras and induction loop sen-sors. Traffic flow ω(i, j) is a count of vehicles traveling between two neighboringnodes i and j. The traffic flow is undefined for nodes that are not connected by aroad. The turns ratio κ(i, j, l) at node vj represents the proportion of cars turningtowards node vl when arriving at node vj from node vi. The 3-point transitionsQˆ(i, j, l) can be empirically estimated using these quantities asQˆ(i, j, l) = ω(i, j)ω(j, l)∑j′ ω(i, j′)ω(j′, l)≈ω(i, j)κ(i, j, l)∑j′ ω(i, j′)κ(i, j′, l), (138)where ω(i, j)ω(j, l) denotes the number of vehicles that travel first from node vito node vj and finally onto node vl. However, this quantity cannot be computedwithout specifically identifying each vehicle. Consequently, the approximationin (138) is used. In the absence of traffic data, auxiliary information from cellu-lar localization or vehicular GPS traces can also be used to estimate Q(i, j, l). Insuch cases, the estimation does not require the approximation in (138) as each cellphone/vehicle can be uniquely identified. However, the traces must be quantizedto grid positions in the surveillance space. For the purposes of simulations, a syn-73thetic three-point transition matrix is used such that the 3-pt transitions Q(i, j, l)at each node vj are inversely proportional to the distance between vl and a des-tination node vk. Such a choice biases the target to make transitions to reach thedestination vk in the shortest number of hops. Destination-aware trajectories: Usingthe 3-point transitions described above and the time-varying transitions in (125),a destination-aware trajectory model GSCFGk is created for an arbitrary node vkwhich was chosen as an “interesting” node hypothetically containing a sensitiveasset. We then simulate multiple trajectories of length T with node vk as thedestination. These trajectories are observed using a noisy process such thatP{aˆt = vi|at = vj} =0.8, for vi = vj0.2|Nj|, for vi ∈ Nj, (139)whereNj refers to the set of intersections connected to node vj by a street and |Nj|refers to the numbers of connected nodes. This noise distribution was chosen toincorporate averaging and quantization effects from the tracklet estimator in (122).As before, models for both SCFG and HMM frameworks are created such thatGSCFG = {GSCFGk , GSCFGi1 , . . . , GSCFGiN} and GHMM = {GHMMk , GHMMii , . . . , GHMMiN}.The models GiN are destination-constrained but with final destination at somenode in 6= k. We chose several such nodes within 2-3 hops of the intended desti-nation k. The detection performance can be seen in Fig. 19a.Palindrome trajectories: Examples are also provided for the detection of palin-drome paths over the road network Λ. A palindrome trajectory is artificially sim-ulated by first generating an arbitrary trajectory for bT/2c time-points and thenappending the reversed sequence to create a re-traced path. The same noisy obser-vation process as in (139) is used to obtain tracklet measurements aˆ0:T. We use adestination-aware model with the same start a0 = vk and end aT = vk points repre-sented as Gk,k as well as a more general SCFG model allowing random transitionslike a random walk within the model set GSCFG = {GSCFGpalindrome, GSCFGk,k , GSCFGgeneral}.The equivalent HMM has a fully connected state transition diagram and therule probabilities are learned using Baum-Welch re-estimation [14] from examplepalindrome trajectories generated from its generative SCFG model. The perfor-mance results are shown in Fig. 19b.The target rendezvous event can be considered as an expanded state-space ver-sion of the destination-aware trajectory model. Simulations show similar resultsto the single-target destination-aware case. As a result, simulation results are notshown for such events.746.7.2 Pattern of Life AnalysisSingle-target pattern of life analysis is defined in the context of this chapter as thedeviation in shape or destination of a target trajectory. This scenario is examinedin Sec. 6.7.2.1 and compares an SCFG-only model with a fused SCFG and RPmodel. Multiple-target pattern of life analysis is defined as the change in theaggregate behavior of many targets in a particular area traveling between a pre-established source and destination. This scenario is examined in Sec. 6.7.2.2.6.7.2.1 Single Target Pattern of Life AnalysisIn this section, we compare the performance of a destination-constrained SCFG(as presented in Sec. 6.4 with the fusion of an SCFG shape model and an RPdestination model. SCFGs can be used to effectively determine complex spatialpatterns of a trajectory. However, the destination can only be predicted in anexpected sense. Alternatively, reciprocal processes cannot incorporate shape con-straints on trajectories but are well suited towards predicting the destination of atarget. If the intent of target is modeled as a function of its shape and destination,then the conditional joint probability of shape and destination achieves a form ofinter-model fusion.Suppose that there are K syntactic trajectory patterns of interest (like arcs, rect-angles, closed loops, move-stop-move etc.) and N destinations of interest in anintent inference task. Each trajectory shape of interest is modeled through theuse of a SCFG GkSCFG, k = {1, . . . , K} and each destination of interest is mod-eled by a Markov bridge GnMB, n = {1, . . . , N}. The set of all target intentsI = {Im|Im ∈ Gk × Gn, m = {1, . . . , KN}} under consideration is given by theCartesian product of the SCFG and Markov bridge models. Under the indepen-dence assumption of target shape and target destination, the posterior probabilityof the target intent is given byP{Im|z1:t} = P{GkSCFG|z1:t}P{GnMB|z1:t} (140)where z1:t = {z1, . . . , zt} are understood to be velocity tracklets when dealingwith SCFG models and position tracklets when dealing with Markov bridge mod-els. The individual posterior probabilities of the SCFG model GkSCFG and Markovbridge model GnMB can be computed using the filter equations presented in Sec. 3.3and Sec. 4.6.1 respectively.Using the joint shape and destination classifier of (140), our aim is to detect asignificant departure of a target or targets from a pre-established routine trajectory.This involves a trajectory either deviating in shape between known starting pointand destination point or a target deviating in both shape and destination point.The example demonstrated in Fig. 14 is used for the numerical study.75The flavor of the experiment is as follows: Consider that targets traveling be-tween point A and point B are tracked and its trajectory on the X-Y plane isrecorded. A normal trajectory has a linear shape between point A and point B. Asensitive asset is located between points A and B. A suspicious trajectory, on theother hand, involves any kind of circling or avoidant behavior between points Aand B. For instance, we would like to classify m-rectangle shaped trajectories withpoint B as its destination. In the experiment, we create 100 differently shaped tra-jectories between point A and point B. Of these, 25 trajectories are arc-shaped, 50are m-rectangles and 25 are arbitrarily shaped. The m-rectangle trajectories formpositive examples while the other shapes represent negative examples. We com-pare two classifiers, viz., the destination constrained SCFG and a fusion of the RPmodel (for destination prediction) and the SCFG model (for shape classification).The receiver operating characteristics for these classifiers are shown in Fig. 20a.The fused SCFG and RP classifier outperforms the destination constrained SCFGclassifier. On the equal error rate guide, the classification accuracy is 83% for thefused SCFG and RP model while it is 73% for the constrained SCFG model. Thearea under the curve (AUC) for the fused SCFG and RP model classifier is 0.8879while the AUC for the destination constrained SCFG is 0.8012. We feel that theexpected destination constraint makes the constrained SCFG a weak classifier fordestination prediction. It would be prudent to point out that constraining higherorder moments of the sentence length may lead to better classification accuracy.For example, the variance of the sentence length could place a more stringentconstraint on the destination through creating a narrow distribution around theexpected destination.As demonstrated in Fig. 20a, the expected destination constraint does the notmake the constrained SCFG very amenable towards destination prediction. How-ever, it does make the SCFG more well-behaved in terms of its performance whenonly considering shape classification. In Fig. 20b, an SCFG model is only used forshape classification of the models in Sec. 6.3.6-6.3.8. We see one distinct clusterof similar ROC curves in Fig. 20b. These curves are obtained for SCFG modelsrecovered by using feasible solutions generated from (132). The other curves areobtained by randomly choosing probabilities for the m-rectangle grammar modelthat do not satisfy any destination constraint. The classification accuracy is anaverage of 73% in the former case while it is lowered to 69% in the latter.6.7.2.2 Multiple-Target Pattern of Life AnalysisIn this section, multiple target pattern of life analysis is formulated as a changedetection problem untilizing three-point transition matrices created from trafficinformation. The multi-target change detection problem in Sec. 6.6 is examinedin this section. The normal operating regime consists of a regular 50 node network76as shown in Fig. 4b. The normal regime is simulated by making the three-pointtransitions Q(i, j, l) inversely proportional to the distance between the node vland the destination node vk to bias the target to reach the destination as quicklyas possible. An anomalous regime is then created by removing edges around achosen “interesting” node vk. In addition, the 3-pt transitions at each node vjare chosen proportional to the distance between the node vl and the destinationnode vk. Such a choice is opposite to the normal regime and it biases the target to“avoid” transitions taking it closer to the destination node vk.Various single-target destination-aware trajectories are simulated heading to-wards the destination node vk using the normal regime. After a specific amountof time t0, the trajectory models are switched to the abnormal regime. Using theprocedure outlined in Sec. 6.6, the different hypotheses Ht0 are tested and thechange point detection performance results are shown in Fig. 21.6.7.3 Data Fusion Using Multiple SensorsWhile we have mostly examined radar-based scenarios in the prequel, this sectionexamines visual surveillance using a network of non-overlapping camera sensors.The first scenario is that of surveillance of the exterior of a building while thesecond scenario is that of surveillance inside the corridors of a building. Differentpeople can traverse the same path with different speeds or different cameras mayhave different frame rates. These may lead to trajectories of different scale. How-ever, the recursive self-embedding property of an SCFG imparts scale-invarianceto the intent inference task. We highlight this as a key feature of the approach pre-sented and demonstrate these concepts numerically in the following simulations.6.7.3.1 External Building ScenarioThe first scenario proposed is that of an external building surveillance scenario.We initially simulate the case where the mode estimates are noiseless and thenexamine the more interesting noisy case. Consider the surveillance of a security-sensitive building (like an embassy) depicted in Fig. 22(a). An alternative place-ment of the cameras for the same scenario is shown in Fig. 22(b). The surveillanceis carried out using a network of cameras on the exterior walls of the building.A typical CCTV camera has an angle of view of 700 which heavily restricts theobservable field of view. In this simple scenario, we assume that the cameras donot have zoom, pan or tilt capabilities. With four cameras, each covering one sideof the building, wide unobservable areas called blind spots exist where targetscannot be reliably tracked. The scenario shown in Fig. 22 depicts some differenttrajectories that a target can exhibit. The usefulness of the scale invariance or self-77embedding property of the SCFG model is apparent in the fact that the target canmaneuver at different distances away from the building thus resulting in variousscaled versions of the shapes being considered. The SCFG model can account forthis scaling implicitly.The problem of observation gaps is tackled by assuming that the cameras aresynchronized and the maneuvers of the target in unobservable areas is repre-sented by a special terminal symbol x which has a uniform distribution over allthe possible emitting non-terminals. This implies that the grammar model foreach shape is augmented by a new terminal symbol represent a “unobserved” or“missed” detection. Such a procedure can also be used in non-ideal detection byincorporating a probability of detection over the terminal symbols as is commonin radar tracking.Noiseless target mode estimates: We first consider noiseless tracking in thepresence of large observation gaps and non-overlapping views of the cameras.The target is maneuvering around the building in a square trajectory and largeparts of the trajectory are occluded from each sensors view. The actual intendedtrajectory is codified by the directional tracklet sequence {aabbccdd}. However,each of the sensors only observe a linear portion of the trajectory. For example,sensor 1 only observes the sequence {aaxxxxxx} while sensor 3 observes thesequence {xxxxccxx}. The posterior probabilities computed at each of the sensorsis shown in Fig. 23(a)-(d). Finally, the data fusion approach of a linear pool in(133) and a logarithmic pool in (134) as outlined in Sec. 6.5 is used to obtain acombined posterior distribution over each model in Fig. 23(e)-(f). The pooling iscarried out assuming that each sensor is equally reliable and hence equal weightsare assigned in the consensus rule. However, in scenarios where certain sensorsare known to be more reliable or when certain sensors have larger fields of view,the weights can be appropriately chosen to reflect this reliability. Due to thelarger computational times required for the parsing, we choose to work withsmall strings that aptly provide proof-of-concept.We observe in Fig. 23(a)-(d) that the individual sensors cannot discern the intentof the trajectory because of the large observation gaps. Each individual sensoronly sees a line and hence each individual sensor classifies the trajectory as aline in a specified direction. A peculiarity is observed for sensor 4 which has ahigher posterior probability for the m-rectangle as compared to a line. This is dueto the fact that a smaller m-rectangle turns out to have a higher likelihood thana long line. Nonetheless, in Fig. 23(e)-(f), we immediately notice that the initialclassification is that of a line, however, as soon as the target appears to turn, theprobability of the trajectory being a line decreases and the other trajectories takeprecedence. At the final instance, the m-rectangle receives the largest posteriorprobability which results in a correct classification of the target trajectory. Both78the linear and logarithmically pooled probabilities appear similar and there is nosignificant difference worth commenting upon.Noisy target mode estimates: Next, a noisy version of the same problem issimulated for the square trajectory by considering noisy estimates of the tracklets.For example, sensor 1 observes the string {ahxxxxxx} and sensor 3 observes thestring {xxxx f cxx} while the intended actual trajectory is {aabbccdd}. The sec-ond symbol in the sensor 1 measurement is perturbed from a to h (this impliesa perturbation of pi4 radians). A similar perturbation occurs in the other sensors.The posterior probabilities at each of the sensors is not useful for inferring the tra-jectory due to the large observation gaps. However, from the fused probabilitiesin Fig. 24(a)-(b), it can be observed that while the linearly pooled probabilitiesindicate the m-rectangle as having the largest posterior probability, the rectan-gular arc also receives a high score. Moreover, these posterior probabilities arebelow 0.5 which does not imply a high confidence of discernibility. On the otherhand, the logarithmically pooled probabilities fare better by clearly indicating them-rectangle as the model with the largest posterior probability. Due to the noisyestimates of the tracklets, we also notice that the trapezoidal arc also has a signif-icant posterior probability in contrast with the noiseless case.6.7.3.2 Internal Hallway ScenarioThe second scenario that we consider is shown in Fig. 25(a). This scenario em-ulates that of surveillance in the interior of a building within its hallways. Anexample application could be monitoring of school or office hallways for suspi-cious behavior. Because of the narrow hallways, the field of view of camerasused indoors have to be larger. The usefulness of the scale-invariance property ofSCFGs becomes clear when we consider that a target can traverse shapes of dif-ferent sizes if different corridors are used. A larger square trajectory is shown inFig. 25(b). Alternatively, different people could traverse the hallways at differentspeeds thus leading to the self-embedding of the directional tracklets. This occursbecause the tracklets are simply unit directional vectors without any notion ofmagnitude. In both cases, the underlying SCFG modeling is able to recognize thetrajectory without ad-hoc modifications to account for scale.Noiseless target mode estimates: The noiseless case of an m-rectangle trajec-tory is considered in Fig. 26. The intended trajectory is given by the trackletsequence {aabbbccddd}. This scenario is different from the earlier external build-ing scenario because of the nature of the camera placement in indoor scenarios.Some of the cameras do not merely measure linear sections of the trajectory. Forexample, the string observed by sensor 1 is {aabxxxxxxd} which includes turn in-formation as well as the tracklet at the end of the trajectory. Intuitively, we would79expect a better intent inference which is apparent in the posterior probabilities inFig. 26.Noisy target mode estimates: Finally, a noisy version of the rectangular tra-jectory is simulated and the posterior probabilities are shown in Fig. 27. Theintended trajectory is the same, viz., {aabbbccddd}. However, as before, randomperturbations occur due to the noisy estimates provided by the tracklet estimator.For example, the tracklets observed at sensor 3 is {xxxxxggcxx}, where the sixthand seventh symbols have been perturbed from c’s to g’s (which corresponds toan error of pi4 radians). The resulting fused posterior probabilities are shown inFig. 27. The fused probabilities in Fig. 27(a)-(b) are able to incorporate informa-tion from all the sensors and correctly classify the trajectory as an m-rectangleeven in the presence of noise and large observation gaps.6.8 conclusionIn this chapter, a hierarchical tracking framework is proposed to assist humanradar operators for the detection and forensic analysis of anomalous trajectorypatterns. Geometric primitives like movement patterns and quantized positionsare used as syntactic sub-units within a stochastic context-free grammar frame-work to perform anomaly detection. The expressive power of SCFG models is ableto capture long-term dependencies in intent-driven trajectories while utilizing ef-ficient polynomial time algorithms for their inference. The Earley-Stolcke parseris modified using a scaling trick to allow processing long tracklet sequences. Anovel interpretation of reciprocal processes using time-varying SCFG rules is alsopresented. We devised novel models for anomalous events like target rendezvousand palindrome paths. Furthermore, the pattern of like analysis problem was for-mulated as a change detection problem using multiple target trajectories. Finally,a comprehensive numerical evaluation of our proposed models was carried out toshow an increase in the detection performance over conventional Markov models.801 1(x , y )T TSecurityCheck-PointAvoidantTrajectory (arc)StraightTrajectory(line)(x , y ) LineType 1 ArcType 2 Arcm-RectangleAvoidantTrajectory (arc)CirclingTrajectory (m-rectangle)(a)1 1(x , y )TT(x , y )Destination ADestination BDestination C(x’ , y’ ) T T3.5 km6 km10 km(x’’ , y’’ )TT(b)Figure 14: (a) Depiction of different anomalous trajectory shapes and associated intent.(b) Targets traveling to different destination using the same trajectory shape.81−4 −2 0 2 4 6051015Base−level tracker input and outputx coordinate of target statey coordinate of target state Original TrajectoryNoisy MeasurementsEstimated Trajectory(a)0 20 40 60 80 100 12005101520Base−level tracker input and outputx coordinate of target statey coordinate of target state Original TrajectoryNoisy MeasurementsEstimated Trajectory(b)Figure 15: (a) An m-rectangle trajectory, its noisy measurements and estimated outputfrom the base level tracker. (b) An arc trajectory, its noisy measurements andestimated output from the base level tracker.aaaahabbbbbbbbbbbbbbbf fcccccccgdddddddddddddd00.20.40.60.81Posterior probability of each grammar model P{Gk|z0 … zt}SequenceP{G k|z 0 … z t} ArcLinem−Rectangle(a)aaaaaaaaaaabbbbbbbbbbbbbbbbbbbbbbbbbbbbbbccccccccc00.20.40.60.81Posterior probability of each grammar model P{Gk|z0 … zt}SequenceP{G k|z 0 … z t} ArcLinem−Rectangle(b)Figure 16: The posterior probability of different shapes when tracking a target followinga (a) m-rectangle trajectory and an (b) arc trajectory.820 0.2 0.4 0.6 0.8 100.20.40.60.81True PositiveFalse Positive SCFG − AUC = 0.73833HMM − AUC = 0.66498Equal Error Rate GuideBaseline(a)0 0.2 0.4 0.6 0.8 100.20.40.60.81True PositiveFalse Positive SCFG − AUC = 0.93157HMM − AUC = 0.84163Equal Error Rate GuideBaseline(b)0 0.2 0.4 0.6 0.8 100.20.40.60.81Arc Recognition RateThreshold SCFGHMM(c)0 0.2 0.4 0.6 0.8 100.20.40.60.81Rectangle Recognition RateThreshold SCFGHMM(d)Figure 17: (a) The ROC curve for SCFG arc-like trajectories with area under the curve(AUC) = 0.738 shows better performance than HMMs that have an AUC of0.664. The SCFG models also have a better absolute recognition rate for higherthresholds as seen in (c). In (b), the ROC curve for SCFG m-rectangle trajecto-ries with AUC = 0.928 shows much better performance than HMMs with anAUC of 0.841. The SCFG rectangle models also have better absolute recognitionrates as shown in (d).83b a a b b b b b b b b c c c c g d d d d d d d00.10.20.30.40.50.60.70.80.9Posterior probability of each grammar model P{Gk|z0 … zt}SequenceP{G k|z 0 … z t} Destination CDestination BDestination A(a)0 20 40 60 80 100 12000.20.40.60.81Posterior prob. of each Markov bridge model P{θk|Z0 … Zt}TimeP{θ k|Z 0 … Z t} True DestinationDestination BDestination C(b)Figure 18: (a) The posterior probability of different destinations using an m-rectangle tra-jectory as depicted in Fig. 14b. (b) Markov bridge model classification fordestination prediction.0 0.2 0.4 0.6 0.8 100.20.40.60.81True PositiveFalse Positive SCFG − AUC = 0.73495HMM − AUC = 0.50006Equal Error Rate GuideBaseline(a)0 0.2 0.4 0.6 0.8 100.20.40.60.81True PositiveFalse Positive SCFG − AUC = 0.91497HMM − AUC = 0.8232Equal Error Rate GuideBaseline(b)Figure 19: (a) The ROC curve for SCFG destination-aware trajectory shows a marked im-provement over HMMs. The area under the curve for SCFG models is 0.734while the AUC for HMMs is equivalent to a random classifier with AUC =0.50. In (b), the ROC curve for SCFG palindrome trajectories with AUC =0.91 also shows a marked improvement over an HMM learned from examplepalindromes generated by an SCFG.840 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 100.20.40.60.81False Positive RateTrue Positive Rate Fused SCFG & Markov BridgeSCFG destination−constrainedRandom ClassifierEqual Error Rate Guide(a)0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 100.20.40.60.81False Positive RateTrue Positive Rate Feasible set probs.Feasible set probs.Feasible set probs.Random probs.Random probs.Equal Error Rate Guide(b)Figure 20: (a) The receiver operating characteristics of the fused SCFG and RP modelclassifier versus a destination constrained SCFG. (b) The ROC curve for onlySCFG based shape classification with and without destination constraints.850 0.2 0.4 0.6 0.8 100.10.20.30.40.50.60.70.80.91True PositiveFalse Positive t0 = 0, AUC = 0.90944t0 = 7, AUC = 0.81038t0 = 15, AUC = 0.76573t0 = 20, AUC = 0.7876t0 = 24, AUC = 0.92433EERRandom(a)0 0.2 0.4 0.6 0.8 100.10.20.30.40.50.60.70.80.91True PositiveFalse Positive t0 = 0, AUC = 0.5657t0 = 7, AUC = 0.52134t0 = 15, AUC = 0.54647t0 = 20, AUC = 0.53376t0 = 24, AUC = 0.41514EERRandom(b)0 0.2 0.4 0.6 0.8 100.20.40.60.81Change Detection RateThreshold t0 = 0t0 = 7t0 = 15t0 = 20t0 = 24(c)0 0.2 0.4 0.6 0.8 100.20.40.60.81Change Detection RateThreshold t0 = 0t0 = 7t0 = 15t0 = 20t0 = 24(d)Figure 21: (a) shows the SCFG model ROC curves for various switching times in trajecto-ries of length 25. In comparison, in (b), we observe that HMMs cannot capturethe target dynamics and hence perform poorly for pattern of life analysis. Thisresult is in agreement with the result shown in Fig. 19a as HMMs cannot rep-resent destination-aware trajectories. The absolute recognition rates for bothSCFG and HMMs are shown in (c) and (d) respectively. The best performanceof the SCFG change point detection algorithm is when all target trajectories areeither entirely simulated under the normal regime or the abnormal regime.8670 7070EMBASSY70BLIND-SPOTFIELD OF VIEWSquare TrajectoryArc TrajectoryMissingTracks Line Trajectory(a)EMBASSY70BLIND-SPOTFIELD OF VIEWSquare TrajectoryArc TrajectoryMissingTracks Line Trajectory70 7070(b)Figure 22: An external building surveillance example. The camera placement is at thecenter of the building walls in (a) while the cameras are placed at the cornersin (b).87a a x x x x x x00.10.20.30.40.50.60.70.80.91Observed symbol sequencePosterior probability of each modelPosterior probability of different trajectories given the observed string Line (Up)Line (Right)Line (Down)Line (Left)Arc (Trapezoidal)Up Arc (Rectangular)m−Rectangle(a)x x b b x x x x00.10.20.30.40.50.60.70.80.91Observed symbol sequencePosterior probability of each modelPosterior probability of different trajectories given the observed string Line (Up)Line (Right)Line (Down)Line (Left)Arc (Trapezoidal)Up Arc (Rectangular)m−Rectangle(b)x x x x c c x x00.10.20.30.40.50.60.70.80.9Observed symbol sequencePosterior probability of each modelPosterior probability of different trajectories given the observed string Line (Up)Line (Right)Line (Down)Line (Left)Arc (Trapezoidal)Up Arc (Rectangular)m−Rectangle(c)x x x x x x d d00.10.20.30.40.50.60.70.80.9Observed symbol sequencePosterior probability of each modelPosterior probability of different trajectories given the observed string Line (Up)Line (Right)Line (Down)Line (Left)Arc (Trapezoidal)Up Arc (Rectangular)m−Rectangle(d)a a b b c c d d00.050.10.150.20.250.30.350.40.450.5Actual Intended TrajectoryPosterior probability using a linear poolPosterior probability of different trajectories after sensor fusion Line (Up)Line (Right)Line (Down)Line (Left)Arc (Trapezoidal)Up Arc (Rectangular)m−Rectangle(e)a a b b c c d d00.10.20.30.40.50.60.70.80.9Actual Intended TrajectoryPosterior probability using a logarithmic poolPosterior probability of different trajectories after sensor fusion Line (Up)Line (Right)Line (Down)Line (Left)Arc (Trapezoidal)Up Arc (Rectangular)m−Rectangle(f)Figure 23: Online posterior probabilities calculated for a noiseless trajectory in Fig.22a.The posterior probabilities at each of the four sensors are shown in (a), (b), (c)and (d). The linearly pooled probabilities are shown in (e) and the logarithmi-cally pooled probabilities are shown in (f).88a a b b c c d d00.050.10.150.20.250.30.350.40.450.5Actual Intended TrajectoryPosterior probability using a linear poolPosterior probability of different trajectories after sensor fusion Line (Up)Line (Right)Line (Down)Line (Left)Arc (Trapezoidal)Up Arc (Rectangular)m−Rectangle(a)a a b b c c d d00.10.20.30.40.50.60.7Actual Intended TrajectoryPosterior probability using a logarithmic poolPosterior probability of different trajectories after sensor fusion Line (Up)Line (Right)Line (Down)Line (Left)Arc (Trapezoidal)Up Arc (Rectangular)m−Rectangle(b)Figure 24: Online posterior probabilities calculated for a noisy trajectory in Fig.22a. Thelinearly pooled probabilities are shown in (a) and the logarithmically pooledprobabilities are shown in (b).89BANK VAULTPERSONAL LOCKERSSECURITY ROOMELEVATOR SHAFTOFFICESDRY WALLDRY WALLELEVATOR SHAFTCeiling-mounted Camera Target tracklets Un-observable trackletsField-of-view125(a)BANK VAULTPERSONAL LOCKERSSECURITY ROOMELEVATOR SHAFTOFFICESDRY WALLDRY WALLELEVATOR SHAFTCeiling-mounted Camera Target tracklets Un-observable trackletsField-of-viewBlind-spot125(b)Figure 25: An internal hallway surveillance example. A rectangular trajectory is shownin (a) and a square trajectory is shown in (b) which are both cases of the m-rectangle language. This example shows the scale invariance of the model dueto different sized m-rectangles.90a a b b b c c d d d00.10.20.30.40.50.60.70.8Combined Symbol StringPosterior probability using a linear poolPosterior probability of different trajectories after sensor fusion Line (Up)Line (Right)Line (Down)Line (Left)Arc (Trapezoidal)Up Arc (Rectangular)m−Rectangle(a)a a b b b c c d d d00.10.20.30.40.50.60.70.80.91Combined Symbol StringPosterior probability using a logarithmic poolPosterior probability of different trajectories after sensor fusion Line (Up)Line (Right)Line (Down)Line (Left)Arc (Trapezoidal)Up Arc (Rectangular)m−Rectangle(b)Figure 26: The noiseless rectangular trajectory in Fig25a. The linearly pooled probabilitiesare shown in (a) and the logarithmically pooled probabilities are shown in (b).a a b b b c c d d d00.10.20.30.40.50.60.70.8Intended Actual StringPosterior probability using a linear poolPosterior probability of di!erent trajectories after sensor fusion Line (Up)Line (Right)Line (Down)Line (Left)Arc (Trapezoidal)Up Arc (Rectangular)m−Rectangle(a)a a b b b c c d d d00.10.20.30.40.50.60.70.80.9Intended Actual StringPosterior probability using a logarithmic poolPosterior probability of di!erent trajectories after sensor fusion Line (Up)Line (Right)Line (Down)Line (Left)Arc (Trapezoidal)Up Arc (Rectangular)m−Rectangle(b)Figure 27: The noisy rectangular trajectory in Fig25a. The linearly pooled probabilitiesare shown in (a) and the logarithmically pooled probabilities are shown in (b).917 T R A J E C T O RY C O N S T R A I N E D T R A C K - B E F O R E D E T E C T7.1 introductionConventional target tracking algorithms employ a detect-then-track approach thatperforms filtering based on target detection. A target is detected by applying ahard threshold on the sensor measurement utilizing a suitable metric, for exam-ple, a constant false alarm ratio. Such a method of first detecting targets andsubsequently forming tracks is more efficient in terms of computational complex-ity. However, in dimly lit (low signal-to-noise ratio) conditions, the backgroundclutter is often at a comparable strength to the target returns and hard threshold-ing leads to overwhelming spurious detections. A track-before-detect (TBD) ap-proach uses multiple frames of the raw sensor measurements with the objectiveof avoiding a hard thresholding decision. Consequently, TBD algorithms jointlyestimate the existence of the target (detection) as well as track its kinematic state(filtering).In this chapter, we enhance the TBD approach by exploiting target trajectorypatterns. Conventionally, maneuvering targets are modeled using jump Markovstate space models[24, 25]. However, modeling maneuvers via Markov chainsdoes not facilitate modeling complex spatial trajectories like u-turns, closed trajec-tories and circling patterns as shown in Fig. 28b. The main idea of this chapter isto model maneuvering targets as a stochastic context-free grammar (SCFG) mod-ulated state space model. The proposed track-before-detect approach exploitshigher level information regarding the intent and/or trajectory pattern of the tar-get. We show that the resulting syntactic TBD algorithms can successfully detectand track targets at significantly lower SNR than conventional TBD algorithms.7.1.1 Why Use Stochastic Context-Free Grammars for Trajectory Modeling?An SCFG is defined formally in Sec. 4.4 and is presented in analogy to a hiddenMarkov model (HMM) to aid signal processing readers who are unfamiliar withthis formalism. SCFGs have been studied extensively in natural language process-ing and are a generalization of Markov chains (as shown in Fig. 28a). The ex-pressive power of SCFGs enables scale-invariant modeling of complex spatial tra-jectories with variable-order long-range dependencies that naturally arise whenhumans (or human operated objects) move in an environment[40]. Such patterns92REGULARCONTEXT-FREECONTEXT-SENSITIVEUNRESTRICTEDSCFGHMM(a)ARCCLOSED LOOPRECTANGLE(b)Figure 28: (a) shows the Chomsky hierarchy of grammars. Polynomial time algorithmsfor inference are only known for the class of context-free and regular gram-mars. SCFGs belong to the class of context-free grammars which are more gen-eral and expressive than regular grammars (that contain HMMs). (b) showsexamples of target trajectories that are scale-invariant and display recursiveembedding. This is a conceptual sketch of a road network on which targettrajectories are evolving. An urban landscape constrains targets to travel pre-dominantly along certain directions. The rectangular trajectories shown havethe same destination but are of different sizes. They can be captured by thesame stochastic context free grammar model without a corresponding increasein computational complexity even though they exhibit memory of differentorders.cannot be generated by Markov chains (this is proved in computer science using"pumping lemmas" [36]). In Fig. 28a, we show the Chomsky hierarchy of gram-matical models which depicts that SCFGs are a more expressive generalization ofMarkov models.Inference using SCFGs can also be performed in polynomial time using ef-ficient algorithms like the inside-outside algorithm [23] and the Earley-Stolckeparser[22]. This makes SCFGs practically relevant unlike more general context-sensitive grammars (see Fig. 28a) where inference is known to be NP-complete[41].Additionally, SCFGs have a compact formal representation in terms of productionrules that allow human intuition to be easily codified into high-level rules. This,in turn, permits the design of high-level Bayesian signal processing algorithms todetect trajectories of interest. The ability for the designer to encode domain exper-tise into a knowledge base is important because the lack of sufficient field data isa limiting factor in training anomaly recognition systems. From an information-theoretic perspective, it is shown in [23] that the predictive power of SCFGs, asmeasured by its predictive entropy, is greater than that of an analogous hidden93(a)−5 0 5 10 15024681012X−positionY−position TrueEstimated(b)0 5 10 15 20 2500.20.40.60.81TimeProbability Pdetectionthreshold(c)Figure 29: (a) shows the perimeter surveillance proof-of-concept application. A dis-mounted target walks around the compound wall in a rectangular trajectory.In (b), a TBD particle filter algorithm with a standard constant velocity modeltogether with a constant turn model is used in a 2dB SNR scenario. The fil-ter diverges midway through the evolution of the target trajectory. In (c), theprobability of detection is shown with the selected threshold. We observe thatalmost half the time, the target is not even detected reliably. In Sec. 7.5, wedemonstrate the performance increase obtained by using SCFG trajectory mod-els as shown in Fig. 34c.Markov model with the same number of free parameters. These characteristicsmake SCFGs an ideal trajectory modeling tool.944pi3 2pi 4pi 04pi−2pi−4pi3−North-EastNorthSouth South-EastWestSouth-WestNorth-WestEast(a)RADAR SENSOR MEASUREMENTTARGETTRAJECTORY SYNTACTIC TBDSCFG TRAJECTORYMODELSPARTICLE FILTEREARLEY-STOLCKE FILTERMETA-LEVEL KNOWLEDGEFROM HUMAN OPERATOR(b)Figure 30: (a)The modes of operation of a target which are represented by directionalmotion models in the 8 quantized radial directions. We refer to ’North-East’,’South’ etc. as cardinal directions in this chapter.(b)The syntactic TBD sys-tem architecture. An image-based sensor measurement from a radar or FLIR(forward-looking infra-red) can form the input to our system. A knowledgebase of radar operator intuition can be used to build SCFG models which helpthe particle filter to operate in low SNR conditions when certain trajectories areexhibited by a target.7.1.2 Why Consider Trajectory-Constrained TBD?Track-before-detect plays an important role for situational awareness in low SNRconditions. The main idea of our approach is for the human operator to utilizemeta-level information about suspicious trajectories to make the TBD algorithmmore sensitive in dimly lit environments. Increased sensitivity, in this scenario, isdefined as an increase in the tracking and detection performance at low signal-to-noise ratios when the target exhibits a suspicious trajectory. Consider the scenariodepicted in Fig. 29a that can occur in dismount-moving target indicator (DMTI)radar applications for highly mobile targets. It depicts a hypothetical situation inwhich a human guard (or vehicle) is patrolling the perimeter of a compound ina circling pattern. Humans are capable of turning on the spot and instantaneouslyreversing their motion which represents a significant deviation from a constantvelocity motion model. Such trajectories are also not well modeled by constantturn models. A standard track-before-detect particle filter with two modes ofoperation (a constant velocity and a constant turn mode) diverges and is unable95to reliably detect and track a simulated rectangular trajectory at an SNR of 2dBas shown in Fig. 29b. However, an SCFG-switching multiple model approachcan accurately detect and track the target as shown in Sec. 7.5. As the targetmoves in the surveillance environment, it generates a sequence of modes q1, . . . , qkwhich constitute a trajectory λ ∈ L. The set L denotes a class of scale-invariant,rotation-invariant trajectories with recursive embedding that can model shapessuch as lines, arcs, m-rectangles and closed-loops as shown in Fig. 28b. In thischapter, we will choose L to be a set of trajectories that can be generated bya stochastic context-free grammar. In the sequel, we show that modeling thetrajectory followed by the target using directional modes allows our syntacticTBD particle filter to operate better than conventional TBD algorithms (withouttrajectory constraints) in lower SNR conditions. Even a small SNR differenceof 3dB can amount to a target being detected with high fidelity (Pd = 0.99) at∼ 16dB) or it being marginally detected (Pd = 0.5 at 13dB) [42].We approach the multi-frame TBD problem as a tracking problem involvinga high-dimensional sensor measurement that is also a highly non-linear func-tion of the state contaminated with non-Gaussian noise. As depicted in Fig. 30b,our syntactic TBD approach utilizes a hybrid particle filter to propagate a mixedcontinuous-discrete state given the entire image sequence of radar measurementswithout performing any hard thresholding at each measurement instant. The out-put of the TBD algorithm is a posterior filtering density from which the target canbe simultaneously detected and tracked. In addition, the mode estimates can beused as a trajectory visualization tool. The mode estimates can also be used ina feedback loop to aid a higher level decision-layer in situational awareness typeapplications.7.1.3 Literature SurveyA brief survey of the literature on the two major components of this chaptera) non-linear filtering and b) trajectory pattern recognition are presented in thissection.On Non-linear Filtering in TBDThe main difficulty in the TBD problem is the highly non-linear relationship be-tween the sensor measurement image and the target state. A rich history of non-linear filtering exists in the TBD literature starting from the use of an extendedKalman filter in [43] and point mass (HMM) filter approximations in [44, 45].An efficient alternative to state-space discretization is to use particle filters tosolve the non-linear estimation problem, see [46, 47]. The histogram probabilistic96multi-hypothesis tracking (H-PMHT) algorithm [48] is an efficient multi-target al-ternative to TBD as it does not threshold the sensor observations and also doesnot use likelihood ratios. A random finite set approach is taken in [49] for mul-tiple targets which uses the probability hypothesis density of a multi-Bernoullirandom finite set. This approach has been shown to be equivalent to a particlefilter TBD approach in [50]. Finally, the recent work [51] addresses the compu-tational complexity of TBD algorithms by using two detection thresholds to firstproduce a small set of detections and then exploiting space-time correlations. Atextbook treatment of hybrid (mixed continuous and discrete state variables) stateestimation techniques can be found in [24].On Trajectory Pattern RecognitionThe modeling of complex spatial trajectories considered stems from research inthe syntactic pattern recognition community [37]. Trajectory modeling has beenwidely studied in the action recognition community. The study in [52] uses atwo-tier approach towards goal recognition in a wireless LAN scenario. A hid-den Markov model (HMM) is used as a feature-detector in the lower tier while ahigher-order HMM is used to enforce syntactic structure in the upper tier. How-ever, a standard HMM suffers from an exponential self-transition probabilitywhen the same state is visited for a long duration. As a result, repeatedly vis-iting the same state exponentially decreases the model likelihood. Consequently,a non-stationary hidden semi-Markov model is proposed in [53] to account forself-transitions. SCFGs can also effectively model such self-transitions using itsself-embedding property (explained in Sec. 4.4).The grammar modeling approach advocated is also related to the approachtaken in [54] where attributes are associated with a stochastic context-free gram-mar to enforce constraints on the applicability of the production rules. The do-main of interest is the detection of certain anomalous activities like carjackingin parking lots. In this chapter, constraints are also enforced albeit on system-theoretic quantities resulting in a convenient initialization for the rule probabili-ties of an SCFG.To the best of our knowledge, constraints on the trajectory patterns have notbeen examined in the context of track-before-detect algorithms in the literature. Incontrast to our past work [55], this chapter considers a highly non-linear measure-ment function for which linear approximations like the extended Kalman filterand extensions like the variable-state interacting multiple model (VS-IMM) usedin [55] cannot be suitably applied. Moreover, the particle filtering solution appliedin this chapter is different from the approximations used in [55]. The more recentwork [2] uses similar trajectory models but is significantly different because itcompletely bypasses the tracker and builds a meta-level inference layer on top97of the base-level tracker. This approach was taken to ensure legacy-compatibilitywith older tracking systems yet providing the prediction and detection capabili-ties of SCFGs to pick out anomalous trajectories.The chapter is organized as follows. The track-before-detect problem is for-mulated as an SCFG-driven multiple-model tracking problem. Details about theswitching state space model used and the sensor characteristics are provided inSec. 7.2. In Sec. 7.3, a brief review of SCFGs is provided together with grammarmodels for common trajectories which can be used as building blocks of morecomplex trajectories. In Sec. 7.4, a particle filtering solution for the SCFG-drivenTBD problem is presented together with a Rao-Blackwellised version. This solu-tion is then used to present numerical experiments in comparison with a Markov-modulated multiple-model TBD particle filter in Sec. 7.5. Finally, we presentconcluding remarks in Sec. 6.8.7.2 scfg-driven multiple model track-before-detectIn this section, the track-before-detect problem is formulated as a constrainedsearch within trajectories satisfying syntactic patterns. The trajectory dynamicsare modeled using an SCFG-mode sequence, the target state dynamics follow aswitching constant velocity model with directional process noise and the sensorcharacteristics are described assuming a line-of-sight radar sensor measurement.7.2.1 Trajectory-Constrained ModelConsider a target moving in the x-y plane according to a discrete-time dynamicmodel of the formxk+1 = Fxk + Gvk(qk), (141)where k is the discrete-time index, qk ∈ Q is a mode, vk is the state process noiseand xk = [xk, x˙k, yk, y˙k]T is the state vector comprising of the position and velocitycomponents in the x and y coordinate axes. The transition matrix F and noisegain G are respectively,F =1 0 δT 00 1 0 δT0 0 1 00 0 0 1, G =δT22 00 δT22δT 00 δT,98where δT is sampling interval. The mode-dependent state process noise vk is awhite Gaussian process with co-variance matrixQ = ρqk ·[σ2a 00 σ2o]· ρTqk ,with ρqk =[sin qk cos qk− cos qk sin qk],where the superscript T denotes the transpose operation, σ2a is the uncertaintyalong the direction indicated by qk and σ2o is the uncertainty along the orthogonaldirection to qk. The modes qk serve to modulate the state process noise vk(qk) andcause it to switch between different variance values.The observations zk from the radar sensor provide data over a discretized two-dimensional domain consisting of an M× N grid of resolution bins of side length∆. In particular, the measurement zk = {zm,nk : m = 1, . . . , M, n = 1, . . . , N} is animage of measured intensities given byzm,nk ={hm,n(xk) + wm,nk ek = 1wm,nk ek = 0,where hm,n(xk) is the target spread function that represents the contribution ofthe target intensity to the (m, n)th bin. The target existence ek is a binary variablerepresenting the absence (ek = 0) or the presence (ek = 1) of a target. The targetspread function hm,n(·) is modeled by a Gaussian spreadhm,n(xk) =∆2 I2piσ2hexp[−(m∆− xk)2 − (n∆− yk)22σ2h](142)where I is the constant amplitude of the target and σ2h is the variance of the Gaus-sian spread function. The measurement noise wm,nk is assumed to be dominatedby Rayleigh distributed clutter. When no target is present, the likelihood of theobservation zm,nk at bin (m, n), follows a Rayleigh distributionp(zm,nk |xk, ek = 0) =2zm,nkPexp−(zm,nk )2P,99where P is the average Rayleigh clutter power determined by calculating the meanpower across the measurement frame. When a target is present, the likelihood ofthe observation follows a Ricean distributionp(zm,nk |xk, ek = 1) =2zm,nkP× exp(hm,n(xk)2 − (zm,nk )2P)I0(2zm,nk hm,n(xk)P),where I0(·) is the modified Bessel function of zero order. The filtering solutionin the sequel requires us to introduce the measurement likelihood ratio in a bin(m, n) asl(zm,nk |xk) = exp(−hm,n(xk)22P)I0(zm,nk hm,n(xk)P). (143)The target existence ek is modeled as a two state Markov chain with transitionmatrixΠe =[1− Pbirth PbirthPdeath 1− Pdeath](144)When the mode sequence is considered to arise from a Markov chain, the transi-tion matrix of the modes is given byΠq =[piq(k, l)]= e−(pi−||k−l|−pi|)2∑m e−(pi−||k−m|−pi|)2 , k, l, m ∈ Q. (145)This transition probability function assigns maximum probability to the samemode such that k = l and assigns an exponentially decaying probability to neigh-boring modes. We use Πq = P{qk|qk−1} to parametrize the Markov chain modelgenerating trajectory sequences q1:k (as an alternative to SCFG models).7.2.2 Estimation ObjectiveWe seek to obtain filtered state, mode and target existence estimates E{xk, qk, ek|z1:k}given the measurement sequence of sensor images z1:k. For each instant k, thepresence (or absence) of the target is indicated by the random variable ek whichis modeled as a two-state Markov chain. The target is assumed to follow a tra-jectory which switches between constant velocity models in certain accelerationdirections qk ∈ Q = {0, pi4 , pi2 , 3pi4 , pi, 5pi4 , 3pi2 , 7pi4 }. These acceleration directions are100shown in Fig. 30a. A sequence of target modes (or maneuvers) is defined as a tra-jectory q1:k which is modeled using SCFGs. The precise formulation of the SCFGtrajectory models is presented in Sec. 7.3. As shown in Fig. 30b, a particle filteris combined together with the Earley-Stolcke parser to perform multi-model stateestimation for a jump SCFG non-linear state space model.7.3 modeling and inference using scfgsSuppose we are interested in tracking dimly lit targets following a specific trajec-tory pattern such as a circling behavior shown in Fig. 29a. How can this infor-mation be encoded into a tractable model and be used with a TBD algorithm?In this section, we provide further insight into SCFGs by contrasting them withhidden Markov models. Finally, we describe how inference using SCFG modelscan be used within the trajectory-constrained framework. The development andnotation in this section follows our previous work in [2]. The formal definition ofan SCFG can be reviewed in Chapter 4 and the trajectory models used are definedin Sec. 6.3.7.3.1 SCFG Models for Anomalous TrajectoriesIn this section, we model various trajectories of interest using stochastic context-free grammar models. For all the SCFG models considered, each type of trajectoryshape or pattern has an associated grammar model L with a common set ofterminals V = Q that represent the possible modes of operation. They mayhave different rule spaces R and/or non-terminal spaces N . While modelingtrajectory shapes using grammar models, we will focus on the structure of theproduction rules. The rule probabilities are chosen so that certain system-theoreticconditions in Sec. 6.3 are satisfied. The models described below have previouslybeen considered in Chapter 6. We include only the grammatical descriptions toprovide a unified description.Linear TrajectoriesWe denote straight paths as linear trajectories that are generated by target dynam-ics obeying local Markov dependency. Linear grammar models are representedusing the compact form Lline = {~an} implying that the model can generate alltrajectories involving n movements of a target in the direction represented by theunit vector ~a. A simple regular grammar for lines is characterized by rules of theform S→~aS |~a with~a ∈ Q representing the target’s direction of motion.101Arcr1 : S→ AXC [p1]r2 : X → AXC [p2]r3 : X → B [p3]r4 : A→~a [p4]r5 : B→~bB [p5]r6 : B→~b [p6]r7 : C → ~c [p7](a)m-Rectangler1 : S→ AXCD [p1]r2 : X → AXC [p2]r3 : X → B [p3]r4 : A→~a [p4]r5 : B→~bB [p5]r6 : B→~b [p6]r7 : C → ~c [p7]r8 : D → ~dD [p8]r9 : D → ~d [p9](b)Figure 31: An arc grammar in (a) and an m-rectangle grammar in (b). Only the pro-duction rules are shown. Capital case refers to abstract non-terminals andlower-case refers to terminal modes. A rule re-writing is analogous to a statetransition in Markov models. A non-terminal N currently being considered forexpansion is replaced by the string on the right hand side of the rule rm chosen(by sampling the conditional distribution induced by all alternative rules) forexpansion. Each rule in the conditional distribution is associated with a ruleprobability pm.Arc-like TrajectoriesThe sentential form for arc-like trajectories is Larc = {~an~b+~cn} which is character-ized an equal number of movements in opposing directions represented by theunit vectors~a and ~c. The notation~b+ denotes an arbitrary number of movementsin the direction represented by ~b. A simple grammar capable of generating arcsof all lengths is shown in Fig. 31(a). We use the notion of arcs to represent u-turnand open trapezoidal patterns.Rectangle TrajectoriesRectangular trajectories are of the sentential form Lrectangle = {~an~bm~cn~dm}, wherethe target moves an equal number of times in opposing directions ~a,~c and ~b, ~d.102However, it can be shown using a pumping lemma that a complete rectangle can-not be modeled by a context-free grammar [36]. A more expressive formalismcalled context-sensitive grammars (see Fig. 28a) are required. However, thereare no known polynomial time algorithms to perform inference with context-sensitive models. Instead, we consider the modified-rectangle language (withassociated grammar shown in Fig. 31(b)) as Lm-rectangle = {~am~b+~cm~d∗}. Themodified-rectangle grammar can model any trajectory comprising of four sidesat right angles (not necessarily a closed curve) with at least two opposite sidesbeing of equal length. The notation ~b+ and ~d∗ represent an arbitrary number ofmovements in the corresponding directions represented by that mode.SCFG Model EstimationThe models presented in Sec. 7.3.1 happen to have compact sentential forms rep-resenting simple geometric shapes. For more complicated trajectory patterns, weadvocate the construction of the rule structure by domain experts incorporatingintuitive knowledge into the rule-based framework of SCFGs. For example, aradar operator typically sees many anomalous trajectories and can subconsciouslycodify the evolution of the trajectory into high-level rules. A discussion of othergrammar construction techniques is provided in [36]. A learning based approachcan also be undertaken to estimate the rules through training data and a typicalapproach based on Bayesian model merging is presented in [22].In addition to the syntactic rules comprising a grammar model, we are alsorequired to choose the rule probabilities governing the conditional application ofa particular rule in the generation of the trajectory sequence. Traditionally, a max-imum likelihood approach can be taken [22] by estimating the rule probabilitiesthat maximize the likelihood of some pre-obtained training data given a candidategrammar model. However, since our models have relatively simple structure, weuse the consistency and expected word length constraints from [2] to estimate ruleprobabilities. Moreover, given that expectation-maximization type algorithms forestimating the rule probabilities of SCFGs are highly prone to getting stuck inlocal minima, such constraints can be used to find appropriate initial values forthe rule probabilities.7.3.2 Comparison between SCFGs and HMMsIn this section, for the reader’s convenience, we provide insight into SCFGs byan analogy with hidden Markov models (HMMs). In order to facilitate our com-parison, we use an example scenario depicted in Fig. 32 in which we seek to103generate the mode sequence q1:5 = ~f ~f ~e ~d ~d. This mode pattern represents anarc-like trajectory (~f 2~e ~d2) with two movements in the ’North-East’ direction, onemovement in the ’East’ direction and two matching movements in the ’South-East’ direction. The SCFG model which can generate such a trajectory is shownin Fig. 32a. It has a non-terminal set N SCFGexample = {S, NE, X, E, SE} where NE repre-sents the ’North-East’ direction as depicted in Fig. 30a. The special symbol X is aself-embedding non-terminal because it can repeatedly call itself while producingan equal number of opposing NE and SE non-terminals. The SCFG terminal setVSCFGexample = {d, e, f } is comprised of the unit vectors corresponding to the cardinaldirections. For example, as depicted in Fig. 30a, the terminal corresponding tothe ’North-East’ cardinal direction is represented by the unit vector ~f . We definean equivalent discrete-observation hidden Markov model specified by a set of un-observed states NHMMexample = {S, NE, E, SE, END} that is similar to the non-terminalset of the SCFG. The HMM state set does not have the self-embedding symbolX and has been augmented with a special END state to guarantee finite-lengthtermination. The HMM observation space is equivalent to the terminal set of theSCFG.In direct analogy to an HMM, the non-terminals of an SCFG are abstract unob-served states while the terminals are observed symbols. In the context of trajectorymodeling, the non-terminals represent structural parts of a shape. For example,in Fig. 32a, an arc is structurally decomposed as NE, E and SE segments. Theterminal symbols correspond to unit movements in a cardinal direction. We no-tice from Fig. 32 that while the HMM state sequence evolves as a linear directedgraph, the SCFG behaves like a branching process. The transitions of an HMMfrom one state to the next can be written as regular grammar rules of the formNk → qk+1Nk+1 which represents a transition from the current state Nk to thenext state Nk+1 while emitting an observation qk+1. This corresponds to the de-piction of the HMM evolution as a linear directed graph in Fig. 32b. However,the SCFG has more complex rules that manifest in a branching process as shownin Fig. 32a. SCFGs are known to be a special class of multi-type Galton-Watsonstochastic branching processes[40, 38]. The defining characteristic of an SCFG isthe presence of self-embedding rules like X → NE X SE in Fig. 32a. Such a rule,repeatedly calls itself to generate equal opposing movements that manifests as anunbounded dependency and imparts scale-invariance to SCFG trajectory models.Analogous to the (hidden) Markov model case, there are several probabilisticqueries that can be computed for a symbol sequence generated from an SCFG. (a)We can compute the likelihood of a sequence P{q1, . . . , qK|LSCFG} given a certainSCFG model. While the forward algorithm [13] is commonly used for HMMs,the inside algorithm [23] is a generalization that can be used for SCFGs. (b) Wecan estimate the hidden sequence of rules used in the derivation of a sequence104analogous to hidden state sequence estimation in HMMs. The Viterbi algorithmcan be used in conjunction with the inside algorithm for this purpose. (c) Finally,we can learn the rule probabilities from a dataset using a maximum-likelihoodapproach called the inside-outside algorithm [23] which is similar to the forward-backward algorithm used in Baum-Welch[13] re-estimation for HMMs.7.3.3 Inference using SCFGsThe main quantity that we are interested for the hybrid state estimation problemin Sec. 7.2.2 is the one-step ahead prediction P{qk|q1:k−1;LSCFG} that can be com-puted from a left-right pass over an observed terminal sequence. Calculation ofthe one-step ahead prediction requires the notion of the “prefix” probability Pk ofa symbol sequence given an SCFG model LSCFG. The computation of the prefixprobability is represented byPk = ∑ν∈(N∪V)∗P{q1, . . . , qk, ν}, (146)where (N ∪ V)∗ denotes all possible arbitrary combinations of non-terminal andterminal symbols. The expression in (146) represents the probability that the se-quence q1:k is the prefix of a sentence generated from a SCFG model LSCFG and itconceptually requires a summation over all possible suffixes. The one-step predic-tion utilizes the probabilistic rules of the SCFG model to predict the next mode inthe sequence. This quantity is important in describing a suitable transitional den-sity for the particle filter solution in Sec. 7.4. The one-step prediction probabilityP{qk = v|q1:k−1} =P{q1:k−1, qk = v}P{q1:k−1}= Pk(ν)Pk−1, (147)where v ∈ Q is an element of the finite mode set Q and we have implicitly as-sumed computation with respect to a particular SCFG model LSCFG and excludedit from the expressions for the sake of brevity. The Earley-Stolcke parser [22] pro-vides an efficient algorithm to compute the quantity in (146) and the related one-step prediction probability in (147). A brief algorithmic description is provided inSec. 4.6.1.7.4 bayesian filtering of syntactic tbdIn this section, a particle-filter based solution is derived for the multiple modeltrack before detect problem outlined in Sec. 7.2. Finite-dimensional filters are105not known for non-linear and non-Gaussian measurement processes. The switch-ing state-space model also introduces a posterior density with an exponentiallyincreasing number of components. As a result, we employ a particle filteringapproach with efficient use of the Earley-Stolcke parser as a proposal density gen-erator. Finally, in Sec. 7.4.2, a Rao-Blackwellised scheme is outlined for variancereduction in the estimates.7.4.1 Multiple-Model SCFG Particle FilterConsider the extended target density P{q1:k−1, e1:k−1, x1:k−1|z1:k−1} that we areinterested in for the multi-frame TBD problem. We define an extended targetstate sk−1 = {qk−1, ek−1, xk−1} and approximate the prior extended state densitywith an empirical random measure such thatP{q1:k−1, e1:k−1, x1:k−1|z1:k−1} ≈ {w(i)k−1, s(i)1:k−1}Npi=1, (148)where Np is the number of particles used. A sequential importance samplingapproach is used to obtain the posterior density in the usual manner using aprediction step and an update step. The prediction step can be decomposed asP{q(i)1:k, e(i)k , x(i)1:k|z1:k−1} = P{s(i)k−1|z1:k−1}× P{x(i)k |q(i)1:k, e(i)1:k, x(i)1:k−1, z1:k−1}× P{e(i)k |q(i)1:k, e(i)1:k−1, x(i)1:k−1, z1:k−1}× P{q(i)k |q(i)1:k−1, e(i)1:k−1, x(i)1:k−1, z1:k−1}. (149)In (149), the first term on the right hand side represents the prior density from(148). To generate new samples for the hybrid state s(i)k , we use the respectivetransitional densities as proposal functions for the particle filter propagation. Thecontinuous-valued state is propagated by sampling from the state transition func-tion in (141). The target existence is propagated using the birth-death Markovtransition matrix in (144). Finally, the mode is propagated using the one-stepprediction probability in (147). Mathematically, we sample Np particles from thebootstrap proposal such that s(i)k ∼ pi{sk|s(i)1:k−1}, wherepi{s(i)k |s(i)1:k−1} = fq(i)k(x(i)k−1, e(i)k−1:k, v(i)k )× P{e(i)k = n|e(i)k−1 = m}P{q(i)k = v|q(i)1:k−1} (150)If the predicted target existence e(i)k = 1, then the following possibilities can occur:106Target Birth: When e(i)k−1 = 0 and e(i)k = 1, the target state is drawn as a samplefrom a birth proposal density pibirth{x(i)k |zk} that is obtained in the following man-ner. The target position components are sampled from a uniform density overpositions in the surveillance region where zk > γ, where γ is an intensity thresh-old. The newborn particles are thus positioned in regions of the surveillance areawhere the most recent sensor measurement has a large value. The target velocitycomponent is sampled from a uniform proposal density ∼ U [−vmax, vmax], wherevmax ∈ R2 is the maximum assumed velocity in the x and y directions.Target Continuance: When e(i)k−1 = 1 and e(i)k = 1, the particle stays alive andthe target state is drawn from the continuance proposal which is taken to be thetransitional prior defined in (141).If the predicted target existence enk = 0, the target state is undefined. This isrepresented by xk = φ. A new sampled particle s(i)k = {x(i)k , e(i)k , q(i)k } is thenappended to the particle representation such that s(i)1:k = {s(i)1:k−1, s(i)k }. The mea-surement update for the particle filter is given byP{x1:k, e1:k, q1:k|z1:k} ∝P{zk|x1:k, e1:k, q1:k}P{x1:k, e1:k, q1:k|z1:k−1}, (151)where P{zk|x1:k, e1:k, q1:k} = P{zk|xk, ek} is the likelihood of the sensor measure-ment given by (143) and P{x1:k, e1:k, q1:k|z1:k−1} is the predicted density given by(150). The measurement likelihood described in (143) is a product of i.i.d randomvariables at each bin location (m, n). However, it is common in TBD literature [46]to limit the influence of a target in a spatial bin (m, n) to a small neighborhoodcentered around that bin. We denote this neighborhood by Cm(xk) indicating theaffected bins in the X-dimension and Cn(xk) indicating the affected bins in the Y-dimension. The incremental un-normalized importance weights for the bootstrapparticle filter are thenw˜(i)k =∏m∈Cm(x(i)k )∏n∈Cn(x(i)k )l(zm,nk |x(i)k ) if e(i)k = 11, if e(i)k = 0.(152)Finally, the weights are normalized and an appropriate resampling step is carriedout. At each stage of the particle filtering algorithm, we can compute estimatesfrom the particle representation of the extended state {s(i)k , w(i)k }Npi=1 Target exis-tence eˆk can be computed aseˆk =∑Npi=1 e(i)kNp. (153)107Target presence is then declared if eˆk is above a threshold value. This can be usedto initiate a track based on the estimated target state given byxˆk =∑Npi=1 x(i)k e(i)k∑Npi=1 e(i)k(154)In a similar manner, we can also compute mode estimates at each time instantk by computing the proportion of particles in each mode and selecting the mostlikely modeqˆk = arg maxv∈V∑Npi=1 I(q(i)k = v)e(i)k∑Npi=1 e(i)k. (155)An algorithmic description of the SCFG multiple model particle filter for syntactictrack-before detect is presented in Algorithm 2.7.4.2 Rao-Blackwellised Multiple-Model SCFG Particle FilterIn this section, we make use of analytical sub-structure present in the problem toreduce the variance of our estimates by using a Rao-Blackwellised version of theparticle filter in Sec. 7.4.1.Consider the filtering density P{ek−1, q1:k−1, x1:k−1|z1:k−1} decomposed asP{ek−1, q1:k−1, x1:k−1|z1:k−1} =P{ek−1|q1:k−1, x1:k−1, z1:k−1}P{q1:k−1, x1:k−1|z1:k−1}. (156)We observe that conditioned on the state sequence x1:k−1 and the mode sequenceq1:k−1, the target existence is generated by a Markov chain. Consequently, theHMM filter can be used to exploit analytical sub-structure for the conditional tar-get existence density ζk−1(l) = P{ek−1 = l|q1:k−1, x1:k−1, z1:k−1}. The continuous-valued target state xk−1 and the discrete-valued mode qk−1 are represented byan extended state sk−1 = {qk−1, xk−1} and their conditional density is approxi-mated by a set of Np weighted random particles as the empirical random measure{s(i)1:k−1, w(i)k−1}Npi=1.We can now write down the prediction and update steps of a recursive Bayesianfilter for the density in (156). The prediction of the conditional target existencedensity is given byζ(i)k|k−1(p) =∑lP{ek = p|ek−1 = l}ζ(i)k−1(l), (157)108Algorithm 2 SCFG-driven Multiple Model Particle Filter1: function SCFG-MMPF({s(i)1:k−1, w(i)k−1}, zk)2: Sample q(i)k ∼ P{qk|q(i)1:k−1;LSCFG} using (147)3: Sample e(i)k ∼ P{ek|e(i)k−1} using (144)4: for i← 1 to Np do5: if e(i)k = 1, e(i)k−1 = 0 (target birth) then6: Sample x(i)k ∼ pibirth{zk}7: else if e(i)k = 1, e(i)k−1 = 1 (continuation) then8: Sample x(i)k ∼ Fx(i)k−1 + Gvk(q(i)k )9: else if e(i)k = 0 (target death) then10: x(i)k = ∅11: Evaluate w˜(i)k using (152)12: Wk = ∑Npi=1 w˜(i)k13: Normalize w(i)k =w˜(i)kWk, ∀i = {1, . . . , Np}14: if Neffective < threshold then15: RESAMPLE{s(i)k , w(i)k }16: for i← 1 to Np do17: w(i)k ←1Np18: return {s(i)1:k, w(i)k }109for l ∈ {0, 1} and i = 1, . . . , Np. The prediction for the random measure {s(i)1:k−1, w(i)k−1}Npi=1is performed using a proposal density piRBPF{s(i)k |s(i)1:k−1, z1:k−1}. We choose to usethe popular bootstrap proposal such that prediction of the extended target stateis given bypiRBPF{s(i)k |s(i)1:k−1} = fq(i)k(x(i)k−1, v(i)k )P{q(i)k = v|q(i)1:k−1}, (158)where fq(i)k(x(i)k−1, v(i)k ) is the state transition function in (141) and P{q(i)k = v|q(i)1:k−1}is the one-step prediction in (147).The measurement update for the target existence is given byζ(i)k|k(l) = P{zk|x(i)k , e(i)k = l}P{s(i)k |s(i)1:k−1, z1:k−1} (159)Finally, the measurement update for the extended state is done using the incre-mental weight update such thatP{s(i)1:k|z1:k} ∝P{zk|s(i)1:k−1, z1:k−1}P{s(i)1:k|z1:k−1}piRBPF{s(i)k |s(i)1:k−1, z1:k−1}=∑lP{zk|x(i)k , e(i)k = l}ζ(i)k|k−1(l)w˜(i)k−1 (160)In (159) and (160), the term P{zk|x(i)k , e(i)k } is evaluated using the measurementlikelihood in (152). An algorithmic description of the Rao-Blackwellised particlefilter is provided in Algorithm 3.7.5 numerical examplesIn this section, we provide two types of numerical examples. First, a reducedstate space "toy" example is detailed to illustrate the concepts in this chapter ina tutorial fashion. Then, we perform simulations on a realistic real-world sce-nario. We evaluate the detection and tracking performance of syntactic track-before-detect at various SNR levels in a Monte-Carlo fashion. There are twocompeting architectures used in all simulations. The first type of architecturecalled “Markov-MMPF” (Markov-switching multiple model particle filter) consid-ers multiple models switching according to a Markov chain model as describedby (145). The second architecture is called “SCFG-MMPF” (SCFG-switching mul-tiple model particle filter) which considers mode switching behavior driven by anSCFG.110Algorithm 3 Rao-Blackwellised Particle Filter for Syntactic TBD1: function RBPF({s(i)1:k−1, w(i)k−1}, ζ(i)k−1, zk)2: for i← 1 to Np do3: Predict ζ(i)k|k−1(p) using (157)4: Sample s(i)k ∼ piRBPF{s(i)k |s(i)1:k−1} using (158)5: Update importance weights w˜(i)k using (160)6: Wk = ∑Npi=1 w˜(i)k7: for i← 1 to Np do8: Normalize w(i)k =w˜(i)kWk, ∀i = {1, . . . , Np}9: Update ζ(i)k|k using (159)10: if Neffective < threshold then11: RESAMPLE{s(i)k , w(i)k }12: for i← 1 to Np do13: w(i)k ←1Np14: return {x(i)1:k, q(i)1:k, w(i)k }, ζ(i)k1117.5.1 Reduced State-Space "Toy" ExampleConsider the one-dimensional non-linear switching stochastic volatility model[56] commonly used in quantitative finance. The mode (called the drift param-eter) represents a “volatility” state which is assumed to be either a low-volatilitystate qk = a or a high-volatility state qk = b. The modes cause the log-volatility toswitch between two states in a linear auto-regressive processxk = α(qk) + φxk−1 + σvk, vk ∼ N (0, 1)where α(qk = 0) = −5.0 and α(qk = 1) = −2.0, φ = 0.5 and σ2 = 0.1. The binary-valued mode qk is assumed to arise from an SCFG with form anb+an havinggrammatical rules S→ aSa [0.6] | X [0.4] and X → bX [0.5] | b [0.5]. Such a modelcaptures scenarios in which the volatility of a certain asset follows a pattern ofbeing in the low volatility state for equal time periods with a high volatility periodof arbitrary length in between. We are interested in tracking the log-volatilityunder such a scenario. The observations are conditionally independent given thelatent state xk such thatzk = exp(xk2)wk, wk ∼ N (0, 1).The conditional probability distributions for the state variables xk and the obser-vations zk are given byP{q1} ∼ P1(v) using (146),P{x1|q1} ∼ N(x1; 0,σ21− [α(q1) + φ]2),P{xk|qk, xk−1} ∼ N (xk; α(qk) + φxk−1, σ2),P{zk|xk} ∼ N (zk; 0, exp xk)Using the model specified above, 1000 Monte-Carlo runs were simulated by chang-ing the measurement noise variance from 1.0 to 20.0 and keeping the state noisevariance fixed at 1.0. We show the results of the Monte-Carlo experiment inFig. 33. The continuous-valued state xˆk estimates are evaluated by computing theroot mean square (RMS) error for each Monte-Carlo run asRMSE =√1K ∑k‖xˆk − xk‖22 (161)The discrete-valued mode is compared using the notion of a confusion matrix. Ifthe discrete mode q takes |Q| values, then a |Q| × |Q| confusion matrix Mq(i, j)112enumerates the number of times the true mode i is estimated as mode j. Themode detection rate is then taken as the trace of the normalized confusion matrix.The normalization in each row i is with respect to the total number of times modeq = i appears in the simulated sequence. A perfect mode estimation would resultin trace(Mnormalizedq (i, j)) = 1.0.The SCFG-MMPF of Algorithm 2 (without the existence variable computations)is able to track the log-volatility with a lower root mean square error (RMSE) andalso detects the modes with a higher accuracy. The Markov-MMPF uses the samealgorithm but the proposal density for the mode comes from the correspond-ing Markov transition matrix representing Markovian switching of the volatilitymodes. We also observe that the variance of the RMSE over all Monte-Carlo runsat a particular noise variance is smaller for the SCFG-MMPF.7.5.2 Syntactic TBD ExamplesWe show, through numerical simulations, that constraining a particle filter TBDsolution along trajectory models results in a reduced error in mode qk, existenceek and state xk estimates compared to competing architectures. The detectionperformance is analyzed by calculating the probability of detection Pd over 1000Monte Carlo runs of the corresponding TBD filter. If H0 represents the hypothesisthat a target is absent and H1 represents the hypothesis that a target is present,then the detection probability Pd is given by the number of times that H1 is trueand the syntactic TBD filter chooses H1 for each of the K frames of availablesensor measurements. The tracking performance is measured using the root meansquare error (RMSE) of the target position and velocity.In all simulations, the time step δT = 0.1 seconds. The process noise co-varianceis constructed using σa = 1.0 and σo = 0.01. A sequence of 100 frames is generatedin each experiment with M = N = 40, ∆ is dependent on the extents of thetrajectory for visualization purposes, the target intensity is assumed constant atI = 20 and the sensor spread parameter σh = 0.7. The Rayleigh clutter powerP = 3. The sensor spread function is also truncated to affect only the closest 8neighbors of the center pixel (i, j).We show an example of a raw sensor measurement in Fig. 34a. Each framedepicts the returned intensities at a single time instant. A white pixel indicatesa high returned intensity while a black pixel indicates a small returned intensity.An operator looking at thresholded sensor measurements would find it impossi-ble to extract target tracks from visual inspection alone. The syntactic TBD particlefilter uses the following parameters. The target existence is modeled by a birthprobability Pbirth = 0.1 and a death probability Pdeath = 0.1. The intensity thresh-old γ = 0.2 max(zk) and the maximum velocity is vmax = [1, 1]T units/sec. The113number of particles used in the filter Np = 1000. Finally, a track is initiated anda target is declared present if the estimated target existence eˆk > 0.7. A simulatedrectangle trajectory and the TBD track output is shown in Fig. 34c. The input tothe TBD tracker is the sequence of sensor measurements shown in Fig. 34. Thisparticular configuration amounts to a 2dB signal-to-noise ratio. We observe thatthe SCFG derived mode sequence is able to track the target better than the caseof a Markov chain based mode sequence.In Fig. 34e, we show the results of running 1000 Monte Carlo runs of a multiplemodel particle filter as the SNR is changed. A range of SNR values is swept bychanging the target intensity and the noise variance. The plot of Pd versus SNRis shown in Fig. 34d which does not show a significant difference between thecompeting models. However, the SCFG trajectory models perform much better interms of estimation error. The decrease in the root mean squared error (RMSE)of the X and Y position is shown in Fig. 34e. A similar decrease in the RMSE ofthe X and Y velocity estimates is obtained for the SCFG trajectory models overthe Markov chain based model as depicted in Fig. 34f. The detection probabilityperformance for both types of architectures are very similar. We hypothesize thatthis is due to the fact that the measurement likelihood is independent of the mode.As a final note, we remark that all simulations were also run with an equivalentsingle model (without multiple modes of operation) particle filter. However, weare unable to show performance comparison as the regular particle filter divergesfor most Monte-Carlo runs.7.6 conclusionThis chapter has addressed the question: suppose one is interested in estimatingthe state of a dimly lit target that is moving according to a specified class of trajec-tories (intent). How can a TBD algorithm be derived to exploit this information?We presented stochastic-context free grammar models arising in natural languageprocessing to model complex spatial trajectory patterns. The syntactic modelingframework facilitates incorporating meta-level information from human operatorsregarding suspicious trajectory patterns into the TBD problem. A novel particlefiltering algorithm coupled with the Earley-Stolcke parser was derived to estimatethe target state and existence. Such a modeling approach can be viewed as a gen-eralization of jump Markov state-space models to jump SCFG state-space models.We also derive a Rao-Blackwellised version of our proposed particle filter to re-duce the variance in the estimates. Finally, the numerical simulations demonstratea marked increase in tracking performance (RMSE) over competing Markov chainbased trajectory models.114In its present form, the proposed particle filter requires running Np Earley-Stolcke parsers in parallel for each particle. We attempted to exploit analytical sub-structure in the SCFG mode sequence by deriving a Rao-Blackwellised versionusing a decision-directed scheme. In such a scheme, previous estimates of themode qˆ1:k drive a single Earley-Stolcke parser rather than maintaining separateEarley-Stolcke parsers for each particle. However, we found that errors in thepast estimates of the mode cause the filter to diverge. In future work, we seek toexplore approximations enabling more efficient conditioning on the mode.115NENE SESESXXEeff ddf f e d d(a) ENDf ef f dS NE SENE NE Ef f e df(b)Figure 32: (a) depicts the generation process of an SCFG as a branching process. The treeon the right is also called a parse tree or a derivation because the sequence isderived by repeated application of the production rules shown in the top left.The | in the production rules denotes an alternate re-writing rule. Circles withdouble boundaries represent observations while regular circles represent latentstates. The derived mode sequence q1:5 is an arc with a trapezoidal shape asshown in the bottom left. In (b), the generation process of an HMM is depictedas a linear directed graph. While the SCFG can always ensure equal opposingmovements in every sample path due to self-embedding, the HMM cannotensure this as observed in the sample HMM sequence to the left.1160 20 40 60 80 100−11−10−9−8−7−6−5−4−3TimeLog−Volatility True StateMarkov−MMPFSCFG−MMPF(a)0 5 10 15 20 250.511.522.5Noise VarianceRMS error Markov−MMPFSCFG−MMPF(b)0 5 10 15 20 250.70.750.80.850.90.95Noise VarianceTrace of confusion matrix Markov−MMPFSCFG−MMPF(c)Figure 33: (a) shows the simulated and estimated log-volatility at a noise variance of 2.0for one of the 1000 Monte Carlo runs. It can be visually observed that theSCFG-MMPF performs better than the Markov-MMPF. (b) shows the RMS er-ror between the true target state and the SCFG and Markov chain versions.Similarly, (c) shows the mode estimation rate. In either case, the SCFG versionperforms better than the Markov chain version.117(a) (b)−5 −4 −3 −2 −1 0 1 2 3 4−10123456X PositionY Position True StateMarkov−MMPFSCFG−MMPF(c)−5 0 5 10 15 200.650.70.750.80.850.90.951SNR (dB)Detection Probability Markov−MMPFSCFG−MMPF(d)−5 0 5 10 15 20051015202530SNR (dB)RMS Error (Position) Markov−MMPFSCFG−MMPF(e)−5 0 5 10 15 2000.511.52SNR (dB)RMS Error (Velocity) Markov−MMPFSCFG−MMPF(f)Figure 34: (a) and (b) are examples of sensor measurements at time instants k = 10, 81.Each sensor measurement is an image of returned intensity values. The darkbins corresponds to weak radar returns while the lighter bins correspond tostrong radar returns.(c) shows the simulated and estimated track at an SNR of2dB of one of the 1000 Monte Carlo runs. (d) shows the change in the detectionprobability with SNR. (e) shows the RMS error in position (X and Y combined)of the SCFG and Markov chain versions. Similarly, (f) shows the RMS error invelocity (X and Y combined) of the competing models. In either case, the SCFGtrajectory models perform better than the Markov chain version.1188 G E S T U R A L C O M M A N D R E C O G N I T I O N8.1 introductionGestural command recognition is an integral part of modern human-computerinteraction (HCI). Conventionally, HCI has been dominated by the ubiquitousmouse and more recently by touch interfaces. However, modern applications de-mand new technologies that do not rely on direct interaction with the computingdevice. As a result, speech and gestural commands have been garnering increasedattention as a natural and un-restrictive medium for human-computer interfaces.In this chapter, we propose a novel static gesture recognition technique utiliz-ing a joint tracking and classification approach. A syntactic framework is used todefine gestures as a sequence of geometric primitives called gesturelets. Stochas-tic context-free grammars (SCFGs) are used as a generative model for complexspatio-temporal patterns composed of gesturelets from a small alphabet set. Theexpressive power of SCFGs is able to capture long-term dependencies in the ges-ture. Moreover, the self-embedding property of SCFGs is used for scale-invariantrecognition of each gesture. As a result, our approach is robust to user variation insigning speed. Since SCFGs form very precise generative models of the gesturesused in this chapter, we are also able to solve the gesture spotting problem in a con-venient manner. Our proposed approach is also agnostic to the sensor modalityused because it primarily depends on movement patterns of the hand that can beobtained from vision-based algorithms, time-of-flight sensors and accelerometer-based devices.A physics-based generative model is proposed for gestures utilizing a regime-switching state space model. The 3D coordinates of the hand/finger is the statevariable of interest whose evolution is driven by the gesturelets composing a spe-cific gesture. A perspective projection through a pin-hole camera is used as thesensing modality with an additional “detector” stage to convert the image mea-surement into a point measurement. We use a simple edge-based detector forproof-of-concept purposes. However, this stage can be easily replaced by moreaccurate and robust detectors. The switching state space model presents a hybridestimation problem because in addition to the continuous valued 3D coordinates,we are also interested in recovering the discrete-valued modal state (gesturelet)driving the state-space model. We propose a novel Rao-Blackwellised particle fil-119IMAGE SEQUENCEGESTURE IN 3D SPACESENSOR HAND DETECTOR DETECTION SEQUENCEGESTURE MODEL 1GESTURE MODEL NIMM FILTERIMM FILTERGESTURE MODEL 1LIKELIHOODGESTURE MODEL NLIKELIHOODRECOGNIZE COMMANDFigure 35: The gesture recognition system diagramter to perform hybrid state estimation. In addition, the inference of SCFG modelsis carried out using a modified Earley-Stolcke parser.8.1.1 Literature SurveyIn this section, a brief survey of relevant work from the gesture recognition litera-ture is provided. Most non-static communicative gestures do not simultaneouslychange in both the posture or configuration of the hand and the gross motion ofthe hand [57]. In this chapter, we constrain ourselves to gestures which do notimpart any information through a particular configuration or pose of the hand.In other words, we can treat the hand as a point in three dimensional space. Atypical gestural pattern comprises of three parts: preparation, stroke and retraction.The majority of the information content is concentrated in the stroke phase of thegesture. In this chapter, we only deal with modeling the stroke phase of commandgestures. The preparation and retraction phases of the gesture are not explicitlymodeled. However, the nature of the SCFG modeling framework proposed allowsus to conveniently solve the gesture spotting problem with only a small modelingchange as explained in Sec. 8.3.On Gesture Recognition TechniquesA gesture is treated as a space-time trajectory in [58] where the 3D coordinatesof the hand is reduced to a 2D coordinate by a plane-fitting approach. The rel-120ative difference between 2D coordinates are then used as alphabets in a discretehidden Markov model to perform recognition. Such an approach is only ableto provide a discriminative model for gestures (as opposed to our physics-basedgenerative model). Moreover, additional heuristics are required to account fortemporal variation in gestures due to different signers. The SCFG framework isable to perform scale-invariant recognition due to the expressive power of its self-embedding rules. When the Markovian assumption is invalid, certain variationslike the coupled HMM[59] and the hierarchical HMM[60] have also been usedfor complex interactions in gestures. However, such techniques cannot modelunbounded long-range dependencies which are conveniently captured by the hi-erarchical branching structure of SCFG models.On Hybrid State EstimationOur proposed approach uses a regime-switching state-space model as a genera-tive model of gestures. However, finite dimensional filters do not exist for suchmodels and sub-optimal approaches must be used[24]. Since we reduce the handto a point measurement (either the centroid of the hand, or a finger tip), we canleverage the extensive literature on switching state-space models from the radartracking and econometrics community. A review of hybrid estimation in suchmodels can be found in [24]. When the regime changes are Markovian, a de factorstandard is to use the interacting multiple model (IMM) filter for tracking. How-ever, our proposed approach involves an SCFG-modulated regime change whichis not Markovian. We use a novel multiple model particle filter approach as analternative. Multiple model particle filters have been used previously for manyapplications such as tracking in high noise environments [47] and for tracking thestate of financial markets.8.1.2 Organization and ContributionsIn this section, we delineate the main results and contributions.1. In Sec. 8.2, a physics-based model using an SCFG-modulated jump statespace approach is proposed as a probabilistic generative model for spatio-temporal gesture patterns. Moreover, gestures are defined as combinationsof syntactic sub-units called gesturelets. Each gesturelet is a direction ofmovement from a small alphabet. Due to the small size of the alphabet andthe potentially large number of gestures, our proposed library of gesturescreates a high perplexity dataset which presents a significant challenge. Tothe best of our knowledge, such an approach is novel and has not beenpresented in the gesture recognition literature.1214pi3 2pi 4pi 04pi−2pi−4pi3−(a) (b)Figure 36: (a) shows the gesturelets and the radial angular directions that they represent.(b) shows the projection of a point in 3D space onto the camera sensor array.2. Novel SCFG and HMM models are presented in Sec. 8.3 for each of theunique gestural patterns used in our library of gestural commands. It can beshown using the Pumping Lemma for regular grammars that such modelscannot be modeled using hidden Markov models [36].3. Gesture recognition is solved jointly with the filtering problem in Sec. 8.4 toprovide a useful framework for applications involving continuous trackingof gestures. A novel Rao-Blackwellised particle filter solution is proposedto solve the hybrid state estimation problem posed by the SCFG-modulatedstate space model.4. In Sec. 8.5, numerical simulations are carried out on a real-world datasetshowing the effectiveness of our approach in comparison to hidden Markovmodels.5. The inference of SCFG models is explained in Chapter 4 using the Earley-Stolcke parser modified with a scaling trick to account for long gesture se-quences. The incremental left-to-right operation of the Earley-Stolcke parserallows computation of the prefix probability of partial gesture sequencesthat are used as a bootstrap proposal in our SCFG particle filter.1228.2 switching state space models for gestural commandsIn this section, the switching state space model relating the temporal evolution ofa gesture in 3D to the sensor observation is described. A user’s hand is assumedto move in one of |Q| modes, where each mode q is associated with a particularstate dynamical model tuned to that mode. The modes refer to gesturelets whichare geometric primitives of gestural patterns. They correspond to unit vectors in8 quantized radial directions as shown in Fig. 36a. The gesture dynamics can besummarized byqk ∼ P{qk|q1:k−1} (mode dynamics), (162)xk ∼ f (xk−1, qk, wk) (state dynamics), (163)zk ∼ h(xk, vk) (measurement function), (164)where qk is a gesturelet, xk = [xk, yk, zk]T is the 3D location of the centroid ofthe hand and zk is the sensor measurement. The evolution of the gesturelets ismodeled using a stochastic process as depicted in (162). In this chapter, we usehidden Markov models and stochastic context-free grammars as generative mod-els for the gesturelets q1:k. The state transition f (·) is modeled in (163) using aqk dependent process noise wk and (164) defines a sensing mechanism h(·) cor-rupted by measurement noise vk. The measurement process used is the projectivepin-hole camera model using homogeneous coordinates.8.2.1 Mode-Dependent State DynamicsThe 3D position and velocity of the hand with respect to a world coordinatesystem is represented by the state variable xk = [xk, yk, zk, 1, x˙k, y˙k, z˙k, 0]ᵀ in ho-mogeneous coordinates [61]. The evolution of the state can be modeled by amode-driven constant velocity state space modelxk =Fxk−1 + Gwk(qk), (165)zk =Hxk + vk, (166)123whereF =1 0 0 0 δτ 0 0 00 1 0 0 0 δτ 0 00 0 1 0 0 0 δτ 00 0 0 1 0 0 0 00 0 0 0 1 0 0 00 0 0 0 0 1 0 00 0 0 0 0 0 1 00 0 0 0 0 0 0 1and (167)G =δτ22 00 δτ220 00 0δτ 00 δτ0 00 0, (168)are the state transition matrix and the process noise gain matrix respectively. Thesampling time is represented by δτ. Only the state dynamics are mode-dependentthrough a modulated process noise wk(qk) ∼ N{0, Q(qk)} such that the processnoise in each mode is normally distributed with a mode-dependent mean µ(qk)and covariance Q(qk). For the radial directions shown in Fig. 36a, we use a zeromean process noise with covarianceQ(q) = ρ[σ2o 00 σ2a]ρT,ρ(q) =[sin(q) cos(q)− cos(q) sin(q)],where σ2o is the variance orthogonal to the mode direction represented by q andσ2a is the variance along the direction of mode q. The rotation matrix ρ is usedfor proper orientation to the mode q. Also, we use σ2a σ2o to model more124uncertainty along the mode and less in directions orthogonal to it. This providesdiscriminative power to each mode. The process noise is a 2-dimensional randomvariable restricted to small accelerations (nearly constant velocity model) only inthe x and y directions. The simplifying approach in our modeling approach isthat the gesture occurs in the x − y plane and that there is negligible movementin the z direction.8.2.2 Camera SensorIn this section, the sensor observation is modeled as the perspective projection of apinhole camera model. Consider a world reference coordinate system as shown inFig. 36b. A 3D point in real world homogeneous coordinates is represented as x =[x, y, z, 1]. The projection operation is a non-linear operation which is the resultof first transforming the world coordinates to a camera-centric frame representedby the action of the augmented 3× 4 rigid-body transformation matrix [M|t] andthen transforming onto the image sensor plane using the intrinsic camera matrixK. The camera projection matrix P = K [M|t], where K represents the intrinsicparameters of the camera, M is the orientation of the camera-centered frame withreference to the world coordinate frame and t is the translation of the camera-centered frame from the world origin. More specifically, the camera intrinsicmatrixK =fx 0 cx0 fy cy0 0 1 , (169)where fx, fy are the focal lengths (in pixels) of the camera in the x and y axes re-spectively. The correction factors cx and cy represent the principal point offset (inpixels) of the camera. The projected 2D homogeneous coordinates u˜ = [u˜, v˜, w˜]ᵀare obtained by the action of the camera projection matrix u˜ = Px. The imagecoordinates z = [u, v]ᵀ are obtained by the normalization to the z = 1 plane suchthat u = u˜w˜ and v = v˜w˜ . The perspective projection operation is shown in Fig. 36b.The non-linear operation involved in the perspective projection is not amenabletowards use with a Kalman filter. However, inspired by the projective Kalmanfilter [62], an adaptive measurement matrix Hk = αkP can be designed to incorpo-rate the effects of the non-linear operation such thatαk =1P3 · xˆk|k−1I4×4. (170)The notation P3 refers to the 3rd row of the projection matrix P and xˆk|k−1 is thepredicted state using the dynamics in (165). The perspective projection model125projects a point in world coordinates into pixel coordinates on a image. We as-sume that a detection operator D operates on each image captured by the cameraand outputs the centroid of the hand (or tip of the finger) as a point measurement.However, this operation is inherently error prone and is typically composed of a)sub-pixel measurement error that can be modeled as a Gaussian and b) impulsivedetection errors due to occlusions or background subtraction issues. In this chap-ter, we assume ideal detection and the measurement noise is assumed to comeentirely from sub-pixel measurement errors. As a result, the measurement noiseis modeled byukvk1 = Hkxkykzk1+ vk, (171)where the measurement noise vk ∼ N{0, R} is assumed to be zero-mean Gaussianwith covariance R. The error statistics of the detection module on ground truthdata is used to obtain the measurement noise covariance R by fitting with a zero-mean Gaussian as explained next.Finger/Hand Detection AlgorithmA simple detection algorithm is used for all numerical simulations presented inthis chapter. We assume that a reasonably constant background is present inthe scene being monitored by the sensor. In particular, the user’s hand is as-sumed to be reliably segmented from the background. A number of differentbackground subtraction/removal algorithms have been proposed in the literature[63]. A clustering method is used in conjunction with the CIELab color spacebecause Euclidean distance between colors in this space is meaningful (unlike theRGB space) and is related to human perception of color difference. Two dominantclusters are extracted from a clustering of the CIELab colors of each pixel in animage. The centers of the clusters are used as reference colors for the backgroundand the foreground. Each pixel in the image is then compared using the Euclideandistance to the background/foreground reference colors resulting in a binary im-age where the foreground is white and the background is black. An example ofthe background subtraction procedure is shown in Fig. 37b. Once the hand hasbeen isolated, a Canny edge detector is used together with various morphologi-cal operations to obtain a smooth contour of the hand. The pixel coordinate ofthe hand is represented as a centroid by using the centroid of the convex hull fitaround the hand contour. On the other hand, the pixel coordinate of the finger126(a) (b)(c)Figure 37: (a) shows an example image used in characterizing the noise statistics of the de-tection algorithm. (b) shows the background removal process using Euclideandistance in the CIELab space. (c) shows the detection of the finger tip usinga combination of geometric rules involving convexity defects and the convexhull of the hand contour.is obtained by examining convexity defects of the hand contour together with ageometric comparison of the convex hull. An example of this procedure is shownin Fig. 37c.Finally, we run our procedure on a dataset of manually labeled image sequencesto obtain the measurement error (in pixels) of our detection routine. A Gaussianfit to the measurement errors provides a good approximation which is then usedas the noise statistics of (171).127DIGITAL SIGNAL-LIKE PATTERNSRIGHT-ANGULAR PATTERNS(a)TRIANGULAR PATTERNS(b)TRAPEZOIDAL PATTERNS(c)PALINDROMIC PATTERNStransformedR(d)Figure 38: (a) shows patterns having a right angular part. These can be used as directionalcommands signifying operations like “next” or “previous”. Also depicted arepatterns looking like a digital signal. (b) shows triangular patterns which arealso called arcs. (c) shows trapezoidal patterns which are closely related to arcs.Finally, (d) depicts symmetric patterns which can be classified as palindromicpatterns. Such patterns read as the same sequence when traversed left-to-rightor right-to-left.8.3 gesture modelsIn this section, we describe HMM and SCFG gesture models for the gesturalpatterns shown in Fig. 38.1288.3.1 Right-angular PatternsThe right-angular gestural models are characterized by sentences of the forma2ngn. They can be used to denote directional commands such as “left”, “right”,“next”, “previous” etc. The primary SCFG rule for such patterns is of the formX → A A X G with X being a self-embedding rule. Repeated applications of sucha production rule generates 2 a’s for every g forming a perfect generative model.8.3.2 Digital-signal like PatternsDigital-signal like patterns are represented by rules of the form engmekcmen. Theseare intuitive patterns that can be used for positional commands like “top”, “bot-tom”, “front”, “back” etc depending on the orientation of the hump. The mostcharacteristic feature of this gesture is an equal number of movements in oppositedirections represented by the gesturelet directions g and c. These can be capturedby a self-embedding rule of the form X → G X C to generate equal number of g’sand c’s. In addition, an equal of e’s can be generated by a different self-embeddingrule of the form S→ E S E.8.3.3 Triangular and Trapezoidal PatternsTriangular gestural models are characterized by sentences of the form f ndn orf ndnam with two neighboring sides of equal length. Such patterns are differenti-ated from trapezoidal patterns of the form f nemdn or f nemdnak. A recurring themein SCFG models of such patterns is the self-embedding rule X → F X D to ensureequal lengths for the gesturelets f and d. In previous work [2], such patterns arealso called “arc-like” patterns.8.3.4 Palindromic PatternsIn the context of a gestural pattern, a palindrome is defined as a sequence ofgesturelets that form the original gesture when reversed. Due to the directionalnature of the gesturelets, a palindrome gesture represents symmetric gesturesaround the horizontal axis, the vertical axis or the origin. The characteristic formof palindromes is wwR, where w is a string of gesturelets and wR is a reversedversion of w. It is not straightforward to construct an equivalent HMM for apalindrome. As a result, in Sec. 8.5, a fully connected HMM is learned fromexample palindromic gestures.1298.4 rao-blackwellized syntactic tracking for classificationIn this section, a Rao-Blackwellised filter is proposed to track gestures from image-based point measurements. We first present a novel Rao-Blackwellised multiplemodel particle filter that uses the one-step prediction probability of an SCFGmodel as a mode proposal density. Then, we describe the gesture recognition taskas a model classification problem using likelihoods computed from the Earley-Stolcke parser.8.4.1 SCFG-based Multiple Model Particle FilterIn this section, we make use of analytical sub-structure present in the problemto reduce the variance of state estimates by using a Rao-Blackwellised particle fil-ter. Consider the filtering density P{q1:k−1, x1:k−1|z1:k−1} whose posterior densitysatisfies the following recursionP{q1:k, x1:k|z1:k} =P{zk|qk, xk}P{qk, xk|q1:k−1, x1:k−1}P{zk−1|z1:k−1}× P{q1:k−1, x1:k−1|z1:k−1}. (172)However, the recursion in (172) involves intractable integrals. Therefore, one hasto resort to some form of numerical approximations. The particle filter providessuch an approximation by representing the filtering density with an empiricalrandom measure {(q(i)1:k, x(i)1:k), w(i)k }Npi=1. This approximation can be recursively up-dated using (172).By considering the following factorization on the filtering densityP{q1:k, x1:k−1|z1:k−1} =P{x1:k−1|q1:k−1, z1:k−1}P{q1:k−1|z1:k−1}, (173)it is possible to utilize analytical sub-structure present in the system to design amore efficient algorithm. Denote the conditional probability distribution of themode as the modal probability χ1:k = P{q1:k|z1:k}. We observe that conditionedon the mode sequence q1:k, the density P{x1:k−1|q1:k−1, z1:k−1} is Gaussian and canbe computed analytically using the optimal Kalman filter if the marginal posteriordensity P{q1:k|z1:k} is known.The modal density satisfies the alternative recursionP{q1:k|z1:k} =P{zk|q1:k−1, z1:k−1}P{qk|q1:k−1}P{zk|z1:k−1}× P{q1:k−1|z1:k−1}, (174)130where the term P{zk|q1:k−1, z1:k−1} is implicitly dependent on past base state val-ues x1:k. Instead of approximating the entire filtering density, we use a weightedset of samples {q(i)1:k, w(i)k }Npi=1 to only represent the marginal posterior distributionχ1:k. The marginal density of the base state x1:k is a Gaussian mixtureP{x1:k|z1:k} =∫P{x1:k|q1:k, z1:k}dχ1:k=∫P{x1:k|q1:k, z1:k}Np∑i=1w(i)k δq(i)1:k(q1:k)=Np∑i=1w(i)k P{x1:k|q(i)1:k, z1:k} (175)that can be computed efficiently with a Kalman filter bank. The Rao-Blackwellisedparticle filter proposed only samples the modal state qk ∼ ζk|1:k−1 from the one-step prediction probability in (147). For each sampled modal state, the meanand covariance of the continuous-valued base state is updated using exact com-putations. In particular, we sample q(i)k and then propagate the mean xˆ(i)k andcovariance Σ(i)k of xk with a Kalman filter as followsxˆ(i)k|k−1 = Fxˆ(i)k−1|k−1Σ(i)k|k−1 = FΣ(i)k−1|k−1Fᵀ + GQ(q(i)k )GᵀS(i)k = HkΣ(i)k|k−1Hᵀk + RRᵀz(i)k|k−1 = Hkxˆ(i)k|k−1xˆ(i)k|k = xˆ(i)k|k−1 + Σ(i)k|kHᵀk (S(i)k )−1[zk − z(i)k|k−1]Σ(i)k|k = Σ(i)k|k − Σ(i)k|kHᵀk (S(i)k )−1HkΣ(i)k|k (176)The Kalman filter can be used to optimally represent the base state using anormal distribution P{x1:k−1|q1:k−1, z1:k−1} = N{xk; xˆk−1|k−1,Σk−1|k−1}. Conse-quently, the Kalman filter can be used to exploit analytical sub-structure for theconditional target state density. The conditional density of the discrete-valuedmode history q1:k−1 is approximated by a set of Np weighted random particles asthe empirical random measure {q(i)1:k−1, w(i)k−1}Npi=1.The prediction for the random measure {q(i)1:k−1, w(i)k−1}Npi=1 is performed using asuitable proposal density piRBPF{q(i)k |q(i)1:k−1, z1:k−1}. We choose to use the popularbootstrap proposal such that prediction of the modal state is given bypiRBPF{q(i)k = v|q(i)1:k−1} = ζ(i)k|k−1(v), (177)131where ζ(i)k|k−1(v) is the SCFG one-step prediction probability in (147) (described inSec. 8.4.2) for v ∈ V and i = 1, . . . , Np. At the end of each cycle of the sequentialimportant sampling step, the base state marginal is computed using (175) suchthat xˆk|k = E{xk|z1:k−1} and Σk|k = cov(xk|z1:k).The measurement update for the modal state is performed by the incrementalimportance weight update for the particles. In the case of using the bootstrapproposal, the importance weights reduce to the mode likelihoodP{zk|q1:k−1, z1:k−1} = N{zk; zk|k−1, Sk}. (178)Each stage of the RBPF is described in Algorithm 4.Algorithm 4 Rao-Blackwellised SCFG Particle Filter1: function RBPF({q(i)1:k−1, w(i)k−1}, xˆk−1,Σk−1zk)2: for i← 1 to Np do3: Sample q(i)k ∼ piRBPF{q(i)k |q(i)1:k−1} using (177)4: Update xˆ(i)k ,Σ(i)k using (176)5: Update importance weights w˜(i)k using (178)6: Wk = ∑Npi=1 w˜(i)k7: for i← 1 to Np do8: Normalize w(i)k =w˜(i)kWk, ∀i = {1, . . . , Np}9: Update xˆk,Σk using (175)10: if Neffective < threshold then11: RESAMPLE{q(i)k , w(i)k }12: for i← 1 to Np do13: w(i)k ←1Np14: return {q(i)1:k, w(i)k }, xˆk,Σk8.4.2 Earley-Stolcke Parser For SCFG InferenceThe main quantity that we are interested for the hybrid state estimation problemis the one-step ahead prediction P{qk|q1:k−1; GSCFG} that can be computed from132a left-right pass over an observed terminal sequence. Calculation of the one-stepahead prediction requires the notion of the “prefix” probability ζk of a symbolsequence given an SCFG model GSCFG. The computation of the prefix probabilityis represented byζk = ∑ν∈(X∪V)∗P{q1, . . . , qk, ν}, (179)where ν is an arbitrary length string in (X ∪ V)∗ denoting all possible arbitrarycombinations of non-terminal and terminal symbols. The expression in (146) rep-resents the probability that the sequence q1:k is the prefix of a sentence generatedfrom a SCFG model GSCFG and it conceptually requires a summation over all pos-sible suffixes. However, it is shown in [22] that the prefix probability ζk can beefficiently computed. A description of this is provided in Sec. 4.6.1. The one-stepprediction utilizes the probabilistic rules of the SCFG model to predict the nextmode in the sequence. This quantity is important in describing a suitable proposaldensity for the particle filter in Sec. 8.4.1. The one-step prediction probabilityP{qk = v|q1:k−1} =P{q1:k−1, qk = v}P{q1:k−1}= ζk(v)ζk−1, (180)where v ∈ Q is an element of the finite mode set Q and we have implicitly as-sumed computation with respect to a particular SCFG model GSCFG and excludedit from the expressions for the sake of brevity. The Earley-Stolcke parser [22] pro-vides an efficient algorithm to compute the quantity in (179) and the related one-step prediction probability in (180). A brief algorithmic description is provided inAlgorithm 1.8.5 numerical examplesIn this section, simulations are carried out using synthetic data assuming an idealpin-hole camera without any lens distortion effects. The characteristics of a cell-phone camera are assumed with focal length of 4mm and a camera sensor formatof 13.2”. A trajectory is simulated in 3D coordinates in the plane at z = 1m perpen-dicular to the optical axis following the different gesture shapes. At this distance,the field of view of the camera is approximately 2m which can be calculated usingθ = 2 ∗ arctan d2 f, (181)where θ is the angle of view, d is the sensor width (which can be obtained bylooking up the sensor format) and f is the focal length of the camera. The field133of view (FOV) can then be obtained from trigonometry. The different shapespresented in Sec. 8.3 are then simulated by using a simple composite of partsmodels. To avoid bias in our experiments, the ground truth is not generated bythe SCFG models. The composite of parts model build each gesture model bysimulating linear parts of the trajectory. For example, a triangle gesture (like thatshown in Fig. 38b) can be generated by concatenating the trajectories generatedby 3 different constant velocity (CV) models. The first CV model is initializedusing a normal distribution having a mean with the position at origin and aunit velocity in the NE direction such that P{x0} ∼ N{[0, 0, 1, 1, 1√2 ,1√2, 0, 0]ᵀ,Σ}.The co-variance matrix Σ can be chosen to generate trajectories with differentsignal-to-noise ratios (SNRs). This model is then used to generate the upwardpart of the arc for τ = 0, . . . , τ1 time points. The second CV model is initial-ized using the final position of the hand in CV model 1 and with velocity mag-nitude vmagτ1 =√(x˙τ1)2 + (y˙τ1)2 equal to the final velocity of the target in CVmodel 1. However, the velocity magnitude is concentrated in the x-axis suchthat P{xτ1+1} ∼ N{[xτ1 , yτ1 , 1, 1, vmagτ1 , 0, 0]ᵀ,Σ}. Finally, the last part of the arctrajectory is created by running a 3rd CV model for time points τ = τ2 + 1, . . . , Tsuch that the initial state P{xτ2+1} ∼ N{[xτ2 , yτ2 , 1, 1,vmagτ2√2,−vmagτ2√2,0]ᵀ,Σ}.The perspective projection is then applied to the location component of the tar-get state contaminated with additive noise producing sensor measurements thatact as input to the proposed filter. The additive noise models the efficacy of thehand/finger localization (in Sec. 8.2.2) used in segmenting the hand from thebackground of the image. It is typically impulsive in nature, but a simplifyingGaussian assumption can be made analogous to [62]. A filter bank is maintainedwith each filter tuned to a particular gesture model. The model with the maxi-mum likelihood at the end of the static gesture is chosen as the correct model andthe filtered state estimate from that model is used for performance metrics. Theroot mean square error in position and velocity is used as a tracking metric andmodel mismatch rates are used as a classification metric. The results are shownin Fig. 39. The SCFG models have lower RMSE than the Markov chain basedmodels.8.6 conclusionA novel physics-based model utilizing an SCFG modulated state-space is formu-lated as a generative model of various gestural commands. The joint tracking andclassification of gestural commands is carried out using a novel Rao-BlackwellisedSCFG particle filter employing a projective Kalman filter for the continuous statevariable. In the numerical simulations, it can be observed that the SCFG mod-1340 5 10 15 2001020304050RMSE in position (cm)SNR (dB) MarkovianSCFG(a)0 5 10 15 20024681012RMSE in velocity (cm/s)SNR (dB) MarkovianSCFG(b)0 0.2 0.4 0.6 0.8 100.20.40.60.81True Positive RateFalse Positive Rate Markovian (AUC = 0.7122)SCFG (AUC = 0.7676)Random ClassifierEqual Error Rate(c)Figure 39: (a) shows the RMSE in position (x, y only) (b) shows the RMSE in velocity(x,y only) and (c) shows the receiver operating characteristics for the triangulargesture models.els outperform Markov chain based gesture models. In particular, at low SNRs(when the hand/finger detection algorithm is performing poorly), the long-rangedependencies in the spatial pattern inform better tracking and classification.1359 C O N C L U S I O N SMeta-level pattern analysis in target tracking is concerned with the characteriza-tion of spatio-temporal trajectories as syntactic stochastic models. Syntactic mod-els for spatio-temporal patterns are formulated using geometric primitives likemovement patterns and positional features. These primitives are then combinedusing stochastic models to devise generative models for scale-invariant patternssignifying target intent. The conventional state space model is shown to be defi-cient when dealing with higher-level reasoning such as anomalous behavior. Thelong-term dependencies of a target plan are incorporated into target dynamicseither through a two-scale hierarchical tracking framework or via the use of aregime-switching state space model. The two models developed are the stochas-tic context-free grammar and the stochastic reciprocal model addressing shapeand destination respectively. The stochastic reciprocal model is shown to be aspecial case of time-varying context-free grammars. The major highlights of thisthesis are in 1) the modeling of spatio-temporal trajectories using such models, 2)the novel signal processing associated with SCFG and RP models, 3) the formu-lation of a two-tier hierarchical tracking system, 4) the formulation of a regime-switching state space model and 5) the novel applications in anomaly detection,track-before-detect and gesture recognition.9.1 discussion of resultsThe research in this dissertation mainly examined time-series of a target stateconsisting of physical variables such as position, velocity and acceleration. A gen-erative model of spatio-temporal trajectories indicating target intent is formulatedusing the concept of SCFG and RP models. The major domain applications are 1)in radar and visual surveillance to detect anomalous trajectories, 2) in joint targetdetection and tracking under low SNR conditions and 3) static gesture recogni-tion and tracking using visual data. The common element in all the applicationsconsidered is the modeling of characteristic spatio-temporal patterns using SCFGand RP models.In Chapter 2, Markovian models are introduced which form the basis of mostconventional tracking systems. The laws of physics obey the Markovian assump-tion at fine-scales. The position, velocity and acceleration of a target can onlychange in a continuous fashion. Consequently, the state of a target at the current136instant in time can be assumed to be dependent only on its immediate past. Inthis chapter, we introduced the terminology used in discrete time-series and thenotion of a hidden state variable. The forward-backward algorithm is introducedas the main inference algorithm used in the classification, state estimation andparameter estimation of HMMs.In Chapter 3, stochastic reciprocal models are introduced as generative modelsfor time-series with destination-constrained trajectories. A reciprocal model ischaracterized by a joint probability distribution on the starting and final state at achosen final time instant. In addition, the transitions between states are governedby a three-point transition probability depending on the immediate past and theimmediate future. Inference using RP models are made possible through the useof a Markov bridge. The three-point transition probabilities are used togetherwith a known final state to backtrack towards an initial state. As a result, pre-computed transition matrices were used within a forward-backward algorithmlike procedure for classification and state estimation. The parameter estimationof RP models was not addressed in this dissertation. The major limitation of theRP approach is the assumption of a known final time at which the target reachesits destination.In Chapter 4, grammar models are introduced as generative models for sym-bolic time-series with hierarchical structure. Such models have found widespreadapplication in application such as protein folding structure, natural language pro-cessing and action recognition. The stochastic context-free grammar is identifiedas the most feasible modeling framework from Chomsky’s hierarchy of grammarmodels due to the availability of polynomial algorithms for inference. More gen-eral frameworks like context-sensitive models have no known polynomial infer-ence algorithms. The Earley-Stolcke algorithm is presented as a top-down parserfor the classification and Viterbi parsing of symbol sequences.In Chapter 5, continuous-time state space models, and more general counter-parts such as switching-state space models, are presented. The probabilistic in-ference using classical methods like the Kalman filter, particle filter and the in-teracting multiple model filter were outlined in preparation for the extension toSCFG-modulated state space models.Chapter 6 contains the bulk of the work on hierarchical two-time scale trackingusing SCFG and RP models. The problem setting is that of an anomaly detec-tion framework seeking to classify a known malicious trajectory amongst a modelset of target trajectories. Both radar-based and visual surveillance scenarios wereconsidered with suitable numerical experiments. The major contribution is thatof using SCFG models to constrain the shape of a trajectory and the use of RPmodels to constrain the target destination. In addition, the SCFG models areimbibed with destination-awareness by deriving first-order moment constraints137on the rule probabilities. The shape and destination constraints are then com-bined using a probabilistic fusion of the individual likelihoods. Several modelsfor trajectories like random walk, destination-aware, rendezvous and palindromepaths were developed together with the signal processing machinery for classi-fication and state estimation. Numerical simulations demonstrated significantperformance increase over competing hidden Markov models.Chapter 7 deviated from the hierarchical two-scale tracking framework and di-rectly incorporated SCFG dynamics within a regime-switching state space model.The assumption of jump Markovian state space models is challenged by adoptinga jump SCFG state space model for target dynamics exhibiting particular trajec-tory patterns. A finite set of directional modes are defined for a maneuveringtarget where the switching between modes is governed by a sequence generatedby an SCFG. The state estimation problem in such a scenario presented a hybridfiltering problem. A sub-optimal particle filtering procedure is developed usingthe prefix probability of the Earley-Stolcke parser as a proposal density for a mul-tiple model particle filter. It is shown that using trajectory models within thephysical state space dynamics allows tracking targets in low SNR with a betterfidelity than competing hidden Markov models.Chapter 8 addressed a new application domain of static gesture recognition.If the hand is represented as a point object, then the evolution of a gesture canbe captured as a spatio-temporal trajectory obeying generation dynamics of anSCFG. Several novel gestural patterns are modeled using SCFGs incorporatinginvariance to signer speed/variation. The joint classification and tracking of ges-tures is then solved using a bank of filters. Moreover, the non-linear projectionin camera sensors is captured using an adaptive linear measurement process sothat the continuous state variables can be filtered using a Kalman filter while thediscrete-state mode variable utilized a particle filter. Such a Rao-Blackwellisationscheme was shown to reduce the variance in the state estimation error. Thischapter also addressed the unavailability of real world data for radar tracking sce-narios. A controlled environment was used to capture real-world gestural datafor numerical experiments.The use of non-Markovian models postulated in this thesis leads to better per-formance when the underlying trajectory dynamics on a coarse-time scale havelong-range dependencies. However, the use of the Earley-Stolcke filter as opposedto an HMM filter comes with a higher computational cost. For example, inferencewith HMMs is O(N2T), where N is the number of states and T is the length of thetrajectory. Comparatively, the Earley-Stolcke filter has a worst case computationalcomplexity of O(N3T3), where N is the number of non-terminals. However, thenumber of states in an HMM used to model a trajectory is usually far greater than138the number of non-terminals used in an equivalent SCFG. This is due to the factthat a non-terminal can represent a hierarchical relationship.9.2 future workThere are several avenues for future work associated with meta-level pattern anal-ysis in target tracking.9.2.1 Approximate Inference Using Context-Sensitive GrammarsThe use of stochastic context-free grammar models is mainly motivated for theiruse in explaining long-range branching dependencies in the time-series. More-over, the existence of polynomial inference algorithms also plays a large part inthe effectiveness of SCFG models. However, certain patterns involving cross-overdependencies (such as copy languages found in RNA/DNA strings) can only begenerated using context-sensitive grammar models. A small class of languagescalled mildly context-sensitive [64] permit polynomial time algorithms for infer-ence and may be used to devise novel models for other anomalous trajectoriesinvolving cross-over dependencies. Additionally, there is very little publishedwork on approximate inference methods for context-sensitive grammar models.For example, the work presented in [65] decomposes a context-sensitive gram-mar as a series of causal stochastic context-free grammars and provides statisticalalgorithms for parameter estimation using such a decomposition.9.2.2 Grammatical Structure InferenceGrammatical inference consists of two parts: estimation of the grammar structure,and the estimation of the model parameters [66, 67]. The parameter estimationof an SCFG, given pre-determined rules can be solved using the inside outsidealgorithm or the Earley-Stolcke parser. The challenge remains, however, for theestimation of the structure (the determination of the form of each rule). In [37] theinduction of regular grammar rules is proposed based on the idea of construct-ing a “canonical” grammar that generates only the strings in the training dataand then replaces pairs of non-terminals with a single non-terminal repeatedly toattempt generalization.In [67], the application of Bayesian model merging is proposed to iterativelyconstruct non-terminals that can generate the strings in the training data. A typi-cal approach to model merging comprises of a data assimilation step and a structuremerging step. Data assimilation builds an initial model by explicitly accommodat-139ing each training example in a training data set. Consequently, the initial modelwill grow with the amount of data and will not exhibit generalization. Struc-ture merging uses a merging operator on the current model to obtain a sequenceof new models. The merging operator coalesces sub-structures in the currentmodel that previously explain shared data points into a composite structure inthe merged model. As a result of the structure merging process, the model movesfrom a naive instance-based model to a more complex generalized model.Another possible approach may be to use genetic programming which hasfound widespread use in problems outside the standard function optimizationformulation. Genetic programming is an extension of genetic algorithms. Whilethe genetic algorithm operates on a population of fixed length binary strings, ge-netic programming operates on a population of parse trees. The mutation andcross-over operators of genetic programming can be iteratively applied to a pop-ulation of parse trees in the search of a tree that minimizes a suitable cost func-tion. Because of its ability to deal with tree structure and discrete knowledgerepresentation, genetic programming may be a plausible candidate in solving thegrammatical structure identification problem.9.2.3 Novel Application DomainsThe main motivation in the use of SCFG models is to challenge the Markovianassumption in time-series analysis involving long-range dependent effects. Forexample, factorial HMMs [68] have enjoyed recent success in the load disaggre-gation problem which involves recognizing the power consumption of individualappliances from an aggregated measurement of total power consumption in ahousehold. The use of an HMM implies that the power consumption trend of aparticular appliance does not contain long-term patterns. However, simple intu-ition suggests that using a particular appliance at a specific time or in a specificplace may affect its own use or the use of other appliances in the future. Such ahierarchical structure is ideally suited towards modeling using SCFG models.In a similar vein, the hierarchical structure involved in the generation of plansis also amenable towards SCFG modeling. For example, the work [69] utilizes aprobabilistic state-dependent grammar to model the plan generation process ofan agent. Using such a model, various plan recognition queries can be answered.Another application which was considered during the course of this thesis, isthat of measurement control using feedback from a syntactic tracker. The scenariois the following: A radar is tasked with persistent surveillance of a surveillancespace containing different assets and targets. The assets can be considered to befriendly and the targets are considered to be hostile. The task of the measurementcontrol algorithm is to measure the threat in the surveillance making use of its140finite radar scheduling resources. The main use of the SCFG modeling frameworkis in the quantification of threat based on the spatio-temporal trajectories exhibitedby different targets.Finally, there is considerable interest in developing stochastic control algorithmsfor an SCFG generation process. The rich theory of Markov decision processes(MDPs) and partially observed Markov decision processes (POMDPs) has enabledvarious applications in communications theory, operations research etc. If theunderlying state of the world evolves as an SCFG, similar decision algorithms canbe devised for optimal control of such processes.141B I B L I O G R A P H Y[1] M. Fanaswala, V. Krishnamurthy, and L. White, “Destination-aware targettracking via syntactic signal processing,” in 36th IEEE International Conferenceon Acoustics, Speech and Signal Processing (ICASSP), May 2011, pp. 3692–3695.[2] M. Fanaswala and V. Krishnamurthy, “Detection of anomalous trajectory pat-terns in target tracking via stochastic context-free grammars and reciprocalprocess models,” IEEE Journal of Selected Topics in Signal Processing, vol. 7,no. 1, pp. 76–90, 2013.[3] V. Krishnamurthy and M. Fanaswala, “Intent inference via syntactic track-ing,” Digital Signal Processing, vol. 21, no. 5, pp. 648–659, 2011.[4] M. Fanaswala and V. Krishnamurthy, “Spatio-temporal trajectory models formeta-level target tracking,” IEEE Aerospace and Electronic Systems Magazine,vol. 30, no. 1, pp. 16–31, Jan 2015.[5] ——, “Syntactic models for trajectory constrained track-before-detect,” IEEETransactions on Signal Processing, vol. 62, no. 23, pp. 6130–6142, Dec 2014.[6] ——, “Meta-level tracking for gestural intent recognition,” in 40th IEEE Inter-national Conference on Acoustics, Speech and Signal Processing (ICASSP), April2015.[7] B. Balaji, M. Fanaswala, and V. Krishnamurthy, “Stochastic context-free gram-mars for scale-dependent intent inference,” in Proceedings of the SPIE, vol.8745, 2013.[8] M. Fanaswala and V. Krishnamurthy, “Syntactic track-before-detect,” in 5thIEEE International Workshop on Computational Advances in Multi-Sensor Adap-tive Processing (CAMSAP), Dec 2013, pp. 41–44.[9] ——, “Spatio-temporal trajectory models for target tracking,” in 17th Interna-tional Conference on Information Fusion (FUSION), July 2014, pp. 1–8.[10] K. P. Murphy, Machine learning: a probabilistic perspective. Cambridge, MA:MIT Press, 2012.[11] K. Fukunaga, Introduction to Statistical Pattern Recognition. Elsevier Science,1990.142[12] A. Webb and K. Copsey, Statistical Pattern Recognition. Wiley, 2011.[13] L. E. Baum, T. Petrie, G. Soules, and N. Weiss, “A maximization technique oc-curring in the statistical analysis of probabilistic functions of markov chains,”The Annals of Mathematical Statistics, vol. 41, no. 1, pp. 164–171, 02 1970.[14] L. R. Rabiner, “A tutorial on hidden Markov models and selected applica-tions in speech recognition,” in Proceedings of the IEEE, 1989, pp. 257–286.[15] X. Guyon, Random Fields on a Network: Modeling, Statistics, and Applications.Springer, 1995.[16] B. Jamison, “Reciprocal processes: The stationary Gaussian case,” Ann. Math.Statist., vol. 41, pp. 1624–1630, 1970.[17] N. Chomsky, “On certain formal properties of grammars,” Information andControl, vol. 2, no. 2, pp. 137 – 167, 1959.[18] C. Cassandras and S. Lafortune, Introduction to Discrete Event Systems.Springer, 2008.[19] I. Salvador and J.-M. Benedi, “Rna modeling by combining stochastic context-free grammars and n-gram models,” International Journal of Pattern Recogni-tion and Artificial Intelligence, vol. 16, no. 03, pp. 309–315, 2002.[20] C. D. Manning and H. Schütze, Foundations of Statistical Natural LanguageProcessing. MIT Press, 1999, vol. 999.[21] J. E. Hopcroft and J. D. Ullman, Introduction to Automata Theory, Languages,and Computation. Addison-Wesley, 1979.[22] A. Stolcke, “An efficient probabilistic context-free parsing algorithm thatcomputes prefix probabilities,” Computational Linguistics, vol. 21, no. 2, pp.165–201, 1995.[23] K. Lari and S. J. Young, “The estimation of stochastic context-free grammarsusing the inside-outside algorithm,” Computer Speech and Language, vol. 4,no. 1, pp. 35 – 56, 1990.[24] Y. Bar-Shalom, T. Kirubarajan, and X. Li, Estimation with Applications to Track-ing and Navigation. New York, NY, USA: John Wiley & Sons, Inc., 2002.[25] B. Ristic, S. Arulampalam, and N. Gordon, Beyond the Kalman Filter: ParticleFilters for Tracking Applications. Artech House, 2004.143[26] M. Mallick, V. Krishnamurthy, and B. Vo, Integrated Tracking, Classification,and Sensor Management: Theory and Applications. Wiley, 2012.[27] J. Mohandoss, “A novel approach to detect anomalous behaviour using ges-ture recognition,” in Proceedings of the Fourth International Conference on Signaland Image Processing 2012 (ICSIP 2012). Springer India, 2013, vol. 222, pp.113–126.[28] M. Takahashi, M. Fujii, M. Naemura, and S. Satoh, “Human gesture recog-nition system for TV viewing using time-of-flight camera,” Multimedia Toolsand Applications, vol. 62, no. 3, pp. 761–783, 2013.[29] K. G. Derpanis, M. Sizintsev, K. J. Cannons, and R. P. Wildes, “Action spot-ting and recognition based on a spatiotemporal orientation analysis,” IEEETransactions on Pattern Analysis and Machine Intelligence, vol. 35, no. 3, pp. 527–540, 2013.[30] J. W. Palmer, K. F. Bing, A. C. Sharma, and J. B. Perkins, “Exploitation ofradar doppler signatures for gait analysis,” in Forty Sixth Asilomar Conferenceon Signals, Systems and Computers (ASILOMAR). IEEE, 2012, pp. 629–632.[31] G. Androulidakis, V. Chatzigiannakis, and S. Papavassiliou, “Networkanomaly detection and classification via opportunistic sampling,” IEEE Net-work, vol. 23, no. 1, pp. 6–12, 2009.[32] S. Liu, L. Ni, and R. Krishnan, “Fraud detection from taxis’ driving behav-iors,” IEEE Transactions on Vehicular Technology, vol. 63, no. 1, pp. 464–472, Jan2014.[33] B. T. Morris and M. M. Trivedi, “Trajectory learning for activity understand-ing: Unsupervised, multilevel, and long-term adaptive approach,” IEEETransactions on Pattern Analysis and Machine Intelligence, vol. 33, no. 11, pp.2287–2301, 2011.[34] M. S. Ryoo and J. K. Aggarwal, “Recognition of composite human activitiesthrough context-free grammar based representation,” in Proceedings of the2006 Conference on Computer Vision and Pattern Recognition (CVPR ’06), 2006,pp. 1709–1718.[35] A. Bobick and Y. Ivanov, “Action recognition using probabilistic parsing,” inProceedings of the 1998 Conference on Computer Vision and Pattern Recognition(CVPR’98), June 1998, pp. 196 –202.144[36] A. V. Aho and J. D. Ullman, The theory of parsing, translation, and compiling.Upper Saddle River, NJ, USA: Prentice-Hall, Inc., 1972.[37] K. S. Fu, Syntactic Pattern Recognition and Applications. Englewood Cliffs, NJ,USA: Prentice-Hall, Inc., 1982.[38] C. S. Wetherell, “Probabilistic languages: A review and some open ques-tions,” ACM Comput. Surv., vol. 12, pp. 361–379, December 1980.[39] J. Benediktsson and P. Swain, “Consensus theoretic classification methods,”IEEE Transactions on Systems, Man, and Cybernetics, vol. 22, no. 4, pp. 688–704,1992.[40] U. Grenander, Elements of Pattern Theory. Johns Hopkins University Press,1996.[41] S.-C. Zhu and D. Mumford, “A stochastic grammar of images,” Foundationsand Trends in Computer Graphics and Visualization, vol. 2, no. 4, pp. 259–362,Jan. 2006.[42] M. Skolnik, Introduction to Radar Systems. McGraw-Hill, 1962.[43] P. Maybeck and D. Mercier, “A target tracker using spatially distributed in-frared measurements,” Automatic Control, IEEE Transactions on, vol. 25, no. 2,pp. 222–225, 1980.[44] Y. Barniv, “Dynamic programming solution for detecting dim moving tar-gets,” Aerospace and Electronic Systems, IEEE Transactions on, vol. AES-21, no. 1,pp. 144–156, Jan 1985.[45] M. Bruno, “Bayesian methods for multi-aspect target tracking in image se-quences,” Signal Processing, IEEE Transactions on, vol. 52, no. 7, pp. 1848–1861,2004.[46] D. J. Salmond and H. Birch, “A particle filter for track-before-detect,” inProceedings of the American Control Conference, Arlington, VA, USA, June 2001,pp. 3755–3760.[47] D. Samuel J, R. Mark G, and C. Brian, “A comparison of detection perfor-mance for several track-before-detect algorithms,” EURASIP Journal on Ad-vances in Signal Processing, vol. 2008, 2007.[48] M. Walsh, M. Graham, R. Streit, T. Luginbuhl, and L. Mathews, “Trackingon intensity-modulated sensor data streams,” in Aerospace Conference, 2001,IEEE Proceedings., vol. 4, 2001, pp. 4/1901–4/1909 vol.4.145[49] B.-N. Vo, B.-T. Vo, N.-T. Pham, and D. Suter, “Joint detection and estimationof multiple objects from image observations,” Signal Processing, IEEE Transac-tions on, vol. 58, no. 10, pp. 5129–5141, 2010.[50] S. Davey, M. Rutten, and N. Gordon, “Track-before-detect techniques,” inIntegrated Tracking, Classification, and Sensor Management, M. Mallick, V. Krish-namurthy, and B.-N. Vo, Eds. John Wiley and Sons, 2013, pp. 43–74.[51] E. Grossi, M. Lops, and L. Venturino, “A novel dynamic programming algo-rithm for track-before-detect in radar systems,” Signal Processing, IEEE Trans-actions on, vol. 61, no. 10, pp. 2608–2619, 2013.[52] J. Yin, X. Chai, and Q. Yang, “High-level goal recognition in a wireless LAN,”in Proceedings of the 19th National Conference on Artifical intelligence. AAAIPress, 2004, pp. 578–583.[53] E. Marhasev, M. Hadad, G. A. Kaminka, and U. Feintuch, “The use of hid-den semi-Markov models in clinical diagnosis maze tasks,” Intelligent DataAnalysis, vol. 13, no. 6, pp. 943–967, 2009.[54] S.-W. Joo and R. Chellappa, “Attribute grammar-based event recognition andanomaly detection,” in Proceedings of the 2006 Conference on Computer Visionand Pattern Recognition Workshop, ser. CVPRW ’06. Washington, DC, USA:IEEE Computer Society, 2006, pp. 107–115.[55] A. Wang, V. Krishnamurthy, and B. Balaji, “Intent inference and syntactictracking with GMTI measurements,” IEEE Transactions on Aerospace and Elec-tronic Systems, vol. 47, no. 4, pp. 2824–2843, 2011.[56] Y. Bao, C. Chiarella, and B. Kang, “Particle filters for Markov switchingstochastic volatility models,” Quantitative Finance Research Centre, Univer-sity of Technology, Sydney, Tech. Rep., January 2012.[57] F. Quek, “Unencumbered gestural interaction,” MultiMedia, IEEE, vol. 3,no. 4, pp. 36–47, 1996.[58] Y. Nam, K. Wohn et al., “Recognition of space-time hand-gestures using hid-den markov model,” in ACM symposium on Virtual reality software and technol-ogy, 1996, pp. 51–58.[59] M. Brand, N. Oliver, and A. Pentland, “Coupled hidden markov models forcomplex action recognition,” in Computer Vision and Pattern Recognition, 1997.Proceedings., 1997 IEEE Computer Society Conference on. IEEE, 1997, pp. 994–999.146[60] G. S. Chambers, S. Venkatesh, G. A. West, and H. H. Bui, “Hierarchical recog-nition of intentional human gestures for sports video annotation,” in PatternRecognition, 2002. Proceedings. 16th International Conference on, vol. 2. IEEE,2002, pp. 1082–1085.[61] R. Szeliski, Computer Vision: Algorithms and Applications, ser. Texts in com-puter science. Springer, 2010.[62] C. Canton-Ferrer, J. R. Casas, A. M. Tekalp, and M. Pardas, “Projectivekalman filter: Multiocular tracking of 3d locations towards scene understand-ing,” in Machine Learning for Multimodal Interaction. Springer, 2006, pp. 250–261.[63] M. Cristani, M. Farenzena, D. Bloisi, and V. Murino, “Background sub-traction for automated multisensor surveillance: a comprehensive review,”EURASIP Journal on Advances in signal Processing, vol. 2010, p. 43, 2010.[64] K. Nakamura and K. Imada, “Towards incremental learning of mildlycontext-sensitive grammars,” in 10th International Conference on Machine Learn-ing and Applications and Workshops (ICMLA), 2011, vol. 1, Dec 2011, pp. 223–228.[65] J. Huang, D. Schonfeld, and V. Krishnamurthy, “A new context-sensitivegrammars learning algorithm and its application in trajectory classification,”in 2012 19th IEEE International Conference on Image Processing (ICIP), Sept 2012,pp. 3093–3096.[66] G. Chanda and F. Dellaert, “Grammatical methods in computer vision: Anoverview,” Georgia Institute of Technology, Tech. Rep., 2004.[67] A. Stolcke and S. Omohundro, “Inducing probabilistic grammars by Bayesianmodel merging,” in Grammatical Inference and Applications. Springer, 1994,pp. 106–118.[68] A. Zoha, A. Gluhak, M. Nati, and M. Imran, “Low-power appliance moni-toring using factorial hidden markov models,” in IEEE Eighth InternationalConference on Intelligent Sensors, Sensor Networks and Information Processing,2013, April 2013, pp. 527–532.[69] D. V. Pynadath and M. P. Wellman, “Probabilistic state-dependent grammarsfor plan recognition,” in Proceedings of the Sixteenth conference on Uncertainty inArtificial Intelligence. Morgan Kaufmann Publishers Inc., 2000, pp. 507–514.147
- Library Home /
- Search Collections /
- Open Collections /
- Browse Collections /
- UBC Theses and Dissertations /
- Meta-level pattern analysis for target tracking
Open Collections
UBC Theses and Dissertations
Featured Collection
UBC Theses and Dissertations
Meta-level pattern analysis for target tracking Fanaswala, Mustafa H. 2015
pdf
Page Metadata
Item Metadata
Title | Meta-level pattern analysis for target tracking |
Creator |
Fanaswala, Mustafa H. |
Publisher | University of British Columbia |
Date Issued | 2015 |
Description | Classical target tracking operates on a fast time-scale (order of seconds) during which target dynamics are constrained by the physical laws of motion. This dissertation is motivated by pattern analysis on a meta-level (order of minutes or larger) during which the intent of a target manifests. On such a coarse time-scale, Markovian models quantifying the physical laws of motion are not useful in detecting anomalous behavior. This is due to the hierarchical nature of plans and goal-oriented targets. In this dissertation, several novel stochastic models are devised to capture long-range dependencies within target trajectories for the joint purpose of classification and enhanced tracking. The behavior of targets on the meta level is captured through both positional features (destination) as well as movement patterns (shape). Such features are used with context-free grammar models and reciprocal Markov models (one dimensional Markov random fields) for modeling spatial trajectories with a known end point. The intent of a target is assumed to be a function of the shape of the trajectory it follows and its intended destination. The stochastic grammar models developed are concerned with trajectory shape classification while the reciprocal Markov models are used for destination prediction. Towards this goal, Bayesian signal processing algorithms with polynomial complexity are also presented. The versatility of such models is illustrated with tracking applications in radar-based and optical surveillance. |
Genre |
Thesis/Dissertation |
Type |
Text |
Language | eng |
Date Available | 2015-04-16 |
Provider | Vancouver : University of British Columbia Library |
Rights | Attribution-NonCommercial-NoDerivs 2.5 Canada |
DOI | 10.14288/1.0166215 |
URI | http://hdl.handle.net/2429/52813 |
Degree |
Doctor of Philosophy - PhD |
Program |
Electrical and Computer Engineering |
Affiliation |
Applied Science, Faculty of Electrical and Computer Engineering, Department of |
Degree Grantor | University of British Columbia |
GraduationDate | 2015-05 |
Campus |
UBCV |
Scholarly Level | Graduate |
Rights URI | http://creativecommons.org/licenses/by-nc-nd/2.5/ca/ |
AggregatedSourceRepository | DSpace |
Download
- Media
- 24-ubc_2015_may_fanaswala_mustafa.pdf [ 5.99MB ]
- Metadata
- JSON: 24-1.0166215.json
- JSON-LD: 24-1.0166215-ld.json
- RDF/XML (Pretty): 24-1.0166215-rdf.xml
- RDF/JSON: 24-1.0166215-rdf.json
- Turtle: 24-1.0166215-turtle.txt
- N-Triples: 24-1.0166215-rdf-ntriples.txt
- Original Record: 24-1.0166215-source.json
- Full Text
- 24-1.0166215-fulltext.txt
- Citation
- 24-1.0166215.ris
Full Text
Cite
Citation Scheme:
Usage Statistics
Share
Embed
Customize your widget with the following options, then copy and paste the code below into the HTML
of your page to embed this item in your website.
<div id="ubcOpenCollectionsWidgetDisplay">
<script id="ubcOpenCollectionsWidget"
src="{[{embed.src}]}"
data-item="{[{embed.item}]}"
data-collection="{[{embed.collection}]}"
data-metadata="{[{embed.showMetadata}]}"
data-width="{[{embed.width}]}"
async >
</script>
</div>
Our image viewer uses the IIIF 2.0 standard.
To load this item in other compatible viewers, use this url:
https://iiif.library.ubc.ca/presentation/dsp.24.1-0166215/manifest