Adaptive Trend Change Detection and Pattern Recognition in Physiological Monitoring by Ping Yang A THESIS SUBMITTED IN PARTIAL FULFILLMENT OF THE REQUIREMENTS FOR THE DEGREE OF DOCTOR OF PHILOSOPHY in The Faculty of Graduate Studies (Electrical and Computer Engineering) THE UNIVERSITY OF BRITISH COLUMBIA (Vancouver) June 2009 c Ping Yang 2009 Abstract Advances in monitoring technology have resulted in the collection of a vast amount of data that exceeds the simultaneous surveillance capabilities of expert clinicians in the clinical environment. To facilitate the clinical decisionmaking process, this thesis solves two fundamental problems in physiological monitoring: signal estimation and trend-pattern recognition. The general approach is to transform changes in different trend features to nonzero shifts by calculating the model-based forecast residuals and then to apply a statistical test or Bayesian approach on the residuals to detect changes. The EWMA-Cusum method describes a signal as the exponentially moving weighted average (EWMA) of historical data. This method is simple, robust, and applicable to most variables. The method based on the Dynamic Linear Model (refereed to as Adaptive-DLM method) describes a signal using the linear growth model combined with an EWMA model. An adaptive Kalman filter is used to estimate the second-order characteristics and adjust the change-detection process online. The Adaptive-DLM method is designed for monitoring variables measured at a high sampling rate. To address the intraoperative variability in variables measured at a low sampling rate, a generalized hidden Markov model is used to classify trend changes into different patterns and to describe the transition between these patterns as a first-order Markov-chain process. Trend patterns are recognized online with a quantitative evaluation of the occurrence probability. In addition to the univariate methods, a test statistic based on Factor Analysis is also proposed to investigate the inver-variable relationship and to reveal subtle clinical events. A novel hybrid median filter is also proposed to fuse heartrate measurements from the ECG monitor, pulse oximeter, and arterial BP monitor to obtain accurate estimates of HR in the presence of artifacts. These methods have been tested using simulated and clinical data. The EWMA-Cusum and Adaptive-DLM methods have been implemented in a software system iAssist and evaluated by clinicians in the operating room. The results demonstrate that the proposed methods can effectively detect trend changes and assist clinicians in tracking the physiological state of a patient during surgery. ii Table of Contents Abstract . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . ii Table of Contents . . . . . . . . . . . . . . . . . . . . . . . . . . . . iii List of Tables . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . vii List of Figures . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . viii Acronyms . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . x Acknowledgements . . . . . . . . . . . . . . . . . . . . . . . . . . . xii 1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . 1.1 Motivation . . . . . . . . . . . . . . . . . . . . . . . . . 1.1.1 Cognitive challenges in physiological monitoring 1.1.2 Standard alarm system . . . . . . . . . . . . . . 1.1.3 Decision making in physiological monitoring . . 1.2 Problem statement . . . . . . . . . . . . . . . . . . . . 1.2.1 Aims of study . . . . . . . . . . . . . . . . . . . 1.2.2 Scope of application . . . . . . . . . . . . . . . . 1.2.3 Challenges . . . . . . . . . . . . . . . . . . . . . 1.3 Outline . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1 1 1 2 4 6 6 7 7 9 2 Literature review . . . . . . . . . . . . . . . . . 2.1 Overview of clinical decision-support systems 2.2 Early stage trend-change detection . . . . . . 2.3 Diagnostic monitoring . . . . . . . . . . . . . 2.4 Temporal abstraction . . . . . . . . . . . . . 2.5 Artifact removal and signal estimation . . . . 2.6 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11 11 12 15 18 22 23 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . iii Table of Contents 3 Change detection techniques and study procedure 3.1 Level-shift change detection . . . . . . . . . . . . . . 3.1.1 Uncertainty in level shifts . . . . . . . . . . . 3.1.2 Hypothesis testing . . . . . . . . . . . . . . . 3.1.3 Sequential probability ratio test . . . . . . . 3.1.4 Bayesian approaches . . . . . . . . . . . . . 3.2 Study procedure . . . . . . . . . . . . . . . . . . . . 3.2.1 Data acquisition . . . . . . . . . . . . . . . . 3.2.2 Data annotation . . . . . . . . . . . . . . . . 3.2.3 Model selection . . . . . . . . . . . . . . . . 3.2.4 Performance evaluation . . . . . . . . . . . . 4 Two-level change detection based on the EWMA 4.1 Method . . . . . . . . . . . . . . . . . . . . . . . . 4.1.1 EWMA-based forecasting model . . . . . . 4.1.2 Cusum-based multilevel change detection . 4.1.3 Abruptness of trend changes . . . . . . . . 4.1.4 Alert management . . . . . . . . . . . . . . 4.2 Results . . . . . . . . . . . . . . . . . . . . . . . . 4.2.1 Example case . . . . . . . . . . . . . . . . 4.2.2 Performance evaluation . . . . . . . . . . . 4.3 Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25 25 25 26 27 30 31 31 31 31 33 model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 34 34 34 35 36 36 37 38 38 40 5 Adaptive trend change detection based on the linear growth dynamic model . . . . . . . . . . . . . . . . . . . . . . . . . . . 5.1 Method . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5.1.1 Linear growth dynamic linear model . . . . . . . . . . 5.1.2 Adaptive Kalman filter . . . . . . . . . . . . . . . . . 5.1.3 EWMA of the DLM predictions . . . . . . . . . . . . 5.1.4 Adaptive change detection . . . . . . . . . . . . . . . 5.1.5 Trend patterns . . . . . . . . . . . . . . . . . . . . . . 5.1.6 Trigg’s tracking signal . . . . . . . . . . . . . . . . . . 5.2 Test on heart rate trend signals . . . . . . . . . . . . . . . . 5.2.1 Q and R estimation . . . . . . . . . . . . . . . . . . . 5.2.2 Parameter tuning . . . . . . . . . . . . . . . . . . . . 5.2.3 Comparison with Trigg’s tracking signal . . . . . . . . 5.2.4 Sensitivity analysis . . . . . . . . . . . . . . . . . . . 5.3 Results on EtCO2 , MVexp, and Ppeak . . . . . . . . . . . . 5.4 Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . 43 44 44 45 48 49 49 50 50 51 52 53 55 57 58 iv Table of Contents 6 Adaptive change detection based on the GHMM . . . . . 6.1 Generalized hidden Markov model . . . . . . . . . . . . . . 6.2 Fixed-point switching Kalman smoother . . . . . . . . . . . 6.2.1 State-space intra-segmental model . . . . . . . . . . 6.2.2 Review of state estimation with the GHMM . . . . 6.2.3 Fixed-point Kalman smoother . . . . . . . . . . . . 6.2.4 Generalized Pseudo Bayesian algorithm . . . . . . . 6.2.5 Overall procedure . . . . . . . . . . . . . . . . . . . 6.2.6 Computational and space complexity . . . . . . . . 6.3 Adaptive Cusum test based on the GHMM . . . . . . . . . 6.3.1 Prerequisites for the intra-segmental model . . . . . 6.3.2 Offline solution: extended Viterbi algorithm . . . . 6.3.3 Online solution: beam search with Cusum pruning . 6.3.3.1 Beam search scheme . . . . . . . . . . . . 6.3.3.2 Adaptive Cusum test . . . . . . . . . . . . 6.4 Application to NIBPmean monitoring . . . . . . . . . . . . 6.4.1 Parameter estimation for the Markov chain regime . 6.4.2 Parameter estimation for the intra-segmental models 6.5 Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6.5.1 Results of the switching Kalman smoother . . . . . 6.5.1.1 Results on simulated data . . . . . . . . . 6.5.1.2 Results on clinical data . . . . . . . . . . . 6.5.2 Results of the adaptive Cusum test . . . . . . . . . 6.5.3 Summary of the results . . . . . . . . . . . . . . . . 6.6 Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 61 63 66 66 66 67 67 70 72 74 74 74 78 79 81 83 84 85 86 87 87 89 90 92 93 7 Multivariate change detection based on factor analysis 7.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . 7.2 Factor analysis . . . . . . . . . . . . . . . . . . . . . . . . 7.2.1 Maximum likelihood estimation of a factor model 7.2.2 Change detection based on the FA . . . . . . . . . 7.3 Simulated test . . . . . . . . . . . . . . . . . . . . . . . . 7.3.1 Generation of simulated scenarios . . . . . . . . . 7.3.2 Case study . . . . . . . . . . . . . . . . . . . . . . 7.3.3 Performance of the Cusum test on F and E . . . 7.4 Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 97 97 98 99 100 102 102 104 107 108 . . . . . . . . . . v Table of Contents 8 Artifact detection and data reconciliation . . . . . . 8.1 Methods . . . . . . . . . . . . . . . . . . . . . . . . . 8.1.1 Univariate temporal filtering . . . . . . . . . . 8.1.1.1 Kalman filter as MMSE estimator . . 8.1.1.2 Median filter as optimal L-1 filter . . 8.1.2 Data reconciliation with structural redundancy 8.1.3 Dynamic data reconciliation . . . . . . . . . . 8.1.3.1 Kalman filter for data reconciliation 8.1.3.2 A novel hybrid median filter . . . . . 8.2 Evaluation of performance . . . . . . . . . . . . . . . 8.2.1 Simulation test . . . . . . . . . . . . . . . . . . 8.2.2 Case studies with clinical data . . . . . . . . . 8.3 Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 110 111 111 112 113 115 117 117 118 121 121 125 127 9 Software implementation and clinical testing . . 9.1 Software implementation: iAssist . . . . . . . . . 9.2 Clinical study . . . . . . . . . . . . . . . . . . . . 9.3 Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 129 129 130 132 . . . . . . . . 10 Conclusion and future work . . . . . . . . . . . . . . . . . . . 134 10.1 Summary: work accomplished . . . . . . . . . . . . . . . . . 134 10.2 Future work: the road ahead . . . . . . . . . . . . . . . . . . 136 Bibliography . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 141 Appendices . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 157 I Covariance estimation in the adaptive Kalman filter . . . . . 157 II List of physiological variables . . . . . . . . . . . . . . . . . . 159 vi List of Tables 4.1 5.1 5.2 5.3 6.1 6.2 6.3 6.4 Detection results of the EWMA-Cusum 2-level alert method for EtCO2 , MVexp, Ppeak, and NIBPmean . . . . . . . . . . 41 Change detection results of the Adaptive-DLM method and the TTS approach on three groups of HR signals . . . . . . . 56 Distribution of the Q, R estimates generated by the AdaptiveDLM method for HR, EtCO2 , MVexp, and Ppeak . . . . . . 57 Area under the entire or part of the ROC curve of the AdaptiveDLM method for EtCO2 , MVexp, and Ppeak . . . . . . . . . 58 Segmental states in physiological trend signals . . . . . . . . . Number of the possible point states for st−1 given sk in the GHMM-based switching Kalman smoother . . . . . . . . . . . Performance of change detection of the standard Cusum test, Adaptive Cusum test, and GHMM-based switching Kalman smoother on simulated data . . . . . . . . . . . . . . . . . . . Performance of change detection of the standard Cusum test, Adaptive Cusum test, and GHMM-based switching Kalman smoother on clinical data . . . . . . . . . . . . . . . . . . . . 64 73 95 95 7.1 Detection of hemorrhage and varying depth of anesthesia using the Cusum test on the statistics F and E based on a factor model . . . . . . . . . . . . . . . . . . . . . . . . . . . . 108 8.1 Comparison of the signal estimation accuracy of the hybrid median filter and the Kalman filter for the simulated cases in Phase I: cases without artifacts . . . . . . . . . . . . . . . . . 125 Comparison of the signal estimation accuracy of the hybrid median filter and the Kalman filter for the simulated cases in Phase II: cases with artifacts . . . . . . . . . . . . . . . . . . 125 8.2 9.1 Clinical evaluation of the real-time performance of iAssist . . 131 vii List of Figures 1.1 1.2 Intraoperative information flow processed by an anesthesiologist Cognitive model for physiological monitoring . . . . . . . . . 3.1 Probability distributions of the forecast residuals for two ferent patterns . . . . . . . . . . . . . . . . . . . . . . . V-mask on a Cusum plot . . . . . . . . . . . . . . . . . Interface of an event annotation software tool eViewer . 3.2 3.3 4.1 5.1 5.2 5.3 5.4 5.5 5.6 5.7 5.8 6.1 6.2 6.3 6.4 3 5 dif. . . . . . . . . 27 28 32 Change-point detection and alert management process of the EWMA-Cusum method illustrated on an EtCO2 signal . . . . 39 Schematic of the Adaptive-DLM trend-change detection method Adaptive Kalman filtering process . . . . . . . . . . . . . . . Q and R estimates generated by the Adaptive-DLM method for a simulated HR trend signal . . . . . . . . . . . . . . . . . Process of the Adaptive-DLM method illustrated on a HR trend signal . . . . . . . . . . . . . . . . . . . . . . . . . . . . Detection results of the Adaptive-DLM method and the TTS approach on two example signals . . . . . . . . . . . . . . . . ROC curves of the Adaptive-DLM method for HR . . . . . . Time delay of the Adaptive-DLM method for trend-change detection in HR . . . . . . . . . . . . . . . . . . . . . . . . . . ROC curves of the Adaptive-DLM method for EtCO2 , MVexp, and Ppeak . . . . . . . . . . . . . . . . . . . . . . . . . . Two example NIBPmean trend signals: similar patterns have different degrees of significance . . . . . . . . . . . . . . . . . Graphical model for the generalized hidden Markov model . . Overall procedure of the fixed-point switching Kalman smoother Linked lists for compact storage of the conditional estimates in the GHMM-based switching Kalman smoother . . . . . . . 45 46 52 53 54 56 57 59 62 64 71 73 viii List of Figures 6.5 6.6 6.7 6.8 6.9 6.10 6.11 6.12 6.13 7.1 7.2 7.3 8.1 8.2 8.3 Design of the upper arm of the test mask for the GHMMbased adaptive Cusum test . . . . . . . . . . . . . . . . . . . Estimation accuracy of the switching Kalman smoother with different smoothing window sizes . . . . . . . . . . . . . . . . Signal-to-noise ratio of the change probability estimated using the switching Kalman smoother with different smoothing window sizes . . . . . . . . . . . . . . . . . . . . . . . . . . . Estimation error and the computational complexity of the switching Kalman smoother with different pre-thresholds hW ROC curves of the GHMM-based switching Kalman smoother for simulated data and clinical data . . . . . . . . . . . . . . . Online pattern recognition process of the switching Kalman smoother demonstrated on an NIBPmean trend signal . . . . Change detection results of the GHMM-based adaptive Cusum test and the standard Cusum test on an NIBPmean trend signal ROC curves of the standard Cusum test and the adaptive Cusum test on simulated data . . . . . . . . . . . . . . . . . . ROC curves of the standard Cusum test and the adaptive Cusum test on clinical data . . . . . . . . . . . . . . . . . . . 8.6 8.7 9.1 Real-time performance evaluation with iAssist 8.5 88 89 90 91 92 93 94 94 Intraoperative events cause changes in a factor model . . . . 101 Variations of the magnitudes of common factors (F ) and variations of the covariance structure (E) during bleeding . . . . 105 Variations of the magnitudes of common factors (F ) and variations of the covariance structure (E) during light anesthesia 106 Standard, structural, and hybrid median filters . . . . . . . . Measurement redundancy in the HR measurements . . . . . . Kalman filter used for estimating heart rate from multiple measurement channels . . . . . . . . . . . . . . . . . . . . . . Effect of the hybrid median filter on two channels of measurements . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Estimation accuracy of the hybrid median filter compared with that of the Kalman filter . . . . . . . . . . . . . . . . . . Performance of the hybrid median filter in clinical case (1) . . Performance of the hybrid median filter in clinical case (2) . 8.4 82 113 115 118 122 123 126 127 . . . . . . . . 131 ix Acronyms ANN AR ARDS AUC AUPC BN BP CPV DLM ECG EEG EM EWMA FA FDA GHMM GLR GPB HMM HR ICU MAP MAP ML MMSE MSE MV OR PCA Artificial Neural Network Autoregressive Adult Respiratory Distress Syndrome Area Under a ROC Curve Area Under Part of the Curve Bayesian Network Blood Pressure Cumulative Percent Variance Dynamic Linear Model Electrocardiogram Electroencephalography Expectation-Maximization Exponentially Moving Weighted Average Factor Analysis Food and Drug Administration Generalized Hidden Markov Model Generalized Likelihood Ratio Generalized Pseudo Bayesian Hidden Markov Model Heart Rate Intensive Care Unit Maximum a Posteriori Mean Arterial Pressure Maximum Likelihood Minimum Mean Square Error Mean Square Deviation Moving Average Operating Room Principal Component Analysis x Acronyms RMS RMSE ROC RSD SN SPRT SQA STD TA TTS WLS Root Mean Square Root Mean Square Error Receiver Operating Characteristic Relative Standard Deviation Signal-to-Noise Sequential Probability Ratio Test Software Quality Audit Standard Deviation Temporal Abstraction Trigg’s Tracking Signal Weighted-Least-Square xi Acknowledgements I would like to thank my supervisors, Professor Guy A. Dumont and Professor J. Mark Ansermino. Working with them has been the most rewarding experience in my life so far. They have provided so much support and guidance, and at the same time enough freedom for me to explore my interests, throughout the course of my PhD study. This thesis could not have been completed without their help. I have learned from them not only how to conduct research, but also how to cooperate with different people and how to lead a team. They also set an example of dedication and patience that will be with me for the rest of my life. I would also like to thank Chris Brouse and William Magruder. Chris Brouse worked closely with me on the physiological monitoring project. He developed a software tool to apply some of the methods proposed in this thesis and conducted most of the clinical study. William Magruder provided me with enormous help in editing and revising my publications and the whole thesis. I deeply appreciate their help. I also want to extend thanks to Joanne Lim, Simon Ford, Natasha McCartney, and all the other people who have helped over the past five years. It has been a great pleasure working with Mande Leung, Prasad Shrawane, Ginna Ng, Pedram Ataee, Ali Shahidi Zandi, Behnam Molavi, and all my other fellow students in the laboratory of Electrical and Computer Engineering in Medicine. They have kindly provided me with coffee and kept me in good company while I wrote the thesis. Special thanks for that! Finally I want to thank my family for their unconditional love and support all these years. They make everything I have accomplished meaningful. xii Chapter 1 Introduction Advances in technology have resulted in exponential growth in the amount of physiological data collected in the clinical environment. Despite the technological advances, improvement in patient safety and surgery outcomes is limited by the imperfect simultaneous monitoring capabilities of humans. An effective physiological monitoring system is required to reduce the cognitive overload and facilitate the decision-making process in critical care environments. This thesis aims to estimate physiological signals from noisy measurements and detect clinically relevant trend changes. These two problems are fundamental for the task of physiological monitoring and can be solved by methodologies based on statistical forecasting models. The results of trend-pattern recognition can assist clinicians in tracking variations in the physiological state of a patients and provide semantic temporal abstracts for knowledge-based diagnosis. 1.1 1.1.1 Motivation Cognitive challenges in physiological monitoring The clinical environment has experienced significant changes in recent decades due to the development of technology. In the 1960s, most clinical environments were equipped with only very basic monitoring devices, often including a blood pressure (BP) monitor, an electrocardiogram (ECG), and a flow monitor. Clinicians had to manually record measurements from these stand-alone devices [134]. Over the next two decades, sophisticated monitoring technologies such as pulse oximetry, spectrometrical gas analysis, and intravascular BP measurement were introduced to the clinical environment. Basic medical devices have also been enhanced through the use of high-frequency measurement and electronic recording. The architecture of physiological monitoring systems was also redesigned with the development of modular communication interfaces to various devices (see Figure 1.1). A central console is used to integrate all the measurements from different monitoring devices, resulting in a flexible and ex1 1.1. Motivation tendable monitoring system. The first commercial integrated physiological monitor was introduced in the early 1990s [151]. Today central physiological monitors have become standard equipments in the Operating Room (OR) and Intensive Care Unit (ICU). The purpose of physiological monitoring is to identify and correct undesirable situations and to optimize the patient’s care. Clinicians, as the operators of this process, use the information from different sources to gradually reduce the range of diagnostic possibilities and make final decision. For example, in the OR, to keep the patient in a stable state, the anesthesiologist undertakes multiple tasks during surgery (see Figure 1.1). The anesthesiologist needs to monitor physiological variables, observe clinical signs such as eye blinks and chest movement, and exchange information with other team members. The information from different sources is integrated with expert knowledge to produce a real-time evaluation of a patient’s state. Physiological variables are the most important indicators of the patient’s physiological state and require constant vigilance and context-sensitive analysis. A typical central physiological monitor provides integrated collection and display of the physiological data but lacks integral data analysis [150]. Data analysis is still mostly performed by clinicians. According to Miller [87], when human memory is used for conscious deliberations about a topic of concern (such as a patient’s current or past states), only seven chunks of information can be remembered at any given time. It is impossible for even the most experienced clinicians to maintain constant surveillance of all the physiological variables in the current clinical environment. This cognitive overload causes fatigue in clinicians and reduces the potential of new technology in enhancing patient safety [29, 91, 138]. 1.1.2 Standard alarm system Most physiological monitors have a built-in alert system that monitors each of the variables individually. Typically, if the value of a variable strays outside a preset range, an alarm will be triggered. Unfortunately, false and unnecessary alarms are so frequent (over 90% in the survey [74]) that the alarms represent a disturbance rather than cognitive assistance in real clinical settings [23, 77, 86, 141]. In many cases, clinicians simply disable the audible alarms at the beginning of a case. Real-time analysis of physiological variables has long been recognized as the bottleneck of the information flow in the critical care environment [28]. Clinicians have demonstrated growing awareness of the benefits of automatic physiological monitoring [38, 152]. 2 1.1. Motivation Physiological monitor Variables HR ECG waveform Blood Pressure Temperature SpO2 Respiratory variables End-tidal CO2 Capnogram Anesthetic fraction O2 Consumption Noise, false alarms Fatigue Perturbation Decision support system Surgeon Nurses Students Anesthesiologist Observing patient Treatment Airway management Anesthesia maintenance Fluid management Figure 1.1: Information flow processed by an anesthesiologist during surgery: A large amount of data is received from from different sources at a high frequency. The anesthesiologist needs to integrate all the information to make a real-time decision about the patient’s state and simultaneously apply interventions to maintain the patient in a desirable state. A decisionsupport system is needed to reduce the cognitive load on the anesthesiologist in the task of patient monitoring. 3 1.1. Motivation 1.1.3 Decision making in physiological monitoring The aim of physiological monitoring is to provide a context-sensitive evaluation of the patient’s physiological state. Clinicians analyze the variations of a patient’s physiological variables over time and the relations between these variables to derive a decision. This cognitive process can be divided according to the level of interpretation into five steps: noise reduction, trend-change detection, temporal reasoning, scenario summarization, and diagnosis (see Figure 1.2). From the bottom up, each step combines new information with the results of the previous step, and generates an abstract more relevant to the final diagnosis. The multidimensional, interrelated, time-varying numerical data are transformed into semantic descriptions of a patient’s pathophysiological states that can be further used in treatment planning. Below are the subtasks handled in each step: • Signal estimation: This step removes artifacts in the raw data and estimates the true values of each variable. • Trend-change detection: A numerical signal level does not offer a direct indication of patient status. Instead, changes in trend patterns reflect a patient’s physiological dynamics and are of greater interest in patient monitoring. This step removes random variations in a physiological trend signal and recognizes trend patterns resulting from a changed physiological state. A trend segment can be treated as a basic unit of information in the perception and analysis of trend signals. • Symptom extraction: A symptom as the basic manifestation of clinical events often consists of multiple trend segments over time and/or from different variable. Trend patterns detected in the pervious step is synchronized and congregated in this step according to domain knowledge to construct a meaningful trend complex. • Scenario summarization: This step prioritizes clinical symptoms, given the patient demographic information, premedication, and treatment sequence, and removes irrelevant information in the current context. For example, a decrease in Heart Rate (HR) and BP, after an increase of Propofol infusion rate, can be treated as normal responses, since Propofol is known to depress the cardiovascular system. • Diagnosis: Clinicians use their expert knowledge to analyze scenarios and derive a final evaluation of a patient’s physiological state and the condition of the life-supporting equipments. Hierarchical structures have also been adopted by other researchers to describe the cognitive process in physiological monitoring [76, 90, 127, 144]. 4 1.1. Motivation Diagnosis Knowledge Scenario Scenario summarization Context Symptoms Demographics Premeditation Health history Treatment ... Symptom extraction Trends Anesthesiologist Trend-change detection Interpretation Clean data Dynamic features inter-variable relationship Noise reduction Raw measurements Time Figure 1.2: The cognitive model of physiological monitoring: expert knowledge is used to process numerical data and generate interpretations of different levels of clinical relevance. A decision-support system does not need to accomplish all the information processing tasks involved in the entire decision-making process. An effective support at any level of interpretation can reduce the cognitive requirement of the corresponding subtask and facilitate clinicians’ overall reasoning process. Human factors studies have revealed that transferring the numerical measurements to clinicians’ awareness is the most time-consuming process [28]. Therefore automatic signal estimation and trend-change detection may significantly reduce clinicians’ cognitive load [43]. Different knowledge representation techniques and reasoning paradigms should be used to solve the subtasks at different interpretation levels [143]. Signal estimation and trend segmentation are data-intensive procedures, i.e., decisions at both levels of interpretation are mostly based on sequential 5 1.2. Problem statement data. The knowledge used in these two steps is the difference of dynamic characteristics between the relevant trend changes and random variations in a physiological variable. This type of knowledge is independent of contextual information and can easily be translated into mathematical models. The subtasks at the higher levels of interpretation are knowledge-intensive. The outcomes of these inference steps depend heavily on the availability and accuracy of the expert knowledge. The knowledge formulation and the inference paradigms should all be designed with consideration of the target clinical events. 1.2 Problem statement This project aims to solve the problems of signal estimation and trendchange detection in the clinical environment. As pointed out in Section 1.1.3, the purpose of these two subtasks is to summarize the dynamics of a physiological variable and provide a temporal context for the higher-level knowledgebased reasoning. The results of trend-change detection also provide early alerts for clinicians before adverse events occur. 1.2.1 Aims of study A trend is defined as a sustained, unidirectional change in a variable’s mean level [4, 85]. When a patient is experiencing a systematic physiological change, the first sign is very often that a physiological variable has changed its trend direction. Trend direction is therefore the most important feature of a trend pattern, classifying signal segments as increasing, decreasing, or steady. In addition, the trend duration and incremental rate are also used to characterize trend segments more specifically. In multivariate study, a trend pattern is used to refer to the relation between the changes in different physiological variables. The aim of this thesis is to detect clinically relevant trend changes. A clinically relevant change often has large amplitude or a long duration, or both. A clinically relevant change indicates that a patient is experiencing a systematic physiological variation that requires awareness from a clinician. Some clinically relevant changes may not require an action, but if the contextual information is required to rule out the action, the changes are also considered clinically relevant. For example, a treatment-induced change is viewed as clinically relevant because knowledge about the treatment effect is required to suppress an action. 6 1.2. Problem statement This project aims to solve the trend-change-detection problem using methodologies based on statistical forecasting models. The knowledge about a signal’s temporal characteristics is formulated as stochastic dynamical models with the uncertainty due to measurement noise and imperfect knowledge represented by random variables. An appropriate forecasting model is crucial for the performance of this type of methods. The basic assumption in this thesis is that, with an appropriate forecasting model, clinically relevant trend changes can be transformed to statistically significant non-zero shifts in the forecast residuals, and clinically irrelevant variations, in contrast, cause insignificant variations around zero in the residuals. A desired change-detection method not only should detect a change, recognize the change pattern, but it should also provide some quantitative evaluation of the certainty of detection. 1.2.2 Scope of application The OR and ICU are two clinical settings in severe need of decision support. Many events in both environments are life-threatening and require immediate attention. However, clinicians need to attend to other tasks, therefore are not able to maintain continuous vigilance on all the variables [164]. The methods in this thesis are designed for intraoperative monitoring and validated using data collected during surgery. Physiological trend signals from the ICU have similar dynamic characteristics to the corresponding signals from the OR. Since the solutions for noise reduction and trendchange detection are independent of contextual information, the methods proposed in this thesis could be applied to the ICU setting. More than 20 physiological trend signals have been included in this study (see Appendix II for a complete list of the variables). These signals together indicate the conditions of a patient’s cardiovascular, pulmonary, and metabolic systems, and the integrity of the ventilation circuit. A trend signal records the time-varying levels of a physiological variable. Although waveform variables such as ECG or Electroencephalography (EEG) are significant physiological indicators, they are not included in this study. 1.2.3 Challenges Artifacts Although modern physiological monitoring devices have incorporated basic denoising functions, the existence of artifacts still remains a significant challenge for intraoperative physiological monitoring. A recent survey [74] revealed that 1/3 of the false alarms given by current physiolog7 1.2. Problem statement ical monitors during postoperative monitoring of cardiac operated patient were triggered by artifacts. Artifacts are often caused by interference with the measurement system or by system errors, presenting in trend signals as numerically distant deviations from the surrounding observations. In this thesis, artifacts are grouped into two types according to their duration: Transient artifacts are very brief (usually less than 10 seconds), with energy concentrated in a frequency band higher than the relevant trend changes. Short-peak artifacts are typically sustained for a relatively long period of time (sometimes more than a few minutes). The frequency spectrum of short-peak artifacts has a considerable overlap with that of the physiological trend changes; therefore, most band-pass filters cannot remove this type of artifact. Removal of short-peak artifacts is a critical problem in physiological monitoring. Background noise is another source of measurement uncertainty. Background noises are small-amplitude random variations caused by physiological fluctuations and environmental noise under normal conditions. Background noise can be assumed to follow a Gaussian distribution. Most statistical methods work well in the presence of background noise. In this thesis, noise is used to refer to the overall uncertainty due to the background noise and artifacts. Intraoperative and inter-patient variability Regulated by homoeostatic mechanisms, a patient’s physiological system compensates for intraoperative interventions and can achieve homeostasis with variables at new levels. The pattern of how a physiological variable responds to a clinical event is partly determined by the nature of the event, and partly by the patient’s condition. As both factors are hardly predictable before surgery, the dynamic characteristics of trend signals, including the mean level and second-order properties, change significantly over time and across patients. Inability to adapt to the intraoperative and inter-patient variability is a major cause for the high false-alarm rate of most monitoring systems [95]. A real-time change detection method should track the variations in both the signal level and the higher-order statistics online and adjust its configuration accordingly, in order to maintain a consistent performance. However, in the presence of artifacts, fast-tracking and robust detection are two competing objectives [157]. Multidimensionality Most clinical events result in trend changes in multiple variables. The relationship between the variations in different variables is highly sensitive to the contextual information. Comparing the real-time 8 1.3. Outline inter-variable relationship with the normal response in the same context can reveal events that only cause subtle changes in each individual signal. Measurement redundancy Measurement redundancy exists when a variable is measured from more than one independent source, or, more generally, when a set of independent measurements is related by a verified equation that remains valid during physiological variations. On one hand, measurement redundancy represents a problem for multivariate analysis. Redundancy provides no information about the physiological state of a patient, but dominates the inter-variable covariance structure and therefore may bury relevant features. Measurement redundancy should be removed before multivariate analysis. On the other hand, redundant measurements can be used improve the accuracy of signal estimation. 1.3 Outline The aim of this thesis is to recognize clinically relevant trend changes in physiological trend signals online in the presence of noise and intraoperative variability. The thesis starts with the extension of a traditional trend-detection method, and then shifts the focus to online adaptive trend-monitoring. This thesis is organized into 10 chapters. This chapter has introduced the problems of signal estimation and trend-change detection, and highlighted the potential challenges that need to be addressed in the following chapters. Chapter 2 reviews the studies that have been done on clinical decision support from the 1960s up to 2008. Each project is reviewed in terms of the level of interpretation, methodology, and target clinical application. In spite of many advances, trend-change detection still remains as an unsolved problem. Chapter 3 introduces the approaches for level-shift detection, which form the basis for model-based trend-change detection. The study procedure and performance evaluation process used in this thesis are also described. The methods proposed in Chapters 4, 5, 6, and 7, based on different forecasting models, are designed to detect changes in different trend features. In Chapter 4 and Chapter 5, trend patterns are classified in terms of the direction of change. In Chapters 4, a trend signal is modeled as the Exponentially Weighted Moving Average (EWMA) of historical data. The Cumulative Sum (Cusum) test is used to evaluate forecast residuals and determine whether a trend change has occurred. Chapter 5 describes an adaptive trend-change detection method for monitoring the variables mea9 1.3. Outline sured at a high sampling rate. In this method, the signal is described using the linear growth state-space model combined with an EWMA model. An adaptive Kalman filter is used to estimate the second-order characteristics and adjust the change-detection process online. Chapter 6 addresses the intraoperative variability in the variables measured at a low sampling rate, by modeling the transition relationship between different trend patterns using a generalized hidden Markov model. Based on this model, a switching Kalman filter and an adaptive Cusum test are proposed for online trend-pattern recognition. Chapter 7 investigates the potential of Factor Analysis in representing the relationship between trend changes in different variables. Based on a factor model, two test statistics are proposed to capture intraoperative events including light anesthesia and intermediate hemorrhage. In Chapter 8 the problem of artifact removal and signal estimation is addressed in the framework of data reconciliation. Measurement redundancy is utilized to obtain more robust estimates of the physiological variables than would be possible from a single sensor source. A novel hybrid median filter is proposed as an implementation of the dynamic data-reconciliation process for the HR measurements. Some of the proposed methods have been implemented in a software system called iAssist. Chapter 9 provides a brief description of the architecture of this system and its real-time performance in the clinical test. Chapter 10 summarizes the contributions of this thesis and points out the issues that require further study. Appendix I describes the derivation of the adaptive Kalman filtering process used in Chapter 5. The physiological variables studied in this project are listed in Appendix II. 10 Chapter 2 Literature review Automatic measurement and recording came into clinical use in the late 1960s. Ever since then, researchers from both biomedical and clinical fields have worked continuously toward real-time interpretation of physiological data in order to assist decision making in physiological monitoring. This chapter reviews the studies of clinical decision support from the 1960s up to the current time (2008). The author searched the online databases PubMed, IEEE Xplore, Science Direct, and ACM portal using the combinations of two groups of keywords: group A included “artifact (artefact) removal”, “trend change”, “temporal abstraction”, and “decision-making support”; group B included “patient monitoring”, “physiological monitoring”, “critical care”, “anesthesia (anaesthesia)”, and “vital sign”. The resulting publications were then checked in Citeseer, an open-access citation search engine, to find the related publications. This chapter starts with a brief overview of the development of physiological monitoring, followed by a detailed review in chronological order. 2.1 Overview of clinical decision-support systems Physiological monitoring has been an active topic of research since the 1960s. Over the years, as the technology and the understanding of the cognitive process involved in physiological monitoring have developed, various techniques have been adopted in physiological monitoring to provide decision support at different levels of interpretation. Most studies in the 1960s and 1970s were focused on automatic trendchange detection. Many researchers applied the statistical tests used for Statistical Process Control (SPC) to physiological monitoring. Endresen [42] and Avent [8] reviewed the popular models and statistical tests used during this time period. Since the late 1970s, the successful application of Artificial Intelligence (AI) in industrial problems has driven some researchers to investigate the 11 2.2. Early stage trend-change detection potential of AI for decision support in the clinical environment. Many intelligent monitoring systems were developed during this period, and most of them attempt to derive a diagnosis directly from signal measurements. However, the performance of these systems was unsatisfactory in practice. The unsuccessful application of first-generation intelligent monitoring systems has raised awareness of the important role of “time” in physiological monitoring [143]. Temporal abstraction has emerged as a research topic, aimed at summarizing temporal variations in physiological signals. Techniques used in AI, statistical tests based on forecasting models, and curve-fitting were used for temporal abstractions. Uckun [144] reviewed the physiological monitoring systems developed from the early 1980s to the 1990s with an emphasis on the target application. Other surveys include [14, 45, 75, 129], with [45] dedicated to trend-change detection. The studies in each stage are reviewed in the following, in terms of methodology, clinical applications, and target level of interpretation. It will be demonstrated that, trend-change detection is an essential part of a clinical decision support system, and despite all the efforts dedicated to trend-change detection, this topic remains a challenge in physiological monitoring. 2.2 Early stage trend-change detection Physiological monitoring studies started in the early 1960s when data loggers were introduced to medical applications [134]. Many methods developed during this period were based on statistical forecasting models and focused on trend monitoring. A signal model and a statistical test are the two components for a model-based change detection method. An appropriate model should translate the trend variations of interest as nonzero deviations in the filtering or forecast residuals. Testing the residuals could reveal whether a change has occurred. Many important models and test charts used in SPC were adopted in physiological monitoring. Methods based on the EWMA model The most widely used model was the EWMA model, equivalent to the Box-Jenkins Autoregressive Integrated Moving Average (ARIMA) (0,1,1) model. As the name indicates, an EWMA model generates predictions based on the accumulation of historical data, and the influence of historical data on the predictions attenuates exponentially as time passes by (see Chapter 4 for details). An appropriately configured EWMA model can adapt to a new plateau very quickly while translating increasing and decreasing ramps into sustained non-zero residu12 2.2. Early stage trend-change detection als. This property makes the EWMA model very suitable for trend-change detection. The Trigg’s Tracking Signal (TTS) [139] is one of the most widely used statistics with the EWMA model for change detection. The TTS is defined as the ratio of the exponentially smoothed residual and the exponentially smoothed absolute residual. One of the desirable properties of TTS is that its value ranges from -1 to 1, and each value corresponds to a statistical significance. However, the TTS, if set to be robust against impulsive artifacts, often generates false alarms for sustained changes of negligible amplitude [42]. Despite this weakness, the TTS as a milestone in statistical physiological monitoring is still widely used by researchers to obtain a comparative evaluation of their work [72, 85, 161]. Hope [62] addressed the limitations of TTS in 1973 by multiplying the value of TTS by a weighting function and referred to the new index as the Patient Condition Factor (PCF). The original weighting function used in [62] ranges from -1 to 1, and increases linearly with the magnitude of the forecast residuals. The PCF method reduced false detection rates compared with the TTS; however, the relationship between the index magnitude and level of statistical significance in TTS were also lost. Empirical testing is needed to determine the alarm limits for PCF-based testing [8]. Conventional SPC charts were also adopted in patient monitoring. Among these charts, the Cusum test [100] gained great popularity. Stoodley and Mirnia [131] modeled the HR signal using an EWMA model plus a piecewise constant incremental factor. The two one-sided Cusum test [10] was used on the EWMA forecast residuals to detect trend changes. The detected trend changes were classified using heuristics into transients, ramps, and steps. This method demonstrated considerable potential for trend monitoring. The Cusum test was also used [56, 120, 128]. Methods based on other integral models In addition to the standard EWMA model, other integral models were also used to describe physiological signals. The nonseasonal Holt-Winter model [61] extends the EWMA model by assuming that both the signal mean level and the slope of changes are time-varying and follow the EWMA processes. This model is equivalent to a Box-Jenkins ARIMA(0,2,2) model. It was used in [4] to monitor intracranial pressure. Compared with the EWMA model, the non-seasonal Holt-Winter model tends to over-fit most medical series and was only suitable for shortterm forecasting [8, 42, 131]. The Exponentially Mapped Past (EMP) is another statistic based the 13 2.2. Early stage trend-change detection integration of signal history. The EMP of an input signal is generated by passing the signal through a low-pass resistor-capacitor filter or by an equivalent digital process [99]. When used for change detection, EMP filters with different time constants are usually connected in parallel or in series. The differences between the outputs of these filters converge to an affine linear function of the slope for a ramp change. The Student’s t-test is usually performed on the output differences to detect a trend change. The cascaded EMP system is preferred in practice because the outputs of a cascaded system are less affected by the unwanted oscillations caused by the unmatched phase shifts of different integrators if the time constants are chosen appropriately. A cascaded digital EMP system was implemented in a bedside monitor called Patient Alarm Warning (PAW) system for detecting bradycardia, tachycardia, and sustained hypotension or hypertension [59, 135]. Although the PAW system has great potential for trend-change detection, it tends to miss changes of a large amplitude but a short duration [42]. Furthermore, PAW is a very inflexible system. It is often necessary to change the time constants of the integrators for a particular variable or patient. Methods based on the ARMA model The Box-Jenkins ARMA model was used in [42] to describe HR and Mean Arterial Pressure (MAP) trend series. In an ARMA model, the present level of the variable is expressed as a linear combination of past values, and the present and past value of a random series, i.e., a combination of the autoregressive (AR) and Moving Average (MA) components. A time series described by an ARMA model is assumed to be stationary. Unfortunately, this assumption is not valid for most physiological variables, as a patient’s status varies over time in most clinical situations. If an ARMA model is used to describe a physiological trend signal, not only the model parameters but also the order of model have to be adjusted in real time [42]. Estimating an ARMA model requires a substantial number of observations (>50 according to [26]) and intensive computation. The ARMA model was therefore not recommended for modeling physiological trend signals [42]. Multi-state Kalman filter and Bayesian forecasting In contrast to the approaches based on hypothesis testing, the multi-state Kalman filter takes the Bayesian approach and provides a flexible framework for modeling and analyzing time series subject to abrupt changes. Initially introduced by Harrison and Stevens in 1976 for forecasting in economics [57], the multistate Kalman filter was soon adopted by many researchers for physiological 14 2.3. Diagnostic monitoring monitoring in different clinical scenarios [8, 32, 42, 51, 121, 124]. For example, in 1983 Smith and West [124] used the method for monitoring renal creatinine levels in patients who had recently received renal transplants. In the multi-state Kalman filter approach, a signal is treated as piecewise linear and is modeled by the linear growth model (see Chapter 5 for details). The linear growth model is a state-space model in which the final observation, signal mean, and slope of change are all subject to Gaussian disturbances. Depending on which factor is disturbed, trend changes exhibit transient, ramp, or step patterns. The prior probability of occurrence for each pattern is assumed to be constant over time and independent of the historical signal. Under this assumption, the multi-state Kalman filter is used in [124] to solve two problems: (1) estimating the signal in the presence of disturbances and (2) investigating the filtering residuals and estimating the posterior probability of occurrence for each pattern. Smith and West [124] also proposed the model estimation process. The multi-state Kalman filter estimates the probability of occurrence for each of the possible trend patterns online, thus providing a more informative summary of a signal’s temporal behavior than the statistical testing methods. This functionality is very desirable for a diagnostic inference application. However, the method has a few drawbacks: first, the linear growth model used in the method, similar to the ARIMA(0,2,2) [4], tends to be oversensitive to artifacts. Furthermore, the assumption of time-invariant prior occurrence probability for trend patterns is not compatible with clinicians’ cognitive processes. Most approaches proposed during this period were designed to detect variations in a signal’s mean level with very limited effort devoted to estimating the second-order statistics. Most of the integral models reviewed in this section could be categorized as generalized exponentially-smoothing models [33], which are also the models of choice in this thesis. 2.3 Diagnostic monitoring In the late 1970s, the development of AI and its successful application to industrial processes stimulated the development of intelligent diagnostic systems. MYCIN [118] (Stanford University), a rule-based expert system, was introduced in 1976, and achieved great successes in bacterial infection diagnosis. The same group soon extended the application of expert system to patient monitoring and introduced a ventilation management consulting 15 2.3. Diagnostic monitoring system called VM [44]. VM [44] is commonly thought to be the first diagnostic monitoring system. Ever since the introduction of VM, much effort has been directed toward intelligent diagnostic monitoring. AI technologies mimic how clinicians reason with their expertise in the presence of ambiguity and incomplete information. The knowledge used for diagnosis often has a very loose structure and therefore demands a more sophisticated framework of representation. A variety of AI techniques, such as rule-based systems, Bayesian belief networks, decision trees and artificial neural networks, have all been used in diagnostic monitoring. Knowledge representation Rule-based expert systems are widely employed for diagnostic reasoning, as most knowledge and inference processes involved in clinical practice are already in the symbolic form. A complete expert system consists of three major components: a knowledge base, which stores expert knowledge in the form of “if... then...” rules; a user interface for knowledge acquisition and real-time data input; and an inference engine that guides the reasoning process using appropriate search mechanisms. The belief behind the expert system is that the inference processes adopted by different experts are the same, but the contents of knowledge they possess are different. Based on this belief, the inference engine can be standardized for a broad application domain. There are many commercial and open-access inference engines available for medical diagnosis. Many rule-based expert systems proposed during this period provided ventilatory support [44, 65, 79, 122]. VM [44] was designed for assisting mechanical ventilation management. The Computerized Patient Advice System (COMPAS) [122] (Stanford University) utilized the HELP information system [122] developed by the same group to assist with respiratory therapy for patients with Adult Respiratory Distress Syndrome (ARDS). Ventilatory support remained an active application problem in the early 1990s [65, 79]. Anesthesia Expert Assist (AES) was a rule-based system that offered intelligent alarms and therapy recommendations to anesthesiologists for monitoring hemodynamics during aortocoronary bypass surgery [109, 114]. Although intuitive to use, rule-based expert systems have a very loose structure, and therefore cannot be trained using validated cases. Bayesian Network (BN) was also employed to represent clinical knowledge. BN is a probabilistic graphical model in which nodes are linked by directed arcs to represent the inference logics. Nodes represent observed or inferred facts, i.e, symptoms or diagnosis. Arcs, qualified by a table of conditional probabilities, encode the independence between the parent and 16 2.3. Diagnostic monitoring child nodes. ALARM [15] (Stanford University) is a BN-based system for monitoring mechanical ventilation. In ALARM, 37 nodes linked by 42 arcs represent the prior knowledge. It was claimed that 71% of clinical problems were correctly recognized in the operating room during a test [15]. An Artificial Neural Network (ANN) solves the diagnostic problem from the perspective of classification. An artificial neural network is a group of interconnected simple processing units (“neurons”). The weight for each connection can be adjusted to generate a more complex global relationship to describe the relationship between input and output that flows through the network. When used for diagnosis in physiological monitoring, the inputs are the temporal features of multiple physiological variables and contextual information, while the output is the target clinical events. The first ANNbased anesthesia monitoring system known to the author was [97] (University of Utah). In that system, 25 temporal features extracted from the combination of EtCO2 , airway pressure, and flow monitored at the mouthpiece were used as inputs to a three-layered ANN to identify alarm conditions. As parametric models, the BN and ANN models can both be adapted to annotated data. Challenges The first challenge for diagnostic monitoring is that a complete collection of knowledge is difficult to acquire. The knowledge collection in a successful diagnostic system should be able to link observations to all the events of interest. Specifying all the possible scenarios can be an intractable task. For this reason, knowledge acquisition is often the most tedious step in constructing an intelligent monitoring system. The second challenge is the availability of real-time contextual information. For example, in intraoperative monitoring, the patient demographic information, sequence of treatments, and environmental disturbances are all essential for making a diagnosis decision. However, it is almost impossible to obtain some of the information without manual input. The representation of uncertainty is also an important issue for methods that operate at any level of interpretation. Fuzzy set, possibility theory, or subjective probability theory can be used to describe the information uncertainty in physiological monitoring [96]. For the tasks of signal estimation and trend-change detection, information uncertainty is mostly caused by measurement noise and random physiological variations, and is often represented by probabilistic components. The uncertainty in diagnostic inference is commonly attributed to the inability to perceive and assess all the contributing factors for a particular assertion, or the expression imprecision. 17 2.4. Temporal abstraction Probability theory is often used with the ANN and BN models. Fuzzy logic was adopted in the AES system [109] and used for anesthesia monitoring. The certainty factor used in VM [44] can be seen as a type of subjective probability. Although these theories express the information uncertainty in different fashions, the operations defined in them are quite similar. In spite of significant efforts devoted to the development of diagnostic monitoring systems, most of the reviewed systems remained in the prototype stage due to their unsatisfactory performance in the clinical environment. In the late 1980s and early 1990s, some researchers reflected on the work that had been done on the intelligent physiological monitoring systems. They recognized the hierarchical nature of the task of physiological monitoring, and also pointed out that the lack of effective temporal feature extraction was one of the major reasons for the unsatisfactory performance [107]. Researchers in human factors investigated the clinicians’ cognitive processes in physiological monitoring in an effort to identify the bottleneck in this information-processing task. They found that the primary effort for decision makers was not at the moment of decision making, but rather in the process of transferring the numerical data to the clinician’s cognizance [28]. It has been suggested that summarizing the temporal behavior of physiological variables may improve clinicians’ situation awareness [43, 73]. 2.4 Temporal abstraction In the late 1980s, researchers recognized the critical role of “time” in physiological monitoring. Many studies have been conducted on sequential data in order to summarize a signal’s temporal characteristic for diagnostic monitoring. This process is referred to as Temporal Abstraction (TA) [70, 101]. There is no commonly accepted definition for TA in the literature. Many TA studies from the early 1990s through to 2007 [129] summarize signals at different levels of interpretation. The problem of TA, from the perspective of pattern-recognition, is often realized in two steps: feature extraction and classification. According to how the two steps are performed, the techniques used for TA can be categorized as AI-based, template-fitting, and model-based statistical test. When AI techniques are adopted, measurement data are often pre-segmented into a series of fixed-length slices. The basic features of these slices, such as mean and standard deviation, are calculated and used as input for TA. The pre-segmentation step, although it reduces the temporal uncertainty, does 18 2.4. Temporal abstraction not locate the starting and end points for trend patterns. The challenge for AI-based TA is how to construct an effective classifier (model-based or rule-based) to identify the temporal patterns in these signal slices. In the template-fitting methods, pre-defined temporal patterns are represented by a mathematical function (or a set of functions for a multivariate problem) over time. Measurements are compared with the templates to find the bestfitting pattern. For model-based methods, constructing an appropriate forecasting model is critical. If a model transforms the trend features of interest into nonzero residuals, testing the residuals can reveal trend changes. Temporal reasoning The application of AI on time series analysis is often referred to as temporal reasoning. Most AI-based techniques can incorporate signal measurements, as well as contextual information and pathophysiological knowledge, to provide an integrated data interpretation. When used for trend-change detection, temporal reasoning is performed on the basic features of pre-sliced signal segments. Data granularity, or the length of signal segments, is one of the most important factors that influence the performance of temporal reasoning. In most studies, the segmental length is empirically set for each variable, and then the features of interest are calculated for every slice. For example, in [22], a “characteristic span” is learnt from population data, then the slope and variability of measurements in each span is calculated. According to the magnitudes of the slope and variability, signal slices can be roughly described as unstable, stable, increase or decrease before more advanced temporal reasoning is performed. Rule-based systems are widely used for representing the temporal relationships between pre-sliced physiological signals. Trend patterns in a single variable can be recognized by investigating the relationship between the successive signal slices [24, 47, 113]. If used for recognizing complex signal patterns formed by multiple physiological variables, the knowledge representation and inference operations become more complicated. To facilitate the acquisition, maintenance, reuse, and sharing of expert knowledge, some researchers have proposed general frameworks for describing the primitives (events, observations, and contexts), relations, and operations that occur in the task of temporal reasoning in a standardized manner. A conceptualized framework can be applied to a variety of applications. For each application, the pre-defined functions in the framework should be implemented using specific methods. Yuval Shahar (Knowledge Systems Laboratory, Stanford University) proposed a knowledge-based temporal abstraction framework [115]. In this system, knowledge was classified into fours types: the structural, clas- 19 2.4. Temporal abstraction sification, temporal-semantic, and temporal-dynamic knowledge. The operations are divided into five TA mechanisms, including context formation, contemporaneous abstraction, temporal inference, temporal interpolation, and temporal pattern matching. RESUME [116] is a realization of this ontological system for patient monitoring. RESUME demonstrated the advantage of a modular, task-specific architecture for building a knowledge-based system. Another ontology system was also proposed by Uckun in [142]. A framework referred to as temporal constraint network was also used to represent the causal relations between time-stamped events. In temporal constraint networks, a node represents a time-stamped event and a directed arc represents a temporal constraint between the linked events. A temporal constraint network is often used to describe scenarios constituted by a set of trend abstracts. During monitoring, signal measurements are compared with the pre-defined temporal constraint networks to identify the scenarios of interest. This approach can be viewed as a template-fitting method that operates at a higher level of interpretation. Temporal constraint network was adopted in DYNASCENE [27], TOPAZ [106], and other systems [37, 117]. Artificial neural network is also used for TA [140, 146]. In TrendFinder (MIT Computer Science and Artificial Intelligence Laboratory), a signal is sliced with three different time resolutions [140]. The features of these slices, including mean, median, maximum, minimum, range, standard deviation, linear regression slope, and absolute value of the linear regression slope, together describe the signal’s short, mid-term, and long-term behavior. TrendFinder has been tested on a wide variety of physiological variables. The ANN, linear logistic regression, and decision tree were compared in [140] in terms of the performance of trend-pattern recognition. It was pointed out that the structure of a classifier only had very minimal influence on the performance, and that the completeness of knowledge was of greater importance. Template fitting A template is a collection of samples that together demonstrate a meaning more relevant to the final diagnosis. Trend templates are often formulated as regression functions, constrained by the template length, the model structure and parameters, and by the sampling interval. To use a template-fitting method, we need to establish a finite set of temporal patterns from medical knowledge and from the modeling of signal variations. During monitoring, signal measurements are compared with all predefined templates. A template is deemed to match the data if the measure of goodness-of-fit meets certain conditions. 20 2.4. Temporal abstraction TrenDx [52, 54] (MIT Computer Science and Artificial Intelligence Laboratory) is a well-known trend-pattern recognition system based on templatefitting. In TrenDx, variations in trend signals are classified into seven templates, each described by a regression model: a constant model represents the steady state, two linear models distinguish the increasing and decreasing ramps, and four quadratic models are used to represent the trend patterns that consist of a sharp increase or decrease followed by a plateau [52]. The duration of a template can be adjusted for a particular case in real time by repeatedly comparing signal measurements with the realizations of trend templates with different intervals. TrenDx was initially proposed as a univariate offline method and used for detecting anomalies in children’s size-weight growth curves [54]. It has been extended to recognize complex patterns (scenarios) in real time by fitting the measurements of multiple variables to the predefined template complex [53]. Online curve-fitting was also used for pre-segmentation. In [25], a trend signal is viewed as a series of connected straight lines and the Cusum test is used to determine whether a trend change has occurred and a new curvefitting process should be started. This method has been tested on both simulated data and clinical data from an ICU setting. Fuzzy course, a concept based on the fuzzy set theory, was proposed by Steimann [130] to represent the allowable deviation in a temporal template. Fuzzy course was implemented in a system called DIAMON [130] to detect ARDS, and was also used in [78] for physiological monitoring in anesthesia. Trend detection based on statistical forecasting models The multistate Kalman filter remains one of the most popular statistical forecasting methods for trend-change detection. In the original multi-state Kalman filter, trend patterns are reflected as short-lived disturbances in the signal model, and therefore the change detection performance deteriorates significantly in the presence of artifacts. To address this problem, some researchers [153] have modified the signal model used in the original multistate Kalman filter to generate sustained non-zero residuals for the trend patterns of interest. Based on the modified model, a statistical test and a Bayesian approach have been proposed to provide robust change detection [153]. The Bayesian method was later used in [32] for HR monitoring. The parameter estimation process has also been described in [153] for the modified multi-state Kalman filter. Principal Component Analysis (PCA) has been used in [48] to summarize the relationship between multiple variables. Visual display of the trajectory 21 2.5. Artifact removal and signal estimation of the principal components may facilitate the perception of variations in the inter-variable relationship and therefore may result in early detection of clinical events. A modified PCA process is used in [5] for noise reduction in patient monitoring. This modified PCA appears to be more effective in extracting the inter-variable relationships than the standard PCA. A physiological monitoring system should integrate the methodologies proposed at different levels of interpretation. For example, the segmentation results [25] have been further processed by a rule-based temporal reasoning system [24] to derive an abstract more relevant to the final diagnosis. YAQ [142] (Vanderbilt University) is a hybrid qualitative/quantitative ontology for physiological monitoring. In YAQ, the quantitative models summarize trend signals and provide inputs for a rule-based diagnostic module; the rule-based module, on the other hand, adjusts the trend analysis process according to the contextual information. This architecture was implemented in a patient monitoring system called SIMON [145] and used for ventilatory management for the newborn. 2.5 Artifact removal and signal estimation Most of the artifact removal methods proposed to date [132] are based on the difference in the dynamic characteristics (power-spectrum distributions in the frequency domain) of clinically relevant trend changes and artifacts. As pointed out in Section 1.2.3, a band-pass filter is not effective at suppressing sustained artifacts. The presence of artifacts remains a significant problem for physiological monitoring. Measurement redundancy can be used to obtain more robust signal estimates than would be possible from a single sensor. Feldman et al. [40] have proposed to estimate HR by fusing the HR measurements from the ECG monitor, pulse oximeter and the invasive BP monitor. This method assumes two possible noise distributions for each sensor: one for background noise and the other for artifacts. There are in total 16 hypotheses about the location of artifacts in the sensor matrix (the number of hypothesis is 2M +1 , with the number of sensors M=3). The measurement with the lowest likelihood of artifactual interference is selected and used in the Kalman filter to derive the fused estimate. The challenge for this method is the acquisition of the noise distributions; additionally, the hypothesis selection involves significant computational effort. These limitations may prevent this method from being implemented in practice. 22 2.6. Conclusion Biomedical manufacturers have shown great interest in sensor fusion [41, 123]. A recent method [123] (GE Healthcare, Chalfont St Giles, UK) fuses the HR measurements using a rule-based approach. The method detects artifacts in the HR from the ECG monitor by testing the measurements against preset limits, and suggests using the HR measurements from other sources only if the HR from the ECG monitor exceeds the limits. The thresholds in this method are very difficult to set. If they are too small, relevant abrupt changes can be distorted; if they are too large, the method may fail to detect many corrupted measurements. 2.6 Conclusion The survey above has revealed the weaknesses and strengths of different techniques when they are used for extracting information at different levels of interpretation. These studies, if integrated appropriately, could provide essential support for decision making in the clinical environment. Trend-change detection remains an unsolved problem in physiological monitoring. AI techniques, curve-templates, and statistical forecasting models can be used to represent the dynamic behaviors of physiological trend signals. Among these, statistical models that generate predictions based on historical data have demonstrated enough power of expression for the purpose of trend-change detection, and meanwhile maintain simplicity and flexibility. The methods proposed in this thesis are based on this type of statistical forecasting models. The statistical trend-change detection methods proposed in the 1970s and 1980s have provided a solid basis for this thesis. However, the measurement and recording technologies available in the clinical environment have improved significantly over the years. For example, HR in an early study [131] was sampled at every one minute, while the common sampling interval in the current clinical environment is 1 or 5 seconds. The signals measured in the current clinical environment have different statistical properties from those measured in the previous clinical environment, posing new challenges for trend-change detection. More advanced signal processing methods need to be employed to address these challenges. The increased computational power of a bedside monitor has also made the application of advanced signal processing possible in the clinical environment. The aim of this thesis is to recognize clinically relevant trend changes in physiological variables online in the presence of noise and intraoperative variability. The thesis starts with the extension of a traditional trend detec23 2.6. Conclusion tion method based on the EWMA model, and then shifts its focus to online adaptive trend monitoring. 24 Chapter 3 Change detection techniques and study procedure Model-based trend-change detection are generally realized in three steps: (1) modeling: describing a signal using a stochastic dynamic model; (2) detrending: calculating the difference between signal measurements and predictions or estimates, and (3) change detection: analyzing the residuals after detrending to identify trend changes from random variations. An appropriate signal model together with the detrending step should transform the target type of trend changes (which may include a switch of trend direction or a change in the incremental rate as pointed out in Chapter 1) into sustained level shifts (or rotation for multivariate analysis) in the forecast residuals. It is also desired that the magnitude of forecast residuals can be categorized to identify the trend patterns. Model selection is one of the major issues for model-based change detection and pattern recognition. In the following chapters, different models are adopted according to the signal characteristics and the target trend changes. This chapter is focused on detection techniques that analyze forecast residuals to determine whether a level shift has occurred. This chapter consists of three sections. The first section is focused on level-shift detection, a fundamental topic for the following chapters. The second section describes the study procedures adopted throughout the project. The last section describes the criteria used for performance evaluation. This section is focused on univariate level-shift detection. Consequently, all the variables herein are scalars. 3.1 3.1.1 Level-shift change detection Uncertainty in level shifts The forecast residual et is defined as the difference between the observation yt and the model-based forecast yˆt|t−1 . Ideally, a level shift caused by a change in the property of interest should have a clear onset and a constant 25 3.1. Level-shift change detection magnitude over the duration of the change. But in practice, relevant level shifts are often blurred by measurement noise and physiological random variations, as in Equation (3.1): (i) (i) (i) (i) et = yt − yˆt = dt + ut (3.1) (i) where dt is the mean of forecast residuals and carries the information about (i) different trend patterns, and ut represents the overall uncertainty in residuals. It is assumed that all the target trend patterns constitute a finite set (i) Z={z (1) , . . . , z (N ) }. The script ∗t used in Equation (3.1) indicates that the trend pattern z (i) is in effect at time t. For most signals in this thesis, (i) the random variable ut can be assumed to be independent over time and follow a Gaussian distribution with zero mean, although the variance of u may vary over time. Due to the presence of uncertainty, the residuals resulting from different patterns may overlap with one another. For example, in Figure 3.1, the distributions of a positive level shift and a residual with zero mean overlap in a range between their means. Measurements with the magnitude falling within this range are possibly generated by either distribution. The problem of trend-pattern recognition is to determine, in the presence of uncertainty, which pattern is in effect by analyzing the residual et . Detection of positive level shifts is used as an example in this chapter. The following sections introduce the application of hypothesis testing, sequential likelihood ratio test, and Bayesian estimation in trend-change detection. I will also describe how the detection performance is adjusted in each of these frameworks to compensate for the intraoperative variability. 3.1.2 Hypothesis testing A statistical hypothesis test makes the decision between a null hypothesis H0 and its alternative hypothesis H1. A test statistic must summarize the information in samples that is relevant to the hypothesis. Given a desired significance level α, the critical region [hα ∞] (or [−hα/2 + hα/2 ] for a two-sided test) can be calculated. One rejects H0 if the sample being tested falls in the critical region. For a hypothesis test, two sources of error may occur: • Type I error : also known as an α error, or a false positive, this is the error of accepting H1 while H0 is true. • Type II error : also known as a β error, a false negative, or a missed detection, this is the error of failing to reject H0 while H1 is true. 26 3.1. Level-shift change detection H0: stable 0 H1: level increase h α d+ Figure 3.1: Probability Distributions of the forecast residuals of two different patterns. The two distributions concentrate around different mean values but overlap over [0 d+ ]. Detection errors are unavoidable due to the presence of information uncertainty. For example, the simplest way of detecting the positive level shift in Figure 3.1 is directly comparing the residual value et with a threshold hα . This scheme produces many false positive detections due to the overlap between the distributions for H0 and H1. 3.1.3 Sequential probability ratio test The Sequential Probability Ratio Test (SPRT) or likelihood-ratio test was developed as a hypothesis test for sequential data [148]. In the SPRT test, whether the sample being tested supports the null hypothesis H0 or the alternative hypothesis H1 is often evaluated by the log-likelihood ratio: Lt = log Pr(et |H1) . Pr(et |H0) (3.2) In the SPRT, the log-likelihood ratio is often accumulated over time. The accumulated log-likelihood ratio has the effect of emphasizing sustained changes. To detect an increase, the null and alternative hypotheses in the accumulated SPRT are: H0: the data from t0 up to now are equal to zero; H1: a level shift of an amplitude d+ has occurred. (3.3) 27 3.1. Level-shift change detection If the signal distributions under H0 and H1 both follow the Gaussian distributions, have the same variance δ 2 , and are independent and identically distributed over time, the SPRT generates an important level-shift detection technique called the Cumulative Sum (Cusum) test. Cusum test and V-mask The statistic used in the Cusum test is the cumulated deviation from the target value. In the trend-change detection, the Cusum accumulates residuals e from t0 , the last time the Cusum was restarted, to the current instant t, as in Equation (3.4): t Ct = ek . (3.4) k=t0 A visual procedure proposed by Barnard in 1959 [10], known as the V-mask testing (see Figure 3.2) can be used with the Cusum statistic to perform change detection. A V-mask moves with the origin on the Cusum plot from the start of the sequence to the current sampling instant. At every sampling instant, the previous Cusum values are tested against the two arms of the V-mask. When the lower arm is crossed from within, a level increase is detected; when the upper arm is crossed, a level decrease is detected. The point at which the Cusum plot crosses the arm of the V-mask (t∗ in Figure 3.2) is identified as the start of the detected level shift. Cusum plot origin { h+ unit length d+ end start of the increase t* current time t Figure 3.2: The Cusum crosses a V-mask’s lower arm to detect an increase. 28 3.1. Level-shift change detection The shape of V-mask is determined by the critical region for the corresponding SPRT (Equation (3.3)). Therefore, performance of the Cusum test is entirely determined by the shape of V-mask. β Pr(et0 . . . et |H1) 1−β < log < 1 − α/2 Pr(et0 . . . et |H0) α/2 (3.5) where α/2 is the type-I error rate (α: the overall type-I error rate for the two-sided test which detects changes in both directions), and β is type-II error rate. With this critical region, the SPRT is carried out as follows: 1. If the likelihood ratio is greater than the upper threshold, the alternative hypothesis H1 is accepted, i.e., an increase is detected; 2. If the ratio is less than the lower threshold, H0 is accepted, i.e., a steady state is detected; 3. Otherwise, the evidence is not strong enough to support either H0 or H1, and the test should keep running. To realize this SPRT, the slope of the lower arm in Figure 3.2 is set to d+ /2, half the magnitude of the increase of minimum interest, and the rise distance h+ α , equivalent to the upper critical threshold in Equation (3.5), are calculated as below [128]: h+ α = δ2 δ2 1−β ) ≈ − log( log(α/2) d+ α/2 d+ (3.6) 2 where dδ+ is the normalizing factor. The approximation holds because the values of α and β are often very small. Theoretically, the lower threshold in Equation (3.5) is not reflected in the V-mask, as the rise distance associated with this threshold would be negative, but in practice, if a lower arm with a very small h+ 0 is not broken by any Cusum values for a considerable period of time, the signal can be declared as steady. Two one-sided Cusum The Cusum plot can cross V-masks of different rise distances. There exists a maximum h+ max (t) for each instant t that guarantees all the lower-arm strokes with h+ h+ max (t) will detect an increase. Since a rise distance is monotonically related to a false positive rate α, hmax (t) can be used as a measure of certainty for change detection. For example, the maximum h+ for the lower arm in Figure 3.2 reflects the certainty level for detecting an increase. 29 3.1. Level-shift change detection The certainty levels h± max are equivalent to Barnard’s one-sided Cusum ± C [10], and can be calculated iteratively as + + + et − d2 , 0) Ct+ = max(Ct−1 − − − Ct = min(Ct−1 + et − d2 , 0) (3.7) The two one-sided Cusum is compared with thresholds to detect level shifts. Adaptivity To obtain constant α- and β-rates overtime, the shape of Vmask should be adapted to the time-varying signal characteristics. The slope d should be updated if the magnitude of the change of minimum interest varies over time. The rise distances h should be adjusted according to the time-varying variance δt2 . Furthermore, if the sequential prior probability for H1 (Prt0 (H1)) and for H0 (Prt0 (H0)) are not always equal to one another over time, the likelihood ratio in Equation (3.5) should be modified as below: (H1/H0) Lt = log Pr(et0 . . . et |H1) Prt0 (H1) . Pr(et0 . . . et |H0) Prt0 (H0) (3.8) The rise distance of the V-mask h should be adjusted accordingly. The Cusum test is an optimal detection method for stationary and independent signals subject to abrupt changes in the mean, since the Cusum test minimizes the delay between the occurrence of a shift and its detection for any fixed false alarm rate [12]. Due to this advantage, the Cusum test is widely used in a broad variety of applications. 3.1.4 Bayesian approaches The probability density for the residual et , conditional on the fact that state z (i) is in effect at time t, can be calculated as below: (i) Pr(et |zt ) = N (i) (et ; 0, δt2 ). (3.9) With a statistical model that forecasts the future based on signal history, (i) Pr(et |zt ) represents the evidence provided by the new measurement yt for (i) zt . The probability distribution N (i) (0, δt2 ) should be updated online to the time-varying signal characteristics. A Bayesian estimator integrates the prior knowledge with the informa(i) tion carried by observations. A fixed-point smoother Pr(zt−k |yt0 . . . yt ), k<t, provides a robust estimate of the finite states, therefore is suitable for trend monitoring. One of the challenges in constructing a Bayesian estimator is (i) to develop the recursive process to update Pr(zt−k |yt0 . . . yt ) online. 30 3.2. Study procedure 3.2 Study procedure The studies in this thesis are carried out in four steps: data acquisition, data annotation, method development, and performance evaluation. 3.2.1 Data acquisition Following ethics approval, the variables listed in Appendix II were collected at British Columbia Children’s Hospital using a PC-based software tool S/5 Collect (GE Healthcare, Chalfont St Giles, UK). The measurements were sampled every 5 seconds. Patient demographic information and intraoperative events were recorded by the attending anesthesiologists. For some of the cases, the attending anesthesiologists were asked to indicate the changes they perceived during surgery, using standard intraoperative auditory and visual monitoring while performing standard anesthesia tasks. Their indications, referred to as intra-operative detection, were recorded in synchrony with the trend data. Data were recorded from induction of anesthesia until the end of surgery. Since NIBPmean was measured every 3 minutes, there were multiple samples for each measurement in the original NIBPmean trend signal. NIBPmean was therefore down-sampled to the measurement rate. 3.2.2 Data annotation Clinically relevant trend changes in the collected data were annotated by expert anesthesiologists after surgery. These anesthesiologists are all familiar with the concept and purpose of this study. Their annotations were used for model estimation and performance evaluation. A purpose-built software tool eViewer was used for viewing and post-op annotation [163]. The software provides an interface (see Figure 3.3) for the anesthesiologists to identify the clinically relevant trend changes, evaluate the degree of significance, and classify trend patterns. The operation interface is divided into three areas for signal display, annotation, and recording. eViewer was designed for annotating each signal individually, but has been updated to a multivariate version to support the annotation of multiple signals. 3.2.3 Model selection Model selection is crucial for the methods proposed in this thesis. An appropriate model should transform clinically relevant changes in the properties of interest into sustained level shifts in forecast residuals and meanwhile reduce 31 3.2. Study procedure Display Operation Recording Figure 3.3: Interface of eViewer the deviations caused by irrelevant variations. In addition, the model should utilize the information known from population data while being adapted to real time data. The model structure should also be intuitive to clinicians, so that expert knowledge can be incorporated into the signal model. An intuitive solution will also facilitate the acceptance by clinicians. Finally, it is desired that a model can be easily generalized to describe a wide variety of trend signals measured in different clinical environments. The models in the EWMA family have demonstrated their advantages for trend monitoring in the previous studies reviewed in Section 2.2. Therefore, the EWMA model and its variants are used in this thesis for describing the trend variations of a single variable. After the functional form of a signal model is determined, the model parameters that are believed to be constant across patients and during surgery are estimated from annotated population data, whereas the parameters that account for the time-varying characteristics should be estimated from real-time data. The unknown parameters should be kept as few as necessary. In addition, the rate of change of these parameters should be much slower than the sampling rate. Otherwise, their real-time estimates can be biased and have a high degree of variance [94]. 32 3.2. Study procedure Different forecasting models are used in Chapters 4 to 7 to detect trend changes in different physiological variables. These models are all based on the prerequisite that signals are not heavily contaminated with artifacts, or that artifacts have been effectively reduced. Although artifacts still may exist, their occurrence is infrequent; therefore, the uncertainty components in these methods are all assumed to follow the Gaussian distributions. All the proposed methods are realized in Matlab (Mathworks, Natick MA, USA). 3.2.4 Performance evaluation Trend-change detection methods eventually can be reduced to a binary decision between a null hypothesis H0 and an alternative hypothesis H1. The performance of this decision-making process should be tuned to arrive at the best tradeoff between sensitivity, specificity, and time delay. Sensitivity (true positive rate) and specificity (1-false positive rate) are often used to describe the accuracy of detection. As shown in Section 3.1, in statistical change detection, a threshold is often used to tune the tradeoff between sensitivity and specificity. A Receiver Operating Characteristic (ROC) curve records the relationship between sensitivity and (1-specificity) as the tuning threshold is varied [167]. The Area Under a ROC Curve (AUC) may be used as a measure of the overall detection performance [55]. ROC curves are also widely used to find the optimal operating setting for the general population. The performance for different thresholds are compared by the distance from the corresponding point to the upper left corner of the ROC figure, i.e., the point with sensitivity=1 and specificity=1. Conventionally, the tuning parameter with the shortest distance represents the optimal setting [3]. This criterion is used in this thesis. However, it should be noted that this criterion is based on the assumption that a false alarm and a missed detection have the same degree of negative impact on clinicians’ performance. This assumption is questionable, since false alarms may be more harmful than missed events. Some human factor researchers [36] claim that false alarms degrade both compliance and reliance, while missed events only reduce the user reliance of a monitoring system. Furthermore, time delay is almost unavoidable in trend-change detection, since the decision is made based on historical information. Although using more historical information can potentially improve the detection accuracy, time delay should be controlled within an acceptable range. Other factors such as computational overhead and storage requirements should also be considered if a method requires intensive computation. 33 Chapter 4 Two-level change detection based on the EWMA model The method proposed in this chapter, referred to as EWMA-Cusum method, is designed to detect changes in trend direction, i.e., to classify changes as increase, decrease, or steady, and to generate multilevel alerts according to the certainty of detection. The Exponentially Weighted Moving Average model is adopted in this chapter to produce one-point-ahead predictions based on the historical data. The one-sided Cusum of the forecast residuals are tested against two thresholds to detect change points with two different certainty levels. The temporal shapes of the detected changes are analyzed using heuristics to determine whether to trigger an alert. The method was tested using the clinical data of MVexp, EtCO2 , Ppeak, and NIBPmean. The work in this chapter has been submitted to IEEE Transactions on Information Technology in Biomedicine. 4.1 4.1.1 Method EWMA-based forecasting model The EWMA-based model is a widely used forecasting model for physiological monitoring. As reviewed in Chapter 2, a few important historical trenddetection methods such as Trigg’s Tracking Signal [139] and Hope’s Patient Condition Factor [62] are all based on the EWMA model. The forecasting model used in the proposed method is based on the standard EWMA model, or the ARIMA(0,1,1) model. The measurement yt for a physiological variable is assumed to be an EWMA of the historical data x ˆt contaminated by noise η as below: x ˆt = λyt−1 + (1 − λ)(ˆ xt−1 ); yt = x ˆ t + ηt ηt ∼ N (0, δ 2 ) (4.1) The noise ηt is assumed to be independent and identically distributed over time during a steady period and to follow a Gaussian distribution with 34 4.1. Method zero mean and variance δ 2 . The forgetting parameter λ∈(0 1) controls the influence of historical data on the signal predictions xˆ. The magnitude of λ adjusts the tradeoff between the robustness of detection and speed of tracking. A small λ increases the magnitude of prediction deviations for trend changes, but it also reduces the speed at which model predictions adapt to a new plateau. The predictions generated using this model deviate from the measurements during an increase or decrease, while automatically following the signal into a plateau phase. After detrending, trend changes are translated into non-zero levels shift in the forecast residuals. The incremental rate and direction of trend changes in the original signal are reflected in the level and direction of the level shifts. 4.1.2 Cusum-based multilevel change detection The one-sided Cusum statistics C ± are calculated as in Equation (3.7) [10]. As pointed out in Section 3.1.3, C ± is related with the V-mask testing and can be used as an indicator of change certainty. Comparing the two one-sided Cusum C ± with different thresholds can detect trend changes of different significance. In this chapter, C ± is compared with three thresholds in each + + − − − direction, with h+ 0 <h1 <h2 for detecting increases and h0 >h1 >h2 for detecting decreases, to detect trend changes of two different levels of certainty. A level-1 minor increase (decrease) is detected when a C + (C − ) crosses h+ 1 + − (h− ) from h (h ). A level-2 major increase (decrease) is detected when 1 0 0 − + − C + (C − ) crosses h+ 2 (h2 ) from h1 (h1 ). If the Cusum statistics in both + directions fall within [h− 0 h0 ], the signal has reached a new plateau. The Cusum test was originally designed for applications where a detected change would demand a prompt action to reset the system to a normal level. In the case of intraoperative patient monitoring, it is neither necessary nor realistic to interrupt surgery and return the patient’s physiological state to the original level. However, allowing the detected trend to continue introduces two problems for change detection. First, the influence from a detected change remains in the Cusum statistics and reduces the detection sensitivity in the future. Second, the oscillations around a detected trend can trigger repetitive detections. In practice, clinicians usually ignore the measurements recorded a long time ago (empirically set to T sampling intervals in this chapter) when evaluating the current trend dynamics. It is also assumed in this chapter that the measurements before the starting point of the most recently detected trend change has little influence on the current detection process. In order 35 4.1. Method to remove the influence of historical measurements, a length-constrained V-mask is used in the Cusum test. For example, the length of the upper arm of the V-mask for detecting decreases should be no longer than min(T, t−t∗ +1), where t∗ is the starting point of the most recent increase (see Figure 3.2 for how to locate t∗ ). This is equivalent to resetting the Cusum C − at max(t∗ , t−T ) to zero and recalculating its values between max(t∗ , t−T ) and t. To reduce redundant detections, the continuous extension of a detected trend is ignored. Repetitive detections due to oscillations around a threshold can be suppressed by comparing the starting locations of two successive changes in the same direction. If the starting locations of both changes fall within a short window of a predefined length (T /10 sampling intervals, for example), the second change is viewed as an extension of the first one, and therefore is ignored. 4.1.3 Abruptness of trend changes The time delay of detection, i.e., the distance between t∗ and t in Figure 3.2 characterizes the abruptness of the detected trend. If this distance is sufficiently short (less than T /10 sampling intervals in this chapter), then the trend can be recognized as an abrupt change. 4.1.4 Alert management A detected trend change does not necessarily demand immediate attention. Detected trend changes can be combined with other information to determine whether an alert should be triggered. Heuristics below are used to suppress uninformative alerts: A: For most abrupt changes, a level-1 and a level-2 changes are often generated immediately one after the other. To avoid repetitive alerts in such a situation, the level-1 abrupt change is delayed for a short period (T /10 sampling intervals). If a level-2 abrupt change of the same direction is detected within this period, then the level-2 alert is displayed and the level-1 change is discarded. B: To reduce false alerts for short-peak artifacts, the alert for an abrupt change is held for a short period (T /10 sampling intervals). If an abrupt change in the opposite direction is detected within this period and the signal level returns to within 2δ around the initial values, where δ is the standard deviation of measurement noise, then the two 36 4.2. Results changes are recognized as a short-peak artifact and the alerts for both changes are suppressed. C: For some variables, trend variations are relevant only when the signal level is within a particular range (critical range). If a change is detected outside the critical range, the alert is delayed until the signal enters this range. The alert is ignored if the signal does not enter this range before changing its direction. The critical range is predetermined in this chapter, but can also be adapted to the real time data. 4.2 Results The performance of the proposed EWMA-Cusum method was tested using the trend signals of end-tidal carbon dioxide (EtCO2 ), peak airway pressure (Ppeak), end-tidal minute volume (MVexp), and noninvasive mean blood pressure (NIBPmean) from 40 cases. Rules A and B were applied to all the variables, and Rule C was used for Ppeak in ventilated patients. The critical range for Ppeak was set to be above 15cmH2 O for increases or below 10cmH2 O for decreases. The forgetting length T was set to 3 sampling intervals (9 minutes at the sampling rate of 3 minutes) for NIBPmean and 48 sampling intervals (4 minutes at the sampling rate of 5 seconds) for EtCO2 , MVexp, and Ppeak. A fifth-order median filter (see Section 8.1.1.2 for details) was used to remove transient artifacts in EtCO2 , MVexp, and Ppeak, and a third-order filter was used on NIBPmean. The 40 cases were divided into a training group and a test group, each containing 20 cases, and spanning approximately 25 hours of anesthesia in total. Each trend signal was firstly filtered using a two-point moving average filter. The noise variance δ 2 for each physiological variable was set to the mean-square-deviation of the filtering residuals averaged over all the annotated steady segments. The forgetting parameter λ was set empirically by investigating the forecast residuals for the steady segments. With the chosen forgetting parameter, the mean absolute residual over a short period after the occurrence of a steady segment, averaged over all the annotated steady segments, should be less than 2δ. The short averaging period is set to 2 minutes for EtCO2 , Ppeak and MVexp (24 data points at the sampling rate of 5 seconds) and 9 minutes for NIBPmean (3 data points at the sampling rate of 3 minutes). The thresholds ±h1 and ±h2 control the change-detection performance. The magnitude of the level-2 certainty thresholds h2 was set to a level so that the resulting false detections are less than 5% of the annotated significant 37 4.2. Results changes in the training group. h1 was set to h2 /2, and h0 was set to h2 /5. Following parameter tuning, the EWMA-Cusum method was tested on the remaining 20 cases. Two anesthesiologists participated in the performance evaluation. The two-level alerts generated by the method were displayed on the unfiltered data and visually inspected by the anesthesiologists. They first evaluated the detection results independently, including the alert location, direction, and level of certainty. When the anesthesiologists’ opinions about a particular alert differed, these discrepancies were discussed, and the results agreed upon were used for performance evaluation. The anesthesiologists were also asked to annotate the changes of level-1 or level-2 significance that were missed by the method. 4.2.1 Example case An EtCO2 trend signal from a pediatric patient is used here to demonstrate how the method works and to show the results of each step (Figure 4.1). The changes were displayed at the detection locations. The Cusum test as a robust change detector successfully avoided detecting the short-peak artifacts during intubation, but falsely detected the short-peak artefact around t=80. An increased test threshold could avoid this false detection, but would results a missed level-2 decrease at t=293. Using Rule B (see Plot 4 and Plot 5), the short-peak artifacts at t=80 was recognized and suppressed without missing the decrease at t=293. The level-1 alert generated at t=40 for an increase (see Plot 2) was very close to the level-2 alert generated at t=43 for the same increase, due to the abruptness of this change. A similar situation was also observed around t=92. Rule A is used to suppress the unnecessary level-1 alerts in these situations (see Plot 4 and Plot 5). For the long gradual decrease after t=200, the level-1 alert is a very useful early warning. 4.2.2 Performance evaluation In the 20 validation cases, 55 level-1 and 33 level-2 alerts were generated for EtCO2 , 65 level-1 and 38 level-2 alerts for MVexp, 26 level-1 and 21 level-2 alerts for Ppeak, and 58 level-1 and 48 level-2 alerts for NIBPmean. The alerts produced by the EWMA-Cusum method were sorted into four types according to the correctness of the location, direction, and level of certainty: (1) alerts with the change direction and the certainty level both correctly detected, (2) alerts whose change direction was correctly detected but whose certainty level did not agree with the annotation, (3) alerts 38 4.2. Results Intubation Decrease in anesthetic concentration Return to spontaneous respiration End of surgery Plot 1: EtCO 2 measurements were contaminated by transient and short-peak artifacts due to intubation and patient movement. Plot 2: Transient artifacts were removed by median filtering. The signal was then detrended by subtracting the predictions from the filtered signal. C 251 Increase Plateau Decrease C 43 Rule A Rule A C 43 C 251 2 (%) Rule B Plot 3: The Cusums of the prediction residuals were tested against different Vmasks to determine the maximum rise distance that correspondeds to the certainty in Plot 4 for each direction. Plot 4: Three thresholds were used to in each direction to detect change points of two different levels of statistical significance and the occurrence of new plateaus. The algorithm detected level-2 increases at t=43 and 94, level-2 decreases at t=71 and 293, level-1 decreases at t=67 and 251, and a plateau at t=192. The increase and decrease in the large ellipse according to Rule B constitute a short-peak artifact. The level-1 change points in the small circles according to Rule A were redundant for the abrupt changes. Change points were also marked on plot 2. AlertsonEtCO Plot 5: The final 2-level alerts were only displayed for the relevant trend changes. Time (sampling interval: 5 seconds) Figure 4.1: The change-point detection and alert management process illustrated on an EtCO2 trend signal 39 4.3. Discussion with falsely detected change direction were false alerts, and Missed alerts. Missed alerts refer to the level-2 change points that were annotated by the anesthesiologists but missed by the method. The missed level-1 annotations were not treated as missed alerts, because missing an early notification was considered to be insignificant in this study. The alerts in category (1) and (2) were both considered to be true positive detections. The whole segment between two successive level-1 change points annotated by the anesthesiologists was taken as a non-change segment (negative instance) and used for evaluating the false positive rate. The performance of the proposed method was evaluated by comparing the detection results in the 20 test cases with the expert annotations (see Table 4.1). The detection results and the annotations are listed in Tables 4.1-(a) and (b) respectively. In Table 4.1-(c), the detection results are compared with the expert annotations in the same category. The true positive rates RT P L1 and RT P L2 evaluate the correctness of the results in both the change direction and certainty level, indicating the detection accuracy in a strict sense. RT P (L1+L2) evaluate the correctness of the results in the trend direction. The direction of over 89% of the annotated changes were correctly identified (see RT P (L1+L2) in Table 4.1-(c)), with the highest rate being 95.4% for NIBPmean. The certainty level of over 92% of the true positive detections in EtCO2 and NIBPmean agreed with the anesthesiologists’ evaluations (see NT ypeA /NT P Table 4.1-(c)). Both the false positive rate and the rate of missed changes were minimal. 4.3 Discussion The EWMA model is widely used for forecasting physiological trend signals, as it intuitively describes the temporal responses of most physiological variables to a clinical event. The simplicity of the model structure provides great robustness for trend-change detection. The EWMA-Cusum method can be used for monitoring most physiological trend signals routinely recorded in the clinical environment. An effective clinical monitoring method should utilize clinicians’ expertise together with statistical signal processing techniques to make decisions. In this study, heuristics about a signal’s temporal shapes are used after a statistical test to differentiate short-peak artifacts from clinically relevant trend changes. The detection results on clinical data demonstrate that appropriately configured heuristics can significantly improve the performance 40 4.3. Discussion Table 4.1: Detection results of the Cusum-based 2-level alert method in 20 clinical cases (a) Detection results NT ypeAL1 NT ypeAL2 EtCO2 MVexp Ppeak NIBPmean 48 57 20 52 29 32 16 44 NT ypeA NT P NF P NF N 77 89 36 96 83 102 45 104 5 1 2 2 3 4 1 3 NT ypeAL1 (NT ypeAL2 ): number of the level-1 (level-2) alerts with both the direction and certainty level correctly detected; NT ypeA : NT ypeAL1 + NT ypeAL2 ; NT P : number of the true positive alerts with the direction correctly detected; NF P : number of alerts of both certainty levels with the direction falsely detected; NF N : number of missed level-2 change points. (b) Expert annotations EtCO2 MVexp Ppeak NIBPmean NP L1 NP L2 NP (L1+L2) ˜N N ˜P N 56 72 27 60 36 42 22 49 92 114 49 109 56 72 27 60 36 42 22 49 NP L1 (NP L2 ): number of the level-1 (level-2) annotated changes; NP (L1+L2) : NP L1 +NP L2 ; ˜N : number of negative instances during which no alert should be given; N ˜ NP : number of significant trend changes. (c) Performance evaluation EtCO2 MVexp Ppeak NIBPmean RT P L1 RT P L2 85.7% 79.1% 74.0% 86.6% 80.5% 76.2% 72.7% 89.8% NT ypeA /NT P RT P (L1+L2) RF P 92.8% 87.3% 80.0% 92.3% 90.2% 89.4% 91.8% 95.4% 8.9% 2.3% 7.4% 3.3% RF N 8.4% 9.5% 4.5% 6.1% RT P L1 : NT ypeAL1 /NP L1 , true positive rate of level-1 change detection; RT P L2 : NT ypeAL2 /NP L2 , true positive rate of level-2 change detection; RT P (L1+L2) : NT P /NP (L1+L2) , the overall true positive rate; ˜N , false positive rate; RF N : NF N /N ˜P , false negative rate. RF P : NF P /N 41 4.3. Discussion of this method. However, the application of heuristics introduces extra time delays. Therefore the heuristics should only be used when necessary. The EWMA-Cusum method is similar to an early-stage change-detection method proposed by Stoodley and Mirnia [131], in which the Cusum test is also used with a forecasting model for trend-change detection. In Stoodley’s method, however, the signal is modeled as the EWMA of the historical data plus a piecewise constant incremental rate estimated from the previously detected trend. The advantage of inclusion of an adaptive incremental rate to the standard EWMA model is that, the predictions follow the detected trend changes automatically and repetitive detections are therefore avoided. However, the adaptation process also makes future performance depends heavily on the accuracy of current detection. A missed or falsely detected trend could deteriorate the performance for a long time. The test results demonstrate that this method has great potential for clinical use. However, it should be noted that the raters were in full perception of the detection results during data annotation. Assessment of the results may thus have been susceptible to rater bias. The use of a two-level alert system alleviates the typical tradeoff between specificity and sensitivity in the single-level alert method. By introducing thresholds of a lower certainty, the two-level alert method can detect changes that would be missed by the single-level alert method. Although the twolevel method may also detect minor changes that would be considered as insignificant by clinicians, these extra minor detections, if displayed appropriately, may have much less negative effect on the user compliance of a monitoring system than false alarms. The display of multi-level alerts is an important topic that requires further study. In clinical practice, trend changes with similar incremental rates and durations may have different degrees of clinical relevance. A signal’s local variability and the shape of the preceding trend segment have great influence on the clinical significance of the current measurement. However, the EWMA-Cusum method proposed in this chapter assumes constant secondorder signal characteristics for all the patients during surgery. The false detections due to this unrealistic assumption can be avoided by adapting the change detection process to a signal’s intraoperative variability online. Chapter 5 and Chapter 6 will be focused on adaptive trend monitoring. 42 Chapter 5 Adaptive trend change detection based on the linear growth dynamic model Chapter 4 has demonstrated the potential of the EWMA model and the Cusum test for detecting changes in the trend direction. However, the performance of the EWMA-Cusum method proposed in Chapter 4 is limited due to its unrealistic assumption of a constant second-order signal property. This chapter addresses the challenge of intraoperative and inter-patient variability, with a belief that the trend-change detection performance can be improved by adapting the detection process to a signal’s second-order statistical characteristics. The signal model used in this chapter is implemented in two steps. Firstly, the linear growth Dynamic Linear Model (DLM) describes both the signal level and the incremental rate as dynamic processes. The timevarying characteristics of a physiological trend signal is reflected in the linear growth DLM as unknown parameters, which can be estimated online using an adaptive Kalman filter. Secondly, an EWMA filter is used to smooth the predictions generated by the DLM. Since the signal level and incremental rate are both modeled as dynamic processes, the DLM can be used for tracking changes in both the signal level and trend speed. In this chapter, a statistical test based on the Cusum of forecast residuals is used with this two-phase model to detect changes in the trend direction. There are two prerequisites for the application of this method. First, as in the EWMA-Cusum method (Chapter 4), signals under investigation should not be heavily contaminated with artifacts, or artifacts have been effectively reduced. In addition, the signal’s second-order property, if varies, must vary at a much slower rate than the signal’s sampling rate. Most of the physiological variables measured per 1 or 5 seconds in the current clinical environment satisfy the requirement. The method was tested on the clinical data of HR, EtCO2 , MVexp, and RR. As an example, the performance on 43 5.1. Method HR is compared with that of the Trigg’s Tracking Signal to demonstrate the advantage of this adaptive monitoring method. The work in this chapter has been published in IEEE Transactions on Biomedical Engineering [161] and Proceedings of the 27th Annual International Conference of the IEEE Engineering in Medicine and Biology Society [163]. 5.1 5.1.1 Method Linear growth dynamic linear model Some physiological trend signals can be treated as piecewise linear and described by the linear growth DLM proposed by Harrison and Stevens [57]: ςt 1 1 ςt−1 νt = + ζt ∼ N (0, Qt ) b b w 0 1 t t−1 t (5.1) xt xt−1 F ζt 1 0 y = [ ] x + η η ∼ N (0, R ) t t t t t H where the real signal level ςt and its incremental rate bt are subject to system disturbances ζt , and the signal measurement yt is subject to measurement noise ηt . The two system-disturbance elements νt and wt in ζt , and the measurement noise ηt are independent of one another, and they are all independent over time and follow the Gaussian distributions. Both ηt and ζt vary greatly between patients and during surgery. Therefore ηt and ζt , in addition to the system state xt , are considered unknown in this chapter. How to solve this dual-estimation problem online represents a challenge. The linear growth DLM tends to be over-sensitive when used for trendchange detection [63]. That is to say, the predictions follow signal variations too rapidly, resulting in a short-lived non-zero shift in the forecast residuals when a trend change does occur. As pointed out in Chapter 3, short-lived residuals are difficult to detect in the presence of artifacts. The proposed adaptive trend monitoring method is realized in four steps (see Figure 5.1). In the first step, the signal level, the incremental rate, and the noise covariances Q and R are estimated using an adaptive Kalman filter. Then, to prolong the effect of trend changes in forecast residuals, an EWMA filter is used to smooth the predictions generated by the linear growth DLM. Q and R estimates are used to adjust the thresholds in the change-detection test. This test divides the signal into a series of straight lines. Finally, the 44 5.1. Method segments of the same trend direction are combined and change points are located. This method is referred to Adaptive-DLM method in this thesis. Figure 5.1: Schematic of the Adaptive-DLM trend-change detection method 5.1.2 Adaptive Kalman filter The noise variance Q and R, and the system state x are estimated using an adaptive Kalman filter. The adaptive Kalman filter consists of a filter, a fixed-point smoother, a process that updates Vk−1,k|t (see Equation (5.5) for the definition) and a recursive Q-R estimator (see Figure 5.2). The Kalman filter is a recursive filter that estimates the system states of a state-space model from a series of noisy measurements. According to the location of estimation k relative to the current instant t, the estimator is classified as smoother for k<t, filter for k=t, and predictor for k>t. In this chapter and the chapters that follow, the subscript k|t indicates that a variable at k is estimated conditionally on the observations y1...t ; the subscript t|t is simplified as t. The operators filter and smoother defined below represent the standard Kalman filtering and fixed-point smoothing processes respectively. These operators will also be used in Chapter 6 and Chapter 8. Filter This operator represents the standard Kalman filtering process [92]: (ˆ x t , V t , Kt , x ˆt|t−1 , Vt|t−1 , et , Vt,t−1|t ) = f ilter(ˆ xt−1 , Vt−1 , Θt , yt ) where Θt ={Ft , Ht , Qt , Rt , µt } represents the model parameters in effect at time t. µt is the mean of the system disturbance ζt . In the DLM in Equation (5.1), Ft =F , Ht =H and µt =0. The only time-varying parameters are Qt and Rt . 45 5.1. Method Figure 5.2: Adaptive Kalman filtering process Maximization algorithm EM: Expectation- The filtering operation is carried out in two steps. With the initial estimates x ˆ0 and V0 , the one-point-ahead prediction x ˆt|t−1 and its covariance Vt|t−1 are first calculated as below: x ˆt|t−1 = Ft x ˆt−1 + µt Vt|t−1 = Ft Vt−1 FtT + Qt (5.2) where Vt|t−1 =E{(xt −ˆ xt|t−1 )(xt −ˆ xt|t−1 )}. Then the innovation et , the innovation covariance St , the Kalman gain Kt , the estimate x ˆt , the estimate covariance Vt , and Vt,t−1|t are updated: et = y t − H t x ˆt|t−1 St = Ht Vt|t−1 HtT + Rt Kt = Vt|t−1 HtT St−1 x ˆt = x ˆt|t−1 + Kt et Vt = Vt|t−1 − Kt St KtT Vt,t−1|t = (1 − Kt Ht )F Vt−1|t−1 (5.3) where Vt = E{(xt − x ˆt|t ), (xt − x ˆt|t )} and Vt,t−1|t = E{(xt − x ˆt|t−1 ), (xt − x ˆt|t )}. Fixed-point smoother The standard fixed-point Kalman smoothing process [126] is represented as: (ˆ xt|t , Vk|t , Vt,k|t−1 ) = smoother(ˆ xk|t−1 , Vk|t−1 , Vt−1,k|t−2 , Kt−1 , et , Vt|t−1 , Θt ) 46 5.1. Method For every point in the smoothing window k∈(t−T +1 . . . t) (T is the window length), the fixed-point smoother updates the system estimates xˆk|t , its covariances Vk|t , and Vt,k|t−1 . The smoothing process is carried as Vt,k|t−1 = Vt−1,k|t−2 (Ft − Ft Kt−1 H)T Kk|t = Vt,k|t−1 H T (Rt + HVt|t−1 H T )−1 x ˆk|t = x ˆk|t−1 + Kk|t et T Vk|t = Vk|t−1 − Vt,k|t−1 H T Kk|t (5.4) where Vk|t =E{(xk −ˆ xk|t )(xk −ˆ xk|t )} and Vt,k|t−1 =E{(xt −ˆ xt|t−1 )(xk −ˆ xk|t−1 )}. Vk−1,k|t updating In addition to the standard fixed-point Kalman smoothing, Vk−1,k|t , defined as Vk−1,k|t =E{(xk −ˆ xk|t )(xk−1 −ˆ xk−1|t )}, is also estimated recursively: V(k−1,k)|t = V(k−1,k)|t−1 − Va (t, k)H T Ka (k − 1, t)T where Va (t, k) = Va (t − 1, k)(F − F Kt−1|t−1 H)T Ka (t, k) = Va (t, k)H T (Rt + HVt|t−1 H T )−1 (5.5) (5.6) with initial condition Va (t, t)=Vt−1,t|t (see Appendix I for the proof). As shown in Figure 5.2, the operators defined above are used to estimate the system states for the previous T points. These estimates are used in the following to estimate Q and R online. Q and R estimation In the DLM, the measurements y0...t and system states x0...t carry information about the unknown model parameters Q and R. The Q and R estimates should maximize the likelihood of x0...t and y0...t . However, since x0...t are unknown, their estimates x ˆt−T +1...t|t generated by the Kalman smoother are used instead, leading to a so-called ExpectationMaximization (EM) algorithm. The EM algorithm is a two-step recursive method. When used for online estimation, the E-step calculates the expectation of the log-likelihood of (Qt , Rt ) conditional on the measurements y0...t and the previous estimates ˆ t−1 , R ˆ t−1 ), with respect to the hidden system states x0...t ; the M-Step (Q maximizes this expectation with respect to Q and R to derive new estimates for the next iteration at time t+1. The process is summarized as: ˆ t−1 , R ˆ t−1 )} E : f (Q, R) = E{log Pr(x0...t , y0...t |(Q, R)|y0...t , (Q (5.7) ˆ ˆ M : (Qt , Rt ) = argmax f (Q, R). Q,R 47 5.1. Method The influence of the historical measurements on the Q and R estimates decays over time. Therefore yt−T +1...t and xt−T +1...t that fall in the smoothing window of the Kalman smoother, instead of all the previous data, are ˆ t and R ˆ t are updated recursively online as used for updating Q and R. Q ˆ t = 1 {(t − 1) · Q ˆ t−1 + Lt|t + t−1 γk|t } Q k=t−T t ˆ t = 1 {t · R ˆ t−1 + Vt|t + t−1 χk|t } R k=t−T t+1 (5.8) where L and V represent the influence of the forward filtering, and γ and χ represent the influence of the backward smoothing (see Appendix I for details). ˆ and R, ˆ the filtering innovations et can be treated With the estimated Q as white Gaussian noise with zero mean. The dynamic process of the systemstate estimates can be written as below [50]: ˆ t + HVt|t−1 H T )KtT ). x ˆt = F x ˆt−1 + Kt et ωt ∼ N (0, Kt (R ωt 5.1.3 (5.9) Wt EWMA of the DLM predictions Given the Kalman estimate x ˆt at time t, the system state for any future instant j, (j>t), can be calculated as x ˆj|t = F j−t x ˆt . (5.10) The multi-point-ahead predictions above result in multiple prediction values for every future instant. The predictions for time j conditional on observations up to different historical instants, i.e., x ˆj|0 . . . x ˆj|t , are averaged using an EWMA filter with a forgetting parameter λ∈(0 1). The averaged prediction x ˜j|t , initialized with x ˜j|0 = x ˆj|0 , can be calculated as below: x ˜j|t = λˆ xj|t + (1 − λ)˜ xj|t−1 1 t j−1. (5.11) The averaging process described above can be realized recursively. Given x ˜t−1|t−2 at time t−1, the prediction for time t is calculated as below: x ˜t|t−1 = λˆ xt|t−1 + (1 − λ)F x ˜t−1|t−2 . (5.12) Similarly, V˜t|t−1 , the covariance of the prediction x ˜t|t−1 , initialized with ˜ V1|0 = W0 , can also be calculated recursively as below: V˜t|t−1 = (1 − λ)2 F V˜t−1|t−2 F T + Wt . (5.13) 48 5.1. Method The filtered predictions for the measurements from t−Tf +1 to t are: [˜ yt−Tf +1|t−Tf . . . y˜t|t−1 ] = H[˜ xt−Tf +1|t−Tf . . . x ˜t|t−1 ]. (5.14) The standard deviation of the sum of these predictions is: t ht = ST D( (˜ yj|j−1 )) = ˜Q ˜ tH ˜T H (5.15) j=t−Tf +1 ˜ ˜ t represents the covariance of [˜ where H=[H . . . H] and Q xt−Tf +1|t−Tf . . . x ˜t|t−1 ] |m−n| ˜ ˜ V(t−|m−n|)|(t−|m−n|−1) , 1 m, n Tf . with Q2m−1:2m,2n−1:2n = ((1−λ)F ) ˜ t is determined by the Q and R estimates, ht is therefore adaptive. Since Q 5.1.4 Adaptive change detection The Cusum of the differences between the EWMA predictions y˜=H x ˜t|t−1 and the Kalman estimate yˆ=H x ˆt is tested against a threshold to decide whether a change has occurred. The cumulation window for the Cusum statistic C(n, t) is set to be no longer than the previous change point t∗i−1 with a maximum length of Tf : C(n, t) = t yj j=t−n (ˆ − y˜j|j−1 ) n min(t − t∗i−1 , Tf )−1. (5.16) C(n, t) accumulates the prediction residuals from t−n + 1 to t. The test thresholds are determined by testing the hypothesis that no change has occurred within the cumulation window [t−Tf +1 . . . t]. Hence the test thresholds are scaled to the standard deviation of the sum of the predictions in the cumulation window (see Equation (5.15)), i.e., h= ± σ×ht , where σ is used to adjust the detection sensitivity. A change point is detected when C(n, t) has broken either limit before reaching the end of the cumulation window. 5.1.5 Trend patterns A trend signal is segmented by the series of detected change points. Each segment is described as steady, increasing or decreasing by evaluating its average incremental rate. Successive segments of the same direction are combined, and only the initial change point triggers an alert. In summary, this method has six parameters: the initial Q0 and R0 , and the smoothing window size T in the adaptive Kalman filter, the cumulation window size Tf , the forgetting parameter λ in the EWMA predictor, and the sensitivity control σ in the Cusum test. 49 5.2. Test on heart rate trend signals 5.1.6 Trigg’s tracking signal Trigg’s tracking signal [139] has been previously used for change detection in patient monitoring (see Section 2.2). TTS is the ratio between the EWMA of forecast residuals and the EWMA of absolute forecast residuals: st = (1 − υ)et + υst−1 Mt = (1 − υ)|e|t + υMt−1 (5.17) Tt = st Mt where e is the model-forecast residuals and υ (0–1) is the forgetting parameter of TTS. The magnitude of TTS is related to the significance of change detection. Change points are identified when the TTS crosses the thresholds ±hT T S from zero (see Figure 5.5). A change point will be ignored, if it has the same trend direction with the previous change point and the TTS statistics between the previous change point and the current change point are consistently positive or negative. 5.2 Test on heart rate trend signals Heart rate is an important physiological variable that demands constant monitoring during surgery. Astute anesthesiologists often rely on subtle changes in the HR trend to evaluate the adequacy of depth of anesthesia as well as fluid administration. A seasoned anesthesiologist can perceive changes in HR of three to four beats per minute from the auditory signal [89]. The HR trend is usually sampled every 1 or 5 seconds during surgery, and can be described using the linear growth DLM. The trend signals of HR from the 40 cases used in Chapter 4 were used here to test the performance of the Adaptive-DLM method. The signal is pre-filtered using a third-order median filter (see Section 8.1.1.2 for details) before the test. The clinically relevant decreases and increases in each case were annotated by two expert anesthesiologists independently using eViewer (see Figure 3.3). The significant changes marked by different raters falling within a two-minute window and of the same direction were treated as agreed upon annotations, and the mean location of this pair was used as the reference location. Following this criterion, the two sets of annotations showed a high ratio of agreement [7]. Besides the agreed-upon events, 3% and 6% additional events were marked by each expert. The final inter-rater agreed change points were decided by discussion between the two experts. The proposed method was tested in four steps: (1) verifying the accuracy of the Q and R estimation algorithm using simulated signals; (2) tuning the 50 5.2. Test on heart rate trend signals initial values Q0 , R0 and the smoothing window size T using the training data; (3) testing the proposed Adaptive-DLM method and the TTS approach using test data and comparing their performances in detecting both increasing and decreasing changes; (4) analyzing the influence of the cumulation window size Tf and sensitivity control σ on the performance of the Adaptive-DLM method, and comparing the performance of the AdaptiveDLM method in detecting increases to that of intraoperative detection. The purpose is to compare the performance of the Adaptive-DLM method to the best performance of attending clinicians. Since only two decreases were detected intraoperatively in the all the test cases, the decreasing instances were not included in Step (4). The expert annotations were used as the reference to evaluate the detection results. Any change detected less than 2Tf sampling intervals after the expert annotation and showing the same direction was considered correct. The tolerable delay was extended to 3Tf points for the TTS approach, as a longer delay was expected. As in Chapter 4, the segment between two successive annotated changes points represents a negative instance. 5.2.1 Q and R estimation The results of Q, R estimation are illustrated in Figure 5.3 with a two-phase simulated signal. The signal is composed of a 500-point segment generated using the DLM with Q0 =[10 0; 0 0.1], R0 =64, followed by the second 500-point segment generated with Q0 =[5 0; 0 0.1], R0 =16. Starting with arbitrary initial values Q0 =[1 0; 0 0.01], R0 =1, with T =30, the estimates converged within the 50% intervals of the first set of true values after less than 50 steps, and then oscillated around the true values until t=500. From then on, the true values of Q(1, 1) and R was dramatically changed while Q(2, 2) remained constant. The Q, R estimates smoothly followed the changes, and converged around the new true values after 50 iterations. The adaptive Kalman filter was tested on 20 simulated signals, each composed of one thousand samples. When initialized with values with more than 100% deviation it took 43 steps for Q(1, 1), 46 steps for Q(2, 2), and 102 steps for R (3.6, 3.8, and 8.6 minutes respectively at the sampling rate of 5 seconds) to converge within the 30% interval around the true values. No deterministic oscillation was observed after the initial steps. 51 5.2. Test on heart rate trend signals Signal 200 Q=[10 0; 0 0.1] R=64 t = 500 Q=[5 0; 0 0.1]R=16 100 Q(2,2) Q(1,1) 0 20 10 0 0.2 0.1 R 0 100 50 0 0 50 250 550 750 t 1000 Figure 5.3: The results of Q and R estimation for a simulated trend signal 5.2.2 Parameter tuning Q and R estimation was applied to the training data with the Kalman smoothing window T =30. The Q and R estimates of the 50 iterations before the end of surgery were averaged to get final estimates for each case. Then the final Q and R estimates of all the training cases were averaged, and the resulting values Q0 (1, 1)=1.18±73.1%, Q0 (2, 2)=0.092±64.0%, and R0 =0.23±100.3% were used as initial settings in the further test. The forgetting parameter used in the EWMA filter was set empirically by investigating the forecast residuals for the steady segments. With the chosen forgetting parameter, the absolute difference between the Kalman estimates and the EWMA predictions, averaged over the first 2 minutes of every steady segment (24 data points at the sampling rate of 5 seconds) and then averaged over all the annotated steady segments, should be less than two times of the standard deviation of measurement noise. The forgetting parameter υ in the TTS approach was set in a similar manner. The threshold for TTS hT T S was set to minimize (sensitivity 2 + (1 − specif icity)2 ). The TTS results were smoothed using a 20-point moving average filter. 52 5.2. Test on heart rate trend signals 5.2.3 Comparison with Trigg’s tracking signal CUSUM Testing Heart Rate (beats/min) Figure 5.4 demonstrates how the Adaptive-DLM method detected change points and determined the trend directions according to the magnitude of incremental rate. As seen in the top two plots, the signal predictions and the testing limits both vary as the signal trend evolves, especially when a change is detected and the EWMA process resets. 140 120 increasing steady decreasing target values real data 100 80 200 100 CUSUM change points upper limit 0 −100 Slope 4 2 lower limit slope threshold 0 820 830 840 850 860 870 Time (interval:5s) 880 890 900 Figure 5.4: An example of how the Adaptive-DLM method detects change points in the heart rate trend signal, with Tf =10, σ=0.7. The signal predictions and the testing limits both vary with the signal trend. The best performance of the Adaptive-DLM method with Tf =20 was compared with that of the TTS approach. Both approaches were tested repeatedly on the test data, as σ or hT T S increased from 0.1 to 1.0. The optimal configurations were: σ=0.5 for the Adaptive-DLM method and hT T S =0.7 for the TTS approach. 53 5.2. Test on heart rate trend signals (a) Signal A Heart Rate 110 100 90 80 1 0.7 TTS 0.5 0 − 0.7 −0.5 −1 800 850 900 950 1000 1050 Time (interval:5s) 1100 1150 1200 (b) Signal B Heart Rate 160 140 120 100 80 1 0.7 TTS 0.5 0 −0.5 −1 Top Plots: − 0.7 0 100 increasing, by annotation decreasing, by annotation 200 300 Time (interval:5s) 400 increasing, by our algorithm decreasing, by our algorithm 500 increasing, by T index decreasing, by T index Figure 5.5: Detection results of the proposed Adaptive-DLM method and the TTS approach on two example signals. Signal A: mean level=98.2, Q=[0.95 0; 0 0.13], R=0.37; The Adaptive-DLM method and TTS both correctly detected the change points without introducing false positive results. Signal B: mean level=138.3, Q=[0.26 0; 0 0.036], R=0.16; The AdaptiveDLM method correctly detected the change points; TTS introduced 7 false positive detections. 54 5.2. Test on heart rate trend signals Figure 5.5 shows the detection results of the proposed Adaptive-DLM method and the TTS approach on two example signals. With the optimal conditions, the Adaptive-DLM method correctly detected all the change points in both examples with no false positive detections. In Signal A (Figure 5.5-(a)), the TTS approach detected the same number of true positive and false positive changes as the Adaptive-DLM method. In Signal B (Figure 5.5-(b)), the changes after t=100 are irrelevant, since the increase just before t=100 has a dramatically large amplitude. However, the TTS for these irrelevant variations are all close to 1, resulting in a series of false detections. This example demonstrated the typical limitation of the TTS approach. As pointed out in Section 2.2, TTS often produces false detections for changes with a long duration but a negligible amplitude. The delay of the TTS results in both examples are longer than that of the proposed Adaptive-DLM method. The detection results of the TTS approach with hT T S =0.7 and the proposed Adaptive-DLM method with σ=0.5 are compared in Table 5.1. In Table 5.1, Q denotes the mean of Q estimates averaged over the whole case; R denotes the mean of R estimates averaged over the whole case. The 20 test cases were classified into three groups according to the magnitude of Q and R relative to the initial Q0 and R0 respectively: (1) with many abrupt changes and a high degree of noise; (2) with many abrupt changes and a low degree of noise; (3) and with gradual changes and a low degree of noise. It shows that the performance of the Adaptive-DLM method was consistent over the three different groups. For groups 1 and 2, TTS achieved comparable results, while for group 3, it gave more false positive results. The average delay of the change points correctly detected by both methods was 10 points for the Adaptive-DLM method and 23 points for the TTS approach, i.e., the delay of the TTS approach was 65 seconds longer. 5.2.4 Sensitivity analysis To study the influence of the size of cumulation window Tf and the sensitivity parameter σ on the performance of detecting increases, three ROC curves, each with a fixed Tf =10, 20 or 30, were constructed by tracing the sensitivity and specificity as σ increased from 0.1 to 1 (see Figure 5.6). Expert annotated 47 increases in the test cases. The area under each ROC curve was calculated to evaluate the detection performance with the corresponding Tf . The AUC is 0.916 for Tf =10, 0.928 for Tf =20, and 0.882 for Tf =30. The attending anesthesiologists correctly identified 36 (77%) increases with 11 (23%) false positive results, 55 5.2. Test on heart rate trend signals Table 5.1: Change detection results of the Adaptive-DLM method and the TTS approach on three groups of HR trend signals (in percentage) Group Number 1 2 3 Overall TP FP TP FP TP FP TP FP the TTS approach 81.3 12.7 95.8 33.3 67.2 84.5 75.5 60.2 Our Algorithm 87.5 12.7 83.3 29.2 87.9 10.3 86.7 15.1 TP: true positive; FP: false positive; Q: average of Q estimates over the whole case; R: average of R estimates over the whole case. With r1=((Q(1, 1)/Q0 (1, 1))2 +(Q(2, 2)/Q0 (2, 2))2 )/2 and r2=R/R0 , the property of each group: (1) r1>1, r2>1, 4 cases, 16 events, 3712 points in total; (2) r1>1, r2<1, 7 cases, 24 events, 6024 points in total; (3) r1<1, r2<1, 9 cases, 58 events, 7960 points in total; No case with r1<1, r2 > 1. expert annotation 1.0 σ increasing Sensitivity 0.8 intraoperative detection 0.6 0.4 Tf=10, AUC=0.916 Tf=20, AUC=0.928 Tf=30, AUC=0.882 0.2 0 0.2 0.4 0.6 1−Specificity 0.8 1.0 Figure 5.6: ROC curves for trend-change detection on HR signals with different Cusum window sizes Tf , as σ increases from 0.1 to 1.0. The optimal operation point is: sensitivity=0.86 and specificity=0.89, which is obtained with σ=0.5, Tf =20. AUC: area under the ROC curve outperformed by the Adaptive-DLM method with σ of a wide range (around 0.45-0.65 for Tf =10 and 0.3-0.55 for Tf =20). The tradeoff between time delay and sensitivity was studied for Tf =10 and Tf =20. Each point in Figure 5.7 was obtained by calculating the average 56 5.3. Results on EtCO2 , MVexp, and Ppeak Table 5.2: Mean and Relative Standard Deviation of the Q, R estimates generated by the Adaptive-DLM method HR EtCO2 MVexp RR Q(1,1) 1.18 ± 73.1% 0.023 ± 156.3% 0.0021 ± 118.0% 0.020 ± 29.0% Q(2,2) 0.092 ± 64.0% 0.001 ± 97.1% 0.00059 ± 98.8% 0.016 ± 67.5% R R=0.23 ± 100.3% 0.0038 ± 153.5% 0.00042 ± 110% 0.013 ± 45.1% delay of the correctly detected increasing changes. It appears that N =20 resulted in a slightly longer delay than N =10. The delay for N =20 increased from around 8.2 points to 17.5 points, i.e., from 41 seconds to one minute and 27 seconds, as σ increased from 0.1 to 1.0, indicating that with a fixed window size, a reduced false positive rate resulted in a longer delay. 18 Delay (points) 16 Tf=20 14 Tf=10 12 10 8 0.1 0.2 0.3 0.4 0.5 0.6 0.7 Sensitivity Control σ 0.8 0.9 1.0 Figure 5.7: Time delay of the Adaptive-DLM method on the HR signals. 5.3 Results on EtCO2 , MVexp, and Ppeak The approach was also tested on EtCO2 , MVexp, and Ppeak following the same procedure as for HR [163]. Table 5.2 lists the means of the Q, R estimates and their standard deviations relative to the mean, obtained from the 20 training cases. For each variable, three ROC curves, each with a different window size of Tf =10, 20, or 30, were constructed (see Figure 5.8). Both the increasing and 57 5.4. Discussion Table 5.3: Area under the entire or part of the ROC curve of the AdaptiveDLM method EtCO2 (%) MVexp(%) RR(%) AUC AUPC AUC AUPC AUC AUPC Tf =10 90.3 25.3 96.2† 66.0§ 86.1 18.0 † § Tf =20 94.6 54.2 95.6 53.7 88.3 28.5§ Tf =30 93.0 44.1 91.1 27.9 90.1† 23.6 (1) The total number of annotated change points including both increases and decreases is 81 for EtCO2 , 69 for MVexp, and 51 for RR; (2) AUC: area under the entire ROC curve; AUPC: area under part of the ROC curve with sensitivity > 0.7 and specif icity > 0.7; (3) † : the largest AUC among different Tf ; § : the largest AUPC among different Tf . decreasing changes were included in the performance evaluation. The entire AUC and the Area Under Part of the Curve (AUPC) with true positive rate above 0.7 and false positive rate below 0.3 were calculated for each of the ROC curves. Table 5.3 lists the AUR and AUPC of interest of each ROC curve, in percentage of the corresponding coverage of a perfect ROC curve, i.e., 1.00 for AUR and 0.09 for AUPC. The optimal cumulation window sizes Tf found by the AUC and the AUPC are consistent for MVexp and EtCO2 . For RR, the AUR indicates Tf =30 is optimal but the AUPC indicates that Tf =20 is optimal. The optimal performance point, in terms of the distance to the left upper corner (0,1) of the ROC figure, is sensitivity=0.90 and specificity=0.01 for EtCO2 obtained with σ=0.9 and Tf =20, sensitivity=0.95 and specificity=0.90 for MVexp obtained with σ=1 and Tf =10, and sensitivity=0.80 and specificity =0.88 for RR obtained with σ=0.9 and Tf =30. 5.4 Discussion The large standard deviations of the Q, R estimates for HR, EtCO2 , MVexp, and Ppeak (see Table 5.2) demonstrate that Q, R estimates are not well concentrated. This confirms that fixed, empirically chosen parameters can be biased for many patients, and using fixed values could degrade the performance of a statistical change-detection method. The Adaptive-DLM method solved this problem by estimating the signal characteristics online. The EM estimation and EWMA prediction are both recursively realized; therefore, the computation overload is acceptable for online use. 58 5.4. Discussion (a) EtCO2 1.0 1.0 0.1 σ increasing Sensitivity 1.3 0.9 Tf=10 0.8 Tf=10 Tf=20 Tf=20 Tf=30 0 0.7 1.0 1−Specificity 0 Tf=30 0.1 0.2 0.3 (b) MVexp 1.0 1.0 0.3 σ increasing Sensitivity 1.5 0.9 Tf=10 0.8 Tf=10 Tf=20 Tf=20 Tf=30 0 0.7 1.0 1−Specificity 0 Tf=30 0.1 0.2 0.3 (c) RR 1.0 1.0 0.2 σ increasing Sensitivity 1.1 0.9 Tf=10 0.8 Tf=10 Tf=20 Tf=20 Tf=30 0 1−Specificity 0.7 1.0 0 Tf=30 0.1 0.2 0.3 Figure 5.8: ROC curves with different cumulation window sizes Tf . Left: entire ROC curves after spline smoothing; Right: enlarged areas of interest with sensitivity >0.7 and specificity >0.7. The largest area under curve is 94.6% obtained with Tf =20 for EtCO2 , 96.2% obtained with Tf =10 for MVexp, and 90.1% obtained with Tf =30 for RR. 59 5.4. Discussion The tests on HR demonstrated that the performance of the proposed Adaptive-DLM method is more consist between patients, compared with the TTS approach. The TTS approach tends to generate false detections for long variations with negligible amplitudes, since the mean absolute deviation and the mean deviation for a trend segment of long duration have the similar magnitude, no matter how small the change amplitude is. Hope in [62] addressed this limitation by multiplying the TTS with a weighting function related to the change amplitude. However, his solution introduced other problems (see Section 2.2 for details). The purpose of the Adaptive-DLM method is to detect changes in the trend direction. Signal segmentation as an intermediate procedure is performed according to the changes in both the incremental rate and signal level, therefore the statistical test used for segmentation does not reflect the degree of certainty of the final decision about the trend direction. The results of signal segmentation carry much information about other trend features in addition to the trend direction. More detailed descriptions about signal patterns, for example, “convex/concave increase”, can be derived from the results of signal segmentation. During data annotation, the anesthesiologists were blind to the detection results generated by the Adaptive-DLM method, therefore their annotations in this study were more objective than the annotations made on the detection results in Chapter 4. However, the post-op annotations can only be used as the gold standard if the annotations perfectly identify true physiological changes. Although the inter-rater agreement strategy was used to improve the reliability of annotation, any inconsistency between the annotations and the true clinically relevant changes may lead to biased results. 60 Chapter 6 Adaptive change detection and pattern recognition based on the generalized hidden Markov model In the previous chapters, the EWMA-Cusum method and Adaptive-DLM method have been proposed to detect changes in the trend direction. Although “direction” is the most important feature of a trend segment, in some clinical situations “incremental rate” and “duration” are also essential for evaluating the patient status. In this chapter, trend variations are sorted into several patterns according to these features and two detection methods are proposed to detect pattern transition online. As pointed out in the Discussion of Chapter 4, signal segments of a similar shape may have different degrees of clinical significance depending on the characteristics of the signal in the recent past. For instance, in Figure 6.1(a), the increase at t2 and decrease at t3 in the Non-Invasive Mean Blood Pressure trend signal are both significant changes, but the changes of similar temporal shapes at t2 and t3 in Figure 6.1-(b) are most likely insignificant, given the large amplitude and long duration of the preceding decrease. This is a typical manifestation of intraoperative physiological variability. For variables measured at a high sampling rate, the intraoperative variability can be handled by estimating the signal’s high-order characteristics online as in Chapter 5. However, some variables cannot be measured very frequently due to the restrictions of device design. For example, NIBP is measured during anesthesia by non-invasive occlusion of the brachial artery with a cuff. The systolic pressure is first detected by slowly releasing the cuff pressure until a distal pulse occurs. Diastolic and mean BPs are then measured by processing the turbulence in the artery. A measurement cycle takes 30 seconds. Since frequent inflation cause congestion of the arm, NIBP during surgery is generally measured at every 3 to 5 minutes. At such 61 Chapter 6. Adaptive change detection based on the GHMM NIBPmean (mm Hg) 80 (b) (a) 75 Significant Insignificant 70 65 60 t’1 t’2 t’3 t’4 ’ t1 ’ t2 ’ t3 ’ t4 Figure 6.1: Two example NIBPmean trend signals: similar patterns have different degrees of clinical significance a rate, a typical trend change is sustained only for 2-6 sampling intervals. The number of samples is not sufficient for the EM algorithm to estimate statistical properties in real time. Therefore, the Adaptive-DLM method proposed in Chapter 5 is not applicable to NIBP. Anesthesiologists usually recognize trend patterns visually, and use their expert knowledge about the transition between these patterns, together with the values of the most recent data to evaluate the probability of occurrence for every possible event. To mimic this cognitive process, a signal’s intraoperative variability should be modeled so that the information about the variability can be learnt from population data. The Hidden Markov Model (HMM) was initially introduced in the late 1960s and has been used in a wide range of applications to describe the transition between different states [105]. However, because each state in an HMM generates only one observation, the state duration follows a geometric distribution, which does not describe physiological trend signals. By allowing one state to generate a sequence of observations and to have an explicit duration distribution, the HMM is generalized to the Generalized Hidden Markov Model (GHMM), also known as the segmental hidden semi-Markov mode [98], or variable duration HMM [105]. The GHMM has been an active research topic since the late 1980s. Originally driven by applications in the fields of speech processing and recognition [34], it has attracted considerable research attention in recent years in the area of bioinformatics. The segmental structure of the GHMM provides a natural framework for describing human genomic sequences and the GHMM has been used to find genes by parsing a DNA sequence into a set of putative coding segments [20, 80]. The GHMM has recently been used 62 6.1. Generalized hidden Markov model for physiological monitoring and event diagnosis [154]. In this chapter, physiological trend signals are described using the GHMM framework. Based on the GHMM, two methods are proposed for change detection and pattern recognition. In the first method, the intra-segmental variations are described using a state-space model; then a Bayesian inference process is used with the fixed-point switching Kalman smoother to estimate the probability of occurrence for each pattern, as well as to estimate the true signal values online. Change points are detected by comparing the probability of change at every sampling instant with a fixed threshold. In the second method, the intra-segmental model transforms different trend patterns to forecast residuals of different magnitudes; then an adaptive Cusum test is used with a maximum posterior state-path finding algorithm to recognize the signal patterns online. Both methods have been tested on the NIBPmean trend signals. The results demonstrate that, by incorporating the pattern transition probability into a signal model, both methods performed better in change-point detection than the standard Cusum test. The adaptive Cusum test has been published in Proceedings of the 28th Annual International Conference of the IEEE Engineering in Medicine and Biology Society [158]; the proposed fixed-point switching Kalman smoother has been published in International Journal of Adaptive Control and Signal Processing [159]. 6.1 Generalized hidden Markov model A physiological trend signal can be viewed as a random series of temporal shapes [25] and described using a first-order GHMM as shown in Figure 6.2(a). According to the direction, duration, and increment rate, the segmental shapes are classified into 9 patterns (see Table 6.1). Unlike the HMM, where each state generates only one data point, in the GHMM each segmental state corresponds to a sequence of data points. The values of these data are determined not only by the pattern of the segment they belong to, but also by the historical data within or before the segment. The GHMM is a three-layer model. The top layer is a Markov chain process. It is assumed that the probability of occurrence for each pattern, given all the patterns in the past, depends only on the previous pattern. The pattern transition is modeled as a first-order Markov chain, with the parameter set {Z, π (0) , A}, where Z={z (1) , . . . , z (N ) } contains the N predefined patterns labels (N =9 in this thesis). π (0) stores the initial prior probabilities; A : N ×N is the transition matrix with Aji representing the 63 6.1. Generalized hidden Markov model Table 6.1: Segmental states in physiological trend signals increase abrupt gradual long short long short decrease abrupt gradual long short long short z (1) z (5) z (2) z (3) z (4) z (6) z (7) z (8) steady z (9) (a) Generalized Hidden Markov Model z1 zm z2 ... Segmental states System states x1 x2 x3 x4 xt Observations y1 y2 y3 y4 yt ... ... (b) Switching Dynamic Linear Model s1 s2 s3 s4 st ... Point states System states Observations x1 x2 x3 x4 xt y1 y2 y3 y4 yt ... ... Figure 6.2: Graphical model for the GHMM probability that the segmental state will change from z (j) to z (i) . In this chapter, unless stated otherwise, subscripts are used as time stamps, parenthesized superscripts represent the indices of a pattern in Z. The lower two layers of the GHMM describe the intra-segmental variations. The middle layer predicts the system states xt , and the bottom layer describes how the measurement yt is observed in presence of noise. A segment with pattern z (i) that starts after the end of the previous segment tp can be described as below: • Duration: ¯l ∼ D(i) ¯l∈[L(i) L(i) c ] f 64 6.1. Generalized hidden Markov model • for each t∈(tp +1 . . . tp +¯l) (xt , yt ) = model(Θ(i) , yt−1 , xt−1 ) (6.1) The temporal shape of state z (i) is determined by the duration ¯l and intra(i) (i) segmental model. ¯l follows a distribution D(i) over the interval [Lc Lf ], which is determined solely by the current segmental state z (i) . The intrasegmental models for different patterns are assumed to have the same functional forms. Trend patterns are differentiated by the model parameters. The complexity of this model exists in that the segment boundaries are not deterministic. For a segment with pattern z (i) , the duration could be (i) (i) any value in the range of [Lc Lf ]. If we introduce a variable l to represent the length from the latest end point and a variable f to indicate whether the current point is an end point, a variable group s={z, l, f } will fully describe the state of each data point: the point-state label st ={z (i) , l, 1} indicates that a pattern z (i) ends at time t with a duration of l sampling intervals (l=¯l); st ={z (i) , l, 0} indicates that a pattern z (i) has sustained for l sampling intervals and is still continuing (l<¯l). Given the present point state st , the probability distribution for the future point state st+1 is independent of the past states s1...t−1 . The transition between point states is still a first-order Markov process, as demonstrated in Figure 6.2-(b), but can no longer be described using the original parameters. The transition is determined jointly by the segmental pattern transition and duration distributions. The resulting transition process is more complex and can only be described using a model of a much larger size. With the switching model under the regime of point states, the problem of change detection becomes one of finding the point with f =1, and the problem of pattern recognition becomes one of finding a states z that best interprets a segment of data. In the following, intra-segmental variations are described using two different models. Based on these models, a fixed-point switching Kalman smoother and an adaptive Cusum test are proposed to perform change detection and pattern recognition. 65 6.2. Fixed-point switching Kalman smoother 6.2 6.2.1 Fixed-point switching Kalman smoother State-space intra-segmental model In this method, the variations within a segment of pattern z (i) are described using a state-space model: System : xt = F (i) xt−1 + ζt ζ ∼ N (µ(i) , Q(i) ) Observation : yt = H (i) xt + ηt η ∼ N (0, R(i) ). (6.2) The parameters for this model are Θ(i) = {F (i) , µ(i) , Q(i) , R(i) }. The system matrix F (i) represents the influence of historical data on the current system states; The system disturbance ζ is independent over time and follows a Gaussian distribution with mean µ(i) and covariance Q(i) . The measurement noise η is also independent over time and follows a Gaussian distribution with zero mean and covariance R(i) . H (i) is the measurement matrix. The true signal level is H (i) x. This intra-segmental model under the regime of point states is a switching DLM [119] (see Figure 6.2-(b)). The change detection problem with this model can be solved using a Bayesian approach. 6.2.2 Review of state estimation with the GHMM The Bayesian approach was adopted by Smith and West in their multistate Kalman filter [124] to estimate trend patterns from signal observations. However, as pointed as in Chapter 2, in their multi-state Kalman filter, the duration of trend patterns are not explicitly modeled, and the transition relationship between patterns is ignored. The prior probabilities for different patterns are assumed to be constant for all the patterns over time. This model is an unrealistic simplification of the physiological trend signals. The forward-backward algorithm is an offline method widely used with the HMM for computing the joint probability of an observation sequence and the state regime. For regular HMM, the joint probability calculated by the algorithm can be normalized to generate the state estimate at a particular time point. But the normalization is very difficult to perform with the GHMM due to unknown segmental boundaries [93]. The switching Kalman filter has been used in many applications with some approximation methods that summarize the state propagation process, to estimate the probability of occurrence for predefined states online. However, most of the studies have either focused on the filtering [82, 119, 154, 156], or fixed-interval smoothing problem [92, 166]. No effort has been dedicated to the fixed-point smoothing process. In the section that follows, 66 6.2. Fixed-point switching Kalman smoother the fixed-point switching Kalman smoother with the GHMM is proposed to estimate the probability of occurrence for each possible trend pattern Pr(sk |Y1...t ) and the system state estimate x ˆk|t , k<t, online. 6.2.3 Fixed-point Kalman smoother For a DLM without the Markov-chain regime, the standard fixed-point Kalman smoother can be used to estimate the intra-segmental system states x for every point in the previous T -point smoothing window. Given the initial estimate x ˆ0 and its covariance V0 , measurements yt are processed iteratively using the Filter and Fixed-point smoother operators defined in Section 5.1.2: (ˆ x t , V t , Kt , x ˆt|t−1 , Vt|t−1 , et ) = f ilter(ˆ xt−1 , Vt−1 , Θt , yt ), and (ˆ xk|t , Vk|t , Vt,k|t−1 ) = smoother(ˆ xk|t−1 , Vk|t−1 , Vt−1,k|t−2 , Kt−1 , et , Vt|t−1 , Θt ). where k t and Θt contains the intra-segmental parameters in effect at time t. Using the two operators, the estimates x ˆk|t , Vk|t , and Vt,k|t−1 (see Section 5.1.2 for the definition) are updated after a new observation is received. In addition, Wt−1 (yt ) the probability density for the current measurement, conditional on all the historical data, is also calculated: Wt−1 (yt ) = N (et ; 0, St ). (6.3) Wt−1 (yt ) contains the information carried in the new observation, and will be used later to update the posterior probability for the point states at k t. Estimation accuracy can be improved by utilizing the information carried in the measurements after the evaluation point. However, an unnecessarily long window size T barely improves the estimation accuracy while increasing the computational overhead. T should be set to an appropriate length. A switching DLM consists of a set of DLM’s; the model in effect at any time is determined by the point state in effect. In the proposed GHMM, the intra-segmental system state x is influenced by the system state in the previous segment, in addition to the current pattern (see Figure 6.2). The system variable xt therefore is conditional on all the previous point states s1...t . The number of possible state sequences for s1...t grows exponentially as the number of observations increases. The state propagation process needs to be summarized to reduce the computational complexity. 6.2.4 Generalized Pseudo Bayesian algorithm The Generalized Pseudo Bayesian (GPB) algorithm [9] summarizes the influence of historical regime states whose individual influence is negligible 67 6.2. Fixed-point switching Kalman smoother given the states of more recent points. An nth-order GPB algorithm summarizes the influence of the states that are n points before the current time. A larger n requires more intensive computation, but may generate a more accurate approximation. The second-order GPB (GPB2) algorithm is a simple approximation approach widely used in the switching Kalman filter [82, 154, 166]. The GPB2 approach has been compared with other approximation methods with regard to the estimation accuracy and computational overhead in [102]. The study found no strong evidence in favor of more sophisticated methods. The GPB2 method is used in this study. The first step of this method is to update the transition possibilities; the second step is to summarize the conditional Kalman estimates. Inference The probability for the state transition from st−1 to st , denoted as Pr(st |st−1 ), is first calculated. For the switching DLM transformed from the GHMM, not all the point-state transitions are valid for st−1 → st , since the length l should grow continuously within its range. According to whether the previous point is an end point, Pr(st |st−1 ) in different situations is calculated as below: (1) if (zt = zt−1 , lt = lt−1 +1, ft = 1) & (zt−1 , lt−1 , ft−1 = 0) D(zt−1 ) (d = lt−1 +1) then Pr(st |st−1 ) = D(zt−1 ) (d lt−1 +1) (2) if (zt = zt−1 , lt = lt−1 +1, ft = 0) & (zt−1 , lt−1 , ft−1 = 0) D(zt−1 ) (d > lt−1 +1) then Pr(st |st−1 ) = D(zt−1 ) (d lt−1 +1) (3) if (lt = 1, ft = 1) & (zt−1 , lt−1 , ft−1 = 1) then Pr(st |st−1 ) = Azt−1 zt D(zt ) (d = 1) (4) if (lt = 1, ft = 0) & (zt−1 , lt−1 , ft−1 = 1) then Pr(st |st−1 ) = Azt−1 zt D(zt ) (d > 1) otherwise : Pr(st |st−1 ) = 0 (6.4) The segment duration for a pattern z (i) is determined by the duration density D(i) , and independent of the intra-segmental dynamic models. Pr(st |st−1 ) is used in the inference process defined in the below. In addition to Pr(st |st−1 ), the inputs and outputs of this inference process also include: s ,s ,s k t−1 t • Wt−1 (yt )= Pr(yt |sk , st−1 , st , y1...t−1 ): the conditional probability density for the current measurement yt given the point states for the instants k, t−1, and t and the previous observations y1...t−1 . This input is generated by the Kalman filter at time t via Equation (6.3). • Pr(st |st−1 ): the transition probability from st−1 to st at time t. 68 6.2. Fixed-point switching Kalman smoother • Wt (sk )= Pr(sk |y1...t ): the probability that the point state at time k is sk given the observations y1...t . • Wtsk (st )= Pr(st |sk , y1...t ): the probability that the point state at time t is st , given the observations y1...t and the point state at time k. • Wtsk ,st (st−1 )= Pr(st−1 |sk , st , y1...t ): the probability that the point state at t−1 is st−1 , given y1...t and the point states at time k and t. The notation Wab (c) above represents the probability of event(s) c conditional on event(s) b and the observations y1..a . The inference process is: { Wt (sk ), Wtsk (st ), Wtsk ,st (st−1 ) } = sk ,st−1 ,st sk inf er{ Wt−1 (sk ), Wt−1 (st−1 ), Pr(st |st−1 ), Wt−1 (yt ) } s ,s s ,s ,s k t−1 k t−1 t Wt−1 (st , yt ) = Pr(st |st−1 )Wt−1 (yt ) sk ,st−1 sk sk Wt−1 (yt , st , st−1 ) = Wt−1 (st−1 )Wt−1 (st , yt ) sk sk Wt−1 (st , yt ) = st−1 Wt−1 (yt , st , st−1 ) sk sk Wt−1 (yt ) = st Wt−1 (st , yt ) sk Wt−1 (st , yt ) sk Wt (st ) = sk Wt−1 (yt ) sk (yt ) Wt−1 (sk , yt ) = Wt−1 (sk )Wt−1 Wt−1 (yt ) = sk Wt−1 (sk , yt ) Wt−1 (sk , yt ) Wt (sk ) = Wt−1 (yt ) W sk (yt , st , st−1 ) Wtsk ,st (st−1 ) = t−1sk Wt−1 (st , yt ) s ,s (6.5) ,s k t−1 t The probability density Wt−1 (yt ) calculated in Equation (6.3) is normalized with Wt−1 (yt ) in this step. The output Wt (sk ) is used in the further process for pattern recognition and change detection. Wtsk ,st (st−1 ) and Wtsk (st ) are used as the weighting factors in the “collapsing” step. Collapsing Assume that for a switching DLM the previous unconditional estimate x ˆt−1 follows a Gaussian distribution. After passing it through an (m) M -state switching Kalman smother, we will get M updated estimates x ˆt , m∈1 . . . M , each conditional on a state label m. Since the smoothing pro(m) cess is a linear transformation, each of the updated estimates xˆt follows a Gaussian distribution. The probability of the updated unconditional estimate x ˆt can be described using a Gaussian mixture model. Collapsing is a weighted summing process to find a Gaussian distribution with the closest Kullback-Leibler distance to the Gaussian mixture [92]. The collapsing 69 6.2. Fixed-point switching Kalman smoother procedure for two Gaussian random variables X and Y is defined as in [92]. ¯ (m) =E(X|s=m) and Y¯ (m) =E(Y |s=m), the Given the conditional means X (m) conditional cross-variance Vx,y =cov(X, Y |s=m), and the probability of oc¯ currence for each condition W (m)= Pr(s=m), the unconditional means X, ¯ Y and the unconditional cross-variance Vx,y are calculated as below: ¯ = collapseI(X ¯ (m) , W (m)) X ¯ = Σm X ¯ (m) W (m) X (m) ¯ ¯ (m) ¯ ¯ (m) Vx,y = collapseII(Vx,y , X, X ,Y ,Y , W (m)) (m) ¯ (m) − X) ¯ T (Y¯ (m) − Y¯ )W (m) Vx,y = Σm Vx,y W (m) + Σm (X 6.2.5 (6.6) (6.7) Overall procedure From t0 =1, in every iteration, the switching Kalman smoother updates the (s ) intra-segmental system state estimate x ˆk|tk and the probability Wt (sk ), with the new observation yt , for every previous instant k in the smoothing window, k∈t−T +1 . . . t. This overall process is realized as below: —————Algorithm: fixed-point switching Kalman smoother ————— Initialization The switching fixed-point Kalman smoother is initialized using x0 ∼ N (ˆ x0 , V0 ) with no state regime. At t=1, conditional on the point (s1 ) (s ) (s ) (s ) state s1 , x ˆ1 , x ˆ1|01 , V1 1 , and v1|01 are calculated using the Kalman filter: (s1 ) (ˆ x 1 , V 1 , K1 , x ˆ1|0 , V1|0 , e1 ) = f ilter(ˆ x0 , V0 , Θt , yt ). (s ) W1 1 The probability for each of the candidate point states for t=1 and the inputs to the inf er operator for the next iteration are calculated: (s ) (0) (0) (s ) (s ) W1 1 =πs1 / i πsi , W1 0 =1, W1 1 (s1 )=1; Recursion From t=2, in every iteration, given the new observation yt , (sk ,st−1 ) (sk ,st−1 ) the estimates from the previous iteration, including xˆk|t−1 , x ˆt−1 , (s ,s ) (s ,s ) (s ,s ) (s ) k t−1 k t−1 , Vt−1k t−1 , x ˆt−1,k|t−2 , Wt−1 (sk ), and Wt−1k (st−1 ) for every k∈ Vk|t−1 max(1, t−T +1) . . . t, the fixed-point switching Kalman smoother is realized following the procedure shown in Figure 6.3. The results of the Kalman smoother are conditional on (sk , st−1 , st ). These estimates are collapsed to (sk , st ) and fed back for the next iteration at t=t + 1. ————— end of the fixed-point switching Kalman smoother ————— 70 6.2. Fixed-point switching Kalman smoother sk ... s t−1 P r(s t |s t−1 ) st yt s ,s f ilter, smoother ,s k t−1 t (yt ) Wt−1 inf er Wt−1 (s k ) sk Wt−1 (s t−1 ) s ,s k t−1 x ˆk|t−1 s ,s k t−1 x ˆt−1 sk ,st−1 Vt−1 s ,s k t−1 Vk|t−1 s ,s k t−1 Vt−1,k|t−2 sk ,st ,st Vk|t−1 s ,s ,st k t−1 Vt,k|t−1 sk ,st−1,st Vt s ,s ,st s ,s ,st k t−1 x ˆk|t−1 Delay k t−1 x ˆt|t−1 s ,st−1,st collapseI Delay x ˆtsk ,st sk ,st x ˆk|t Wt (s k ) Wtsk ,st (s t−1 ) x ˆt k s ,st−1,st , Wtsk ,st (s t−1 )) s ,st−1,st , Wtsk ,st (s t−1 )) = collapseI(ˆ xt k k = collapseI(ˆ xk|t s ,s ,s sk ,st k t−1 t x ˆt|t−1 = collapseI(ˆ xt|t−1 , Wtsk ,st (s t−1 )) Wtsk (s t ) k x ˆsk|t collapseI sk ,st x ˆk|t collapseII s ,st−1,st Vtsk ,st = collapseII(Vt k s ,s ,s s ,st−1,st ,x ˆt k s ,s s ,st−1,st ˆt k ,x ˆtsk ,st , x s ,s ,x ˆtsk ,st , Wtsk ,st (s t−1 )) ,s sk ,st sk ,st k t−1 t k t−1 k t−1 t k , Wtsk ,st (s t−1 )) ,x ˆt|t−1 ,x ˆt|t−1 ,x ˆsk|t−1 ,x ˆk|t−1 = collapseII(Vt,k|t−1 Vt,k|t−1 s ,st−1,st sk ,st Vk|t = collapseII(Vk|tk s ,st−1,st k ,x ˆk|t s ,st−1,st sk ,st k ,x ˆk|t ,x ˆk|t sk ,st ,x ˆk|t , Wtsk ,st (s t−1 )) Figure 6.3: Overall procedure of the fixed-point switching Kalman smoother (the parentheses for the state labels are not displayed); Please refer to Section 6.2.5 for a detailed description of the process. To solve the problem of trend-change detection and pattern recognition, (s ) in every iteration, Wt (sk ) and x ˆk|tk are used to calculate the following estimates for every k∈max(1, t−T +1) . . . t: 1. Wt (fk =1) the certainty of change; 2. zˆk|t,fk =1 the pattern at time k given the observations y1...t , and the fact that a change point has occurred at time k; 3. yˆk|t the unconditional signal estimate at time k. 71 6.2. Fixed-point switching Kalman smoother Wt (fk =1) = Wt (sk ) (6.8) sk ∈{sk |fk =1} zˆk|t,fk =1 = argmax {Wtfk =1 (zk )} zk = argmax { zk Wt (sk )/Wt (fk =1)} (6.10) sk ∈{sk |fk =1} (s ) yˆk|t = H (i) (6.9) x ˆk|tk Wt (sk ) (6.11) sk The level of Wt (fk =1) is compared with a threshold h. Wt (fk =1) h indicates that a change point has occurred at time k. 6.2.6 Computational and space complexity The GPB2 algorithm has a polynomial complexity [149]. If we treat the standard fixed-point Kalman smoothing process as one operation unit, the computation overhead for every sampling instant is determined by the number of possible combinations for (sk , st−1 , st ) and the smoothing window size T . If all the patterns have the same duration [1 L], for every sampling instant, there are 2N L possible point states. The naive state propagation could result in T (2N L)3 times of calculations, imposing a heavy demand on the processor. However, not all the point-state combinations are possible. The possible state combination (sk , st−1 , st ) needs to satisfy: Pr(st , st−1 , sk ) = Pr(sk ) Pr(st−1 |sk ) Pr(st |st−1 ) > 0 (6.12) The transition st−1 →st has to satisfy the conditions discussed in Equation (6.4). When ft−1 =0, st−1 only has two possible propagations, both are the one-point extensions of the previous state, i.e., {zt =zt−1 , lt =lt−1 +1, f =0 or 1}; when ft−1 =1, for each st−1 , the state st can be any of the 2N possible segmental patterns, i.e., {zt =z (1)...(N ) , lt =1, f =0 or 1}. Similarly, given sk , the number of possible point states for st−1 is less than N (t−k−1) (assuming T L) (see Table 6.2). If the point at time k has N L possible states for fk =1 and fk =0 each, the numbers of possible (sk , st−1 ) combinations for all the points in the smoothing window k∈(tT+1. . . t−1) add up to N 2 L(T −1)(T −2). Since a point state with the same segmental pattern zt and the same duration lt can share one Kalman smoothing operation, in every iteration, the computational complexity for a T -point switching Kalman filter is ≈ N 3 L(T −1)(T −2). 72 6.2. Fixed-point switching Kalman smoother Table 6.2: Number of the possible point states for st−1 given sk ft−1 = 1 ft−1 =0 fk =1 N L−1, t−k−1 N L, t−k−1 fk =0 N ( L−1, t−k−2 +1) N ( L, t−k−2 +1) l z 1 2 1 2 f=0 3 4 5 6 l z 1 2 3 f=1 4 5 6 3 0.1 0.2 l 2 3 0.3 Naive Matrix 0.4 Wt(sk) z 1 4 1 4 . . l 2 3 2 3 . . f index 0 1 0 2 1 3 1 4 . . . . st-1 4 1 5 st Legend Table st-1 sk index ... st-1 Legend Table 4 2 1 5 2 2 ... s k Legend Table . . . . 1 2 st 1 1 1 sk index 1 2 2 2 2 3 4 4 2 2 5 1 2 2 6 5 2 2 (st, st-1 , sk ) Mapping Table with ( s k , st) as key 3 4 5 (st-1 , sk ) Mapping Table with sk as key Figure 6.4: Linked lists for compact storage of the conditional estimates in the GHMM-based switching Kalman smoother A naive way of storing the conditional estimates also requires a vast amount of space. A complete collection of the estimates conditional on (sk , st−1 , st ) would occupy a space as large as (2N L)3 . As explained above, not all the (sk , st−1 , st ) combinations are possible. The “naive matrix” is therefore very sparse. Linked lists are used to obtain a compact storage. As shown in Figure 6.4, the memory is organized using a series of linked list referred to as “legend table” and “mapping table”. The “legend table” only stores the value of the non-zero estimates and their locations in the naive matrix and assigns a new index for each record. The mapping tables for the estimates conditional on (sk , st ) and on (sk , st−1 , st ) are built in a similar manner. In the mapping tables, the entries are organized in blocks, and each block has a unique key. For example, in the mapping table for (sk , st−1 , st ), the entries in the same block share the same (sk , st ). The scheme improves the speed of looking-up in the collapsing step. 73 6.3. Adaptive Cusum test based on the GHMM The probability of occurrence for each possible point state at every instant in the smoothing window Wt (sk ), k∈t−T +1 . . . t, is compared with a very small threshold hW in every step to remove the unpromising point states sk before the iteration reaches the end of the smoothing window. An appropriate hW may reduce the computational and space complexity without compromising the estimation accuracy. 6.3 6.3.1 Adaptive Cusum test based on the GHMM Prerequisites for the intra-segmental model The method presented in this section uses a SPRT method online as the heuristic to find the pattern sequence with the maximum local posterior probability approximately. To use this SPRT heuristic, the forecast residuals should have different magnitudes for different patterns. It is also desirable that the predefined patterns can be grouped according to certain properties, so that the state trellis can be narrowed down according to these properties. The second condition is satisfied in this thesis, as the trend patterns defined in Table 6.1 can be grouped according to the direction of change into increasing, decreasing, and steady. 6.3.2 Offline solution: extended Viterbi algorithm The extended Viterbi algorithm is a dynamic programming method that finds the optimal segmental state sequence as well as the segment boundaries for an observation series described by the GHMM, using the Maximum a Posteriori (MAP) criterion. A sequential data structure S is defined to store one of the candidate end-point sequences that segments the entire observation series y1 . . . y : S = {(zt1 , lt1 )t1 . . . (ztm , ltm )tm . . . (z , l )tM } (6.13) where t1 < . . . <tM < . Each node (ztm , ltm )tm in S represents a segmental end point (ftm =1) at time tm with the pattern ztm and duration ltm . (ztm , ltm )tm is referred to as an end-point state-node, and ztm is referred to as an end-point pattern. There are M end points in total, excluding the last point of the observation series. The successive nodes in S are linked together as below: if (ztm , ltm )tm is the m-th node in S, tm −ltm leads to the location of the (m−1)-th end point in S, i.e., tm−1 =tm −ltm . It is assumed that the start of an observation series can be controlled, and y1 is always the first observation generated by the first pattern, i.e., t1 =lt1 . However, 74 6.3. Adaptive Cusum test based on the GHMM the observations associated with the last pattern may be cut off before it reaches the segmental end. Therefore, tM < in Equation (6.13) and we do not recognize the last segment. There are more than one candidate end-point-sequences that can potentially be used to segment an observation series, since both the number of end points (M in S ) and the state of each end point are unknown. The Viterbi algorithm finds the sequence Sˆ that has the maximum posterior probability, or equivalently, Sˆ = argmax Pr(S , y1 . . . y ). (6.14) S The extended Viterbi algorithm is a recursive algorithm which starts with a forward process to find and store the local best state-path leading to each possible pattern for the current time, given that the current time is an end point. After reaching the end of the observation series, it tracks backward along the previously stored state-pathes to find the single best path for the complete observation series. To realize this process iteratively, an auxiliary function Pt (zt ) is defined to represent the maximum probability for a candidate end-point pattern zt at time t, and a data structure {Sˆt−ˆlt , ˆlt }t (zt ) is defined to record the state-path leading to Pt (zt ). In this data structure, t−ˆlt locates the end point of the previous segment. The Pt (zt ) function is also used in [49]. Pr(St−lt , zt , lt , ft =1, y1...t ) Pt (zt ) = ltmax ,St−lt (6.15) {Sˆt−ˆlt , ˆlt }t (zt ) = argmax Pr(St−lt , zt , lt , ft =1, y1...t ). l ,S t t−lt The extended Viterbi algorithm is carried out recursively as below: ———————— Algorithm: extended Viterbi algorithm ——————— Initialization The algorithm is initialized with P0 =1 and Sˆ0 ={ }. Forward recursion From t0 =1 to t= −1, in every forward iteration, Pt (zt ) and {Sˆt−ˆlt , ˆlt }t (zt ) for time t are calculated, given the partial observation series y1...t and the estimated Pk (zk ) and {Sˆk−ˆlk , ˆlk }k (zk ) for k<t. This process is realized in three steps. 75 6.3. Adaptive Cusum test based on the GHMM 1. Admissible (zt−lt , lt , zt ) First, the admissible (lt , zt−lt , zt )’s are identified. The states of successive end points at t−lt and t must satisfy the following conditions: zt−lt ∈{z (i) : Pt−lt (z (i) ) > 0, i∈1 . . . N } zt ∈ {z (i) : Azt−lt i > 0, i∈1 . . . N } t − lt {z (i) : (0) πi > 0, i∈1 . . . N } (z ) Lf t (z ) lt 1 t − lt = 0 t) min(L(z c , t) (6.16) (6.17) (6.18) (z ) where Lc t and Lf t are the lower and upper bounds for the duration of zt (see Section 6.1 for details). If the admissible (lt , zt−lt , zt ) set is not empty, P and Sˆ are updated. 2. P updating For every admissible (lt , zt−lt , zt ), a function Γt is calculated: Γt (lt , zt−lt , zt ) = Pt−lt (zt−lt )Azt−lt zt Pr{yt−lt +1...t |zt } D(zt ) (lt ) Ps (t,lt ,zt ) Pd (6.19) where Pt−lt (zt−lt ) is previously generated at t−lt , Azt−lt ,zt is the transition probability for zt−lt →zt , Ps indicates how well the measurements yt−lt +1...t fit the trend shape zt , and Pd represents the probability that the current segment has a duration lt given its segmental state zt . Given the current segmental state zt , Ps is assumed to be independent of both the segmental states z0...t−lt and the system states x0...t−lt of the historical segments. Pd only depends on the duration distribution of zt . Γt (lt , zt−lt , zt ) is maximized over all the admissible (lt , zt−lt )’s (see Equation (6.17-6.17)), resulting in a Pt (zt ) for every possible endpoint pattern zt for time t and the corresponding {ˆ zt−ˆlt , ˆlt }(zt ): Γt (lt , zt−lt , zt ) Pt (zt ) = ltmax ,zt−lt (6.20) zt−ˆlt , ˆlt }(zt ) = argmax Γt (lt , zt−lt , zt ). {ˆ l ,z t t−lt 3. Sˆ updating The newly identified {ˆ zt−ˆlt , ˆlt }(zt ) is attached to the existent state path {Sˆt−ˆlt −ˆlt−l , ˆlt−ˆlt }t−ˆlt (ˆ zt−ˆlt ) that leads to zˆt−ˆlt at time t−ˆlt , t 76 6.3. Adaptive Cusum test based on the GHMM resulting in the updated state path {Sˆt−ˆlt , ˆlt }t (zt ): {{Sˆt−ˆlt −ˆlt−l , ˆlt−ˆlt }t−ˆlt (ˆ zt−ˆlt ), {ˆ zt−ˆlt , ˆlt }(zt )} t z ˆ , ˆl ˆ ) ˆ , ˆlt }(zt ) ⇒{Sˆ ˆ ˆ , (ˆ t−lt −lt−lt t−lt ⇒{Sˆt−ˆlt , ˆlt }t (zt ). (6.21) t−lt t−lt If there is no admissible (lt , zt−lt , zt ), do nothing. Let t=t+1 and go to the next iteration. Termination and backtracking Forward recursion is stopped when reaching the end of signal t= . Many researches suggest calculate P (z ) and {Sˆ −ˆl , ˆl } (z ) as for other observations, and find the global MAP state sequence Sˆ by tracking backward along the path stored in {Sˆ ˆ , ˆl } (ˆ z ), −l with zˆ = argmaxz P (z ). However, this procedure may reduce the estimation accuracy, if there is no guarantee that y is the end of the last segment. Given the partial observations, the estimated zˆ for the last segment is very likely to be different from the true pattern. Therefore we suggest not recognizing the last pattern, and propose to estimate the MAP end-point sequence in Equation (6.13) using the following procedures. First, the admissibility of (z −l , l , z ) combinations for the last observation are evaluated using the conditions in Equation (6.17) and (6.17), and (z ) the duration condition in Equation (6.18) is modified to l min(Lc , ), where the lower bound for l is removed since is not required to be an end point. Then, for every admissible (l , z −l ), a function Υ is calculated: Υ(l , z −l )= P −l (z −l )Az −l z Pr{y −l +1... |z }D(z ) (d l ). (6.22) z Maximize Υ with respect to (l , z −l ) to generate (ˆl , zˆ −l ): (ˆl , zˆ −l ) = argmax Υ(lt , z l ,z −l ). (6.23) −l Then, track backward along the state path stored in {Sˆ −ˆl −ˆ l , ˆl −ˆl } −ˆ l (ˆ z −ˆl ) to identify the MAP end-point sequence Sˆ for the entire observation series. ———————– end of the extended Viterbi algorithm ———————— The Viterbi algorithm is impractical for online change detection, because the backward tracking starts from the end of a signal. To use this method in 77 6.3. Adaptive Cusum test based on the GHMM real time, a stopping rule is needed to decide when to start backtracking and to initiate a new segment. In addition, the computational and space complexity of the Viterbi algorithm is very high with the GHMM. It propagates though all the possible (lt , zt−lt , zt )’s in every iteration, although given the observational evidence, some of these state combinations have little promise to become part of the final MAP state sequence. Including the unpromising states in the forward recursion produces little improvement for the estimation accuracy but unnecessarily increases the complexity. It is desirable to remove the unpromising states before reaching the end of observation series, resulting in an approximation scheme called the beam search. 6.3.3 Online solution: beam search with Cusum pruning Beam search is a widely used approximation scheme for finding the optimal path in a state trellis. In the beam search, a heuristic function is often used to evaluate the promise of each state node. Similar to how we search around a dark room following a flashlight beam, the method only searches the promising nodes included in the promising state subset. To approximate the MAP estimation described in Section 6.3.2, a beam search scheme should evaluate the promise of every (lt , zt−lt , zt ) according to certain criteria, in addition to the admissible conditions, and remove the unpromising (lt , zt−lt , zt ) combinations. The promise of a (lt , zt−lt , zt ) combination is measured by the Γ function (see Equation (6.19)). Therefore, an appropriate criterion for selecting (lt , zt−lt , zt ) should be based on the minimum requirement for each component of the Γ function (Pt−lt (zt−lt ), Azt−lt ,zt , Ps and Pd ). The length-constrained Cusum test (see Section 4.1.2) is used to define the conditions that a (zt , lt ) combination at time t should fulfill to be included in the forward recursion. In the Cusum test, the Cusum of forecast residuals is compared with a length-constrained test mask to determine whether a trend signal has changed its direction. Since the trend patterns defined in this chapter can be grouped according to the direction into three groups (increasing, decreasing and steady), the Cusum test, by recognizing the change direction, can narrow down zt to one of these groups. The Cusum test can also roughly locate t∗ , the starting point of the newly detected change. It is reasonable to assume that the previous segment ends around t∗ , i.e., t−lt ∈[t∗ −∆T t∗ +∆T ]. zt−lt can be narrowed down to these patterns that have the same direction as the trend change previously detected by the Cusum test. Furthermore, a newly detected trend change also suggests that enough 78 6.3. Adaptive Cusum test based on the GHMM observational evidence has been observed for recognizing the previous segment. Therefore the backtracking process is triggered after a change point is detected by the Cusum test. The Cusum test thresholds are adjusted online to reflect the influence of the newly received measurements on the forecasting probability for the upcoming data. 6.3.3.1 Beam search scheme ˜ t ) is To realize the beam search process iteratively, an auxiliary function P(z defined to record the approximate local maximum probability leading to the ˜ t ) is similar to P(zt ) used in the Viterbi algorithm, current pattern zt . P(z ˜ is only optimized over the promising (lt , zt−l , zt ) but in every iteration, P t combinations in the beam, instead of all the admissible (lt , zt−lt , zt ) combinations. Since in physiological monitoring, the signal history from more than a few minutes ago has little clinical relevance, the beam search method only tracks back to identify the pattern before the newly detected change point. Therefore the data structure Sˆt for storing the local best pathes in the Viterbi algorithm is not used. Instead, the promising patterns for the ∗ , where m represents the total number of ongoing trend zt are stored in Zm end points that have been detected until the current iteration; t∗ records the starting point of the ongoing trend located by the Cusum test; the candidate ∗ patterns for the previous change point in Zm−1 are used as the promising states for zt−lt . The beam search process is carried out recursively as below: —————– Algorithm: beam search with the Cusum pruning ————— ˜ 0 =1, m=0, Z ∗ =Z, and Initialization The algorithm is initialized with P m ∗ t =0. The Cusum test mask is also initialized according to Section 3.1.3. ∗ , and P ˜ k , k<t, the Recursion From t0 =1, in every iteration, given y1...t , Zm ˜ beam search method calculates Pt , identifies the previous segmental state if a new trend change has been detected, and adjusts the Cusum test according to the new observation. 1. Beam updating The Cusum of forecast residuals is compared retrospectively with a length-constrained testing-mask to determine whether a new segment has occurred: • A broken lower arm indicates that an increase has occurred. In this case, all the increasing patterns can potentially be the segmental states of the newly detected trend, and the data points around t∗ , the starting point of the newly detected trend change identified by the Cusum test (see Figure 3.2 for how to locate 79 6.3. Adaptive Cusum test based on the GHMM t∗ ), can potentially be the latest end point. Therefore, m=m+1, ∗ ={z (1) , z (2) , z (3) , z (4) }, and t∗ is updated. Zm • A broken upper arm indicates that a decrease has occurred. There∗ ={z (5) , z (6) , z (7) , z (8) }, and t∗ is updated. fore, m=m+1, Zm • If no change has been detected after the Cusum reaches the full length of the test mask, the ongoing signal segment is either a ∗ = plateau or a long gradual change. Therefore, m=m+1, Zm {z (3) , z (7) , z (9) }, and t∗ in this case is set to the location of left end of the test mask. • If no new trend is detected, m=m. The current observation is treated as an extension of the pattern at the previous instant. ∗ and Z ∗ The updated t∗ , Zm m−1 are used together with the conditions in Equation (6.17-6.17) to construct the candidate (lt , zt−lt , zt ) combi˜ t (zt ) updating: nations (recorded in beamt ) for P ∗ zt−lt ∈{z (i) : Pt−lt (z (i) ) > 0, z (i) ∈Zm−1 } zt ∈ ∗} t−l {z (i) : Azt−lt i > 0, z (i) ∈Zm t 1 (0) {z (i) : πi ∗} > 0, z (i) ∈Zm (z ) Lf t , t−t∗ −∆T (z ) lt t − lt = 0 ∗ t) L(z c , t, t−t +∆T (6.24) (6.25) (6.26) (z ) where Lc t and Lf t are the lower and upper bounds for the duration of zt , and the tuning parameter ∆T is the allowable deviation of an end point from the location determined by the Cusum test. ˜ is updated and the Cusum test is adjusted. If the beamt is not empty, P ˜ updating The function Γ ˜ t (similar to Γt in Equation (6.19), 2. P ˜ t (zt )) is calculated for every but with Pt (zt ) substituted with P ˜ (lt , zt−lt , zt ) in beamt . Optimizing Γt (lt , zt−lt , zt ) over all the candidate (lt , zt−lt ) combinations in the beamt , we get the updated ˜ t (zt ) and the corresponding {ˆ P zt−ˆlt , ˆlt }(zt ). ˜ Pt (zt ) = max (lt ,zt−lt )∈beamt zt−ˆlt , ˆlt }(zt ) = {ˆ ˜ t (lt , zt−l , zt ) Γ t argmax (lt ,zt−lt )∈beamt ˜ t (lt , zt−l , zt ) Γ t (6.27) 3. Pattern recognition If a change is detected by the Cusum test, the previous trend pattern zˆt−ˆlt and the location of its end point t−ˆlt are identified. 80 6.3. Adaptive Cusum test based on the GHMM The sensitivity of the Cusum test is configured to detect the occurrence of a trend before it reaches the segmental end. As in the backward state-tracking step in the Viterbi algorithm, given the partial observations, we do not select the single best pattern for the ongoing trend. Instead, the posterior probability of the state-pathes are summarized over the promising patterns for the ongoing trend following the procedure as below. First, the promising (zt−lt , lt , zt ) combinations are identified and recorded in beam , using the conditions in Equation (6.24), (6.25) and (6.26), with the lower bound constraint dropped for lt , since the current instant t might not be the end point of the ongoing ˜ t (lt , zt−l ) summarizes the local maxitrend. Then, a function Υ t mum posterior probability for an end-point sequence which ends at t−lt with the last pattern being zt−lt , given y1...t . ˜ t (lt , zt−l ) = Υ t zt ˜ t−l (zt−l )Az zt Pr{yt−l +1...t |zt }D(zt ) (d P t t t t−lt lt ) (6.28) ˜ t (lt , zt−l ) with respect to (lt , zt−l ) to identify the Maximize Υ t t location and pattern for the segment before the ongoing trend change: ˜ t (lt , zt−l ) (ˆlt , zˆt−lt ) = argmax Υ (6.29) t (lt ,zt−lt )∈beam 4. Adjusting the Cusum test The shape of the test mask in the Cusum test is adapted online (see the next section). If the beam is empty, do nothing. t=t+1 and go to the next iteration. ———————————— end of beam search ——————————— 6.3.3.2 Adaptive Cusum test If an intra-segmental model transforms different patterns into different deviations in the forecast residuals, the shape of the test mask for the Cusum test can be designed to capture these patterns. For instance, the target patterns of the upper arm are decreases including and the steady pattern, i.e., {z (5) . . . z (8) , z (9) } (see Figure 6.5). Since the abrupt long decrease z (5) is a sustained version of the abrupt short decrease z (6) , and a gradual long decrease z (7) is a sustained version of the gradual short decrease z (8) , only 81 6.3. Adaptive Cusum test based on the GHMM d (9) d (8) { d (6) Overlapping area h (8) h (6) h (9) Figure 6.5: Design of the upper arm of the test mask for the GHMM-based adaptive Cusum test z (6) , z (8) and z (9) are considered. Following the guidelines described in Section 3.1.3 for V-mask design, a standard arm is generated for each of z (6) , z (8) and z (9) : for a target pattern z (i) , the slope of the arm is set as d(i) , half of the mean forecasting deviation caused by z (i) . The rise distance h(i) is set according to Equation (3.6). As the duration ranges of {z (6) , z (8) , z (9) } are connected head to tail with some overlapping area, the outline of these arms is used as the test mask. In this thesis the two arms in the overlapping area are simply connected as shown in Figure 6.5. The test mask in the overlapping areas could be smoothed using a more sophisticated method, such as averaging the two arms with the duration probabilities as weighting factors. To obtain a consistent type I and type II error rates, the rise distance h of each constituting arm is adjusted according to the updated probability of the corresponding target pattern for the upcoming trend change. The ratio (i)/(9) Rt|k between z (i) , i∈1 . . . 8, relative to the steady state z (9) for every point t−ˆlt <k<t in the span of the test mask is calculated as below: (i)/(9) Rt|k = = Pr(zk+1 = z (i) , fk+1...t = 0|y1...k , fk = 1) Pr(zk+1 = z (9) , fk+1...t = 0|y1...k , fk = 1) (i) , f ∗ Pr(zk+1 = z k+1...t = 0, zk , y1...k , fk = 1) zk ∈Zm Pr(zk+1 = z (9) , fk+1...t = 0, zk , y1...k , fk = 1) ˜ t−k) zk ∈Zm ∗ Pk (zk )Az z (i) k ≈ ˜ D(9) (d t−k) zk ∈Zm P (z )A ∗ k k zk z (9) ∗ zk ∈Zm (i) D (d (6.30) ˜ k (zk ), and Z ∗ is the promiswhere Pr(zk , y1...k , fk =1) is approximated by P m ing states for the latest detected trend change. 82 6.4. Application to NIBPmean monitoring (i)/(9) The updated ratio Rt|k is used to adjust the shape of the test mask. Every point on the test mask, from the initial point at time t retrospectively to every point k, t−ˆlt <k<t should be calculated following the V-mask cor(i) responding to z (i) with the rise distance hα,t|k as below: (i) hα,t|k = δ2 1 δ2 1−β (i)/(9) ) ≈ − log( (log Rt|k + log(α/2)) α/2 R(i)/(9) d(i) d(i) t|k (6.31) where d(i) is the expected magnitude of the level shifts caused by the target pattern z (i) , δ 2 is the covariance of measurement noise, and α and β are the type I and type II error rates (also see Section 3.1.3). (i) The initial rise distance hα,1 , i∈1 . . . 8, for each pattern z (i) can be cal(i)/(9) (0) (0) culated using Equation (6.31) with R1 =πi /π9 . The online adaptive (i) (i) rise distance hα,t|k is scaled to hα,1 as below: (i)/(9) (i) hα,t|k = log Rt|k + log(α/2) (0) (0) log πi /π9 + log(α/2) (i) hα,1 (6.32) (i) In practice, hα,1 is multiplied by a parameter σ to control the sensitivity. 6.4 Application to NIBPmean monitoring The proposed switching Kalman smoother and the adaptive Cusum test were both tested on the NIBPmean trend signals. This section describes the intra-segmental models used for NIBPmean and how the model parameters are estimated from the annotated data. A large number of annotated data are required to obtain accurate parameter estimates for the GHMM. The model parameters are acquired in two ways. An anesthesiologist who is familiar with the concept of the GHMM was interviewed regarding the model parameters using a questionnaire. In the interview, the expert was asked to enumerate all the possible patterns given the previous one, and weight the probability of occurrence for each of these possible patterns on a scale of 1-10. These weights were then normalized to form a transition matrix. The expert was also asked to give the range, mean, and standard deviation for the duration of each trend pattern. A second anesthesiologist annotated the trend patterns in 20 NIBPmean signals using eViewer (see Figure 3.3). Due to the limited number of data, the number of occurrences of some patterns are in sufficient for parameter 83 6.4. Application to NIBPmean monitoring estimation. Among the 97 annotated patterns, there were only 3 abrupt long increases (z (1) ) and 1 abrupt long decrease (z (5) ). Pattern transitions from abrupt long increases (z (1) ) to abrupt long decrease (z (5) ) and from abrupt long decrease (z (5) ) to steady (z (9) ) were not observed, although both transitions are possible in practice. The annotated data were used with the expert knowledge to calculate the final parameter estimates. 6.4.1 Parameter estimation for the Markov chain regime In the top-layer Markov chain, the unknown parameters include the initial probability vector π (0) and the transition matrix A. The jth row vector of A is denoted as π (j) , j∈1 . . . N , and the distributions of different row vectors of A are assumed to be independent with each other. The Dirichlet distribution can be used to describe the prior knowledge about π (j) , j∈0 . . . N [17, 154]. The Dirichlet distribution is often denoted as [17]: π = [π1 . . . πN ] ∼ Dir(m) (6.33) where m:1×N records the observation counts for the N rival events z (i) , and π:1×N represents the probabilities for the N rival events given the observational evidence m, s.t. i πi =1. The mean of a Dirichlet distribution m is E(π)= M , with M = i mi . If new measurements become available and among them z (i) is observed γi times, E(π) is updated as below: E(π) = 1 [m1 +γ1 . . . mN +γN ] i (mi +γi ) (6.34) π (j) was estimated as below: (j) πi (j) (j) = (j) (1 − ρ)mi + ρ˜ πi N i=1 (1 (j) (6.35) (j) − ρ)mi + ρ˜ πi (j) where πi is the ith element in the row vector π (j) , mi , j∈0 . . . N , de(0) notes the number of transitions from z (j) to z (i) , mi represents the count of occurrences of z (i) , and π ˜ (j) represents the parameters translated from the expert knowledge. ρ controls the weighting between the expert knowledge and observational evidence. To compensate for the limited number of annotated data, a larger ρ was assigned to the expert knowledge. 84 6.4. Application to NIBPmean monitoring 6.4.2 Parameter estimation for the intra-segmental models Duration D The duration for a trend pattern is assumed to follow the discretized truncated Gaussian distributions as in [49]. A discretized truncated Gaussian distribution is a variant of the Gaussian distribution with the values in the admissible range discretized to represent the number of sampling intervals, and the values outside the admissible range set to zero. The range, mean and variance for the duration of each pattern were first set by the expert, and then the expert settings were averaged with the sample mean and sample variance calculated from the annotated data. The intra-segmental variations were predicted based on the historical information. The forecasting process was formulated into different models for the switching Kalman smoother and the adaptive Cusum test. Intra-segmental DLM for the switching Kalman smoother The following intra-segmental models were used in the switching Kalman smoother: F= (1−λ) λ ; H=[1 (1−λ) λ (i) 0]; µ(i) =[µ1 0]T ; Q(i) = q (i) 0; ; R(i) =R (6.36) 0 0 where the system matrix F , measurement matrix H and measurement noise variance R are invariant across different patterns. The system state x contains x1 , the true signal value, and x2 , the EWMA of the historical signal true values, where x1 equals x2 plus a trend drift factor η. The EWMA forgetting parameter is fixed for all the patterns, so the distribution of the trend drift factor η determines the incremental rate. η was assumed to fol(i) low a Gaussian distribution with non-zero mean N (µ1 , q (i) ). Measurement noise is assumed to be independent and identically distributed over time and follow a Gaussian distribution. The DLM parameters were estimated from the annotated data using a method similar to [154]. The signal was firstly filtered using a two-point moving average filter. The variance of the filtering residuals was treated as the variance of measurement noise. The forgetting parameter used in the EWMA model was set by investigating the forecast residuals for the steady segments as in the previous chapters. With the chosen forgetting parameter, the mean absolute residual for the first three sampling points of a steady segment, averaged over all the annotated steady segments, should be less than two times of the standard deviation of measurement noise. After the forgetting parameter was determined, the EWMA model was to generate 85 6.5. Results signal predictions. The forecast residuals were treated as observations of the trend drift factor. The sample mean and sample variance of the forecast residuals for each individual segment were averaged over the segments of the same direction and speed, and then used as the mean and covariance of the trend drift factor η. Intra-segmental EWMA model for the adaptive Cusum test The EWMA model can transform the trends of different directions and incremental rates into forecast deviations of different directions and magnitudes. This model satisfies the requirement of the adaptive Cusum test, and therefore is used here (also see Equation (4.1)): x ˆt = λyt−1 + (1 − λ)(ˆ xt−1 ); yt = x ˆ t + ηt ηt ∼ N (d(i) , R) (6.37) where η determines the direction and incremental rate of a trend pattern. The forgetting parameter λ is estimated in a similar manner as in the DLM, and fixed for all the patterns. The sample mean and sample covariance of η (i) were calculated for each pattern using the annotated data. 6.5 Results Both methods were tested using simulated data and clinical data. In the simulated study, 30 simulated signals, each consisting of 10 segmental patterns, were generated for each method, using the corresponding intra-segmental model for NIBPmean. The total number of change points was 270 (excluding the end of case). The samples for the segmental pattern and segmental duration were generated using the Inverse Transform method [35]. The total number of data points was 2196 for the switching Kalman filter and 2184 for the adaptive Cusum test. To evaluate the performance on clinical data, 10 NIBPmean trend signals without heavy artifactual contamination were selected from day surgery cases. The boundaries and patterns of the clinically relevant changes in these cases were annotated by an anesthesiologist using eViewer. The annotations were used as a reference to evaluate the test results. The signals totaled 353 readings and contained 54 annotated segments. To evaluate the change detection performance, the positive and negative instances and the types of detection errors are defined as below. The end point of each segment (excluding the end of case) is taken as a positive instance; the whole segment of a sustained pattern between two successive 86 6.5. Results change points is taken as a negative instance. The number of negative instances is the same as the number of true positive instances. The recognized patterns were grouped into three types: (1) false detection falls more than 3 sampling intervals before or after the true location, or beyond the neighboring change points, or with a falsely detected change direction; (2) acceptable detection has a correct description of the change direction but a false description of the duration or a false description about the speed of change; and (3) accurate detection means that zˆ=z, and the location of the recognized change points are less than 2 sampling intervals away from the true location and not beyond the neighboring change points. Acceptable detections and accurate detections were both considered as true positive detections. 6.5.1 6.5.1.1 Results of the switching Kalman smoother Results on simulated data The proposed fixed-point switching Kalman smoother was tested on the simulated signals with different window sizes from T =1 to T =12. A standard fixed-point Kalman smoother with known segmental states and boundaries was also tested on the same data with the same T . The Root Mean Square (RMS) deviations of the estimates with T −1 point delay were averaged over the 30 simulated signals. The averaged RMS deviation was also calculated for the observations. Compared with the observations, both smoothers achieved better approximation of the true signal levels (see Figure 6.6). As the window size T increases, the estimation accuracy of the standard smoother improves. Compared with the standard smoother, the switching Kalman smoother has larger estimation errors when T 2. But the accuracy of the switching Kalman smoother improves as T increases, and reaches an accuracy close to that of the standard Kalman smoother at T =4. When T 7 the estimation accuracy of the switching Kalman smoother appears to have reached its plateau and once again deviates from the curve of the standard Kalman smoother. The window size T also influences the distribution of the estimated change probability. A Signal-to-Noise (SN) ratio was defined as the mean ∗ ∗ squares of Wt (ft−T +1 =ft−T +1 ) over the mean squares of Wt (ft−T +1 =ft−T +1 ) ∗ (f is the true value of f which is 1 for change points and 0 for non-change points). A high SN ratio indicates that change points can be identified with high certainty. The SN ratio of the smoothing Kalman smoother improves as the window size T increases and reaches a plateau at T =4 (see Figure 6.7). The plateau value of the SN ratio is ≈62. The plateau indicates that the 87 6.5. Results 3 D2: observation D1: with unknown point states D0: with known point states D1−D0 RMS deviation 2.5 2 1.5 1 0.5 0 2 4 6 8 10 12 smoothing window size T Figure 6.6: Root Mean Square (RMS) deviations of the smoothed estimates from the true values of the simulated signals as the smoothing window sizes T increases from 1 to 12 sampling intervals certainty of detection could not be further improved by increasing the size of smoothing window. The residual uncertainty explains why the estimation accuracy of the switching Kalman smoother is below that of the smoother with known segmental states. The smoothing window size was set to T =4 for further testing on the simulated signals and clinical data. Another important tuning parameter is the pre-threshold hW on Wt (sk ). With T =4, the influence of hW on the estimation accuracy and computational complexity of the switching Kalman smoother was evaluated. The number of smoothing computations performed per sampling instant increases from 45 to 550 as hW decreases from 10−2 to 10−9 , while the estimation error stops dropping when hW 0.01% (see Figure 6.8). The pre-threshold was therefore set at hW =0.01%, in order to reduce the computational complexity without compromising the estimation accuracy. The change probabilities Wt (ft−T +1 =1) generated with T =4 were tested against different thresholds, from h=30% to h=70%, with the step size ∆h=10%. As demonstrated on the ROC curve (Figure 6.9), with a decreased threshold h, the switching Kalman smoother generated more true positive detections, but also introduced more false positive detections. Among all the points on the ROC curve, the point corresponding to h=50% is closest to the upper left corner of the ROC figure. h=50% is therefore considered the 88 6.5. Results 70 signal−to−noise ratio 60 50 40 30 20 10 0 2 4 6 8 10 12 smoothing window size T Figure 6.7: Signal-to-noise ratio of the change probability estimated using the switching Kalman smoother with different smoothing window sizes optimal setting in the simulated test. With h=50%, the method detected 90.37% of the 270 change points including 73.3% accurate detections and 17.0% acceptable detections. The false positive rate was 6.67%. 6.5.1.2 Results on clinical data The online pattern recognition process of the fixed-point switching Kalman smoother is demonstrated with an example signal in Figure 6.10. Excluding the end of case, there are 7 segments with different patterns in the example signal. As the time delay increases, the change probabilities increase toward 100% around the change points and decrease elsewhere. This is not surprising, as using the information carried by the future data, the smoothing process improves the certainty of pattern recognition. When the allowable time delay was extended to 3 sampling intervals, all the change points were detected with a threshold h=50%. For the change points at t=11, 44, and 58, the recognized locations were one point after the annotation. At t=58 the probability of the pattern of short gradual decrease z (8) was 42%, slightly higher than the 39% probability of the pattern of short abrupt decrease z (6) , therefore, z (6) was recognized as z (8) . At t=31, the decreasing stroke of an artifact amidst a long steady segment was falsely recognized as a short abrupt decrease. Segments of more obvious patterns took a shorter period 89 6.5. Results Estimation error estimation error computational complexity 1.6 4 1.55 2 1.5 2 3 4 −log10(hW), 5 6 7 8 9 Computational complexity 6*100 1.65 0 hW:threshold on W(ft−T+1=1) Figure 6.8: Estimation error and computational overhead of the switching Kalman smoother as the pre-threshold hW decreases from 10−2 to 10−9 of time to recognize. For example, the change point at t=45 and t=63 could be detected with a delay of only one sampling interval. As in the simulation test, the clinical signals were tested repeatedly with different thresholds h. The change detection performance obtained with each threshold was also recorded by plotting an ROC curve (see Figure 6.9). The curve changes in a similar manner as seen in the simulated data, i.e., the true positive rate and the false positive rate both increase as the threshold decreases. However, the performance on the clinical data was worse than that on the simulated data. With the optimal threshold h=50%, the method recognized 29 trend changes with accurate pattern descriptions and 16 with acceptable pattern descriptions, out of the 54 annotated segments. Two segments were missed and 4 were detected with a false direction. The total number of false detections was 8. 6.5.2 Results of the adaptive Cusum test The standard Cusum test and the proposed adaptive Cusum test were both tested on the simulated signals and the clinical data. The detected changes were classified according to their directions and displayed at the detection locations in Figure 6.11. At t1 , by tracking backward along the upper arm of the fixed mask, the standard Cusum test falsely detected a decrease. This 90 6.5. Results 1 .3 .5.5 0.9 ture positive ratio .3.3 .4.4 0.95 .4 0.85 .4 .5 .5 0.8 .6 .6 0.75 .6 0.7 .6 0.65 0.6 0.55 0.5 .7 .7 0 simulated signals clinical data .7 .7 0.1 0.2 0.3 0.4 0.5 false positive ratio Figure 6.9: ROC curves summarize the performance of change detection of the GHMM-based switching Kalman smoother with different thresholds h on simulated data and clinical data. decrease was successfully avoided by the adaptive Cusum test, since the test mask in the adaptive Cusum test has a much wider openness at the same location. The segment before t2 was recognized as a long abrupt increase. Given that, the probability of a decrease for t>t2 became relatively lower than the initial prior probability. Therefore the rise distance of the upper arm of the adaptive mask became larger than the fixed mask. The change points detected at t=30 were false detections due to artifacts. The adaptive Cusum test and the standard Cusum test were tested on the simulated and clinical data, both repeatedly with different sensitivity (i) settings. The initial rise distance for every pattern z (i) was set to h10%,1 , corresponding to a 10% false positive rate, and the σ was changed from 0.7 to 1.1 in the simulated test, and from 0.6 to 1.0 in the clinical test, with the step size ∆σ=0.1. The ROC curves of the standard Cusum test and the adaptive Cusum test were compared in both the simulated study (see Figure 6.12) and the clinical-data study (see Figure 6.13). The performance of the adaptive Cusum test appeared to be better than the standard Cusum test on both the simulated and clinical data. 91 6.5. Results NIBPmean (mmHg) Annotation: z(7) z(2) z(7) artifact z(9) z(1) z(6) z(4) 18 31 44 55 58 63 70 65 60 55 50 45 9 11 2 point lag 3 point lag probability of being a change point 1 z(7) (2) z z(2) z(7) z(9) z(4) (8) z(1) z h=0.5 0.5 0 1 z(7) z(2) z(6) z(2) z(9) z(4) h=0.5 0.5 1 point lag 0 1 z(6) z(2) z(9) z(4) h=0.5 0.5 0 no lag 1 h=0.5 0.5 0 9 11 18 31 44 Time (interval: 3 minutes) 55 58 63 Figure 6.10: Online pattern recognition process of the switching Kalman smoother demonstrated with the results on an example NIBPmean trend signal. The smoothing window size T =4 and the certainty threshold h=50%. 6.5.3 Summary of the results The optimal performances of the standard Cusum test, adaptive Cusum test, and fixed-point switching Kalman smoother are compared with on another. The switching Kalman smoother and adaptive Cusum test performed better than the standard Cusum test on the simulated data (Table 6.3). However, with the clinical data (Table 6.4), the change-detection performances of all the three methods showed no obvious difference. The accuracy of pattern 92 BPmean (mmHg) 6.6. Discussion Increase −Adaptive Cusum Decrease Stable 70 Increase −Standard Cusum Decrease Stable 60 Insignificant 50 t2 t 1 10 Cusum 0 −10 −20 Fixed Mask −30 Adaptive Mask −40 0 10 20 30 40 Time (interval: 3 minutes) 50 60 Figure 6.11: Change detection results of the GHMM-based adaptive Cusum test and the standard Cusum test on an NIBPmean trend signal recognition of the switching Kalman smoother was obviously higher than that of the adaptive Cusum test in both the simulated test and clinical test. 6.6 Discussion The GHMM is a useful framework for incorporating the prior knowledge about the pattern transition learnt from experts or population data into the online trend monitoring process. The results on the simulated data and the clinical signals suggest that the proposed two methods have great promise for trend-change detection. Both methods provide detailed descriptions of trend patterns, which are desirable as inputs for a diagnostic system. The switching Kalman smoother performed better than the adaptive Cusum test in recognizing trend patterns, especially on the simulated data. This indicates that the second-order Generalized Pseudo Bayesian inference in the switching Kalman smoother provides a better summary of the signal history than the MAP estimation in the adaptive Cusum test. In addi93 6.6. Discussion 1 0.95 0.9 ture positive ratio .7 .8 .7 .8 .9 .9 0.85 0.8 1.0 1.0 0.75 0.7 0.65 1.1 1.1 0.6 Results of the standard Cusum test Results of the adaptive Cusum test 0.55 0.5 0 0.1 0.2 0.3 0.4 0.5 false positive ratio Figure 6.12: The ROC curves of the standard Cusum test and the adaptive Cusum test on simulated data. The sensitivity parameters σ was set to different values from 0.7 to 1.1. Both methods obtained the best performance with σ = 0.9. 1 .6 0.95 .7 .6 0.9 .7 ture positive ratio .8 0.85 .8 0.8 0.75 .9 0.7 1.0 0.65 .9 0.6 0.55 1.0 0.5 0 0.1 Results of the standard Cusum test Results of the adaptive Cusum test 0.2 0.3 0.4 0.5 false positive ratio Figure 6.13: The ROC curves of the standard Cusum test and the adaptive Cusum test on clinical data. The sensitivity parameters σ was set to different values from 0.6 to 1.0. Both methods obtained the best performance with σ = 0.8. 94 6.6. Discussion Table 6.3: Results on simulated data unit: percentage (number) Accurate Acceptable FP FN Standard Cusum 87.8% (237) 15.9% (43) 11.9% (32) Adaptive Cusum 60.0% (162) 91.8% (244) 8.2% (25) 4.1% (11) Switching KS 73.3% (198) 90.4% (244) 6.7% (18) 6.7% (18) Switching KS: switching Kalman smoother; FP: false positive; FN: false negative Table 6.4: Results on clinical data unit: percentage (number) Accurate Acceptable FP FN Standard Cusum 79.6% (43) 14.8% (8) 5.6% (3) Adaptive Cusum 35.2% (19) 85.2% (46) 11.1% (6) 3.7% (2) Switching KS 53.7% (29) 83.3% (45) 14.8% (8) 3.7% (2) Switching KS: switching Kalman smoother; FP: false positive; FN: false negative tion, state pruning was more precisely controlled in the switching Kalman smoother than in the adaptive Cusum test. In the switching Kalman smoother, the potential of every point state is evaluated by the probability of occurrence Wt (sk ), whereas in the adaptive Cusum test, the goodness of state pruning depends on the accuracy of the Cusum test. The existence of artifacts is one of the reasons for the false detections. Artifacts often have a very large amplitude and a brief duration. Artifacts of this type could be defined as an extra pattern in the GHMM, and recognized in the process of change detection. The performance of both methods can be improved if artifacts are effectively recognized and removed. The performance of both methods on clinical data was not as good as on simulated data. The reason for the degraded performance is that the model parameters could not describe the signal accurately due to the limited number of training cases. It is desired to train the model using a large number of annotated cases, and based on the new parameter estimates, to evaluate the performance of the state-estimation methods. It is also possible to estimate the parameters using the EM algorithm [93] using non-annotated data. The EM estimation can only be trusted when initialized with good ML estimates. The change detection performance with the obtained parameters should be compared with the performance obtained with the current settings. To reduce annotation bias, it would be desirable to have multiple anesthesiologists annotate the signals and use the common annotations as the reference. 95 6.6. Discussion The tradeoff between temporal granularity and efficiency is an unavoidable problem for any computationally intensive algorithm. The computational and space complexity for both methods increase with the length of the segmental pattern. Many physiological variables in the current clinical setting are measured at a much higher frequency than the NIBPmean. For example, HR is usually measured every 1 second or every 5 seconds. At such a sampling rate, a trend pattern typically lasts for over one hundred data points. When the proposed algorithm were directly used to monitoring the variables recorded at such a high frequency, the computational and space complexity could become intractable. To handle this problem,we recommend signals measured at a high sampling rate be segmented into small time slices and the proposed methods be applied on the features such as mean or median of these slices instead of the original data. However, this preprocessing scheme should only be used with the full awareness that it may reduce the resolution of pattern recognition. The size of the time slices should be decided according to the purpose of a specific application. In contrast to the non-adaptive Cusum test used in Chapter 4, the adaptive Cusum test based on the GHMM does not generate repetitive detections for sustain trend changes. In the proposed GHMM, the probability is very low for the transition between two segments of the same direction. Therefore, after a trend change is detected, the rise distance for the V-mask arms designed to detect the changes in same direction will automatically become very large. Repetitive detections are avoided in this way. The performance of the standard Cusum test in this chapter appearers worse than that in Chapter 4. This can be explained by the annotation bias. In Chapter 4, data annotation was performed on the detection results, therefore highly susceptible to bias. In this chapter, the annotation process was performed on the unprocessed data. The raters were not influenced by the detection results and therefore were more objective. The proposed switching Kalman smoother can be used to monitor trend signals of other physiological variables or time-series from other application areas, as long as the signal can be described using the framework of the GHMM, i.e., there are finite number of trend patterns, the transition between these patterns follows a first-order Markov process, and the intrasegmental variations can be appropriately described by a state-space model. The switching DLM used with the switching Kalman smoother is transformed from the GHMM in this chapter. The speciality about this model is reflected in the calculation process of Pr(st |st−1 ). Other than that, the proposed fixed-point switching Kalman smoother can be used for state estimation with general switching dynamic linear systems. 96 Chapter 7 Multivariate change detection based on factor analysis 7.1 Introduction The previous chapters have addressed the problem of adaptive trend-change detection in a single variable. The proposed univariate methods can be used on each individual trend signal in the sensor matrix to monitor the adequacy of a patient’s physiological status and the integrity of a monitoring system. More than twenty physiological variables are routinely measured during a standard surgical procedure. These variables interrelate closely, providing an integrated observation of a patient’s physiological state. Monitoring the signals separately is not the best solution for physiological monitoring. First, the occurrence of a clinical event often causes trend changes in multiple variables. For example, varying depth of anesthesia causes trend changes in almost all cardiodynamic and respiratory variables. These trend changes will trigger a large number of alerts if not fused together. This will increase the cognitive load of the attending anesthesiologist. Second, many adverse events can only be detected and identified by analyzing the interrelationship between the direction and amplitude of these trend changes. For example, BP may give an early indication of a cardiovascular problem in an anesthetized patient. However, in a typical case of moderate intraoperative hemorrhage, the sympathetic autonomic nervous system will cause an increase in HR to compensate for a reduction in stroke volume, which will result in a steady BP. Therefore, the drop in BP is usually small until blood loss exceeds compensatory mechanisms. Due to this homeostatic mechanisms, it is sometimes difficult for a univariate method to detect events effectively if the changes in other variables are ignored. In this chapter, Factor Analysis (FA) is used to address these problems by extracting the linear structure in the relationship between trend variations 97 7.2. Factor analysis in different variables. The extracted model represents the pharmacological effect of the anesthetics under normal conditions, and can be used to calculate predictions for the whole set of variables. As in the univariate study, investigating the forecast residuals reveals adverse events during surgery. The performance of the proposed method was evaluated using simulated signals in the scenarios of intraoperative hemorrhage and light anesthesia generated using a surgical simulation software tool. The method is intended to evaluate the potential and to identify the challenges when FA is applied to multivariate physiological monitoring. The work has been published in Proceedings of the 29th Annual International Conference of the IEEE Engineering in Medicine and Biology Society [162]. 7.2 Factor analysis As most intraoperative events can only be detected by investigating the trends of physiological variables, the first step is detrending. The trend signals are centered using an EWMA forecaster (see Section 4.1.1). A patient’s intraoperative status varies under the regulation of several homeostatic control systems. If we assume that the variations in each physiological variable are a linear combination of the variations in the unobservable variables with a specific additive component, the residuals after detrending can be modeled as in Equation (7.1): X = Af + e (7.1) where X : n×1 represents the observed forecast residuals, loadings matrix A : n×m is a constant matrix, common factors f : m×1, represents independent latent variables that have influence on multiple observed variables, and specific factors e : n×1 is the vector of residuals, each of which only contributes to one particular variable. If the training data set is large enough and includes trend variations of different magnitudes and directions, X, f and e can be assumed to follow multivariate Gaussian distributions with zero mean, with the covariance matrices denoted as S=E(XX T ), φ=E(f f T ), and ψ=E(eeT ). FA separates the common factors f from the specific factors e and requires their covariance to satisfy: 1. φ=I: the common factors f are independent of each other and follow the normal distribution; 2. ψ is diagonal: the specific factors e are uncorrelated; 3. E(ef T )=0: f and e are uncorrelated. 98 7.2. Factor analysis Principal Component Analysis (PCA) is another popular technique for dimension reduction. However, many researchers have claimed that PCA should not be used as a model, as it only summarizes the training data and is not suitable for representing data beyond the original measurements [147]. There have been a number of comparative simulation studies, such as [125], in which PCA was found to be inferior to FA in finding the underlying structure in data simulated from a factor model. Although PCA and FA are conceptually different, the two techniques often give similar numerical results in many empirical studies, especially if the number of latent variables m is far less than the number of observed variables n, and ψ, the specific covariance only has small terms [68, 147]. The factor model estimated using the ML method (see below) is independent of the units of measurement of the signals. If a variable yi is multiplied by a constant, the ith row of the loading matrix A needs to be multiplied by the same constant, and the corresponding specific variance needs to be multiplied by the square of the constant [66]. This property is very desirable for physiological monitoring, as some physiological variables, such as Ppeak, may be measured in different units in clinical practice. The relationship between variations in the measured physiological variables during surgery are mainly influenced by anesthetic agents. It is assumed that in this chapter the same anesthetic is used throughout the maintenance phase of anesthesia. The method is intended to detect changes either in the covariance structure or in the trend of latent variables. This task is realized in two steps: model estimation and change detection. 7.2.1 Maximum likelihood estimation of a factor model Many methods have been proposed to estimate a factor model, including canonical correlation analysis [108], principal factor analysis [110], Maximum Likelihood (ML) method [66] and Bayesian method [104]. The ML and Bayesian methods have gained great popularity in recent years with the growing availability of high computational power. The ML FA is used for model estimation in this chapter. The first step in ML FA estimation is to calculate the sample covariance ˆ Sˆ is calculated by averaging XX T in the training set: matrix S. Sˆ = K T k=1 (Xk Xk ) (K−1) (7.2) where X is the vector of forecast residuals and k represents the kth sample vector in the training set. The total number of sample vectors in the training 99 7.2. Factor analysis ¯ can be calculated by normalizing each set is K. The correlation matrix Sˆ variable using the corresponding standard deviation. Under the multivariate normality assumption, the ML method estimates the loading matrix A and the specific covariance ψ from Sˆ by optimizing the log-likelihood logP r(X|A, ψ). This problem is equivalent to: (A, ψ) = argmin {tr(b) − log |b| − n} (7.3) A,ψ ˆ Many methods have been proposed for this optiwhere b=(AAT +ψ)−1 S. mization problem given the number of common factors [66, 67, 112]. Number of common factors The Generalized Likelihood Ratio (GLR) test is often used with the ML method to determine the number of common factors m for a factor model [66]. In the GLR test, K(tr(b) − log |b| − n) (K: sample size) has an asymptotic χ2 distribution under the null hypothesis that m=m. ˆ The number of degrees of freedom for this χ2 distribution is: 1 d = [(n − m) ˆ 2 − (n + m)]. ˆ 2 (7.4) The goodness-of-fit test usually needs to be carried out a number of times with m ˆ increased stepwise until a satisfactory significance is obtained. The PCA is often performed on the sample correlation matrix to generate the initial guess for m [30]. Structure of loading matrix The loading matrix A estimated by the ML method is not the only matrix that satisfies the conditions required by a FA model. All the matrices orthogonal to the estimated A, together with the estimated ψ, meet these conditions. Further constraints were imposed on A to find a loading matrix with a desirable structure. We assume that the variables tend to load either strongly or weakly on the common factors. An orthogonal rotation called Varimax [71] is used to drive the terms in A toward -1, 0, or 1 and away from intermediate values. Varimax maximizes: m Q= n { j=1 i=1 7.2.2 A4ij 1 − ( n n A2ij )2 }. (7.5) i=1 Change detection based on the FA Calculation of factor scores As mentioned before, the relationship between the variations in physiological variables is mainly determined by the 100 7.2. Factor analysis pharmacological effect of anesthetic agents. More than one FA model is needed to represent the normal responses of an average patient to a wide variety of anesthetic combinations. The first step of intraoperative monitoring is to select the right factor model according to contextual information. With the selected factor model, the common scores f and the specific scores e for new observations X can be calculated as below [137]: fˆ = AS −1 X eˆ = X−Afˆ (7.6) ct or Test statistics for change detection Intraoperative events either cause variations in the common scores f (A0 →A1 in Figure 7.1), and/or in the correlation structure (A0 →B0 or A1 →B1 in Figure 7.1). In the latter case, Specific factor I C om m on fa Model collapsed A 0 A 1 B 0 B 1 Model collapsed Uncertain area Specific factor II Figure 7.1: Intraoperative events may cause physiological signals to change along the direction defined by the common factors (A0 →A1 ) or in the covariance structure (A0 →B0 or A1 →B1 ). The structural changes A0 →B0 and A1 →B1 have the same degree of clinical significance in most scenarios. the deviations A0 →B0 and A1 →B1 , although with different magnitude |e|, 101 7.3. Simulated test have the same degree of clinical relevance. The test statistics F and E are designed to capture these changes: F (t) = fˆT fˆ E(t) = k=t k=(t−T ) k=t k=(t−T ) ˆ eˆ(k)T ψ −1 e(k) (7.7) X(k)T X(k) F measures the distance from the stable state where the common scores are zero, and E indicates the degree of rotation between the current covariance structure and the covariance under the normal conditions. In E, the specific scores e are scaled by the amplitude of local variations, so as to highlight the changes in the correlation structure. In Equation (7.7), the magnitude of local variations in the observations X and specific scores e are both filtered using a T -point moving average filter, before the ratio of deviation is calculated. Otherwise, when random oscillations are not removed by filters beforehand, the operation of division may exaggerate the uncertainty in these variations. For the same reason, if the magnitude of local variations in X is too small, E should not be used for change detection (see the uncertain area in Figure 7.1). 7.3 7.3.1 Simulated test Generation of simulated scenarios Simulated clinical cases were generated using a commercially available anesthesia trainer software Body Simulation (Advanced Simulation Corporation, WA, USA). The cardiopulmonary modeling in this software provides clinically realistic responses to drugs and other stimuli. An experienced anesthesiologist performed general anesthesia virtually using this software on healthy patients who underwent a surgical procedure for inguinal lymph node removal. All cases started with a modified rapid sequence of intravenous induction of anesthesia with propofol (100mg), morphine (5mg), and rocuronium (6mg) followed by endotracheal intubation. The patients were all on controlled ventilation. Anesthesia was maintained with the infusion of propofol. The procedure was performed on simulated patients of three different ages (44, 65, and 95 years). The cardiovascular characteristic parameters, such as the venous compliance, were adjusted to reflect the typical baroreceptor reflex in each age group. The scenarios of light anesthesia and slight to moderate intraoperative hemorrhage were simulated in each patient after endotracheal intubation. 102 7.3. Simulated test The propofol infusion rate was reduced from the baseline of 6mg/kg/h to different rates 1-3 times over the duration of each case, in order to simulate the scenario of light anesthesia. In the hemorrhage cases, the propofol infusion rate was kept constant at 6mg/kg/h, and a blood loss of 100ml/min for 10 minutes (Hemorrhage-A), 50ml/min for 10 minutes (Hemorrhage-B), and 25ml/min for 10 minutes (Hemorrhage-C) was introduced during the maintenance period of anesthesia. For each age group, 10 light anesthesia cases, 2 Hemorrhage-A cases, 2 Hemorrhage-B cases, and 2 Hemorrhage-C cases were generated. In total there were 30 light anesthesia cases, including 84 episodes of varying depth of anesthesia (53 decreases and 31 increases), and 18 cases of intraoperative hemorrhage. The trend signals of 12 physiological variables, including SpO2 , SvO2 , HR, MAP, MCVP, BPsys, BPdia, TVexp, EtO2 , EtCO2 , Ppeak, and RR (see Appendix II for more details of these variables), were recorded at a sampling rate of one second. The starting and end points for each event were annotated by the same anesthesiologist using M-eViewer, with reference to the event log in Body Simulation. The signals were detrended using an EWMA forecaster as in Chapter 4. The forgetting parameter λ in the EWMA model for each physiological variable was set empirically by investigating the filtering residuals for the steady segments. With the chosen forgetting parameter, the mean absolute residual over the first 2 minutes after the starting of every steady segment, averaged over all the annotated steady segments, should be less than two times of the standard deviation of measurement noise. Fifteen light anesthesia cases, 5 from each patient group, were used for model estimation. RR and Ppeak were excluded from the study since these values were steady during the entire maintenance phase of anesthesia in all the simulated cases. The number of common factors was initially set to 2, ¯ with a 90% CPV determined by eigenanalysis of the correlation matrix Sˆ threshold. The CPV method calculates the percentage of overall variances captured by the m principal eigenvalues and accepts m if the percentage is larger than the predefined threshold. The ML model estimation was implemented in Matlab using the standard optimization toolbox. m=2 was found to meet the 10% significance level and was accepted as the number of common factors. The remaining 15 light-anesthesia cases, containing 47 episodes of varying depth of anesthesia and the 18 intraoperative hemorrhage cases were used with the estimated FA model to verify the performance of the proposed test statistics. 103 7.3. Simulated test 7.3.2 Case study A hemorrhage scenario and a light-anesthesia scenario in the same 65-yearold patient were selected to demonstrate the temporal variations of the proposed FA-based statistics in different situations. The first case (see Figure 7.2) is a typical hemorrhage scenario. In this case, the hemorrhage started at a rate of 100ml/min after the cardiovascular vital signs reached the post-induction stable stage and then continued for 10 minutes until the patient eventually collapsed. Although intraoperative hemorrhage is a serious clinical event and demands immediate intervention, the resulting variations in the monitored physiological variables were very subtle. For example, the Mean Arterial Pressure (MAP) dropped from around 63mmHg to around 48mmHg before the collapse. The speed of change was around 1.5mmHg/minute, a rate that can be very difficult to detect during standard clinical monitoring. The proposed statistics summarize the changes in multiple variables and capture the subtle deviation from the normal covariance structure. During hemorrhage, F did not show obvious changes, while the elevated E over the bleeding period indicated that the FA model was no longer valid. This observation is not surprising, as F summarizes the changes in the magnitude of the common scores, and the variations in all the variables were very subtle during the bleeding, while the statistic E is designed to capture the changes in the relationship structure without considering the amplitude of variations. At the collapsing point, both F and E increased dramatically due to very large variations in amplitude (The transient decrease in E right after the patient collapse may be explained by the fact that the changes in the signals are not synchronized.) In the second case (see Figure 7.3), the depth of anesthesia was decreased, and then increased by changing the infusion rate of propofol. The F statistics increased significantly over the duration of varying depths of anesthesia. The E statistics stayed at a very low magnitude (<0.1), indicating that the variations were mainly due to the effects of drugs. The E statistics increased slightly at the beginning of the increased depth of anesthesia because the responses of different physiological variables to the increased propofol infusion rate were not synchronized, and thus resulted in a temporary deviation from the normal inter-variable relationship. 104 7.3. Simulated test 7.3.3 Performance of the Cusum test on F and E The F and E statistics were calculated and then averaged over every annotated episode in all the simulated cases. These averaged values, denoted as ¯ were grouped according to the patient status in the correspondF¯ and E, ¯s for the stable ing episode: SF¯s and SE¯s contain the mean values F¯s and E ¯ ¯ episodes, SF¯h and SE¯h contain the mean values Fh and Eh for the hemor¯va for rhage episodes, and SF¯va and SE¯va contain the mean values F¯va and E the episodes of varying depth of anesthesia. As demonstrated in the case study, when the patient status is stable, both F and E are close to zero. In contrast, when a clinically relevant event occurs during anesthesia, either F or E increases (or both increase). To detect intraoperative events online, the problem becomes change-point detection in the test statistics F and E. The Cusum test was used to detect upshifts in F and E. Although the uncertainty components in F and E do not follow the Gaussian distribution, the Cusum test, as a robust technique, can still be used for change detection in F and E. The length-constrained Cusum C + proposed in Chapter 4 was used here with the forgetting length set to 180 sampling intervals (3 minutes) for both statistics. The definition of C + (t) is repeated below: C + (t) = max(C + (t − 1) + e(t) − d+ /2, 0) (7.8) where e(t) represents the signal to be tested, i.e., F (t) or E(t). The Cusum statistics CF+ and CE+ were tested against the corresponding + thresholds h+ F and hE to detect changes in the amplitude of common factors or in the correlation structure. The target upshift of minimum interest d+ in Equation (7.8) and the test thresholds were set empirically. To detect the occurrence of a varied depth of anesthesia, the magnitudes of d+ F were set to half of the minimum of SF¯va , plus the average of SF¯s , and the test threshold h+ F was determined by trial-and-error to obtain the highest true positive rate. The Cusum test was performed twice on the E statistics. In + the first test, d+ E and hE were set similarly as in the test on the F statistics, + and fixed for all the cases. In the second test, d+ E and hE were manually adjusted for each case. The detection results were listed in Table 7.1. A true positive T PF (T PE ) detection refers to an event detected in an unstable episode by testing the statistic F (E). A false positive F PF (F PE ) detection refers to an event generated during a stable episode by testing the statistic F (E). If more than one positive detection was generated in the same episode, only the first one was counted. Missed events refer to the episodes with a real event 107 7.4. Discussion Table 7.1: Detection of hemorrhage and varying depth of anesthesia using the Cusum test on the statistics F and E based on a factor model Scenario T PF T PE F PF F PE Missed Varying DA 41 (87.3%) 2 (4.3%) 0 0 4 (8.5%) Bleeding 0 0 15 (83.3%) 0 13 (72.2%) 3 (16.6%) 0 15 (83.3%) 0 3 (16.6%) 3 (16.6%) Bleeding ∗ The test cases contained 47 episodes of varying depth of anesthesia and 18 episodes of bleeding. DA: depth of anesthesia; T PF (T PE ): true positive detection generated by testing on F (E); F PF (F PE ): false positive detection generated by testing on F (E); Bleeding0 : results with fixed settings for the Cusum test; Bleeding∗ : results with the Cusum test manually adjusted for each case. that was not detected by either F or E. The Cusum of F appears to be an effective indicator of varying depth of anesthesia. As for hemorrhage detection, the performance of E with the fixed Cusum settings (see Bleeding0 in Table 7.1) was unsatisfactory. Investigation of the failed cases revealed ¯ in the same status group (S ¯ , that the magnitudes of the episode means E Es SE¯h or SE¯va ) varied dramatically during different cases. The magnitudes ¯h for the hemorrhage episodes in some cases was even smaller than the of E ¯s for the stable episodes in other cases. If the magnitudes of magnitudes of E + dE and the threshold h+ E could both be adjusted for each case, the detection performance would be greatly improved (see Bleeding∗ in Table 7.1). 7.4 Discussion The results of the Cusum test appear to be highly correlated with the occurrence of simulated intraoperative events. The two simulated clinical scenarios are different in nature: the physiological variables still follow the interrelationship determined by the effect of anesthetic agents during variations in the depth of anesthesia, but not during bleeding. In the factor model, a variation in the depth of anesthesia presents itself as a trend change in the magnitude of common factors, while a hemorrhage presents itself as a sustained deviation from the normal correlation structure. The proposed statistics were able to capture and differentiate these events if the statistical test on them was appropriately configured. However, how to configure the statistical test still remains a challenge. In all the simulated cases, the F statistics for the stable episodes were consistently close to zero, and the increases in F caused by variations in the depth of anesthesia were significantly larger than zero. In contrast, these 108 7.4. Discussion was no constant baseline for the E statistics. Any variations not captured by the common factors, regardless of whether they were caused by intraoperative events, inter-patient variability, or unsynchronized responses of different physiological variables to a varied infusion rate, are all transferred to the residual part E, resulting in a great variability in E. Using fixed Cusum parameters generated a high rate of false detections. It is desirable to adapt the test online to the characteristics of each case. A potential solution is to detect trend changes instead of level shifts in the E statistics using the univariate methods proposed in the previous chapters. The statistic E summarizes the variations in multiple variables, but also discards some information. One of the discarded properties is the direction of change. As seen in Figure 7.3, the elevated E statistics over the annotated Span B were actually caused by two events: light anesthesia followed by an increase in the depth of anesthesia. The E statistics for the two events merged as one sustained increase, and can only be differentiated by checking the contribution of each individual common factor. Most missed episodes of varying depths of anesthesia in Table 7.1 can be attributed to this cause. There are a few challenges when applying the proposed method to real clinical data. The first challenge is data annotation. To extract an accurate factor model and to learn the characteristics of the proposed statistics in different scenarios, we need to collect a large volume of data from a variety of patient groups. All these data need to be classified according to their contextual information, and for each intraoperative event, the starting and end points of the relevant changes in each physiological signal need to be annotated. Secondly, the variables should be weighted according to their relative significance. Although weighting the variables does not change the structure of the FA model estimated using the ML method, the magnitude of the factor scores will be influenced by the weighting factors. Furthermore, in practice, the uncertainty in the signals does not follow the Gaussian distribution. For many events, the changes in different variables are not synchronized as well as in the simulated data. More sophisticated factor models, such as dynamic factor analysis [88], should be used to address this problem. The statistic F can summarize the variations in all the variables, and the statistic E can capture subtle events by testing the adequacy of a pre-learnt factor model. However, multivariate monitoring techniques, even with successful application to real clinical data, cannot completely replace univariate trend-change detection methods. Univariate methods usually perform better than multivariate methods in recognizing temporal features. How to fuse the results of multivariate methods with the results of univariate methods is an interesting research topic. 109 Chapter 8 Artifact detection and data reconciliation The existence of artifacts creates a major challenge for physiological trend monitoring. At each sampling time, yi (t) the measurement of the ith physiological variable in an M -dimensional sensor matrix is the level of the physiological variable at the site of interest yi∗ (t) (called the true signal level) plus the overall noise contamination, as described in Equation (8.1). The overall noise ni (t) refers to the distortion caused conjointly by background noise ηi (t) and artifacts ai (t). Since artifacts are often caused by interference or system malfunction, they are also referred to as gross errors. yi (t) = yi∗ (t) + ηi (t) + ai (t); ηi (t) ∼ N (0, Ri ) i∈1 . . . M (8.1) ni (t) This chapter estimates the signal true value yi∗ in the presence of background noise ηi and artifacts ai . The background noise ηi (t) is assumed to follow a Gaussian distribution and be independent over time and across sensors. The magnitude of the noise variances R can be different for different sensors, but is assumed to be constant over time for the same sensor. As pointed out in Chapter 1, artifacts ai are grouped into transient artifacts and sustained short-peak artifacts according to their frequency power spectrum or their duration. The frequency spectrum of short-peak artifacts has a considerable overlap with that of the physiological trend changes; therefore the presence of shortpeak artifacts may degrade the performance of the proposed trend-change detection methods significantly. The measurements contaminated with background noise still carry information about the signal true values and should be utilized in the process of signal estimation. However, if an artifact ai (t) is presented in the ith sensor, the measurement yi (t) often deviates far from the true physiological value. Using the artifact in signal estimation will distort the estimates for other sampling instants or other variables. A signal estimation method in 110 8.1. Methods physiological monitoring always consists of two processes: (1) artifact detection to identify and eliminate the contaminated measurements and (2) signal estimation to derive the true signal level. If a variable is contaminated with artifacts, the true values should be estimated from other information sources if measurement redundancy exists. Measurement redundancy commonly exists in the sensor network in the current critical care environment. Most physiological variables are measured at a frequency much higher than that of normal physiological variations. The measurements of a variable in a short time window can be viewed as repetitive observations, or are related in a known manner. These relationships are referred to as temporal redundancy. Furthermore, some signals are measured by more than one independent sensor, or are related by physiological mechanism or measurement principle. These relationships are referred to as structural redundancy. Both temporal redundancy and structural redundancy exist in intraoperative HR measurements. This chapter uses the example of HR trend signals to demonstrate how temporal and structural redundancy can be used to solve the problem of artifact detection and signal estimation. This chapter begins with a brief review of the temporal filtering techniques used in the previous chapters. The framework of data reconciliation is then introduced to formalize the problem of signal estimation in the presence of measurement redundancy. In the end, both temporal and structural information are used in the framework of data reconciliation, generating the stochastic dynamic data-reconciliation process. A hybrid median filter is proposed as an implementation of the dynamic data-reconciliation process for HR measurements. The results demonstrate that the accuracy of signal estimation can be greatly improved by incorporating structural redundancy into the signal estimation process. This is particularly true when signals are contaminated with short-peak artifacts. The proposed hybrid median filter provides a robust estimation of the HR signal without requiring strict assumptions about the signal’s characteristics, and therefore has great potential for practical use. The work in this chapter has been published in Journal of Clinical Monitoring and Computing [160]. 8.1 8.1.1 Methods Univariate temporal filtering Univariate filtering techniques utilize temporal redundancy to identify artifacts and generate signal estimates. Temporal filters formalize the knowledge about redundancy as stochastic dynamic models. Statistical tests based on 111 8.1. Methods the forecasting models can then be used to detect artifacts and estimate the true signal levels. Models used for the purpose of noise reduction are very different from those used for trend monitoring. Models used in trend monitoring, as seen in the previous chapters, are designed to summarize clinically relevant variations, with insignificant physiological oscillations reflected as model uncertainty. For the purpose of signal estimation, the dynamic models used in this chapter should be more sensitive. An appropriate model for noise reduction should be able to quickly follow all the variations in physiological variables, so that only measurement noise and artifacts are separated out in forecast residuals. One option for noise-reduction is to first detect artifacts and then estimate the signals from uncontaminated measurements. Another possibility is to use robust estimators to directly estimate signals from the artifactcontaminated measurements. Both schemes are explained in the following sections. The methods are based on different descriptions of the signals’ dynamics, and the results are optimized with respect to different criteria. 8.1.1.1 Kalman filter as MMSE estimator The Kalman filter is a recursive Minimum Mean Square Error (MMSE) estimator for the signals described by the state-space dynamic linear model. The dynamic linear growth model in Equation (5.1) follows the signal variations very quickly. This property is not desirable for trend monitoring, but serves the purpose of artifact detection very well, and is used here to describe the HR signal. Since background noise is assumed to follow a Gaussian distribution, the estimates generated using the Kalman filter are also optimal in the sense of maximum likelihood. Artifact detection should be performed before the standard Kalman filtering process (see Section 5.1.2) is used. The forecast residuals e should be tested at every sampling instant. Only the measurements not contaminated with artifacts should be used for updating the signal estimates. Since the Kalman filter is a linear filter and the background noise follows a Gaussian distribution, many statistical tests used for statistical process control can be used for artifact detection. These include the Shewhart (threshold) chart and Cusum test. The threshold method is the most widely used scheme because it introduces a minimal time delay. If a measurement is detected as an artifact, the prediction is used as the signal estimate. Most liner filters, including the moving average filter and the EWMA filter, can be used to estimate signal values after artifact detection. The 112 8.1. Methods (a) Temporal median filter y(t) y(t- 2 ) y(t- 1 ) (b) Structural median filter y1 (t) (c) Hybrid median filter y1(t) y2 (t) y2(t) y1(t- 1) y3 (t) Med [ y(t- 2 ), y(t- 1), y(t) ] Med [ y 3(t), y 2 (t), y 1 (t) ] y2 (t- 1 ) Med [ , y 1(t- 1 ), y 2 (t- 1 ), y 2(t), y 1 (t) ] Figure 8.1: Comparison of the standard, structural, and hybrid median filters that generate the estimate yˆ(t). estimation accuracy is partly determined by the threshold used in artifact detection and also by the accuracy with which the underlying dynamic models of the filters describe the signal. 8.1.1.2 Median filter as optimal L-1 filter Standard median filter A standard T -point moving median filter replaces the current measurement with the median of the most recent T raw data points (see Figure 8.1-(a), where T =3). The process is represented by: yˆ(t) = M ed [y(t−T +1) . . . y(t)] (8.2) where M ed represents the median operator. To find the median, the numbers in the filter window are first arranged in an increasing (decreasing) order. Then the median is the center value if T is even, or the average of the middle two values if T is odd. In this section, T is assumed to be odd, i.e., T = 2N +1, where N is an positive integer. The filter window moves one step forward after a new measurement is received. The properties of the (2N +1)-order standard median filter are described in [46] using three basic signal characteristics. A constant neighborhood is a region of at least N +1 consecutive constant values; an edge is a monotonic region of at least N +1 sampling intervals between two constant neighborhoods of different values; and an impulse is a maximum of N −1 points between two constant neighborhoods of the same value. The standard median filter completely removes impulses, but preserves the shape for edges. The delay introduced by the (2N +1)-order median filter is N sampling intervals. In essence, an appropriately configured median filter can remove transient noises without attenuating the signal edge. This property has estab113 8.1. Methods lished median filters as the filter of choice for denoising trend signals [22, 60, 81]. The standard median filter is used in the previous chapters for reducing transient artifacts. Median filter as L-1 estimator The assumption underlying the standard median filter is that the measurements falling within a short window T are samples of the same signal level y ∗ (t), as described in: y ∗ (t) − y(t) = ε(t) ∗ y (t) − y(t−1) = ε(t−1) (8.3) ... ∗ y (t) − y(t−T +1)=ε(t−T +1) where ε is a random variable with zero mean, representing the uncertainty about the equality relationship. The degree of variability of ε influences the credibility of the corresponding measurement. If the model in Equation (8.3) is valid, the result of the standard median filter is an optimal estimate of the true value that minimizes the mean absolute deviations (L-1 norm), i.e.: |˜ y − y(t−k+1)| }. yˆ(t) = argmax{ y˜ (8.4) k=1...T Every measurement in the filter window is assigned the same degree of credibility in the standard median filter. Weighted median filter Different credibility can be assigned to the measurements in the filter window by weighting the absolute deviations: w(k)|˜ y − y(t−k+1)| } yˆ(t) = argmax{ y˜ (8.5) k=1...T where w(k) is a positive integer. Given W = Tk=1 w(k), w(k)/W represents the relative weight for the kth measurement in the filter window. The weighted median filter [21] can be used to find the point that optimizes the criterion defined in Equation (8.5) as below: yˆ(t) = M ed [y(t−T +1) w(t−T +1) . . . y(t) w(t)] where the symbol (8.6) represents duplications, i.e., y(t) w(t) = y(t) . . . . . . y(t). (8.7) w(t) 114 8.1. Methods Electrical and mechanical cardiac events to produce heart beat Electrical potential ECG device HR ecg Pulsatile variation in plethysmograph Pulse oximeter HR SpO2 Pulsatile variation in arterial pressure Pressure transducer Physical principles Anesthesia monitor HR abp Measurement methods Recording Figure 8.2: Structural redundancy in the HR measurements: the mechanical and electrical events associated with each heartbeat are transmitted to and detected by different sensors. The weight assigned to each measurement in Equation (8.5) is expressed as the frequency with which this measurement appears in the filter data set. Temporal filters can effectively remove transients but have minimal effect on short-peak artifacts. Since the frequency spectrum of short-peak artifacts has a considerable overlap with that of the physiological trend changes, any univariate dynamic models that can reduce short-peak artifacts will also distort the relevant trend changes. Reducing short-peak artifacts represents a critical problem for physiological monitoring. 8.1.2 Data reconciliation with structural redundancy Structural redundancy in HR measurements A typical example of structural redundancy exists in HR measurements. Intraoperative HR is routinely measured by averaging the reciprocals of R-R intervals from the ECG. HR is also calculated by measuring the interval between peaks in the plethysmographic waveform from the pulse oximeter, and when available, from the interval between peaks of the waveform of invasive arterial BP. As the energy generated from the heart beating flows along different paths to these sensors (see Figure 8.2), the measurement of HR is exposed to different types of interference. Artifacts from different propagation paths are independent of one another. Electrocautery noise is a common artifact when the ECG (HRecg ) is used to determine HR. HR measured by a pulse oximeter (HRSpO2 ) is sensitive to patient motion, low perfusion states, and interference from ambient light [58]. Artifacts from invasive arterial BP waveforms (HRabp ) are frequently caused by sampling or flushing, disturbances of the sampling fluid 115 8.1. Methods column, and occasionally by catheter clotting [84]. The true pulse rates measured by HRSpO2 and HRabp are identical to the true HR. Situations where HR may differ considerably from pulse rate, such as fibrillation and other instances of abnormal rhythm, are not considered. The relationships between these measurements are described as: y1∗ (t) − y2∗ (t) = 0 y1∗ (t) − y3∗ (t) = 0 (8.8) where y1∗ , y2∗ , and y3∗ represent the true level of HRecg , HRSpO2 , and HRabp , respectively. The dimension of the sensor matrix is M =3. Framework of data reconciliation The signal measurements can be rectified so that the estimates are consistent with known relationships. This process is referred to as data reconciliation. The framework of data reconciliation for an M -dimensional sensor matrix is described below: min Φ(t) = M yi (t), yi (t)) i=1 φi (ˆ H(ˆ y1 (t) . . . yˆM (t)) = εH G(ˆ y1 (t) . . . yˆM (t)) + εG 0 s.t. (8.9) where the objective function Φ(t) is a distance from the original measurements. Prior knowledge about structural redundancy is formulated into equality or inequality constraints. The uncertainty factors εH and εG are included to allow for inexact knowledge about the constraints. Data reconciliation can be formulated as a problem of constrained optimization. Its implementation is determined by the formation of a specific problem. If uncertainty exists in the constraints (εH =0 or εG =0), the estimates can be found by optimizing the penalized objective function. The penalized objective function is the original objective function plus a measure of the deviation of measurements from the constraints [83]. For deterministic constrained optimization problems (εH =0 and εG =0), as with the HR measurements in Equation (8.8), one of the best-studied problems is the Weighted-Least-Square (WLS) estimation. If there are only equality constraints and the constraints are all in linear or some special bilinear forms, the optimal yˆ(t) in a WLS problem can be found through linear decomposition [31, 133]. Otherwise, the problem should be used to solve numerically. Structural median filter As in temporal filtering, artifact detection and data reconciliation can be accomplished in one step using a robust estimator. If the three sensors are all available, the L-1 norm can be used as the 116 8.1. Methods objective function in the data-reconciliation process, resulting in: =3 Φ(t) = M yi (t) − yi (t)| i=1 |ˆ yˆ1 (t) − yˆ2 (t) = 0 yˆ1 (t) − yˆ3 (t) = 0. s.t. (8.10) The constraints are incorporated into the objective function to generate: M =3 yˆ(t) = argmax{ y˜ |˜ yi (t) − yi (t)| }. (8.11) i=1 yˆ(t) can be found by using the structural median filter as shown in Figure 8.1-(b). As in the temporal median filter, the measurements from different sensors can be weighted according to their credibility. The structural median filter introduces no time delay, but it is effective when only one sensor is contaminated with artifacts. Another limitation is that, at least three sensor inputs should be available for the structural median filter to work, but HRabp is not routinely measured. 8.1.3 Dynamic data reconciliation The data reconciliation framework can be extended to include the signals’ dynamic characteristics [165]. In this section, the knowledge about both the temporal and structural redundancy is used in the artifact detection and data reconciliation processes to improve estimation accuracy. 8.1.3.1 Kalman filter for data reconciliation In this section, the Kalman filter is used to obtain the MMSE estimates of HR, subject to the dynamic characteristics (Equation (5.1)) and structural redundancy (Equation (8.10)). The signals are estimated in two steps (Figure 8.3). The first step is to detect artifacts and excluding the contaminated measurements from the data-reconciliation process. Artifacts can be detected using global schemes such as the χ2 -test on the whole set of measurements, or using local schemes such as a threshold test on each individual measurement channel [133]. Here artifacts are detected by testing the forecast residuals of each individual signal against a predetermined threshold σ. Only the measurements not contaminated with artifacts are then processed using a Kalman filter to find the estimate that has the MMSE criterion and satisfies the structural constraints. Given the equality constraints, the MMSE estimates can be found 117 8.1. Methods z-1 y1 y2 y3 Artifact Detection Kalman Filter H, R construction Figure 8.3: Kalman filter used for estimating heart rate from multiple measurement channels by augmenting the observation matrix and then performing the standard Kalman filter procedure (see Section 5.1.2) on the augmented system. For example, if HRSpO2 (denoted as y2 ) is contaminated with artifacts at time t, y2 will be removed from the measurement matrix, and the true value yt∗ is estimated by filtering the following augmented system: x(t) = F x(t−1) + ζ ζ(t) ∼ N (0, Q(t)) ˜ ˜ y = H(t)x(t) + η˜(t) η˜(t) ∼ N (0, R(t)) H ˜ ˜ = R1 0 H(t) = R(t) H 0 R3 (8.12) where Ri is the covariance matrix of measurement noise in the ith sen˜ ˜ sors. H(t) and R(t) should be constructed online according to the results of artifact detection. If all the sensors are contaminated with artifacts, the prediction is used as the signal estimate. 8.1.3.2 A novel hybrid median filter In this section, the dynamic data reconciliation problem is optimized with respect to the L-1 norm under the assumption that (1) the current HR equals the estimate for the previous sampling instant; (2) HR keeps constant over a short window T; and (3) HR is measured by multiple sensors. With T =2 and M =2, the optimal yˆ(t) is estimated through: M =2 min yi (t) − yi (t)| s.t. Φ(t) = i=1 |ˆ y ˆ (t) − y (t−1) = ε Dynamics−(1) 1 1 1 yˆ2 (t) − y2 (t−1) = ε2 Dynamics−(2) yˆ (t) − yˆ(t−1) = ε3 Dynamics−(3) 1 yˆ1 (t) − yˆ2 (t) = 0 Structual − (1) (8.13) 118 8.1. Methods where ε1...3 indicate the uncertainty about the dynamic constraints. The influence of the stochastic dynamic constraints on the final estimate can be realized by adding penalty terms to the original objective function. Here the penalty terms are also based on the L-1 norm. The penalized ˜ is: objective function Φ M =2 ˜ Φ(t)=λ yi (t) − yi (t)| + λ1 |ˆ y1 (t) − y1 (t−1)| 0 i=1 |ˆ +λ2 |ˆ y2 (t) − y2 (t−1)| + λ3 |ˆ y1 (t) − yˆ(t−1)| s.t. yˆ1 (t) − yˆ2 (t) = 0 (8.14) where λ0 , . . . , λ3 reflect the relative credibility of each term. If λ0 , . . . , λ3 are all set to one, replacing yˆ1 (t) and yˆ2 (t) with yˆ(t), we get: 1 2 ˜ Φ(t) = |ˆ y (t) − yˆ(t−1)| + |ˆ y (t) − yi (t−k)|. (8.15) k=0 i=0 Minimizing Equation (8.15) generates the results of the hybrid median filter M ed[ˆ y (t−1), y1 (t−1), y2 (t−1), y1 (t), y2 (t)] (see Figure 8.1-(c)). The hybrid median filter can be used to fuse the measurements from M 2 sensors, and the filter window size can be extended to T 2, as long as the assumption about the signal dynamics still holds. Then the penalized objective function becomes: ˜ Φ(t) = |ˆ y (t) − yˆ(t−T +1)| + T −1 k=0 M y (t) i=1 |ˆ − yi (t−k)|. (8.16) Weighting of the measurements Since the hybrid median filter is a recursive filter, its estimate is therefore based not only on the measurements in the filter window, but it is also influenced by all the historical data. The influence of historical data decreases exponentially. If a hybrid median filter with a T -point filter window is used to fuse the measurements from M independent sensors, the influence of historical data decreases exponentially in a block-wise fashion, with the base of the exponential rate being T M1+1 . The proof for this observation is given below. Statement: ˜ argmax Φ(t)=argmax {|˜ y (t) − yˆ(t−T +1)| + Ψ} y˜(t) y˜(t) 1 =argmax { W |˜ y (t) − yˆ(t−2(T −1))| y˜(t) 2(T −1) 1 +W k=T −1 M y (t) i=1 |˜ (8.17) − yi (t−k)| + Ψ} 119 8.1. Methods −1 M where Ψ = Tk=0 y (t) − yi (t−k)| is the non-recursive part in the fili=1 |˜ ter data set, and W = T M +1 is the number of elements in the filter data set. Proof: Equation (8.17) is equivalent to: argmax {W |˜ y (t) − yˆ(t−T +1)| + W Ψ} = y˜(t) argmax {|˜ y (t) − yˆ(t−2(T −1))| + y˜(t) 2(T −1) k=T −1 M y (t) i=1 |˜ − yi (t−k)| + W Ψ} (8.18) So the problem becomes to prove that SL : M ed[ˆ y (t−T +1) W, S] = SR :M ed[ˆ y (t−2(T −1)), Y (t−2(T −1)), . . . , Y (t−T +1), S] (8.19) S˜ where S represents [Y (t−T +1) W, Y (t−T +2) W, . . . , Y (t) W ], and Y (t) represents all the measurements from the M sensors at time t. From now on, the left-side data set in Equation (8.19) is denoted as SL , and the right-side data is denoted as SR . In Equation (8.19), the W data points in yˆ(t−T +1) W on the left side is replaced with the W readings in S˜ on the right side. It should also be ˜ = yˆ(t−T +1). noted that M ed[SL ] = yˆ(t) and M ed[S] Equation 8.19 is proven in the following three situations: 1. if yˆ(t)=ˆ y (t−T +1): the number of readings yˆ(t) in S˜ is equal to the number of readings yˆ(t), therefore replacing yˆ(t−T +1) W in SL with S˜ does not change the median of SL . 2. if yˆ(t)<ˆ y (t−T +1): no more than W readings in S˜ can be smaller 2 than yˆ(t). As demonstrated with the example below (W =5 and T =2), replacing yˆ(t−T +1) W with S˜ does not change the median of SL . SL = ◦ ◦ ◦ ◦ ◦ . . . • • • • • ↑ yˆ(t) SR = yˆ(t−1) W ◦ ◦◦◦◦ ↑ yˆ(t) . . . yˆ(t−1) where S˜ : [ yˆ(t−1) ] M ed [SL ] = M ed [SR ] (8.20) 120 8.2. Evaluation of performance 3. if yˆ(t)>ˆ y (t−T +1), Equation (8.19) can be proven in a similar. Therefore, the statement in Equation (8.17) is proven. The above analysis is for the hybrid median filter with each element in the filter window equally weighted, including yˆ(t). As in the temporal and structural median filters, measurements from different sensors or different sampling instants can be weighted according to their credibility. Degree of redundancy The degree of redundancy provided by the dynamic and structural constraints determines the maximum number of measurements that are allowed to be contaminated with artifacts. Too many artifacts may distort the estimate. For an L-1 estimator the degree of redundancy is half of (the number of independent constraints - the number of variables to be estimated). Therefore, to obtain a correct estimate, a maximum of 2 measurements out of [y1 (t−1), y2 (t−1), y1 (t), y2 (t)] are allowed to be contaminated with artifacts, i.e., either each of the sensors at time t is contaminated with a single-point artifact, or both samples from one of the sensors are contaminated with artifacts. The degree of redundancy can be increased by incorporating more independent sensors, or, if a longer delay is acceptable, by increasing the length of the filter window. The performance of the hybrid median filter with T =2 and M =2 is demonstrated in Figure 8.4. If a short peak occurs in one sensor (Figure 8.4(a)), or transients occur in both sensors (Figure 8.4-(b)), the variations are recognized as artifacts and removed. If short peaks occur in both sensors (Figure 8.4-(c)), a short peak is retained with 1-point delay. 8.2 Evaluation of performance The hybrid median filter with the default window size T =2 was tested using both simulated signals and clinical data. 8.2.1 Simulation test The simulation study was performed in two phases, with three scenarios in each phase. In phase I, three scenarios using one, two, and three sensors were used. The sensors were contaminated solely with background noise. In phase II, a range of one to three sensors was used, and, in addition to background noise, the measurements were contaminated with artifacts. 121 8.2. Evaluation of performance (a) Denoised signal Sensor 1 Sensor 2 (b) Denoised signal Sensor 1 Sensor 2 (c) Denoised signal Sensor 1 Sensor 2 Figure 8.4: Effect of the hybrid median filter on two channels of measurements: (a) a short peak in one sensor, (b) transients in both sensors, and (c) short peaks in both sensors. Thirty cases were generated for each of the six scenarios. For each case, the measurement signal was a combination of a true HR series and a noise series. For scenarios involving more than one sensor, multiple measurements were simulated for the same case by contaminating the same true HR series separately with a different noise series. The linear growth DLM in Chapter 5 was used to generate the true HR series. Each series contained 1,000 sample points. Background noise for the first phase of scenarios was generated using the Gaussian distribution N (0, δ 2 ) with δ=5. For phase II, in addition to the background noise, HR signals were contaminated with 100 randomly located 1-point transient artifacts and 10 randomly located short-peak artifacts. The amplitude of the transient artifacts was randomly generated using a uniform distribution over [-50 50]. The amplitude of the short peak artifacts followed a uniform distribution over [-50 50], and their duration was uniformly distributed over [2 10] sampling intervals. The hybrid median filter with T =2 proposed in Section 8.1.3.2 and the Kalman filer described in Section 8.1.3.1 were tested on the simulated data in both scenarios (Phase I and Phase II). For the hybrid median filter, no specific information about the test signals was required. For the Kalman filter method, the model parameters were all set to the true values used for generating the signals, and the threshold σ used for artifact detection was set to different magnitudes, from 1δ to 20δ with the step size being ∆σ=δ. δ is the standard deviation of the background noise. 122 8.2. Evaluation of performance 1 sensor, Kalman filter 2 sensors, Kalman filter 3 sensors, Kalman filter Hybrid median filter 2.5 Relative RMSE 2 1.5 1 1 sensor, hybrid median 0.5 2 sensors, hybrid median 3 sensors, hybrid median 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 Artifact detection threshold σ (unit: δ) Figure 8.5: The relative Root Mean Square Errors (RMSE) of the Kalman filter with the model covariances set to the true values and the artifact thresholds σ changing from 1δ to 20δ, compared with the RMSE of the hybrid median filter. δ is the standard deviation of the background noise. The signals are contaminated with transient and short-peak artifacts. The performance of the two filters was evaluated by comparing their Root Mean Square Errors (RMSE). The RMSE qualifies the difference between the signal estimates yˆ and the true signal level y ∗ . The RMSE was calculated for each case and then averaged over all the cases for each scenario. The RMSE levels of the unfiltered data indicate the overall noise levels. The relative RMSE ratios are defined as the RMSE levels relative to the RMSE level of the unfiltered data. The relative RMSE ratios were calculated for the two filters to compare their performance in different scenarios. For the scenarios with artifacts (Phase II), the estimation accuracy of the Kalman filter obtained with different σ thresholds was compared with the estimation accuracy of the median filter (see Figure 8.5). For both filters, the accuracy of signal estimation improved as the number of sensors increased. In all the scenarios, the magnitude of the threshold σ has a great influence on the performance of the Kalman filter. When σ is too small, 123 8.2. Evaluation of performance the estimates of the Kalman filter deviate far from the signal true values, due to severe signal distortion. For σ’s of larger magnitudes, the estimation accuracy of the Kalman filter varies in the three scenarios. The performance in each scenario is described below. If only one sensor is available, the performance of the Kalman filter deteriorates as σ increases from 3δ to 9δ, indicating that the Kalman filter might have missed more artifacts with the larger σ thresholds. However, when the threshold σ becomes larger than the amplitude of most artifacts (σ>10δ), the method has little effect of artifact removal. Instead, when σ>10δ, the increased σ thresholds further reduce signal distortion. Therefore, the estimation accuracy of the Kalman filter improves as σ increases from 10δ and soon reaches peak performance. Except in the situation of severe signal distortion (σ 2δ), the optimal relative RMSE for the Kalman filter (marked with ∗ in Table 8.2) is 63%, when one sensor is available, obtained at the plateau area with σ=20δ, and the worst relative RMSE (marked with − in Table 8.2) is 111%, obtained with σ=9δ. The performance obtained at the plateau area is equivalent to the performance obtained without artifact removal (marked with 0 in Table 8.2). If more than one sensor is available, the influence of missed artifacts on the estimation accuracy of the Kalman filter is reduced for 3δ σ 9δ. The estimation accuracy of the Kalman filter improves as σ increases for σ 3δ. If two sensors are available, the optimal relative RMSE is 32%, obtained with σ=3δ, the worst relative RMSE is 50%, obtained with σ=8δ, and the relative RMSE without artifact removal is 47%, obtained with σ 11δ. If three sensors are available, the optimal relative RMSE is 24%, obtained with σ=3δ and the worst relative RMSE is 41%, obtained with σ 11δ, and the relative RMSE without artifact removal is also 41%, obtained with σ 11δ. The results of the simulated test in both phases are listed in Table 8.1 and Table 8.1, respectively. In all six scenarios (Phases I and II), the overall RMSEs of both the Kalman filter and the hybrid median filter are smaller than those of the unfiltered data. As the number of sensors increases, the overall RMSEs of both filters decrease. In the cases where the signals were contaminated with background noise only (Phase I), the Kalman filter performed better than the hybrid median filter. In the cases where the signals were contaminated with both background noise and artifacts (Phase II), the optimal performance obtained by the Kalman filter with the true model parameter is only marginally better than the performance of the proposed hybrid median filter. If the threshold for artifact-removal is not set appropriately, the estimation accuracy of the Kalman filter can be significantly worse than the hybrid median filter. 124 8.2. Evaluation of performance Table 8.1: Root mean square errors and relative root mean square errors (relative to the unfiltered data) for the simulated cases in Phase I: cases without artifacts Number of sensors one two three Unfiltered data Kalman filter Hybrid median filter 2.23 (100%) 1.43 (64%) 1.75 (78%) 2.23 (100%) 1.09 (49%) 1.34 (60%) 2.23 (100%) 0.96 (43%) 1.17 (52%) The true model parameters were used in the Kalman filtering process. Table 8.2: Root mean square errors and relative root mean square errors (relative to the unfiltered data) for the simulated cases in Phase II: cases with artifacts Number of sensors Unfiltered data Kalman filter− Kalman filter0 Kalman filter∗ Hybrid median filter one 11.57 (100%) 12.84 (111%) 7.29 (63%) 7.29 (63%) 7.65 (66%) two 11.57 (100%) 5.79 (50%) 5.44 (47%) 3.70 (32%) 4.28 (37%) three 11.57 (100%) 4.74 (41%) 4.74 (41%) 2.78 (24%) 3.24 (28%) Kalman filter− : the worst estimation accuracy of the Kalman filter with an poorly configured artifact-removal threshold; Kalman filter0 : the performance of the Kalman filter without artifact removal; Kalman filter∗ : the optimal estimation accuracy of the Kalman filter with an appropriately configured artifact-removal threshold. The true model parameters were used in the Kalman filtering process. 8.2.2 Case studies with clinical data Clinical cases were used to demonstrate the performance of the proposed hybrid median filter on intraoperative HR trend measurements. Case I included multiple and prolonged bursts of electrocautery during surgery. The HRecg measurements were corrupted with a significant number of electrocautery artifacts (Figure 8.6-(a)). Many of the electrocautery artifacts were short peaks, and the longest lasted for 26 sampling intervals (2 minutes and 10 seconds). HRSpO2 measurements were contaminated with movement artifacts at around t=2370. The longest short-peak artifact in the pulse oximeter lasted for 5 sampling intervals (25 seconds). For a univariate median filter, the window size required to remove these short-peak artifacts would 125 8.2. Evaluation of performance (a) (b) Movement artifact Induction End of surgery Physiological short variation HR SpO2 (c) HR ecg Artifacts from ECG Artifacts from pulse oximeter Figure 8.6: The hybrid median filter has fused the HR measurements from the ECG (HRecg ) and pulse oximeter (HRSpO2 ) into a single HR series, and removed the electrocautery artifacts from HRecg and movement artifacts from HRSpO2 . be at least 53 points for HRecg and 11 points for HRSpO2 . The remaining oscillations at the beginning and end of the case were caused by surgical intervention. The hybrid median filter fused HRecg and HRSpO2 into one single HR series (Figure 8.6-(b)). The movement artifacts presented on the pulse oximeter and most of the electrocautery artifacts in the ECG device were effectively removed from the HR estimates. In Case II, HRabp was measured from the invasive arterial BP monitor, in addition to HRecg and HRSpO2 . Artifacts were present in all sensors (see Figure 8.7-(a)). The oscillations in HRabp at the beginning of the case were produced by flushing the arterial line and movement of the patient and fluid column. During the procedure, adjustment of the patient position introduced movement artifacts, including the artifacts at around t=390 in HRSpO2 and t=700 in HRabp . Flushing of the arterial line introduced additional artifacts in HRabp at around t=380. In this case, the hybrid median filter removed most artifacts from the HR estimates (see Figure 8.7-(b),(c)). 126 8.3. Discussion HR ecg (a) (b) Movement artifacts HR SpO2 Movement artifacts Flushing the arterial line HR abp Movement artifact (c) Artifacts from arterial BP monitor Artifacts from pulse oximeter Flushing the arterial line and movement Artifacts from ECG Figure 8.7: The hybrid median filter has fused the HR measurements from the ECG (HRecg ), pulse oximeter (HRSpO2 ), and arterial BP monitor (HRabp ) into a single HR series. 8.3 Discussion When measurements from more than one sensor are used, both hybrid median filter and the Kalman filter generate more robust estimates of HR than would be possible from a single sensor source, by using both the signal’s temporal characteristics and the relationship between the measurements from multiple sensors. The hybrid median filter and the Kalman filter optimize the signal estimates with respect to different criteria. As a robust estimator, the hybrid median filter provides an integrated solution for the problem of artifact removal and signal estimation. The Kalman filter, as a linear filter, averages the measurements rather than selecting the artifact-free sensor; therefore, an effective artifact removal step is required for this method. In the simulated scenarios free of artifact, the Kalman filter method with the true model parameters performed marginally better than the hybrid median filter. This is not surprising, as the Kalman filter is the optimal filter for a Gaussian linear model. However, when artifacts were present, the hybrid median filter performed better than the Kalman filter for a wide range of artifact-detection thresholds, even if the model parameters for the Kalman filter were set to the true values. In reality, it is unlikely that the true model parameters will be unknown. The parameters estimated from 127 8.3. Discussion population data often deviate significantly (>100%) from the true settings (refer to the results in Chapter 5). In practice the uncertainty of the model parameters may cause the performance of the Kalman filter to degrade. In the first clinical case a few short peak variations were retained in the estimates of the hybrid median filter. This does not represent a failure of the hybrid median filter. These short variations were present in more than half of the sensors. It is undesirable to remove these short variations as they may well have clinical significance. One of the fundamental assumptions of the proposed hybrid median filter is that the sampling rate is sufficiently high that the physiological variation in the filter window can be ignored. This assumption might not hold if the sampling rate is very low (such as every few minutes). In this case, a temporal model should be used to describe the physiological dynamics between samples. Model predictions should replace the historical estimates and historical measurements used in the hybrid median filter. The hybrid median filter has a few advantages over a previously described method [40], which is a two-step solution based on artifact detection and Kalman filtering (see Section 2.5). Firstly, the hybrid median filter method is computationally simpler. The only computation required in the hybrid median filter is ordering the samples in the data set. The median filter intrinsically selects the best measurement and generates the estimate. Secondly, the hybrid median filter does not require any explicit assumptions about the distribution of artifacts. The Gaussian distribution and the uniform distribution were used only for generating signals in the simulated testing. In fact, median filters are able to remove most artifacts that follow long-tail distributions. Finally, the hybrid median filtering process is intuitive and the performance can be easily adjusted. These considerable benefits of this method make it highly suitable for clinical use. 128 Chapter 9 Software implementation and clinical testing The EWMA-Cusum method and the Adaptive-DLM method proposed in Chapter 4 and Chapter 5 have been implemented in a software system called iAssist. iAssist allows for real-time evaluation of the proposed trend-changedetection methods in the operating room. The software implementation and clinical study presented in this chapter was mostly conducted by Chris Brouse as part of his Master project, and the results have been published in Proceedings of the 29th Annual International Conference of the IEEE Engineering in Medicine and Biology Society [19]. 9.1 Software implementation: iAssist The iAssist software system was developed in an object-oriented programming language Java (Sun Microsystems, Santa Clara, CA). iAssist supports the following functions: 1. 2. 3. 4. Data acquisition from multiple input sources; Signal processing using multiple algorithms; Display of results and reception of feedback using a graphical interface; Data storage to save the results to external devices. The system is realized in a modular framework, which allows for different implementations for each of the function modules mentioned above. These modules communicate via the common interfaces predefined in the framework and exchange data during intraoperative monitoring. The framework of iAssist provides great extensibility and flexibility. In the current version, the real clinical data are read from the datacollecting software S/5 Collect (GE Healthcare, Chalfont St Giles, UK) via the TCP/IP communication protocol. Our group is developing other data acquisition modules, aiming to read data directly from a wide range of physiological monitors. The EWMA-Cusum method proposed in Chapter 4 is implemented with a one-level threshold and applied to SpO2 and NIBPmean. 129 9.2. Clinical study The Adaptive-DLM method proposed in Chapter 5 is implemented and applied to HR, MVexp, EtCO2 , and RR. All the data are preprocessed using a median filter to reduce transient artifacts before being processed by the trend-change-detection methods. The detected trend changes are displayed as directional arrows on the signal trend trace (Figure 9.1). In addition to the visual display, the detected trend changes are encoded as different vibration patterns and sent via Bluetooth to a vibro-belt to provide tactile alerts to clinicians. A touch screen is used to receive feedback from clinicians. The algorithm configurations and detection results are formatted according to the ISO/IEEE 11073 international standard for modeling pointof-care medical device data [1], using XML-based deployment descriptors, and stored in ASCII text files. 9.2 Clinical study Following ethics approval, the performance of iAssist was evaluated in real time alongside the physiological monitors with S/5 Collect in the operating room at BC Children’s Hospital. The median filter was optimized for each variable. The sensitivity parameters of the trend-change-detection methods were configured before the start of each case, according to the results in the offline testing. However, the participating anesthesiologists were allowed to change the sensitivity configurations during surgery. Each event detected by iAssist in the maintenance period of anesthesia was graded by the attending anesthesiologists as “artifact” (A), “clinically insignificant” (CI), “clinically significant” (CS), “clinically significant with action taken” (CSAT), or “clinically significant due to intervention” (CSI) with reference to the graphical display of the trends and events marked on the trend display (Figure 9.1). Missed events were also noted. The usefulness of each event was rated on a 10cm visual analogue scale from “frustrating” (0) to “very useful” (10). Fifteen anesthesiologists completed the clinical evaluation in a variety of pediatric surgical cases of at least one hour in duration (38 cases; mean duration of 103 (STD 4.25) minutes). The mean number of events per case was 22.8 (STD 23.4). The clinical evaluations of the generated alerts are shown in Table 9.1. The proposed methods detected 13 events per hour of anesthesia. In total, over 50% of the events iAssist detected were rated as clinically significant (CS, CSAT, or CSI), and less than 7% were rated as artifacts. Only 6 significant events were reported as having been missed by the algorithms. The usefulness value ranged from 0 to 9.3 (9.3 for an EtCO2 event). 130 9.3. Discussion Figure 9.1: Clinicians evaluate the trend-change-detection results generated by iAssist during clinical testing. Table 9.1: Clinical evaluation of the real-time performance of iAssist EtCO2 HR MVexp RR SpO2 NIBPmean Total Total (N) 212 253 145 86 48 124 868 Artifact (%) 5.2 6.3 5.5 2.3 37.5 3.2 6.8 Insignificant (%) 24.1 36.4 20.7 9.3 20.8 28.2 26.0 Significant (%) 52.3 43.9 54.5 60.5 18.8 62.1 50.6 Not Rated (%) 18.4 13.4 19.3 27.9 22.9 6.5 16.6 EtCO2 , HR, MVexp, and RR were monitored using the Adaptive-DLM method; SpO2 and NIBPmean were monitored using the EWMA-Cusum method with a one-level certainty threshold. 131 9.3. Discussion 9.3 Discussion In the clinical test, over 50% alerts generated by iAssist were considered as clinically significant. In contrast, the false alarm rate of standard physiological monitors can be as high as 90% according to a survey [74]. The results demonstrate that the detection accuracy of the EWMA-Cusum method and the Adaptive-DLM method was superior to that of basic threshold-based alarms used in current physiological monitors. However, compared with the performance in the offline testing (see Table 4.1, Table 5.3 and Figure 5.6), both the sensitivity and specificity decreased in the clinical test. One of the contributing factors for the degraded performance is the frequent occurrence of artifacts. In the offline testing, we followed the Gaussian assumption on signal characteristics imposed by the change-detection methods and did not use any case with heavy artifact contamination. However, in the real-time clinical test, some signals were corrupted by artifacts. For example, HR signals in electrocautery surgeries were all significantly disturbed by the electrocautery artifacts, resulting in many false detections. Future inclusion of the hybrid median filter proposed in Chapter 8, or the wavelet-based denoising approach proposed by our group [18], is expected to significantly reduce the number of false alerts due to electrocautery noise. Furthermore, the tuning parameters were not fixed consistently at the optimal settings throughout the testing. In some cases, the attending clinicians adjusted the tuning parameters away from the recommended sensitivity setting, in order to reduce time delay by sacrificing detection accuracy. For example, the sensitivity parameter for SpO2 was set to detect changes of 3-5% over only a few minutes throughout the test, resulting in many artifacts (false detections). Some clinically significant trend changes due to intervention (one type of true positive detections) were annotated incorrectly by clinicians as clinically insignificant (CI) or artifacts (A). This evaluation bias is not surprising, since most CSI changes are anticipated responses for clinicians, and do not require immediate clinical intervention. However, from the perspective of trend-change detection, CSI detections notify the clinician who initiated the intervention about the physiological responses of a patient to the treatment, and should be treated as true positive detections. Had all the results been classified according to the classification system described in Section 9.2, the detection performance would appear better than in Table 9.1. The participating clinicians were given a high degree of freedom during the test, as one of the purposes of this pilot study was to promote the concept of trend-change detection and to encourage clinicians to explore the 132 9.3. Discussion functionality of iAssist. We have observed the attitudes of the participating clinicians toward iAssist throughout the test. It was noticed that, as clinicians became more familiar with iAssist and the underlying change-detection methods, they showed more interest in using the software and participating in the clinical testing. This observation indicates that the acceptance of a clinical decision-support system among its target users – clinicians, can be significantly improved by education and training, which agrees with the finding of a previous survey [13]. On the other hand, the technical complexity of a system, if too high, may hinder its acceptance among clinicians. Using methods based on intuitive concepts, like the EWMA-Cusum method realized in the current version of iAssist, is very important in the first stage of clinical study. Due to the freedom given to the participating clinicians, the results of this pilot study should only be taken as a preliminary evaluation of the real-time performance of the proposed methods. Many contributing factors were not controlled while conducting the test, neither were they taken into account in data analysis. One example is the influence of workload on the clinician’s viewpoint of trend-change detections. It should be noted that a significant proportion of alerts were not annotated (up to 27.9% for RR, see Table 9.1). These non-annotated alerts very likely occurred during the occurrence of critical events, a high-workload situation where the results of trend-change detection are expected to facilitate the decision making process but the attending clinicians could not divert their attention to iAssist to evaluate the results due to high workload. To reduce the influence of raters’ workload on the results of evaluation, it is desirable to have a research assistant to assist with alert annotation or to use a wireless headset instead of keyboard or touch-screen to receive feedback from clinicians. To obtain a better evaluation of the methods’ real time performance, more clinical tests should be performed following a better controlled testing procedure. Moreover, after the software system gains further acceptance among clinicians, other methods proposed in this thesis should be implemented in iAssist and evaluated in real time. 133 Chapter 10 Conclusion and future work 10.1 Summary: work accomplished This thesis has addressed the signal-estimation and trend-change-detection problems in physiological monitoring. The proposed univariate methods are designed to detect changes in different trend features. The EWMA-Cusum method (Chapter 4) can detect changes in trend direction. The AdaptiveDLM (Chapter 5) detects changes in trend direction, but can also be used to capture changes in the incremental rate by adjusting the sensitivity setting as well as the segment-congregation step. In the GHMM-based methods (Chapter 6), trend changes are classified according to their direction, incremental rate, and duration. More detailed trend abstraction can be generated online together with a quantitative estimation of the occurrence probability. The results of the proposed methods summarize the signals’ temporal dynamics, and can be used as inputs for a high-level diagnostic inference system. The accuracy of change detection, as demonstrated in both offline testing and real-time clinical testing, indicates that the proposed methods have great potential for clinical use. In clinical testing, the participating clinicians evaluated over 50% of the alerts generated by the EWMA-Cusum and the Adaptive-DLM methods as clinically relevant. Compared with over 90% irrelevant alarms generated by the standard alarm systems [74], the methods proposed in this thesis achieved a much higher detection accuracy. The proposed methods use different techniques to handle the challenge of intraoperative variability. The EWMA-Cusum method is based on a simple ARIMA(0,1,1) model and has the advantage of flexibility and robustness. Following adjustment, it can be applied to most physiological variables. However, the performance of this method is limited due to the fact that the second-order statistics of the monitored signals are unrealistically assumed to be constant during surgery and between patients. In contrast, the Adaptive-DLM method and the GHMM-based methods (the switching fixed-point Kalman smoother and the adaptive Cusum test) can adapt online to the intraoperative variability. The Adaptive-DLM method requires a good initial estimate of the second-order statistics (Q and R), 134 10.1. Summary: work accomplished and estimates the statistical characteristics of the signal online based on the recent data. The estimation process in the Adaptive-DLM method is based on the EM algorithm. If a change occurs in the second-order statistics, the Q-R estimation process needs to run a number of iterations before converging to a local optimum. Therefore the Adaptive-DLM method is only appropriate for physiological variables whose covariances vary much slower than the sampling rate. In the GHMM-based methods, the intraoperative transition between different patterns is modeled as a Markov-chain process. The formalized transition relationship can be learned from annotated population data, and then used online for pattern recognition. The methods based on the GHMM, including the switching Kalman smoother and the adaptive Cusum, are suitable for monitoring physiological variables sampled at a lower rate, such as NIBPmean, and can also be extended to monitor variables measured at a higher sampling rate, such as HR. The two GHMM-based methods have the highest computational complexity. To reduce the complexity, it is recommended that signals measured at a high sampling rate be down-sampled or segmented by linear approximation before being processed by the GHMM-based methods. The multivariate analysis (Chapter 7) performed in this thesis is a pilot study, intended to investigate the potential of linear dimensionalityreduction techniques to represent the inter-variable relationships. The proposed statistics based on the factor model appear to be effective at separating two different types of changes: changes in the magnitude of common factors and changes in the covariance structure. Therefore, these statistics provide a solid ground for summarizing repetitive alerts due to a common cause and for detecting clinical events with subtle manifestations. However, the covariance structure between physiological variables is sensitive to the contextual information, and may vary significantly between patients. The inter-patient variability is not addressed in the proposed method. The method should be tested using additional clinical data to find directions for further improvement. Finally, a solution for artifact-removal is proposed for HR using a novel hybrid median filter (Chapter 8) that utilizes the temporal and structural redundancy in the sensor network in the current clinical environment. Artifact removal, as a fundamental challenge in physiological monitoring, requires further investigation. 135 10.2. Future work: the road ahead 10.2 Future work: the road ahead The ultimate goal of intelligent physiological monitoring is to improve patient safety by providing decision-making support to the clinicians who are dedicated to total care of the individual patient. In a topic so closely related to the well-being of human subjects, every advance in the area of physiological monitoring should be validated using stringent evaluation. In the following sections I propose potential future work that can be carried out based on the methods proposed in this thesis. In addition, I discuss the issues of clinical testing, software design, and the ethical and legal issues that a clinical-decision support system should address before clinical application. Other subproblems in physiological monitoring Physiological monitoring is a multi-level task. Although this thesis has addressed one of the most important subproblems in this task – trend-change detection and pattern recognition – subproblems at other levels of interpretation (see Figure 1.2 for more details) still need to be addressed. Finally, the solutions to all the subproblems need to be integrated to build a complete clinical decision-support system. Artifact removal is a fundamental step for any physiological monitoring system. An appropriate artifact-removal solution should address three issues: artifact detection, signal estimation, and artifact explanation. As demonstrated in Chapter 8, artifact detection together with signal estimation is intended to improve the quality of input for the higher-level modules, and consequently to reduce the uncertainty of final decision making. Artifact explanation is intended to identify the underlying cause for artifacts, and decide whether to trigger an alert. Artifact explanation is necessary because some artifacts indicate device malfunction and may well deserve immediate attention. In addition to HR measurements, measurement redundancy also exists in other signals in the current clinical sensor network. For example, gas flows sampled from the same breathing circuit must satisfy some mass transfer equations. Utilizing the correlation between different variables is a promising direction for artifact removal and signal estimation. For the problem of artifact explanation, signal measurements alone might not provide sufficient information to determine the clinical relevance of artifacts. Indeed, it may be necessary to utilize the contextual information and trend dynamics of other variables in order, to derive a diagnostic interpretation of the detected artifacts. Semantic trend abstracts provide a solid basis for diagnostic inference. It is desirable to integrate temporal abstracts with the expert knowledge 136 10.2. Future work: the road ahead commonly used in physiological monitoring (such as the critical threshold for each variable) to derive alerts more relevant to clinicians. Clinicians naturally comprehend information from all sources and make a final diagnosis/decision without clear recognition of the individual cognitive processes. The early alerts may cause distraction or confusion to clinicians who are not familiar with the concept of trend-change detection. We found in the preliminary clinical test that simply triggering an alert for every trend change introduces a heavy cognitive load for anesthesiologists. To address these issues, we are developing a rule-based expert system. The system will be used to fuse trend abstracts with the standard threshold-based alarm scheme to provide alerts of higher clinical relevance [6, 39]. Although information integration is desirable, researchers need to decide on the target level of interpretation for the final results. Increased automated diagnosis may not necessarily provide better cognitive support and improved outcomes. As a matter of fact, clinicians demonstrate obvious reluctance to trust a decision-support system that produces specific diagnostic or therapeutic suggestions [13]. This attitude could reduce the user compliance for a decision support system. Furthermore, some contextual information required for diagnostic inference can only be obtained by manual input in the current clinical setting. The increased workload counteracts the benefits provided by automatic information processing. Another concern for diagnostic inference is knowledge completeness. As pointed out in Chapter 2, it is very difficult if not impossible to build a knowledge reservoir for detecting all the potential clinical events. Biomedical researchers should work closely with clinicians to identify the clinical subcategory in which a complete collection of clinical scenarios is relatively easier to obtain. Alert design and display The information extracted by the data interpretation methods needs to be transformed into some form and delivered to clinicians. The efficiency of information expression and quality of information transmission influence the amount of relevant information that clinicians can finally receive from a physiological monitoring system. A multi-level alert system (the EWMA-Cusum method) usually triggers more alerts than a single-level alert system, and may increase clinicians’ cognitive load. The display of multi-level alerts is an important topic that requires further study. Visual communication remains the predominant method currently used to relay information to clinicians. Our group has been investigating the relatively under-utilized sense of touch to trans- 137 10.2. Future work: the road ahead mit information during physiological monitoring [11]. This tactile display uses the body’s largest sensory organ, the skin, to provide subtle cues to clinicians based on features extracted from the data. Compared with visual and auditory displays, alerts sent via tactile devices may not distract from patient observation and do not disturb other individuals in the clinical environment. Clinical testing Extensive testing should be performed in the clinical environment to evaluate the performance of every component of a physiological monitoring system. We have conducted preliminary testing to verify the real-time performance of the proposed trend-change-detection methods, as well as the efficiency of tactile display. More real-time testing will be performed in the operating room to evaluate the combined performance of the proposed trend-change-detection methods and the tactile display. After a system is confirmed to be able to effectively detect and deliver early alerts to a clinician, the next question is whether adopting such a system in the clinical environment can enhance situation awareness of the attending clinician, and eventually improve patient safety. Many human-factor studies have been performed to prove the link between situation awareness and patient outcomes [103, 136]. However, extensive clinical testing will be needed to verify whether the alerts generated by our methods improve the situation awareness of clinicians in the operating room. Situation awareness in physiological monitoring refers to the perception and comprehension of any change in the clinical environment related to the well-being of patients, and the ability to evaluate a patient’s status based on the perceived changes [43]. Situation awareness can be assessed using a variety of approaches, including direct and indirect measures of performance, mental workload measures, and analytical measures of specific aspects of performance such as movement or communication [155]. In order to learn whether the proposed methods improve the clinicians’ situation awareness, we should first choose an assessment approach to measure the degree of situation awareness. We can then perform clinical testing to compare the degrees of situation awareness with and without the presence of early alerts. Ethical and legal issues Significant ethical and legal obstacles will need to be overcome before the work completed in this thesis can be adopted in the clinical environment. The software implementation of the proposed methods, iAssist, is currently in its prototype stage and is mainly used for conducting preliminary clinical testing. iAssist could be converted into 138 10.2. Future work: the road ahead a fully-developed software product for use in clinical practice. Here we discuss the ethical and legal requirements that a medical software product should comply with, using the US Food and Drug Administration (FDA) regulations as an example. The Federal Food, Drug, and Cosmetic Act (FD&C Act) [2], defines a medical device as: “an instrument, apparatus, implement, machine, contrivance, implant, in vitro reagent, or other similar or related article, including any component, part, or accessory, which is . . . intended for use in the diagnosis of disease or other conditions, or in the cure, mitigation, treatment, or prevention of disease . . . or intended to affect the structure or any function of the body”. According to this definition, a software implementation of any signal processing or decision-support system should be considered as a medical device and should be subject to all the FDA medical device regulations and guidelines. The FDA groups medical devices into three classes based on the level of control necessary to assure the safety and effectiveness of the device [111]. These groups range from Class I, which has a minimal potential for harm to the user, to Class III, which needs premarket approval. Under the current FDA policy, , software devices are grouped depending on the purpose of application into stand-alone devices and components, parts, or accessories to a pre-existing medical device. The level of concern for a software device is determined by its inherent risk, the degree of involvement of human intervention, and the level of regulation of the parent device. The level of risk involved with a physiological monitoring system is mainly determined by the relevance of its outcomes to the diagnosis and control of patient status. A system that contains built-in decision rules and generates diagnostic suggestions, in the opinion of some experts, is a standalone Class III device [16]. However, if a system only removes artifacts, and is only intended to be used as a supplementary module in an existing physiological monitor, the system is subject to less restrictive regulations. A software medical device differs from regular medical devices in many ways. Software is often used to realize complex functions, and the implementation is often easy to change. It is difficult to test a software device and keep track of its changes. The FDA currently requires a substantial volume of documentation to be submitted and examined for a premarket notification of a software product. Now the FDA is also considering Software Quality Audits (SQA) and certification as a replacement for premarket notification [111], based on the belief that a well managed software life-circle has an effect on the integrity of the resulting software system. Satisfying independent third-party SQA standards, such as IEC 62034 [69] (to be recognized by FDA), may remove the obligation to describe the software processes 139 10.2. Future work: the road ahead in detail in regulatory submissions to the FDA [111]. In addition to the FDA regulations, there are other ethical issues, such as patients’ attitudes toward the application of a decision support system. Any issue that may influence patients, clinicians, or the clinical environment during the planning, development, validation, and application phases of a software device should be handled with extreme care. It has been almost half a century since the first wave of research on automatic physiological monitoring emerged in the 1960s. Over the years, many problems in physiological monitoring have been investigated and many solutions proposed to assist clinical decision making. The computational power of a bedside monitor has increased to a level that makes advanced signal processing possible in the clinical environment. Moreover, the ever-increasing healthcare burden has significantly increased awareness of the potential benefits of a clinical decision support system among clinicians, medical device manufacturers, and the government. For all these reasons, we optimistically anticipate the clinical application of more advanced physiological monitoring systems in the near future. 140 Bibliography [1] ISO/IEEE 11073 Committee. http://www.ieee1073.org. [2] Federal Food, Drug, and Cosmetic Act, as Amended, Sec. 201(h). U.S. Government Printing Office, Washington, DC, 1993. [3] A. K. Akobeng. Understanding diagnostic tests 3: receiver operating characteristic curves. Acta Paediatrica, 96(5):644–647, 2007. [4] R. Allen. Time series methods in the monitoring of intracranial pressure I: Problems, suggestion for a monitoring scheme and review of appropriate techniques. Journal of Biomedical Engineering, 5(1):5– 18, 1983. [5] G. Alterovitz, D. Staelin, and J. H. Philip. Temporal patient state characterization using Iterative Order and Noise (ION) estimation: applications to anesthesia patient monitoring. Journal of Clinical Monitoring and Computing, 17(5):471–486, Oct 2003. [6] J. M. Ansermino, J. Lim, G. A. Dumont, C. Brouse, P. Yang, D. Dunsmuir, J. Daniels, and S. Schwarz. Clinical decision support in physiological monitoring. International Journal of Biomedical Engineering and Technology. to appear. [7] J. M. Ansermino, P. Yang, G. A. Dumont, J. Lim, N. Pegram, and C. Ries. Detecting an increase in heart rate in children using an adaptive change point detection algorithm. In IARS 79th Clinical and Scientific Congress, Honolulu, Hawaii, Mar 2005. [8] R. K. Avent and J. D. Charlton. A critical review of trend-detection methodologies for biomedical monitoring systems. Critial Review in Biomedical Engineering, 17(6):621–659, 1990. [9] Y. Bar-Shalom and X. R. Li. Estimation and Tracking: Principles, Techniques, and Software. Artech House, 1993. 141 Chapter 10. Bibliography [10] G. A. Barnard. Control charts and stochastic processes. Journal of the Royal Statistical Society. Series B (Methodological), 21(2):239–271, 1959. [11] P. Barralon, G. Ng, G. Dumont, S. K. W. Schwarz, and M. Ansermino. Development and evaluation of multidimensional tactons for a wearable tactile display. In Proceedings of the 9th international conference on Human computer interaction with mobile devices and services, pages 186–189, Singapore, 2007. [12] M. Basseville. Detecting changes in signals and systems—a survey. Automatica, 24(3):309–326, 1988. [13] P. C. Beatty. Preliminary survey into clinical attitudes to computer based decision support systems. British Journal of Anaesthesia, 79:140–141, 1997. [14] K. Becker, B. Thull, H. K¨asmacher-Leidinger, J. Stemmer, G. Rau, G. Kalff, and H.-J. Zimmermann. Design and validation of an intelligent patient monitoring and alarm system based on a fuzzy logic process model. Artificial Intelligence in Medicine, 11(1):33–53, 1997. [15] I. A. Beinlich and D. M. Gaba. The ALARM monitoring systemintelligent decision making under uncertainty. Anesthesiology, 71(A):336, 1989. [16] E. S. Berner, K. J. Hannah, and M. J. Ball, editors. Clinical Decision Support Systems: Theory and Practice. Springer-Verlag New York, Inc., Secaucus, NJ, USA, 1998. [17] A. H. Briggs, A. E. Ades, and M. J. Price. Probabilistic sensitivity analysis for decision trees with multiple branches: Use of the dirichlet distribution in a Bayesian framework. Medical Decision Making, 23(4):341–350, Aug 2003. [18] C. Brouse, G. A. Dumont, F. J. Herrmann, and J. M. Ansermino. A wavelet approach to detecting electrocautery noise in the ecg. IEEE Engineering in Medicine and Biology Magazine, 25(4):76–82, Jul-Aug 2006. [19] C. Brouse, G. A. Dumont, P. Yang, J. Lim, and J. M. Ansermino. iassist: A software framework for intelligent patient monitoring. In Proceedings of the 29th Annual International Conference of the IEEE 142 Chapter 10. Bibliography Engineering in Medicine and Biology Society, pages 3790–3793, Aug 2007. [20] C. Burge and S. Karlin. Prediction of complete gene structures in human genomic DNA. Journal of molecular biology, 268(1):78–94, Apr 1997. [21] A. Burian and P. Kuosmanen. Tuning the smoothness of the recursive median filter. IEEE Transactions on Signal Processing, [see also IEEE Transactions on Acoustics, Speech, and Signal Processing], 50(7):1631–1639, Jul 2002. [22] D. Calvelo, M. C. Chambrin, D. Pomorski, and P. Ravaux. Towards symbolization using data-driven extraction of local trends for icu monitoring. Artificial Intelligence in Medicine, 19(3):203–223, 2000. [23] M.-C. Chambrin. Alarms in the intensive care unit: how can the number of false alarms be reduced? Critical Care, 5(4):184–188, 2001. [24] S. Charbonnier. On line extraction of temporal episodes from ICU high-frequency data: A visual support for signal interpretation. Computer Methods and Programs in Biomedicine, 78(2):115–132, 2005. [25] S. Charbonnier, G. Becq, and L. Biot. On-line segmentation algorithm for continuously monitored data in intensive care units. IEEE Transactions on Biomedical Engineering, 51(3):484–492, Mar 2004. [26] C. Chattfield. The Analysis of Time-Series, 5th ed. Chapman & Hall, Holden Day, San Francisco, 1996. [27] A. I. Cohn, S. Rosenbaum, M. Factor, and P. L. Miller. DYNASCENE: an approach to computer-based intelligent cardiovascular monitoring using sequential clinical scenes. Methods of Information in Medicine, 29(2):122–131, 1990. [28] E. Coiera, V. Tombs, and T. Clutton-Brock. Attentional overload as a fundamental cause of human error in monitoring. Technical report, Hewlett Packard Laboratories, 1996. [29] J. Cooper, R. Newbower, and R. Kitz. An analysis of major errors and equipment failures in anesthesia management: considerations for prevention and detection. Anesthesiology, 60:34–42, 1984. 143 Chapter 10. Bibliography [30] M. D. Coovert and K. McNelis. Determining the number of common factors in factor analysis — a review and program. Educational and Psychological Measurement., 48(3):687–692, 1988. [31] C. M. Crowe. Data reconciliation — progress and challenges. Journal of Process Control, 6:89–98, 1996. [32] M. Daumer and M. Falk. On-line change-point detection (for state space models) using multi-process Kalman filters. Linear Algebra and its Applications, 284(1–3):125–135, 1998. [33] J. G. De Gooijer and R. J. Hyndman. 25 years of time series forecasting. International Journal of Forecasting, 22(3):443–473, 2006. [34] S. Deligne. Interference of variable-length acoustic units for continuous speech recognition. In Proceeding of International Conference on Acoustics, Speech, and Signal Processing, volume 3, pages 1731–1734, Munich, Germany, 1997. [35] L. Devroye. Non-uniform Random Variate Generation. SpringerVerlag, New York, 1986. [36] S. R. Dixon, C. D. Wickens, and J. S. McCarley. On the independence of compliance and reliance: Are automation false alarms worse than misses? Human Factors, 49(4):564–572, Aug 2007. [37] M. Dojat, F. Pachet, Z. Guessoum, D. Touchard, and A. Harf. NeoGanesh: A working system for the automated control of assisted ventilation in ICUs. Artificial Intelligence in Medicine, 11(2):97–117, Oct 1997. [38] S. Dreiseitl and M. Binder. Do physicians value decision support? a look at the effect of decision support systems on physician opinion. Artificial Intelligence in Medicine, 33(1):25–30, 2005. [39] D. Dunsmuir, J. Daniels, C. Brouse, S. Ford, and J. M. Ansermino. A knowledge authoring tool for clinical decision support. Journal Journal of Clinical Monitoring and Computing, 22(3):189–198, Jun 2008. [40] M. H. Ebrahim, J. M. Feldman, and I. Bar-Kana. A robust sensor fusion method for heart rate estimation. Journal of Clinical Monitoring and Computing, 13(6):385–393, 1997. 144 Chapter 10. Bibliography [41] Z. Elghazzawi. Method and apparatus for removing artifact from physiological signals. United States Patent, (5971930), Oct 1999. [42] J. Endresen and D. W. Hill. The present state of trend detection and prediction in patient monitoring. Intensive Care Medicine, 3:15–26, 1977. [43] M. R. Endsley. Design and evaluation for situation awareness enhancement. In Proceedings of the Human Factors Society 32nd Annual Meeting, Human Factors Society, pages 385–388, Sep 1988. [44] L. M. Fagan. VM: Representing Time-dependent Relations in a Medical Detting. PhD Thesis. Stanford University, Stanford, CA, USA, 1980. [45] R. Fried and M. Imhoff. Alarm algorithms in critical care monitoring. Biometrical Journal, 46(1):90–102, 2004. [46] N. J. Gallagher and G. Wise. A theoretical analysis of the properties of median filters. IEEE Transactions on Acoustics, Speech, and Signal Processing [see also IEEE Transactions on Signal Processing], 29(6):1136–1141, Dec 1981. [47] D. Garfinkel, P. Matsiras, T. Mavrides, J. McAdams, and S. Auckburg. Patient monitoring in the operating room: Validation of instrument reading by artificial intelligence methods. In Proceedings of 13th Symposium on Computer Applications in Medical Care, pages 575–579, 1989. [48] U. Gather, R. Fried, V. Lanius, and M. Imho. Online monitoring of high dimensional physiological time series - a case study. Estad´ıstica (Revista del instituto interamericano de estad´ıstica), 53:259–298, 2003. [49] X. Ge. Segmental Semi-Markov Models and Applications to Sequence Analysis. PhD Thesis. University of California, Irvine, Dec 2002. [50] G. Goodwin and K. Sin. Prediction and Control. Prentice-Hall, first edition, 1984. [51] K. Gordon. The multi-state Kalman filter in medical monitoring. Computer Methods and Programs in Biomedicine, 23:147–154, 1986. [52] I. J. Haimowitz. Knowledge-based trend detection and diagnosis. PhD Thesis. Massachusetts Institute of Technology, Cambridge, MA, USA, 1994. 145 Chapter 10. Bibliography [53] I. J. Haimowitz and I. S. Kohane. Managing temporal worlds for medical trend diagnosis. Artificial Intelligence in Medicine, 8(3):299– 321, 1996. [54] I. J. Haimowitz, P. P. Le, and I. S. Kohane. Clinical monitoring using regression-based trend templates. Artificial Intelligence in Medicine, 7(6):473–496, 1995. [55] J. A. Hanley and B. J. McNeil. The meaning and use of the area under a receiver operating characteristic (ROC) curve. Radiology, 143(1):29– 36, 1982. [56] M. J. Harrison and C. W. Connor. Statistics-based alarms from sequential physiological measurements. Anaesthesia, 62(10):1015–1023, 2007. [57] P. J. Harrison and C. F. Stevens. Bayesian forecasting (with discussion). Journal of the Royal Statistical Society: Series B, 38(3):205–247, 1976. [58] M. J. Hayes and P. R. Smith. A new method for pulse oximetry possessing inherent insensitivity to artifact. IEEE Transactions on Biomedical Engineering, 48(4):452–461, Apr 2001. [59] D. J. Hitchings, M. J. Campbell, and D. E. Taylor. Trend detection of pseudo-random variables using a exponentially mapped past statistical approach: an adjunct to computer assisted monitoring. Journal of Clinical Monitoring, 6(2):73–88, Apr 1975. [60] S. Hoare. Automatic artifact identification in anesthesia patient record keeping: a comparison of techniques. Medical Engineering and Physics, 22(8):547–553, 2000. [61] C. C. Holt. Forecasting seasonals and trends by exponentially weighted moving averages (originally published in 1957). International Journal of Forecasting, 20(1):5–10, 2004. [62] C. E. Hope, C. D. Lewis, I. R. Perry, and A. Gamble. Computed trend analysis in automated patient monitoring systems. British Journal of Anaesthesia, 45(5):440–449, 1973. [63] M. Imhoff, M. Bauer, U. Gather, and D. Lohlein. Time series analysis in intensive care medicine. Applied Cardiopulmonary Pathophysiology, 6:203–281, 1997. 146 Chapter 10. Bibliography [64] A. Ismail. On Dual Control and Adaptive Kalman Filtering with Application in the Pulp and Paper Industry. PhD Thesis. Department of Electrical & Computer Engineering, The University of British Columbia, Vancouver, Canada, Sep 2001. [65] A. Jawad and J. G. Cundill. An expert system for the mechanical ventilator settings. IEE Colloquium on Intelligent Decision Support Systems and Medicine, pages 1–3, Jun 1992. [66] K. J¨ oreskog. Some contributions to maximum likelihood factor analysis. Psychometrika, 32(4):443–482, Dec 1967. [67] R. Jennrich and S. Robinson. A Newton-Raphson algorithm for maximum likelihood factor analysis. Psychometrika, 34(1):111–123, Mar 1969. [68] I. T. Jolliffe. Principal Component Analysis. Springer, second edition, Oct 2002. [69] P. Jordan. Standard IEC 62304 - medical device software - software lifecycle processes. Software for Medical Devices, 2006. The Institution of Engineering and Technology Seminar on, pages 41–47, Nov. 2006. [70] M. G. Kahn. Modeling time in medical decision-support programs. Medical Decision Making, 11(4):249–264, 1991. [71] H. Kaiser. The varimax criterion for analytic rotation in factor analysis. Psychometrika, 23:187–200, 1958. [72] R. Kennedy. A modified Trigg’s Tracking Variable as an ‘advisory’ alarm during anesthesia. International Journal of Clinical Monitoring and Computing, 12:197–204, 1995. [73] G. Klein and R. Calderwood. Decision models: some lessons from the field. IEEE Transactions on Systems, Man and Cybernetics, 21(5):1018–1026, Sep/Oct 1991. [74] E. Koski, A. M¨akivirta, T. Sukuvaara, and A. Kari. Frequency and reliability of alarms in the monitoring of cardiac postoperative patients. Journal of Clinical Monitoring and Computing, 7(2):129–133, 1990. [75] C. A. Kulikowski. History and development of artificial intelligence methods for medical decision making. In J. D. Bronzino, editor, The Biomedical Engineering Handbook, pages 2681–2698. IEEE Press, 1995. 147 Chapter 10. Bibliography [76] J. Larssan and B. Hayes-Roth. Guardian: intelligent autonomous agent for medical monitoring and diagnosis. IEEE Intelligent Systems and Their Applications, 13(1):58–64, Jan/Feb 1998. [77] S. T. Lawless. Crying wolf: False alarms in a pediatric intensive care unit. Critical Care Medicine, 22(6):981–985, 1994. [78] A. Lowe, R. W. Jones, and M. J. Harrison. Temporal pattern matching using fuzzy templates. Journal of Intelligent Information Systems, 13(1-2):27–45, 1999. [79] I. MacGill, J. Cade, R. Siganporia, and J. Packer. VAD: ventilation management in the ICU. In Proceedings of Third Annual IEEE Symposium on Computer-Based Medical Systems, pages 345–349, Jun 1990. [80] W. H. Majoros, M. Pertea, A. L. Delche, and S. L. Salzberg. Efficient decoding algorithms for generalized hidden Markov model gene finders. BMC Bioinformatics, 6:16, 2005. [81] A. Makivirta, E. Koski, A. Kari, and T. Sukuvaara. The median filter as a preprocessor for a patient monitor limit alarm system in intensive care. Computer Methods and Programs in Biomedicine, 34(2-3):139– 144, 1991. [82] V. Manfredi, S. Mahadevan, and J. Kurose. Switching Kalman filters for prediction and tracking in an adaptive meteorological sensing network. In Proceeding of 2005 Second Annual IEEE Communications Society Conference on Sensor and Ad Hoc Communications and Networks, pages 197–206, Santa Clara, California, USA, Sep 2005. [83] D. Maquin, O. Adrot, and J. Ragot. Data reconciliation with uncertain models. ISA Transactions, 39:35–45, Feb 2000. [84] B. H. McGhee and M. E. J. Bridges. Monitoring arterial blood pressure: What you may not know. Critical Care Nurse, 22(2):66–70, Apr 2002. [85] W. W. Melek, Z. Lu, A. Kapps, and W. D. Fraser. Comparison of trend detection algorithms in the analysis of physiological time-series data. IEEE Transactions on Biomedical Engineering, 52(4):639–651, Apr 2005. 148 Chapter 10. Bibliography [86] C. Meredith and J. Edworthy. Are there too many alarms in the intensive care unit? an overview of the problems. Journal of Advanced Nursing, 21(1):15–20, 1995. [87] G. Miller. The magic number seven plus or minus two: some limits on our capacity for processing information. Psychological Review, 63:81– 97, 1956. [88] P. Molenaar, J. Gooijer, and B. Schmitz. Dynamic factor analysis of nonstationary multivariate time series. Psychometrika, 57(3):333–349, Sep 1992. [89] M. K. Molyneux, A. K. McIndoe, A. T. Lovell, and P. K. T. Mok. Continuous auditory monitoring. How well can we estimate absolute and changing heart rates? Can this be improved. In International Meeting on Medical Simulation, Albuquerque, NM, USA, Jan 2004. [90] F. A. Mora, G. Passariello, G. Carrault, and J. P. Le Pichon. Intelligent patient monitoring and management systems: a review. IEEE Engineering in Medicine and Biology Magazine, 12(4):23–33, Dec 1993. [91] H. J. Murff, V. L. Patel, G. Hripcsak, and D. W. Bates. Detecting adverse events for patient safety research: a review of current methodologies. Journal of Biomedical Informatics, 36(1-2):131–143, 2003. [92] K. Murphy. Switching Kalman filter. Technical report, Compaq Cambridge Research Lab, Cambridge, MA, 1998. [93] K. Murphy. Hidden semi-Markov models (segment models). Technical report, MIT Artificial Intelligence Lab, Nov 2002. [94] I. J. Myung. The importance of complexity in model selection. Journal of Mathematical Psychology, 44(1):190–204, 2000. [95] M. J. Navabi, K. C. Mylrea, and R. C. Watt. Detection of false alarms using an integrated anesthesia monitor. In Proceedings of the 11th Annual International Conference of the IEEE Engineering in Medicine and Biology Society, volume 6, pages 1774–1775, Nov 1989. [96] K. C. Ng and B. Abramson. Uncertainty management in expert systems. IEEE Expert [see also IEEE Intelligent Systems and Their Applications], 5(2):29–48, Apr 1990. 149 Chapter 10. Bibliography [97] J. Orr and D. Westenskow. A breathing circuit alarm system based on neural networks. Anesthesiology, 71(A):339, 1989. [98] M. Ostendorf, V. V. Digalakis, and O. A. Kimball. From HMM’s to segment models: A unified view of stochastic modeling for speech recognition. IEEE Transation of Speech and Audio Processing, 4(5):360–378, 1996. [99] J. Otterman. The properties and methods for computation of exponentially-mapped-past statistical variables. IEEE Transactions on Automatic Control [see also IRE Transactions on Automatic Control], 5:11–17, Jan 1960. [100] E. S. Page. Continuous inspection schemes. Biometrika, 41:100–115, 1954. [101] W. A. Perkins and A. Austin. Adding temporal reasoning to expertsystem-building environments. IEEE Expert: Intelligent Systems and Their Applications, 05(1):23–30, 1990. [102] R. R. Pitre, V. P. Jilkov, and X. R. Li. A comparative study of multiple-model algorithms for maneuvering target tracking. In I. Kadar, editor, Proceedings of Signal Processing, Sensor Fusion, and Target Recognition XIV, volume 5809, pages 549–560, May 2005. [103] C. Pott, A. Johnson, and F. Cnossen. Improving situation awareness in anaesthesiology. In Proceedings of the 2005 annual conference on European association of cognitive ergonomics, pages 255–263. University of Athens, 2005. [104] Press and Shigemasu. Bayesian inference in factor analysis. In L. J. Glesser, editor, Contributions to Probability and Statistics, New York, 1989. Springer. [105] L. Rabiner. A tutorial on hidden Markov models and selected applications in speech recognition. Proceedings of the IEEE, 77(2):257–286, Feb 1989. [106] N. Ramaux, D. Fontaine, and M. Dojat. Temporal scenario recognition for intelligent patient monitoring. In Proceedings of the 6th Conference on Artificial Intelligence in Medicine in Europe, pages 331–342, 1997. 150 Chapter 10. Bibliography [107] I. Rampil. Intelligent detection of artifact. In J. Gravenstein, editor, The automated anesthesia record and alarm systems, pages 175–190, Boston, 1987. Butterworths. [108] C. Rao. Estimation and tests of significance in factor analysis. Psychometrika, 20(2):93–111, Jun 1955. [109] G. Rau. Man-machine communication for monitoring, recording and decision support in an anesthesia information system. In Proceedings of the 9th Annual International Conference of the IEEE Engineering in Medicine and Biology Society, pages 427–428, Boston, MA, 1987. [110] A. C. Rencher. Multivariate Statistical Inference and Applications. Wiley Interscience, Dec 1997. [111] Rockville. FDA regulation of medical device software (background). In the FDA/NLM Software Policy Workshop, Sep 1996. [112] D. Rubin and D. Thayer. EM algorithms for ML factor analysis. Psychometrika, 47(1):69–76, Mar 1982. [113] A. Salatian and J. Hunter. Deriving trends in historical and real-time continuously sampled medical data. Journal of Intelligent Information Systems, 13(1-2):47–71, 1999. [114] T. Schecke, G. Rau, H. J. Popp, H. Kasmacher, G. Kalff, and H. J. Zimmermann. A knowledge-based approach to intelligent alarms in anesthesia. IEEE Engineering in Medicine and Biology Magazine, 10(4):38–44, Dec 1991. [115] Y. Shahar. A framework for knowledge-based temporal abstraction. Artificial Intelligence, 90(1-2):79–133, 1997. [116] Y. Shahar and M. A. Musen. RESUME: A temporal-abstraction system for patient monitoring. Computers and biomedical research, 26(3):255–273, 1993. [117] S. Sharshar, L. Allart, and M.-C. Chambrin. A new approach to the abstraction of monitoring data in intensive care. Lecture Notes in Computer Science, 3581:13–22, 2005. [118] E. H. Shortliffe. Computer-Based Medical Consultations: MYCIN. Elsevier, North-Holland, New York, 1976. 151 Chapter 10. Bibliography [119] R. H. Shumway and D. S. Stoffer. Dynamic linear models with switching. Journal of the American Statistical Association, 86(415):763–769, 1991. [120] T. Sibanda and N. Sibanda. The Cusum chart method as a tool for continuous monitoring of clinical outcomes using routinely collected data. BMC Medical Research Methodology, 7(1):46, 2007. [121] D. Sittig, M. Factor, and D. PSittig. Physiologic trend detection and artifact rejection: a parallel implementation of a multi-state Kalman filtering algorithm. Computer Methods and Programs in Biomedicine, 31:1–10, 1990. [122] D. F. Sittig, N. L. Pace, R. M. Gardner, E. Beck, and A. H. Morris. Implementation of a computerized patient advice system using the HELP clinical information system. Computers and Biomedical Research, 22(5):474–487, 1989. [123] D. A. Sitzman and R. M. Farrell. System and method for selecting physiological data from a plurality of physiological data sources. United States Patent, (6801802), Oct 2004. [124] A. Smith and M. West. Monitoring renal transplants: an application of the multiprocess Kalman filter. Biometrics, 39:867–878, 1983. [125] S. Snook and R. Gorsuch. Principal component analysis versus common factor analysis: A Monte Carlo study. Psychological Bulletin, 106(1):148–154, Jul 1989. [126] T. S¨oderstr¨om. Discrete-time Stochastic Systems: Estimation and Control. Springer, second edition, 2002. [127] Y. Sowb, R. Loch, and E. Roth. Cognitive modeling of intraoperative critical events. In IEEE International Conference on Systems, Man, and Cybernetics, volume 3, pages 2533–2538, Oct 1998. [128] D. Spiegelhalter, O. Grigg, R. Kinsman, and T. Treasure. Riskadjusted sequential probability ratio tests: applications to bristol, shipman and adult cardiac surgery. International Journal for Quality in Health Care, 15(1):7–13, 2003. [129] M. Stacey and C. McGregor. Temporal abstraction in intelligent clinical data analysis: A survey. Artificial Intelligence in Medicine, 39(1):1–24, 2007. 152 Chapter 10. Bibliography [130] F. Steimann. Diagnostic Monitoring of Clinical Time Series. PhD thesis. Technical University of Vienna, Austria, 1995. [131] K. Stoodley and M. Mirnia. The automatic detection of transients, step changes and slope changes in the monitoring of medical time series. The Statistician, 28:163–170, 1979. [132] G. Takla, J. H. Petre, D. J. Doyle, M. Horibe, and B. Gopakumaran. The problem of artifacts in patient monitor data during surgery: a clinical and methodological review. Anesthesia Analgesia, 103(5):1196– 1204, Nov 2006. [133] A. C. Tamhane and R. S. H. Mah. Data reconciliation and gross error detection in chemical process networks. Technometrics, 27(4):409–422, 1985. [134] S. Tani, M. Fujikake, K. Minamitani, and Y. Yamanaka. Electronic aid in routine observation in hospital ward. In Digest of Papers for Scientific Program, 6th International Conference on Medical Electronics and Biological Engineering, Aug 1965. [135] D. E. M. Taylor, J. S. Whamond, and D. J. Hitchings. A probabilistic approach to patient monitor alarm system. In MEDINFO’74, page 746, North-Gikkabdm Ansterdan, 1974. [136] S. Taylor-Adams, A. Brodie, and C. Vincent. Safety skills for clinicians: An essential component of patient safety. Journal of Patient Safety, 4(3), Sep 2008. [137] G. H. Thomson. The Factorial Analysis of Human Ability. London University Press, 1951. [138] J. Tinker, D. Dull, R. Caplan, R. Ward, and F. Cheney. Role of monitoring devices in prevention of anesthetic mishaps: a closed claims analysis. Anesthesiology, 71(4):541–546, 1989. [139] D. Trigg. Monitoring a forecasting system. Operational Research Quarterly, 15:271–274, 1964. [140] C. L. Tsien. TrendFinder: Automated Detection of Alarmable Trends. PhD Thesis. Massachusetts Institute of Technology, 2000. [141] C. L. Tsien and J. C. Fackler. Poor prognosis for existing monitors in the intensive care unit. Critical Care Medicine, 25(4):614–619, Apr 1997. 153 Chapter 10. Bibliography [142] N. S. Uckun. An Ontology for Model-based Reasoning in Physiological Domains. PhD Thesis. Vanderbilt University, Nashville, TN, USA, 1992. [143] N. S. Uckun. Artificial intelligence in medicine: state-of-the-art and future prospects. Artificial Intelligence in Medicine, 5(2):89–91, 1993. [144] S. Uckun. Intelligent systems in patient monitoring and therapy management. International Journal of Clinical Monitoring and Computing, 11:11–241, 1994. [145] S. Uckun and B. M. Dawant. Qualitative modeling as a paradigm for diagnosis and prediction in critical care environments. Artificial Intelligence in Medicine, 4(2):127–144, 1992. [146] O. Uluyol, X. Wang, and L. H. Tsoukalas. A connectionist approach to trend detection. In Proceedings of International Conference on Information Intelligence and Systems, pages 255–259, 1999. [147] W. F. Velicer and D. N. Jackson. Component analysis versus common factor analysis: Some further observations. Multivariate Behavioral Research, 25(1):97–114, 1990. [148] A. Wald. Sequential tests of statistical hypotheses. The Annals of Mathematical Statistics, 16(2):117–186, 1945. [149] K. Watanabe and S. G. Tzafestas. Generalized pseudo-Bayes estimation and detection for abruptly changing systems. Journal of Intelligent and Robotic Systems, 7(1):95–112, Febuary 1993. [150] C. K. Waterson. Development directions in integrated anesthesia monitoring. In Proceedings of the Annual International Conference of the IEEE Engineering in Medicine and Biology Society, volume 4, pages 1765–1766, Nov 1988. [151] J. P. Welch and N. M. Sims. Network for portable patient monitoring devices. United States Patent, (5319363), Jun 1994. [152] D. Westenskow, J. Orr, F. Simon, H. Bender, and H. Frankenberger. Intelligent alarms reduce anesthesiologist’s response time to critical faults. Anesthesiology, 77(6):1074–1079, 1992. [153] J. Whittaker and S. Fruhwirth-Schnatter. A dynamic change point model for detecting the onset of growth in bacteriological infections. Applied Statistics, 43(4):625–640, 1994. 154 Chapter 10. Bibliography [154] C. K. I. Williams, J. Quinn, and N. McIntosh. Factorial switching Kalman filters for condition monitoring in neonatal intensive care. In Advances in Neural Information Processing Systems 18, pages 1513– 1520, Cambridge, MA, 2006. MIT Press. [155] M. Wright, J. Taekman, and M. Endsley. Objective measures of situation awareness in a simulated medical environment. Quality and Safty in Health Care, 13:65–71, October 2004. [156] W. Wu, M. Black, D. Mumford, E. Bienenstock, and J. Donoghue. Modeling and decoding motor cortical activity using a switching Kalman filter. IEEE Transation on Biomedical Engineering, 51(6):933–942, Jun 2004. [157] J. Yamron, L. Gillick, S. Knecht, S. Lowe, and P. van Mulbregt. Statistical models for tracking and detection, 2000. [158] P. Yang, G. Dumont, and J. M. Ansermino. An adaptive Cusum test based on a hidden semi-Markov model for change detection in noninvasive mean blood pressure trend. In Proceedings of the 28th Annual International Conference of the IEEE Engineering in Medicine and Biology Society, pages 3395–3398, New York, USA, Aug 2006. [159] P. Yang, G. A. Dumont, and J. Ansermino. Online pattern recognition based on a generalized hidden Markov model for intraoperative vital sign monitoring. International Journal of Adaptive Control and Signal Processing, in press, 2008. [160] P. Yang, G. A. Dumont, and J. Ansermino. Sensor fusion using a hybrid median filter for artifact removal in intraoperative heart rate monitoring. Journal of Clinical Monitoring and Computing, 23(2):75– 83, Apr 2009. [161] P. Yang, G. A. Dumont, and J. M. Ansermino. Adaptive change detection in heart rate trend monitoring in anesthetized children. IEEE Transactions on Biomedical Engineering, 53(11):2211–2219, 2006. [162] P. Yang, G. A. Dumont, S. Ford, and J. M. Ansermino. Multivariate analysis in clinical monitoring: Detection of intraoperative hemorrhage and light anesthesia. In Proceedings of the 29th Annual International Conference of the IEEE Engineering in Medicine and Biology Society, pages 6215–6218, Aug 2007. 155 Chapter 10. Bibliography [163] P. Yang, G. A. Dumont, J. Lim, and J. M. Ansermino. Adaptive change point detection for respiratory variables. In Proceedings of the 27th Annual International Conference of the IEEE Engineering in Medicine and Biology Society, Shanghai, China, Sep 2005. [164] D. Young and J. Griffiths. Clinical trials of monitoring in anesthesia, critical care and acute ward care: a review. British Journal of Anaesthesia, 97(1):39–45, 2006. [165] L. Zhou, H. Su, and J. Chu. A study of nonlinear dynamic data reconciliation. In Proceedings of the IEEE International Conference on Systems, Man and Cybernetics, pages 1360–1365, Netherlands, Oct 2004. [166] T. Zhou and R. Shumway. One-step approximations for detecting regime changes in the state space model with application to the influenza data. Computational Statistics and Data Analysis, 52(5):2277– 2291, 2008. [167] M. H. Zweig and G. Campbell. Receiver-operating characteristic (ROC) plots: a fundamental evaluation tool in clinical medicine. Clinical Chemistry, 39(4):561–577, 1993. 156 Appendices I Covariance estimation in the adaptive Kalman filter This appendix describes the Q-R estimation process used in the adaptive Kalman filter in Chapter 5. Refer to [64] for more details. Analytical formats of Q and R estimates The probability density of (x0...t , y0...t ) is: V (x0...t , y0...t |θ) = V (y0...t |x0...t , θ) · V (x0...t |θ) (I-1) where θ is (Q, R). Given the proposed DLM (5.1), the expectation of the log-likelihood E in Equation (5.7) becomes: t f =− t+1 2 log |R| − 2 log |Q| t 1 −1 − 2 k=0 tr{R (εk|t εTk|t + HVk|t H T )} − 21 tk=1 tr{Q−1 (αk|t + F αk−1|t F T )} + o (I-2) where o represents terms independent of Q and R, tr denotes the matrix trace, and ˆk|t x ˆTk|t + Vk|t , αk|t = x βk|t = x ˆk|t x ˆTk−1|t + V(k−1,k)|t , (I-3) ε = y − H x ˆk|t k k|t with x ˆk|t and Vk|t denoting the state estimate and the estimation error covariance of xk , both conditional upon y0...t (k t); V(k−1,k)|t denoting E{(xk−1 − xk−1|t )(xk − xk|t )}. Maximizing f with respect to Q and R, we get the analytical answer of the Q and R estimates: ˆ t = 1 t (αk|t + F αk−1|t F T − βk|t F T − F β T ) Q k=1 k|t t Lk|t (I-4) t T ˆt = 1 (ε ε + HV HT ) R k|t k|t k=0 k|t t+1 Vk|t 157 I. Covariance estimation in the adaptive Kalman filter Recursive update of Q and R estimates Substituting the recursive realization of x ˆk|t , Vk|t and V(k−1,k)|t into Equation (I-3) yields: αk|t = αk|t−1 + Wk|t β = βk|t−1 + Uk|t (I-5) k|t εk|t = εk|t−1 − HKt,k|t−1 εt|t−1 where Wk|t = M (t, k, k) and Uk|t = M (t, k, k−1), with M (t, k, m) = Ka (t, k)εt|t−1 εTt|t−1 Ka (m, t)T − Va (t, k)H T Ka (m, t)T + Ka (t, k)εt|t−1 x ˆTm|t−1 + x ˆk|t−1 εTt|t−1 Ka (m, t)T − Va (t, k)H T Ka (m, t)T Using Equation (I-4) we get: ˆ t−1 + Lt|t + t−1 γk|t } ˆ t = 1 {(t − 1) · Q Q k=1 t ˆ t = 1 {t · R ˆ t−1 + Vt|t + t−1 χk|t } R k=0 t+1 (I-6) where γk|t = HKa (t, k)εt|t−1 εTt|t−1 Ka (t, k)T H T − εk|t−1 εTt|t−1 Ka (t, k)T H T − HKa (t, k)εt|t−1 εTk|t−1 − HVa (t, k)H T Ka (t, k)T H T and χk|t = Wk|t + F Wk−1|t F T − Uk|t F T − F U T . Since γ and χ vanish quickly as k moves away from t, a window T with a size of 2-3 times the dominant time constant of the Kalman filter is applied to finally get Equation (5.8). Proof of Equation (5.5) Consider an augmented state vector [xt pt qt ]T for t > k, where pt = xk , qt = xk−1 . The system model is: xt+1 F 00 xt I pt+1 = 0 I0 pt + 0 ζt (I-7) qt+1 0 0I qt 0 The prediction error covariance matrix of the augmented state vector is derived in a similar manner to the standard Kalman filter. One of the p,q matrix elements, Vt+1|t , which indicates the cross-covariance of pˆt+1|t and qˆt+1|t , is recursively updated by: p,q p,q x,p T T Vt+1|t = Vt|t−1 − [Vt|t−1 ] H [Ktx,q ]T (I-8) x,p T Let Va (t, k) represent [Vt|t−1 ] and Ka (k−1, t) represent [Ktx,q ]. As Vk,k−1|t = p,q Vt+1|t , Equation (5.5) is confirmed. Equations in (5.6) follow the standard Kalman filter iteration. 158 II. List of physiological variables II List of physiological variables Variable Unit Description HR HRecg HRSpO2 HRabp NIBPmean NIBPdia NIBPsys MAP BPsys BPdia MCVP SpO2 SvO2 MVexp RR(CO2 ) MVinsp TVexp TVinsp Ppeak EtCO2 (FeCO2 ) EtO2 (FeO2 ) FeAA FeN2 O FeN2 FiO2 FiAA FiN2 O FiN2 beats/min beats/min beats/min beats/min mmHg mmHg mmHg mmHg mmHg mmHg mmHg % % ml breaths/min ml ml ml mmHg % % % % % % % % % Number of heart beats per minute HR from electrocardiogram monitor HR from pulse oximeter HR from arterial blood monitor Noninvasive mean blood pressure Noninvasive diastolic blood pressure Noninvasive systolic blood pressure Invasive mean arterial blood pressure Invasive systolic arterial blood pressure Invasive diastolic arterial blood pressure Mean central venous blood pressure Oxygen saturation of arterial blood Mixed venous oxygen saturation End tidal minute volume Respiratory rate from capnograph Inspiratory minute volume End tidal volume Inspiratory tidal volume Peak airway pressure End tidal concentration of carbon dioxide End tidal concentration of oxygen End tidal concentration of anesthetic agent End tidal concentration of nitrous oxide End tidal concentration of nitrogen Inspiratory concentration of oxygen Inspiratory concentration of anesthetic agent Inspiratory concentration of nitrous oxide Inspiratory concentration of nitrogen 159
- Library Home /
- Search Collections /
- Open Collections /
- Browse Collections /
- UBC Theses and Dissertations /
- Adaptive trend change detection and pattern recognition...
Open Collections
UBC Theses and Dissertations
Featured Collection
UBC Theses and Dissertations
Adaptive trend change detection and pattern recognition in physiological monitoring Yang, Ping 2009
pdf
Page Metadata
Item Metadata
Title | Adaptive trend change detection and pattern recognition in physiological monitoring |
Creator |
Yang, Ping |
Publisher | University of British Columbia |
Date Issued | 2009 |
Description | Advances in monitoring technology have resulted in the collection of a vast amount of data that exceeds the simultaneous surveillance capabilities of expert clinicians in the clinical environment. To facilitate the clinical decision-making process, this thesis solves two fundamental problems in physiological monitoring: signal estimation and trend-pattern recognition. The general approach is to transform changes in different trend features to nonzero level-shifts by calculating the model-based forecast residuals and then to apply a statistical test or Bayesian approach on the residuals to detect changes. The EWMA-Cusum method describes a signal as the exponentially moving weighted average (EWMA) of historical data. This method is simple, robust, and applicable to most variables. The method based on the Dynamic Linear Model (refereed to as Adaptive-DLM method) describes a signal using the linear growth model combined with an EWMA model. An adaptive Kalman filter is used to estimate the second-order characteristics and adjust the change-detection process online. The Adaptive-DLM method is designed for monitoring variables measured at a high sampling rate. To address the intraoperative variability in variables measured at a low sampling rate, a generalized hidden Markov model is used to classify trend changes into different patterns and to describe the transition between these patterns as a first-order Markov-chain process. Trend patterns are recognized online with a quantitative evaluation of the occurrence probability. In addition to the univariate methods, a test statistic based on Factor Analysis is also proposed to investigate the inver-variable relationship and to reveal subtle clinical events. A novel hybrid median filter is also proposed to fuse heart-rate measurements from the ECG monitor, pulse oximeter, and arterial BP monitor to obtain accurate estimates of HR in the presence of artifacts. These methods have been tested using simulated and clinical data. The EWMA-Cusum and Adaptive-DLM methods have been implemented in a software system iAssist and evaluated by clinicians in the operating room. The results demonstrate that the proposed methods can effectively detect trend changes and assist clinicians in tracking the physiological state of a patient during surgery. |
Extent | 2005198 bytes |
Genre |
Thesis/Dissertation |
Type |
Text |
FileFormat | application/pdf |
Language | eng |
Date Available | 2009-06-11 |
Provider | Vancouver : University of British Columbia Library |
Rights | Attribution-NonCommercial-NoDerivatives 4.0 International |
IsShownAt | 10.14288/1.0067275 |
URI | http://hdl.handle.net/2429/8932 |
Degree |
Doctor of Philosophy - PhD |
Program |
Electrical and Computer Engineering |
Affiliation |
Applied Science, Faculty of Electrical and Computer Engineering, Department of |
Degree Grantor | University of British Columbia |
GraduationDate | 2009-11 |
Campus |
UBCV |
Scholarly Level | Graduate |
Rights URI | http://creativecommons.org/licenses/by-nc-nd/4.0/ |
AggregatedSourceRepository | DSpace |
Download
- Media
- 24-ubc_2009_fall_yang_ping.pdf [ 1.91MB ]
- Metadata
- JSON: 24-1.0067275.json
- JSON-LD: 24-1.0067275-ld.json
- RDF/XML (Pretty): 24-1.0067275-rdf.xml
- RDF/JSON: 24-1.0067275-rdf.json
- Turtle: 24-1.0067275-turtle.txt
- N-Triples: 24-1.0067275-rdf-ntriples.txt
- Original Record: 24-1.0067275-source.json
- Full Text
- 24-1.0067275-fulltext.txt
- Citation
- 24-1.0067275.ris
Full Text
Cite
Citation Scheme:
Usage Statistics
Share
Embed
Customize your widget with the following options, then copy and paste the code below into the HTML
of your page to embed this item in your website.
<div id="ubcOpenCollectionsWidgetDisplay">
<script id="ubcOpenCollectionsWidget"
src="{[{embed.src}]}"
data-item="{[{embed.item}]}"
data-collection="{[{embed.collection}]}"
data-metadata="{[{embed.showMetadata}]}"
data-width="{[{embed.width}]}"
async >
</script>
</div>
Our image viewer uses the IIIF 2.0 standard.
To load this item in other compatible viewers, use this url:
http://iiif.library.ubc.ca/presentation/dsp.24.1-0067275/manifest