Modeling Dependencies in Multivariate Data by Aline Talhouk B.A, Concordia University, 1997 M.Sc., The University of British Columbia, 2007 A THESIS SUBMITTED IN PARTIAL FULFILLMENT OF THE REQUIREMENTS FOR THE DEGREE OF DOCTOR OF PHILOSOPHY in The Faculty of Graduate Studies (Statistics) THE UNIVERSITY OF BRITISH COLUMBIA (Vancouver) August 2013 c© Aline Talhouk 2013 Abstract In multivariate regression, researchers are interested in modeling a correlated multivariate response variable as a function of covariates. The response of interest can be multidimensional; the correlation between the elements of the multivariate response can be very complex. In many applications, the association between the elements of the multivariate response is typically treated as a nuisance parameter. The focus is on estimating efficiently the regression coefficients, in order to study the average change in the mean response as a function of predictors. However, in many cases, the estima- tion of the covariance and, where applicable, the temporal dynamics of the multidimensional response is the main interest, such as the case in finance, for example. Moreover, the correct specification of the covariance matrix is important for the efficient estimation of the regression coefficients. These complex models usually involve some parameters that are static and some dynamic. Until recently, the simultaneous estimation of dynamic and static parameters in the same model has been difficult. The introduction of par- ticle MCMC algorithms by Andrieu and Doucet (2002) has allowed for the possibility of considering such models. In this thesis, we propose a general framework for jointly estimating the covariance matrix of multivariate data as well as the regression coefficients. This is done under different settings, for different dimensions and measurement scales. ii Preface This work builds on work developed during my UBC Masters’s thesis (Tabet, 2007). Particularly, the work concerning binary data and the inference of static correlation matrices in Chapters 3 and 4 extends the Master’s thesis work. More specifically, in Chapter 3, we extend the work done in Tabet (2007) to include model selection, where a structure for the inverse corre- lation matrix is no longer assumed, but rather, we estimate the structure directly from the data along with other parameters. Another major change with respect to the Master’s thesis arises in Chap- ter 4. The prior on the regression coefficients, β that had been previously suggested in Tabet (2007) results in a dependence on D, the expansion pa- rameter, in the posterior. This situation is corrected in Chapter 4, by taking a conjugate prior for β that is centered at 0 and with a covariance matrix that depends on the correlation matrix R. Furthermore, the regression co- efficients are sampled in the augmented space and therefore we are able to integrate out β locally. Parts of the content of Chapters 3 and 4 were done in collaboration and under the supervision of Dr. Kevin Murphy and Dr. Arnaud Doucet and were published in the Journal of Computational and Graphical Statistics (Talhouk et al., 2012). The heart rate variability data example discussed in Chapters 1 and 5 is based on physiological data obtained from the Pediatric Anesthesia Research Team (PART) at the British Columbia Children’s Hospital. This study was approved by the University of British Columbia and Children’s & Women’s Health Centre of British Columbia Research Ethics Board (CREB H09-00905). Parents of all subjects provided written informed consent to participate in the study. More details about the study protocol and data collection are provided in Chris J. Brouse’s Ph.D dissertation (Brouse, 2013). iii Table of Contents Abstract . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . ii Preface . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . iii Table of Contents . . . . . . . . . . . . . . . . . . . . . . . . . . . iv List of Tables . . . . . . . . . . . . . . . . . . . . . . . . . . . . . ix List of Figures . . . . . . . . . . . . . . . . . . . . . . . . . . . . . x Glossary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . xvii Acknowledgments . . . . . . . . . . . . . . . . . . . . . . . . . . xx Dedication . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . xxii 1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1 1.1 Motivation . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1 1.2 Data Examples . . . . . . . . . . . . . . . . . . . . . . . . . . 2 1.2.1 Heart Rate Variability as an Indicator of Nociception 3 1.2.2 Stock Market Return of Technology Firms . . . . . . . 5 1.2.3 Women and Mathematics . . . . . . . . . . . . . . . . 8 1.2.4 Six Cities Data . . . . . . . . . . . . . . . . . . . . . . 8 1.3 Thesis Organization . . . . . . . . . . . . . . . . . . . . . . . 9 2 Time-Varying Covariance . . . . . . . . . . . . . . . . . . . . 12 2.1 The Wishart Prior . . . . . . . . . . . . . . . . . . . . . . . . 14 2.1.1 Dynamic Modeling of the Covariance in the Literature 15 iv Table of Contents 2.1.2 The Inverse Wishart . . . . . . . . . . . . . . . . . . . 15 2.1.3 The Dynamic Linear Model . . . . . . . . . . . . . . . 16 2.1.4 The Wishart Autoregressive Process . . . . . . . . . . 17 2.1.5 The Hyperparameters of the WAR Prior . . . . . . . . 20 2.2 Bayesian Estimation with the WAR . . . . . . . . . . . . . . 24 2.3 Sampling using Sequential Monte Carlo (SMC) . . . . . . . . 24 2.3.1 Computation of the Marginal Likelihood . . . . . . . . 27 2.4 SMC Algorithm for the Wishart Autoregressive Process . . . 28 2.4.1 The Non-central Wishart Prior as Importance Distri- bution . . . . . . . . . . . . . . . . . . . . . . . . . . . 28 2.4.2 A Central Wishart as Importance Distribution . . . . 29 2.4.3 Numerical Experiments using SMC . . . . . . . . . . . 33 2.5 Estimating Fixed Parameters . . . . . . . . . . . . . . . . . . 37 2.6 Particle MCMC for Fixed Parameter Estimation . . . . . . . 42 2.6.1 The Particle Independent Metropolis Hastings Algo- rithm (PIMH) . . . . . . . . . . . . . . . . . . . . . . 42 2.6.2 The Particle Marginal Metropolis Hastings Algorithm (PMMH) . . . . . . . . . . . . . . . . . . . . . . . . . 46 2.6.3 Estimating Both M and S . . . . . . . . . . . . . . . . 55 2.7 Summary and Conclusion . . . . . . . . . . . . . . . . . . . . 59 3 Static Correlation . . . . . . . . . . . . . . . . . . . . . . . . 66 3.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . 66 3.2 Conditional Independence . . . . . . . . . . . . . . . . . . . . 67 3.3 Gaussian Graphical Models . . . . . . . . . . . . . . . . . . . 68 3.4 Some Graph Theory . . . . . . . . . . . . . . . . . . . . . . . 68 3.5 Bayesian Estimation in Gaussian Graphical Models . . . . . . 71 3.5.1 Prior and Posterior for Covariance Matrices . . . . . . 72 3.5.2 Covariance Selection and Model Determination . . . . 75 3.6 Bayesian Estimation for Binary Data . . . . . . . . . . . . . . 77 3.6.1 Probit Model with a Structured Correlation Matrix . 81 3.7 Simulation Results . . . . . . . . . . . . . . . . . . . . . . . . 84 3.8 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . 86 v Table of Contents 4 Estimating the Effects of Covariates . . . . . . . . . . . . . 88 4.1 Multivariate Model, Static Covariance . . . . . . . . . . . . . 89 4.1.1 Continuous Response . . . . . . . . . . . . . . . . . . 89 4.1.2 Binary Response . . . . . . . . . . . . . . . . . . . . . 91 4.2 Dynamic Covariance . . . . . . . . . . . . . . . . . . . . . . . 94 4.2.1 Prior Specification for (β,Σ−11:T ) . . . . . . . . . . . . . 94 4.2.2 Posterior and Sampling Scheme . . . . . . . . . . . . . 95 4.3 Simulation Results . . . . . . . . . . . . . . . . . . . . . . . . 99 4.4 Summary and Discussion . . . . . . . . . . . . . . . . . . . . 102 5 Applications . . . . . . . . . . . . . . . . . . . . . . . . . . . . 105 5.1 Heart Rate Variability as an Indicator of Nociception . . . . . 105 5.2 Stock Market Returns of Technology Firms . . . . . . . . . . 112 5.3 “Women and Mathematics” Data Set . . . . . . . . . . . . . . 114 5.4 Six Cities Data Set . . . . . . . . . . . . . . . . . . . . . . . . 116 6 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 122 6.1 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 122 6.2 Future Work . . . . . . . . . . . . . . . . . . . . . . . . . . . 123 6.2.1 Numerical Evaluation of the Hypergeometric Function 123 6.2.2 Better Proposals for Static Parameters . . . . . . . . . 123 6.2.3 Improving Speed of Mixing and Avoiding Degeneracy 124 6.2.4 Other Possible Extensions . . . . . . . . . . . . . . . . 124 Bibliography . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 125 Appendix A Distributions . . . . . . . . . . . . . . . . . . . . . 130 A.1 The Gaussian Distribution . . . . . . . . . . . . . . . . . . . . 130 A.1.1 The Univariate Representation . . . . . . . . . . . . . 130 A.1.2 Multivariate and Matrix-Variate Gaussian Distribution 131 A.2 The χ2 Distribution . . . . . . . . . . . . . . . . . . . . . . . 132 A.2.1 The Central χ2 . . . . . . . . . . . . . . . . . . . . . . 132 A.2.2 The Non-central χ2 . . . . . . . . . . . . . . . . . . . 133 A.3 The Wishart Distribution . . . . . . . . . . . . . . . . . . . . 134 vi Table of Contents Appendix B Conjugate analysis . . . . . . . . . . . . . . . . . . 140 B.1 Normal Wishart . . . . . . . . . . . . . . . . . . . . . . . . . 140 B.2 Normal Wishart with Covariates . . . . . . . . . . . . . . . . 141 Appendix C Heart Rate Data Results . . . . . . . . . . . . . . 144 C.1 Bolus Event 1 Data Results . . . . . . . . . . . . . . . . . . . 144 C.2 Bolus Event 2 Data Results . . . . . . . . . . . . . . . . . . . 145 C.3 Bolus Event 3 Data Results . . . . . . . . . . . . . . . . . . . 147 C.4 Bolus Event 4 Data Results . . . . . . . . . . . . . . . . . . . 148 C.5 Bolus Event 5 Data Results . . . . . . . . . . . . . . . . . . . 150 C.6 Bolus Event 6 Data Results . . . . . . . . . . . . . . . . . . . 151 C.7 Bolus Event 7 Data Results . . . . . . . . . . . . . . . . . . . 153 C.8 Bolus Event 8 Data Results . . . . . . . . . . . . . . . . . . . 154 C.9 Bolus Event 9 Data Results . . . . . . . . . . . . . . . . . . . 156 C.10 Bolus Event 10 Data Results . . . . . . . . . . . . . . . . . . 157 C.11 Bolus Event 11 Data Results . . . . . . . . . . . . . . . . . . 159 C.12 Bolus Event 12 Data Results . . . . . . . . . . . . . . . . . . 160 C.13 Bolus Event 13 Data Results . . . . . . . . . . . . . . . . . . 162 C.14 Bolus Event 14 Data Results . . . . . . . . . . . . . . . . . . 163 C.15 Bolus Event 15 Data Results . . . . . . . . . . . . . . . . . . 165 C.16 Bolus Event 16 Data Results . . . . . . . . . . . . . . . . . . 166 C.17 Bolus Event 17 Data Results . . . . . . . . . . . . . . . . . . 168 C.18 Bolus Event 18 Data Results . . . . . . . . . . . . . . . . . . 169 C.19 Bolus Event 19 Data Results . . . . . . . . . . . . . . . . . . 171 C.20 Bolus Event 20 Data Results . . . . . . . . . . . . . . . . . . 172 C.21 Bolus Event 21 Data Results . . . . . . . . . . . . . . . . . . 174 C.22 Bolus Event 22 Data Results . . . . . . . . . . . . . . . . . . 175 C.23 Bolus Event 23 Data Results . . . . . . . . . . . . . . . . . . 177 C.24 Bolus Event 24 Data Results . . . . . . . . . . . . . . . . . . 178 C.25 Bolus Event 25 Data Results . . . . . . . . . . . . . . . . . . 180 C.26 Bolus Event 26 Data Results . . . . . . . . . . . . . . . . . . 181 C.27 Bolus Event 27 Data Results . . . . . . . . . . . . . . . . . . 183 C.28 Bolus Event 28 Data Results . . . . . . . . . . . . . . . . . . 184 vii Table of Contents C.29 Bolus Event 29 Data Results . . . . . . . . . . . . . . . . . . 186 C.30 Bolus Event 30 Data Results . . . . . . . . . . . . . . . . . . 187 viii List of Tables 1.1 Different types of multivariate data . . . . . . . . . . . . . . . 3 1.2 Summary statistics of technology stock log-returns . . . . . . 7 1.3 Breakdown of wheezing cases by age group and smoking sta- tus of mother at onset. The percentages in parentheses are given with respect to each age group. . . . . . . . . . . . . . . 9 2.1 Posterior median of ρ obtained for different values of c and K. 49 2.2 Time to run different algorithms, in minutes. . . . . . . . . . 62 5.1 Table of edge marginal probabilities comparing results ob- tained by the algorithm developed in this paper and those by MC3 algorithm as outlined in Tarantola (2004) . . . . . . . . 116 5.2 Breakdown of wheezing cases by age group and smoking sta- tus of mother at onset. The percentages in brackets are per- centages with respect to each age group. . . . . . . . . . . . . 117 5.3 Posterior means of the parameters of interest. The first two columns correspond to using the saturated model for the Jef- freys’s prior of Lawrence et al. (2008) and the marginally uni- form prior. The last column corresponds to using a marginally uniform prior and performing model selection. . . . . . . . . . 118 5.4 Posterior predictive statistical checks with 95% intervals and p-values, indicating that the posterior predictive distribution of the proposed model is not statistically different from the observed data . . . . . . . . . . . . . . . . . . . . . . . . . . . 121 ix List of Figures 1.1 A diagram depicting the multivariate longitudal response with n rows, J columns and over T time points. . . . . . . . . . . 1 1.2 The brain perceives pain through the nociception system, which prompts increased activity in the autonomic nervous system (ANS). In turn, the ANS activates the sympathetic nervous system (SNS), creating a stress response in the body that includes increased respiration, blood pressure and heart rate. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4 1.3 Heart rate data for a 6 year old child undergoing a dental procedure. The child starts out in a nociceptive state. An anasthetic bolus is administered and the state of the child returns to normal. The top line is the raw measured heart rate and the bottom line is the filtered heart rate, with the mean removed. . . . . . . . . . . . . . . . . . . . . . . . . . . 5 1.4 A plot of the weekly return of Google, Microsoft, IBM and Apple stock for the period of September 1 2004 to February 1st, 2012 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7 1.5 A graphical model representation of the multivariate longitu- dinal model which includes covariate information and covari- ate effects parameter. Observed variables are shaded, unob- served variables are clear. . . . . . . . . . . . . . . . . . . . . 11 2.1 Graphical representation of the multivariate longitudinal data. The observed variables are shaded, and the latent (unob- served) variables are not. Conditional on the precision ma- trices, the observed data are independent over time. . . . . . 13 x List of Figures 2.2 Precision simulated from the prior using different values of the hyper-parameters ρ, S and d. . . . . . . . . . . . . . . . . 22 2.3 Time series plots showing the effect of changing ρ on the marginal elements of Σ−1 in the bivariate case (J = 2). . . . . 23 2.4 Effective sample size and the mean squared error obtained by increasing the number of particles K. . . . . . . . . . . . . . . 35 2.5 Effective sample size and the mean squared error obtained by increasing the number of time steps T . . . . . . . . . . . . . 36 2.6 Effective sample size and the mean squared error obtained by increasing the number of observations n. . . . . . . . . . . . . 37 2.7 Effective sample size and the mean squared error obtained by increasing the number of observations n. . . . . . . . . . . . . 38 2.8 Marginal likelihood for different values of underlying ρ in varying dimensions, with T = 200 time steps. . . . . . . . . . 40 2.9 Error in the marginal likelihood for different values of under- lying ρ in varying dimensions, with T = 200 time steps. . . . 41 2.10 PIMH Simulation results showing the acceptance probability, the ESS and marginal liklihood as a function of Particles, MC iterations and number of times steps. . . . . . . . . . . . 45 2.11 The posterior traceplots of the parameter ρ obtained using the PMMH algorithm with different K and increasing c. . . . 49 2.12 Plot of the marginal likelihood obtained from running SMC with different values of ρ with K = 1000 particles. The max- imum is reached at ρ=0.83. . . . . . . . . . . . . . . . . . . . 50 2.13 Plot of the posterior estimate of ρ obtained using pMCMC with c = 0.2 and K = 1000. . . . . . . . . . . . . . . . . . . . 50 2.14 Plot of the true correlation and the median estimated corre- lation along with a 95% credible interval, obtained using the PMMH algorithm estimating M and fixing S. . . . . . . . . 52 2.15 Average entropy loss for Σ−1 computed over 30 independent experiments under a fixed ρ and using the PMMH algorithm. 54 2.16 Acceptance probability of PMMH on 30 different data sets, generated with different true value of ρ. . . . . . . . . . . . . 55 xi List of Figures 2.17 The posterior traceplots of the parameter ρ and s2 obtained using the PMMH algorithm with a fixed K = 2000 and si- multaneously increasing c1 and c2. . . . . . . . . . . . . . . . 57 2.18 The posterior traceplots of the parameter ρ and s2 obtained using the PMMH algorithm for c1 = c2 = 0.2. . . . . . . . . . 59 2.19 Posterior distribution of the estimated precision in the uni- variate case (J = 1) with 95% credible interval using the different algorithms. . . . . . . . . . . . . . . . . . . . . . . . 61 2.20 Plot of the true correlation and the median estimated corre- lation along with a 95% credible interval, obtained using the SMC algorithm with the Non-central Proposal. . . . . . . . . 63 2.21 Plot of the true correlation and the median estimated corre- lation along with a 95% credible interval, obtained using the PMMH algorithm estimating M and fixing S. . . . . . . . . 64 2.22 Plot of the true correlation and the median estimated corre- lation along with a 95% credible interval, obtained using the PMMH algorithm estimating both M and S. . . . . . . . . . 65 3.1 A decomposable graph and its junction tree decomposition. Each node of the junction tree represents a clique while ver- tices shared by adjacent nodes of the tree define the separators. 71 3.2 An example of two different graph structures, resulting in a fully connected precision in 3.2(a) versus a sparse precision matrix in 3.2(b). . . . . . . . . . . . . . . . . . . . . . . . . . 72 3.3 A graphical representation of the multivariate model with J = 3. In this model Y represents the observed binary data and Z represents the latent correlated Gaussian data. . . . . . . . 78 3.4 Most probable graphical structures and estimated posterior probabilities for simulated data. . . . . . . . . . . . . . . . . . 85 xii List of Figures 3.5 Matrix of posterior edge marginal probabilities, simulated data. Here the size of the square is indicative of the magni- tude of the posterior edge marginal probability and therefore the larger the square the higher the probability that an edge exists between the corresponding nodes. . . . . . . . . . . . . 86 3.6 Boxplot of the entropy loss over 50 datasets obtained using a fixed saturated correlation matrix (left) versus model selec- tion (right). . . . . . . . . . . . . . . . . . . . . . . . . . . . . 87 4.1 Example of ancestral lineages generated by an SMC algo- rithm for K = 5 and T = 3. The shaded path is Σ −1(2) 1:3 = (Σ −1(3) 1 ,Σ −1(4) 2 ,Σ −1(2) 3 ) and the ancestral lineage is B (2) 1:3 = (3, 4, 2) (example adapted from Andrieu et al. (2010)) . . . . 97 4.2 Posterior estimates of β using Particle Gibbs . . . . . . . . . 101 4.3 Plot of the true correlation and the median estimated corre- lation along with a 95% credible interval, obtained using the PG algorithm with conditional SMC. . . . . . . . . . . . . . 103 4.4 Entropy loss computed at each time step and plotted against time . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 104 5.1 Results for Bolus Event 1 . . . . . . . . . . . . . . . . . . . . 108 5.2 Boxplot of the posterior of ρ in the nociceptive state (NC) and anti-nociceptive (ANC) for bolus event 1. . . . . . . . . . 109 5.3 Results for Bolus Event 2 . . . . . . . . . . . . . . . . . . . . 110 5.4 Results for Bolus Event 4 . . . . . . . . . . . . . . . . . . . . 111 5.5 Posterior distribution of ρ obtained from running the PMMH algorithm on the four technology stock problem . . . . . . . . 113 5.6 The marginal likelihood evaluated and plotted for different scalar values for ρ on a grid. The maximum is reached at ρ = 1114 5.7 Most probable graphical structures and estimated posterior probabilities for the “Women and Mathematics” data set. . . 115 xiii List of Figures 5.8 Autocorrelation function for the marginal correlation parame- ters obtained from the Six Cities example under the saturated assumption. . . . . . . . . . . . . . . . . . . . . . . . . . . . . 119 5.9 The four most probable graph structures, with their associ- ated posterior probability . . . . . . . . . . . . . . . . . . . . 120 C.1 Results for bolus event 1 . . . . . . . . . . . . . . . . . . . . . 144 C.2 Boxplot of the posterior of ρ in the nociceptive state (NC) and anti-nociceptive (ANC) for bolus event 1. . . . . . . . . . 145 C.3 Results for Bolus Event 2 . . . . . . . . . . . . . . . . . . . . 145 C.4 Boxplot of the posterior of ρ in the nociceptive state (NC) and anti-nociceptive (ANC) for bolus event 2. . . . . . . . . . 146 C.5 Results for bolus event 3 . . . . . . . . . . . . . . . . . . . . . 147 C.6 Boxplot of the posterior of ρ in the nociceptive state (NC) and anti-nociceptive (ANC) for bolus event 3. . . . . . . . . . 148 C.7 Results for bolus event 4 . . . . . . . . . . . . . . . . . . . . . 148 C.8 Boxplot of the posterior of ρ in the nociceptive state (NC) and anti-nociceptive (ANC) for bolus event 4. . . . . . . . . . 149 C.9 Results for bolus event 5 . . . . . . . . . . . . . . . . . . . . . 150 C.10 Boxplot of the posterior of ρ in the nociceptive state (NC) and anti-nociceptive (ANC) for bolus event 5. . . . . . . . . . 151 C.11 Results for bolus event 6 . . . . . . . . . . . . . . . . . . . . . 151 C.12 Boxplot of the posterior of ρ in the nociceptive state (NC) and anti-nociceptive (ANC) for bolus event 6. . . . . . . . . . 152 C.13 Results for bolus event 7 . . . . . . . . . . . . . . . . . . . . . 153 C.14 Boxplot of the posterior of ρ in the nociceptive state (NC) and anti-nociceptive (ANC) for bolus event 7. . . . . . . . . . 154 C.15 Results for bolus event 8 . . . . . . . . . . . . . . . . . . . . . 154 C.16 Boxplot of the posterior of ρ in the nociceptive state (NC) and anti-nociceptive (ANC) for bolus event 8. . . . . . . . . . 155 C.17 Results for bolus event 9 . . . . . . . . . . . . . . . . . . . . . 156 C.18 Boxplot of the posterior of ρ in the nociceptive state (NC) and anti-nociceptive (ANC) for bolus event 9. . . . . . . . . . 157 xiv List of Figures C.19 Results for bolus event 10 . . . . . . . . . . . . . . . . . . . . 157 C.20 Boxplot of the posterior of ρ in the nociceptive state (NC) and anti-nociceptive (ANC) for bolus event 10. . . . . . . . . 158 C.21 Results for bolus event 11 . . . . . . . . . . . . . . . . . . . . 159 C.22 Boxplot of the posterior of ρ in the nociceptive state (NC) and anti-nociceptive (ANC) for bolus event 11. . . . . . . . . 160 C.23 Results for bolus event 12 . . . . . . . . . . . . . . . . . . . . 160 C.24 Boxplot of the posterior of ρ in the nociceptive state (NC) and anti-nociceptive (ANC) for bolus event 12. . . . . . . . . 161 C.25 Results for bolus event 13 . . . . . . . . . . . . . . . . . . . . 162 C.26 Boxplot of the posterior of ρ in the nociceptive state (NC) and anti-nociceptive (ANC) for bolus event 13. . . . . . . . . 163 C.27 Results for bolus event 14 . . . . . . . . . . . . . . . . . . . . 163 C.28 Boxplot of the posterior of ρ in the nociceptive state (NC) and anti-nociceptive (ANC) for bolus event 14. . . . . . . . . 164 C.29 Results for bolus event 15 . . . . . . . . . . . . . . . . . . . . 165 C.30 Boxplot of the posterior of ρ in the nociceptive state (NC) and anti-nociceptive (ANC) for bolus event 15. . . . . . . . . 166 C.31 Results for bolus event 16 . . . . . . . . . . . . . . . . . . . . 166 C.32 Boxplot of the posterior of ρ in the nociceptive state (NC) and anti-nociceptive (ANC) for bolus event 16. . . . . . . . . 167 C.33 Results for bolus event 17 . . . . . . . . . . . . . . . . . . . . 168 C.34 Boxplot of the posterior of ρ in the nociceptive state (NC) and anti-nociceptive (ANC) for bolus event 17 . . . . . . . . 169 C.35 Results for bolus event 18 . . . . . . . . . . . . . . . . . . . . 169 C.36 Boxplot of the posterior of ρ in the nociceptive state (NC) and anti-nociceptive (ANC) for bolus event 18. . . . . . . . . 170 C.37 Results for bolus event 19 . . . . . . . . . . . . . . . . . . . . 171 C.38 Boxplot of the posterior of ρ in the nociceptive state (NC) and anti-nociceptive (ANC) for bolus event 19. . . . . . . . . 172 C.39 Results for bolus event 20 . . . . . . . . . . . . . . . . . . . . 172 C.40 Boxplot of the posterior of ρ in the nociceptive state (NC) and anti-nociceptive (ANC) for bolus event 20. . . . . . . . . 173 xv List of Figures C.41 Results for bolus event 21 . . . . . . . . . . . . . . . . . . . . 174 C.42 Boxplot of the posterior of ρ in the nociceptive state (NC) and anti-nociceptive (ANC) for bolus event 21. . . . . . . . . 175 C.43 Results for bolus event 22 . . . . . . . . . . . . . . . . . . . . 175 C.44 Boxplot of the posterior of ρ in the nociceptive state (NC) and anti-nociceptive (ANC) for bolus event 22. . . . . . . . . 176 C.45 Results for bolus event 23 . . . . . . . . . . . . . . . . . . . . 177 C.46 Boxplot of the posterior of ρ in the nociceptive state (NC) and anti-nociceptive (ANC) for bolus event 23. . . . . . . . . 178 C.47 Results for bolus event 24 . . . . . . . . . . . . . . . . . . . . 178 C.48 Boxplot of the posterior of ρ in the nociceptive state (NC) and anti-nociceptive (ANC) for bolus event 24. . . . . . . . . 179 C.49 Results for bolus event 25 . . . . . . . . . . . . . . . . . . . . 180 C.50 Boxplot of the posterior of ρ in the nociceptive state (NC) and anti-nociceptive (ANC) for bolus event 25. . . . . . . . . 181 C.51 Results for bolus event 26 . . . . . . . . . . . . . . . . . . . . 181 C.52 Boxplot of the posterior of ρ in the nociceptive state (NC) and anti-nociceptive (ANC) for bolus event 26. . . . . . . . . 182 C.53 Results for bolus event 27 . . . . . . . . . . . . . . . . . . . . 183 C.54 Boxplot of the posterior of ρ in the nociceptive state (NC) and anti-nociceptive (ANC) for bolus event 27. . . . . . . . . 184 C.55 Results for bolus event 28 . . . . . . . . . . . . . . . . . . . . 184 C.56 Boxplot of the posterior of ρ in the nociceptive state (NC) and anti-nociceptive (ANC) for bolus event 28. . . . . . . . . 185 C.57 Results for bolus event 29 . . . . . . . . . . . . . . . . . . . . 186 C.58 Boxplot of the posterior of ρ in the nociceptive state (NC) and anti-nociceptive (ANC) for bolus event 29. . . . . . . . . 187 C.59 Results for bolus event 30 . . . . . . . . . . . . . . . . . . . . 187 C.60 Boxplot of the posterior of ρ in the nociceptive state (NC) and anti-nociceptive (ANC) for bolus event 30. . . . . . . . . 188 xvi Glossary Random and Observed Variables • V : a response variable of interest. It also may refer to a vertex in a graphical model. • Z: a continuous random variable. • Y : a binary 0/1 random variable. • X is a n× p design matrix of covariates observed without error. Parameters • β: is a p× J matrix of regression coefficients. • µ: is a notation to refer to the mean parameter • Σ: is a sequence of J × J covariance matrices of length T • S: scale parameter of the Wishart distribution • M : autoregressive parameter • u: latent Gaussian variable • d: degrees of freedom of the Wishart distribution • G: a graphical model, a collection of vertices xvii Glossary Indices • n : number of observations indexed by i = 1, . . . , n. • J : number of dependent variables indexed by j = 1, . . . , J . • T : number of time intervals that may or may not be equally spaced indexed by t = 1, . . . , T . • K: number of particles m = 1, . . . ,K Distribution Related Notation • g(.): the canonical link of a GLM • f(.): the likelihood of the observed data. • pi(.): the distribution of the prior or the posterior distribution of pa- rameters. • p(.): denotes the target distribution of interest. • q(.): denotes the importance distribution used to approximate the target of interest. • E(.): denotes the expected value • Var(.): denotes the variance • F(.): a parametric distribution that hasn’t been specified. Acronyms • MCMC: Markov Chain Monte Carlo • SMC: Sequential Monte Carlo • pMCMC: Particle Markov Chain Monte Carlo • AR: Autoregressive process • WAR: Wishart Autorepgressive process xviii Glossary Other Symbols • ⊗: Kroneker product • etr: exponential trace such that etr(A) ≡ exp (tr(A)) xix Acknowledgments First and foremost, I would like to express my heartfelt gratitude to my advisors during my time at UBC: Dr. Paul Gustafson, Dr. Arnaud Doucet and Dr. Kevin Murphy. I could not have asked for better role models, each inspirational, supportive, and patient. Through your guidance and support, I learned to be a better researcher and critical thinker. It has been a huge honor to work with you, and I appreciate your contributions of time, ideas, and funding to make my graduate school experience possible and successful. I would also like to thank my examining and supervisory committee: Dr. William Welch, Dr. Jim Zidek, and Dr. Jane Wang and Dr. Petros Dellaportas for their constructive, thoughtful and detailed feedback. I am very grateful to the anonymous reviewers at JCGS for their useful comments which helped improve the work presented herein. This thesis was partially funded by an NSERC CGS-D grant, I would like to thank NSERC for their generous support. A special thanks goes to all the faculty and staff in the department of Statistics at UBC. I am particularly happy to acknowledge my debt to Dr. John Petkau and to Rick White who helped plant and nurture my interest in consulting. My deepest thanks and acknowledgment goes to my collaborator and friend Chris Brouse for allowing me to use the data from his research and for assisting in the biological interpretation of the results. My time at UBC has been made enjoyable in large part due to the many friends that became a part of my life. I am grateful for the memorable times spent in the company of Roman Holenstein, Kimberly Fernandes, Steve Kanters, Mike Marin, Alexandra Romann, Luke Bornn, Julien Cornebise, Francois Caron, Pierre Jacob, Glen Goodvin, Michael Chiang, and Mark Schmidt. Your friendship has been most inspiring both professionally and xx Acknowledgments personally. I have been very fortunate to have many friends I was able to rely on for support outside the UBC community. Marianne Zakhour, I am forever in debt to you for your generosity, countless support and loyalty. Roxane Genot, your contagious laughter makes the darkest day a little brighter. Rodney Daughtrey, you have been my rock during the final stages of this Ph.D., I am very grateful to have you in my life. Lastly, I would like to thank my family for their continued love, support and encouragement; Rhea for brightening my life with laughter and hope, my brother Ziad for teaching me the meaning of perseverance, my dad for inspiring me to leave my mark and my mom who is the real reason I had any chance to make it this far down the academic path. Thank you. Aline Talhouk UBC July 2013 xxi Dedication To Rhea of Sunshine. xxii Chapter 1 Introduction 1.1 Motivation In multivariate regression, researchers are interested in modeling a correlated multivariate response variable as a function of covariates. The response of interest, in the most general case, can be three dimensional and represented by a cube (Figure 1.1). Let V denote the multivariate longitudinal response which consists of a vector of dimension J , measured for many units of obser- vation n, over multiple time steps T . Each time step t = 1, . . . , T , represents a time slice of the cube. V11 Vi1 Vn1 V1J ViJ VnJ ... ... ... ... ... ... ... t = 1 t = T TimeUn its of ob ser va tio n Responses Figure 1.1: A diagram depicting the multivariate longitudal response with n rows, J columns and over T time points. 1 1.2. Data Examples The correlation between the elements of the multivariate response can be very complex. At time slice t, the J responses may be correlated, the n units of observation may be clustered, furthermore, the correlation might be smoothly changing over time. In this thesis, we do not consider clustered or spatially correlated units. Therefore, all units of observation (the rows of the cube in Figure 1.1) are assumed to be independent and identically distributed. In certain applications, such as in biostatistics, for example, the associ- ation between the elements of the multivariate response is typically treated as a nuisance parameter. The focus is on efficiently estimating the regres- sion coefficients, in order to study the average change in the mean response as a function of predictors. However, in many cases, the estimation of the covariance which encodes the co-variation, and where applicable the tem- poral dynamics of the multidimensional response variable, are of interest. A common example arises in financial applications, where the primary aim is to model the volatility and the co-volatility of financial variables. More- over, the correct specification of the covariance matrix is important for the efficient estimation of the regression coefficients. In this thesis, we aim to develop methodology to jointly estimate the association between the elements of the multivariate response as well as the effect of covariates. We propose a framework that allows for an increase in each of the three dimensions (n, J and T ) and that can be suitable for both a continuous and binary data measurement. 1.2 Data Examples Multivariate data are very common and arise in many applications. They are referred to by different names, but each can be viewed as a special case of the cube in Figure 1.1. In table 1.1, we summarize some of the different types of responses commonly encountered in applications, and link them back to the cube of Figure 1.1. 2 1.2. Data Examples Data Type Units Outcomes Time n J T Univariate Response many one one Multivariate Response many many one Longitudinal Response many one many Time Series one one many, equally spaced Repeated Measure many one many, not always equally spaced Multivariate Time Series one many many, equally spaced Multivariate Longitudinal many many many, not always equally spaced MV Repeated Measure many many many, not always equally spaced Table 1.1: Different types of multivariate data In the next sections, we will introduce some of the data examples to be used in this thesis, along with the research question that we will be addressing. 1.2.1 Heart Rate Variability as an Indicator of Nociception The first example comes from a health research application in neuroscience. The response variable of interest is a time series measurement of the heart rate of a patient during surgery (n = 1, J = 1, T = 418). In this example, we do not consider covariate information; rather, we would like to model the variability in the heart rate at each time step. A change in the heart rate variability is an indicator of a stress response (Figure 1.2), which is an indicator that a patient is feeling pain. In sick patients, a strong stress response can cause serious injury. Anesthesiologists try to minimize the stress response during surgery by giving patients drugs that block nociception. Finding the appropriate balance can be challenging: too much anesthesia can make the patient very ill, while too little anesthesia increases the stress response. The aim is to measure the variability in heart rate to assist anesthesi- ologists in determining level of Autonomic Nervous System (ANS) Activity. The data under consideration for this example is obtained from the Pedi- atric Anesthesia Research Team (PART) at the British Columbia Children’s Hospital. The patients are 6 year old children undergoing dental procedures under general anesthesia. A baseline-stimulus response experiment was con- 3 1.2. Data Examples Nocicep'on) System)in)the) Brain)) Autonomic) Nervous)System) (ANS))) Sympathe'c) Nervous)System) (SNS))) Pain) S'mulus) Stress) Response:)) < Increased)respira'on) < High)blood)pressure) < Change'in'heart'rate'variability' Figure 1.2: The brain perceives pain through the nociception system, which prompts increased activity in the autonomic nervous system (ANS). In turn, the ANS activates the sympathetic nervous system (SNS), creating a stress response in the body that includes increased respiration, blood pressure and heart rate. ducted, where a patient is in a baseline state, a stimulus occurs and the pa- tient exhibits a response changing the initial state. In the data we consider the patient is experiencing pain so that the baseline state is the nociceptive state. An anasthetic bolus (the stimulus) is administered. This can be a single shot or multiple shots applied over a few seconds. Subsequently, the state of the patient returns to normal which is the anti-nociceptive state. There are a total of 30 such bolus events, which we analyze separately. Each event is unique, although some of these events are from the same procedure. In Figure 1.3, we plot the heart rate data from one such bolus event over time. The top line is the raw measurement of heart rate over T = 418 time steps. Typically the interest is not in modeling the mean but the variability of heart rate, therefore, it is common to pre-process the data to remove the mean (the bottom line in Figure 1.3). This is usually done by fitting an autoregressive linear regression model and working with the residuals. 4 1.2. Data Examples 50 100 150 200 250 300 350 400 −20 0 20 40 60 80 100 120 time Bolus H e a r t R a t e Nociceptive Period Anti Nociceptive Period Figure 1.3: Heart rate data for a 6 year old child undergoing a dental pro- cedure. The child starts out in a nociceptive state. An anasthetic bolus is administered and the state of the child returns to normal. The top line is the raw measured heart rate and the bottom line is the filtered heart rate, with the mean removed. 1.2.2 Stock Market Return of Technology Firms In the next example we consider volatilities of assets returns which are gen- erally thought to be time-changing (Poon and Granger, 2003). Moreover, they are likely to be serially correlated. This may suggest that the covariance matrix is smoothly and slowly changing in time. This has been well estab- lished in the financial time series literature, making multivariate stochastic volatility models an important topic of research in that field. To illustrate this with an example, we consider the weekly log returns of technology stocks for the period of September 1 2004 to February 1 2012. 5 1.2. Data Examples The data was obtained from Yahoo! Finance. We look at log returns of clos- ing values, adjusted for dividends and split, for the following four technology assets: Google, Microsoft, IBM, and Apple. The time series are of dimension T = 387 time steps, with n = 1 and J = 4. A first step in the analysis of this model is to pre-process the data so that the response variable of interest is mean-centered. One way to do this is to fit a J-variate autoregressive ordinary least square model and work with the fitted residuals instead of the raw data (More on this in Chapter 5). A plot of the mean-centered time series over time is shown in Figure 1.4 and the summary statistics are presented in Table 1.2. The sample correlation matrix is given by: R̂ =  1.00 0.39 0.43 0.53 0.39 1.00 0.51 0.36 0.43 0.51 1.00 0.54 0.53 0.36 0.54 1.00  . (1.1) 6 1.2. Data Examples 0 50 100 150 200 250 300 350 −0.1 0 0.1 G oo gl e 0 50 100 150 200 250 300 350 −0.1 0 0.1 M ic ro so ft 0 50 100 150 200 250 300 350 −0.1 0 0.1 IB M Log−Returns as a Function of Time 0 50 100 150 200 250 300 350 −0.2 0 0.2 Ap pl e time (weeks) Figure 1.4: A plot of the weekly return of Google, Microsoft, IBM and Apple stock for the period of September 1 2004 to February 1st, 2012 Google Microsoft IBM Apple Minimum −0.07 −0.09 −0.07 −0.12 Q1 −0.01 −0.01 −0.01 −0.01 Median 0.00 0.00 0.00 0.00 Mean 0.00 0.00 0.00 0.00 Q3 0.01 0.01 0.01 0.02 Maximum 0.08 0.06 0.06 0.08 St. Dev. 0.02 0.02 0.01 0.02 Table 1.2: Summary statistics of technology stock log-returns The main aim of this analysis is to provide an estimate of the 4 × 4 covariance matrix at each time point t, while accounting for the slow evolu- tion of this matrix in time. Moreover, it would be desirable to estimate the smoothness parameter directly from the data instead of making assumptions about this parameter. 7 1.2. Data Examples 1.2.3 Women and Mathematics Another example comes from research where the primary aim is to study the attitude of 1190 New Jersey high school students toward mathematics (Fowlkes et al., 1988a). In this research, there is no longitudinal component (T = 1), rather we only consider the association between the following six binary response variables measured for each individual: V1 Lecture Attendance (Attend/Did not Attend) V2 Gender (Male/Female) V3 School Type (Urban/Suburban) V4 Perceived need for Mathematics in future work (Yes/No) V5 Subject Preference (Math/Science vs. Liberal Arts) V6 Future Plans (College/Job) In this example the response Y is an n × J matrix of binary observations with n = 1190 and J = 6. The primary aim here is to study the structure of association between the elements of the response. 1.2.4 Six Cities Data The final example is a subset of the Six Cities study, a longitudinal study of the health effects of air pollution. Even though the data is longitudinal in this case, we will not assume that the covariance is changing over time. The data under consideration consists of repeated binary measures of wheezing status of 537 children from Stuebenville, Ohio. The objective is to model the probability of wheeze status over time as a function of the mother’s smoking status during the first year of the study. The multivariate response variable of interest in this case is Vit, indicating the wheezing status of child i at annual visit t = (1, 2, 3, 4) corresponding to age (7, 8, 9, 10). The full data is an n × T matrix, with n = 537 and T = 4. The nature of the data is suggestive of an association between the T elements of the multivariate 8 1.3. Thesis Organization response variable. Even though the association is not assumed to change over time, it is possible that it may follow a particular structure. We are interested in estimating the association structure as well as assessing the effects of one covariate, the mother’s smoking status at the beginning of the study (1=yes, 0=no), on the mean response. Table 5.2 breaks down the number of “cases” representing wheezing symptoms, by age group and mother smoking status. From looking at these results, it appears that the percentage of wheezing children is higher for smoking mothers at each age. Mother Smoking Status Child’s Age 7 8 9 10 Smoker 32 (17.0%) 40 (21.3%) 36 (19.1%) 27 (14.4%) Non Smoker 55 (15.8%) 51 (14.6%) 49 (14.0%) 36 (10.0%) Total 87 (16.2%) 91 (16.9%) 85 (15.8%) 63 (11.7 %) Table 1.3: Breakdown of wheezing cases by age group and smoking status of mother at onset. The percentages in parentheses are given with respect to each age group. 1.3 Thesis Organization In this thesis, we propose a framework for estimating the covariance matrix of multivariate data under different settings, different dimensions, and for different measurement scales. In Chapter 2, we explore methods and algo- rithms for estimating a sequence of covariance matrices that are smoothly changing in time. This is done for continuous data measurements. This cor- responds to multivariate stochastic volatility models, which are very popular in the econometrics literature. We develop algorithms that model the co- variance Σt = Var(Vt) directly at each time step and the evolution in time Σt | Σt−1 taking into account the complex structure and the dynamic nature of the response. We use novel algorithms that allow for the simultaneous estimation of both static and time-varying parameters. We consider the performance of these algorithms as the dimension of the response increases 9 1.3. Thesis Organization (over n, J , and T ). Moreover, we extend the algorithm to estimate some of the model’s hyper-parameters. In Chapter 3, the multivariate response variable under consideration is binary, with no longitudinal element. It consists of an n × J array of ob- servations. Estimating a full correlation matrix requires the estimation of J(J − 1)/2 parameters. The number of parameters increases quadratically as a function of the number of variables. Therefore, when dealing with high dimensional problems, imposing a structure of association is helpful both conceptually and computationally. Conceptually, not all of the variables may be associated, and therefore it might make more sense from a subject matter standpoint to impose those sparsity constraints. From a computa- tional viewpoint, this reduces the number of free parameters to be estimated compared with the saturated model. For a properly specified structure, this would result in more accurate and efficient estimates of parameters. In Chapter 3, we appeal to the well-developed theory of covariance selection in Gaussian graphical models to impose sparsity constraints on the inverse covariance matrix. In the case of binary observations, for reasons of identi- fiability, the covariance matrix needs to be constrained to be a correlation matrix R (Chib and Greenberg, 1998), with diagonal elements Rii = 1 and off-diagonal elements Rij ∈ (−1, 1). The sparsity constraint will be imposed by setting some off-diagonal elements in the inverse correlation R−1ij to be 0. In Tabet (2007), we showed how this can be achieved using theories of Gaussian graphical models when the sparsity structure is known. The main contribution in Chapter 3 is to propose an extension to the algorithm to allow the estimation of a sparse structure directly from the data. In Chapter 4, in addition to estimating the association between the ele- ments of the response, we also propose a methodology to estimate the effects of covariates. Even though the covariates X may be measured at each time step, i.e, time varying covariates, their effect on the response, denoted by β, is assumed to be static. Figure 1.5 is a graphical model representation of the general multivariate longitudinal model with covariates. The unknown parameters are the covariance matrices Σ1:T and the covariate effect β. The response variable V1:T and the covariates X1:T are observed. We consider es- 10 1.3. Thesis Organization ∑T ∑t+1 ∑t ∑t-1 ∑1 ... ... VT Vt+1 Vt Vt-1 V1 XT Xt+1 Xt Xt-1 X1 β Figure 1.5: A graphical model representation of the multivariate longitudinal model which includes covariate information and covariate effects parameter. Observed variables are shaded, unobserved variables are clear. timating covariates when the response is continuous and Σ is evolving in time dynamically, as depicted in Figure 1.5. We also consider a non-longitudinal (static) case with T = 1. Under this setting we propose methodology to es- timate covariates for binary response, while accounting for the identifiability constraints. In Chapter 5, we show how the methodology developed in this thesis can be used to address real problems and applied to real dataset. We revisit the data examples that we introduced in the previous section. Finally in Chapter 6, we discuss some of the limitation of the methodology and future work. 11 Chapter 2 Time-Varying Covariance In this Chapter, we assume that the correlation between the elements of the multivariate response variable is slowly changing in time. The aim is to estimate the model’s underlying parameters, which in this case, consist of a sequence of covariance matrices and a smoothing parameter that controls the degree of change from one time step to the next. In general the smoothing parameter is assumed to be static, while the covariances are assumed to be dynamic. The main contribution of this Chapter is to propose a novel method for simultaneously estimating the dynamic and the static parameters of the model using particle Markov Chain Monte Carlo (Andrieu and Doucet, 2002). Throughout this Chapter, we assume the observed response variable is continuous, multivariate with J dimensions, longitudinal with T time steps, and with n number of observations. Under this scenario, the response is a n×J ×T array of continuous observations corresponding to Figure 1.1. Let Z denote the multivariate response of interest and we assume that at any given time point t, Zt arises from a matrix-Normal distribution such that Zt ∼ MN n,J(µ,Σ,Φ), where MN denotes the matrix-Normal distribution with probability density function given by f(Zt | µ,Σt,Φ) = |Σ −1 t |n/2|Φ|−J/2 (2pi)nJ/2 etr { −1 2 Σ−1t (Zt − µ)′Φ−1(Zt − µ) } , (2.1) where etr denotes the exponential trace, µ represents the mean of Zt, Σt is a J × J time-varying covariance matrix that depends on Σt−1 and encodes the dependence between the columns of Zt. The n × n matrix Φ encodes the dependence between the rows of Zt. In this research we assume that 12 Chapter 2. Time-Varying Covariance Zt+1 ZT Zt Zt-1 Z1 Σ1 Σt-1 Σt Σt+1 ΣT …" …" -1 -1 -1 -1 -1 Figure 2.1: Graphical representation of the multivariate longitudinal data. The observed variables are shaded, and the latent (unobserved) variables are not. Conditional on the precision matrices, the observed data are inde- pendent over time. observations are independent and identically distributed, therefore we set Φ = I, the identity matrix. It is important to note that when Φ is equal to the identity matrix, the matrix-variate Gaussian simplifies to the multi- variate Gaussian with probability density function given in Appendix A.1.2. Moreover, in this Chapter, for convenience, we model the data in terms of a precision matrix Σ−1t rather than a covariance matrix Σt. The response, Z1, . . . , ZT , is assumed to be mean-centered with µ = E(Zt) = 0 and conditionally independent given Σ −1 1 , . . . ,Σ −1 T . Intuitively, this means that the information in Zt depends on the information in Zt−1 only through what is captured in the change of Σ−1t | Σ−1t−1. Therefore, given the Markovian assumption of the model, the joint distribution of Z and Σ−1 decomposes according to the graph in Figure 2.1 and the joint distribution can be written as: pi(Z1:T ,Σ −1 1:T ) = f(Z1 | Σ−11 )pi(Σ−11 ) T∏ t=2 f(Zt | Σ−1t )pi(Σ−1t | Σ−1t−1). (2.2) This model represents a non-linear dynamic state space model where Zt is the observed state and Σ−1t is the hidden state and the model is described by the observation density f(Zt | Σt) and the transition density pi(Σ−1t | 13 2.1. The Wishart Prior Σ−1t−1). Under a Bayesian paradigm, the joint posterior of Σ−11:T | Z1:T is pro- portional to the joint distribution in (2.2) and differs only by a normalizing constant. The likelihood, f(Zt | Σ−1t ), also known as the observation den- sity is given in (A.5). Therefore, it remains to specify a joint prior on Σ−11:T , which decomposes as the initial prior pi(Σ−11 ) and the transition density pi(Σ−1t | Σ−1t−1). 2.1 The Wishart Prior The Wishart distribution plays a very important role in Bayesian estimation of the inverse covariance matrix, because it is the conjugate prior on inverse covariance matrices for multivariate Gaussian data. There exists several parameterizations of the Wishart distribution (see appendix A.3 for the different parameterizations and their equivalence). Throughout this thesis, we might appeal to different parameterizations at different times, as needed. In this Chapter, we use the parametrization obtained by taking the sum of squared Gaussian variables. If a J×J precision matrix Σ−1 follows a Wishart distribution with degrees of freedom ν ≥ J and scale parameter Ω > 0, and denoted by WJ(ν,Ω), its probability density function has mass only for a positive definite matrix Σ−1. The p.d.f is given by { 2νJ/2ΓJ (ν/2) |Ω|ν/2 }−1 |Σ−1| 12 (ν−J−1)etr(−1 2 Ω−1Σ−1 ) , (2.3) where ΓJ is the multivariate gamma function given by ΓJ(a) = pi 1 4 J(J−1) J∏ i=1 Γ [ a− 1 2 (i− 1) ] . (2.4) Because the Wishart is conditionally conjugate, and because it is the mul- tidimensional generalization of the χ2 distribution, its use is equivalent to extending the univariate model to the multivariate case and allows the inter- pretation of hyperparameters over increasing dimensions. For this reason, 14 2.1. The Wishart Prior we restrict our attention to prior distributions that involve the Wishart and inverse Wishart distributions. 2.1.1 Dynamic Modeling of the Covariance in the Literature Modeling time-varying covariance and correlation matrices has been a very active area of research in econometrics, through the study of multivariate stochastic volatility models. A few approaches have been proposed for this class of models, see the paper by Asai et al. (2006) and more recently by Chib et al. (2007) for a comprehensive review. From this literature, we are particularly interested in models that use the Wishart family of distributions to model the dynamics over time. 2.1.2 The Inverse Wishart Rather then working with the precision matrix, Philipov and Glickman (2006) model the covariance transition by proposing the following inverse Wishart prior for Σt (see appendix A.3 for the relationship to the Wishart distribution) Σt | Σt−1 ∼ IWJ(ν, St−1), (2.5) where ν is the degrees of freedom and St−1 is the scale parameter which is used to encode the temporal dynamics and is defined as St = 1 ν (A1/2)(Σ−1t ) d(A1/2)′. (2.6) Under this set up, the hyperparameters in the model are: A, a positive definite matrix with Cholesky decomposition A = (A1/2)(A1/2)′; d, a scalar parameter; and ν, an integer degrees of freedom parameter. This approach has some nice properties, namely for a time-invariant structure, this model simplifies to the usual Normal-inverse-Wishart model. Furthermore, the hyperparameters A and d determine the dynamic behaviour of the covari- ance structure through St. The matrix A can be seen as the measure of inter-temporal covariance. The scalar parameter d denotes the strength of 15 2.1. The Wishart Prior the relationships over time (long-memory, persistence). When d = 0, an independent and identically distributed covariance over time is implied. A special case also arises when d = 1 and A = I and corresponds to the matrix- variate random walk. On the other hand, using this approach does not ap- pear to result in stationary marginal distributions. Philipov and Glickman (2006) do not make any claims on that either, they only prove stationarity of the determinant |Σt| for values of |d| < 1. To confirm this, we performed a simulation study by sampling from this prior distribution, for different val- ues of A and d. Indeed, the marginal distribution on the covariance matrices seems to change for each time point (results not shown). 2.1.3 The Dynamic Linear Model Adopting a completely different approach, Carvalho and West (2007) pro- pose an extension of the dynamic linear model for sequential forecasting and updating, and appeal to the hyper-inverse Wishart distribution that arises in Gaussian graphical models (see appendix A.3 for definition). The prior they place on Σt is Σt ∼ HIWJ(bt, St), and bt = δbt+1 + 1 St = δSt−1 + ete′t where et is a vector of realized forecast errors et = Zt− ft, where ft denotes the forecast at time t. The advantage of this approach is that by using the hyper-inverse Wishart, a sparse covariance matrix can be estimated by imposing a structure on Σ−1t . However, issues using this approach arise with the hyperparameters, because at each time point, the hyperparameter used in the prior is a function of the posterior parameters and the data from the previous time point. Further- more, this process has been shown to be non-stationary. 16 2.1. The Wishart Prior 2.1.4 The Wishart Autoregressive Process An alternative approach uses the Wishart autoregressive process of order 1, introduced by Gourieroux et al. (2009), and denoted by WAR (1). This process constructs, at each time step, a Wishart distributed matrix by sum- ming over squared latent Gaussian variables. These latent variables evolve in time according to an autoregressive process. More specifically, for t = 1, . . . , T , we define an artificial latent variable ut, a d× J matrix that follows a matrix-variate Gaussian distribution with initial distribution at t = 1 given by u1 ∼MN d,J(0, S, Id), (2.7) and subsequently, ut | ut−1 ∼MN d,J(Mu′t−1, S −MSM ′, Id). (2.8) The temporal dependence is introduced by evolving ut in time according to an autoregressive process of order 1, so that at each time step t, we have d independent AR processes for vectors of length J . Taking the square of these vectors and adding them up results in a J×J matrix and is equivalent to computing Σ−1t = u ′ tut, (2.9) where Σ−1t follows a Wishart distribution marginally with integer degrees of freedom d > J and scale parameter S. Under this setup, Σ−11 , . . . ,Σ −1 T constitute a Wishart autoregressive pro- cess of order 1, which defines a matrix Markov process for Σ−1t , constructed using latent variables (Cameron, 1973). We are therefore able to build a joint prior on Σ−11 , . . . ,Σ −1 T pi(Σ−11:T ) = pi(Σ −1 1 ) T∏ t=1 pi(Σ−1t | Σ−1t−1), (2.10) where the initial distribution at time point t = 1, given by pi(Σ−11 ), is the conjugate Wishart prior WJ(d, S), with degrees of freedom d, scale param- 17 2.1. The Wishart Prior eter S, and probability density function given by pi ( Σ−11 ) = { 2dJ/2ΓJ (d/2) |S|d/2 }−1 |Σ−11 | 1 2 (d−J−1)etr ( −1 2 S−1Σ−11 ) . (2.11) At subsequent time points, t > 1, the transition distribution pi(Σ−1t | Σ−1t−1) is a non-central Wishart denoted by WJ(d, S∗,Θ) with degrees of freedom d, scale parameter S∗ = S −MSM ′ and non-centrality parameter Θ = (S −MSM ′)−1M ′Σ−1t−1M and with density given by pi(Σ−1t | Σ−1t−1) = { 2dJ/2ΓJ (d/2) |S∗|d/2 }−1 |Σ−1t | 1 2 (d−J−1)etr { −1 2 Θ } × etr ( −1 2 S∗−1Σ−1t ) 0F1 ( 1 2 d; 1 4 ΘS∗−1Σ−1t ) , (2.12) where 0F1 is the hypergeometric function of matrix argument defined in Appendix A.3 in equation (A.25). The dynamics of the WAR process are characterized by the degrees of freedom d > J , a symmetric positive semi-definite J × J matrix S to model the covariance of the latent process; and an unconstrained latent autoregres- sive matrixM , that encodes the temporal dependence between the sequences of covariance matrices. The Wishart autoregressive process can also be de- fined for any autoregressive order, therefore the Markov assumption of order 1 can be weakened. Moreover, while in this thesis we construct the prior using integer degrees of freedom only, this process is also defined for non- integer degrees of freedom, as discussed in Gourieroux et al. (2009), which allows for greater modeling flexibility. Finally, this process guarantees that covariance and precision matrices are symmetric and positive definite. Another attractive property of the Wishart autoregressive process is that it represents a generalization of the conjugate model used in the univariate case. When T = 1, the Wishart autoregressive process simplifies to the conjugate Wishart distribution for precision matrices, and for J = 1, it simplifies to the χ2 distribution. Moreover, we can directly show that this process is stationary, with marginal distribution ut ∼ MN d,J(0, S, Id) and Σ−1t ∼ WJ(d, S). This can 18 2.1. The Wishart Prior be seen by taking the iterated expectation and moving by induction starting with: E(u1) = 0 E(u2) = E(E(u2 | u1)) = E(Mu1) = ME(u1) = 0 ... E(ut) = E(E(ut | ut−1)) = E(Mut−1) = ME(ut−1) = 0 ... E(uT ) = E(E(uT | uT−1)) = E(MuT−1) = ME(uT−1) = 0 similarly, we can compute the marginal variance: Var(u1) = S Var(u2) = E(Var(u2 | µ1)) + Var(E(u2 | u1) = E [S −MSM ′] + Var(Mu1) = S −MSM ′ +MVar(u1)M ′ = S −MSM ′ +MSM ′ = S ... Var(ut) = E(Var(ut | ut−1)) + Var(E(ut | ut−1)) = E [S −MSM ′] + Var(Mut−1) = S −MSM ′ +MVar(ut−1)M ′ = S −MSM ′ +MSM ′ = S ... Var(uT ) = E(Var(uT | uT−1)) + Var(E(uT | uT−1)) = E [S −MSM ′] + Var(MuT−1) = S −MSM ′ +MVar(uT−1)M ′ = S −MSM ′ +MSM ′ = S 19 2.1. The Wishart Prior 2.1.5 The Hyperparameters of the WAR Prior The temporal dynamics of the WAR model are controlled by the hyperpa- rameterM . This can be seen from equations (2.7) and (2.8). To further illus- trate this, suppose we take M = ρIJ , where I is the identity matrix and ρ is a scalar parameter. The “memory” of the process is encoded in ρ. A large ρ implies covariances with high dependency between time points; on the other hand, when ρ = 0, there are no dynamics and at each time step we have an independent and identically distributed precision matrix Σ−1t ∼ WJ(S, d). The past matters more as ρ approaches 1. For ρ = 1 exactly, we have the same precision matrix at all time steps: Σ−11 = Σ −1 2 = · · · = Σ−1T . In this case, the time element becomes irrelevant, and the data can be stacked and analyzed simultaneously as i.i.d data generated from the same Σ−1. In this research, we only consider scenarios with 0 ≤ ρ ≤ 1. The hyperparameters S and d represent the scale matrix and degrees of freedom of the Wishart distribution respectively. Barnard et al. (2000) have shown that if Σ ∼ IWJ(d, S) and given the decomposition Σ = DRD, where R is a correlation matrix and D = diag(d1, . . . , dJ) is a diagonal matrix of standard deviations with di = (Σii) −1 > 0, then the resulting correlation matrix R has a marginally uniform prior only when S = I, the identity matrix, and for d = J + 1 (note that this value depends on the parametrization of the Wishart). This means that marginally Rij is uni- formly distributed in the [−1, 1] interval. A marginally uniform distribution has desirable properties which will be further explored in Chapter 3. To illustrate the effect of a change in M , S and d on the precision elements of Σ−1, we consider an empirical example where we simulate from the prior distribution in (2.10). We begin by considering the univariate case. First we examine the effect of the autoregressive parameter by varying M = ρ ∈ {0.7, 0.98, 0.999} and holding S = 1 and d = J + 1 = 2. In the second simulation, we fix ρ = 0.98 and d = 2 and we vary S ∈ {0.01, 0.5, 1, 4}. Finally, in the last experiment, we fix ρ = 0.98 and S = 1 and vary d ∈ {J + 1, J + 50}. Given the above three scenarios, for each hyperparameter experiment, 20 2.1. The Wishart Prior the latent Gaussian variables are generated according to the model in (2.7) and (2.8).The precisions are then computed using (2.9). Figure 2.2(a) plots the values of a precision simulated from the prior using different values of ρ and maintaining the scale matrix at S = 1 and degrees of freedom at d = 2. It can be noted that a larger ρ results in a smoother change in the precision over time. Furthermore, ρ has to be really large and very close to 1 before the trace plot flattens. For ρ = 0.7 the change is still not large enough to show any visible persistence over time. In Figure 2.2(b), we plot the values obtained by simulating a precision from the prior, this time holding ρ at 0.98 and d still set at 2 and varying S. We can see that a smaller S translates generally in smaller precisions, which means very variable data. Finally, in Figure 2.2(c), where ρ is set to 0.98 and S is set to 1 and we consider two possibilities of d. What we note is that a larger d also means larger precision. The same pattern holds for the bivariate case in Figure 2.3 and trivariate case (results not shown). Previous research shows that S controls the location of the precision while d controls the spread. For more on understanding the effect of hyperparameters on the covariance matrix the reader is referred to Tokuda et al. (2011). 21 2.1. The Wishart Prior 0 100 200 300 400 500 0 5 10 15 The Precision Over Time for Different Values of ρ time σ−1 ρ = 0.7 ρ= 0.98 ρ=0.999 (a) Varying ρ for S = 1 and d = J + 1 0 100 200 300 400 500 0 2 4 6 8 10 12 14 16 18 20 The Precision Over Time for Different Values of s time σ−1 s=0.01 s=0.5 s=1 s=2 (b) Varying S for ρ = 0.98 and d = J + 1 0 100 200 300 400 500 0 10 20 30 40 50 60 70 80 The Precision Over Time for Different Values of d time σ−1 d=J+1 d=J+50 (c) Varying d for ρ = 0.98 and S = 1 Figure 2.2: Precision simulated from the prior using different values of the hyper-parameters ρ, S and d. 22 2.1. The Wishart Prior 0 100 200 300 400 500 0 2 4 6 8 10 12 14 16 σ11 Over Time for Different Values of ρ time σ11 ρ = 0.7 ρ= 0.98 ρ=0.999 0 100 200 300 400 500 0 2 4 6 8 10 12 14 16 18 σ22 Over Time for Different Values of ρ time σ22 ρ = 0.7 ρ= 0.98 ρ=0.999 0 100 200 300 400 500 −10 −5 0 5 10 σ12 Over Time for Different Values of ρ time σ1 2 ρ = 0.7 ρ= 0.98 ρ=0.999 Figure 2.3: Time series plots showing the effect of changing ρ on the marginal elements of Σ−1 in the bivariate case (J = 2). 23 2.2. Bayesian Estimation with the WAR 2.2 Bayesian Estimation with the WAR Given observed continuous Gaussian data, and adopting the WAR of order 1 as a prior for the precision matrix, one can begin to think of performing Bayesian inference on the distribution of Σ−11:T | Z1:T . Posterior estimation at time point t = 1 is straightforward. The Wishart is the conjugate prior, therefore the posterior distribution of Σ−11 | Z1 is also a Wishart distribution with updated parametersWJ(d+n, (S−1 +SST )−1), where SST = Z ′1Z1 is the residual sum of squares. At time points t > 1, the non-central Wishart is not conjugate. The conditional posterior distribution pi(Σ−1t | Σ−11:t−1, Z1:t) simplifies to pi(Σ−1t | Σ−1t−1, Zt) due to the Markovian assumption on the model and can be written as pi(Σ−1t | Σ−1t−1, Zt) ∝ f(Zt | Σ−1t )pi(Σ−1t | Σ−1t−1). (2.13) This distribution is not available in closed form and therefore it is diffi- cult to sample directly from it. In cases where the posterior distribution is not available analytically, the problem is addressed indirectly by generating random samples for the target distribution of interest using Monte Carlo methods. We propose a sequential Monte Carlo (SMC) algorithm to sample from the posterior distribution, using two different proposal densities. SMC meth- ods are a general class of Monte Carlo methods that sample sequentially from a sequence of target probability densities of increasing dimensions. These algorithms are particularly suitable for non-linear, non-Gaussian models. 2.3 Sampling using Sequential Monte Carlo (SMC) The posterior distribution pi(Σ−11:T | Z1:T ) is multidimensional and not avail- able analytically. SMC methods are a set of simulation-based methods which provide a convenient and attractive approach to estimating recursively a se- quence of probability distributions. These methods are particularly suited 24 2.3. Sampling using Sequential Monte Carlo (SMC) for state space models. Using SMC, the intractable problem of sampling from a multidimensional target distribution, denoted by p(.), is decomposed into a series of simpler problems which can be tackled recursively. Using Bayes’ rule, the target posterior can be written as p ( Σ−11:T | Z1:T ) = pi ( Σ−11:T , Z1:T ) pi (Z1:T ) = pi ( Σ−11:T ) f ( Z1:T | Σ−11:T ) pi (Z1:T ) , (2.14) and the normalizing constant, also known as the marginal likelihood, is obtained by integrating over the latent state pi (Z1:T ) = ∫ pi ( Σ−11:T , Z1:T ) dΣ−11:T . (2.15) The joint distribution can be re-written as pi ( Σ−11:T , Z1:T ) = pi ( Σ−11:T−1, Z1:T−1 ) pi(Σ−1T | Σ−1T−1)f ( ZT | Σ−1T ) . (2.16) Therefore, the posterior p(Σ−11:T | Z1:T ) satisfies the following recursion p ( Σ−11:T | Z1:T ) ∝ p (Σ−11:T−1 | Z1:T−1)pi (Σ−1T | Σ−1T−1) f (ZT | Σ−1T ) . (2.17) Thus, we are able to break a high dimensional distribution into the product of distributions of lower dimensions. A general SMC algorithm for state space models can be described as a two-step recursion involving a sampling step and a correction step. The algorithm would proceed at time t = 1 by generating a sample of size K from a target distribution p(Σ−11 | Z1) of interest. In most cases it is not possible to sample directly from the target distribution p(.), therefore the K samples, also referred to as particles, are drawn instead from an importance distribution denoted by q(.), with support as large or larger than that of p(.). In order to correct for that approximation, the discrepancy between the tar- get and the importance distribution, known as the importance weights, are computed for each sample by taking the ratio between the two distributions p and q, evaluated for each particle m. The resulting weights are given by 25 2.3. Sampling using Sequential Monte Carlo (SMC) w1(Σ −1(m) 1 ) = f(Z1 | Σ−1(m)1 )pi(Σ−11 ) q(Σ−11 | Z1)(m) . (2.18) The K particles, are then re-sampled according to normalized weights W1 where W (m) 1 = w (m) 1∑K l=1w (l) 1 . (2.19) This resampling step corrects the approximation and ensures that the re- sulting weighted samples are distributed according to the target distribution of interest. At the following time step, t = 2, the identity resulting from the recur- sion in (2.17) is considered: p(Σ−11:2 | Z1:2) ∝ p(Σ−11 | Z1)pi(Σ−12 | Σ−11 )f(Z2 | Σ−12 ). The particles obtained from the previous step are used for p(Σ −1 1 | Z1) and an importance density q(Σ−12 | Σ−11 , Z2) is used to extend the dimension of the current particles to approximate pi(Σ−12 | Σ−11 )f(Z2 | Σ−12 ). This pro- vides samples that are distributed according to the approximate distribution p(Σ−11 | Z1)q(Σ−12 | Z2,Σ−11 ). Once more, in order to correct for the approx- imation, the importance weights w2 must be computed and normalized and the whole particle path is re-sampled according to the normalized weights W2. w2(Σ −1(m) 2 ) = f(Z2 | Σ−1(m)2 )pi(Σ−12 | Σ−1(m)1 ) q(Σ−12 | Z2,Σ−11 )(m) ; W (m) 2 = w (m) 2∑K l=1w (l) 2 . The resampling step results in particles that are distributed according to p(Σ−11 ,Σ −1 2 | Z1, Z2). This procedure is then repeated T times. At each step t the weights are computed and normalized according to wt(Σ −1(m) t ) = f ( ZT | Σ−1(m)T ) pi ( Σ−1T | Σ−1(m)T−1 ) q(Σ−1t | Zt,Σ−1t−1)(m) ; W (m) t = w (m) t∑K l=1w (l) t . (2.20) The resampling step ensures that the trajectories with small importance 26 2.3. Sampling using Sequential Monte Carlo (SMC) weights are eliminated and the ones with large weights are replicated. Fur- thermore, while the resampling step introduces some Monte Carlo variance, it reduces the accumulation of error over time. At each time step, the entire particle path corresponding to the particles that are chosen in the resam- pling step is kept and propagated forward and the remaining particles are discarded. This obviously results in the algorithm degenerating over time, because a small proportion of the weights contain all of the probability mass and hence most particles are eventually discarded. In many cases, it is simpler to work with log-weights for numerical rea- sons, since weights can be very small or very large, and numerical precision might be a concern when normalized. Furthermore, in order to prevent un- derflow or overflow, it is common to subtract the largest log-weight from all the log-weights prior to normalization. From equation (2.20) we can see the optimal choice of q(.) ≡ p(.) would result in weights equal to 1, meaning that all the particles survive and are propagated forward. A choice in importance distribution that is very dif- ferent from p(.) would result in a very rapid degeneracy of the algorithm. For this reason, the choice of importance density for the SMC algorithm is somewhat critical. 2.3.1 Computation of the Marginal Likelihood One of the nice by-products of the sequential Monte-Carlo sampler is that an estimate of the marginal likelihood p̂(Z1:T ) can be obtained directly from each SMC run. This estimate is evaluated recursively and is given by p̂(Z1:T ) = p̂(Z1) T∏ t=2 p̂(Zt | Z1:t−1), (2.21) where p̂(Zt | Z1:t−1) = 1 K K∑ m=1 wt(Σ −1(m) 1:t ), (2.22) 27 2.4. SMC Algorithm for the Wishart Autoregressive Process and where wt are the un-normalized weights at time point t. This is an estimate of p(Zt | Z1:t−1) = ∫ wt(Σ −1(m) t )q(Σt | Zt,Σt−1)p(Σ1:t−1 | Z1:t−1)dΣ1:t. (2.23) 2.4 SMC Algorithm for the Wishart Autoregressive Process Estimation in the Wishart autoregressive process, at the first time step, results in an analytical solution that can be sampled exactly. Therefore at time point t = 1, we can sample directly from the target distribution p(Σ−11 | Z1) without the need for any correction. At time point t = 2, since the posterior distribution given in (2.13) is not available in closed form, we must choose an importance distribution to generate candidate samples. The weights can be computed for each sample as mentioned previously; particles paths are chosen according to those weights and propagated to the following time point. The procedure is repeated for all time points. In the next section, we propose two different importance densities to propose candidate samples that can be used for running the SMC algorithm with the Wishart autoregressive process. 2.4.1 The Non-central Wishart Prior as Importance Distribution The first approach to obtain samples from the target distribution p(Σ−11:T | Z1:T ) is to use the non-central Wishart prior distribution as an importance distribution by setting q(Σ−1t | Σ−1t−1, Zt) ≡ pi(Σ−1t | Σ−1t−1) ∼ W(d, S∗,Θ), with d, S∗ and θ as defined in Section 2.2. Using the prior as importance distribution, the importance weights at each time step t simplify to the likelihood Wt = f(Zt | Σ−1t )pi(Σ−1t | Σ−1t−1) q(Σ−1t | Σ−1t−1) = f(Zt | Σ−1t ). (2.24) 28 2.4. SMC Algorithm for the Wishart Autoregressive Process A detailed description of the implementation of the SMC algorithm is outlined in Algorithm (2.1). Algorithm 2.1 SMC using the Non-central Wishart Prior For particles m = 1, . . . ,K: At time point t = 1 • Sample Σ−1(m)1 i.i.d∼ WJ(d+ n, (S−1 + SST )−1) where SST = Z ′1Z1 At time point t > 1 • Sample Σ−1(m)t i.i.d∼ WJ(d, S−MSM ′, (S−MSM ′)−1M ′Σ−1(m)t−1 M)- A non-central Wishart. • Compute wt(Σ −1(m) t ) = f(Zt | Σ−1(m)t ) ; W (m)t = wt(Σ −1(m) t )∑K l=1wt(Σ −1(l) t ) , where f is the multivariate Gaussian p.d.f. evaluated at Zt and Σ −1(m) t for each particle m. • Re-sample the particles Σ−1(m)1:t with replacement according to the weights W (m) t . The performance of this proposal distribution is expected to be satisfac- tory for cases where observations are not too informative. This is because the proposal essentially ignores the observed data, therefore when the data does not carry too much information, the posterior is very close to the prior in shape. 2.4.2 A Central Wishart as Importance Distribution An alternate proposal is one that takes into account the data and uses the central Wishart distribution as an approximation to the non-central Wishart transition density. This is convenient because the central Wishart is the conjugate prior, and would therefore provide a closed form solution mak- ing it easier to sample approximately from p(Σ−1t | Σ−1t−1, Zt) by proposing 29 2.4. SMC Algorithm for the Wishart Autoregressive Process samples from a Wishart distribution such that q(Σ−1t | Σ−1t−1, Zt) ∝ f(Zt | Σ−1t )q(Σ −1 t | Σ−1t−1). The central Wishart approximation of the non-central Wishart has been suggested by Gupta and Nagar (2000), and is obtained by matching the first moment of the two distributions. Using the representation of non- central Wishart as the inner product of normal matrices, for t > 1, let ut ∼MN d,J(Mu′t−1, S∗, Id), then Σ−1t = u′tut and E ( Σ−1t | Σ−1t−1 ) = dS∗ +M ′Σ−1t−1M. (2.25) In this case it is easy to see that by taking q(Σ−1t | Σ−1t−1) to be a central Wishart with degrees of freedom d and location parameter S∗+ 1dM ′Σ−1t−1M , the first moment of this central Wishart would match to the one in (2.25). Furthermore, the second moments have been shown to differ only in terms O(d−1). Combining this proposal with the likelihood, results in an importance approximation to the posterior that is also a central Wishart with updated degrees of freedom d+n and scale parameter ((S−MSM ′+ 1dMΣ−1t−1M ′)−1+ SST )−1. The importance weights can then be used to correct this approxi- mation wt(Σ −1 t ) = p(Zt | Σ−1t )p(Σ−1t | Σ−1t−1) q(Σ−1t | Σ−1t−1, Zt) . (2.26) 30 2.4. SMC Algorithm for the Wishart Autoregressive Process We expand the target distribution of interest: p(Zt | Σ−1t )p(Σ−1t | Σ−1t−1) ∝ (2pi)−nJ/2 ∣∣Σ−1t ∣∣n/2 etr{−12Σ−1t Z ′tZt } × |S ∗|−d/2 ∣∣Σ−1t ∣∣ (d−J−1)2 2dJ/2ΓJ(d/2) etr { −1 2 S∗−1 ( Σ−1t +MΣ −1 t−1M ′)} 0F1(d/2; (1/4)S ∗−1MΣ−1t−1M ′S∗−1Σ−1t ) ∝ (2pi)−nJ/2 |S ∗|−d/2 2dJ/2ΓJ(d/2) ∣∣Σ−1t ∣∣(n+d−J−1)/2 × etr { −1 2 [ Σ−1t ( Z ′tZt + S ∗−1)+ S∗−1MΣ−1t−1M ′]} × 0F1(d/2; (1/4)S∗−1MΣ−1t−1M ′S∗−1Σ−1t ), where S∗ = S −MSM ′. The importance distribution is a central Wishart WJ(d+ n, ( ( S∗ + 1dMΣ −1 t M ′)−1 + ZtZ ′t)−1) and its pdf is given by q(Σ−1t | Σ−1t−1, Zt) = 2 (d+n)J2 ΓJ ( d+ n 2 ) ∣∣∣∣∣ ( S∗ + 1 d MΣ−1t M ′ )−1 + ZtZ ′ t ∣∣∣∣∣ −(d+n) 2  −1 |Σ−1t | 1 2 (d+n−J−1)etr { −1 2 Σ−1t [( S∗ + 1 d MΣ−1t M ′ )−1 + ZtZ ′ t ]} . (2.27) Finally we can compute the logarithm of the weight by considering the ratio of p(.) and q(.), log(wt(Σ −1 t )) = − d 2 log |S∗| − (d+ n) 2 log ∣∣∣∣∣ [ S∗ + 1 d MΣ−1t−1M ′ ]−1 + ZtZ ′ t ∣∣∣∣∣ − 1 2 tr {( S∗−1 − [ Sc+ 1 d MΣ−1t−1M ′ ]−1) Σ−1t + S ∗−1MΣ−1t−1M ′ } + log(0F1(d/2; (1/4)S ∗−1MΣ−1t−1M ′S∗−1Σ−1t )). (2.28) The SMC sampling algorithm that uses the central Wishart as a proposal is detailed in Alogirthm (2.2). A big limitation to this approach is the numerical evaluation of the 0F1 confluent hypergeometric function with matrix argument necessary for the 31 2.4. SMC Algorithm for the Wishart Autoregressive Process Algorithm 2.2 SMC using the central Wishart Approximation For particles m = 1, . . . ,K, At time point t = 1, • Sample Σ−1(m)1 i.i.d∼ WJ(d+ n, (S−1 + SST )−1) where SST = Z ′1Z1. At time point t > 1 • Sample Σ−1(m)t i.i.d∼ WJ(d+n, ((S−MSM ′+ 1dMΣ−1t−1M ′)−1+SST )−1- A centered Wishart with SST = Z ′tZt • Compute wt(Σ −1(m) t ) is the exponential of the expression in (2.28); W (m) t = wt(Σ −1(m) t )∑K l=1wt(Σ −1(l) t ) , for each particle m. • Re-sample the particles Σ−1(m)1:t with replacement according to the nor- malized weights W (m) t . 32 2.4. SMC Algorithm for the Wishart Autoregressive Process computation of the log weights in (2.28). Koev and Edelman (2006) propose an efficient algorithm and Matlab code to evaluate the hypergeometric func- tion. However, for certain values of Σ−1, their algorithm does not appear to produce accurate results. 2.4.3 Numerical Experiments using SMC In the first set of experiments, we consider the perfomance of the SMC algorithm under the two different proposals discussed in the previous section, while increasing one at a time the number of particles (K), the number of observations per time point (n), the number of time points (T ), and the dimensionality of the response at each time point (J). In order to evaluate each proposal, we monitor the computational effi- ciency of the algorithm as well as the accuracy of the estimates. In theory, both proposals are expected to produce similar results given enough compu- tational power. To assess computational efficiency we consider the average effective sample size (ESS), where ESS is the average over t = 1, . . . , T of ESSt given by ESSt = 1∑K m=1(W (m) t (Σ −1 t )) 2 . (2.29) We are interested in what the ESS represents as a percentage of the initial number of particles that are used in the model. The higher this percentage, the more computationally efficient the algorithm. In order to evaluate accuracy, for J = 1, we consider the average mean square error (MSE), which is the average over time of MSEt, the mean squared deviation of the estimated particles from the true variance used to generate the data and is given by MSEt = ∑K m=1(σ̂t −1(m) − σ−1t )2 K . (2.30) For J > 1 we are considering precision matrices instead of scalar preci- sions. In order to evaluate how different the estimated precision matrices are with respect to the true precision matrices used to generate the data, 33 2.4. SMC Algorithm for the Wishart Autoregressive Process the entropy loss was computed for each SMC particle at each time point using L (m) t (Σ̂ −1(m) t ,Σ −1 t ) = trace(Σ̂ −1(m) t Σt)− log |Σ̂−1(m)t Σt| − J, (2.31) where Σ̂ −1(m) t is an SMC particle at time point t. The average entropy loss was computed by averaging over all particles and time points L̄ = ∑T t=1( ∑K m=1 L (m) t /K) T . We first begin by looking at the scalar case, namely when J = 1. While this case is not particularly interesting from a computational standpoint, it enables us to compare the two SMC proposals. This is because the central and non-central Wishart simplify to the χ2 and non-central χ2 distribution respectively and we are able to implement the SMC using both the prior as well as the conditional prior that uses the central χ2 approximation to the conditional posterior as importance densities. Because we are able to evaluate the oF1 function easily in the scalar case, we can evaluate the non- central χ2 density as required by the algorithm. The hyperparameters used to generate the data were fixed in all the sim- ulations to S = IJ and M = 0.98IJ . The methods were implemented using both the Wishart and the χ2 distribution with both proposals. The idea is that the univariate and the multivariate implementations are expected to give identical results for J = 1. However, only the Wishart implementation can run when J > 1, and this is implemented only using the non-central Wishart proposal due to the difficulty in evaluating the oF1 function with matrix argument. In Figure 2.4, we plot the ESS and MSE against the number of particles K, obtained using the different proposals and implementations, for a fixed n = 1 observation unit and T = 200 time steps. We note that the proposal that uses the non-central Wishart/χ2 distribution results in a larger average effective sample size than the one that uses the central Wishart/χ2 approx- imation. The ESS tends to stabilize around 90% of the original number of 34 2.4. SMC Algorithm for the Wishart Autoregressive Process particles. This is not surprising given that the likelihood is not very infor- mative with only one observation per time point, making the posterior very similar to the prior. The method that uses the central Wishart/χ2 approxi- mation as a proposal results in a ESS around 45% of the initial number of particles used. We can also see that with enough particles, both methods give similar results in terms of the MSE statistic. Furthermore, we note that the central and non-central χ2 implementation give identical results to the central and non central Wishart implementation, which is what we expect for J = 1. 0 500 1000 1500 2000 2500 3000 3500 4000 4500 5000 0 500 1000 1500 2000 2500 3000 3500 4000 4500 Effective Sample size Particles Non−Central χ2 Centeral χ2 Non−Central Wishart Central Wishart (a) Effective Sample Size 0 500 1000 1500 2000 2500 3000 3500 4000 4500 5000 0 0.5 1 1.5 2 2.5 Mean Squared Error Particles Non−Central χ2 Centeral χ2 Non−Central Wishart Central Wishart (b) Mean Square Error Figure 2.4: Effective sample size and the mean squared error obtained by increasing the number of particles K. In the second set of experiments, using the same data set, we increase the number of time steps, while maintaining the number of observations at n = 1 and the number of particles fixed to K = 1000 particles. In Figure 2.5, we again plot the ESS and MSE this time against the number of time steps T , obtained using the different proposals and implementations. We can see that the ESS as well as the MSE estimates stabilize and reach stationarity at around T = 100. Furthermore, as expected, the ESS remains stable with increasing time steps and the Wishart/χ2 implementations continue to result in identical results. In the third set of experiments, we test the effect of increasing the number of observations from n = 1, . . . , 140, while maintaining the number of time 35 2.4. SMC Algorithm for the Wishart Autoregressive Process 0 50 100 150 200 250 300 350 400 450 500 300 400 500 600 700 800 900 1000 1100 1200 1300 1400 Effective Sample size Time Points Non−Central χ2 Centeral χ2 Non−Central Wishart Central Wishart (a) Effective Sample Size 0 50 100 150 200 250 300 350 400 450 500 0 1 2 3 4 5 6 7 Mean Squared Error Time Points Non−Central χ2 Centeral χ2 Non−Central Wishart Central Wishart (b) Mean Square Error Figure 2.5: Effective sample size and the mean squared error obtained by increasing the number of time steps T . steps fixed at T = 200 and the number of particles at K = 10, 000. As the number of observations increases, the likelihood becomes more informative, leading the prior to become a worse approximation of the posterior. In Figure 2.6, we plot the ESS and MSE as a function of the number of observation per time point n. We can see that as the number of observations increases, the effective sample size in the method that uses the prior as an importance density deteriorates, from 90% for n = 1 dropping to 40% for n = 140. On the other hand, the method that uses the central χ2 / central Wishart approximation, which are the methods that account for the observed data, become more computationally efficient, with ESS increasing from 38 % for n = 1 to 78% when n = 140. In addition, we can see that given enough particles, both proposals give similar MSE as the number of observations increases. As expected with a larger n the MSE becomes smaller leading to more precise estimates. Finally, we extend our experiments to the multidimensional case, where we monitor the ESS and MSE as J increases. In this case however, we are not able to use the central Wishart approximation as a proposal, because there is no easy way to numerically evaluate the hypergeometric function, oF1, needed for the computation of weights. Some methods were tried, but 36 2.5. Estimating Fixed Parameters 0 50 100 150 0.6 0.8 1 1.2 1.4 1.6 1.8 x 104 Effective Sample size Number of Observations Non−Central χ2 Centeral χ2 Non−Central Wishart Central Wishart (a) Effective Sample Size 0 50 100 150 0 0.5 1 1.5 2 2.5 3 3.5 4 4.5 Mean Squared Error Number of Observations Non−Central χ2 Centeral χ2 Non−Central Wishart Central Wishart (b) Mean Square Error Figure 2.6: Effective sample size and the mean squared error obtained by increasing the number of observations n. they did not produce consistent numerical results. Therefore, we focus on evaluating to which extent the prior is able to serve as a suitable proposal density, at least in terms of effective sample size. In this experiment, the number of observations is kept at n = 1, number of time steps is fixed at T = 200 and the number of particles is set to K = 5000. In Figure 2.7, we plot the ESS and MSE against the dimension of the precision matrix at each time step J . We can see that the ESS percentage drops from around 90% for J = 1 to less than 30 % for J = 6. Similarly, the entropy loss seems to increase with larger dimensions. These results are not very surprising given the increase in the complexity of the problem under consideration. 2.5 Estimating Fixed Parameters In many situations, information about the hyperparameters is not available; estimating those from the data becomes of interest. For example, researchers may not know how smoothly the precision matrix varies from one time point to the next. In this case, it is useful to estimate the autoregressive parameter M from the data alongside other unkown parameters rather than arbitrarily fixing it to a specific value. From an application standpoint, it is reasonable to assume that this autoregressive parameter is static over time. We consider 37 2.5. Estimating Fixed Parameters 1 2 3 4 5 6 1000 1500 2000 2500 3000 3500 4000 4500 Effective Sample size Dimension of Σ−1t (a) Effective Sample Size 1 2 3 4 5 6 0 5 10 15 20 25 30 35 Entropy Loss Dimension of Σ−1t (b) Entropy Loss Figure 2.7: Effective sample size and the mean squared error obtained by increasing the number of observations n. the simple case M = ρI, where ρ is an unknown scalar; we also fix the other hyperparmeters to d = J + 1 and S = IJ , the identity matrix. A simple method to estimate the static autoregressive scalar parameter ρ, given the data, is to propose different values of ρ on a grid that cover the support of possible values of ρ, and evaluate the marginal likelihood corresponding to each proposed ρ. A parameter estimate is then obtained by choosing the value of ρ that maximizes the marginal likelihood. To illustrate how this method works, we perform simulations by taking a grid of values of ρ in the (0, 1) range and using SMC with the non-central Wishart as a proposal to compute the marginal likelihood for each value of ρ on that grid. The marginal likelihood is straight-forward to obtain from the SMC algorithm using (2.21) and (2.22). Moreover, for ρ = 0 and ρ = 1, we can compute the marginal likelihood analytically as the non-centrality parameter becomes 0 and the non-central Wishart distribution becomes a central Wishart distributon which is conjugate and a closed form solution is available (see Appendix B.1). The experiment is repeated for different underlying ‘true’ values of the parameter ρ = {0.7, 0.98, 0.999} and for dimensions J = {2, 3, 5}. The number of particles is fixed to K = 2500, in order to maintain a reasonable 38 2.5. Estimating Fixed Parameters ESS for all experiments. The number of time steps is fixed to T = 200 and the number of observations per time step is fixed to n = 1. The experiment was repeated 5 times in order to get some estimate of the Monte Carlo error. The error is obtained by computing the squared deviations from the average marginal likelihood over the five repetitions. Figure 2.8 shows plots of the marginal likelihood for the different dimen- sions with error bars to represent the Monte Carlo error obtained over the 5 repetitions. Figure 2.9 gives another depiction of the MC error from esti- mating the marginal likelihood. We note that as the number of dimensions increases, it becomes challenging to estimate the marginal likelihood. For example in Figure 2.8(a), the estimates of the marginal likelihood are in line with the theoretical marginal likelihood, however we can see that in Figure 2.8(b) and 2.8(c), for J = 3 and J = 5, the SMC estimate falls below the theoretical result. Moreover, the error bars get larger with an increase in dimension. Furthermore, for smaller values of the underlying true ρ, it seems more difficult to discriminate using the marginal likelihood since that seems mostly flat and only dips off past the true value. Taking a closer look at Figures 2.9, we note that the error in estimating the marginal likelihood becomes amplified as ρ gets closer to 1. This is because the proposal distribution tends to a point mass as ρ gets close to 1. The covariance matrix is not changing much in time as ρ→ 1, so we end up proposing the same covariance matrix with probability very close to 1. This also affects computations numerically, as we are required to take the Cholesky decomposition with values very close to 0 in the sampling of the non-central Wishart. 39 2.5. Estimating Fixed Parameters 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 −900 −800 −700 −600 −500 −400 −300 −200 −100 Marginal Likelihood for K=2500, n=1, J=2, and T=200 ρ Ma rgin al li keli hoo d ρ=0.7 ρ=0.98 ρ=0.999 (a) J = 2 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 −1100 −1050 −1000 −950 −900 −850 −800 −750 Marginal Likelihood for K=2500, n=1, J=3, and T=200 ρ Ma rgin al li keli hoo d ρ=0.7 ρ=0.98 ρ=0.999 (b) J = 3 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 −2400 −2200 −2000 −1800 −1600 −1400 −1200 Marginal Likelihood for K=2500, n=1, J=5, and T=200 ρ Ma rgin al li keli hoo d ρ=0.7 ρ=0.98 ρ=0.999 (c) J = 5 Figure 2.8: Marginal likelihood for different values of underlying ρ in varying dimensions, with T = 200 time steps. 40 2.5. Estimating Fixed Parameters 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 0 2 4 6 8 10 12 Error in Marginal Likelihood for K=2500, n=1, J=2, and T=200 ρ Err or i n M arg inal like liho od ρ=0.7 ρ=0.98 ρ=0.999 (a) J = 2 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 0 2 4 6 8 10 12 14 Error in Marginal Likelihood for K=2500, n=1, J=3, and T=200 ρ Err or i n M arg inal like liho od ρ=0.7 ρ=0.98 ρ=0.999 (b) J = 3 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 0 50 100 150 200 250 Error in Marginal Likelihood for K=2500, n=1, J=5, and T=200 ρ Err or i n M arg inal like liho od ρ=0.7 ρ=0.98 ρ=0.999 (c) J = 5 Figure 2.9: Error in the marginal likelihood for different values of underlying ρ in varying dimensions, with T = 200 time steps. 41 2.6. Particle MCMC for Fixed Parameter Estimation 2.6 Particle MCMC for Fixed Parameter Estimation The approach discussed in the previous section is one method that can be used to estimate a fixed a scalar M ; however, this method does not scale well for a more complex M . Particle Markov Chain Monte-Carlo (pMCMC) methods are a class of algorithms proposed by Andrieu et al. (2010) to build efficient proposal densities using SMC as part of an MCMC algorithm. This class of algorithms is particularly suited for estimation of parameters that are highly correlated or static over time. The SMC algorithm is used to generate a proposal, which is accepted with probability equal to a ratio of marginal likelihoods. In the next section we introduce the most basic pMCMC method, the particle independent Metropolis-Hastings algorithm. Later, we build on that algorithm to devise algorithms more suitable to the complexity of the multivariate longitudinal model. 2.6.1 The Particle Independent Metropolis Hastings Algorithm (PIMH) The particle independent Metropolis-Hastings (PIMH) is a simple particle MCMC algorithm, where SMC is used to propose a possible MCMC can- didate. As noted in Andrieu et al. (2010), PIMH on its own provides little advantage over SMC. This is mainly due to the computational complexity of PIMH. Therefore, given equal computational resources, SMC with enough particles can produce satisfactory results. Here, we present the algorithm as a stepping stone to introduce more complex algorithms that use SMC in combination with other MCMC transitions for estimating fixed parameters. The algorithm, detailed in Algorithm (2.3), proceeds to propose candi- dates Σ∗−11:T from an SMC approximation of p̂(. | Z1:T ). The new candidates, given the current state Σ−11:T are accepted with probability 1 ∧ p̂ ∗(Z1:T ) p̂(Z1:T )(s− 1) 42 2.6. Particle MCMC for Fixed Parameter Estimation where p̂∗(Z1:T ) is the marginal likelihood given the new candidate and p̂(Z1:T )(s − 1) is the marginal likelihood obtained at the previous itera- tion. This essentially means that if the marginal likelihood improves under the proposed candidate, it will be accepted. The PIMH algorithm is ergodic under weak conditions and leaves p(Σ−11:T | Z1:T ) invariant. Furthermore, the acceptance probability of the algorithm converges to 1 as the number of particles K →∞. Algorithm 2.3 Particle Independent Metropolis Hastings Algorithm (PIMH) Initialization, s = 0 • Run the SMC Algorithm targeting p(Σ−11:T | Z1:T ) and sample Σ−11:T (0) ∼ p̂(. | Z1:T ) and denote p̂(Z1:T )(0) the associated marginal likelihood estimate. For each subsequent MCMC iteration s ≥ 1 • Run an SMC Algorithm targeting p(Σ−11:T | Z1:T ) and sample Σ−1∗1:T ∼ p̂(. | Z1:T ) and denote p̂∗(Z1:T ) the associated marginal likelihood es- timate. • With probability 1 ∧ p̂ ∗(Z1:T ) p̂(Z1:T )(s− 1) set Σ−11:T (s) = Σ −1∗ 1:T and p̂(Z1:T )(s) = p̂ ∗(Z1:T ) otherwise setΣ−11:T (s) = Σ−11:T (s− 1) and p̂(Z1:T )(s) = p̂(Z1:T )(s− 1). In the next set of simulations, we implement the PIMH algorithm to pro- pose candidates from p(Σ−11:T | Z1:T ). The SMC algorithm with the prior as an importance distribution described in 2.1 is used as a mechanism to gen- erate proposals. The marginal likelihood is easily obtained using equation (2.21). For a fixed ρ = 0.98IJ , S = I, J = 3, and n = 1, data are generated and results are obtained by running the PIMH algorithm. We consider the effect resulting from increasing the number of particles K to 5000 (Figure 2.10(a)), while fixing the other parameters as above. In the first pane of 43 2.6. Particle MCMC for Fixed Parameter Estimation Figure 2.10(a), the acceptance probability is plotted as a function of the number of particles. The acceptance probability seems to follow an increas- ing trend with an increase in number of particles, which is what would be expected. The second pane is the ESS obtained from each SMC proposal within PIMH. As expected, ESS is increasing linearly as a function of the number of particles. The third pane shows an estimate of the marginal like- lihood obtained from SMC, as a function of increasing number of particles. This does not appear to give very fluctuating results. The results of the second set of experiments are outlined in Figure 2.10(b), where everything is held constant, except for the number of MCMC iterations which is increased up to s = 1000. The number of particles K is fixed to 1000. The first pane shows that the acceptance probability of the algorithm remains more or less constant, fluctuating between 12 % and 35%. Similarly, the ESS obtained from each SMC proposal within PIMH is sta- ble. This is good because the acceptance probability should not change as a function of the number of MCMC iterations. Finally, the marginal likelihood in the third pane is consistent with results from the previous experiment. The last Figure 2.10(c) shows the results of running the PIMH algorithm for a fixed number of particles K = 10000 and a fixed number of MCMC iteration s = 500, but this time increasing the length of the time series to T = 500. In this case, the acceptance probability seems to suffer consider- ably as T increases, which is expected given that this increases the degree of difficulty of the problem we are trying to solve. 44 2.6. Particle MCMC for Fixed Parameter Estimation 0 5000 0.1 0.15 0.2 0.25 0.3 0.35 0.4 0.45 0.5 Acceptance Probability Particles 0 5000 0 500 1000 1500 2000 2500 3000 3500 Effective Sample size Particles 0 5000 −389 −388.5 −388 −387.5 −387 −386.5 Marginal Likelihood Particles (a) Increasing the Number of Particles 0 500 1000 0.1 0.15 0.2 0.25 0.3 0.35 Acceptance Probability MC Iterations 0 500 1000 627.5 627.6 627.7 627.8 627.9 628 628.1 628.2 628.3 628.4 Effective Sample size MC Iterations 0 500 1000 −388.6 −388.4 −388.2 −388 −387.8 −387.6 −387.4 −387.2 −387 −386.8 −386.6 Marginal Likelihood MC Iterations (b) Increasing MCMC Iterations 0 500 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 Acceptance Probability Time 0 500 580 590 600 610 620 630 640 650 Effective Sample size Time 0 500 −2000 −1800 −1600 −1400 −1200 −1000 −800 −600 −400 −200 0 Marginal Likelihood Time (c) Increasing the Number of Time Steps Figure 2.10: PIMH Simulation results showing the acceptance probability, the ESS and marginal liklihood as a function of Particles, MC iterations and number of times steps. 45 2.6. Particle MCMC for Fixed Parameter Estimation 2.6.2 The Particle Marginal Metropolis Hastings Algorithm (PMMH) As mentioned previously, PIMH on its own offers little in terms of compu- tational efficiency compared with SMC. However, in the presence of static parameters, particle MCMC methods become very useful. In the context of the Wishart autoregressive process, the autoregressive parameter M is assumed to be static. The particle marginal Metropolis-Hastings (PMMH) algorithm (Andrieu et al., 2010) builds a proposal for p(Σ−11:T ,M | Z1:T ). This is done by proposing joint updates of Σ−11:T and M . The proposed pair (Σ∗−11:T ,M ∗) is accepted with probability 1 ∧ p̂M∗(Z1:T )p(M ∗) p̂M (Z1:T )p(M) q(M |M∗) q(M∗ |M) . (2.32) If accepted, the update (Σ−11:T ,M)→ (Σ∗−11:T ,M∗) is made and a new pair is proposed. A general PMMH algorithm is detailed in Algorithm (2.4). As a first step, we consider a scalar static parameter such that M = ρIJ . We propose candidates from pρ(Σ −1 1:T | Z1:T ), using an SMC proposal described in Algorithm (2.2). The distribution p(ρ) is taken to be uniform on the interval (0, 1) and so the probability density function is equal to 1, for all values of ρ. The transition density for each ρ∗ ∼ q(. | ρ(s − 1)) is a random walk, centered at the previous value such that ρ∗ ∈ (0, 1) : ρ∗ = ρ(s−1) + c, (2.33) where c is a step size tuning parameter and  ∼ N (0, 1). In order to propose new values of ρ∗, we sample from a univariate truncated Gaussian distribu- tion. Sampling from this distribution is done using the algorithm of Robert (1995). The Hastings component of the ratio in (2.32) is given by q(ρ | ρ∗) q(ρ∗ | ρ) = Φ ( b−ρ∗ c ) − Φ ( a−ρ∗ c ) Φ ( b−ρ c ) − Φ (a−ρc ) , (2.34) where a and b are the lower and upper limits of truncation, ρ is the current 46 2.6. Particle MCMC for Fixed Parameter Estimation Algorithm 2.4 Particle Marginal Metropolis-Hastings (PMMH) Step 1: Initialization, s = 0 • Set M(0) arbitrarily and • Run an SMC algorithm targeting pM(0)(Σ−11:T | Z1:T ). Sample Σ−11:T (0) ∼ p̂M(0)(. | Z1:T ) and let p̂M(0)(Z1:T ) denote the estimated marginal likelihood. Step 2: For iteration s ≥ 1, • Sample M∗ ∼ q(. |M(s− 1)). • Run an SMC algorithm targeting pM∗(Σ−11:T | Z1:T ). Sample Σ−1∗1:T ∼ p̂M∗(. | Z1:T ) and let p̂M∗(Z1:T ) denote the estimated marginal likeli- hood. • Accept the pair {M∗,Σ−11:T } with probability 1 ∧ p̂M∗(Z1:T )p(M ∗) p̂M(s−1)(Z1:T )p(M) q(M(s− 1) |M∗) q(M∗ |M(s− 1)) . set M(s) = M∗, Σ1:T (s) = Σ∗1:T and p̂M(s)(Z1:T ) = p̂M∗(Z1:T ), other- wise set M(s) = M(s − 1), Σ−11:T (t) = Σ−11:T (s − 1) and p̂M(s)(Z1:T ) = p̂M(s−1)(Z1:T ). 47 2.6. Particle MCMC for Fixed Parameter Estimation value and ρ∗ is a new proposed value centered at ρ. In this case a = 0 and b = 1 are the truncation points. The tuning parameter of the proposal distribution c is typically chosen such that the acceptance probability is in the 20%- 30% range. Given this choice of target and proposal density, the acceptance proba- bility in (2.32) becomes 1 ∧ p̂ρ∗(Z1:T ) p̂ρ(s−1)(Z1:T ) Φ ( 1−ρ∗ c ) − Φ ( −ρ∗ c ) Φ ( 1−ρ(s−1) c ) − Φ (−ρ(s−1) c ) , (2.35) where the marginal likelihood is obtained readily from the SMC algorithm using equation (2.21). This PMMH sampler is ergodic and leaves p(ρ,Σ−11:T | Z1:T ) invariant. To test the PMMH algorithm, we conduct a set of simulations where we consider the computational efficiency of the algorithm by looking at the traceplots of ρ obtained from the PMMH algorithm as a function of an increasing number of particles K as c increases. We fix the underlying ’true’ ρ = 0.98, S = IJ and d = J+1, for J = 3 and n = 1. The starting value for ρ was set to 0.5. The resulting traceplots of the posterior distribution of ρ are shown in Figure 2.11. The acceptance probability appears to be decreasing as c increases and increasing as a function of the number of particles. For the particular data set we simulated, the acceptance probability goes from 16% to 29% for c = 0.2 as K increases from 250 to 1000. For a fixed K = 1000, increasing c decreases the acceptance probability from 54% to 19%. Moreover, a larger c allows a faster exploration of the parameter space, requiring fewer MCMC iterations. After 5000 MCMC iterations, we can note that the median value of ρ, given in Table 2.1, is closer to the true value with a larger c. 48 2.6. Particle MCMC for Fixed Parameter Estimation 0 2000 40000 0.2 0.4 0.6 0.8 1 AP= 22.52 % K = 2 5 0 0 2000 40000 0.2 0.4 0.6 0.8 1 AP= 19.14 % 0 2000 40000 0.2 0.4 0.6 0.8 1 AP= 15.76 % 0 2000 40000 0.2 0.4 0.6 0.8 1 AP= 9.18 % 0 2000 40000 0.2 0.4 0.6 0.8 1 AP= 43.1 % K = 5 0 0 0 2000 40000 0.2 0.4 0.6 0.8 1 AP= 39.68 % 0 2000 40000 0.2 0.4 0.6 0.8 1 AP= 20.1 % 0 2000 40000 0.2 0.4 0.6 0.8 1 AP= 12.64 % 0 2000 40000 0.2 0.4 0.6 0.8 1 AP= 54.1 % c=0.001 K = 1 0 0 0 0 2000 40000 0.2 0.4 0.6 0.8 1 AP= 54.22 % c=0.01 0 2000 40000 0.2 0.4 0.6 0.8 1 AP= 29.12 % c=0.2 0 2000 40000 0.2 0.4 0.6 0.8 1 AP= 18.96 % c=0.5 Figure 2.11: The posterior traceplots of the parameter ρ obtained using the PMMH algorithm with different K and increasing c. c = 0.001 c = 0.01 c = 0.2 c = 0.5 K = 250 0.53 0.79 0.77 0.78 K = 500 0.51 0.79 0.77 0.78 K = 1000 0.51 0.75 0.77 0.78 Table 2.1: Posterior median of ρ obtained for different values of c and K. From the results of Table 2.1 and Figure 2.11, we see that the true value of ρ is not what is returned by the algorithm as the posterior median. However, when we consider the grid method for estimating ρ, discussed in the previous section, we obtain a similar result. In Figure 2.12, we plot the marginal likelihood, estimated using SMC, as a function of the different values of ρ taken on a grid, we note that the maximum is reached at 0.83, which is close to the posterior median returned from the PMMH algorithm 49 2.6. Particle MCMC for Fixed Parameter Estimation depicted in Figure 2.13. 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 −398 −396 −394 −392 −390 −388 −386 −384 −382 −380 ρ M ar gin al lik eli ho od Figure 2.12: Plot of the marginal likelihood obtained from running SMC with different values of ρ with K = 1000 particles. The maximum is reached at ρ=0.83. 0.4 0.5 0.6 0.7 0.8 0.9 1 0 500 1000 1500 Posterior Distribution of ρ Figure 2.13: Plot of the posterior estimate of ρ obtained using pMCMC with c = 0.2 and K = 1000. 50 2.6. Particle MCMC for Fixed Parameter Estimation We also would like to consider the posterior distribution of Σ−11:T obtained from the PMMH algorithm. However, rather than considering Σ−11:T directly, we transform the particles from Σ−11:T to R1:T . We used the data from the experiment with c = 0.2 and K = 1000. In Figures 2.14(a), 2.14(b), and 2.14(c), we plot the median marginal posterior of each off-diagonal element of Rt, obtained from the PMMH algorithm, as a function of time. We also highlight the 95% credible interval, as well as the true value of each element of Rt used to generate the data. From these plots, we notice that the median estimate of the marginal elements of Rt seems to get closer to the truth as the series progresses. The credible interval is quite wide, indicating our uncertainty about the posterior. Intuitively, the marginal prior has a flat distribution, and in this case, with n = 1, the likelihood is not very informative, resulting in a posterior that is quite spread out. 51 2.6. Particle MCMC for Fixed Parameter Estimation 0 10 20 30 40 50 60 70 80 90 100 −1 0 1 time R12 True Correlations Median Corrleation (a) Marginal distribution of R12 0 10 20 30 40 50 60 70 80 90 100 −1 0 1 time R13 True Correlations Median Corrleation (b) Marginal distribution of R13 0 10 20 30 40 50 60 70 80 90 100 −1 0 1 time R23 True Correlations Median Corrleation (c) Marginal distribution of R23 Figure 2.14: Plot of the true correlation and the median estimated correla- tion along with a 95% credible interval, obtained using the PMMH algorithm estimating M and fixing S. 52 2.6. Particle MCMC for Fixed Parameter Estimation In the next set of simulations, we compare the covariance matrix esti- mates obtained from the PMMH algorithm where ρ is sampled using PMMH compared to being set to an arbitrary value. Thirty different data sets are generated using a fixed value of ρ = 0.4 with S = IJ and n = 1, J = 3. In each case, we compared the average entropy loss (defined in (2.31)). The experiment was repeated for ρ = 0.80 and ρ = 0.95. The results are outlined in Figure 2.15. In Figure 2.15(a), we consider the boxplots of the average entropy loss for three models where ρ is fixed, and one where ρ is sampled using PMMH. We can see that PMMH produced as good results as fixing to a value close to the true parameter. The loss was smaller on average com- pared to what is obtained by mis-specifying the model. In Figure 2.15(b), we show boxplots for the average entropy loss when the underlying true value of ρ is set to 0.8. The PMMH algorithm performs similar to the other algo- rithms with fixed values; however, it gives more consistent results between data sets. The average loss computed from the other algorithms seems to be more variable across data sets. In the last Figure 2.15(c), boxplots of the average entropy loss are produced for an underlying ρ value of 0.95. It appears that PMMH performs a little worse than a properly specified parameter, but does similar than a mis-specified model. In the previous simulation, the tuning parameter c was fixed at 0.3 across all experiments. This resulted in a wide range of acceptance probabilities plotted in Figure 2.16. The acceptance probability is very data dependent. This means that when considering a problem, one must ensure that the PMMH algorithm is tuned specifically to that problem. This is done by ad- justing c and the number of particles K to produce acceptance probabilities in the 20%-30% range. 53 2.6. Particle MCMC for Fixed Parameter Estimation 5 10 15 20 25 30 35 40 45 50 Fixed at 0.5 Fixed at 0.75 Fixed at 0.98 PMMH Ave rag e E ntro py Los s True ρ =0.4 (a) The true ρ underlying the data is set to 0.40 5 10 15 20 25 30 35 Fixed at 0.5 Fixed at 0.75 Fixed at 0.98 PMMH Ave rag e E ntro py Los s True ρ =0.8 (b) The true ρ underlying the data is set to 0.80 5 10 15 20 25 30 35 40 45 50 Fixed at 0.5 Fixed at 0.75 Fixed at 0.98 PMMH Ave rag e E ntro py Los s True ρ =0.95 (c) The true ρ underlying the data is set to 0.95 Figure 2.15: Average entropy loss for Σ−1 computed over 30 independent experiments under a fixed ρ and using the PMMH algorithm. 54 2.6. Particle MCMC for Fixed Parameter Estimation 0 0.05 0.1 0.15 0.2 0.25 0.3 0.35 0.4 0.45 0.5 1 PMMH Acceptance Probability over independent data sets (a) ρ = 0.40 0 0.1 0.2 0.3 0.4 0.5 1 PMMH Acceptance Probability over independent data sets (b) ρ = 0.80 0 0.1 0.2 0.3 0.4 0.5 0.6 1 PMMH Acceptance Probability over independent data sets (c) ρ = 0.95 Figure 2.16: Acceptance probability of PMMH on 30 different data sets, generated with different true value of ρ. 2.6.3 Estimating Both M and S In this section, we extend the model to account for both an unkown but static M and S. As mentioned earlier, while M controls the temporal dynamics between covariance matrices at different time steps, S controls the spread of each covariance matrix at each time step. We extend the PMMH algorithm introduced in the previous section to jointly propose candidates for M and S. The algorithm would be identitical to the one outlined in Algorithm (2.4). However, we now propose a new triplet (M∗, S∗,Σ−1∗1:T ). We assume that M and S are independent a priori, such that the acceptance probability of the MH ratio is given by 1 ∧ p̂M∗(Z1:T )p(M ∗)p(S∗) p̂M (Z1:T )p(M)p(S) q(M |M∗) q(M∗ |M) q(S | S∗) q(S∗ | S) . As before, if accepted the update (M,S,Σ−11:T )→ (M∗, S∗,Σ−1∗1:T ) is made, otherwise the current state is kept at (M,S,Σ−11:T ). This process is repeated. 55 2.6. Particle MCMC for Fixed Parameter Estimation For simplicity, we begin by exploring this idea for the scalar parameters such that M = ρIJ and S = s 2Ij . We fix d = J + 1 as before. The prior distribution of p(ρ) is the uniform on (0, 1) as before and therefore it cancels in the numerator and the denominator. As for s2, we assume an inverse χ2 prior distribution. We propose candidates for pM,S(Σ −1 1:T | Z1:T ) using the SMC proposal suggested in Algorithm (2.1). We can sample ρ, as previously, from a uni- variate Gaussian truncated to the (0, 1) region, centered at the current value of ρ and with variance c21. As for S, there are a variety of proposals that can be used to generate candidates(Browne, 2006). For simplicity, we propose to use another Gaussian truncated to the (0,∞) region, centered at the current value of s2 and with variance c22. The above MH ratio becomes 1 ∧ p̂M∗,s∗2(Z1:T )p(S ∗) p̂M,S(Z1:T )p(s2) Φ ( 1−ρ∗ c ) − Φ ( −ρ∗ c ) Φ ( 1−ρ c ) − Φ (−ρc ) 1− Φ ( −s2∗ c ) 1− Φ ( −s2 c ) , (2.36) where p̂M∗,S∗(Z1:T ) is readily available from the SMC algorithm. In order to test this algorithm we generate data from a model with underlying ρ = 0.98, S = 1. We generate a series of length T = 100 and we run the algorithm using K = 2000 particles for 2000 MCMC iterations. In Figures 2.17(a) and 2.17(b), we monitor the posterior distribution of M and S respectively as we vary c1 and c2, We initialize both ρ(0) = s(0) = 0.5. Here we notice, similar to what was noted previously, the acceptance probability decreases as a function of c1 and c2. 56 2.6. Particle MCMC for Fixed Parameter Estimation 0 2000 0 0.2 0.4 0.6 0.8 1 AP= 52.9 % c 1 = 0 . 0 5 0 2000 0 0.2 0.4 0.6 0.8 1 AP= 41.02 % 0 2000 0 0.2 0.4 0.6 0.8 1 AP= 28.16 % 0 2000 0 0.2 0.4 0.6 0.8 1 AP= 16.86 % 0 2000 0 0.2 0.4 0.6 0.8 1 AP= 33.66 % c 1 = 0 . 2 0 2000 0 0.2 0.4 0.6 0.8 1 AP= 27.28 % 0 2000 0 0.2 0.4 0.6 0.8 1 AP= 16.62 % 0 2000 0.4 0.5 0.6 0.7 0.8 0.9 1 AP= 11.6 % 0 2000 0 0.2 0.4 0.6 0.8 1 AP= 22.06 % c2=0.05 c 1 = 0 . 5 0 2000 0 0.2 0.4 0.6 0.8 1 AP= 18.46 % c2=0.2 0 2000 0 0.2 0.4 0.6 0.8 1 AP= 12 % c2=0.5 Posterior of ρ 0 2000 0 0.2 0.4 0.6 0.8 1 AP= 7.18 % c2=1 (a) Posterior for ρ 0 2000 0 0.5 1 1.5 2 AP= 52.9 % c 1 = 0 . 0 5 0 2000 0 0.5 1 1.5 2 AP= 41.02 % 0 2000 0 0.5 1 1.5 2 AP= 28.16 % 0 2000 0.5 1 1.5 2 AP= 16.86 % 0 2000 0 0.5 1 1.5 2 AP= 33.66 % c 1 = 0 . 2 0 2000 0 0.5 1 1.5 2 AP= 27.28 % 0 2000 0 0.5 1 1.5 2 AP= 16.62 % 0 2000 0.5 1 1.5 2 AP= 11.6 % 0 2000 0.5 1 1.5 AP= 22.06 % c2=0.05 c 1 = 0 . 5 0 2000 0.5 1 1.5 2 AP= 18.46 % c2=0.2 0 2000 0 0.5 1 1.5 2 AP= 12 % c2=0.5 Posterior of S 0 2000 0.5 1 1.5 2 AP= 7.18 % c2=1 (b) Posterior for S Figure 2.17: The posterior traceplots of the parameter ρ and s2 obtained using the PMMH algorithm with a fixed K = 2000 and simultaneously increasing c1 and c2. 57 2.6. Particle MCMC for Fixed Parameter Estimation In Figures 2.18(a) and 2.18(b), we consider the posterior distribution obtained for ρ and S for c1 = c2 = 0.2, with acceptance probability of 27%. In this case, for this particular data set, similar to when we were estimating only ρ, we find that the posterior distribution of ρ has a median value of 0.73 and that of s2 has a median value of 1.04. 58 2.7. Summary and Conclusion 0 1000 2000 3000 4000 5000 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 Posterior Distribution of ρ 0 0.5 1 0 500 1000 1500 2000 2500 Median −− 0.736 95 CI −− [0.527,0.868] (a) Posterior for ρ 0 1000 2000 3000 4000 5000 0.2 0.4 0.6 0.8 1 1.2 1.4 1.6 1.8 Posterior Distribution of S 0 0.5 1 1.5 2 0 200 400 600 800 1000 1200 1400 1600 1800 2000 2200 Median −− 1.04 95 CI −− [0.765,1.44] (b) Posterior for S Figure 2.18: The posterior traceplots of the parameter ρ and s2 obtained using the PMMH algorithm for c1 = c2 = 0.2. 2.7 Summary and Conclusion In this Chapter, we have proposed a model that can be useful to understand the dynamic correlations (or co-movement) between multidimensional lon- 59 2.7. Summary and Conclusion gitudinal data. Using SMC, with fixed hyperparameters and given enough particles, we are able estimate the correlation matrix reasonably well in time. However, in many cases the interest lies in estimating the hyperparameters which are assumed to be static. We have shown that through the use of particle MCMC we are able to develop a valid algorithm to infer static as well as dynamic parameters. As a final illustration of the different methods we considered in this Chapter, we compare the results of the different algorithms on the same data set. Particularly, we look at where the true precision parameters lie in comparison to the posterior median and 95% credible interval of the estimated parameter. In Figure 2.19 we plot the posterior distribution of the precision in time for one dimension, with J = 1. For this particular simulation, we generated data with n = 1, T = 200 with a fixed underlying true ρ = 0.98, d = J + 1, and S = 1. Then we ran the different algorithms with K = 2000 particles. For the PMMH algorithm, we ran 2000 MCMC iterations. When estimating only M , we set the step size to c = 0.1 to result in a 30% acceptance probability. For estimating both M and S, we set c1 = 0.05 and c2 = 1 to result in a 24% acceptance probability. We can see that all the algorithms produce very similar estimates of the median precision. The results are similar, for both when we estimate the hyperparameters and when we fix them to their true values. 60 2.7. Summary and Conclusion 0 20 40 60 80 100 120 140 160 180 200 −2 0 2 4 6 8 10 12 time True Precision Median Precision (a) Using the Non-central χ2 Proposal 0 20 40 60 80 100 120 140 160 180 200 −2 0 2 4 6 8 10 12 time True Precision Median Precision (b) Using the Central χ2 Proposal 0 20 40 60 80 100 120 140 160 180 200 −2 0 2 4 6 8 10 12 time True Precision Median Precision (c) Using the PMMH Algorithm, Esti- mating only M 0 20 40 60 80 100 120 140 160 180 200 −2 0 2 4 6 8 10 12 time True Precision Median Precision (d) Using the PMMH Algorithm, Es- timating M and S Figure 2.19: Posterior distribution of the estimated precision in the univari- ate case (J = 1) with 95% credible interval using the different algorithms. To obtain the results in Figures 2.20, 2.21 and 2.22, we generated data with n = 1, J = 3, T = 100 with a fixed underlying true ρ = 0.98IJ , d = J + 1, and S = IJ . Then we ran the different algorithms (SMC with fixed M and S, PMMH with fixed S, and PMMH with neither parameter fixed ) with K = 2000 particles. For the PMMH algorithm, we ran 2000 MCMC iterations. When estimating only M , we set the step size to c = 0.25 to result in a 32% acceptance probability. For estimating both M and S, we set c1 = 0.08 and c2 = 0.8 to result in a 22% acceptance probability. After obtaining the estimates of Σ−1, we transform to R an we consider the median off-diagonal elements as well as the 95% credible interval. In comparing the three methods, we note that while the median estimate is very similar from 61 2.7. Summary and Conclusion one algorithm to the next. From Figure 2.20, we notice that the SMC algorithm begins to degenerate. The particle MCMC algorithm in Figure 2.21 and 2.22 does not degenerate, even if the SMC proposal does. This is what we would expect because, by construction, in the PMMH algorithm, we use SMC to obtain one sample path approximately according to the posterior; we do not need a good SMC approximation of the joint. Finally, we compare the time it takes to run each algorithm (Table 2.2), and obviously PMMH takes a lot longer to run in comparison with SMC. SMC SMC PMMH PMMH (NCW Prop) (CW Prop) (Est. M) (Est. M and S) J = 1 0.5 5.0 31.5 35.8 J = 3 0.3 – 611.9 578.9 Table 2.2: Time to run different algorithms, in minutes. The drawback of any Metropolis-Hasting scheme is that it heavily de- pends on the proposal distribution. The proposal we developed here as part of the pMCMC algorithm needs tuning with respect to the step-size of the random walk for each data set. As we have seen in our many simulations, too small a value for c will results in most candidates being accepted but the algorithm moves very slowly and a longer chain is required to explore the space. Too large a value for c results in better mixing however, fewer candidates are accepted. Future work needs to explore methods to adap- tively and automatically tune these parameters to optimize mixing (Gilks et al., 1998). Theoretical and empirical results suggest that the acceptance probability should be in 15-50% range for a random walk algorithm (Robert, 1996). Finally, a better implementation of a method that evaluates the oF1 hypergeometric function in higher dimensions, needs to be considered. It is important that the method be efficient, in order to keep the computation time reasonable. This would be particularly helpful for data set with a large number of observations n. 62 2.7. Summary and Conclusion 0 10 20 30 40 50 60 70 80 90 100 −2 −1.5 −1 −0.5 0 0.5 1 1.5 2 2.5 3 time True Correlation Median Correlation (a) Marginal distribution of R12 0 10 20 30 40 50 60 70 80 90 100 −2 −1.5 −1 −0.5 0 0.5 1 1.5 2 2.5 3 time True Correlation Median Correlation (b) Marginal distribution of R13 0 10 20 30 40 50 60 70 80 90 100 −2 −1.5 −1 −0.5 0 0.5 1 1.5 2 2.5 3 time True Correlation Median Correlation (c) Marginal distribution of R23 Figure 2.20: Plot of the true correlation and the median estimated correla- tion along with a 95% credible interval, obtained using the SMC algorithm with the Non-central Proposal. 63 2.7. Summary and Conclusion 0 10 20 30 40 50 60 70 80 90 100 −2 −1.5 −1 −0.5 0 0.5 1 1.5 2 2.5 3 time True Correlation Median Correlation (a) Marginal distribution of R12 0 10 20 30 40 50 60 70 80 90 100 −2 −1.5 −1 −0.5 0 0.5 1 1.5 2 2.5 3 time True Correlation Median Correlation (b) Marginal distribution of R13 0 10 20 30 40 50 60 70 80 90 100 −2 −1.5 −1 −0.5 0 0.5 1 1.5 2 2.5 3 time True Correlation Median Correlation (c) Marginal distribution of R23 Figure 2.21: Plot of the true correlation and the median estimated correla- tion along with a 95% credible interval, obtained using the PMMH algorithm estimating M and fixing S. 64 2.7. Summary and Conclusion 0 10 20 30 40 50 60 70 80 90 100 −2 −1.5 −1 −0.5 0 0.5 1 1.5 2 2.5 3 time True Correlation Median Correlation (a) Marginal distribution of R12 0 10 20 30 40 50 60 70 80 90 100 −2 −1.5 −1 −0.5 0 0.5 1 1.5 2 2.5 3 time True Correlation Median Correlation (b) Marginal distribution of R13 0 10 20 30 40 50 60 70 80 90 100 −2 −1.5 −1 −0.5 0 0.5 1 1.5 2 2.5 3 time True Correlation Median Correlation (c) Marginal distribution of R23 Figure 2.22: Plot of the true correlation and the median estimated correla- tion along with a 95% credible interval, obtained using the PMMH algorithm estimating both M and S. 65 Chapter 3 Static Correlation 3.1 Introduction Estimation of the association structure in the multivariate case, under cer- tain settings, simplifies greatly. For example, when the data under consid- eration are multivariate but not longitudinal, or longitudinal but not multi- variate. Referring back to Figure 1.1 in the Introduction, this corresponds to a slice of the cube with dimensions n× J for a slice T = 1 if the data is non-longitudinal, or n× T for J = 1 when the data is longitudinal but not multivariate. Moreover, when dealing with problems of high dimensions, it can be very useful from a computational viewpoint to reduce the number of parameters to be estimated. This is accomplished by imposing a structure of association on the covariance matrix. Furthermore, in certain cases, the researcher may be interested in comparing different hypotheses of patterns of association. In other cases, the data might follow a natural grouping, where certain subsets of variables will express a higher degree of association within, and less association between other groups of variables. Imposing a structure of association is done by imposing constraints to force the inverse covariance to lie in a lower dimensional space. The number of parameters to be estimated is reduced, which provides efficiency gains and facilitates interpretation. Nonetheless, without a priori knowledge or subject matter expertise, it might be difficult to know what structure to impose. In this case, it is possible to learn the structure from the data; however, this also adds to the computational effort. If one is willing to make the assumption that the observed data follow a multivariate Gaussian distribution, the estimation of the covariance matrix 66 3.2. Conditional Independence under this setting is straightforward. It can be achieved by appealing to the well-developed theory of covariance selection in Gaussian graphical models (Dempster, 1972). The structure of association among variables is imposed through conditional independence by fixing certain off-diagonal elements of the inverse covariance matrix to zero. In this Chapter, we present methodology used for the Bayesian estima- tion of a full as well as a constrained covariance and correlation matrix in the context of Gaussian Graphical Models. We first present some of the well established methodology used for Gaussian data. These methods are then extended to accommodate binary data, using the multivariate probit model. Estimation of the full and constrained covariance for binary data was ad- dressed in Tabet (2007). Here, we build on this work and we provide an extension to learn the structure of association underlying the binary data. Throughout this chapter we assume the data to be mean-centered at 0 with no predictors in the model. 3.2 Conditional Independence The concept of conditional independence plays a key role in how constraints are imposed on the association between variables in this Chapter. Therefore, it is useful to consider its definition and interpretation more closely. The covariance matrix Σ, or equivalently the correlation matrix R, en- codes the marginal association between variables. Usually, when two vari- ables exhibit a high marginal association, the pairwise estimate of their correlation or covariance is high. However, these two variables may be af- fected by a third variable, which acts as a mediating variable or confounder. For example, consider the following three variables: smoking, drinking coffee and lung cancer. Drinking coffee and lung cancer may express a high degree of association, however once information about smoking is available, drink- ing coffee and lung cancer become decoupled because, smoking accounts for the association between drinking coffee and lung cancer. In order to filter out the indirect dependence caused by intermediate or confounding vari- ables, one could control for all other variables in the model by considering 67 3.3. Gaussian Graphical Models the conditional dependence. In the context of Gaussian data, conditional dependence is encoded by the precision matrix, the inverse covariance, Σ−1. Let σij denote the ij-th element of Σ−1. When σij = 0, it implies that variables i and j are conditionally independent, given all the other variables in the model. Generally, most variables are marginally correlated, which implies that zero correlation is a strong indicator for independence. On the other hand, conditional independence accounts for confounding variables and indirect associations, providing a strong measure of dependence. Gaussian graphical models provide a convenient framework for imposing a conditional indepen- dence structure of association between Gaussian variables. 3.3 Gaussian Graphical Models Gaussian graphical models are a class of undirected graphical models that provide a graphical representation of the inverse covariance matrix Σ−1, characterizing conditional independence between Gaussian variables. An edge is drawn between any two nodes in the graph unless these two nodes are conditionally independent given all the other variables in the model. This way, graphs provide a clear representation of the interrelationship be- tween variables in the model. This approach to modeling data is known as covariance selection (Dempster, 1972). The introduction of the hyper- inverse Wishart distribution by Dawid and Lauritzen (1993) as a conjugate prior for structured covariance matrices was central in the development of Bayesian approaches for inference in this class of models (Giudici (1996), Giudici and Green (1999), and Wong et al. (2003)). 3.4 Some Graph Theory In this section, we present some key graph theory concepts and definitions necessary to understand the statistical models that will be developed in this Chapter. For a complete account of graphical model theory, the reader is referred to Lauritzen (1996). 68 3.4. Some Graph Theory An undirected graph is a pair G = (V,E), where V is a set of vertices representing variables and E, the edge-set, is a subset of the set of unordered distinct pair of vertices. Visually, each vertex i is a node representing the random variable i and an edge (i, j) ∈ E is an undirected edge connecting nodes i and j unless they are conditionally independent. In the example of Figure 3.1(a), V1 ⊥⊥ V3 | V2 is used to denote that V1 is conditionally independent of V3 given V2. Below are some graph theoretic definitions which we will call upon in this Chapter: - A subgraph is a graph which has as its vertices some subset of the vertices of the original graph. - A graph or a subgraph is complete or fully connected if there is an edge connecting any two nodes. - A clique is a complete subgraph. - Let A and B be subgraphs. A set S is said to separate A from B if all paths from A to B go through S. - Subgraphs (A,B, S) form a decomposition of G if V = A∪B,S = A∩B, where S is complete and separates A from B. - A sequence of subgraphs that cannot be further decomposed are the prime components of a graph. - A graph is said to be decomposable if every prime component is complete. - A junction tree is a tree with set of vertices equal to the set of prime components of G such that for any two prime components Pi and Pj and any T on the path between Pi and Pj , Pi ∩ Pj ⊂ T . This representation is critical for all computations done in graphical models. - A distribution p(.) is Markov with respect to a graph G, if for any decom- position (A,B) of G, A ⊥⊥ B|A∩B, where A∩B is complete and separates A from B. 69 3.4. Some Graph Theory - If the joint distribution of a set of variables V is Markov with respect to a graph G, then it can be decomposed according to the graph structure as a ratio of products over prime components and separators p(V |G) = ∏ P∈P p(VP )∏ S∈S p(VS) . (3.1) where P is the set of all prime components, S is the set of all separators. Computational efficiencies are a direct consequence of the Markov prop- erties in graphical models, because they give the ability to model subsets of variables locally. In this research, we restrict our attention to decomposable graphs. Fig- ure 3.1, is an example of a decomposable undirected graphical model, taken from Carvalho (2006). The original graph in 3.1(a) can be decomposed into its junction tree in 3.1(b). This graph has five cliques and four seperators. In this graph, {{1, 2, 5}, {2, 4, 5, 6}, {2, 3, 4}, {4, 6, 7}, {6, 7, 8, 9}} is the set of cliques and {{2, 5}, {2, 4}, {4, 6}, {6, 7}} is the separators set. 70 3.5. Bayesian Estimation in Gaussian Graphical Models 2   5   4   6   7   8   9   1   3   (a) 2   4   3   2   5   4   6   6   7   8   9   2   5   1   4   6   7   S={2,4}   S={2,5}   S={4,6}   S={6,7}   (b) Figure 3.1: A decomposable graph and its junction tree decomposition. Each node of the junction tree represents a clique while vertices shared by adjacent nodes of the tree define the separators. 3.5 Bayesian Estimation in Gaussian Graphical Models Let Zi = (Zi1, . . . , ZiJ) denote the J-dimensional vector of continuous re- sponses on the ith observation (1 ≤ i ≤ n). Given a graph structure G, 71 3.5. Bayesian Estimation in Gaussian Graphical Models we assume that Zi iid∼ NJ(0,Σ), follows a matrix variate normal distribution consistent with G. In this Chapter, we assume without loss of generality, that the response is centered at 0 and the probability density function of the matrix-variate Gaussian distribution simplifies under the i.i.d assumption to f(Z | Σ, G) = 1 (2pi)nJ/2 |Σ|−n/2etr ( −1 2 Σ−1ZZ ′ ) . (3.2) In a Bayesian paradigm, given a graph structure, the inference is based on the posterior distribution pi(Σ | Z,G) ∝ f(Z | Σ, G)pi(Σ | G), (3.3) where the distribution of the likelihood p(Z | Σ, G) is given in (3.2) and p(Σ | G) is the prior on the covariance matrix. 3.5.1 Prior and Posterior for Covariance Matrices When the graph G is fully connected as in the example of Figure 3.2(a), this corresponds to a saturated model where all the elements of Σ−1 are different from 0 and no constraints are imposed. 1   5   2   3   4   (a) An example of a fully connected graph 1   5   2   3   4   (b) An example of a structured graph Figure 3.2: An example of two different graph structures, resulting in a fully connected precision in 3.2(a) versus a sparse precision matrix in 3.2(b). 72 3.5. Bayesian Estimation in Gaussian Graphical Models In this case, the prior for Σ is the conjugate inverse Wishart prior denoted by Σ ∼ IWJ(ν,Ω) with degrees of freedom ν and inverse scale matrix Ω defined as in Dawid and Lauritzen (1993) ∣∣Ω 2 ∣∣( ν+J−12 ) ΓJ ( ν+J−1 2 ) |Σ|−(ν+2J)/2exp(−1 2 tr [ Σ−1Ω ]) , (3.4) where ΓJ denotes the multivariate gamma function. Note that this conven- tion for the inverse Wishart distribution is different from the one used in the previous Chapter. Refer to Appendix A.3 for the equivalence. The resulting posterior distribution is available analytically in closed form and is given by: pi(Σ | Z,G) ∝ f(Z | Σ, G)pi(Σ | G) ∝ |Σ|−n/2etr ( −1 2 Σ−1ZZ ′ ) × |Σ|−(ν+2J)/2exp ( −1 2 tr [ Σ−1Ω ]) ∝ |Σ|−(n+ν+2J)/2exp ( −1 2 tr [ Σ−1 (Ω + ZZ ′) ]) , which is an inverse Wishart denoted by IW(n+ν,Ω+ZZ ′) , with updated degrees of freedom and inverse scale matrix. On the other hand, consider a structured graph as in the example in Figure 3.2(b). The precision matrix is sparse with elements σij = 0, when- ever an edge is missing between variables i and j. In this case the inverse Wishart is no longer a valid prior distribution. Under a structured graph assumption, the hyper-inverse Wishart dis- tribution, defined by Dawid and Lauritzen (1993), is a locally conjugate family of Markov probability distributions for matrices on decomposable graphs. Let Σ be a J × J covariance matrix consistent with the decompos- able graph G, then Σ follows a hyper-inverse Wishart distribution denoted by HIWJ(ν,Ω, G) with degrees of freedom ν > 0 and location matrix Ω > 0 which can be decomposed as pi(Σ| ν,Ω, G) = ∏ P∈P pi(ΣP | ν,ΩP )∏ S∈S pi(ΣS | ν,ΩS) . (3.5) 73 3.5. Bayesian Estimation in Gaussian Graphical Models The hyper-inverse Wishart distribution is the unique hyper-Markov dis- tribution for Σ that is consistent with clique-marginals that are inverse Wishart distributed. This means that for a given decomposable graph structure G, if Σ ∼ HIWJ(ν,Ω, G), then the covariance matrix of each prime component P follows an inverse Wishart distribution with ΣP ∼ IWJ(ν,ΩP ). As seen earlier, the multivariate Gaussian distribution is Markov with respect to any decomposable graph G and therefore the joint likelihood of the Gaussian variables Z decomposes according to the graph structure as a ratio of products over prime components and separators f(Z|Σ, G) = ∏ P∈P f(ZP |ΣP )∏ S∈S f(ZS |ΣS) , (3.6) where P is the set of all prime components, and S is the set of all separators. We can therefore write the posterior distribution as pi(Σ | Z,G) ∝ f(Z | Σ, G)p(Σ | G) ∝ ∏ P∈P f(ZP |ΣP )∏ S∈S f(ZS |ΣS) · ∏ P∈P pi(ΣP | ν,ΩP )∏ S∈S pi(ΣS | ν,ΩS) ∝ ∏ P∈P f(ZP |ΣP )pi(ΣP | ν,ΩP )∏ S∈S f(ZS |ΣS)pi(ΣS | ν,ΩS) = ∏ P∈P pi(ΣP | ZP )∏ S∈S pi(ΣS | ZS) . The posterior distribution is also hyper-inverse Wishart with updated pa- rameters Σ| (Z, ν,Ω, G) ∼ HIWJ(n+ ν,Ω + ZZ ′). Efficient methods for sampling from the hyper-inverse Wishart distribu- tion rely on the junction tree representation of the graph and matrix results for inverse Wishart distributions (Carvalho and West, 2007). Intuitively, the zero structure is fixed in the inverse covariance and matrix completion meth- ods are used to ensure that the resulting covariance is positive semi-definite and unique. 74 3.5. Bayesian Estimation in Gaussian Graphical Models 3.5.2 Covariance Selection and Model Determination If the structure of the inverse correlation matrix is unknown, it is possible to estimate it directly from the data. This could be achieved by making the graph structure G an unknown parameter and assigning it a prior distribu- tion pi(G), and considering its posterior distribution for inference p(G | Z) ∝ p(Z | G)p(G). (3.7) The marginal likelihood is given by the following integral over the space of positive definite symmetric matrices consistent with the structure of G p(Z | G) = ∫ Σ∈M(G) f(Z | Σ, G)pi(Σ | G)dΣ, (3.8) where M(G) is the set of all possible symmetric positive definite matrices consistent with G. For decomposable Gaussian graphical models, the marginal likelihood p(Z | G) is easily computed because the covariance matrix can be integrated out. This is due to the fact that the hyper-inverse Wishart is a locally conjugate prior and the posterior admits a closed form. In the case of decomposable models, this integral is given by a ratio of normalizing constants from the prior and the posterior p(Z | G) = 2pi−nJ/2 h(G, ν,Ω) h(G, νn,Ωn) , (3.9) where ν and Ω are the hyperparameters of the prior on Σ and νn = ν+n and Ωn = Ω + ZZ ′ are the updated posterior hyperparameters; and h denotes the normalizing constant of the hyper-inverse Wishart given by h(G, ν,Ω) = ∏ C∈C |ΩC2 |( ν+|C|−1 2 )Γ|C|( ν+|C|−1 2 ) −1∏ S∈S |ΩS2 |( ν+|S|−1 2 )Γ|C|( ν+|S|−1 2 ) −1 . (3.10) 75 3.5. Bayesian Estimation in Gaussian Graphical Models The posterior distribution over graphs can be computed as p(G | Z) = p(Z | G)pi(G)∑Dc i=1 p(Z | G)pi(G) , (3.11) where Dc is the total number of decomposable graphs. The marginal like- lihood p(Z | G) can be computed using (3.9). If we assume that all de- composable graphs of any given size are equally likely, we can set a uniform prior pi(G) = 1/Dc. This prior on G, as noted by Armstrong et al. (2009), will tend to favour graphs of medium size, where the size of a graph is given by the number of its edges. There are other sparsity inducing priors which can be considered. A Bernoulli prior on the inclusion of edges for example, would penalize a large number of edges. Other possible priors can be con- sidered as well (Bornn and Caron, 2011). However, for illustration purposes and computational convenience, in this thesis we limit our consideration to the uniform prior pi(G) = 1/Dc. Enumerating all possible graph structures can be a very arduous task since the number of possible graphs is 2 J(J−1) 2 . For graphs larger than 6 nodes, the computation problem becomes daunting. As an alternative, for higher dimensions, we can sample the graph from its posterior distribution by appealing to well developed MCMC methods for sampling decomposable graphs (Giudici and Green, 1999). The Monte Carlo sampling algorithm suggested by Giudici and Green (1999) to sample from the posterior distribution starts with a current graph Gc and proceeds by randomly selecting the index of two vertices such that the addition or the deletion of an edge between these two vertices would result in a decomposable graph. These types of moves, because they only differ by one edge are considered local moves and provide very important computational gains. The proposed resulting graph, denoted by Gp, is then accepted according to the following Metropolis-Hastings acceptance probability min { 1, p(Z | Gp)pi(Gp) p(Z | Gc)pi(Gc) } . (3.12) 76 3.6. Bayesian Estimation for Binary Data This results in samples generated from the posterior distribution over all graphs p(G | Z). It is important to note that most MCMC approximations of the posterior are expected to be slow to converge in these algorithms, particularly for large graphical models, with many nodes. 3.6 Bayesian Estimation for Binary Data In this section, we change the focus from a continuous response to an ob- served binary response. If the data under consideration are binary response variables, it is common to use the multivariate Probit model where the likelihood is written in terms of latent Gaussian variables. This is an exten- sion to the linear model that we have seen introduced in Chapter 1 using a probit link instead of the identity link. As discussed in the introduction, in this class of models, the covariance matrix needs to be constrained to a correlation matrix as in Chib and Greenberg (1998) in order to ensure identifiability. Let Yi = (Yi1, . . . , YiJ) denote the J-dimensional vector of observed binary 0/1 responses on the ith subject (1 ≤ i ≤ n), and R = (rij) a J × J correlation matrix. In the multivariate probit model we have pr(Yi = yi | R) = ∫ AiJ . . . ∫ Ai1 NJ(0, R)du, (3.13) where Aij is the interval [(0,∞) if yij = 1 and (−∞, 0) otherwise and Nk(u;m,Σ) is the density of a k-variate Gaussian distribution with argu- ment u, mean vector m and covariance matrix Σ. Figure 3.6 provides a graphical representation of the multivariate Probit model. The observed binary data are independent conditional on the latent Gaussian variables. Estimation in this class of models is not simple since a prior specifica- tion on R is not straightforward; there are no conjugate priors for correlation matrices. Several priors on R have been proposed in the literature. In an influential paper, Chib and Greenberg (1998) use a jointly uniform prior for the correlation pi(R) ∝ 1. This prior poses several problems. Computation- ally, it results in a posterior that is not easy to sample. Chib and Greenberg (1998) propose a Metropolis-Hastings (MH) random walk algorithm to sam- 77 3.6. Bayesian Estimation for Binary Data Z1 Y1 Z2 Y2 Z3 Y3 Figure 3.3: A graphical representation of the multivariate model with J = 3. In this model Y represents the observed binary data and Z represents the latent correlated Gaussian data. ple the correlation matrix drawing the correlation coefficients in blocks. The resulting proposal is not guaranteed to be a correlation matrix and more- over, as with random walk algorithms in general, the mixing is slow in high dimensions. Furthermore, Barnard et al. (2000) show that this prior tends to push marginal correlations to zero in high dimensions, making it very informative marginally. Similarly, the Jeffrey prior used by Liu (2001) tends to push marginal correlations to ±1 in high dimensions. Barnard et al. (2000) propose using a marginally uniform prior for R given by pi(R) ∝ |R|J(J−1)2 −1( J∏ i=1 |Rii|)−(J+1)/2, (3.14) where Rii is the principal submatrix of R. This prior was suggested but not used in Zhang et al. (2006). Despite not being conjugate, Tabet (2007) show that this prior can be used within the context of a parameter expansion and data augmentation (PXDA) algorithm, to develop an efficient MCMC 78 3.6. Bayesian Estimation for Binary Data algorithm for inferring the posterior distribution of R. Given n binary observations Y = (Y1, . . . , Yn) and the prior density pi(R), we are interested in the posterior density pi(R | Y ) ∝ pi(R) n∏ i=1 pr(Yi | R). The standard approach in the latent model is to introduce explicitly the latent variables Z = (Z1, . . . , Zn) and to consider instead pi(R,Z | Y ) ∝ pi(R) n∏ i=1 NJ(Zi | R)I (Zi ∈ Ai) . (3.15) To sample from pi(R,Z | Y ), we use an MCMC algorithm as in Tabet (2007). The algorithm alternates between simulating the latent Gaussian data and estimating the model parameters. The simulation of Z, the latent variables, is standard. The full conditional density of Z factorizes as pi(Z | Y,R) = n∏ i=1 pi(Zi | Y,R), where pi(Zi | Y,R) ∝ NJ(Zi | R)I (Zi ∈ Ai) is a truncated multivariate Gaussian. For J = 1, we can sample from it directly. Otherwise, we sample approximately from it by cycling through a series of univariate truncated Gaussians (Geweke, 1991). In each step Zij is simulated from pi (Zij |Zi,−j , R), which is a univariate Gaussian distribution truncated to [0,∞) if Yij = 1 and to (−∞, 0) if Yij = 0. To sample the univariate truncated Gaussians, we use the efficient algorithm of Robert (1995). To sample the correlation matrix R, we adopt a parameter expansion framework. We define a transformation on the latent variables W = ZD where D = diag(d1, . . . , dJ) is the expansion parameter with di > 0. It 79 3.6. Bayesian Estimation for Binary Data follows that pi(W | R,D) =MN n,J (0, DRD, In) . We then define the new posterior pi(R,D,W | Y ) ∝ pi(R)pi(D | R)pi(W | R,D) n∏ i=1 I (Wi ∈ Ai) . (3.16) Due to the lack of identifiability mentioned in Chapter 1, the marginals in R under (3.15) and (4.8) are similar whatever be the density pi(D | R). We set pi(D | R) = ∏Ji=1 pi(di | R) with d2i ∼ IG ( (J + 1) /2, rii/2 ) , (3.17) where the notation rij is used to refer to the ijth element of R−1 and IG (a, b) denotes the inverse-gamma distribution with shape parameter a and scale parameter b. The reason for this choice is that it follows from Barnard et al. (2000) that Σ = DRD ∼ IW (2, IJ) when R follows (3.14) and where IW (ν, Ω) denotes the inverse Wishart distribution with degrees of freedom ν and inverse scale matrix Ω defined as in Dawid and Lauritzen (1993). Hence to sample jointly (R,D) from pi(R,D | Y,W ) = pi(R,D | W ), we can simply perform a change of variables Σ = DRD, sample Σ according to the resulting full conditional density pi(Σ | W ), and then compute R = D−1ΣD−1 where di = (σii)1/2, and σii is the i-th diagonal element of Σ. It is easy to check that pi(Σ |W ) = IW(2 + n, IJ +WW ′) (3.18) The full algorithm is summarized in Algorithm (3.1). 80 3.6. Bayesian Estimation for Binary Data Algorithm 3.1 PXDA Algorithm (Saturated Model) Given a current value of (R,Z,D), • Sample Zij ∼ pi (Zij |Zi,−j , R) for all i, j. • Sample D using (4.9) and compute W = ZD. • Sample Σ ∼ IW(2 + n, IJ +WW ′). • Compute R = D−1ΣD−1 where di = (Σii)−1/2. 3.6.1 Probit Model with a Structured Correlation Matrix In the multivariate Probit model, the association between the J columns of the n× J matrix of binary responses Y = (Y 1, . . . , Y J) is modeled via the correlations between the J columns of the n× J matrix of latent Gaussian variables Z = ( Z1, . . . , ZJ ) distributed according to (3.2) As we have seen in a previous section, conditional independence assump- tions of Gaussian latent variables are encoded by the inverse correlation R−1, such that rij = 0 implies that Zi and Zj are conditionally indepen- dent, given all the other variables in the model. Furthermore, we saw that Gaussian graphical models provide a convenient framework for imposing a conditional independence structure of association between variables. As noted in Webb and Forster (2008), the interpretation of conditional inde- pendence on the latent variables Z does not translate to the observed binary variables, so that Z1 ⊥⊥ Z2 | Z3 does not imply Y 1 ⊥⊥ Y 2 | Y 3, but does imply Y 1 ⊥⊥ Y 2 | Z3. In the Probit regression setting, we must define a prior on R and not on Σ. Marginally, we would like the correlation rij to be uniform on (−1, 1) if variable i is conditionally dependent on j and we would like rij , the ijth element of R−1, to be null if they are conditionally independent:{ rij = 0 if Zi ⊥ Zj | Z−i,−j ; rij ∼ U(−1, 1) otherwise. (3.19) Because the Gaussian graphical model is a graphical representation of 81 3.6. Bayesian Estimation for Binary Data the covariance or equivalently the correlation matrix, conditional indepen- dence between variables i and j implies rij = 0. Moreover, when variables i and j are conditionally dependent, they belong, by the definition of a decom- posable graph, to the same prime component P . From the property of the hyper-inverse Wishart, we know that ΣP is saturated and follows an inverse Wishart distribution. Therefore all the methods developed in section (3.5) can be applied locally at the prime component level to define a marginally uniform prior on RP . In order to obtain the marginally uniform prior on correlations, under the parametrization of Dawid and Lauritzen (1993), the degrees of freedom b should be set to 2, regardless of the size of the prime component and the location parameter for each prime component. Furthermore, ΩP should be set to the identity matrix. For each prime component, we then have the marginally uniform prior pi (RP ) pi(RP ) ∝ |RP | |P |(|P |−1) 2 −1 (∏ i∈P |RP,ii| )− (|P |+1) 2 (3.20) where |P | is the cardinality of the prime component P and RP,ii is the principal submatrix of RP . Following the same reasoning as earlier, we consider the scaled latent Gaussian variables W = ZD which follow a matrix-variate normal distribu- tion MN n,J (0,Σ, In). The PXDA algorithm can now be extended to ac- commodate a structured inverse correlation matrix with the only difference being that Σ is obtained by first sampling Σ | W,G ∼ HIW(νn,Ωn) where νn = 2+n and Ωn = IJ+WW ′, and then obtaining R using R = D−1ΣD−1. The algorithm is summarized in Algortithm (3.2). 82 3.6. Bayesian Estimation for Binary Data Algorithm 3.2 PXDA Algorithm (Structured Precision Matrix) Given a current value of (R,Z,D), • Sample Zij ∼ pi (Zij |Zi,−j , R) for all i, j. • Sample D using (4.9) and compute W = ZD. • Sample Σ | W,G ∼ HIW(νn,Ωn) where νn = 2 + n and Ωn = IJ + WW ′. • Compute R = D−1ΣD−1 where di = (Σii)−1/2. As in the continuous case, we can extend the model to include sampling of the graph structure given the data. Once the expansion parameter is sampled and the latent variables are transformed Z →W , in this expanded space, all the procedures applicable to Gaussian data are valid, and therefore sampling of the graph structure would proceed exactly as in section 3.5.2. Then given the graph structure, a hyperinverse Wishart distribution can be used to impose the constraint on the precision matrix such that it’s con- sistent with the graph structure. The complete algorithm using covariance selection is summarized in Algorithm (3.3). Algorithm 3.3 PXDA Algorithm (Covariance Selection) Given a current value of (G,R,Z,D), • Sample Zij ∼ pi (Zij |Zi,−j , R) for all i, j from a multivariate truncated Gaussian. • Sample D using (4.9) and compute W = ZD. • Propose a decomposable graph Gp by adding or deleting an edge and set G = Gp with probability given in (3.12). • Sample Σ | W,G ∼ HIW(νn,Ωn), where νn = 2 + n and Ωn = IJ + W ′W . • Compute R = D−1ΣD−1 where di = (Σii)−1/2. 83 3.7. Simulation Results 3.7 Simulation Results The aim of the first simulation is to demonstrate the ability of the algorithm to recover the true graph structure underlying observed binary variables given a design matrix of covariates. We generate n = 500 independent data points from the model where the correlation matrix R used to generate the Gaussian observations was set at R =  1.000 0.000 0.000 −0.491 0.000 0.000 1.000 −0.296 0.000 0.116 0.000 −0.296 1.000 0.000 −0.392 −0.491 0.000 0.000 1.000 0.000 0.000 0.116 −0.392 0.000 1.000  (3.21) corresponding to the inverse correlation R−1 =  1.318 0.000 0.000 0.647 0.000 0.000 1.096 0.325 0.000 0.000 0.000 0.325 1.278 0.000 0.464 0.647 0.000 0.000 1.318 0.000 0.000 0.000 0.464 0.000 1.182  (3.22) We ran the MCMC algorithm with model selection as described in Sec- tion 3.5.2 on the simulated data and N = 50, 000 Monte Carlo samples were collected. Of the 822 possible decomposable structures for a graph with five nodes, the algorithm explored 271 graphs. From Figure 3.4, we see that the true graph structure underlying the data was the most probable struc- ture sampled by the algorithm and accounts for approximately 15% of the posterior mass. In Figure 3.5, we show an image of the matrix of posterior edge marginal probabilities. These probabilities were approximated by the empirical frequencies an edge between any two nodes was included in the sampled graphs. When compared to the image of the true inverse correla- tion matrix underlying the data, all the non-zero entries in R−1 correspond to edges that have considerably higher posterior marginal probabilities. In 84 3.7. Simulation Results 4 1 3 2 5 Post= 0.15 4 1 3 2 5 Post= 0.11 4 1 3 2 5 Post= 0.06 4 1 3 2 5 Post= 0.05 4 1 3 2 5 Post= 0.03 Most Probable Structures 4 1 3 2 5 True Structure Figure 3.4: Most probable graphical structures and estimated posterior probabilities for simulated data. order to assess convergence, multiple runs were undertaken with different over-dispersed starting values and very similar results were obtained in all cases. In order to explore the benefits of performing model selection in com- parison to fitting a saturated model, we ran a simulation comparing the two alternatives. In this simulation 50 different data sets were generated using the same graph structure as above, however with a different true underlying correlation matrix each time (consistent with the graph structure). For sim- plicity, no covariates were added in this case. For each data set, we fit both the saturated model and use the model selection procedure. To evaluate the posterior estimators provided in each case, the entropy loss was computed 85 3.8. Conclusion 1 2 3 4 5 1 2 3 4 5 Edge Marginals 1 2 3 4 5 1 2 3 4 5 R−1 Figure 3.5: Matrix of posterior edge marginal probabilities, simulated data. Here the size of the square is indicative of the magnitude of the posterior edge marginal probability and therefore the larger the square the higher the probability that an edge exists between the corresponding nodes. for each Monte Carlo sample using L(R̂, R) = trace(R̂R−1)− log |R̂R−1| − J where R̂ is the Monte Carlo sample estimate of the correlation matrix. Figure 3.6 shows the box plots of the average entropy loss recorded for each data set. We can see that using the model selection procedure resulted in a significantly smaller entropy loss (paired t-test pval <0.01). 3.8 Conclusion In this Chapter, we looked at a simpler multivariate model where the time element is irrelevant either because of the nature of the data (not a time se- ries) or because the association is assumed to be static over time. Estimating a full precision matrix requires the estimation of J(J−1)/2 parameters. The number of parameters increases quadratically as a function of the number of variables under consideration. 86 3.8. Conclusion 0.05 0.1 0.15 0.2 Full Model Model Selection Entropy Loss Figure 3.6: Boxplot of the entropy loss over 50 datasets obtained using a fixed saturated correlation matrix (left) versus model selection (right). Therefore, when dealing with high dimensional problems, imposing a structure of association is helpful both conceptually and computationally. From a computational viewpoint, this reduces the number of free param- eters to be estimated compared with a full model. Hence, for a properly specified structure, it will result in more accurate and efficient estimates of parameters. Furthermore, researchers are often able to identify or reject certain parsimonious structures based on their conceptual understanding of the data and their subject matter expertise. We appealed to the theory of Gaussian graphical models to infer the underlying graphical structure and estimate a covariance matrix consistent with that structure under a unified framework. We extentended existent methodology to allow for a binary response by using a Probit link. Through- out this Chapter, we assumed that the model is mean-centered at 0. In the next Chapter we will consider adding the covariates. 87 Chapter 4 Estimating the Effects of Covariates It is often the case that the variable of interest V is dependent on certain variables referred to as predictors or covariates. Under this setting, the aim of the analysis is to estimate the effect of the covariates on the multivariate response while accounting for the correlation between the elements of the response. This corresponds to a multivariate linear regression model which assumes a linear relationship between the covariates and the multivariate response. In this Chapter, we extend the methodology introduced in Chapter 2 and Chapter 3 to include the estimation of the effects of covariates while simultaneously estimating the association between elements of the multi- variate response. In the simple multivariate case, discussed in Chapter 3, the dimension of the response under consideration is either a n × T ma- trix if it’s a univariate time series with static covariance, or n × J if it’s multivariate cross-sectional response. For clarity, this will be referred to as the multivariate model. The more complex case arises when the response is multivariate, multi-observation and longitudinal with a dynamic covariance. This is the model discussed in Chapter 2. Under that setting the observed response is a n × T × J cube depicted in Figure 1.1. This will be referred to as the multivariate longitudinal model. We begin by presenting an extension of the multivariate model of Chap- ter 3. We discuss standard posterior estimation of parameters in multivari- ate Bayesian regression. We will then show how the same concept can be extended in the context of the parameter expansion framework for a binary 88 4.1. Multivariate Model, Static Covariance response, introduced in Chapter 3. This has been done previously in Tabet (2007); however, the prior on β, the covariate effect that had been previ- ously suggested results in a dependence on D, the expansion parameter, in the posterior. In this Chapter, we correct this by taking a conjugate prior for β conditional on R, the correlation between the elements of the multivariate response, and centered at 0. Subsequently, we extend the methodology introduced in Chapter 2, to add the inference of the covariate effects in the context of the multivariate longitudinal model. This is a more complex model that involves the esti- mation of dynamic as well as static parameters. We appeal to a variant of the particle MCMC algorithm to address this. Throughout this Chapter, we assume that the effect of covariates on the response is static, i.e. β is not time-varying, even though the covariates may be measured at different time steps. 4.1 Multivariate Model, Static Covariance We begin by considering models with a covariance matrix that is not chang- ing in time; these are models that are either multivariate and cross-sectional or univariate and longitudinal, corresponding to the representation discussed in Chapter 3. For simplicity, we illustrate the modeling approach for cross- sectional response of dimension n× J . 4.1.1 Continuous Response For a continuous response the linear model can be written as Z = Xβ + ,  i.i.d∼ F(0,Σ) (4.1) where Z is a n× J matrix of observed continuous responses, X is an n× p design matrix of observed covariates, and β is a p × J matrix of regression coefficients. The error term  is a matrix with rows independent and identi- cally distributed according to a distribution F , with mean 0 and Σ = (σij) a J × J covariance matrix. 89 4.1. Multivariate Model, Static Covariance In this model, the unknown parameters of interest are (β,Σ). If we assume that the distribution of the error terms F is Gaussian. The likelihood of the data is multivariate Gaussian, with pdf given by f(Z | X,β,Σ) = 1 (2pi)nJ/2 |Σ|−n/2etr ( −1 2 Σ−1(Z −Xβ)(Z −Xβ)′ ) (4.2) A Bayesian approach attempts to model the posterior distribution of the parameters given the data pi(β,Σ | Z) ∝ f(Z | β,Σ)pi(β,Σ). We follow a default modeling approach by choosing a conjugate prior for β that depends on Σ. This choice of prior would result in mathematical convenience when dealing with the binary response case. The joint prior becomes pi(β,Σ) = pi(β | Σ)pi(Σ). For β, we follow a standard Bayesian conjugate approach and we select the matrix-variate Gaussian distribution pi(β | Σ) ∼MN pJ(0p×J ,Σ,Ψ), (4.3) where Ψ is a p × p matrix of hyperparameters representing the prior belief on the dispersion of the rows of β, and we assume that the dispersion across the columns of β follows Σ, similar to the data. This means that a priori, we believe that the effect of a given covariate on the different responses varies according to Σ, which is the same way the responses co-vary. Furthermore, it is not uncommon to assume that Ψ is a diagonal matrix with elements on the diagonal set to be very large so as to the prior is uninformative. This means that we assume that the effects of covariates on a given response are independent a priori. For a prior on Σ, we also choose a standard Bayesian conjugate prior, the inverse-Wishart prior for covariance matrices, pi(Σ) ∼ IWJ(d, S), (4.4) 90 4.1. Multivariate Model, Static Covariance where S is a J × J scale matrix and d is an integer representing the degrees of freedom. Given n observations Z, and the prior density pi(β,Σ), we are interested in the posterior density pi(β,Σ | Z). The previous conjugate priors result in a posterior distribution that is available analytically and it is the inverse- Wishart distribution and the matrix Normal for Σ and β respectively. pi(β,Σ | Z) = pi(Σ | Z)pi(β | Σ, Z). It is easy to check that (see Appendix B.2 for algebraic details) pi(Σ | Z) ∼ IWJ(d+ n,Z ′Z + SJ −B′Ξ−1B), (4.5) and pi(β | Z,Σ) ∼MN pJ(B,Σ,Ξ), (4.6) where Ξ−1 = X ′X + Ψ−1 and B = ΞX ′Z. This is multivariate Bayesian linear regression, where samples from the posterior are collected by alternating between sampling β | Σ, Z and Σ | Z. When sampling Σ | Z, we are able to integrate out β so that it does not affect sampling; however, since β is conditional on Σ, posterior samples of Σ are needed to obtain new samples of β. Algorithm 4.1 summarizes the steps for sampling from the joint posterior of (β,Σ). Algorithm 4.1 Multivariate Bayesian Linear Regression Given a current value for (β,Σ), the parameters are updated as follows: • Sample Σ | Z ∼ IWJ(d+ n,Z ′Z + IJ −B′Ξ−1B). • Sample β | Σ, Z ∼ Np,J(B,Σ,Ξ). where Ξ−1 = X ′X + Ψ−1 and B = ΞX ′Z 4.1.2 Binary Response When the observed data are not continuous, instead we observe n binary observations y = (y1, . . . , yn). As we have seen in Chapter 3, the covariance 91 4.1. Multivariate Model, Static Covariance matrix Σ must be constrained for identifiability reasons to be a correlation matrix R. The likelihood of the data is written using a Gaussian latent representation, pr(Yi = yi | X,β,R) = ∫ LiJ . . . ∫ Li1 NJ((Xβ)i , R)dZi, where (Xβ)i is the i th row of Xβ, Lij is the interval [0,∞) if yij = 1 and (−∞, 0) otherwise. In this case, we are interested in the posterior density pi(β,R,Z | y) ∝ pi(β,R) n∏ i=1 NJ(Zi | Xiβ,R)I (Zi ∈ Li) . (4.7) In order to sample from the expression in (4.7), we implement a Gibbs sampling approach similar to the one in the previous Chapter, where we sample the latent Gaussian variables pi(Z | y, β,R) = n∏ i=1 pi(Zi | y, β,R), where pi(Zi | y, β,R) ∝ NJ((Xβ)i, R)I (Zi ∈ Li) is a multivariate truncated Gaussian, easily sampled as in the previous Chapter using the algorithms of Geweke (1991) and Robert (1995). Fur- thermore, we also adopt a parameter expansion approach to obtain samples (β,R) by defining a transformation on the latent variables W = DZ, where D =diag(d1, ..., dJ) is the expansion parameter with di > 0. It follows that pi(W | β,R,D) = Nn,J (XβD, DRD, In) . We then define the new posterior pi(β,R,D,W | y) ∝ pi(R)pi(β | R)pi(D | R)pi(W | β,R,D) n∏ i=1 I (Wi ∈ Li) . (4.8) 92 4.1. Multivariate Model, Static Covariance As before, we also set pi(D | R) = ∏Ji=1 pi(di | R) with d2i ∼ IG ( (J + 1) /2, rii/2 ) . (4.9) It follows that to sample jointly from (β,R,D) from pi(β,R,D | y,W ) = pi(β,R,D |W ), first we sample D from its prior distribution and perform a change of variables Σ = DRD and γ = βD. The independent prior on the regression coefficients, β that had been previously suggested in Tabet (2007) results in a dependence on D, the expansion parameter, in the posterior. This does not arise in this case by taking a conjugate prior for β that is centered at 0 and with a covariance matrix that depends on the correlation matrix R. We therefore sample (Σ, γ) according to their resulting full conditional density pi(γ,Σ |W ) = pi(Σ |W )pi(γ |W,Σ). We then compute R = D−1ΣD−1, β = γD−1 where di = (σii)1/2, where σii is the i-th diagonal element of Σ. These are the same as the same posterior distribution used in the previous section for Gaussian data pi(Σ |W ) = IW(2 + n, W ′W + IJ −B′Ξ−1B), (4.10) and pi(γ |W,Σ) = Np,J(B, Σ, Ξ), (4.11) where Ξ−1 = X ′X + Ψ−1 and B = ΞX ′W . A general PXDA Algorithm which includes covariates is given in Algo- rithm (4.1.2). 93 4.2. Dynamic Covariance Algorithm 4.2 PXDA with Covariates Given a current value for (β,R,Z) we update these parameters as follows: • Sample Zij ∼ pi (Zij |Zi,−j , β, R) for all i, j using the algorithms of Geweke (1991) and Robert (1995). • Sample D using (4.9) and compute W = ZD. • Sample (Σ, γ) using (4.10)-(4.11). • Compute R = D−1ΣD−1, β = γD−1 where di = (Σii)−1/2. 4.2 Dynamic Covariance Let Z1, . . . , ZT represent multivariate continuous longitudinal observations assumed to arise from a Gaussian distribution. The covariance matrix is assumed to be dynamic, and the covariates are measured at each time step as well; however, the effect of those covariates on the response is usually not changing in time. The linear model is written as Zit = Xitβ + it (4.12) where Xit is a 1 × p vector of covariates, assumed to be measured without error, for subject i at time point t, and β is a p × J matrix of regression coefficients, assumed to be static in time. Finally it ∼ N (0,Σt) where Σt is a J × J covariance matrix. In this model, the parameters of interest are (β,Σ−11:T ), a combination of static and dynamic parameters. 4.2.1 Prior Specification for (β,Σ−11:T ) For Bayesian inference we must consider a joint prior for (β,Σ−11:T ). Unlike in the previous section, we are not able to specify a prior for β that depends on Σ since, we are assuming that β is fixed and Σ varies in time, therefore 94 4.2. Dynamic Covariance an independent prior for β and Σ−11:T is pi(β,Σ−11:T ) = pi(β)pi(Σ −1 1:T ) (4.13) We already saw in Chapter 2, that a suitable prior for Σ−11:T is the Wishart auto-regressive process, such that pi(Σ−11:T ) = pi(Σ −1 1 ) T∏ t=1 pi(Σ−1t | Σ−1t−1) (4.14) where pi(Σ−11 ) is a Wishart distribution WJ(d, S) and pi(Σ−1t | Σ−1t−1) is the non-central Wishart distributionW(d, S∗,Θ) with degrees of freedom d, scale parameter S∗ = S − MSM ′ and non-centrality parameter Θ = (S −MSM ′)−1M ′Σ−1t−1M . It remains to specify a prior on the unknown matrix of regression coef- ficients β. We propose a matrix-variate Gaussian prior for β pi(β) ∼MN pJ(0, IJ ,ΨP ) (4.15) with known diagonal covariance matrix ΨP indicating our prior belief about the degree of association between the p rows of beta and we assume that the J columns of beta are independent a priori and so we set the association between columns to the identity matrix. The Gaussian distribution is con- jugate to the likelihood and will result in a posterior that can be sampled from easily. Furthermore, this prior on β is not very informative for large hyperparameter values on the diagonal elements of the covariance matrix Ψp. 4.2.2 Posterior and Sampling Scheme The resulting posterior distribution of interest is pi(β,Σ−11:T | Z1:T ). Sampling from this posterior is not available in closed form as we have seen in Chapter 2. Intuitively, one can think of a Gibbs sampling scheme to iteratively sam- ple from pi(β | Σ−11:T , Z1:T ) and pi(Σ−11:T | Z1:T , β). In this case, one can use 95 4.2. Dynamic Covariance a particle MCMC targeting pi(Σ−11:T | Z1:T , β) and then, given the sampled values for the Σ−11:T , update pi(β | Σ1:T , Z1:T ). Unfortunately, under this sce- nario Andrieu et al. (2010) show that a direct application of the algorithm (2.4), would not result in an invariant density for pi(β,Σ−11:T | Z1:T ). In order to address this, they suggest the use of a special type of particle MCMC update called the conditional SMC update. The conditional SMC update is similar to the regular SMC update, how- ever, it requires keeping track of the “ancestral lineage”. We define B (m) t to be the index of the ancestor particle of Σ −1(m) 1:T at generation time t. A graphical illustration is provided in Figure 4.1. A conditional SMC update, ensures that a prespecified particle path Σ1:T with ancestral lineage B1:T is “frozen” and survives all the resampling steps and the remaining K − 1 “free” particles are generated as usual. Therefore for each sampled value of the static parameter β, the current value of Σ−11:T is ensured to survive all the re-sampling steps, and K − 1 new samples from p(Σ−11:T | Z1:T ) are obtained at each step. A detailed description of the conditional SMC algorithm is provided in Algorithm 4.3. The resulting particle Gibbs algorithm, would therefore iterate between sampling Σ−11:T using Algorithm 4.3, conditional on a current draw of β and the observed data. Then, given the new value Σ−11:T and the data, we draw a new value of β ∼ pi(β | Σ−11:T , Z1:T ). The posterior distribution of β can be obtained by considering a slice of the linear model for a given sequence of time-varying covariance matrices, so for observation i, we have Zi (T×J) = Xi (T×p) β (p×J) + i (T×J) We can apply the vec operator which refers to the vectorization of a matrix and results in a vector obtained by stacking the columns of the matrix on top of one another, then we have zi = vec(Z ′ i) (TJ×1) = (Xi ⊗ IJ) (TJ×pJ) vec(β′) (pJ×1) + vec(i) (TJ×1) where⊗ denotes the Kroneker product and we used the identity vec((XiβIJ)′) = 96 4.2. Dynamic Covariance Σ1 -1(1) Σ1 Σ1 Σ1 Σ1 Σ2 Σ2 Σ2 Σ2 Σ2 Σ3 Σ3 Σ3 Σ3 Σ3 Ti m e st ep s particles -1(2) -1(3) -1(4) -1(5) -1(1) -1(2) -1(3) -1(4) -1(5) -1(1) -1(2) -1(3) -1(4) -1(5) Figure 4.1: Example of ancestral lineages generated by an SMC algorithm for K = 5 and T = 3. The shaded path is Σ −1(2) 1:3 = (Σ −1(3) 1 ,Σ −1(4) 2 ,Σ −1(2) 3 ) and the ancestral lineage is B (2) 1:3 = (3, 4, 2) (example adapted from Andrieu et al. (2010)) vec(IJβ ′X ′i)) = (Xi ⊗ IJ)vec(β′). If we let zi = vec(Z ′i), β∗ = vec(β′), and Xi = Xi ⊗ IJ . Then we have zi iid∼ NTJ(Xiβ∗,Σ∗) where Σ∗ is a TJ × TJ block diagonal covariance matrix constructed with Σ1, . . . ,ΣT and and Xi is obtained by stacking the design matrices from all time points, for each individual. For i = 1, . . . , n, the pdf is given by 2pi− TJn 2 |Σ∗|−n/2 exp ( −1 2 Σ∗−1 n∑ i=1 (zi −Xiβ∗)(zi −Xiβ∗)′ ) The prior on β∗ can be obtained by applying taking the vectorization of the matrix Gaussian in 4.15 which becomes (2pi)−np/2|(Ψ⊗ IJ)|−1/2etr ( −1 2 (Ψ⊗ IJ)−1(β∗β′∗) ) (4.16) 97 4.2. Dynamic Covariance Algorithm 4.3 Conditional SMC Define B (m) t to be the index of the ancestor particle of Σ −1(m) 1:T at generation time t. Step 1 : Let Σ−11:T = (Σ −1(B1) 1 ,Σ −1(B2) 2 , . . . ,Σ −1(BT−1) T−1 ,Σ −1(BT ) T ) be a path associated with the ancestral lineage. Step 2 : for t = 1 • For m 6= B1, sample Σ(m)1 i.i.d∼ W(d+n, (S−1 +SST )−1), where SST = (Z1 −X1β)′(Z1 −X1β) Step 3 :At time point t = 2, . . . , T : • For m 6= Bt, sample Σ−1(m)t i.i.d∼ W(d, S − MSM ′, (S − MSM ′)−1M ′Σ−1(m)t−1 M)- A non-central Wishart. • Compute wt(Σ −1(m) t ) = f(Zt | Σ−1(m)t ) ; W (m)t = wt(Σ −1(m) t )∑K l=1wt(Σ −1(l) t ) where f is the multivariate Gaussian p.d.f. evaluated at Zt and Σ −1(m) t for each particle m. • Re-sample the particles Σ−1(m)1:t with replacement according to the weights W (m) t using the conditional multinomial resampling algorithm. Combining the above expression with the prior on β∗ would result in the following posterior distribution for β∗ pi(β∗ | Z1:T , µ1:T ) = NpJ (βn,Ψn) (4.17) with βn = Ψn n∑ i=1 X′iΣ ∗−1zi and Ψ−1n = (Ψ⊗ IJ)−1 + n∑ i=1 X′iΣ ∗−1Xi. 98 4.3. Simulation Results The particle Gibbs (PG), summarized in Algorithm (4.4) below, results in an invariant distribution of p(β∗,Σ−11:T | Z1:T ). Algorithm 4.4 Particle Gibbs (PG) Step 1 : Initialization, s = 0 • Set β∗(0) and B1:T (0) arbitrarily. Step 2 : for s = 1, . . . , N • Sample β∗(s) ∼ pi(β∗ | Σ−11:T (s− 1), Z1:T ) as in equation (4.17). • Run a conditional SMC algorithm targeting p(Σ−11:T | β∗(s), Z1:T ) con- ditional on Σ−11:T (s− 1). • Sample Σ−11:T (s) ∼ p̂(Σ−11:T | β∗(s), Z1:T ). In this case B1:T (s) are im- plicitly sampled as well. 4.3 Simulation Results To test the PG algorithm, continuous data Z were generated according to Zit ∼ N (Xitβ,Σt) In this example, with a fixed n = 1, J = 3, p = 2 and T = 100, covariates were generated by drawing uniformly between [−0.5, 0.5]. The matrix of regression coefficients underlying the model was fixed to β = ( 2 3 1 -1 5 2 ) (4.18) Furthermore, to simulate Σ1:T , we used the WAR process with d0 = 4, S0 = IJ and we fixed M = 0.98IJ . The number of particles used at each Gibbs step was set to K = 5000 and the total number of Gibbs iterations performed was 2000. With those settings, the algorithm took 3 days to run. Figure 4.2 shows the marginal distribution of β, as well as trace plots 99 4.3. Simulation Results and autocorrelation plots (ACF). We could see that the resulting posterior distribution for each β seems to be centered at the true value for most of the cases, and the true value is contained in the 95% credible interval for all of the elements of the matrix β. Furtherore, ACF plots indicate that the autocorrelation between samples quickly drops to zero, which is a sign of good mixing of the algorithm. In Figure 4.3, we take a look at the posterior distribution of R1:T = D −1/2 1:T Σ1:TD −1/2, which we are able to compute from the Σ−11:T samples obtained from the particle Gibbs algorithm . We plot the median of the marginal elements Rij over time as well as a 95% credible interval. The first thing we note, is that the algorithm is degenerating. Credible inter- vals appear after time points t = 75. This is because up to that point the distribution is approximated by a unique particle, due to the re-sampling. Furthermore, the intervals do not always contain the true value of the cor- relation, however, this may be due to the algorithm degenerating. Having said that, the algorithm is picking up on the general trend and given the complexity of the problem, the resulting point estimates may be reasonable. Finally in order to evaluate the estimated sequences of matrices jointly as a whole, rather than marginally, we plotted the average entropy loss computed at each time step against time in Figure 4.3. We can see that the entropy loss seems to vary randomly over time. The lack of trend in the entropy loss is a good sign that the loss is not accumulating, and the variability is simply due to Monte Carlo error. 100 4.3. S im u lation R esu lts 0 20001 1.5 2 2.5 3 β11 T r a c e p l o t 1 2 30 20 40 60 80 100 True H i s t o g r a m 0 10 20−0.5 0 0.5 1 S a m p l e A u t o c o r r e l a t i o n 0 20002 2.5 3 3.5 4 β12 2 3 40 20 40 60 80 100 True 0 10 20−0.5 0 0.5 1 0 2000−1 −0.5 0 0.5 1 1.5 2 β13 0 1 20 20 40 60 80 100 True 0 10 20−0.5 0 0.5 1 0 2000−3 −2.5 −2 −1.5 −1 −0.5 0 β21 −2 −1 00 20 40 60 80 100 True 0 10 20−0.5 0 0.5 1 0 20003 4 5 6 7 β22 4 5 60 20 40 60 80 100 True 0 10 20−0.5 0 0.5 1 0 20000 1 2 3 4 β23 1 2 30 20 40 60 80 100 True 0 10 20−0.5 0 0.5 1 Figure 4.2: Posterior estimates of β using Particle Gibbs 101 4.4. Summary and Discussion 4.4 Summary and Discussion In this Chapter, we have extended the models discussed in Chapters 2 and 3 to include covariate information and estimate the dependence between the elements of the response simultaneously. When the covariance matrix is static in time, we showed how are able to extend Bayesian estimation to binary data, using the parameter expansion for data augmentation framework. This allows the proper simultaneous estimation of the covariates, the covariance and its structure. For a dynamic covariance matrix, we implemented the particle Gibbs algorithm to alternate between sampling of the dynamic precision matrix and the fixed covariate. In order to ensure the validity of the particle Gibbs, we construct the proposal using conditional SMC, where we condition on maintaining a prespecified path through the sampler. While PG sampler is ergodic, it currently has poor mixing. This was shown to be particularly true when the number of particles small relative to T (Whiteley, 2010). This is because the SMC degeneracy in the SMC path causes the variables that are left out at any two consecutive iterations to be strongly dependent. Future work should consider improving the mixing by adding a backwards simula- tions step to the algorithm as suggested by Whiteley et al. (2010),Olsson and Ryden (2011) and Lindsten and Schon (2011). This strategy has been shown to drastically improve the degeneracy problem. Another alternative to consider as well is including ancestor sampling as suggested by Lindsten et al. (2012). 102 4.4. Summary and Discussion 0 50 100 150 −2 −1.5 −1 −0.5 0 0.5 1 1.5 2 2.5 3 time True Correlation Median Correlation (a) Marginal distribution of R12 0 50 100 150 −2 −1.5 −1 −0.5 0 0.5 1 1.5 2 2.5 3 time True Correlation Median Correlation (b) Marginal distribution of R13 0 50 100 150 −2 −1.5 −1 −0.5 0 0.5 1 1.5 2 2.5 3 time True Correlation Median Correlation (c) Marginal distribution of R23 Figure 4.3: Plot of the true correlation and the median estimated correlation along with a 95% credible interval, obtained using the PG algorithm with conditional SMC. 103 4.4. Summary and Discussion 0 50 100 150 0 5 10 15 20 25 30 En tro py L os s fo r Σ time Figure 4.4: Entropy loss computed at each time step and plotted against time 104 Chapter 5 Applications In this Chapter, we showcase the methodology developed in this thesis and how it can be used to address real problems and applied to real datasets. We revisit the examples we discussed in Chapter 1. 5.1 Heart Rate Variability as an Indicator of Nociception The brain perceives pain through the nociception system, which prompts increased activity in the autonomic nervous system (ANS). In turn, the ANS activates the sympathetic nervous system (SNS), creating a stress response in the body that includes increased respiration, blood pressure and heart rate. In sick patients, a strong stress response can cause serious injury. A patient exists in a homeostatic balance between a state of sympathetic (stress) and parasympathetic tone (a frequency of neural firing). When in a state of extreme stress, the sympathetic tone is high and the parasympa- thetic tone is low. On the other hand, when in a state of extreme relaxation the opposite is true. Heart rate variability (HRV) is a measurement of the interaction between the sympathetic and parasympathetic activity in auto- nomic functioning. In general, changes in HRV are highly correlated with morbidity and mortality. The first example we consider, introduced in section 1.2.1, is data coming from an experiment conducted by the Pediatric Anesthesia Research Team (PART) at the British Columbia Children’s Hospital. The heart rates of six year old children undergoing a dental procedure under general anesthesia were monitored. (More details about the experiment can be found in Brouse 105 5.1. Heart Rate Variability as an Indicator of Nociception (2013)). In this thesis we specifically focus on scenarios where a child is undergoing a dental procedure and is experiencing a painful stimulus, which means he is in a nociceptive state. The anesthesiologist responds by giving the child an anesthetic bolus. The bolus consists of a single shot or multiple shots applied in succession. In order to analyze the data available, each bolus event is considered separately. The first and last 120 seconds are treated as padding and are ignored. The baseline (nociceptive) period exists from from 121 seconds to 180 seconds, it is 60 seconds long. The response (anti-nociceptive) period is 180 seconds from the end to 121 seconds from the end and is also 60 seconds long. The anesthetic bolus occurs at 180 seconds from the beginning, but there may be more than one. Any number of anesthetic boluses may be given between 180 seconds to 240 seconds from the end. This is illustrated in Figure 5.1(a). The line on top is the observed heart rate. We can see that in the nociceptive period, prior to receiving the anesthetic bolus the heart rate behaves differently then after the bolus, when the patient no longer feels pain. The line at the bottom is the same data but with the mean filtered out. This is done by working with the residuals obtained by fitting a first order autoregressive model. This step is done in order to isolate the source of variability over time to the variance parameter. In mathematical notation, the observed heart rate at time point t is given by Vt. The data is assumed to follow an autoregressive process that is related to the heart rate at the previous time step through an auto-regressive parameter A such that Vt = AVt−1 + t, t ∼ N (0, σ2t ). Since we are only interested in inferring σ21:T , we pre-process the data to remove the mean and work with the residuals Zt = Vt − ÂVt−1 . If we adopt a baseline-stimulus setup, we have, for each bolus event, two 106 5.1. Heart Rate Variability as an Indicator of Nociception datasets that we consider independently; one corresponds to the nociceptive period and one to the anti-nociceptive period. The data in each period has dimensions J = 1, n = 1, and T = 60. The sequence of variances can then be inferred using the autoregressive process we introduced in Chapter 2. Clearly, from figure 5.1(a), we expect the pattern of variability to change from the nociceptive period to the anti-nocicpetive period. In each case, we run the PMMH algorithm to obtain an estimate of the variance as well as the underlying smoothness parameter ρ. In this case because we are working in the univariate case, we have the option to use either the non-central χ2 or the central χ2 approximation as a proposal to the algorithm. Both algorithms produced similar results, and therefore we only present the results obtained using the non-central χ2 as a proposal for consistency. For each bolus event, we run the PMMH algorithm with K = 1000 particles, and 5000 MCMC iterations. In order to obtain an acceptance probablity in the 20− 30% range, the variance of the random walk proposal for ρ, the autoregressive hyperparameter, had to be adjusted for each bolus event and for each period. For the first bolus event it was set to c = 0.4 in the nociceptive period, and to c = 0.05 in the anti-nociceptive period. In Figures 5.1(c) and 5.1(d), we show the trace plot and histogram of the posterior distribution of ρ in each of the two periods. From this Figure and other examples, we can see that a larger step size results in a larger exploration of the space. 107 5.1. Heart Rate Variability as an Indicator of Nociception 50 100 150 200 250 300 350 400 −20 0 20 40 60 80 100 120 time Bolus H e a r t R a t e Nociceptive Period Anti Nociceptive Period (a) The raw and centered heart rate data −1 0 1 2 3 4 N oc ic ep tio n log (σ) vs Time −1 0 1 2 3 4 time A nt i− N oc ic ep tio n (b) Posterior distribution of log(σ) in the nociception and anti-nociception phase. 500 1000 1500 2000 2500 3000 3500 4000 4500 0 0.2 0.4 0.6 0.8 1 Nociceptive Period 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 0 100 200 300 400 500 Nociceptive Period Median −− 0.943 95 CI −− [0.684,0.994] Acc. Prob −− 24.9% (c) Histogram and trace plot of the posterior distribution of ρ in the no- ciception phase 500 1000 1500 2000 2500 3000 3500 4000 4500 0 0.2 0.4 0.6 0.8 1 Anti−Nociceptive Period 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 0 500 1000 1500 2000 2500 Anti−Nociceptive Period Median −− 0.995 95 CI −− [0.907,1] Acc. Prob −− 23.8% (d) Histogram and traceplot of the posterior distribution of ρ in the anti- nociception phase Figure 5.1: Results for Bolus Event 1 The posterior distribution of each log(σt) obtained in the nociceptive period and anti-nociceptive period is plotted in Figure 5.1(b) over time. We note that during the nociceptive period, the log standard deviation appears to be fluctuating in time compared with the anti-nociceptive period, which is a lot smoother. This is also reflected in the estimate of ρ which is higher in the anti-nociceptive period (median 0.99). Figure 5.2 contains side by side boxplots comparing the estimates of ρ in both periods. Bolus event 1 is a very “clean example” that showcases well how the variability of the heart rate between the two different periods differs. Results from bolus event 2 in Figure 5.3, are a little less straightforward to interpret. 108 5.1. Heart Rate Variability as an Indicator of Nociception 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 NC ANC Posterior of ρ Figure 5.2: Boxplot of the posterior of ρ in the nociceptive state (NC) and anti-nociceptive (ANC) for bolus event 1. In this case, the painful stimulus occurred partway through the nociceptive period. The anesthesiologist responded quickly and an anesthetic bolus was given, which led the heart rate to begin readjusting. However, likely another stimulus event occurred to counter the effect of the bolus. The behaviour in the data is mirrored in the estimated results. We observe that ρ is the same in both periods and the estimates of log σ are fluctuating for both periods, and not showing the same level of smoothness that we saw in the anti-nociceptive period of the previous bolus event example. 109 5.1. Heart Rate Variability as an Indicator of Nociception 50 100 150 200 250 300 350 400 −20 0 20 40 60 80 100 120 time Bolus Heart Rate Data for Bolus Event 2 Anti Nociceptive Period Nociceptive Period (a) The raw and centered heart rate data (b) Posterior distribution of log(σ) in the nociception and anti-nociception phase 500 1000 1500 2000 2500 3000 3500 4000 4500 0 0.2 0.4 0.6 0.8 1 Nociceptive Period 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 0 200 400 600 800 1000 Nociceptive Period Median −− 0.985 95 CI −− [0.859,0.999] Acc. Prob −− 26% (c) Histogram and traceplot of the pos- terior distribution of ρ in the nocicep- tion phase 500 1000 1500 2000 2500 3000 3500 4000 4500 0 0.2 0.4 0.6 0.8 1 Anti−Nociceptive Period 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 0 500 1000 1500 Anti−Nociceptive Period Median −− 0.963 95 CI −− [0.302,0.999] Acc. Prob −− 29.1% (d) Histogram and traceplot of the posterior distribution of ρ in the no- ciception phase Figure 5.3: Results for Bolus Event 2 Bolus event 4 in Figure 5.1 is another example of non-typical results. In this case it is suspected that an anesthetic bolus was given in anticipation of a stimulus that didn’t occur. From Figure 5.4(a), we can see that the pattern in the variability of heart rate over time does not change. Reliably, the estimates obtained from the algorithm also mirror that. The estimates for ρ are high and the estimates of log(σ) are very smooth over time. 110 5.1. Heart Rate Variability as an Indicator of Nociception 50 100 150 200 250 300 350 400 −20 0 20 40 60 80 100 120 time Bolus Heart Rate Data for Bolus Event 4 Anti Nociceptive Period Nociceptive Period (a) The raw and centered heart rate data (b) Posterior distribution of log(σ) in the nociception and anti-nociception phase 500 1000 1500 2000 2500 3000 3500 4000 4500 0 0.2 0.4 0.6 0.8 1 Nociceptive Period 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 0 500 1000 1500 2000 Nociceptive Period Median −− 0.998 95 CI −− [0.974,1] Acc. Prob −− 35% (c) Histogram and traceplot of the pos- terior distribution of ρ in the nocicep- tion phase 500 1000 1500 2000 2500 3000 3500 4000 4500 0 0.2 0.4 0.6 0.8 1 Anti−Nociceptive Period 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 0 500 1000 1500 2000 2500 Anti−Nociceptive Period Median −− 0.996 95 CI −− [0.838,1] Acc. Prob −− 25.1% (d) Histogram and traceplot of the posterior distribution of ρ in the no- ciception phase Figure 5.4: Results for Bolus Event 4 One of the limitations of the baseline-stimulus setup, is that the windows that are used to collect the data are fixed. The researchers did not want to manipulate the window so as to not introduce bias, but there were many other confounding events that occured during the procedure which clearly affect the results. In most cases, the findings resulting from the use of the PMMH algorithm appear to be clinically relevant and correspond to findings obtained using other methods. All the results obtained from of all the experiments are presented in Appendix C. Future work on this dataset, should consider a changepoint detection algorithm that can take the full range of the data and detect the point(s) at which a change occurs and see 111 5.2. Stock Market Returns of Technology Firms whether it matches the observed data. Furthermore, as noted in Chapter 2, being able to adaptively set the step size for each data region would be useful. 5.2 Stock Market Returns of Technology Firms In the next example, we consider the stock market log-returns of the four technology firms: Google, Microsoft, IBM and Apple. A description of the data and the pre-processing was given in section 1.2.2 of Chapter 1. The pre-processing step with the multivariate autoregressive filter en- sures that the only parameter evolving in time is the covariance matrix. This in turns ensures that the conditional independence assumption of Chapter 2 is met. The analysis is done as with the previous example on the residuals, denoted by Z, which are obtained from fitting the AR filter. In this case, Z is an n× J × T array with n = 1, J = 4, and T = 386. We assume that the data follows a matrix-variate Gaussian distribution with Zt | Σ−1t ∼MN n,J(0n×J ,Σ−1t , In) where the unknown parameter of interest is the sequence of T precision matrices Σ−11 , . . . ,Σ −1 T . We place a Wishart autoregressive prior on the sequence of matrices as described in Chapter 2 such that pi(Σ−11:T ) = pi(Σ −1 1 ) T∏ t=1 pi(Σ−1t | Σ−1t−1) and where Σ−11 ∼ WJ(d, S−1, 0) Σ−1t | Σ−1t−1 ∼ WJ(d, S −MSM ′, (S −MSM ′)−1M ′Σ−1t−1M) In this model, we fix the hyperparmeter S = IJ , the identity matrix and d = J+1 = 5. This choice of hyperparameters results in a prior on Σ that in turn results in a marginally uniform prior on the correlation R between the 112 5.2. Stock Market Returns of Technology Firms stocks, where R is obtained from the decomposition R = D−1ΣD−1 where D is a diagonal matrix with D = √ diag(Σ). Therefore, the aim here is to infer the autoregressive parameter M = ρI as well as the precision sequence Σ−1t from the data. We run the PMMH algorithm (2.4) of Chapter 2, to obtain samples of ρ and Σ−11:T using the prior as a proposal distribution for Σ1:T and using the random walk proposal for ρ with starting value ρ0 = 0.5 and different c to make sure that the acceptance probability is between 25% and 50% which indicates that the sampler is moving and exploring the space of possible values. For this example, we tried running the algorithm with different number of particles and for different number of iterations. However, it appears that regardless of the step size of the random walk, once the random walk reaches ρ = 1, it gets stuck there. For illustration, in Figures 5.5 we show posterior samples of ρ obtained from running the PMMH algorithm with K = 2000 SMC iterations and s = 5000 MCMC samples. 0 500 1000 1500 2000 2500 3000 3500 4000 4500 5000 0.5 0.55 0.6 0.65 0.7 0.75 0.8 0.85 0.9 0.95 1 Time Series Plot of ρ , AP=1 % with c=0.01 (a) Time series plot Posterior Distribution of ρ 0.5 0.55 0.6 0.65 0.7 0.75 0.8 0.85 0.9 0.95 1 0 500 1000 1500 2000 2500 3000 3500 4000 4500 5000 Median −− 0.996 95 CI −− [0.993,0.997] (b) Histogram Figure 5.5: Posterior distribution of ρ obtained from running the PMMH algorithm on the four technology stock problem One of the advantages of assuming M = ρI is that because ρ is a scalar, we can estimate the marginal likelihood using different values of ρ on a grid to see why the algorithm is getting stuck. The results are plotted in Figure 113 5.3. “Women and Mathematics” Data Set 5.6. We can see that clearly the marginal likelihood peaks at 1. Therefore, it is no surprise that the algorithm is getting stuck at this boundary. Finally, we can perhaps conclude that the posterior is concentrated in a (0.9, 1) region and that there might be evidence against a stationary stochastic volatility model in this example. 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 −1000 −500 0 500 1000 1500 2000 2500 ρ M ar gi na l li ke lih oo d Figure 5.6: The marginal likelihood evaluated and plotted for different scalar values for ρ on a grid. The maximum is reached at ρ = 1 5.3 “Women and Mathematics” Data Set In the next example we switch gears and we look at an application where the correlation is assumed to be static. We illustrate the methods discussed in Chapter 3 by applying the model selection algorithm to the “Women and Mathematics” data set (Fowlkes et al., 1988b). This data was analyzed by Tarantola (2004) and by Madigan and Raftery (1994) using the MC3 algorithm, to learn the structure of a discrete graph. The primary aim is to study the attitude of 1190 New Jersey high school students toward mathematics. The six binary variables collected are: X1 Lecture Attendance (Attend/Did not Attend) X2 Gender (Male/Female) X3 School Type (Urban/Suburban) 114 5.3. “Women and Mathematics” Data Set Figure 5.7: Most probable graphical structures and estimated posterior probabilities for the “Women and Mathematics” data set. X4 “I will be needing Mathematics in my future work” (Agree/Disagree) X5 Subject Preference (Math/Science vs. Liberal Arts) X6 Future Plans (College/Job) We ran the algorithm for N = 200, 000 iterations. Multiple runs with differ- ent starting values and very similar results were observed in all cases. Our method identifies six structures with probabilities higher that 3% (0.03), accounting for 38% of the total probability mass. The most probable struc- tures are depicted in figure 5.7. All of the probable graphs chosen by our algorithm show that X1, lec- ture attendance, was independent of everything else conditional on all the other variables in the graph. Similarly, all of the probable models include edges between [X5,X2,X4], showing conditional association between gen- der, subject preference and whether they will need mathematics in their fu- ture. These results are consistent with those found by others. Our method also detects with high probability the clique [X6,X5,X4], suggesting a con- ditional association between future plans, subject preference, and whether 115 5.4. Six Cities Data Set the respondents thought there will be needing mathematics in their future. Moreover, a relationship between X6, future plans and X3, type of school chosen, was detected. Comparing our results with those of Tarantola (2004), we note that while their most probable graph does not correspond to the most probable one obtained here, some of their most probable graphs are identical to the ones we obtained. Furthermore, if we compare the edge marginal probability, we see that in their hierarchical models, with a certain value of the hyperparameters (λ0 = 64 in the non-hierarchical model and f = 64 in the hierarchical model), the edge marginals they obtained are very similar to the ones we have (see table 5.1). The graph they report in their paper corresponds to the hierarchical model with f = 1. Edges (2,1) (3,1) (4,1) (5,1) (6,1) (3,2) (4,2) (5,2) This paper 0.11 0.10 0.14 0.13 0.13 0.04 0.90 1.00 Non Hierarchical λ0 = 64 0.10 0.12 0.15 0.13 0.15 0.11 1.0 1.0 Hierarchical f = 64 0.12 0.15 0.17 0.15 0.16 0.12 1.0 1.0 Edges (6,2) (4,3) (5,3) (6,3) (5,4) (6,4) (6,5) This paper 0.20 0.22 0.56 1.00 1.00 1.00 0.98 Non Hierarchical λ0 = 64 0.02 0.81 0.33 1.0 1.0 0.80 0.79 Hierarchical f = 64 0.03 0.80 0.43 1.0 1.0 0.89 0.88 Table 5.1: Table of edge marginal probabilities comparing results obtained by the algorithm developed in this paper and those by MC3 algorithm as outlined in Tarantola (2004) 5.4 Six Cities Data Set In the last example we consider the data set used by Chib and Greenberg (1998), a subset of the Six Cities study, a longitudinal study of the health effects of air pollution. The data under consideration consists of repeated binary measures of wheezing status of 537 children from Stuebenville, Ohio. The objective is to model the probability of wheeze status over time as a function of the mother’s smoking status during the first year of the study. The response variable of interest yij indicates the wheezing status of child i at j = (1, 2, 3, 4) corresponding to age (7, 8, 9, 10). The nature of the data is suggestive that there exists an association between the elements of the 116 5.4. Six Cities Data Set multivariate response variable. We are interested in assessing the effects of the mother’s smoking status at onset (1=yes, 0=no) . Table 5.2 breaks down the number of “cases” representing wheezing symptoms, by age group and mother smoking status. From looking at these results, it appears that the percentages of wheezing children is higher for smoker mothers at each age. Child’s Age Mother Smoking Status 7 8 9 10 Smoker 32 (17.0%) 40 (21.3%) 36 (19.1%) 27 (14.4%) Non Smoker 55 (15.8%) 51 (14.6%) 49 (14.0%) 36 (10.0%) Total 87 (16.2%) 91 (16.9%) 85 (15.8%) 63 (11.7 %) Table 5.2: Breakdown of wheezing cases by age group and smoking status of mother at onset. The percentages in brackets are percentages with respect to each age group. As in Lawrence et al. (2008), we only use the mother’s smoking status at onset as a covariate (x1) as well as an intercept term: Pr(Yi,j = 1) = βj,0 + βj,1xi,1. For the prior (4.3) on β, we set Ψ = 105Ip. We fit the probit model using a saturated correlation matrix as in Lawrence et al. (2008) but also estimating the structure of association from the data. We generated N = 50, 000 samples and used 3, 000 samples as burn-in. Table 5.3 presents the posterior means for β and R. These results are compared to those presented in Lawrence et al. (2008). From the results in Table 5.3, we note that the results obtained using our proposed method and prior do not differ significantly from those presented in Lawrence et al. (2008). The estimates for the regression coefficients are robust to the correlation structure as well as the prior specification. The marginal posterior correlations are slightly smaller under the marginal uni- form priors which is consistent with the fact that Jeffreys’ prior tends to push marginal correlations to ±1. Furthermore, in order to compare the performance of the algorithm from a computational standpoint we look at 117 5.4. Six Cities Data Set Saturated Model Selection Jeffreys Marginally uniform Marginally uniform β11 -0.983 -0.922 -0.922 β12 0.011 0.032 0.031 β21 -1.029 -0.929 -0.983 β22 0.219 0.223 0.222 β31 -1.054 -1.012 -1.013 β32 0.166 0.181 0.182 β41 -1.235 -1.180 -1.181 β42 0.150 0.167 0.169 r12 0.592 0.498 0.495 r13 0.535 0.455 0.422 r14 0.573 0.487 0.485 r23 0.700 0.588 0.589 r24 0.571 0.494 0.469 r34 0.641 0.548 0.552 Table 5.3: Posterior means of the parameters of interest. The first two columns correspond to using the saturated model for the Jeffreys’s prior of Lawrence et al. (2008) and the marginally uniform prior. The last col- umn corresponds to using a marginally uniform prior and performing model selection. the plots of the autocorrelation function in Figure 5.8 and we compare that to Figure 1 in Lawrence et al. (2008). We can see that the proposed algo- rithm fairs well in comparison and shows significant autocorrelation to only about 10 to 15 lags compared to 25 to 30 lags for the algorithm in Lawrence et al. (2008) which in turn performs significantly better than the algorithm in Chib and Greenberg (1998). Because our method performs correlation selection, whereby the graph is sampled alongside the other parameters, we can obtain a posterior distri- bution over graph structures. Figure 5.9 depicts the most probable graphs that were obtained. The saturated model used by Chib and Greenberg (1998) and Lawrence et al. (2008) was the second most probable graph with posterior probability of 0.26. The most probable graph structure (43% of posterior mass) appears to be the one that suggests that the wheezing status of the child age 9 is conditionally independent of the wheezing status at age 118 5.4. Six Cities Data Set 0 5 10 15 20 −0.5 0 0.5 1 R 1 2 0 5 10 15 20 −0.5 0 0.5 1 R 1 3 0 5 10 15 20 −0.5 0 0.5 1 R 1 4 0 5 10 15 20 −0.5 0 0.5 1 R 2 3 0 5 10 15 20 −0.5 0 0.5 1 R 2 4 0 5 10 15 20 −0.5 0 0.5 1 R 3 4 Figure 5.8: Autocorrelation function for the marginal correlation parameters obtained from the Six Cities example under the saturated assumption. 119 5.4. Six Cities Data Set 4 1 3 2 Post= 0.43 4 1 3 2 Post= 0.26 4 1 3 2 Post= 0.25 Most Probable Structures 4 1 3 2 Post= 0.02 Figure 5.9: The four most probable graph structures, with their associated posterior probability 7, given the other variables in the model. Finally, in order to assess the fit of the model to the data, we use posterior predictive checks suggested in Gelman et al. (2003). Posterior predictive checks are useful for detecting systematic differences between the model and the observed data. They are done by generating replicated data sets from the posterior predictive distribution of the statistical model. The replicated data is then compared with the observed data set with respect to some features of interest. For this example, we took L = 25, 000 samples from the joint posterior distribution, and generated a data set Y l for each (βl, Rl), l = 1, . . . , L. The resulting binary data set is then compared with the observed data with respect to the following three quantities • % of kids that never experienced wheezing at any age. • % of kids that experienced wheezing at every age. • % of kids that did not experience wheezing at onset, however, once they have they have continued for more than 2 time points. Each of the above measures, T (y) was computed under the observed data and compared to the quantity obtained under the replicated artificial data. The Bayesian p-value is used to assess whether differences are statistically 120 5.4. Six Cities Data Set significant. The Bayesian p-value is given by Pr(T (Y rep) > T (y)|y) and is estimated from simulations using ∑L l=1 I ( T (Y l) > T (y) ) L. The p-values are computed for the case where the correlation structure is inferred from the data. The results in Table 5.4 suggest that the posterior predictive distribution produces data that are in reasonable agreement with the observed data. Indeed the p-values obtained are neither too close to 0 nor too close to 1. Test Variable T (y) 95% Intervals for T (Y rep) p-value % never exp. wheezing 66.1 [55.12 67.59] 0.08 % always exp. wheezing 3.3 [ 1.11 4.47] 0.21 % incident wheezing 2.4 [1.86 5.77] 0.93 Table 5.4: Posterior predictive statistical checks with 95% intervals and p- values, indicating that the posterior predictive distribution of the proposed model is not statistically different from the observed data 121 Chapter 6 Conclusion 6.1 Summary In this research work we have laid out a general and unified framework to model dependencies in multivariate data. More specifically, under a Bayesian paradigm, we considered the problem of estimating covariance ma- trices and covariate effects under different settings and for different data types. For continuous data, in Chapter 2, we extended the Bayesian linear model and proposed algorithms that allow the estimation of time-varying covariance matrices using the Wishart autoregressive process. The Wishart autoregressive process is the multivariate generalization of the χ2, a com- monly used prior for the variance parameter in the univariate case. Therefore this natural extension allows for easy interpretation of the model parameters and guarantees stationarity. However, we have seen that algorithms needed in the implementation of the WAR model are computationally expensive and do not scale very well in high dimensions as the number of time steps, T , and the number of variables, J , simultaneously increase. This is particularly more difficult in the presence of static parameters. This dynamic model for continuous data was extended in Chapter 4 to allow for the estimation of a covariate information using a particle Gibbs algorithm. In Chapter 3, we discussed methods for estimating a static covariance that is unknown but fixed over time. We also discussed algorithms to esti- mate the static covariance or correlation matrix when the observed data are binary. Furthermore, we introduced covariance selection models and Gaus- sian graphical models as a way to estimate a sparse structure of the inverse correlation matrix for multivariate binary data using the probit model. This 122 6.2. Future Work model was further developed in Chapter 4 , by adding linear predictors and attempting to estimate their effect. 6.2 Future Work Finally, here are some directions to consider in future work. 6.2.1 Numerical Evaluation of the Hypergeometric Function One of the first road blocks in scaling the Sequential Monte Carlo algorithm for the estimation of a time-varying covariance matrix is the efficient and accurate numerical evaluation of the oF1 hypergeometric function. The ac- curate evaluation of the hypegeometric function is needed for the Central Wishart importance distribution used in the SMC algorithm. As we saw in simulation results, the more informative the likelihood is, the more different the posterior becomes from the prior and the worse the non-central Wishart prior becomes as a proposal. Future work in numerical analysis could speed up and make possible the use of the proposal that accounts of the observed data in dimensions J larger than 1. 6.2.2 Better Proposals for Static Parameters In Chapter 2 we discussed the need to devise better proposals for the esti- mation of the fixed autoregressive parameter M . The current random walk proposal can be improved by automatically tuning the step size. However, it might make sense from an application stand point to come up with pro- posals that allow for unconstrained as well as structured matrix M . In their paper, Lina et al. (2012) suggest stacking the rows of the matrix M and proposing from a multivariate Normal distribution. This could be a good starting point. Moreover, as we have seen in our simulations in the scalar case, ρ is not linear in that values have to be very close to 1 before we are able to detect persistence in the covariance matrix. Perhaps one can 123 6.2. Future Work consider a transformation to the real line (or, equivalently a multiplicative random walk). This might be preferable in this case. 6.2.3 Improving Speed of Mixing and Avoiding Degeneracy Particle MCMC algorithms are known to be computationally expensive and therefore future work must include ways to speedup these algorithms to make them practical for use. Such methods would include possibly more ef- ficient implementations using a faster programming language. Furthermore, speeding up the algorithm by considering parallelisation over particles (Lee et al., 2010). In the case of particle Gibbs, methods that include a backwards sim- ulations step within the algorithm should be considered, as suggested in Whiteley (2010) and Lindsten and Schon (2011). The aim of these strate- gies is to avoid particle degeneracy and improve mixing. Other alternatives that could be beneficial include ancestor sampling as suggested by Lindsten et al. (2012). 6.2.4 Other Possible Extensions Other extensions that could be attempted once the above mentioned issues have been addressed include extending the model to run for binary data in the dynamic case. Under this scenario, the response is a multidimensional longitudinal vector of binary observations for each subject. One can con- sider extending the parameter expansion for data augmentation (PXDA) algorithm to the dynamic case, moreover, learning a structured covariance structure over time, and possibly estimating the structure. This obviously adds to the complexity of the model and should not be attempted before resolving issues with mixing and speed. Other natural extensions to ac- commodate mixed scales as well as ordinal and count scales could also be considered. 124 Bibliography C. Andrieu and A. Doucet. Particle filtering for partially observed gaussian state space models particle filtering for partially observed gaussian state space models. Journal of the Royal Statistical Society. Series B (Statistical Methodology), 64(4), 2002. C. Andrieu, A. Doucet, and R. Holenstein. Particle markov chain monte carlo methods. Journal of the Royal Statistical Society B, (72):1–33, July 2010. H. Armstrong, C.K. Carter, K. Wong, and R. Kohn. Bayesian covariance matrix estimation using a mixture of decomposable graphical models. Statistics and Computing, 19:303–316, 2009. M. Asai, M. McAleer, and J. Yu. Multivariate stochastic volatility: A review. Econometric Reviews, 25(2):145–175, 2006. J. Barnard, R. McCulloch, and X. Meng. Modeling covariance matrices in terms of standard deviations and correlations, with application to shrink- age. Statistica Sinica, 10:1281–1311, 2000. Luke Bornn and François Caron. Bayesian clustering in decomposable graphs. Bayesian Analysis, 6(4):829–846, 2011. C.J. Brouse. Monitoring nociception during general anesthesia with heart rate variability. PhD Thesis, 2013. W. J. Browne. Mcmc algorithms for constrained variance ma- trices. Comput. Stat. Data Anal., 50(7):1655–1677, April 2006. ISSN 0167-9473. doi: 10.1016/j.csda.2005.02.008. URL http://dx.doi.org/10.1016/j.csda.2005.02.008. 125 Bibliography M.A. Cameron. A note on functions of markov processes with an application to a sequence of χ22statistics. Journal of Applied Probability, 10(4):pp. 895–900, 1973. C. Carvalho. Structure and sparsity in high-dimensional multivariate anal- ysis. PhD Thesis, 2006. Carlos M. Carvalho and Mike West. Dynamic matrix-variate graphical mod- els. Bayesian Analysis, 2(1):69–98, 2007. S. Chib and E. Greenberg. Analysis of multivariate probit models. Biometrika, 85(2):347–361, 1998. S. Chib, Y. Omori, and M. Asai. Multivariate stochastic volatility. Technical report, 2007. A. Dawid and S. Lauritzen. Hyper markov laws in the statistical analysis of decomposable graphical models. The Annals of Statistics, 21(3):1272– 1317, sep 1993. A. Dempster. Covariance selection. Biometrics, 28(1):157–175, mar 1972. E. B. Fowlkes, A. E. Freeny, and J. M. Landwehr. Evaluating logistic models for large contingency tables. 83:611–622, 1988a. E. B. Fowlkes, A.E. Freeny, and J.M. Landwehr. Evaluating logistic models for large contingency tables. Journal of the American Sta- tistical Association, 83(403):611–622, 1988b. ISSN 01621459. URL http://www.jstor.org/stable/2289283. A. Gelman, John B. Carlin, Hal S. Stern, and D.B. Rubin. Bayesian Data Analysis, Second Edition (Chapman & Hall/CRC Texts in Statistical Sci- ence). Chapman and Hall/CRC, 2 edition, July 2003. J. Geweke. Efficient simulation from the multivariate normal and student– t distributions subject to linear constraints. Computing Science and Statistics: Proceedings of the Twenty–Third Symposium on the Interface, Alexandria, VA: American Statistical Association, pp., 1991. 126 Bibliography W.R. Gilks, G.O. Roberts, and S.K. Sahu. Adaptive markov chain monte carlo through regeneration. Journal of the American Statistical Associa- tion, 93(443):1045–1054, 1998. P. Giudici. Learning in graphical gaussian models. Bayesian Statistics 5: Proceedings of the Fifth Valencia International Meeting, June 5-9,1994, pages 621–628, 1996. P. Giudici and P.J. Green. Decomposable graphical gaussian model deter- mination. Biometrika, 86:785–801, 1999. C. Gourieroux, J. Jasiak, and R. Sufana. The wishart autoregressive process of multivariate stochastic volatility. Journal of Econometrics, 150(2):167– 181, June 2009. A. Gupta and D. Nagar. Matrix Variate Distributions. Chapman and Hall, 2000. P. Koev and A. Edelman. The efficient evaluation of the hypergeometric function of a matrix argument. Mathematics of Computation, pages 833– 846, 2006. S. Lauritzen. Graphical Models. Clarendon Press, Oxford, 1996. E. Lawrence, D. Bingham, C. Liu, and V.N.N. Nair. Bayesian inference for multivariate ordinal data using parameter expansion. Technometrics, 50 (2):182–191, February 2008. A. Lee, C. Yau, M.B. Giles, A. Doucet, and C.C. Holmes. On the utility of graphics cards to perform massively parallel simulation of advanced monte carlo methods. Journal of Computational and Graphical Statistics, 19(4):769–789, 2010. Ming Lina, Changjiang Liub, and Linlin Niua. Bayesian estimation of wishart autoregressive stochastic volatility model. 2012. F Lindsten and T. B Schon. On the use of backward simulation in the particle gibbs sampler, 2011. 127 Bibliography F. Lindsten, M.I. Jordan, and T.B. Schon. Ancestor sampling for particle gibbs. arXiv preprint arXiv:1210.6911, 2012. C. Liu. Bayesian analysis of multivariate probit models - discussion on the art of data augmentation by van dyk and meng. Journal of Computational and Graphical Statistics, 10:75–81, 2001. D. Madigan and A.E. Raftery. Model selection and accounting for model un- certainty in graphical models using occam’s window. Journal of the Amer- ican Statistical Association, 89(428):1535–1546, 1994. ISSN 01621459. URL http://www.jstor.org/stable/2291017. Jimmy Olsson and Tobias Ryden. Rao-blackwellization of particle markov chain monte carlo methods using forward filtering backward sampling. Signal Processing, IEEE Transactions on, 59(10):4606–4619, 2011. A. Philipov and M.E. Glickman. Multivariate Stochastic Volatility Via Wishart Random Processes. Journal of Business and Economic Statistics, 24(3):313–327, 2006. Ser-Huang Poon and Clive WJ Granger. Forecasting volatility in financial markets: A review. Journal of Economic Literature, 41(2):478–539, 2003. C. Robert. Simulation of truncated normal variables. Statistics and Com- puting, 5(2):121–125, 1995. G.O. Robert. Adaptive markov chain monte carlo through regeneration. Chapman and Hall/CRC, 1996. A. Tabet. Bayesian inference in the multivariate probit model. Master’s Thesis, 2007. A. Talhouk, A. Doucet, and K. Murphy. Efficient bayesian inference for mul- tivariate probit models with sparse inverse correlation matrices. Journal of Computational and Graphical Statistics, 21(3):739–757, 2012. 128 Bibliography C. Tarantola. MCMC model determination for discrete graphical models. Statistical Modeling, 4(1):39–61, 2004. doi: 10.1191/1471082X04st063oa. URL http://smj.sagepub.com/cgi/content/abstract/4/1/39. T. Tokuda, B. Goodrich, I. Van Mechelen, A. Gelman, and F. Tuerlinckx. Visualizing distributions of covariance matrices. 2011. E. Webb and J. Forster. Bayesian model determination for multivariate ordinal and binary data. Computational Statistics and Data Analysis, 52: 2632–2649, January 2008. N Whiteley. Discussion on particle markov chain monte carlo methods. Journal of the Royal Statistical Society B, (72):306–307, July 2010. Nick Whiteley, Christophe Andrieu, and Arnaud Doucet. Efficient bayesian inference for switching state-space models using discrete particle markov chain monte carlo methods. arXiv preprint arXiv:1011.2437, 2010. F. Wong, C. Carter, and R. Kohn. Efficient estimation of covariance selection models. Biometrika, 90(4):809–830, 2003. X. Zhang, W.J. Boscardin, and T. Belin. Sampling correlation matrices in bayesian models with correlated latent variables. Journal of Computa- tional and Graphical Statistics, 15(4):880–896, December 2006. 129 Appendix A Distributions We adopt the convention of using small letters to denote scalars, capital letters to denote matrices and vectors. A.1 The Gaussian Distribution A.1.1 The Univariate Representation If Z1:n i.i.d∼ N (µ, σ2) is an n-dimensional vector with scalar mean µ and variance σ2 then f(Z | µ, σ2) = (2pi)−n/2(σ2)−n/2 exp ( −1 2 σ−2sst ) (A.1) where sst = ∑n i=1(Zi − µi)2 is the sum of squared distance from the mean. The log likelihood is given by l(µ, σ−2; z1:n) = −n 2 log(2pi) + n log σ−2 − 1 2 σ−2sst (A.2) Random Number Generation To generate n samples with mean µ and variance σ2 • Sample z∗ ∼ N (0, 1) using the randn function in matlab. • Compute z = σz∗ + µ. 130 A.1. The Gaussian Distribution A.1.2 Multivariate and Matrix-Variate Gaussian Distribution If Z ∼ NJ(µ,Σ) with mean vector µ (J × 1) and variance-covariance matrix Σ (J × J), then the probability density function is given by f(Z | µ,Σ) = 1 (2pi)nJ/2 |Σ−1|n/2etr ( −1 2 Σ−1(Z − µ)(Z − µ)′) ) (A.3) The log likelihood is given by l(Z | µ,Σ) = −nJ 2 log(2pi) + n 2 log(|Σ−1|) + tr ( −1 2 Σ−1(Z − µ)(Z − µ)′ ) (A.4) The Matrix-Variate Normal Distribution The matrix Variate Nor- mal distribution is the generalization of multivariate Gaussian vectors to multivariate Gaussian matrices. A random matrix X (n × J) is said to follow a matrix Normal distribution X ∼ MN n×J (X;Mn×J ,ΣJ×J ,Ωn×n) with pdf is given by f(X |M,Σ,Ω) = |Σ −1|n/2|Ω|−J/2 (2pi)nJ/2 etr { −1 2 Σ−1(X −M)′Ω−1(X −M) } (A.5) where M is n× J , Σ is J × J and Ω is n× n. In this case, M is the mean of the matrix X, Σ encodes the covariance between the columns of X and Ω is the covariance between the rows of X. The matrix normal is related to the multivariate normal in the follow- ing way, for X ∼ MN n×J (X;Mn×J ,ΣJ×J ,Ωn×n) if and only if vec(X) ∼ N (vec(M),Σ⊗Ω) where vec(X) is the vector notation obtained by stacking the columns of X and ⊗ is the kronecker product. The log likelihood is given by l(Σ,Ω,M ;X) = −nJ 2 log(2pi) + n 2 log|Σ−1| − J 2 log |Ω|+ tr { −1 2 Σ−1(X −M)′Ω−1(X −M) } (A.6) 131 A.2. The χ2 Distribution Random Variable Generation To generate a random variable from a n × J matrix-variate Gaussian Variable with mean M(n×J) and covariance ΣJ and In, we use the built-in matlab functions mvnrnd. • Generate X=mvnrnd(M,Σ,1) A.2 The χ2 Distribution A.2.1 The Central χ2 If w ∼ χ2(d, s) with degree of freedom d and scale parameter s then f(w | d, s) = 1 2d/2Γ(d/2) 1 s exp ( −1 2 w s )(w s )d/2−1 (A.7) The log p.d.f. is given by l(d, s;w) = −d 2 log(2)− log(Γ(d/2)) + d 2 log (w s ) − log(w)− w/2s (A.8) Expectation The expectation of a standard χ2 variable is equal to its degrees of free- dom. To obtain the expectation of a scaled χ2 we can start with the Gaussian representation. Let x1:d iid∼ N (0, s), the ∑di=1( xi√s)2 ∼ χ2(d) with expected value d. E ( d∑ i=1 ( xi√ s )2) = d (A.9) E ( d∑ i=1 (xi) 2 ) = sd (A.10) Random Number Generation To Generate n random variables according to χ2(d, S) • Generate w′ ∼ χ2(d) using matlab function chi2rnd(d, n). 132 A.2. The χ2 Distribution • Compute w = sw′ A.2.2 The Non-central χ2 If w ∼ χ2(d, s, θ) with degrees of freedom d, non-centrality parameter θ and scale parameter s, then f(w | d, θ, s) = 1 2d/2Γ(d/2) 1 s exp ( −1 2 ( θ + w s ))(w s )d/2−1 0F1 ( ; d/2; θ 4 w s ) (A.11) Here we use the representation of the non-central χ2 variable w as the sum of squared Gaussian variables w = ∑d i=1(xi) 2, where x ∼ N (µ, s) and θ is related to the mean of the Gaussian variable and is defined as θ = d∑ i=1 ( µi√ s )2 the log p.d.f given by l(d, s;w) = −d 2 log(2)− log(Γ(d/2)) + d 2 log (w s ) − log(w)− 1 2 ( θ + w s ) + log ( 0F1 ( ; d/2; θ 4 w s )) (A.12) Expectation E ( d∑ i=1 ( xi√ s )2) = d+ θ (A.13) E ( d∑ i=1 (xi) 2 ) = s(d+ θ) (A.14) Note that when the non-centrality parameter θ = 0 the non-central χ2 reduces to the central χ2 Random Number Generation 133 A.3. The Wishart Distribution To Generate n random variables according to a non-central χ2(d, S, θ), a matlab function can be readily used. This function uses the Poisson weighted mixture representation. • Generate w′ ∼ χ2(d, I, λ) using matlab function chi2rnd(d, λ, n). • Compute w = sw′, to obtain a scaled variable. A.3 The Wishart Distribution The Central Wishart Distribution If a J × J covariance matrix Σ ∼ WJ(d, S), with degrees of freedom d and scale parameter S, the the p.d.f is given by{ 2dJ/2ΓJ (d/2) |S|d/2 }−1 |Σ| 12 (d−J−1)etr(−1 2 S−1Σ ) (A.15) Where ΓJ is the multivariate Gamma given by ΓJ(a) = pi 1 4 J(J−1) J∏ i=1 Γ [ a− 1 2 (i− 1) ] (A.16) There exist several parameterization of the Wishart distribution. The one that we most frequently use throughout (unless otherwise noted) is the result of taking the outer product of two Gaussian vectors. The log pdf is given by l(d, S; Σ) = −dJ 2 log(2)− ΓJ ( d 2 ) − d 2 log |S|+ (d− J − 1) 2 log |Σ| + tr { −1 2 S−1Σ } (A.17) From Gupta and Nagar (2000) Theorem 3.2.2 page 88: LetX ∼MN dJ(0, S, Id) and define Σ = X ′X, then for d ≥ J , Σ ∼ WJ(d, S) with density given by (A.15). 134 A.3. The Wishart Distribution Another theorem of importance is the invariance of the Wishart distri- bution (Theorem 3.3.1 p 90 Gupta and Nagar (2000)). If Σ ∼ WJ(d, S) then for any A(J × J) non-singular matrix, AΣA′ ∼ WJ(d,ASA′). Expected Values Let Σ = (σij) ∼ WJ(d, S), then E (Σ) = dS, E (σij) = dsij (A.18) Random Variable Generation To generate a random matrix Σ ∼ WJ(d, S) 1. Generate d observations from X ∼ NdJ(0, IJ) 2. Compute Z = X ′X. In this case, Z ∼ WJ(d, IJ). 3. Let S = AA′, where A is a lower triangular matrix obtained using the Cholesky decomposition. We can use the invariance property and compute Σ = AZA′. The distribution of which will be Σ ∼ WJ(d, S) as desired. The Inverse Wishart Distribution A matrix Σ(J × J) is said to have an inverse Wishart distribution, denoted by IWJ(d, ψ) with degrees of freedom d an location matrix ψ(p× p) if it’s pdf is given by { 2(d−J−1)J/2ΓJ (d/2) |ψ|−(d−J−1)/2 }−1 |Σ|− 12 (d)etr(−1 2 Σ−1ψ ) (A.19) Let Σ ∼ IWJ(d, ψ), then Σ−1 ∼ WJ(d− J − 1, ψ−1). The Standard Inverse Wishart Distribution There are several parametrization of the inverse Wishart distribution. We will list below the ones that we use in this work. 135 A.3. The Wishart Distribution 1. The parametrization used in Gupta and Nagar (2000) Let Σ ∼ IW (m, IJ), then fJ(Σ|ν) ∝ |Σ|− 12 (m) exp(−1 2 tr(Σ−1)) (A.20) With Expectation: E(Σ) = IJ m− 2J − 2 The Matlab function invwishirnd from the MCMC tool box (http://www.mathworks.com/matlabcentral/fileexchange/ by David Shera), implements sampling from this distribution. 2. The parametrization used in Barnard et al. (2000) Let Σ ∼ IW (ν, IJ), then fJ(Σ|ν) ∝ |Σ|− 12 (ν+J+1) exp(−1 2 tr(Σ−1)) (A.21) With Expectation: E(Σ) = IJ ν − J − 1 This corresponds to A.20, with m = ν+J+1. This parametrization is implemented in the matlab function iwishrnd in the STAT toolbox. 3. The parametrization used in Dawid and Lauritzen (1993) Let Σ ∼ IW (δ, IJ), then fJ(Σ|δ) ∝ |Σ|− 12 (δ+2J) exp(−1 2 tr(Σ−1)) (A.22) With Expectation: E(Σ) = IJ δ − 2 This corresponds to A.20, with δ = ν − J + 1 . 136 A.3. The Wishart Distribution In the one dimensional case all three parametrization reduce to an inverse Chi Square: f(σ2|v) ∝ (σ)− 12 (v+2) exp(− 1 2σ2 ) (A.23) In this case we could see that ν and δ is parametrization 2 and 3 are equivalent and they are equal to v, and in parametrization 1, m = v + 2. The Non-Central Wishart Distribution Σ ∼ WJ(d, S,Θ) follows an non central Wishart distribution with degrees of freedom d, a scale parameter S and a non-centrality parameter Θ, with pdf given by { 2dJ/2ΓJ (d/2) |S|d/2 }−1 |Σ| 12 (d−J−1)etr{−1 2 Θ } etr ( −1 2 S−1Σ ) 0F1 ( ; 1 2 d; 1 4 ΘS−1Σ ) (A.24) where ΓJ (d/2) = ∫ A0 exp{tr(−A)}(detA)(d−J−1)/2dA is the multi- dimen- sional gamma function and 0F1 is the hypergeometric function of matrix argument, and the density is defined on positive definite matrices. The hypergeometric function has a series expansion: 0F1(; d 2 ; 1 4 ΘS−1Σ) = ∞∑ p=0 ∑ l Cl( 1 4ΘS −1Σ) (d/2)lp! (A.25) where Σl denotes summation over all partitions l = (p1, . . . , pm), p1 ≥ · · · ≥ pm ≥ 0 of p integers, (d/2)l is the generalized hypergeometric coefficient (d/2)l = ∏m i=1(d/2− (i− 1)/2)p1 with (a)p1 = a(a+ 1) . . . (a+ pi − 1), and Cl( 1 4ΘS −1Σ) is the zonal polynomial associated with partition l. The log pdf of the non-centered Wishart is given by − dJ 2 log(2)−ΓJ ( d 2 ) − d 2 log |S|+ (d− J − 1) 2 log |Σ|− 1 2 tr { Θ + S−1Σ } + log { 0F1 ( ; 1 2 d; 1 4 ΘS−1Σ )} (A.26) 137 A.3. The Wishart Distribution This parametrization of the non-central Wishart arises by taking the outer product of Gaussians. From Gupta and Nagar (2000) Theorem 3.5.1 p 114. Let Xd×J ∼ MN dJ(Md×J , SJ , Id), then ΣJ = X ′X ∼ WJ(d, S,Θ) with Θ = S−1M ′M . Furthermore, Theorem 3.5.2 gives a useful result: Let X ∼MN dJ(M,SJ , Id) and let A(q×J) be any matrix of rank q ≤ J Then AX ′XA′ ∼ Wq(d,ASA′, (ASA′)−1AM ′MA′) (A.27) Expectation Let Σ ∼ WJ(d, SJ ,Θ) be a non-central Wishart, with Θ = S−1M ′M E(Σ) = dSJ +M ′M (A.28) Random Number Generation To Generate a random variable W ∼ WJ (d, SJ ,Θ) with Θ = S−1M ′M We can write W = S1/2 ( J∑ i=1 (Xi + m̃i)(Xi + m̃i) ′ + V ) S1/2 where V ∼ WJ(d − J, IJ) and X ∼ Nj(0, I), and m̃i are the columns of a M̃ = chol(S−1/2M ′MS−1/2). Under this formulation we can see that 138 A.3. The Wishart Distribution E(W ) = E [ S1/2 ( J∑ i=1 (Xi + m̃i)(Xi + m̃i) ′ + V ) S1/2 ] (A.29) = S1/2E [ J∑ i=1 ( XiX ′ i +Xim̃ ′ i + m̃iX ′ i + m̃im̃ ′ i ) + V ] S1/2(A.30) = S1/2 [ E ( J∑ i=1 XiX ′ i ) + E(V ) + J∑ i=1 m̃im̃ ′ i ] S1/2 (A.31) = S1/2 ( JI + (d− J)I + J∑ i=1 m̃im̃ ′ i ) S1/2 (A.32) = S1/2 ( dI + M̃ ′M̃ ) S1/2 = dS +M ′M (A.33) 139 Appendix B Conjugate analysis B.1 Normal Wishart Let Z | Σ−1 ∼ N (0,Σ−1) and Σ−1 ∼ W(d0, S0), the joint distribution is given by pi(Z,Σ−1) = 1 (2pi)nJ/2 |Σ−1|n/2etr ( −1 2 ZZ ′Σ−1 ) × 1 C0 |Σ−1| 12 (d0−J−1)etr ( −1 2 S−10 Σ −1 ) (B.1) where C0 = 2 d0J/2ΓJ (d0/2) |S0|d0/2 is the normalizing constant for the prior distribution and d0 and S0 are the hyperparameters for the degrees of freedom and the scale respectively. The prior is conjugate to the likelihood therefore the posterior distribu- tion is given by pi(Σ−1 | Z) ∼ W(dn, Sn) (B.2) with dn = d0 +n and Sn = (S −1 0 +ZZ ′)−1. The marginal likelihood is given in closed form by p(Z) = Cn C0 1 (2pi)nJ/2 (B.3) = ΓJ (dn/2) |Sn|dn/2 ΓJ (d0/2) |S0|d0/2 1 pinJ/2 (B.4) In the case, where we have multiple time points, the marginal likelihood 140 B.2. Normal Wishart with Covariates decomposes p(Z1, . . . , ZT ) = p(Z1, . . . , ZT−1)p(ZT | Z1, . . . , ZT−1) (B.5) = p(Z1) T∏ t=2 p(Zt | Zt−1) (B.6) In the special case where Σ−11 , . . . ,Σ −1 T are independent and identically dis- tributed (This is when ρ = 0) p(Z1, . . . , ZT ) = T∏ t=1 p(Zt) (B.7) = T∏ t=1 ΓJ ((d+ n)/2) |(S−10 + ZtZ ′t)−1|(d+n)/2 ΓJ (d0/2) |S0|d0/2 1 pinJ/2 (B.8) In the other special case where Σ−11 = · · · = Σ−1T , corresponding to the WAP case when ρ = 1, the time element becomes irrelevant and the data can be stacked and the joint distribution of data and parameter can be written as pi(Z,Σ−1) = 1 (2pi)nTJ/2 |Σ−1|nT/2etr ( −1 2 ZZ ′Σ−1 ) × 1 C0 |Σ−1| 12 (d0−J−1)etr ( −1 2 S−10 Σ −1 ) (B.9) which also results in a Wishart with the following hyperparameters: dn = d0 + nT and Sn = (S −1 0 + ZZ ′)−1, and where Z is the rows of Z1, . . . , Zt stacked. The marginal likelihood in this case has the same as (B.4) but with the new hyperparameters. B.2 Normal Wishart with Covariates The multivariate linear model is defined by 141 B.2. Normal Wishart with Covariates Z = Xβ +  (B.10)  Z11 . . . Z1J ... . . . ... Zn1 . . . ZnJ  =  x11 . . . x1p ... . . . ... xn1 . . . xnp ×  β11 . . . β1J ... . . . ... βp1 . . . βpJ +  11 . . . 1J ... . . . ... n1 . . . nJ  (B.11) Where Z is a n × J matrix of continuous responses, X is a n × p design matrix observed without error, β is a p×J matrix of regression coefficients and  is a n×J matrix of Gaussian error terms. In this case, Zn×J ∼ MNn,J (Xβn×p,ΣJ , In) follows a matrix-variate Gaussian density given by f(Z | β,Σ) = (2pi)− 12Jn|In|− J2 |ΣJ |−n2 etr { −1 2 Σ−1J (Z −Xβ)′I−1n (Z −Xβ) } (B.12) where Σ is a J × J covariance matrix representing the association between the errors of the different elements of the response and In is an n×n matrix represent- ing the association between the different observations and is the identity because observations are assumed to be independent and identically distributed. We use a conjugate prior for β which is also a matrix-variate Normal, βp×J | ΣJ ∼MNn,J (0p×J ,ΣJ ,Ψp) with a p× J mean matrix of zeros and Ψp is a p× p matrix of hyperparamters representing respectively the prior belief on the location and the dispersion of the distribution of β, with density given by (2pi)− 1 2pJ |Ψ|− J2 |Σ|− p2 etr { −1 2 Σ−1J β ′Ψ−1p β } (B.13) and ΣJ ∼ IW(2, I), whereI is the identity matrix using Lauritzen parametriza- tion as in paper is proportional to |Σ|−(2+2J)/2etr { −1 2 Σ−1 } (B.14) We would like to compute the posterior distribution: pi(Σ, β | Z) ∝ f(Z | Σ, β)pi(β | Σ)pi(Σ) (B.15) Putting it all together, we get 142 B.2. Normal Wishart with Covariates pi(Σ, β | Z) ∝ |Σ|−n+p+2+2J2 |Ψ|− J2 etr { −1 2 Σ−1J [ (Z −Xβ)′(Z −Xβ) + β′Ψ−1p β + IJ ]} (B.16) The term (Z −Xβ)′(Z −Xβ) + β′Ψ−1β + I inside the square brackets can be re-written by completing the square as (β −M)′Ξ−1(β −M)−M ′Ξ−1M + C (B.17) where M = ΞX ′Z and Ξ−1 = X ′X + Ψ−1 and C = Z ′Z + IJ . Therefore we are able to re-write equation (B.16) as: pi(Σ, β | Z) ∝ |Ψ|− J2 |K| J2 | [ Σ|− (n+2+2J)2 etr { −1 2 Σ−1 [ C −M ′Ξ−1M]}]︸ ︷︷ ︸ ∝IWJ (Σ;2+n,Z′Z+I−M ′Ξ−1M) × [ |Σ|− p2 |K|− J2 etr { −1 2 Σ−1 [ (β −M)′Ξ−1(β −M)]}]︸ ︷︷ ︸ ∝MNJp(β;M,Σ,Ξ) From the previous expression , we can see that we are able to integrate out β. This leads us to pi(Σ | Z) ∼ IWJ(2 + n,Z ′Z + I −M ′Ξ−1M) (B.18) and pi(β | Z,Σ) ∼MN pJ(M,Σ,Ξ) (B.19) 143 Appendix C Heart Rate Data Results C.1 Bolus Event 1 Data Results 50 100 150 200 250 300 350 400 −20 0 20 40 60 80 100 120 time Bolus H e a r t R a t e Nociceptive Period Anti Nociceptive Period (a) The raw and centered heart rate data −1 0 1 2 3 4 N oc ic ep tio n log (σ) vs Time −1 0 1 2 3 4 time A nt i− N oc ic ep tio n (b) Posterior distribution of log(σ) in the nociception and anti-nociception phase. 500 1000 1500 2000 2500 3000 3500 4000 4500 0 0.2 0.4 0.6 0.8 1 Nociceptive Period 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 0 100 200 300 400 500 Nociceptive Period Median −− 0.943 95 CI −− [0.684,0.994] Acc. Prob −− 24.9% (c) Histogram and traceplot of the pos- terior distribution of ρ in the nocicep- tion phase 500 1000 1500 2000 2500 3000 3500 4000 4500 0 0.2 0.4 0.6 0.8 1 Anti−Nociceptive Period 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 0 500 1000 1500 2000 2500 Anti−Nociceptive Period Median −− 0.995 95 CI −− [0.907,1] Acc. Prob −− 23.8% (d) Histogram and traceplot of the posterior distribution of ρ in the anti- nociception phase Figure C.1: Results for bolus event 1 144 C.2. Bolus Event 2 Data Results 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 NC ANC Posterior of ρ Figure C.2: Boxplot of the posterior of ρ in the nociceptive state (NC) and anti-nociceptive (ANC) for bolus event 1. C.2 Bolus Event 2 Data Results 50 100 150 200 250 300 350 400 −20 0 20 40 60 80 100 120 time Bolus Heart Rate Data for Bolus Event 2 Anti Nociceptive Period Nociceptive Period (a) The raw and centered heart rate data (b) Posterior distribution of log(σ) in the nociception and anti-nociception phase 500 1000 1500 2000 2500 3000 3500 4000 4500 0 0.2 0.4 0.6 0.8 1 Nociceptive Period 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 0 200 400 600 800 1000 Nociceptive Period Median −− 0.985 95 CI −− [0.859,0.999] Acc. Prob −− 26% (c) Histogram and traceplot of the pos- terior distribution of ρ in the nocicep- tion phase 500 1000 1500 2000 2500 3000 3500 4000 4500 0 0.2 0.4 0.6 0.8 1 Anti−Nociceptive Period 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 0 500 1000 1500 Anti−Nociceptive Period Median −− 0.963 95 CI −− [0.302,0.999] Acc. Prob −− 29.1% (d) Histogram and traceplot of the posterior distribution of ρ in the no- ciception phase Figure C.3: Results for Bolus Event 2 145 C.2. Bolus Event 2 Data Results 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 NC ANC Posterior of ρ Figure C.4: Boxplot of the posterior of ρ in the nociceptive state (NC) and anti-nociceptive (ANC) for bolus event 2. 146 C.3. Bolus Event 3 Data Results C.3 Bolus Event 3 Data Results 50 100 150 200 250 300 350 400 −20 0 20 40 60 80 100 120 time Bolus Heart Rate Data for Bolus Event 3 Anti Nociceptive Period Nociceptive Period (a) The raw and centered heart rate data (b) Posterior distribution of log(σ) in the nociception and anti-nociception phase 500 1000 1500 2000 2500 3000 3500 4000 4500 0 0.2 0.4 0.6 0.8 1 Nociceptive Period 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 0 100 200 300 Nociceptive Period Median −− 0.979 95 CI −− [0.93,0.999] Acc. Prob −− 25.1% (c) Histogram and traceplot of the pos- terior distribution of ρ in the nocicep- tion phase 500 1000 1500 2000 2500 3000 3500 4000 4500 0 0.2 0.4 0.6 0.8 1 Anti−Nociceptive Period 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 0 100 200 300 400 500 Anti−Nociceptive Period Median −− 0.925 95 CI −− [0.145,0.997] Acc. Prob −− 37.3% (d) Histogram and traceplot of the posterior distribution of ρ in the no- ciception phase Figure C.5: Results for bolus event 3 147 C.4. Bolus Event 4 Data Results 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 NC ANC Posterior of ρ Figure C.6: Boxplot of the posterior of ρ in the nociceptive state (NC) and anti-nociceptive (ANC) for bolus event 3. C.4 Bolus Event 4 Data Results 50 100 150 200 250 300 350 400 −20 0 20 40 60 80 100 120 time Bolus Heart Rate Data for Bolus Event 4 Anti Nociceptive Period Nociceptive Period (a) The raw and centered heart rate data (b) Posterior distribution of log(σ) in the nociception and anti-nociception phase 500 1000 1500 2000 2500 3000 3500 4000 4500 0 0.2 0.4 0.6 0.8 1 Nociceptive Period 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 0 500 1000 1500 2000 Nociceptive Period Median −− 0.998 95 CI −− [0.974,1] Acc. Prob −− 35% (c) Histogram and traceplot of the pos- terior distribution of ρ in the nocicep- tion phase 500 1000 1500 2000 2500 3000 3500 4000 4500 0 0.2 0.4 0.6 0.8 1 Anti−Nociceptive Period 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 0 500 1000 1500 2000 2500 Anti−Nociceptive Period Median −− 0.996 95 CI −− [0.838,1] Acc. Prob −− 25.1% (d) Histogram and traceplot of the posterior distribution of ρ in the no- ciception phase Figure C.7: Results for bolus event 4 148 C.4. Bolus Event 4 Data Results 0.55 0.6 0.65 0.7 0.75 0.8 0.85 0.9 0.95 1 NC ANC Posterior of ρ Figure C.8: Boxplot of the posterior of ρ in the nociceptive state (NC) and anti-nociceptive (ANC) for bolus event 4. 149 C.5. Bolus Event 5 Data Results C.5 Bolus Event 5 Data Results 50 100 150 200 250 300 350 400 −20 0 20 40 60 80 100 120 140 time Bolus Heart Rate Data for Bolus Event 5 Anti Nociceptive Period Nociceptive Period (a) The raw and centered heart rate data (b) Posterior distribution of log(σ) in the nociception and anti-nociception phase 500 1000 1500 2000 2500 3000 3500 4000 4500 0 0.2 0.4 0.6 0.8 1 Nociceptive Period 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 0 200 400 600 800 Nociceptive Period Median −− 0.941 95 CI −− [0.317,0.998] Acc. Prob −− 29.9% (c) Histogram and traceplot of the pos- terior distribution of ρ in the nocicep- tion phase 500 1000 1500 2000 2500 3000 3500 4000 4500 0 0.2 0.4 0.6 0.8 1 Anti−Nociceptive Period 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 0 100 200 300 400 500 Anti−Nociceptive Period Median −− 0.782 95 CI −− [0.0694,0.999] Acc. Prob −− 59.1% (d) Histogram and traceplot of the posterior distribution of ρ in the no- ciception phase Figure C.9: Results for bolus event 5 150 C.6. Bolus Event 6 Data Results 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 NC ANC Posterior of ρ Figure C.10: Boxplot of the posterior of ρ in the nociceptive state (NC) and anti-nociceptive (ANC) for bolus event 5. C.6 Bolus Event 6 Data Results 50 100 150 200 250 300 350 400 450 500 550 −20 0 20 40 60 80 100 120 140 time Bolus Heart Rate Data for Bolus Event 6 Anti Nociceptive Period Nociceptive Period (a) The raw and centered heart rate data (b) Posterior distribution of log(σ) in the nociception and anti-nociception phase 500 1000 1500 2000 2500 3000 3500 4000 4500 0 0.2 0.4 0.6 0.8 1 Nociceptive Period 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 0 100 200 300 400 Nociceptive Period Median −− 0.926 95 CI −− [0.506,0.994] Acc. Prob −− 26.7% (c) Histogram and traceplot of the pos- terior distribution of ρ in the nocicep- tion phase 500 1000 1500 2000 2500 3000 3500 4000 4500 0 0.2 0.4 0.6 0.8 1 Anti−Nociceptive Period 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 0 500 1000 1500 Anti−Nociceptive Period Median −− 0.99 95 CI −− [0.819,1] Acc. Prob −− 21% (d) Histogram and traceplot of the posterior distribution of ρ in the no- ciception phase Figure C.11: Results for bolus event 6 151 C.6. Bolus Event 6 Data Results 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 NC ANC Posterior of ρ Figure C.12: Boxplot of the posterior of ρ in the nociceptive state (NC) and anti-nociceptive (ANC) for bolus event 6. 152 C.7. Bolus Event 7 Data Results C.7 Bolus Event 7 Data Results 50 100 150 200 250 300 350 400 −20 0 20 40 60 80 100 120 140 160 time Bolus Heart Rate Data for Bolus Event 7 Anti Nociceptive Period Nociceptive Period (a) The raw and centered heart rate data (b) Posterior distribution of log(σ) in the nociception and anti-nociception phase 500 1000 1500 2000 2500 3000 3500 4000 4500 0 0.2 0.4 0.6 0.8 1 Nociceptive Period 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 0 200 400 600 Nociceptive Period Median −− 0.945 95 CI −− [0.667,0.994] Acc. Prob −− 28.7% (c) Histogram and traceplot of the pos- terior distribution of ρ in the nocicep- tion phase 500 1000 1500 2000 2500 3000 3500 4000 4500 0 0.2 0.4 0.6 0.8 1 Anti−Nociceptive Period 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 0 100 200 300 400 Anti−Nociceptive Period Median −− 0.902 95 CI −− [0.658,0.977] Acc. Prob −− 30.1% (d) Histogram and traceplot of the posterior distribution of ρ in the no- ciception phase Figure C.13: Results for bolus event 7 153 C.8. Bolus Event 8 Data Results 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 NC ANC Posterior of ρ Figure C.14: Boxplot of the posterior of ρ in the nociceptive state (NC) and anti-nociceptive (ANC) for bolus event 7. C.8 Bolus Event 8 Data Results 50 100 150 200 250 300 350 400 −20 0 20 40 60 80 100 120 140 160 time Bolus Heart Rate Data for Bolus Event 8 Anti Nociceptive Period Nociceptive Period (a) The raw and centered heart rate data −1 0 1 2 3 N oc ic ep tio n log (σ) vs Time −1 0 1 2 3 time A nt i− N oc ic ep tio n (b) Posterior distribution of log(σ) in the nociception and anti-nociception phase 500 1000 1500 2000 2500 3000 3500 4000 4500 0 0.2 0.4 0.6 0.8 1 Nociceptive Period 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 0 200 400 600 Nociceptive Period Median −− 0.934 95 CI −− [0.694,0.984] Acc. Prob −− 23.5% (c) Histogram and traceplot of the pos- terior distribution of ρ in the nocicep- tion phase 500 1000 1500 2000 2500 3000 3500 4000 4500 0 0.2 0.4 0.6 0.8 1 Anti−Nociceptive Period 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 0 200 400 600 Anti−Nociceptive Period Median −− 0.99 95 CI −− [0.941,1] Acc. Prob −− 20% (d) Histogram and traceplot of the posterior distribution of ρ in the no- ciception phase Figure C.15: Results for bolus event 8 154 C.8. Bolus Event 8 Data Results 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 NC ANC Posterior of ρ Figure C.16: Boxplot of the posterior of ρ in the nociceptive state (NC) and anti-nociceptive (ANC) for bolus event 8. 155 C.9. Bolus Event 9 Data Results C.9 Bolus Event 9 Data Results 50 100 150 200 250 300 350 400 −20 0 20 40 60 80 100 120 time Bolus Heart Rate Data for Bolus Event 9 Anti Nociceptive Period Nociceptive Period (a) The raw and centered heart rate data (b) Posterior distribution of log(σ) in the nociception and anti-nociception phase 500 1000 1500 2000 2500 3000 3500 4000 4500 0 0.2 0.4 0.6 0.8 1 Nociceptive Period 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 0 100 200 300 Nociceptive Period Median −− 0.897 95 CI −− [0.49,0.99] Acc. Prob −− 31.6% (c) Histogram and traceplot of the pos- terior distribution of ρ in the nocicep- tion phase 500 1000 1500 2000 2500 3000 3500 4000 4500 0 0.2 0.4 0.6 0.8 1 Anti−Nociceptive Period 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 0 100 200 300 Anti−Nociceptive Period Median −− 0.978 95 CI −− [0.904,0.998] Acc. Prob −− 20.5% (d) Histogram and traceplot of the posterior distribution of ρ in the no- ciception phase Figure C.17: Results for bolus event 9 156 C.10. Bolus Event 10 Data Results 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 NC ANC Posterior of ρ Figure C.18: Boxplot of the posterior of ρ in the nociceptive state (NC) and anti-nociceptive (ANC) for bolus event 9. C.10 Bolus Event 10 Data Results 50 100 150 200 250 300 350 400 −20 0 20 40 60 80 100 120 140 time Bolus Heart Rate Data for Bolus Event 10 Anti Nociceptive Period Nociceptive Period (a) The raw and centered heart rate data −1 0 1 2 N oc ic ep tio n log (σ) vs Time −1 0 1 2 time A nt i− N oc ic ep tio n (b) Posterior distribution of log(σ) in the nociception and anti-nociception phase 500 1000 1500 2000 2500 3000 3500 4000 4500 0 0.2 0.4 0.6 0.8 1 Nociceptive Period 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 0 200 400 600 800 1000 Nociceptive Period Median −− 0.968 95 CI −− [0.782,0.998] Acc. Prob −− 24.6% (c) Histogram and traceplot of the pos- terior distribution of ρ in the nocicep- tion phase 500 1000 1500 2000 2500 3000 3500 4000 4500 0 0.2 0.4 0.6 0.8 1 Anti−Nociceptive Period 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 0 200 400 600 800 1000 Anti−Nociceptive Period Median −− 0.986 95 CI −− [0.875,1] Acc. Prob −− 23.9% (d) Histogram and traceplot of the posterior distribution of ρ in the no- ciception phase Figure C.19: Results for bolus event 10 157 C.10. Bolus Event 10 Data Results 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 NC ANC Posterior of ρ Figure C.20: Boxplot of the posterior of ρ in the nociceptive state (NC) and anti-nociceptive (ANC) for bolus event 10. 158 C.11. Bolus Event 11 Data Results C.11 Bolus Event 11 Data Results 50 100 150 200 250 300 350 400 450 −20 0 20 40 60 80 100 120 140 time Bolus Heart Rate Data for Bolus Event 11 Anti Nociceptive Period Nociceptive Period (a) The raw and centered heart rate data −1 0 1 2 3 N oc ic ep tio n log (σ) vs Time −1 0 1 2 3 time A nt i− N oc ic ep tio n (b) Posterior distribution of log(σ) in the nociception and anti-nociception phase 500 1000 1500 2000 2500 3000 3500 4000 4500 0 0.2 0.4 0.6 0.8 1 Nociceptive Period 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 0 200 400 600 800 Nociceptive Period Median −− 0.956 95 CI −− [0.714,0.995] Acc. Prob −− 21.7% (c) Histogram and traceplot of the pos- terior distribution of ρ in the nocicep- tion phase 500 1000 1500 2000 2500 3000 3500 4000 4500 0 0.2 0.4 0.6 0.8 1 Anti−Nociceptive Period 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 0 500 1000 1500 Anti−Nociceptive Period Median −− 0.998 95 CI −− [0.98,1] Acc. Prob −− 36.4% (d) Histogram and traceplot of the posterior distribution of ρ in the no- ciception phase Figure C.21: Results for bolus event 11 159 C.12. Bolus Event 12 Data Results 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 NC ANC Posterior of ρ Figure C.22: Boxplot of the posterior of ρ in the nociceptive state (NC) and anti-nociceptive (ANC) for bolus event 11. C.12 Bolus Event 12 Data Results 50 100 150 200 250 300 350 400 −20 0 20 40 60 80 100 120 time Bolus Heart Rate Data for Bolus Event 12 Anti Nociceptive Period Nociceptive Period (a) The raw and centered heart rate data (b) Posterior distribution of log(σ) in the nociception and anti-nociception phase 500 1000 1500 2000 2500 3000 3500 4000 4500 0 0.2 0.4 0.6 0.8 1 Nociceptive Period 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 0 200 400 600 Nociceptive Period Median −− 0.949 95 CI −− [0.802,0.991] Acc. Prob −− 24.1% (c) Histogram and traceplot of the pos- terior distribution of ρ in the nocicep- tion phase 500 1000 1500 2000 2500 3000 3500 4000 4500 0 0.2 0.4 0.6 0.8 1 Anti−Nociceptive Period 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 0 200 400 600 800 Anti−Nociceptive Period Median −− 0.975 95 CI −− [0.825,0.999] Acc. Prob −− 25.2% (d) Histogram and traceplot of the posterior distribution of ρ in the no- ciception phase Figure C.23: Results for bolus event 12 160 C.12. Bolus Event 12 Data Results 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 NC ANC Posterior of ρ Figure C.24: Boxplot of the posterior of ρ in the nociceptive state (NC) and anti-nociceptive (ANC) for bolus event 12. 161 C.13. Bolus Event 13 Data Results C.13 Bolus Event 13 Data Results 50 100 150 200 250 300 350 400 −20 0 20 40 60 80 100 120 140 time Bolus Heart Rate Data for Bolus Event 13 Anti Nociceptive Period Nociceptive Period (a) The raw and centered heart rate data (b) Posterior distribution of log(σ) in the nociception and anti-nociception phase 500 1000 1500 2000 2500 3000 3500 4000 4500 0 0.2 0.4 0.6 0.8 1 Nociceptive Period 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 0 100 200 300 400 Nociceptive Period Median −− 0.912 95 CI −− [0.768,0.971] Acc. Prob −− 24% (c) Histogram and traceplot of the pos- terior distribution of ρ in the nocicep- tion phase 500 1000 1500 2000 2500 3000 3500 4000 4500 0 0.2 0.4 0.6 0.8 1 Anti−Nociceptive Period 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 0 100 200 300 400 Anti−Nociceptive Period Median −− 0.9 95 CI −− [0.758,0.96] Acc. Prob −− 24.8% (d) Histogram and traceplot of the posterior distribution of ρ in the no- ciception phase Figure C.25: Results for bolus event 13 162 C.14. Bolus Event 14 Data Results 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 NC ANC Posterior of ρ Figure C.26: Boxplot of the posterior of ρ in the nociceptive state (NC) and anti-nociceptive (ANC) for bolus event 13. C.14 Bolus Event 14 Data Results 50 100 150 200 250 300 350 400 −20 0 20 40 60 80 100 120 140 time Bolus Heart Rate Data for Bolus Event 14 Anti Nociceptive Period Nociceptive Period (a) The raw and centered heart rate data −1 0 1 2 N oc ic ep tio n log (σ) vs Time −1 0 1 2 time A nt i− N oc ic ep tio n (b) Posterior distribution of log(σ) in the nociception and anti-nociception phase 500 1000 1500 2000 2500 3000 3500 4000 4500 0 0.2 0.4 0.6 0.8 1 Nociceptive Period 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 0 500 1000 1500 2000 Nociceptive Period Median −− 0.989 95 CI −− [0.871,1] Acc. Prob −− 24.8% (c) Histogram and traceplot of the pos- terior distribution of ρ in the nocicep- tion phase 500 1000 1500 2000 2500 3000 3500 4000 4500 0 0.2 0.4 0.6 0.8 1 Anti−Nociceptive Period 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 0 200 400 600 800 1000 Anti−Nociceptive Period Median −− 0.984 95 CI −− [0.844,1] Acc. Prob −− 24.2% (d) Histogram and traceplot of the posterior distribution of ρ in the no- ciception phase Figure C.27: Results for bolus event 14 163 C.14. Bolus Event 14 Data Results 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 NC ANC Posterior of ρ Figure C.28: Boxplot of the posterior of ρ in the nociceptive state (NC) and anti-nociceptive (ANC) for bolus event 14. 164 C.15. Bolus Event 15 Data Results C.15 Bolus Event 15 Data Results 50 100 150 200 250 300 350 400 −20 0 20 40 60 80 100 120 time Bolus Heart Rate Data for Bolus Event 15 Anti Nociceptive Period Nociceptive Period (a) The raw and centered heart rate data (b) Posterior distribution of log(σ) in the nociception and anti-nociception phase 500 1000 1500 2000 2500 3000 3500 4000 4500 0 0.2 0.4 0.6 0.8 1 Nociceptive Period 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 0 50 100 150 200 250 Nociceptive Period Median −− 0.987 95 CI −− [0.947,0.999] Acc. Prob −− 30.9% (c) Histogram and traceplot of the pos- terior distribution of ρ in the nocicep- tion phase 500 1000 1500 2000 2500 3000 3500 4000 4500 0 0.2 0.4 0.6 0.8 1 Anti−Nociceptive Period 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 0 1000 2000 3000 4000 Anti−Nociceptive Period Median −− 0.996 95 CI −− [0.316,1] Acc. Prob −− 27% (d) Histogram and traceplot of the posterior distribution of ρ in the no- ciception phase Figure C.29: Results for bolus event 15 165 C.16. Bolus Event 16 Data Results 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 NC ANC Posterior of ρ Figure C.30: Boxplot of the posterior of ρ in the nociceptive state (NC) and anti-nociceptive (ANC) for bolus event 15. C.16 Bolus Event 16 Data Results 50 100 150 200 250 300 350 400 −20 0 20 40 60 80 100 120 140 time Bolus Heart Rate Data for Bolus Event 16 Anti Nociceptive Period Nociceptive Period (a) The raw and centered heart rate data (b) Posterior distribution of log(σ) in the nociception and anti-nociception phase 500 1000 1500 2000 2500 3000 3500 4000 4500 0 0.2 0.4 0.6 0.8 1 Nociceptive Period 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 0 500 1000 1500 Nociceptive Period Median −− 0.966 95 CI −− [0.734,1] Acc. Prob −− 21% (c) Histogram and traceplot of the pos- terior distribution of ρ in the nocicep- tion phase 500 1000 1500 2000 2500 3000 3500 4000 4500 0 0.2 0.4 0.6 0.8 1 Anti−Nociceptive Period 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 0 100 200 300 Anti−Nociceptive Period Median −− 0.895 95 CI −− [0.735,0.962] Acc. Prob −− 27.6% (d) Histogram and traceplot of the posterior distribution of ρ in the no- ciception phase Figure C.31: Results for bolus event 16 166 C.16. Bolus Event 16 Data Results 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 NC ANC Posterior of ρ Figure C.32: Boxplot of the posterior of ρ in the nociceptive state (NC) and anti-nociceptive (ANC) for bolus event 16. 167 C.17. Bolus Event 17 Data Results C.17 Bolus Event 17 Data Results 50 100 150 200 250 300 350 400 −20 0 20 40 60 80 100 120 140 time Bolus Heart Rate Data for Bolus Event 17 Anti Nociceptive Period Nociceptive Period (a) The raw and centered heart rate data −1 0 1 2 3 N oc ic ep tio n log (σ) vs Time −1 0 1 2 3 time A nt i− N oc ic ep tio n (b) Posterior distribution of log(σ) in the nociception and anti-nociception phase 500 1000 1500 2000 2500 3000 3500 4000 4500 0 0.2 0.4 0.6 0.8 1 Nociceptive Period 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 0 100 200 300 400 Nociceptive Period Median −− 0.994 95 CI −− [0.966,1] Acc. Prob −− 22.4% (c) Histogram and traceplot of the pos- terior distribution of ρ in the nocicep- tion phase 500 1000 1500 2000 2500 3000 3500 4000 4500 0 0.2 0.4 0.6 0.8 1 Anti−Nociceptive Period 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 0 100 200 300 400 Anti−Nociceptive Period Median −− 0.921 95 CI −− [0.58,0.994] Acc. Prob −− 31.9% (d) Histogram and traceplot of the posterior distribution of ρ in the no- ciception phase Figure C.33: Results for bolus event 17 168 C.18. Bolus Event 18 Data Results 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 NC ANC Posterior of ρ Figure C.34: Boxplot of the posterior of ρ in the nociceptive state (NC) and anti-nociceptive (ANC) for bolus event 17 C.18 Bolus Event 18 Data Results 50 100 150 200 250 300 350 400 −20 0 20 40 60 80 100 120 140 time Bolus Heart Rate Data for Bolus Event 18 Anti Nociceptive Period Nociceptive Period (a) The raw and centered heart rate data −1 0 1 2 3 N oc ic ep tio n log (σ) vs Time −1 0 1 2 3 time A nt i− N oc ic ep tio n (b) Posterior distribution of log(σ) in the nociception and anti-nociception phase 500 1000 1500 2000 2500 3000 3500 4000 4500 0 0.2 0.4 0.6 0.8 1 Nociceptive Period 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 0 100 200 300 Nociceptive Period Median −− 0.884 95 CI −− [0.616,0.981] Acc. Prob −− 33.8% (c) Histogram and traceplot of the pos- terior distribution of ρ in the nocicep- tion phase 500 1000 1500 2000 2500 3000 3500 4000 4500 0 0.2 0.4 0.6 0.8 1 Anti−Nociceptive Period 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 0 100 200 300 Anti−Nociceptive Period Median −− 0.993 95 CI −− [0.965,1] Acc. Prob −− 27.6% (d) Histogram and traceplot of the posterior distribution of ρ in the no- ciception phase Figure C.35: Results for bolus event 18 169 C.18. Bolus Event 18 Data Results 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 NC ANC Posterior of ρ Figure C.36: Boxplot of the posterior of ρ in the nociceptive state (NC) and anti-nociceptive (ANC) for bolus event 18. 170 C.19. Bolus Event 19 Data Results C.19 Bolus Event 19 Data Results 50 100 150 200 250 300 350 400 450 −20 0 20 40 60 80 100 120 time Bolus Heart Rate Data for Bolus Event 19 Anti Nociceptive Period Nociceptive Period (a) The raw and centered heart rate data −1 0 1 2 3 N oc ic ep tio n log (σ) vs Time −1 0 1 2 3 time A nt i− N oc ic ep tio n (b) Posterior distribution of log(σ) in the nociception and anti-nociception phase 500 1000 1500 2000 2500 3000 3500 4000 4500 0 0.2 0.4 0.6 0.8 1 Nociceptive Period 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 0 100 200 300 400 Nociceptive Period Median −− 0.893 95 CI −− [0.69,0.967] Acc. Prob −− 25.5% (c) Histogram and traceplot of the pos- terior distribution of ρ in the nocicep- tion phase 500 1000 1500 2000 2500 3000 3500 4000 4500 0 0.2 0.4 0.6 0.8 1 Anti−Nociceptive Period 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 0 100 200 300 400 500 Anti−Nociceptive Period Median −− 0.962 95 CI −− [0.794,0.996] Acc. Prob −− 24.7% (d) Histogram and traceplot of the posterior distribution of ρ in the no- ciception phase Figure C.37: Results for bolus event 19 171 C.20. Bolus Event 20 Data Results 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 NC ANC Posterior of ρ Figure C.38: Boxplot of the posterior of ρ in the nociceptive state (NC) and anti-nociceptive (ANC) for bolus event 19. C.20 Bolus Event 20 Data Results 50 100 150 200 250 300 350 400 −20 0 20 40 60 80 100 120 140 time Bolus Heart Rate Data for Bolus Event 20 Anti Nociceptive Period Nociceptive Period (a) The raw and centered heart rate data −1 0 1 2 N oc ic ep tio n log (σ) vs Time −1 0 1 2 time A nt i− N oc ic ep tio n (b) Posterior distribution of log(σ) in the nociception and anti-nociception phase 500 1000 1500 2000 2500 3000 3500 4000 4500 0 0.2 0.4 0.6 0.8 1 Nociceptive Period 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 0 100 200 300 Nociceptive Period Median −− 0.878 95 CI −− [0.4,0.972] Acc. Prob −− 33% (c) Histogram and traceplot of the pos- terior distribution of ρ in the nocicep- tion phase 500 1000 1500 2000 2500 3000 3500 4000 4500 0 0.2 0.4 0.6 0.8 1 Anti−Nociceptive Period 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 0 500 1000 1500 Anti−Nociceptive Period Median −− 0.96 95 CI −− [0.312,0.998] Acc. Prob −− 25.8% (d) Histogram and traceplot of the posterior distribution of ρ in the no- ciception phase Figure C.39: Results for bolus event 20 172 C.20. Bolus Event 20 Data Results 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 NC ANC Posterior of ρ Figure C.40: Boxplot of the posterior of ρ in the nociceptive state (NC) and anti-nociceptive (ANC) for bolus event 20. 173 C.21. Bolus Event 21 Data Results C.21 Bolus Event 21 Data Results 50 100 150 200 250 300 350 400 450 500 −20 0 20 40 60 80 100 120 140 time Bolus Heart Rate Data for Bolus Event 21 Anti Nociceptive Period Nociceptive Period (a) The raw and centered heart rate data −1 0 1 2 3 4 N oc ic ep tio n log (σ) vs Time −1 0 1 2 3 4 time A nt i− N oc ic ep tio n (b) Posterior distribution of log(σ) in the nociception and anti-nociception phase 500 1000 1500 2000 2500 3000 3500 4000 4500 0 0.2 0.4 0.6 0.8 1 Nociceptive Period 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 0 50 100 150 200 250 Nociceptive Period Median −− 0.886 95 CI −− [0.55,0.993] Acc. Prob −− 31.1% (c) Histogram and traceplot of the pos- terior distribution of ρ in the nocicep- tion phase 500 1000 1500 2000 2500 3000 3500 4000 4500 0 0.2 0.4 0.6 0.8 1 Anti−Nociceptive Period 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 0 100 200 300 Anti−Nociceptive Period Median −− 0.887 95 CI −− [0.36,0.985] Acc. Prob −− 33.2% (d) Histogram and traceplot of the posterior distribution of ρ in the no- ciception phase Figure C.41: Results for bolus event 21 174 C.22. Bolus Event 22 Data Results 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 NC ANC Posterior of ρ Figure C.42: Boxplot of the posterior of ρ in the nociceptive state (NC) and anti-nociceptive (ANC) for bolus event 21. C.22 Bolus Event 22 Data Results 50 100 150 200 250 300 350 400 −20 0 20 40 60 80 100 120 time Bolus Heart Rate Data for Bolus Event 22 Anti Nociceptive Period Nociceptive Period (a) The raw and centered heart rate data (b) Posterior distribution of log(σ) in the nociception and anti-nociception phase 500 1000 1500 2000 2500 3000 3500 4000 4500 0 0.2 0.4 0.6 0.8 1 Nociceptive Period 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 0 100 200 300 400 500 Nociceptive Period Median −− 0.988 95 CI −− [0.934,0.999] Acc. Prob −− 24.3% (c) Histogram and traceplot of the pos- terior distribution of ρ in the nocicep- tion phase 500 1000 1500 2000 2500 3000 3500 4000 4500 0 0.2 0.4 0.6 0.8 1 Anti−Nociceptive Period 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 0 100 200 300 400 Anti−Nociceptive Period Median −− 0.995 95 CI −− [0.982,0.999] Acc. Prob −− 28.9% (d) Histogram and traceplot of the posterior distribution of ρ in the no- ciception phase Figure C.43: Results for bolus event 22 175 C.22. Bolus Event 22 Data Results 0.8 0.82 0.84 0.86 0.88 0.9 0.92 0.94 0.96 0.98 1 NC ANC Posterior of ρ Figure C.44: Boxplot of the posterior of ρ in the nociceptive state (NC) and anti-nociceptive (ANC) for bolus event 22. 176 C.23. Bolus Event 23 Data Results C.23 Bolus Event 23 Data Results 50 100 150 200 250 300 350 400 450 500 −20 0 20 40 60 80 100 120 140 time Bolus Heart Rate Data for Bolus Event 23 Anti Nociceptive Period Nociceptive Period (a) The raw and centered heart rate data −1 0 1 2 N oc ic ep tio n log (σ) vs Time −1 0 1 2 time A nt i− N oc ic ep tio n (b) Posterior distribution of log(σ) in the nociception and anti-nociception phase 500 1000 1500 2000 2500 3000 3500 4000 4500 0 0.2 0.4 0.6 0.8 1 Nociceptive Period 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 0 200 400 600 800 1000 Nociceptive Period Median −− 0.964 95 CI −− [0.305,0.999] Acc. Prob −− 24.7% (c) Histogram and traceplot of the pos- terior distribution of ρ in the nocicep- tion phase 500 1000 1500 2000 2500 3000 3500 4000 4500 0 0.2 0.4 0.6 0.8 1 Anti−Nociceptive Period 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 0 100 200 300 400 500 Anti−Nociceptive Period Median −− 0.937 95 CI −− [0.502,0.994] Acc. Prob −− 27.7% (d) Histogram and traceplot of the posterior distribution of ρ in the no- ciception phase Figure C.45: Results for bolus event 23 177 C.24. Bolus Event 24 Data Results 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 NC ANC Posterior of ρ Figure C.46: Boxplot of the posterior of ρ in the nociceptive state (NC) and anti-nociceptive (ANC) for bolus event 23. C.24 Bolus Event 24 Data Results 50 100 150 200 250 300 350 400 −20 0 20 40 60 80 100 120 140 time Bolus Heart Rate Data for Bolus Event 24 Anti Nociceptive Period Nociceptive Period (a) The raw and centered heart rate data (b) Posterior distribution of log(σ) in the nociception and anti-nociception phase 500 1000 1500 2000 2500 3000 3500 4000 4500 0 0.2 0.4 0.6 0.8 1 Nociceptive Period 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 0 500 1000 1500 2000 2500 Nociceptive Period Median −− 0.998 95 CI −− [0.939,1] Acc. Prob −− 20.1% (c) Histogram and traceplot of the pos- terior distribution of ρ in the nocicep- tion phase 500 1000 1500 2000 2500 3000 3500 4000 4500 0 0.2 0.4 0.6 0.8 1 Anti−Nociceptive Period 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 0 500 1000 1500 2000 2500 Anti−Nociceptive Period Median −− 0.994 95 CI −− [0.863,1] Acc. Prob −− 22.9% (d) Histogram and traceplot of the posterior distribution of ρ in the no- ciception phase Figure C.47: Results for bolus event 24 178 C.24. Bolus Event 24 Data Results 0.4 0.5 0.6 0.7 0.8 0.9 1 NC ANC Posterior of ρ Figure C.48: Boxplot of the posterior of ρ in the nociceptive state (NC) and anti-nociceptive (ANC) for bolus event 24. 179 C.25. Bolus Event 25 Data Results C.25 Bolus Event 25 Data Results 50 100 150 200 250 300 350 400 −20 0 20 40 60 80 100 120 140 time Bolus Heart Rate Data for Bolus Event 25 Anti Nociceptive Period Nociceptive Period (a) The raw and centered heart rate data (b) Posterior distribution of log(σ) in the nociception and anti-nociception phase 500 1000 1500 2000 2500 3000 3500 4000 4500 0 0.2 0.4 0.6 0.8 1 Nociceptive Period 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 0 1000 2000 3000 4000 Nociceptive Period Median −− 1 95 CI −− [0.917,1] Acc. Prob −− 27.7% (c) Histogram and traceplot of the pos- terior distribution of ρ in the nocicep- tion phase 500 1000 1500 2000 2500 3000 3500 4000 4500 0 0.2 0.4 0.6 0.8 1 Anti−Nociceptive Period 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 0 1000 2000 3000 4000 5000 Anti−Nociceptive Period Median −− 0.999 95 CI −− [0.758,1] Acc. Prob −− 25% (d) Histogram and traceplot of the posterior distribution of ρ in the no- ciception phase Figure C.49: Results for bolus event 25 180 C.26. Bolus Event 26 Data Results 0.6 0.65 0.7 0.75 0.8 0.85 0.9 0.95 1 NC ANC Posterior of ρ Figure C.50: Boxplot of the posterior of ρ in the nociceptive state (NC) and anti-nociceptive (ANC) for bolus event 25. C.26 Bolus Event 26 Data Results 50 100 150 200 250 300 350 400 −20 0 20 40 60 80 100 120 time Bolus Heart Rate Data for Bolus Event 26 Anti Nociceptive Period Nociceptive Period (a) The raw and centered heart rate data 0 1 2 3 4 N oc ic ep tio n log (σ) vs Time 0 1 2 3 4 time A nt i− N oc ic ep tio n (b) Posterior distribution of log(σ) in the nociception and anti-nociception phase 500 1000 1500 2000 2500 3000 3500 4000 4500 0 0.2 0.4 0.6 0.8 1 Nociceptive Period 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 0 100 200 300 400 Nociceptive Period Median −− 0.999 95 CI −− [0.994,1] Acc. Prob −− 25.3% (c) Histogram and traceplot of the pos- terior distribution of ρ in the nocicep- tion phase 500 1000 1500 2000 2500 3000 3500 4000 4500 0 0.2 0.4 0.6 0.8 1 Anti−Nociceptive Period 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 0 1000 2000 3000 4000 Anti−Nociceptive Period Median −− 0.999 95 CI −− [0.587,1] Acc. Prob −− 36.4% (d) Histogram and traceplot of the posterior distribution of ρ in the no- ciception phase Figure C.51: Results for bolus event 26 181 C.26. Bolus Event 26 Data Results 0.5 0.6 0.7 0.8 0.9 1 NC ANC Posterior of ρ Figure C.52: Boxplot of the posterior of ρ in the nociceptive state (NC) and anti-nociceptive (ANC) for bolus event 26. 182 C.27. Bolus Event 27 Data Results C.27 Bolus Event 27 Data Results 50 100 150 200 250 300 350 400 −20 0 20 40 60 80 100 120 140 time Bolus Heart Rate Data for Bolus Event 27 Anti Nociceptive Period Nociceptive Period (a) The raw and centered heart rate data (b) Posterior distribution of log(σ) in the nociception and anti-nociception phase 500 1000 1500 2000 2500 3000 3500 4000 4500 0 0.2 0.4 0.6 0.8 1 Nociceptive Period 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 0 50 100 150 200 250 Nociceptive Period Median −− 0.996 95 CI −− [0.989,0.999] Acc. Prob −− 23.5% (c) Histogram and traceplot of the pos- terior distribution of ρ in the nocicep- tion phase 500 1000 1500 2000 2500 3000 3500 4000 4500 0 0.2 0.4 0.6 0.8 1 Anti−Nociceptive Period 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 0 100 200 300 Anti−Nociceptive Period Median −− 0.965 95 CI −− [0.849,0.998] Acc. Prob −− 27.7% (d) Histogram and traceplot of the posterior distribution of ρ in the no- ciception phase Figure C.53: Results for bolus event 27 183 C.28. Bolus Event 28 Data Results 0.7 0.75 0.8 0.85 0.9 0.95 1 NC ANC Posterior of ρ Figure C.54: Boxplot of the posterior of ρ in the nociceptive state (NC) and anti-nociceptive (ANC) for bolus event 27. C.28 Bolus Event 28 Data Results 50 100 150 200 250 300 350 400 −20 0 20 40 60 80 100 120 140 time Bolus Heart Rate Data for Bolus Event 28 Anti Nociceptive Period Nociceptive Period (a) The raw and centered heart rate data 0 1 2 3 N oc ic ep tio n log (σ) vs Time 0 1 2 3 time A nt i− N oc ic ep tio n (b) Posterior distribution of log(σ) in the nociception and anti-nociception phase 500 1000 1500 2000 2500 3000 3500 4000 4500 0 0.2 0.4 0.6 0.8 1 Nociceptive Period 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 0 100 200 300 400 500 Nociceptive Period Median −− 0.979 95 CI −− [0.866,0.998] Acc. Prob −− 28.2% (c) Histogram and traceplot of the pos- terior distribution of ρ in the nocicep- tion phase 500 1000 1500 2000 2500 3000 3500 4000 4500 0 0.2 0.4 0.6 0.8 1 Anti−Nociceptive Period 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 0 500 1000 1500 Anti−Nociceptive Period Median −− 0.999 95 CI −− [0.991,1] Acc. Prob −− 25.9% (d) Histogram and traceplot of the posterior distribution of ρ in the no- ciception phase Figure C.55: Results for bolus event 28 184 C.28. Bolus Event 28 Data Results 0.6 0.65 0.7 0.75 0.8 0.85 0.9 0.95 1 NC ANC Posterior of ρ Figure C.56: Boxplot of the posterior of ρ in the nociceptive state (NC) and anti-nociceptive (ANC) for bolus event 28. 185 C.29. Bolus Event 29 Data Results C.29 Bolus Event 29 Data Results 50 100 150 200 250 300 350 400 −20 0 20 40 60 80 100 120 time Bolus Heart Rate Data for Bolus Event 29 Anti Nociceptive Period Nociceptive Period (a) The raw and centered heart rate data −0.5 0 0.5 1 1.5 2 2.5 N oc ic ep tio n log (σ) vs Time −0.5 0 0.5 1 1.5 2 2.5 time A nt i− N oc ic ep tio n (b) Posterior distribution of log(σ) in the nociception and anti-nociception phase 500 1000 1500 2000 2500 3000 3500 4000 4500 0 0.2 0.4 0.6 0.8 1 Nociceptive Period 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 0 1000 2000 3000 Nociceptive Period Median −− 0.996 95 CI −− [0.443,1] Acc. Prob −− 51% (c) Histogram and traceplot of the pos- terior distribution of ρ in the nocicep- tion phase 500 1000 1500 2000 2500 3000 3500 4000 4500 0 0.2 0.4 0.6 0.8 1 Anti−Nociceptive Period 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 0 1000 2000 3000 4000 Anti−Nociceptive Period Median −− 0.999 95 CI −− [0.44,1] Acc. Prob −− 32.8% (d) Histogram and traceplot of the posterior distribution of ρ in the no- ciception phase Figure C.57: Results for bolus event 29 186 C.30. Bolus Event 30 Data Results 0.4 0.5 0.6 0.7 0.8 0.9 1 NC ANC Posterior of ρ Figure C.58: Boxplot of the posterior of ρ in the nociceptive state (NC) and anti-nociceptive (ANC) for bolus event 29. C.30 Bolus Event 30 Data Results 50 100 150 200 250 300 350 400 −20 0 20 40 60 80 100 120 time Bolus Heart Rate Data for Bolus Event 30 Anti Nociceptive Period Nociceptive Period (a) The raw and centered heart rate data −1 0 1 2 3 N oc ic ep tio n log (σ) vs Time −1 0 1 2 3 time A nt i− N oc ic ep tio n (b) Posterior distribution of log(σ) in the nociception and anti-nociception phase 500 1000 1500 2000 2500 3000 3500 4000 4500 0 0.2 0.4 0.6 0.8 1 Nociceptive Period 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 0 500 1000 1500 2000 2500 Nociceptive Period Median −− 0.993 95 CI −− [0.117,0.999] Acc. Prob −− 28% (c) Histogram and traceplot of the pos- terior distribution of ρ in the nocicep- tion phase 500 1000 1500 2000 2500 3000 3500 4000 4500 0 0.2 0.4 0.6 0.8 1 Anti−Nociceptive Period 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 0 500 1000 1500 2000 Anti−Nociceptive Period Median −− 0.971 95 CI −− [0.0691,0.998] Acc. Prob −− 37.6% (d) Histogram and traceplot of the posterior distribution of ρ in the no- ciception phase Figure C.59: Results for bolus event 30 187 C.30. Bolus Event 30 Data Results 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 NC ANC Posterior of ρ Figure C.60: Boxplot of the posterior of ρ in the nociceptive state (NC) and anti-nociceptive (ANC) for bolus event 30. 188