Open Collections

UBC Theses and Dissertations

UBC Theses Logo

UBC Theses and Dissertations

Prediction of age of onset and penetrance and its application to clinical trial design for Huntington… Brinkman, Ryan Remy 2002

Your browser doesn't seem to have a PDF viewer, please download the PDF to view this item.

Item Metadata

Download

Media
831-ubc_2002-731340.pdf [ 27.35MB ]
Metadata
JSON: 831-1.0090583.json
JSON-LD: 831-1.0090583-ld.json
RDF/XML (Pretty): 831-1.0090583-rdf.xml
RDF/JSON: 831-1.0090583-rdf.json
Turtle: 831-1.0090583-turtle.txt
N-Triples: 831-1.0090583-rdf-ntriples.txt
Original Record: 831-1.0090583-source.json
Full Text
831-1.0090583-fulltext.txt
Citation
831-1.0090583.ris

Full Text

Prediction of age of onset and penetrance and its application to clinical trial design for Huntington Disease. Ryan Remy Brinkman B.Sc, Carleton University, 1989 A THESIS SUBMITTED IN PARTIAL FULFILMENT OF THE REQUIREMENTS FOR THE DEGREE OF DOCTOR OF PHILOSOPHY in THE FACULTY OF GRADUATE STUDIES G e n e t i c s Graduate Program We accept this thesis as conforming to the required standard THE UNIVERSITY OF BRITISH COLUMBIA July 2001 © Ryan Remy Brinkman, 2001 ( In presenting this thesis in partial fulfilment of the requirements for an advanced degree at the University of British Columbia, I agree that the Library shall make it freely available for reference and study. I further agree that permission for extensive copying of this thesis for scholarly purposes may be granted by the head of my department or by his or her representatives. It is understood that copying or publication of this thesis for financial gain shall not be allowed without my written permission. Department of The University of British Columbia Vancouver, Canada DE-6 (2/88) A B S T R A C T Huntington Disease (HD) is a progressive, neurodegenerative disorder caused by a CAG repeat expansion. The disease presents with motor disturbances, psychiatric symptoms, and cognitive decline. At the start of this thesis, there was no reliable method for predicting the age-specific likelihood of onset of HD. I hypothesized that being able to predict age of onset would be useful for both patients at risk for HD (patients) and clinical studies. For patients it would provide knowledge about their future age of onset, and clinically it would aid creation of clinical risk groups for stratifying patients in clinical trials and in the design of treatment regimes (in the case of a potentially hazardous therapy for HD). I first used data from the University of British Columbia HD clinic to demonstrate the utility of CAG-specific survival analysis. I then assembled what is believed to be the largest cohort of HD patients analyzed to date (3452 individuals from 40 centers worldwide) and developed a novel parametric survival model to estimate the age-specific likelihood of onset. The probability estimates of the model proved to be very accurate with a mean 95% confidence interval of 2%. I also developed a nonparametric survival model to predict the age-specific likelihood of death from HD. I used the parametric model to estimate the age and CAG specific penetrance of HD and demonstrated how my analyses might be used to aid in the design of presymptomatic clinical trials. Specifically I investigated how using the model can reduce sample size, cost and time necessary to conduct a trial. Further my analyses indicated a larger variance in age of onset for lower CAG repeat lengths, which could be of importance in future studies for factors that modify onset of HD. ii Table of Contents Abstract ii Table of Contents iii List of Tables vi List of Figures viii Acknowledgements xii 1 Introduction 1 1.1 Clinical and molecular aspects of Huntington Disease 2 1.2 Neuropathology of Motor Dysfunction 5 1.3 Localization 6 1.4 Interacting Proteins 7 1.5 Apoptosis 9 1.6 Age of Onset 10 1.7 Similarities to Other CAG Repeat Disorders 12 1.8 Survival Analysis 13 1.8.1 Nonparametric survival analysis 13 1.8.2 Parametric 14 1.9 Thesis Objectives 16 1.9.1 Nonparametric Prediction of Onset Using a UBC Cohort 16 1.9.2 Nonparametric Prediction of Onset Using a Worldwide Cohort 17 1.9.3 Parametric Prediction of Onset Using a Worldwide Cohort 17 1.9.4 Parsimonious Model for Predicting Onset Using a Worldwide Cohort 17 1.9.5 Determining the Accuracy of the Predictive Model 18 1.9.6 Prediction of Age at Death Using a UBC Cohort 18 1.9.7 Penetrance 18 1.9.8 Using the Predictive Model for the Design of Clinical Trials 18 2 Materials and Methods 20 2.1 Subjects 21 2.1.1 UBC cohort 21 2.1.2 Worldwide cohort 22 2.2 CAG Determination 23 2.2.1 UBC cohort 23 2.2.2 Worldwide cohort 24 2.3 Age of onset determination 24 2.3.1 UBC cohort 24 2.3.2 Worldwide cohort 25 2.4 Survival Analysis 25 2.4.1 Assumptions 25 2.4.2 Predicting the age and CAG-specific likelihood of onset 26 2.4.2.1 Nonparametric survival analysis 26 2.4.2.2 Parametric survival analysis 26 2.4.2.3 Parsimonious model 27 2.4.3 Predicting the age and CAG-specific likelihood of death 32 iii 2.5 Determining the Accuracy of the Predictive Model 32 2.6 Penetrance 34 2.7 Using the Predictive Model for the Design of Clinical Trials 34 3 Nonparametric Prediction of Onset Using The UBC Cohort 36 3.1 Introduction 37 3.2 Results 37 3.3 Conclusion 46 4 Nonparametric prediction of onset of HD using a Worldwide cohort 49 4.1 Introduction 50 4.2 Results 50 4.3 Conclusion 54 5 Parametric Prediction of Onset Using a Worldwide cohort 55 5.1 Introduction 56 5.2 Results 56 5.3 Conclusion 63 6 Parsimonious Model for Predicting Onset Using a Worldwide Cohort 64 6.1 Introduction 65 6.2 Results 65 6.2.1 Conditional probability tables 82 6.3 Conclusion 85 7 Assessing the Accuracy and Clinical UTILITY of the Predictive Model 89 7.1 Introduction 90 7.1.1 Brier Scores 90 7.2 Results 91 7.3 Conclusion 93 8 Penetrance 97 8.1 Introduction 98 8.2 Results 98 8.3 Conclusion 102 9 Using the Predictive Model for the Design of Clinical Trials 103 9.1 Introduction 104 9.2 Results 105 9.3 Conclusion 109 10 Predicting age of death using the UBC Cohort 111 10.1 Introduction 112 10.2 Results 112 10.3 Conclusion 116 11 Discussion 117 11.1 Summary of Results 118 11.1.1 Prediction of onset 118 11.1.2 Clinical trials 119 11.1.3 Penetrance 119 11.1.4 CAG-specific influence of factors modifying age of onset 119 11.2 Future Investigations 120 11.2.1 Identification of individuals with extreme phenotypes using the parsimonious model 120 11.2.2 Models for other triplet diseases 121 11.2.3 Stochastic model of disease progression 121 11.2.4 Extensive clinical trial design 121 iv 11.2.5 Conclusion 122 12 Bibliography 123 Appendix I Supplementary Figures 134 Appendix 11 Programs for determining the Brier Score 183 12.1 Program to calculate Brier Score based on predictive model 184 12.2 Program to calculate best possible Brier Score based on perfect model and dataset 187 Appendix III Conditional Probability Tables 190 Appendix IV Contributing Centers (worldwide cohort) 274 v List of Tables Table 1 Distribution of UBC cohort. Affected individuals and presymptomatic individuals at risk for HD who have a CAG greater than 28 40 Table 2 Age distribution of presymptomatic individuals in the UBC cohort. Data shown for individuals with a CAG between 30 and 35 42 Table 3 Cumulative probability of onset at different ages based on nonparametric analysis of the UBC cohort. Data shown for individuals with a CAG repeat between 39 and 50 43 Table 4 Median age at onset based on nonparametric survival analysis of the UBC cohort. Data shown for individuals with a CAG between 39 and 50 44 Table 5 Distribution of affected and asymptomatic individuals at risk for HD in the worldwide cohort 51 Table 6 Mean, maximum and standard deviation of the 95% Cl of the prediction of age of onset made using nonparametric survival analysis of the UBC and worldwide cohorts 53 Table 7 Log-likelihood of the parametric models by CAG repeat length 58 Table 8 Mean, maximum and standard deviation of 95% Cl of the prediction of the age-specific likelihood of onset made using nonparametric and parametric models with the UBC and worldwide cohorts 62 Table 9 Censoring rates for worldwide cohort 69 Table 10 Mean age of onset estimates for censoring groups based on an adjusted grouping of censor rates 70 Table 11 Cumulative probability of onset at different ages based on parsimonious model 79 Table 12 Mean, maximum and standard deviation of 95% Cl of the prediction of age of onset made using nonparametric, parametric and parsimonious model with the UBC and worldwide cohorts 80 Table 13 Mean age of onset of HD based on the parsimonious model, conditional on CAG and current age 83 Table 14 Median age of onset of HD based on the parsimonious model, conditional on CAG and current age 84 Table 15 Data used to estimate penetrance of CAG expansion in HD Gene, by CAG in the UBC cohort 100 Table 16 Penetrance of HD estimated by the parametric model 101 Table 17 Age and CAG distribution of presymptomatic individuals with CAG expansion from the worldwide cohort 107 Table 18 Size of a clinical trial required for detecting (80% power, p = 0.05) a delay of onset among presymptomatic individuals using both CAG and age as variables. 108 Table 19 Distribution of affected and deceased individuals in the UBC cohort 114 Table 20 Conditional Probability of Onset for an Individual with 36 CAG Repeats 191 Table 21 Conditional Probability of Onset for an Individual with 37 CAG Repeats 195 Table 22 Conditional Probability of Onset for an Individual with 38 CAG Repeats 199 Table 23 Conditional Probability of Onset for an Individual with 39 CAG Repeats 203 Table 24 Conditional Probability of Onset for an Individual with 40 CAG Repeats 207 vi Table 25 Conditional Probability of Onset for an Individual with 41 CAG Repeats 211 Table 26 Conditional Probability of Onset for an Individual with 42 CAG Repeats 215 Table 27 Conditional Probability of Onset for an Individual with 43 CAG Repeats 219 Table 28 Conditional Probability of Onset for an Individual with 44 CAG Repeats 223 Table 29 Conditional Probability of Onset for an Individual with 45 CAG Repeats 227 Table 30 Conditional Probability of Onset for an Individual with 46 CAG Repeats 231 Table 31 Conditional Probability of Onset for an Individual with 47 CAG Repeats 235 Table 32 Conditional Probability of Onset for an Individual with 48 CAG Repeats 239 Table 33 Conditional Probability of Onset for an Individual with 49 CAG Repeats 243 Table 34 Conditional Probability of Onset for an Individual with 50 CAG Repeats 247 Table 35 Conditional Probability of Onset for an Individual with 51 CAG Repeats 251 Table 36 Conditional Probability of Onset for an Individual with 52 CAG Repeats 255 Table 37 Conditional Probability of Onset for an Individual with 53 CAG Repeats 258 Table 38 Conditional Probability of Onset for an Individual with 54 CAG Repeats 262 Table 39 Conditional Probability of Onset for an Individual with 55 CAG Repeats 266 Table 40 Conditional Probability of Onset for an Individual with 56 CAG Repeats 270 vii List of Figures Figure 1 Symptoms of HD 3 Figure 2 Cumulative probability of onset for an individual with 40 repeats based on a nonparametric survival analysis of the UBC cohort. Error bars represent 95% Cl. 41 Figure 3 Cumulative probability of onset for being affected for a CAG between 39 and 50 based on nonparametric survival analysis of the UBC cohort 45 Figure 4 Cumulative probability of onset for being affected for a CAG between 39 and 50 based on a nonparametric survival the worldwide cohort 52 Figure 5 Quantile-quantile plots for estimated residuals stratified by quartiles of population density for 41 CAG repeats, based on the worldwide cohort 59 Figure 6 Comparison of the logistic and normal distributions. The thin gray line is a logistic distribution with mean 0 and scale = Jsin (therefore variance =1). The thick black line is a standard normal distribution with a mean 0 and variance 1.... 60 Figure 7 Cumulative probability of onset for being affected for a CAG between 39 and 50 based on the worldwide cohort. Smooth curves represent logistic survival curve and staircase line the nonparametric survival curves 61 Figure 8 Exponential relationship between the location parameter of the logistic survival distribution and repeat size, dashed line represents the 95% Cl 67 Figure 9 Exponential fit of the scale parameter of the logistic survival distribution and repeat size, dashed lines represents the 95% Cl 68 Figure 10 Cumulative probability of onset for being affected for a CAG between 39 and 56 based on the parsimonious model 74 Figure 11 Population estimates of mean age of onset for CAG repeat lengths 36 to 60. The • symbols and solid line indicate the range of data that was used to fit the exponential curves. The o symbols and long dashed lines indicate CAG lengths for which the model's predictions were extended. Small dashed lines indicate 95% Cl, larger spaces between dashes indicates repeats for which the model's predictions were extended 75 Figure 12 Population estimates of standard deviation of age of onset for CAG repeat lengths 36 to 60. The • symbols and solid line indicate the range of data that was used to fit the exponential curves. The O symbols and long dashed lines indicate CAG lengths for which the model's predictions were extended. Small dashed lines indicate 95% Cl, larger spaces between dashes indicates repeats for which the model's predictions were extended 76 Figure 13 Cumulative probability of onset for 53 repeats. Staircase lines represent the nonparametric (Kaplan-Meier) analysis with bars representing 95% Cl. Smooth curves represent prediction based on the parsimonious model with dashed lines representing 95% Cl 77 Figure 14 Cumulative probability of onset for being affected for a CAG between 39 and 50 based on the worldwide cohort showing the parsimonious model (smooth curves) compared to nonparametric distribution model (staircase lines) 78 Figure 15 Distribution of age of onset for individuals with 36 to 56 CAG repeats based on the parsimonious model 81 Figure 16 Cumulative probability of onset predicted by a parsimonious model developed with 80% of the data, compared to the observed onset for the hold-out sample for viii 41-42 CAG repeats. Black staircase lines represent the nonparametric (Kaplan-Meier) analysis. Green smooth curves with solid lines represent a parametric model based on 80% of the data. Blue short dashed lines represent the Brier Scores of the nonparametric prediction, based on the holdout sample, and the red long dashed lines represent the Brier Scores of the parametric model predictions, based on the modeling sample 92 Figure 17 Cumulative probability of death, at a particular age, for a CAG repeat length of 45. Stiarcase line represents nonparametric survival analysis of the UBC cohort and smooth line represents logistic survival curve. Error bars represent 95% Cl , 115 Figure 18 Cumulative probability of being affected, at a particular age, for a CAG repeat length of 39. Error bars represent 95% Cl , Basd on the UBC cohort 135 Figure 19 Cumulative probability of being affected, at a particular age, for a CAG repeat length of 41. Error bars represent 95% Cl , Basd on the UBC cohort 136 Figure 20 Cumulative probability of being affected, at a particular age, for a CAG repeat length of 42. Error bars represent 95% Cl , Basd on the UBC cohort 137 Figure 21 Cumulative probability of being affected, at a particular age, for a CAG repeat length of 43. Error bars represent 95% Cl , Basd on the UBC cohort 138 Figure 22 Cumulative probability of being affected, at a particular age, for a CAG repeat length of 44. Error bars represent 95% Cl , Basd on the UBC cohort 139 Figure 23 Cumulative probability of being affected, at a particular age, for a CAG repeat length of 45. Error bars represent 95% Cl , Basd on the UBC cohort 140 Figure 24 Cumulative probability of being affected, at a particular age, for a CAG repeat length of 46. Error bars represent 95% Cl , Basd on the UBC cohort 141 Figure 25 Cumulative probability of being affected, at a particular age, for a CAG repeat length of 47. Error bars represent 95% Cl , Basd on the UBC cohort 142 Figure 26 Cumulative probability of being affected, at a particular age, for a CAG repeat length of 48. Error bars represent 95% Cl , Basd on the UBC cohort 143 Figure 27 Cumulative probability of being affected, at a particular age, for a CAG repeat length of 49. Error bars represent 95% Cl , Basd on the UBC cohort 144 Figure 28 Cumulative probability of being affected, at a particular age, for a CAG repeat length of 50 Error bars represent 95% Cl , Basd on the UBC cohort 145 Figure 29 Cumulative probability of being affected, at a particular age, for a CAG repeat length of 50 Error bars represent 95% Cl , Basd on the UBC cohort 146 Figure 30 Quantile-quantile plots for estimated residuals stratified by quartiles of population density for 36 CAG repeats, based on the worldwide cohort 147 Figure 31 Quantile-quantile plots for estimated residuals stratified by quartiles of population density for 37 CAG repeats, based on the worldwide cohort 148 Figure 32 Quantile-quantile plots for estimated residuals stratified by quartiles of population density for 38 CAG repeats, based on the worldwide cohort 149 Figure 33 Quantile-quantile plots for estimated residuals stratified by quartiles of population density for 39 CAG repeats, based on the worldwide cohort 150 Figure 34 Quantile-quantile plots for estimated residuals stratified by quartiles of population density for 40 CAG repeats, based on the worldwide cohort 151 Figure 35 Quantile-quantile plots for estimated residuals stratified by quartiles of population density for 42 CAG repeats, based on the worldwide cohort 152 Figure 36 Quantile-quantile plots for estimated residuals stratified by quartiles of population density for 43 CAG repeats, based on the worldwide cohort 153 Figure 37 Quantile-quantile plots for estimated residuals stratified by quartiles of population density for 44 CAG repeats, based on the worldwide cohort 154 ix Figure 38 Quantile-quantile plots for estimated residuals stratified by quartiles of population density for 45 CAG repeats, based on the worldwide cohort 155 Figure 39 Quantile-quantile plots for estimated residuals stratified by quartiles of population density for 46 CAG repeats, based on the worldwide cohort 156 Figure 40 Quantile-quantile plots for estimated residuals stratified by quartiles of population density for 47 CAG repeats, based on the worldwide cohort 157 Figure 41 Quantile-quantile plots for estimated residuals stratified by quartiles of population density for 48 CAG repeats, based on the worldwide cohort 158 Figure 42 Quantile-quantile plots for estimated residuals stratified by quartiles of population density for 49 CAG repeats, based on the worldwide cohort 159 Figure 43 Quantile-quantile plots for estimated residuals stratified by quartiles of population density for 50 CAG repeats, based on the worldwide cohort 160 Figure 44 Quantile-quantile plots for estimated residuals stratified by quartiles of population density for 51 CAG repeats, based on the worldwide cohort 161 Figure 45 Quantile-quantile plots for estimated residuals stratified by quartiles of population density for 52 CAG repeats, based on the worldwide cohort 162 Figure 46 Quantile-quantile plots for estimated residuals stratified by quartiles of population density for 53 CAG repeats, based on the worldwide cohort 163 Figure 47 Quantile-quantile plots for estimated residuals stratified by quartiles of population density for 54 CAG repeats, based on the worldwide cohort 164 Figure 48 Quantile-quantile plots for estimated residuals stratified by quartiles of population density for 55 CAG repeats, based on the worldwide cohort 165 Figure 49 Quantile-quantile plots for estimated residuals stratified by quartiles of population density for 56 CAG repeats, based on the worldwide cohort 166 Figure 50 Cumulative probability of onset for 41 repeats. Staircase lines represent the nonparametric (Kaplan-Meier) analysis with bars representing 95% Cl. Smooth curves represent prediction based on the parametric model with dashed lines representing 95% Cl 167 Figure 51 Cumulative probability of onset for 45 repeats. Staircase lines represent the nonparametric (Kaplan-Meier) analysis with bars representing 95% Cl. Smooth curves represent prediction based on the parametric model with dashed lines representing 95% Cl 168 Figure 52 Cumulative probability of onset for 49 repeats. Staircase lines represent the nonparametric (Kaplan-Meier) analysis with bars representing 95% Cl. Smooth curves represent prediction based on the parametric model with dashed lines representing 95% Cl 169 Figure 53 Quantile-quantile plots for estimated residuals stratified by quartiles of population density for six distribution families fit to age of death data 41 CAG repeats, based on the UBC cohort 170 Figure 54 Quantile-quantile plots for estimated residuals stratified by quartiles of population density for six distribution families fit to age of death data 42 CAG repeats, based on the UBC cohort 171 Figure 55 Quantile-quantile plots for estimated residuals stratified by quartiles of population density for six distribution families fit to age of death data 42 CAG repeats, based on the UBC cohort 172 Figure 56 Quantile-quantile plots for estimated residuals stratified by quartiles of population density for six distribution families fit to age of death data 43 CAG repeats, based on the UBC cohort 173 x Figure 57 Quantile-quantile plots for estimated residuals stratified by quartiles of population density for six distribution families fit to age of death data 44 CAG repeats, based on the UBC cohort 174 Figure 58 Quantile-quantile plots for estimated residuals stratified by quartiles of population density for six distribution families fit to age of death data 45 CAG repeats, based on the UBC cohort 175 Figure 59 Cumulative probability of death, at a particular age, for a CAG repeat length of 41. Staircase line presents nonparametric model, smooth curve logistic distribution fit to same data. Error bars and dotted lines represent 95% Cl, based on the UBC cohort 176 Figure 60 Cumulative probability of death, at a particular age, for a CAG repeat length of 42. Staircase line presents nonparametric model, smooth curve logistic distribution fit to same data. Error bars and dotted lines represent 95% Cl, based on the UBC cohort 177 Figure 61 Cumulative probability of death, at a particular age, for a CAG repeat length of 43. Staircase line presents nonparametric model, smooth curve logistic distribution fit to same data. Error bars and dotted lines represent 95% Cl, based on the UBC cohort 178 Figure 62 Cumulative probability of death, at a particular age, for a CAG repeat length of 44. Staircase line presents nonparametric model, smooth curve logistic distribution fit to same data. Error bars and dotted lines represent 95% Cl, based on the UBC cohort 179 Figure 63 Cumulative probability of onset predicted by a parametric model developed with 80% of the data, compared to the observed onset for the hold-out sample for 43-44 CAG repeats. Black staircase lines represent the nonparametric (Kaplan-Meier) analysis. Green smooth curves with solid lines represent a parametric model based on 80% of the data. Blue short dashed lines represent the Brier Scores of the nonparametric prediction, based on the holdout sample, and the red long dashed lines represent the Brier Scores of the parametric model predictions, based on the modeling sample 180 Figure 64 Cumulative probability of onset predicted by a parametric model developed with 80% of the data, compared to the observed onset for the hold-out sample for 45-47 CAG repeats. Black staircase lines represent the nonparametric (Kaplan-Meier) analysis. Green smooth curves with solid lines represent a parametric model based on 80% of the data. Blue short dashed lines represent the Brier Scores of the nonparametric prediction, based on the holdout sample, and the red long dashed lines represent the Brier Scores of the parametric model predictions, based on the modeling sample 181 Figure 65 Cumulative probability of onset predicted by a parametric model developed with 80% of the data, compared to the observed onset for the hold-out sample for 48-56 CAG repeats. Black staircase lines represent the nonparametric (Kaplan-Meier) analysis. Green smooth curves with solid lines represent a parametric model based on 80% of the data. Blue short dashed lines represent the Brier Scores of the nonparametric prediction, based on the holdout sample, and the red long dashed lines represent the Brier Scores of the parametric model predictions, based on the modeling sample 182 xi Acknowledgements I would like to thank all patients who have contributed blood samples and doctors who contributed clinical information to HD DNA banks and databases. Without their selfless contributions, this research could have never happened. It was a privilege to work with Dr. Michael Hayden As a supervisor no one could hope for more. He let me find my own way, but was always ready to bring me back to what was important. My committee members Drs. Ann Rose, Jan Friedman and Dessa Sadovnick for their thoughtful comments over the years, which lead me to look at things from a wonderfully different perspective. I would like to thank Dr. Elisabeth Almqvist, for so much laughter and interesting science, and always both at the same time, Dr. Blair Leavitt, for taking the time to show me why we do this, Odell Loubster for all the technical assistance, and the rest of the Hayden lab, and BNW, for being just the great bunch of people they really are. I thank my collaborators, Drs. Danial Falush and Doug Langbehn for all your statistical mentoring and Yinshan Zhao and Michael Jones for helpful comments regarding certain challenges in the statistical analysis. Enormous thanks to my parents, Theo and Elisabeth. Everything good that is in here came from them. And most of all, my love, thanks, and undying appreciation to Fiona, the first "Dr. Brinkman", for making this all worthwhile. Without her constant, loving support, this endeavor would not have been possible, or even worthwhile. This thesis was supported in part by a scholarship from the Medical Research Council of Canada. xii CHAPTER 1 INTRODUCTION 1 1.1 Clinical and molecular aspects of Huntington Disease Huntington Disease (HD) has been reported in practically all countries. HD occurs equally in both sexes with an overall prevalence of about 10 per 100,000 in Caucasian populations, with lower frequencies reported in non-Caucasians1'2. HD presents in adults with motor disturbances, memory deficits, psychiatric symptoms, and cognitive decline1 (Figure 1). The most striking feature is chorea, which occurs in approximately 90% of all affected individuals. However, the initial signs of the disease can be subtle with individuals in the very early stages of the disease being unable to perform complex facial movements such as blowing, whistling and frowning. Oculomotor dysfunction, especially of rapid, tracking movements, occurs in 60-80% of patients, who demonstrate increased latency and diminished speed of eye movements3"5. As the disease progresses there is increased writhing, jerking and twisting of different body parts, although the face, hands and head are particularly affected. Other motor disturbances include dysarthria and gait disorders6. Rigidity and dysphagia occur in the advanced stages of the disease. Mental disturbances including dementia, depression and personality changes (increased irritability, impulsiveness and aggression) may precede movement dysfunction by a decade or more6, but cannot be considered diagnostic for HD 7. Recent memory is significantly affected8'9. Severe weight loss is also a striking characteristic10. The disease is inexorably progressive, leading to profound functional disability and death over a period of ten to thirty years after onset of the first symptoms1'1 1"1 3. Five to ten percent of cases occur before age 20, with patients showing bradykinesia, rigidity, severe dementia and a more rapidly progressing disease. Even before the availability of predictive testing (first by linkage, then by mutation identification) the family history left little doubt as to diagnosis and the hereditary nature 2 of HD . There is no cure for HD, although symptomatic treatments available to lessen chorea and depression. Clinical Symptoms Initial Symptoms Early Middle Late Oculomotor dysfunction Mood changes Involuntary Depression movements Difficult to get along with Clumsiness Dysarthria Unsteady gait Intellectual decline Memory loss Weight loss Hypertontcity 1 Loss of speech \ Rigidity j Bowel control ! Bladder control \ Time Line 1 years 2-5 years 6-10 years >10 years (adapted from: Kirkwood, Su, Conneally, Foroud. Progression of symptoms in the early and middle stages of Huntington Disease, Archives of Neurology, 58,273-278 (2001)) Figure 1 Symptoms of HD The HD gene is located within 4p16.3 and encodes a 3136 residue (350-kDa) protein, huntingtin (httf5. Although the exact function of htt is unknown, normal htt expression has an anti-apoptotic effect16, is required for normal development17'18 and hematopoiesis19, and has been implicated in vesicle transport, endocytosis, and as a part of the cytoskelton15 ;20. The mutation responsible for the clinical manifestation of disease is an expansion of a CAG trinucleotide region in exon 1 of the gene, encoding a lengthened polyglutamine tract15. The general population has between 6 and 35 CAG 3 repeats, with 99% having less than 30 repeats. Persons affected with HD have a CAG repeat size (CAG) between 36 and 25021"23. The deletion of one htt allele does not result in HD, indicating that the disease does not result from solely a loss of function 2 4 3 . HD has previously been considered to be a true dominant disorder. However there are problems with this conclusion as it is based on the observation that patients homozygous for expanded CAG repeats have a disease similar in severity and rate of progression to their heterozygote siblings 2 6 2 7 ' 2 8 . The difficulty lies in the fact that the subjects of one study were only assessed by linkage26 and the complexity in comparing age of onset between repeats makes the conclusions suspect . Successive generations within HD families tend to have a younger age of onset (anticipation)29. The polymorphic CAG allele of normal chromosomes is transmitted from generation to generation in a mendelian fashion. However mutant HD alleles are unstable, and upon transmission offspring tend to acquire larger (1-4 units) repeats, although decreases of 1-2 repeats also occur2 9. The observation of mosaicism in sperm, that instability is more frequent during paternal transmission, and the knowledge that monozygotic twins have an identical CAG all point to gametogenesis as the primary source of instability30, which is thought to occur through Msh2 dependant gap repair31. There is evidence that instability also occurs in a non-replication based manner in neurons of the affected regions of the human brain where changes of up to 13 trinuceotides are seen 3 2 : 3 3 . Ten-fold greater changes (up to 160 repeat increase) have been observed in the post-mitotic striata of older mice 3 4. 4 1.2 Neuropathology of Motor Dysfunction The hallmark of disease pathology in HD is diffuse brain atrophy, with severe neuronal loss occurring in the basal ganglia. In the corpus striatum, the caudate nucleus and putamen (collectively the neostriatum, or simply the striatum) are particularly affected1 with significant loss also seen in the neocortex13135. In these regions, it is the medium-sized spiny neurons, containing the neurotransmitters ^-aminobutyric acid (GABA) and enkephalin that are most vulnerable while GABA/Substance P neurons are less affected. Nicotinamide adenine dinucleotide phosphate diaphorase (NADPH-d) positive neurons are relatively spared3 6. Motor dysfunction in HD can be related to the pattern of neuronal loss in different components of the basal ganglia-thalamo-cortical circuit. The striatum collects and processes input from the entire cerebral cortex and substantia nigra and output through other parts of the basal ganglia to areas of the frontal cortex that have been implicated in motor planning and execution37. While the subthalamic nucleus does not experience a loss of neurons in HD, its dysfunction is thought to be the crucial event that produces chorea 3 8 ' 3 9. Degeneration of the GABA-enkephalin medium spiny neurons projecting from the striatum releases the normal suppression of neurons in the external globus pallidus, resulting in thier becoming hyperactive. GABA/Substance P neurons normally suppress the subthalamic nucleus, resulting in increased depression of the activity of glutamatergic neurons in the subthalamic nucleus. The subthalamic nucleus normally outputs an excitatory effect on the internal globus pallidus. Therefore, a hypofunctional subthalamic nucleus causes a reduction of the normal inhibitory action of the internal 5 globus pallidus upon the thalamus. It is this disinhibition of the thalamus that ultimately leads to involuntary choreic movements40. The rigid state in late-stage HD is thought to result from the later loss of striatal GABA-Substance P containing neurons projecting to the internal segment of the globus pallidus38. Early oculomotor dysfunction in HD likely comes from the loss of striatal GABA-Substance P neurons41. These degenerate earlier than the GABA Substance P neurons projecting to the globus pallidus and normally act to inhibit the inhibitory effect of neurons projecting from the substantia nigra on neurons of the tectum mesencephali. The loss of striatal GABA-Substance P containing projections therefore results in over-inhibition of saccade initiation and other oculomoter abnormalities38. 1.3 Localization Wild type htt is ubiquitously expressed in many different peripheral tissues, not only the direct targets of HD 4 2" 4 4 . While mutant and wild type htt co-localize in all regions, some4' but not all 4 6 studies have found reduced expression of mutant htt in the cortex and striatum. Htt, associated with microtubules in dendrites and with synaptic vesicles in axon terminals, is thought to serve in synaptic function or intracellular trafficking as well as transport along the cytoskeleton47"50. Microscopic ubiquinated aggregates containing N-terminal fragments of htt (including the polyglutamine expansion) are found in (a) the cytoplasm and nucleus of neurons of 6 HD patients, (b) some transgenic models • , and (c) neuronal and non-neuronal cell models5 3"5 5. However, the distribution of nuclear aggregates does not always correspond to the selective pathology of the disease5 6. Furthermore, it has been shown that nuclear localization but not aggregation is required to induce toxicity in transfected neuronal cell culture systems57. There is however some controversy about this matter58. Inclusions can be degraded naturally by the body, and degradation may be accompanied by reversal of neurological signs 5 9. While aggregates may act in a protective manner by sequestering toxic polyglutamine-expanded protein, they may still have an indirect role in pathogenesis. For example, increased resistance of inclusions to proteosomes may make cells sensitive to stress57. Several mechanisms have been proposed for the way the extended polyglutamine tract could self-associate. One possibility is through a transglutaminase-mediated cross-linking via isopeptide bonds between the glutamine tract of hff and lysine residues in neighboring proteins60. An alternative mechanism could be through polar-zipper interactions where an expanded glutamine tract can form stable hairpins consisting of anti-parallel polyglutamine containing strands held together by hydrogen bonds 6 1 ' 6 2. In-vito evidence has been in support of both these hypotheses5 8'6 3. 1.4 Interacting Proteins Several proteins have been found to interact with htt. As htt is ubiquitously expressed but shows a brain-specific effect, an understanding of the interaction(s) of brain-specific proteins with htt could give insight into the possible normal and pathogenic roles of the protein. 7 Several lines of evidence point to htt having a role in the cytoskeleton. Its interaction with Huntingtin-assoclated protein (HAP 1) is modulated through the polyglutamine tract. While HAPVs precise role is unknown, it is a brain-specific protein that interacts with cytoskeletal components (the p150 subunit of dynactin, the pericentriolar protein PCM-1 and microtubules)5 0^4 6 5. The lowest levels of HAP1 mRNA expression correspond with the areas of greatest pathological cell loss HD (i.e. the caudate putamen, globus pallidus and neocortex)66. The interaction of htt and the huntingtin interacting protein 1 (HIP1) is inversely correlated to polyglutamine chain length67. HIP1, a proapoptotic protein, is the human homologue of the yeast protein Sla2p, which is essential for proper function of the cytoskeleton68. This has lead to the hypothesis that modulation of the interaction between htt and HIP1 leads to altered membrane-cytoskeleton interaction. Furthermore, mHipIR, a protein closely related to HIP1 associates with both actin filaments and clathrin-coated pits and vesicles2 0. This therefore suggests a role in endocytosis, perhaps by linking the actin cytoskeleton to coated pits, facilitating vesicle budding20. Finally, the association of htt with microtubules, likely through polymerized tubulin, suggests that htt may have a role in intracellular transport or axonal transport69. Htt has also been found to interact with an Ubiquitin-conjugating enzyme in a manner independent of polyglutamine length. This suggests a possible role for htt in the catabolic pathway, based on the role of ubiquitination to direct the target protein to the proteosome for degradation70. Detection of ubiquitinated forms of mutant disease proteins within neuronal intracellular inclusions suggests that polyglutamine expansion 8 leads to an unusually stable conformation of the protein that is resistant to proteolysis, sequestering the toxic polyglutamine-expanded protein in a protective manner5 1'7 1. Htt has also been found to interact with several other proteins including Calmodulin, CREB-binding protein and mSin3a, Cysthathionin (3-synthase, Grb2 and RasGAP, HYB-A,-B,-C, MLK2, N-CoR, p53, SH3GL3, She and EGF receptor72. However the relationships of these to normal or mutant function of htt is not as well understood. 1.5 Apoptosis Programmed cell death through apoptosis is a necessary part of natural development to remove excess cells (including neurons) and to maintain homeostasis. Caspases are proteases that have been directly implicated in the execution of apoptosis73. Aberrant activation of the apoptotic pathway leads to a premature loss of cells. Cleavage of htt occurs at two caspase-3 sites downstream of the polyglutamine region of htt. The length of the polyglutamine track appears to determine the susceptibility of htt to caspase-3 cleavage 7 4' 7 5. The amino-terminal fragment generated by caspase-3 cleavage can also induce apoptosis, leading to an accelerating cascade 7 5. Expression of a dominant-negative caspase-1 mutant delays onset of symptoms and extends survival76. Caspase inhibitors diminish the toxicity of htt and reduce aggregate formation77. Neurons from mice transgeneic for expanded (48 or 89 repeats), but not normal length htt (16 repeats) show increased TUNEL staining, indicitive of apoptotic death7 8. Perhaps most interesting is the finding that the overall incidence of cancer (the antithesis of apoptosis) is significantly lower among HD patients, but not among their healthy relatives79 9 Huntingtin itself may play a role in regulating the balance between cell proliferation and cell death. Studies suggest that wild-type huntingtin has an anti-apoptotic effect16'80. Cells expressing wild type htt are protected from cell death induced by death receptors by the pro-apoptotic BCI-2 family members, as well as by caspase-9. This likely occurs through the effect of wild type htt on mitochondrial or post-mitochondrial apoptotic effects16. The full-length protein also modulates the toxicity of the polyglutamine expansion16. Furthermore, Mf is required for normal hematopoiesis19. Wild type htt may also sequester HIP-1 and prevent it from inducing the apoptotic pathway68. Additional evidence in support for the role of wild type htt in cell survival is that it up-regulates transcription of brain derived neurotrophic factor (BDNF)8 1 and that reduced (up to 82%) BDNF expression has been found in the caudate and putamen of HD patients compared to age matched controls82. Together, this evidence leads to the conclusion that disruption of normal htt function in the brains of HD patients causes insufficient neurotrophic support for striatal neurons. 1.6 Age of Onset For patients who are given information that they have inherited a CAG in the HD range (greater than 35 repeats) the question often changes from whether they will develop HD to when will the disease manifest. However, it can be difficult to pinpoint a precise age of onset of HD. Gradual changes in behavior and movement can occur over a period of many years with no clear threshold. Nevertheless, a long-term study of the cohort used to identify the HD gene found that individuals with totally normal assessment have only a 3% chance of developing definite HD within the next 3 years8 3. The difficulty in precisely estimating age of onset on an individual basis does not 10 however preclude analysis of this endpoint as the average HD population shows a clear and consistent pattern13. Numerous studies have described a significant inverse relationship between CAG and age of onset for HD 8 4" 9 9, with CAG accounting for approximately 60% of the variation in age of onset. The mean age of onset of HD is 40 years 1 0 0, although individuals with more than 60 repeats almost invariably present with juvenile-onset HD 8 9 ; 9 5 . There is conflicting data as to whether there is a significant correlation between CAG and the rate of progression of the disease after onset 8 4 ' 8 5 ' 9 6 ' 1 0 1 . Using this relationship to obtain a mean age of onset for a particular CAG is not clinically applicable as the range of predicted onset for a particular CAG is very broad. Therefore, most authors have recommended against using this method to predict the age of onset for an individual patient6;86-88. Studies prior to this thesis investigating the relationship between CAG and age of onset have not included presymptomatic individuals with a CAG in the affected range in the analyses. This prevents a complete understanding of the relationship between CAG and age of onset of HD. Survival analysis provides the statistical means to incorporate information about individuals who carry an expanded htt allele, but are clinically asymptomatic at the time of assesment. While there have been two analyses of age of onset including presymptomatic individuals based on life-table or survival analysis, these were performed before the HD gene was identified, and included heterozygotes of unknown geneotype 1 0 2' 1 0 3. 11 One model to predict onset was developed by Aylward subsequent to the discovery of the HD gene 1 0 4 . However this model (age at onset = [-0.81 x repeat length] + [0.51 x parental onset age] + 54.87) was derived from a stepwise multiple regression analysis based on only 50 parent-child pairs from one HD clinic. Furthermore, the importance of parental age of onset in predicting an individual's age of onset has been disputed in a more recent, and comprehensive, analysis 1 0 5. Finally, the Aylward model was derived using only symptomatic individuals, excluding information from those individuals who have an expanded htt allele, but lived disease free for many years. 1.7 Similarities to Other CAG Repeat Disorders At present, eight other neurological diseases are known to result from a CAG repeat expansion (Dentatorubral pallidoluysian atrophy (DRPLA); spinobulbar muscular atrophy (SBMA/AR); and spinocerebellar ataxia types 1,2,3,6,7 and 12 (SCA1, SCA2, SCA3 (or Machado-Joseph MJD) SCA6, SCA7 (or CACNA1A) and SCA12) 1 0 6 " 1 0 8 . However, the mutant proteins show no similarity to each other (outside of the CAG repeat region), and only a certain, disease-specific subset of neurons are affected in each disease. The inverse relationship between the increased CAG repeat length and age of onset for all diseases associated with CAG expansion is well documented 1 0 9" 1 1 7. They all have a progressive neuronal dysfunction beginning in mid-life with neuronal loss and death 10-20 years after onset; dominant inheritance (with the exception of SBMA which is X-linked) and somatic and germline repeat instability leading to anticipation. Polyglutamine inclusions in HD, DRPLA SCA 1,3 and 7 are primarily found in the nucleus or perinuclear region and are ubiquitin positive1 1 8. Except for SCA-12, where 12 the expansion occurs in the promoter, the wild type non-pathogenic proteins contain around 20 consecutive polyglutamines, while the disease forms have a polyglutamine tract of greater than about 35 glutamine residues. Seven of the diseases are predicted to contain caspase cleavage sites by amino acid sequence and htt, atrophin-1, the androgen receptor and ataxins-2,3,6 and 7 and are all specific substrates for one or more caspases 7 7. 1.8 Survival Analysis The focus of this thesis is to further the understanding of age of HD onset using survival analysis. Survival analysis is a method for the analysis of events that occur over time. This technique has been applied to many fields of health research 1 1 9. One unique feature of survival analysis is the modeling of the "time-to-event" in the presence of "censored" cases. Censoring occurs where the time of the critical event has not been recorded, but is known to occur after some point1 2 0. For example, disease onset for individuals with 43 repeats who move out of contact with the clinic after the genetic test is done. While there are different types of censoring, this thesis is concerned with right censoring, i.e. where the time of HD onset for some patients is only known to have occurred after the last clinic assessment or "at risk" individuals died before onset. Two general types of survival models are non/semi-parametric and parametric. 1.8.1 Nonparametric survival analysis Nonparametric survival analysis, often called Kaplan-Meier analysis provides an unbiased estimate of the survival function. It is more efficient and more widely used than parametric survival models when no suitable theoretical distributions are known 1 2 1. 13 Kaplan-Meier analysis is based on estimating conditional probabilities at each time point when an event occurs and taking the product limit of those probabilities to estimate the survival rate at each time point. To summarize this procedure, let n be the total number of individuals whose survival times, censored or not, are available. Relabel the n survival times in order of increasing magnitude such that t(i) < t<2) < ... < t(n). Then the observed survival time for a particular year is given by Equation 1, where r runs through those positive integers for which t<r) < t and t(r> are uncensored1 2 1. " n — v S(t)= l l 7 Equation 1 t{r)<i n-r + 1 The resulting survival curves are then used to predict the probability of an event occurring before a given time point. For example, the estimated median survival time is the 50 th percentile, or the value oft at S(t) = 0.50. Cox's proportional hazards model allows for analysis that is slightly more complex and assumes that the hazard for patients belonging to one risk group is a constant times that of patients in another risk group (i.e. that they are proportional)119. The Cox model is not a fully parametric model, as it does not specify the form of the underlying hazard. The hazard function can be viewed as the approximate probability of an individual experiencing onset in the next instant. 1.8.2 Parametric Parametric survival models are more efficient when survival times follow a probability distribution (e.g. exponential, logistic). These lead to smaller standard errors, easier interpretation of the results and more precise statistical inferences compared to nonparametric models 1 2 2. However, the use of an inappropriate parametric model can lead to very poor predictions. Therefore, the underlying assumptions of the model being 14 used must be carefully examined. The Kaplan-Meier estimator provides an excellent tool for initial exploration of the data and for suggesting and verifying possible parametric models as it involves no underlying assumptions. 15 1.9 Thesis Objectives This thesis is primarily concerned with the survival analysis of HD patients. In this regard I: (1) First used nonparametric analysis to predict onset using a UBC HD patient cohort (2) Developed a large, international collaboration of HD clinics (3) Used increasingly more complex parametric survival analysis to refine estimates of the likelihood of onset (4) Used these estimates to calculate the size a clinical trial would need to be in order to detect a delay of onset among presymptomatic individuals and (5) Estimated the age and CAG-specific penetrance of HD (6) Predicted the age of death for HD patients 1.9.1 Nonparametric Prediction of Onset Using a UBC Cohort At the start of this thesis, there was no reliable method for predicting the age-specific likelihood of onset of HD. I hypothesized that being able to predict the likelihood of onset of HD would not only be useful for patients and their families desiring knowledge about their future risk of onset, but also clinically, both in creating clinical risk groups for stratifying patients in clinical trials, and in the design of treatment regimes (in the case of a potentially hazardous therapy for HD). I hypothesized that inclusion of CAG repeat size into a survival analysis would increase the accuracy of the prediction of the likelihood of onset of HD by a certain age. I therefore used survival analysis to predict the age-specific probability of onset using a cohort of individuals recruited from the 16 University of British Columbia (UBC) HD clinic, who were either at-risk for HD (i.e. asymptomatic but with a CAG greater than 35) or affected, using nonparametric survival analysis. 1.9.2 Nonparametric Prediction of Onset Using a Worldwide Cohort Once I had established that CAG-specific survival analysis was a viable method to estimate the age-specific likelihood of HD onset, there was a need to replicate the analyses on another set of the patients to verify and extend my findings. I hypothesized this would demonstrate the clinical utility of this analysis by providing clinicians with confirmation of the applicability of the analysis on a larger set of HD patients and by the agreement in the results. 1.9.3 Parametric Prediction of Onset Using a Worldwide Cohort The S-shaped distribution of the nonparametric survival curves for both the UBC and worldwide cohorts led me to hypothesize that it would be possible to fit a parametric model to the survival curves of the larger, worldwide cohort. I hypothesized that this would allow predictions of onset with smaller confidence intervals, thus increasing the usefulness of the predictive model. 1.9.4 Parsimonious Model for Predicting Onset Using a Worldwide Cohort There appeared to be a relationship between the parameters specifying the individual survival curves obtained using the best fitting parametric distribution. I then hypothesized that it would be possible to use this relationship to extend the parametric model into a unified (parsimonious) model incorporating information from a wide variety of repeats. Compared to my earlier method of individually fitting survival curves to each 17 CAG repeat I hypothesized that the parsimonious model would be more efficient, leading to smaller confidence intervals. 1.9.5 Determining the Accuracy of the Predictive Model Once the parsimonious model had been developed, it was important to validate the clinical utility of the predictions. I hypothesized that it would be possible to validate the model by splitting the worldwide cohort into modeling and testing samples, and that there would be little difference between the predictions provided by the parsimonious model developed using the modeling cohort and the nonparametric estimates obtained using the remaining "hold out" sample. 1.9.6 Prediction of Age at Death Using a UBC Cohort The success at developing nonparametric survival models to predict onset, lead me to hypothesize that it would also be possible to use survival analysis to predict the age-specific likelihood of death for HD patients, an endpoint of obvious clinical interest. 1.9.7 Penetrance While there had been case reports of reduced penetrance for HD before the start of this thesis, I hypothesized that accurate numerical estimates of penetrance could be based on the parsimonious model of likelihood of onset 1.9.8 Using the Predictive Model for the Design of Clinical Trials A method for implementation of clinical trials to detect a delay of onset of presymptomatic individuals had not been reported before this thesis. I hypothesized that the generation of survival curves, and especially the parsimonious model, would aid in 18 the design of clinical trials in presymptomatic gene carriers, by targeting for recruitment those individuals who will likely have onset during the course of a clinical trial. 19 CHAPTER 2 Materials and Methods 20 2.1 Subjects 2.1.1 UBC cohort I* created one database with clinical and demographic information derived from two independent databases. A clinical resource from the UBC Huntington Disease Medical Clinic, Department of Medical Genetics, UBC Hospital (HD Clinic) contained information on individuals from throughout British Columbia, though most were from the greater Vancouver area. The second database contained information from the DNA diagnostic testing service provided by the Hayden Laboratory as part of its ongoing research activities on HD. This component contained information on individuals seen at the local clinic as well as information from patients whose DNA was sent to the laboratory for DNA diagnostic testing, including patients from Canada and around the world. I combined patients' records from the two databases together, amalgamating fields and removing duplicate entries (based on name, date of birth and family relationships) as appropriate. I consulted patient's charts at the HD clinic both to resolve inconsistencies, and to collect further information, primarily date of birth, disease onset and last clinic visit and I updated the database as the new data became available. Data was continuously updated throughout the course of my graduate program by DNA technicians, summer students, genetic counselors, research associates and myself, creating a "living" database. The total size of the database now comprises 5089 patients and family members and includes a wealth of patient and DNA sample information. * I have used "I" throughout this thesis to distinguish my contributions to this thesis from those of other individuals, who are specifically acknowledged as appropriate 21 1593 individuals in the amalgamated database are affected with HD and 2244 are presymptomatic but at-risk (first or second degree relatives of an affected individual). As an initial study demonstrating proof of concept, I performed survival analysis on a cohort of 728 affected and 321 asymptomatic at-risk individuals. Of these, there were only 32 individuals with 36 to 38 repeats, and 65 individuals with a CAG greater than 50 in the initial UBC cohort. Therefore, patients with these repeats were excluded from the survival analysis since there were too few of individuals with this CAG to undertake rigorous statistical analysis. The remaining 661 affected and 205 presymptomatic patients comprised the UBC cohort and represented 90% of individuals in the Hayden laboratory HD database having a CAG in the range of 36 or above. 2.1.2 Worldwide cohort Based on the success of the initial survival analysis, I established a collaborative group (Appendix IV) of HD centers whose names were provided by the Canadian HD society, the Huntington Study Group and the Genetic Counselor Group to verify and extend my analyses. Forty centers (Appendix IV) contributed data anonymously to this study, including centers in Europe (7), Asia (1), Africa (2) and North America (30). The UBC cohort was also included, plus an approximately 500 additional individuals who I identified as previously only having their CAG repeat length measured through less accurate methods, or as lacking CAG repeat length information altogether. All data was examined for outliers and suspect data was confirmed with contributing centers. One 22 individual was removed from the dataset because their age of onset, which could not be confirmed on further checking, was three standard deviations from the mean and 15 years later than the next oldest individual with that CAG. I obtained ethical approval for the study from the University of British Columbia. Centers supplying the data also obtained such approval from their local governing bodies. Since no affected persons were observed for a CAG of 35 or less, individuals with a CAG greater than this were initially considered the cohort at risk. The outcome variable was age of onset or last known age unaffected. The worldwide cohort comprised 2298 affected and 615 asymptomatic at risk individuals with a CAG between 36 and 56. For repeat lengths greater than 56, there was not enough data to assure stable estimates of the CAG-specific curves. 2.2 CAG Determination 2.2.1 UBC cohort For the initial study on the UBC cohort, the CAG repeat was assessed for all samples by excluding the CCG repeat using PCR analysis with primers that flanked only the CAG repeat22 or by using primers that flanked both the CAG repeat and the CCG repeat1 2 3 followed by analysis with primers that flank only the CCG repeat124. Phasing of the CCG repeat was performed by pedigree studies when necessary. The CAG repeat size was assessed by comparison to sequenced clones of known CAG size. 23 2.2.2 Worldwide cohort To minimize differences in CAG determination, the initial study included only those individuals who had their CAG determined exclusive of the adjacent polymorphic CCGn stretch, using cloned standards for accurate sizing 2 2 ' 1 2 5" 1 2 7 . 2.3 Age of onset determination 2.3.1 UBC cohort Strict criteria were used to determine the age of onset by incorporating careful verification of clinical information by a neurologist (Dr. Michelle Mezei), and a clinical geneticist (Dr. Elisabeth Almqvist), both from the Hayden laboratory, as well as myself to resolve any possible discrepancies in key individuals who were indicated to be unaffected at a very old age. I performed an assessment of age of onset through both a retrospective review of patient charts and DNA data. The two clinicians conducted telephone interviews with patients, family members, genetic counsellors and physicians. Age of onset was defined as the first time a patient had either neurological or psychiatric symptoms that represented a permanent change from the normal state. As an additional analysis, I compared the difference in age of neurological and psychiatric onset. The age used for analysis of all asymptomatic individuals was the oldest age when their clinical status was last directly confirmed at the Genetics clinic in Vancouver or by the local, attending physician. Particular attention was paid to confirmation of current age and clinical status of all asymptomatic, at-risk individuals in the HD database aged greater than 65 years of age. 24 2.3.2 Worldwide cohort Age at onset was defined as the first time when neurological signs that represented a permanent change from the normal state were identified in a patient. The age used in the analysis of presymptomatic individuals was the oldest age when a physician last directly confirmed their clinical status. Note that definition age of onset excluded psychiatric onset, which was included in the initial analysis. This was due to concerns about the possibility that psychiatric symptoms are not solely due to HD. 2.4 Survival Analysis 2.4.1 Assumptions Several assumptions about the nature of censoring were made throughout this thesis. As at-risk individuals were either still presymptomatic at the end of the study or were lost to follow-up, the information from presymptomatic individuals was right censored 1 2 8. I assumed that being presymptomatic at a certain time was only indicative that the time of onset exceeded that time, and carried no prognostic information about subsequent survival times, for either that individual or other individuals (i.e. censoring was noninformative and individuals were not lost to follow up because of impending onset)1 2 8. This assumption is supported if patients' examination scheme is noninformative for the disease process 1 2 9. This is true if the full likelihood of onset is proportional to the likelihood obtained, which is true under any examination scheme that is stochastically independent of likelihood of onset. This occurs if: 1. Patients were examined either at regular intervals (usually, but not necessarily yearly) 25 2. Patients were examined in a more or less random fashion but that examination times were independent of the subjects' disease history 3. Clinicians chose the next examination time dependent on the patient's state at the current examination129 As most clinic visits for patients under physician's care for HD fall under one of the three above cases (and not, for example, where a patient whose symptoms suggest HD is advancing may self-select to be (or not to be) examined by a doctor), my hypothesis that censoring was non-informative and independent of the disease process is valid 1 2 9. Based on these assumptions, survival analysis could utilize information from both presymptomatic and affected individuals1 2 8'1 3 0. 2.4.2 Predicting the age and CAG-specific likelihood of onset 2.4.2.1 Nonparametric survival analysis Kaplan-Meier survival analysis was used to calculate the cumulative probability of having onset of HD by a certain age for a particular CAG. Nonparametric (Kaplan-Meier) survival curves were constructed using S-PLUS 2000 1 3 1. As there were no affected persons with CAG less than 36, unaffected individuals with a CAG greater than 35 were considered at-risk, from birth to either neurological or psychiatric (specifically for the UBC cohort) onset or until death or last contact. 2.4.2.2 Parametric survival analysis The first step in building the parametric model was finding a probability distribution that gave a close fit to the observed survival distributions from individual repeats. I fit 12 CAG-specific parametric models (weibull, extreme, lognormal, loggaussian, gaussian, logistic, loglogtetic, logexponential, exponential, lograyleigh, rayleigh and normal)131 to the nonparametric survival curves. These "fits" were examined by visual inspection of 26 quantile-quantile plots, to identify the distribution that was most linear 1 3 2 and the maximum likelihoods of each of the parametric models for each CAG were compared to assess the best fit. The plausibility of a Cox proportional hazards model 1 3 3 was also tested. 2.4.2.3 Parsimonious model As discussed previously, I explored mathematical functions that gave a good least-squares fit to the nonparametric survival curve for each CAG repeat 1 3 4. Of the 12 probability distributions tested, the logistic distribution gave the best fit. Equation 2 gives the cumulative distribution function (CDF) of a logistic function. 1 '-(A 1 + exp CDFL0C!STIC = —- — T Equation 2 -{Age-location) 1 scale J The location parameter is the value of the distribution at the 50 t h percentile, and the scale parameter can be related to variance of the distribution by Equation 3 and therefore to standard deviation by Equation 4. i2 [nscale] Variance -S.D., LOGISTIC ^ Equation 3 \nscale\ LOGISTIC = Equation 4 The individual survival curves for each CAG, as specified by the logistic distribution, seemed to be related in that they were somewhat evenly distributed, though the longer repeats seemed to cluster together, and had more steep survival curves (see Results). I 27 explored simple mathematical functions that gave a good least-squares fit to the relationship between CAG and both the location and scale parameters of the individual survival curves (in other words exploring mathematical relationships that described the relationship between the survival curves for the different repeats) 1 3 4. Exponential functions of the form shown by Equation 5 (where exp signifies exponent) provided excellent fits to both the relationship of CAG to the location parameter of the various CAG-specific logistic survival curves and the fit between CAG and the scale parameter and its transformations (variance and standard deviation). a + b exp -CAG Equation 5 This prompted me to hypothesize that it would be possible to generate a mathematic model (the parsimonious model) of the like age-specific likelihood of onset based on the logistic survival distribution ( Equation 2), but relating the survival distributions for each CAG through the exponential relationship of both the location and scale parameter (Equation 5). I further hypothesized that it would be possible to use the data from the worldwide cohort to estimate the parameters for this model. The complexity of the derivation of the formula and values for the parameters lead me to eventually seek the support of two statisticians. Drs. Doug Langbehn (from Departments of Psychiatry, and Biostatistics, University of Iowa College of Medicine) and Daniel Falush (Max-Planck Institut fur Infektionsbiologie) provided assistance in the obtaining the mathematical solution of the parsimonious model as indicated below. The first step in building the parsimonious model was to ensure that all the CAG repeats could be combined together through the exponential relationship of the location and scale parameters of the logistic distribution. 28 Previous research 1 3 5 had shown that there was likely underascertainment of individuals with lower CAG repeats. This underascertainment was supported by the fit of the exponential distribution to the location (mean age of onset) parameter, which predicted that the average age of onset for individual with 36 repeats was 95 (Section 6.2, Table 13). This indicated that it was likely that many individuals would live symptom free their entire life and thus never come to the attention of a center for HD research or a HD clinic, except if they were identified through an affected family member. Inclusion of the data that was provided for lower repeats would therefore likely bias the parsimonious model. To estimate at what repeat underascertainment no longer had a significant effect, the overall censoring distribution in the sample was calculated. Centers were then grouped after adjusting for the CAG repeat length and age distribution within each center. The concordance between the CAG-specific mean age of onset estimates for the different clinics was then determined. Based on the ascertainment analysis, I concluded that only repeats between 41 and 56 should be included into the design of the parsimonious model. Using the Mathematica symbolic algebra program1 3 6 and incorporating custom-built programs, Dr. Langbehn derived the parsimonious model, likelihood, score, and observed statistical information equations137. The parameter values were estimated by applying the general theory of parametric censored survival regression, maximum likelihood estimation138. Due to the almost complete redundancy (technically colinearity) between certain parameters in the model traditional optimization techniques were not sufficient to find the solutions to the likelihood equations. Therefore, the likelihood equations were numerically solved using empirical line search methods implemented in 29 the Optimization Toolbox of Matlab 1 3 9. Solutions were forced to converge to approximately 1 part in 10 1 1. To check convergence to a consistent maximum, the solution was verified several times, starting with different initial parameter estimates. The analytically derived score equations were used to verify the solution's correspondence to a critical point on the likelihood surface. The estimated covariance matrix of the parameters was obtained by inverting the observed information matrix. The statistical significance and goodness-of-fit of the parametric model were assessed by the chi-square approximations provided by twice the negative log likelihood of the relevant model estimates. The delta method was applied to the estimated parameter covariance matrix to obtain approximate confidence intervals for all survival or failure probabilities (including conditional probabilities) and CAG-specific means and standard deviations1 4 0. In the case of probabilities, a symmetric normal-distribution approximation to the confidence interval of the logit of the probability was first derived. These were then transformed into nonsymmetrical intervals on the probability scale. Similarly, symmetrical confidence intervals for the natural logarithms of mean ages of onset and their standard deviations were derived and then transformed to the original scale of years of age. Given that an individual is presymptomatic at time x, I calculated the conditional probability of onset at some point (x+t) for individuals with a CAG between 36 and 56, aged between 0 and 90 years of age (by year), where t was between 5 and 35 (in 5 year increments) using Equation 6. ST(t) = S ^ x + t \ t > 0 Equation 6 Sx(x) 30 The variance of the conditional probability was estimated via the delta method in a similar manner to that described above. 31 2.4.3 Predicting the age and CAG-specific likelihood of death In addition to predicting age of onset, I used survival analysis to determine the probability of death by a particular age for an individual with a specific CAG size. As an initial investigation, individuals with a CAG greater than 36 were selected from the UBC cohort. Data was analyzed using Kaplan-Meier survival analysis and included 542 individuals with repeats between 41 and 45. 75 had died and the majority (60%) of persons in the database with greater than 36 repeats have a CAG in this range. Other repeats were excluded from the analysis as the small numbers of individuals for these particular repeats precluded rigorous statistical analyses. I subsequently used those individuals from the extended UBC cohort, who had either age of onset or age of death information available, in a parametric analysis. Nonparametric survival analysis was performed using the same methods as previously described. Quantile-quantile plots were constructed for 6 distribution families. The fit of each distribution was judged by eye (as previously described). As the logistic distribution appeared to have the best fit, it was chosen for further characterization. Logistic models were fit to the data for each repeat with sufficient numbers. These were compared to the nonparametric models as a further test of the goodness-of-fit. 2.5 Determining the Accuracy of the Predictive Model Given the large sample size, I anticipated little overfitting in the model, whereby a model is fit to such an extent that it includes small, random fluctuations. To validate the model, 32 Dr. Langbehn refit a parsimonious model using a random subsample of 80% of the data. The other 20%, held out for cross validation, was divided into 4 parts: CAG 41-42, CAG 43-44, CAG 45-47, and CAG 48-56. The parametric estimates of the cumulative onset distributions for each of the four hold-out strata were calculated using the CAG length prevalences in those strata. The resultant estimated survival curves were compared to the empirical Kaplan-Meier estimates for the strata. I also calculated the accuracy of the model in predicting the outcome for the hold-out sample for each of the 4 groups using the Brier Score (Equation 7) 1 4 1. Brier Score = -Y{(0-n{t*\ X,))2I(fi<t*,8, = 1)(1 /G(7J)) + (1 -A(t*\ X,)f I(ft > t*){\IG(t*))} Equation 7 n M For each individual 8{ = I is the censoring code indicating if the patient was observed to have onset and (I = 1) or if the patient was lost to follow-up (1 = 0); t* is the time for which the estimation is being made; 7J is the age of onset or censoring; ft(t*) is the prediction of the probability of onset from the model; G(t*) is the Kaplan-Meier estimate of the censoring distribution for the age at which the prognosis is being made and G(7;) is the Kaplan-Meier estimate of the censoring distribution when the patient was last observed (lost to follow-up or had onset). 33 Patients were classified into three categories. Brier Scores were assigned weights using computer programs I developed in Perl (http://www.perl.com) according to methods described by Graf 41(Appendex II). 2.6 Penetrance I initially examined penetrance based on the UBC cohort, and then I extended the analysis to the worldwide cohort. Initial investigation into penetrance was based on the number of individuals who lived to an old age. Old age was defined using the Statistics Canada data for expected lifetime of 75 years for males and 80 years for females 1 4 2. Drs. M. Mezei and E. Almqvist obtained confirmation of current age and clinical status for all presymptomatic, at-risk individuals in the UBC cohort aged greater than 65 years of age. The analysis was extended by actual numerical estimate of penetrance based on the parsimonious model. 2.7 Using the Predictive Model for the Design of Clinical Trials The distribution of presymptomatic individuals by CAG and age from the 40 clinics of the worldwide cohort was used as an estimate of the distribution of individuals who would comprise the target cohort for a clinical trial designed to detect a delay of onset. To develop such a model I first determined the age and repeat range for which the greatest number of untreated presymptomatic individuals would show symptoms of the disease within a four-year period. This was based on the CAG and age distribution of presymptomatic individuals and their conditional probability of onset as predicted by the parsimonious model. I assumed a balanced randomization of treatment and placebo with regard to both age and CAG. I also assumed no dropouts and follow-up determinations of clinical status would be performed every 6 months for 4 years. I 34 assumed that the treatment effect would follow an accelerated life model (more accurately, a decelerated life model in this case) with the progression towards disease onset from the point of treatment initiation decreased to either 80%, 50% or 20% of the original rate. Sample size calculations were based on the log rank survival test 1 3 8. Conditional on the assumed age and CAG distribution and the conditional model of onset probability, Dr. Langbehn then calculated the mean contribution per subject to the log rank statistic. The sample size was given when the appropriate critical point of the one degree of freedom (d.f.) non-central chi square distribution was divided by the mean likelihood contribution. The non-centrality parameter was determined by the assumed percent slowing of disease onset. The theory justifying this approach in slightly different contexts has been described by both Agresti 1 4 3 and O'Brien 1 4 4. Given the assumptions, the log-rank test proved slightly more powerful than other candidate tests from the Harrington and Fleming class 1 3 8 ' 1 4 5 . 35 CHAPTER 3 NONPARAMETRIC PREDICTION OF ONSET USING THE UBC COHORT The work presented in this chapter has contributed to one publication: Brinkman, R.R., Mezei, M.M., Theilmann, J., Almqvist, E., and Hayden, M.R. (1997) The likelihood of being affected with Huntington disease by a particular age, for a specific CAG size. American Journal of Human Genetics 60, 1202-10. 36 3.1 Introduction I hypothesized that CAG-specific survival curves would enable the estimation of the probability of onset of HD by a certain age in a patient with a particular repeat. I therefore used nonparametric (Kaplan-Meier) survival analysis to estimate the age-specific probability of onset using individuals in the UBC cohort who were either at-risk (i.e. presymptomatic but with a CAG repeat length greater than 35) or affected for HD using nonparametric (Kaplan-Meier) survival analysis. 3.2 Results The distribution of affected and presymptomatic at-risk individuals in the UBC cohort with a CAG greater than 28 is shown in Table 1. There were no affected individuals with less than 36 CAG repeats. The ages of presymptomatic persons with a repeat between 30 and 35 are shown in Table 2. Nine hundred and sixty three individuals (728 affected) had repeats greater than 35. Of these, 866 individuals (90%) from 445 families had a repeat between 39 to 50 repeats. Ninety seven percent of affected individuals had a neurological age of onset, with the remainder (n=23) having psychiatric. The linear association between log age of onset and log CAG size was significant (P <0.001) with an r2 value of 0.73. The cumulative probability of onset for a CAG of 40 increased in a S-shaped manner (Figure 2). Similar results were seen for repeats from 39 to 50, the range for which there was enough data to construct curves (see Appendix I, Figure 18 to Figure 29). The cumulative probability of onset at five year intervals for a given CAG repeat is shown in 37 Table 3, with the complete age distribution for each CAG shown in Figure 3. The mean 95% confidence interval (95% Cl) for the age-specific likelihood of age of onset with a given CAG was 20%, with a standard deviation of 10%. The largest 95% Cl was 55. As CAG increases from 39 to 50 there is a significant increase in the probability of onset (P <0.0005) by a given age. For example, while an individual with 40 repeats has only a 13% probability of having onset by age 45, this increases to 32% for someone with 42 repeats, 73% for 44 repeats and 100% for a person with 46 repeats (Table 3,Figure 3). Differences of a single CAG had a significant effect on the expected age of onset for an individual. There was a significant linear trend between CAG and median age of onset (P <0.001; r*=0.96) with the median age of onset decreasing by 3.4 years (± 0.2) for each CAG increase between 39 and 50 (Table 4). For example, while only 50% of persons with 40 repeats will be affected by age 59, this decreases to age 37 for 45 repeats, and to age 27 for 50 repeats. To assess the effect of any possible bias introduced by the inclusion of multiple individuals from 445 families, I randomly selected two individuals from each family and repeated the analysis. There was no significant difference in the results obtained for 39 to 49 repeats, suggesting that I did not introduce any obvious bias into the cohort by including two or more individuals from any one family for this repeat range. While the mean difference between the age of neurological and psychiatric onset was small (0.8 years), there was a large standard deviation (+/- 4.7 years) and some 38 patients had much earlier psychiatric symptoms. This result is based on the analysis of the 164 individuals having both types of onset. 39 ro cu \ O) O < O ro > CO _c o $ Q X CO C/) CO T3 > T3 C o "rt* CO E o Q. E >* oo CD i— Q. T3 C CO CO CO T5 > C "O aj o CD < o JZ o o O co M — O c o V-' ' i_ W b T — 00 cu CN TabI than > O m o m 00 CO 5 CO CA 3 «*-< (0 rt (A (A CM CO O) CO CD CO CD CD CO CM f -5- - f •*f 00 CO 1,049 m CD CD CO T - CM CO O CO o CM in CO 00 in co CO CD CM CO CD CM CM CO CO CD CM CO CO CD CD in CM O CO CM CO CD CD CM CO CM CO in CD o CO I"-CM CD CD T - CO CD CO CO m x f CO m CO oo 00 <N 00 O T - -cf 00 CD CO CM CO h~ m -cf 00 - f If) r ~ CM CO CO CM CO - f 00 in CD CO CD in - f in CO CO CO CD CM CD O T - - f t-- CO CO f -CO CO CM CO CD CO f -00 CM CM CO CO CM m 00 in CD M " m CM 00 m CO CM CM CO CO CM CM CM m CO CM CD CO - f CO CM in CD oo CD T— CO CD CO 00 N-T - m 00 CD CO •sf CD CM m CD f ~ CD CM - f CM T -- f •<- m CM h- rt f -CO m CO CO m CD OO CD CO -* - t 00 rt f - •cf - f CD CO in in CO 00 o T — CO -3-CM CD CM CM 00 S CD CO rt CM CJ) CD CO O CD T - CD 00 CM o CD in CO -a-co m CO r— T— -t CD -cf CM in CM -t m CO in o CO CO CO CM O m CD in 00 CO CM T~ in CD in CD in CD < Z CO CO CD -cf h- CO CD CO CM o CO CJ) IO -3-CD 00 T - r ~ O CM o CD 00 rt CD O 00 O CM o CO £ 00 CO T - CO T - m CM o rt 00 o in CM CO CM 00 r-T— o CO CO CM •*f 00 CD T - f -CM CM o 00 T - CD CD CD CD CD CD < z o CO CD >» CO CD JA _ « a) -2 CJ) 2 < > '•5 c CD n E 3 co .2 -j= ^ « sz e F 5 2 — o 2 «J rt Q . CD E E ^ < < < i_ X , CJ) C o < E 3 E c ' c CD CD CJ) CO 1 CD > < 3 o o Q CO (A rt 1 1 1 l o o "~ o CD CJ) ,5 CD 5 < < J 8)1 p 2 E z < 2 u) i CO CD -—• >: $2 rt to CB 05 U) >j c ^ O CD M_ U) o c CD O CD < ° fc CJ) F < X ° CO Q ^ C/) u) < CD o CD 5 0) E 3 C 15 o 40 jesuo jo Aiinqeqojd o CO CO >, 03 c CD "co > co o E CD L _ CD CL c o c CD c o "O 0 CO CD -Q W ro CD CL CD o SI "co > g. c CD i _ O O M — e>-LO CD O) CO c C o CD CO o CD >> i CL CD is i _ CO CO rob bar CL CD 2 > L -LU CD 3 ti E ho o o o CM O 0 CQ "3 CD LL SZ » 41 Table 2 Age distribution of presymptomatic individuals in the UBC cohort. Data shown for individuals with a CAG between 30 and 35. Number of individuals with a CAG of AGE (years) 30 31 32 33 34 35 TOTAL 0-19 1 0 1 2 1 3 8 20-29 1 1 2 5 2 4 15 30-39 3 0 1 2 1 4 11 40-49 1 4 6 1 0 5 17 50-59 0 2 1 5 2 7 17 60-64 0 0 0 1 2 1 4 65-69 0 0 0 0 0 0 0 71-74 1 0 0 0 0 2 3 75-79 1 0 0 1 1 1 4 80-84 0 1 0 1 0 1 3 85-89 0 0 0 0 0 2 2 90-95 0 0 0 0 0 1 1 Total 8 8 11 18 9 31 85 42 Table 3 Cumulative probability of onset at different ages based on nonparametric analysis of the UBC cohort. Data shown for individuals with a CAG repeat between 39 and 50 CUMULATIVE PROBABILITY (95% CI) FOR C A G REPEAT SIZE OF A G E OF SUBJECT 39 40 41 42 (years) (n = 21) ( » - 111) (n = 98) (IJ = 129) 30 .02 (.05-.00) 35 .02 (.05-.00) .02 (.05-.00) .05* (.09-.01)* 40 .07* (.20-.00)* .08 (.13-.02) .12 (.18-.05) .14 (.20-.07) 45 .13 (.19-.05) .21 (.30-.12) .32 (.41-.23) 50 .16 (.33-.00) .21 (.30-.12) .38 (.48-.26) .58 (.66-.47) 55 .36 (.46-.25) .55 (.64-.42) .81 (.87-.71) 60 .61 (.70-47) .77 (.85-.65) .99 (1.00-.91) 65 .36 (.59-.00) .80 (.88-.67) .88' (.94-.78)* 1.00 ( N A ) 70 .68 (.87-.20) .90 (.96-77) .94 (.98-.85) 75 .79b (.94-.28)b .95 (.99-.82) .98 (1.00-.88) 80 85 1.00* (NA)* 43 44 45 46 (n = 116) (n = 123) (if = 76) (n = 63) 20 .03 (.07-.00) 25 .01 (.02-.00) .05 (.10-.00). .06 (.12-.00) 30 .05 (.10-.01) .04 (.08-.01) .17 (.25-.08) .10 (.17-.02) 35 .18 (.26-.11) .22 (.30-.14) .37 (.47-.24) .41 (.52-.27) 40 .39 (.48-.29) .49 (.58-.39) .72 (.81-.59) .86 (.93-.73) 45 .56 (.65-.45) .73 (.80-.62) .91 (.96-.80) 1.00 ( N A ) 50 .87 (.93-/78) .89 (.94-.80) 1.00' (NA)* 55 .93 (.97-.85) .96 (.99-.88) 60 .99* (1.00-.91)' 1.00* ( N A ) ' 65 1.00 ( N A ) 47 48 49 50 (» = 48) (n = 35) (n = 30) (« = 16) 15 .07 (.15-.00) 20 .04 (.10-.00) .06 (.13-.00) .30 (.45-.12) .19* (.36-.00)* 25 .16 (.25-.04) .15 (.26-.02) .41 (.57-.20) .39 (.59-.09) 30 .36 (.49-.20) .46 (.61-.26) .53 (.68-.30) .73 (.89-.32) 35 .64 (.76-.45) .78 (.88-.57) .77 (.89-.53) 1.00 ( N A ) 40 .89 (.95-.72) .89 (.96-.69) .95* (.99-.70)* 45 1.00 ( N A ) 1.00 ( N A ) 1.00 ( N A ) * Values are for 1 year greater ( C A G repeat sizes 42, 43, and 49), 1 year less ( C A G repeat sizes 39-41 and 50), or 2 years less ( C A G repeat sizes 44 and 45) than the stated interval. b Value is for an individual 71 years of age. 43 Table 4 Median age at onset based on nonparametric survival analysis of the UBC cohort. Data shown for individuals with a CAG between 39 and 50 Median Onset3 (95% Cl) CAG (years) 39 66(59-72) 40 59(56-61) 41 54(52-56) 42 49(48-50) 43 44(42-45) 44 42(40-43) 45 37(36-39) 46 36(35-37) 47 33(31-35) 48 32(30-34) 49 28(25-32) 50 27(24-30) 44 45 3.3 Conclusion My initial study clearly showed that by incorporating CAG as a parameter in a survival analysis-based estimate of age of onset, it was possible to predict the likelihood of being affected by a particular age with a mean 95% Cl of 20%. This represented the first survival analysis of onset of HD subsequent to the discovery of the HD gene and (at the time) the most accurate statistical analysis of age of onset. I also confirmed the prior assessments of ranges of repeats in affected persons as greater than 35. The degree of association between CAG and age of onset was significant (P <0.001) with a r2 of 0.73, which is higher than prior estimates of around 0.687. This 2 9 ' 8 5 probably is due to definitive determination of CAG and confirmation of clinical status. Incorporation of censored individuals (individual who are presymptomatic and at-risk) into the analysis of age of onset of HD was more efficient than only using age of onset as it allowed the inclusion of 44% more data than if these individuals were ignored. Furthermore it addresses the estimation bias introduced by only including individuals with onset 1 3 0. Survival curves were based on individuals with a CAG between 39 and 50, which represented 90% of individuals in the UBC cohort having a CAG greater than 35. The results of this preliminary study, must however, be interpreted with caution. Although highly accurate methodologies were used to calculate repeat length, and individuals' last age being unaffected was rigorously ascertained, this analysis could not immediately be extrapolated to other laboratories due to possible inter-laboratory variability in the methods of CAG and age of onset assessment. Nevertheless, it was somewhat reassuring that comparison of CAG determined on the same samples between laboratories with significant experience in assessment of CAG revealed few 46 differences in assessment of C A G 1 4 6 . Furthermore, many centers use a standardized assessment of age of onset, the Unified Huntington's Disease Rating Scale (UHDRS) as a tool in the standardized diagnosis and follow-up of HD 1 4 7 * 1 4 9 . Another cautionary note is that these analyses were dependent on data from affected and unaffected persons from families with HD. Therefore, these data may not apply equally to presymptomatic individuals in the general population who have an expanded CAG. However, it is indeed extremely rare to find such individuals in the general population. In addition, it should be recognized that a potential bias in the database is under-representation of presymptomatic persons at increased risk for developing HD. This would result in over-estimation of the age of onset in this study. However, this analysis did include 661 affected and 205 presymptomatic individuals at-risk and while this does not represent a complete assessment of CAG in a defined population of affected and at-risk individuals, there was no systematic bias in the ascertainment of data. While my analysis indicated that on average there is not a large difference between neurological and psychiatric onset, the latter is not always solely due to HD, and can be confounding. To avoid this possible bias, use of psychiatric onset was avoided in the analyses presented in the forthcoming chapters. The probability curves derived in the initial study could not be used to predict the particular age of onset for an individual, due to the large (20%) confidence interval of the predictive model, the lack of a complete range of age-specific estimates (a result of the nonparametric analysis), and the fact that it was based only on the results of one 47 center. However, this analysis suggested the clinical utility of the methodology in providing estimates of symptom free survival to an individual seeking additional information as part of a predictive testing program. This prompted my further study of survival analysis-based methods for the estimation the age-specific probability of onset, as discussed in the following chapters. 48 CHAPTER 4 NONPARAMETRIC PREDICTION OF ONSET OF HD USING A WORLDWIDE COHORT. The work presented in the following chapters relating to the development of the predictive model, and its use in predicting penetrance of HD and in the design of clinical trials have contributed to one submitted publication: Brinkman, R.R., Langbehn, D., Falush, D., Paulsen, J. and Hayden, M.R. on behalf of an international Huntington Disease collaborative group Predicting age of onset and penetrance and designing clinical trials for Huntington Disease. Submitted to Nature Medicine (July 25, 2001). 49 4.1 Introduction A predictive model fails to provide useful clinical information if it has broad confidence limits. Ivly initial analysis1 5 0 was based on a nonparametric survival analysis of individuals with a CAG between 39 and 50 using data from one center. However, the average size of the 95% confidence intervals was 20%, which is too inaccurate for use in presymptomatic counseling1 0 5. Further analyses were therefore warranted on a larger cohort, to both verify and extend the preliminary findings. To do this I recruited 40 clinics from nine countries on four continents. 4.2 Results Individuals were from a cohort ascertained through 40 centers (Appendix IV). The distribution of affected and presymptomatic at-risk individuals with repeats greater than 35 is shown in Table 5. Three thousand four hundered and fifty two individuals (2634 affected, 818 presymptomatic) met this CAG criterion. There were no affected individuals with less than 36 repeats. The nonparametric survival curves are shown in Figure 7. The results were nearly identical to those obtained for the UBC cohort, but the mean confidence limit was reduced to 13% compared to the previous 20% Table 6. The curves failed the assumption of a proportional hazard relationship (p<0.0001) - a prerequisite for applying the familiar Cox model of survival analysis. 50 • o> O) < g S o O J oi O J m T-iq O J wC O SIS CO CM O CM O O T . O sz o o CD •g o CD Q X co <o ^ CO CO o S3 8 8 • a < s i CM z q o • co o o i 8 "* 5 d d - CO o d o o 6 o i/i N 3 ^ r cy CM T- co o d d t» CM IO O d 8 S 5 5 * 5 d d m co co CJ cn co in o d •r- 6 S lO O ) tO N lO •tf CO I O o *~ d If) O) O O N CO o co eg <x> o C M d 8 5 d C D T--9 O d 3 5 d £ 8 o r~ L O o O J , • • o o l CD _co CD 3 -g •> T3 C CO E o C L E >» C/3 CO T3 C CD T J £ CD C o '-•-» 3 JO ' CO b in _CD Xi CD 3 CM o C O CM CO CO N C O O O ^ r-*- d g cc> o i o> o CM cn j i- IO O N O U) CM CO v CO T* t— *~ d 8 3 5 S 2 -d T- ffi O < «) W CJ co i - r- v © d i.l < o OT <5 2 T3 jg s« § i s i " I i s » $ ? ^ , 2> S" « 5 1 « co 2 g> tr T- io o o> CM ^ tO CM C O r-n d c$ 28 co cS o r- O o d o co co o* m co o co co r- o d 3 I ^ c B S J .i I "B a J .5 re Q 8 t 2 2 co a. w - ID l i s • • g o — SP « 0 « C 1 a c CU Ol 3 I S I z < 2 o> CO m co 3; r- r» s: © * . OT CO c _ -n c f o « E J o I I I S CO d i Ia OT •SS cr 51 r S3 CD N W CO C L 0) cc CD < CO co cn co co I I o tf CO • t f LO CD tf CO 05 "V tf o i n T ~ CM co in m i n co in U I M I M H H I I I I } o a> T"" d co d d m o i n d tf-d co d OJ d } 9 S U Q jo ^!l!QBqoJd LO CO 1— co 1 CD 9 CD E CO u. CO C L C o c CO c o •53 CD CO CO _ Q o t o •g c CO CD CO c CD CD CD - Q o < o co TJ CD O I CO C O c ' 0 CD CO c o 14— o >> •4—» 1 ° O O C L CD . 1 1 Z> O E $ O ^ tf ^ 2 > U_ CO 52 Table 6 Mean, maximum and standard deviation of the 95% Cl of the prediction of age of onset made using nonparametric survival analysis of the UBC and worldwide cohorts 95% Cl size Cohort Mean Standard deviation Maximum UBC cohort 0.20 0.10 0.55 Worldwide cohort 0.13 0.10 0.45 53 4.3 Conclusion The nonparametric survival analysis of the worldwide cohort confirmed the utility of this method to provide estimates of the age-specific likelihood of onset. The estimates based on the worldwide cohort were more accurate than those obtained using the smaller UBC cohort (Table 6). This analysis also provided indirect evidence that there was little difference in the way clinics worldwide determine both repeat size and age of onset, a result that has been found in direct comparisons between centers previously146. However, this increase in accuracy may not have solely been due to the increased cohort size. This analysis also used only those patients with a more accurate measure of CAG than in my initial study (CAG was determined exclusive of the adjacent CCG repeat for all patients)127. 54 CHAPTER 5 PARAMETRIC PREDICTION OF ONSET USING A WORLDWIDE COHORT 55 5.1 Introduction Compared to nonparametric methods, parametric survival curves have smaller standard errors, more precise statistical inferences, and easier interpretation151. The similar S-shape of all the CAG-specific survival curves lead me to hypothesize that a parametric probability distribution152 could be fit to each. 5.2 Results The logistic distribution had the best average fit to the nonparametric survival curves for all repeats as shown by the goodness-of-fit of the quantile-quantile plots. For example, Figure 5 shows the fit of 6 distribution families to a survival analysis of 41 repeats. The logistic distribution gave a fit closest to a straight line. Similar results were obtained for all CAG repeats (Appendix I, Figure 30 to Figure 49). The maximum likelihood is the value of the likelihood function when the parameters are replaced by their maximum likelihood estimates and measures the extent to which the data are fitted by a particular model. The larger the value, the better the fit between the model and the observed data 1 5 3. The log-likelihood analysis also indicated that the logistic fit was consistently best (i.e. largest) for each CAG length, suggesting a common shape of the survival distributions (Table 7). The normal distribution had a fit that was almost as good as the logistic. A comparison between a logistic and normal distribution is shown in Figure 6. Note that while the logistic distribution has a notably higher peak at its median Figure 6(a), it has a longer tail Figure 6(b). Despite these differences, there is usually little divergence between models fit to a normal or a logistic distribution154. I chose the 56 logistic as it had a slightly better overall fit compared to the normal distribution, and has a closed cumulative density formula, allowing for easier derivations154. The mean, maximum and standard deviation of the 95% Cl of the prediction of age-specific likelihood of onset were all smaller for the parametric model compared to the nonparametric estimates (Table 8). For example, the mean size of the 95% Cl was 13% for the nonparametric model and only 3% for the parametric model. 57 CO c 0 My—• CO 0> Q . O i CD < O >, X3 JO CD T3 O E CD E CO i_ CD Q . CD o o 0 ~r co o _i r-JQ CD O) ro CO o c CD c o a X CD CO CM LO LO LO o CD CD rt CO a i CD T f LO CM CO CO T i CM i LO I CD i r~ CD LO LO CD O 00 1^  CD CM 1^  CO T f CO CO S- T f T f o ^ — O) CO T f i 1 1 CM i CD CO o LO o T f r— CM LO CM 00 d LO CD CD CO I i CM i CD i i T f CD CM CM 00 CD CD 00 q Csi ^ oo CM CM oo CD ^ CD CD ^ — rt op O CD 1 i CM i CO i 00 CO O CD 1^  LO LO CM CD CD o ° LO CD CO N CJ) CO 1 CM r~-o CD CD N - CO T f CO 1^  CD CD CM o T— 1 i rt 1 CM CD 00CD CM T f CO O T f LO 1^  o CD CM LO LO I LO CD CO CD CM CD rt CO T— CM oo T f 1 LO i CM i CM CD CD CO CM C -CM CD LO CO CO CD T— t LO CO 00 LO LO rt a> d d CM CO CD CD T f T— T— T— O CD CD CD CO T -O CO CO LO CD rt CD CD LO T f O CD LO CM rt CO CM CO CM C O CO C O T f 00 CO cb co rt CD LO co d CD LO T f CM CM T f 1^  T f CD 00 CO CO o CM T f CD 00 LO CD CD rt ob CO cb CD cb ob LO CO 1^  CM CO T f CD CO CM LO CO LO LO CD LO o LO O LO CD T f CM CO t CO • CM i CM 1 i i I i 1 i 1 CM o CO CM 1^  o o CD T f CD CD CD o CO CM T f CO r~ CO T f CD CO d 00 T f CD LO CM CD 1^  CD CO LO CD T f o CD LO T f LO CO 1^  i 00 1 CD i T CO 1 CM i I T— 1 i • LO 00 CO LO CD CM O CO O 00 CD CD o CD CO T f CD r*~ LO O CD LO CM T f eb c\i CM cb CD <J> 00 o a> LO CD T f CD T f T— CD CO CO o CO T f CM T— CM CO 1 CM i 1 • • i i > • 1 CO o LO 00 T f CM LO CD CD 1^  CO CO T f CD rt CD 1^  CO rt d CO d cb cb LO cb CO cb r^ ! LO 00 o CO CO CD 00 CD CO 00 CO CD i 1 LO i CM 1 • 1 i 1 1 -cf CD T f o CD O O o . — CD in r~-ut t o u> o c ^ « ^. 3 ca O) c ro 'in in 3 ro oi CD O _ LO CO LO E ^-L . T f O ' c at oLO LO E d J> LO X III o — CD 3 00 n ^ f O n < CO oo LO CM 1 CM oo en co co O LO T f CO LO CD O r -T f T f CM CO T f T f 5 CM LO CM rt 1^  CM CO CM 00 LO CM LO CO T - CD K ) IO N CO CD O r T f T f T f T f T f CO  CM CD 1^  CO LO O CO LO 00 CO CO LO rt CD CM CM CO CD CN i CO O T f O T f LO T f T— CO T f oo CD CD 00 CD o w CD f~ CO CO LO CM T f CD LO CD rt CO rt 00 00 T f 1^  rt o CO CM T f cb T f 00 00 d T f 1^  T f T f cb T f ob cb d LO CM T f CO CD CD T f CD T - T— CM rt" CM CO T— CD CD CD io I CM i LO i CD i • CM 1 O i CD i CD i LO i t CO i CM i i 1 1 i 1 ? CO ,_ CM CD CM CM CD CO 00 LO CD T f CD CD ,_ CD o cn CD f~ CO CD CD LO CD CO T f CD T f CD CD 00 CD T — T f CD CM to rv! CO cb CM c\i CD LO T f d 00 LO CD T f T f CO T f CD CO LO CM T f T f 00 CD CD LO CD T — T — CM ^ — CM CO T — CD CD CD IT) i CM i LO I CD i 1 CM i O i CD i CD i LO I CO i CNJ I i 1 1 i i CO o LO LO o CO o CD CD T f o CO 00 00 CO CO o CD CD rt CD oo CD 00 rt O T f CD T f CM rt CO rt LO rt 1^  00 CD cb T f cb d CO 00 LO T f ob cb cb d CO d LO CM CO CO CD CD LO CD T— T— CM CM CO T— CD CD LO 1 CM LO • 1 CM i o T— 1 CD 1 CD LO i CO i CM i i i 1 1 • i T CD O CM T f CO O CD CD CM CO T f 00 T f CM 00 oo LO CD a> CD CD T — o 00 CD rt CO CO CO CM h~ CO CD q CO rt T f o 1^  CO cb cb LO CD cb T f CD CO CD CM LO CO ob cb cb T f LO CM T f LO CO r - CD CD CD ^ — ^ — CM CM CM CO ^ — CO CD CD LO I CM i LO 1 CD i 1 CM i O i cn CD i LO I CO i CM i i i • 1 i CD CO CM T f CD O CD O) CM CO T f 00 T f CM CO 00 LO CD 00 CD 00 rt o CO CD T— CO CO CO CM CO CD q CO rt T f rt 1^  d cb cb LO CD CO T f CD cb CD c\i LO 00 ob cb 1^  cb T f LO CO T f LO 00 CD CO CD ^ — y— CM CM CM CO ^ — CD CD CD I CM 1 LO i CD 1 rt 1 CM i O i CD i CD i LO i T CO i CM 1 i i 1 i i ? CD T f LO CO O LO 00 LO CO T f LO CD CM CM CN T f CO at oo 00 LO LO CO T f 1^  CD CO o CM T f CO CM rt o CM CM LO cb d 1^  cb 1^  d CM LO LO 1^  CM CO CO cb T f cb 1^  cvi LO CO T f T f O) CD o 00 CM CO CM CO CM CM T f rt CD CD CO i CM i LO I CD i 1 CM 1 1 CD i r-~ i LO 1 CO 1 CM i rt i i rt 1 1 i CO CD CD CD CD 00 CO CM CD CD CM <J> CM CM CD o o T f 00 o CD O CM CO rt CM ^ — rt 1^  LO o 00 T f CO T f CO CD 00 T f CN CD LO CM CO T f LO CO LO LO LO LO LO LO LO C CO CD S 58 c o AiuiqeqoJd C T J i n c O O l L O C O L O C M L D T - C N J i n L O C71 C7> O) 05 O) " O O O O o O) 0) O! ' , ' • o o o °> o> • Ainiqeqojcj • c -3 TO 59 (a) (b) -2.8 -2.6 -2.4 -2.2 -2.0 Figure 6 Comparison of the logistic and normal distributions. The thin gray line is a logistic distribution with mean 0 and scale = Sin (therefore variance =1) . The thick black line is a standard normal distribution with a mean 0 and variance 1. 60 0 Nj CO CD rn a : 0> I GC • 0 < o I C I I I I I H H H I I I I I d co d d co d in d •tf d co d T— CM d d o d jasuo jo Ajiiiqrgqojd co to CD ,>j CD CO < o sz o o CD •g I o $ CD c o •D 8 » co CD O o > 3 CO o •o c CD CT> O CO E c EH Q. C o c 0 0 0 -Q O _ < *= O £ (0 as £ co T J O 0 .1= 0 3 £ to I? CO CD 1 0 0 £ £ i f CD > 0 CO c o 2: 3 CO O o 4—' CO = 'co •S 2 CO — -° "F 2 0 Q. CO 0 2 > Q. •• 0 ro3 E 3 o IS-8> o 3 O E U_ CO CO 0 3 O 61 Table 8 Mean, maximum and standard deviation of 95% Cl of the prediction of the age-specific likelihood of onset made using nonparametric and parametric models with the UBC and worldwide cohorts. Cohort Model UBC cohort Nonparametric Worldwide cohort Nonparametric Parametric 95% Cl size Mean Standard deviation Maximum 0.20 0.10 0.55 0.13 0.10 0.45 0.03 0.06 0.37 62 5.3 Conclusion I used a combination of methods to compare alternative distribution models fitted to the observed set of survival data obtained from the worldwide cohort. The logistic survival curves showed an extremely good fit to the nonparametric survival curves for all the repeats for which there was enough data to perform the test (Table 7, Figure 5, Figure 30 to Figure 49). While one other study of parametric distribution of age of onset for HD found that a logarithmic normal model showed a good fit, this study was conducted before the genetic defect was known1 5 5. As expected, using the parametric model resulted in a significantly (10-fold) smaller average size of the 95% Cl, compared to the results using a nonparametric model (Table 8). The parametric survival curves accentuated the relationship between the CAG-specific survival curves. Each survival curve seemed to be separated from its neighbors by a consistent amount, though higher repeats seemed to be clustered more closely together. Furthermore, survival curves for lower CAG repeats seemed to increase less steeply than higher CAG repeats, with a stepwise increase in the steepness between repeats. This observation was what led me to hypothesize that it would be possible to develop a parsimonious model exploiting this interrelatedness between different repeats. 63 CHAPTER 6 PARSIMONIOUS MODEL FOR PREDICTING ONSET USING A WORLDWIDE COHORT 64 6.11ntroduction Based on my observations of the interrelatedness of the parametric survival curves, I hypothesized that it would be possible to use this relationship to extend the parametric model into a unified (parsimonious) model. I hypothesized that it would be possible to incorporate information from a wide variety of repeats into one model, compared to my earlier method of individually fitting survival curves to each CAG repeat. This would allow the use of all the patients in estimating the parameters of the distribution, leading to smaller confidence intervals. 6.2 Results The location and scale parameters for each CAG-specific survival curve estimated by the logistic model followed a regular curvilinear pattern (Figure 8 and Figure 9). I found that exponential functions of the form shown in Equation 8 provided excellent fits to both the mean and variance of the age of onset. y = a + b exp[-(CAG)/c Equation 8 The non-linear square of the correlations (r2) for these fits were .99 and .96 for the location and scale respectively. However, the fit of the exponential curve to the location and scale parameter estimate for less than 41 repeats (Figure 8) and the estimate of the scale parameter for greater than 56 repeats (Figure 9) was not as good as between 41 and 56. The exponential relationship for both the parameters indicated that it might be possible to derive a mathematical model relating the individual CAG-specific survival 65 curves together. However it was important to first understand the lack of fit at the extreme repeats. The lower CAG repeats corresponded to an increased proportion of censoring (Table 5). Only 16% of individuals with 36 repeats were observed to have onset. The average age of onset (66 years) indicated that many individuals would not have onset of HD within their lifetime. When the centers were split according to their censoring ratio, the different censoring groups gave increasingly different estimates for the mean age of onset for repeat lengths less than 41, but similar estimates for larger repeat lengths (Table 9 and Table 10). The low, middle and high censoring groups had 467, 2538 and 275 subjects respectively. The low censoring group had notably more pessimistic estimates for the lowest and highest CAG repeat ranges. However, the high censoring group provided little data for the high CAG repeat ranges, so estimates in this range (high censoring/high CAG repeat length) should be considered with caution. The high censoring group also provides higher age of onset estimates than the mid-range censor group, though this is not as severe. These differences are only statistically significant (p< .05) from CAG repeat from 42 to 51. These results imply that the survival curves are reliable for a CAG 41 or greater, but are increasingly biased by incomplete ascertainment for shorter repeats. Furthermore, there were only sufficient patients (at least 20 per repeat, provided that the censoring ratio is less than 0.5) up to 56 repeats to derive accurate estimates of the parameters156. For example, there were on average 12 individuals per CAG between 57 and 60, while there were at least 20 for each CAG between 37 and 56. Therefore, only the 2913 individuals (84% of the sample) who had a CAG between 41 and 56 (Table 2) were used for the development of the parsimonious model. 66 67 68 Table 9 Censoring rates for worldwide cohort Residual Expected Observed Censoring rate Censoring rate Censoring rate -0.23 0.23 0 -0.16 0.17 0.01 -0.15 0.15 0 -0.14 0.19 0.06 -0.12 0.12 0 -0.08 0.16 0.07 -0.08 0.10 0.02 -0.08 0.21 0.13 -0.07 0.07 0 -0.06 0.21 0.15 -0.06 0.20 0.15 -0.05 0.27 0.22 -0.05 0.24 0.19 -0.03 0.19 0.16 -0.02 0.23 0.21 0 0.19 0.19 0.01 0.11 0.13 0.01 0.07 0.09 0.02 0.23 0.25 0.03 0.27 0.30 0.04 0.31 0.35 0.05 0.31 0.36 0.06 0.19 0.24 0.06 0.41 0.47 0.07 0.28 0.35 0.08 0.32 0.40 0.12 0.27 0.38 0.20 0.23 0.43 0.22 0.35 0.57 0.23 0.38 0.61 0.24 0.22 0.46 0.24 0.42 0.67 0.27 0.30 0.57 0.28 0.22 0.50 0.30 0.36 0.67 0.38 0.32 0.7 0.38 0.43 0.82 0.40 0.31 0.71 0.50 0.35 0.85 Mean Center Number Censoring age of Individuals Group 41.48 39 42 low 41.11 3 145 low 51.00 37 36 low 40.77 10 193 low 48.84 8 51 low 45.67 34 27 medium 45.05 16 82 medium 41.84 2 237 medium 58.5 31 2 medium 41.22 13 60 medium 42.82 21 205 medium 39.78 7 59 medium 44.07 9 216 medium 45.95 27 44 medium 47.41 20 34 medium 41.59 18 167 medium 48.75 5 8 medium 42.38 1 47 medium 39.19 14 36 medium 40.90 0 923 medium 39.65 29 34 medium 40.27 32 11 medium 48.48 35 99 medium 40.5 12 62 medium 40.65 4 147 medium 38.16 19 25 medium 47.69 40 13 medium 44.43 26 7 high 47.14 25 7 high 37.20 38 80 high 50.77 11 13 high 38.67 30 6 high 40.54 23 28 high 48.77 22 48 high 36.29 36 21 high 38.81 17 27 high 38.55 6 11 high 45.14 28 7 high 33.25 33 20 high 69 Table 10 Mean age of onset estimates for censoring groups based on an adjusted grouping of censor rates. CAG overall high medium low 38 72.81 79.34 78.11 67.43 39 66.51 71.47 69.84 62.86 40 60.94 64.73 62:80 58.67 41 56.03 58.96 56.80 54.82 42 51.70 54.02 51.69 51.29 43 47.88 49.79 47.35 48.05 44 44.51 46.17 43.64 45.08 45 41.53 43.07 40.49 42.34 46 38.90 40.41 37.81 39.84 47 36.59 38.14 35.53 37.54 48 34.54 36.19 33.58 35.43 49 32.74 34.52 31.93 33.49 50 31.15 33.09 30.52 31.71 51 29.74 31.87 29.32 30.08 52 28.50 30.82 28.30 28.58 53 27.41 29.92 27.43 27.20 54 26.44 29.16 26.69 25.94 55 25.59 28.50 26.06 24.78 56 24.84 27.93 25.52 23.72 70 Based on my finding of the ability to fit (1) a logistic distribution to each of the CAG-specific survival curves ( Equation 2) (2) an exponential relationship between CAG and the location parameter of each of the CAG-specific logistic survival curves, especially over the repeat range of 41 to 56 ( Equation 8), and (3) an exponential relationship between CAG and the scale parameter of each of the CAG-specific logistic survival curves, especially over the repeat range of 41 to 56 ( Equation 8), I hypothesized that it would be possible to relate all the CAG repeat-specific survival curves together into one parsimonious model, based on the data for 41 to 56 repeats. This assumption was based on the fact that taken together, these three relationships specify all the information required to obtain a likelihood of onset for any combination of CAG and age. I also hypothesized that it would be possible to use the data from the entire worldwide cohort to estimate the parameters of this model, versus my previous methods of fitting a separate survival curve to data from each CAG. Based on these hypotheses, Dr. Langbehn estimated parsimonious survival model for HD onset as: This equation is of the same form as a logistic function upon which it is based (Equation 2) with the incorporation of terms to account for the exponential functions (Equation 5) of both the scale and location parameters. Parameter values were estimated as described in the Methods section (2.4.2.3). 1 #(-21.55 - exp[9.56 - 0.146(C4G)] + Age) \ Equation 9 l+exp J 71 The evaluation of Equation 9 for a repeat between 39 and 56 and ages between 0 and 100 is shown in Figure 10, illustrating the smooth transition between repeats of the both the steepness of the survival curves and the median age of onset predicted by the parsimonious model. The good fit between the nonparametric and parsimonious models of disease onset for a 53 repeats is shown in Figure 13. Results for 41, 45 and 49, repeats are shown in , Figure 50 to Figure 52(Appendix 1). Similar agreement was observed for the entire range of repeats. The approximate p-values of each of the six terms estimated within the model were highly significant (minimum chi square = 13.819, 1 d.f., p = .0002). Finally, the goodness-of-fit of our model was satisfactory when compared to a saturated logistic model, in which a separate 2-parameter model was fit for each CAG (chi square = 35.165, 26 d.f., p = .108). The estimates of the mean and standard deviation of age of onset given by the parsimonious model are shown in Figure 11 and Figure 12 respectively. The lines in Figure 11 and Figure 12, while similar to the initial least squares fits (Figure 8 and Figure 9), are actually derived from the maximum likelihood model discussed below. The sample size decreased markedly for repeats greater than 53 (Table 2), contributing to the decreased apparent fit of the exponential model in that region. The confidence intervals displayed in Figure 11 and Figure 12 are for the estimated true population values and are not directly applicable to the estimates derived from the data of one 72 repeat at a time (points in the figure). These points contain additional sampling error that contributes to their dispersion around the true population values. Figure 14 shows the goodness-of-fit of CAG and age-specific estimates of the likelihood of onset made by the parsimonious model compared to nonparametric estimates for the same repeat. Numerical values for predictions of the cumulative probability of onset based on the parsimonious model are shown in Table 11, A comparison of the mean size of the 95% confidence limits of the age-specific likelihood of onset of the nonparametric model for the UBC and worldwide cohort and the parametric and parsimonious model is shown in Table 12 and graphically for a selection of repeats in Figure 14. The average size of the 95% Cl for individuals less than 91 years of age over the CAG range of 41 to 56 for the parsimonious model was 2% of the estimated age-specific probability of onset. Under the best-fit parametric model, the variance in age of onset is larger for shorter repeats, as illustrated by the greater spread in the distribution of age of onset (Figure 15). For high repeat lengths, most individuals are predicted to have onset within a very narrow range, with a smooth transition along the range of repeats. 73 Figure 10 Cumulative probability of onset for being affected for a CAG between 39 and 56 based on the parsimonious model 74 10 36 38 40 42 44 46 48 50 52 54 56 58 60 GAG repeat size Figure 11 Population estimates of mean age of onset for CAG repeat lengths 36 to 60. The • symbols and solid line indicate the range of data that was used to fit the exponential curves. The o symbols and long dashed lines indicate CAG lengths for which the model's predictions were extended. Small dashed lines indicate 95% Cl, larger spaces between dashes indicates repeats for which the model's predictions were extended. 75 00 cd 321 CD >>. 30 28 CD CO 26 c O 24^  o. 22 CD 20 CD CO 18 M— o 16 C 14 10. .2 viat 1 10 CD TJ. 8 *o 6 CO 4 : " D 2 : C CO C O 1 r" r~ ~i 1 r~ 36 38 40 42 44 46 48 50 52 54 56 58 60 CAG repeat size Figure 12 Population estimates of standard deviation of age of onset for CAG repeat lengths 36 to 60. The • symbols and solid line indicate the range of data that was used to fit the exponential curves. The O symbols and long dashed lines indicate CAG lengths for which the model's predictions were extended. Small dashed lines indicate 95% Cl, larger spaces between dashes indicates repeats for which the model's predictions were extended. 76 1 g .52 c CO 03 >> CO CD (j) ? i j9Suo i o Ajinqeqojd aAueinwno 9J LO ~ as J 5 D) 3 C 11 O w co £ T - Q . _ ; <D £ O g) - ^ CD LO LL JD CD 77 CD N CO co CD CD LE S CO CO 03 co co o CM CO tf •sf tf I ! ! H I M q o> i— d CO d d CO d r -in tf d d 19SUQ JQ /ty i iqBqojd CO CD o O o CD 1 -co — L = O 0 CD 8 -c .1= ~ ro "81 8? X J E o c LO g ei £ CD CO rz CD CD CD - O 00 T J O 'i— E CO co (3 § < c O § co <= L _ O o *= -o T3 CD CD ' O | CO CO. CO CL £ o o C CO 2 | * - - C = I o E w- CO o .•&•§ 1 1 2 | CL O P CO O £ *f — T - CO i f u_ co 78 Table 11 Cumulative probability of onset at different ages based on parsimonious model. Cumulative probability (95% Ci) for a repeat of Age of subject (years) 36 37 38 39 15 " 0.00(0.00-0.02) 0.00 (0.00-0.01) 0.00 (0.00-0.01) 0.00 (0.00-0.00) 20 0.00 (0.00-0.02) 0.00 (0.00-0.01) 0.00(0.00-0.01) 0.00(0.00-0.01) 25 0.00 (0.00-0.03) 0.00(0.00-0.02) • 0.00 (0.00-0.01) 0.00(0.00-0.01) 30 0.00 (0.00-0.04) 0.00 (0.00-0.03) 0.00 (0.00-0.02) 0.01 (0.00-0.01) 35 0.00 (0.00-0.05) 0.01 (0.00-0.04) 0.01 (0.00-0.03) 0.01 (0.00-0.03) 40 0.01 (0.00-0.06) 0.01 (0.00-0.05) 0.01 (0.00-0.04) 0.02(0.01-0.04) 45 0.01 (0.00-0.08) 0.02 (0.00-0.07) 0.02 (0.01-0.07) 0.04 (0.02-0.07) 50 0.02(0.00-0.10) 0.03(0.01-0.10) 0.04(0.02-0.10) 0.07 (0.04-0.12) 55 0.03(0.01-0.13) 0.04 (0.01-0.13) 0.07 (0.03-0.15) 0.13(0.08-0.19) 60 0.04 (0.01-0.16) 0.07(0.03-0.17) 0.12 (0.07-0.21) 0.23(0.17-0.30) 65 0.06 (0.02-0.20) 0.11 (0.05-0.23) 0.20(0.13-0.30) 0.37 (0.29-0.44) 70 0.10(0.03-0.25) 0.17(0.09-0.30) 0.32 (0.23-0.42) 0.53(0.45-0.61) 75 0.14(0.06-0.30) 0.26 (0.16-0.39) 0.45 (0.36-0.55) 0.69(0.61-0.76) 60 0.21 (0.10-0,37) 0.37 (0.26-0.50) 0.60 (0,50-0,69) 0.81 (0.74-0.87) 85 0.29(0.17-0.45) 0.49(0.38-0.61) 0.73 (0.62-0.82) 0.90 (0.83-0.94) 40 41 42 43 15 0.00 (0.00-0.00) 0.00 (0.00-0.00) 0.00 (0.00-0.00) 0.00 (0.00-0.00) 20 0.00 (0.00-0.00) 0.00 (0.00-0,00) 0.00 (0.00-0.00) 0.00 (0.00-0.00) 25 0.00(0:00-0.01) 0.00(0.00-0.01) 0.01 (0.00-0.01) 0.01 (0.01-0.01) 30 0.01 (0.00-0.01) 0.01 (0.01-0.01) 0.01 (0.01-0.02) 0.02 (0.02-0.03) 35 0.01 (0.01-0.03) 0.02 (0.02-0.03) 0.04 (0.03-0.04) 0.06 (0:05-0.07) 40 0.03 (0.02-0.05) 0.05 (0.04-0.07) 0.09(0.08-0.10) 0.16(0.14-0.17) 45 0.06 (0.04-0.09) 0.11 (0.09-0.14) 0.20 (0.18-0.22) 0.35 (0.33-0.36) 50 0.13(0.09-017) 0.23 (0.20-0.26) 0.40 (0.37-0.42) 0.60 (0.58-0.62) 55 0.24 (0.19-0.29) 0.41 (0.38-0.45) 0.63 (0.60-0.66) 0.81 (0.79-0.83) 60 0.40 (0.35-0.46) 0.62 (0.59-0.66) 0.82 (0.80-0.83) 0.92 (0.91-0.93) 65 0.59 (0.53-0.64) 0.80 (0.76-0.83) 0.92 (0.91-0.93) 0.97 (0.97-0.98) 70 0.76 (0.70-0.80) 0.90 (0.88-0.92) 0.97 (0.96-0.97) 0.99 (0.99-0.99) 75 0.87 (0.83-0.90) 0.96 (0.94-0.97) 0.99 (0,98-0.99) 1.00(1.00-1.00) 80 0.93 (0.90-0.96) 0.98 (0.97-0.99) 1.00(0.99-1.00) 85 0.97 (0.95-0.98) 0.99 (0.99-0.99) 44 45 46 47 15 0.00 (0.00-0.00) 0.00(0.00-0.00) 0.00 (0.00-0.00) 0.00 (0.00-0.00) 20 0.00(0.00-0.01) 0.01 (0.00-0.01) 0.01 (0.01-0.01) 0.01 (0.01-0.02) 25 0.01 (0.01-0.01) 0.02 (0.02-0.02) 0.03(0.02-0.04) 0.05 (0.04-0.05) 30 0.04 (0.03-0.04) 0.06(0.05-0.07) 0.10(0.09-0.11) 0.16(0.14-0.17) 35 0.10(0.09-0.12) 0.18 (0.16-0.19) 0.28 (0.26-0.30) 0.41 (0.39-0.44) 40 0.27 (0.25-0.29) 0.42 (0.40-0.44) 0.59 (0.56-0.61) 0.73 (0.70-0.75) 45 0.53 (0.51-0.55) 0.71 (0.69-0.73) 0.84 (0.82-0.85) 0.91 (0.90-0.92) 50 0.78 (0.76-0.80) 0.89(0.88-0:90) 0.95 (0.94-0.96) 0,97 (0.97-0.98) 55 0.92 (0.90-0.93) 0.96 (0.96-0.97) 0.99 (0.98-0.99) 0.99 (0.99-0.99) 60 0.97 (0.97-0.98) 0.99(0.99-0.99) 1.00 (0.99-1.00) 1.00 (1.00-1.00) 65 0.99 (0.99-0.99) 1.00 (1.00-1.00) 1.00 (1.00-1.00) 70 1.00 (1.00-1.00) 48 49 50 51 15 0.00(0.00-0.01) 0.01 (0.01-0.01) 0.01 (0.01-0.01) 0.01 (0.01-0.02) 20 0.02 (0.02-0.02) 0.03 (0.02-0.03) 0.04 (0.03-0.05) 0.05 (0.04-0.07) 25 0.07 (0.06-0.08) 0.10 (0.09-0.12) 0.15(0.13-0.17) 0,20(0.17-0.23) 30 0.23(0.21-0.26) 0.32 (0.30-0.35) 0.42(0.39-0.45) 0.52 (0.48-0.55) 35 0.55 (0.52-0.57) 0.66 (0.64-0.69) 0.76 (0.73-0.78) 0.82 (0.79-0.85) 40 0.83 (0.81-0.85) 0.89 (0.87-0.91) 0.93 (0.91-0.94) 0.95 (0.94-0.96) 45 0.95 (0.94-0.96) 0,97 (0.96-0.98) 0.98 (0.98-0.99) 0.99 (0.98-0.99) 50 0.99 (0.98-0.99) 0.99(0.99-0,99) 1.00 (0.99-1.00) 1.00 (1.00-1.00) 55 1.00(1.00-1.00) 1.00(1.00-1.00) 79 Table 12 Mean, maximum and standard deviation of 95% Cl of the prediction of age of onset made using nonparametric, parametric and parsimonious model with the UBC and worldwide cohorts. 95% Cl size Cohort Model Mean Standard deviation Maximum UBC cohort Nonparametric Worldwide cohort Nonparametric Parametric Parsimonious 0.20 0.13 0.03 0.02 0.10 0.10 0.06 0.03 0.55 0.45 0.37 0.06 80 Figure 15 Distribution of age of onset for individuals with 36 to 56 CAG repeats based on the parsimonious model. 81 6.2.1 Conditional probability tables Prediction of onset Equation 9 gives the age-specific probability of onset, predicted at birth. An estimate that is typically of greater practical relevance is the conditional probability of onset, which takes into account an individual's current age, which can be derived from Equation 6. For example, a 40-year-old individual with 42 repeats could be told that on average (50% chance) they are likely to have onset by age 54 (Table 13). However, the average age of onset for presymptomatic individuals aged 50 with 42 repeats is 58. The median age of onset is presented in Table 14. Conditional predictions, as well as mean and median ages of onset for individuals aged less than 91 years with a CAG between 36 and 56 are presented in Appendix III. 82 Table 13 Mean age of onset of HD based on the parsimonious model, conditional on CAG and current age Mean age of onset for a CAG of Current age 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 (years) At birth 95 85 77 69 63 57 52 48 45 41 39 36 34 33 31 30 29 28 27 26 26 5 95 85 77 69 63 57 52 48 45 41 39 36 34 33 31 30 29 28 27 26 26 10 95 85 77 69 63 57 52 48 45 41 39 36 34 33 31 30 29 28 27 26 26 15 95 85 77 69 63 57 52 48 45 41 39 36 34 33 31 30 29 28 27 27 26 20 95 85 77 69 63 57 52 48 45 42 39 37 35 33 32 31 30 29 28 28 27 25 95 85 77 69 63 57 52 48 45 42 39 37 35 34 33 32 31 31 30 30 30 30 96 85 77 69 63 57 53 49 45 42 40 38 37 36 35 35 34 34 34 34 34 35 96 86 77 70 63 58 53 49 46 44 42 41 40 39 39 39 39 39 39 38 38 40 96 86 77 70 64 58 54 51 48 46 45 44 44 44 44 44 43 43 43 43 43 45 96 86 78 70 64 59 56 53 51 50 49 49 49 49 49 48 48 48 48 48 48 50 96 86 78 71 65 61 58 56 55 54 54 54 54 54 54 53 53 53 53 53 53 55 97 87 79 72 67 64 62 60 60 59 59 59 59 59 59 58 58 58 58 58 58 60 97 88 80 74 70 67 66 65 65 64 64 64 64 64 64 63 63 63 63 63 63 65 98 89 82 77 73 72 70 70 69 69 69 69 69 69 69 68 68 68 68 68 68 70 99 91 84 80 78 76 75 75 74 74 74 74 74 74 74 73 73 73 73 73 73 75 101 93 87 84 82 81 80 80 79 79 79 79 79 79 79 78 78 78 78 78 7 80 103 95 91 88 87 86 85 85 84 84 84 84 84 84 84 83 83 83 83 83 8 83 Table 14 Median age of onset of HD based on the parsimonious model, conditional on CAG and current age Median age of onset for a CAG of Age (years) 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 At birth 95 85 77 69 63 57 52 48 45 41 39 36 34 33 31 30 29 28 27 26 26 5 95 85 77 69 63 57 52 48 45 41 39 36 34 33 31 30 29 28 27 26 26 10 95 85 77 69 63 57 52 48 45 41 39 36 34 33 31 30 29 28 27 26 26 15 95 85 77 69 63 57 52 48 45 41 39 36 34 33 31 30 29 28 27 26 26 20 95 85 77 69 63 57 52 48 45 41 39 36 35 33 31 30 29 28 28 27 27 25 95 85 77 69 63 57 52 48 45 42 39 37 35 33 32 31 30 30 29 29 29 30 95 85 77 69 63 57 52 48 45 42 39 38 36 35 34 34 33 33 33 33 33 35 95 85 77 69 63 57 53 49 45 43 41 40 39 38 38 38 38 38 38 37 37 40 95 85 77 69 63 58 53 50 47 45 44 43 43 43 43 43 42 42 42 42 42 45 96 86 77 70 63 58 54 52 50 49 48 48 48 48 47 47 47 47 47 47 47 50 96 86 77 70 64 60 57 55 54 53 53 53 53 53 52 52 52 52 52 52 52 55 96 86 78 71 66 62 60 59 58 58 58 58 58 58 57 57 57 57 57 57 57 60 96 87 79 73 68 66 64 64 63 63 63 63 63 62 62 62 62 62 62 62 62 65 97 87 80 75 72 70 69 68 68 68 68 68 68 67 67 67 67 67 67 67 67 70 97 89 82 78 76 74 74 73 73 73 73 73 73 72 72 72 72 72 72 72 77 75 99 90 85 82 80 79 79 78 78 78 78 78 78 77 77 77 77 77 77 77 7' 80 100 93 88 86 85 84 84 83 83 83 83 83 83 82 82 82 82 82 82 82 8: 84 6.3 Conclusion The parsimonious model I developed was based on the largest collection of HD patients reported to date, drawn from a worldwide collaboration of 40 HD centers. This study indicated that it is possible to derive a clinically useful model (i.e. with small confidence limits) that expresses the relationship between having a certain repeat size and the probability that disease onset will occur by a certain age. By incorporating both affected and at-risk individuals, more powerful statistical techniques have been applied, allowing the development of a parametric survival model that predicts the age-specific probability of onset with narrow (2%) 95% confidence limits. The parsimonious model has six parameters and was made using all the data from the cohort. This resulted in a more precise model compared to the case when scale and location are estimated independently for each CAG between 41 and 56, requiring 32 parameters. The likelihood ratio test showed that despite having many more parameters, individually fit survival curves contain little more information. The remaining evidence in the data provided no statistically significant improvement in the fit of a logistic survival model to the data. The large sample size also helped avoid over fitting the model to random, potentially aberrant features of the data. There is some risk of bias in the worldwide cohort sample resulting from the clinical-based sampling of affected patients. My previous attempt to detect an effect of pedigree size on estimates did not find any significant correlation, suggesting that probands do not contribute a measurable bias. However, previous studies have suggested there is underascertainment of individuals with lower repeats, (e.g. as few as one out of 20 85 individuals with 38 repeats are ascertained)135. This is consistent with the data presented in Table 13 which predicts that the mean age of onset for individuals with 36 repeats is 77 years, approaching the limit of a normal lifespan. Finally, given that only 9-20% of individuals approached by testing centers take part in predictive testing, there is also the possibility that people who chose to be tested are not representative of the HD population as a whole 1 5 7 ' 1 5 8. Therefore, a direct survival analysis of individuals with a CAG between 36 and 41 likely overestimates their probability of onset. Underascertainment of individuals with smaller repeats is also supported by the different estimates of mean age of onset of the different centers, dependent on their censoring ratios. The low, medium and high censoring groups gave increasingly different estimates for the mean and variance in age at onset for repeat lengths less than 41, but gave similar estimates for larger repeat lengths (Tables 9 and 10) attempted to avoid this bias as the parameters for the parsimonious model were estimated using only individuals with a CAG between 41 and 56. Extrapolating the model provided estimates for individuals with less than 41 repeats. One alternative solution would have been to exclude those centers with abnormally low censoring rates. However, this would have decreased the sample size and have introduced possible bias through the pre-selection of eligible patients for the parametric models. The confidence limits were calculated as if, conditional on repeat length, all of the observations were statistically independent. However, there may have been some residual dependencies since many of the subjects came from common pedigrees. While this normally increases the width of confidence intervals, there was not suitable information available to correct for this from most of the contributing centers. The main source of pedigree-based dependence is almost certainly repeat length. Having 86 accounted for this as the main predictor under study, I am confident that I have accounted for as much of the pedigree-based dependence as possible. It is quite possible that the remaining within-pedigree effects are negligible. By exploiting the multifactorial relationship of the survival curves I was able to develop a model for which the average 95% confidence limit for a CAG repeat range between 39 and 50 was 2%, compared to the nonparametric (20% and 13% for the UBC and worldwide cohorts respectively) and parametric models (03%)(Table 12). This was due largely to the efficiency of the parsimonious model in using all the available data instead of fitting individual survival curves to each CAG. The mean size of the 95% Cl obtained using parametric model was not improved on dramatically by moving to the parsimonious model (Table 12). However there was a six-fold decrease in the maximum size of the 95% Cl. The approximate interpretation of the parsimonious confidence interval is that, given hypothetical replications of the entire experiment under identical conditions, this interval would cover the true parameters 95% of the time. (If the parsimonious model was correct, as assumed, the true parameters would also perfectly follow some function of the general exponential form that we are using). A confidence interval directly applicable to the estimates obtained from the individual CAG length estimates would also take the expected variance of these intervals into account. This is a likely explanation of why the "coverage", especially for the standard deviations, appears poor. 87 While there was a clear skew in the mean age of onset dependent on the censoring rate of the clinics for short repeats, no such skew is observed for larger repeat lengths. This difference is likely due to ascertainment. This method of dividing the data may exaggerate real differences. However, the sampling clearly is not uniform across centers and therefore it cannot be consistently representative of the same population, violating one of the assumptions required for unbiased survival analysis. It is also impossible, in any formal way, to say which centers have taken more or less representative samples from the underlying HD population. However, it is likely that the centers with especially low censoring (i.e., almost everyone they've studied already had onset) are probably the least representative. A bias due to underascertainment was likely minimized by restricting the model development to only those individuals with a repeat between 41 and 56, as the censoring groups all had similar estimates of mean onset over this range, compared to lower repeats. 88 CHAPTER 7 ASSESSING THE ACCURACY AND CLINICAL UTILITY OF THE PREDICTIVE MODEL 89 7.1 Introduction Any predictive model's value should not be determined by how many zeros are in the associated P-values, but also in its ability to sensibly predict outcomes with some success 1 5 9 . It is important to validate any predictive model both statistically and clinically to provide evidence that the model is adequate for clinical use. I therefore assessed the degree of overfit of the model through the ability of a model (based on 80% of the cohort) to predict the probability of onset of the remaining 20% of the cohort (the hold out-sample). This included the development of Brier Scores to estimate the accuracy of the predictions. 7.1.1 Brier Scores In developing a prediction model, it is important to provide an estimate of its accuracy. A common method is a comparison of observed and predicted event rates for individuals159. The Brier Score was originally developed forjudging the inaccuracy of probabilistic weather forecasts but is finding greater acceptance as a measure of accuracy for survival models as it provides an unbiased estimate of the predictive values of model using all the available data 1 4 1 ; 1 6 0 " 1 6 3 . The Brier Score measures the average discrepancies between a predicted probability of an event occurring (e.g. the probability of precipitation) and the observed outcome (did it rain or not)1 4 1. A perfect predictive model has a Brier Score of "0". A completely inaccurate model which, for example, predicts 100% of patients will have onset by a certain age, when 0% actually to do so will have a Brier Score of "1" for that age. A method has been recently been developed1 4 1 that allows the calculation of Brier Scores for survival curves with 90 censoring. The Brier Score as a measure of accuracy is preferable to alternative methods of evaluating model accuracy (e.g. calibration curves) which are only evaluated at a singular time point. Should other predictive models be developed, it is possible to compare them using their Brier Scores. 7.2 Results When the model was re-estimated using only 80% of the data, there was no significant overfitting evident when the predictions were compared to the observed onsets in the other 20% "holdout" sample for the 41 to 42 repeat data (Figure 16). Similar results were observed for the other 3 CAG groups (Appendix I, Figure 63 to Figure 65). Assuming that the there was enough data in the hold-out sample to generate a model that specifies the underlying probability model accurately, the Brier Score of that model (when tested against the data used to generate the model) can provide a benchmark for comparing the performance of a model built and tested on a separate sample. I found that the Brier Scores obtained using the parsimonious model to predict the age of onset for individuals in the hold out sample (red line in Figure 16) were nearly as good as those obtained when Kaplan-Meier analysis of the hold out sample itself was used to generate the predicted age of onset distribution (blue line in Figure 16). The average Brier Scores of the parsimonious model for ages 0 to 90 years for the 48 to 56, 45 to 47, 34 to 44 and 41 to 42 CAG repeat groups were small (0.04, 0.05, 0.06 and 0.07 respectively), indicating the parsimonious model provides accurate predictions of age of onset. 91 Age (years) Figure 16 Cumulative probability of onset predicted by a parsimonious model developed with 80% of the data, compared to the observed onset for the hold-out sample for 41-42 CAG repeats. Staircase line represents the nonparametric (Kaplan-Meier) analysis. Smooth curve with solid line represents a parametric model based on 80% of the data. Short dashed line represents the Brier Scores of the nonparametric prediction, based on the holdout sample, and the long dashed line represents the Brier Scores of the parametric model predictions, based on the modeling sample. 92 7.3 Conclusion The good fit of the parsimonious model developed with 80% of the data to the nonparametric results obtained from the 20% holdout sample is indicative of the clinical utility of the model. The maximal Brier score observed at the 50 th percentile is indicative of the inherent difficulty of correctly choosing between two events that occur with equal frequency (e.g. the probability of onset by age 57, predicted at birth, for an individual with 41 repeats; Table 13). The rapid improvement in Brier Score at each side of the 50 th percentile indicates the overall performance of the model makes it a useful tool in clinical practice. Several criteria have been proposed to aid in the critical appraisal of probability models 1 6 4" 1 6 6. I have endeavored to adhere to these principles during the course of my analysis as follows. 1. The cohort used in developing the predictive model is well defined, followed for a sufficient period of time, and is comparable to the patients for whom the model will be used to provide predictions for. The model I developed is based on thousands of patients, many followed at yearly clinic visits since 1984. Furthermore, the cohort was comprised of patients drawn from 40 clinics from nine countries on four continents. It is therefore reasonable to 93 conclude that the model is generally applicable to presymptomatic HD patients. 2. The clinical state predicted by the model should be relevant to the patient. Age of onset, defined as initial signs of neurological dysfunction, is a well-established clinical state of obvious interest. Furthermore, an accurate, quantitative estimate of the age-specific probability of onset can give the patient, the family, and the physician important information that can be used in making treatment decisions and may help in other aspects of life planning. 3. All variables required by the model should make clinical sense and be available at minimal expense when the prediction is to be made. As a CAG determination is routinely and reliably determined for patients at risk for HD, physicians have timely access to all the patient data required to make predictions using the model 1 2 5. The importance of repeat size in determining age of onset is undisputed, versus the possibility that some parameters could have been incorrectly chosen if logistic regression had been used to select parameters from a list 1 5 9 ; 1 6 7 . 4. A model should be accurate in that the degree of uncertainty in the probability estimate should be small enough that estimates are meaningful when making predictions. By exploiting the multifactorial relationship of the individual survival curves I was able to develop a model which estimated the age-specific likelihood of onset for 94 which the mean 95% Cl for a CAG repeat range of 41 to 56 was 2%, clearly within any reasonable bounds. 5. It is important that the model performs well and can be applied to clinics other than those used to develop the model. I tested the degree of overfit of the parsimonious model by splitting the cohort into a model development and a test cohort component. The model was developed using the same model building steps as were used in the development of the parsimonious model, but using only 80% of the data, and then tested on the remaining 20%. There was excellent agreement of between the predictions of the two estimates, indicating the model passed an essential test of credibility168. There does not appear to be a great deal of dissimilarity between patients from different centers as they generally have the same expectancy of onset regardless of origin, an observation that has been made by others2. 6. The model should have advantages over traditional methods of prediction and be easy to implement. There were no reliable predictions of age of onset available for HD patients before the start of this thesis. This work represents the first useful predictive model for HD. In an attempt to make the model as "user-friendly"as possible to use, I calculated detailed conditional probability tables for individuals of any age, 95 rather than single cut-points of 5 or 10 years (Appendix III), as suggested by established guidelines for the reporting of statistics169. 96 CHAPTER 8 PENETRANCE 97 8.1 Introduction A disease is only partially penetrant if not all individuals manifest symptoms within a normal lifetime. HD has previously been considered to be 100% penetrant with all carriers of the HD expansion manifesting the disease1'2. However, recent case reports have suggested21 that on rare occasions HD may not be fully penetrant. For instance, one study found four of seven individuals over the age of 70 with a 36 repeats had no signs or symptoms of HD. One individual with a 39 repeats died at age 95 with no definite clinical or pathological evidence of HD 2 1 . I used both the UBC and worldwide cohorts to investigate the number of cases of nonpenetrance that occurred and the parsimonious model to provide numerical estimates of the age and CAG-specific penetrance of HD. 8.2 Results No individuals in the UBC cohort with a CAG repeat length greater than 41 remained presymptomatic older than age 56 (Table 15). This result indicated that clinical manifestation of the disease was fully penetrant within a normal lifespan for this CAG repeat range. There were, however, several individuals with a CAG repeat length between 36 and 41 who did not manifeste with symptoms of HD within a normal expected lifespan (Table 15). For example, there were two males aged 78 and 85 with 36 CAG repeats and one male aged 75 with 39 CAG repeats who were presymptomatic. There was also one female with 38 CAG repeats and a male with 40 98 repeats who were not affected until age 84 and a male with 41 repeats who was not affected until age 75 (Table 15). By defining onset greater than or equal to 75 years for males and greater than or equal to 81 years for females as beyond the normal lifespan142, the data showed complete penetrance with a CAG repeat length of 42 or above. Reduced penetrance may however occur within the range of 36 to 41 CAG repeats. These data require validation in other independently ascertained large groups of patients as the numbers in my study were too small to allow for meaningful penetrance estimates for each specific repeat size. However, it was obvious that there is a trend to increasing penetrance, with increasing repeat length in the 36 to 41 repeat range, up to 90% for 39 CAG repeats and 99% for 41 CAG repeats (Table 15) The analysis of penetrance was aided by the development of the parsimonious model which predicts that there is a substantial probability that individuals with less than 40 CAG repeats will not have onset within their lifespan. For example, based on the model, I predict only a 21% chance for an individual with 36 repeats to have onset before age 80 years (Table 15). Thirteen of the 410 (three percent) individuals with a CAG less than 41 were older than 74 and still presymptomatic. The oldest presymptomatic individual was a 90 year old with 37 repeats (Table 5). The oldest presymptomatic individual with a CAG of 41 was 71, and the model predicts a 56% chance of onset by age 75, and an 80% chance by age 80 (Table 20 to Table 40, supplementary tables). There were no presymptomatic individuals older than 70 with a CAG greater than 41 (Table 5). 99 Table 15 Data used to estimate penetrance of CAG expansion in HD Gene, by CAG in the UBC cohort Number of Individuals CAG Affected Unaffected Individuals Total < 29-35 0 9 9 36 1 2 3 37 4 0 4 38 2 1 3** 39 8 1 9** 40 64 1 65 41 74 1 75 >41 575 0 575 * Age greater than 75 years (males) or greater than 80 years (females) ** Including individuals reported by Rubinsztein21. 100 CO 0 5 , Qj CD CD < O CO 00 CO 1^  CO D> CO < CD CD CD o q o q o q O rt rt CD CD o q 6 q o q o_ x— CD CD o q o q o q O rt CD ST CD q q q q rt rt co CD c6 CD CD CD o q CD o_ CD CD CD O q o q d d rt rt c\T CD CD 5T CD ST CD d d d d CO CO 4 CD CD CD CD d d d o CD CO CD oo CD CD CD d d d d 00 CD co CD oo CD d d d d o CO 00 o CD LO CD d^  CO 1^  00 CO CD CD d d d d CO CO 00 T? CD d q d d LO • t f CO 4 CO 00 d d d ci CO LO CD co co o CD d d d d CM Tf Lo" LO ST CD CN 00 d d d d CO CM ci> CO o LO CSI CD o_ CD q__ o_ CN CO LO Tf o CD CO d d d d o CO CD CO LO CO q q d d CD o co co CN 06 CO d d d d CO CN CO CD Tf d d d d Lo~ CM CO CO in Tf q q q q co o CD o 6 o_ q^  q__ q__ o r— Tf CN CD CM o d d d o 1^  LO o CO LO oo 8.3 Conclusion My initial results1 5 0 supported the case reports of Rubinsztein21 who investigated individuals with 30 to 40 repeats. I confirmed that the lower limit of CAG in individuals who manifest with HD is 36. This lower limit is supported by the fact that there were 31 presymptomatic individuals with a CAG of 35 including one man who was 93 years old. A larger proportion of affected patients with 36 to 39 repeats presented with late-onset disease. However, this analysis and previous studies are limited by their small sample sizes, precluding accurate numerical estimation of the degree of non-penetrance. This prompted further survey the penetrance of HD at different repeats using the parsimonious model as the age-specific probability of onset can be used to provide an estimate of penetrance. The model-based extrapolations suggest that many individuals with a CAG less than 41 will not show symptoms of HD within their lifetime, including up to 79% of those with 36 repeats (Table 16). This analysis provides the first numeric estimate of penetrance of HD by age and CAG. 102 C H A P T E R 9 U S I N G T H E P R E D I C T I V E M O D E L F O R T H E D E S I G N O F C L I N I C A L T R I A L S 103 9.1 Introduction The ultimate aim of HD research is to provide insights that will slow or stop the progression of HD in affected persons and delay or prevent onset in at risk persons, i.e. those with greater than 35 repeats. Although therapeutics have been identified to decrease the rate of progression in some neurological diseases, no agents have been identified to delay the course of any triplet repeat disease. Should an intervention prove effective in slowing the rate of HD in diagnosed individuals, there will be an immediate, compelling need to test these treatment strategies on individuals who are considered presymptomatic but are gene carriers for the disease. Without the ability to accurately estimate the probability of onset, clinical trials to detect a delay of onset among presymptomatic individuals are difficult to design. Before this thesis, there was no well-established methodology for predicting age of onset in HD. Consequently, there was no adequate method to design a clinical trial to detect a delay in age of onset induced by an experimental therapeutic. The development of the parsimonious model allows the investigation of alternative trial designs so that when one does become available, it can be implemented as quickly and cost effectively as possible. The challenges for presymptomatic trials for HD include the inclusion of enough individuals who would be expected to have onset during the trial, while ensuring the trial design is practical in that it can be completed in a timely manner with a reasonable balance between power and cost. 104 9.2 Results The clinical trial design was based on assumptions realistic for many clinical trials including an 80% power to detect a 20%, 50% or 80% slowing of the progression towards onset within four years (p = 0.05). A 50% delay of progression indicates that the rate of approaching onset is reduced by half, such that a patient's risk of onset at four years is reduced to the risk normally expected at two years. A four-year trial provides a balance between ensuring a potential treatment would have time to have an observable effect while keeping the trial duration within reasonable limits to control cost. The parametric model can be used in a similar manner to explore alternative scenarios. In designing a clinical trial for persons at-risk for HD, one would ideally test a compound on a cohort of individuals who would be expected to have onset of symptoms within a short period. A double-blind placebo control approach would necessitate seeing a difference in onset between treated and control groups. The issue is how to enroll those persons most likely to have onset in the near future to more rapidly reach statistical significance between the treated and untreated groups. My results indicate that it is necessary to take into account both CAG and age. Individuals with a larger CAG are more likely to have earlier onset (Table 5 and Table 13). Keeping CAG constant, as one chooses older at-risk individuals, the likelihood that they will have onset in a shorter period also increases (Table 13), although the number of eligible individuals in that category decreases (Table 17). Conversely, if one recruits younger patients, to ensure that enough individuals could be enrolled in a trial, more individuals would be enrolled who are far from onset, and who therefore add little power to the trial. The challenge therefore is to strike a balance between age and the number of patients needed to 105 detect an effect. For example, unrestricted enrollment of the presymptomatic cohort described in Table 17 (a summary of the worldwide cohort) in a clinical trial designed to detect a 50% delay of onset within four years, would require 592 individuals (Table 18). However, individuals with less than 38 repeats would contribute almost no information (Table 13). Table 13 indicates a clinical trial using only individuals aged greater than 40 would be preferable. The likelihood of finding over 350 presymptomatic individuals of this advanced age (based on the data proved by the 40 centers for HD research, and the parsimonious model) is low, precluding this as a viable design. In contrast, a clinical trial based on restricting enrollment to those individuals older than 36 with 40 or 41 repeats, older than 31 with 42 repeats, and older than 26 for individuals with a CAG between 43 and 56, provided a good compromise between the statistical power contributed by each patient and the proportion of the available cohort that could be included. Such a study would require 416 at-risk individuals to detect a 50% delay in the age-specific probability of onset (Table 13). This is feasible based on the number of asymptomatic patients who have already participated in predictive testing and are known to have a CAG expansion. 106 •e o sz o u CD •g > o CD E o CO -t—' o o LO CD •tf OO •tf • t f CO • t f « S T f l D ( M S C O ' - C » ) ^ 0 ! 0 ) ^ C D O ) C M C N S ^ ^ O O I N ^ C M rt CN CN CN T -T- T - CO T -i - LO -tf CN LO CO CO *tf CO LO LO T - LO LO CN rt •tf CD CN CN 1^  rt CM LO T _ 00 CD CD CD CM CO • t f rt rt rt • t f rt CN CM o CD T - • t f CO • t f CO CN o CO o> o LO LO o CO LO o o • t f CN CM T ~ T ~ CN • t f CD CO • t f o 00 LO CN rt CM • t f CM CM CN • t f CO o CM • t f LO -- CO CD CO CN o CD O CN CD CN CD CN CM CD f- • t f LO rt rt rt CO en • t f rt rt rt CD CM LO CO h~ N- CM CO CN rt • t f LO rt CO LO CO 00 CO CM CD CO rt CN CM CM T _ T ~ CN CO CN T ~ rt T _ T ~ T ~ CN T ~ T ~ T _ • t f rt CD CO T ~ x— T _ CO T _ co "• CM CO LO LO O LO o LO O LO o LO o LO o LO O LO o •tal CU CO CO O CM CN CO CO T LO LO CD CD { h-1 i 00 i oo 1 CD i CD i •tal CD rt CD CD T— CD CD —^ CD CD T - CO CD T— o 1-< > CD rt CM CN ro CO 5 "tf LO LO CD CD l-~ 00 00 CD 107 o V - ' CD E o Q . E >» CO 0 ro c o E CD CD -4—' c 0 CD Q. 0 c 0 2 a: c 0 o £ Al 0, o £ Al 0, * o £ Al 0, * O CO ^ CD Al CD TJ 0 C O o 8 2 5 § c g 'co 0 0 2- co co c 2 o Q . CO "2 © CD .E o 5 ~ o co CM CD O CO CD 00 •t- CO CD <=> S tf r: °° CM ^ S9 tf CM CD CO 22 CD CD ^ S ? CO tf CO O tf h~ co cn T -g CM CM cS cn cn o o o CM ID 00 CD ID TJ c CD 00 CO c 0 0 ^ 0 DI CD < o CD < O CO V) CO zz TJ > TJ C CD CM C CD 0 TJ O TJ C CO co" CO 0 C L 0 CN tf _c CO c: CO x: 0 TJ o UJ CD 0 CL 0 O O tf ~ CD £ ID CD TJ CO C C RO CO CO £ ^  i_ c 0 0 TJ 0 O •§ * 0) * DI 108 9.3 Conclusion The challenges in designing a feasible presymptomatic clinical trial include recruitment of enough individuals with a reasonable risk of onset during a trial that can be completed in a timely manner with a reasonable balance between power and cost. My results show that careful consideration of the enrollment criteria can result in a 28% reduction in the number of patients required (e.g. from 592 patients for a trial to detect a 50% increase in age of onset using unrestricted enrollment, to 416 using the restricted enrollment criteria). Restricted enrollment would both reduce patient exposure to experimental treatments that may have significant side effects and decrease both the overall time and cost of the trial. For pharmaceutical companies, time is an important barrier to implementation of trials for late onset neurodegenerative diseases, since a patent is issued long before a clinical trial begins, not when the drug is marketed. At a typical cost of $10,000 per patient per year for a four-year HD clinical trial, using a restricted enrollment design could result in a total savings of $7,040,000 over four years if a trial were designed to have a 80% power to detect a 50% delay of onset (p = 0.05), compared to a similar trial designed with unrestricted enrollment (Table 18). Even for a clinical trial to detect a dramatic (80%) decrease in the age-specific probability of onset, using the restricted enrollment criteria can result in considerable ($2,160,000) savings over a four-year trial. Estimates based on the criteria that we have proposed suggest that the number and type (age and repeat size) of patients required to detect at least a 50% decrease in the age-specific probability of onset will be attainable for a multi-center trial. However, as a clinical trial to 109 detect a 20% delay of onset would require approximately 3000 patients, this likely precludes a trial attempting to detect this level of effect. 110 CHAPTER 10 P R E D I C T I N G A G E O F D E A T H U S I N G T H E U B C C O H O R T Part of the work presented in this chapter has contributed to one publication: Wellington, C. L; Brinkman, R. R.; O'Kusky, J. R., and Hayden, M. R. Toward understanding the molecular pathology of Huntington's disease. Brain Pathol. 1997 Jul; 7(3):979-1002. I l l 10.1 Introduction In addition to predicting age of onset, I used survival analysis to determine the probability of death by a particular age for an individual with a specific CAG size 1 7 0 . 10.2 Results Individuals from the UBC cohort with a CAG repeat length greater than 36 repeats were initially selected. The data were analyzed using Kaplan-Meier survival analysis. I included 542 individuals with CAG lengths of 41-45, of whom 75 had died. The majority (60%) of persons in the database with a CAG greater than 36 had a CAG repeat length in this range. Other repeats were excluded from the analysis as the small numbers of individuals for these particular CAG sizes precluded rigorous statistical analyses. The analysis predicted that while only 4% of persons with 41 CAG repeats (n=74) would die by 60 years of age, this increases to 30% for 43 CAG repeats (n=93) and 83% for 45 CAG repeats (n=66). After the development of the parametric model for predicting the age-specific likelihood of onset, the UBC cohort was used to investigate the possibility that a parametric model could be fit to the individual CAG survival curves in a similar manner as was done for the age of onset analysis. Six hundred and sixty eight individuals who had shown symptoms of HD were selected from the UBC cohort based on having available information. Of these age at death information was known for 115 (Table 19). As was observed for the probability of onset analysis, the logistic distribution gave a good fit to the nonparametric survival analysis estimating the probability of death. The cumulative 112 probability of death for a CAG of 45 is shown in Figure 17. Additional CAG repeats for which there was sufficient information (CAG 41 to 44) are shown in Appendix I, Figure 59 to Figure 62. 113 CM T— I in o m LO CO O LO CO •* N T - t CM O i - CO CO CO CN CO 00 CO i - T -00 CO T - CN LO CO 00 i - o CD Tf o o CN CO CO O Is. 1 - CN T t 00 T t LO T t T t Tt" Is. CM T -00 Ti-ro ® , i— O < o T - CN CO CO T - CO CO CO O CD LO CO Is. CD LO CO CO 1 - Tfr CM CO CD T - N-T - CO O O IO CN LO LO S CD CN LO CO T - LO T t CN CO 00 CO LO LO 00 Tl-CO 00 o T J " 1 -00 IO T - LO CN CO CO T t T -co o o CO CN LO IS. o T t 5 W 0) CO c CO CD E 3 CO CN LO Tf CO LO |S- Tf CD T -|s. LO O0 rs. LO LO CO CD CO LO T - (B N CO LO oo n s CN f~ CO CD Is. co is. N O) (J) CO LO 00 CO i -CO CO ' t -is, o T - CO LO is. T - CO CN T t T - IS. T t T f r*-o h-LO T -CO 1^ - CO CN T -00 LO CN Tf T - T -T t CN 00 T t T -O T t CO CD i -CO CO (s. CN O O CO CO d CD o' is. |s-CO o CO 00 CO is. CO CO CO LO CN CO CN O T t CN 00 LO 00 LO CD LO Is. CO T -LO CN is. CO LO CN T t LO CD i -LO CN CO is. CO CD 00 CN T -|S- T -co CD CO CO T -00 CO is. CO CN CO CO CO CO d CN O CN O LO CO o tf c o CO c 'E </) CO 7= X « Q) "o "55 > JS 1 * c o CO c 'E CO X CD C/3 CO c o CD CO CO cu CD CO CD > X CD CD E 3 _ Z < S c E CO X CD CO CO CD 2 CD Ol CO — "5 ° CO Q 2 GO to JZ 3 2 CD > T3 c E 3 T3 CD co CO 8 CD 2 .i -S -8 •= x- « ro 0 CD CD a»"° 1 « o ™ p CD E 3 S? .£ CO Q 2 ^ CO < o •o CO CD Tf J/J CO 3 •g > C E 3 C C g '•c o Q. O o H CL 114 115 10.3 Conclusion Kaplan-Meier curves for age of death were similar in shape to those based on age of onset data. Extending the analysis to a parametric model indicated that logistic distributions gave an accurate age-specific likelihood of age of death for all the CAG repeats tested, coinciding with the model used for the prediction of age of onset. Unfortunately as sufficient data on age of death was only available for enough individuals to construct survival curves for a repeat between 41 and 45, it was not possible to resolve the controversy surrounding the hypothesis that the duration of HD is associated with repeat s ize 8 4 ; 9 0 ; 1 0 1 . While a recent analysis of almost 3000 patients has shown that duration of disease is influenced by the age of onset, with juvenile and late onset patients having the shortest duration, repeat size was not taken into account1 7 1. My analysis provides insight not only into progression of the disease, but may also be used in concert with age of onset prediction to aid those affected with HD to plan for their future. 116 CHAPTER 11 D I S C U S S I O N 117 11.1 Summary of Results During the course of my thesis, I developed a series of survival-based analyses to accurately predict the age-specific probability of onset of HD and the age specific probability of death subsequent to onset of disease. I propose that these predictions will be useful for genetic counseling, clinical management and clinical trial design and are useful for patients at risk, their family members. The analysis also provides additional insights into the penetrance of the disease and underascertainment of disease onset. 11.1.1 Prediction of onset Properly developed and validated predictions can influence clinical practice166. My research should help clinicians provide accurate predictions of onset to patients. However, caution should be used while discussing this information with patients. Individuals at-risk for HD can use the information presented in this thesis in planning for their futures, should they decide to undergo testing. However, while it is possible to use the model I have presented to estimate probability of onset very accurately, with narrow confidence intervals, we still cannot necessarily predict the future with a high degree of accuracy for all repeat lengths. Furthermore, even a perfect predictive model does not provide any certainty as to whether a patient will experience onset before the time of prediction159. For example the model may estimate that there is a 60% likelihood of onset by age 50, but this includes a chance that onset will occur as early as 20. The parsimonious model I have developed does give the most accurate predictions of onset of HD currently available. While predictive uncertainty in late onset diseases such as HD will probably always be a reality, physicians can with some confidence use 118 predictions based on our model to provide counseling for individuals and families who are at-risk for HD and desire age of onset information. 11.1.2 Clinical trials Statistical techniques alone cannot replace a thoughtful approach to the design of any clinical trial. However, hastening assessment of a potential therapeutic option as quickly as practical is an ethical imperative, so that successful treatments can be made available to all persons at risk for HD. The predictive model should be of assistance in future clinical trial designs involving presymptomatic individuals by aiding investigators in designing a clinical trial for HD that has a higher chance of detecting a delay of onset of symptoms, using fewer individuals. It represents an excellent practical example of how pharmacogenetics can immediately improve trial design and reduce costs for drug development. 11.1.3 Penetrance My results should give new hope to those individuals who have a CAG less than 41 and provides the first numeric estimate of penetrance of HD by age and CAG. I have shown for the first time that penetrance for HD can be accurately estimated, and is in fact quite low for individuals with lower (i.e. less than 41) repeats. 11.1.4 CAG-specific influence of factors modifying age of onset The significant association between the variance of the probability of onset and CAG indicates that the contribution of modifiers (both genetic and environmental) is less obvious in individuals with higher repeats sizes (e.g. greater than 44) which I assume is due to the overwhelming effect of polyglutamine length. Similarly, the larger variance in 119 the probability of onset for lower repeats could be indicative of modifiers playing a greater role when the CAG size is less (Figure 15). Differences in the CAG distribution of the cohorts used to investigate the influence of modifiers such as apolipoprotein on age of onset of HD could be responsible for differing findings of significance of the effect of different alleles on age of onset 1 7 2 ; 1 7 3 . In the future, it might be helpful to compare the effects of potential modifiers on individuals with lower (e.g. greater than 35 but less than 42) versus higher repeat size. 11.2 Future Investigations 11.2.1 Identification of individuals with extreme phenotypes using the parsimonious model The parsimonious model can be used to assign a likelihood of observing an individual with a very early or a late onset, including identifying those individuals who lived disease free longer than would have been expected. Individuals who had onset in the lower or upper fifth percentile, or who are still disease free at the 95 t h percentile have extreme phenotypes, given their genotype. Families whose members cluster at extremes of their CAG-specific survival curves, or alternatively, at opposite ends of the spectrum would be very interesting for further in-depth research. The identification of these individuals through the application of the parsimonious model would provide a valuable resource for use in genotyping, or other studies investigating factors that modify onset, and could provide a lead to targets that may prove effective in delaying onset. For example, individuals in a family with early onset for their repeat size may have shared genetic factors that contribute to this extreme phenotype. Pharmaceuticals to block a gene effect are easier to design, so this family would be of particular interest if it were large 120 enough, or if enough other families with the same sub-phenotype could be collected. By blocking the effect of a modifying gene that causes earlier age of onset, onset may be delayed for other HD patients. 11.2.2 Models for other triplet diseases The relationship between repeat size and age of onset is well documented for many of the CAG repeat disorders107. It is possible that the methods I used for the development of the mathematical model of onset can be applied to these disorders with equal success, providing more accurate prediction of onset for these other disorders as well. 11.2.3 Stochastic model of disease progression The prognostic model I have developed could be extended into a stochastic process model to describe the progression of an individual with HD through different disease states. There are several recognized stages of HD 1 7 4 . It is unclear if time between these stages is dependent on repeat size. The answer to this question would help in the design of clinical trials, and in the treatment of individuals. The parsimonious model along with the logistic model for age of death could be used as the foundation for the development of a more comprehensive model incorporating multiple endpoints. 11.2.4 Extensive clinical trial design A breakthrough in any neurological disorder such as Parkinson's, Lou Gehrig's and Alzheimer's disease may be used in the development of a treatment for HD as these diseases have similar neurodegenerative basis (i.e. plaque formation could be analogous to inclusion formation) and may respond to similar compounds. 121 Once a compound has been identified that has the potential for delaying onset of HD, more in-depth designs for clinical trials can be prepared including such factors as recruitment and drop out rates based on information about side effects and frequency of treatment. 11.2.5 Conclusion Genetic tests are rightly celebrated as the first clinical fruits of the revolution in molecular biology. I have used a large sample and specialized statistical methods to accurately predict clinical outcome based on a familiar genetic test. My results are important today because they allow the clinician to confidently inform patients about what their positive test result actually means. My results will likely be more important tomorrow, since they also provide a basis for targeting medical interventions and designing clinical trials. 122 B I B L I O G R A P H Y 123 Reference List 1. Hayden, Michael R. Huntington's Chorea. (1981). New York, Springer-Verlag. 2. Harper, Peter S. Huntington's Disease. (1996). Great Britain, The University Press. Major Problems in Neurology. Warlow, C. P. and van Gijn, J. 3. Starr, A. A disorder of rapid eye movements in Huntington's chorea. Brain 90, 545-64 (1967). 4. Bollen, E. et al. Horizontal and vertical saccadic eye movement abnormalities in Huntington's chorea. J Neurol Sci 74, 11-22 (1986). 5. Lasker, A.G., Zee, D.S., Hain, T.C., Folstein, S.E., and Singer, H.S. Saccades in Huntington's disease: slowing and dysmetria. Neurology 38, 427-31 (1988). 6. Harper, P.S. Clinical consequences of isolating the gene for Huntington's disease. BMJ 307, 397-8 (1993). 7. Gilliam, T.C. and Gusella, J.F. Huntington's Diease. Butterworths, Boston (1988). 8. Aminoff, M.J., Marshall, J., Smith, E.M., and Wyke, M.A. Pattern of intellectual impairment in Huntington's chorea. Psychol Med 5, 169-72 (1975). 9. Caine, E.D. and Shoulson, I. Psychiatric syndromes in Huntington's disease. Am J Psychiatry 140, 728-33 (1983). 10. Sanberg, P.R., Fibiger, H.C., and Mark, R.F. Body weight and dietary factors in Huntington's disease patients compared with matched controls. Med J Aust 1, 407-9 (1981). 11. Roos, R.A., Hermans, J., Vegter-van der Vlis, M., van Ommen, G.J., and Bruyn, G.W. Duration of illness in Huntington's disease is not related to age at onset. J Neurol Neurosurg Psychiatry 56, 98-100 (1993). 12. Newcombe, R.G., Walker, D.A., and Harper, P.S. Factors influencing age at onset and duration of survival in Huntington's chorea. Ann Hum Genet 45, 387-96 (1981). 13. Harper, P.S. In Harper, P.S. (ed.) The Natural history of Huntinton's disease. W.B. Saunders Company Ltd, Toronto (1996). 14. Katzman, R. Differential diagnosis of dementing illnesses. Neurol Clin 4, 329-40 (1986). 15. The Huntington's Disease Collaborative Research Group. A novel gene containing a trinucleotide repeat that is expanded and unstable on Huntington's disease chromosomes. Cell 72, 971-83 (1993). 16. Rigamonti, D. ef al. Wild-type huntingtin protects from apoptosis upstream of caspase-3. J Neurosci 20, 3705-13 (2000). 17. Nasir, J. era/. Targeted disruption of the Huntington's disease gene results in embryonic lethality and behavioral and morphological changes in heterozygotes. Cell 81, 811-23 (1995). 18. Duyao, M.P. era/. Inactivation of the mouse Huntington's disease gene homolog Hdh. Science 269, 407-10 (1995). 19. Metzler, M. ef al. Huntingtin is required for normal hematopoiesis. Hum Mol Genet 9, 387-94 (2000). 124 20. Engqvist-Goldstein, A.E., Kessels, M.M., Chopra, V.S., Hayden, M.R., and Drubin, D.G. An actin-binding protein of the Sla2/Huntingtin interacting protein 1 family is a novel component of clathrin-coated pits and vesicles. J Cell Biol 147, 1503-18 (1999). 21. Rubinsztein, D.C. ef al. Phenotypic characterization of individuals with 30-40 CAG repeats in the Huntington disease (HD) gene reveals HD cases with 36 repeats and apparently normal elderly individuals with 36-39 repeats. Am J Hum Genet 59, 16-22 (1996). 22. Kremer, B. ef al. A worldwide study of the Huntington's disease mutation. The sensitivity and specificity of measuring CAG repeats. N Engl J Med 330, 1401-6 (1994). 23. Nance, M.A., Mathias-Hagen, V., Breningstall, G., Wick, M.J., and McGlennen, R.C. Analysis of a very large trinucleotide repeat in a patient with juvenile Huntington's disease. Neurology 52, 392-4 (1999). 24. White, J.K. ef al. Huntingtin is required for neurogenesis and is not impaired by the Huntington's disease CAG expansion. Nat Genet 17, 404-10 (1997). 25. Persichetti, F. ef al. Differential expression of normal and mutant Huntington's disease gene alleles. Neurobiol Dis 3, 183-90 (1996). 26. Wexler, N.S. ef al. Homozygotes for Huntington's disease. Nature 326, 194-7 (1987). 27. Myers, R.H. ef al. Late onset of Huntington's disease. J Neurol Neurosurg Psychiatry 48, 530-4 (1985). 28. Durr, A. ef al. Homozygosity in Huntington's disease. J Med Genet 36, 172-3 (1999). 29. Ranen, N.G. ef al. Anticipation and instability of IT-15 (CAG)n repeats in parent- offspring pairs with Huntington disease. Am J Hum Genet 57, 593-602 (1995). 30. MacDonald, M.E. ef al. Gametic but not somatic instability of CAG repeat length in Huntington's disease. J Med Genet 30, 982-6 (1993). 31. Kovtun, I.V. and McMurray, CT. Trinucleotide expansion in haploid germ cells by gap repair. Nat Genet 27, 407-11 (2001). 32. Ireland, M.J., Reinke, S.S., and Livingston, D.M. The impact of lagging strand replication mutations on the stability of CAG repeat tracts in yeast. Genetics 155, 1657-65 (2000). 33. Telenius, H. ef al. Somatic and gonadal mosaicism of the Huntington disease gene CAG repeat in brain and sperm. Nat Genet 6, 409-14 (1994). 34. Kennedy, L. and Shelbourne, P.F. Dramatic mutation instability in HD mouse striatum: does polyglutamine load contribute to cell-specific vulnerability in Huntington's disease? Hum Mol Genet 9, 2539-44 (2000). 35. Kuemmerle, S. ef al. Huntington aggregates may not predict neuronal death in Huntington's disease. Ann Neurol 46, 842-9 (1999). 36. Graveland, G.A., Williams, R.S., and DiFiglia, M. Evidence for degenerative and regenerative changes in neostriatal spiny neurons in Huntington's disease. Science 227, 770-3 (1985). 37. Graybiel, A.M., Aosaki, T., Flaherty, A.W., and Kimura, M. The basal ganglia and adaptive motor control. Science 265, 1826-31 (1994). 38. Kremer, B., Weber, B., and Hayden, M.R. New insights into the clinical features, pathogenesis and molecular genetics of Huntington disease. Brain Pathol 2, 321-35 (1992). 125 39. Albin, R.L. et al. Preferential loss of striato-external pallidal projection neurons in presymptomatic Huntington's disease. Ann Neurol 31, 425-30 (1992). 40. Vonsattel, J.P. and DiFiglia, M. Huntington disease. J Neuropathol Exp Neurol 57, 369-84 (1998). 41. Albin, R.L., Young, A.B., and Penney, J.B. The functional anatomy of basal ganglia disorders. Trends Neurosci 12, 366-75 (1989). 42. Strong, T.V. ef al. Widespread expression of the human and rat Huntington's disease gene in brain and nonneural tissues. Nat Genet 5, 259-65 (1993). 43. Li, S.H. ef al. Huntington's disease gene (IT15) is widely expressed in human and rat tissues. Neuron 11, 985-93 (1993). 44. Gutekunst, CA. ef al. Identification and localization of huntingtin in brain and human lymphoblastoid cell lines with anti-fusion protein antibodies. Proc Natl Acad Sci U S A 92, 8710-4 (1995). 45. Sapp, E. ef al. Huntingtin localization in brains of normal and Huntington's disease patients. Ann Neurol 42, 604-12(1997). 46. Sharp, A.H. ef al. Widespread expression of Huntington's disease gene (IT15) protein product. Neuron 14, 1065-74 (1995). 47. DiFiglia, M. et al. Huntingtin is a cytoplasmic protein associated with vesicles in human and rat brain neurons. Neuron 14, 1075-81 (1995). 48. Persichetti, F. ef al. Normal and expanded Huntington's disease gene alleles produce distinguishable proteins due to translation across the CAG repeat. Mol Med 1, 374-83 (1995). 49. Trottier, Y. ef al. Cellular localization of the Huntington's disease protein and discrimination of the normal and mutated form. Nat Genet 10, 104-10 (1995). 50. Engelender, S. ef al. Huntingtin-associated protein 1 (HAP1) interacts with the p150Glued subunit of dynactin. Hum Mol Genet 6, 2205-12 (1997). 51. Davies, S.W. ef al. Formation of neuronal intranuclear inclusions underlies the neurological dysfunction in mice transgenic for the HD mutation. Cell 90, 537-48 (1997). 52. DiFiglia, M. ef al. Aggregation of huntingtin in neuronal intranuclear inclusions and dystrophic neurites in brain. Science 277, 1990-3 (1997). 53. Peters, M.F. ef al. Nuclear targeting of mutant Huntingtin increases toxicity. Mol Cell Neurosci 14, 121-8 (1999). 54. Kim, M. ef al. Mutant huntingtin expression in clonal striatal cells: dissociation of inclusion formation and neuronal survival by caspase inhibition. J Neurosci 19, 964-73 (1999). 55. Lunkes, A. and Mandel, J.L. A cellular model that recapitulates major pathogenic steps of Huntington's disease. Hum Mol Genet 7, 1355-61 (1998). 56. Gutekunst, CA. ef al. Nuclear and neuropil aggregates in Huntington's disease: relationship to neuropathology. J Neurosci 19, 2522-34 (1999). 57. Saudou, F., Finkbeiner, S., Devys, D., and Greenberg, M.E. Huntingtin acts in the nucleus to induce apoptosis but death does not correlate with the formation of intranuclear inclusions. Cell 95, 55-66 (1998). 58. Scherzinger, E. ef al. Self-assembly of polyglutamine-containing huntingtin fragments into amyloid-126 like fibrils: implications for Huntington's disease pathology. Proc Natl Acad Sci U S A 96, 4604-9 (1999). 59. Yamamoto, A., Lucas, J.J., and Hen, R. Reversal of neuropathology and motor dysfunction in a conditional model of Huntington's disease. Cell 101, 57-66 (2000). 60. Green, H. Human genetic diseases due to codon reiteration: relationship to an evolutionary mechanism. Cell 74, 955-6 (1993). 61. Perutz, M. Polar zippers: their role in human disease. Protein Sci 3, 1629-37 (1994). 62. Perutz, M.F. Glutamine repeats and inherited neurodegenerative diseases: molecular aspects. Curr Opin Struct Biol 6, 848-58 (1996). 63. Karpuj, M.V. ef al. Transglutaminase aggregates huntingtin into nonamyloidogenic polymers, and its enzymatic activity increases in Huntington's disease brain nuclei. Proc Natl Acad Sci U S A 96, 7388-93 (1999). 64. Li, X.J. et al. A huntingtin-associated protein enriched in brain with implications for pathology. Nature 378, 398-402 (1995). 65. Gutekunst, CA. ef al. The cellular and subcellular localization of huntingtin-associated protein 1 (HAP1): comparison with huntingtin in rat and human. J Neurosci 18, 7674-86 (1998). 66. Page, K.J., Potter, L., Aronni, S., Everitt, B.J., and Dunnett, S.B. The expression of Huntingtin-associated protein (HAP1) mRNA in developing, adult and ageing rat CNS: implications for Huntington's disease neuropathology. Eur J Neurosci 10, 1835-45 (1998). 67. Kalchman, M.A. ef al. HIP1, a human homologue of S. cerevisiae Sla2p, interacts with membrane-associated huntingtin in the brain. Nat Genet 16, 44-53 (1997). 68. Hackam, A.S. ef al. Huntingtin interacting protein 1 induces apoptosis via a novel caspase-dependent death effector domain. J Biol Chem 275, 41299-308 (2000). 69. Tukamoto, T., Nukina, N., Ide, K., and Kanazawa, I. Huntington's disease gene product, huntingtin, associates with microtubules in vitro. Brain Res Mol Brain Res 51, 8-14 (1997). 70. Kalchman, M.A. ef al. Huntingtin is ubiquitinated and interacts with a specific ubiquitin- conjugating enzyme. J Biol Chem 271, 19385-94 (1996). 71. Ferrigno, P. and Silver, P.A. Polyglutamine expansions: proteolysis, chaperones, and the dangers of promiscuity. Neuron 26, 9-12 (2000). 72. Cattaneo, E. ef al. Loss of normal huntingtin function: new developments in Huntington's disease research. Trends Neurosci 24, 182-8 (2001). 73. Kumar, S. Regulation of caspase activation in apoptosis: implications in pathogenesis and treatment of disease. Clin Exp Pharmacol Physiol 26, 295-303 (1999). 74. Goldberg, Y.P. ef al. Cleavage of huntingtin by apopain, a proapoptotic cysteine protease, is modulated by the polyglutamine tract. Nat Genet 13, 442-9 (1996). 75. Wellington, CL. ef al. Caspase cleavage of gene products associated with triplet expansion disorders generates truncated fragments containing the polyglutamine tract. J Biol Chem 273, 9158-67 (1998). 76. Ona, V.O. ef al. Inhibition of caspase-1 slows disease progression in a mouse model of Huntington's disease. Nature 399, 263-7 (1999). 127 77. Wellington, CL. et al. Inhibiting caspase cleavage of huntingtin reduces toxicity and aggregate formation in neuronal and nonneuronal cells. J Biol Chem 275, 19831-8 (2000). 78. Reddy, P.H. ef al. Behavioural abnormalities and selective neuronal loss in HD transgenic mice expressing mutated full-length HD cDNA. Nat Genet 20, 198-202 (1998). 79. Sorensen, S.A., Fenger, K., and Olsen, J.H. Significantly lower incidence of cancer among patients with Huntington disease: An apoptotic effect of an expanded polyglutamine tract? Cancer 86, 1342-6(1999). 80. Leavitt, B.R. ef al. Wild-type huntingtin reduces the cellular toxicity of mutant huntingtin in vivo. Am J Hum Genet 68, 313-24 (2001). 81. Zuccato, C. ef al. Loss of Huntingtin-Mediated BDNF Gene Transcription in Huntington's Disease. Science (2001). 82. Ferrer, I., Goutan, E., Marin, C, Rey, M.J., and Ribalta, T. Brain-derived neurotrophic factor in Huntington disease. Brain Res 866, 257-61 (2000). 83. Penney, J.B. Jr ef al. Huntington's disease in Venezuela: 7 years of follow-up on symptomatic and asymptomatic individuals. Mov Disord 5, 93-9 (1990). 84. Kieburtz, K. ef al. Trinucleotide repeat length and progression of illness in Huntington's disease. J Med Genet 31, 872-4 (1994). 85. Duyao, M. ef al. Trinucleotide repeat length instability and age of onset in Huntington's disease. Nat Genet 4, 387^ 92(1993). 86. Barron, L.H. ef al. A study of the Huntington's disease associated trinucleotide repeat in the Scottish population . J Med Genet 30, 1003-7 (1993). 87. Andrew, S.E. ef al. The relationship between trinucleotide (CAG) repeat length and clinical features of Huntington's disease. Nat Genet 4, 398-403 (1993). 88. MacMillan, J.C ef al. Molecular analysis and clinical correlations of the Huntington's disease mutation. Lancet 342, 954-8 (1993). 89. Stine, O.C. ef al. Correlation between the onset age of Huntington's disease and length of the trinucleotide repeat in IT-15. Hum Mol Genet 2, 1547-9 (1993). 90. Brandt, J. ef al. Trinucleotide repeat length and clinical progression in Huntington's disease. Neurology 46, 527-31 (1996). 91. Masuda, N. ef al. Analysis of triplet repeats in the huntingtin gene in Japanese families affected with Huntington's disease. J Med Genet 32, 701-5 (1995). 92. Yapijakis, C. ef al. Linkage disequilibrium between the expanded (CAG)n repeat and an allele of the adjacent (CCG)n repeat in Huntington's disease patients of Greek origin. Eur J Hum Genet 3 , 228-34 (1995). 93. Trottier, Y., Biancalana, V., and Mandel, J.L. Instability of CAG repeats in Huntington's disease: relation to parental transmission and age of onset. J Med Genet 31, 377-82 (1994). 94. Novelletto, A. ef al. Analysis of the trinucleotide repeat expansion in Italian families affected with Huntington disease. Hum Mol Genet 3, 93-8 (1994). 95. Legius, E. etal. Limited expansion of the (CAG)n repeat of the Huntington gene: a premutation (?). Eur J Hum Genet 2, 44-50 (1994). 128 96. Ashizawa, T., Wong, L.J., Richards, C.S., Caskey, C.T., and Jankovic, J. CAG repeat size and clinical presentation in Huntington's disease. Neurology 44, 1137-43 (1994). 97. de Rooij, K.E. ef al. Borderline repeat expansion in Huntington's disease. Lancet 342, 1491-2 (1993). 98. Simpson, S.A., Davidson, M.J., and Barron, L.H. Huntington's disease in Grampian region: correlation of the CAG repeat number and the age of onset of the disease. J Med Genet 30, 1014-7 (1993). 99. Craufurd, D. and Dodge, A. Mutation size and age at onset in Huntington's disease. J Med Genet 30,1008-11 (1993). 100. Hayden, M. Basal ganglia disorders. Churchill Livingston, New-York ((1996)). 101. Illarioshkin, S.N. ef al. Trinucleotide repeat length and rate of progression of Huntington's disease. Ann Neurol 36, 630-5(1994). 102. Newcombe, R.G. A life table for onset of Huntington's chorea. Ann Hum Genet 45, 375-85 (1981). 103. Adams, P. B. Statistical Analysis of Age at Onset in Huntington's Disease. Georgia State University, Georgia State University. (1986). 104. Aylward, E.H. ef al. Basal ganglia volume and proximity to onset in presymptomatic Huntington disease. Arch Neurol 53, 1293-6 (1996). 105. Squitieri, F. ef al. Family and Molecular Data for a Fine Analysis of Age at Onset in Huntington Disease. Am J Med Genet 95, 366-373 (2000). 106. Cummings C.J. and Zoghbi H.Y. Fourteen and counting: unraveling trinucleotide repeat diseases. Human Molecular Genetics 9, 909-916 (2000). 107. Gusella, J.F. and MacDonald, M.E. Molecular genetics: unmasking polyglutamine triggers in neurodegenerative disease. Nat Rev Neurosci 1, 109-15 (2000). 108. O'Hearn, E., Holmes, S.E., Calvert, P.C., Ross, C.A., and Margolis, R.L. SCA-12: Tremor with cerebellar and cortical atrophy is associated with a CAG repeat expansion. Neurology 56, 299-303 (2001). 109. La Spada, A.R. ef al. Meiotic stability and genotype-phenotype correlation of the trinucleotide repeat in X-linked spinal and bulbar muscular atrophy. Nat Genet 2, 301-4 (1992). 110. Giunti, P. ef al. The trinucleotide repeat expansion on chromosome 6p (SCA1) in autosomal dominant cerebellar ataxias. Brain 117 ( Pt 4), 645-9 (1994). 111. Koide, R. ef al. Unstable expansion of CAG repeat in hereditary dentatorubral- pallidoluysian atrophy (DRPLA). Nat Genet 6, 9-13 (1994). 112. Nagafuchi, S. ef al. Dentatorubral and pallidoluysian atrophy expansion of an unstable CAG trinucleotide on chromosome 12p. Nat Genet 6, 14-8 (1994). 113. Ranum, L.P. ef al. Molecular and clinical correlations in spinocerebellar ataxia type I: evidence for familial effects on the age at onset. Am J Hum Genet 55, 244-52 (1994). 114. Komure, O. et al. DNA analysis in hereditary dentatorubral-pallidoluysian atrophy: correlation between CAG repeat length and phenotypic variation and the molecular basis of anticipation. Neurology 45, 143-9(1995). 129 115. Maciel, P. et al. Correlation between CAG repeat length and clinical features in Machado- Joseph disease. Am J Hum Genet 57, 54-61 (1995). 116. Maruyama, H. ef al. Molecular features of the CAG repeats and clinical manifestation of Machado-Joseph disease. Hum Mol Genet 4, 807-12 (1995). 117. Takiyama, Y. ef al. Evidence for inter-generational instability in the CAG repeat in the MJD1 gene and for conserved haplotypes at flanking markers amongst Japanese and Caucasian subjects with Machado-Joseph disease. Hum Mol Genet 4, 1137-46 (1995). 118. Wanker, E.E. Protein aggregation and pathogenesis of Huntington's disease: mechanisms and correlations. Biol Chem 381, 937-42 (2000). 119. Klein, J. P and Moeschberger, M. L. Survival Analysis. Statistics for Censored and Truncated Data. New York, Springer. Statistics for Biology and Health. Dietz, K, Gail, M, Krickeberg, K., and Singer. B. (1997). 120. Kaplan, E.L. and Meier, P. Nonparametric estimation from incomplete observations. Journal of the American Statistical Association 53, 457-481 (1958). 121. Lee, Elisa T. Statistical methods for survival data analysis. (1992). New York , Wiley. 122. Claxton, K. and Posnett, J. An economic approach to clinical trial design and research priority-setting. Health Econ 5, 513-24 (1996). 123. Goldberg, Y.P., Andrew, S.E., Clarke, L.A., and Hayden, M.R. A PCR method for accurate assessment of trinucleotide repeat expansion in Huntington disease. Hum Mol Genet 2, 635-6 (1993). 124. Andrew, S.E., Goldberg, Y.P., Theilmann, J., Zeisler, J., and Hayden, M.R. A CCG repeat polymorphism adjacent to the CAG repeat in the Huntington disease gene: implications for diagnostic accuracy and predictive testing. Hum Mol Genet 3, 65-7 (1994). 125. ACMG/ASHG statement. Labatory guidelines for Huntington disease genetic testing. The American College of Medical Genetics/American Society of Human Genetics Huntington Disease Genetic Testing Working Group. Am J Hum Genet 62, 1243-7 (1998). 126. Warner, J.P., Barron, L.H., and Brock, D.J. A new polymerase chain reaction (PCR) assay for the trinucleotide repeat that is unstable and expanded on Huntington's disease chromosomes. Mol Cell Probes 7, 235-9(1993). 127. Bruland, O. ef al. Accurate determination of the number of CAG repeats in the Huntington disease gene using a sequence-specific internal DNA standard. Clin Genet 55, 198-202 (1999). 128. Lagakos, S.W. General right censoring and its impact on the analysis of survival data. Biometrics 35, 139-56 (1979). 129. Gruger, J., Kay, R., and Schumacher, M. The validity of inferences based on incomplete observations in disease state models. Biometrics 47, 595-605 (1991). 130. Leung, K.M., Elashoff, R.M., and Afifi, A.A. Censoring issues in survival analysis. Annu Rev Public Health 18, 83-104 (1997). 131. S-PLUS . MathSoft, Inc. 99. Seattle, Washington, MathSoft, Inc. (1999). 132. Harrel Jr., F.E. Regression modelling strategies with applications to survival analysis and logistic regression. University of Virginia, Charlottesville (2001). 130 133. Regina C. Elandt-Hohnson and Norman L. Johnson. Survival Models and Data Analysis. (1980). New York, John Wiley and Sons. Wiley series in probability and mathematical statistics, applied section. 134. TableCurve2D. SPSS Science. 2000. Chicago, IL, SPSS Science. 135. Falush, D., Almqvist, E.W., Brinkmann, R.R., Iwasa, Y., and Hayden, M.R. Measurement of Mutational Flow Implies Both a High New-Mutation Rate for Huntington Disease and Substantial Underascertainment of Late-Onset Cases. Am J Hum Genet 68, 373-385 (2000). 136. Mathematica . Wolfram Research Inc. 2000. Champaign, IL, Wolfram Research Inc. (2000). 137. Serfling, R. J. Approximation Theorems of Mathematical Statistics. (1980). New York, Wiley. 138. Cox, D. R. and Oakes, D. Analysis of Survival Data. (1984). New York, Chapman and Hall. 139. Optimization Toolbox. The Mathworks, Inc. 2000. Champaign, IL, The MathWorks, Inc. (2000). 140. Bishop, Y. M., Fienberg, T. S., and Holland P. Discrete Multivariate Analysis. (1975). Cambridge, MA, MIT Press. 141. Graf, E., Schmoor, C, Sauerbrei, W., and Schumacher, M. Assessment and comparison of prognostic classification schemes for survival data. Stat Med 18, 2529-45 (1999). 142. Statistics Canada. Life tables, Canada and provinces, 1990-1992. Catalogue 84-537. Ottawa , Statistics Canada. (1995). 143. Agresti.A. Categorial Data Analysis. (1990). New York, Wiley. 144. O'Brien, R. G. Using the SAS system to perform power analysis for log-linear models. 778-782. (1986). Cary, NC, SAS Institutes, Inc. The Proceedings of the Eleventh Annual SAS Users Group (SUGI) Conference. 145. Harrington, D. P. and Fleming, T. R. A class of rank test procedures for censored survival data. Biometrika 69, 553-566. (1982). 146. Marshall, Frederick J., Penney, J., Oakes, D., Kieburtz K. , Shoulson, I., and the Huntigton Study Group . Inter-Laboratory Variability of (CAG)n Determinations in Huntington's Disease, unpublished. 147. Huntington Study Group. Unified Huntington's Disease Rating Scale: reliability and consistency. Mov Disord 11, 136-42 (1996). 148. Siesling, S., van Vugt, J.P., Zwinderman, K.A., Kieburtz, K., and Roos, R.A. Unified Huntington's disease rating scale: a follow up. Mov Disord 13, 915-9 (1998). 149. Siesling, S., Zwinderman, A.H., van Vugt, J.P., Kieburtz, K., and Roos, R.A. A shortened version of the motor section of the Unified Huntington's Disease Rating Scale. Mov Disord 12, 229-34 (1997). 150. Brinkman, R.R., Mezei, M.M., Theilmann, J., Almqvist, E., and Hayden, M.R. The likelihood of being affected with Huntington disease by a particular age, for a specific CAG size. Am J Hum Genet 60, 1202-10(1997). 151. Lee, E.T. and Go, O.T. Survival analysis in public health research. Annu Rev Public Health 18, 105-34(1997). 152. Yu, S.S. and Voit, E.O. A graphical classification of survival distributions.Lifetime data: models in reliability and survival analysis. The Netherlands (1996). 131 153. Collett, D.Modelling survival data in medical research. Chapman & Hall, New York (1994). 154. Banks, R .B . Growth and diffusion phenomena : mathematical frameworks and applications. Springer-Verlag, New York ((1994)). 155. Armenian, H.K. and Khoury, M.J. Age at onset of genetic diseases: an application for Sartwell's model of the distribution of incubation periods. Am J Epidemiol 113, 596-605 (1981). 156. Balakrishnan, N. Maximum Likelihood Estimation Based on Complete and Type II Censored Samples. In Balakrishnan, M. (ed.) Handbook of the Logistic Distribution. Marcel Dekker, Inc., New York (1992). 157. Meiser, B. and Dunn, S. Psychological impact of genetic testing for Huntington's disease: an update of the literature. J Neurol Neurosurg Psychiatry 69, 574-8 (2000). 158. Tyler, A., Ball, D., and Craufurd, D. Presymptomatic testing for Huntington's disease in the United Kingdom. The United Kingdom Huntington's Disease Prediction Consortium. BMJ 304, 1593-6 (1992). 159. Altman, D.G. and Royston, P. What do we mean by validating a prognostic model? Stat Med 19, 453-73 (2000). 160. Margolis, D.J. ef al. Validation of a melanoma prognostic model. Arch Dermatol 134, 1597-601 (1998). 161. Mussurakis, S., Buckley, D.L., and Horsman, A. Prediction of axillary lymph node status in invasive breast cancer with dynamic contrast-enhanced MR imaging. Radiology 203, 317-21 (1997). 162. Mackillop, W.J. and Quirt, C.F. Measuring the accuracy of prognostic judgments in oncology. J Clin Epidemiol 50, 21-9 (1997). 163. Knorr, K.L. ef al. Making the most of your prognostic factors: presenting a more accurate survival model for breast cancer patients. Breast Cancer Res Treat 22, 251-62 (1992). 164. Braitman, L.E. and Davidoff, F. Predicting clinical states in individual patients. Ann Intern Med 125, 406-12(1996). 165. Laupacis, A., Wells, G., Richardson, W.S., and Tugwell, P. Users' guides to the medical literature. V. How to use an article about prognosis. Evidence-Based Medicine Working Group. JAMA 272, 234-7(1994). 166. Laupacis, A., Sekar, N., and Stiell, I.G. Clinical prediction rules. A review and suggested modifications of methodological standards. JAMA 277, 488-94 (1997). 167. Harrell, F.E., Lee, K.L., Califf, R.M., Pryor, D.B., and Rosati, R.A. Regression modelling strategies for improved prognostic prediction. Stat Med 3, 143-52 (1984). 168. Wyatt, J.C. and Altman, D.G. Prognostic models: clinically useful or quickly forgotten? British Medical Journal 311, 1539-1541 (1995). 169. Lang, T. A. and Secic M. Assessing time to an event as an endpoint. 137-46. How to Report Statistics in Medicine: Annotated Guidelines for Authors, Editors, and Reviewers. American Coll of Physicians. (1997). 170. Wellington, C.L., Brinkman, R.R., O'Kusky, J.R., and Hayden, M.R. Toward understanding the molecular pathology of Huntington's disease. Brain Pathol 7, 979-1002 (1997). 171. Foroud, T., Gray, J., Ivashina, J., and Conneally, P.M. Differences in duration of Huntington's 132 disease based on age at onset. J Neurol Neurosurg Psychiatry 66, 52-6 (1999). 172. Kehoe, P., Krawczak, M., Harper, P.S., Owen, M.J., and Jones, A.L. Age of onset in Huntington disease: sex specific influence of apolipoprotein E genotype and normal CAG repeat length. J Med Genet 36, 108-11 (1999). 173. Panas, M., Avramopoulos, D., Karadima, G., Petersen, M.B., and Vassilopoulos, D: Apolipoprotein E and presenilin-1 genotypes in Huntington's disease. J Neurol 246, 574-7 (1999). 174. Kirkwood, S.C., Su, J.L., Conneally, P., and Foroud, T. Progression of symptoms in the early and middle stages of Huntington disease. Arch Neurol 58, 273-8 (2001). 133 APPENDIX I SUPPLEMENTARY FIGURES 134 —I— CO IS. d CD d i n d — i — Tt -1— CO CO CD a> Tt o co CM O CM CM d o d jasuo io AijijqeqoJd 5, </> u. ffi 135 o 136 137 138 139 5 3 3 o CD LO 00 O 00 LO h-o LO CO o CO LO m o LO ^ in i~ re LO £ Tf CO o * • t f LO CO o CO LO CM o CM cn d oo d d co d LO d •tf d co d CM d o d jasuo jo Aijijqeqojd c o CO CM 0) 3. <* .5? CD LU GO 140 o LO oo o co 10 o N . uo CD O  CO LO O "0 ^ IB i_ CD CD Tf CD Tt LO CO o CO LO CM O CM d oo d d co d LO d Tt d co d c\i d o d jasuo jo Xiniqeqojd .f? to LL CQ CO 141 —I— cn oo d — i — —I— CD LO d —I— •tf d — i — CO CM d —I— d o cn LO oo o oo LO o N-LO CD O CD LO LO O LO . . (A to LO £ -tf -2 LO CO O CO LO CM O CM CD O d jasuo jo A)!i;qeqoJd 142 1 o 00 o 00 LO o Is. LO CD O CD LO LO O to to cu Tt LO CO o CO LO CM O CM d 00 d is. d CD d LO d Tl" d co d CM d o d jasuo jo Xjiijqeqojd co CM .5> co LL m 143 5 CD d CO d CD CD d LO d TJ-d d — i — CM O LO oo o oo LO o LO CD o CO LO LO o LO 10 to TC -2 0) Tt LO CO o CO LO CM o CM O d »asuo jo Ajinqeqojd r- -CN C 3 " ° . f ? CO LL. CQ 144 o CO LO 00 o co LO 1^ -o f-LO CO o CO LO LO o LO — -(A CO LO £ T f .s LO CO o CO LO CM o CM CO d 00 d CO LO o o o o jasuo jo Xuijqeqojd CO d CM d o d oo CM CD = 5) .2* co u. m 145 o CO LO CO o CO LO r--o LO CO o CO LO LO o "0 . . (0 u ca TT o Tf LO CO o CO LO CM O CM CD d OO d d co d LO d Tf d co d CM d o d jasuo jo Xijiiqeqojcj O) (A U J ca LL. CQ 146 c o T3 Q) U> (0 .Q XjinqeqoJd' AiniqeqoJd O S L O C O O L O C O L O C M L O T - C N J L O co oi cn oi m • o o o o s? e? °> ' Aiinqeqojci ' ' 0 Q 3 CO 149 CO S (O T- fO T- (O Ajinqeqojd 1 m CM o o o o o o CD CT) LO CO G) h- LO CO T- C\J LO LO LO o o o o OOOO g? °? °i ' Xiinqeqcud ' 8 S Q) .C rt C o a> (0 n .O. O) O) IS CD O) * 'OOOO O § S § r o ' A}j|!qeqojd ' °E 8 • • o 3 CO 150 c o oiocooiocoincsjiOT-cNio a? o? cj) 01 o • • • o o o o A n n q B q o J d q m o o o o o 3 151 c o Xiinqeqojd' m CM in (N o o o o o o o o o o o o • • o o 10 in in o o o o o o o o • o o o • o o cn LO co o m c o m c N i n CN LO CD cnoi OJ 05 • o o o o Xjinqeqojd ' ' ^ 8 152 03 S fO T - CO ^ (O Ajinqeqojd in csi m CN o o o o o o o o o o o o • • o o CJ) 00 CO O) S tf CM LD A^nqeqojd LO LO in ro o inco cn LO co I D C M m r- csjin O O O O CD CD CD CD CD ' O O O O O O O O CD CD CD 7," , • O O • o o o CD CD • Aijijqeqojd o CD £ C o X * CD to CO A in to CD a CD < O CO o >« to c CD TJ c o ' 1 • ta 3 Q . O a. o CO _CD '€ to 3 CT >. A TJ CD i * -to to to to 3 TJ "to CD k. TJ CD to E (0 0) o <*-w o a. c to 3 CT i uant oho a u CO CD TJ CO CD L . T» 3 k. O) O LE 3 153 a> C o oo s rt i— co Ajwqeqojd CO CN LO CN O O O O O O O O o o o o • • o o cn co aocnr^-^tCN LO Aiwqeqojd in in in co cnmcocnmcoincvim-»-cNjm in o o o o co cnoi cn cn ' O o o o o o o o o cn cn cn • • • o o o • o o o cn cn • AJinqeqOJd '9 o 3 at 154 0> C o NiqeqoJd Ajniqeqojd LO io m ro o o o o o o o o o o o C O L O C O C D L O O O L O C M L O T - C N J I O 0) cod) cn O) • • " O o o o C7> CO CD " O O 0 1 "> ' AiinqeqoJd ' ° 156 c o T> 0) (A (0 n n i- n i- n Xjinqeqo'jd' i n CM o o o o o o cn co io co 01 smn T - CMLO cn co cn cn o o cn cn AnnqeqOJd ° c n m c o c n i n c o i n c v i i O T - c M i n CD CO CO O ) O ) " O O O O 8 § 0 5 ' XlHiqEqOJd ' ° 8 157 c o TJ d> in co n AiinqeqoJd AiinqeqoJd o > m c o a ) » r > c o t c 5 c \ j t r > * - C M i o L O CD a> CJ> Ol CD • ' O O O O o S T • AiinqeqoJd ' ^ 8 8 3 ro 158 c o T3 0) CO (0 .Q HiqeqoJd c n m o o C D i n o D i f i c M L O T - c M L O C O O ) O J O ) O ) ' O O O O gg"! ' ^IliqeqoJd' ' °- g 3 CO 159 162 163 164 165 5 cn o CM CD CO CD CD CO CD L O CD TJ-CD CO C D CM CD O CD jasuo jo Ajujqeqojd aAjjeiniuriQ 03 •£= >>"£ CD OJ CO 0) 3 to xi O 167 CO d co d r--d CD d LO d •tf d co d CN d o d jasuo jo Ajjuqeqojd aAijeiniunQ .ti LO 5 oo co co CO ~ >>c CO CD c co CD 0) •e- CL CD CD ctjj co •— co o "O i= -c | 1 2 a5 co "o CL O § E c o CD i= . C CD r E CO L _ CO CL CD c CD CO 0 s CL CD c o T3 CD CO co _Q c o co CO ^ o •o CD CL "c CD CO CD i _ CL *-CO CD rt "co CD CL CD LO •tf CD CO c o o >> 15 co o CD > " r t ro E O LO CD g> LL. o o E CO o LO CD CD C CD CO £ CL CD 1_ CO i .O O 168 o o co o | s -o co L O co CO CD CD 03 < o o o CM —I— co d — i — CO d — i — i s . d — i — CO d — i — LO d — i — d co d — i — CM d - 1 — d o d jasuo jo AjjijqBqoJc) 8A!}e|nwno 13 CO x> O 169 CA CD E eg CO CO L O CNIT- C O i— CO T - C O v-CO ^ U ^ o o o o o o <•*> - o o o o • o o AuiiqeqoJd C O N . L O C O T - CN1LO L O L O C O C O C O C O O O Q O cn coco • -o Q o °> • • AwjqeqoJd ' ^8 £2 SSS co LO OOIOCNI LO rt CNLO jq cococococo • • - o o o o Q C O CO C O • • • • O O Q C O CO ~ Ajinqeqojd O o CU rt 3 ° i . CD L L . ca 171 (A CU E n cn cn in CNT- CO T - CO T- co T-cn • • o o o o Q o cn • • o o Q o • Xnnqeqojd • • q 0 I'I'I'I'I " W I ' I ' I ' I ' I ' I'P ' F I " I' I' F ' F W II cn co LOCO cn h-ioco *- CMLO LOLQ cn cncncn • • • • • 0 0 0 0 cn coco • - o o o q • • AiH.iqeqojd • q § co LO co cn LO co LO CM LO T - CVI LO JO SSfJcoco • - o o o o o co coco •, • • • 0 0 o q q • AjinqeqoJd q 3 cu *-3 ° S> cu .S> co LU CO 172 to tu E 00 cj) L O C M T - C O C O T - C O J -o> w . ™ . . o o o o o o C D , • • O O Q O • A}i|iqeqojd • • o o S S J S S S C3> r » i o c o T- CMLO L O J O CD 00OO CD O O O O  O0O0 C D Ajinqeqojd o o o O Q O J I O O O C D L O C O L O C V J L O T - C N I L O C D O O O O O O O O - ' - O O O Q oo go oo oo oo C D C D O O • . • oo oo • AiinqeqoJd ° 8 o o o cu <*-3 ° .Is, LL CO 173 LO "CO O "CO LO "CN o CN LO -LO "37 d 15" 7T "ar d To" d d CM d i d d L|}B8a jo /tynqeqojd 3 T3 CO O LZ E 176 LO " co o " C O LO " CM O CM L.LO "37 d 15" d TT o o d d d CM d o d L|}B8Q JO XjjnqeqoJd a cu D) o i i E 177 m o ' CO in "CN o "CM .m o' l b d — i — I T d CN d o d Lijeaa jo AiinqeqoJd 3 TJ ro o iZ E 178 Lj}B9a jo /tynqeqojd 3 TJ D) O i i E 179 0 10 20 30 40 50 60 70 80 90 Age (years) Figure 63 Cumulative probability of onset predicted by a parametric model developed with 80% of the data, compared to the observed onset for the hold-out sample for 43-44 CAG repeats. Staircase line represents the nonparametric (Kaplan-Meier) analysis. Smooth curve with solid line represents a parametric model based on 80% of the data. Short dashed line represents the Brier Scores of the nonparametric prediction, based on the holdout sample, and the long dashed line represents the Brier Scores of the parametric model predictions, based on the modeling sample. 180 r1.0 o.H 20 30 40 50 Age (years) Figure 64 Cumulative probability of onset predicted by a parametric model developed with 80% of the data, compared to the observed onset for the hold-out sample for 45-47 CAG repeats. Staircase line represents the nonparametric (Kaplan-Meier) analysis. Smooth curve with solid line represents a parametric model based on 80% of the data. Short dashed line represents the Brier Scores of the nonparametric prediction, based on the holdout sample, and the long dashed line represents the Brier Scores of the parametric model predictions, based on the modeling sample. 181 CD 5' 20 30 40 50 Age (years) Figure 65 Cumulative probability of onset predicted by a parametric model developed with 80% of the data, compared to the observed onset for the hold-out sample for 48-56 CAG repeats. Staircase line represents the nonparametric (Kaplan-Meier) analysis. Smooth curve with solid line represents a parametric model based on 80% of the data. Short dashed line represents the Brier Scores of the nonparametric prediction, based on the holdout sample, and the long dashed line represents the Brier Scores of the parametric model predictions, based on the modeling sample. 182 A P P E N D I X II P R O G R A M S F O R D E T E R M I N I N G T H E B R I E R S C O R E 183 12.1 Program to calculate Brier Score based on predictive model The following program i s written in Perl and calculates Brier Scores and weights for a test dataset, based on predictions generated by a model. Predictions are to be provided in a text f i l e (predict.txt). Predict.txt should have the format of a column of numbers with 2 fields: (1) Age of prediction and (2) Prediction ($prdctage and $prdct respectively). Censoring distributions are to be provided in a f i l e called "censor.txt". The format of censor.txt i s currently setup to accept censor d i s t r i b t i o n information as provided by SPLUS-2000. The second field in SPLUS output i s the age ($age) and the fifth field i s the probability of a patient being censored (0 individual has onset and 1 individual i s censored). Note that in order to calculate the censoring distribution censoring codes must be reversed from their usual meaning (0 and 1 codes must be reversed before censoring distribution i s calculated) Data to be tested i s to be provided in a f i l e called data.txt having 3 fields: (1) Patient Identifier, (2) Patient Risk Status (0 for censored, 1 for onset) and (3) Age ($id, $risk, $age respectively) separated by whitespace. Gtot should add up to the number of individual in the test sample, give or take a decimal place or two due to rounding. #!/usr/bin/perl open (PREDICT,"predict.txt") || die "Can't open predict: $!\n"; ttpredict.txt has f i r s t column INTEGER age f i e l d and second column decimal prediction of onset decreasing 1 to 0 while (<PREDICT>){ chomp; ($prdctage,$prdct)=split (/\s+/,$_) ; #print "$prdctage $prdct\n"; $guess=$prdct; $predict{$prdctage}=$guess; #print ("Adding at >$prdctage< >$predict{$prdctage}<>$guess<\n"); } for ($i=0;$i<90;$i++){ i f ($predict{$i} > 0) { } 184 } open (CENSOR,"censor.txt") || die "Can't open censor: $!\n", while (<CENSOR>){ ($fieldl,$age,$field2,$field3, $prob)=split /\s+/; $censor{$age}=$prob; } $Ghat=1.00; for ($glookup = 0;$glookup <= 120;$glookup++) i f ($censor{$glookup}>0){ $Ghat=$censor{$glookup}; } $G{$glookup}=$Ghat; } #foreach $key (keys(%G)){ # print ("at $key Ghat >$G{$key}<\n"); #} open (DATA,"data.txt") || die "Can't open data: $!\n"; while (<DATA>){ ($id,$risk,$age)=split /\s+/; $idrisk{$id}=$risk; $idage{$id}=$age; $npatients=$npatients+l; } for ($i=0;$i<100;$i++){ i f ($predict{$i} > 0) { $prognosisforage=$predict{$i}; $pihatt=$prognosisforage; $progage=$i; #print ("\n\n\n making prognosis for age $progage guessed >$prognosisforage< \n\n\n"),-foreach $key (keys(%idrisk)){ $patrisk= $idrisk{ $key} ,-$patage= $idage{$key}; $GhatT=$G{$patage}; $Ghatt=$G{$progage}; i f ($patage <= $progage && $patrisk == 1){ $category=l; } i f ($patage > $progage){ $category=2; } 185 i f ( $ p a t a g e <= $ p r o g a g e && $ p a t r i s k == 0 ){ $ c a t e g o r y = 3 , -} i f ( $ c a t e g o r y ==1){ $ w e i g h t = l / $ G h a t T ; $ t e r m = 0 - $ p i h a t t ; $ t e r m s q u a r e d = $ t e r m * $ t e r m ; $ c o n t r i b = $ t e r m s q u a r e d * $ w e i g h t ; # p r i n t ( " w e i g h t i s >$we igh t< t e r m >$term< t e r m s q u a r e d > $ t e r m s q u a r e d < c o n t r i b > $ c o n t r i b < \ n " ) ; } i f ( $ c a t e g o r y ==2){ $ w e i g h t = l / $ G h a t t ; $ t e r m = l - $ p i h a t t ; $ t e r m s q u a r e d = $ t e r m * $ t e r m ; $ c o n t r i b = $ t e r m s q u a r e d * $ w e i g h t ; ttprint ( " w e i g h t i s >$we igh t< t e r m >$term< t e r m s q u a r e d > $ t e r m s q u a r e d < c o n t r i b > $ c o n t r i b < x x \ n " ) ; } i f ( $ c a t e g o r y ==3){ $ c o n t r i b = 0 ; $ w e i g h t = 0 ; # p r i n t ( " w e i g h t i s >$we igh t< t e r m >$term< t e r m s q u a r e d > $ t e r m s q u a r e d < c o n t r i b > $ c o n t r i b < \ n " ) ; } $ G t o t = $ G t o t + $ w e i g h t ; ttprint ( " $ p r o g a g e $ k e y $ p a t r i s k $ p a t a g e $ c a t e g o r y G h a t T >$GhatT< p i h a t > $ p i h a t t < G h a t t >$Gha t t< $ t e r m $ t e r m s q u a r e d c o n t r i b > $ c o n t r i b < w e i g h t > $ w e i g h t < \ n " ) ; $ B S = $ B S + $ c o n t r i b ; ttprint ( " G t o t > $ G t o t < \ n " ) ; } $ B S = $ B S / $ n p a t i e n t s ; p r i n t " $ p r o g a g e $BS $ G t o t \ n " ; $ p r o g a g e = $ p r o g a g e + l ; $ G t o t = 0 ; } } 186 12.2 Program to calculate best possible Brier Score based on perfect model and dataset The following program i s written in Perl and calculates Brier Scores and weights for a test dataset, based on predictions generated by a model. Predictions are to be provided in a text f i l e (predict.txt). Predict.txt should have the format of a column of numbers with 2 fields: (1) Age of prediction and (2) Prediction ($prdctage and $prdct r e s p e c t i v e l y ) . Censoring distributions are to be provided in a f i l e called "censor.txt". The format of censor.txt i s currently setup to accept censor d i s t r i b t i o n information as provided by SPLUS-2000. The second field in SPLUS output i s the age ($age) and the fifth field i s the probability of a patient being censored (0 individual has onset and 1 individual i s censored). Note that in order to calculate the censoring distribution censoring codes must be reversed from their usual meaning (0 and 1 codes must be reversed before censoring distribution i s calculated) Data to be tested i s to be provided in a f i l e called data.txt having 3 fields: (1) Patient Identifier, (2) Patient Risk Status (0 for censored, 1 for onset) and (3) Age ($id, $risk, $age respectively) separated by whitespace. Unlike The program calculates Brier Scores for a range of ages from 0 to 90. Gtot should add up to the number of individual in the test sample, give or take a decimal place or two due to rounding. #!/usr/bin/perl $progage=0; open (PREDICT,"predict.txt") || die "Can't open predict: $!\n"; while (<PREDICT>){ chomp; ($goopO,$prdct)=split(/\s+/,$_); #print "prdct $prdct\n"; $guess=$prdct; push(©predict,$guess); } 187 ttprint (">@predict<\n") open (CENSOR,"censor.txt") || die "Can't open censor: $!\n"; while (<CENSOR>){ ($fieldl,$age,$field2,$field3, $prob)=split /\s+/; $censor{$age}=$prob; } $Ghat=l.00; for ($glookup = 0;$glookup <= 120;$glookup++){ i f ($censor{$glookup}>0){ $Ghat=$censor{$glookup}; } $G{$glookup}=$Ghat; } #foreach $key (keys(%G)){ # print ("at $key Ghat >$G{$key}<\n"); #} open (DATA,"data.txt") || die "Can't open data: $!\n"; while (<DATA>){ ($id,$risk,$age)=split /\s+/; #print ("id >$id< risk >$risk< age >$age<\n"); $idrisk{$id}=$risk; $idage{$ id}=$age; $npatients=$npatients+l; } ttprint ("npatients >$npatients<\n"); foreach $prognosisforage (©predict) { $pihatt=$prognosisforage; #$pihatt=l; ttprint "\n\n"; ttprint ("\n\n\n making prognosis for age $progage guessed >$prognosisforage< \n\n\n"); foreach $key (keys(%idrisk)){ $patrisk= $idrisk{$key}; $patage= $idage{$key}; $GhatT=$G{$patage}; $Ghatt=$G{$progage},• ttprint ("patage >$patage< progage >$progage<\n") ,-ttprint ("1 at $key we have risk >$patrisk< and age >$patage< and patient i s in category >$category< GhatT >$GhatT< pihat >$pihatt< Ghatt >$Ghatt< contrib>$contrib< \n"); i f ($patage <= $progage && $patrisk == 1){ $category=l; } i f ($patage > $progage){ $category=2; } 188 i f ($patage <= $progage && $patrisk == 0){ $category=3; } i f ($category ==1){ $weight=l/$GhatT; $term=0-$pihatt; $termsquared=$term*$term; $contrib=$termsquared*$weight; #print ("weight i s >$weight< term >$term< termsquared >$termsquared< contrib>$contrib< xx\n"); } i f ($category ==2){ $weight=l/$Ghatt ; $term=l-$pihatt; $termsquared=$term*$term; $contrib=$termsquared*$weight; ttprint ("weight i s >$weight< term >$term< termsquared >$termsquared< contrib>$contrib< xx\n"); } i f ($category ==3){ $contrib=0; $weight=0; ttprint ("weight i s >$weight< term >$term< termsquared >$termsquared< contrib>$contrib< xx\n"); } $Gtot=$Gtot+$weight; ttprint ("progage >$progage< key>$key< risk>$patrisk< age>$patage< cag>$category< GhatT >$GhatT< pihat >$pihatt< Ghatt >$Ghatt< $term $termsquared contrib >$contrib< weight >$weight<\n"); $BS=$BS+$contrib; •ttprint ("Gtot >$Gtot<\n"); } $BS=$BS/$npatients; print "$progage $BS $Gtot\n"; $progage=$progage+l; $Gtot=0; } 189 A P P E N D I X III C O N D I T I O N A L P R O B A B I L I T Y T A B L E S 190 1^ CO CD LO CO LO LO LO ' LO, co co. co co, I*-1 is." h- oo; CO' CO CO I oo, o o i Oj o' O; o, o, o O" o i oj o i, o; o r o o = Oj T-9 9 9 9 oj ot oj p j o M o o o d o . d O i i o! d d, dkdi d 'di d i d ' o l d d O l o. o o o c^c^o p j p i o i o o!o | o . . . CM}»:CN| CM oj o;o, o p | o|o) oioj ojoj o oj d dj dtd) d | o d | d | dtdj d : t j * T - CM, OJ CO Q , O | O . O T— &*C~| T— CO, CO CD >> o CO CO •<*' O O; d d' • i * o O 1 T f § T* o i o i o ,-o; T f I rf, LO to- LO o o o o o ' o o' o 6id) 6 o, o; otoj O IP*! T— I T~i T— °< ° o d COS CO! CO, T f j T t o; o; o o, o d 1 dl d d* d i , i , i i , i S 3 S S S o j o' o o o CO CO CO OO O ' O! O i O O O T-d < d d o d CM! CM CM! CM oj pi-o] p oj d i ol d t i l l O" O sfpj o o LO oo CO CD >. LO CM CM I CO o?-o d ' d 6 6 cot CO, o j o ! o; < i , o> < coj is., h-i i^ . ,pl o pj o 'dl d! o'd O O l o o COi CO ol o o o o o :©] O S £ o • O; o' O i o; oj o d i d ! d ; d ; d i d ! d CO o CO c o •B' c o O CO CD > . o CM c M j orj O" p i d i o j -o s o l C M ! CM O j o 3 o CM;cvi co o o o o d d O O O o o o •<ti LO , mi LO 0 o oj o d\ d'cij c6 01 o f o l o p pi p'-p; p of d, d ' d d col CO* col OJ O l O i |dj d i of 13 I ' l l Oj O.I Oj ,T-1 T— i T*l o) o^oi di d , ol CO CD 0|0j O ^ O j o i l gig! oiol ojoj CM CM! CNl'CNS O O O f O. d ' d ' d ! dj i i i s . i i o' o, o l o l o-oj oLp] CM" CM of o d r d i i t 2,* 2 ol'p' 1! CM CM i CM| CO \ O; O O] O | COi CO CO CO O , O , 0| o d-d —, , o •Q| o I o T t l T f o o dl d 61 6 T t l T f T f O j O O o d d M I i "9 2 2. pi o o .'dj d d co CD CO CD CD C O CO O O' d'-di o , o o .oj o' oi o o o o; o o o O O O; O, d d o , CD I I I I 1 O O OSOi o'oj ol'oi III Wm ol o] 21 okol o i o l o.'ol of.i I I | I E I . I I I ' o i o l c o ' o o o!< V T- P 1 P T-; C\h CM o j o o o o: o< o d ' d , d : d ; d t d ! d I ' I I I 4 I I * I s O j oj o | d j ol oj o l d o'.oj oj oj o CM! CM CM CM I CM) CM \ CM| CM CO - _ - _ ~ O i O O O osol o o o o o o o o A jo' o t o o oj o oj o — "O o O" O CT O O" O O l I I I I I I I o o o o o o o o Os pj d s d; o o o 9 9 9 O j O j Oj o o! of o: o T—j| T- $ T—t T— O j O ; Oj o d d ' di d • i i i o o o, o o} o: o o • i 11 i i CT OJ O 0_ ot'oj O O o o o o: (Ji o oto; o o o o o o o o o oro< o t o o o o o co C CO O c M c o T t L o c o r s . o o c j > C M c O T f L O c o r s . c o c n oises CMPCO - t LO co T - T - T - T - T - T - T - T - CM BCM1 CM p CM CM CM CM 191 5 O CD JO to JO o CO c g "rt T3 ' C o O cu 3 O m O r - C N n ^ i n c O N C O O J O r ! C M C O C O C O r O C O C O C O C O C O C O ' t f ' t f -CM, CO. "tf. UO, -cf! "*i CO N - 00 OI ' O , CM CO LOl • < t r f , -cr T f irj'io in in io io! CO LO 192 193 ro 03 >. L O C O col d 00 eg eoi oo; co; a « oi W d t d j o l d s m ?>4 <? O) OJ c cot-CD t>-d|.df d §> i §>! s s CO cu >> o CO , CM CO; 00 d ; d CM- CM CO CD CO, • * LO LO CO-LO! LO, LO LO IO oS d j o d_' O ' N J oo cn, cn' o! rs., ; iv-, r- co! d« d I d | d .'©• cu CO CO -.—' c OJ l_ l_ 3 O •tf, in, co - r»-, co j eo; co ooi ! d - d , ... . < O i M "tfj -tf di df'df ioi co I iv.' oo! oo, cn' o •<- CM y-< T - 1 r-i T - ! T - t-; CM CM CM, d d d d d , d : d ; d d tv-i oo oo. oo Lid COi O T - . CM eo? cn; CO) cn, CO' -tf cn< CO to; cn-HII 194 195 CD CD LO CO T - | CN r COI LO ' CO CN] CN | CN, CN , CM o °1 T t O , O <—>i * _ i , O O* I I I i ' i LO, CO CO. fs. o o' o o S S S & S S S O l d j CO, T t ( s - i COI CM' CMi 9;9l CO, OJj o ! o d :;CMI TF co| co 9f9 CM. CO CO| CO f,COI T t i . - * T t tf o-o o o CO. C-O [ O ' O ! O- O ' Oj 00, O o i o l CMj CO. CM, CM! d[ oi LO IS. CM i CM; d, d! o LO ofdl di'Qj dl i : 11 i , 15 i , O S T-1 CO ! COI 00 CN , CM; CM ; CN- CM ' d i g o i j c i g>ir;i M-icp' co, CM j co: co ds d| d LOTS LO; IO oj O ' Q ; i ' i i i , O , CM T t ' CO CO CO o d d co CO, COI d! d ! T t ' dj o ofca c o t a 9V9: colCft] CO, CO o o o pj CO • r-4 CO co! T f COJ LO LO oi d dj o o CO - CD] T t . t r Oi.CM, | s . | > ~ -d id • , i LO rs.i T t T t d, d 'I LO CD J O CO X I O CO c g TJ c o o CO CD >. o CO I S CO OJ 0 , O i d i d . 00 t CO o r ^ d o o CM CM 9 d 4 o LO o d o * T— d d CO ( .Ttj CM CM: m foj M m LO;»S. CM I CM d 00, "oj si oro' o 18 LO CO CO CO CO9: d • 9 d • • 9, CM CO T t °<o T~{ T — T - T - T - , dj d 0 d d — • — T— — - t >•* 00 0 CO LO CM CM CM CM d d d d d o f 196 197 CO CD >. LO CO cn: •cn, s. eoi d cni cn i cn- cn o>< cn, O 1 cn dj d i dj d T J - -4 LO in>, CO. 00; co oo d : d i d o cn- co'cn cn cn co o d d LO co co oo oo co o d d LO iti> CO - CO CO cn o>: cn ° cn* o>, d i o ' d 1 o co co I S cn cn cn d o d d CO CD >^  o CO |v-5 N , cn - o>, 0 9 : CO (v., |v. K j d , d-00. eo ooj cn co! Si [v.; K | d i d CO CO. CO cn cn, cn 9:9- 9 cn. cn o |v-[ s., CO d i d 0 00, 00, eo CO CO CO 9 9 9 O T -00 00 00 d o d rt Lest col 0 0 d | d | CM CM CO i CO. CO CO ! CO. CO ; d ' oi d ' CO -<f -tf CO CO1 CJ) d o o" O LO CO CO CD >. LO CM LOI LO' cot OJ d LO LO! CO COS 01 o CM |v-rv-! IS 00 100 fdl CO1 CO COi CO; co, co; CD cn cn. cn 01 o o! o • o CN. CO Oi o 00! OO 00 00 O' o o» cn|o oo' 00 cn 0' o ' 0* o 1 0 .O. CO o CO SZ g 'S ' c o O CO CD >> o CM CO CO d d • 1 rt CM CO CO s s 05! s: |v- 00' rt, CM! CM CO |i:C0| COi d fdl d i 1 , 1 1 I CM CO "tf CO l cot CO o j d o rt', rt* CM 00 , 00 ! 00 I V - ifsjl d i o j d i d s d d d d d o CM' CO j co COj co J OJ o d d i' 'St • t f LO CO CO C O ! CO o d d CM co co 00 00! COCO CD >. LO • CM i CO CO -tf .001 CO, 00i CO, CO 1-9 . CO ! t ! 0 ! 0 . 9 : 0 . 9 co o -<- r-•tf LO LO LO LO d 1 d ' d o d CM to LO CD CO CO CO 9 9 9 C O -tf -tf L O LO L O o d d CO col 00 j to' cn o 1 -CO s COi CD I C— |v-0 1 O i o O i o o o CN CM CO r-. h~ rv. o d d CO CD ^ LOj CD! CO! CO! CDS •tf  i  d ^ d 1 CO CO  11-.', 00> cn o co; CD; CD > co; iv. d ; d d 10 co r-~ 00 CO CO J CO! CO d i d I d l d • rt" CM CM |v. |v. |v. d o o C O fl) 6 COi COt'tf d o d CO! -tf LO; LOj LO LO| o d d co tv. LOj LO d | d CO CD LO) CO; CO O j O i O ' CD CO CO -«—, c CD i_ i ZJ O (v.! 001 OK 00; 00, CO O T- CM CO -tf LO COEOJ* coioj: co ten 198 •4—» CD CU o_ OJ OL CD < O oo CO CO •g > TJ c c CO OJ </) c O 4 — o ' !5 co .o o co c o » T J C o O CM CM _0J XJ ro 199 o LO CD JO ro JD -O CO c g Cw T J C o O o CO c CD L_ ZJ o OO OO q.Oj di cy CO N ' OO CN CM' CNI C N ' C O ' LO' CO' OO O N <t COlCO r-- i - ; T - ' T-;- T- > CN C N C N j CN CN! CO o: oi <LJ'cij CD"o ci d; d j di d CO CJ) CO; CO d ; d c- o •* to CO CO1 00" T-' -tf .CO LO LO, LO: COi CD j CO O O d O O O O T - C N C O T j - L O C O r ~ O O C J ) O t -C O C O C O C O C O C O C O C O C O C O . - t f - t f CN CO LO. CD iv. ' 00 OO; t *t ** -tf •* o ; r-j C N co. L O ' LO LO LO' 1 200 201 CO CD CD to C CO CD CD fc . » o o LO mj LO CO CM LO CD o CD CD CD oS CM oSj or CD co 6 CO CO O ! o o co c r co o d d o ' c D O O o CM; CM \ CO LO LO LO C O C O C J J O O O T - T - ' C M C M CO CO CO T t Tt 'Tf T f T t T t T t o o ' o o o o ' o o o d ( D S f f l o i O ' - o i c o ^ m C O C 0 C 0 G 0 O ) O > 0 ) 0 ) O ) C 7 ) 202 CO LO, (v., S! CN CN CN! CN i COJ CO d' d, d | dj d CO, LO N." 203 204 CD CD >. 10 co C D CD o>! co SCO: CD LOT CO f Q J O CN CO la>i co | d ! co iCO CO « d d oo»oo| ooi;,oo T f T t CO j 0 0 S - ' S co;o>! d d i 10 Ro coioo' 00(00 ool 00 d o cor 00, co fco d i p UotlO: 00 loo d f o l CD CD >. O CO T t i m: 00 00, (O! CD N S CO CO 00 CO X LO CO; CO o; 0 • 2'° I— CO, CO C O ! 00, 00 o | g o cO| co, T t 00 00; CO CO CO. 00 CO CO CO CO CO 9,9 O ' O CO OO CT)' CD, d » d " o d LO IO CD (CO. o ' 0 , 0 CO CD CDtOO CO CD >% LO CN OOj o CO -OOtCO CNfCO, T t CD 1 OJ CD d } d ! d C 0 » O T-r ^ i 00, co |s- i CO CO CO 00, CO LO Wj CO t C O CD 1 0 5 ) CD 1 OO d , d ' d o ' c o , - * ] L O P C O , CO I col 00' 00 o o < o ; 0 , 0 O T — CN j CN CD OO, CD O l o 0 0 o 0 0 0 CO CD o CN T t T f tfi o CO' IO CD 00! 00 00 d t d d I— I CO o CD COj |s . d j o l d coicoj 00 Is. I ' M o. CD O T - , 00 . O J , CO • OO O O i O i 01 o co'10] co is . |s. • fs.1 h- , h-o i o 01 o CN T t 00 i 00 LO CO ool-op, d f - d i CM MO, Oj£C»' d t * o C O , x t , CD, O J ; d r d I • o 0 0 ! 00 L O I S oo|ea| d r a c o l s t i co; co I d ' d ' 1 d T t T t CD CD d d s CN| CM. 00 00; CO: o , 00 f CO] d>' d i o ; • co; co: d o. LO 1 LO-co, co d | d , I I LO' LO OO! 00 O l O ' O - • ' o i o r o ' CO' S? 001 o>! CO C O CD O 0 CO CD C D OO O J ' CD o j 0 : 0 o , o i o C O 1 00; c o : ^ O f O ' O t Sol C N ' C N COJOOj d ? d j is . tin 00! cq 00 d l d i d CM i CO CO CD j C 0 | CD d f d ' d co CD >. LO T - i CO IS. Is. O f O i 5? 9 00] 1 - ' co's in T t j 101 in' L O ©1 d ! d , d 0 0 0 T - . CMj 00 t OOj O r © ] • 1. CM' T t , CD' v-'  ' T t , LO LO col co, <o, co • co d d d ' d : d OOsCD! co; CO; d " d ' is . i i s . ; 00 j 00; 00 00 00100 d o d d • 1, 1 ' 1 O , T— CM: CO, is . i s . rs-i d o d d o r o CD j CD 9 9 Ltof in is.f , is . o ; o LO I Is., O CM LO LO' IO, CD, CO CD d t d i d i d i d CO T- CNuTt! CD Is. Is., ts. LO i s . : fs. , |s.-J co; cn is . Is,' 0 0 o j o . o 1 o - o , o ; o O T - r - CN 00 00 CO 00 o d o o a .01 00100; d ! d ' • * t | s- , Is., i s . i s . d d CO * COj CD d r d CO. COi CO i s . ; rs.; is . d i d o T t 10 CO [ CO d , d i LO IO LO CO CD CO CO* 00 j CO. CO dd'd'cS'd co CD COFCO' CD CM T t . LO d . d 1, 1 in is-00; co S S CMj T f T t i ©• d oo, o ; CN LO (Oj CD d i d ] d O r C N T t T f ' T t T f d ' d l d -j co in 10 0 0 0 • A. 15f T f © 1 i CO CM d CO CD CD, CO' O Is. i s . : is . fs.t 00 ©i O ' O J O J O l O COi CO Ifm d CM] CM! CO CD 1 COj CD d : d l d •^^^•;,^<m^s CM Is-l s . , Is. o o COi T t i LO in, 0 ©^  i s - . COi CO CO d o -PCM CM H rs. d d co CD CD CD CD ^—. c CD t_ >_ o C 0 | 0 | CN}-tf CO T- I CMj CM ; CN, CM d t © J d i d ; d 1 1 1 1 1 CM I CO; LO C O ts. O I O l O O O 00 O CM, T f CD CM CO CO CO CO d 0 o d 0 I I I I I CO O T-! CM ! CO T- CM CM CN CN O O O O O •tfj d , d l d 1 , i J 1 LO! CO Is. CM, CM. CM d ' © d LO| CD* T f [ T t , d r 0" 1 1 t j CD, O CM, CO' ts.5co 0 o -1 1. CO' CO' O O O O ) CO d ; 0 T f 10' cot 00 o T -9 T- CM CM O T f CD OO CD CM CM CN CN T— CM T f LO CO CO CO CO LOJ O O O 0 0 0 0 0 0 0 0 0 |s- 00 CD' O CO CO CO T f o o d d o T- CM CD CO CD C O T f i n c D t s - O O C D O i - C N C O T t ( D C O I D I D O C O C O S N S S S *i I CM T f [ , T t o f d , cDfrs. |s.tS. LO d • LO CO d ' d , d d Si CD, Is.; COS CO-LO LO d - ' d l d 1 s 11 1 CO CD: CD CO CO' CO H T t , in. LO co T t tf O O O O O O 1 2 J 9 | 2. co' co] I S . tf J tf • tf T - | CM'COi T f COi 00 \ C0 : CO • I B CO 00 205 CO CO LO CO o , o O O I o O N o ' O I t I 1 |-~ (v., 00. CD co' oo: oo ! co; co O , O , O O C O ! CO u>, cn d t o i cnj cn d l d o o 00 00 ens cn d f d C O ! CO C0j CO d l d co co o CO cnro>< cnfoY o ; o ; o CO CO CO cn co •. co o o o 00 00 j 001 00 001 co o>| co d d> <y> d> co; oo o> CO d i d 5 O LO cn CO O CO CO >* LO CM 00 CO oo co d d > i CO C O , en oo, eo oo ; co, oo co cn, co co d) d mm i ;cS CO SM co CO* CO CO CO 2. 2- & 2-f S CO, CO cn co CO' CO cn, co d i d 9 co 00 00 C O ' CO d COS CO cn: cn d i d o CO CO cnj cn cn o d d co! co co cn! cn \ co o i o : o CO CO o CM CO CO CO| CO coi co cn co i i i t b-i r--; ooi oo ,001 oOfCO oo d l d d l d CO CO I CO} cn. co' co! d 8 d I d | LCD ten 1 oo co co! co! d d ; d ; i i i J 00 CO CO f-~, Is- K | o d d *co CO C0J co d< d co [co cojsco cn f cn d , d co c g c o O CO 0) CO [ CO' CO d i d ' d • $ H i co> co, cn d i d ' d enjen O ' O • * i o - o 00 00 d d CO CO CO CO CO CO CO 00 00; 00. 00 CO d - d d« d I d , d co co co CO CO 00 O O O o o o 00 OO CO •tf -^r -tf o o o N N N C O C Q C O O O C O C O • t f - t f - t f - t f - t f - t f - t f - t f ' t f o d d o o o o o o CO CO CO "co" i_ c CO CO 0 L_ L_ ZS O N. 00 CO 00 cn; o i- CM co "tf LO ;00i CO CO CO CO CO CO 206 CM [ co] in f co CD >> LO CO CO CO' CO • * ' O . O j 0 , 0 0 6 co cdl I 11 I , I rt T - ) rtf 0 os o i q 2 > 2 ; 2 - , 2 . T - 1 CNj CM ! CM, o j q ! q c £ | 01 oj 0 , 0 •tf I IO' CO CO o o o o o o o o I I < I f CM CM! CM CO O l o . o o d - d d COi CO CO 0 0 o d 1 d d CO' CO O ' o Ot o M co 91 r-~, OS O 2 2 oj d j d o o i o • • ] • 00 o>! 1 -O j rt d i d col -tf o j o CO 1 CM! CM I CO! LO' 00| CO J col 0 0 1 0 0 I ' l l I I CO j CM! LO • CO •<-; 1 CNj CN CN d j d j O l d •tf' co CM| CM d f d | c o o f c o co co c o •tf 00 CO • t f • t f d CO CD >> o CO CN* CM 1 CN O O O 9 9 9 O O T -— - O O CM; ( o l ( o o d ! d 0 o CO o P! P o : co. O f o O j O O J d o i o CO o CM -tf O rt;CO CM CN CO > 00 T~>1 O f c O CO CD >< LO CM O O d » d 5* d • o t - CO LO COi 9i;9, c o l h-o | q d | d 0 0 ^ q * q d p d O CO O , CO O CD >N O CM O o ot o CM CO o t o ' -tf { -tf o j o d i d ' co CD CM] p CO' CO O t O ' did1 o l o ' d i d ' CM (.CN, O f p ! d i d ; co CD co CD 9C9I o o O O ' Ol o . o oi6lP]°tP -—-—0,0 0 0 rtl' /—\ I I t i l ° i 0 0 0 0 0 Foj o l d o s o o o o o o o . O f O I t I . o o Oj o O l O o | d O O O O O I I I I I O O O O O o o 01 I I I K d O . O ! O O O O ' 0 0 0 0 0 0 O 0 . 0 0 o o 0 0 I I I . I ' l l I 1 4 1 O O O, O O 1 O O i o O O O O O O O O O f S o <=>\6 CD cn co rt L_ C CO CD CD b. =H ZS o i l l CN CO -tf! LO CO • i h- 00 c o 2 CN CO "tf LO CO r*- O O C O O r t C N C O ' t f L O C O rt rt CN CN CN CN CM CN CN 207 CO CD > . i n co rt •tf' m d i d t<o> i - 1 «tf' CO | CD! tv-iN-f LO[lOi CD'CD' ^ d j d - d j C j 4 c N ! CO , CO LOfcO' CO CD r--> o * •v. 001 d i d * • i * (v. o C O , l - - l CM.lOi O l O O O O O co i m ' r--t(v.' 2 - ' & col ol r - t c o d d 00, CO 9 9J COfcO! tv-fcO 1 O j d c o j i n l oo foot d ; o l O CN CO CO d j d ' CNf-tf oofoo 2 - ! 2 1 r - i o o l COi 00' co j «-*•i i o i ir> CO I CO} CO CO d • d d o * • i i i i LO , I--; 00 O CO| 00 CO*. CD d o : d j o O ' rt* CN CO -tf CO i CO COi.: CO! CO o 1 o ! o > O o ; o ! r- *co, CO JCOj 9L9* co ceo co ,co; 2 2 m'cb CO CO d d CO | v . , 00- oo co o> co' co. co ' o d d ' 5 t ' • CO CO* -tf CJ) CO; CJ) - CO CO 2.-2S2f~*; c CD i_ i O :C0. O rt] CN IsCO ^ l O l O l S o O O I O i - N n t f ' B ' O S ' O O O ) . O ^ - C N C O t f ;cN. co m n f c o c o n n c o j c o n ^ t f t f t f t f t f t f t f t f t f i n i o i n i n i n CO LO 208 209 CO CD >, UO CO o o o 0 , 0 ; o © o - o 0 0 0 0 0 o i o o o t 1 or a>, 1 o"> 1 05 i , 05' • ) 05 ' t 05 1 05 05 Ch 0 5 en 05. 05, 05 0) OS 0 d , d d d O ' d d d 0 0 0 0 0 O ; 0 O 0 0 , 0 0 0 O j 0 0 O CD CD >. O CO 0)i 05 05 05 CD 05 d ! d 1 d : co« co 05j 05 di d p i di 05j 05 05 05 0 o l 0 5 , 0 ) , 05 05' CO; CO, 05 jCt) SIS O5)05J COfOlj d ' d ! 05! 05 05 j 05 9 ! d i 5. O CO CD >> LO CM 05! 05| 05: 05] 05 '0>> 05 05« 0) 0>| O ; O Q , O ' O i to; co 1 co 0)1 O) j 05' 05(05! O ' O i O O 1 O l O 0 H O ) 05(05 d ; d CO 05 05 O5|05 9 9 CO CO 0>-«O>, d r d CO. CO j CO' CO, CO OT 05 , 05' 0) j:05l cofco. o) of 00, CO O5-.0) O O O O O i O O o - o CO CD t s . i s . ' t * . r~-05 05 ( 0) 05 d, d d. d • I I I ! CO CO CO CO CO. 0) 05, 05 05 0) d i s . r s . IS. | S . : 05 0) 0) 05i o d d d 1 * 4 1 1 CO col CO CO 05 05, 05 , 0) d ;d i d d CD CD CO C CO CD CD k_ i _ 3 O CO ' CO CO 00 LO LO LO LO O O O O ISi! 001 05! O T - CM CO -t LO 00 001 COi 05 05 0) OS 05 05 210 ro OJ CD-CD a: CD < o ro •g > c ro 0 co c O O >> ro o ro c o x3 c o O LO CN O .Q ro 1 -211 212 213 214 CO CD CD o_ CD a: o < o CN tf CD T J > TJ c CD 0 CO c O o 15 CD XJ o CD c o —' T J c o O C D C M XJ CD 215 216 217 CO CD >% LO CO 0(0' O o l o o o | " o o o *6. o 010 0 , 0 0 . 0 0 o o o o o o o O'O o i o o o i o i o O' o 0 , 0 Oj O'O, o , o O ( o o j o i o ' o , o l o I o CO CD o CO Ol O.O, Ol Ol O O o i O O O . O: o . o o o ' o ' o ' o ! o i o 0 , 0 . 0 o o l 010 O'O O ©, O i OI O O O O O O O O 0 o - o o 0 0 : 0 0 CO CD >> LO CN Ol CD CO CO! d d coPco, cotco! co 'col CD CD d d CD CO CD CD CD CO d , d , d • 13 1 . CD s COi CD I CD I CO; CD ' CD!'CO' CD CD CO, CD o l d ! d COI CD CO CO CD CD d l d i d OJ CD CD CO| CD CO O O O CO i CO CD SCO d P o O LO CD JO CO J 3 -O CO CO >> o CN o o f o o 00 .001 CO CD CD tOO CD COI Oi * COj d f d i d d d o 1 t 1 11 . i t •v."--^ (v. |v. iv. |v. CD t CO* CO CO CO CO d ! d i d o o d CO s CO CO CO, CO, CO o d d 1 1 1 (v., fv. rv. Oi ' CD- CD o d d col 00 r co CO, CO} co 0 0 0 11 1 •v- rv. fv. CD CD CO o d d CO i CO CO' 00 OO j COi CO CO CO CO 00 CO CD'CO; CO, CD COSCD, Oi CD CO CO CD,CO I d l 01 Oi o o o o o o o o CO c g •5 c o o CO CD >. LO L O ; LO CD( CO' d LO ' LOi LO " CO' CO o o t i 1 CO. CO CO CO d j d ««t1 -* COS CD d l d IOI LO Oil CO d i d ; 11 I , CO CO ,C0| CD s i s 1*1 CO CD LO LO LO CO. CD CO, d d , d j t * 1 r £ CO; CO CO Oi; CD J CO] d l d ' d l s—s > f -w •cH -tf I CO CD CO Ol 0:0 ' O O LO LO co, co 0- d • , 1 co co cofco S' S •<ti-«t CO j CO d i d co CD CO CD >, LO LO LO, OO. CO Lo.tn» 00 1 CO' (ON.' (v. CO CO 00 d ' d ; d I * I I 1 . 00 00 00 LO LO LO 00 00 00, CO 00 CO d o d t, 1,1 -tf —•* —# CO CO CO d o d — •" -"rt-1 LO LO LO 00 00 00 O i o ; o ; o ' o . o ; o o o , ,  CO CO CO, CO d d 0 CO 0 co d d •-CN CD CN CO d d cor col co - CDS CD •a o o I I I o o COj CO d i d ' d CN I CNI CN CO ICOJ CO d l o l d CD Oi rt 1? C CO CD CD zs O CO CO CO CO CO CO CO CD CO CO 0 0 0 0 0 0 O O O O O CO CO CO CO CO 0 0 0 0 0 0 CN CSI CN CNJ CN CO CO CD CO CO O O O O O O T J - ! L O co rviicolco O •«- CN CO T j " 10 00: CO CO OOfCOfCO CO CO CO CO CO CO 218 219 220 221 CO CD >. LO CO 01 o !o , o;O; o• o o o oj o t o o o; O .O O O OfO' oso O O o o o ' o : o { 0 o o o o 0 0 0 , 0 0 . 0 o o o 0 : 0 0 : 0 0 o o o o CD CD >. o CO O O O O O O O o o o, O O O . 0 0 0 0 0 ©• o ; o o © o o o o oi o o o o o; 0 0 ' o o o o 0 0 0 - 0 0 0 0 LO CO .Cl CO .0 • o CO CD LO CN 00 o> OOj CD 01 d Of o o o o o o ©j o o o o o o co; CD 00! 00 dl d ,COj CD j CD, OO CO I CO* OO CO CO: CD Ol O i O o • o OO' 00. I OO CD' CO' hooi CD,00 00 00 CD, 00 ' j o | g o ; o CO CD >. O CN OO. CD; \Oi CD 1 CO I OO! CD , CD OOj 00 001 co d ' d 00* 00' ov; 00 d" d COI OOJ'CO1 OJ CD OO d o o 00 00 CD,00 d i d • , i 00: 00 CD ; CJ d ' d 00s 00 COj co ajl ool OOl COi 00' 001 CO' O0, CD CD os 0 0 , 0 0 ! -COi 00 OO. CD d d co c g -*—, 73 C o O CO CD co: co t cos co • co; co, co OOj CD CD, CD I 00s CD 00 CO j CO CD 00 o, 0 ,0! 0 0 o o o o Lfjj LO Ift, LO > LO LO LO O0( CD j OO, CD ; Ot. CD ' CO di d , d d ' d d ' d LO CO D OJ d d CO COCOCOCOCOCOCOCO OOj CDCOCOCOCDCDCDCD CO CD Oi, CD , CO. CD , 00' CD 00 CD CD 00' 00 • CO, OO : CO, CO CO' 00 CO di d i d ' d d o d d d H i t 1 t 1 1 t 1 M S N M N S S- S- h-00^  00; 00, 00 00 CO CO 00 CO dl 0 5 o: 0, d. o d d d 00 00 00 CO CO CO d ©! d co! co co< COj CO I CO' d* d i 0! CO! 00 00. CO d- d CO CD >% LO CD CD CD C CD 1 O S M. CD co; d t d I t t CO ECO COj© d d ' d Is. I Is.' CO|CO 0 0 ' • , 1 . CO 1 CO CD ; CO S-! fs. CD i CO d ; d I - I COi CO co co © ! o , o- o o ~' \ \ : LO LO LO CO CO CO O O O 10'W CO CD o d LO LO' CO CO o d LO LO CD © d d S I s o o o i O T - c M n ' t ' f l a>C0C0C0CDCDCDCDCDCD >4illl Illl lit! UN iili 222 -4—» CO 0 Q . 0 LXL CD < O tf tf CO =3 •g "> •o _c c CO 0 CO c o o >* -4—' ro xi o ro c o C o O 00 CM 0 X5 CO 223 224 225 CO CD > . LO CO o l o o o • o o 0 : 0 o mi o o o o o o o o 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 81 O O O O O O' O O O . O . O O; O s O. O , O CO CD o CO 8 0 . 0 0 0 O S O I O I O 0 0 o i o i o i q o . o o o ; o o : o ' o l o Oj O, Oi O ' O O ; O Oj O f Q ! O - O ! O Oi 0 o t o ; O ' O 0 ( 0 0 - 0 O i O ° 4 5 5 o LO CD CO X ) • o CO CD > . LO CM Oj O 5?] P ,o< O j O , o O O ' O o o o } o ' O O O p i o i c y . o 101 o O ' O j O 0 . 0 o 0 0 0 0 0 o o t o O i O O ' O o ' o O i O 0 0 CO CD >. o CM CD CO, OT CD CD CD: CD CD CD CD CD OT CD CD O O O O O t , 1 1 1 1 1 1 CD CD CO CD CD CD CD CD CD OJ CD CD CD CD O O O O d CD OJ CD CD O O O O 1 1 1 1 1 CD CD CO CD o ' d i d o OT CD OT CD OT CD! COj CD CD COj CD <D! CD 0) CD CD CD CD d i o o o o o d o o CO c g •5 c o o CO CD LO s- ts. r-CO OJ OJ fs. CD t— CD d O CO CD CO OJ d 0.97 0.97 CD d CD TO CD C0| CD 1 OJ CD I CD; CD | CD p l d < d i d ? o l d I'd i3 i i * j 1 f * 1 1 < t COI OO C0! CD CD; CD j CD. — COi CO CO CO. CO! 00 1 CO| CO f 00 d i o d o d o o o d : 0 0 0 CO CD CD d o d O O Oi CD * OT CD , o d d ; o i o o OO' CD OT o o d O CO CD O J o o ; o ' o coi co co CO' CD CO O O' N. |s- O' o IS., is. O O O O O CD CO CD CD CO CD CD; CD, CO CD CD CO O 1 O ' O O O O ' O i O O CO, COT CO CO CO COCO CO * CO COi CO CO CO CO CO'CD CO CD d l d , d d d ' d i d d d i CD C D CO ' c CD , o :fCl 00 S OT O T - CM CO LO COi COS CO CO OT CD O0 00 OT 226 227 228 CO cu >. LO CO 0 0 , 0 0 0 0 0 O ' O l O f O o o o 0 0 0 0 0 0 0 o • O' o o; o o o 0 0 0 ; 0 - 0 : O', o o 1 o O O O O O O . O O O O O O O O O O O O O 0 0 0 0 0 . 0 0 0 0 0 0 0 0 0 0 0 0 , 0 O O O O o 0 0 0 , 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 : 0 O O O O O O O'O: O O O O O O O O O O O O O O O O O O O O O o-0 o 0 o 0 0 0 0 0 o 0 0 0 ; O O O: o, o O O O; O; O i 0 - 0 o o o 0 o . o j o o o O O 0 . 0 0 0 0 0 : o o CO CD o CO 0 0 0 0 0 0 0 o, o; o o o o o o 0 0 0 o O O O O O o o Os O.'OS O-.O O Q; O O O O O O O O O O 0 0 0 , 0 : 0 O: O o o 0; o 0 - 0 0 . 0 O O O O O O O 0 0 1 0 0 : 0 O iO O o; o - Os o J O' o 0 o , Oi o o o 0 0 o 0 0 0 0 0 0 0 0 0 0 , o o o , 0 0 O. O O O O O O O O O O O O f o 0. o 0 , 0 , 0 0 0 o o 0 0 0 0 0 o o o o 0 0 0 0 o o o o o oi o o! o; o 0 0 0 0 o o 0 . 0 0, o! o- o o o o o o. os 01 o o o 0 , 0 0 , 0 0 : 0 o CO CD L O C N o o o, o; o o o 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 o 0 0 0 0 o 0 0 o o o o o o 0 : 0 0 . o o o o o o, o o o o 0 . 0 0 0 : 0 o o 0 . 0 ; 0 0 o o o o o O ' o o O, O O i O O O O OiO O O i O O O, Oi O • O' O' O. o, 0 0 0 0 0 0 0 o o o o o o o o 0 0 0 : 0 0 0 0 0 0 0 0 , o o o o o; O, O* O. Oi O: o' o , o o 01 o 0 , 0 o o 0 0 o o o o o o o 0 0 0 0 0 0 0 , 0 0 o 0 0 0 o 0 , 0 0 , CO CD >. o CM CD O J ' Oi o>- Oi- oi 00 ot 00' CD o* 0 0 0 0 I I 1 , 1 I CD, C O ! CD CD CD CD CD CD, CD CD o d d CD CD CD CD CD CD CD CD O O O O CD Oi; Oi Oi d d 1 oi o>; CD '.OJi d o CD CD CD CD CD CD CD " C O CD CD CD. CD o d d CD CD CD CD. d d I I CD CD -Oi; Oj d : dj CD CD CD ; (Dj CD , CD Oi CD, CD, CD d d d d d I I I I I I CD' CD CD CD CD CD CD CD CD CD o d d o'; d CD, CD CD CD CD CD CD CD d o d d : 1 1 1 1 CD CD CO CD CO CD CD CD d o d d CO CD, Oi Oi d d CD CD. CO CD d d CD; CD C O CD CD CD o d d 1 1 I CO CD OJ' CD CD CD o d d CD O 0 : CO, OO! CO CD OJ' CD' CO CD OO CD CD CD O ' O: O O O O O CD CD CD O J CD. C O CD Oi Oil oi -o d d OO CD: CO' CD, dl d ' CD CD. OOi'COi 00 OO OO; OO' CD CD OO CD OO CD CO CD OJ CD 0 0 o; O O: o o- o o o 00 00 CO 00 d d OO CD OJ OO CD CD o d d CO CD CO, C O , CO, C O CO CD O J CD CO OO d d CO CO CD OO O O O O O I I I I .j,,:,,,!,:,: N- rs.jrs.fs. CD: CD CD: OO " o d d fs.' Is. OO CD d d S- — _ r ~ ~ fs. S (s. | S . IS. JS. [S. CD COi CO' CO CO . OO; CD o d d . d ' d d d CO CO CD' OO; O O: 1 f Is. Is. CD CO d dj is. fs. 00 Oi; 0 0 0 C O : C O CO • C O ; C O ; CO , o d d , I I . I IS. f S . | S . , co; 00; co. 0 ^ 0 0 , IS., fS, | S . , CO , o> CO , d: d : d ' 00 00, OO CD d d |S-i Is.. OO; OOi is.. Is. OO CD C O 00 d ts-00 d •w |s-00 00 C O CO, OO; C O C O 1 00 CO; OO; OO O O O O O O Is. (s. CO CO d d |s.!N.! CO CD O O O O O is.. Is.' Is. OO , CO CD . d i d d •—.s-* is.;i«s is. CD OO CD 1. d'<>, d C O C O , C O 00 CD CD ' O J , CD ' o d d • 1 1 1 |s- |s. |s- |s. CO CD OO CD 0 0 0 0 ts, |s. |s. |s. o> O J at 01 d 1 d 0 d 00 00 00 CD d d 1 1 is. Is. O J 00 d d is. Is. O J CD 0 d C O CO C O 00 OO O J o d d • 1 1 IS. | S . |S. CD CD CD o d d ts. Is. |s. OO CD 00 o d d CO CD CM CM CM CM: CM CD OJ: CD' OO, CD 0 d d d d 1 I-;; 1 - fc' 1 O O O O O CD COj OO OJ CD o d d CN CN CO CO d d mm 1 o o 00 CD O O O O CN CN OO OO CN - CM] CN CD i, GO' CD O Oi Oi - i i 1 C O ' oo-,o>| d : d j o] o • , 1 O i O ' O CD CO CO d;di d CN CN COI CD O O: I I o o OJi CD d. d CM CM CM CM| CN CD OO' CD - OO CD d o : d ! d i d • 1 - I I I ' 1 0 0 0 0 0 CD CO! OO • OJ' CD d d d ; ©*' d CN CM CM CM! CO CO OJ CO CM CM OO CD O O O d o d d 00 00 00 CD d o d d 0 0 0 o o 00 00 d d CN CM CN CD CO CO 9 9 9 d 6 d OJ CD OJ o d d OO OO OO OJ CD CD CD d d 0 o" 0 d 0 CD CD OJ CD OO CD CO: CD: OO OJ;OJ' CD; OOJ CD CD' CD Oi Oi 0 d ' d ' d d d d ; d ! d d d o d d d - d 0 , 0 OO CD OO CD CD d 0 c>, <zi ci co CO CD > . L O CD OO CO CM: IS., CM: CM: IS. |S.. CM ts. CM |s-CM CM Is. CN |s- CM |s- CN fS;' CM' ^ : CM fs. CM; CM CM Is. • Is.] |s. CM fs.. CN fs. CN fs. CN Is. CMj CN fs. Is. CM fs. CM fs. d 0 0 d d 9 d O d d' d . O 0 0 0 9 d d. d 9 9 d d CO CO 00 co 06' cq; 06, CD 06 CD 00; CO 00 CO 06 CO CO-CO CO co. oo! CD ecS CO 06 CD ob 06 CD CD ob CO' ob CD ob CD ob CO COi CO CO: CO ob CD* ob CO dj d 2- d .—• CD d d d d d 0^  d_ 0 d d d d o_ d d d -s •—-d d oi |s-0 IS.. O Is. 0 Is. 0 IS. 0 ts. 0 ts. O is. 0 |s. 0 fs. 0 IS. 0 fs. 0 fs. 0 0 fs. fs. 0 fs. 0 |s-O is. 0 |s-0 0 ts. |s. 0 |s-0 fs. d 0 d d d d d d d d d d d d d d d d d d d d d CM CM CM CN N N S M d o d d ; • 1 1 1 \ CO CO 00 CO \ CO CO (O CD CN CM CM fs. |s. ts. dl d d 1 . 1 1 00 CO 00 CO CO CO d d d d d d d o o |S- fs. 0 0 0 0 0 0 0 IS. | S . | S . 0 0 0 to c CO CD CD t >i o CD fs. 00 CD O l i r ! CM LO LO LO LO CD CD CD C O S - ^ J - L O C D I S - O O O O O T - C N C O TfilfiH ! D S C O O ) O r - C \ i n * J U J! CD' cocococococofs . is.is . r s . is.wm s s K S c o f f l c o c o c o 00 229 O i n oo ca CD >> CO OO ooi oo 9 9 CO, oo cot oo. dldl o o ' o j o • i i - i co oo' ooroo CO, CO; CO OO d-d d. d CO I ox cos oo, d co CO co, co 230 231 232 233 CO CO >. LO CO s CD' O O l o O ' < o < o o l I I I o j o o O O l 1 3 O l l Op o i o O i o p.1 Oi CO CO >* o CO o Ol o o t o i o o | o o , o o i o i ; i o o o o o § o LO CO JD CO JO • o CO CO >> LO CM O i . O j o i o o | o o i o o - o o t o o i o O f o O ; O o f o -o 1 o o ; o CO CO >* o CM o t o o f o o l o .CO CO "cn] CO dl o> cn dl 3§ o>] bd! o t o o i o fO>i co hoc? — 1 00 CO c o c o O CO CD :>. co. co o f o CO CO CM CM COi CO CO CO >. W r-CO CO CO Hf •*-« c CO CO CO l_ 1 •>< zs 0 O O C O O T - C M C O - t f L O CO CO CO CO CO CO C O . C O 234 235 CO CO :>. LO CO o o o o o . o O O O O Oi 0;,0< O O ' O O O O O o o : o o o o ; o ; o o o o , o o o o o o o o o o o o o o p o p o p o p p p , p p , 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 : 0 0 0 0 O: O O O O O O. O O, 0 1 Oi O • O O O O O 0 : 0 0 0 O O O O O . O' O O! o o o o o o o o o ; 0 , 0 , o o 0 0 0 , 0 o o o o 0 0 0 0 0 O Q O O O O ' O O O O O . O - 0 :0 - O O o o o o o o o 0 , 0 0 . 0 o o 0 : 0 o CO CO >. o CO o o o o O O O O O ; o o 0 0 0 o ' o* o ; p 0 0 o 0 0 0 0 0 — o 0 . 0 o o o o o o o o ; 0 0 -O O O O O O O O O: O . O O O O O 0 0 o 0 0 o ; 0 ' o . o : o ; o : o ; o ' o o o o 0 . 0 o , o o o o o ; o 0 0 0 : p o o , p p o p p . p ; p : p ; p - p . p - p p p o p p : p 0 p ; 0 p . p , p - 0 , p -o o o o o o o o o o o o o ! o 0 0 o . o ; o 0 > o ! 0 0 0 0 . 0 o o o o.- o o o o ; o o o o , 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 c CO CO ">. LO CM CO CO co co 9 9 co co CO' CO d d CO CTH CO. CO 0 0 o o o o 0 0 , 0 o o o 0 , 0 o o . o 0 , 0 o o ' o o o o o o ; O ' O O: O ' O O O ' OIO, O 0 , O p O : 0 0 . 0 O O 0 0 0 o o 0 . 0 o . co co d>i 0 o o d i 0 , 0 : 0 0 o ' o c o - c o co 0 0 0 o, 0 1 0 o o o o d d d r t r t T - ' r t r t r t r t co co C O C O o ; o o o • 1 1 1 1 1 1 1 , 1 1 , j 1 1 * 0 0 0 0 0 0 0 0 0 01 o o o o O O ' O O O O O O O O O O , O O O O Oi O O O O O O O O O O O O O O O O O O O O o o O: o o 0 o 0 : 0 o o 0 0 0 0 0 . 0 o o o 0 o o o o o o LO CO j J3 CO .O o CO CO >< o CM r-~ co co; co 00 00 CO CO o o o o LO CO o r co I"-. 00 CO CO o o o o co. co; co; co d d co co CO' CO o'. d co coi co co; co COi co COi co " co co o co co co p d o d I I I CD. CD co • CO CO5'.CO CO CO o o o o . CO CO CO CO 0 0 0 -0 0 , 0 co, co co; co CO COi Ol o O! O CO. CO CO CO; O O O O ; O . O O O O O O O O O 0 0 0 0 0 0 o 0 , 0 0 0 0 CO CO, CO- CO CO CO CO CO CO CD CO; CO; 0 0 0 ci c6 0 0 0 0 0 0 0 co, co co co co co CO, h~ OJ CO 0 d CO 00 co- CO d d CO- CO: CO' CO 0 0 0 CO CD; CO CO; d 0' COi OV CO CO; c o l l s T co' co 9 9 CO: |v- ! COj^COj d f d * ( V . I N . co co d d co co, CO CO d d . CO. CO COi CO d . d CO CO CO CO CO CO CO CO CO CO 0 0 0 0 0 0 : o o o o o : 0; 0 0 0 O ; O O o o o o o 1-co c g "rt c o o CO CO CO rt 00 CO CO tf CO CO LO CO CO CO 0 0 0 0 . 0 o LO 00 00 00 d d S- CO 00- CO d ' d CO. CO d d rt CO OJ co d d CO tf? co- O), o j o ] tf LO' o> 00 d ; d CO CO; d d i 1 1. LO co-co co; d 0. t o s. co cn, d d 00 00 00. co 00 - co. OO CO 00,00 00; CO CO C0! CO- 00 CO CO CO: 00 co 00 0 0 0 0 0 0 0 0 0 0 0 1--. r-ovco 00, co-o o o o 00 co co co d o 00 00. co co d o : N- 00 CO co co co o d d , . 00 coi c o : co co co. d . d ' d : co 00 co- co © d 00 00 co' co d d eo 00 co O) d d ' • CO 00 04 CO. d d co; o»; 1 00 CO d co "d-00 co, CO CO; d d • 1 CO CO O) CO d d * r t 00 CO co 01, d d CO CO CO 00 co a>< co co d o d d 1 [...-|.;: 1 1 00 co; 00 co, co co co co 0 0 0 , 0 CO CO 00 CO. CO CO CO CO 0 0 , 0 0 CO CO CO CD CD CD CO 1 0 co: I N . r-- CM tf COI 00' 00. co: 00 eo; o o 0 0 O ! LO CO- CO d : d o tf I N . h-d d o • • I N - . O 00 © • d o o o o LOCO; 00 CO 00; CO d | ! d OO! CO CO' CO-CM CM co co, CO co-co co; co co-CD CO, CD' "tf tf CO CD ^ * * N — V " t f tf CO CD o o o o o o 0 0 o o o CO; O' 00 CO O rt OJ OJ rt.rt C N ! CD • CO CD CM CN co co CN' CN CD OJ •tf -tf. CD CO. tf tf tf tf CD CD CO : co; 0 0 , 0 0 , 0 . 0 CN CM CD CO CN CN; CN CN: CO CO! CD Ol O O Oi O O O O O O O O ! O O O O O O O O O O O O CM 00 CO co: d d i CM CO O CM l"v h- 00 COS 0 0 d : d IO tv. CQ CO 00! CO O COI 00 OJ O O o l o o CO CO d d CM CM CN CM CO CO CO CO d d o d coco1 co '• co co co co co co co co co CD CO CD 3 CO; CO-CO' CO OJ CD - CO. CD CO 0 0 0 0 0 0 0 0 0 ci da 00 CN CN, CO d : d ' I N . CN •tf; LO O - O ""IMS co 0 LO CO d d COi LO 00 CO CO' CO CO CO d i d 0 0 CN CO CO rv. -v. rv. 0 0 0 0 tf . tf LO . Co LO 5? io" •v. I N . rv. I N . -v. -v. iv . d d d d d : d d SO LO I N . I N . — : — ^ N LO LO LO N- 1 r>- I N , S tf - CO CN CN O. |N.< 1 , o o o o o . CO 00 CO I N - xsi LO 00 rt CO LO CO r-~ COI CD I CO. CO tf. tf LO LO LO; CO CO CO CO CO CO CO d o d d . d o d o d i d d 0 d 0 d 0 o d d o d d d 0 d d d o d O rt v . I N . I N . I N . ; -v. . t-; CN I N . CN CN (V- IN. CN, CN CN' N., fN. |N,. to O: CN' CO d ' d O LO' co tf tf 1 tf. o o d d i CO tf, r-IO LO O CO LO CO CO CO rv. OJ O rt rt CM: CN i COi CO ii. COi CO' CO CO CO tf tf. tf tf CO CO l-v I N . [ N . |v.. |N. |N.|v.,fN.-|^ |N. | N . | N . ; | N . O O O O O O O O O O O . O O O ' O ' O ' O o o 0 . 0 : 0 O to C CO CD CO t >i o C O O r t C M C O t f L O C O I N - c O C O O CNCOCOCOCOCOCOCOCOCOCOtf rtCNCOtfLOCOrNOOCOOrtCMCOtfLOCO'v. tf tf tf tf tf tf tf tf tf LOLOLOLOLOLOLOLO 236 CD CD LO CO §18! op< o o|OI of© of o 818 o of CD CD >. o CO 8P8 © ^ Oi O P i I ' I Oi O © O 6 6 0 O; OiO ol 0 - 0 oj0; o l . , Oi O- O' Oj Ot Oj'O Ol O OJO Oil Oi < 0 o 01 o > oi o 3 O O O' or o o O' O! O 0 , 0 , 0 o o. o, o oi o o o 'Si o oi of P r P . of-o 0 J 0 8 8 CD CD LO CM O'O' OOIO ofo o, O O 8 o , S o S o Q O , O o. © O ©; O O.O O'O.OO o,01 o; o: OjO, o o o o o o o o o o o! © o i Oi o > o O i o o o o o old of oi 0 , 0 0 Oi Of o I 1 I , 1,^ I I I I I I I I I I I i " I I O OiO'O.O'O O O'O o o o o o © o OjO 0 , 0 0 O'O 0 . 0 o 0 , 0 OiO 0; o o d d 0 0 0 O Ol O 0 0 0 ores 0L0I OiOlO OSOi o — • — - — , * — - — w - — - >—- >—. * — O O O O O; O i O' O O O O O.OlO'O'O pj of©; pip] p;o 0 o p'p« p;p: P [ P j P o: o o: o ol o, o o > © 0 1 o < o o oi 237 CD CU >. LO CO o o o o o o. I I I t I , I I o o o o, o>o o o o o o o O. I O I oloi o Ol Oi o 5 fro CD cu o CO •tl O' Ol o Di O OJ O H i j i',f i O O; O f Oi O O i ol o I oi o o 5 o O' o: CO CD LO CN O O O O' O'O' o O O O.O' o.oi o I I I I I I O O o o o o o p o p o p 3 C TZ, ZZ. C H o o o o« o o o o o o o o o LO CD .o CO X I • o CO c o +^  TJ c o o CO CD >» LO 00 CO 00, CO CO CO CO CD^CD* CDj OJf CD; CD! CD d d d'd d i d! d CO CO CO CO' CO' CO CD CJ CD Ol O) CD d o d d d d CO, 00 CD d d CO CD Tf Tt Tf CD • CD CD CD d d< d d Tt, Tt ~ CD CN. CM, CN CD OJ; CD d'd CO COI CD • CD! d'd! co" co; CDI CD did; LCTI to! Is. 9* 9« CNS CM] r*—' s j m 18 tCM CDFOJ to CN Fs •7*3 to; CO CD >> LO I LO; 9} 9; CN; CM O, ©' O! o Tf Tfl |s- h-IrJi s-> Tf; Ttj Tf is. rs.| rs. o o o oi o' o; o CD CD CO c CD 1_ o 00 CD O.T- CN CO Tf LO CO CO, CD I CD CD CD CD CD I • 238 239 240 I. 241 CD CD LO CO O i O . O ' O ' O i O o o o o o o o o o o o o I I I I I I I I I o o o o o o o o o O O O' O'O] o o o f o Si 8 LSI oto , o O Oi o CD CD o CO 8! Ol-Oj Ol O1 o o o| o o; of O; of o j o o PIP p ' © o ° c o LO CD -CJ CD jQ • o CD CD >» LO CN O Oi o o O O O o o o r o O! 0 , 0 O O O O O O O O O o o o o • o, o tot o • O o o , o i o : o , o O' o ' o - 0 : 0 ! o 0 , 0 o; o Oi O iO © 0 0 CO CD >. o CN o f o ol:o CD o> CD5.CD 81 at\ at I at} at at ot at, 00 o" O; d' 0 co! 00 co; 00 CD, CD CO' CD CD ; CD CD, CD d ' d ' *> 1 CO' 00 O"H<D 0 , 0 CD c g S c o O CO CD pDCB O O I C O J iOJi CDfCDi o o o o d 0 d CDj CD CD' CD CO- 00! CO OJ CD CD o d d CO* CO' COj CO CD CJ CD CD O O O O 00 '00 CD- CD df'd CD C D CO c CD o i II i i • is. 00 OJ O T - CN CO CO, 00 , COj CD i CD] CD I CD 242 CO CD OJ CD-CD a: o < o CJ) tf CO T J > T J c CO CD CO c o 4— o >* CO X J o CO c o T J c o O CO CO _0J X J CO 243 244 245 246 247 248 t l l i 111 1 1 co co o CM co to CO rv. co co o T - CM CO tf LOi t o t o s r - r» r-- rv. rv. rv. CO CO CO CO 00 00 " j ^ "x J 249 CO CO >> LO CO 0 , 0 1 0 O- < o: o o; o co co >. o CO O Q O i i o o o i i O' o o-© o 6 o o o o o o o , o, o o« > o 54 O I I '©! o o, o 3j O ii o dl 6 o o O < o < o1 oi o o o o CO CO >. LO CM o; oj o O . O l O 6:©I6 o'o. o o d o o o o OjO o o o.o- ol ol o o' oioi io O' oj oto Oj Oi o oto LO col CO • O CO CO >. o CM O, O OiO Ol o o o o- o o: O'O1 o O O O; o o o O) o ©j o a c | ZZJ. a ^: Ol 0|;0| OiOi O Ol O |>Oj o O J O N '<*N . CO! CO I 'C0| CO i COi CO CO' CO COi CO f col CO - - - - - - - -- - - - - - - --ol o [o oj ofo CO c g -5 c o I O CO CO CO CO CO | v . I V . , co C CO CO CO t >\ o CO S. C O C O O i - C N C O t f L O co oo oooococococococn 250 251 252 253 CO CD >. O CO oto] oto] o OJO! oiol o o Of ©j o| O t p i 8 S i S 1 ©•oi o O L O of© CO CD O' o, O 1 O O i O . O ' O q j p l 0 | ,0 i i ' i i r o o o o o;©! o'o ©;© o! © o, © O ' O 254 ro 0 Q . 0 (Z CD < O CNJ LO CD •g > _C C CO to c O •4— o >, -4—' Zj CD - Q O CD c o T3 C o O CD CO 0) X I CD 255 256 257 CO cu >. CO CO CD r— 00,-cos cn: oil OJ o 9 CM COI CD | pj T J - coi CD. CD' O O O coi s CD j CD dl d COj 00 CD CD d; d CD] CD CD, CD Oi o o o:o i o O ' O . o t o o o oi o; O iO OjO o o O K O H O S3 0 : 0 O O O Oi O O ' O , O i O o o r o 1 o o o o o o o oi o o; o oi o oi o o; o o o, o t o o(-o o o o o o o CO cu >. o CO 9 9 9 COi CO CO, i H [•dl o i o l o^o' CO I CO f-JcO| O} CMS CD CD; CD d, d. d " I CD 00 CO 001 CO oj d col o>! Q O O O Q ; o o o o o o o ol o d d di o; o i o l o © dl o i o o: o o! O I O O ' O o o o o o o T - CM o o CM CO o o o o o o o o o o LOC-- CD, CM O O O 0|"-r! d i cu CD C CO cu cu o CMCOTfLOCDts-COCD CM CO Tt ID CD". t*~' 00 Tt'O) CM I CMl d'.d. LO O! CO! CN! CO co Tt; Tt|iq| LO d d d 1 di d CD O T— CN CO Tf LO i - l CN l CM CN CN' CN I CM CD CN 258 259 260 O UO CO CD .O • o o ; o CO' COJ CO I CO, CO I CD* CO I CO' CO j CO' CO [ CO d} o< d p : d , d ! • i- i 1,1 i . t , oo oo oo co oo i oo: CO i COi CO CO: CO ] CO' d! d ; d 'd- d i d ! CO c g C O O CO CO co co. co, co, d i d co co co- co d ' d co col CO CO; d i d ' CO; CO CO' CO d d I I 00 CO CO CO d d co co CO CO d o CO CO >> COj CO' d i d ; i i ? CO' CO cofco' d ' ©) COlCO CD' CO' d>d • i i CO > CO' co I co' 2J 2J LOJLO, CO) OT? d|'dj co, co; co p: d o 1 • co! co' COj C0| 2- Si LO'IT)! CO i COi d o ! OO CO d o I T -tf; -tf |v- ,|v. CO CO CO CO d i d • i CO CO CO,OT 2- 2 LO: LO co co d , d CO CO 00j 00 d -tf J -tf d ' d o- (v. |v- iv. Of.Oi o Ol O |v-i rv. IV- |V. o OO- 00 d i d i !" i •tf |-tf d,'d Iv-'-i-v, |V.J|V. d f d CO CO CO .•^ c CO k_ l_ Z> o N-J 00 00 CO rfio" 261 CM CN CM CM CM CN CN 262 263 264 to CO LO CO o o 0 0 , 0 : 0 O O O O L O I O • i M O f O ] O O ^ O 1 o ol o O I O ] o t o 0 0 0 0 0 0 O " O O OJ o 0 0 0 0 p o p rt rt rt CO CD > , o CO Si O ' O 0 ^ 0 0 1 o o 0 0 0 1 1 \ 1 ~ 1 1 • J 1 * 1 0 0 0 o ! o o o o o O O ' O o o o i o o o o; o'o- oio| o O J O , O O O i O T— i rt' rt I rtj rt 0 o ' o 0 1 o o CO CD >> LO CM o o o ' o o i o ' O ' 0 o p j p , p i p pj o o o o o . i o o p p p i rt—• ^ ^ in I<t« . -II O O O O I O o 0 1 0 - 0 0 o o rt'i. o o o o O l S l o o O O j O O O i r O O - O o l o ) O o rt, rt J-^i o [ o ! 0 0 O f O o o o LO col .0. co .O. • o CO CD > . o CM CO c g rt T 3 C o to CD CO CO c CO 1_ 1_ ZJ o CO OO CO- C O , COi CO I s . (v. , (v.! | v . 1 | s . i s . d d o d l d ' d is., co \ co o rti CM COI COi CO CO C0| CO CO, CO O l o tf LO CD ' CO 265 266 267 268 ro CO >> o CO O! O 0 ! O i O o o', Ol O 0> OI O O, O o i o d o Oi O o o o o 0, O O; o o ! o o i 5 O LO CO JO CO JO o CO c o s c o o CO CO >. LO CN Oi o o' o , Oi o o, o i O, O! 0 | O o o i o , o o o 0 . 0 : 0 O O s o 01 o o o O O ;0 O ! 0 to Ol O O' O Oj o o o p p Oj O pj 01 p o o ' o CO CO >, o CM ,Oj O ' O OtOl o o p p; p p p p p CO CO o CO CO CT), oj Oj'dS 'dj oC'ol ,o | OI'Oj c o c o CO CO d i d 0 0 0 0 CNfCN" CN; CO \ 00 00 d ,dj d 1 1 1 1 •tfttfi -<t |N-riN-' r-co co co co 0 . 0 Pip LO?LO cn: co d d CO CO d d d l d i d , d CN; CN 00 = 00 d 1 d • 1 tf.'-* d d COi 00 ! CO-IN., IN. |N. 0 0 0 00 00: 00 00 |N |N. IN. |N. I o o o d 00 00 |N- J N . dS-'o CO CO CO c CO I I Z5 o a IN.| co I coi o 00] 00 j cot CO T- CM; CO COi CO ICO; tf!LO co^ co 269 270 Wj LO LO, CD* CO CO. o; d ol CM 00 CM 00-CM 00 CM CM 00 CMi 00, d d d O' d 9 CO CO CO co; CO CO 1— d d d di d oj *—•i co: N-l 00 oo] IV] oo ay d d d' d d ;C0| tf I LO CO N- 00 OJ tf 1 tf ( tf tf tf tf tf d* dl O T - c M c o t f i o c o r -LOLOLOLOLOLOLOLO 271 272 o*ol OiOi o oLol o o| o co .o CO .Q • O CO c o c o o o"i o ' O j S l o ' S o o O i O I O O O O O O 273 APPENDIX IV CONTRIBUTING CENTERS (WORLDWIDE COHORT) 274 Participating Centers and Number of Patients (number of presymptomatic at-risk, number of affected patients tested) Austria Aschauer Harald, Department of General Psychiatry, University Hospital for Psychiatry, Vienna (1,8) Belgium Eric Legius, Center for Human Genetics, Leuven (2,81) Verellen Lannoy, Centre de Genetique Humaine et Unite de Genetique Medicale, Bruxelles (24,80) Canada Tillie Chiu, Children's Hospital of Eastern Ontario, Ottawa (14,7) Cathy Gillies, Janice Schween, Thunder Bay District Health Unit, Thunder Bay (5,2) Heather Hogg, Jill Beis, Christie Riddel, Medical Genetics, IWK Grace Health Centre, Halifax (0,36) Odell Loubser, Ryan Brinkman, Elisabeth Almqvist, Susan Creighton, Michael Hayden Department of Medical Genetics, University of British Columbia, Vancouver (294,668) Wendy Meschino, Department of Genetics, North York General Hospital, North York (50,31) 275 David Rosenblatt, Maria Galvez, Division of Medical Genetics, Department of Medicine, McGill University, Montreal (25,26) Anaar Sajoo, Sandra Farrell, The Credit Valley Hospital, Mississauga (3,4) Germany Elke Holinski-Feder, Martin Daumer, Michael Scholz, Department of Medical Genetics, University of Munich, Munich (0,52) Italy Paola Mandich, Emilio Di Maria, Department of Neurological Sciences and Vision, University of Genova, Genova and Andrea Novelletto, Department of Cell Biology, University of Calabria, Rende (40,184) Japan Ichiro Kanazawa, Jun Goto, Department of Neurology, University of Tokyo Hospital, Tokyo (35,182) South Africa Jacquie Greenberg, Alison September, Department of Human Genetics, University of Cape Town Medical School, Cape Town (3, 46) Amanda Krause, Department of Human Genetics, South African Institute for Medical Research and University of the Witwatersrand, Johannesburg (3,5) 276 Sweden Gabrielle Ahlberg, Center of Molecular Medicine, Karolinska Hospital, Stockholm (7,27) Ingela Landberg, Ulf Kristoffersson, Department of Clinical Genetics University Hospital, Lund (2,25) United States of America Bonnie Baty, Department of Pediatrics, University of Utah Health Sciences Center, Salt < Lake City (4,3) Robin Bennett, Thomas Bird, Department of Medical Genetics, University of Washington, Seattle (32,143) Laurie Carr, Susan Perlma, Department of Neurology, University of California, Los Angeles (0,42) Kimberly Quaid, Indiana University/Purdue University, Indianapolis (22,8) Kathleen Delp, Spectrum Health Genetics, Grand Rapids (4,3) Mahala Earnhart, Brad Hiner, Movement Disorder Center, Marshfield Clinic, Marshfield (4,7) Carolyn Gray, Richard M. Dubinsky, Department of Neurology, University of Kansas Medical Center, Kansas City (6,9) Madaline Harrison, Department of Neurology, University of Virginia Health System, Charlottesville (7,39) 277 Don Higgins, Departments of Neurology and Neuroscience, Ohio State University, Columbus (13,49) Danna Jennings, Yale, New Haven (18,14) John Johnson, Linda Beischel, Shodair Hospital, Helena (10,15) Karen Kovak, Oregon Health Sciences University, Portland (0,2) Katie Leonard, Baylor College of Medicine, Houston (17,4) Richard H. Myers, Nat Couropmitree, Beth Knowlton, Boston University School of Medicine, Boston (32,226) Martha Nance, Park Nicollet Clinic, St. Louis Park (12,198) Mark E. Nunes, United States Air Force Medical Genetics Center, Keesler Air Force Base, Mississippi (6,8) Jane Paulsen, Beth Turner, Departments of Psychiatry and Neurology University of Iowa, Iowa City (9,2) Guerry Peavy, Jody Corey-Bloom, Mark Jacobson, University of California, San Diego (9,54) Adam Rosenblatt, Christopher Ross, Johns Hopkins University School of Medicine, Baltimore (2,158) Kathleen Shannon, Rush Presb/St. Lukes Medical Center, Chicago (9,29) 278 Elaine Spector, University of Colorado Health Sciences Center DNA Diagnostic Laboratory, Maureen Leehey, University of Colorado School of Medicine, Lauren Seeberger, Colorado Neurologic Institute, Denver, (14,23) Carrie Stoltzfus, David R. Witt, Elaine Louie, Genetics Department, Kaiser Permanente, Northern California, San Jose (29,38) Andrea Zanko, Division of Medical Genetics, University of California, San Francisco (51,96) 2 7 9 

Cite

Citation Scheme:

        

Citations by CSL (citeproc-js)

Usage Statistics

Share

Embed

Customize your widget with the following options, then copy and paste the code below into the HTML of your page to embed this item in your website.
                        
                            <div id="ubcOpenCollectionsWidgetDisplay">
                            <script id="ubcOpenCollectionsWidget"
                            src="{[{embed.src}]}"
                            data-item="{[{embed.item}]}"
                            data-collection="{[{embed.collection}]}"
                            data-metadata="{[{embed.showMetadata}]}"
                            data-width="{[{embed.width}]}"
                            async >
                            </script>
                            </div>
                        
                    
IIIF logo Our image viewer uses the IIIF 2.0 standard. To load this item in other compatible viewers, use this url:
http://iiif.library.ubc.ca/presentation/dsp.831.1-0090583/manifest

Comment

Related Items