@prefix vivo: . @prefix edm: . @prefix ns0: . @prefix dcterms: . @prefix dc: . @prefix skos: . vivo:departmentOrSchool "Medicine, Faculty of"@en, "Population and Public Health (SPPH), School of"@en ; edm:dataProvider "DSpace"@en ; ns0:degreeCampus "UBCV"@en ; dcterms:creator "Marra, Carlo Armando"@en ; dcterms:issued "2009-11-30T21:39:44Z"@en, "2004"@en ; vivo:relatedDegree "Doctor of Philosophy - PhD"@en ; ns0:degreeGrantor "University of British Columbia"@en ; dcterms:description "Objectives: The primary objectives of this study were to: 1) compare the properties of commonly utilized indirect utility assessment instruments (the Health Utilities Index Mark 2 and 3 [HUI2 and HUI3], the EuroQol [EQ-5D], and the Short Form 6-D [SF-6D] in terms o f feasibility, reliability, construct validity and longitudinal construct validity (responsiveness) in rheumatoid arthritis (RA); and 2) determine if, when utilized to act as quality weights in the estimation of quality adjusted life years (QALYs) in an economic evaluation, the application of scores from the different instruments would result in different incremental cost per QALY ratios. The primary hypotheses of this study were that there would be differences between these instruments in terms of their properties and that using their scores to estimate QALYs in an economic evaluation of an intervention for RA would result in significantly different estimates. Methods: Three hundred and twenty patients between 19 and 90 years of age diagnosed with RA residing in the Greater Vancouver Regional District or rural Okanagan region of British Columbia were recruited. Patients were administered a questionnaire containing the HUI2, HUI3, EQ-5D, SF-6D, a disease-specific instrument (the Rheumatoid Arthritis Quality of Life [RAQoL] questionnaire, and a disability index (the Health Assessment Questionnaire [HAQ]). In addition, questions were asked regarding RA management (including drug use and toxicity), RA severity (including swollen and tender joints, pain visual analogue scale, RA duration, patient global assessment of disease activity VAS , and self-perceived RA severity and control), socio-economic status, and RA health utilization. Questionnaires were administered at baseline, three and six months thereafter. In a subset of patients, an additional questionnaire was administered within five weeks of the three month questionnaire to determine reliability. Results: Scores obtained with the HUI2, HUI3 , EQ-5D, and the SF-6D were significantly different, had low agreement, and appeared to be measuring mostly physical function and pain. All the instruments displayed cross-sectional construct validity and were able to discriminate between different levels of severity of RA. However, when their scores were used to estimate QALYs in an economic evaluation of RA , there was a two fold difference between the lowest (using the HUI3) and highest (using the SF-6D) incremental cost per QALY ratios. Further examination revealed that the scores achieved with the indirect utility assessment instruments were influenced by annual household income despite adjustment for RA severity and other chronic diseases. Finally, in longitudinal analyses, the disease-specific RAQoL displayed the highest reliability and sensitivity to change with the HUI3 and SF-6D scores being the most responsive of the indirect utility assessment instruments in measuring positive change. Conclusions: Although all indirect utility assessment measures appear to be able to assess generic HRQL in RA , when used as quality weights to estimate QALYs in an economic evaluation, they yielded vastly different estimates of the incremental cost-effectiveness ratio that could result in different policy recommendations. The scores of these instruments could also be influenced by income leading to possible bias in cost-effectiveness analyses. The HUI3 and SF-6D were responsive to positive changes in RA. The RAQoL displayed excellent properties and is a suitable disease-specific HRQL instrument for RA."@en ; edm:aggregatedCHO "https://circle.library.ubc.ca/rest/handle/2429/15983?expand=metadata"@en ; dcterms:extent "17188556 bytes"@en ; dc:format "application/pdf"@en ; skos:note "OUTCOME MEASURES IN ECONOMIC EVALUATIONS OF RHEUMATOID ARTHRITIS by C A R L O A R M A N D O M A R R A B.Sc. (Pharm.), University o f British Columbia, 1992 Pharm.D., University of British Columbia, 1995 A THESIS S U B M I T T E D I N T H E P A R T I A L F U L F I L M E N T O F T H E R E Q U I R E M E N T S F O R T H E D E G R E E O F D O C T O R OF P H I L O S O P H Y in T H E F A C U L T Y O F G R A D U A T E S T U D I E S Department of Health Care and Epidemiology We accept this thesis as conforming to the required standard Aslam H . Anis John M . Esdaile Jacek Kopec Anthony E . Boardman T H E U N I V E R S I T Y O F B R I T I S H C O L U M B I A May 06, 2004 © Carlo Armando Marra, 2004 The work presented in this thesis was conceived, conducted and disseminated by the doctoral candidate. The co-authors of the manuscripts that comprise part of this dissertation made contributions only as is commensurate with a dissertation committee or as experts in a specific area as it pertains to the work. The co-authors provided direction and support. The co-authors reviewed each manuscript prior to submission for publication and offered critical evaluations; however, the candidate was responsible for the writing and the final content of thee manuscripts. Aslam H . AnJs, PW.D., Chair, Supervisory Committee ABSTRACT Objectives: The primary objectives of this study were to: 1) compare the properties of commonly utilized indirect utility assessment instruments (the Health Utilities Index Mark 2 and 3 [HUI2 and H U D ] , the EuroQol [EQ-5D], and the Short Form 6-D [SF-6D] in terms of feasibility, reliability, construct validity and longitudinal construct validity (responsiveness) in rheumatoid arthritis (RA) ; and 2) determine if, when utilized to act as quality weights in the estimation of quality adjusted life years ( Q A L Y s ) in an economic evaluation, the application of scores from the different instruments would result in different incremental cost per Q A L Y ratios. The primary hypotheses of this study were that there would be differences between these instruments in terms of their properties and that using their scores to estimate Q A L Y s in an economic evaluation of an intervention for R A would result in significantly different estimates. Methods: Three hundred and twenty patients between 19 and 90 years of age diagnosed with R A residing in the Greater Vancouver Regional District or rural Okanagan region o f British Columbia were recruited. Patients were administered a questionnaire containing the HUI2 , H U D , E Q - 5 D , SF-6D, a disease-specific instrument (the Rheumatoid Arthritis Quality of Life [ R A Q o L ] questionnaire, and a disability index (the Health Assessment Questionnaire [HAQ]) . In addition, questions were asked regarding R A management (including drug use and toxicity), R A severity (including swollen and tender joints, pain visual analogue scale, R A duration, patient global assessment o f disease activity V A S , and self-perceived R A severity and control), socio-economic status, and R A health utilization. Questionnaires were administered at baseline, three and six months thereafter. In a subset of patients, an i i additional questionnaire was administered within five weeks of the three month questionnaire to determine reliability. Results: Scores obtained with the HUI2, H U D , EQ-5D, and the SF-6D were significantly different, had low agreement, and appeared to be measuring mostly physical function and pain. A l l the instruments displayed cross-sectional construct validity and were able to discriminate between different levels of severity of R A . However, when their scores were used to estimate Q A L Y s in an economic evaluation of R A , there was a two fold difference between the lowest (using the H U D ) and highest (using the SF-6D) incremental cost per Q A L Y ratios. Further examination revealed that the scores achieved with the indirect utility assessment instruments were influenced by annual household income despite adjustment for R A severity and other chronic diseases. Finally, in longitudinal analyses, the disease-specific R A Q o L displayed the highest reliability and sensitivity to change with the H U D and SF-6D scores being the most responsive of the indirect utility assessment instruments in measuring positive change. Conclusions: Although all indirect utility assessment measures appear to be able to assess generic H R Q L in R A , when used as quality weights to estimate Q A L Y s in an economic evaluation, they yielded vastly different estimates o f the incremental cost-effectiveness ratio that could result in different policy recommendations. The scores of these instruments could also be influenced by income leading to possible bias in cost-effectiveness analyses. The H U D and SF-6D were responsive to positive changes in R A . The R A Q o L displayed excellent properties and is a suitable disease-specific H R Q L instrument for R A . 111 TABLE OF CONTENTS ABSTRACT II TABLE OF CONTENTS IV LIST OF TABLES .- : IX LIST OF FIGURES XII CHAPTER 1: INTRODUCTION 1 1.1 RHEUMATOID ARTHRITIS: EPIDEMIOLOGY, ECONOMIC BURDEN AND ECONOMIC EVALUATION 1 1.2 RESEARCH NEEDS AND STUDY JUSTIFICATION 8 1.3 STUDY HYPOTHESIS, OBJECTIVES, AND THESIS ORGANIZATION 11 1.4 SUMMARY 13 1.5 REFERENCES 15 CHAPTER 2: BACKGROUND 25 2.1 COST-EFFECTIVENESS ANALYSIS AND THE QALY 25 2.2 PREFERENCE-BASED, INDIRECT UTILITY ASSESSMENT MEASURES 31 2.2.1. The Health Utilities Index (HUI) Mark 2 and 3 31 2.2.2. The EuroQol (EQ-5D) 35 2.2.3. The Short Form 6D (SF-6D) 39 2.3 EMPIRIC COMPARISONS BETWEEN THE INDIRECT UTILITY ASSESSMENT INSTRUMENTS .'...40 2.3.1. Comparisons between the Health Utilities Index Mark 2 and Mark 3 40 2.3.2. Comparisons across Indirect Utility Assessment Instruments Outside of Musculoskeletal Diseases 44 2.3.3. Comparisons across Indirect Utility Assessment Instruments within Musculoskeletal Diseases 54 2.4 QUALITY WEIGHTINGS IN THE ESTIMATION OF QALYS IN COST-UTILITY ANALYSES IN RA: WHAT ARE INVESTIGATORS USING? 58 2.5 SUMMARY 60 iv 2.6 REFERENCES 62 CHAPTER 3: A COMPARISON OF FOUR INDIRECT METHODS OF ASSESSING UTILITY VALUES IN RHEUMATOID ARTHRITIS 74 3.1 FOREWORD 74 3.2 INTRODUCTION 74 3.3 METHODS 76 3.3.1. Measures 77 3.3.2. Data Analysis • 78 3.4 RESULTS 80 3.4.1 Comparison of Utility Scores 80 3.4.2 Analysis of Agreement 82 3.4.3 Exploratory Factor Analysis 82 3.5 DISCUSSION 83 3.6 REFERENCES 88 CHAPTER 4: COMPARISON OF GENERIC, INDIRECT UTILITY MEASURES (THE HUI2, HUB, SF-6D, AND THE EQ-5D) AND DISEASE-SPECIFIC INSTRUMENTS (THE RAQOL AND THE HAQ) IN RHEUMATOID ARTHRITIS 106 4.1 FOREWORD 106 4.2 INTRODUCTION 106 4.3 METHODS HO 4.3.1 Sample 110 4.3.2 Measures HI 4.3.3 Data Analysis 113 4.4 RESULTS 115 4.4.1 Sample 115 4.4.2 Description of Global and Single-Attribute Utilities 116 4.4.3 Construct Validity 117 4.5 DISCUSSION 119 4.6 REFERENCES 123 CHAPTER 5: NOT ALL QALYS ARE EQUAL: THE IMPACT OF USING DIFFERENT INDIRECT UTILITY MEASURES ON ESTIMATING THE COST-UTILITY OF INFLIXIMAB IN RHEUMATOID ARTHRITIS 137 5.1 FOREWORD 137 5.2 INTRODUCTION 137 5.3 METHODS 139 5.3.1 Clinical Trial Data Source 139 5.3.2 Overview of Model 141 5.3.3 Transition Probability Matrices and Statistical Modeling 142 5.3.4 Mortality Rate 144 5.3.5 Utilities and QALYS 145 5.3.6 Cost Estimation 145 5.3.7 Survival Analysis 148 5.3.8 Cost-Utility and Probabilistic Analysis 148 5.3.9 Univariate Sensitivity Analysis 149 5.4 RESULTS 150 5.4.1 Simulation Results 150 5.4.2 Utility and QALY Values 150 5.4.3 Cost-Utility and Probabilistic Analysis 151 5.4.4 Traditional Sensitivity Analysis 152 5.5 DISCUSSION 152 5.6 REFERENCES 158 CHAPTER 6: THE IMPACT OF LOW FAMILY INCOME ON SELF-REPORTED HEALTH OUTCOMES IN PATIENTS WITH RHEUMATOID ARTHRITIS WITHIN A PUBLICLY-FUNDED HEALTH CARE ENVIRONMENT 187 6.1 FOREWORD 187 6.2 INTRODUCTION 187 vi 6.3 METHODS 189 6.3.1 Study Sample and Design 189 6.3.2 Generic Health-Related Quality of Life Measurement 189 6.3.3 Functional Status Measurement 190 6.3.4 RA Specific Quality of Life Measure 191 6.3.5 Clinical Measurements 192 6.3.6 Socioeconomic Status 192 6.3.7 Statistical Analysis 193 6.4 RESULTS 194 6.5 DISCUSSION 198 6.6 REFERENCES 205 CHAPTER 7: ARE INDIRECT UTILITY MEASURES RELIABLE AND RESPONSIVE IN RHEUMATOID ARTHRITIS PATIENTS? 218 7.1 FOREWORD 218 7.2 INTRODUCTION 218 7.3 METHODS 220 7.3.1 Study Sample 220 7.3.2 Measures 221 7.3.3 Data Analysis 224 7.4 RESULTS 229 7.4.1 Demographics and Missing Values 229 7.4.2 Reliability 230 7.4.3 Validity of the Transition Questionnaire 231 7.4.4 Responsiveness 232 7.4.5 Flexible Polytomous Regression Techniques 234 7.4.6 Change in Unweighted Domain Scores (EQ-5D, SF-6D) and Single Attribute Utilities (HUI2, HUD) 236 7.5 DISCUSSION 236 vii 7.6 REFERENCES 246 CHAPTER 8: GENERAL DISCUSSION, CONCLUSIONS, AND RECOMMENDATIONS 278 8.1 SUMMARY OF STUDY FINDINGS 278 8.2 UNIQUE CONTRIBUTIONS, IMPACT, AND IMPLICATIONS 282 8.3 STUDY STRENGTHS AND LIMITATIONS 284 8.3.1 Strengths 284 8.3.2 Limitations 287 8.4 RECOMMENDATIONS 296 8.4.1 Further Research 297 8.5 CONCLUSIONS : 298 8.6 REFERENCES 301 APPENDIX 1 308 APPENDIX II 318 APPENDIX III 338 APPENDIX IV 356 APPENDIX V 387 APPENDIX VI 389 viii LIST OF TABLES TABLE 2.1: SOURCE OF PREFERENCES USED FOR QALY WEIGHTS IN ECONOMIC EVALUATIONS OFRA 73 TABLE 3.1: COMPARISON OF THE INDIRECT UTILITY ASSESSMENT INSTRUMENTS 91 TABLE 3.2: CLINICAL CHARACTERISTICS OF THE STUDY PARTICIPANTS : 92 TABLE 3.3: OVERALL MEAN AND MEDIAN UTILITY SCORES FROM THE INSTRUMENTS IN THE SAMPLE OF RA PATIENTS 93 TABLE 3.4: INTRACLASS CORRELATIONS AND 95% CONFIDENCE INTERVALS BETWEEN INSTRUMENTS 94 TABLE 3.5: ROTATED FACTOR PATTERN MATRIX 95 TABLE 3.6: FACTOR CORRELATION MATRIX 96 TABLE 3.7: RELATIVE PRATT INDEX SCORES ASSESSING RELATIVE CONTRIBUTION OF EACH FACTOR TO THE MODEL'S ADJUSTED R2 97 TABLE 4.1: OVERVIEW OF MAUT INSTRUMENT PROPERTIES 127 TABLE 4.2: CHARACTERISTICS OF THE STUDY PARTICIPANTS 128 TABLE 4.3: MULTI-ATTRIBUTE AND SINGLE ATTRIBUTE UTILITY SCORES FROM THE MAUT INSTRUMENTS 129 TABLE 4.4: DOMAIN RESPONSES FOR THE MAUT INSTRUMENTS 130 TABLE 4.5: RELATIONSHIP BETWEEN RA SEVERITY AND CONTROL AND THE GLOBAL UTILITY SCORES FOR EACH OF THE MAUT INSTRUMENTS 132 TABLE 4.6: DICHOTOMOUS MEASURES OF RA SEVERITY 133 TABLE 4.7: CORRELATIONS (SPEARMAN'S RHO) FOR MULTI-ATTRIBUTE AND SELECT SINGLE ATTRIBUTE UTILITY SCORES WITH RA SEVERITY 134 TABLE 4.8: SIMPLE LINEAR REGRESSION ANALYSES FOR OVERALL INSTRUMENT SCORES AND HAQ 135 TABLE 5.1: OBSERVED TRANSITION PROBABILITY MATRICES FOR METHOTREXATE FROM THE ATTRACT TRIAL (FROM WEEK 30 TO WEEK 54) 162 ix T A B L E 5.2: OBSERVED TRANSITION PROBABILITY MATRICES FOR INFLIXIMAB FROM THE ATTRACT TRIAL (FROM WEEK 30 TO WEEK 54) 163 TABLE 5.3: CALCULATED WEEKLY TRANSITION PROBABILTY MATRIX FOR METHOTREXATE 164 TABLE 5.4: CALCULATED WEEKLY TRANSITION PROBABILTY MATRIX FOR INFLIXIMAB. 165 TABLE 5.5: UNIT COSTS (IN CANADIAN DOLLARS), OTHER PARAMETERS AND EQUATIONS IN THE MARKOV MODEL 166 TABLE 5.6: MULTIPLE LINEAR REGRESSION MODELS OF THE INDIRECT UTILITY MEASURES 167 TABLE 5.7: DISCOUNTED QALYS GENERATED BY INDIRECT UTILITY METHOD IN THE MARKOV MODEL 168 TABLE 5.8: EXPECTED COSTS AND INCREMENTAL COST-UTILITY RATIOS GENERATED BY THE INDIRECT UTILITY METHODS 169 TABLE 5.9: UNIVARIATE SENSITIVITY ANALYSIS - INCREMENTAL COST-UTILITY RATIO (INCREMENTAL COST PER QALY) BY INDIRECT UTILITY METHOD 170 TABLE 6.1: CHARACTERISTICS OF THE STUDY PARTICIPANTS (N= 313) 209 TABLE 6.2. PROPERTIES OF THE MEASURES OF SOCIOECONOMIC STATUS (SES) IN OUR SAMPLE '' 210 TABLE 6.3 UNIVARIATE ASSOCIATIONS WITH THE GENERIC HRQL MEASURES (THE SF-6D AND THE HUB) 211 TABLE 6.4 UNIVARIATE ASSOCIATIONS WITH THE GENERIC HRQL MEASURES (THE HUI2 AND THE EQ-5D) : 212 TABLE 6.5 UNIVARIATE ANALYSIS WITH THE DISEASE-SPECIFIC MEASURES (THE HAQ AND THE RAQoL) 213 TABLE 6.6: COMPARISON OF RA AND HEALTH STATUS MEASURES ACROSS DIFFERENT SOCIAL CLASSES 214 T A B L E 7.2: TEST - RETEST RELIABILITY 251 x TABLE 7.3: INTRACLASS CORRELATION COEFFICIENT VALUES FOR GENERIC AND DISEASE-SPECIFIC HRQL MEASURES FOR THOSE REPORTING NO CHANGE IN THEIR RHEUMATOID ARTHRITIS BETWEEN 0 AND 6 MONTHS 252 TABLE 7.4: MINIMALLY IMPORTANT DIFFERENCES REPORTED IN THE LITERATURE AND DERIVED FROM THE SAMPLE USING ANCHOR-BASE APPROACHES 253 TABLE 7.5: CORRELATIONS BETWEEN THE TRANSITION QUESTION AND CHANGES IN RHEUMATOID ARTHRITIS OUTCOME VARIABLES FROM 0 TO 6 MONTHS 254 TABLE 7.6: DIFFERENCES AND RESPONSIVENESS STATISTICS FROM BASELINE TO 6 MONTHS STRATIFYING THE SAMPLE BY THE TRANSITION QUESTION 255 TABLE 7.7: DIFFERENCES AND RESPONSIVENESS STATISTICS FROM BASELINE TO 6 MONTHS STRATIFYING THE CATEGORIES CREATED FROM PATIENT GLOBAL ASSESSMENT OF DISEASE SEVERITY VAS 256 TABLE 7.8: RANKINGS OF RESPONSIVENESS OF MEASURES ACCORDING TO THE RESPONSIVENESS STATISTIC AND THE EXTERNAL CRITERIA OF CHANGE (EITHER RESPONSES TO THE PATIENT TRANSITION QUESTION OR TO THE PATIENT GLOBAL ASSESSMENT OF DISEASE ACTIVITY VAS) 257 TABLE 7.9: ASSOCIATIONS BETWEEN INSTRUMENT UNWEIGHTED DOMAINS / SINGLE ATTRIBUTE SCORE CHANGES AND SELF-REPORTED CHANGE FROM 0 TO 6 MONTHS 258 xi L I S T O F F I G U R E S FIGURE 3.1: DISTRIBUTIONS OF GLOBAL UTILITY VALUES ACROSS THE MAUT INSTRUMENTS 98 FIGURE 3.2: BLAND-ALTMAN PLOT OF DIFFERENCE BETWEEN THE HUI2 AND HUB VS. THE AVERAGE SCORE WITHIN PATIENTS 100 FIGURE 3.3: BLAND-ALTMAN PLOT OF DIFFERENCE BETWEEN THE HUB AND THE SF-6D VS. THE AVERAGE SCORE WITHIN PATIENTS 101 FIGURE 3.4: BLAND-ALTMAN PLOT OF DIFFERENCE BETWEEN THE HUB AND EQ-5D VS. THE AVERAGE SCORE OF THESE TWO INSTRUMENTS WITHIN PATIENTS 102 FIGURE 3.5: BLAND-ALTMAN PLOT OF THE DIFFERENCE BETWEEN THE EQ-5D AND SF-6D VS. THE AVERAGE SCORE WITHIN PATIENTS 103 FIGURE 3.6: BLAND-ALTMAN PLOT OF DIFFERENCE BETWEEN THE HUI2 AND THE SF-6D VS. THE AVERAGE SCORE WITHIN PATIENTS 104 FIGURE 3.7: BLAND-ALTMAN PLOT OF DIFFERENCE BETWEEN THE HUI2 AND EQ-5D VS. THE AVERAGE SCORE OF THESE TWO INSTRUMENTS WITHIN PATIENTS 105 FIGURE 4.1: BOX PLOT OF MAUT INSTRUMENT GLOBAL UTILITY SCORES 136 FIGURE 5.1: A SCHEMATIC REPRESENTATION OF THE MARKOV, HAQ-BASED MODEL USED FOR THE COST-EFFECTIVENESS ANALYSIS 171 FIGURE 5.2: KAPLAN-MEIER SURVIVAL CURVES FROM THE 100,000 MONTE CARLO SIMULATIONS 172 FIGURE 5.3: INCREMENTAL COSTS AND QALYS FROM 1000 2ND ORDER MONTE CARLO SIMULATIONS 173 FIGURE 5.4: COST-UTILITY ACCEPTABILITY CURVES FOR EACH INDIRECT UTILITY MEASURE 174 FIGURE 6.1: GENERIC HRQL BY SELF-REPORTED ANNUAL INCOME (HUB AND SF-6D) 215 FIGURE 6. 2: GENERIC HRQL BY SELF-REPORTED ANNUAL INCOME (HUI2 AND EQ-5D) 216 FIGURE 6.3: RAQOL SCORE AND HAQ DISABILITY INDEX BY SELF-REPORTED INCOME 217 Xll FIGURE 7.1: AGREEMENT BETWEEN THE PATIENT TRANSITION QUESTION AND CHANGES USING MID CUTOFFS FOR THE GENERIC AND DISEASE-SPECIFIC INSTRUMENTS 259 FIGURE 7.2: SCATTERPLOT OF HUI2 UTILITY SCORES OVER TIME STRATIFIED BY THE RESULTS OF THE COLLAPSED TRANSITION QUESTION 262 FIGURE 7.3: SCATTERPLOT OF HUI3 UTILITY SCORES OVER TIME STRATIFIED BY THE RESULTS OF THE COLLAPSED TRANSITION QUESTION 263 FIGURE 7.4: SCATTERPLOT OF EQ-5D UTILITY SCORES OVER TIME STRATIFIED BY THE RESULTS OF THE COLLAPSED TRANSITION QUESTION 264 FIGURE 7.5: SCATTERPLOT OF SF-6D UTILITY SCORES OVER TIME STRATIFIED BY THE RESULTS OF THE COLLAPSED TRANSITION QUESTION 265 FIGURE 7.6: RESULTS OF THE MULTI-RESPONSE MODEL OF THE ASSOCIATION BETWEEN A CHANGE IN THE HUI2 AND THE TRANSITION QUESTION 266 FIGURE 7.7: RESULTS OF THE MULTI-RESPONSE MODEL OF THE ASSOCIATION BETWEEN A CHANGE IN THE HUD AND THE TRANSITION QUESTION 267 FIGURE 7.8: RESULTS OF THE MULTI-RESPONSE MODEL OF THE ASSOCIATION BETWEEN A CHANGE IN THE EQ-5D AND THE TRANSITION QUESTION 268 FIGURE 7.9: RESULTS OF THE MULTI-RESPONSE MODEL OF THE ASSOCIATION BETWEEN A CHANGE IN THE SF-6D AND THE TRANSITION QUESTION 269 FIGURE 7.10: RESULTS OF THE MULTI-RESPONSE MODEL OF THE ASSOCIATION BETWEEN A CHANGE IN THE RAQOL AND THE TRANSITION QUESTION.... 270 FIGURE 7.11: RESULTS OF THE MULTI-RESPONSE MODEL OF THE ASSOCIATION BETWEEN A CHANGE IN THE HAQ AND THE TRANSITION QUESTION 271 FIGURE 7.12: RESULTS OF THE MULTI-RESPONSE MODEL OF THE ASSOCIATION BETWEEN A CHANGE IN THE HUI2 AND THE PATIENT GLOBAL ASSESSMENT OF DISEASE ACTIVITY272 FIGURE 7.13: RESULTS OF THE MULTI-RESPONSE MODEL OF THE ASSOCIATION BETWEEN A CHANGE IN THE HUI3 AND THE PATIENT GLOBAL ASSESSMENT OF DISEASE ACTIVITY273 xiii FIGURE 7.14: RESULTS OF THE MULTI-RESPONSE MODEL OF THE ASSOCIATION BETWEEN A CHANGE IN THE EQ-5D AND THE PATIENT GLOBAL ASSESSMENT OF DISEASE ACTIVITY 274 FIGURE 7.15: RESULTS OF THE MULTI-RESPONSE MODEL OF THE ASSOCIATION BETWEEN A CHANGE IN THE SF-6D AND THE PATIENT GLOBAL ASSESSMENT OF DISEASE ACTIVITY275 FIGURE 7.16: RESULTS OF THE MULTI-RESPONSE MODEL OF THE ASSOCIATION BETWEEN A CHANGE IN THE RAQOL AND THE PATIENT GLOBAL ASSESSMENT OF DISEASE ACTIVITY 276 FIGURE 7.17: RESULTS OF THE MULTI-RESPONSE MODEL OF THE ASSOCIATION BETWEEN A CHANGE IN THE HAQ AND THE PATIENT GLOBAL ASSESSMENT OF DISEASE ACTIVITY.... 277 xiv ACKNOWLEDGEMENTS A large debt of gratitude is owed to my co-supervisors, Aslam Anis and John Esdaile, for providing me with invaluable mentorship and support. From the outset, they provided sound advice, amazing opportunities and lots of good ideas. I would have been lost without them. Most sincere thanks to my committee members: Stephen Marion who shared so unselfishly of his time imparting a small fraction of his immense knowledge to me and Jacek Kopec for mentorship and counsel that has benefited me greatly and improved the quality of my work. The support of dedicated research assistants made this work possible: Barbara Vinduska, Amir Adel Rashidi, and Janet Pursed. Also, much thanks to the rheumatologists who facilitated recruitment: Robert Offer, Andrew Chalmers, Kamran Shojania, Barry Koehler, Graham Reid, Dan Macleod, Alice Klinkhojf, John Kelsall, Milton Baker, and Diane Lacaille. Best wishes and heartfelt thanks to all of the study participants. Special thanks to all of the people who shared their time and knowledge with me along the way especially: Daphne Guh, Chris Richardson, and John Woolcott. Also, Larry Lynd, who forged the path ahead of me, motivated me to embark on this career path, and left very large footprints that were difficult to fill . Most of all, I want thank my amazing family; especially, my wife Fawziah for pushing me to follow my goals and for providing me with unyielding support and advice from the beginning that gave me the necessary strength and motivation to begin and complete this work. Without her many compromises, dedication and commitment, none of this would have been possible. To her, I dedicate this thesis. Finally, Yasmin and Noah, thank you both for the constant reminders about the things that are truly important in life and keeping me grounded in reality. This research was generously supported by grants from the Canadian Arthritis Network. Thanks to the Canadian Institutes of Health Research, the Arthritis Society and the Michael Smith Foundation for Health Research for their fellowship support. xv CHAPTER 1 INTRODUCTION 1.1 RHEUMATOID ARTHRITIS: EPIDEMIOLOGY, ECONOMIC BURDEN AND ECONOMIC EVALUATION Rheumatoid arthritis (RA) is a chronic, progressive, inflammatory disease that afflicts approximately 300,000 Canadians. 1 ' 2 The cause of this condition is unknown and there is no cure. This disease affects the physical functioning of patients as well as their psychological and social health 4 and eventually progresses to substantial disability through the loss of mobility, increased morbidity, and premature mortality. 5 - 9 The incidence of death from cardiovascular disease 1 0, infection 1 1 and cancer 1 2 are significantly higher than those experienced in the general population. R A can occur at any age but its onset peaks between the ages of 40 and 60 years. 1 4 The prevalence of R A is approximately 0.5 to 1% of the adult population and the incidence appears to be decreasing over the past 4 decades (from 61.2 per 100,000 in 1955 to 1964 to 32.7/100,000 in 1985 to 1994). 2 , 1 5 However, this finding is from one population-based, longitudinal study in a specific geographic area (Rochester, Minnesota) and may not be generalizable to areas with different ethnicity (this sample was 96% white) or environmental exposures. Other prevalence data is difficult to find and is often not population-based. Using administrative databases in British Columbia, a population-based estimate of 27,710 1 R A cases (mean (SD) age of 64.1(17), 67% women) was identified translating into a prevalence rate of 0.76%. 1 6 Although the incidence of R A may be decreasing, results from epidemiological evaluations show that the premature mortality associated with this condition has not changed over the last several decades despite the development of more effective interventions. 2 ' 1 7 1 8 However, a recent analysis has determined that the use of methotrexate ( M T X ) is associated with a substantial survival benefit in patients with R A despite having had worse prognostic factors for mortality prior to being treated with this agent.9 After adjusting for confounding by indication (specifically, in this case, patients with more severe disease having a higher probability of being prescribed M T X ) , this beneficial affect on mortality was demonstrated in comparisons with patients who were taking other disease modifying antirheumatic drugs ( D M A R D s ) or patients taking no D M A R D s (the adjusted hazard ratio of M T X compared to no D M A R D s was 0.2, 95% confidence interval (CI) 0.1 to 0.7). The mortality hazard ratio for comparisons between those using M T X and those with no M T X use (i.e. other D M A R D s ) was 0.4 (95% CI 0.2 to 0.8) for all cause mortality, 0.3 (95% CI 0.2 to 0.7) for cardiovascular mortality, and 0.6 (95% CI 0.2 to 1.2) for non-cardiovascular mortality. Thus, from the results of this analysis, it would appear that the application of an effective treatment such as M T X is associated with survival benefits. In another study examining survival in R A , 7 Wolfe et al.. demonstrated that the strongest predictor of mortality in this disease group was longitudinal changes in the Health Assessment Questionnaire (HAQ) . A one standard deviation increase in the H A Q (a higher H A Q represents more severe disease) resulted in a 26.2% greater increase in the odds ratio for mortality compared to the next most powerful predictor of mortality, the patient 2 completed global severity index. Interestingly, changes from the fourth quartile to the first quartile for both of these measures would have an estimated reduction in mortality by 50% for the H A Q and 33% for the patient completed global severity index. Thus, since the H A Q and other self-reported measures are the most highly predictive of future mortality, lowering the H A Q with effective drug therapy (as shown by Wolfe et al. 9) or improving self-rated health should result in improved survival in R A . There are many other examples in the literature of the effect of drug therapy on the H A Q in patients with R A ; however, these patients have not been followed for a long enough period of time to determine i f this reduction in H A Q translates into a reduction in mortality. 1 9\" 2 3 Thus, one can only postulate whether the reduction in H A Q due to drug treatment (other than M T X ) observed in these trials results in a lower mortality rate. The severity of R A has been shown to be significantly related to a reduction in health-related quality of life ( H R Q L ) 2 4 ' 2 5 and H R Q L in R A has been shown to be worse than in other forms of arthritis. 2 6 The finding that lower H R Q L is associated with higher R A severity has been shown using both disease-specific measures such as the Rheumatoid Arthritis Quality of Life Questionnaire ( R A Q o L ) 2 7 ' 2 8 and the Arthritis Impact Management Scale ( A I M S ) , 2 9 ' 3 0 and, to a lesser extent, preference-based measures of H R Q L such as the EuroQol (EQ-5D) . 3 1 A s with the humanistic burden that R A exerts in terms of mortality and reduction in H R Q L , the economic burden of R A to society is substantial and is thought to rival that of coronary artery disease. 3 2 Since R A usually starts in the 4 t h or 5 t h decade of l i f e 3 3 and the disease cannot be cured, the costs attributable to this condition are often compounded over several years. There are a number of studies in the literature which attempt to examine the 3 direct and indirect costs associated with R A from a variety of perspectives. The findings of these analyses show a great degree of variability in the estimated costs, partly due to differences in the costing methodologies employed (and some utilized charges instead of costs) 4 7, which variables were included in the ascertainment of direct medical costs, and methodological problems in the determination of cost-of-illness in R A . 4 8 Despite these limitations, each study concluded that the costs to manage R A were substantial, and when assessed, the indirect costs were a large portion of the overall costs to manage R A . Several reviews examining studies that investigate the costs to manage R A have been published. 4 8\" 5 0 Most of these studies included in these reviews were conducted in an era when new, expensive drug therapies for R A , such as the tumour necrosis factor (TNF) alpha blockers and newer non-steroidal anti-inflammatory drugs, were not yet available. Specifically, in a review by Lubeck, 4 9 hospitalizations generally accounted for > 60% whereas drug costs generally accounted <25% of direct costs in R A . Pugner et al. 5 0 found that the mean annual direct cost to manage R A was $5,425 (1998 U . S . dollars) and that the median percentage of this total due to hospitalization and drugs was 47% and 16%, respectively. Few studies have attempted to quantify indirect costs but, o f those that have , 3 6 ' 3 9 ' 4 0 ' 4 2 ' 4 3 these have ranged from $1082 to $37501 (1996 U . S . dollars) per patient. However, again in the determination of indirect costs, there was a lack of clarity in the studies on how the results were determined and/or important methodological issues that makes comparisons across the studies difficult. The only Canadian study that was published in this earlier era of R A drug treatment was by Clarke et al. who examined cohorts of individuals with rheumatoid arthritis in Saskatchewan. 4 2 ' 4 3 In this longitudinal study of almost 1000 patients with R A , annual direct 4 and indirect costs were determined to be $4656 and $1597, respectively (1994 Canadian dollars). The authors determined resource utilization and direct medical costs from assessing the number of physician visits, medications, diagnostic tests, and inpatient care. Inpatient care was associated with almost two-thirds of the direct medical costs. Indirect costs were assessed through productivity loss using the human capital approach. Since most of the patients were > 60 years of age and no longer considered themselves to be in the work force, the authors did not include these individuals in the calculation of indirect costs resulting in small estimates. However, the results of these studies are no longer accurate because of the introduction and increasing use of biological drug therapy (infliximab, etanercept and adalimumab) as well as a new class of nonsteroidal anti-inflammatory drugs (the cyclo-oxygenase [COX] 2 specific inhibitors) 5 1 ' 3 in the past few years which have caused medication costs to manage R A to skyrocket. 3 5 These biological agents are effective but are extremely expensive with annual acquisition costs from $12,000 to over $20,000 per patient. if A recent analysis examining costs in the biological era was published by Michaud et al.. In a sample of 7,527 patients with R A answering semi-annual questionnaires from January 1999 to December 2001, direct medical costs (calculated from physician and other health professional visits, radiologic examinations, laboratory and other tests, outpatient surgeries, hospitalizations and medications) were determined. In the entire sample, the mean direct cost was $9,519 (2001 U . S . dollars) of which 66% was due to drug costs and 16% and 17% were due to hospital costs and outpatient costs, respectively. In those receiving biological Reference 51 was authored by the candidate during the tenure of the doctoral program and has been inserted as Appendix I 5 agents, the annual direct cost was $19,016 compared to $6,164 (2001 U . S . dollars) in those not receiving these agents. The use of these agents may have had an impact on indirect costs as measured by productivity losses as well . In a study by Y e l i n et a l . , 5 2 the association between the use of etanercept and employment outcomes were investigated in a sample of 497 R A patients. In order to ensure eligibility for employment, only patients between 18 and 64 were included in the study. In structured telephone interviews, patients were asked questions regarding their employment status in the year of diagnosis (75% in the etanercept group vs. 77% in the non-etanercept group) and in the study year (71% in the etanercept group vs. 55% in the non-etanercept group). After adjusting for demographics, overall health status, duration of R A , R A status, and occupation type, the difference increased to 20%, 95% CI 9% to 32% difference (53% vs. 73% employed in the non-etanercept and etanercept groups, respectively). Thus, it would appear that, at least among those of working age, etanercept has the potential to reduce the indirect costs associated with R A . With respect to the impact that these new biological agents make on H R Q L and H A Q scores, a few studies have shown benefit. \" Using the Short Form 36 (SF-36), both infliximab and etanercept have been shown to improve H R Q L (at least in the short term) over M T X in randomized controlled tr ials . 5 3 ' 5 4 In a recently published observational study of patients either being treated with infliximab or with stable R A , the responsiveness of the SF-36, the EQ-5D, the standard gamble, and the Short Form 6-D (SF-6D) were evaluated. 5 5 In the group treated with infliximab, large responsiveness indices (>0.80 effect sizes) were observed for the SF-36 physical component score, and relevant domains in the SF-36 (bodily pain, physical functioning, role physical, social functioning and vitality). For preference-6 based measures, the SF-6D was highly responsive (effect size of 1.40), the E Q - 5 D was moderately responsive (effect size of 0.67) and the standard gamble was poorly responsive (effect size of 0.49). However, the sample size of this study was small (60 patients on infliximab and 24 patients with stable R A ) so the results were not conclusive. Although economic evaluations of pharmacotherapies are not new in R A , \" there has been an explosion of published cost-effectiveness analyses since the introduction of leflunomide (a new D M A R D ) , new biological agents, pharmacogenetic technologies and C O X - 2 specific inhibitors. 5 9\" 7 0 ' 1 3 Although there are many shortcomings of several of these analyses, a detailed discussion of these is beyond the scope of this chapter. A critical review 71 pertaining to the cost-effectiveness literature of biological agents w i l l soon be available. However, a couple of limitations of these analyses have direct relevance to this thesis -namely, the use of randomized controlled data to estimate outcomes and the attempt (or 72 lack thereof) to incorporate H R Q L data into the outcome variables. Wolfe et al. make a compelling case that the short-term efficacy data derived from randomized controlled trials are not suitable to extrapolate to long-term cost-effectiveness results and that observational drug-treatment databases should be utilized. This finding is based on the evidence that treatment outcomes derived from observational databases can often be different than those derived from randomized controlled trials in R A . 7 3 ' C The second major limitation of many of * Of note, the candidate authored one of the cost-effectiveness analyses on the pharmacogenetics technologies (reference 66) during his tenure as a doctoral student and it has been included as Appendix II.. c Of note, the candidate authored a paper (reference 73) during the tenure of his doctoral program that showed that the efficacy and toxicity of cyclosporin in an observational database was different than reported in randomized controlled trials and this has been included as Appendix III. 7 these cost-effectiveness analyses is the lack of an attempt to integrate preference-based generic H R Q L measures into the economic evaluations or the application of instruments/techniques that have not undergone appropriate testing in R A . For example, many of the analyses report either costs alone or cost-effectiveness ratios using naturalistic units (such as reduction in swollen joints or in proportions of patients improving using standard c r i t e r i a ) . 5 6 ' 5 7 , 6 0 ' 6 1 ' 6 6 ' 6 7 ' 6 9 Other analyses utilized direct preference elicitation techniques such as a visual analogue scale ( V A S ) , Time Trade Of f (TTO) or standard CO CQ -J/\"\\ _____ gamble (SG) •>°>^>'\" although the T T O / S G have been shown to be poorly responsive and poorly correlated with clinical outcomes in patients with rheumatoid arthrit is. 5 5 ' 7 4 ' 7 5 The most commonly applied method to obtain preference scores was the E Q - 5 D in the recent cost-effectiveness analyses of biological agents 6 3\" 6 5 which has been shown to be both responsive and valid in rheumatoid arthritis. 3 1 Other instruments utilized to derive preference-based scores that are commonly utilized in cost-utility analyses have not yet been applied in assessing the cost-effectiveness of treatments for rheumatoid arthritis. These instruments include the Health Utilities Index Mark 2 (HUI2) and Mark 3 ( H U D ) , and the S F - 6 D . 7 6 However, research in R A and other disease states suggests that scores obtained with these systems are not interchangeable and could have a profound impact on the estimation of incremental cost-effectiveness ratios. 7 7 ' 7 8 1.2 RESEARCH NEEDS AND STUDY JUSTIFICATION The appropriate and most efficient use of health care resources has resulted in the need to conduct economic evaluations for new and existing treatments in order to inform 8 decision-making. For diseases such as R A that are chronic and incurable with a documented impact on H R Q L , the need to integrate H R Q L data into treatment assessment is critical. Compounding this point is the evolving field of R A treatment, which has brought about several new, effective but very expensive agents in the past few years. Thus, through the cost-utility analysis framework, preference-based measures of H R Q L can be used to inform resource allocation decisions in health care. 7 9 ' 8 0 This is done through the calculation of the quality adjusted life year ( Q A L Y ) which is commonly used in the denominator of the OA O 1 incremental cost-utility ratio calculation. ' A s originally described by Weinstein et a l . , 8 2 \"the quality adjusted life year approach assigns to each period of time a weight, ranging from 0 to 1, corresponding to the health-related quality of life during that period, where a weight of 1 corresponds to optimal health, and a weight of 0 corresponds to a health state judged to be equivalent to death\". Thus, Q A L Y s relating to a health outcome are expressed as the value (weighting) given to a particular health state multiplied by the time spent in that state. The weightings used in the calculation of Q A L Y s are derived from preferences for health states, which can be measured directly through the application of various methodologies such as the standard gamble (SG), time trade off (TTO), person trade-off, and rating scales. 7 9 ' 8 0 However, due to the expense and inconvenience associated with administering many of the direct approaches, generic, preference-based questionnaires have been developed which integrate health into a single index (where death is anchored at zero and perfect health at one). These questionnaires typically consist of a health classification system with an associated scoring formula that assigns preference-weighted values to the health states defined by the classification system and integrates the different aspects of health into a single index. The questionnaires that are 9 most commonly utilized are the Health Utilities Index Mark 2 and 3 (HUI2 and H U B ) , the EuroQol (EQ-5) and the Short Form 6D (SF-6D) . 7 9 ' 8 0 However, despite their widespread use, there have been few comparative studies addressing their strengths, weaknesses and interchangeability. 8 0 Empirical assessments of instruments designed to measure or value H R Q L involves examining feasibility, reliability, validity and responsiveness.8 3 Feasibility refers to the ability of the instrument to be used in practice and accepted by respondents.8 1 Reliability refers to stability of responses i f the conditions under examination remain unchanged. 8 3 Validity is the extent to which an instrument measures the property it is intended to 84 measure. M u c h of the literature on validity o f H R Q L measures focuses on the discriminative properties of instruments which is also the technique commonly employed for preference-based instruments. 8 5 Another essential property of H R Q L instruments is the ability to detect change over time and, the extent to which this change is important or meaningful. Finally, it is not clear i f these instruments, which all purport to assess the same construct, namely a single index score of H R Q L , are interchangeable and, i f used as the weights for Q A L Y s in an economic evaluation of pharmacotherapy for R A , would result in comparable outcomes and potentially similar policy decisions. This study was conceived based on the need to compare the properties of the most commonly utilized indirect, preference-based measures in terms of their cross-sectional construct validity, differences in aspects of health that they assess, potential biases in their scores in terms of the effects of income, and their sensitivity to change and responsiveness. Chapter 2 provides a detailed review of these preference-based, indirect utility instruments, 10 studies that compare their properties, and the use of preference-based measures as Q A L Y weights in economic evaluations of R A . 1.3 S T U D Y H Y P O T H E S I S , O B J E C T I V E S , A N D T H E S I S O R G A N I Z A T I O N The overall aim of this study was to evaluate and compare the different properties of the four indirect utility instruments (HUI2, H U D , SF-6D, EQ-5D) and to assess whether using the scores generated by their different systems in the same cost-effectiveness framework would result in different outcomes. The primary hypothesis of this study was that quality adjusted life year ( Q A L Y ) estimates obtained using these instruments would be different and would result in different incremental cost-utility ratios and, therefore, potentially different policy decisions. The first objective of this study was to determine if, on a cross-sectional basis, the indirect utility instruments would yield similar utility values in patients with R A and, i f not, were the assessed domains of health similar among the instruments. The second objective was to determine i f these indirect utility assessment instruments displayed cross-sectional, construct validity in the assessment of patients with R A and how well they compared, in this regard, to the disease-specific R A Q o L and to a disability status measure, the H A Q . The th i rd objective was to determine i f the utilization of the different utility values generated by the indirect utility instruments in a cost-utility analysis of a new drug therapy (infliximab plus M T X ) compared to usual therapy ( M T X alone) for rheumatoid arthritis would result in different estimates of the incremental cost per Q A L Y gained. 11 The fourth objective was to determine i f the results generated by the indirect utility instruments are influenced by socioeconomic status and, i f so, could therefore bias the results of cost-utility analyses. The fifth objective was to determine the longitudinal validity of the instruments in rheumatoid arthritis in terms of their ability to be responsive to changes that patients experience in their R A . This thesis is comprised of eight chapters, organized chronologically, addressing each of the objectives in order. This first chapter provides a brief introduction to: 1) the epidemiology of R A ; 2) the effect of R A on mortality; 3) direct and indirect costs of R A ; 4) the rapidly evolving field of pharmacotherapy for R A ; 5) the impact on new strategies on work productivity and H R Q L ; 6) published cost-effectiveness analyses in R A ; and 7) the use of preference-based measure scores as weighting factors for Q A L Y s in economic evaluations of R A . Chapter 2 provides a detailed literature review of indirect utility instruments as weightings for Q A L Y s in the economic evaluation of interventions for R A . Specifically, the indirect utility measures, their properties, their use in the calculation of quality adjusted life years, how their properties have compared in other disease states, and the application of these instruments in cost-utility analyses in R A are reviewed in detail. Chapters 3 and 4 present the results of the cross-sectional analysis from the baseline results of a sample of R A patients who participated in our longitudinal study. Chapter 5 provides a comparison of how the application of these utility instruments in a decision-analytic, Markov model for a new pharmacotherapy in R A results in vastly different incremental cost per Q A L Y ratios. Chapter 6 provides an examination of how these indirect utility instruments are influenced by annual 12 income and how this could potentially bias economic evaluation of therapies for R A . Chapter 7 presents the longitudinal validity analyses of these instruments in terms of responsiveness. Chapters 3 through 7, and Appendix I, II, III are each stand-alone manuscripts, which have either been published, are in press, or are under review by a major, peer-reviewed, scholarly journal. The work presented in this thesis was conceived, conducted, and disseminated by the doctoral candidate as has been declared by the co-supervisors of the candidate (Appendix IV). The final chapter provides a summary of the research findings and outlines the strengths, limitations and the unique contributions and potential impact of the findings of this study. 1.4 S U M M A R Y Approximately 300,000 Canadians have been diagnosed with R A . Due to its chronic, debilitating nature, the direct costs associated with the management of this condition and indirect costs due to lost employment are substantial. Functional status and H R Q L have been shown to be reduced in patients with R A . New therapies, specifically biological D M A R D s , have the potential to improve H R Q L , functional status, and offset some o f the indirect costs associated with productivity and potentially some of the direct costs of management (such as hospitalizations) although their acquisition costs are large. Therefore, cost-effectiveness analyses of these new agents that integrate appropriate preference-based measures of H R Q L into years of life are required. Often, due to the convenience and availability, instruments that estimate society's preferences for health states are used to accomplish this task. However, preliminary evidence suggests that the application of these instruments in 13 economic evaluations could result in vastly different cost-effectiveness outcomes. Therefore, further research is required to compare the scores and properties of these instruments in patients with R A . This study focused on the comparison of the scores and properties of four indirect utility assessment instruments (the HUI2 , H U B , the EQ-5D, and the SF-6D) in patients with R A . The evaluation of these instruments in this population required recruitment of a sample of patients with R A for direct comparison of scores and the evaluation of their longitudinal properties. In addition, a HAQ-based, Markov model was created to test the hypothesis that incremental cost per Q A L Y ratios would be different using the various indirect utility instrument scores as weightings for Q A L Y s . 14 1.5 R E F E R E N C E S 1. The Arthritis Society. Accessed on the Internet, January 5, 2004 at: http://vvww.arthritis.ca 2. Doran M F , Pond G R , Crowson C S , O'Fal lon W M , Gabriel SE. Trends in the incidence and mortality of rheumatoid arthritis. Arthritis Rheum 2002; 46:625 3. American College of Rheumatology Subcommittee on Rheumatoid Arthritis Guidelines. Guidelines for the Management of Rheumatoid Arthritis. 2002 Update. Arthritis Rheum 2002;46:328-346. 4. Y e l i n E , Callahan L F , for the National Arthritis Data Work Group. The economic cost and social psychological impact of musculoskeletal conditions. Arthritis Rheum 1995;38:1351-1356. 5. Pincus T, Callahan L F . The 'side effects' o f rheumatoid arthritis: Joint destruction, disability, and early mortality. B r J Rheumatol 1993;32 (Suppl.l): 28-37. 6. Gabriel SE, Crowson C S , Kremers H M , Doran M F , Turesson C, O 'Fal lon W M , Matteson E L . Survival in rheumatoid arthritis. A population-based analysis of trends over 40 years. Arthritis Rheum 2003;48:54-58. 7. Wolfe F, Michaud K , Gefeller O, Choi H K . Predicting mortality in patients with rheumatoid arthritis. Arthritis Rheum 2003;48:1530-1542. 8. Wong JB, Ramey D R , Singh G . Long-term morbidity, mortality, and economics of rheumatoid arthritis. Arthritis Rheum 2001;44:2746-2749. 9. Choi H K , Hernan M A , Seeger SD, Robins J M , Wolfe F. Methotrexate and mortality in patients with rheumatoid arthritis: A prospective study. Lancet 2002;359:1173-1177. 15 10. Mutru O, Laasko M , Isomaki H , Koota K . Cardiovascular mortality in patients with rheumatoid arthritis. Cardiology 1989;76:71-77. 11. Wolfe F, Cathey M A . The assessment and prediction of functional disability in rheumatoid arthritis. J Rheumatol 1991;18:1298-1306. 12. Doran M F , Crowson C S , Pond G R , O'Fal lon W M , Gabriel SE . Frequency of infection in patients with rheumatoid arthritis compared with controls: a population-based stud. Arthritis Rheum 2002;46:2287-2293. 13. Cibere J, Sibley J, Haga M . Rheumatoid arthritis and the risk of malignancy. Arthritis Rheum 1997;40:1580-1586. 14. Tugwell P. Pharmacoeconomics of drug therapy of rheumatoid arthritis. Rheumatology 2000:39(suppl. 1): 43-47. 15. Hochberg M C , Spector T D . Epidemiology o f rheumatoid arthritis: update. Epidemiol Rev 1990;12:247-252. 16. Lacaille D , Anis A H , Guh D , Esdaile J M . Assessing the quality of care for R A at the population level. Arthritis Rheum 2002;46 (suppl.):s626-s626. 17. Gabriel SE, Crowson C S , O'Fal lon W M . Mortality in rheumatoid arthritis: Have we made an impact in 4 decades? J Rheumatol 1999;26:2529-2533. 18. Coste J, Jougla E . Mortality from rheumatoid arthritis in France, 1970-1990. Int J Epidemiol 1994;23:545-552. 19. Scott D L , Strand V . The effects of disease-modifying anti-rheumatic drugs on the Health Assessment Questionnaire score. Lessons from the leflunomide clinical trials database. Rheumatology 2002;41:899-909. 16 20. Kremer J M . Rational use of new and existing drugs for rheumatoid arthritis. A n n Intern M e d 2001;134:695-706. 21. Ma in i R, St. Clair E W , Breedveld F, Furst D , Kalden J, Weisman M , et al.. for the A T T R A C T Study Group. Infliximab (chimeric anti-tumour necrosis factor alpha monoclonal antibody) versus placebo in rheumatoid arthritis patients receiving concomitant methotrexate: a randomized Phase III trial. Lancet 1999;354:1932-1939 22. Boers M , Verhoeven A , Markusse H , Van der Laar M , Westhovens R, V a n Denderen J. Randomised comparison of combined step-down prednisolone, methotrexate and sulphasalazine with sulphasalazine alone in early rheumatoid arthritis. Lancet 1997; 350:309-318. 23. Moreland L , Schiff M , Baumgartner S, Tindall E , Fleischmann R, Bulpitt K . Etanercept therapy in rheumatoid arthritis: a randomised, controlled trial. A n n Intern M e d 1999; 130:478-486. 24. Bendtsen P, Akerl ind I , Hornquist JO. Assessment of quality of life in rheumatoid arthritis: methods and implications. Pharmacoeconomics 1994;286-298. 25. Nicho l M B , Harada A S M . Measuring the effects of medication use on health-related quality of life in patients with rheumatoid arthritis: A review. Pharmacoeconomics 1999;16(5Pt l):433-448. 26. Dominick K L , A h e m F M , Go ld C H , Heller D A . Health-related quality of life among older adults with arthritis. Health and Quality of Life Outcomes 2004;2:5. 27. Nevil le C , Whalley D , McKenna S, Le Comte M , Fortin PR. Adaptation and validation o f the rheumatoid quality o f life scale for use in Canada. J Rheumatol 2001;28:1505-1510. 17 28. de Jong Z , van der Heijde D , Mckenna SP, Whalley D . The reliability and construct validity of the R A Q o L : a rheumatoid arthritis-specific quality of life instrument. B r J Rheumatol 1997;36:878-883. 29. Lorish C D , Abraham N , Austin JS, Bradley L A , Alarcorn G S . A comparison of the full and short versions of the Arthritis Impact Measurement Scales. Arthritis Care Res 1991;4:168-173. 30. Buchbinder R, Bombardier C, Yeung M , Tugwell P. Which outcome measures should be used in rheumatoid arthritis clinical trials? Clinical and quality-of-life measures' responsiveness to treatment in a randomized controlled trial. Arthritis Rheum 1995;38:1568-1580. 31. Hurst N , K i n d P, Ruta D , Hunter M , Stubbings A . Measuring health-related quality of life in rheumatoid arthritis: validity, responsiveness, and reliability o f EuroQol (EQ-5D). B r J Rheumatol 1997;36:551-559. 32. Callahan L F . Economics of rheumatoid arthritis. Rheumatoid Arthritis 1999;2:3-5. 33. Maetzel A , Strand V , Tugwell P, Wells G , Bombardier C. Cost effectiveness of adding leflunomide to a 5-year strategy of conventional disease-modifying antirheumatic drugs in patients with rheumatoid arthritis. Arthritis Rheum 2002;47:655-661. 34. Michaud K , Messer J, Choi H K , Wolfe F. Direct medical costs and their predictors in patients with rheumatoid arthritis. A three-year study of 7,527 patients. Arthritis Rheum 2003;48:2750-2762. 35. Ward M M , Javitz H S , Ye l in E H . The direct cost of rheumatoid arthritis. Value in Health 2000;4:243-252. 18 36. Meenan R P , Y e l i n E H , Heke C J , et al.. The costs of rheumatoid arthritis: A patient-oriented study o f chronic disease costs. Arthritis Rheum 1978:21:827-833. 37. Lubeck DP , Spitz P W , Fries JF, et al.. A multicentre study of annual health service utilization and costs in rheumatoid arthritis. Arthritis Rheum 1986;29:488-493. 38. Y e l i n E , Wanke L A . A n assessment of the annual and long-term direct costs of rheumatoid arthritis. Arthritis Rheum 1999;42:1209-1218. 39. Stone C E . The lifetime economic costs of rheumatoid arthritis. J Rheumatol 1984;11:819-827. 40. Gabriel SE, Crowson C S , Campion M E , et al.. Direct medical costs unique to people with arthritis. J Rheumatol 1997;24:719-725. 41. Y e l i n E . The costs of rheumatoid arthritis - absolute, incremental, and marginal estimates. J Rheumatol 1996;23:47-51. 42. Clarke A E , Zowal l H , Levinton C et al.. Direct and indirect medical costs incurred by Canadian patients with rheumatoid arthritis: A 12 year study. J Rheumatol 1997;24:1051-1060. 43. Clarke A , Levinton C, Joseph L , Penrod J, Zowal l H , Sibley J, et al.. Predicting the short term direct medical costs incurred by patients with rheumatoid arthritis. J Rheumatol 1999; 26:1068-1075. 44. Liang M H , Larson M , Thompson M , et al.. Costs and outcomes in rheumatoid arthritis and osteoarthritis. Arthritis Rheum 1984:27;522-529. 45. van Jaarvsveld C H M , Jacobs J W G , Schrijvers A J P , et al.. Direct cost of rheumatoid arthritis during the first six years: A cost-of-illness study. B r J Rheumatol 1998;37:837-847. 19 46. Pincus T. The underestimated long term medical and economic consequences of rheumatoid arthritis. Drugs 1995;50 (suppl 1): 1 -14. 47. Finkler S A . The distinction between costs and charges. A n n Intern M e d 1982;96:102-109. 48. Cooper N J . Economic burden of rheumatoid arthritis: a systematic review. Rheumatology 2000;39:28-33. 49. Lubeck D P . A review of the direct costs of rheumatoid arthritis: managed care versus fee-for-service settings. Pharmacoeconomics 2001;19:811-818. 50. Pugner K M , Scott DI , Holmes JW, Hieke K . The costs o f rheumatoid arthritis: an international, long-term view. Semin Arthritis Rheum 2000;29:305-320. 51. Marra C A , Esdaile J M , Sun H , Anis A H . The cost of C O X inhibitors: how selective should we be? J Rheumatol 2000;27:2731-2733. 52. Y e l i n E , Trupin L , Katz P, Lubeck D , Rush S, Wanke L . Association between etanercept use and employment outcomes among patients with rheumatoid arthritis. Arthritis Rheum 2003;48:3046-3054. 53. Kosinski M , Kujawski S C , Martin R, Wanke L A , Buatti M C , Ware J E Jr., Perfetto E M . Health-related quality of life in early rheumatoid arthritis: Impact on disease and treatment response. A m J Manag Care 2002;8:231-240. 54. Blumenauer B , Cranney A , Clinch J, Tugwell P. Quality of life in patients with rheumatoid arthritis: which drugs might make a difference? Pharmacoeconomics 2003;21:927-940. 20 55. Russell A S , Conner-Spady B , Mintz A , Mal lon C, Maksymowych W P . The responsiveness of generic health status measures in patients with rheumatoid arthritis receiving infliximab. J Rheumatol 2003;30:941-947. 56. Anis A H , Tugwell P X , Wells G A , Stewart D G . A cost-effectiveness analysis of cyclosporine in rheumatoid arthritis. J Rheumatol 1996;23:609-616. 57. Kavanaugh A , Heudebert G , Cush J, Jain R. Cost evaluation of novel therapeutics in rheumatoid arthritis ( C E N T R A ) : a decision analysis model. Semin Arthritis Rheum 1996;25:297-307. 58. Verhoeven A C , Bibo JC, Boers M , Engel G L , van der Linden SJ. Cost-effectiveness and cost-utility of combination therapy in early rheumatoid arthritis: randomized comparison of combined step-down prednisolone, methotrexate and sulphasalazine with sulphasalazine alone. B r J Rheumatol 1998;37:1102-1109. 59. Maetzel A , Strand V , Tugwell P, Wells G , Bombardier C. Cost effectiveness of adding leflunomide to a 5-year strategy of conventional disease-modifying antirheumatic drugs in patients with rheumatoid arthritis. Arthritis Rheum 2002;47:655-661. 60. Choi H K , Seeger JD, Kuntz K M . A cost-effectiveness analysis of treatment options for patients with methotrexate-resistant rheumatoid arthritis. Arthritis Rheum 2000;43:2316-2327. 61. Choi H K , Seeger JD, Kuntz K M . A cost-effectiveness analysis of treatment options for methotrexate-nai've rheumatoid arthritis. J Rheumatol 2002;29:1156-1165. 62. Wong JB , Singh G , Kavanaugh A . Estimating the cost-effectiveness of 54 weeks of infliximab for rheumatoid arthritis. A m J M e d 2002; 113:400-408. 21 63. Kobelt G , Jonsson L , Young A , Eberhardt K . The cost-effectiveness of infliximab (Remicaide) in the treatment o f rheumatoid arthritis in Sweden and the United Kingdom based on the A T T R A C T study. Rheumatology 2003;42:326-335. 64. Brennan A , Bansback N J , Reynolds A , Conway P. Modeling the cost-effectiveness of etanercept in adults with rheumatoid arthritis in the U K . Rheumatology 2004;43:62-72. 65. Kobelt G , Eberhardt K , Geborek P. T N F inhibitors in the treatment of rheumatoid arthritis in clinical practice: Costs and outcomes in a follow-up study of patients with R A treated with etanercept or infliximab in southern Sweden. A n n Rheum Dis 2004;63:4-10. 66. Marra C A , Esdaile J M , Anis A H . Practical pharmacogenetics: The cost-effectiveness of screening for thiopurine s-methyltransferase polymorphisms in patients with rheumatological conditions treated with azathioprine. J Rheumatol 2002;36:1851-1855. 67. Oh K T , Anis A H , Base SC. Pharmacoeconomic analysis of thiopurine methyltransferase polymorphism screening by polymerase chain reaction for treatment with azathioprine in Korea. Rheumatology 2004;43:156-163. 68. Spiegel M R B , Targownik L , Dulai G S , Gralnek I M . The cost-effectiveness of cyclo-oxygenasae-2 selective inhibitors i n the management of chronic arthritis. A n n Intern M e d 2003;138:795-806. 69. Lee K K , Y o u J H , Ho JT, Suen B Y , Yung M Y , Lau W H , Lee W V , Sung J Y , Chan F K . Economic analysis of celecoxib versus diclofenac plus omeprazole for the 22 treatment of arthritis in patients at risk of ulcer disease. Aliment Pharmacol Ther 2003;18:217-222. 70. Maetzel A , Krahn M , Naglie G . The cost effectiveness of rofecoxib and celecoxib in patients with osteoarthritis or rheumatoid arthritis. Arthritis Rheum 2003;49:283-292. 71. Bansback N , Regier D , Brennan A , A r a R, Shojania K , Esdaile J M , Anis A H , Marra C A . Improving the methods for economic evaluation of rheumatoid arthritis: A review of the literature pertaining to biologic D M A R D s . Drugs (in press). 72. Wolfe F, Michaud K , Pincus T. Do rheumatology cost-effectiveness analysis make sense? Rheumatology 2004;43:4-6. 73. Marra C A , Esdaile J M , Guh D , Fisher J H , Chalmers A , Anis A H . The effectiveness and toxicity of cyclosporin A in rheumatoid arthritis: longitudinal analysis of a population-based registry. Arthritis Rheum 2001;45:240-245. 74. Verhoeven A C , Boers M , van der Linden S. Responsiveness o f the core set, response criteria, and utilities in early rheumatoid arthritis. A n n Rheum Dis 2000;59:966-974. 75. Tijhuis G J , Jansen SJ, Stiggelbout A M , Zwinderman A H , Hazes J M , Vlieland TP. Value of the time trade off method for measuring utilities in patients with rheumatoid arthritis. A n n Rheum Dis 2000;59:892-827. 76. Kopec J A , Wil l i son K D . A comparative review o f four preference-weighted measures of health-related quality of life. J C l i n Epidemiol 2003;56:317-325. 77. Suarez-Almazor M E , Conner-Spady B . Rating of arthritis health states by patients, physicians and the general public. Implications for cost-utility analysis. J Rheumatol 2001;28:648-656. 23 O'Br ien B J , Spath M , Blackhouse G , Severns J L , Dorian P, Brazier J. A view from the bridge: agreement between the SF-6D utility algorithm and the Health Utilities Index. Health Econ 2003;12:975-981. Kopec J A , Wil l i son K D . A comparative review of four preference-weighted measures of health-related quality of life. J C l i n Epidemiol 2003;56:317-25. Drummond M F , O'Br ien B , Stoddart G L , Torrance G W (eds.). Methods for the economic evaluation of health care programmes. 2nd edition. Oxford Medical Publications, Oxford. 1997. Dolan P. The measurement of health-related quality of life for use in resource allocation decisions in health care. Chapter 32. In: Handbook of Health Economics, V o l . 1. Edited by Culyer A J , Newhouse JP. London, U . K . Elsevier Science 2000. Weinstein M C , Stason W B . Foundations of cost-effectiveness analysis for health and medical practices. N Engl J Med 1977;296;716. Streiner D L , Norman G R . Health Measurement Scales: A Practical Guide to their Development and Use. 2 n d edition. Oxford University Press, 1995. Hays R D , Anderson R, Revicki D . Psychometric considerations in evaluating health related quality of life measures. Qual Life Res 1993;2:441-449. Maddigan S L , Feeny D H , Johnson J A for the D O V E investigators. Construct validity of the R A N D - 1 2 and the Health Utilities Index Mark 2 and 3 in type 2 diabetes. Qual Life Res 2004 (in press). Liang M H , L e w R A , Stucki G , Fortin PR, Daltroy L . Measuring clinically important changes with patient-oriented questionnaires. M e d Care 2002;40 (suppl):II-45 -11-51. 24 CHAPTER 2 BACKGROUND 2.1 COST-EFFECTIVENESS ANALYSIS AND THE QALY In recent years, cost-effectiveness analysis has emerged as the preferred technique for economic evaluation in health care.1 Cost-effectiveness analysis shows the relationship between the incremental net resources used (costs) and the net health benefits generated (effects) between a specific intervention and an alternative strategy.2 A s such, the incremental cost-effectiveness ratio (ICER) can be calculated, which is simply the ratio of the difference between two interventions' costs and the difference between their effectiveness as follows: / C ^ = A C o s t ^ E f f e c t Rather than express the outcomes in cost-effectiveness analysis in terms of naturalistic units (such as number of tender joints reduced), analysts have sought outcome measures that permit comparisons across conditions. This framework would inform societal decision-making such that competing interventions that produce the greatest gain in health for the resources expended could be identified. One potential way to permit cross-indication comparisons (i.e. comparisons of cost-effectiveness across disease states) would be to utilize life-expectancy as the measured outcome.1 However, this approach would not consider the health-related quality of life associated with various interventions and would bias funding decisions against those interventions imparting mainly H R Q L while favoring only those 25 interventions that result in improvements in survival. A s such, diseases such as rheumatoid arthritis (RA) , where improvements in survival due to interventions are small (when compared to cancer therapy or H I V pharmacotherapy) \" but improvements in H R Q L are paramount, would be hard-pressed to compete for scarce health resources. Therefore, an outcome measure that integrates both years of life and H R Q L into a single metric provides a solution to this problem. 1 ' 6 The use of quality adjusted life years ( Q A L Y s ) is an attempted solution to incorporate both potential life prolongation and improvement in H R Q L . 1 Neumann et al. stated that \" Q A L Y s represent the benefit o f a health intervention in terms of time in a series of quality-weighted health states, in which the quality weights reflect the desirability o f l iving in the state, typically from perfect health (weighted 1.0) to dead (weighted 0.0).\"1 Therefore, once the quality weights are obtained for each health state experienced by an individual, they are multiplied by the duration of time spent in the health state. The products of these calculations are then summed to obtain the total number of Q A L Y s for that person in the following manner: T Total QALYs(QT) = t^u^D, i=l Where: Uj(qj)= the quality of life in period i (measured by utilities); t = the time interval of period in terms of years; D i = discount factor of period i A s such, the incremental cost-effectiveness ratio becomes: 26 The Q A L Y approach is not without controversy and other competing methods have been suggested such as the Healthy-Years Equivalents (HYEs) , disability adjusted life years ( D A L Y s ) , and saved young life equivalents ( S A V E ) . 1 ' 2 ' 6 Each of these techniques has their relative advantages and disadvantages which are beyond the scope o f this chapter; nonetheless, Q A L Y s have remained the outcome measure o f choice i n the health economic literature. 1' 6 However, the Q A L Y approach has several assumptions on which it is based which include, but are not limited to: 1) utility independence; 2) constant proportionality trade-off; and 3) risk attitude over life years. 1 ' 6 However, assuming that the Q A L Y approach is correct and assumptions are met, there is still the issue of what should be utilized as the source of weights for the health states in the Q A L Y calculation. Certain conditions for these weights must be met which include: 1) that they be based on preferences for health states; 2) that they be measured on an interval scale; and 3) that they be anchored at perfect health (1.0) and death (0.0). 2 For the latter anchoring requirement, health states can be valued to be worse than death and have negative weights associated with them. 1 ' 2 Also , the terminology used to describe these weights can be problematic as researchers have used the terms \"utility\", \"value\" and \"preference\" interchangeably.1 However, as Drummond et a l . 2 describe, the term \"utility\" is reserved for preferences that are measured under conditions of uncertainty that satisfy the axioms of expected utility theory (the standard gamble [SG]). \"Values\" are preferences that are measured under conditions of certainty and thus include rating scale and time-tradeoff elicited scores, whereas \"preferences\" encompass both and is a general term to describe the desirability of a set of outcomes. 27 Sources for weights which meet the aforementioned assumptions come from both directly elicited techniques and indirectly elicited techniques. 1 ' 2 ' 7 Although an in-depth discussion of techniques to directly elicit preference values to be used as weightings for Q A L Y s is beyond the scope of this chapter, a brief description is provided. Directly elicited techniques encompass the methods to elicit values and utilities as described above (namely the SG, the T T O , and the RS). Within this framework, it is recommended to use choice-based techniques (SG or TTO) over scaling methods (RS) . 1 ' 2 The S G is grounded in von Neumann Morgenstern expected utility theory (EUT) and, in its usual form, asks respondents to choose between a particular, intermediate health state with certainty and a gamble involving a probability of a better or worse outcome than the certain 6 7 outcome. ' The goal with the S G approach is to find the probability (p) in the gamble at which the respondent is indifferent between the certain and uncertain alternatives.2 Although long considered to be the preferred method due to its strong ties to E U T , a recent qualitative study investigated what thought processes respondents invoke in formulating their S G responses and found that some respondents were incorporating inappropriate information into their choices. In addition, Dolan argues that since there is evidence that people violate the assumptions of E U T , much of the appeal of the S G is lost. 6 The T T O technique involves asking a respondent to make tradeoffs between a shorter life span in perfect health versus a longer life span in the health state in question. The time in 1 2 full health is varied until the respondent is indifferent between the two alternatives. ' The T T O choice is not made under uncertainly so the values that it elicits are not considered utilities, at least under E U T . ' Recently, a study appeared that has cast doubt on the ability 28 to use the T T O as weighting for Q A L Y s due to the violation of the constant proportional time trade-off assumption. 9 Finally, the R S technique asks respondents to indicate ratings for health states (or their own health state) on a scale (usually a vertical or horizontal line) with endpoints of \"worst\" and \"best\" health states usually represented by 0 and 100. To allow for the possibility for health states worse than death, the line is often anchored at the \"worst\" and \"best\" imaginable health states.6 , 7 In comparing directly elicited preference scores, application of the various techniques result in different preference weights. 1 The S G approach almost always generates scores that are higher than the T T O method, and both are usually greater than R S scores. 1 ' 2 ' 7 Since people are risk averse, they are less wil l ing to accept the gamble outcome presented in the S G and more wil l ing to accept the certainty. A s well , since people have positive time preference and value years of life in the near future more than years of life in the distant future, they would be more wil l ing to give up years of life at the end of a profile as in the T T O . 6 Both of these assumptions would lead to higher S G scores than T T O values. Indirect preference or utility assessment techniques involve the use o f generic health classification systems in the form of a questionnaire.1'2 Through completion of the health classification system, respondents are assigned a health state which, in turn, is valued using a scoring function that applies preference weights from another population (i.e. society). Due to their relative ease and low cost to administer when compared to the S G or T T O techniques, these questionnaires are widely applied. ' These instruments commonly utilize multi-attribute utility theory ( M A U T ) to combine many attributes into a single utility value. 29 A s it is beyond the scope of the chapter to describe M A U T in detail, the reader is referred to reviews for a complete description and assumptions involved with this theory. 1 ' 2 The most common examples of these questionnaires include the Health Utilities Index Mark 2 and Mark 3 (HUI2 and H U D ) , the EuroQol (EQ-5D) and the Short-Form 6D (SF-6D). Each of these systems assesses different domains of health and relies on different scoring functions/methods to determine preference scores. 1 0 Other preference-based measures that have been less commonly applied are the Quality of Well-Being scale, the Finnish 15-D, and the Assessment of Quality of Life ( A Q o L ) . 2 ' 1 0 In the Canadian Coordinating Office of Health Technology Assessment's Guidelines for the Economic Evaluation of Pharmaceuticals, 1 1 it is stated that due to the lack of head-to-head comparisons o f these systems, there is little information to guide users in the choice o f instrument. A s such, it advises users \"to study the alternative systems, to select in advance the one that best suits the study objectives, to justify the selection in the study protocol, and to stick with it. It is not appropriate to try a variety of approaches and simply pick the one that puts the product in the best light\". 1 1 Dolan echoes this advice and states that the \"evidence on the responsiveness of one measure relative to another is in short supply\" and that further within patient comparisons are necessary.6 Neumann et al. voice concerns about the potential lack of sensitivity to important changes in particular disease states that might be experienced in the application of these systems.1 Finally, Hawthorne et al. surmise that sensitivity of these instruments might be context specific with some instruments being more sensitive to health states in some diseases when compared to others. 1 0 30 In the next sections of this chapter, I w i l l review the four most commonly utilized indirect utility assessment instruments, the comparative data that exists and the choice of Q A L Y weightings used in the published cost-utility analyses of treatments for R A . 2.2 PREFERENCE-BASED, INDIRECT UTILITY ASSESSMENT MEASURES 2.2.1. The Health Utilities Index (HUI) Mark 2 and 3 The H U I Mark 2 and 3 systems (HUI2 and HUI3) are generic preference-based measures which, when used together, describe almost 1,000,000 unique health states.12 The HUI2 and H U B health classification systems were designed to directly link the multi-attribute health status classification system used to describe health status with preference-based, multi-attribute utility functions. The preference-based scoring functions convert the descriptive health classifications into values for each attribute and a single value for overall H R Q L . 1 2 ' 1 5 The HUI2 was originally developed to assess the global morbidity burden of childhood cancer. The content of this instrument was based on a study in which lay raters ranked 15 attributes of health according to importance. 1 6 The H U B was developed to, improve upon the definitions used in the HUI2, be applicable in both clinical and general population health studies, and to have structural independence among its attributes (i.e. such that all combinations of levels in the system are possible). Since its creation, the H U B has been used in a variety o f clinical studies and in five major population health surveys in Canada. 1 4 31 The HUI2 and H U B were developed with the intention of capturing 'within the skin' attributes of health status.1 5 The HUI2 classification system originally consisted of 7 attributes: Sensation (vision, hearing, speech), Mobil i ty, Emotion, Cognition, Self-care, Pain, and Fertility. Although fertility was initially included to assess sub-fertility and infertility sequelae associated with childhood cancer and its treatment, this attribute has been dropped from the current H U I questionnaires. The 8 attributes of the HUI3 classification system are: Vis ion , Hearing, Speech, Ambulation, Dexterity, Emotion, Cognition, and Pain. Although certain attributes have the same names across the classification systems, they have different underlying constructs. The Pain attribute assesses severity o f pain in the H U B system whereas, in the HUI2 system, the frequency of pain and its control are considered. Similarly, for the Emotion attributes, in the H U B system there is a focus on happiness versus depression but in the HUI2 system, distress and anxiety are assessed. Finally, the Cognition attribute in the HUI2 system concentrates on learning whereas in the H U B , the ability to solve day to day problems is assessed. Therefore, studies sometimes combine the HUI2 and H U B to take advantage of both of their properties. 1 7\" 1 9 However, the creators recommend that the H U B be specified as the measure for primary analysis due to its larger descriptive system (972,000 possible health states vs. 24,000 possible health states in the HUI2), structural independence, and availability for comparison with population norms. The authors also suggest that adding the HUI2 to the H U B also provides an efficient source to conduct sensitivity analysis on the utility values. Therefore, the HUI2 and H U B are often administered together and have been 12 formatted for interviewer-administration (face-to-face or by telephone) or self-completion. There are two versions for the viewpoints of the questionnaires: a self-assessment version 32 where information is collected from people about their own health and a proxy assessment version, where health status information is collected from people other than the subjects. Also , there are different health status assessments periods that can be captured by the H U I systems which are classified as \"current\" or \"usual\". There are three \"current\" versions which assess periods of the past 1-week, 2-weeks and 4-weeks. The \"usual\" version does not specify time recall periods. The \"current\" versions are recommended for clinical or health economic evaluations whereas the \"usual\" version is recommended for use in population health surveys. Both the H U I 2 and H U D scoring systems are based upon multiplicative multi-attribute utility functions. 1 3 ' 1 4 This facilitates calculation of H R Q L index scores, where dead has a utility of 0 and healthy has a utility of 1.0. Single attribute utility scores can also be calculated for each attribute in the HUI2 and H U D . Both systems allow for calculation of utility values less than zero (health states considered worse than death) with the lowest possible scores being -0.03 for the HUI2 and -0.36 for the H U D . 1 2 The utility scores are assumed to have interval scale properties, whereas the attribute levels do not have interval scale properties. 1 3 ' 1 4 H U I utility scores represent mean community preferences, and the scores have been calculated from preference scores measured in accordance with von Neumann- Morgenstern expected utility theory and extensions of this theory to accommodate multiattribute utility functions. 1 3 ' 1 4 ' 2 0 ' 2 1 The investigators obtained preferences (using the S G in four marker states and a rating scale for the remainder in a random sample of the population l iving around Hamilton, Ontario) for single-deficit states, including \"corner\" states in which the deficit is set to the worst l e v e l . 1 3 ' 1 4 ' 2 2 Rating scale value scores were 33 converted to utility scores using a power function determined from the relationship between the S G and rating scale value scores in the marker states. 1 3 ' 1 4 A number of the attributes of the HUI2 and H U D are specifically relevant to the study o f rheumatoid arthritis, including Mobil i ty, Emotion (from both systems), Self-Care, Pain (from both systems), Ambulation, and Dexterity. The HUI2 and H U D do not contain items that explicitly inquire about social roles, family roles, energy, work/productivity, and personality. These dimensions, which may be considered 'outside the skin ' , may be important to patients with rheumatoid arthritis such as social functioning, work/productivity, 9\"^ 97-and energy. \". However, the scoring functions of the HUI2 and H U D may indirectly capture some o f these 'outside the skin' attributes. There has been some work to characterize what would represent the minimally important difference (MID) for the HUI2 and the H U D . 1 2 Grootendorst et al. concluded that differences on the H U D of 0.03 or more should be considered to be clinically important 2 8, 90 whereas Samsa et al. determined, from a small random sample of 160 patients from a Veteran's Administration hospital, that 0.02 (95% confidence interval 0.01 to 0.05) was a clinically meaningful difference. Based upon these results and the fact that the smallest difference between utility scores between levels of an H U I attribute is 0.05, the creators of the H U I systems recommend that a difference of 0.05 (and possibly smaller) is likely meaningful. However, further research is required to substantiate these recommendations in variety of different diseases including R A . To date, no published studies could be identified that examine the properties of the HUI2 or H U D specifically in R A . However, a few studies examine the properties of the H U D system (but not the HUI2) in people with one of several musculoskeletal diseases of 34 which R A was inc luded . 2 6 \" 2 8 ' 3 0 Since most of these are comparative studies with other preference-based measures, they w i l l be discussed in detail in section 2.3. Analysis of an arthritis patient sub-sample from a population health survey using the H U D found the greatest burden of morbidity in pain with very small, but significant (due to large sample sizes), differences in ambulation and cognition compared to a reference group without stroke or arthritis 2 8 However, the major limitation of this analysis was that the diagnosis of arthritis was self-reported and could have represented one of a number of conditions including osteoarthritis, rheumatoid arthritis or other rheumatic conditions. This limitation is substantiated by the relatively high mean H U D utility scores for patients reporting \"arthritis\" in this sample (0.84) which was significantly lower (but not as low as expected i f the sample had been limited to only R A ) than the mean control score (those without arthritis or stroke) o f 0.92. 2.2.2. The EuroQol (EQ-5D) The E Q - 5 D was designed as a cardinal index of health for describing and valuing H R Q L . The main objective in creating this instrument was to develop a standardized measure for describing and valuing H R Q L that could be used to generate cross-national comparisons of health status. The dimensions of the instrument were selected after a detailed review of several, generic health status measures. The instrument consists o f a descriptive health state classification system and a visual analog scale 'health thermometer' (the V A S component). The descriptive health state classification system consists of 5 domains (Mobility, Self-Care, Usual Activities, Anxiety/Depression, and Pain/Discomfort), each with 3 response levels (no problems, some 35 problems, extreme problems). The health 'thermometer' represents a subjective, global evaluation of the respondent's health status on a scale between 0 and 100, where 0 represents worst imaginable health state and 100 represents best imaginable health. Three types of data are produced for each patient: a health state vector or profile describing the extent of problems on each of the 5 domains, a population-weighted health-index based on the health state vector (the E Q - 5 D index or utility score), and a VAS-based ^9 self-rated assessment of H R Q L . The E Q - 5 D was intended for self-completion and the recall period refers to the present (today). 3 3 The scoring algorithm typically applied to the descriptive health classification system is the UK-based York scoring system. 3 4 This scoring system was generated from interviews of a sample of the general U K population. Respondents were asked to rank and then value hypothetical E Q - 5 D health states using the Time Trade-Off approach. 3 4 Although no Canadian-based scoring ' tariff has been developed for the EQ-5D, a scoring model has been generated for VAS-based valuations in an adult U S sample. A study of differences between a European and Canadian-based sample o f E Q - 5 D valuations found V A S valuations for E Q -5D health states were comparable for all domains other than Usual Act iv i t ies . 3 6 Scores on the E Q - 5 D range from -0.56 to 1.00 where negative values represent health 99 states worse than death. This range is the widest of all utility values determined by any of 99 the preference-based instruments. In addition, the brevity of the E Q - 5 D has been considered to be a strength in the study of H R Q L . In a large (sample size was over 1100 patients in each group), randomized comparison of the two instruments, the response rate of the E Q - 5 D was shown to be higher than the SF-6D in severely disabled persons after stroke (66% with no missing data vs. 55%, pO.OOOl , respectively). 3 5 36 However, the advantage of brevity of the E Q - 5 D leads to a major limitation in its health classification system - the few domains of health assessed and the small number of health states described. O f all the preference-based instruments, the E Q - 5 D has the smallest number of health states described (243) compared to the HUI2 (24,000), H U D (972,000) and the SF-6D (18,000). In addition, it has the fewest domains of health that are assessed by its system. Therefore, in studies of chronic conditions such as R A , these might not be sufficient to accurately describe impairments in H R Q L . For example, the E Q - 5 D lacks dimensions of H R Q L that may be impacted by R A such as dexterity, social functioning and vitality. The ubiquitous dimension 'usual activities' has the potential to elicit some information on personality, family roles, or productivity, but the bundling of so many potential aspects of H R Q L obscures interpretation. Also , many researchers have found that there are many gaps in the distribution o f scores achieved by the E Q - 5 D , especially in the mid-utility range (between 0.30 to 0.5) with a clustering o f scores in the upper-utility area possibly leading to ceiling effects. 3 0 ' 3 8 Another concern about the E Q - 5 D relating to the small number of health states described is its ability to detect responsiveness and sensitivity to changes. Floor and ceiling effects have also been observed to a greater degree in the E Q - 5 D than in the Short Form-12 (SF-12), an abbreviated version of the Short Form-36 (SF-36). 3 9 In studies comparing the E Q - 5 D to the SF-36, the EQ-5D index score was less responsive to change and less able to discriminate between groups than the SF-36. 4 0 \" 4 2 A recent study comparing the E Q - 5 D to the SF-6D (a preference-based measure derived from the SF-36 - see below for more details), the authors found that the SF-6D was more sensitive than the E Q - 5 D in detecting small changes in patients who had undergone liver transplantation.4 3 37 Specifically, in R A , there have been two studies that have examined the construct validity, responsiveness and reliability of the EQ-5D. Hurst et al. first reported on the validity of the E Q - 5 D in terms of its ability to measure both current health status and change in health status in a small sample of 55 patients with R A . 4 4 A t baseline, the E Q - 5 D index scores were significantly correlated with other condition-specific measures including loss of function, joint pain, joint tenderness and mood. In addition, E Q - 5 D change scores were correlated with changes in these measures. In a larger study with 233 R A patients, these results were repeated.4 5 In addition, the investigators found that the test-retest reliability of the E Q - 5 D utility score (ICCM).78, 95% CI 0.60-0.96) after two weeks which was higher than all other clinical measures except the Health Assessment Questionnaire ( H A Q ) scores. Therefore, it appears that the E Q - 5 D has demonstrated reliability and cross-sectional and longitudinal construct validity in R A . However, from these results, it was unclear how these properties for the E Q - 5 D would compare to other preference-based instruments in the assessment of R A . In a study comparing the responsiveness of generic health status measures in patients with R A who were either receiving infliximab or not receiving infliximab, SF-6D scores were found to be consistently higher than E Q - 5 D scores. These authors found lower test-retest reliability for the E Q - 5 D (ICC = 0.66) compared to the SF-6D (ICC=0.72). In addition, these authors found mean changes almost two-fold greater in the E Q - 5 D than the SF-6D in those patients receiving infliximab. However, effect sizes as a measure of responsiveness were larger in the SF-6D due primarily to the smaller standard deviation at baseline (0.07 for the SF-6D vs. 0.30 for the EQ-5D) . Other studies comparing the E Q - 5 D to other preference-38 based measures have been conducted outside of R A and w i l l be discussed in detail be low. 3 0 ' 4 3 N o studies were identified that identified what is thought to be the minimally important difference (MID) in index scores of the EQ-5D. It has been hypothesized that the M I D is 0.03 since this represents the smallest change in the utility values that can occur as a result of a one category change in a single dimension. Further research is necessary to further characterize the M I D for the EQ-5D. 2.2.3. The Short Form 6D (SF-6D) The most recent o f the preference-based, indirect utility assessment instruments, the SF-6D, was created by Brazier et al. in an effort to derive a scoring algorithm to derive preference-based scores from the SF-3 6 4 6 ' 4 7 The SF-36 is one of the most widely utilized H R Q L measures and contains 36 questions assessing these eight dimensions: physical functioning, role limitation due to physical health , social functioning, vitality, bodily pain, mental health, role limitation due to emotional problems, and general health. 4 8 The SF-6D revised the SF-36 into a six-dimensional health state classifications system assessing physical functioning, role limitations, social functioning, pain, mental health, and vitality. The SF-6D health classification system defines health states by a respondent selecting one level from each of the six dimensions. Each dimension has four to six levels and thus, 18,000 possible health states are defined in this manner . 2 2 ' 4 7 To assess preferences for the multi-attribute health states defined by the SF-6D system, the creators used an interviewer administered S G in a representative sample from the U K . 4 7 The boundaries of the SF-6D utility scores are from 0.30 to 1.00 with a score of 1.00 being indicative of \"full health\". The 39 M I D of the SF-6D, based upon a meta-analysis of seven longitudinal studies, was determined to be the 0.033 (95% CI 0.029 to 0.037). 4 9 Due to the newness of this preference-based, indirect utility assessment instrument, there have been few published studies in which it has been utilized. However, the use of this measure is increasing, and with the availability o f many SF-36 datasets that could be converted into preference-based measures, it is anticipated that the application of this measure w i l l continue to grow. 26'30>43>49-51 of note, the creators of the SF-6D state that, when compared to the EQ-5D, the fact that the SF-6D has a much larger descriptive system may result in greater sensitivity. 4 7 Specific studies comparing results obtained with the SF-6D and other preference-based measures w i l l be described in detail below. 2.3 EMPIRIC COMPARISONS BETWEEN THE INDIRECT UTILITY ASSESSMENT INSTRUMENTS 2.3.1. Comparisons between the Health Utilities Index Mark 2 and Mark 3 One o f the first studies published in the literature examining differences achieved with scores achieved from the HUI2 and H U D was authored by Neumann et al.. These investigators compared scores achieved with the two H U I systems in a cross-sectional sample of 679 patients with Alzheimer's disease (AD) and their caregivers. In addition, the investigators utilized the scores obtained by the two systems in a decision-analytic, Markov-model based, economic evaluation of a new drug to determine what the impact using the different utility values would be on the incremental cost-effectiveness ratios. When patients completed the questionnaires, their mean (SD) utility scores were lower on the H U D 40 (0.22[0.26]) than on the HUI2 (0.53 [0.21]). However, when caregivers completed the questionnaires as proxies for the patients, similar results were found between the two systems (mean score [SD] on the H U D o f 0.87 [0.14] and HUI2 0.87 [0.11]). Both systems appeared to have construct validity in terms of their ability to discriminate between severity levels of A D . For the H U D , patient scores ranged from 0.47(0.24) for questionable A D to -0.23 (0.08) for terminal A D , compared with a range of 0.73 (0.15) to 0.14 (0.07) for the HUI2 . In the results of the cost-effectiveness analysis, the results were more economically attractive when the scores for the H U D were used as compared to the HUI2 . Maddigan et al.. examined the construct validity of the two H U I systems in 394 patients with type 2 diabetes in rural communities in Alberta and subsequently compared the scores of the two systems and examined reasons for their differences. 5 3 ' 5 4 The mean score of the HUI2 was higher (0.78, S D 0.18) than the mean score of the H U D (0.64, S D 0.30). Using the \"known groups\" approach to the assessment of construct validity, the investigators found that the HUI2 , H U D and the R A N D - 1 2 all discriminated across subgroups o f individuals representative o f more and less advanced diabetes or differing levels of disease severity. For example, disease severity measures were associated with impairment on the vision ambulation, dexterity and pain attributes on the H U D and impairments on self-care and mobility attributes of the HUI2 . Overall scores were lower in those above the median duration of diabetes than those below and in those whose diabetes was managed using insulin compared to diet alone. In the paper comparing the HUI2 to the H U D scores and examining the extent to which each of these systems detect differences associated with varying levels of type 2 diabetes severity or disease advancement, 372 individuals were available for analysis. Specifically, 41 differences were investigated in single attribute and overall HUI2 and H U D utility scores o f groups with presumed differences in disease severity or stability of control. Severity of type 2 diabetes was defined based upon those receiving insulin therapy (most severe) to those treated with diet al.one (least severe). Stability of control was defined based upon absenteeism from work, emergency room visits, and hemoglobin A l e values. Relative to HUI2 scores, larger differences were seen in HUI2 scores for individuals defined as having more advanced type 2 diabetes. Both the pain and emotion attributes of the H U D categorized a larger proportion o f the sample as having moderate to severe impairment than corresponding attributes in the HUI2 system. These observations prompted the authors to conclude that, due to the greater range o f possible scores (including the wide range o f states valued as worse than death) and its superior ability (relative to the HUI2) to discriminate between those with moderate or severe impairment as compared to mi ld or no impairment, the H U D may be a better instrument to utilize in type 2 diabetes. The responsiveness of the H U I systems was recently compared to the SF-36 and disease-specific measures (the Harris Hip Scale, Western Ontario and McMaster University Osteoarthritis Index ( W O M A C ) , and McMaster-Toronto Arthritis Patient Preference Disability Questionnaire ( M A C T A R ) ) in patients undergoing hip arthroplasty. Feeny et al. evaluated the responsiveness of these questionnaires in 90 patients (out of a possible 553 patients who had been initially referred for hip disease). Questionnaires were applied prior to surgery and post-surgery facilitating comparisons between pairs of measures for each patient. The responsiveness statistics utilized were the effect size (ES), the standardized response mean ( S R M ) , the relative efficiency statistic (RE) and the paired t-test. Some form of improvement was detected by the overall/summary scores for al l o f the instruments and in 42 many o f the domain scores of the SF-36 and the single-attributes for the H U I systems. A l l of the overall scores had large E S statistics including the HUI2 and H U B . For example, for the SF-36, improvements were observed in the physical functioning, bodily pain and vitality domains as well as the physical component summary score; for the HUI2 , improvements were observed in the pain and self-care attributes; and finally, for the H U B , improvements were observed in the pain and ambulation attributes. Surprisingly, the mobility attribute for the HUI2 was not responsive. Overall, as hypothesized, the disease-specific measures were most responsive but the generic measures yielded acceptable responsiveness statistics and would be suitable to be used in this context. Another publication has resulted from this data set which examines differences between community based preferences (based on responses to the HUI2 and H U B ) and individual preferences (based upon S G utilities). 5 6 The investigators examined agreement (as measured by intraclass correlation coefficients [ICC]) between the different utility assessment techniques and compared the mean scores between instruments. Mean scores were statistically higher for the S G when compared to the H U B (0.62, S D 0.31 vs. 0.52, S D 0.21) but not the HUI2 (0.62, S D 0.31 vs. 0.62, S D 0.19). However, the ICCs were low ranging from 0.06 (agreement between the S G and HUI2) to 0.09 (agreement between the S G and the H U B ) . Thus, the authors concluded that the HUI2 was a good proxy for directly measured S G at the group level; however, this conclusion cannot be applied at the individual level (as evidenced by the low agreement). Feeny et al. conducted another study examining the relationships between the HUI2 , H U B and directly-measured S G scores at both the individual- and group-level in a sample o f 140 teenage survivors of extremely low birthweight ( E L B W ) and 124 control group teens. 43 Again, mean S G scores were compared to HUI2 and H U D scores and agreement was assessed between the S G and the HUI2 and H U D using the ICC. For the E L B W group, the S G , HUI2 and H U D mean (SD) scores were 0.90 (0.20), 0.89 (0.14), and 0.80 (0.22) compared to the control scores of 0.93 (0.11), 0.95 (0.09) and 0.89 (0.13) respectively. The differences between the E L B W and control HUI2 and H U D scores were significantly different (pO.OOOl and p<0.0002) and clinically important. However, no such differences were observed between the S G in the sample and the controls. A l so , although there were no differences between mean S G and HUI2 scores, mean S G and HUI2 were significantly different (p<0.001) than H U D scores with the latter being systematically lower. In the assessment of agreement, the ICCs between the S G and the HUI2 or H U D were very low indicating poor agreement at the individual level. Agreement between the HUI2 and H U D was moderate at 0.63 (95% 0.01 to 0.73) in the sample and control groups combined. Again, the authors conclude that, at the group level, results from the HUI2 and the S G are interchangeable but this relationship did not hold up at the individual level. 2.3.2. Comparisons across Indirect Utility Assessment Instruments Outside of Musculoskeletal Diseases The first study comparing preference based scores derived from indirect utility assessment instruments arose within the framework of a randomized clinical trial o f 561 patients being treated with tirilazad mesylate or placebo for aneurysmal subarachnoid hemorrhage. 5 8 The E Q - 5 D V A S , the HUI2 utility scores and a rating scale were used as measures of patient preferences. The scoring function of the E Q - 5 D was not considered as the authors stated that they were assessing patient preferences (rather than societal 44 preferences). The measures o f preferences tended to have higher agreement at lower levels of functioning and poor agreement at higher levels of functioning. Since this study did not directly compare the scores that are typically utilized as Q A L Y weights, little can be concluded from this research. However, this study likely sparked the interests of other investigators to conduct comparisons between the indirect utility assessment instruments and was the forerunner to a body of research. The creation of the Assessment of Quality of Life ( A Q o L ) questionnaire led investigators to compare its properties with those of the H U B , E Q - 5 D , SF-6D and a Finnish preference-based measure, the 15-D. 1 0 O f note, the SF-6D scores were calculated by an older algorithm which has since been changed. 4 6 , 4 7 The investigators administered these instruments to residents in Victoria, Australia. The sample was selected to provide a heterogeneous, representative sample of community members weighted by socioeconomic status, chronically i l l patients attending outpatient clinics in two of Melbourne's largest hospitals, and inpatients from three hospitals. The response rate was 58% (n=396), 43% (n=334) and 58% (n=266) for the community, outpatient and inpatient samples, respectively for a total number o f 976 respondents. The investigators found that the distributions o f the scores o f the five instruments were quite different with A Q o L , H U B and EQ-5D having a greater range o f scores and lower values than the SF-6D and the 15-D. However, when broken down by sample type (community, outpatient, inpatient) and by age (16-35, 36-50, 51-65, and >66), all the instruments displayed a monotonic, decreasing relationship between sample and age-groupings. Spearman's correlations between each pair of instruments scores revealed high (>0.60) correlation coefficients for al l comparisons. The A Q o L and the 15-D had the highest 45 correlation (0.80) whereas the E Q - 5 D and the HUI-3 had the lowest (0.64). Finally, in a more detailed analysis of patterns of agreement between the instruments, it was revealed that a change in the average score in the SF-6D and the 15-D corresponded to a much greater change in the scores predicted by the other three instruments. In the determination of ceiling effects (where scores cluster at the highest ends of the scale), scores o f the other instruments were plotted when the score of the instrument of interest was at a maximum value of 1.00. B y examining the dispersion in the other scores, the investigators determined the ability o f the other instruments to detect differences in health states when the instrument o f interest was at its ceiling. The results showed that the dispersion of scores for the other instruments when the A Q o L , 15D, or SF-6D were at 1.00 was minimal suggesting that these instruments had a relatively high ceiling. However, when a utility value of 1.00 was achieved with the H U D and the E Q - 5 D , there was significant dispersion of scores in the other instruments suggesting a possible low ceiling effect. Thus, it would appear that despite having a wider range, both the H U D and the E Q - 5 D display ceiling effects that are not experienced by the SF-6D. Bosch and Hunink compared the H U D and the E Q - 5 D in 88 patients treated for intermittent claudication in the Netherlands. 5 9 These patients completed the H U D , E Q - 5 D , R A N D 36-Item Health Survey 1.0, T T O , S G and rating scale before revascularization and at follow-up at 1 month after the procedure. After revascularization, improvements were mostly noted in the H U D attributes of pain and ambulation compared to mobility, usual activities and pain/discomfort domains in the E Q - 5 D system. It was hypothesized that since T T O scores are usually lower than S G scores, the mean T T O and the E Q - 5 D (which uses the T T O in its scoring function) scores would be 46 lower than the mean S G and the H U D (which uses the S G in its scoring function as described above) scores. Prior to treatment, the E Q - 5 D mean (SD) score (0.57 (0.25)) was significantly lower than the H U D mean (SD) score (0.66 (0.20), p O . O l ) . Also , as hypothesized, the T T O mean (SD) scores (0.82 (0.17)) were lower than the S G mean (SD) scores (0.91 (0.14)). However, at 1 month after the procedure, there were no differences between the H U D and the E Q - 5 D scores (0.77 (0.21) vs. 0.79 (0.23), respectively). To investigate agreement at the individual level, the investigators determined the I C C values between the H U D and E Q - 5 D at baseline (0.49) and at 1 month after the procedure (0.66). The I C C between the changes in the H U D and EQ-5D score was poor (0.30). The authors concluded that studies utilizing the mean values of these systems (such as in cost-utility analysis) would conclude a lower impact on H R Q L due to revascularization i f the H U D was used instead o f the E Q - 5 D due to the smaller changes in utility scores. Longworth and Bryan conducted a comparison of the E Q - 5 D and the SF-6D in liver transplant patients in 524 patients (90% response rate for at least one questionnaire) in the U K . 4 3 Investigators administered the H R Q L questionnaires from point of listing on the transplant list and then in 3 month intervals until transplantation. After transplantation, H R Q L questionnaires were given at 3, 6, 12 and 24 months. A t the conclusion o f the study, there were 1462 data pairs (at two consecutive time points) to compare the two indirect utility assessment instruments. When results of the mean scores of the two instruments were compared at baseline (listing time) to 12 months post-transplantation, the E Q - 5 D detected a significant improvement in H R Q L (mean score increased from 0.52 to 0.61, mean change o f 0.09, 95% 47 CI 0.03 to 0.14); however, the SF-6D did not show a significant change (mean score increased from 0.61 to 0.62, mean change of 0.01, 95% CI -0.04 to 0.05). In pre-transplantation measurements, the SF-6D was found to have a much narrower spread and symmetrical distribution when compared to the E Q - 5 D . Due to its relatively high lower bound of 0.30, scores could not dip lower than this value. Conversely, no patients were scored as 1.00 (full health) by the SF-6D system. On the other hand, the E Q - 5 D had a sizeable proportion of respondents classified as health states worse than dead (negative values) at any time point prior to transplantation and a number of patients scoring full health at all time points prior to transplantation. In post-transplantation measurements, the results were similar with less o f the E Q - 5 D scores being in the \"worse than dead\" range but more at the full health point (1.00). The distribution o f the SF-6D scores was similar as those achieved prior to transplantation with a few patients reporting full health (1.00). O f note, there were also gaps in the distribution of the E Q - 5 D scores, with the most noticeable being between 0.37 to 0.50 and 0.88 to 1.00. Although the correlations between the E Q - 5 D and SF-6D scores were high (0.76, p<0.001), there was a large amount of variation in the scores across the measures. In the examination of ceiling effects, when the E Q - 5 D was scored as 1.00 (a total of 237 paired observations), only 22 SF-6D also were at full health. The remaining SF-6D scores ranged from 0.57 to 0.99 with a mean score of 0.82. Thus, it would appear that towards the higher range of utility scores, the SF-6D showed greater sensitivity. However, the reverse was true when floor effects were examined. More respondents indicated the lowest levels on the SF-6D than the E Q - 5 D domains. For example, 42% and 21% of respondents indicated the most severe levels on the role limitation and vitality domain, respectively on the SF-6D 48 questionnaire with the largest proportion indicating the worst level on any E Q - 5 D domain being usual activities at 14%. Therefore, from the results of this analysis, it would appear that despite having better properties at the upper end of the utility range, the SF-6D displays floor effects. This finding is likely limited to disease states where the burden of disease is large as in organ transplantation. A s such, the use of the SF-6D over instruments such as the E Q - 5 D may underestimate the magnitude of H R Q L improvements in these types of conditions and undervalue treatment in cost-utility analysis. This finding is somewhat in agreement with the statement by Brazier et al. in that \"any greater sensitivity [of the SF-6D] would be most likely in groups experiencing mild to moderate health problems and in those expected to experience comparatively small changes or where small differences are expected between interventions.\" 4 7 O'Br ien et al. examined the level o f agreement between the SF-6D utility algorithm and the H U D in patients at increased risk o f sudden cardiac death participating in a randomized trial of implantable defibrillator therapy. 5 1 The SF-6D and the H U D questionnaires were completed at baseline by 246 patients generating cross-sectional scores. The mean values from the H U D (0.61, 95% CI 0.60 to 0.63) and the SF-6D (0.58, 95% CI 0.54 to 0.62, p<0.03). A s shown with other studies, the range o f the H U D scores were much greater (-0.21 to 1.00) as compared to the SF-6D scores (0.30 to 0.95). The distributions of the scores of the two systems again were quite different with the SF-6D passing the Kolmogorov-Smirnoff test for normality. The distribution for the H U D scores failed statistical tests for normality and followed a skewed, bimodal pattern. Agreement, assessed using the intraclass correlation coefficient (ICC), was low (ICC 0.42, 95% C I 0.31 to 0.52). 49 In their discussion, the authors raise several interesting points on potential reasons for the differences observed in the scores from the two instruments. Firstly, the SF-6D considers different domains of health while the H U B is based on \"within the skin\" attributes. Secondly, although their scores are both based on the S G , SF-6D health states were valued directly while the H U B health states were directly valued by RS and converted to S G scores by a statistical power function. Finally, although the absolute scores for these instruments were different in a cross-sectional study, it is not clear from their results i f difference scores would vary to the same extent in a longitudinal analysis. Siderowf et a l . 6 0 compared the scores of three preference-based instruments, the E Q -5D, the Disability and Distress Index (DDI), and the HUI2 in 100 patients with idiopathic Parkinson's disease (PD). The D D I contains four functional domains: general mobility, usual activities, self-care and social and person relations. Responses on these domains are combined with an overall rating for the dimension \"distress\" and the entire system is scored between -1.486 and 1.0. While the D D I appears to be preference-based, it does not provide utility values. Construct validity of the three preference-based instruments was determined by comparing their scores with the total Unified P D Rating Scale [UPDRS] (a widely used disease-specific, symptom severity rating system), the Hoehn and Yahr scale (disease severity scale in PD) and the Beck Depression Inventory. The three instruments' discriminative ability was tested by dividing the study sample into upper and lower halves and quartiles based on the U P D R S . Overall, the mean (SD) of the three scores were: E Q - 5 D 0.59 (0.27), HUI2 0.75 (0.18) and D D I 0.93 (0.17) which were significantly different (pO.OOl for pairwise comparisons). Only the EQ-5D supplied scores that were negative (health states valued as 50 worse than death). The scores between the instruments were moderately to strongly correlated with Pearson correlations coefficients of 0.74 (HUI2 with the EQ-5D) , 0.62 (EQ-5D with the DDI) and 0.56 (DDI with the HUI2). A l l three instruments were significantly correlated with disease-specific measures. Generally, the D D I had lower correlations with these measures than the HUI2 and EQ-5D. In terms of discriminative ability, the H U I 2 and E Q - 5 D were superior to the D D I in their ability to distinguish between severities of P D . None of the instruments were able to distinguish between subjects with and without motor fluctuations or drug-induced dyskinesia. In their discussion of the research findings the authors raise an interesting point - namely, that because all the instruments yielded scores that were correlated with the disease-specific measures, they might be measuring functional status much more than preferences. In order to test this hypothesis, further studies examining correlations and agreement with directly elicited preference techniques such as the S G and T T O . Using 36 clinical experts to score the HUI2 , HUI3 and the E Q - 5 D classification systems according to literature reports on eight sequelae associated with childhood meningitis, scores obtained using these three health classification systems were compared. 6 1 The sequelae chosen in the valuation exercise were deafness, minor hearing loss, epilepsy, mild mental retardation, severe mental retardation combined with tetraplegia , paresis of the leg, and mi ld mental retardation combined with epilepsy and paresis of the leg. For each of the sequelae, the investigators constructed a short, structured synopsis that reported on relevant domains. In general, scores on the H U I 2 and the E Q - 5 D were comparable except for the severe retardation and tetraplegia sequelae which was scored, on average, to be -0.15 (0.13) with the E Q - 5 D and 0.12 (0.03) with the HUI2 . Interestingly, with the same health 51 state, the score on the H U B was -0.33 (0.02). A l l health states were scored significantly lower using the H U B than the other systems (p<0.05 for all). However, the HUI2 and H U B had the same ranking for the health states. Rankings with the E Q - 5 D system were similar except it ranked epilepsy lower than mi ld hearing loss and leg paresis lower than deafness in contrast to the H U I systems. Using various measures of agreement, there were significant differences across all three of the instruments for each of the sequalae suggesting that they were not interchangeable. From their results, the authors concluded that sensitivity analyses of Q A L Y weightings must be employed in cost-utility analysis in order to account for the observed differences in scores. Lubetkin et al. examined the relationship between the SF-12, the E Q - 5 D and the H U B for overall scores and in analogous domains of health in a convenience sample of 301 participants (77% participation rate) at an inner-city community health centre in N e w York C i t y . 6 2 Participants were mainly from ethnic minorities, had low annual incomes (90% earned less than $30K), and low education (47% had high school graduation or less). Using Pearson's correlation coefficients, correlation between the overall scores ranged from 0.41 (SF-12 with E Q - V A S ) to 0.69 ( H U B with E Q - 5 D index). Considering just the preference-based measures, correlations between similar domains (using Kendall 's tau for ordinal variables) were 0.59 (between the H U B ambulation attribute and the E Q - 5 D mobility domain), 0.58 (between the H U B pain attribute and the E Q - 5 D pain domain, and 0.55 (between the H U B emotion attribute and the E Q - 5 D anxiety/depression domain). Areas o f impairment most frequently detected by the H U B were pain, vision, cognition and emotion, whereas, for the E Q - 5 D pain/discomfort and anxiety/depression were impaired most often. The authors concluded that despite differences in the structure of these systems, correlations 52 between related aspects were moderate to strong and participants demonstrated consistency in responses across analogous items. From a population health perspective, there have been comparisons of the H U D and the E Q - 5 D both in the U K and in Canada. 6 3 ' 6 4 In the comparison in the U K , the EQ-5D, a modified version of the H U D and the SF-12 were compared within a general population sample. 6 3 The modified version of the H U D that was used was an eight item questionnaire (one for each domain) that is available at no charge from the developers. The authors claimed that they could not afford the fees associated with using the standard H U D questionnaire as they are substantial. The three instruments were evaluated in terms of their feasibility, coverage (such that there should be a broad range o f responses across its items), and discrimination (ability to discriminate between individuals based upon self-rated health status, measurable morbidity, and socioeconomic status). A l l instruments showed feasibility in that there were low non-response rates (less than 6% for all the items across all questionnaires). The SF-12 had a broad distribution of scores across its items although there was still heavy skewing towards responses indicative of good health. However, the H U D and the E Q - 5 D scores were highly skewed on all dimensions with the majority reporting full health. In the E Q - 5 D , respondents were least l ikely to report problems on the self-care domain but most likely to report pain/discomfort. Forty-nine percent of respondents indicated no problems on all five domains of the E Q - 5 D . For the H U D , respondents were least likely to report problems on the speech dimension but most likely to report decrements in the pain dimension. In the sample, there were 35 distinct health states (out of the possible 243) described by the E Q - 5 D system compared to 126 distinct health states defined by the H U D . F i n a l l y , i n terms o f their ability to 53 discriminate between self-reported health states and socioeconomic status, all three instruments had acceptable levels of performance. The SF-12 summary scores could not discriminate among people with different education levels. The authors concluded that, despite the differences in their descriptive systems and scoring functions, overall there was no discerning feature to pick one over the other as a population health measure. The Canadian study attempted to assess the relationship between the H U B and the E Q - 5 D at both the descriptive and scoring level . 6 4 The analysis is performed on answers given by 1,477 respondents of Statistics Canada 1998 National Population Survey pilot study. Both the H U I and the E Q - 5 D mean scores declined with increasing age and with decreasing self-perceived health. People who report chronic conditions had lower scores and people with more severe problems a larger change in the scores. The H U B and E Q - 5 D scores were moderately correlated (0.58 for Spearman correlation coefficient) as were answers to self-related health questions (coefficients varying between 0.48 and 0.56). The instruments' mean scores had reasonable agreement in less healthy respondents and respondents of younger age (16-34) but, in healthier respondents, the mean scores were less similar. Thus, based on these results, it was decided to continue to use the H U B for the Canadian National Population Health Surveys. 2.3.3. Comparisons across Indirect Utility Assessment Instruments within Musculoskeletal Diseases Recently, three studies comparing the various properties of some of the indirect utility 26 27 30 * instruments in samples with rheumatologic conditions were published. ' ' O f these, in only one of the studies did all patients have R A 2 6 while in the other two studies, patients had 54 a mixture o f musculoskeletal diseases. The study conducted exclusively in R A , authored by Russell et al., examined the reliability and responsiveness of the SF-36, SF-6D, E Q - 5 D , S G , the modified H A Q , and a pain V A S in two groups of R A patients (Group 1 consisted of 24 Oft patients with stable R A and Group 2 consisted of 60 patients beginning infliximab therapy). Patients in group 2 were assessed prior to being initiated on infliximab therapy and after 14 weeks o f infliximab treatment. Test-retest reliability was estimated for each instrument in the stable patient group using the I C C whereas responsiveness was assessed by using the paired t-test, effect size (ES) and standardized response mean (SRM) . For all the measures, the I C C ranged from 0.50 (role emotional domain from the SF-36) to 0.92 (physical functioning domain from the SF-36). The preference-based measures had moderate reliability (ICCs of: E Q - 5 D 0.66, SF-6D 0.72, S G 0.73). However, the sample from which these results were derived was very small (n=24) and thus, these estimates may not be robust. In terms of responsiveness, for Group 2; all the overall scores and domain scores for the SF-36 detected significant changes from baseline to the second measurement. Standardized response means ( S R M ) and ES were the largest for the pain V A S , the E Q - 5 D V A S , the SF-36 physical component scores, and the SF-36 vitality domain. In terms of the preference-based measures, the S R M and E S values were 0.67 and 0.64 for the EQ-5D, 1.40 and 0.87 for the SF-6D, and 0.49 and 0.43 for the S G . Despite the fact that the change described by the E Q - 5 D system was twice that described by the SF-6D, the responsiveness statistics were much smaller mainly due to the larger SD of the baseline and change scores of the EQ-5D. The authors concluded that the SF-6D might be a preferable to the E Q - 5 D in measuring clinically-relevant improvement in R A . 55 Conner-Spady et al. assessed the interchangeability o f preference-based, indirect utility assessment instruments (the EQ-5D, H U D , and the SF-6D) in a longitudinal study. One hundred and sixty one patients (of the 252 initially approached) with at least one o f several rheumatological conditions (51% had R A , 19% had low back pain, 14% had knee osteoarthritis, 12% had fibromyalgia and 3% had psoriatic arthritis) participated in the baseline questionnaire and 98 patients had data both at baseline and 12 months later. O f the 98 patients. The mean scores (SD) of the instruments at baseline were 0.49 (0.31), 0.50 (0.54) and 0.62 (0.14) for the EQ-5D, H U D and SF-6D respectively. Distributions of the three measures were very different with the E Q - 5 D having a bimodal distribution with two gaps between 0.28 and 0.50 and another between 0.88 and 1.00. The H U D score distribution was more continuous with a wide range from -0.21 to 1.00. The SF-6D had a normal appearing distribution. O f the three instruments, it appeared that E Q - 5 D had some ceiling effects when compared to the other two instruments. For specific domains that were analogous across the instruments, > 97% reported decrements on the pain domains across all instruments, 42% ( H U D ) to 77% (SF-6D) reported impairment for mental health, and 52% ( H U D ) to 98% (SF-6D) for impairments on mobility, ambulation, or physical functioning. Responsiveness o f the three instruments from baseline to 12 months was assessed using a self-reported change question (dividing the group into \"better\", \"same\" and \"worse\" subgroups) and the E S . After 12 months, 41% reported their health to be better, 31% the same and 28% as worse compared with baseline. Using a repeated measures A N O V A , a significant tool effect with significantly higher SF-6D scores and a significant tool by time by group interaction with the E Q - 5 D scores showing a significantly greater mean improvement than the other two instruments (changes of 0.15, 0.07 and 0.05 for the E Q - 5 D , 56 H U D , and the SF-6D respectively). For the group reporting their health to be \"worse\", the E Q - 5 D showed a significantly greater mean decrease (0.19) than either the H U D (0.05) or the SF-6D (0.03). For the \"better\" and \"worse\" groups, there were no significant differences in the E S between the instruments. However, for the \"worse\" group, the E Q - 5 D had a significantly larger ES than the H U D and SF-6D. The authors concluded that the instruments, although measuring a similar underlying construct, were not interchangeable and could result in substantially different estimates i f used in a cost-utility analysis. The main limitation of this study was the inclusion of several disease states which may have influenced different domains covered by the various instruments. Thus, it was difficult to separate out the performances of the instruments in any particular disease state. Finally, the E Q - 5 D was compared to the H U D in sample of patients with rheumatic diseases in Singapore. Specifically, the authors compared overall utility scores, test-retest reliability, and construct validity of these instruments in 114 patients with rheumatic diseases (49 had R A , 31 had lupus, and 24 had osteoarthritis).2 7 Test-retest reliability was assessed using I C C values. Construct validity of the instruments was assessed by, based upon median values, dichotomizing SF-36 scores, pain V A S scores, tender points, and the number of other acute/chronic conditions and conducting t-tests and Mann Whitney U tests on the H U D or E Q - 5 D scores in each of these groups (i.e. mean H U D scores in those above and below the median of the SF-36 would be compared). Agreement between responses on analogous domains between the instruments was examined. The test-retest reliability of the E Q - 5 D was 0.64 compared to 0.75 for the H U D . The means (SD) of the preference-based scores were 0.75 (0.21) for the E Q - 5 D and 0.76 (0.17) for the H U D . Correlation between the two instruments'baseline scores was 0.45. The E Q -57 5D system classified patients into 16 unique health states whereas the H U B system classified patients into 72 states. For the pain dimensions on the two instruments, 78% reported deficits on the E Q - 5 D while 90% reported decrements in this domain on the H U B . A s expected for both instruments, patients classified as having worse health status had lower scores than those classified as having better health status (by all o f the criteria). Correlations between the two preference based measures and the SF-36 domain scores ranged from 0.23 to 0.55 for the E Q - 5 D and from 0.29 to 0.49 for the H U B (with the highest correlation for both instruments being with the \"bodily pain\" domain). The authors concluded that the E Q - 5 D and the H U B performed equally well in assessing H R Q L although they measured different dimensions. A s with the previous study, the inclusion of multiple disease states makes interpretation of the results difficult. In addition, despite collecting longitudinal data, responsiveness was not assessed. 2.4 QUALITY WEIGHTINGS IN THE ESTIMATION OF QALYS IN COST-UTILITY ANALYSES IN RA: WHAT ARE INVESTIGATORS USING? With the availability of new, effective and costly pharmacotherapeutic interventions for R A , economic evaluations of these therapies are becoming more common. 6 5 \" 7 9 Increasingly, the methodology utilized to conduct these analyses falls under the cost-utility framework and an incremental cost per Q A L Y is often ca lcula ted . 2 ' 6 7 ' 6 8 ' 7 1 \" 7 4 ' 7 7 ' 7 9 ' 8 0 This approach is supported by the publication of a recent consensus-based reference case for economic evaluations of programs or interventions in the management of R A . Recommendations outlined in the consensus document advocate the use o f Q A L Y s as outcome measures but also stated that disease-specific measures could also be considered. 58 The consensus document also attempts to address which sources should be utilized for Q A L Y weightings and states that both direct and indirect (specifically naming the EQ-5D on and the H U D ) methods are acceptable to utilize. A s outlined above in previous sections of this chapter, the use of different quality weighting sources in the estimation of Q A L Y s across economic evaluations (even within the same disease area) could lead to very different estimations in the incremental number of Q A L Y s and, therefore, the incremental cost-effectiveness (or cost-utility) ratio. 2 6> 3 0> 4 3 ' 5 2 ' 6 0 Although the magnitude of this potential problem has not been directly explored in an actual economic evaluation o f a therapy or intervention for R A , Suarez-Almazor and Conner-Spady utilized a hypothetical intervention and results from small surveys conducted with the E Q -5D, R S , T T O and S G techniques in the general public (n=T04), patients with R A (n=51) and health professionals (n=43). Significant differences were found between the scores achieved on the different preference-based methods by technique and by the sample that was surveyed. A s such, the incremental cost per Q A L Y s calculated using these weights for a hypothetical intervention with R A ranged from $40,000 to $220,000. Therefore, the question arises as to what investigators have been using as preference weights for the calculation of Q A L Y s in economic evaluations of R A . For example, i f standardization has already occurred through the mutual yet independent selection of an instrument or technique that appears to be the best suited to measure elements that are germane to R A , then there may be little or no problem in this regard. However, i f there are several different weightings applied to economic evaluations in studies in the literature without an attempt to standardize the outcomes, the results would be very difficult to compare. A s such, Table 2.1 was compiled to examine the different sources for Q A L Y 59 weights that appear in economic evaluations for R A interventions. A s can be seen from Table 1, sources for weighting come from both directly assessed and indirectly assessed preference measures. The S G is the most commonly applied utility weighting being used four times in economic evaluations. The R S and E Q - 5 D were the next most commonly used instruments followed by the T T O . The SF-6D and the H U I systems have not yet been applied in economic evaluations for interventions or treatments for R A . 2.5 SUMMARY Q A L Y is the preferred measure used to integrate both years of life and health -related quality of life into the effectiveness measure in economic evaluations. The quality weightings for Q A L Y s are based upon H R Q L measured anchored at 0 (dead) and 1.00 (full health). Preference-based weightings for the estimation of Q A L Y s are generally recommended and can be either directly elicited or indirectly elicited by using a health classification and scoring system. Techniques to directly elicit preferences include R S , S G , and T T O methods. Indirect measurement techniques that are widely used include the HUI2 , HUI3 , E Q - 5 D , and the SF-6D. The application o f the preference-based, indirect measurement techniques is expanding due to their ease and low cost of administration when compared to the directly elicited techniques. In the literature, there have been a number of recent studies comparing the indirect preference measurement systems in a variety of disease states. Results from these studies suggest that although the instruments perform reasonably well on their own in terms of feasibility, reliability, validity and responsiveness, there are important differences among them. These differences could result in significant variation in the calculation of incremental 60 cost-effectiveness ratios when different instruments are applied as the quality weights in the estimation of Q A L Y s . In R A , there has been little work comparing the properties of the various indirect, preference-based, utility assessment instruments. The published economic evaluations that have included the Q A L Y as an outcome measure make use o f several different weighting sources (RS, T T O , S G , and the EQ-5D) , making comparisons of outcomes between studies difficult. In addition, there have been no studies within R A that determine what the potential impact would be of using different indirect sources of Q A L Y weightings on the outcomes of economic evaluations. A s such, additional work is required in these areas. 61 2.6 REFERENCES 1. Neumann PJ , Goldie SJ, Weinstein M C . Preference-based measures in economic evaluation in health care. Annu Rev Public Health 2000;21:587-611. 2. Drummond M F , O 'Br ien B , Stoddart G L , Torrance G W (eds.). Methods for the economic evaluation of health care programmes. 2nd edition. Oxford Medical Publications, Oxford. 1997. 3. Choi H K , Hernan M A , Seeger S D , Robins J M , Wolfe F. Methotrexate and mortality in patients with rheumatoid arthritis: A prospective study. Lancet 2002;359:1173-7. 4. Progress in cancer control over the past few decades. National Cancer Institute of Canada. Accessed on the Internet, January 25, 2004 at http://www.ncic.cancer.ca/ncic/internet/standard/0,3621,84658243_85787780_91036 035_langld-en,00.html 5. Hogg R S , Heath K V , Y i p B , Craib K J , O'Shaughnessy M V , Schechter M T , Montaner JS. Improved survival among HIV-infected individuals following initiation of antiretroviral therapy. J A M A 1998;279:450-454. 6. Dolan P. The measurement o f health-related quality o f life for use in resource allocation decisions in health care. Chapter 32. In: Handbook of Health Economics, V o l . 1. Edited by Culyer A J , Newhouse JP. London, U . K . Elsevier Science 2000. 7. Tengs T O , Wallace A . One thousand health-related quality of life estimates. M e d Care 2000;38:583-637. 8. Baker R, Robinson A . Responses to standard gambles: are preferences 'well constructed'? Health Econ 2004; 13: 37-48. 62 9. Dolan P, Stalmeier P. The validity of time trade-off values in calculating Q A L Y s : constant proportional time trade-off versus the proportional heuristic. J Health Econ. 2003; 22: 445-58. 10. Hawthorne G , Richardson J, Day N A . A comparison of the Assessment of Quality of Life ( A Q o L ) with four other generic utility instruments. A n n M e d 2001; 33: 358-70. 11. Canadian Coordinating Office for Health Technology Assessment.: Guidelines for Economic Evaluation o f Pharmaceuticals, Canada. Ottawa: The Canadian Coordinating Office for Health Technology Assessment ( C C O H T A ) 2 n d edition, 1997. 12. Horsman J, Furlong W , Feeny D , Torrance G . The Health Utilities Index (HUI®): concepts, measurement properties and applications. Health and Quality of Life Outcomes 2003; 1:54 (available from http://hqlo.com/content/1 /1754). 13. Torrance G W , Feeny D H , Furlong W J , Barr R D , Zhang Y , Wang Q. Multiattribute utility function for a comprehensive health status classification system: Health Utilities Index Mark 2. M e d Care 1996;34:702-722. 14. Feeny D , Furlong W , Torrance G W , Goldsmith C H , Zhu Z , DePauw S, Denton M , Boyle M . Multiattribute and single-attribute utility functions for the Health Utilities Index Mark 3 system. M e d Care 2002;40:113-128. 15. Feeny D H , Torrance G , Furlong W J . Chapter 26: Health Utilities Index. In: Quality of Life and Pharmacoeconomics in Clinical Trials, Second Edition, edited by B . Spilker, Lippincott-Raven Publishers, Philadelphia, 1996: 239-251. 16. Feeny D , Furlong W , Barr R D . A comprehensive multiattribute system for classifying the health status of survivors of childhood cancer. J C l i n Oncol 1992;10:923-928. 63 17. Maddigan S L , Feeny D H , Johnson J A for the D O V E Investigators. Construct validity of the R A N D - 1 2 and the Health Utilities Index Mark 2 and 3 in type 2 diabetes. Qual Life Res 2004 (in press) 18. Blanchard B , Feeny D , Mahon J L , Bourne R, Rorabeck C , Stitt L , Webster-Bogaert S. Is the Health Utilities Index responsive in total hip arthroplasty patients? J C l i n Epidemiol 2003;56:1046-1054. 19. Pickard A S . Responsiveness of generic health status measures in stroke. Doctor of Philosophy thesis. University o f Alberta. 2002. 20. von Neumann J, Morganstern O. Theory of games and economic behaviour. Princeton N J : Princeton University Press, 1944. 21. Keeney R L , Raiffa H . Decisions with multiple objectives: Preferences and value tradeoffs. 2 n d ed. N e w York, N Y : Cambridge University Press, 1993. 22. Kopec J A , Wil l i son K D . A comparative review of four preference-weighted measures of health-related quality of life. J C l i n Epidemiol 2003;56:317-325. 23. Dominick K L , Ahern F M , Gold C H , Heller D A . Health related quality o f life among older adults with arthritis. Health and Quality o f Life Outcomes 2004;2:5 (available at www.hqlo.com/content/2/1/5) 24. Salaffi F, Stancati A , Carotti M . Responsiveness of health status measures and utility-based methods in patients with rheumatoid arthritis. C l i n Rheumatol 2002;21:478-487. 25. Y e l i n E , Trupin L , Katz P, Lubeck D , Rush S, Wanke L . Association between etanercept use and employment outcomes among patients with rheumatoid arthritis. Arthritis Rheum 2003;48:3046-3054. 64 26. Russell A S , Conner-Spady B , Mintz A , Mal lon C, Maksymyowych W P . The responsiveness of generic health status measures as assessed in patients with rheumatoid arthritis receiving infliximab. J Rheumatol 2003;30:941-947. 27. Luo N , Chew L H , Fong K Y , K o h D R , N g SC, Yoon K H , Vasoo S, L i S C , Thumboo J-. A comparison of the EuroQol-5D and the Health Utilities Index Mark 3 in patients with rheumatic disease. J Rheumatol 2003;30:2268-2274. 28. Grootendorst P, Feeny D , Furlong W . Health Utilities Index Mark 3: evidence of construct validity for stroke and arthritis in a population health survey. M e d Care. 2000; 38: 290-299. 29. Samsa G , Edelman D , Rothman M , Will iams G R , Lipscomb J, Matchar D . Determining clinically important differences in health status measures. A general approach with illustrations to the Health Utilities Index Mark II. Pharmacoeconomics 1999;15:141-155. 30. Conner-Spady B , Suarez-Almazor M E . Variation in the estimation of quality adjusted life-years by different preference-based instrument. M e d Care 2003;41:791-801. 31. Brooks R. EuroQol: the current state of play. Health Policy 1996;37:53-72. 32. Coons SJ, Rao S, Keininger D L , Hays R D . A comparative review of generic quality of life instruments. Pharmacoeconomics 2000;17:13-35. 33. Brooks R, Robin R, de Charro F(eds.). The measurement and valuation of health status using E Q - 5 D : A European perspective (Evidence from the EuroQoL B I O M E D Research Programme). Klewer Academic Publishers, Netherlands, 2003. 34. Dolan P. Modeling valuations for the EuroQol health states. M e d Care 1997;35:1095-1108. 65 35. Johnson J A , Coons SJ, Erog A , Azava-Kovats G . Valuation of EuroQol (EQ-5D) health states in an adult U S sample. Pharmacoeconomics 1998;13:421-433. 36. Dorman PJ , Slattery J, Farrell B , Dennis M S , Sandercock P A . A randomised comparison of the EuroQol and Short Form-36 after stroke. United Kingdom collaborators in the International Stroke Trial . B M J 1997; 315: 461. 37. Pickard A S , Weijnen ThJG, Niewenhuizen M G M , Johnson J A , de Charro Fth. A comparison of Canadian and European VAS-based valuations of E Q - 5 D health states (abstract). Can J C l i n Pharmacol 2001;8:23. 38. Wolfe F, Hawley D J . Measurement of the quality of life in rheumatic disorders using the EuroQol. B r J Rheumatol 1997; 25:675-682 39. Johnson J A , Pickard A S . Comparison of the E Q - 5 D and SF-12 health surveys in a general population survey in Alberta, Canada. M e d Care 2000;38:115-121. 40. Hollingworth W , Mackenzie R, Todd C J , Dixon A K . Measuring changes in quality of life following magnetic resonance imaging of the knee: SF-36, Euroqol or Rosser index? Qual Life Res 1995;4:325-334. 41. Essink-Bot M - L , Krabbe P F M , Bonsel G J , Aaronson N K . A n empirical comparison of four generic health status measures: The Nottingham Health Profile, the Medical Outcomes Study 36-item Short Form Health Survey, the C O O P / W O N C A charts and the EuroQol instrument. M e d Care 1997;35:522-537. 42. Jenkinson C , Stradling J, Petersen S. H o w should we evaluate health status? A comparison of three methods in patients presenting with obstructive sleep apnoea. Qual Life Res 1998;7:95-100. 66 43. Longworth L , Bryan S. A n empirical comparison of E Q - 5 D and SF-6D in liver transplant patients. Health Econ 2003; 12: 1061-1067. 44. Hurst N P , Jobanputra P, Hunter M , Lambert M , Lochhead A , Brown H . Validity of Euroqol~a generic health status instrument—in patients with rheumatoid arthritis. Economic and Health Outcomes Research Group. Br J Rheumatol 1994; 33: 655-662. 45. Hurst N P , K i n d P, Ruta D , Hunter M , Stubbings A . Measuring health-related quality of life in rheumatoid arthritis: Validity, responsiveness and reliability of the EuroQol (EQ-5D). B r J Rheumatol 1997;36:551-559. 46. Brazier J, Usherwood T, Harper R, Thomas K . Deriving a preference-based single index from the U K SF-36 Health Survey. J C l i n Epidemiol 1998;51:1115-1128. 47. Brazier J, Roberts J, Deverill M . The estimation of a preference-based measure of health from the SF-36. J Health Econ 2002;21:271-292. 48. Ware JE, Sherbourne C D . The M O S 36-item Short Form Health Survey (SF-36). M e d Care 1992;30:473-483. 49. Walters SJ, Brazier JE. What is the relationship between the minimally important difference and health state utility values? The case of the SF-6D. Health and Quality of Life Outcomes 2003; 1:4 (available at http://www.hqlo.eom/content/l/l/4\") 50. Schackman B R , Goldie SJ, Freedberg K A , Losina E , Brazier J, Weinstein M C . Comparison of health state utilities using community and patient preference weights derived from a survey of patients with H I V / A I D S . M e d Decis Making 2002; 22: 27-38. 67 51. O'Brien B J , Spath M , Blackhouse G , Severens J L , Dorian P, Brazier J. A view from the bridge: agreement between the SF-6D utility algorithm and the Health Utilities Index. Health Econ. 2003; 12: 975-981. 52. Neumann PJ , Sandberg E A , Arak i SS, Kuntz K M , Feeny D , Weinstein M C . A comparison of HUI2 and H U D utility scores in Alzheimer's disease. M e d Decis Making. 2000; 20: 413-422. 53. Maddigan S L , Feeny D H , Johnson J A for the D O V E Investigators. Construct validity of the R A N D - 1 2 and Health Utilities Index Mark 2 and 3 in type 2 diabetes. Qual Life Res (in press). 54. Maddigan S L , Feeny D H , Johnson J A for the D O V E Investigators. A comparison of the Health Utilities Indices Mark 2 and Mark 3 in type 2 diabetes. M e d Decis Making 2003;23:489-501. 55. Blanchard C, Feeny D , Mahon J L , Bourne R, Rorabeck C , Stitt L , Webster-Bogaert S. Is the Health Utilities Index responsive in total hip arthroplasty patients? J C l i n Epidemiol 2003;56:1046-1054. 56. Feeny D , Blanchard C , Mahon JL , Bourne R, Rorabeck C , Stitt L , Webster-Bogaert S. Comparing community-preference-based and direct standard gamble utility scores: evidence from elective total hip arthroplasty. Inter J Technol Assess 2003; 19:362-372. 57. Feeny D , Furlong W , Saigal S, Sun J. Comparing directly measured standard gamble scores to HUI2 and H U D utility scores: group- and individual level comparisons. Soc Sci M e d 2004;58:799-809. 68 58. Gl ick H A , Polsky D , Wil lke RJ , Schulman K A . A comparison of preference assessment instruments used in a clinical trial: Responses to the visual analog scale from the EuroQol E Q - 5 D and the Health Utilities Index. M e d Decis Making 1999;19:265-275. 59. Bosch J L , Hunink M G M . Comparison of the Health Utilities Index Mark 3 ( H U D ) and the EuroQol E Q - 5 D in patients treated for intermittent claudication. Qual Life Res 2000;9:591-601. 60. Siderowf A , Ravina B , Gl ick H A . Preference-based quality-of-life in patients with Parkinson's disease. Neurology 2002;59:103-108. 61. Oostenbrink R, M o l l H A , Essink-bot M L . The E Q - 5 D and the Health Utilities Index for permanent sequelae after meningitis. A head to toe comparison. J C l i n Epidemiol 2002;55:791-799. 62. Lubetkin EI , Golde M R . Areas of decrement in health-related quality o f fie ( H R Q L ) : Comparing the SF-12, E Q - 5 D and H U D . Qual Life Res 2003;12:1059-1067. 63. Macran S, Weatherly H , K i n d P. Measuring population health. A comparison of three generic health status measures. M e d Care 2003;41:218-231. 64. Statistics Canada. A head-to-head comparison of two generic health status measures in the household population: McMaster Health Utilities Index (Mark 3) and the E Q -5D. Ottawa: Statistics 2003. 65. Anis A H , Tug well P X , Wells G A , Stewart D G . A cost-effectiveness analysis o f cyclosporine in rheumatoid arthritis. J Rheumatol 1996;23:609-616. 69 66. Kavanaugh A , Heudebert G , Cush J, Jain R. Cost evaluation of novel therapeutics in rheumatoid arthritis ( C E N T R A ) : a decision analysis model. Semin Arthritis Rheum 1996;25:297-307. 67. Verhoeven A C , Bibo JC, Boers M , Engel G L , van der Linden SJ. Cost-effectiveness and cost-utility of combination therapy in early rheumatoid arthritis: randomized comparison of combined step-down prednisolone, methotrexate and sulphasalazine with sulphasalazine alone. Br J Rheumatol 1998;37:1102-1109. 68. Maetzel A , Strand V , Tugwell P, Wells G , Bombardier C . Cost effectiveness of adding leflunomide to a 5-year strategy o f conventional disease-modifying antirheumatic drugs in patients with rheumatoid arthritis. Arthritis Rheum 2002;47:655-661. 69. Choi H K , Seeger JD, Kuntz K M . A cost-effectiveness analysis of treatment options for patients with methotrexate-resistant rheumatoid arthritis. Arthritis Rheum 2000;43:2316-2327. 70. Choi H K , Seeger JD, Kuntz K M . A cost-effectiveness analysis of treatment options for methotrexate-nai've rheumatoid arthritis. J Rheumatol 2002;29:1156-1165. 71. Wong JB , Singh G , Kavanaugh A . Estimating the cost-effectiveness of 54 weeks of infliximab for rheumatoid arthritis. A m J M e d 2002; 113:400-408. 72. Kobelt G , Jonsson L , Young A , Eberhardt K . The cost-effectiveness o f infliximab (Remicaide) in the treatment of rheumatoid arthritis in Sweden and the United Kingdom based on the A T T R A C T study. Rheumatology 2003;42:326-335. 70 73. Brennan A , Bansback N J , Reynolds A , Conway P. Modeling the cost-effectiveness of etanercept in adults with rheumatoid arthritis in the U K . Rheumatology 2004;43:62-72. 74. Kobelt G , Eberhardt K , Geborek P. T N F inhibitors in the treatment of rheumatoid arthritis in clinical practice: Costs and outcomes in a follow-up study of patients with R A treated with etanercept or infliximab in southern Sweden. A n n Rheum Dis 2004;63:4-10. 75. Marra C A , Esdaile J M , Anis A H . Practical pharmacogenetics: The cost-effectiveness of screening for thiopurine s-methyltransferase polymorphisms in patients with rheumatological conditions treated with azathioprine. J Rheumatol 2002;36:1851-1855. 76. Oh K T , Ani s A H , Base S C . Pharmacoeconomic analysis o f thiopurine methyltransferase polymorphism screening by polymerase chain reaction for treatment with azathioprine in Korea. Rheumatology 2004;43:156-163. 77. Spiegel M R B , Targownik L , Dulai G S , Gralnek I M . The cost-effectiveness of cyclo-oxygenasae-2 selective inhibitors in the management of chronic arthritis. A n n Intern M e d 2003;138:795-806. 78. Lee K K , Y o u J H , Ho JT, Suen B Y , Yung M Y , Lau W H , Lee W V , Sung J Y , Chan F K . Economic analysis of celecoxib versus diclofenac plus omeprazole for the treatment of arthritis in patients at risk of ulcer disease. Aliment Pharmacol Ther 2003;18:217-222. 79. Maetzel A , Krahn M , Naglie G . The cost effectiveness of rofecoxib and celecoxib in patients with osteoarthritis or rheumatoid arthritis. Arthritis Rheum 2003;49:283-292. 71 80. Bae S-C, Corzillius M , Kuntz K M , Liang M H . Cost-effectiveness o f low dose corticosteroids versus non-steroidal anti-inflammatory drugs and C O X - 2 specific inhibitors in the long-term treatment of rheumatoid arthritis. Rheumatology 2003;42:46-53. 81. Maetzel A , Tugwell P, Boers M , Guil lemin F, Coyle D , Drummond M , Wong JB , Gabriel S E on behalf o the O M E R A C T 6 Economics Research Group. Economic evaluation of programs or interventions in the management of rheumatoid arthritis: Defining a consensus based reference case. J Rheumatol 2003;30:891-896. 82. Suarez-Almazor M E , Conner-Spady B . Rating of arthritis health states by patients, physicians, and the general public. Implications for cost-utility analysis. J Rheumatol 2001;28:648-656. 72 TABLE 2.1: SOURCE OF PREFERENCES USED FOR QALY WEIGHTS IN ECONOMIC EVALUATIONS OF RA Preference- DMARD NSAID Decision- Clinical or Reference # Elicitation Technique analysis Observational Trial Direct RS V V 67,68,71,79 T T O 77,80 S G vv 67,68,79,80 Indirect E Q - 5 D V 72,73,74 SF-6D HUI2 H U D \" D M A R D \" refers to a study examining the cost-effectiveness of a disease-modifying antirheumatic drug; \" N S A I D \" refers to a study examining the cost-effectiveness of a traditional N S A I D or C O X - 2 specific inhibitor; \"Decision analysis\" refers to the methodology used to perform the economic evaluation; \"Cl inical or Observational Tr ia l\" refers to whether the economic evaluation was conducted alongside a clinical or observation trial; \" R S \" = rating scale; \" T T O \" = time tradeoff; \" S G \" = standard gamble; \" E Q - 5 D \" = EuroQol index score; SF-6D = Short-Form 6D index score; HUI2 = Health Utilities Index Mark 2 index score; HUI3= Health Utilities Index Mark 3 index score. 73 CHAPTER 3 A COMPARISON OF FOUR INDIRECT METHODS OF ASSESSING UTILITY VALUES IN RHEUMATOID ARTHRITIS 3.1 FOREWORD This chapter is a cross-sectional comparison of four indirect utility instruments (HUI2, H U D , SF-6D, EQ-5D) in a sample of patients with rheumatoid arthritis. The content of this chapter has been accepted for publication in Medical Care. The candidate is first author on this manuscript, developed the hypotheses, entered and manipulated the data, performed the statistical analyses, and wrote the final manuscript. Co-authors of the study included Daphne Guh, a statistician, Drs. Andy Chalmers and Barry Koehler, rheumatologists who participated in the recruitment of patients, Dr. John Brazier, the developer of SF-6D and Drs. Anis , Esdaile, and Kopec, members of the supervisory committee. 3.2 INTRODUCTION To integrate quality of life into economic analyses, the effectiveness of a health intervention is measured using a metric known as \"utility\" where values of zero and 1.0 equal death and perfect health, respectively (some measures permit values less than zero for health states ranked worse than death). Utilities are used to calculate quality-adjusted life years ( Q A L Y ) gained by adjusting survival by the average utility weight derived from the outcome of that health intervention. Cost per Q A L Y gained is a unique and preferred 74 measure of the economic value of different interventions, because it permits comparison both within and across disease groups, thereby facilitating funding allocation decisions. 1 A variety of methods exist for measuring health-related quality of l ife. 2 However, in order to integrate such measures into an economic evaluation, the common approach is to use Q A L Y s as the outcome and preference-based assessments (often referred to as \"utilities\") as the source of weightings to assign quality to life-years.1 The use of a pre-scaled index is often the most convenient and least expensive means of achieving this approach. While no validated index is available for economic evaluations specifically in musculoskeletal disease, several generic preference measures appear suitable for adaptation to economic evaluations in R A . 3 , 4 Examples of these instruments include the Health Utilities Index 2 and 3 (HUI2 and H U B ) , the Short Form 6D (SF-6D), and the EuroQol (EQ-5D). The major characteristics of these instruments have been summarized in Table 3.1 and comprehensive reviews of these instruments are available elsewhere. 1 ' 5 It is important to point out that there is no \"gold standard\" among these instruments and each likely has its own advantages and disadvantages. Although a few studies have examined the appropriateness of individual indirect utility instruments specifically in R A , 6 \" 9 no study has directly compared these measures in the same R A population. However, Conner-Spady et al. recently reported on the intercharigeability of preference-based instruments (the E Q - 5 D , the SF-6D, and the H U B ) in providing weights for Q A L Y s . 1 0 Specifically, they compared the global utility scores in a sample of 161 patients with five different musculoskeletal conditions. With the increased popularity of economic evaluations of new therapies and programmes, the impact of the choice of utility measure to use in the weightings of Q A L Y s 75 is uncertain. It is important to evaluate these instruments in terms of their agreement and also to identify specific deficits of preference-based measures in R A . Thus, our primary objectives of this study were 1) to compare the global utility scores from the HUI2 , HUI3 , E Q - 5 D and the SF-6D at both a sample level and within individuals in a clinically heterogeneous sample o f R A patients; and 2) to determine the extent to which global utility scores from the indirect utility assessment instruments were representative of dimensions of health status measured in a sample of R A patients. 3.3 M E T H O D S Three hundred and thirteen individuals participated in the study. In order to be included in the study, subjects had to have a rheumatologist-confirmed diagnosis of R A (as defined by the American College of Rheumatology diagnostic criteria) 1 1, receive rheumatology care within the province of British Columbia in one of the urban study areas (Vancouver, Richmond) or one of the rural study areas (Vernon and Penticton), consent to answering the questionnaires and be sufficiently proficient in English to answer the questionnaires. Recruitment of R A patients began in October 2001 and ended in September 2002. Ethical approval for this study was obtained through the University of British Columbia's Behavioural Ethics Committee and informed consent was obtained from each of the participants. Eight private practice rheumatologists' offices from the study areas referred subjects as part o f their interactions in routine clinical practice. In addition, two of these rheumatologists' practices sent letters to all of their patients with R A inviting them to participate in the survey. A l l patient questionnaires were self-administered, self-completed 76 and submitted via mail. The study physicians' offices supplied additional information from the patients' health record. 3.3.1. Measures 3.3.1.1 Clinical Participants were asked questions regarding their R A and medication history (including recent adverse events). Other self-reported clinical variables included swollen and tender joint count (using the mannequin-based 42 joint count methodology), 1 2 a 10 cm pain visual analogue scale ( V A S ) and patient global assessment of disease activity (10 cm V A S ) . Erythrocyte sedimentation rate (ESR) values closest to the date of completion of the questionnaire (within 1 month) were extracted from the patient's chart for those patients whose rheumatologist used this measure for patient monitoring. In addition, the attending rheumatologists were asked to complete a physician global assessment of disease activity (10 cm V A S ) for each patient. 3.3.1.2 Questionnaires Respondents self-completed three questionnaires allowing for the scoring of four indirect utility assessment instruments (the HUI2 , the H U D , the E Q - 5 D , and the SF-6D). 3.3.1.3 Hypotheses Since all the instruments purport to measure the same construct (namely, a global utility value), then, in theory, the values obtained with the different instruments should agree within individual subjects. However, since each instrument has been constructed (in terms of 77 domains assessed) and valued in different ways, we hypothesized that there would be significant differences between the instruments. In addition, we hypothesized that the global utility scores achieved with the different M A U T instruments would be represented by different dimensions of health status. 3.3.2. Data Analysis Descriptive statistics were used to characterize the study sample. Repeated measures A N O V A was used to compare global utility values across instruments with Bonferroni's correction to adjust for multiple comparisons between instruments. Due to the skewed nature of the distribution of some of the instrument's global utility scores, nonparametric tests were also applied. Since the results using both approaches agreed, only the results of the parametric tests are reported here. Agreement among the utility scores obtained from the four instruments was assessed using the Intraclass Correlation Coefficient (ICC) with a two-way mixed effect model such that the subject effect is random and the measure effect is f ixed. 1 3 Bland-Altman plots were used to examine patterns of inter-instrument agreement between every possible combination of instruments. 1 4 These plots are useful to reveal a relationship between the differences and the averages to look for any systematic bias and to identify possible outliers. Since, in theory, these instruments should have perfect agreement as they are attempting to measure the same global utility, the difference scores should be randomly distributed closely around the zero line. B y convention, i f 95% of the differences fall within zero ± 1.96 times the standard deviation of the mean difference and are not interpreted to be clinically important, the two methods may be used interchangeably. 78 Minimal ly important differences (MID) in the utility values obtained by the M A U T instruments are thought to be between 0.03 and 0.07. 9 ' 1 5 ' 1 6 Exploratory factor analysis was utilized to identify the dimensions of health measured by the questions and to determine i f their similar domains loaded into identified dimensions. Raw answers for all o f the questions from the SF-6D, the E Q - 5 D and the HUI2/3 questionnaire were utilized for the factor analysis. Due to the data's skewed, categorical nature, techniques based on polychoric correlations were used. Based on the nature o f the instruments, it was expected that the factor analysis would identify the following dimensions: functional ability, pain, cognition, hearing/vision/sensation, and mental/emotional health. Unweighted least squares was utilized as the estimation method for factor extraction. Promax rotation was used as it allows factors to be reasonably correlated. The criteria used to determine the appropriate number o f factors consisted of the scree test, an eigenvalue greater than one, and the presence of residuals greater than 0.05 between the observed and the 17 reproduced correlation matrices and the overall interpretation of the solution. The rotated factor pattern matrix, which represents a matrix that is uncontaminated by overlap among factors, was selected for interpretation of the factor solution. 1 7 1 8 B y adapting the methods o f Richardson and Zumbo to determine the extent to which the relative proportion of variation in the global utility scores were explained by the factor scores produced by the exploratory factor analysis, each global utility score was regressed onto the saved factor scores. To determine the relative contribution of each explanatory variable (i.e., each factor) to the regression equation, a relative Pratt index score was generated. This index quantifies the contribution each independent variable makes to the overall regression equation by partitioning the model R 2 into the proportion attributable to 79 each independent variable. 1 9 One should note that the index is based on a geometric layout of regression and does not make any assumptions regarding the distribution of the variables. In using the modified Pratt Index, we were able to determine which aspects o f health were relevant to R A patients in the different overall utility scores. 3.4 RESULTS Three hundred and thirteen (245 female) respondents with confirmed R A completed the baseline questionnaire. One hundred and ninety seven (63%) patients were recruited directly by the study rheumatologists whereas 116 were recruited via the mail survey. The completion rates differed for direct recruitment (91%) compared to mail recruitment (38%) after accounting for invalid mailings (address problems (n=69), had died (n=6), reported that they did not have R A (n=3), and already recruited (n=l)). The final sample represented a clinically heterogeneous cohort o f patients with R A (Table 3.2). A more detailed description of our cohort is available elsewhere. 2 0 3.4.1 Comparison of Utility Scores Summaries of the instrument utility scores are presented in Table 3.3. There were few missing values in our sample for any o f the four instruments. The distributions of the utility values obtained from the four instruments were markedly different (Figure 3.1). The H U D global utility score was the lowest of the four instruments for 151 (50%) of the participants followed by the SF-6D utility score in 96 participants (32%). The H U I 2 global utility score was the highest score in 141 participants (47%) followed by the E Q - 5 D score in 92 (30%) of participants. Sixteen participants (5%) scored negative values on at least one measure with 80 the H U D global utility score being negative in 5% (n=15) of participants and the E Q - 5 D utility index in 1% (n=2). Both were negative in only one participant. For the SF-6D, a total of 223 health states were defined (the most out of any of the instruments). The most common SF-6D health vector was '121212' (n=9, 2.9%), followed by '523434' (n=6, 1.9%) and '323323' (n=5, 1.6%). N o participant indicated no problems ('111111') or the worst health state ('645655'). A total of 35 different health states were described by the E Q - 5 D health profile (vectors). The most frequent health vector in the sample for the E Q - 5 D was '21221' with 51 (16.2%) indicating this response (some problems in mobility, usual activities and moderate pain/discomfort with no problems in self-care or anxiety/depression). The other most common vectors were '11121' (n=39, 12.4%), '21222' (n=27, 8.6%), '22222' and '22221' (both n=26, 8.3%). Twenty-three (7.3%) indicated no problems ('11111') whereas the worst health state in the sample was '33331' (n=l, 0.3%). Using the HUI2 , a total o f 136 health states were defined. The most common health states were '211112' (n=31, 10.2%) where respondents indicated that they required equipment to hear or see or speak and occasional pain without disruption of normal activities followed by '111112' (n=12, 3.9%). For the H U D , atotal of 217 health states were defined. Four (1%) respondents indicated no problems ('11111111') and 1% (n=5) indicated no problems except for mi ld or moderate pain that prevented no activities ('11111112'). The most common health state vectors were '21111112' (n=14, 4.6%) indicating normal vision with glasses and mild or moderate pain preventing no activities and '21112112' (n=13, 4.2%) indicating normal vision with glasses, mild limitations in the use of hands and fingers and mild or moderate pain that prevented no activities. The worst health state for the H U D system was '51144215' (n=l,0.3%). 81 Using repeated measures A N O V A , there were significant differences among the utility scores obtained by the four instruments within individuals. In examining differences utility scores obtained from the different methods, all comparisons were significant (p<0.005). 3.4.2 Analysis of Agreement The I C C for all o f the measures (HUI2, H U D , EQ-5D, SF-6D) was 0.67 (95% C.I. 0.62 to 0.71). The pairwise I C C values are summarized in Table 3.4. The Bland-Altman plots are presented in Figures 3.2-3.7. In general, for all of the plots, there appeared to be more agreement in the higher utility values compared to the lower utility values across instruments. A l l plots reveal that a substantial proportion of the observations in the lower utility values fall outside the area of zero ± 1.96 times the standard deviation of the differences and most of these differences between any two instruments exceed the M I D . 3.4.3 Exploratory Factor Analysis Bartlett's Test of Sphericity was used to determine whether the correlation matrix differed from the identity matrix. The results o f this test supported the use o f factor analysis (p < 0.0001). There were five factors with eigenvalues greater than one; thus, a factor analysis extracting these factors was performed. This analysis accounted for 74% of the variation in the original variables. Five factors were extracted and interpreted as follows (with strongest loading indicators in brackets): 1) Factor 1 = Physical Functioning (pain mobility/ambulation, self-> 82 care, dexterity, usual activity, physical health, role limitations, social limitations); 2) Factor 2 = Emotional/Mental Health - (anxiety, happiness, and mental health); 3) Factor 3 =Speech (speaking); 4) Factor 4 = Cognition - (cognition, hearing); and 5) Factor 5 = Vision (vision). The rotated factor pattern matrix showing the individual loadings from the raw questions is shown in Table 3.5. The factor correlations are shown in Table 3.6. With the exception of physical functioning/pain and emotional/mental health, there were no moderate or strong correlations among the factors. For multiple linear regressions of the global utility scores (dependent variable), the results revealed that the functional ability/pain factor score contributed the most in explaining the variance in all o f the global utility scores. However, for the HUI2/3 , cognition factor scores explained a large proportion of the variance in the global utility scores whereas the emotional/mental factor scores explained a large component of the variance in the SF-6D and E Q - 5 D utility scores (Table 3.7). 3.5 DISCUSSION This is the first study to report on the results of administering and comparing the four instruments in a relatively large sample of participants with R A . The low number of missing values for each of the instruments attests to the fact that they are suitable to be self-administered in this type of cohort. There were significant differences in utility scores between instruments. In addition, the level of agreement was much lower than theoretically postulated (i.e. an I C C approaching 1.0) with I C C values ranging from 0.56 to 0.79. From the Bland-Altman plots, agreement was much poorer at lower utility values than higher utility values. Al so , it would appear that the global utility scores from the various systems are 83 mostly measuring physical functioning/pain; however, the HUI2/3 are also measuring cognition while the SF-6D and E Q - 5 D are also measuring emotional/mental health. This finding is not surprising considering that the H U I systems were created to be \"within the skin\" measures; that is, they are primarily concerned with impairment rather than disability or handicap. On the other hand, the SF-6D is based on the psychometric instrument, the SF-36. 2 1 A s such, it is a measure of handicap and includes social and role functioning which are not assessed by the other instruments. The SF-6D, due to its high lower boundary o f +0.30 does not provide a wide range o f utility values. This phenomenon is illustrated in the Bland-Altman plot comparing the global utilities from the SF-6D to the EQ-5D. However, this observation is not unexpected as Brazier stated that this instrument might be most appropriate in \"groups experiencing mild to moderate health problems and in those expected to experience comparatively small changes\". 2 1 Brazier also states that one of the potential advantages of the SF-6D is the much larger size of its predictive system as compared to the EQ-5D. This appears to be the case as there were 188 more health states defined using the SF-6D compared to the E Q - 5 D system within our cohort. This finding may impart a greater degree of sensitivity to change that this instrument can detect in longitudinal studies especially in those in mildly or moderately impaired. The E Q - 5 D appears to have significant ceiling effects with 21% of the study participants reporting either no problems on all the domains or some problems on one domain only (with the remainder having no problems). More individuals reported having no problems under the E Q - 5 D classification system than reporting the lowest level o f deficit on the SF-6D system. In addition, due to the limited number o f health states that were reported 84 by the sample, there was a lack of variability in,the responses to the five questions. This lack of variation in the responses and low number of possible descriptive health states may impede the sensitivity of the E Q - 5 D in longitudinal studies when compared to the other instruments. The HUI2 and HUI3 displayed superior agreement to the other instruments as they had the highest pair-wise I C C values. However, as shown by the Bland-Altman plot comparing these two instruments, the global utility values defined by them tend to be quite different especially at the lower end of the utility scale. Both of these systems appear to define sufficient health states to enable them to discriminate small changes between patients or within patients over time, which is in agreement with previous research. 2 2 There have been comparisons between instruments in other disease states. For example, there have been comparisons between the HUI2 and H U D in Alzheimer's disease, 1 9 the H U D and the SF-6D in patients at an elevated risk o f sudden cardiac death, 2 3 and the SF-6D, H U D , EQ-5D, the Assessment of Quality of Life Questionnaire ( A Q o L ) , and the Finnish 15D in a sample of Australian Residents. 2 4 Generally, the results of these studies agreed with ours in that there were significantly different values achieved both between instruments and within individuals using the different instruments. There are several limitations to this study. First of all , the scoring functions for both the E Q - 5 D and SF-6D global utility values were derived from samples of the U K population which may differ from preferences given by those in Canada. 2 1 ' 2 5 Secondly, although the SF-6D and the H U I systems utilized the standard gamble as a valuation technique for health states described by their attributes, there are several differences with how the health states 85 are valued. For example, the SF-6D health states were valued directly, whereas the H U I health states were directly valued by a rating scale that was transformed (using a power function) into a standard gamble utility. The E Q - 5 D health states were valued using the time trade-off technique. Also , the H U I systems utilize multiattribute utility theory and a multiplicative model of scoring whereas the SF-6D and E Q - 5 D use empiric methods and additive scoring models. Thus, differences between the health utilities obtained may be confounded by the utility evaluation and valuation methodology. In addition, we have studied a single cohort with a specific disease state thus limiting the generalizability to other diseases. A s we only analyzed cross-sectional data, it is not clear i f changes in the utility values obtained from the M A U T instruments would be similar. I f these changes are similar across instruments, O 'Br ien et al. postulate that this would increase the comfort level for inter-study comparability of Q A L Y s . 2 0 Further research examining the longitudinal validity (responsiveness and sensitivity to change) o f these M A U T instruments is presently ongoing. The results from this research w i l l further guide which of these instruments is the most appropriate to utilize in the assessment of RA-related outcomes. A recent study comparing the M A U T instruments in a sample from the Australian general population recommended that researchers should select a M A U T instrument that is sensitive to the health states that are being investigated. 2 4 Although it is good practice in quality o f life research to randomize the order of administration of questionnaires to enable the evaluation of an \"order effect\", we did not randomize the sequence of questionnaires. In piloting of the survey in the first 25 patients, we found that patients were not completing the questionnaires in the order that they were 86 presented or another identifiable systematic pattern. Thus, a formal assessment of an \"ordering effect\" was not feasible. In conclusion, the utility values obtained from the four M A U T instruments were statistically and clinically different. The SF-6D is bounded by a relatively high floor (+0.30) and the E Q - 5 D displayed important ceiling effects that could limit their ability to detect change in patients at the higher or lower limits of the utility scale. The H U I 2 and HUI3 did not appear to suffer from these limitations. It is unlikely that the utility values from the M A U T instruments tested, i f used as the weightings for Q A L Y s in studies examining R A , would result in comparable estimates. 87 3.6 R E F E R E N C E S 1. Drummond M F , O 'Br ien B , Stoddart G L , Torrance G W (eds.). Methods for the economic evaluation of health care programmes. 2nd edition. Oxford Medical Publications, Oxford. 1997. 2. Streiner D L , Norman G R . Health measurement scales. 2nd edition. Oxford Medical Publications, Oxford. 1995. 3. Green C, Brazier J, Deverill M . Valuing health-related quality of life. Pharmacoeconomics 2000;17:151-165. 4. Coons SJ, Rao S, Keininger D L , Hays R D . A comparative review of generic quality-of-life instruments. Pharmacoeconomics 2000;17(l):13-35. 5. Kopec J A , Wil l i son K D . A comparative review of four preference-weighted measures of health-related quality of life. J C l i n Epidemiol 2003;56:317-325. 6. Verhoeven A C , Boers M , van der Linden. Responsiveness of the core set, response criteria, and utilities in early R A . A n n Rheum Dis 2000;59:966-974. 7. Hurst N P , K i n d P, Ruta D , et al.. Measuring health-related quality of life in rheumatoid arthritis: Validity, responsiveness, and reliability of E u r o Q O L (EQ-5D). Br J Rheumatol 1997;36:551-559. 8. Wolfe F, Hawley DJ . Measurement of the quality of life in rheumatic disorders using the EuroQOL. Br J Rheumatol 1997; 36:786-793. 9. Grootendorst P, Feeny D , Furlong W . Health Utilities Index Mark 3: evidence of construct validity for stroke and arthritis in a population health survey. M e d Care 2000; 38:290-299. 88 10. Conner-Spady B , Suarez-Almazor M E . Variation in the estimation of quality-adjusted life-years by different preference-based measures. M e d Care 2003;41:791-801. 11. Araett F C , Edworthy S M , Bloch D A , et al.. The American Rheumatism Association 1987 revised criteria for the classification of rheumatoid arthritis. Arthritis Rheum 1988;31:315-324. 12. Wong A L , Wong W K , Harker J, et al.. Patient self-report tender and swollen joint counts in early rheumatoid arthritis. Western Consortium o f Practicing Rheumatologists. J Rheumatol 1999;26:2551-2561. 13. Shrout P E , Fleiss JL. Intraclass Correlations: Uses in assessing rater reliability. Psychological Bulletin 1979; 2: 420-428. 14. Altaian D G , Bland J M . Measurement in medicine: The analysis of method comparison studies. The Statistician 1983;32;307-317. 15. Drummond M F . Introducing economic and quality of life measurements into clinical studies. A n n M e d 2001;33:344-349. 16. Samsa G , Edelman D , Rothman ML,e t al.. Determining clinically important differences in health status measures. A general approach with illustration to the Health Utilities Index Mark II. Pharmacoeconomics 1999;15:141-155. 17. Tabachnick D G , Fidell L S . Using multivariate statistics (3rd Edition). Harper Collins Publishers, Inc. New York, New York. 1996. 18. Richardson C G , Zumbo B D . A statistical examination of the Health Util i ty Index-Mark III as a summary measure of health status for a general population health survey. Social Indicators Research 2000;51:171-191. 89 19. Thomas D R , Hughes E , Zumbo B D . On variable importance in linear regression. Social Indicators Research 1998;45:253-275. 20. Marra C A , Woolcott JC , Shojania K , et al.. A n assessment of the construct validity of four indirect utility measures in rheumatoid arthritis. Soc Sci M e d (submitted). 21. Brazier J, Roberts J, Deverill M . The estimation of a preference-based measure of health from the SF-36. J Health Econ 2002;21:271-292. 22. Neumann PJ , Sandberg E A , Araki SS, et al.. A comparison of H U I 2 and HUI3 utility scores in Alzheimer's disease. M e d Decis Making 2000;20:413-422. 23. O 'Br ien B J , Spath M , Blackhouse G , et al.. A view from the bridge: agreement between the SF-6D utility algorithm and the Health Utilities Index. Health Econ 2003;12:975-981. 24. Hawthorne G , Richardson J, Atherton Day N . A comparison of the Assessment of Quality of Life ( A Q O L ) with four other generic utility instruments. A n n M e d 2001;33:358-370. 25. Dolan P. Modeling valuations for EuroQol health states. Medical Care 1997;35:1095-1108. 90 TABLE 3.1: COMPARISON OF THE INDIRECT UTILITY ASSESSMENT INSTRUMENTS Dimensions/Domains/Attributes #of Possible Health States Valuation Technique Boundaries HUI2 Sensation (vision, hearing, speech), Mobil i ty , Emotion Cognition, Self-care, Pain 24,000 Standard Gamble -0 .03-1 .00 H U B Vis ion , Hearing, Speech, Ambulation, Dexterity, Emotion, Cognition, Pain 972,000 Standard Gamble -0 .36-1 .00 SF-6D Physical Function, Role Limitation, Social Function, Pain, Mental Health, Vitality 18,000 Standard Gamble 0 .30-1 .00 EQ-5D Mobil i ty , Usual Activities, Self-Care, Pain, Anxiety 243 Time Trade Off -0 .59-1 .00 91 TABLE 3.2: CLINICAL CHARACTERISTICS OF THE STUDY PARTICIPANTS Parameter Mean SD* Age (yrs) 61.5 (25.9) 26 E S R 24.52 (21.02) 21 R A Q o L (range 0 - 3 0 ) 12.82 (8.28) 8 R A Duration (yrs) 13.87(11.41) 11 H A Q 1.10(0.77) 0.8 Pt. Global Assessment 59.82 (25.86) 26 M D Global Assessment 20.88 (23.39) 23 Pain V A S 43.12(27.02) 27 E Q - 5 D Health Thermometer 65.02 (19.27) 19 Tender Joint Count 15.09(11.99) 12 Swollen Joint Count 9.14(9.66) 10 N % Self-Reported R A Severity, n % Very M i l d 9 2.9% M i l d 34 10.9% Moderate 120 38.3% Severe 110 35.1% Very Severe 27 8.6% Missing 13 4.2% Self-Reported R A Control, n % Very Wel l Controlled 33 10.6% Wel l Controlled 76 24.3% Adequately Controlled 123 39.3% Not Wel l Controlled 61 19.5% Not Controlled A t A l l 7 2.2% Missing 13 4.2% * SD = Standard Deviation 92 TABLE 3.3: OVERALL MEAN AND MEDIAN UTILITY SCORES FROM THE INSTRUMENTS IN THE SAMPLE OF RA PATIENTS Instrument Mean* 95% CI Lower Upper Median IQR* N SF-6D 0.63 0.61 0.64 0.60 0.12 302 EQ-5D 0.66 0.63 0.69 0.74 0.19 308 HUI2 0.71 0.69 0.73 0.75 0.28 304 HUI3 0.53 0.50 0.57 0.56 0.44 303 * IQR = interquartile range *P-value O.0001 for comparison of mean utility scores using repeated measures ANOVA (based on n= 302) *P<0.005 for all pairwise comparisons with Bonferroni correction 93 TABLE 3.4: INTRACLASS CORRELATIONS AND 95% CONFIDENCE INTERVALS BETWEEN INSTRUMENTS Instrument HUI2 HUI3 SF-6D EQ-5D HUI2 1.00 0.79 . 0.66 0.68 (0.74-0.83) (0.54-0.72) (0.61-0.74) HUI3 1.00 0.56 0.66 (0.48-0.64) (0.59-0.72) SF-6D 1.00 0.59 (0.51-0.66) EQ-5D 1.00 94 TABLE 3.5: ROTATED FACTOR PATTERN MATRIX Indicator (Questionnaire)* Factor 1 • 2 3 4 5 Pain (SF-6D) 0.88 Mobility (EQ-5D) 0.86 Self-care (EQ-5D) 0.86 Pain (HUI2/3) 0.83 Pain (EQ-5D) 0.81 Physical Functioning (SF-6D) 0.79 Pain with medications (HUI2/3) 0.78 Ambulation (HUI2/3) 0.77 Self-care (HUI2/3) 0.76 Social Limitations (SF-6D) 0.73 0.33 Dexterity (HUI2/3) 0.68 Vitality (SF-6D) 0.62 0.32 Role Limitations 0.60 0.43 Emotion (HUI2/3) 0.89 Anxiety (EQ-5D) 0.85 Mental Health (SF-6D) 0.77 Happiness (HUI2/3) 0.69 Speaking to those you know (HUI2/3) 0.84 Speaking to those you don't know (HUI2/3) 0.93 Memory (HUI2/3) 0.37 0.72 Think/Solve Problems (HUI2/3) 0.51 0.70 Hearing (HUI2/3) 0.63 Vision - Reading (HUI2/3) 0.86 Vision - Recognizing a friend (HUI2/3) 0.84 * Indicators with loadings less than 0.30 have been omitted to improve interpretability 95 TABLE 3.6: FACTOR CORRELATION MATRIX Factor 1 2 3 4 5 Physical Functioning/Pain 1.00 Emotional/Mental Health -0.32 1.00 Speech -0.08 -0.16 1.00 Cognition -0.15 -0.10 -0.15 1.00 Vision -0.15 -0.05 -0.07 -0.04 1.00 96 TABLE 3.7: RELATIVE PRATT INDEX SCORES ASSESSING RELATIVE CONTRIBUTION OF E A C H FACTOR TO THE MODEL'S ADJUSTED R 2 Relative Model Instrument, Pearson Standardized Pratt Index Adjusted Factor Correlation Beta Weight Score* R-square HUI2 0.78 Physical functioning/Pain -0.82 -0.75 0.79 Cognition -0.50 -0.33 0.21 H U B 0.86 Physical Functioning/Pain -0.86 -0.77 0.77 Cognition -0.53 -0.36 0.23 SF-6D 0.83 Physical Functioning/Pain -0.90 -0.77 0.83 Emotional/Mental Health -0.68 -0.21 0.17 EQ-5D 0.51 Physical Functioning/Pain -0.63 -0.71 0.87 Emotional/Mental Health -0.55 -0.13 0.13 * The Relative Pratt Index represents the proportion of the model R-square that is explained by the variable 97 F I G U R E 3 . 1 : D I S T R I B U T I O N S O F G L O B A L U T I L I T Y V A L U E S A C R O S S T H E M A U T I N S T R U M E N T S HUI2 100 B S f f ^ S z E a S S O -- 0 . 2 9 t o - - 0 . 1 9 t o - - 0 . 0 9 t o 0 . 0 1 t o 0 . 1 0 t o 0 . 2 0 t o 0 . 3 0 t o 0 . 4 0 t o 0 . 5 0 t o 0 . 6 0 t o 0 . 7 0 t o 0 . 8 0 t o 0 . 9 0 t o 0.2 0.1 O .O 0 . 0 9 0 . 1 9 0 . 2 9 0 . 3 9 0 . 4 9 0 . 5 9 0 . 6 9 0 . 7 9 0 . 8 9 1 . 0 0 G l o b a l U t i l i t y V a l u e s H U I 3 - 0 . 2 9 t o - - 0 . 1 9 t o - - O . O O t o 0 . 0 1 t o 0 . 1 0 t o 0 . 2 0 t o 0 . 3 0 t o 0 . 4 0 t o 0 . 5 0 t o 0 . 6 O t o 0 . 7 0 t o 0 . 8 0 t o 0 . 9 0 t o 0 . 2 0 . 1 0 . 0 0 . 0 9 0 . 1 9 0 . 2 9 0 . 3 9 0 . 4 9 0 . 5 9 0 . 6 9 0 . 7 9 0 . 6 9 1 . 0 0 G l o b a l U t i l i t y V a l u e s 98 EQ-5D - 0 . 2 9 t o - - O . 1 9 l o - - 0 . O 9 t o 0 . 0 1 t o 0 . 1 0 t o 0 . 2 0 t o 0 . 3 0 t o 0 . 4 0 t o 0 . 5 0 t o 0 . 6 0 t o 0 . 7 0 t o 0 . 8 0 t o 0 . 9 0 t o 0 . 2 0 . 1 O.O 0 . 0 9 0 . 1 9 0 . 2 9 0 . 3 9 0 . 4 9 0 . 5 9 0 . 6 9 0 . 7 9 0 . 8 9 1 . 0 0 G l o b a l U t i l i t y V a l u e s SF-6D -0.29 to - -0.19 to - -0.09 to 0.01 to 0.10 to 0.20 to 0.30 to 0.40 to 0.50 to 0.60 to 0.70 to 0.80 to 0.90 to 0.2 0.1 0.0 0.09 0.19 0.29 0.39 0.49 0.59 0.69 0.79 0.89 1.00 G l o b a l U t i l i t y V a l u e s 99 FIGURE 3.2: BLAND-ALTMAN PLOT OF DIFFERENCE BETWEEN THE HUI2 AND HUI3 VS. THE AVERAGE SCORE WITHIN PATIENTS Average of HUI2 and HUI3 The dotted lines represent ±1.96 times the standard deviation around the difference. The clustering of the data on the right around zero compared to the on the left, indicates that the two utility scores have better agreement in those with higher values and that, at lower values, the HUI2 yields much higher scores than the H U D . 100 FIGURE 3.3: BLAND-ALTMAN PLOT OF DIFFERENCE BETWEEN THE H U B AND THE SF-6D VS. THE AVERAGE SCORE WITHIN PATIENTS • co TS C CC co _D I c 8 c _ li 0 . 0 -.8 • a • J S EL-rn •a c P CP • • 0 . 0 2 4 Average of HUI3 and S F - 6 D .6 1.0 The dotted lines represent ±1.96 times the standard deviation around the difference. The pattern o f observations reveals that for higher utility values, the SF-6D tends to have lower scores than the H U D whereas the converse is true at lower utility values (which is expected since the SF-6D is bounded at +0.30). 101 FIGURE 3.4: BLAND-ALTMAN PLOT OF DIFFERENCE BETWEEN THE HUB AND EQ-5D VS. THE AVERAGE SCORE OF THESE TWO INSTRUMENTS WITHIN PATIENTS A v e r a g e of HUI3 and E Q - 5 D The dotted lines represent ±1.96 times the standard deviation around the difference. From the pattern of the points, it appears that the E Q - 5 D is lower than the H U D at higher values but this relationship reverses at lower values. Also , the odd linear patterns of the points are due to the gaps in the E Q - 5 D scoring system. 102 FIGURE 3.5: BLAND-ALTMAN PLOT OF THE DIFFERENCE BETWEEN THE EQ-5D AND SF-6D VS. THE AVERAGE SCORE WITHIN PATIENTS A v e r a g e o f E Q - 5 D a n d S F - 6 D The dotted lines represent ±1.96 times the standard deviation around the difference. There would appear to be higher agreement on the right compared to the left indicating that the two utility scores have better agreement in those with higher values and that, at lower values, the SF-6D yields much higher scores than the E Q - 5 D . 103 FIGURE 3.6: BLAND-ALTMAN PLOT OF DIFFERENCE BETWEEN THE HUI2 AND THE SF-6D VS. THE AVERAGE SCORE WITHIN PATIENTS Q CD • W T3 C ro • • u x, J J * - o— °a a°a T3 TI • JQ [ i m ,-p c*a 0\"n 0.0' X c I c _ a> 0.8). Comparing the effect sizes across the different indirect utility and disease-specific instruments allowed for a comparison in the instruments' abilities to discriminate between groups of different disease severity, with a larger effect size indicating better discriminative ability. To further assess and compare the construct validity among the instruments, relationships between continuous clinical variables and the M A U T global utilities, the single attribute utilities, the H A Q and the R A Q o L were assessed with Spearman's correlations. It was postulated that strong correlations/relationships would exist between the overall scores from all the instruments and measures of R A severity. For the single attribute utility scores, it was postulated that strong correlations/relationships would exist between mobility, self-care and pain (from the HUI2) , ambulation, dexterity and pain (from the H U D ) , physical functioning, role limitations, pain and vitality (from the SF-6D), mobility, self-care, usual activities, and pain/discomfort (from the EQ-5D) and the continuous measures of R A severity. Due to the skewed nature of the data, non-parametric correlations (Spearman's rho) were calculated. A Spearman's rho of > 0.50 or < -0.50 were considered be strong, while values between -0.49 to -0.30 or 0.30 to 0.49 were considered moderate and values between -0.30 and 0.30 were considered to be weak. 2 5 According to the methods outlined by Samsa et a l . , 2 6 M I D values were calculated for each of the M A U T instruments using the calculated effect sizes. In brief, the methodology is as follows: 1) using Cohen's criteria 2 4 , the absolute value of the effects under consideration were considered to be small (d=0.20); 2) the standard deviation of the global utility from the instruments were determined (Table 4.2); 3) a preliminary estimate of the M I D was determined by multiplying the effect size (0.2) by the standard deviation for each of the 114 summary scores. Differences in scores for each dichotomous clinical parameter were compared to the M I D estimate to determine i f they were clinically important. Thus, our hypothesis was that, using the methods of estimating effect size-based clinically important differences (CID) from cross-sectional data, the differences in overall instrument scores between groups of different R A severity, would exceed the minimally important difference (MID). The H A Q was also utilized as a means to estimate the M I D for the overall utility scores. Using simple, ordinary least squares (OLS) , linear regression, the M A U T overall utility scores and the R A Q o L score (independent variables) were regressed on the H A Q score (dependent variable). Since it is well accepted that the M I D for the H A Q is 0.25, the M I D for each instrument was estimated from the beta coefficient from the regression model to produce a 0.25 change in H A Q score. Descriptive statistics were used to characterize the study sample. Parametric tests (t-tests and A N O V A ) were used to test for important differences between those with missing data from the M A U T instruments. Both parametric (t-tests and A N O V A ) and non-parametric (Mann Whitney tests and Kruskal-Wallis tests) were used due to the skewed nature of the data. However, since the parametric and non-parametric approaches agreed, only the results of the parametric tests are reported. 4.4 RESULTS 4.4.1 Sample Three hundred and thirteen (245 female) respondents with confirmed R A completed the baseline questionnaire. One hundred and ninety seven (63%) patients were recruited 115 directly by the study rheumatologists whereas 116 were recruited via mail . The completion rates of the surveys differed according to the method of recruitment. For direct recruitment by a study rheurnatologist, 91% completed the baseline questionnaire, whereas for recruitment by mail, there was a 38% completion rate after accounting for invalid mailings (returned due to address problems (n=69), patient had died (n=6), patient did not have R A (n=3), and patient already recruited by a different rheurnatologist (n=l)). Those recruited by mail tended to be older (62 vs. 58 years, p=0.01), had R A for a longer period of time (15 vs. 10 years, p=0.0002), had better perceived control of their R A (16% vs. 26% rated as \"not well controlled\" or \"not controlled\" at all , p=0.03) and included more females (84% vs. 74%, p=0.03) than those recruited directly by a rheurnatologist. The final sample represented a clinically heterogeneous cohort of patients with R A (Table 4.2). There were few missing values in the H R Q L or clinical variables. For the H R Q L questionnaires, the lowest completion rate was for the SF-6D with 11 (<4%) missing values. There were no significant differences in demographic or R A characteristics identified between those with complete and missing values. 4.4.2 Description of Global and Single-Attribute Utilities In Table 4.3, a summary of the results of the multiattribute and single attribute utility values for the M A U T instruments is displayed. In Table 4.4, the specific domain responses for the M A U T instruments are given. A comparison of the distributions of the utility scores is illustrated in Figure 4.1. 116 4.4.3 Construct Validity A s hypothesized, all of the M A U T instrument global utility scores were lower in groups thought to have higher R A severity and most of these relationships were statistically significant (Tables 4.5 and 4.6). For self-reported disease-severity and control (each with five categories of responses), there was a gradient across all the instruments' global scores with the highest level of severity/control having the lowest utility and vice versa (Table 4.5). The Spearman correlation coefficients were very similar across both the disease-specific and generic instruments. Other relationships between dichotomous, disease severity indicators and the summary scores for each of the instruments are shown in Table 4.6. For all o f the severity variables, the hypothesized relationship of a better score (a higher global utility for the M A U T instruments and lower scores for the disease-specific instruments) was found to be valid. O f these, 29 of the 36 were significant at p<0.05. For the effect size analysis, 32 out of the 36 calculated effect sizes exceeded Cohen's low limits o f 0.2 (Table 4.6). The H A Q and R A Q o L were generally better able to discriminate among the groups of lower and higher severity as indicated by the larger effect sizes. However, all of the M A U T instruments appeared to have discriminative ability with the H U D having the largest magnitude in overall differences across the dichotomous severity measures. For the correlation analysis between the multi-attribute and single-attribute utility scores and the disease severity measures, all the expected correlations were in the hypothesized direction and most were highly significant. Strong correlations were observed consistently with the R A Q o L score, the patient global V A S , the H A Q disability score, and 117 the pain V A S with the M A U T global utility scores (Table 4.7). For the single attribute utility scores that were postulated to be highly correlated, consistent strong correlations existed only with the R A Q o L scores and the H A Q disability scores. Generally, with the exception of the pain/discomfort single attribute scores from the EQ-5D, the pain V A S was also strongly correlated with the pain single attribute scores, the mobility single attribute score from the HUI2 , and the physical functioning and the role limitations single attribute scores from the SF-6D. The patient global V A S was strongly correlated with the pain single attribute scores from the HUI2 , H U D , and the SF-6D (along with the physical functioning and role limitations single attribute scores from this measure) and the usual activities single attribute score from the EQ-5D. For the disease specific measures, the R A Q o L score was strongly correlated with all o f the disease severity measures with the exception of R A duration, whereas the H A Q displayed a similar correlation pattern as the global utility scores with strong correlations with the R A Q o L , the patient global V A S , and the pain V A S . Using the effect size methodology to estimate the M I D , these values for each of the M A U T instruments were 0.04 for the HUI2 , 0.06 for the H U D , 0.03 for the SF-6D, and 0.05 for the E Q - 5 D . For the disease-specific measures, the estimated M I D was 1.70 for the R A Q o L and 0.15 for the H A Q disability index. A s it can be seen in Table 4.6, the differences in global scores between naturally occurring groups based on clinical characteristics generally exceeds these M I D estimates. Finally, the results of the simple, linear regression revealed strong associations between the M A U T instruments' global utility values and the R A Q o L (dependent variables) and the H A Q disability index (Table 4.8). In estimating the M I D of the M A U T instruments and the R A Q o L using the accepted M I D of the H A Q and the beta coefficients from the linear 118 regression, it was found that the M I D estimates were in general agreement to those determined by the effect size methodology (0.04 vs. 0.03 for the HUI2 , 0.07 vs. 0.06 for the HUI3 , 0.03 and 0.05 for the SF-6D and the E Q - 5 D , respectively). For the R A Q o L , both methodologies yielded similar results (1.70 and 2.0). 4.5 DISCUSSION This is the first study to examine the construct validity of these four generic, M A U T instruments simultaneously in a relatively large cohort of participants with a single, wel l -defined, chronic disease. In addition, it is the first to compare the generic M A U T instruments to two disease specific measures (the H A Q and the R A Q o L ) in their relative abilities to discriminate across R A severity. Finally, the estimates of the M I D values from each of the instruments both serves as a comparison to those with prior M I D values estimated in the literature (HUI2 and H U D , and the H A Q ) , 4 ' 2 0 ' 2 1 and provides new information for those instruments without prior M I D estimates (SF-6D, EQ-5D, and the R A Q o L ) . The low number of missing values (all < 4%) for each of the instruments attests that they are suitable for self-administration. Overall, all the instruments tend to discriminate across disease severity based on multiple criteria with worse scores being associated with measures indicating a higher severity of R A . The results of the differences in the overall scores across known, naturally occurring groups (Table 4.6) general supports construct validity for each of the instruments. However, there were some important differences among the generic, M A U T instruments. For example, only the E Q - 5 D and the SF-6D overall scores were significantly different between those 119 experiencing adverse events to R A drug therapy over the previous three months compared to those who did not. Similarly, o f the M A U T instruments, only the HUI2 and H U D overall scores were significantly different between groups with and without R A hospitalization and other chronic diseases. It is not clear why these differences among the instrument scores exist but may be due to the different aspects of health that are represented by each o f the systems. O f note, differences across groups based on R A severity were consistently significant for the R A Q o L and the H A Q severity index (with the exception of adverse events for the latter). Contrary to the hypothesis that strong correlations would exist between overall instrument scores or selected single attribute utility scores and R A duration, this finding was not observed. Similarly, few strong correlations were observed for SJC or T J C and the scale results. A l l other hypothesized relationships were found to be significant. The length of R A is a somewhat imprecise measure of severity as some patients have severe, aggressive disease from the onset and others have a slow, insidious disease process. A s shown in Table 4.4, more individuals reported having no problems under the E Q -5D and the HUI2 classification systems than reporting the lowest level of deficit on the other systems. This lack of variation in the responses and low number of possible descriptive health states may impede the sensitivity of the E Q - 5 D and HUI2 in longitudinal studies when compared to the other instruments. Conversely, both the SF-6D and the HUI-3 tended to have responses across the full range of severity for most of the domains assessed. Therefore, the SF-6D and the H U D may have a higher degree of sensitivity to the disease burden of R A . The estimates of M I D that we obtained using the effect-size and regression methodologies closely agreed for the HUI2 and H U D (within 0.01) and were exactly the same for the SF-6D and the EQ-5D. However, although the results o f the HUI2 M I D 120 estimates agree with what has been postulated in the literature (Drummond stated that a difference o f 0.03 in utility values should be used for the basis o f sample size calculations for the HUI2 and H U B ) , 2 7 the estimates we obtained for the H U B M I D were higher at 0.06 and 0.07. Moreover, the M I D for the SF-6D estimated by Brazier et al . was identical to those that we obtained using different methodologies. 2 3 However, using any o f the criteria, it would appear that, for the global utility values, the differences between groups in Table 4.6 exceed the M I D . The results o f this study lend support to the construct validity o f al l o f the generic M A U T instruments and the disease-specific instruments in R A and provide some detail regarding their limitations and strengths. A s mentioned, Brazier et al.'s concern is that a score of a preference-based measure may fail to detect a hypothesized difference simply because the difference is not valued by patients or the result of an insensitive scoring system. We believe that we have chosen health states in R A that would clearly result in hypothesized differences in preferences. Besides, we did see differences among these different health states in the anticipated directions, which would appear to address Brazier's concern that these instruments might not identify changes appropriately. Another limitation was that most of the data obtained was self-reported without subsequent verification from clinical records. Thus, it is possible that study participants did not accurately or objectively describe the severity of their R A . However, we believe that the risk of this is low as previous research has shown that patient self-reporting o f symptoms is valid and reliable in R A . 1 9 In conclusion, the overall and certain single-attribute scores o f both the generic M A U T instruments and the disease-specific instruments are all able to distinguish between 121 groups that were defined by measures of R A severity. A s expected, the disease-specific instruments appeared to be slightly superior to the generic measures; however, the M A U T instruments appeared to have construct validity for R A . Effect-size and regression-based estimates of the M I D for the instruments agreed providing comparison with the M I D values previously postulated in the literature or new information for those measures without prior estimates. Most of the differences across R A severity in our cohort exceeded the M I D for overall scores using any criteria. 122 REFERENCES Tijhuis, G.J . , de Jong, Z , Zwinderman, A . H . , Zuijderduin, W . M . , Jansen, L . M . A , et al.. The validity of the Rheumatoid Arthritis Quality of Life ( R A Q o L ) questionnaire. Rheumatology 2001;40:1112-1119. Guyatt G H , Feeny D H , Patrick D L . Measuring health-related quality o f life. A n n Intern M e d 1993; 118-622-629. Drummond M F , O 'Br ien B , Stoddart G L , Torrance G W (eds.). Methods for the economic evaluation of health care programmes. 2nd edition. Oxford Medical Publications, Oxford. 1997. Torrance G W , Feeny D H , Furlong W J , Barr R D , Zhang Y , Wang Q. Multiattribute utility function for a comprehensive health status classification system: Health Utilities Mark 2. Medical Care 1996;34:702-722. Feeny D , Furlong W , Torrance G W , Goldsmith C H , Zhu Z , et al.. Multiattribute and single-attribute utility functions for the Health Utilities Index Mark 3 system. Medical Care 2002;40:113-128. Brazier J, Roberts J, Deverill M . The estimation of a preference-based measure of health from the SF-36. J Health Econ 2002;21:271-292. The EuroQol Group. EuroQoL - a new facility for the measurement of health-related quality of life. Health Policy 1990;16:199-208. Lubeck D P . Health-related quality o f life measurements and studies in rheumatoid arthritis. A m J Manag Care. 2002,8:811-820. Bruce B , Fries JF. The Stanford Health Assessment Questionnaire: a review of its history, issues, progress, and documentation. J Rheumatol 2003;30:167-178. 123 10. De Jong Z , V a n Der Heijde, Mckenna SP, Whalley D . The reliability and construct validity of the R A Q o L : A rheumatoid arthritis-specific quality of life instrument. B r J Rheumatol 1997;36:878-883. 11. Streiner D L , Norman G R . Health Measurement Scales: A Practical Guide to Their Development and Use. 2nd Edition. Oxford University Press, N e w York, 1995. 12. Brazier J, Deverill M . A checklist forjudging preference-based measures of health related quality of life: Learning from psychometrics. Health Econ 1999;8:41-51. 13. Maddigan SL , Feeny D H , Johnson J A . Construct validity of the R A N D - 1 2 and Health Utilities Index Mark 2 and 3 in Type 2 diabetes. Qual Life Res (in press). 14. Grootendorst P, Feeny D , Furlong W. Health Utilities Index Mark 3: evidence of construct validity for stroke and arthritis in a population health survey. M e d Care. 2000;38:290-299. 15. Neumann PJ , Sandberg E A , Arak i SS, Kuntz K M , Feeny D , Weinstein M C . A comparison of HUI2 and H U D utility scores in Alzheimer's disease. M e d Decis Making 2000;20:413-422. 16. Bosch J L , Hunink M G M . Comparison of the Health Utilities Mark 3 ( H U D ) and the EuroQol E Q - 5 D in patients treated for intermittent claudication. Quality of Life Research 2000;9:591-601. 17. Hurst N P , K i n d P, Ruta D , Hunter M , Stubbings A . Measuring health-related quality of life in rheumatoid arthritis: Validity, responsiveness, and reliability o f EuroQOL (EQ-5D). B r J Rheumatol 1997;36:551-559. 124 18. Arnett F C , Edworthy S M , Bloch D A , McShane D J , Fries JF, Cooper N S , et al.. The American Rheumatism Association 1987 revised criteria for the classification of rheumatoid arthritis. Arthritis Rheum 1988;31:315-324. 19. Wong A L , Wong W K , Harker J, Sterz M , Bulpitt K , Park G , Ramos B , Clements P, Paulus H . Patient self-report tender and swollen joint counts in early rheumatoid arthritis. Western Consortium of Practicing Rheumatologists. J Rheumatol 1999;26:2551-2561 20. Redelmeier D A , Lor ig K . Assessing the clinical importance o f symptomatic improvements — an illustration in rheumatology. Arch Intern M e d 1993;153:1337-1342. 21. Wells G A , Tugwell P, Kraag G R , Baker PR, Groh J, Redelmeier D A . Min imum important difference between patients with rheumatoid arthritis: the patient's perspective. J Rheumatol 1993;20:557-560. 22. Johnson J A . (2003) Personal communication. 23. Walters SJ, Brazier JE. What is the relationship between the minimally important difference and health state utility values? The case of the SF-6D. Health Qual Life Outcomes. 2003;11:4-12. 24. Cohen J. Statistical power analysis for the behavioural sciences. 2nd ed. Hillsdale (NJ): Lawrence Erlbaum Assoc., 1988. 25. Cohen J. A power primer. Psychol B u l l 1992;112;155-159. 26. Samsa G , Edelman D , Rothman M L , Will iams G R , Lipscomb J, et al.. Determining clinically important differences in health status measures. A general approach with 125 illustration to the Health Utilities Index Mark II. Pharmacoeconomics 1999;15:141-155. Drummond M . Introducing economic and quality of life measurements in clinical studies. A n n M e d 2001;33:344-349. 126 TABLE 4.1: OVERVIEW OF MAUT INSTRUMENT PROPERTIES Dimensions/Domains/Attributes #of Possible Health States Valuation Technique Boundaries HUI2 Sensation (vision, hearing, speech), Mobil i ty , Emotion Cognition, Self-care, Pain 24,000 Standard Gamble -0 .03-1 .00 HUI3 Vis ion , Hearing, Speech, Ambulation, Dexterity, Emotion, Cognition, Pain 972,000 Standard Gamble -0 .36-1 .00 SF-6D Physical Function, Role Limitation, Social Function, Pain, Mental Health, Vitality 18,000 Standard Gamble 0 .30 -1 .00 EQ-5D Mobil i ty , Usual Activities, Self-Care, Pain, Anxiety 243 Time Trade Off -0 .59-1 .00 127 TABLE 4.2: CHARACTERISTICS OF THE STUDY PARTICIPANTS Parameter Mean (SD)* Median (IQR)* Age (yrs) 61.5(25.9) 63.0(19.0) ESR 24.52 (21.02) 18.0 (24.0) RAQoL score (range 0 - 30) 12.82 (8.28) 12.5(13.0) RA Duration (yrs) 13.87(11.41) 12.00(15.67) HAQ Disabilty Index (range 0 - 3.00) 1.10(0.77) 1.125 (1.375) Patient Global Assessment (100 mm VAS) 59.82(25.86) 65.00 (37.00) MD Global Assessment (100mm VAS) 20.88 (23.39) 10.5(29.0) Pain VAS (100mm VAS) 43.12(27.02) 42.00 (44.00) EQ-5D Health Thermometer 65.02(19.27) 70 (30) Tender Joint Count (range 0 - 28) 15.09(11.99) 12.00(15.00) Swollen Joint Count (range 0 -28) 9.14(9.66) 6.00(11.00) Self-Reported RA Severity, n % Very Mild 9 3% Mild 34 11% Moderate 120 38% Severe 110 35% Very Severe 27 9% Missing 13 4% Self-Reported RA Control, n % Very Well Controlled 33 11% Well Controlled 76 24% Adequately Controlled 123 39% Not Well Controlled 61 19% Not Controlled At All 7 2% Missing 13 4% Adverse Drug Reaction to RA Medication in Last 3 Months, n % Yes 108 35% No 202 65% Hospitalized For RA in Last 12 Months, n% Yes 45 15% No 253 85% Missed Work or School Due to RA in Last 12 Months, n% Yes 59 37% No 100 63% Purchased or Rented Equipment for RA in Last 12 Months, n% Yes 72 28% No 183 72% Used Allied Health Professional/Home Care Services for RA in Last 12 months, n % Yes 129 42% No 174 58% Concomitant Chronic Illness Other Than RA, n% Yes 192 62% No 118 38% * SD = Standard deviation; IQR = interquartile range 128 TABLE 4.3: MULTI-ATTRIBUTE AND SINGLE ATTRIBUTE UTILITY SCORES FROM THE MAUT INSTRUMENTS N Mean STD* Median IQR* Min. Max. HUI2 Global Utility Score 304 0.71 0.20 0.75 0.28 0.11 1.00 HUI2 Single Attribute Scores Sensation 304 0.95 0.04 0.95 0.61 1.00 Mobility 304 0.96 0.06 0.97 0.03 0.73 1.00 Emotion 304 0.96 0.05 0.93 0.07 0.70 1.00 Cognition 304 0.98 0.03 1.00 0.05 0.65 1.00 Self care 304 0.99 0.03 1.00 0.03 0.80 1.00 Pain 304 0.86 0.15 0.85 0.12 0.38 1.00 H U D Global Utility Score 303 0.53 0.29 0.56 0.44 -0.16 1.00 HUD Single Attribute Scores Vision 303 0.98 0.04 0.98 0.75 1.00 Hearing 303 0.98 0.05 1.00 _ 0.61 1.00 Speech 303 0.99 0.02 1.00 0.81 1.00 Ambulation 303 0.94 0.08 0.93 0.07 0.58 1.00 Dexterity 303 0.89 0.11 0.95 0.19 0.56 1.00 Emotion 303 0.95 0.08 1.00 0.05 0.46 1.00 Cognition 303 0.95 0.08 1.00 0.05 0.42 1.00 Pain 303 0.88 0.11 0.90 0.19 0.55 1.00 SF-6D Global Utility Score 302 0.63 0.13 0.60 0.12 0.31 1.00 SF-6D Single Attribute Scores Physical Functioning 302 0.91 0.04 0.90 0.06 0.83 1.00 Role Limitations 302 0.93 0.03 0.93 - 0.90 1.00 Social Functioning 302 0.92 0.03 0.92 _ 0.88 1.00 Pain 302 0.92 0.05 0.92 0.05 0.79 1.00 Mental Health 302 0.95 0.03 0.94 - 0.83 1.00 Vitality 302 0.98 0.02 0.98 0.93 1.00 EQ-5D Global Utility Score 308 0.66 0.24 0.74 0.19 -0.21 1.00 EQ-5D Single Attribute Scores Mobility 308 0.91 0.08 0.85 0.15 0.34 1.00 Self-care 308 0.94 0.09 1.00 0.18 0.44 1.00 Usual Activities 308 0.90 0.11 0.88 0.12 0.56 1.00 Pain 308 0.70 0.16 0.80 0.27 0.26 1.00 Anxiety/Depression 308 0.94 0.09 1.00 0.15 0.41 1.00 * SD = Standard deviation; IQR = interquartile range - = <0.0001 129 TABLE 4.4: DOMAIN RESPONSES FOR THE MAUT INSTRUMENTS Cognition (n= 308) Self Care (n=308) Emotion (n=308) Pain (n=307) Sensation (n=306) Mobility (n=307) HUI2 Levels* Count % Count % Count % Count % Count % Count % 1 185 60 212 67 152 49 19 6 42 14 145 47 2 115 37 81 27 137 45 124 40 229 74 105 34 3 7 2 11 5 17 5 103 34 33 11 53 17 4 1 1 4 1 2 1 49 16 2 1 4 1 5 - - - - - - 12 4 _ _ 0 0 6 - - - - - - - - - -Pain (n=306) Emotion (n=306) Vision (n= 309) Hearing (n=308) Speech (n=310) Cognition (n=308) Ambulation (n=306) Dexterity (n=307) HUI3 Levels* Count % Count % Count % Count % Count % Count % Count % Count % 1 22 7 157 50 45 15 284 90 294 95 185 60 145 48 74 24 2 93 31 105 34 249 80 6 2 7 2 19 6 104 33 99 32 3 111 36 33 11 4 1 8 3 7 2 54 17 29 9 43 14 4 64 21 10 3 6 2 8 3 2 1 41 13 24 8 73 24 5 16 5 1 1 5 2 1 1 0 0 8 3 3 1 16 5 6 - - 4 1 - - 2 1 - - 1 1 1 1 2 1 130 Physical Functioning (n= 307) Role Limitat (n=302) ons Social Functic (n=302) ming Pain (n=303) Mental Health (n=303) Vitality (n=308) SF-6D Levels* Count % Count % Count % Count % Count % Count % 1 7 3 50 17 44 15 12 4 58 19 2 1 2 54 18 192 63 75 24 55 18 123 41 71 23 3 75 24 11 4 129 4 3 72 24 104 34 108 35 4 50 16 4 9 16 4 3 14 81 27 15 5 84 27 5 95 31 - - 11 4 58 19 3 1 38 12 6 26 8 - - - - 25 8 - - 5 2 Mobility (n=309) Usual Activiti (n=309] es Self Care (n=308) Pain/ Disco rr (n=308] ifort Anxiety/ Depression (n=308) EQ-5D Levels Count % Count % Count % Count % Count % 1 119 39 94 30 219 71 35 11 197 63 2 189 61 202 65 86 28 245 80 108 35 3 1 <1 13 5 3 2 28 9 3 2 For the HUI2, HUD, SF-6D and EQ-5D, higher levels represent higher degrees of limitation or increased symptoms. For the SF-6D, Physical Functioning and Pain have six levels, Social Functioning, Mental Health and Vitality have five levels and Role Limitations has four levels. For the HUI2 all the domains have four levels except for Mobility and Pain which have five. For the HUI3, all the domains have six levels with the exception of Pain, Vision and Speech which U d V C five 131 TABLE 4.5: RELATIONSHIP BETWEEN RA SEVERITY AND CONTROL AND THE GLOBAL UTILITY SCORES FOR E A C H OF T H E MAUT INSTRUMENTS Mean Score (SD) Global Utility Score. Mean rSD^ HAQ ^ RAQoL' HUI2< HUI3* SF-6D} EQ-5D* Self-reported RA severity Very mi ld 0.28 (0.43) 3.22 (3.27) 0.89 (0.09) 0.80 (0.25) 0.83 (0.12) 0.89 (0.14) M i l d 0.43 (0.51) 5.15 (4.40) 0.84 (0.12) 0.75 (0.21) 0.74 (0.12) 0.84 (0.13) Moderate 1.01 (0.64) 11.09 (6.65) 0.76 (0.15) 0.59 (0.24) 0.65 (0.10) 0.71 (0.14) Severe 1.37 (0.72) 16.76 (7.86) 0.63 (0.20) 0.43 (0.28) 0.58 (0.12) 0.58 (0.27) Very Severe 1.68 (0.83) 18.96 (8.54) 0.58 (0.23) 0.28 (0.34) 0.50(0.11) 0.47 (0.30) Spearman's Correlation Coefficient 0.46* 0.52* -0.47* -0.46* -0.55* -0.45* Self-reported RA control Very well controlled 0.65 (0.78) 6.15(7.02) 0.82 (0.20) 0.74 (0.26) 0.73 (0.14) 0.81(0.18) Wel l controlled 0.79 (0.68) 8.0 (6.11) 0.79 (0.13) 0.66 (0.24) 0.69 (0.11) 0.77 (0.13) Adequately controlled 1.09 (0.70) 13.10(6.97) 0.73 (0.17) 0.54 (0.25) 0.62 (0.11) 0.67 (0.11) Not well controlled 1.63 (0.59) 20.37 (6.50) 0.55(0.19) 0.31 (0.24) 0.52 (0.09) 0.48 (0.28) Not controlled at all 2.04 (0.38) 24.0 (3.51) 0.42 (0.08) 0.00 (0.11) 0.46 (0.08) • 0.25 (0.23) Spearman's Correlation Coefficient 0.45* 0.59* -0.49* -0.52* -0.58* -0.50* Comparison (using A N O V A ) o f mean values stratified by severity/control - all p<0 0001 *p<0.0001 132 T A B L E 4.6: D I C H O T O M O U S M E A S U R E S O F R A S E V E R I T Y Parameter Adverse Events to RA Drug Therapy in Last 3 months Yes No Effect Size Hospitalized in Past Year for RA Yes No Effect Size Other Chronic Diseases 1 or more None Effect Size Days Off Work/School Due to RA in Past Year Yes No Effect Size Use of Allied Health/Home Services** for RA in Past Year Yes No Effect Size HAQ Mean Score (SD) 1.20 (0.72) 1.06 (0.77) 0.19 1.38(0.73) 1.05 (0.77) 0.44 1.19(0.76) 0.97 (0.75)* 0.29 1.27 (0.71) 0.83 (0.75) J 0.60 1.34 (0.72) 0.80 (0.74) f 0.74 RAQoL 14.7 (7.9) 11.7 (8.2)* 0.37 14.9 (7.9) 12.3 (8.2)* 0.32 13.6 (8.3) 11.6 (8.1)* 0.24 16.4 (7.7) 10.1 (7.8) f 0.81 15.1 (7.9) 11.0 (8.1) f 0.51 HUI2 0.68 ((0.20) 0.73 (0.19) 0.26 0.63 (0.21) 0.72 (0.19)* 0.45 0.69 (0.20) 0.74 (0.19)* 0.26 0.67 (0.19) 0.76 (0.19) 0.47 0.66 (0.20) 0.75 (0.18) | 0.47 Global Utility Score, Mean HUD 0.50 (0.29) 0.55 (0.29) 0.17 0.44 (0.27) 0.55 (0.29)* 0.39 0.51 (0.29) 0.58 (0.29)* 0.24 0.49 (0.27) 0.62 (0.29) 0.46 0.45 (0.28) 0.60 (0.28) t 0.54 SF-6D 0.60 (0.11) 0.65 (0.14) 0.40 0.60 (0.11) 0.63 ((0.13) 0.25 0.62 (0.13) 0.64(0.13) 0.15 0.59 (0.11) 0.67 (0.13) J 0.66 0.59 (0.12) 0.65 (0.13) f 0.48 (SD) EQ-5D 0.61 (0.27) 0.69 (0.21) 0.33 0.61 (0.26) 0.67 (0.23) 0.24 0.65 (0.24) 0.68 (0.25) 0.12 0.59 (0.28) 0.72 (0.22) 0.52 0.60 (0.26) 0.71 (0.22) f 0.46 Rent or Purchase Equipment for RA in Past Year Yes No Effect Size 1.34 (0.69) 0.90 (0.76) t 0.61 16.6 (8.1) 11.8 (8.0) f 0.60 0.62(0.21) 0.74 (0.18) t 0.61 ••Physiotherapy, Occupational Therapy, Massage Therapy, Home Care, f p-value < 0.0001 from a t-test between the two groups , * p-value <0.001, p-value < 0.01, * p-value < 0.05 Large effect sizes (>0.5) are highlighted in bold text 0.39 (0.29) 0.58 (0.27) f 0.68 0.58 (0.11) 0.63 (0.13) 0.42 0.69 (0.22) 0.57 (0.27) % 0.49 133 T A B L E 4.7: C O R R E L A T I O N S F O R M U L T I A T T R I B U T E A N D S E L E C T S I N G L E A T T R I B U T E U T I L I T Y S C O R E S W I T H R A S E V E R I T Y MEASURE RA Duration RAQoL Swollen Tender Joint Patient HAQ Pain (years) Score Joint Count Count Global VAS Disability VAS (0-30) (0-28) (0-28) (mm) Score (mm) HUI2 Global Utility -0.11 -0.70| -0.38t -0.44| 0.55| -0.66| -0.59t Mobility (HUI2) -0.25t -0.52| -0.37f -0.36t 0.49t -0.64t -0.44t Self-Care (HUI2) -0.10 -0.55$ -0.36f -0.34t 0.42t -0.60| -0.40t Pain (HUI2) -0.04 -0.62t -0.45t -0.42| 0.51| -0.54| -0.59t HUD Global Utility -0.20$ -0.75t -0.42t -0.48t 0.58| -0.76t -0.60t Ambulation (HUD) -0.24t -0.53t -0.37t -0.36| 0.49t -0.65| -0.44t Dexterity (HUD) -0.24$ -0.62| -0.47| -0.43f 0.44| -0.68| -0.39t Pain (HUD) -0.09 -0.70| -0.51| -0.45| 0.57t -0.6 It -0.70t SF-6D Global Utility -0.17$ -0.80t -0.47| -0.53f 0.63t -0.73t -0.62t Physical Functioning -0.21$ -0.64f -0.42| -0.41| 0.50| -0.69t -0.50t Role Limitations -0.14* -0.57t -0.36t -0.36t 0.42| -0.50t -0.42t Social Functioning -0.16$ -0.70f -0.42f -0.44| 0.56t -0.63t -0.52t Pain -0.18$ -0.74| -0.50f -0.48| 0.62| -0.67t -0.63t EQ-5D Global Utility -0.11 -0.70t -0.47t -0.42t 0.58| -0.6 It -0.60t Mobility (EQ-5D) -0.17J -0.53t -0.4 It -0.37t 0.46| -0.53t -0.44t Usual Activities (EQ-5D) -0.10 -0.64t -0.44| -0.45| 0.51| -0.57t -0.51t Pain/Discomfort(EQ-5D) 0.00 -0.34f -0.26| -0.32t 0.30t -0.3 It -0.32t Self-Care (EQ-5D) -0.14* -0.59t -0.34| -0.37f 0.43f -0.60t -0.43t RAQoL Score 0.20$ - 0.53t 0.54t -0.62| 0.76t 0.62t HAQ Disability Score 0.28$ 0.76f 0.48| 0.46$ -0.53f - • 0.54t * p <0.05; * p<0.01;f p <0.001; Variables for which correlations were hypothesized to be strong are in bolded type in the measures column; Correlations considered to be strong are in bold type in the results column 134 TABLE 4.8: SIMPLE LINEAR REGRESSION ANALYSES FOR OVERALL INSTRUMENT SCORES AND HAQ DEPENDENT VARIABLES Variable HAQ Disability Index (0-3.0) Minimally Important Difference (MID) Estimation DEPENDENT VARIABLES HUI2 Global Utility Beta-coefficient (SE)* Model p-value R 2 -0.16(0.01) O.0001 0.43 For each 1.0 change in HAQ, the HUI2 changes 0.16. Therefore, a 0.25 change in HAQ (MID) results in a 0.04 change in the HUI2 (estimated MID) DEPENDENT VARIABLES HUD Global Utility Beta-coefficient (SE)* Model p-value R2 -0.29 (0.01) <0.0001 0.58 For each 1.0 change in HAQ, the HUI3 changes 0.29. Therefore, a 0.25 change in HAQ (MID) results in a 0.07 change in the HUB (estimated MCID) DEPENDENT VARIABLES SF-6D Global Utility Beta-coefficient (SE)* Model p-value R2 -0.12(0.007) <0.0001 0.53 For each 1.0 change in HAQ, the SF-6D changes 0.12. Therefore, a 0.25 change in HAQ (MID) results in a 0.03 change in the SF-6D (estimated MID) DEPENDENT VARIABLES EQ-5D Global Utility Beta-coefficient (SE)* Model p-value R2 -0.19(0.01) O.0001 0.37 For each 1.0 change in HAQ, the EQ-5D changes 0.19. Therefore, a 0.25 change in HAQ (MID) results in a 0.05 change in the EQ-5D (estimated MID) DEPENDENT VARIABLES RAQoL Score Beta-coefficient (SE)* Model p-value R2 8.13 (0.003) O.0001 0.57 For each 1.0 change in HAQ, the RAQoL changes 8.13. Therefore, a 0.25 change in HAQ (MID) results in a 2.0 change in the RAQoL (estimated MID) * SI 3 = Standard Error 135 FIGURE 4.1: BOX PLOT OF MAUT INSTRUMENT GLOBAL UTILITY SCORES MAUT Instrument o Outliers are marked by an open circle. 136 CHAPTER 5 NOT A L L QALYS ARE EQUAL: THE IMPACT OF USING DIFFERENT INDIRECT UTILITY MEASURES ON ESTIMATING THE COST-UTILITY OF INFLIXIMAB IN RHEUMATOID ARTHRITIS 5.1 FOREWORD This manuscript is currently under review under the same title for publication in the Medical Decision Making. The candidate is first author of this manuscript which is co-authored by Drs. Stephen Marion, Fred Wolfe, John Esdaile, Monique Gignac, A n n Clarke and As lam Anis . In addition, a statistician, M s . Daphne Guh, was a co-author. Dr. Stephen Marion completed all of the complicated mathematical modeling such as the construction of the transition probability matrix and instructed the candidate in these methods. Drs. Wolfe, Gignac and Clarke provided access to databases that facilitated estimation o f the costs and effectiveness outcomes. M s . Guh provided assistance with the statistical analysis. Drs. Anis and Esdaile are co-supervisors of the candidate. The candidate's role in this manuscript involved the development of the primary hypothesis, study design, model design (with Dr. Marion), statistical analyses and the writing of the final manuscript. 5.2 INTRODUCTION With the introduction of new, expensive biological agents for the treatment of rheumatoid arthritis (RA) , the costs to manage this debilitating chronic disease have increased. This observation has drawn a great deal of attention as governments and third-137 party payers struggle to make decisions about how to incorporate these therapeutic agents into their funding envelopes.1 Because of the limited funds available for health care, R A drugs often compete for funding with those used to treat other disease areas. However, unlike H I V and cancer, arthritis cripples and does not immediately result in death. Drugs used for R A tend to improve quality of life and do not immediately impact on mortality. Because of this fact, treatments for R A can be undervalued when compared to treatments for other chronic diseases. Economic evaluations have become increasingly important in the allocation of funding to treatments and programmes. Thus, i f R A patients are not to be short-changed by policies and decisions, then measures that incorporate quality of life changes into economic studies are key. In order to integrate quality o f life into economic analyses, the effectiveness o f the health interventions in question is measured using a metric known as \"utility,\" which ranges in value from 0 (dead) to 1 (perfect health). Results are reported as cost per quality-adjusted life years ( Q A L Y ) gained, which are derived by incorporating the utilities as weights in the life expectancy calculation. Cost per Q A L Y gained is a unique and preferred measure of the economic value of different interventions, because it permits comparison across disease groups, thereby facilitating funding allocation decisions. The ultimate goal is a comparison of the cost per quality adjusted life year ( Q A L Y ) gained so as to prioritize funding according to c o s t / Q A L Y . This determination of a c o s t / Q A L Y permits comparison across therapies within disease states (i.e. Drug A vs. Drug B , or Drug vs. physiotherapy vs. surgery) as well as permitting comparisons across diseases ( R A vs. lupus or congestive heart disease). To determine the utility weightings for Q A L Y s , the use of a pre-scaled multi-attribute utility index is often the most convenient and least expensive means of achieving this approach. While no validated multi-attribute utility index is available for economic 138 evaluations specifically in musculoskeletal disease, several generic measures of H R Q O L (health related quality of life) appear suitable for adaptation to economic evaluations in R A . 3 ' 4 Such generic utility-based instruments for use in economic evaluations are the Health Utilities Index Mark 2 and 3 (HUI2 and H U D ) , the EuroQol 5D (EQ-5D) and the Short Form 6-D (SF-6D). 5 These measures each capture different attributes and domains and thus assess different aspects of quality of life and utilize different methodologies to calculate the utility score. However, all of these instruments purport to integrate the health states obtained from the population under study with predetermined societal preference ratings for said states to produce an overall index score. There is no \"gold standard\" among these instruments and each likely has its own advantages and disadvantages. However, little is known how the choice of these instruments influences the outcome in economic evaluations of R A drug therapy. Thus, the primary objective of this study was to determine the impact on incremental cost effectiveness ratios (ICERs) from using different indirect utility instruments in an economic evaluation of a new biological agent (infliximab) used to treat R A . A secondary objective of the study was to determine the incremental costs, Q A L Y s , and cost-utility of infliximab over standard therapy for the treatment of active, refractory R A . 5.3 M E T H O D S 5.3.1 Clinical Trial Data Source The largest randomized controlled trial to evaluate the efficacy of infliximab in patients with R A refractory to other disease modifying antirheumatic drugs ( D M A R D s ) was 139 the A T T R A C T trial. 6 Patients were eligible for this trial i f they had active R A (6 or more tender and swollen joints and symptoms or signs (at least two of morning stiffness for at least 45 minutes, an erythrocyte sedimentation rate of at least 28 mm/hr, and serum C-reactive protein of at least 2.0 mg/dL) despite methotrexate ( M T X ) doses of more than 12.5 mg per week. 428 patients from 34 centers in the United States, Canada, and Europe were randomly allocated to one of five treatment groups: M T X alone; M T X plus 3 mg/kg of infliximab every 4 or 8 weeks; or M T X plus 10 mg/kg of infliximab every 4 or 8 weeks. A t baseline, most patients were also receiving non-steroidal anti-inflammatory drugs (74%) and corticosteroids (61%). D M A R D s other than M T X were not allowed and, i f necessary, withdrawn before beginning study treatment. The 5 treatment groups were comparable with respect to age (median, 54 years), gender distribution (78% were women), disease duration (8-9 years), functional class, number of swollen and tender joints, and levels of C-reactive protein. The primary outcome of the study was a comparison of A C R 2 0 (American College o f Rheumatology function class: a 20% or greater improvement in swollen and tender joint counts and a 20% or greater improvement in 3 of patient's global assessment, physician's global assessment, physical disability score (as measured by the Health Assessment Questionnaire [HAQ] disability index), erythrocyte sedimentation rate, and patient's assessment of pain) at 54 weeks between groups. A t the end of the 54 weeks, 52% of the infliximab-treated patients had achieved an A C R 20% response compared with 17% of M T X - o n l y controls (P<0.001). Improvement with infliximab was also evident when comparing patient outcomes in terms of A C R 50% (33% vs. 18%) and A C R 70% (18% vs. 3%). 140 5.3.2 Overview of Model The framework of our Markov decision-analytic model is presented as a schematic diagram in Figure 5.1 and is outlined in Appendix I. Consistent with the inclusion/exclusion criteria of the A T T R A C T trial, 6 the decision model was used to compare the costs and effects o f two different drug therapy strategies for adult patients with R A refractory to standard therapy including M T X : 1) intravenous infliximab 3mg/kg every 8 weeks and intravenous M T X of at least 12.5mg/week; and 2) continued usual R A management with M T X as described above. The time horizon for our model was ten years since we assumed that this would likely be the minimal time frame that infliximab would be used in clinical practice and the perspective of both the costs and outcomes were from society. We utilized a 3% discount rate for both costs and Q A L Y s as generally recommended in recent guidelines. The infliximab plus M T X strategy was based on the pooled results across the three infliximab treatment arms in the A T T R A C T trial . 6 Similar to Wong et al., because of the similarity of the observed outcomes in the different doses and dosing intervals for the infliximab arms in the A T T R A C T trial, these were pooled to estimate the health state transitions associated with infliximab treatment. Treatment with infliximab was assumed to be continuous during the ten year model unless, during individual patient simulations, three consecutive months residing in H A Q states > 2.0 were observed. A t this point, the costs o f infliximab were doubled (indicating a shortening in the dosage interval from 8 weeks to 4 weeks which has been shown to be the course most rheumatologists would take considering the lack of response.9 Finally, i f no improvement in H A Q was made after a further three consecutive months, it was assumed that infliximab would be discontinued and the M T X 141 alone strategy would be adopted. The M T X alone strategy was based on the results achieved from the M T X plus placebo arm in the A T T R A C T trial. 6 \" ' In defining health states for our Markov model, we utilized the H A Q in increments of 0.125 (scores range from 0, which is perfect function, to 3.0, which is maximum functional impairment in 0.125 increments) as discrete health states (Figure 5.1). The H A Q is the best predictor of mortality in R A , 1 0 work disability, 1 1 and health care resource utilization. 1 2 Therefore, within the model, transitions were made between different HAQ-defined health states while costs and outcomes ( Q A L Y s ) were accrued according to the treatment strategy. A l l cause mortality functioned as the absorbing state in the model. 5.3.3 Transition Probability Matrices and Statistical Modeling Wong et al. published transition probabilities based upon an on-treatment analysis o from the A T T R A C T trial for the period of baseline to 30 weeks and 30 weeks to 54 weeks. These investigators grouped the H A Q into four discrete health states ( H A Q 0 = no disability; H A Q 0.1 to 1.0 = mild impairment; H A Q 1.1 to 2.0 = moderate impairment; and H A Q 2.1 to 3.0 = advanced impairment) and reported the transitions to and from these health states. Wong et al. also reported their transition probabilities as single state transitions across a 30 week (baseline to 30 weeks) or 24 week (30 weeks to 54 weeks) time frame. Since there was substantial improvement in the first 30 weeks of the A T T R A C T trial in both the placebo and the infliximab arms, 6 we classified this time frame as being an adjustment phase where considerable regression to the mean was occurring in both treatment arms. Thus, we did not utilize this time frame to estimate the long-term transition probabilities. Instead, we utilized Wong et al.'s 30 to 54 week transition probabilities for the M T X (Table 5.1) and infliximab 142 plus M T X (Table 5.2) strategies as the basis of our long-term transition probability matrix. However, we used the baseline H A Q distribution for all arms in the A T T R A C T trial as the baseline H A Q distribution for entry into the model. Therefore, the underlying mathematical model that we adopted was a continuous time Markov process. A s such, transitions can occur at any time, not just at discrete weekly or monthly time points and the state of the patient with respect to R A is fully defined by knowing which of 25 H A Q states she/he is in at any time. There may be some error, however, in the measurement of the H A Q state. The true H A Q is the measured H A Q +/- an error. The transitions in H A Q state at any time-point are always to one of the neighboring H A Q states (i.e. from H A Q 0.250 to H A Q 0.375 or H A Q 0.125). The interval between transitions is assumed to be exponentially distributed with a rate parameter that depends on the current H A Q ; i.e. the distribution of between event times is: r exp(-r t) (where \"r\" is the transition rate and is a function of the H A Q level from which the transition w i l l occur). The transition rate has three multiplicative components: i) a purely random fluctuation component which is equal in both directions, but which is larger for H A Q scores in the middle of the range than at the extremes; ii) a systematic excess tendency for drift in either an upper or a downward direction; and iii) a factor which allows the systematic drift to increase or decrease (and even reverse) across the range of possible H A Q states. This assumption reduces the 600 (24*25) independent transition probabilities in an unstructured 143 model, to 5 parameters in this structured transition rate model. The observed 4 x 4 transition matrix (from Wong et al.) arises by running the underlying model for 24 weeks and then collapsing the resultant 25 x 25 transition matrix to 4 x 4. The 5 parameters were estimated by standard maximum likelihood methods for non-linear models (S-Plus® 6.1 for Windows procedure N L M I N , but with a more stringent convergence criterion than the default). The estimates for the two treatment strategies were carried out independently. Mortality over six months was ignored in these calculations. The 25 x 25 weekly transition probability matrices for M T X and infliximab plus M T X strategies are shown in Table 5.3 and Table 5.4, respectively. Mortality was then estimated from other data (see below) and superimposed on the H A Q transition rate model (see the Appendix I for details of the model and Appendix II for the C-code for the model). 5.3.4 Mortality Rate We modeled the all-cause mortality rate from a large, dataset spanning from 1974 and continuing through 1999, consisting of 1,922 consecutive R A patients seen at the Wichita (Kansas) Arthritis Center, an outpatient rheumatology clinic. Demographic, clinical, laboratory, and self-report data (including H A Q ) were obtained at each follow-up clinic visit. The details of this data set in regard to mortality have been reported previously. 1 0 ' 1 3 The death rate was calculated using Poisson regression using time at risk as an offset variable. The covariates in the model were age, age-squared, the H A Q and HAQ-squared. From this data, we determined the probability of death by H A Q state as people aged during the simulation (Table 5.5). A l l patients were assumed to be 52.6 ± 9.2 years of age at the time of initiating the treatments as this was the mean age of enrollment in the A T T R A C T study.6 The 144 regression model fit was assessed by examining the model deviance divided by its degrees of freedom. 5.3.5 Utilities and QALYS The primary outcome measure used in the analysis was Q A L Y s over a ten year period associated with the use of either treatment strategy. Util i ty weights for the determination of Q A L Y s were calculated using multiple linear regression models assessing the relationship between the H A Q score, age and utility values. The models were estimated using baseline data from a longitudinal study of 317 R A patients comparing different utility instruments (the HUI2 , the H U D , the EQ-5D, and the SF-6D). Details o f this study can be found elsewhere. 1 4 ' 1 5 The model fit was assessed using R 2 and residual plots were used to assess the fit of each model. 5.3.6 Cost Estimation Our cost analysis was performed from the Canadian societal perspective. The cost components include both the direct medical costs and indirect costs incurred by loss of work due to R A . The methodology used to calculate each cost category is outlined below. A l l costs were deflated by the Consumer Price Index for healthcare products and are in 2002 Canadian dollars. 1 6 Unit costs and other equations in our model are summarized in Table 5.5. 5.3.6.1 Direct Drug Costs According to Schering Canada, infliximab is marketed at a cost o f $CAN2002 909.51 / 100 mg vial). Since the infliximab strategy used in our model (and in clinical practice) is 145 dosed based upon body weight, we applied a weight of 66kg as reported in another Canadian clinical trial o f R A drug therapy 1 7 resulting in the use of two vials every eight weeks. However, a report out of the United States suggested that the average weight of infliximab users in clinical practice was 77kg. 9 Furthermore, Malone et al. revealed in their report that 78% of clinicians surveyed gave patients the entire vial when the calculated dose based on weight was less than the entire v ia l . 9 Thus, we used a blended cost assuming that 67% received 2 vials and 33% received 3 vials of infliximab. This assumption was tested in univariate sensitivity analysis. In addition, the costs of pharmacy (for preparation), nursing (for preparation and monitoring), and baseline T B screening tests consisting of a chest X-ray, a P P D skin test, and a rheurnatologist follow-up visit (for the infliximab strategy) were obtained from the provincial medical fee guide and included in the model. The cost of M T X is $1.00 per 2.5mg tablet and $9.75 per 50mg vial. A n analysis of M T X prescriptions in 2000 for all R A patients in the province of British Columbia based on a population-based R A cohort showed that 90% of M T X prescriptions were oral tablets and 5% were injectable solution with preservative and 5% were injectable solution without preservative. The monitoring costs of anti-nuclear antibodies and ant i -DNA antibodies done twice a year were included in the infliximab strategy whereas other monitoring costs were assumed to be the same across the two strategies. 5.3.6.2 Other Direct Costs Other direct costs beside drug cost included in this study were derived from a longitudinal study of 1063 Canadian patients who reported semi-annually on their health services utilization over the preceding 6 months during 1983 and 1994. A detailed description of the determination of these costs is available from the literature. 1 9 The direct 146 costs of R A care were comprised of long-term care, rehabilitation, nursing homes, health professional visits, medications, diagnostic tests, acute hospitalization, emergency department visits, ambulance services, dialysis and outpatient surgeries. Using a subset o f the database, a mixed-effect regression model to estimate direct cost (log transformed) over the next 6-month period was generated where the predictors were gender (fixed effect), disease duration (fixed effect) and H A Q (random effect) at 0 month and within-patient correlation of observations over time were adjusted. These costs were divided by 24 to give average weekly costs by H A Q health state for input into the Markov model with weekly state transitions. 5.3.6.3 Indirect Costs Indirect cost caused by work disability due to R A was estimated from a prospective longitudinal cohort of 120 employed R A patients recruited in Ontario, Canada from September 1999 to December 2001. 2 0 In the self-report questionnaire, participants were asked the number of days missed due to R A in the past 6 months and their regular weekly working hours. Participants' disability level was assessed by items drawn from the H A Q and the Multidimensional Functional Assessment Questionnaire and supplemented with additional items to assess discretionary activities such as hobbies, leisure pursuits. A multiple linear regression model was constructed between work capacity and the disability score. Using the disability score as a proxy of the H A Q score, we estimated the work capacity for our study groups based on the baseline H A Q and improvement in H A Q . A gender-weighted average income of Canadian population aged from 45-64 was multiplied by work missed to estimate the cost of lost work capacity. For the model, once the age of the cohort was 147 equivalent to 65, these indirect costs were no longer accrued as it was assumed that patients would have retired. 5.3.7 Survival Analysis From the 100,000 simulations described below (50,000 for each treatment strategy), we conducted Kaplan Meier survival analysis and Cox regression. Since time at risk and the occurrence of death as a binary variable were tracked, we estimated the probability of survival over the 10 years of the model. The log-rank test was used to test the null hypotheses that the survival time between the treatment strategies was the same. Cox regression was used to determine the hazard ratio associated with the use of infliximab plus M T X as compared to M T X alone. Right censoring was used for those who had not died after the 10 year time horizon of the Markov model. 5.3.8 Cost-Utility and Probabilistic Analysis The analysis of the state-transition model provides expected costs and expected Q A L Y s over a 10 year follow-up period. If the infliximab strategy was both more effective and more costly, we calculated the incremental cost-utility ratio of the additional cost per Q A L Y . To quantify the precision of our cost-utility estimates, we conducted probabilistic analyses. We utilized both 1st order (random walk) and 2nd order (random draws from specified distributions) Monte Carlo simulation methods. For the 2nd order simulations, we conducted 1000 iterations. For each of the 2nd order iterations with sampling from the specified distributions, 50 random walks were conducted for each strategy. Therefore, a total 148 of 100,000 simulations were conducted (50,000 per treatment strategy). For this model, probability distributions were defined for three sets of key model parameters: 1) gender was assumed to follow a Bernoulli; 2) baseline age was assumed to follow a normal distribution; and 3) baseline H A Q scores were assumed to follow the distribution at randomization in the M T X arm of the A T T R A C T tr ial . 6 ' 8 The variables were chosen as they were key variables in the cost calculation (Table 5.5) and the baseline H A Q distribution sets the starting point for the transition probability matrices. The 95% confidence region surrounding the incremental cost-utility ratio was estimated using Fieller's theorem. 2 2 This analysis was also used to generate plots on the cost-effectiveness plane using each of the indirect utility measurements and to generate cost-acceptability curves with the often quoted threshold of society's willingness to pay (WTP) of $50,000 per Q A L Y as the ceiling ratio. 2 3 5.3.9 Univariate Sensitivity Analysis For parameter estimates that were uncertain but where evidence of prior probability distributions was uncertain or inapplicable, we conducted deterministic, univariate sensitivity analysis. Specifically, we calculated the cost per Q A L Y by varying the discount rate from 0%, 3% (base case), 5%, and 7%. In addition, to account for the potential of higher doses of infliximab being used to achieve the same benefit (Malone et al. reported that approximately 1/3 of infliximab patients receive a higher dose than 3 mg/kg every eight weeks 9), we varied the weekly cost of infliximab from $200 to $500 per week. 149 5.4 RESULTS 5.4.1 Simulation Results Under the assumptions of our model, the mean final H A Q states for those still alive after 10 years or those who expired during the simulations were 2.40 ± 0.41 for the M T X alone strategy and 1.38 ± 0.92 for the infliximab plus M T X strategy (p<0.0001 by student's t-test). The Kaplan-Meier survival curve from the analysis of the 100,000 simulated patients is presented in Figure 5.2. The result of the log-rank test revealed that there was a significant benefit in survival by using infliximab plus M T X over M T X alone (p<0.0001). The hazard ratio associated with infliximab plus M T X was 0.63 (95% confidence interval 0.62 to 0.65, p<0.0001) when compared to M T X alone. Q Similar to Wong et al., to compare the benefit o f infliximab as predicted by our model compared to that achieved in the A T T R A C T trial 6 , we determined the predicted mean H A Q score for the infliximab plus M T X and the M T X alone arms after 54 weeks. Our model predicted a mean difference in improvement in H A Q score of 0.4 for infliximab plus M T X versus M T X alone after 54 weeks which was identical to that observed in the A T T R A C T trial. 5.4.2 Utility and QALY Values The results of the multiple linear regression analyses of the indirect utility measures and H A Q are presented in Table 5.6. The discounted Q A L Y s produced by using these equations in the Markov model by treatment strategy are provided in Table 5.7. The SF-6D produced the highest estimations of Q A L Y s secondary to its high lower bound (0.30) as 150 compared to the other three instruments which permit utility values less than zero (presumably, those health states valued less than death). The H U B produced the largest incremental difference between the infliximab plus M T X and the M T X alone strategies. 5.4.3 Cost-Utility and Probabilistic Analysis The results of the expected costs, expected Q A L Y s and the incremental cost-utility ratios (with 95% confidence limits generated by the 1st and 2nd order Monte Carlo simulations) of using the infliximab plus M T X over the M T X alone strategy are presented in Table 5.8. The mean incremental cost per Q A L Y was the highest for utility weightings provided by the SF-6D compared to the lowest for utility weighting provided by the H U B with the HUI2 and E Q - 5 D utility weightings providing estimates in the middle of these results. The results of the probabilistic analysis are shown graphically on the cost-effectiveness plane in Figure 5.3. Finally, in Figure 5.4, the cost-acceptability curves for each of the indirect utility measures are shown. These results suggest that for ceiling ratios below $50,000, the H U B and E Q - 5 D would most likely yield results below this figure (100% and 99% probability, respectively) as compared to the HUI2 and SF-6D (8% and 0% probability, respectively). For a ceiling ratio of $100,000 per Q A L Y , the results indicate that estimates obtained with any of the indirect utility estimates would be judged to be cost-effective (100% probability). 151 5.4.4 Traditional Sensitivity Analysis Results o f the univariate sensitivity analysis o f varying discount rates and the cost o f infliximab are presented in Table 5.9. The incremental cost-utility ratios are relatively robust to the different discount rates. However, increasing the cost of infliximab causes a large increase in the incremental cost per Q A L Y for all of the indirect utility measures. 5.5 DISCUSSION Our analysis reveals that there is considerable variation in the incremental cost per Q A L Y of using different indirect utility instruments as weightings for Q A L Y estimation in economic evaluation of new therapies for R A . It appears that the SF-6D yields the least optimistic while the H U D yields the most optimistic incremental cost per Q A L Y gained. These findings were further supplemented by the results from the cost-acceptability curves which showed that, under a ceiling ratio of society's W T P of $50,000 per Q A L Y , the H U D and E Q - 5 D based results had a 100% probability of being under this limit. A l so , under the assumptions of our model, we demonstrated that the addition of infliximab to M T X in patients with refractory R A results in an improvement in both quality of life (regardless of the indirect utility measurement technique employed) and survival. However, this benefit comes at an increased cost which is due mostly to the acquisition cost of the drug. Recently, a significant amount of attention has been paid to the fact that the available indirect utility instruments could result in drastically different results when applied to the calculation of Q A L Y s in rheumatology. 1 5 , 1 6 , 2 4 ' 2 5 Conner-Spady et al. administered the E Q -5D, the SF-6D, and the H U D in a consecutive sample of rheumatology patients (98 patients were included in the analysis; of these, 51% had R A with the remainder having other 152 rheumatological conditions) at baseline, 3, 6 and 12 months. 2 3 They calculated a theoretical Q A L Y by summing the average Q A L Y by instrument for each time interval (for example, 0 to 3 months). They found that the E Q - 5 D derived Q A L Y s were larger in those reporting better health than H U B or SF-6D derived Q A L Y s . The authors concluded the three indirect tools they tested were not interchangeable which could have important ramifications for economic evaluations. The analysis by Luo et al. also conducted in a relatively small sample (n=114) of patients with a variety of rheumatic diseases concluded that the H U B and the E Q - 5 D performed \"equally we l l \" in measuring utilities in rheumatic diseases based upon assessment of construct val idi ty. 2 6 While we agree that both of these instruments have construct validity in R A , 1 5 we agree with Conner-Spady et al. that they are clearly not interchangeable for use in economic analysis as Q A L Y weightings. Our analysis, which is based on indirect utility assessment using the HUI2 , H U B , SF-6D, and E Q - 5 D from a sample of 317 patients with rheumatologist-confirmed R A , 1 5 ' 1 6 is the first attempt at quantifying the differences in indirect utility weightings for Q A L Y measurement in an actual economic evaluation in rheumatic arthritis. From the incremental cost-utility results, it can be seen that there was a relative difference of over 100% in incremental cost per Q A L Y between the lowest (from the H U B derived Q A L Y s ) and highest (from the SF-6D derived Q A L Y s ) estimates. In addition, the range in incremental cost per Q A L Y s generated spans the often-quoted threshold of $50,000 per Q A L Y for programs to be funded, making decision-making difficult. For example, using H U B or E Q - 5 D generated Q A L Y s and the $50,000 per Q A L Y threshold as shown in the cost-acceptability curves (Figure 5.4), decision-makers might determine that infliximab is an economically attractive strategy. However, the same model yields less optimistic findings with the HUI2 , and the SF-6D derived Q A L Y s potentially resulting in a conflicting decision. Thus, given the wide 153 range of incremental cost per Q A L Y s generated by using the various instruments, policies regarding new medications should not be based on a single measure wherever possible. A t the very least, economic evaluations should either attempt to explore this issue in sensitivity analysis or, the choice of outcome measures should be standardized across economic evaluations of rheumatoid arthritis. These finding substantiate the conjecture by Conner-Spady et al. that there might be important implications in economic evaluation by employing different indirect utility instruments. 2 5 Our analysis builds on those previously described as the utility estimates are derived from a much larger, homogeneous (all patients had R A ) sample. In addition, we utilized all four of the most commonly utilized indirect utility instruments thus permitting a complete comparison amongst them. Our findings are similar to those who have compared the outcomes in economic analyses using different indirect utility measures. For example, a study by Neumann et a l . , 2 6 showed that incremental cost-effectiveness estimates for a new drug for Alzheimer's disease were more economically attractive when using the HUI3 as compared to the HUI2 . With respect to the secondary objective of our study, namely to determine the incremental cost per Q A L Y in adding infliximab to M T X in the treatment of refractory R A , we found that the cost per additional discounted Q A L Y by adopting the infliximab strategy under the assumptions of our model ranged from $38,161 to $58,991 depending on the utility weighting method utilized. Our results generally agree with other evaluations of the cost-effectiveness of infliximab that have been recently published in that, under certain conditions, the infliximab strategy could be construed to be economically attractive, although there are important methodological differences. 1 ' 8 ' 2 7 A l l o f these analyses (including ours) make use of a Markov model based upon states derived from the Health Assessment 154 Questionnaire ( H A Q ) and utilize the A T T R A C T trial as the primary source of clinical data. Wong et al. extended the results from the 54 week follow-up from the A T T R A C T trial over the lifetime of R A patients.8 These authors found that, from the perspective of society, the infliximab strategy had an incremental cost-utility ratio of $9,100 (US) per Q A L Y gained versus M T X monotherapy. Kobelt et al. conducted a similar analysis from the Swedish healthcare perspective over a 1 year and a 2 year period. 2 8 These investigators found, under the assumptions o f their model, the incremental cost-utility ratio for the infliximab over the M T X strategy was 3440 E U R O per Q A L Y (from the Swedish societal perspective) and 34,800 E U R O per Q A L Y (from the U K societal perspective). Much of the differences in the results across these studies can be explained by basic differences in the construction o f Markov models. Wong et al . utilized a 4x4 transition probability matrix, Kobelt et al. utilized a 6x6 matrix, while ours was 25x25 (with a continuous, underlying process) allowing for increased sensitivity in the relationship between H A Q , cost and Q A L Y s . In addition, the utility weightings employed by each of the studies were different. Wong et al. utilized a visual analogue scale ( V A S ) with death equal to zero and one equal to perfect health whereas Kobelt et al. utilized the E Q - 5 D . In both cases, the sample size from which these values were derived or the methodology to integrate them into the Markov model was not clear. In addition, while the V A S is a preference-based measure, it is not a choice-base method and thus not a utility. The comparison o f these two studies is a perfect illustration how weightings for Q A L Y s vary considerably across economic analysis even for the same drug therapy. Our analysis assumes that infliximab is continued for ten years for those that respond to therapy. This assumption is not unreasonable as it is conceivable that these refractory patients w i l l continue to receive a drug therapy that is as costly as infliximab for the duration 155 of their disease i f it is successful. However, there are several limitations to our study. Since the available results for the A T T R A C T trial only account for 54 weeks of follow-up, the transition probabilities could only be estimated from within this time window. We have ignored the occurrence, costs and outcomes associated with adverse events in the infliximab arm which could include infusion reactions, superficial upper respiratory tract infections, demyelination, and serious opportunistic infections. However, in the A T T R A C T trial, there was no difference in the number of serious adverse events (requiring hospitalization or judged to be life-threatening) between the two treatment groups.6 Many adverse events for the tumor necrosis factor alpha inhibitors were identified due to post-marketing surveillance. A s such, due to either the lack of significant medical intervention required for the management o f these adverse events, their low probability o f occurrence, their elimination through monitoring that we have built into the model (ie. for example, activation of latent T B is a concern with anti-TNF alpha therapy but we have included screening mechanisms prior to treatment for those in the infliximab strategy) and/or their negligible impact on utility, we determined that the costs incurred to manage this complication and changes in utility would not affect the overall model. A recent study has shown that patients encountered in clinical practice are quite different than those enrolled in the A T T R A C T t r ia l . 2 8 In fact, in those patients with \"long-term\" R A in a clinical practice, only 5% would have fit the inclusion/exclusion criteria of the A T T R A C T trial. This finding may seriously limit the applicability of all of the infliximab cost-utility analyses to the majority of patients encountered in clinical practice. A recent editorial by Wolfe et al.. outlines the potential pitfalls of extrapolating randomized controlled trial data in the conduct of long-term cost-effectiveness studies. Further research in this regard is warranted. 156 Other limitations with the model involve the use of H A Q health states (in 0.125 increments) as the underlying predictor of utility, mortality and work disability. Wolfe recently outlined the non-linear nature of the H A Q (changes in the H A Q score in the range o f 1 to 2 represent much less change in function than changes in the H A Q score in range from 0 to 1). J U While we did not assume linearity in the H A Q in terms of the probability calculation, we made this assumption in the regression models. However, as stated by Wolfe, the H A Q is \"a good, sensitive questionnaire, the best we have to date and one that has stood the test of t ime.\" 3 0 Thus, despite its limitations, it is still l ikely the best method currently available in defining R A health states for transition models as we describe. The use of different indirect utility measurement methods as weightings for Q A L Y s yields quite different incremental cost-utility ratios in the economic evaluation of new therapies for R A . Thus, these differences should be explored in sensitivity analysis or, the choice o f outcome measures should be standardized across economic evaluations of rheumatoid arthritis. Infliximab as add on therapy to patients with R A who are refractory to M T X results in additional years of life even when an adjustment for quality is made. Depending on the method of measurement of utility adopted and the ceiling ratio of society's W T P for a Q A L Y , infliximab may represent good value in certain health care environments. 157 5.6 REFERENCES 1. Jobanputra P, Barton P, Bryan S, Fry-Smith A , et al.. The clinical effectiveness and cost-effectiveness of new drug treatments for rheumatoid arthritis: etanercept and infliximab. Accessed on the Internet June 2002 at http://www.nice.org.uk/pdf/RAAssessmentReport.pdf. 2. Drummond M F , O 'Br ien B , Stoddart G L , Torrance G W (eds.). Methods for the economic evaluation of health care programmes. 2nd edition. Oxford Medical Publications, Oxford. 1997. 3. Green C, Brazier J, Deverill M . Valuing health-related quality o f life. Pharmacoeconomics 2000;17:151-165. 4. Coons SJ, Rao S, Keininger D L , Hays R D . A comparative review of generic quality-of-life instruments. Pharmacoeconomics 2000;17:13-35. 5. Kopec J A , Wil l i son K D . A comparative review of four preference-weighted measures of health-related quality of life. J C l i n Epidemiol 2003;56:317-325. 6. Ma in i R, St Clair E W , Breedveld F, Frust D , Kalden J, Weisman M et al. for the A T T R A C T Study Group. Infliximab (chimeric anti-tumour necrosis factor alpha monoclonal antibody) versus placebo in rheumatoid arthritis patients receiving concomitant methotrexate: a randomized phase III trial. Lancet 1999;354:1932-1939. 7. Canadian Coordinating Office of Health Technology Assessment. Guidelines for the economic evaluation of pharmaceuticals: Canada. 2nd Edition. Ottawa: Canadian Coordinating Office of Health Technology Assessment (CCOHTA);1997 . 8. Wong J, Singh G , Kavanaugh A . Estimating the cost-effectiveness of 54 weeks of infliximab for rheumatoid arthritis. A m J Med 2002;113:400-408. 158 9. Malone D C , Ortmeier B G . Cost effectiveness analysis of etanercept (Enbrel) versus infliximab (Remicade) in the treatment of rheumatoid arthritis patients (abstract). Arthritis Rheum 2002;46(suppl.):s95. 10. Wolfe F, Michaud K , Gefeller O, Choi H K . Predicting mortality in patients with rheumatoid arthritis. Arthritis Rheum 2003;48:1530-1542. 11. Lajas C , Abasolo L , Bellajdel B , Hernandez-Garcia C, Carmona L , Vargas E , Lazaro P, Jover J A . Costs and predictors o f costs in rheumatoid arthritis: a prevalence-based study. Arthritis Rheum. 2003;15;49:64-70. 12. Ethgen O, Kahler K H , Kong S X , Reginster J Y , Wolfe F. The effect of health related quality of life on reported use of health care resources in patients with osteoarthritis and rheumatoid arthritis: a longitudinal analysis. J Rheumatol 2002;29.T 147-1155. 13. Choi H K , Hernan M A , Seeger JD, Robins J M , Wolfe F. M T X therapy and mortality in patients with rheumatoid arthritis: a prospective study. Lancet 2002;359:1173— 1177. 14. Marra C A , Woolcott JC, Shojania K , Offer R, Kopec J, Brazier JE , Esdaile J M , Anis A H . A comparison of generic, indirect utility measures (the HUI2 , H U I 3 , SF-6D, and the EQ-5D) and disease-specific instruments (the R A Q o L and the H A Q ) in rheumatoid arthritis. Soc Sci M e d (submitted). 15. Marra C A , Esdaile J M , Guh D , Kopec J A , Brazier JE, Chalmers A , Koehler B , Anis A H . A comparison of four indirect methods of assessing utility values in rheumatoid arthritis. M e d Care (submitted). 16. Consumer Price Index. Statistics Canada. Retrieved January 10, 2003 from www.statcan.ca/english/subjects/cpi/cpi-en.htm . 159 Tsakonas E , Fitzgerald A A , Fitzcharles M A , Cividino A , Thorne JC, et al.. Consequences o f delayed therapy with second-line agents in rheumatoid arthritis: a 3 year follow-up on the hydroxychloroquine in early rheumatoid arthritis ( H E R A ) study. J Rheumatol 2000;27:623-629. Lacaille D , Anis A , Guh D , Esdaile J. Assessing the quality of care for R A at a population level. Arthritis Rheum 2002;46: S626. Clarke A E , Zowal l H , Levinton C, Assimakopoulos H , Sibley JT, et al.. Direct and indirect medical costs incurred by Canadian patients with rheumatoid arthritis: A 12 year study. J Rheumatology 1997;24:1051-1060. Anis A H , Sun H Y , Gignac M . The indirect costs of illness disability in an incident cohort of arthritis patients. Arthritis Rheum 2002;46:s91. Number of income recipients and their average income in constant dollars by sex and age groups. Statistics Canada. Retrieved December 8, 2001, from http://www.statcan.ca/english/census96/ Briggs A H , O 'Br ien B J , Blackhouse G . Thinking outside the box: Recent advances in the analysis and presentation of uncertainty in cost-effectiveness studies. Annu Rev Public Health 2002;23:377-401. Goeree R, O 'Br ien B J , Blackhouse G , Marshall J, Briggs A , Lad R. Cost-effectiveness and cost-utility o f long-term management strategies for heartburn. Value in Health 2002;5:312-324. Conner-Spady B , Suarez-Almazor M E . Variation in the estimation o f quality-adjusted life-years by different preference-based instruments. M e d Care 2003;41:791-801. 160 Luo N , Ling-Huo C , K o k - Y o n g F , Dow-Rhoon K , Swee-Cheng N , et al.. A comparison of the EuroQoI-5D and the Health Utilities Index Mark 3 in patients with rheumatic disease. J Rheumatol 2003;30:2268-2274. Neumann PJ , Sandberg E A , Arak i SS, Kuntz K M , Feeny D , Weinstein M C . A comparison o f HUI2 and H U D utility scores in Alzheimer's disease. M e d Decis Making 2000;20:413-422. Kobelt G , Jonsson L , Young A , Eberhardt K . The cost-effectiveness o f infliximab (Remicaide®) in the treatment of rheumatoid arthritis in Sweden and the United Kingdom based on the A T T R A C T study. Rheumatology 2003:42:326-335. Sokka T, Pincus T, Eligibi l i ty of patients in routine care for major clinical trials of anti-tumor necrosis factor alpha agents in rheumatoid arthritis. Arthritis Rheum 2003;48:313-318. Wolfe F, Michaud K , Pincus T. Do rheumatology cost-effectiveness analyses make sense? Rheumatol 2004;43:4-6. Wolfe F. The psychometrics of functional status questionnaires: room for improvement. J Rheumatol 2002; 29:865-868. 161 T A B L E 5.1: OBSERVED TRANSITION PROBABILITY M A T R I C E S FOR M E T H O T R E X A T E F R O M T H E A T T R A C T TRIAL (FROM W E E K 30 T O W E E K 54) H A Q SCORE GROUPS* K'PS 0 0.1 -1.0 1.1-2.0 2.1-3.0 GRC 0 0.5 0.143 0 0 :ORE 0.1 -1.0 0.5 0.524 0.063 0 iQ SC 1.1-2.0 0 0.333 0.781 0.2 2.1-3.0 0 0 0.156 0.8 * H A Q transition states defined as 0 = no impairment; 0.1 to 1= mi ld impairment; 1.1 to 2 = moderate impairment; 2.1 to 3 = advanced impairment 162 TABLE 5.2: OBSERVED TRANSITION PROBABILITY MATRICES FOR INFLIXIMAB FROM THE ATTRACT TRIAL (FROM WEEK 30 TO WEEK 54) HAQ SCORE GROUPS* HAQ SCORE GROUPS* HAQ SCORE GROUPS* 0 0.1 -1.0 1.1-2.0 2.1-3.0 HAQ SCORE GROUPS* 0 0.679 0.079 0 0 HAQ SCORE GROUPS* 0.1 -1.0 0.286 0.822 0.158 0 HAQ SCORE GROUPS* 1.1-2.0 0.035 0.089 0.806 0.342 HAQ SCORE GROUPS* 2.1-3.0 0 0.01 0.036 0.658 *HAQ transition states defined as 0 = no impairment; 0.1 to 1= mi ld impairment; 1.1 to 2 = moderate impairment; 2.1 to 3 = advanced impairment 163 TABLE 5.3: CALCULATED WEEKLY TRANSITION PROBABILTY MATRIX FOR METHOTREXATE 0.922 0.084 0.006 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 \\ 0.074 0.SO5 0.10? 0.01 0.00 i 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0.005 0.303 0.735 0.136 0.016 0.002 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0.009 0.131 0.665 0.15S 0.023 0.003 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0.001 0.016 0.16 0.601 0.173 0.03 0.004 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0.002 0.025 0.183 0.54S 0.181 0.034 0.005 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0.003 0.035 0.20! 0.511 0.SS4 0.036 0.005 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0.005 0.044 0214 0.489 0.183 0.036 0.004 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0.001 0.007 0.052 0.223 0.483 0.178 0.033 0.004 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0.001 0.OG9 0.056 0.228 0.493 0.S71 0.028 0.002 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 O.OO! 0.009 0.056 0.229 0.518 0.16 0.021 0.001 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0.00! 0.009 0.052 0.225 0.559 0.145 0.015 0.001 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0.001 0.007 0.044 0.213 0.614 0.125 0.009 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0.001 0.005 0.034 0.192 0.6S 0.101 0.005 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0.003 0.024 0.164 0.751 0.077 0.003 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0.00! 0.014 0.13 0.819 0.054 0.001 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0.001 o.oos 0.095 0.878 0.035 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0.004 0.065 0.923 0.021 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 O.OOS 0.04 0.955 0.012 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0.001 0.024 0.975 0.006 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0.013 0.987 0.003 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0.007 0.994 0.001 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0.003 0.997 0.001 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0.001 0.999 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0.001 1.. The upper left hand corner represents the probability of transition from H A Q 0 to H A Q 0 in a one week time frame (pi,i). The rows represent the probability of transition from H A Q states in increments of 0.125 (for example, the second row, first column represents the probability of a transition from H A Q 0.125 to H A Q 0 [p2,i]). Similarly, the columns are in 0.125 increments in H A Q (for example, the first row represent probabilities of transition to a H A Q 0, the second row represents transitions to H A Q 0.125. The bottom right hand corner represents the probability of transitioning from H A Q 3.0 (p25,25)-164 TABLE 5.4: CALCULATED WEEKLY TRANSITION PROBABILTY MATRIX FOR INFLIXIMAB (0.m 0.029 0.00! 0 0 0 0 0 0.034 0.928 0.037 0.001 0 0 0 0 0.001 0.042 0.909 0.04? 0.002 0 0 0 0 0.00! 0.052 0.837 0.059 0.002 0 0 0 0 0.002 0.062 0.363 0.072 0.004 0 0 0 0 0.003 0.073 0.837 0.086 0.005 0 0 0 0 0.004 0.084 0.81 0.1 0 0 0 0 0 0.005 0.094 0.783 0 0 0 0 0 0 0.606 0.103 0 0 0 0 0 0 0 0.008 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 .0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0.007 0 0 0 0 0 0 0 0.114 0.01 0.001 0 0 0 0 0 0.757 0.128 0.012 0.001 0 0 0 0 0.112 0.733 0.14 0.0!5 0.001 0 0 0 0.009 0.118 0.712 0.152 0.018 0.002 0 0 0.00! 0.01 0.123 0.694 0.56! 0.02 0.002 0 0 0.00! o.on 0.126 0.68 0.!69 0.022 0.002 0 0 0.00! 0.012 0.127 0.67 0.175 0.024 0 0 0 0.00! 0.012 0.1.27 0.664 0.178 0 0 0 0 0.00! 0.012 0.125 0.663 0 0 0 0 0 0.001 0.012 0.12! 0 0 0 0 0 0 0.00! o.on 0 0 0 0 0 0 0 0.00! 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 © 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 a 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 © 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0.002 0 0 0 0 0 0 0 0 0.025 0.002 0 0 0 0 0 0 0 0.18 0.025 0.002 0 0 0 0 0 0 0.667 0.179 0.024 0.002 0 0 0 0 0 0.116 0.675 0.176 0.022 0.002 0 0 0 0 0.01 O.U 0.688 0.171 0.02 0.00! 0 0 © 0 0.008 0.103 0.704 0.S63 0.0! 7 0.001 0 0 0 0 0.007 0.095 0.725 0.153 0.0! 5 0.00! 0 0 0 0 0.006 0.086 0.748 0.142 0.012 0.00 0 0 0 0 0.004 0.076 0.774 0.128 0.00< 0 0 0 0 0 0.003 0.066 0.801 0.11 0 0 0 0 0 0 0.002 0.058 0.87 The upper left hand corner represents the probability of transition from H A Q 0 to H A Q 0 in a one week time frame (pi,i). The rows represent the probability of transition from H A Q states in increments of 0.125 (for example, the second row, first column represents the probability o f a transition from H A Q 0.125 to H A Q 0 [p2,i]). Similarly, the columns are in 0.125 increments in H A Q (for example, the first row represent probabilities of transition to a H A Q 0, the second row represents transitions to H A Q 0.125. The bottom right hand corner represents the probability of transitioning from H A Q 3.0 ( £ 2 5 , 2 5 ) 165 TABLE 5.5: UNIT COSTS (IN CANADIAN DOLLARS), OTHER PARAMETERS AND EQUATIONS IN THE MARKOV MODEL Model Parameter Parameter Equation Infliximab* Methotrexate 15mgAveek Pharmacist preparation¥ Nursing monitoring ¥ Laboratory monitoring 1 Chest X-ray1* P P D Sk in Test* Rheumatology Cl inic Vis i t* Proportion females Mean age, yrs/100 (SD) 6 Month Direct Medical Costs Annual Indirect Costs Work Capacity $264.89/week $6.05/week $6.59/week $19.02/week $5.66/week $84.62 once $8.42 once $72.62 once 78% 0.52 (0.09) (exp (6.49 -1.18*age + 0.15*(female)+0.39*HAQ + 0.5*1.66)) ((l-workcapacity)*((47,085*(male) + 33,774*(female))) 1.09 -<0.18*HAQ) Annual Death Rate - exp(-l 1.42+(l 7.05*age)-(0.26*HAQ)-(7.95*age 2)+(0.29*HAQ 2)) *Based on an assumption of 67% receive 2 * lOOmg vials and 33% receive 3 * lOOmg vials - not applicable ¥ For the preparation, administration and monitoring of infliximab infusion * For the monitoring of anti-nuclear antibodies and anti-DNA antibodies twice a year t For the screening of latent tuberculosis prior to infliximab therapy In the equations, \"male\" and \"female\" refer to proportion of that gender as reported in the ATTRACT trial; \"HAQ\" refers to the disability index from 0 to 3.0; and \"age\" refers to the age in years divided by 100. 166 TABLE 5.6: MULTIPLE LINEAR REGRESSION MODELS OF THE INDIRECT UTILITY MEASURES Dependent Variable Independent Variable, Beta Coefficient, P- Model R 2 Value HUI2 Utility Score Intercept, 0.88, O.0001 0.43 H A Q , - 0 . 1 7 , O.0001 Age* , 0.03, 0.71 H U B Utility Score Intercept 0.83, O.0001 0.59 H A Q , -0.29, <0.0001 Age*, 0.01, 0.55 SF-6D Utility Score Intercept, 0.69, O.0001 0.54 H A Q , -0 .13,0 .0001 Age*, 0.13, 0.002 EQ-5D Utility Score Intercept, 0.72, <0.0001 0.38 H A Q , -0.20, O.0001 Age*, 0.25, 0.008 A l l models based on a sample size of 317 respondents *Age is age in years divided by 100 167 TABLE 5.7: DISCOUNTED QALYS GENERATED BY INDIRECT UTILITY METHOD IN THE MARKOV MODEL Indirect Utility Method Discounted* Mean QALY ± SD Mean Difference ± SD Infliximab plus MTX Strategy MTX Alone Strategy HUI2 5.38 ± 0 . 8 9 4.12 ± 0 . 6 3 1.27 ± 0 . 4 3 HUI3 3.74 ± 1.27 1.74 ± 0 . 7 2 1.99 ± 0 . 6 4 SF-6D 4.74 ± 0.69 3.75 ± 0 . 5 0 0.98 ± 0 . 3 5 EQ-5D 4.72 ± 0.94 3.30 ± 0 . 5 8 1.43 ± 0 . 4 7 *Discounted at 3% per annum 168 TABLE 5.8: EXPECTED COSTS AND INCREMENTAL COST-UTILITY RATIOS GENERATED BY THE INDIRECT UTILITY METHODS Strategies Cost $ per Patient* 95% Confidence Limits^ (1) M T X alone $133,737 $131,007-$136,466 (2) Infliximab plus MTX $199,729 $197,747-$201,713 Difference (2) - (1) $65,993 $64,412 - $67,573 Incremental Cost per QALY* HUI2-QALY $52,078 $48,850 - $55,528 HUI3-QALY $33,092 $30,887 - $35,436 SF-6D-QALY $67,005 $62,773 -$71,540 EQ-5D-QALY $46,159 $43,086 - $49,438 * Discounted at 3% per annum * Discounted at 3% per annum f Generated by application of Fieller's theorem 169 TABLE 5.9: UNIVARIATE SENSITIVITY ANALYSIS - INCREMENTAL COST-UTILITY RATIO (INCREMENTAL COST PER QALY) BY INDIRECT UTILITY METHOD Discount 0% Discount 5% Discount 7% Infliximab Cost* $200 to $500 per week HUI2-QALY $56,318 $51,302 $53,762 $40,918 to $147,454 HUI3-QALY $39,190 $34,851 $35,951 $27,485 to $98,207 SF-6D-QALY $75,142 $84,902 $77,483 $46,616 to$169,237 EQ-5D-QALY $53,518 $47,830 $49,609 $37,084 to $138,098 * At the base case discount rate (3%) for both costs and QALYs 170 FIGURE 5.1: A SCHEMATIC REPRESENTATION OF THE MARKOV, HAQ-BASED MODEL USED FOR THE COST-EFFECTIVENESS ANALYSIS \"STAGE\" REFERS TO THE MARKOV TRANSITION CYCLE. FOR SIMPLICITY, ONLY TRANSITIONS FROM H A Q 0 ARE SHOWN. 171 FIGURE 5.2: KAPLAN-MEIER SURVIVAL CURVES FROM THE 100,000 MONTE CARLO SIMULATIONS .7 * S S B S S : 0 2 4 6 8 10 12 Follow-up Time (Years) SURVIVAL CURVES FOR THOSE ON INFLIXIMAB PLUS M T X (UPPER BLUE LINE) AND M T X ALONE (LOWER RED LINE). T H E DIAMONDS (0) REPRESENT THOSE WHO WERE CENSORED. 172 FIGURE 5.3: FIELLER'S THEOREM CONFIDENCE LIMITS PLOTTED ON THE COST-EFFECTIVENESS PLANE 10 1.2 1.4 1.6 1.8 2.0 QALY difference (A QALY) Each ellipse within each utility-defined QALY covers 5%, 50% and 95% of integrated joint density between cost and the QALY differences. The lines are the upper and lower confidence limits using Fieller's theorem. As noted by Briggs et al. (ref. 22), the \"wedge\" defined by Fieller's confidence limits falls inside the 95% ellipse. 173 FIGURE 5.4: COST-UTILITY (or COST-EFFECTIVENESS) ACCEPTABILITY CURVES FOR EACH INDIRECT UTILITY MEASURE Cost-Effectiveness Acceptability Curve (Fieller's Thereom) 30000 40000 50000 60000 70000 Value of ceiling ratio ($) 174 APPENDIX I: MARKOV MODEL To conduct a cost-effectiveness analysis of infliximab plus methotrexate ( M T X ) compared to M T X alone in the treatment of severe rheumatoid arthritis using the A T T R A C T trial as a source o f clinical outcomes, we defined a Markov model with 26 health states: 25 states based upon increments of 0.125 in a Health Assessment Questionnaire ( H A Q ) score from 0 (no disability) to 3.0 (worst level of disability) and one absorbing state, death. The length of our Markov cycle was one week. A n important assumption made was that patients in a given initial health state have a constant probability per unit time of making a transition into any other given state independently of how much time has already passed in the initial state. This assumption, referred to as the Markov property, was essential in modeling the prognosis with a finite number of states. We also assumed that, given that death does not occur, the intermediate transition probabilities from one H A Q state to another are gender and age independent and therefore constant through time. The probability of all-cause mortality in a unit of time for people with rheumatoid arthritis however is a function of the H A Q state and age which we have denoted asp(h,a). A s mentioned, the final transition probabilities were the combination of the intermediate transition probabilities and mortality probability. The transition matrix below describes the transition probabilities in the Markov model (also shown it Tables 5.3 and 5.4). 175 T(h,a) = p^(\\-p(h,a)) p>. 2(1-p(h, a)) pi,(\\-p(h,a)) p2.2(1-p(h, a)) p(h,a) p(h,a) 0 0 Where pi,, = H A Q 0 to H A Q 0 /?/, 2 = H A Q 0 t o H A Q 0.125 p2,i = H A Q 0.125 to H A Q 0 P25.25 = H A Q 3.0 to H A Q 3.0 PU6, i= 1,2,3 26 = state i to death p(h,a) = all-cause, age and HAQ-dependent mortality experienced by people with rheumatoid arthritis The utility accrued for cycle t, referred to as the cycle sum, was calculated by: Cycle Sum = ]T p,(t)xU, where P , (t) was the distribution of patients in the 26 health states at cycle t and Uj was the utility associated with state /. Simulations were run for 10 years (520 cycles). The cycle sum was then added to a running total - the cumulative utility- which is what was required for cost-utility analysis. /=1,2...26 176 The distribution of patients at each cycle (i.e. one week) was calculated using the transition matrices T(h,a) and the distribution of patients at age to, distributed among the 26 health states (pi(t0), p2(t0) p^o) P4(t0) ps(t0) P6(t0) p7(ta) P26(t0)), then the distribution of patients at age t was given by: t (p,(t),p>(t),p,(t)...pK(t)) = (pl(t«),P2(t„),p,(t»)...px,(to)) T(HAQ,age) age=to+\\ where the right hand side of the equation was the matrix product of a row vector (Pi(t0), P2(t0) Pi(t0) P4t0) P5(t0) P6(t0) P7(t0) P26(to)), and t-t0-l transition matrices. 177 APPENDIX II: C - C O D E USED T O R U N T H E M A R K O V M O D E L /* — The following code was compiled under lcc-win32 — */ #include #include #include #include #include \"uniform.c\" #include \"ltqnorm.c\" #define ranunif genrand_real3 #define setseed initgenrand #defme str(x) #x #define xstr(x) str(x) //changeable: #define RESTART #define MM 25 #define MM1 24 #define ENDSTAGE 520 (520 weeks = 10 years) #define SEED 14357 #defme RATE 0.03 (discount rate) #define BLKN 1000 (for 2nd order Monte Carlo - # of simulations) #define BLKZ 50 (for 1st order Monte Carlo - # of patients simulated per 2n d order Monte Carlo) #define CFLX 500 (for univariate sensitivity analysis = cost of infliximab up to 500 dollars) #defme MCCOUT sim500.txt int haqmcc(void); int rchoose (double *xx, int mm, double ru) // chooses the first element of a vector of length mm that equals or exceeds ru { int jj; int ii; jj = mm-l; for (ii = 0; ii <= mm-1; ii++) if (ru < xx[ii]) { jj = »; break; } return jj; } // the function that does the simulations: int haqmcc(void) { //data input char *rnames=\"block outtime ageO gender dead haqiO haqil finaldose rcost rqalyl rqaly2 rqaly3 rqaly4 inflix\"; FILE *fpt, *fout; double xcol=0; //intermediate, now misnamed double **ctmtx, **ctflx, **ctmat, *chaq0, *chaq; int nread=0; int ii, jj, stage, iter, block; int haqi, haqiO, trak, doseO, dose,outtime,gender,dead; double *vhaq=(double *)malloc(MM*sizeof(double)); 178 unsigned long s=SEED; double age, ageO, haq, Ipdying, rate, dfact; double directcost, indirectcost, wcage, cinflix, cmethotrexate=6.05; double cost, rcost; double log52=log(52); double *workcapacity=(double *)malloc(MM*sizeof(double)); double *cinfliximab=(double *)malloc(MM*sizeof(double)); double *bchoices=(double *)malloc(2*sizeof(double)); double util[4]={0,0,0,0}; double rqaly[4]={0,0,0,0}; //read matrix ctmtx in //ctmtx is matrix with columns that are cumulated versions of the columns of the mtx transition matrix ctmtx=(double **)malloc(MM*sizeof(double *)); ctmtx[0]=(double *)malloc(MM*MM*sizeof(double)); for(ii= 1 ;ii<=24;ii++) ctmtx[ii]=ctmtx[ii-1 ]+MM; if((fpt=fopen(\"ctmtx.txt\",\"r\"))==NULL) printf(\"%s\",\"Error opening file ctmtx for reading\"); for(ii=0;ii<=24;ii++) { for(jj=0uJ<=24Jj++){ nread=nread+fscanf(fpt,\"%le\", &xcol); ctmtx[jj][ii]=xcol; //column major order }} fclose(fpt); //read matrix ctflx in //ctflx is formed from the transition matrix for the infliximab group nread=0; ctflx=(double **)malloc(MM*sizeof(double *)); ctflx[0]=(double *)malloc(MM*MM*sizeof(double)); for(ii=l;ii<=24;ii++) ctflx[ii]=ctflx[ii-l]+MM; if((fpt=fopen(\"ctflx.txt\",\"r\"))==NULL) printf(\"%s\",\"Error opening file ctflx for reading\"); for(ii=0;ii<=24;ii++) { for(iJ=0dJ<=24uj++){ nread=nread+fscanf(fpt,\"%le\", &xcol); ctflx[jj][ii]=xcol; //column order }} fclose(fpt); //read vector chaqO in //baseline cumulative HAQ distribution nread=0; chaqO=(double *)malloc(MM*sizeof(double)); if((fpt=fopen(\"newchaq0.txt\",\"r\"))==NULL) printf(\"%s\",\"Error opening file chaqO for reading\"); for(ii=0;ii<=24;ii++) { nread=nread+fscanf(fpt,\"%le\", &xcol); chaqO[ii]=xcol; } fclose(fpt); #ifdef RESTART fout=fopen(xstr(MCCOUT),\"w\"); fprintf(fout,\"%s\",rnames); #else 179 fout=fopen(xstr(MCCOUT),\"a\"); #endif /* RESTART */ setseed(s); bchoices[l]=1.0; for(ii=0;ii<=MMl;ii++){ vhaq[ii]=(double)ii/8; // possible haq values workcapacity[ii]=1.09132 - 0.18404*vhaq[ii]; workcapacity[ii]= (workcapacity[ii]>l) ? 1 : workcapacity[ii]; // correct in expected calc. cinfliximab[ii]= (CFLX *0.67)+ (2*CFLX*0.33) + 25.50; // double vial use } for(block=l ;block<=BLKN;block++){ //constants that are not altered during the Markov process can go here. ageO = .52+0.092*ltqnorm(ranunif()); bchoices[0]=l-0.2243; gender=rchoose(bchoices,2,ranunif()); haqiO=rchoose(chaqO,MM,ranunif()); iter=l; for(dose0=0;dose0<=l ;dose0++){ for(iter= 1 ;iter<=B LKZ; iter++) { // initial values for quantities that do change, go here. haqi=haqiO; haq=vhaq[haqiO]; dead=0; dose=dose0; age=ageO; ctmat=(dose==0) ? ctmtx : ctflx; rcost=0; for(ii=0;ii<4;ii++) rqaly[ii]=0; outtime=0; for(stage=0;stage < ENDSTAGE ;stage++) { lpdying=-11.4184+(17.0462*age)-(0.2582*haq)-(7.9548*age*age)+(0.2903*haq*haq)-log52; dead=(ranunif()=l) { //if dose==0, infliximab is permanently discontinued.so no need to track if(haq>=2.0) trak=trak+1; else trak=0; if(trak==12) {dose=(dose+l)%3;cinflix=dose*cinfliximab[haqi];trak=0;} } if(dose< 1) ctmat=ctmtx; // \"else\" would give wrong result. Would fail to change cmat. age=age + 1/5200; chaq=ctmat[haqi]; haqi=rchoose(chaq,MM,ranunif()); haq=vhaq[haqi]; outtime=stage; } fprintf(fout,\"\\n%d %d %f %d %d %d %d %d %i %f %f %f %f %d\", block, outtime, ageO, gender, dead, haqiO, haqi, dose, (long)rcost, rqaly[0], rqaly[l], rqaly[2], rqaly[3], doseO); //printf(\"%f',clock()/CLOCKS_PER_SEC); } } if(block % 50 == 0) printf(\"\\n%d\",block);} fprintf(fout,\"\\n\"); fclose(fout); return 0; } int main(void){ int rtv; time_ttl,t2,tt; tl=time(&tt); rtv=haqmcc(); t2=time(&tt); printf(\"\\n\\n%f ,difftime(t2,tl)); return rtv; } *For calculation of Itqnorm * Lower tail quantile for standard normal distribution function. * This function returns an approximation of the inverse cumulative * standard normal distribution function. I.e., given P, it returns * an approximation to the X satisfying P = Pr{Z <= X} where Z is a * random variable from the standard normal distribution. * The algorithm uses a minimax approximation by rational functions * and the result has a relative error whose absolute value is less * than 1.15e-9. * * Author: Peter J. Acklam * Time-stamp: 2002-06-09 18:45:44+0200 * E-mail: jacklam@math.uio.no * WWW URL: http://www.math.uio.no/~jacklam * C implementation adapted from Peter's Perl version */ #include #include #include 181 /* Coefficients in rational approximations. */ static const double a[ ] = { -3.969683028665376e+01, 2.209460984245205e+02, -2.759285104469687e+02, 1.383577518672690e+02, -3.066479806614716e+01, 2.506628277459239e+00 }; static const double b[ ] = { -5.447609879822406e+01, 1.615858368580409e+02, -1.556989798598866e+02, 6.680131188771972e+01, -1.328068155288572e+01 }; static const double c[ ] = { -7.784894002430293e-03, -3.223964580411365e-01, -2.400758277161838e+00, -2.549732539343734e+00, 4.374664141464968e+00, 2.938163982698783e+00 }; static const double d[ ] = { 7.784695709041462e-03, 3.224671290700398e-01, 2.445134137142996e+00, 3.754408661907416e+00 }; #define LOW 0.02425 #define HIGH 0.97575 double ltqnorm(double p) { double q, r; errno = 0; if(p<0||p>l) { errno = EDOM; return 0.0; } else if (p == 0) { errno = ERANGE; return - H U G E V A L /* minus \"infinity\" */; } else if (p == 1) { errno = ERANGE; return HUGE_VAL /* \"infinity\" */; } else if(p HIGH) { /* Rational approximation for upper region */ q = sqrt(-2*log(l-p)); return -(((((c[0]*q+c[l])*q+c[2])*q+c[3])*q+c[4])*q+c[5]) / ((((d[0]*q+d[ 1 ])*q+d[2])*q+d[3])*q+1); } else { /* Rational approximation for central region */ q = p-0.5; r = q*q; return (((((a[0]*r+a[l])*r+a[2])*r+a[3])*r+a[4])*r+a[5])*q / (((((b[0]*r+b[l])*r+b[2])*r+b[3])*r+b[4])*r+l); } /•calculates uniform A C-program for MT19937, with initialization improved 2002/1/26. Coded by Takuji Nishimura and Makoto Matsumoto. Before using, initialize the state by using init_genrand(seed) or init_by_array(init_key, key_length). Copyright (C) 1997 - 2002, Makoto Matsumoto and Takuji Nishimura, All rights reserved. Redistribution and use in source and binary forms, with or without modification, are permitted provided that the following conditions are met: 1. Redistributions of source code must retain the above copyright notice, this list of conditions and the following disclaimer. 2. Redistributions in binary form must reproduce the above copyright notice, this list of conditions and the following disclaimer in the documentation and/or other materials provided with the distribution. 3. The names of its contributors may not be used to endorse or promote products derived from this software without specific prior written permission. THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS \"AS IS\" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, 183 PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.) Any feedback is very welcome. http://www.math.keio.ac.jp/matumoto/emt.html email: matumoto@math.keio.ac.jp */ #include /* Period parameters */ #define N 624 #defme M 397 #define M A T R I X A 0x9908b0dfUL /* constant vector a */ #define UPPER_MASK 0x80000000UL /* most significant w-r bits */ #define LOWERMASK 0x7fffffffUL /* least significant r bits */ static unsigned long mt[N]; /* the array for the state vector */ static int mti=N+l; /* mti==N+l means mt[N] is not initialized */ /* initializes mt[N] with a seed */ void init_genrand(unsigned long s) { mt[0]= s & OxffffffffUL; for (mti=l; mti32 bit machines */ } } /* initialize by an array with array-length */ /* init_key is the array for initializing keys */ /* keylength is its length */ void init_by_array(init_key, key_length) unsigned long init_key[], key_length; { intij , k; init_genrand(19650218UL); i=l;j=0; k = (N>key_length ? N : keylength); for(; k; k~) { mt[i] = (mt[i] A ((mt[i-l] A (mt[i-l] » 30)) * 1664525UL)) + initkeyfj] + j ; /* non linear */ mt[i] &= OxffffffffUL; /* for WORDSIZE > 32 machines */ i++;j++; if (i>=N) { mt[0] = mt[N-l]; i=l; } if (j^keyjength) j=0; 184 } for(k=N-l;k;k») { mt[i] = (mt[i] A ((mt[i-l] A (mt[i-l] » 30)) * 1566083941UL)) - i; /* non linear */ mt[i] &= OxffffffffUL; /* for WORDSIZE > 32 machines */ i++; if (i>=N) { mt[0] = mt[N-l]; i=l;} } mt[0] = 0x80000000UL; /* MSB is 1; assuring non-zero initial array } /* generates a random number on [0,Oxffffffff]-interval */ unsigned long genrand_int32(void) { unsigned long y; static unsigned long mag01[2]={0x0UL, MATRIX_A}; /* mag01[x] = x * M A T R I X A for x=0,l */ if (mti >= N) {/* generate N words at one time */ int kk; if (mti == N+l) /* if init_genrand() has not been called, */ init_genrand(5489UL); /* a default initial seed is used */ for (kk=0;kk 0.50 or < -0.50 were considered be strong, while values between -0.49 to -0.30 or 0.30 to 0.49 were considered moderate and values between -0.30 and 0.30 were considered to be weak). 2 2 The dependent variables for all other analyses were the global utility scores for the SF-6D, the HUI2 , the H U D , the E Q - 5 D and the overall R A Q o L and H A Q disability score. Univariate associations between the dependent variables and demographic characteristics (age, and gender), disease severity measures (duration of R A , self-reported pain, swollen/tender joint count, self-reported severity and control), and measures of SES were assessed using simple linear regression. Comparisons of self-reported health and R A severity measures across categories of SES were conducted. Statistical comparisons were made using either A N O V A , Student's t-test or %2 test, where appropriate. For the primary analysis, ordinary least squares (OLS) regression was used to adjust for R A clinical measures and then to assess the relationship between SES and the dependent variables. Each SES variable was modeled separately. A l l two-way interactions between SES and R A clinical measures were also tested in the multiple regression models. Mode l fit was 193 assessed using adjusted R and standardized residuals were plotted against standardized predicted scores to assess each model for homoscedasticity. To account for the possibility that there is an inter-relationship between the dependent variables (the generic or disease-specific H R Q L or functional status score) and annual household income, two-stage least squares (TSLS) regression was used. For T S L S , the \"problematic\" predictor variable (in our case, income) must be continuous rather than categorical. Therefore, for the T S L S analysis, the self-reported annual income variable was converted to a measure in increments of $10,000.00 (as reported on the original questionnaire). Instrumental variables are not influenced by others in the model but have influence on the variable of interest. Thus, for the first stage of the regression, the instrumental variables used in our analysis to predict annual household income were marital status, number of people in the household, and educational status which were all highly correlated with income but lowly correlated with H R Q L or functional status measures. In the second stage, the predicted values of income were regressed on the generic H R Q L scores (HUI2, H U B , SF-6D, or EQ-5D), the disease-specific H R Q L scores ( R A Q o L ) and the H A Q scores to yield unbiased parameter estimates. These parameters were compared to O L S regression coefficients (from models using the same annual household income variable) to determine how closely they matched. 6.4 RESULTS Characteristics of the study participants are presented in Table 6.1. The average age of the sample was 61.5 ± 25.9 years, there were more women (79%) and males tended to be 194 older (66 vs. 60 years, p=0.004). The mean number of years since being diagnosed with R A was 13.9 ± 11.4. Fifty eight (19%) of the 313 participants did not report their annual household income and were therefore excluded from the income analysis at the individual level. To determine i f this subgroup was different from the group that reported annual income levels, we compared the values for the variables presented in Table 6.1 between these two groups. There were no differences between these variables in the two subgroups adding confidence that no bias was introduced through these missing data. The sample was well distributed across levels of SES (Table 6.2). Although 50 (16%) had an annual household income below $20,000, 47 (15%) reported incomes over $70,000, 15 of which exceeded $100,000. There was a significant relationship between self-reported annual income and number of household members with 56% of those reporting income of less <20K per year being the only household member compared with 17% and 9% of those reporting annual household incomes of 20 - 5 OK and >50K per year, respectively (pO.OOOl). The mean number of household members was also significantly lower for those with an annual household income reported as <20K (mean 1.7, standard deviation 1.1), compared with those reporting annual household incomes of >50K per year (mean 2.6, standard deviation 1.4, pO.OOOl). Wi th respect to self-reported education, 114 (51%) completed at least one year of post secondary education (median 1.0) and 52 (17%) received at least a bachelors degree; 32 (10%>) completed at least five years of post-secondary education. The sample was also heterogeneous for the contextual measures (neighbourhood median income, prevalence of having received at least a bachelor's degree, and percent neighbourhood unemployment), with participants residing in neighbourhoods with unemployment rates varying from 1% to 195 18% (median = 9%) and the prevalence of a bachelor's degree ranging from 2% to 52% (median=T2%). The self-reported measures of SES (annual income and education) were moderately correlated (Spearman's rho 0.33, pO.0001) . However, the contextual measures (tended to be more highly correlated amongst themselves with correlation coefficients ranging from 0.56 to 0.73 (all pO.0001) . Correlations between self-reported and contextual measures were mostly low with coefficients ranging from 0.11 to 0.31 (all p<0.05). Unadjusted associations between the generic H R Q L measures (the SF-6D, the H U B , the HUI2 , and the EQ-5D) , or the R A Q o L and the H A Q , and demographic, SES and R A severity variables are presented in Tables 6.3, 6.4 and 6.5, respectively. There were no associations between age and any of the generic health related quality of life measures; however, there was a significant positive association with the H A Q disability index (p=0.02). On average, men had significantly better generic and disease-specific quality o f life and functional status as measured by most of the instruments. Most of the R A severity variables were significant across all of the H R Q L instruments and the H A Q . N o associations were found between contextual measures of SES (neighbourhood median income, prevalence of bachelor's degrees, and proportion of neighbourhood unemployed) and any of the generic or disease-specific H R Q L measures or the H A Q (Tables 6.3, 6.4 and 6.5) with the exception of median neighbourhood income and the E Q - 5 D (p=0.02). For both self-reported income and education levels, comparison of mean values of all measures (SF-6D, H U B , HUI2 , E Q - 5 D , R A Q o L , H A Q ) by A N O V A showed a significant gradient across SES categories (Table 6.6). For example, lower levels of income were associated with poorer generic and disease-specific H R Q L and physical function. Results were confirmed with nonparametric tests. In general, all measures of health utilization 196 (hospitalization, use of professional services, and use of physical aides/equipment), joint damage, and health status showed a consistent gradient of worse functioning in those in lower self-reported SES categories (self-reported annual family income and self-reported education level) (Table 6.6). There were no differences across SES categories for type or number of D M A R D s or the use of prednisone over the past three months. There were no gradients or associations between any of these variables and the contextual measures of SES. For the proximate (self-reported) SES measures, the differences in these variables were statistically significant for most of the subjective R A severity measures (self-reported R A severity, patient global assessment of disease activity) and both the global scores of the generic and disease-specific quality of life measures. O f note, most of the physically-based clinical measures (such as joint counts) were not significantly different between the different SES levels suggesting no physical differences in disease severity. The results of the O L S regressions with adjustment for disease severity measures show significant associations between the H U D and the SF-6D overall scores and the H A Q for self-reported income (Figures 6.1, 6.2 and 6.3). There were no other significant associations for other measures of SES (self-reported education of contextual measures) after adjustment for disease severity. To account for differences in household size across self-reported annual household income categories, we included number of people in the household in the regression models. Other measures of disease management and severity that were tested but did not improve the overall fit of the model included number and type of D M A R D s used within past three months, number of other chronic diseases, swollen joint count (collinear with tender joint count) and erythrocyte sedimentation rate. O f note, the R A Q o L did not significantly differ across self-reported income categories. A l l differences in 197 the p-estimates between the lowest and highest income categories exceeded the M O D for the SF-6D, H U D , and the H A Q . The results of the T S L S regression analyses also reveal that self-reported annual income was significantly associated with most of the generic H R Q L measures (p=0.03, p=0.03 and 0.04 for the HUI2 , H U D and SF-6D, respectively) and the H A Q (p=0.002) but not for the R A Q o L (p=0.07), or the E Q - 5 D (p=0.14). A comparison of the beta-coefficients between O L S and T S L S regression revealed close agreement for the HUI2 (0.02 vs. 0.04), H U D (0.03 for both), the SF-6D (0.01 for both), the E Q - 5 D (0.03 and 0.04), the H A Q (-0.09 vs. -0.08) and the R A Q o L (-0.50 vs. -0.70). Thus, there likely was little, i f any, feedback between income and H R Q L or functional status in the sample adding further credence to the utilization of O L S . 6.5 DISCUSSION This study demonstrates a consistent and significant gradient in both generic and disease-specific H R Q L and functional status and other self-reported health measures across income and education categories. This gradient is maintained even after adjustment for R A severity and number of people in the household for self-reported income (but not education) categories. O f note, there was no significant gradient across SES measures for physically defined R A severity measures (disease process and joint damage measures). Thus, these results highlight how SES impacts a well-defined chronic disease such as R A by influencing how patients perceive and report their health status. These findings become particularly important when one considers that self-rated health predicts mortality even after controlling for a wide range of factors (demographic, 23 psychosocial, prior illness, physician's assessments and physiological measures). Thus, 198 from our results, we have determined that low SES predicts poor self-reported health independently of R A severity and may thus be a strong contributing factor to the early mortality and substantial morbidity seen in R A patients with low S E S . 2 4 ' 2 5 Another important finding is that the magnitudes of utility values assessed by both the H U D and the SF-6D significantly vary by SES independently of R A severity measures. In addition, although not statistically significant for the HUI2 and the E Q - 5 D , there was a pattern for higher scores in higher income groups. This finding has potentially important ramifications for results of cost-utility analyses in therapies for R A as investigators need to ensure balance between treatment groups in not only clinical and disease specific factors but also SES. Therefore, in order to avoid potential confounding or bias due to SES status in economic evaluation, one would need to: 1) verify SES at baseline in randomized controlled trials (RCT) where utility measures w i l l be used in a cost-utility analysis; 2) control for SES in observational studies where such measures may be used; and 3) ensure that the results obtained from the sample w i l l be generalizable to the population of interest (i.e. to the extent that studies do not report SES and/or the SES is not similar to the general population). Sculpher and O'Br ien outline some additional concerns with using the results of • 26 indirectly assessed utility scores that are influenced by income in cost-utility analysis. They state that often, in cost-benefit analysis, where willingness to pay is often utilized to assign monetary value to benefits, ability to pay biases such data in favour of the more affluent. Q A L Y s , as used as outcomes in cost-utility analyses, are thought to avoid this potential bias. However, the authors argue (and our results suggest) that this may not be the case. Specifically, the authors state that the effects of income could come into play when individuals are asked to value health states to generate utilities for the indirect utility instrument scoring functions or when the instrument is applied in the field. In our study, the 199 latter scenario is applicable. In this situation, the authors state that there is no reason why income effects should be excluded as these could be a relevant component of illness that may contribute to deficits in health status. However, these income effects could bias cost-utility analysis in at least two different ways: 1) when cross-national comparisons are being made and there are differences between countries in the levels of income maintenance available to the sick; and 2) the possibility of double-counting i f income effects of reduced health have already been factored into the valuation stage o f the instrument. For example, reduced quality of life that is mediated by loss of income should be counted in the denominator of the cost-effectiveness ratio. However, i f one also includes loss of income as an indirect cost in the numerator, than there is a potential to count these effects twice - in the numerator and the denominator. In R A , while it is well established that there are associations between low SES and morbidity and mortality, the mechanisms behind these associations are largely unknown. Callahan et a l . 2 7 reported that scores on a helplessness scale appeared to mediate a component of the association between formal education level and five year mortality. In a study attempting to identify a partial explanation for the association between low education and poor outcome in R A , K a t z 2 8 identified that self-care was strongly associated with education and thus concluded that low education was a proxy for a constellation of factors responsible for poor health outcomes. Therefore, the differences in self-reported health that we observed on both the generic and disease-specific H R Q L and the H A Q scales might be indicative o f helplessness or inability to complete self-care tasks in patients with lower SES. Our results generally support the findings by Brekke et a l . . 2 9 and McEntegart et a l . . 3 0 who showed that self-reported health outcomes, but not objective indices of disease activity, differed across groups based upon SES. Specifically, McEntegart et a l . . 3 0 , revealed how 200 patients l iving in more deprived areas in Scotland had poorer H A Q scores as compared to those l iving in more affluent areas. Similarly, Brekke et al.., who conducted their study as a comparison of R A patients from affluent west Oslo to those from deprived east Oslo, extended these findings to disease-specific and generic quality of life measures. Both of these analyses used contextual measures of SES. The study by McEntegart et al.. utilized the Carstairs index (a composite score using postal code that draws on measures of overcrowding, male unemployment, social class and car ownership) while Brekke et al.. utilized neighbourhood factors (such as income, education, employment, mortality, housing standard and proportion of third world citizens) to define the two areas of Oslo as affluent or deprived. Our findings build on those previously reported by including multiple measures of SES including those directly reported by the patient as opposed to only performing neighbourhood level analyses and the addition of two preference-based, generic H R Q L instruments and the R A Q o L . Since we collected patient-specific R A drug treatment data, we were able to determine that there were no treatment differences across SES categories that could have influenced self-reported outcomes. Similarly, since all of our subjects were under the care of rheumatologists, any differences in specialist versus non-specialist care that may have been due to SES and could potentially have influenced self-reported outcomes were avoided. In addition, our study is the first to examine i f this relationship holds true in a North American country with universal access to health care. O f note, we adjusted our model by the number of people l iving in the household. While we found that there was a significant difference in number of people per household across self-reported annual income with higher levels o f income reported by those with larger families, this variable was not significantly 201 associated with the self-reported health variables, did not effect the magnitude or significance of the association of annual household income with the dependent variables, and did not significantly improve the multiple linear regression model fit. Another point of interest that arises from the results of our study was the lack of a consistent gradient (except for the HUI3) across the income categories for the adjusted models of self-reported generic and disease-specific H R Q L and functional status (Figures 6.1 and 6.2). For example, with the SF-6D, it appears that the biggest difference across income categories is between the middle and highest groups rather than between the lowest and highest groups. These results bring up the possibility that there may not be a perfect gradient across the three separate income categories and that it may be a dichotomous phenomenon (i.e. high versus low income) with an annual household income cut-off of approximately $50,000 defining the two groups. Another possible explanation is that there is another factor that is somehow influencing the self-reported health outcomes for the middle income category making it lower than both the high and low income categories. O f interest, in our study, there was a low correlation between the proximate (self-reported) and contextual measures of SES. With both annual household income and education, there were strong univariate associations with self-reported health. However, once adjustments for R A severity were made, only self-reported annual income remained significant. A possible explanation for the lack of association between the self-reported H R Q L measures and education and contextual SES measures is that they may not be indicative o f SES in elderly populations such as those with R A . Our sample was mostly comprised of subjects who had worked in an era when there was less emphasis on education. Therefore, in contrast to a younger, employed sample of asthmatics from the same o geographical area where education and income was highly correlated, results from our 202 sample revealed that these two variables were less correlated. Similarly, in the aforementioned asthma sample, there were strong correlations between contextual and proximate measures o f SES that were not observed in our R A sample indicating that these measures of SES may be more robust for younger participants who are more likely to be currently employed. Another finding from our sample that supports this premise is that older individuals (>50 years of age) who were still in the work force tended to have less education but similar income to those working individuals less than 50 years old. While there were significant gradients across SES (as defined by annual household income) for both of the generic H R Q L measures and the H A Q after adjustment for R A severity, similar findings were not observed for the R A Q o L . Despite significant univariate gradients across SES as defined by annual household income and education, the R A Q o L did not display a clear SES gradient in the multiple linear regression analysis. We postulate that the reason for this is that the R A Q o L is capturing items that are so germane to R A that the variance in its score is explained mostly by the objective and subjective disease severity measures. Indeed, the addition of annual household income had a negligible impact on the model R 2 in the multiple linear regression analysis of the R A Q o L whereas it improved the model fit in all the other analyses. Finally, it can be argued that the results using O L S regression only reveal an association between self-reported annual household income and H R Q L or functional status without the ascertainment o f directionality (i.e. is it the low income that is causing the low HRQL/functional status or vice versa?). We utilized T S L S regression to account for this and found no evidence to support that the low income was \"caused\" by the low H R Q L or functional status (i.e. the beta coefficients achieved by O L S were not biased). This finding likely makes sense in our sample since most participants were elderly and retired and their 203 current annual household income was likely not influenced by their current H R Q L or functional status. Our study shows that even in a country such as Canada with universal access to health care, the impact of R A on self-reported health is strongly associated with SES as measured by annual income even after adjusting for disease severity. Because self-reported health has been strongly associated with mortality and morbidity, there are important implications for intervention. In addition, these findings should be considered in the context of cost-utility analysis to prevent biasing of utility values obtained from preference-based instruments. In the event that studies do not investigate or report SES or i f the SES in the study sample differs significantly from the population of interest, the results of the analysis may have poor generalizability. Further research should focus on the mediating factors that contribute to this social gradient in self-reported health outcomes in R A . 204 6.6 REFERENCES 1. Idler E L , Benyamini Y . Self-rated health and mortality: A review of twenty-seven community studies. J Health Social Behaviour 1997;38:21-37. 2. Idler E L , Kas l S. Health perceptions and survival: Do global evaluations of health status really predict mortality? J Gerontol 1991;46:S55-S65. 3. Pincus T, Sokka T. Quantitative measures for assessing rheumatoid arthritis in clinical trials and clinical care. Best Pract Res C l i n Rheumatol. 2003;17:753-81. 4. Wolfe F, Michaud K , Gefeller O, Choi H K . Predicting mortality in patients with rheumatoid arthritis. Arthritis Rheum. 2003;48:1530-1542. 5. Franks P, Gold M R , Fiscella K . Sociodemographics, self-rated health, and mortality in the U S . Soc Sci Med. 2003;56:2505-2514. 6. Alter D A , Naylor C D , Austin P C , Chan B T , T u J V . Geography and service supply do not explain socio-economic gradients in angiography use after acute myocardial infarction. Can M e d Assoc J 2003;168:261-264. 7. Hawker G A , Wright JG , Glazier R H , Coyte P C , Harvey B , et al.. The effect of education and income on need and willingness to undergo total joint arthroplasty. Arhritis Rheum 2002;46:3331-3339. 8. Lynd L D , Pare P D , Ba i T, Fitzgerald J M , Anis A H . A cross-sectional evaluation of the relationship between socioeconomic status and the magnitude o f short-acting beta-agonist use in asthma. Chest (accepted December 2003) 9. Wood E , Montaner JS, Chan K , Tyndall M W , Schechter M T , et al.. Socioeconomic status, access to triple therapy, and survival from HIV-disease since 1996. A I D S 2002;16:2065-2072. 205 10. Marra C A , Woolcott JC, Shojania K , Offer R, Kopec J, Brazier JE , Esdaile J M , Anis A H . A comparison of generic, indirect utility measures (the HUI2 , HUI3 , SF-6D, and the EQ-5D) and disease-specific instruments (the R A Q o L and the H A Q ) in rheumatoid arthritis. Soc Sci M e d (submitted) 11. Marra C A , Esdaile J M , Guh D , Kopec J A , Brazier JE, Chalmers A , Koehler B , Anis A H . A comparison of four indirect methods of assessing utility values in rheumatoid arthritis. M e d Care (submitted). 12. Brazier J, Roberts J, Deverill M . The estimation of a preference-based measure of health from the SF-36. J Health Econ 2002;21:271-292. 13. Walters SJ, Brazier JE . What is the relationship between the minimally important difference and health state utility values? The case of the SF-6D. Health Qual Life Outcomes. 2003;11:4-12. 14. Feeny D , Furlong W , Torrance G W , Goldsmith C H , Zhu Z , et al.. Multiattribute and single-attribute utility functions for the Health Utilities Index Mark 3 system. M e d Care 2002;40:113-128. 15. Drummond M F , O 'Br ien B , Stoddart G L , Torrance G W (eds.). Methods for the economic evaluation of health care programmes. 2nd edition. Oxford Medical Publications, Oxford. 1997. 16. Bruce B , Fries JF. The Stanford Health Assessment Questionnaire: a review of its history, issues, progress, and documentation. J Rheumatol 2003;30:167-178. 17. Redelmeier D A , Lor ig K . Assessing the clinical importance of symptomatic improvements — an illustration in rheumatology. Arch Intern M e d 1993; 153:1337-1342. 206 18. Wells G A , Tugwell P, Kraag G R , Baker PR, Groh J, Redelmeier D A . Min imum important difference between patients with rheumatoid arthritis: the patient's perspective. J Rheumatol 1993;20:557-560. 19. De Jong Z , V a n Der Heijde, Mckenna SP, Whalley D . The reliability and construct validity of the R A Q o L : A rheumatoid arthritis-specific quality of life instrument. B r J Rheumatol 1997;36:878-883. 20. Wong A L , Wong W K , Harker J, Sterz M , Bulpitt K , Park G , Ramos B , Clements P, Paulus H . Patient self-report tender and swollen joint counts in early rheumatoid arthritis. Western Consortium of Practicing Rheumatologists. J Rheumatol 1999;26:2551-2561. 21. B C Stats. 1996 Census of Canada. Victoria, B C : Ministry of Finance and Corporate Relations, Government of B C ; 1998 March 9, 1998. 22. Cohen J. A power primer. Psychol B u l l 1992;112;155-159. 23. Idler E L , Angel RJ . Self-rated health and mortality in the N H A N E S - I epidemiologic follow-up study. J Pub Health 1990;80:446-452. 24. Maiden N , Capell H A , Madhok R, Hampson R, Thomson E A . Does social disadvantage contribute to the excess mortality in rheumatoid arthritis patients? A n n Rheum Dis 1999;58:525-529. 25. Pincus T, Callahan L F , Sale W G , Brooks A L , Payne L E , et al.. Severe functional declines, work disability, and increased mortality in seventy-five rheumatoid arthritis patients studied over nine years. Arthritis Rheum 1984;27:864-872. 26. Sculpher M J , O 'Br ien B J . Income effects of reduced health and health effects of reduced income. Med Decis Making 2000;20:207-215. 207 27. Callahan L F , Cordray D S , Wells G , Pincus T. Formal education and five-year mortality in rheumatoid arthritis: Mediation by helplessness scale scores. Arthritis Care Res 1996;9:463-472. 28. Katz PP. Education and self-care activities among persons with rheumatoid arthritis. Soc Sci M e d 1998;46:1057-1066. 29. Brekke M , Hjortdahl P, Thelle D S , Kv ien T K . Disease activity and severity in patients with rheumatoid arthritis: relations to socio-economic equality. Soc Sci M e d 1999;48:1743-1750. 30. McEntegart A , Morrison E , Capell H A , Duncan M R , Porter D , et al.. Effect of social deprivation on disease-severity and outcome in patients with rheumatoid arthritis. A n n Rheum Dis 1997;56:410-413. 208 TABLE 6.1: CHARACTERISTICS OF THE STUDY PARTICIPANTS (N= 313) Parameter Mean SD Age (yrs) 61.5 25.9 RA Duration (yrs) 13.87 11.41 Pain V A S (mm) 0 to 100 43.12 27 Tender Joint Count 0 to 50 15.09 12 Sw ollen Joint Count 0 to 50 9.14 9.67 Erythrocyte Sedimentation Rate (mm/hr) 24.71 21.01 HAQ Disability Index 0 to 3.0 1.1 0.77 RAQoL Score 0 to 28 12.73 8.48 HUI-2 Global Utility Score -0.03 to 1.00 0.71 0.19 HUI 3 Global Utility Score -0.36 to 1.00 0.53 0.29 EQ-5D Global Utility Score -0.59 to 1.00 0.67 0.24 SF-6D Global Utility Score 0.31 -1.00 0.63 0.13 Parameter N % Self-Reported RA Severity, n % Very Mild 9 3% Mild 34 11% Moderate 120 38% Severe ~ 1 1 0 35% Very Severe 27 9% Self-Reported RA Control, n % Very Well Controlled 33 11% Well Controlled 76 24% Adequately Controlled 123 39% Not Well Controlled 61 19% Not Controlled At All 7 2% Hospitalized For RA in Last 12 Months, n% 45 15% Missed Work or School Due to RA in Last 12 Months, n% 59 19% Purchased or Rented Equipment for RA in Last 12 Months, n% 72 23% Used Allied Health Professional/Home Care Services in Last 12 months, n % 129 42% Concomitant Chronic Illness Other Than RA, n% 192 62% 209 TABLE 6.2. PROPERTIES OF THE MEASURES OF SOCIOECONOMIC STATUS (SES) IN OUR SAMPLE Contextual SES Measures Mean SD Neighbourhood Median Income 20040 4313 Bachelor's Education (%) 15.2 10.2 Neighbourhood Unemployment Rate (%) 8.7 2.6 Self-Reported: SES Measures (Proximate) Number % Education Completed Less than High School/Trade 75 24% High School/Trade 169 54% Bachelor's 52 17% Missing 17 5% Annual Household Income <$20,000 50 16% $20,000 to $50,000 115 37% >$50,000 90 28% Missing 58 19% 210 TABLE 6.3 UNIVARIATE ASSOCIATIONS WITH THE GENERIC HRQL MEASURES (THE SF-6D AND THE HUB) SF-6D Global Utility Score HUI3 Global Utility Score Factor Regression Coefficient (SE) p-value Regression Coefficient (SE) p-value DEMOGRAPHICS Age Gender (Female is reference) -0.0002 (0.0006) 0.05 (0.02) NS <0.0001 -0.002 (0.001) 0.08 (0.04) NS 0.04 RA SEVERITY VARIABLES Years since diagnosis -0.002 (0.0006) 0.001 -0.006 (0.001) 0.0001 No. of other chronic diseases -0.02 (0.005) 0.006 -0.03 (0.01) 0.004 Erythrocyte sedimentation rate -0.002 (0.0006) 0.0003 -0.003 (0.001) 0.0007 Tender joint count -0.006 (0.0005) O.0001 -0.01 (0.001) O.0001 Swollen join count -0.006 (0.0007) O.0001 -0.01 (0.002) O.0001 Global pain VAS -0.003 (0.0002) O.0001 -0.006 (0.0005) <0.0001 Patient global assessment VAS 0.003 (0.0002) O.0001 0.007 (0.0005) O.0001 HAQ disability index score -0.12(0.08) O.0001 -0.29 (0.01) <0.0001 Hospitalization in last year* -0.03 (0.02) NS -0.10(0.05) 0.03 Home/Health services for RA* -0.05 (0.02) 0.0009 -0.15(0.03) O.0001 Purchase/rent RA equipment* -0.05 (0.02) 0.005 -0.19(0.04) O.0001 Missed work/school in last year* -0.07 (0.02) 0.0004 -0.14(0.05) 0.004 RA self-reported severity O.0001 <0.0001 Very mild 0.33 (0.04) O.0001 0.51 (0.10) <0.0001 Mild 0.24 (0.03) O.0001 0.47 (0.08) O.0001 Moderate 0.15 (0.02) O.0001 0.31 (0.05) O.0001 Severe 0.08 (0.02) 0.0003 0.15(0.06) 0.0008 Very severe Ref Ref RA self-reported control O.0001 Very well controlled 0.28 (0.05) 0.0001 0.74 (0.10) <0.0001 Well controlled 0.24 (0.04) 0.0001 0.66 (0.10) O.0001 Adequately controlled 0.17(0.04) 0.0001 0.54 (0.09) O.0001 Not well controlled 0.07 (0.04) 0.12 0.31 (0.10) 0.002 Not controlled at all. Ref Ref PROXIMATE SES FACTORS Education Completed <0.0001 NS None -0.06 (0.02) 0.01 -0.11 (0.05) 0.04 High school/Trade -0.04(0.02) . 0.03 -0.06 (0.05) 0.17 Bachelors education Ref Ref Yrs. Post-secondary education 0.005 (0.003) NS Annual household income <0.0001 0.0003 < $20,000 -0.09 (0.02) O.0001 -0.21 (0.05) <0.0001 $20,000 - $50,000 -0.06 (0.02) O.0001 -0.10(0.04) 0.01 > $50,000 Ref Ref CONTEXTUAL SES FACTORS Median Neighborhood Incomet % Bachelors Education Neighbourhood Unemployment 0.017 (0.018) 0.0002 (0.0008) -0.0005 (0.002) NS NS NS 0.04 (0.04) 0.0006 (0.002) -0.006 (0.005) NS NS NS * Reference category is \"no\" t For categories of $ 10,000 211 T A B L E 6.4 U N I V A R I A T E A S S O C I A T I O N S W I T H T H E G E N E R I C H R Q L M E A S U R E S ( T H E HUI2 A N D T H E EQ-5D) HUI2 Global Utility Score EQ-5D Global Utility Score Factor Regression Coefficient (SE) p-value Regression Coefficient (SE) p-value DEMOGRAPHICS Age Gender (Female is reference) -0.01 (0.0009) 0.04 (0.03) NS NS 0.0008 (0.001) 0.05 (0.03) NS NS RA SEVERITY VARIABLES Years since diagnosis 0.002 (0.001) 0.04 -0.003 (0.001) 0.03 No. of other chronic diseases -0.03 (0.008) 0.0008 -0.02 (0.01) 0.09 Tender joint count -0.007 (0.0008) O.0001 -0.008 (0.001) O.0001 Erythrocyte sedimentation rate -0.002 (0.0007) 0.01 -0.002 (0.0008) 0.03 Swollen join count -0.008 (0.001) <0.0001 -0.01 (0.001) O.0001 Global pain VAS -0.004 (0.0003) <0.0001 -0.005 (0.0004) <0.0001 Patient global assessment VAS 0.004 (0.0003) O.0001 0.005 (0.0004) O.0001 Hospitalization in last year* -0.077 (0.032) 0.02 -0.05 (0.04) 0.18 Home/Health services for RA* -0.09 (0.02) <0.0001 -0.11 (0.03) <0.0001 Purchase/rent RA equipment* -0.12(0.03) <0.0001 -0.12(0.03) 0.0003 Missed work/school in last year* 0.08 (0.03) 0.009 0.13(0.04) 0.002 RA self-reported severity <0.0001 <0.0001 Very mild 0.31 (0.03) <0.0001 0.41 (0.08) O.0001 Mild 0.27 (0.07) O.0001 0.36 (0.05) <0.0001 Moderate 0.18(0.04) O.0001 0.24 (0.04) O.0001 Severe 0.05 (0.04) NS 0.12(0.04) 0.02 Very severe Ref Ref RA self-reported control O.0001 <0.0001 Very well controlled 0.40 0.07) O.0001 0.56 (0.08) <0.0001 Well controlled 0.37 (0.07) <0.0001 0.52 (0.08) <0.0001 Adequately controlled 0.31 (0.06) O.0001 0.42 (0.08) <0.0001 Not well controlled 0.13 (0.07) 0.05 0.22 (0.08) 0.006 Not controlled at all Ref Ref PROXIMATE SES FACTORS Education completed NS 0.02 None -0;07 (0.04) 0.04 -0.11 (0.04) 0.01 High school/trade -0.05 (0.03) NS -0.02 (0.04) NS Bachelors education Ref Ref Yrs. Post-secondary education 0.006 (0.005) NS 0.002 (0.006) NS Annual household income 0.002 0.001 < $20,000 -0.11 (0.03) 0.0006 -0.15(0.04) 0.0004 $20,000 - $50,000 -0.07 (0.03) 0.02 -0.09 (0.03) 0.009 > $50,000 Ref Ref CONTEXTUAL SES FACTORS Median neighborhood income t % bachelors education Neighbourhood unemployment • 0.000002 (0.000003) 0.0001 (0.001) -0.003 (0.004) NS NS NS 0.000008 (0.000004) 0.001 (0.001) -0.008 (0.005) 0.02 NS 0.08 * Reference category is \"no\" t For categories of $10,000 212 TABLE 6.5 UNIVARIATE ANALYSIS WITH THE DISEASE-SPECIFIC MEASURES (THE HAQ AND THE RAQOL) HAQ Score RAQoL Score Factor Regression Coefficient (SE) p-value Regression Coefficient (SE) p-value DEMOGRAPHICS Age 0.007 (0.004) 0.03 -0.032 (0.05) NS Gender (Female is reference) -0.35 (0.11) 0.001 -3.57(1.13) 0.002 RA SEVERITY VARIABLES Years since diagnosis 0.02 (0.003) <0.0001 0.15(0.05) 0.002 No. of other chronic diseases 0.10(0.03) 0.002 1.02 (0.33) 0.002 Tender joint count 0.03 (0.003) <0.0001 0.38 (0.03) 0.0001 Erythrocyte sedimentation rate 0.008 (0.003) 0.007 0.08 (0.03) 0.004 Swollen join count 0.03 (0.004) <0.0001 0.42 (0.04) 0.0001 Global pain VAS 0.02 (0.001) O.0001 0.19(0.01) 0.0001 Patient global assessment VAS -0.01 (0.001) <0.0001 -0.19(0.01) 0.0001 Hospitalization in last year* 0.33 (0.13) 0.008 2.64(1.32) 0.05 Home/Health services for RA* 0.43 (0.09) O.0001 4.02 (0.93) 0.0001 Purchase/rent RA equipment* 0.48(0.10) <0.0001 4.76(1.11) 0.0001 Missed work/school in last 0.42 (0.12) 0.0005 6.28 (1.27) 0.0001 year* RA self-reported severity O.0001 0.0001 Very mild -1.40 (0.26) O.0001 -15.73 (2.69) 0.0001 Mild -1.26 (0.17) O.0001 -13.81 (1.80) 0.0001 Moderate -0.68 (0.14) <0.0001 -7.97(1.49) 0.0001 Severe -0.31 (0.15) 0.03 -2.34(1.02) 0.12 Very severe Ref Ref RA self-reported control <0.0001 0.0001 Very well controlled -1.39(0.28) O.0001 -17.85 (2.75) 0.0001 Well controlled -1.25 (0.27) O.0001 -16.01 (2.60) 0.0001 Adequately controlled -0.94 (0.26) O.0001 -10.98 (2.56) 0.0001 Not well controlled -0.40 (0.27) 0.13 -3.98 (2.63) 0.15 Not controlled at all Ref Ref PROXIMATE SES FACTORS Education completed 0.05 NS None 0.30(0.13) 0.02 3.10(1.45) 0.03 High school/trade 0.26 (0.12) 0.03 1.88 (1.27) 0.14 Bachelors education Ref Ref Yrs. Post-secondary education -0.03 (0.02) NS -0.10(0.22) NS Annual household income 0.0001 0.03 < $20,000 0.61 (0.13) 0.0001 3.43 (1.48) 0.02 $20,000 - $50,000 0.50 (0.10) 0.0001 2.69(1.18) 0.02 > $50,000 Ref Ref CONTEXTUAL SES FACTORS Median neighborhood 0.1 (0.1) NS -0.0001 (0.001) NS incomet % bachelors education 0.0004 (0.002) NS -0.02 (0.05) NS Neighbourhood unemployment -0^ 0005 (0.002) NS 0.024 (0.15) NS * Reference category is \"no\" + For categories of $10,000 213 T A B L E 6.6: C O M P A R I S O N O F R A A N D H E A L T H S T A T U S M E A S U R E S A C R O S S D I F F E R E N T S O C I A L C L A S S E S Parameter Self-Reported Annual Family Income Self-Reported Education Level <20K 20 - 50K >50K p-value 0.7n i_ o o 00 >> £ 0.6-n o O £2 X 0.54 0.4-HUI3 1 <20 K 20-50K >50 K Self-Reported Annual Income p = 0.05 (type III sums of squares) Model R 2= 0.66 SF-6D o u CO n o O Q to • u. co 0.674 0.624 0.57-1 I <20K 20-50K >50K Self-Reported Annual Income p=0.02 (type III sums of squares) Model R2 = 0.71 The points indicate the least squares means and the lines indicate their 95% confidence intervals. HRQL measures were adjusted for RA duration, pain VAS, self-reported RA control and severity, tender joint count, RAQoL score, and number of people in the household. Higher scores indicate better HRQL. 215 FIGURE 6. 2: GENERIC HRQL BY SELF-REPORTED ANNUAL INCOME (HUI2 AND EQ-5D) HUI2 o u CO n .o o O CM => I 0.6-1 <20 K 20-50K >50 K Self-Reported Annual Income p = 0.45 (type III sums of squares) Model R 2 = 0.61 o 2 o O Q m • a UJ E Q - 5 D 0.7H 0.6H I I <20 K 20-50K >50 K Self-Reported Annual Income p = 0.20 (type III sums of squares) Model R 2 = 0.63 The points indicate the least squares means and the lines indicate their 95% confidence intervals. HRQL measures were adjusted for RA duration, pain VAS, self-reported RA control and severity, tender joint count, RAQoL score, and number of people in the household. Higher scores indicate better HRQL. 216 FIGURE 6.3: RAQOL SCORE AND HAQ DISABILITY INDEX BY SELF-REPORTED INCOME 14.5-1 14.0-1 © 13.5 8 13.0-1 CO _J 12.5-1 Q 12.0-1 11.54 11.0-1 10.5 RAQoL Score T \" T p = 0.85 (type III Model R2= 0.62 <20K 20-50K >50K Self-Reported Annual Income sums of squares) 1.4-1 | 1.3H ^ 1.2H >« = 1.H g 1.0H a 0 9 I 0.8H 0.7« HAQ Score I I p = 0.0003 (type Model R2= 0.51 —I 1 1 <20K 20-50K >50K Self-Reported Annual Income sums of squares) The points indicate the least squares means and the lines indicate their 95% confidence intervals. HRQL measures were adjusted for RA duration, pain VAS, self-reported RA control and severity, tender joint count, RAQoL score, and number of people in the household. Higher scores indicate better HRQL. 217 CHAPTER 7 ARE INDIRECT UTILITY MEASURES RELIABLE AND RESPONSIVE IN RHEUMATOID ARTHRITIS PATIENTS? 7.1 FOREWORD This manuscript is currently under review under the same title in the Quality of Life Research. The candidate is the first author of the manuscript with is co-authored by Daphne Guh who provided statistical support; Dr. A m i r Ade l Rashidi who assisted with data entry, database construction, and data manipulation; Dr. Jacek Kopec, a member of the candidate's committee who supplied analytical advice; Dr. Michal Abrahamowicz who invented the polytomous regression techniques for responsiveness and instructed the candidate in their application; and Dr. John Brazier who developed the SF-6D and provided methodological advice. Drs. As lam Anis and John Esdaile, co-supervisors of the candidate, were also co-authors on the manuscript. The candidate's role in the manuscript involved the conception of the research question, development of the primary hypothesis and methodology, all statistical analyses, and the writing of the final manuscript. 7.2 INTRODUCTION Improvement in health quality of life ( H R Q L ) is considered to be one of the most important goals in the management of rheumatoid arthritis ( R A ) . 1 A s such, H R Q L and health status measures have often been used as outcomes in clinical trials and studies assessing a 218 variety of interventions in R A . 2 \" 5 A variety of instruments that assess RA-specific H R Q L (for example, the Arthritis Impact Measurement Scales (AIMS) , the Rheumatoid Arthritis Quality of Life questionnaire ( R A Q o L ) ) or generic H R Q L or function (such as the Short Form 36 (SF-36)) have been applied to the assessment of R A . 2 ' 6 ' 7 Preference-based or indirect utility measures are generic H R Q L measures that are often used in clinical and observational studies as the scores that they generate can be utilized to calculate quality adjusted life-years ( Q A L Y s ) and can thus be integrated into cost-o utility analyses. Examples of these instruments include the Health Utilities Index Mark 2 (HUI2) and Mark 3 ( H U D ) , EuroQol (EQ-5D), and the Short Form 6D (SF-6D). A l l of these instruments have been previously applied in the assessment of patients with R A . 9 \" 1 1 Responsiveness is often defined as the ability of an instrument to measure change; however, there are multiple definitions of responsiveness that exist in the literature. These definitions can be divided into three categories: 1 3 1) ability of an instrument to detect change in general (also referred to as \"sensitivity\" by Liang et a l . ) 1 4 ; 2) ability o f an instrument to detect clinically important change; and 3) the ability of an instrument to detect real changes in the concept being measured. There has been little work in the evaluation and comparison of responsiveness (using any definition) of the indirect utility instruments. A recent study by Conner-Spady et a l . , 1 1 examined the responsiveness of three preference-based measures o f H R Q L (EQ-5D, H U D , and the SF-6D) in a sample of patients with at least one of several types of rheumatological conditions. To our knowledge, there have been no evaluations of the responsiveness of the R A Q o L in R A in North American populations although one has been published in a Swedish sample. 7 Therefore, there remains a need for more research to assess the responsiveness of 219 these measures, to compare their characteristics, and to determine how their properties compare to disease-specific measures. Finally, since the indirect utility measures are often used as the source of weightings used for Q A L Y s in cost-utility studies in R A , it is important that they are determined to be reliable, valid and responsive in this disease state. Therefore, the primary purpose of this study was to examine the reliability and responsiveness of the indirect utility instruments and the R A Q o L and the H A Q from baseline to six months in a sample of rheumatoid arthritis patients. A secondary purpose was to examine the reliability and validity of using a patient completed transition question as the external criterion to assess responsiveness in R A . 7.3 M E T H O D S 7.3.1 Study Sample To be included, subjects had to have a rheumatologist-confirmed diagnosis of R A (as defined by the American College of Rheumatology diagnostic criteria) 1 5, receive rheumatology care within the province o f British Columbia, consent to answer the questionnaires, be sufficiently proficient in English to answer the questionnaires, and be wil l ing to participate in follow-up surveys. Recruitment of R A patients began in October 2001 and ended in September 2002. Ethical approval for this study was obtained through the University of British Columbia's Behavioural Ethics Committee and informed consent was obtained from each of the participants. Eight private rheumatologists' offices from the study areas referred subjects into the sample during their interactions in routine clinical practice. In addition, two of the eight rheumatologists' practices sent letters to all of their patients with R A inviting them to 220 participate in the survey. A l l patient questionnaires were self-administered, self-completed and submitted via mail. The study rheumatologists' offices supplied additional information from the patients' health record. 7.3.2 Measures Participants were asked to complete a questionnaire at baseline and three and six months thereafter. The questionnaire consisted of sections devoted to socio-economic, clinical and functional status and quality of life assessment instruments. 7.3.2.1 Clinical Participants were asked questions regarding their R A and medication history including adverse reactions over the past three months. Other self-reported clinical variables included swollen joint count (SJC) and tender joint count (TJC) (using the mannequin-based 42 joint count methodology) 1 6, a 10 cm pain visual analogue scale ( V A S ) , a patient global assessment of disease activity (10 cm V A S ) 1 , and R A severity and R A control (both using a 5 point Likert scale). Erythrocyte sedimentation rate (ESR) values closest to the date of completion of the questionnaire (within 1 month) were extracted from the patient's chart for those patients whose rheurnatologist used this measure for patient monitoring. In addition, the attending rheumatologists were asked to complete a physician global assessment of disease activity (10 cm V A S ) for each patient.1 In addition, for the six-month questionnaire, participants were asked to complete a five point Likert scale that assessed changes in their R A since answering the baseline 221 questionnaire. The question asked was \"Overall, how would you describe changes in your rheumatoid arthritis since answering the FIRST questionnaire (i.e. about 6 months ago?\"). Response choices included \"Much Worse\", \"Somewhat Worse\", \"The Same\", \"Somewhat Better\" and \" M u c h Better\". These questions are referred to as \"patient transition questions\" for the remainder of the manuscript. To increase the number of patients in each category, responses to these questions were collapsed into three categories as follows: (1) worse (included responses \"much worse and somewhat worse\"); (2) the same; and (3) better (included \"much better and somewhat better) which is a similar approach adopted by other 9 12 14 investigators. ' ' The sample of R A patients in our study experienced \"natural\" courses of their disease over time rather than exposure to a treatment o f known efficacy, administered in a randomized design. In group level analyses, average change scores can mask the proportion of patients with follow-up scores that differ from those at baseline. Because of this, we carried out separate analyses for each of the distribution-based responsiveness measures according to our collapsed transition question criteria (\"worse\", \"the same\", or \"better\"). I I 17 This is the same approach used by other investigators. ' 7.3.2.2 Health Status and HRQL Measures 7.3.2.2.1 Health Assessment Questionnaire (HAQ) Disability Index The H A Q is a measure o f physical disability that assesses ability to complete everyday tasks in areas such as dressing and grooming, rising, eating, walking, personal hygiene, reach, grip and other activities (such as getting into and out of a car). Each of these areas is assigned a section score that is further adjusted to account for the use of any aids, 222 devices or help from another person. Section scores are then summed and averaged to give an overall score between 0.0 (best possible function) to 3.0 (worst function). A H A Q score difference of 0.25 is said to represent the minimally important difference ( M I D ) . 1 8 ' 1 9 7.3.2.2.2 Rheumatoid Arthritis Quality of Life Questionnaire (RAQoL) The R A Q o L consists of 30 questions (answered by yes/no) that assess such aspects of R A as moods and emotions, social life, hobbies, everyday tasks, personal and social relationships, and physical contact. The R A Q o L is scored by assigning a point for each affirmative response and no points for negative responses. Thus, scores range from 0 (least severity) to 30 (highest severity). To date, the M I D for the R A Q o L has been estimated to be approximately 2.00. 7.3.2.2.3 Preference Based Measures - MA UT Instruments The indirect utility assessment instruments used in the questionnaire were the HUI2 , H U D , SF-6D, and the EQ-5D. In a cross-sectional analysis in patients with R A , the M I D for the overall utility scores was determined to be 0.03 to 0.04 for the H U I 2 , 0.06 to 0.07 for the H U D , and 0.03 to 0.05 for the SF-6D and the E Q - 5 D . 2 0 In another analysis of seven longitudinal studies examining SF-6D global utility scores, investigators estimated that the M I D to be 0.033 (95% CI: 0.029 to 0.037). 1 0 A recent comprehensive review of the similarities and differences across these instruments is available and is beyond the scope of this research paper. 223 7.3.3 Data Analysis 7.3.3.1 Reliability The reliability of a transition question to assess changes in health status has not previously been studied. 1 0 To determine reliability of the patient transition question and the other questionnaires, a second questionnaire was sent to a randomly selected group of 50 patients immediately after receipt of their follow-up questionnaire with the instructions to complete and return within 5 weeks. The five week period was chosen as this was determined a priori to be the time window in which changes (either improvement or deterioration) in their R A would be unlikely. Patients were instructed to answer the transition question in relation to their baseline questionnaire. Another approach to test-retest reliability was the \"stable groups\" approach comparing scores from patients who reported that they remained stable from 0 to 6 months. For all analyses, intraclass correlation coefficients (two-way mixed effect model such that the subject effect was random and the instrument effect was fixed) were calculated for the overall scores from the two time periods. 7.3.3.2 Validity of the Transition Question Similarly, no assessments have been conducted to determine the validity of the transition question with respect to assessing rheumatoid arthritis. 1 0 To determine agreement with the results from the collapsed transition question and changes in the H R Q L instruments (divided into \"worse\", \"same\" and \"better\" categories using M I D values from the literature as cut-off points), a weighted kappa (using quadratic weights) was calculated where 1.00 signifies perfect agreement and 0.00 signifies no agreement.2 2 We also estimated the M I D 224 using values that were calculated from this study using those who scored either \"somewhat better\" or \"somewhat worse\" on the transition question assuming that the change experienced by these patients was equivalent to the M I D as noted by other investigators. 1 0 Both anchor-and distribution-based approaches to assessing M I D values have been shown to yield similar values in many situations. 2 3 ' 2 4 To further examine the validity of using the transition question as the external criteria, we examined the Spearman's correlation between the transition question and changes in variables that were found to exhibit strong correlation with the generic and disease-specific H R Q L measures in cross-sectional analyses (patient global assessment of disease severity, pain visual analogue scale ( V A S ) , and the H A Q disability score). Spearman's rho of > 0.50 or < -0.50 were considered be strong, while values between -0.49 to -0.30 or 0.30 to 0.49 were considered moderate and values between -0.30 and 0.30 were considered to be weak). For purposes of this analysis, a correlation of moderate or greater was considered to be evidence of validity. 7.3.3.3 Measures of Responsiveness Our analysis focused on the assessment of responsiveness to change in R A for the indirect utility measures (the HUI2 , H U B , SF-6D and the EQ-5D) , the R A Q o L and the H A Q . Analysis for responsiveness was completed for the baseline and six month pairs of responses. For each patient who had data on all instruments at each o f the pair of visits, the difference between the two corresponding indirect utility, R A Q o L , and H A Q scores was calculated. In the primary analysis of responsiveness, the results were stratified into patients classified as \"better\", \"the same\", or \"worse\" according to the collapsed transition question. 225 In addition, in a secondary analysis, utilizing the patient global assessment of disease activity (called \"patient global\" hereafter), the percentage improvement over baseline was calculated utilizing the following formula: (6mos.patientglobal -baseline.patientglobal) + (baseline. patientglobaT) According to this formula, patients were classified as: 1) \"better\" i f the patient global had changed by > 20% , 2) \"the same\" i f the patient global had changed > -20% and < 20%; and 3) \"worse\" i f the patient global had changed < -20%. Spearman correlation coefficients were calculated to determine the correlation between this classification criteria and the collapsed transition question. A l l the indices of responsiveness (as described below) were calculated for the subgroups defined by this criterion. Five distribution-based approaches were employed to assess responsiveness: Oft 1) the effect size ( E S ) using the following formula: mean{x\\ - xi) totalgroup totalgroup where: xi = mean score at 6 months for the entire group X2 = mean score at baseline for the entire group SDtotaigroup = standard deviation at baseline for the entire group 226 A n effect size of 1 indicates a change in magnitude equivalent to one standard deviation. We adopted the criteria of Cohen, where absolute values of effect sizes (d) can be categorized as small (< 0.5), medium (0.5 to 0.8), or large ( > 0.8). 2 7 Positive values reflect improvement while negative values reflect worsening for the indirect utility instruments while the converse is true for the H A Q and the R A Q O L . 2) the standardized response mean (SRM) using the following formula: where xi = mean score at 6 months for the entire group X2 = mean score at baseline for the entire group S D (xi -X2)totaigroup = the standard deviation ( S D ) for the change in scores in the entire group. The absolute values of the S R M are regarded as either small (<0.5), medium (0.5 to 0.8) or large (>0.8) and the signs (either positive or negative) are interpreted as for the 3) the control standardized response mean ( C S R M ) using the following formula: mean{x\\ - xi) totalgroup ES. 28 mean{x\\ - xi) totalgroup 227 where the mean change in the total group refers to the mean change in the subgroups of \"worse\", \"same\", and \"better\" (as per the external criteria) and the standard deviation is taken from the subgroup reporting no change . The criterion for the size of the C S R M is the same as for the ES and S R M . 4) the relative efficiency statistic (RE) using the following formula t . ^ l comparison t goldstandard Given the information on the superior responsiveness of disease-specific over generic measures, 3 0 we selected the R A Q o L as the \"gold standard\" which to compare each of the instruments. The measure with the highest R E has the highest power for a given sample size, or requires fewer patients, to achieve a given level o f statistical power. 12 13 5) paired sample t-tests reported as a p-value Since the standard errors of the distribution-based approaches are not defined, we used bootstrap methods to estimate 95% confidence intervals (CI) for the E S , S R M , and the C S R M . 1 0 Rather than conduct a large number of statistical tests, the 95% CIs were investigated to determine the degree of overlap between the values generated across the H R Q L measures. Also , since it is well-known that the results of these indices sometimes generate conflicting results, 1 2 ' 3 1 we ranked the order of the values according to the responsiveness statistic and calculated the overall median value across the responsiveness statistics to determine the overall rank. . 228 The distribution-based methods described above do not provide answers to practical questions such as, for example, how likely is a decrease in a specified amount in the utility score (as measured by the indirect instruments) to represent actual deterioration? Thus, we utilized a flexible polytomous regression model to assign probabilities of patient's improvement, status quo, or deterioration (as defined by the transition question) to different levels of change in the indirect utility and disease specific H R Q L measures. 1 7 The results of this polytomous regression are presented in a graph of 3 curves, each of which describes how the estimated probability of a respective outcome (improvement, no change, or worsening as defined by the collapsed transition question or the patient global assessment of disease activity question), changes as a function of the difference in two consecutive scores. Finally, we examined associations between changes in either the unweighted domain scores of the E Q - 5 D and the SF-6D (as these instruments do not typically calculate single-attribute utility values) or the single-attribute utility scores of the HUI2 and HUI3 with the external criteria. The purpose of these analyses was to investigate which domains/single attributes were most likely to change in response to improvement or worsening in R A (as defined by the external criteria). Statistical analysis using Kruskal-Wallis was employed. Conservatively, we defined a clear association i f the test for was significant for the domain or single attribute with both external criteria. 7.4 RESULTS 7.4.1 Demographics and Missing Values O f the 320 R A patients who returned the baseline questions, 239 returned the six month questionnaires for a 75% follow-up response rate. Characteristics of our baseline 229 sample have been described in detail elsewhere. Baseline characteristics of those who completed the six month questionnaires compared to those who did not are shown in Table 7.1 For most of the variables examined there were no differences between the baseline characteristics of those who completed the baseline questionnaire as compared to those who did not. However, for all of the instrument scores, those who completed the six month questionnaires appeared to have poorer baseline mean H R Q L scores than those who did not (with the exception of the HUI2) but this relationship was statistically significant only for the H A Q . Other variables that differed between the subgroups were self-reported severity and proportion who worked outside the home in the past 12 months (both favouring those only completing the baseline questionnaire). 7.4.2 Reliability Test-retest reliability for the collapsed categories of the transition question (\"worse\", \"same\", and \"better\") using the follow-up questionnaire and a subsequent questionnaire within 5 weeks of these responses yielded 38 valid (received within the stated time frame) responses and perfect agreement in 36 patients. In the two patients not showing agreement, both returned their reliability questionnaire 14 days after the follow-up questionnaire, one assessed change in his/her R A at 3 months as \"better\" but \"worse\" 14 days later, while the other assessed change at 3 months as \"worse\" but \"better\" 14 days later. Therefore, the I C C value for the collapsed transition question was 0.80 (95% CI 0.64 to 0.89) with these two responses included and 1.00 i f these are eliminated. The results for the test-retest reliability approach for the generic and disease specific instruments are shown in Table 7.2. Using the stable groups approach (i.e. those reporting no change from 0 to 6 months), we also 230 determined I C C values to examine the reliability for the generic and disease-specific instruments (Table 7.3). Results were similar to the test-retest reliability approach in that reliability of the E Q - 5 D overall score appeared to be the lowest while the R A Q o L and the H A Q displayed the highest reliability. 7.4.3 Validity of the Transition Questionnaire For the 0 to 6 months transition question, 96 (40%) reported improvement, 85 (36%) reported no change and 58 (24%) reported worsening. O f these, 222 patients had pairs of answers on all questionnaires to permit comparisons (89 reporting improvement, 77 reporting status quo and 56 reporting worsening). For the secondary external criterion ( as defined by categorization of the patient global assessment of disease severity V A S ) for these 222 pairs, results of the patient global scores were available and were classified as follows: 65, 118, and 39 reporting improvement, status quo and worsening using criterion described in the Methods section. The two external criteria had fairly low agreement (weighted kappa 0.30, 95% CI 0.20 to 0.41). To examine the agreement between results of the collapsed transition question (improved, status quo, worsened) and the H R Q L measures categorized based upon the literature-based M I D values, we plotted the results as bar graphs and calculated weighted kappa values (Figure 7.1). For all the instruments, agreement between the transition question and the categories of the H R Q L values was relatively low (weighted kappa ranging from 0.15 to 0.28). M I D values calculated from our longitudinal sample using the anchor-based approach yielded values somewhat smaller than those reported in the literature (Table 7.4). 231 Spearman's correlations between the transition question responses and changes in the patient global assessment of disease activity V A S , the pain V A S , and the H A Q disability score are shown in Table 7.5. Correlations between the transition question responses and the R A outcome measures were similar in magnitude to those between the R A outcome measures. Changes in the patient global assessment of disease activity V A S and the H A Q score displayed moderate correlation with the transition question. 7.4.4 Responsiveness The mean change scores for each of the instruments between six months and baseline are shown for the entire sample and stratified by results of the transition question in Table 7.6 and for the categories defined by the patient global assessment of disease activity in Table 7.7. A s hypothesized, for many of the instruments, since the sample was experiencing \"natural\" changes in their disease over time, the change scores for the entire sample tended to obscure the changes in the subgroups. Scatterplots o f the indirect utility scores over time (from three measurements at baseline, 3 and 6 months) are presented in Figures 7.2 to 7.5 with ordinary least squares regression lines depicting the overall trends. Most of these lines had slopes in the hypothesized direction (positive for \"better\" and negative for \"worse\" as defined by the collapsed transition question). For those who reported their R A as \"the same\", slopes of the regression line tended to be positive across the indirect utility measures. Also of note, within each instrument, the average scores at baseline between the three groups as defined by the collapsed transition question were different with those stating that their R A had worsened having either lower baseline indirect utility scores (for the HUI2 , H U D , SF-6D, and the EQ-5D) and higher (and therefore, worse) R A Q o L and H A Q scores. Those 232 reporting that their R A was \"the same\" after six months tended to have better H R Q L scores for all the instruments at baseline than the \"better\" and \"worse\" categories. For each of the measures, on average, changes for those reporting \"better\" or \"worse\" were in the appropriate direction (i.e. for the indirect utility scores, positive and negative, respectively; whereas, for the R A Q o L and the H A Q , this was reversed). These findings were similar when the external criterion for change was changed to categories based upon changes in the patient global assessment of disease severity V A S (Table 7.7). The indices of responsiveness (ES, S R M , C S R M , paired t-test and the R E ) and their associated 95% CI for those who responded as better, the same or worse according to the transition question are presented in Table 7.6, according to the patient global rating of disease severity V A S (Table 7.7) and the rankings of the various responsive statistics are shown in Table 7.8. Generally, the results of the various responsiveness statistics tended to agree within each of the instruments (Table 7.8) and there was little overlap between their 95% CI . Overall, the R A Q o L was the most consistently responsive o f the instruments tested regardless of which of the external criteria were applied. Depending on whether the change was classified as either \"worse\" or \"better\" and which of the external criteria were applied, the indirect utility instruments and the H A Q displayed varying degrees of responsiveness. For example, the E Q - 5 D appeared to be responsive in those who were classified as \"worse\" irrespective of which external criteria were applied but unresponsive in those classified as \"better\". The H A Q appeared to be relatively responsive in both those classified as better or worse using the patient transition question to define the groups, but less responsive (in relation to the other instruments) when the patient global assessment of disease severity criterion was applied. The H U B appeared to be relatively unresponsive except in those 233 classified as \"better\" by the patient global assessment of disease severity. The HUI2 was consistently ranked among the middle in responsiveness and the SF-6D appeared to be more responsive in those classified as \"better\" (by either criterion) than those classified as \"worse\". 7.4.5 Flexible Polytomous Regression Techniques Results from the flexible polytomous regressions exploring responsiveness are shown in Figures 7.6 to 7.17. The curves on each figure correspond to the 3 types of outcome (worse, same, better) as defined by the external criteria (patient transition question or the patient global assessment of disease activity). Each curve shows how the estimated probabilities of a specific response vary depending on the observed change in the scores of the instruments. In general, the results of using the patient global assessment of disease activity V A S appear to be better able to discriminate between those patients whose R A has improved, worsened or stayed the same than the transition question. This is evident in all o f the graphs as there is a sharper delineation between the three curves (worse, better and same) in Figures 7.12 to 7.17 than in the corresponding Figures 7.6 to 7.11 (for example, comparing Figure 7.6 to Figure 7.12, of which both examine changes in the HUI2) . Overall, the R A Q o L appeared to be most responsive in both Figure 7.10 and Figure 7.16 as compared to the other instruments using the same external criterion. For example, in Figure 7.16, there is very good discrimination between the three curves as shown by their degree of separation. The probability of being classified as \"the same\" is high (approximately 60%) i f the difference between the two scores is zero. Similarly, this probability decreases as we move in either 234 direction and becomes extremely small when the difference is ± 20. A s the difference in the scores gets larger in the positive direction (recall that larger values in the R A Q o L reflect worse H R Q L ) , the probability of being classified as \"worse\" grows to > 80% when the difference in scores is approximately 15 and almost 100% when the difference is 20. These values are similar to those displayed for negative values (reflecting improvement) in the R A Q o L and the dashed curve labeled as \"better\". For the indirect utility instruments for the graphs using the patient transition question as the external criteria for change, there was generally fairly poor discrimination between the curves with significant overlap between the probabilities of being classified \"better\", \"worse\" and \"same\" across the range of difference scores (Figures 7.6 to 7.10). Using the patient global assessment of disease activity V A S criteria, the curves for all the indirect utility instruments showed much better discrimination between those classified as \"better\" and \"worse\" (Figures 7.12 -7.15). However, for those classified as the \"same\", there was considerable overlap between these probabilities and the probabilities for \"better\" and \"worse\". The H U B appeared to be the best able to discriminate in this regard (Figure 7.13). Thus, it would seem that although these instruments can discriminate change well (according to the external criterion) in those who improve or worsen, those that stay the same yield somewhat problematic difference scores. This finding could be a property of the instruments or may be a reflection of the cut-off values of our external criterion. Similarly, for the H A Q , the patient global assessment of disease severity V A S criterion appeared to result in better discrimination between the curves; however, as with the indirect utility measures, there was considerable overlap between the \"same\" category and the other categories. 235 7.4.6 Change in Unweighted Domain Scores (EQ-SD, SF-6D) and Single Attribute Utilities (HUI2, HUH) The associations between the instrument unweighted domains (EQ-5D and SF-6D) and the single attribute scores (HUI2 and H U D ) and the external criteria of change are shown in (Table 7.9). For the EQ-5D, pain/discomfort, anxiety/depression and self-care, and, for the SF-6D, physical, and social functioning, role limitations and pain met our criteria for statistical significance. For the single attributes from the H U I systems, ambulation, emotion, and pain (from the H U D ) and mobility, emotion and pain (from the HUI2) met the criteria. O f note, there were more significant associations between the domains/single attributes and the changes defined by the patient global assessment of disease severity categories than the patient transition question responses. For example, with the E Q - 5 D there was a significant association between the mobility domain in the patient global assessment o f disease severity V A S defined changes but not for the other external criterion. For the SF-6D, H U D , and HUI2 there were significant associations for the vitality domain, the dexterity single attribute, and the sensation single attribute, respectively, using the patient global assessment of disease severity V A S defined changes. O f note, for the self-care single attribute in the HUI2 , there was a significant association between the patient transition defined changes but not the other criterion. 7.5 DISCUSSION This study is the first to compare the reliability and longitudinal changes in scores obtained with four indirect utility instruments ( H U D , HUI2 , E Q - 5 D , SF-6D), a disease-236 specific measure (the R A Q o L ) , and a disability measure (the H A Q ) in a sample of patients with rheumatoid arthritis. Our results demonstrate that while the generic, preference-based measures yielded scores that were generally reliable, they had lower responsiveness (as assessed by multiple methodologies) in R A than the disease-specific R A Q o L . The indirect utility measures did, however, yield moderate responsiveness statistics when the patient global assessment of disease severity was applied as the external criterion for change. The domains and attributes of the indirect utility instruments that were commonly associated with the external criteria for change in R A tended to be pain, ambulation/physical functioning, and emotional/mental health. We also examined the reliability and validity of utilizing a patient-completed transition question to function as the primary external criteria of change which is a common approach. 9 \" 1 1 ' 1 4 , 1 7 However, the main concern with this approach is recall bias. Some literature suggests that retrospective estimates of the initial state are often highly correlated with the present state and uncorrelated with the initial state.22 Another concern deals with the starting point of individuals who are rating their health changes. For example, an individual starting at a lower point in health or function may rate a small change as significant where a person of higher function may regard change of the same magnitude as insignificant. 2 9 Despite these concerns, there has been little work in evaluating the reliability and validity of transition questions. 1 0 In this study, we found that the reliability o f the transition question was acceptable using the test-retest approach. However, from the validity standpoint, we found that the responses from the transition question were only lowly to moderately correlated with commonly accepted clinical variables used to assess R A (such as the pain V A S , the patient global assessment of disease severity V A S , and the H A Q ) . 1 The patient 237 transition defined changes also had low agreement with previously defined M I D values and the changes defined by the patient global assessment of disease severity V A S . In addition, we found that there were fairly large mean differences in the instruments between the time points for individuals who were classified as being the \"same\" from their R A perspective (sometimes the change in this category was of similar magnitude as those classified as \"worse\" or \"better\"). This point was illustrated in the polytomous regression plots where there was considerable overlap between the \"same\" and \"better\" or \"worse\" curves. While this finding could be the result of shortcomings of the instrument in assessing changes in R A , these findings were not observed when a different external criterion was applied (categories based upon the patient global assessment of disease activity V A S ) . Also , several single attributes that were expected to have significant associations with changes in R A were significantly associated with changes in the patient global assessment V A S and not the patient transition question changes (mobility (EQ-5D), vitality (SF-6) and dexterity ( H U B ) ) . Therefore, categorization of the patient global assessment of disease activity V A S appears to be a superior external criterion for R A than the patient transition question as it was expected that these domains/single attributes would be associated with changes in R A . Therefore, we would hypothesize that these are the main factors that are driving the observed changes in the global utility scores. Generally, dividing the sample into \"worse\", \"same\" and \"better\" using the patient global assessment of disease severity V A S categories seemed to more accurately define these groups than the patient transition question. This point is illustrated by the larger responsiveness statistics for all of the instruments, the smaller amount of change in all of the instruments in those classified as having their R A being the \"same\" as at baseline, and a 238 greater magnitude of change (either negative or positive) in those classified as having their R A \"worse\" or \"better\" than baseline. Using the transition question as the external criterion resulted in small ES , S R M and C S R M statistics for virtually all o f the instruments and non-significant p-values on the paired t-tests for those who reported to have improved or worsened from baseline for many of the indirect utility measurements (Table 7.6). Conversely, when applying the classification according the patient global assessment of disease severity V A S (Table 7.7), many of the responsiveness statistics for those classified having their R A improved or worsened over baseline can be interpreted as moderate or large, and all o f the paired t-tests for those who improved or worsened were significant for all of the instruments. The indirect utility instruments displayed different properties in this study. Reliability was acceptable for all o f the scores except for the EQ-5D (ICC 0.46 to 0.52 depending on methodology employed). This finding is considerably lower than previously reported in rheumatoid arthritis ( ICC of 0.73 using the stable groups approach and 0.78 using test-retest reliability). 9 The differences in these two findings may be due to the five week window for resubmission of the reliability questionnaires in our study compared to two weeks in the other analysis. In the longer time frame, it is possible that there was a higher probability for change. This change may have penalized the E Q - 5 D much more than the other scales as there is a term in the EQ-5D scoring function (N3) that subtracts 0.269 i f a score of the lowest level (3) occurs on at least one domain. Thus, a one category change (from \"2\" to \"3\") in response in a single domain can have profound implications for reducing the E Q - 5 D utility score. However, other instruments which were found to be more responsive than the E Q - 5 D were stable (the R A Q o L and the H A Q ) over this time frame. 239 The HUI2 and the H U D generally had low responsiveness statistics utilizing the patient transition question as the external criteria and moderate responsiveness statistics when the categories o f the patient global assessment o f disease activity V A S were applied. Their relative rankings were towards the middle or bottom for all o f the instruments regardless o f the external criteria applied accept for the \"better\" category as defined by the patient global assessment of disease activity. For this category, the H U D had the highest responsiveness statistics in three categories (the ES , S R M , and the paired sample t-test). This was likely due to the observation that the mean change in this category was quite large (0.17) which was almost half of the baseline score. In the polytomous regression plots, the H U D appeared to have less overlap between the same and the better or worse curves than the other indirect utility instruments (i.e. Figure 7.13) which may make it more responsive in R A . A s expected, the sensation attribute (HUI2), the vision, hearing and speech attributes ( H U D ) and the cognition attributes (both scales) were not associated with the external criteria defined change in R A . O f note, although one would have expected dexterity ( H U D ) and self-care (HUI2) to be consistently associated with changes in R A , each was only significant for only one of the external criteria. The SF-6D generally had low responsiveness statistics utilizing the patient transition question as the external criteria and moderate responsiveness statistics when the categories of the patient global assessment of disease activity V A S were applied. This latter finding was especially true for the \"better\" category. In the rankings of the responsiveness statistics, the SF-6D had much higher rankings for those classified as improved (median rankings o f 3 for both external criteria) compared to those classified as worsened (median rankings of 5 and 6). One of the problems with the responsiveness of the SF-6D when using our external 240 criteria was the amount of change experienced by those categorized as the \"same\". Both paired t-tests for this category using each of the external criteria were significant indicating a large degree o f mean change (0.04 in Table 7.6 which was as large as those reporting improvement and 0.02 in Table 7.7). These results are further illustrated in the Figures 7.9 and 7.14 with the probability of being scored as the same being somewhat constant over the range of SF-6D change scores. A s anticipated, the R A Q o L was the most responsive to changes in both positive and negative directions which are in agreement with other research comparing disease-specific to generic H R Q L instruments. 3 0 The responsiveness statistics were generally moderate to large irrespective of the external criteria of change applied and were consistently in the top 2 in the rankings (Table 7.8). In addition, the results of the polytomous regressions reveals well delineated curves for same, better and worse without a large degree of overlap (Figures 7.10 and 7.15). Results for the H A Q revealed that this instrument performed approximately equivalently for both of the external criteria with responsiveness statistics of similar magnitude. However, when compared to the other instruments, the H A Q rankings were among the highest for responsiveness statistics calculated from categories defined by the patient transition question but were either in the middle (for those categorized as worse) or at the bottom (for those categorized as better) for responsiveness statistics calculated from categories defined by the patient global assessment of disease severity V A S . Although the reason for this finding is not obvious, perhaps the patient transition question is capturing mostly changes in elements of disability (as measured by the H A Q ) rather than other aspects/domains of R A which are being captured by the other instruments. 241 In summary, the R A Q o L was consistently the most responsive of the tested instruments. Among the indirect utility instrument's overall utility scores, the E Q - 5 D appeared to be the most responsive to worsening but not to improvement. Conversely, the H U D and SF-6D were superior in detecting improvement but the SF-6D detected changes in those classified as the \"same\". Thus, in R A clinical trial situations where a known effective intervention is to be applied and there is a large probability of positive change, the SF-6D and the H U D would be superior to the other instruments. However, changes in the SF-6D might be larger as many patients classified as the same by other criteria would, in fact, improve using this scale. The HUI2 appeared to be fairly non-responsive in R A in comparison to the other measures. We located two other studies that compared the responsiveness of indirect utility instruments in longitudinal sample o f patients with a variety o f musculoskeletal diseases. ' In the study that was conducted exclusively in R A , investigators examined the reliability and responsiveness of.the SF-36, SF-6D, E Q - 5 D , standard gamble (SG), the modified H A Q , and a pain V A S in two groups of R A patients (Group 1 consisted of 24 patients with stable R A and Group 2 consisted of 60 patients beginning infliximab therapy). Patients in group 2 were assessed prior to being initiated on infliximab therapy and after 14 weeks of infliximab treatment. Test-retest reliability was estimated for each instrument in the stable patient group using the I C C whereas responsiveness was assessed by using the paired t-test, effect size (ES) and standardized response mean (SRM). For all the measures, the I C C ranged from 0.50 (role emotional domain from the SF-36) to 0.92 (physical functioning domain from the SF-36). The preference-based measures had moderate reliability (ICCs of: E Q - 5 D 0.66, SF-242 6D 0.72, S G 0.73). However, the sample from which these results were derived was very small (n=24) and thus, these estimates are not stable. In terms of responsiveness, for Group 2; all the overall scores and domain scores for the SF-36 detected significant changes from baseline to the second measurement. Standardized response means ( S R M ) and ES were the largest for the pain V A S , the E Q - 5 D V A S , the SF-36 physical component scores, and the SF-36 vitality domain. In terms of the preference-based measures, the S R M and E S values were 0.67 and 0.64 for the EQ-5D, 1.40 and 0.87 for the SF-6D, and 0.49 and 0.43 for the S G . Despite the fact that the change described by the EQ-5D system was twice that described by the SF-6D, the responsiveness statistics were much smaller mainly due to the larger SD of the baseline and change scores of the EQ-5D. The authors concluded that the SF-6D might be a preferable to the E Q - 5 D in measuring clinically-relevant improvement in R A . In the other study, conducted in patients who had one of many rheumatic diseases,1 1 investigators utilized a five-point transition question for patient's to self-report their health changes from baseline to 12 months and then subsequently collapsed the transition question to three categories (better (n=40) , same (n=30), and worse (n=28)). The results of the responsiveness analyses were somewhat different than ours. The E S for both those categorized as better or worse was moderate (0.53 and -0.58, respectively) for the EQ-5D and were much larger than those reported for the H U D and the SF-6D. Possible reasons for these differences include the sample size (n=98 pairs vs. n=222 pairs in our sample), sample characteristics (only half had R A while the balance was a mixture of other rheumatologic conditions with potentially more or less propensity for change), the recall period for the transition question (12 months vs. 6 months), and the lack of validity testing o f the transition 243 question (i.e. we found that the transition question may not be the most valid external criteria to categorize the sample). Blanchard et al. , examined the responsiveness of the HUI2 and the H U D (and other, disease-specific measures) in a sample of osteoarthritis patients (n=90) undergoing total hip arthroplasty. These investigators found that, although the disease-specific measures were the most responsive to this known, effective intervention, the preference-based measures had responsiveness statistics that were moderate to large in magnitude and were able to detect change resulting from T H A . In addition, the pain (both HUI2 and H U D ) , ambulation ( H U D ) and self-care (HUI2) single-attributes had moderate to large ES . Although the findings of the present study may provide valuable information on the application of these instruments in the assessment of changes experienced by patients with R A , there are some limitations to be considered when interpreting the results. Firstly, the results are specific to R A patients undergoing \"natural\" changes in their disease and may not generalize to other populations. Secondly, while the patient transition question appeared to suffer from several limitations in assigning categories of change to R A patients over six months, this finding might not generalize to other disease states/processes or shorter time frames o f recall. In utilizing the secondary external criteria of change, we adopted ± 20% from baseline to be out cut-points in assigning categories of change. However, although this criterion is similar to what the American College of Rheumatology has used in determining the A C R 20 criteria 1 and it appeared to perform better than the patient transition question, other cut-points may yield better results. Further studies are necessary to determine such cut-points. 244 We conclude that the reliability of all the instruments (except the EQ-5D) and the patient transition question was acceptable. The patient transition question might not be a valid external criterion for assigning categories of change to patients with R A and categories defined by the patient global assessment of disease severity appeared to perform better in this regard. The R A Q o L was the most responsive although all the instruments were capable of detecting change to some degree. The H U B and the SF-6D may be the best indirect utility instruments to use in clinical trials of R A where a known effective intervention is to be applied. 245 7.6 REFERENCES 1. American College of Rheumatology Subcommittee on Rheumatoid Arthritis Guidelines for the Management of Rheumatoid Arthritis: 2002 Update. Arthritis Rheum 2002;46:326-348. 2. Lipsky PE , van der Heijde D M , St Clair E W , Furst D E , Breedveld F C , et al.. Infliximab and methotrexate in the treatment of rheumatoid arthritis. Anti-Tumor Necrosis Factor Trial in Rheumatoid Arthritis with Concomitant Therapy Study Group. N Eng J M e d 2000;343:1594-1602. 3. Blumenauer B , Cranney A , Cl inch J , Tugwell P. Quality of life in patients with rheumatoid arthritis : which drugs might make a difference? Pharmacoeconomics. 2003;21:927-940. 4. Scott D L . Leflunomide improves quality of life in rheumatoid arthritis. Scand J Rheumatol Suppl. 1999;112:23-29. 5. Zhao SZ, Fiechtner JI, Tindall E A , Dedhiya SD, Zhao W W , et al.. Evaluation of health-related quality of life of rheumatoid arthritis patients treated with celecoxib. Arthritis Care Res 2000; 13:112-121. 6. Hammond A , Young A , Kidao R. A randomised controlled trial o f occupational therapy for people with early rheumatoid arthritis. A n n Rheum Dis 2004;63:23-30. 7. Eberhardt K , Duckberg S, Larsson B M , Johnson P M , Nived K . Measuring health related quality of life in patients with rheumatoid arthritis—reliability, validity, and responsiveness of a Swedish version of R A Q o L . Scand J Rheumatol. 2002;31:6-12. 246 8. Drummond M F , O 'Br ien B , Stoddart G L , Torrance G W (eds.). Methods for the economic evaluation of health care programmes. 2nd edition. Oxford Medical Publications, Oxford. 1997. 9. Hurst N P , K i n d P, Ruta D , Hunter M , Stubbings A . Measuring health-related quality of life in rheumatoid arthritis: Validity, responsiveness and reliability o f EuroQol (EQ-5D). Br J Rheumatol 1997;36:551-559. 10. Walters SJ, Brazier JE. What is the relationship between the minimally important difference and health state utility values? The case of the SF-6D. Health Qual Life Outcomes. 2003;11:4-12 11. Conner-Spady B , Surez-Almazor M E . Variation in the estimation of quality-adjusted life-years by different preference-based instruments. M e d Care 2003;41:791-801. 12. Blanchard C, Feeny D , Mahon J L , Bourne R, Rorabeck C, et al.. Is the Health Utilities Index responsive in total hip arthroplasty patients? J C l i n Epidemiol 2003;56:1046-1054. 13. Terwee C B , Dekker F W , Wiersinga, Prummel M F , Bossuyt P M M . On assessing the responsiveness of health-related quality of life instruments: Guidelines for instrument evaluation. Qual Life Res 2003;12:349-362. 14. Liang M H , Lew R A , Stucki G , Fortin PR, Daltroy L . Measuring clinically important changes with patient-oriented questionnaires. M e d Care 2002;40 (suppl):II-45 - II-51. 15. Arnett F C , Edworthy S M , Bloch D A , McShane D J , Fries JF, Cooper N S , et al.. The American Rheumatism Association 1987 revised criteria for the classification of rheumatoid arthritis. Arthritis Rheum 1988;31:315-324. 247 16. Wong A L , Wong WK, Harker J, Sterz M , Bulpitt K, Park G, Ramos B, Clements P, Paulus H. Patient self-report tender and swollen joint counts in early rheumatoid arthritis. Western Consortium of Practicing Rheumatologists. J Rheumatol 1999;26:2551-2561. 17. Fortin PR, Abrahomowicz, Clarke A E , Neville C, Du Berger R, et al.. Do lupus disease activity measures detect clinically important changes? J Rheumatol 2000:27;1421-1428. 18. Redelmeier DA, Lorig K. Assessing the clinical importance of symptomatic improvements — an illustration in rheumatology. Arch Intern Med 1993; 153:1337-1342. 19. Wells GA, Tugwell P, Kraag GR, Baker PR, Groh J, Redelmeier DA. Minimum important difference between patients with rheumatoid arthritis: the patient's perspective. J Rheumatol 1993;20:557-560. 20. Marra CA, Woolcott JC, Shojania K, Offer R, Kopec J, Brazier JE, Esdaile JM, Anis A H . An assessment of the construct validity of four indirect utility measures in rheumatoid arthritis . Social Science and Medicine (submitted). 21. Kopec JA, Willison KD. A comparative review of four preference-weighted measures of health-related quality of life. J Clin Epidemiol 2003;56:317-325. 22. Streiner DL, Norman GR. Health measurement scales. A practical guide to their development and use. 2nd Ed. Oxford University Press, London, 1995. 23. Norman GR, Wridhar FG, Guyatt GH, Walter SD. Relation of distribution- and anchor-based approaches in interpretation of changes in health-related quality of life. Med Care 2001;39:1039-1047. 248 24. Norman G R , Sloan J A , Wyrwich K W . Interpretation of changes in health-related quality of life. The remarkable universality of half a standard deviation. M e d Care 2003;41:582-592. 25. Cohen J. A power primer. Psychol B u l l 1992; 112; 155-159. 26. Deyo R A , Diehr P, Patrick D L . Reproducibility and responsiveness of health status measures. Statistics and strategies for evaluation. Control C l i n Trials 1991;12:142S-158S. 27. Cohen J. Statistical power analysis for the behavioural sciences. 2nd ed. Hillsdale (NJ): Lawrence Erlbaum Assoc., 1988. 28. Tidermark J, Bergstrom G , Svensson O, Tornkvist, Ponzer S. Responsiveness of the EuroQol (EQ-5D) and the SF-36 in elderly patients with displaced femoral neck fractures. Qual Life Res 2003;12:1069-1079. 29. Liang M , Larson M , Cullen K , Schwartz A . Comparative measurement efficiency and sensitivity of five health status instruments for arthritis. Arthritis Rheum 1985;28:542-547. 30. Wiebe S, Guyatt G , Weaver B , Matijevic S, Sidwell C. Comparative responsiveness of generic and specific quality-of-life instruments. J C l i n Epidemiol. 2003;56:52-60. 31. Wright J G , Young N L . A comparison of different indices of responsiveness. J C l i n Epidemiol 1997;50:239-246. 32. Russell A S , Conner-Spady B , Mintz A , Mal lon C, Maksymyowych W P . The responsiveness of generic health status measures as assessed in patients with rheumatoid arthritis receiving infliximab. J Rheumatol 2003;30:941-947. 249 TABLE 7.1: BASELINE CHARACTERISTICS OF THOSE SUPPLYING BOTH BASELINE AND SIX MONTH QUESTIONNAIRES COMPARED TO THOSE WHO ONLY COMPLETED THE BASELINE QUESTIONNAIRE Completed both Completed only baseline and six baseline month N 239 81 Age, y (SD) 62.1 (13.1) 58.1 (12.8) Duration of RA, y (SD) 13.8(11.2) 14.7(12.7) Tender joints, range 0-28 (SD) 14.9(11.5) 15.6(13.4) Swollen joints, range 0-28 (SD) 8.8 (8.5) 10.0 (9.3) Pain VAS, range 0 - 100mm (SD) 38.5 (26.2) 44.4 (27.1) Patient Global VAS of disease activity, 59.0 (25.2) 62.8 (27.4) range 0-100mm (SD) Other chronic diseases 1.2(1.3) 1.4(1.8) HUI2 utility score (SD) 0.71 (0.19) 0.71 (0.20) HUB utility score (SD) 0.52 (0.29) 0.56 (0.29) EQ-5D utility score (SD) 0.65 (0.24) 0.69 (0.23) SF-6D utility score (SD) 0.62 (0.13) 0.64 (0.14) RAQoL score (SD) 12.90 (8.1) 12.20 (8.8) HAQ Score, range 0-3.0 1.2 (0.8) 1.0(0.8)* Female (%) 79.2 78.4 Hospitalized for RA in last 12 months (%) 16.5 10.8 Used allied health professional/home care 44.3 36.8 services for RA in last 12 months (%) Self-reported RA severity (%)t Very Mild 1.3 7.8 Mild 10.2 14.2 Moderate 42.2 35.1 Severe 37.4 33.8 Very Severe 8.9 9.1 Self-reported RA control (%) Very Well Controlled 10.2 13.0 Well Controlled 25.8 24.7 Adequately Controlled 42.6 37.6 Not Well Controlled 18.7 23.4 Not Controlled At All 2.7 1.3 Working outside the home in last 12 17.4 30.9$ months (%) SD = standard deviation * p=0.04 by student t-test * p=0.04 by Chi-square (df=4, Chi-square = 9.8) * p=0.01 by Chi-square (df=l, Chi-square =6.7) 250 TABLE 7.2: TEST - RETEST RELIABILITY Instrument Attribute/Domain ICC 95% CI HUI2 0.77 0.59 -0.88 Sensation 0.81 0 .63-0 .90 Mobi l i ty 0.84 0 .71-0 .91 Emotion 0.72 0 .52-0 .84 Cognition 0.75 0 .57-0 .87 Self-care 0.36 0 .05-0 .60 Pain 0.52 0 .25-0 .72 HUI3 0.81 0 .66-0 .90 Vis ion 0.88 0.78 - 0.94 Hear 0.23 -0.05 - 0.47 Speech 1.00 -Ambulation 0.82 0.68 -0.90 Dexterity 0.78 0 .63-0 .88 Emotion 0.66 0 .43-0 .81 Cognition 0.46 0.15 -0.68 Pain 0.59 0 .34-0 .76 SF-6D 0.89 0.79 - 0.94 Physical functioning 0.81 0 .67-0 .90 Role limitations 0.73 0 .55-0 .85 Pain 0.82 0.68-0.90 Mental health 0.77 0.60-0.87 Vitali ty 0.74 0.55-0.85 Social functioning 0.80 0.64-0.88 EQ-5D 0.46 0 .18-0 .68 Mobi l i ty 0.74 0.55-0.85 Self-care 0.51 0.23-0.71 Usual activities 0.71 0 .50-0 .84 Pain/Discomfort 0.72 0 .52-0 .84 Anxiety/Depression 0.73 0 .55-0 .85 HAQ 0.97 0 .93-0 .98 RAQoL 0.93 0.86 - 0.96 Questionnaire results compared to results within 35 days of global scores, unweighted domain scores (EQ-5D and SF-6D), and single attribute utility scores (HUI2 and H U B ) . Results are intraclass correlation coefficients (ICC) with 95% confidence intervals (CI) 251 TABLE 7.3: INTRACLASS CORRELATION COEFFICIENT VALUES FOR GENERIC AND DISEASE-SPECIFIC HRQL MEASURES FOR THOSE REPORTING NO CHANGE IN THEIR RHEUMATOID ARTHRITIS BETWEEN 0 AND 6 MONTHS Measure ICC* 95% CI HUI2 0.72 0 .60-0 .81 H U B 0.78 0 .67-0 .85 SF-6D 0.78 0 .68-0 .86 E Q - 5 D 0.52 0.35 - 0.66 H A Q 0.90 0 .84-0 .93 R A Q o L 0.86 0 .78-0 .91 *ICC = intraclass correlation coefficient (two-way mixed effect model such that the subject effect was random and the instrument effect was fixed) 252 T A B L E 7.4: M I N I M A L L Y I M P O R T A N T D I F F E R E N C E S R E P O R T E D IN T H E L I T E R A T U R E AND DERIVED F R O M T H E S A M P L E USING A N C H O R - B A S E A P P R O A C H E S Instrument Literature-derived MID* Values Author, Reference # Anchor-based MID* Values (SD) Worse Better HUI2 0 .03-0 .04 8 -0.05(0.19) 0.04 (0.13) H U B 0.06 - 0.07 8 -0.03 (0.25) 0.02 (0.22) EQ-5D 0.03-0 .05 8 -0.06 (0.21) 0.03 (0.22) SF-6D 0.033 9 -0.02 (0.07) 0.02 (0.09) R A Q o L 2.00 8 1.39(5.12) -1.72 (5.56) H A Q 0.25 6,7 0.12(0.50) -0.10(0.37) * M I D = minimally important dif erences 253 TABLE 7.5: CORRELATIONS BETWEEN THE TRANSITION QUESTION AND CHANGES IN RHEUMATOID ARTHRITIS OUTCOME VARIABLES FROM 0 TO 6 MONTHS Transition Question Patient Global Assessment of Disease Pain VAS (100mm) Health Assessment Questionnaire Transition Question* 1.00 Patient Global Assessment of Disease Activity VAS (100mm) 0.41 1.00 Pain VAS (100mm) 0.26 -0.44 1.00 Health Assessment Questionnaire (HAQ) Score (0 - 3.0) 0.31 -0.29 0.39 1.00 * Transition question - 5 point scale with responses \"Much Worse\", \"Somewhat Worse\", \"The Same\", \"Somewhat Better\", \"Much Better\" All correlations p < 0.0001 254 T A B L E 7.6: DIFFERENCES AND RESPONSIVENESS STATISTICS FROM BASELINE TO 6 MONTHS STRATIFYING THE SAMPLE BY THE TRANSITION QUESTION N Baseline Mean SD Range Mean Change SD Effect Size 95% CI SRM 95% CI CSRM 95% CI Paired test t-RE HUI3 Overall 222 0.52 0.29 -0.16 to 1.00 0.03 0.22 Worse 56 0.44 0.3 0.16 to 0.95 -0.03 0.25 -0.10 -0.31 to 0.13 -0.12 -0.56 to 0.08 -0.18 -0.53 to 0.22 0.4 0.12 Same 77 0.6 0.26 -0.05 to 1.00 0.03 0.17 0.12 -0.03 to 0.26 0.18 -0.14 to 0.31 0.18 -0.14 to 0.31 0.13 Better 89 0.51 0.3 -0.15 to 0.97 0.07 0.24 0.23 0.08 to 0.41 0.29 0.01 to 0.40 0.41 0.13 to 0.74 0.005 0.39 HUI2 Overall 222 0.71 0.19 0.13 to 1.00 0.02 0.16 Worse 56 0.66 0.21 0.14 to 0.97 -0.03 0.19 -0.14 -0.41 to 0.10 -0.16 -0.39 to 0.16 -0.18 -0.71 to 0.16 0.22 0.25 Same 77 0.76 0.16 0.16 to 1.00 0.03 0.17 0.19 -0.05 to 0.27 0.18 -0.05 to 0.39 0.18 -0.05 to 0.39 0.12 Better 89 0.7 0.2 0.18 to 1.00 0.06 0.15 0.30 0.16 to 0.47 0.40 0.10 to 0.52 0.35 0.23 to 0.81 0.0001 0.72 EQ-5D Overall 222 0.65 0.24 -0.03 to 1.00 -0.001 0.2 Worse 56 0.6 0.25 -0.03 to 1.00 -0.04 0.21 -0.16 -0.44 to 0.06 -0.19 -0.66 to -0.02 -0.21 -0.55 to 0.06 0.04 0.73 Same 77 0.72 0.19 0.08 to 1.00 -0.02 0.19 -0.11 -0.36 to 0.16 -0.11 -0.41 to 0.02 -0.11 -0.41 to 0.02 0.48 Better 89 0.62 0.26 0.02 to 1.00 0.04 0.2 0.15 0.01 to 0.31 0.20 0.12 to 0.59 0.21 -0.01 to 0.45 0.02 0.24 SF-6D Overall 222 0.62 0.13 0.31 to 1.00 0.03 0.09 Worse 56 0.59 0.13 0.31 to 0.95 -0.01 0.08 -0.08 -0.24 to 0.08 -0.13 -0.44 to 0.15 -0.13 -0.39 to 0.14 0.26 0.21 Same 77 0.64 0.11 0.42 to 0.94 0.04 0.08 0.36 0.19 to 0.56 0.50 0.31 to 0.70 0.50 0.31 to 0.70 0.0001 Better 89 0.62 0.13 0.31 to 1.00 0.04 0.11 0.31 0.11 to 0.49 0.36 0.16 to 0.58 0.50 0.19 to 0.78 0.001 0.52 RAQoL Overall 222 13.1 8.1 0 to30 -1.26 5.14 Worse 56 14.9 7.7 0 to30 1.43 4.25 0.19 0.04 to 0.33 0.34 -0.10 to 0.45 0.38 0.08 to 0.70 0.02 1.00 Same 77 10.8 7.4 0 to27 -1.23 3.74 -0.17 -0.19 to 0.05 -0.33 -0.39 to 0.07 -0.33 -0.39 to 0.07 0.004 Better 89 13.8 8.4 0 to30 -3 5.94 -0.36 -0.51 to-0.20 -0.51 -0.22 to 0.60 -0.80 -0.42 to-1.20 O.0001 1.00 HAQ Overall 222 1.15 0.77 0 to 2.63 -0.06 0.45 Worse 56 1.28 0.74 0 to 2.63 0.16 0.48 0.22 0.04 to 0.38 0.33 0.06 to 0.65 0.46 0.08 to 0.84 0.009 1.21 Same 77 0.99 0.79 0 to 2.38 -0.07 0.35 -0.09 -0.28 to 0.02 -0.20 -0.56 to-0.10 -0.20 -0.56 to 0.10 0.09 Better 89 1.21 0.75 0 to 2.50 -0.18 0.46 -0.24 -0.38 to-0.11 -0.39 -0.69 to -0.30 -0.51 -0.84 to -0.23 0.0001 0.71 255 TABLE 7.7: DIFFERENCES AND RESPONSIVENESS STATISTICS FROM BASELINE TO 6 MONTHS STRATIFYING THE CATEGORIES CREATED FROM PATIENT GLOBAL ASSESSMENT OF DISEASE SEVERITY VAS N Baseline Mean SD Range Mean Change SD Effect Size 95% CI SRM 95% CI CSRM 95% CI Paired t-test RE H U B Overall 222 0.52 0.29 -0.16 to 1.00 0.03 0.22 Worse 39 0.47 0.28 -0.16 to 0.85 -0.10 0.21 -0.36 -0.04 to -0.65 -0.46 -0.07 to -0.88 -0.53 -0.11 to-0.82 0.001 0.78 Same 118 0.63 0.25 -0.07 to 0.97 0.01 0.19 0.05 -0.04 to 0.24 0.07 -0.06 to 0.31 0.07 -0.06 to 0.31 0.45 Better 65 0.37 0.28 -0.15 to 1.00 0.17 0.23 0.60 0.28 to 0.72 0.73 0.29 to 0.80 0.89 0.32 to 1.02 <0.0001 0.74 HUI2 Overall 222 0.71 0.19 0.13 to 1.00 0.02 0.16 Worse 39 0.69 0.19 0.19 to 0.94 -0.06 0.14 -0.33 -0.63 to -0.07 -0.44 -0.10 to -0.80 -0.46 -0.13 to-0.89 0.01 0.82 Same 118 0.77 0.17 0.18 to 1.00 0.02 0.14 0.10 -0.08 to 0.18 0.13 -0.12 to 0.24 0.13 -0.12 to 0.24 0.17 Better 65 0.63 0.18 0.24 to 0.97 0.09 0.17 0.49 0.39 to 0.83 0.52 0.48 to 1.01 0.64 0.56 to 1.25 0.0001 1.02 EQ-5D Overall 222 0.65 0.24 -0.03 to 1.00 -0.001 0.20 Worse 39 0.65 0.21 0.08 to 1.00 -0.12 0.19 -0.55 -0.16 to -0.52 -0.63 -0.19 to-0.85 -0.69 -0.70 to -0.68 0.0004 1.14 Same 118 0.73 0.19 0.08 to 1.00 -0.02 0.17 -0.09 -0.17 to 0.01 -0.10 -0.34 to 0.01 -0.10 -0.12 to-0.10 0.30 Better 65 0.53 0.21 0.02 to 0.88 0.09 0.22 0.36 0.16 to 0.52 0.43 0.29 to 0.73 0.56 0.55 to 0.57 0.0012 0.61 SF-6D Overall 222 0.62 0.13 0.31 to 1.00 0.03 0.09 Worse 39 0.60 0.11 0.37 to 0.95 -0.03 0.07 -0.24 -0.02 to -0.49 -0.35 -0.04 to -0.87 -0.30 -0.03 to -0.64 0.04 0.62 Same 118 0.66 0.13 0.37 to 1.00 0.02 0.09 0.18 0.05 to 0.31 0.26 0.09 to 0.45 0.26 0.09 to 0.45 0.006 Better 65 0.57 0.11 0.31 to 0.89 0.06 0.09 0.54 0.32 to 0.79 0.62 0.41 to 0.85 0.68 0.39 to 1.03 <0.0001 0.90 RAQoL Overall 222 13.1 8.1 0 t o 3 0 -1.26 5.14 Worse 39 13.2 8.2 1 to 29 2.72 4.87 0.33 0.21 to 0.83 0.56 0.20 to 0.67 0.71 0.30 to 0.97 0.002 1.00 Same 118 10.4 7.3 0 t o 2 9 -1.03 3.85 -0.14 -0.08 to 0.28 -0.27 -0.08 to -0.29 -0.27 -0.08 to -0.29 0.005 Better 65 17.1 7.7 0 to30 -4.29 6.20 -0.56 -0.18 to-0.75 -0.69 -0.27 to-1.08 -1.11 -0.23 to-1.59 <0.0001 1.00 H A Q Overall 222 1.15 0.77 0 to 2.63 -0.06 0.45 Worse 39 1.06 0.71 0 to 2.63 0.25 0.46 0.34 0.11 to 0.44 . 0.50 0.28 to 0.88 0.63 0.26 to 0.92 0.002 0.97 Same 118 0.97 0.77 0 to 2.63 -0.07 0.39 -0.08 -0.06 to -0.25 -0.17 -0.12 to -0.46 -0.17 -0.12 to-0.46 0.08 Better 65 1.53 0.63 0 to 2.50 -0.22 0.43 -0.35 -0.32 to -0.76 -0.50 -0.48 to -0.92 -0.57 -0.44 to-1.24 0.0002 0.72 256 TABLE 7.8: RANKINGS OF RESPONSIVENESS OF MEASURES ACCORDING TO THE RESPONSIVENESS STATISTIC AND THE EXTERNAL CRITERIA OF CHANGE (EITHER RESPONSES TO THE PATIENT TRANSITION QUESTION OR TO THE PATIENT GLOBAL ASSESSMENT OF DISEASE ACTIVITY VAS) Patient Transition Question Patient Global Assessment of Disease Activity VAS Categories Measure ES SRM CSRM Paired t-test RE Median Rank ES SRM CSRM Paired t-test RE Median Rank Worse Worse HUI2 4 4 4 4 4 4 4 5 5 4 4 4 HUB 5 6 4 6 6 6 2 4 4 2 5 5 EQ-5D 3 3 3 3 3 3 1 1 2 1 1 1 SF-6D 6 5 5 5 5 5 5 6 6 5 6 6 RAQoL 2 1 2 2 2 2 4 2 1 3 2 2 HAQ 1 2 1 1 1 1 3 3 3 3 3 3 Better Better HUI2 3 2 5 2 3 3 4 4 4 4 1 4 HUD 5 5 4 5 5 5 1 1 2 1 5 1 EQ-5D 6 6 6 6 6 6 5 6 6 6 6 6 SF-6D 2 4 3 4 4 4 3 3 3 3 3 3 RAQoL 1 1 1 1 1 1 2 2 1 2 2 2 HAQ 4 3 2 3 3 3 6 5 5 5 5 5 ES = Effect Size; SRM = Standardized Response Mean; CSRM = Control Standardized Response Mean; RE = Relative efficiency; Ties in rankings are possible. 257 TABLE 7.9: ASSOCIATIONS BETWEEN INSTRUMENT UNWEIGHTED DOMAINS / SINGLE ATTRIBUTE SCORE CHANGES AND SELF-REPORTED CHANGE FROM 0 TO 6 MONTHS Patient Transition Question Patient Global Assessment of Disease Severity VAS Categories Instrument Changes in Domain/SA P-value P-value EQ-5D Usual activities 0.41 0.22 Pain/discomfort 0.03 0.003 Mobil i ty 0.19 0.01 : Anxiety/Depression 0.05 0.002 Self-care 0.003 O.001 SF-6D Physical functioning 0.003 O.001 Role limitations 0.001 0.008 Social functioning O.001 0.002 Pain 0.003 O.001 Mental Health 0.08 0.19 Vitality 0.08 0.01 HUB Vis ion 0.92 0.07 Hearing 0.77 0.50 Speech 0.84 0.97 Ambulation 0.01 <0.0'01 Dexterity 0.41 O.001 Emotion 0.003 0.001 Cognition 0.63 0.58 Pain 0.007 <0.001 HUI2 Sensation 0.99 0.05 Mobil i ty 0.02 O.001 Emotion 0.004 0.004 Cognition 0.36 0.99 Self-care 0.009 0.23 Pain 0.003 O.0001 258 FIGURE 7.1: AGREEMENT BETWEEN THE PATIENT TRANSITION QUESTION AND CHANGES USING MID CUTOFFS FOR THE GENERIC AND DISEASE-SPECIFIC INSTRUMENTS 50 40 o c CD 3 cr CD 20 10 1 n T r a n s i t i o n q u e s t i o n r H w o r s e ISarre Better < or = to -0.04 > -0.04 to < 0.04 > or = to 0.04 HUI2 d i f f e r e n c e s b e t w e e n 6 m o n t h s a n d b a s e l i n e Weighted Kappa = 0.24 (95% CI 0.13 - 0.35) 50 40 o c CD cr CD 30 20 10 < or = -0.07 > -0.07 and < 0.07 > or = to 0.07 T r a n s i t i o n q u e s t i o n |Worse Same Better HUI3 d i f f e r e n c e s b e t w e e n 6 m o n t h s a n d b a s e l i n e Weighted Kappa = 0.17 (95% CI 0.06 - 0.28) 259 FIGURE 7.1: AGREEMENT BETWEEN THE PATIENT TRANSITION QUESTION AND CHANGES USING MID CUTOFFS FOR THE GENERIC AND DISEASE-SPECIFIC INSTRUMENTS 50 < or = to-0.04 >-0.04 to < 0.04 > or = to 0.04 HUI2 differences between 6 months and baseline Weighted Kappa = 0.24 (95% CI 0.13 - 0.35) 50 < or = -0.07 >-0.07 and < 0.07 > or = to 0.07 HUI3 differences between 6 months and baseline Weighted Kappa = 0.17 (95% CI 0.06 - 0.28) 259 50 40 < or = to -0.05 >-0.05 and < 0.05 > or = to 0.05 EQ-5D difference between 6 months and baseline Weighted Kappa = 0.19 (0.09 - 0.28) 5 0 . < or = to-0.033 > or = to 0.033 >-0.033to <0.033 SF-6D differences between 6 months and baseline Weighted Kappa = 0.15 (95% CI 0.04 - 0.25) 260 60 > or = to 2.00 >-2.00 to < 2.00 <-or = to 2.00 RAQoL difference between 6 months and baseline Weighted Kappa = 0.28 (95% CI 0.17 - 0.38) 40 > or = to 0.250 >-0.250 to <2.50 < or = to-0.250 HAQ difference between 6 months and baseline Weighted Kappa = 0.23 (95% CI 0.12 - 0.33) 261 FIGURE 7.2: SCATTERPLOT OF HUI2 UTILITY SCORES OVER TIME STRATIFIED BY THE RESULTS OF THE COLLAPSED TRANSITION QUESTION Better Time (Da/s) The lines represent a least squares regression fit to characterize the trend for change over time by category. 262 F I G U R E 7 .3: S C A T T E R P L O T O F H U I 3 U T I L I T Y S C O R E S O V E R T I M E S T R A T I F I E D B Y T H E R E S U L T S O F T H E C O L L A P S E D T R A N S I T I O N Q U E S T I O N -0.2-I , , . \"—i . , • r-0 100 200 300 400 Time (Days) The lines represent least squares regression fit to characterize the trend for change by category. 263 F I G U R E 7.4: S C A T T E R P L O T O F EQ-5D U T I L I T Y S C O R E S O V E R T I M E S T R A T I F I E D B Y T H E R E S U L T S O F T H E C O L L A P S E D T R A N S I T I O N Q U E S T I O N -i 1 1 • 1 1 1 • r~ 0 100 200 300 400 Time (Days) The lines represent least squares regression fit to characterize the trend for change by category. 264 FIGURE 7.5: S C A T T E R P L O T O F SF-6D UTILITY SCORES O V E R T I M E STRATIFIED BY T H E RESULTS OF T H E C O L L A P S E D TRANSITION QUESTION Time (Days) The lines represent least squares regression fit to characterize the trend for change by category. 265 FIGURE 7.6: RESULTS OF THE MULTI-RESPONSE MODEL OF THE ASSOCIATION BETWEEN A CHANGE IN THE HUI2 AND THE TRANSITION QUESTION HUI2: Change from 0 to 6 months -0.6 -0.4 -0.2 -0.0 . 0.2 0.4 Change in HUI2: 6 mos. minus baseline score 266 FIGURE 7.7: RESULTS OF THE MULTI-RESPONSE MODEL OF THE ASSOCIATION BETWEEN A CHANGE IN THE H U I 3 AND THE TRANSITION QUESTION 267 F I G U R E 7.8: R E S U L T S OF T H E M U L T I - R E S P O N S E M O D E L O F T H E A S S O C I A T I O N B E T W E E N A C H A N G E I N T H E EQ-5D A N D T H E T R A N S I T I O N Q U E S T I O N EQ-5D - Change from 0 to 6 Months -04 -0.2 o.O 0.2 Change in EQ-5D score: 6 month minus baseline score r 0.4 0.6 268 FIGURE 7.9: RESULTS OF THE MULTI-RESPONSE MODEL OF THE ASSOCIATION BETWEEN A CHANGE IN THE SF-6D AND THE TRANSITION QUESTION 269 F I G U R E 7.10: R E S U L T S O F T H E M U L T I - R E S P O N S E M O D E L O F T H E ASSOCIATION B E T W E E N A C H A N G E IN T H E R A Q O L AND T H E TRANSITION Q U E S T I O N R A Q o L : C h a n g e f r o m 0 to 6 m o n t h s Change in RAQoL score: 6 mos. minus baseline score 270 FIGURE 7.11: RESULTS OF THE MULTI-RESPONSE MODEL OF THE ASSOCIATION BETWEEN A CHANGE IN THE HAQ AND THE TRANSITION QUESTION H A Q - C h a n g e f r o m 0 to 6 M o n t h s Worse Change in HAQ Score: 6 mos. minus baseline score n= 226 tif=3 271 FIGURE 7.12: RESULTS OF THE MULTI-RESPONSE MODEL OF THE ASSOCIATION BETWEEN A CHANGE IN THE HUI2 AND THE PATIENT GLOBAL ASSESSMENT OF DISEASE ACTIVITY 272 FIGURE 7.13: RESULTS OF THE MULTI-RESPONSE MODEL OF THE ASSOCIATION BETWEEN A CHANGE IN THE HUI3 AND THE PATIENT GLOBAL ASSESSMENT OF DISEASE ACTIVITY 273 FIGURE 7.14: RESULTS OF THE MULTI-RESPONSE MODEL OF T H E ASSOCIATION BETWEEN A CHANGE IN THE EQ-5D AND THE PATIENT GLOBAL ASSESSMENT OF DISEASE ACTIVITY 274 FIGURE 7.15: RESULTS OF THE MULTI-RESPONSE MODEL OF THE ASSOCIATION BETWEEN A CHANGE IN THE SF-6D AND THE PATIENT GLOBAL ASSESSMENT OF DISEASE ACTIVITY 275 7.16: RESULTS OF THE MULTI-RESPONSE MODEL OF THE ASSOCIATION BETWEEN A CHANGE IN THE RAQOL AND THE PATIENT GLOBAL ASSESSMENT OF DISEASE ACTIVITY RAQoL: Change from 0 to 6 months ta o Change in RAQoL: 6 months minus baseline score 276 FIGURE 7.17: RESULTS OF THE MULTI-RESPONSE MODEL OF THE ASSOCIATION BETWEEN A CHANGE IN T H E HAQ AND THE PATIENT GLOBAL ASSESSMENT OF DISEASE ACTIVITY H A Q : Change from 0 to 6 months -0.5 0.0 0.5 1.0 Change in HAQ: 6 months minus baseline score 277 CHAPTER 8 GENERAL DISCUSSION, CONCLUSIONS, AND RECOMMENDATIONS 8.1 SUMMARY OF STUDY FINDINGS The results of this study provided significant insights into the limits o f interchangeability of indirect utility assessment instruments in the assessment of H R Q L in patients with rheumatoid arthritis (RA) . In addition, it provided evidence that although each of these instruments displayed cross-sectional, construct validity in R A , they yielded quite different results and hence could lead to different policy decisions when utilized in an economic evaluation for R A interventions. A s well , annual household income appeared to impact on the scores achieved with these instruments which is inconsistent with the assumptions underlying utility measurements. Finally, for longitudinal construct validity, as expected and consistent with previous research findings, 1 the disease-specific measure (the R A Q o L ) performed the best. However, the H U D and SF-6D appeared to be suitable for measuring R A improvement, whereas the E Q - 5 D was the most responsive of the preference-based measures in terms o f detecting worsening. In terms o f the interchangeability o f the scores from the instruments, it was found that, on a cross-sectional basis, mean scores were significantly different. In addition, when agreement was analyzed, the scores from the instruments had low to moderate agreement and agreement tended to be better at the higher end o f the utility scores (towards perfect health) and tended to be much worse at the lower end (as scores approached states judged to be near or worse than death). In comparison to the other instruments, it appeared that the E Q - 5 D 278 tended to suffer from limitations previously described in the literature - namely gaps in its distribution especially at the mid-point of its scale. 2 ' 3 These differences are likely due to the various domains, levels o f choice within each domain, scoring algorithms, and the valuation techniques that are intrinsic to each system. However, despite the differences in scores, in our R A sample, each instrument appeared to be measuring mostly physical functioning and pain. However, the HUI2 and H U D systems were also assessing cognition while the E Q - 5 D and SF-6D were measuring emotional/mental health. In terms of construct validity in R A , the preference-based instruments all appeared to perform as hypothesized and almost as well as the R A Q o L and the H A Q . A l l o f the scores from the indirect utility assessment instruments were lower in naturally occurring dichotomous groups (such as use o f equipment for R A in the past twelve months - \"yes\" or \"no\") thought to have higher R A severity. In addition, there was an observed gradient of scores in the hypothesized direction across self-reported disease-severity and control with moderate to strong correlations. O f note, although the measures of association tended to be slightly higher for the R A Q o l and the H A Q , the preference-based measures performed almost as wel l in their ability to discriminate between groups o f known R A severity or symptoms. The overall scores and relevant attributes (for example, mobility, self-care and pain within the HUI2) within each o f the instruments performed as expected with moderate to large, significant correlations with R A clinical variables. In addition, weak associations were found between attributes not thought to be directly affected by R A (such as sensation, vision, and hearing) and R A clinical variables. Minimal ly important differences in the instrument scores were found to be somewhat higher than had been hypothesized or observed 279 in other studies of the H U I systems, 4 , 5 and approximately the same for the S F - 6 D , 6 E Q - 5 D and the H A Q . 7 Other investigators have noted that differences observed in the mean and individual scores across the instruments might result in radically different estimations of Q A L Y s and, thus, outcomes o f economic evaluations. 8 - 1 0 When the different utility scores were applied to a decision-analytic, Markov-based, cost-effectiveness analysis of infliximab plus M T X versus M T X alone, results presented in Chapter 5 substantiated these findings and hypotheses. For example, between the lowest (using HUI3-derived Q A L Y s ) and highest (using SF-6D derived Q A L Y s ) of the incremental cost-effectiveness ratios generated through the use o f scores from the HUI2 , H U D , E Q - 5 D and SF-6D as Q A L Y weights, there was a difference o f over 100% ($33,092 per Q A L Y , 95% CI $30,887 to $35,436 for the H U D vs. $67,005 per Q A L Y , 95% CI $62,773 to $71,540 for the SF-6D). Considering that the ceiling ratio commonly cited for policy-makers is $50,000 per Q A L Y , 1 1 different funding decisions could be made depending on which indirect utility assessment instrument was utilized. The impact on the different utility instruments on cost-effectiveness analysis is not limited to their differences in scores. A s Sculpher and O'Br ien wrote, scores on preference-weighted measures, i f truly reflecting society's preferences for health states, should not be influenced by income. 1 2 However, there is evidence that scores on other generic, nonpreferenced-based measures vary by income and other measures o f socioeconomic status in R A . Results presented in Chapter 6 clearly show that prior to adjustment for R A severity, each of the scores from any o f the H R Q L instruments (the R A Q o L , H A Q , and the indirect utility instruments) had a significant gradient within categories o f income (Table 6.5). After adjustment for R A severity which could be inherently biasing these finding 280 (sicker people have more severe R A and are poorer) and number of people in the household, there still was a clear gradient across income for all o f the indirect utility assessment instrument scores (however, only the H U D and SF-6D scores were still significantly associated with income in our sample). O f note, the H A Q also demonstrated this relationship whereas the R A Q o L did not. Other crucial properties of any H R Q L instrument are reliability and longitudinal construct validity (which may also be referred to as responsiveness). 1 4\" 1 6 Little work has been done in this regard on any of the indirect, utility assessment instruments (both within and outside of R A ) and for the R A Q o L . In addition, although patient transition questions (where patients have rated changes in their health retrospectively over a specified time period as \"better\", \"worse\" or \"the same\") have been utilized as an external criterion of change in responsiveness studies, 6 ' 9 there is little information about the reliability and validity of this technique in R A . A s documented in Tables 7.2 and 7.3, we found that all o f the indirect assessment instruments were reliable, with exception of the EQ-5D. This finding is both in 17 18 agreement and in contrast with previous studies conducted in R A . Consistently, using various methodologies and external criteria for change, the R A Q o L was determined to be the most responsive out of the tested instruments. The indirect utility instruments were poorly to moderately responsive with the SF-6D and the H U B being the most responsive for R A improvement and the E Q - 5 D being the most responsive for R A worsening. The HUI2 was the least responsive to changes in R A and the performance of the H A Q was dependent on the external criterion of change that was applied. In terms of using the patient transition question as the external criterion of change, it was found to be moderately correlated with other R A outcome measures but poorly associated with changes in the H R Q L measures. Using 281 categories of the patient global assessment of disease severity question appeared to yield much more valid results. 8.2 UNIQUE CONTRIBUTIONS, IMPACT, AND IMPLICATIONS The manuscripts comprising this dissertation contribute significantly to the current body of literature both methodologically and through the findings as they relate to disease specific H R Q L and generic measures that often are utilized as Q A L Y weightings in the cost-utility framework. 1 9 Thus, the findings in this study have significant implications both in the assessment of H R Q L and in the application of the indirect utility assessments in economic evaluations of interventions for R A . In terms o f the feasibility of these instruments in R A , all o f the instruments tested appear to have been accepted by the R A patients and were almost entirely successfully completed. In addition, they all appeared to be appropriate measures for R A in that they were able to discriminate between levels of disease severity and were strongly correlated with R A clinical measures. Therefore, in terms of these properties for the generic measures, there was little evidence that one should be chosen over the other to assess H R Q L in R A . A s a disease-specific measure, the R A Q o L was slightly better than the generic measures in terms o f construct validity and thus should be a preferred measure for the assessment o f R A specific H R Q L . In terms o f the application of their scores as Q A L Y weights to be used in economic evaluation, we found a large impact on the incremental cost-effectiveness ratio by utilizing the different indirect utility assessment instrument scores. Considering the lack o f advice on Q A L Y weighting standardization either within the rheumatology research community or 282 from economic evaluation guidelines, 2 1 these results are extremely useful in illustrating the magnitude of this problem. A s such, these finding should guide either the standardization of Q A L Y weighting measures within economic evaluations for R A or make it implicit that sensitivity analyses that covers the wide range of utility values that are possible with these instruments are performed. In addition, these results wi l l be useful for informing economic evaluations occurring outside of R A and rheumatological conditions as this issue is likely also relevant in different disease states. Thus, an effort should be made for Q A L Y weighting standardization both within and across disease states, especially i f comparisons are to be made between economic evaluations while allocating limited health care resources. The finding that annual household income was associated with scores on the preference-based measures was problematic as utility scores should be neutral with respect to income. A s such, this research provides empiric evidence for the concern raised by Sculpher and O ' B r i e n 1 2 and w i l l serve as an impetus for future work in this area to address this limitation of the current instruments or prevent double-counting in economic evaluations. These results also show that generic quality of life, despite R A severity, was lower in those who are poorer which was similar to other investigators' findings.13 Since self-reported health (including H R Q L ) is strongly and independently associated with morbidity and mortality, there should be new focuses for interventions in R A that do not focus solely on clinical or severity measures. Another benefit o f this research was the discovery that population-based, census-level or post-code based variables (neighbourhood median income, percentage with baccalaureate degrees and neighbourhood unemployment) are poorly correlated with R A patient self-reported socioeconomic data. A s such, this 283 research can inform other studies conducted in this area and prevent erroneous results based upon incorrect assumptions. Wi th respect to responsiveness, our results showed that not only does the R A Q o L have cross-sectional construct validity but it also was reliable and responsive to change. Thus, the R A Q o L appeared to be a feasible, reliable, valid, and responsive instrument - all properties necessary for a excellent too l . 1 4 There were no reasons why this new measure should not be integrated as an outcome measure in the assessment of R A in both clinical trials and in observational studies. Wi th respect to the indirect utility assessment instruments, the responsiveness analysis provided guidance about which measures should be adopted in certain situations. For example, in the application of a known, effective intervention, the SF-6D or H U B would be suitable choices considering their cross-sectional and longitudinal construct validity (for the latter, their ability to detect change is superior in measuring improvement in R A rather than worsening). The EQ-5D may be better suited when a preference-based measure is needed to detect worsening. The HUI2 appeared to relatively unresponsive in patients with R A . 8.3 STUDY STRENGTHS AND LIMITATIONS 8.3.1 Strengths This study has a number of specific strengths that enhance the credibility of the results. First o f al l , it was conducted in a relatively large sample of patients in a homogeneous, well-delineated disease group, R A . This fact was important as there should not be differential response between the instruments i f some domains were more valid or responsive within different diseases. In addition, although recruitment occurred within one 284 province, due to the cosmopolitan nature o f British Columbia, many of the participants came from racially diverse backgrounds, thus increasing the applicability o f our results. Another advantage with our methodology was the inclusion and comparison o f the four most popular indirect utility assessment instruments (the HUI2 , H U I 3 , E Q - 5 D and SF-• 6D) as well as a disease-specific ( R A Q o L ) and a well-established disability measure ( H A Q ) . Thus, we were able to perform between instrument comparisons within the indirect utility assessment classification and also compare their individual and collective performances relative to disease-specific measures. These additional data generated are improvements over many of the comparative studies which have only examined two or three o f the instruments leaving many questions unanswered (see Chapter 2). Due to our prospective sampling and questionnaire methodology, we were able to directly ask questions to the R A patients that could not otherwise have been assessed through administrative data or retrospective chart reviews. A s such, we could assess construct validity based upon other variables (such as outlined in Table 4.6) rather than limiting our analysis to commonly collected clinical data. In addition, we could ascertain many sociodemographic variables including annual household income directly from respondents rather than relying on census level data, which, as we have shown, could have lead to inaccurate results (Chapter 6). For the cost-effectiveness analysis of infliximab (Chapter 5), we were fortunate to have access to some very large and comprehensive datasets that increased the validity of the results. For example, we modeled all-cause mortality based upon more than 1900 consecutive R A patients followed since 1974 at the Wichita (Kansas) Arthritis Center, an outpatient rheumatology c l i n i c . 2 3 ' 2 4 We also had access to the dataset which formed the basis 285 of the only comprehensive, Canadian R A cost-of-illness study that has been published to 25 date. Resource utilization and cost data were collected during a longitudinal study of 1063 Canadian R A patients who reported semi-annually on their health services utilization over the preceding 6 months during 1983 and 1994. 2 5 These direct costs o f R A care were comprehensive and were comprised of long-term care, rehabilitation, nursing homes, health professional visits, medications, diagnostic tests, acute hospitalization, emergency department visits, ambulance services, dialysis and outpatient surgeries. Finally, to estimate indirect costs due to productivity losses, we had access to data from a prospective longitudinal sample of 120 employed R A patients recruited in Ontario, Canada from September 1999 to December 2001. In the self-report questionnaire, participants were asked the number of days missed due to R A in the past 6 months and their regular weekly working hours. We were also able to divide the H A Q into smaller discrete health states for the state-transition model than have been previously used by investigators utilizing HAQ-based Markov models. ' The problem with dividing the H A Q scores into four to six discrete health states is that much smaller changes in this instrument have been shown to be predictive of large changes in resource utilization, 2 9 work productivity 3 0 and mortality. 2 4 Thus, in these models, the loss of transitions between smaller changes in H A Q resulted in a less accurate assessment of costs and outcomes. A s such, the fact that we were able to create a 25 state transition model (based on an underlying continuous process) with costs and outcomes associated with each of the discrete states, allowed for more accurate modeling of the progression of the disease and prediction of economic outcomes. 286 In the reliability and responsiveness analyses, we were able to assess these properties using more than one method. For reliability assessment, we used two versions of test-retest methodology which produced very similar intraclass correlation coefficients adding to the consistency of our results. In addition, since it has been demonstrated that the various responsiveness statistics often give different results, 3 1 ' 3 2 we used multiple methods to assess this property and examined the consistency of the results generated by these analyses. Besides using commonly recommended responsiveness statistics (ES, S R M , C S R M , paired t-tests, and RE) , 1 5 we also utilized a new graphical method developed by Abrahamowicz and colleagues. 3 3 This new method is potentially advantageous for the clinical translation of responsiveness as clinicians can examine the probability o f true improvement (as assessed by the external criterion) based on a given change in the H R Q L score. Therefore, our recommendations for responsiveness were based upon multiple criteria and are likely considerably more robust than i f only one methodology was utilized. The rankings across the different statistics, were, for the most part, in agreement and thus strengthened our conclusions. In addition, we also utilized two external criteria for change - the patient transition question and a categorization of the patient global assessment of disease severity visual analogue scale. Therefore, we examined consistency across results achieved with both external criteria in our assessment of responsiveness. 8.3.2 Limitations A s with any study, this one was not without its limitations, none of which significantly affected the findings. In each of the stand-alone manuscripts, limitations have been stated but, nonetheless, they merit review. 287 With respect to the questionnaire process, there were a few potential threats to the internal validity of the study. For example, since this was a natural history study without a known, effective intervention or a control group, people's H R Q L was assumed to change (either positively or negatively) or stay the same over time due to their R A . However, this assumption was only true i f all extraneous variables that might affect health status could be held constant over time. O f course, at the individual level, events may have occurred that influenced their H R Q L that were independent of their R A (i.e. development of or worsening of co-morbidities, personal problems). However, on a group level, it would be unlikely for these events to influence the changes in health status captured by the H R Q L instruments. It was possible that the methods used to administer the questionnaire may have influenced the results to some extent. The ordering of the H R Q L instruments measures was not randomized because of the self-completed nature o f the study and due to the observation, during piloting, that respondents were completing the questionnaires in different orders rather than how they were presented. Also , because respondents were permitted to take the questionnaire home with them (or, were mailed the questionnaire at home) and were instructed to mail it, once completed, back to the investigator, it was possible that some of the respondents answered the questions over several days. Thus, it was conceivable that some patients filled out one H R Q L measure on a given day when their health status was different from when they completed other H R Q L measures. For instruments such as the E Q -5D with an immediate recall period, this could result in potentially biased results during cross-sectional comparisons with the other instruments. Another area of concern with longitudinal studies is that the experience of people lost to follow-up may be systematically different than those who decide to stay in the study. To 288 alleviate some concern, baseline summary scores and demographic and clinical characteristics of those who were retained for the duration of the study were compared to dropouts (Table 7.1). Patient characteristics and H R Q L scores between the groups were generally very similar. The few statistically significant differences between the two groups tended to favour the group that completed only the baseline questionnaires as they were found to have significantly better mean baseline H A Q scores, lower self-reported R A severity, and a higher percentage were working outside the home. The retention rate for both those who completed both the baseline and 6 month assessment was 239 out of the 320 (75%). This is a reasonable retention rate for any longitudinal study and is evidence of the considerable efforts expended by the project team to track and follow-up the patients. A problem with any study assessing the responsiveness of H R Q L instruments has to do with the definitions o f this property that exist in the literature. 1 5 , 1 6 For the most part, consensus is lacking on what constitutes a responsive measure, and how responsiveness should be quantified. In addition, attempts to further refine and define this property of H R Q L measures have led to the creation of new and, sometimes conflicting, definitions. 15,16,37 Also , with repeated administration of the questionnaires over time, there was the possibility that respondents may have undergone what is deemed \"response shift\". 3 4 This process involves a change in the calibration or self-valuation of a particular health state over time. This finding (the adaptation to long-term conditions leading to bias in self-assessments of H R Q L ) has recently been illustrated. 3 5 ' 3 6 Another competing theory has been deemed \"implicit theory of change\". In this theory, patients who respond to a questionnaire regarding health where there is retrospective recall (i.e. \"Overall, how would you describe 289 changes in your rheumatoid arthritis since answering the FIRST questionnaire? (i.e. about six months ago\") reconstruct the previous state such that the prospective judgment is more valid. For example, patients begin with their perceived present health state and then infer, based upon impression rather than a conscious comparison between two time points, what their health state must have been. Either of these conditions would have been a threat to validity in our study. These theories are especially applicable when one considers that one of our external criteria, the patient transition question, was based upon recall o f a health state from baseline. Because no 'gold standard' exists for establishing change in H R Q L in R A , two external anchor-based criteria were used to evaluate responsiveness. One of which was the patient transition question, a 5-point Likert scale evaluating how the respondent's R A had changed since the last questionnaire. However, there has been some criticism about this approach as recall of a previous state in patients with a chronic disease is subject to recall bias (implicit change theory) or could be influenced by response shift, as described above. 3 8 However, in defense of this criterion of change, since this question was asked only at the follow-up and not at the baseline, error associated with its measurement w i l l only occur once and would not be compounded with baseline measurement error. The other anchor based criterion was a categorization of the patient global assessment of disease activity V A S . The cutoff values applied in this study were derived from the definitions of positive change associated with effective treatment as defined by the American College o f Rheumatology criteria, 3 9 but they may not be the optimal cutoff points for change in this natural history R A sample. In addition, the definition for \"worse\" associated with this 290 criterion (> 20% reduction in the V A S where \"0\" is anchored at \"Very Poor\" and \"1\" is anchored at \"Very Well\") has not been widely used. Comparisons o f difference scores for patients whose R A was defined as \"worse\", \"same\" or \"better\" across the two anchors assisted in distinguishing between problems that where associated with the external anchors of change (i.e. misclassification error). The direction and magnitude of mean difference scores o f patients using the different external anchors of change provided insights into the appropriateness of these external anchors. The categorization of the patient global assessment of disease activity criterion demonstrated all 3 of the following characteristics that were intuitively desirable (using literature values for the M I D ) 4 0 : (1) positive mean change scores that equal or surpass the M I D for a summary score based on patients classified as \"better\"; (2) negative mean change scores that equal or surpass the M I D for a summary score in patients classified as \"worse\"; (3) change scores, either positive or negative, that were less than the M I D for patients classified as \"same\" (Table 7.7). For the patient transition question, these characteristics did not apply, and there were some instances where the group classified as '\"same\" had a greater mean difference score than the groups classified as either \"worse\" or \"better\" (Table 7.6). For example, changes in the \"same\" group for the H U D , and H U D were of equal magnitude (although in the opposite direction) to those classified as \"worse\". For the SF-6D, those classified as \"same\" had changes equal and in the same direction as those classified as \"better\". This finding was especially problematic as this change was significant (pO.OOOl). The R A Q o L also had a significant amount of change (p=0.004) in the same direction (although not o f equal magnitude) as those classified as better. For these and other reasons, the patient global assessment of disease activity categories was deemed to be a superior external criterion. 291 The choice of time periods over which to evaluate responsiveness was based upon the investigators estimates of the likelihood of change over a given time period. A s such, to allow for the greatest opportunity for potential change in R A , responsiveness was evaluated between baseline and 6 months. However, further examination of responsiveness could occur between other time periods such as between baseline and 3 months, or even 3 and 6 months. While these analyses might capture smaller meaningful change, we believe that such a short time frame (3 months) might not be adequate for most patients to experience true changes in their disease in a natural history study of R A . Finally, although we conducted analyses to determine which of the domains/single attributes of the indirect utility assessment instruments were associated with changes as defined by both external criteria, we did not specifically evaluate the responsiveness of each of these domains/attributes. In Chapter 5, as in all decision-analytic studies, there were some limitations and important assumptions that could influence the interpretation of the results. First of all , as the transition probability matrices were estimated from a randomized, controlled trial, the results of the analysis really reflected cost-efficacy and not cost-effectiveness. Wolfe et al. make this important distinction in a recent editorial pointing out that outcomes achieved with drug therapy in R A clinical trials are often quite different than observed in clinical practice. 4 1 Secondly, we have assumed that the impact on H A Q o f infliximab would impart a survival benefit similar to that observed with M T X . While this assumption is likely reasonable, it w i l l be many years before this finding can be substantiated in an observational study. Also , although we estimated direct costs from a large Canadian sample followed longitudinally for a number of years, there are some limitations with these data. They were drawn from the province of Saskatchewan which has few rheumatologists and thus the costs o f care may be 292 conservative. In addition, these costs come from an era when other new drug therapies (such as the C O X - 2 specific inhibitors) and other drug strategies (such as combination D M A R D therapy) were not yet available or utilized thus further underestimating the costs likely to be incurred today. From the productivity database, in order to integrate the costs into the H A Q -based model, we had to find a relationship between scores on the H A Q and ability to work. Unfortunately, the entire H A Q questionnaire was not collected in this study and only certain H A Q questions and elements from the Multidimensional Functional Assessment Questionnaire were available. From these, we constructed a disability index that served as a proxy for the H A Q . On further testing in the prospectively collected sample, we found that there was high correlation between this proxy score and the actual H A Q . Furthermore, we simulated 100,000 individual patients' results (50,000 for each treatment strategy) which may have reduced the confidence limits around the cost-effectiveness ratio through a reduction in variability. Therefore, with fewer simulated patients, the confidence intervals may have been wider with a larger degree of overlap with the results achieved by the different indirect utility assessment instruments. Finally, we used a ceiling ratio o f $50,000 per Q A L Y as the threshold for what decision-makers might consider to be a fundable program. However, despite widespread use of this ceiling value in the literature, there is evidence that this value might be too l o w 4 2 With respect to the paper outlining the effect of income on the scores achieved with the H R Q L instruments (Chapter 6), there were a few limitations. Annual income data was collected in increments of $10,000 (starting with the first category of <$20,000) up to a maximum of $100,000 (the last category was >$ 100,000) and then further categorized to <$20,000, $20,000 to $50,000, and >$50,000. Some people declined to answer this question 293 resulting in 19% (n=58) missing values for this variable. It is possible that the income (and other characteristics) of these individuals was systematically different than those who answered the question resulting in a bias. However, when we compared all other available characteristics between these two groups, no differences were detected thus giving some confidence that no bias was introduced. Collapsing a variable, as was done with income, can sometimes lead to a loss of information. However, when measuring socioeconomic status, rather than continuous income, definitions utilized in other studies have often been based on similar categories. Income, as an indicator of SES, is almost always classified depending on the number of subjects. For example in large analyses (such as the Black report and the National Population Health Survey analyses in Canada), 4 3 ' 4 4 income has been categorized. This categorization aids in interpretability as, instead of finding a linear relationship between continuous income and H R Q L score, categorizing income allowed us to classify people as being of low, medium or high income and to calculate the gradient in H R Q L scores associated with this classification. Also , with our sample's size, it was helpful to classify, as the classification decreased the noise. There is much greater noise by misclassification of a person when using $10K increments (i.e. a person can report their income as $40K/yr when, in fact, it 's closer to $30K/yr. B y classifying this way, one is more likely to classify an individual into the correct group even i f there is some error in the reporting. This may not be the case i f we were utilizing a more objective source of data (i.e. income tax reports) but is most likely with self-report information due to recall bias. In the relative income hypothesis, 4 5 health is related to one's income (or social status) relative to others in their population, not to their absolute income. This hypothesis facilitates 294 the division of the sample into categories (such as low/intermediate/high) of social status based upon income. Population-based studies (with very large sample sizes) inherently use quintiles or deciles. Since this study was not population-based and is based upon a relatively small sample (when compared to the population), we were restricted to categorizing our data into larger categories from which one can draw statistical inference. Finally, these categories of income have been applied successfully in another sample. 4 6 For many o f the analyses in our study, we have utilized parametric statistics. However, there is some evidence that the scores obtained on H R Q L instruments can be interpreted as ordinal. There are numerous publications that support the analysis of these types of data using conventional, parametric statistical techniques provided that some criteria are met. Particularly, in a recent publication, the authors have nicely demonstrated that, when an ordinal health-related quality of life scale has more than seven categories, it can be treated as continuous provided that most of categories are occupied 4 7 Also , where the distribution is spread over a number of categories, it can be assumed that the data were generated from a continuous distribution. The indirect utility assessment instruments have more than 100 possible scores, the H A Q has 25 categories and the R A Q o L has 30. In our sample, the distributions the scores of the instruments are spread across many of the \"categories\", and it appears that the underlying distribution is continuous. These findings are well known in the 48 psychometric literature and have been described almost two decades ago by Bollen et al., and Johnson and Creech. 4 9 295 8.4 RECOMMENDATIONS The results from this study give rise to some important recommendations. First o f all , since application of the different indirect utility assessment instruments in an economic evaluation result in incremental cost-effectiveness ratios that differ by as much as 100%, guidelines should be developed for use of the scores from these instruments as weightings for Q A L Y s . Without standardization to the application of these instruments, it w i l l be difficult to compare the results of economic evaluations both within and across disease groups. Thus, at the very least, sensitivity analysis should be employed which covers the full spectrum of Q A L Y s that could be achieved with the different indirect utility assessment instruments. Without these changes, the limitation of not being able to compare across studies directly challenges one of the advantages to using the incremental cost per Q A L Y approach -namely, the ability to compare across studies based on the same outcome metric. However, having stated the above recommendation, within R A and likely other disease states, it is difficult to compare among the instruments based upon their properties as H R Q L scales. For example, all the indirect utility assessment instruments displayed construct validity in R A and likely are suitable as generic H R Q L measures. However, in terms o f reliability, responsiveness and ability to measure positive changes in R A , the H U D and the SF-6D appeared to be superior to the EQ-5D. The finding that scores of the indirect utility assessment instruments vary by income reveal the possibility that bias could be introduced into economic evaluations in a number of ways. In clinical trials where an economic evaluation is to be conducted alongside, care should be taken to insure that annual household income is balanced between the two or more arms that are being compared. 296 A s an external criterion for change, the transition question that we adopted did not appear to be the best anchor-based approach to utilize. Categories based upon the patient global assessment of change V A S appeared to perform better and thus should be utilized in future studies when an external criterion is desired. Finally, the disease-specific R A Q o L displayed excellent properties as a H R Q L instruments. Not only was it feasible, well received by patients, valid (as measured using the known-groups approach to construct validity), it was highly responsive, even in a natural history study o f R A . A s such, there should be consideration by investigators for this scale to be used as an outcome in clinical trials and observational studies in R A . 8.4.1 Further Research Although we have recommended that there be a standardized approach to the application o f scores from these indirect utility assessment instruments in economic evaluations, a lot of effort w i l l be required to implement this task. Research should be performed in various disease states comparing the properties of these instruments and which performs the best. After this research is conducted, a systematic approach w i l l need to be adopted to determine which of the instruments, when applied over several disease states, yields the most valid responses. Further comparisons of preference-based measures within R A need to be performed. For example, responses obtained with the indirect utility measures, which reflect society's preferences, should be compared with measures such as the standard gamble that can reflect patient preferences. Comparisons about how differences with application of scores from both approaches influence economic evaluation should be performed. 297 Economic evaluations utilizing measured changes over time in scores obtained with the indirect utility assessment instruments should be performed. We are in the process of applying this methodology to a cost-effectiveness analysis o f a new pharmacotherapeutic intervention for R A . Methods to remove the impact of income from the scores achieved with the indirect utility instruments should be investigated. In addition, with economic evaluations already performed using these instruments, it would be useful to know if, and to what extent, double-counting has occurred and i f bias has been introduced. The R A Q o L should be compared to other RA-specific H R Q L measures such as the A I M S systems to determine its relative performance. Also , further examination of the responsiveness of each of the domains/attributes of the indirect utility assessment instruments should be performed to further evaluate the scales' relative merits in the assessment of R A . Finally, further research is required to investigate the mechanisms of reduced H R Q L in R A secondary to lower income despite adjustment for disease severity and other chronic diseases. Considering the association between self-reported health and morbidity and mortality, this finding could result in further improvements in health beyond those offered from effective drug therapies. 8.5 CONCLUSIONS The choice of indirect utility assessment instruments in estimating Q A L Y s in economic evaluations for R A appears to have been somewhat arbitrary and there has been little guidance in selecting which instrument should be used. However, due to their ease and low cost of administration, there has been an increased use of scores from indirect utility 298 assessment instruments as Q A L Y weightings despite the lack o f comparative data on their strengths and weaknesses. A s with other diseases, these observations are relevant to R A where, due to a recent explosion in the availability of effective, expensive drug therapies, economic evaluations are becoming increasingly popular. This study provides good evidence that the scores achieved by these instruments are not interchangeable, and when used to estimate Q A L Y s in economic evaluation, could result in very different information for policy decision-makers. Through an exploration of their properties, it can be concluded that all the indirect utility assessment instruments are feasible, valid and, for the most part reliable (with the exception of the EQ-5D) . However, there are important differences between the instruments in terms of which domains of health they assess, the scores that they achieve and their relative responsiveness to changes in R A . The H U D and the SF-6D, based on their properties, appear to be the instruments with the most desirable properties to be utilized in R A . However, when their scores were applied to estimate Q A L Y s in an economic evaluation for a pharmacotherapeutic intervention for R A , the outcomes (in terms of incremental cost per Q A L Y ) using the SF-6D were more than twofold that of the H U D (with those achieved with the E Q - 5 D and the HUI2 being in between). Other important findings of this study include that annual household income is positively associated with scores achieved with the indirect utility assessment instruments, The fact that income influences the scores achieved by all the indirect utility assessment instruments (but, in our sample, only significantly for the H U D and the SF-6D) implies that bias might be introduced into economic evaluations that utilized these scores as Q A L Y weightings. 299 Another finding of this study included that the R A Q o L had excellent psychometric properties thus making it a useful outcomes measurement tool for both clinical trials and observational studies in R A . In addition, in the assessment of responsiveness, we found that the use of a patient transition question as an anchor-based, external criterion for change in R A was not as useful as using categorizations of the patient global assessment of disease activity V A S . Finally, the finding that generic H R Q L is lower in patients with lower income makes future research in determining the mechanism behind this observation imperative as self-reported health is associated with morbidity and mortality. In conclusion, although all indirect utility assessment measures appear to demonstrate the minimal requirements to assess generic H R Q L in R A , when used as quality weights in the calculation o f Q A L Y s in an economic evaluation, they give vastly different results that could result in different policy recommendations. The scores of these instruments could also be biased by income. The R A Q o L displayed excellent properties and is a suitable disease-specific H R Q L instrument for R A . 300 8.6 REFERENCES 1. Wiebe S, Guyatt G , Weaver B , Matijevic S, Sidwell C. Comparative responsiveness of generic and specific quality-of-life instruments. J C l i n Epidemiol. 2003;56:52-60. 2. Wolfe F , Hawley D J . Measurement o f the quality of life in rheumatic disorders using the EuroQol. B r J Rheumatol 1997; 25:675-682. 3. Longworth L , Bryan S. A n empirical comparison of E Q - 5 D and SF-6D in liver transplant patients. Health Econ 2003; 12: 1061-1067. 4. Horsman J, Furlong W , Feeny D , Torrance G . The Health Utilities Index (HUI®): concepts, measurement properties and applications. Health and Quality of Life Outcomes 2003;1:54 (available from http://hqlo.eom/content/l/l/54). 5. Grootendorst P, Feeny D , Furlong W . Health Utilities Index Mark 3: evidence of construct validity for stroke and arthritis in a population health survey. M e d Care. 2000; 38: 290-299. 6. Walters SJ, Brazier JE. What is the relationship between the minimally important difference and health state utility values? The case of the SF-6D. Health and Quality of Life Outcomes 2003;1:4 (available at http://www.hqlo.eom/content/l/l/4) 7. Redelmeier D A , Lor ig K . Assessing the clinical importance of symptomatic improvements — an illustration in rheumatology. Arch Intern M e d 1993; 153:1337-1342. 8. Oostenbrink R, M o l l H A , Essink-bot M L . The EQ-5D and the Health Utilities Index for permanent sequelae after meningitis. A head to toe comparison. J C l i n Epidemiol 2002;55:791-799. 301 9. Conner-Spady B , Suarez-Almazor M E . Variation in the estimation of quality adjusted life-years by different preference-based instrument. M e d Care 2003;41:791-801. 10. Neumann PJ , Sandberg E A , Araki SS, Kuntz K M , Feeny D , Weinstein M C . A comparison of HUI2 and H U B utility scores in Alzheimer's disease. M e d Decis Making. 2000; 20:413-422. 11. Goeree R, O 'Br ien B J , Blackhouse G , Marshall J, Briggs A , Lad R. Cost-effectiveness and cost-utility of long-term management strategies for heartburn. Value in Health 2002;5:312-324. 12. Sculpher M , O 'Br ien B J . Income effects of reduced health and health effects o f reduced income M e d Decis Making 2000;20:207-215. 13. Brekke M , Hjortdahl P, Thelle DS , Kvien T K . Disease activity and severity in patients with rheumatoid arthritis: relations to socio-economic equality. Soc Sci M e d 1999;48:1743-1750. 14. Dolan P. The measurement of health-related quality of life for use in resource allocation decisions in health care. Chapter 32. In: Handbook of Health Economics, V o l . 1. Edited by Culyer A J , Newhouse JP. London, U . K . Elsevier Science 2000. 15. Terwee C B , Dekker F W , Wiersinga, Prummel M F , Bossuyt P M M . On assessing the responsiveness of health-related quality of life instruments: Guidelines for instrument evaluation. Qual Life Res 2003;12:349-362. 16. Liang M H , Lew R A , Stucki G , Fortin PR, Daltroy L . Measuring clinically important changes with patient-oriented questionnaires. M e d Care 2002;40 (suppl):II-45 - II-51. 302 17. Russell A S , Conner-Spady B , Mintz A , Mal lon C, Maksymyowych W P . The responsiveness of generic health status measures as assessed in patients with rheumatoid arthritis receiving infliximab. J Rheumatol 2003;30:941-947. 18. Hurst N P , K i n d P, Ruta D, Hunter M , Stubbings A . Measuring health-related quality of life in rheumatoid arthritis: Validity, responsiveness and reliability of EuroQol (EQ-5D). B r J Rheumatol 1997;36:551-559. 19. Drummond M F , O'Br ien B , Stoddart G L , Torrance G W (eds.). Methods for the economic evaluation of health care programmes. 2nd edition. Oxford Medical Publications, Oxford. 1997. 20. Maetzel A , Tugwell P, Boers M , Guil lemin F , Coyle D , Drummond M , Wong J B , Gabriel SE on behalf o the O M E R A C T 6 Economics Research Group. Economic evaluation of programs or interventions in the management of rheumatoid arthritis: Defining a consensus based reference case. J Rheumatol 2003;30:891-896. 21. Canadian Coordinating Office for Health Technology Assessment: Guidelines for Economic Evaluation of Pharmaceuticals, Canada. Ottawa: The Canadian Coordinating Office for Health Technology Assessment ( C C O H T A ) 2nd edition, 1997. 22. Idler E L , Benyamini Y . Self-rated health and mortality: A review of twenty-seven community studies. J Health Social Behaviour 1997;38:21-37. 23. Choi H K , Hernan M A , Seeger JD, Robins J M , Wolfe F. M T X therapy and mortality in patients with rheumatoid arthritis: a prospective study. Lancet 2002;359:1173— 1177. 303 24. Wolfe F, Michaud K , Gefeller O, Choi H K . Predicting mortality in patients with rheumatoid arthritis. Arthritis Rheum 2003;48:1530-1542. 25. Clarke A E , Zowal l H , Levinton C, Assimakopoulos H , Sibley JT, et al.. Direct and indirect medical costs incurred by Canadian patients with rheumatoid arthritis: A 12 year study. J Rheumatology 1997;24:1051-1060. 26. Anis A H , Sun H Y , Gignac M . The indirect costs of illness disability in an incident cohort of arthritis patients. Arthritis Rheum 2002;46:s91. 27. Wong JB, Singh G , Kavanaugh A . Estimating the cost-effectiveness of 54 weeks of infliximab for rheumatoid arthritis. A m J Med 2002; 113:400-408. 28. Kobelt G , Jonsson L , Young A , Eberhardt K . The cost-effectiveness of infliximab (Remicaide) in the treatment of rheumatoid arthritis in Sweden and the United Kingdom based on the A T T R A C T study. Rheumatology 2003;42:326-335. 29. Lajas C, Abasolo L , Bellajdel B , Hernandez-Garcia C, Carmona L , Vargas E , Lazaro P, Jover J A . Costs and predictors of costs in rheumatoid arthritis: a preyalence-based study. Arthritis Rheum. 2003;15;49:64-70. 30. Ethgen O, Kahler K H , Kong S X , Reginster J Y , Wolfe F. The effect of health related quality o f life on reported use of health care resources in patients with osteoarthritis and rheumatoid arthritis: a longitudinal analysis. J Rheumatol 2002;29:1147-1155. 31. Blanchard C , Feeny D , Mahon J L , Bourne R, Rorabeck C, et al.. Is the Health Utilities Index responsive in total hip arthroplasty patients? J C l i n Epidemiol 2003;56:1046-1054. 32. Wright J G , Young N L . A comparison of different indices of responsiveness. J C l i n Epidemiol 1997;50:239-246. 304 33. Fortin PR, Abrahomowicz, Clarke A E , Nevil le C , D u Berger R, et al.. Do lupus disease activity measures detect clinically important changes? J Rheumatol 2000:27;1421-1428. 34. Brossart D F , Clay D L , Wil lson V L . Methodological and statistical considerations for threats to internal validity in pediatric outcome data: response shift in self-report outcomes. J Pediatr Psychol. 2002 ;27:97-107. 35. Groot W . Adaptation and scale of reference bias in self-assessments of quality of life. J Health Econ 2000;19:403-420. 36. Daltroy L H , Larson M G , Eaton H M , Phillips C B , Liang M H . Discrepancies between self-reported and observed physical function in the elderly: the influence of response shift and other factors. Soc Sci M e d 1999;48:1549-61. 37. Husted J A , Cook R J , Farewell V T , Gladman D D . Methods for assessing responsiveness: a critical review and recommendations. J C l i n Epidemiol 2000;53:459-468. 38. Norman G . H i ! H o w are you today? Response shift, implicit theories and differing epistemologies. Qual Life Res 2003;12:239-49. 39. Arnett F C , Edworthy S M , Bloch D A , McShane D J , Fries JF, Cooper N S , et al.. The American Rheumatism Association 1987 revised criteria for the classification of rheumatoid arthritis. Arthritis Rheum 1988;31:315-324. 40. Pickard A S . Responsiveness of generic health status measures in stroke. Doctor of Philosophy thesis. University of Alberta. 2002. 41. Wolfe F , Michaud K , Pincus T. Do rheumatology cost-effectiveness analyses make sense? Rheumatology 2004; 43(l):4-6. 305 42. Ubel P A . What is the price of life and why doesn't it increase at the rate of inflation? A n n Intern M e d 2003; 163:1637-41. 43. Gray A M . Inequalities in health. The Black Report: a summary and comment. Int J Health Serv 1982;12(3):349-80. 44. Statistics Canada. National Population Health Survey, 1996-1997, Public use microdata files. 1998; Ottawa: Statistics Canada. 82 -M0009XCB. 45. Wilkonson R G . Income distribution and life expectancy. B M J 1992;304:165-8. 46. Lynd L D , Pare P D , Ba i T, Fitzgerald J M , Anis A H . A cross-sectional evaluation of the relationship between socioeconomic status and the magnitude of short-acting beta-agonist use in asthma. Chest (accepted December 2003). 47. Walters SJ, Campbell M J , La l l R. Design and analysis of trial with quality of life as an outcome: a practical guide. J Biopharm Stat 2001 ;11:155-176. 48. Bol len K A , Barb K H . Pearson's r and coarsely categorized data. A m Soc Rev 1981;46:232-9. 49. Johnson D R , Creech JC. Ordinal measures in multiple indicator models: A simulation study of categorization error. A m Soc Rev 1983;48:398-407. 306 307 APPENDIX I THE COST OF COX INHIBITORS: HOW SELECTIVE SHOULD WE BE? Marra CA, Esdaile JM, Sun H, Anis AH. J Rheumatol 2000;27:2731-3 308 Canada's national health insurance scheme covers the cost o f outpatient physician visits and acute hospital care for all Canadians. Prescription drug costs are not universally covered and the extent of coverage varies by province. However, at the very least, those over 65 years and the financially indigent are covered in every province and some also provide universal coverage after an initial deductible/co-payment. It has been estimated that almost three-quarters of Canadians have some form of coverage (either public, private or both) for prescription medications. 1 Thus, it is of concern that total expenditures on prescription drugs have increased more rapidly than any other health care component. In 1999, they accounted for over 15% of total health care expenditures.2 The significance of this is underscored by the fact that in 1993 the total expenditures on prescription drugs exceeded the total expenditures on physicians for the first time. 3 Over the past decade, this increase in overall drug expenditures can be attributed to both an increase in the number o f drugs being consumed and the higher average price of individual drugs. Since 1995, Canadians have spent on average 9% more each year on drugs. In 1999, they spent 11.4% more than they had in 1998.1 For rheumatology, these trends in Canadian drug utilization and expenditures are even more pronounced than other fields. In 1999, prescription drugs used i n the treatment of arthritic diseases accounted for approximately 3% ($ 231 million) of the 8.3 bi l l ion (approximately $270 per person) spent on prescription drugs.4 Although this figure is low compared to cardiovascular medications (15% of expenditures on prescription drugs or $1.2 billion), it represents one of the largest year over year increases in prescription drug purchases by therapeutic class. When compared to 1998, expenditures on arthritis drugs in 1999 increased by 27%. In addition, 4% (11,500) of all prescriptions in 1999 were for 309 arthritis treatment, a 15% increase over the number in 1998. These numbers are similar to those in the U S where antiarthritics accounted for 3% ($3.7 billion) of the $112 bil l ion prescription drug market (approximately $400 per person) and 4% (108,500) of all prescriptions.4 However, despite the similarity in these percentages, the increase in prescription costs and numbers of prescriptions in the U S (1999 vs 1998) were higher: 77% and 19%, respectively. In part, this large increase in Canadian drug expenditures was due to the introduction of newly patented medications including the C O X - 2 selective inhibitors, celecoxib (Celebrex®) and rofecoxib (Vioxx®). Celecoxib was introduced to the Canadian market in Apr i l 1999 whereas rofecoxib was available only in October 1999. Consequently, due to its earlier availability, celecoxib had the larger impact on the Canadian prescription drug market during 1999. Specifically, purchases of celecoxib were almost $70 mil l ion compared to less than $3 mil l ion for rofecoxib.4 In addition, celecoxib was the 48th most commonly prescribed drug in Canada (1.8 mill ion prescriptions in 1999) and the 14th highest in prescription drug expenditures.4 These drugs generated considerable advertising. For instance, in 1999, celecoxib was the 4th most widely promoted drug as measured by estimated advertising expenditures ($4 million), advertisement pages in professional journals (336 pages), health professional detailing (55,000), and samples distributed (47,000). O f note, our data reveals that the C O X - 2 selective inhibitors have resulted in significant changes in the patterns and costs of both osteoarthritis and rheumatoid arthritis pharmacotherapy in Canada (Figures 1 and 2). For rheumatoid arthritis, there was an average of approximately 570 thousand prescriptions per year for N S A I D S from 1995 to 1998. For 1999 and 2000 (forecasted data), we have estimated that there w i l l be approximately 505 310 thousand prescriptions for both selective (260 thousand) and non-selective (245 thousand) C O X inhibitors. While the number of prescriptions may have decreased, the annual cost of these therapies has risen from an average of $13 mil l ion to approximately $21 mil l ion. Interestingly, the impact of the C O X - 2 selective inhibitors is even more pronounced in osteoarthritis. From 1995-1998, on average, there were about 1.7 mil l ion annual prescriptions for N S A I D S resulting in annual expenditures of $44 mil l ion. After the introduction of C O X - 2 selective agents, these figures increased to almost 1.9 mil l ion prescriptions and $78 mil l ion in expenditures. In that only the provinces of Alberta, Manitoba and Quebec were providing full reimbursement for at least one of the prescribed C O X - 2 selective agents during 1999 (other provinces providing partial reimbursement or restricted status coverage included Ontario, N e w Brunswick, Newfoundland, Nova Scotia, and Saskatchewan, while only British Columbia and Prince Edward Island currently do not provide coverage for either of these agents), these figures represent a dramatic shift in prescribing habits. Clearly, physicians have been convinced of the added benefit o f the agents and Canadian patients, usually reluctant to pay for a drug when an alternative that is covered by the health care system exists, have 'bought into' this belief. Authors of recent editorials have attempted to place the new specific C O X - 2 inhibitors into clinical perspective.5\"7 The International Cox-2 Study Group outlined the specific instances where these authors believe that the C O X - 2 specific inhibitors should be used in favour of classic C O X nonspecific N S A I D S . 5 These recommendations, for the most part, were based on expert opinion derived from epidemiological studies that evaluated the risks of serious gastrointestinal complications in certain patient groups. Since these 311 recommendations are fairly restrictive, it is unlikely that they are being followed precisely, given the rapid increase i n the use o f C O X - 2 specific inhibitors seen in our data. The C O X - 2 specific inhibitors possess equivalent efficacy as C O X nonspecific N S A I D S in both R A 8 and O A 9 . From the available evidence, it has been estimated that the number of patients that one would need to treat with a C O X - 2 specific inhibitor (instead of C O X nonspecific N S A I D S ) to avoid a serious gastrointestinal event and death is approximately 130 and 1300, respectively. 6 Another editorial estimated the yearly incremental cost (in 1999 U S dollars) of preventing one complicated ulcer with these new agents to be approximately $30,000. 7 However, it remains to be determined i f this reduction in ulcer-related hospitalizations, morbidity, and mortality leads to a reduction in overall treatment costs with the C O X - 2 specific inhibitors over traditional agents. The data suggest that C O X - 2 specific inhibitors are replacing the traditional N S A I D s . O f particular interest is the rapid increase in the utilization of C O X - 2 specific inhibitors in O A . Due to the higher risk-benefit ratio associated with using N S A I D s , the first line management of osteoarthritis has historically been acetaminophen. 1 0 However, N S A I D s have been shown to be moderately more effective than acetaminophen in osteoarthritis 1 1 and these patients are less wil l ing than rheumatoid arthritis patients to accept the risk of gastrointestinal toxicity associated with nonspecific C O X inhibitors. 1 2 Clinicians may feel that patients prescribed C O X - 2 specific inhibitors w i l l realize the additional benefit o f an N S A I D over acetaminophen, without the increase in risk of toxicity. In the past, treatment for arthritis has been inexpensive relative to other disease groups. A sufficiently large number o f N S A I D s existed in generic versions to allow considerable choice, and most disease suppressive medications are inexpensive. This is changing. The 312 prospect of C O X - 2 selective antiinflammatory drugs being used for more than 50% of arthritis consumers, along with new second-line drugs such as leflunomide (Arava®), infliximab (Remicade®) and etanercept (Enbrel®) being used to treat rheumatoid arthritis, juvenile rheumatoid arthritis, spondyloarthritis and perhaps other inflammatory types of arthritis is rapidly modifying the prescribing scene. Even so, arthritis overall w i l l remain relatively inexpensive to treat given the considerable burden of disease in the community and the consequences to the individual and to society of not treating these diseases with effective medications. Nonetheless, the cost-effectiveness of the new therapies has to be unequivocally established using rigorous scientific methods. 1 3 Only then wi l l large third party-payers such as health maintenance organizations, pharmacy benefit managers, and private insurers in the U S and provincial drug plans in Canada be convinced that they represent \"good value\" for scarce health care dollars. 313 R E F E R E N C E S 1. Canadian Institute for Health Information. Health care in Canada 2000: A first annual report. Ottawa: The Institute; 2000. 2. Canadian Institute for Health Information. National health expenditure trends, 1975-1999. Ottawa: The Institute; 1999. 3. Anis A H . Pharmaceutical policies in Canada: Another example of provincial federal discord. C M A J 2000;162:523-526. 4. IMS Health. Canadian pharmaceutical industry review, 1999. Pointe-Claire: IMS Health; 1999. 5. Lipsky P E , Abramson SB, Breedveld F C , Brook P, Burmester R, et al.. Analysis of the effect o f C O X - 2 specific inhibitors and recommendations for their use in clinical practice. J Rheumatol 2000;27:1338-1340. 6. Freemantle N . Cost-effectiveness of non-steroidal anti-inflammatory drugs (NSAIDs) -what makes a N S A I D good value for money. Rheumatol 2000;39:232-234. 7. Peterson W L , Cryer B . C O X - 1 sparing N S A I D s - Is the enthusiasm justified? J A M A 1999;282:1961-1963. 8. Simon L S , Weaver A L , Graham D Y , Kiv i t z A J , Lipsky P E et al.. Anti-inflammatory and upper gastrointestinal effects o f celecoxib in rheumatoid arthritis: a randomized controlled trial. J A M A 1999;282:1921-1928. 9. Day R, Morrison B , Luza A , Castaneda O, Strusberg A et al.. A randomized trial o f the efficacy and tolerability of the C O X - 2 inhibitor rofecoxib vs ibuprofen in patients with osteoarthritis. Rofecoxib/Ibuprofen Comparator Study Group. Arch Intern M e d 2000 26;160:1781-1787. 314 Hochberg M C , Altman R D , Brandt K D , et al. . Guidelines for the medical management of osteoarthritis. Part 1. Osteoarthritis of the hip. Arthritis Rheum 1995;38:1535-1540. Eccles M , Freemantle N , Mason J. North of England evidence based guideline development project: summary guidelines for non-steroidal anti-inflammatory drugs versus basis analgesia in treating the pain of degenerative arthritis. B M J 1998;317:526-530. Bagge E , Traub M , Crotty M , et al.. Are rheumatoid arthritis patents more wi l l ing to accept non-steroidal drug treatment risks than osteoarthritis patients? Br J Rheumatol 1997; 36:470-472. Canadian Coordinating Office for Health Technology Assessment. Guidelines for economic evaluations of pharmaceuticals: Canada. 2nd ed. Ottawa: C C O H T A ; 1997. 315 FIGURE 1: EXPENDITURES ON PRESCRIPTION DRUGS BY THERAPEUTIC CLASS FOR OA AND RA COMBINED, IN CANADA OVER TIME 120 1995 1996 1997 1998 1999 2000* Year Costs: 2000 figures based on an annualized projection from 1 s t quarter data (January to March 2000) 316 FIGURE 2: NUMBER OF PRESCRIPTIONS BY THERAPEUTIC DRUG CLASS FOR RA AND OA COMBINED, IN CANADA OVER TIME 3 1995 1996 1997 1998 1999 2000* Year Costs: 2000 figures based on an annualized projection from 1 s t quarter data (January to March 2000) 317 APPENDIX II PRACTICAL PHARMACOGENETICS: THE COST-EFFECTIVENESS OF SCREENING FOR THIOPURINE S-METHYLTRANSFERASE (TPMT) POLYMORPHISMS IN PATIENTS WITH RHEUMATOLOGICAL CONDITIONS TREATED WITH AZATHIOPRINE (AZA) Marra CA, Esdaile JM, Anis AH. J Rheumatol 2002;36:1851-5 318 INTRODUCTION The study of pharmacogenetics may assist in the determination of the considerable variation in respond to drug therapy by individuals. 1 With the recent ability to determine genetic polymorphisms that are associated with the efficacy or toxicity of particular drugs prior to the initiation of therapy, the potential exists to tailor drug regimens for individuals. Therefore, the adoption of these new diagnostic techniques could lead to an enhancement of the beneficial effects and the reduction of adverse effects of therapeutic agents. However, as with any new, expensive technology, it is imperative to systematically assess the cost-effectiveness of this new approach in order to determine i f scarce health care dollars should be allocated to its routine integration into the health care system. Thus, as the potential arises for genetic testing to be utilized to predict the outcomes associated with drugs used for rheumatological conditions, systematic evaluations must occur to assess their relative economic merit. Azathioprine ( A Z A ) interferes with the de novo synthesis of inosinic acid via feedback inhibition o f 6-thiosinosinic acid. A Z A also inhibits the interconversion of purine bases such as inosine to adenine and guanine ribonucleotides. 2 ' 3 ' 4 A Z A , as with all thioguanines, is metabolized to the active metabolites referred to collectively as 6-thioguanine nucleotides (6-TGN), which, in turn are inactivated by thiopurine-6-methyltransferase (TPMT) . The active metabolites, 6 - T G N , accumulate in tissues and are thought to be responsible for many of A Z A ' s serious toxicities. 2 ' 3 ' 4 T P M T is a cytosolic enzyme that preferentially catalyzes the inactivation (S-methylation) o f the active metabolites o f the thiopurines (6-mercaptopurine, A Z A and thioguanine). 5 T P M T shows codominant genetic polymorphism with about 85-90% of 319 people exhibiting high T P M T activity and approximately 10-15% of individuals having ' intermediate T P M T activity caused by heterozygosity at the T P M T locus. 5 ' 6 About 1 in 300 individuals inherits T P M T deficiency as an autosomal recessive trait. Unfortunately, no phenotype allows for the detection of either T P M T deficiency or intermediate activity thus these conditions, when identified, are usually only detected after a severe adverse reaction to a thiopurine medication. 5 Case reports, case series and case-control studies suggest that a reduction in T P M T activity is associated with the development of toxicity to thiopurine medications. 3 ' 5 ' 7\" 1 1 Due to these findings, some investigators have advocated the prospective measurement of T P M T activity. 5 ' 8\" 1 1 Unfortunately, T P M T assays are not clinically available, are influenced by several exogenous factors (eg. recent red blood cell transfusion, diuretic therapy, alcoholism) and are expensive, thus precluding their use.5 Yates C R et al. developed and validated polymerase chain reaction (PCR)-based methods for detection of T P M T mutations in the genomic D N A of humans. 5 Specifically they established the genetic basis for T P M T polymorphisms and discovered that, at the genetic level, the three groups of T P M T activity can be classified as homozygous wild-type (high activity), heterozygous mutants (intermediate activity) and homozygous mutants (deficiency). This P C R test for T P M T activity yielded the following sensitivities and specificities: 1) high activity, 100% and 100%; 2) intermediate activity, 95% and 100%; 3) deficiency, 100% and 100%. In addition, since P C R is relatively inexpensive, widely available, rapid, not affected by exogenous factors, and widely used clinically in other areas, it is a useful diagnostic test for the determination of T P M T activity. Decision analysis has been widely employed as a tool to systematically assist technology assessment when there is uncertainty about clinical or economic outcomes. 1 2 ' 1 3 320 This technique has been used extensively to model both the outcomes and cost-effectiveness of diagnostic testing in a variety of disease states. 1 4 ' 1 5 ' 1 6 However, to date, there has been no published cost-effectiveness analysis examining the cost-effectiveness of screening for enzyme polymorphism-based strategies. The objective of this study was to model the cost-effectiveness o f two alternate A Z A treatment strategies in two common autoimmune rheumatologic conditions (rheumatoid arthritis and S L E ) . The two strategies consist of: 1) P C R testing strategy prior to initiation o f A Z A resulting in dosage reduction in cases of reduced T P M T activity and deficiency; and 2) N o testing with usual therapy. METHODS Literature Review To determine the incidence of A Z A induced adverse events (primarily agranulocytosis, leukopenia and pancytopenia), a systematic search of the English language literature using computerized M E D L I N E , E M B A S E , P r e - M E D L I N E and Cochrane Systematic Review databases from 1966 to September 2000 and a manual search of references from retrieved articles were performed. Search terms included A Z A , T P M T , polymorphisms, rheumatoid arthritis, rheumatology, systemic lupus erythematosus, leukopenia, or pancytopenia. A recent case report outlined the occurrence of severe pancytopenia in a 14 year old girl with juvenile spondylarthritis on 3mg/kg/day of A Z A . \" Further laboratory evaluation identified a homozygous deficiency of T P M T which led to a buildup o f the A Z A metabolite 321 6-thioguanine. After identification and withdrawal of the drug, the patient gradually improved over an eight-week course. A prospective case-control study measured the T P M T enzyme activity in patients v ia H P L C methods. Cases were defined as longstanding R A patients on A Z A (n= 33) while the two control groups were patients with early R A (n=24) and healthy volunteers (n-42). The authors found that 14 out of the 33 (42%) AZA-treated patients developed severe toxicity and had to be withdrawn from therapy. O f these, 7 (50%) had intermediate T P M T activity and the toxicity exhibited was gastrointestinal intolerance or hematological cytopenia. The relative risk of developing severe toxicity in patients with intermediate T P M T activity compared to those with high T P M T activity was 3.1 (95% CI of 1.6-6.2). The authors also postulated that intermediate T P M T activity was associated with both hematological and gastrointestinal toxicities. A recent study has cast some doubt on whether the determination of patients' T P M T activity assessed via genotyping has any predictive value in determining the adverse 17 reactions to A Z A . The investigators correlated the adverse hematological toxicities with the T P M T genotype in 120 S L E patients attending their rheumatology clinic. The authors did not find a relationship between T P M T genotype and AZA-induced toxicity except for a single homozygote-deficient patient. However, this study has several limitations. Although 120 patients were tested, only 78 had ever received A Z A . O f the seven patients found to have T P M T polymorphisms, only four had received A Z A . O f these, three were heterzygotes who received very small doses of A Z A (0.4 and 0.6 mg/kg) and thus would not be expected to develop toxicity. In addition, it is unclear how the authors elucidated adverse reactions to A Z A as no definitions or methods of collection (prospective or retrospective) are supplied. 322 In a recent Cochrane Systematic Review of the use of A Z A in the treatment of rheumatoid arthritis, the efficacy and toxicity of A Z A was evaluated. 1 8 Specifically, in comparison to patients who received placebo, those that received A Z A were found to be 4.6 times more likely to withdraw from therapy due to adverse effects over a six month treatment period. The most common adverse reactions were found to be gastrointestinal symptoms (15%), mucocutaneous reactions (26%) and hematological disturbances (9%). In addition, the incidence of leukopenia in these trials was 21%, but most cases resolved by lowering the dose of A Z A . Finally, Black et a l . 1 9 evaluated the incidence of polymorphic inactivation of A Z A by thiopurine methyltransferase and the development o f clinical toxicity in a cohort of 67 patients from two rheumatology units who were prescribed A Z A . Polymerase chain reaction-based assays were used to detect mutations in T P M T . Six of the 67 patients (9%) were heterozygous for mutant T P M T alleles. O f these, five patients discontinued therapy within one month of starting treatment due to leukopenia (< 3.5 * 10 9 / L ). O n further questioning, the remaining patient admitted to having not taken any A Z A . Decision Analytic Model A predictive, decision analytic model was created using Data for HealthCare by Tree Age Software (version 3.5, Williamstown, Mass.) to estimate the effects and costs of each of the strategies (Figure 1). Specifically, the strategies evaluated were: 1) a P C R test screening for T P M T reductions or deficiencies (as per the methods described by Yates et a l . , 5 the P C R assays isolate and characterize two mutant alleles that are associated with T P M T deficiency, T P M T * 2 and T P M T * 3 A ) . and a reduction in dose based on the results of the 323 test; or 2) usual full-dose therapy with A Z A . Because T P M T reductions or deficiencies result in an accumulation of the active metabolites, modified dosing could be implemented in order to avoid toxicity without compromising therapeutic effect. The dosing guidelines by T P M T genotype are as follows: 1) homozygous wi ld type (normal T P M T activity): target dose of 2.0 -2.5 mg/kg/day; 2) heterozygous (reduced T P M T activity): target dose of 1.0 mg/kg/day; and 3) homozygous mutant (deficient of T P M T activity): target dose of 0.25 mg/kg/day. The time horizon for the development o f adverse events in the model was 6 months so discounting of costs and effects was not required. 2 0 A l l costs were in 1999 Canadian dollars (1 Canadian dollar = 0.67 United States dollar). Target Population For the purposes of this analysis, only costs and outcomes o f patients with rheumatological conditions (mainly R A and S L E ) were included in the model. Probabilities and Values Used in the Model A l l probabilities and values that were used both in the base case and sensitivity analyses are detailed in Table 1. To estimate the incidence o f hematological disturbances (agranulocytosis and/or leukopenia) that would be preventable through a prophylactic A Z A dosage reduction in susceptible individuals, the results of the literature review previously described were evaluated. From the Cochrane Systematic Review of A Z A in R A , it appears that approximately 9% of the R A patients experienced hematological toxicities in the pooled results o f three published six month randomized, clinical trials and up to 21% experienced leukopenia. 1 8 This figure is in agreement with the study by Black et a l . . 1 9 Gastrointestinal 324 disturbance was not included in the model as due to the lack of consistent correlation with their development with T P M T , 3 ' 1 8 the idiosyncratic nature of gastrointestinal disturbances (i.e. in clinical trials, even placebo arms have high incidences of gastrointestinal events), and the lack of significant medical attention necessary to resolve most cases. From the results o f the case-control study,3 it also appears that the prospective evaluation of T P M T activity would not be sufficient to eliminate all cases o f hematological toxicity. Thus, for the base case analysis, we assumed that 50% of the cases of hematological toxicity could be eliminated by the PCR-testing strategy through the identification of susceptible individuals and the implementation of a prophylactic dosage reduction. To determine the incidence of true and false positive and true and false negative P C R test results, we utilized the sensitivity and specificity of the T P M T P C R genotype testing procedure outlined by Yates et a l . . 5 To convert these parameters into probabilities, we first converted the sensitivity and specificity values into both positive and negative likelihood ratios The likelihood ratio expresses the odds that the test result occurs in patients with the disease versus those without the disease. Thus, the positive likelihood ratio equals sensitivity/(l-specificity) and the negative likelihood ratio equals (l-sensitivity)/specificity. Decision probabilities were calculated from these likelihood parameters and the a priori probability of having a positive result using Baye's Revis ion. 2 2 Costs used in the Model A l l cost data that were used in the model are summarized in Table 2. Only direct medical costs were considered. Since the cost of the P C R genotyping test is not available in local clinical laboratories and is not currently being utilized in this manner, we estimated the 325 cost of this test from the costs of performing other P C R based tests that are clinically available. Specifically; we estimated that the base case cost of the P C R genotype testing would be $100 Canadian per person. From interviews with a hospital-based, academic hematologist and an office- based, community hematologist, we estimated the costs associated with caring for individuals who developed a hematological disturbance. Specifically, the experts felt that approximately 50% of these individuals would need to be hospitalized due to the risks associated with agranulocytosis and the underlying immunosuppression that occurs with drug treatment o f their underlying illness. Those i l l enough to be hospitalized were assumed to require intravenous antibacterials and G - C S F . In addition, daily complete counts with differential and blood cultures i f fever was present would be ordered. Average duration o f hospitalization in those hospitalized was estimated to be 10 days. A fully-allocated hospital cost model was utilized to estimate the costs associated with resource utilization. 2 3 For those able to be managed as outpatients (estimated to be 50%), costs of physician visits and laboratory tests were determined from the Provincial Guide to Medical Fees and drug costs were determined from the I M S Health Canada database of provincial drug costs. It was assumed that those managed as outpatients would incur two office visits and have two complete counts with differential. Approximately 30% would receive G - C S F as outpatient therapy. Other outpatient costs would be incurred by those who were false positives with the P C R genotype testing. Since these individuals would be treated with a reduced dosage o f A Z A , the likelihood of treatment failure would be high. Thus, these individuals were assumed to incur an additional physician office visit and an additional prescription for a medication. 326 R E S U L T S In the base case model, the normal dosing strategy cost $677 per patient whereas the genotype-directed dosing strategy cost $663 per patient. In the genotype dosing strategy, the number needed to treat to avoid one adverse event over six months was 20. Thus, the genotype-based dosing strategy dominated (it was both more effective and less costly) the usual dosing strategy. Sensitivity analyses The most influential parameters on the results of the model in univariate sensitivity analyses are summarized in Figure 2. Since the P C R test was not performed by clinical laboratories in our region, we did not have an accurate cost estimate for this parameter. Thus, we have utilized a wide range of test costs for the sensitivity analysis (Figure 3). A s it can be seen from the results of this analysis, the outcomes of the model are very sensitive to the value of this parameter such that the threshold value was approximately $114. However, the model was quite robust to plausible changes in the sensitivity and specificity of the P C R test with threshold values of 0.929 and 0.947, respectively. A sensitivity analysis on the probability of the preventable hematological cytopenias revealed the threshold value to be 4.4%, above which the genotype based dosing strategy is less costly and below which the usual dosing is less costly. Finally, a sensitivity analysis on the probability receiving inpatient care for the hematological cytopenias revealed the threshold value to be 44%, above which the genotype based dosing strategy is less costly and below which the usual dosing is less costly. 327 DISCUSSION We found that the prospective adoption of an A Z A dosing strategy based on the molecular diagnosis of T P M T reduction and deficiency by genotypic results in a small cost reduction when using our base case estimates of costs and probabilities. In addition, this strategy results in avoidance of approximately half of all serious hematological toxicities associated with A Z A therapy. The cost difference was sensitive to small changes in the cost of P C R and the probability of preventable hematological cytopenia. However, the clinical endpoint (the avoidance of hematological cytopenia) was robust across the range tested in our sensitivity analyses. To our knowledge, this is the first economic evaluation to consider the impact of drug metabolizing enzymatic heterogeneity secondary to measurable genetic polymorphisms within the human genome on the disposition o f a pharmaceutical agent and the subsequent development o f adverse events. Other economic decision analyses have been completed that have evaluated the cost-effectiveness of determining the genotype of H I V and the impact on drug resistance. 2 4 , 2 5 Similar to our study, these investigators found that the adoption of genetic diagnostic techniques can facilitate an improvement in the outcomes achieved with drug therapy. One of the major impediments to implementing the prospective testing of T P M T activity through genotyping is the availability of the P C R test procedure in a clinical laboratory that permits a short turn-around time for the results. Currently, there are clinical sites where this test has been implemented to prospectively diagnose patients and allow for the reduction of thiopurine dosage and the avoidance o f drug toxicity. 4 5 We would therefore advocate the adoption of this technology should the necessary infra-structure (ie. clinical laboratory) and 328 expertise be available. In addition, it is possible, as with any diagnostic test, clinicians may not order the P C R test prior to initiating therapy but may use it as a confirmatory mechanism once toxicity has developed. Obviously, the cost-effectiveness of using the P C R test in this manner would be considerably different than determined from this study. Our analysis has several limitations. First of all , we did not consider indirect costs as these were not available. Since the indirect costs would likely be higher in the strategy that included more adverse events, it is likely that the omission of these conservatively biases our results against the genotype based dosing strategy. In addition, we did not consider health-related quality of life for the various health states represented in our model (leukopenia without infection, leukopenia with infection, and healthy) as these were not available. Future analyses should be performed such that these outcomes can be integrated into the model. In summary, we found that the costs of determining T P M T activity through genotyping and a corresponding A Z A dose-reduction in susceptible individuals ranged from being slightly cost saving to relatively modest in light of its outcome benefit when compared with the usual approach to A Z A dosing.. Our analysis can act as a data-driven basis for policy decisions about genetic testing to avoid side effects associated with A Z A in patients with rheumatic diseases. 329 REFERENCES 1. Evans W E , Relling M V . Pharmacogenomics: Translating functional genomics into rational therapeutics. Science 1999 15;286:487-491. 2. American College of Rheumatology. Guidelines for monitoring drug therapy in rheumatoid arthritis. Arthritis Rheum 1996;39:723-731. 3. Stolk J N , Boerbooms A M T , de Abreu R A , de Koning D G M , van Beusekom H J , et al.. Reduced thiopurine methyltransferase activity and development of side effects o f A Z A treatment in patients with rheumatoid arthritis. Arthritis Rheum 1998;41:1858-1866. %. Krynetski E Y , Evans W E . Pharmacogenetics as a molecular basis for individualized drug therapy: the thiopurine S-methyltransferase paradigm. Pharmaceutical Research 1999;16:342-349. 5. Yates C R , Krynetski E Y , Loennechen T, Fessing M Y , Tai H L , Pui C H , Rell ing M V , Evans W E . Molecular diagnosis of thiopurine S-methyltransferase deficiency: genetic basis for A Z A and mercaptopurine intolerance. A n n Intern M e d 1997;126:608-614. >. Weinshilboum R M , Sladek S L . Mercaptopurine pharmacogenetics: monogenic inheritance of erythrocyte thiopurine methyltransferase activity. A m J H u m Genet 1980;32:651-662. r. M c L e o d H L , Mi l l e r D R , Evans W E . AZA-induced myelosuppression in thiopurine methyltransferase deficient heart transplant recipient. Lancet 1993;341:436. Evans W E , Horner M , Chu Y Q , et al.. Altered mercaptopurine metabolism, toxic effects and dosage requirement in a thiopurine methyltransferase-deficient child with acute lymphocytic leukemia. J Pediatr 1991 ;119:985-989. 330 9. Lennard L , Gibson B E , Nicole T, Li l ley man JS. Congenital thiopurine methyltransferase deficiency and 6-mercatpopurine toxicity during treatment for acute lymphoblastic leukemia. A r c h Dis Ch i ld 1993;69:577-579. 10. Evans W E , Hon Y Y , Bomgaars L , Coutre S, Holdsworth M , Janco R, et al.. Preponderance of thiopurine s-methyltransferase deficiency and heterozygosity among patients intolerant to mercaptopurine or A Z A . J C l in Oncol 2001;19:2293-2301. 11. Leipold G , Schutz E , Haas P, Oellerich M . A Z A induced severe pancytopenia due to homozygous two-point mutation of the thiopurine methyltransferase gene in a patient with juvenile HLA-B27-associated spondylarthritis. Arthritis Rheum 1997;40:1896-1898. 12. Kassirer JP, Moskowitz A J , Lau J, Pauker SG. Decision analysis: A progress report. A n n Intern M e d 1987;106:275-291. 13. O 'Br ien B J . Economic evaluation of pharmaceuticals. Frankenstein's monster or vampire of trials? Med Care 1996;34:DS1-DS10. 14. Paterson DI, Schwartzman K . Strategies incorporating spiral C T for the diagnosis of acute pulmonary embolism: a cost-effectiveness analysis. Chest. 2001 ;119:1791-800. 15. Webb K H . Does culture confirmation of high-sensitivity rapid streptococcal tests make sense? A medical decision analysis. Pediatrics. 1998;101:E2. 16. M c M a h o n P M , Araki SS, Neumann PJ, Harris GJ , Gazelle GS . Cost-effectiveness of functional imaging tests in the diagnosis of Alzheimer disease. Radiology 2000;217:58-68. 17. Naughton M A , Battaglia E , O 'Br ien S, Walport M J , Botto M . Identification of thiopurine methyltransferase (TPMT) polymorphism cannot predict myelosuppression 331 in systemic lupus erythematous patients taking A Z A . Rheumatology. 1999 ;38:640-644. 8. Suarez-Alamazor M E , Spooner C, Belseck E . A Z A for treating rheumatoid arthritis (Cochrane Review). In: The Cochrane Library, Issue 1, 2001. Oxford: Update Software. 9. Black A J , M c L e o d H L , Capell H A , Powrie R H , Matowe L K , et al.. Thiopurine methyltransferase genotype predicts therapy-limiting severe toxicity from A Z A . A n n Intern M e d 1998;129:716-718. 0. Canadian Coordinating Office for Health Technology Assessment. Guidelines for economic evaluations of pharmaceuticals: Canada. 2nd ed. Ottawa: C C O H T A ; 1997 1. Dawson-Saunders B , Trapp R G . Evaluating diagnostic procedures. In: Dawson-Saunders B , Trapp R G , editors. Basic and clinical biostatistics.Norwalk:Appleton and Lange.l990;229-244. I. Data 3.5 for Health Care User's manual. TreeAge Software, Williamstown, M A . 1999. ). An i s A H , Guh D , Stieb D , Leon H , Beveridge R C , et al.. The costs of cardiorespiratory disease episodes in a study of emergency department use. Can J Public Health 2000;91:103-106. k Freedberg K A , Losina E , Weinstein M C , Paltiel D , Cohen C J , et al.. The cost effectiveness of combination antiretroviral therapy for H I V disease. N Eng J M e d 2001;344:824-831. Weinstein M C , Goldie SJ, Losina E , Cohen C J , Baxter JD, et al.. Use o f genotypic resistance testing to guide hiv therapy: clinical impact and cost-effectiveness. A n n Intern M e d 2001; 134:440-450. 332 TABLE 1: PROBABILITIES AND VALUES USED IN THE DECISION MODEL Parameter Base Case Range Reference Upper Lower Hematological cytopenia 0.09 0.03 0.15 18, 19 Sensitivity of P C R genotype test (%) 95.2 76.2 99.9 5 Specificity of P C R genotype test (%) 100 83.9 100 5 Positive likelihood ratio ~O0 4.73 ~oo 5 Negative likelihood ratio 0.05 0.01 0.29 5 Inpatient Care 0.50 0.20 0.99 Experts 333 TABLE 2: COSTS USED IN THE DECISION MODEL Parameter Costs1 Adverse event costs Inpatient Outpatient P C R costs Treatment failure cost A Z A Cost/50mg tablet3 Base Case 2679 790 100 573 0.67 Range Upper 2933 990 50 775 0.63 Lower 2592 590 200 275 0.80 Reference 17 19 B C G S C 2 17, 19 I M S Health Pharmacist dispensing fee per episode 6.62 2.08 11.36 Benefit Management Ltd. A l l costs in the model are 1999 Canadian Dollars British Columbia Genome Sequencing Centre Average Canadian cost per generic brand tablet as reported by IMS Health Canada 334 F I G U R E 1: D E C I S I O N A N A L Y T I C M O D E L A Z A Normal dosing no testing ^ u — o Genotype dosing^ • o< Test+ pTest+ vTest-pTest-A D R pADR no A D R True+ pTrue+ False+ pFalse+ False-pFalse-rue-pTrue--<] costl -<] cost2 -O cost3 -<] cost4 -O costs cost6 Decision Model Legend: SQUARE : decision node between the two strategies (normal and genotype dosing; CIRCLE: chance node from which probabilities of events emanate; TRIANGLE : terminal node denoting the final outcome and payoffs. Normal dosing refers to typical AZA dosing to a target dose of 2.0 - 2.5 mg/kg/day, Genotype dosing refers to the use of recommended reductions of AZA dosing according to genotype, pADR refers to the probability of hematological cytopenia in the normal dosing arm (the # is the corresponding probability of 1-pADR), the PCR testing properties (Test+, Test-. True +, False+, False-, True -) were derived from the sensitivity and specificity; All costs (cost 1 to cost 6) represent embedded formulae that calculate the costs associated with each decision path. 3.35 FIGURE 2: UNIVARIATE SENSITIVITY ANALYSES - MOST INFLUENTIAL PARAMETERS Univariate sensitivity analyses of influential parameters S3 probability of avoidable cytopenia: 0.01 to 0.21 EE) cost of PCR: 50 to 200 V7\\ probability of inpatient care: 0.2 to 0.99 S3 sensitivity of PCR: 0.86 to 0.99 E3 specificity of PCR: 0.84 to 0.9999 I i | i | i | i | I $590.0 $610.0 $630.0 $650.0 $670.0 Expected Value The dark vertical lines represent threshold points. The Y-axis of the graph refers to the expected value of the genotype dosing strategy. 336 FIGURE 3 : SENSITIVITY ANALYSIS O F T H E C O S T O F T H E P C R T E S T > 2 O n. x w 50.0 Sensitivity Analysis on cost of PCR 87.5 125.0 cost of PCR 162.5 • Normal dosing • Genotype dosing Threshold Values: cost of PCR = 114.0 ® EV = $677.0 200.0 E V - Expected or threshold value. This is the point when the value of the parameter being tested results in an equal cost in both strategies 337 APPENDIX III THE EFFECTIVENESS AND TOXICITY OF CYCLOSPORINE A IN RHEUMATOID ARTHRITIS: LONGITUDINAL ANALYSIS OF A POPULATION-BASED REGISTRY Marra CA, Esdaile JM, Guh D, Fisher JH, Chalmers A, Anis AH. Arthritis Rheum 2001;45:240-5 338 INTRODUCTION Rheumatoid arthritis (RA) is a chronic, inflammatory, systemic disease. Due to its progressive nature, extra-articular manifestations and the adverse effects of medications used for treatment, patients with R A have an increased risk of morbidity and mortality. ' ' 2 Aggressive pharmacotherapy is recommended early in the course of the disease in order to prevent severe disability and death.3 Methotrexate ( M T X ) has become the disease modifying antirheumatic drug ( D M A R D ) of choice due to its superior efficacy and tolerability when compared to other 4 8 agents. \" However, M T X , used as monotherapy or in combination with another D M A R D , is rarely associated with sustained disease remissions and many patients do not respond to and/or tolerate this agent. 7 ' 8 , 9 Thus, newer treatments, such as cyclosporine A (CSA) , have emerged as additional agents for patients who fail usual D M A R D therapy. Placebo-controlled or comparative, randomized trials of C S A in R A 1 0 ' 1 1 have demonstrated that C S A is more effective than placebo and is at least as effective as penicillamine and azathioprine. However, treatment with C S A is associated with an increased incidence of reversible renal dysfunction and hypertension. Other significant adverse reactions include headache, tremor, gastrointestinal disturbances and gum hyperplasia. Although these trials demonstrated efficacy, they were o f short duration (2 to 12 months) and enrolled a selected group o f patients wil l ing to provide informed consent, thereby limiting somewhat their generalisability for the treatment of patients with R A . The objectives of the present study were to determine C S A ' s long-term effectiveness and toxicity and determinants of such in a population-based inception cohort o f patients with R A . 339 P A T I E N T S A N D M E T H O D S Starting in January 1991, data were prospectively recorded for an inception cohort of all patients with R A treated with C S A in British Columbia (total population 3.4 million). To be eligible for reimbursement from the B . C . government for the costs o f C S A , patients had to meet the revised criteria of the American College of Rheumatology for R A and had experienced an inadequate response to prior D M A R D therapy including M T X . The same physicians and nurses evaluated all patients over time. C S A treatment was started at 2.5 - 3.0 mg/kg/day in 2 divided doses and was increased or decreased at monthly intervals until the minimum effective dose was achieved or toxicity developed. Discontinuation of C S A for ineffectiveness or toxicity was at the discretion of the attending rheurnatologist in response to regular monitoring parameters described below. The reason for C S A discontinuation was determined prospectively. For the purposes of this study, data from January 1991 through December 1997 were analyzed. Data included demographics, disease history (years o f R A , extra-articular manifestations), R A treatment history (previous D M A R D use, prednisone dose at C S A initiation), baseline clinical assessments, and details of C S A treatment (initiation dose, maximum dose, number of dose changes, dose at discontinuation, and the reason for discontinuation). Concomitant therapies including M T X were also recorded. Each patient was assessed monthly for the first three months and then at least every 3 months thereafter for C S A effectiveness (number of tender joints, duration of morning stiffness) and measurements of C S A toxicity. Patients were reviewed for toxicity (systolic (SBP) and 340 diastolic blood pressure (DBP), and serum creatinine) weekly for the first month and then were followed on a monthly basis. Statistical Analysis Kaplan-Meier survival analysis methods were used to estimate the probability of discontinuing C S A due to either lack of efficacy or toxicity. The log-rank test was used to test the null hypotheses that the survival time (i.e. total duration of C S A treatment) between categorical factors (sex, age groups, years of RA. prior to C S A treatment divided into those with R A < or > 10 years, concomitant M T X use) was the same. Right censoring was used for patients lost to follow-up prior to discontinuation of C S A . Survival time was used as a measure of overall C S A effectiveness. To develop a model that predicted how long a patient continued on C S A therapy, Cox proportional hazard models were used to evaluate C S A discontinuations, adjusting for the effects of age, sex, duration of R A , combination with M T X (yes/no), year of C S A initiation, and baseline measures of disease severity (tender joint count and duration o f morning stiffness) in a forward stepwise fashion. Validi ty o f the proportional hazards assumption was explored by inspecting plots of log[-log(proportion continuing C S A ) ] versus log(time) for categorical independent variables in the model. For both Kaplan Meier and Cox regression, significance was set at an a level of < 0.05 for the model and parameter estimates. Generalized estimating equation (GEE) models were utilized to examine the relationship between each of the dependent variables (efficacy variables U o m t count and duration o f morning stiffness] and toxicity variables [serum creatinine, SBP , and DBP]) with a number of explanatory variables (sex, age, duration of rheumatoid arthritis, combination 341 treatment with M T X , number of previous D M A R D s , number o f extra-articular manifestations, and year of C S A start). G E E models were used because the repeated measures for each subject over time were assumed to be correlated. 1 2 Joint count data were assumed to follow a Poisson distribution and thus the log-link function was applied. Morning stiffness, systolic blood pressure, diastolic blood pressure and serum creatinine were assumed to be normally distributed and thus the identity-link function was applied. A n exchangeable correlation matrix was utilized for all analyses, which assumes that all observations are equally correlated. Significance was set at an a level o f < 0.05 for the model and parameter estimates. Interactions between all significant covariates were tested in all regression models and residual plots were inspected to assess the validity of the models. R E S U L T S One hundred and thirty three patients were started on C S A treatment of which 100 (75%) were female. The median age was 58 years (range 28-82) with a median duration of R A of 13 years (range 2-54). Ninety-nine (74%) patients had extra-articular manifestations of R A (median 1, range 0-5) prior to C S A therapy with the most common being keratoconjunctivitis sicca (72 patients), rheumatoid nodules (68 patients), and vasculitis (15 patients). Patients had received a median of 5 previous treatments with D M A R D s [range 2-8] with inadequate response or toxicity. Previous D M A R D use was as follows: M T X in 120 (90%), oral or parenteral gold products in 114 (86%), penicillamine in 90 (68%), azathioprine in 87 (65%), hydroxychloroquine in 83 (62%), sulfasalazine in 69 (52%), chlorambucil in 13 (10%), and cyclophosphamide in 6 (5%). A t the time o f C S A initiation, 342 102 patients (77%) were receiving prednisone (median dose 5 mg/day, range 2.5-60 mg) and 80 (60%) were receiving nonsteroidal anti-inflammatory drugs (NSAIDs) . The median maximum C S A dose utilized was 4.1 mg/kg/day (range 2-7) whereas the median C S A dose at cessation was 3.3 mg/kg/day (range 1-6). Twenty-seven patients (20%) received concomitant M T X therapy for at least a portion of their C S A treatment course. O f these, 26 (96%) had received previous M T X therapy prior to C S A initiation, 18 (67%) continued receiving M T X from the period prior to C S A initiation, and 9 (33%) initiated M T X treatment during C S A therapy. Neither baseline demographics nor treatment variables differed significantly between those treated with the combination therapy and those treated with C S A monotherapy, or between patients who were on M T X prior to initiating C S A and those who initiated M T X therapy during C S A . Survival Analyses Thirty-seven of the 133 patients (28%) discontinued C S A prematurely due to ineffectiveness (n=19) or toxicity (n=18). O f those discontinued due to toxicity, elevations in serum creatinine (n=10), hypertension (n=4), infections (n=3), and gingival hyperplasia (n=l) were the causes. On follow-up after C S A discontinuation, no patient developed elevated SCr or blood pressure secondary to C S A treatment. The remainder of the patients (n=96) was right censored due to the continuation of C S A therapy beyond the end of the study (n=61), loss to follow-up (n=24), death unrelated to R A / C S A (n=6), or cessation of C S A unrelated to ineffectiveness or toxicity (n=5). The median time to C S A discontinuation was 75 months (95% CI 38 to 112) assessed by Kaplan Meier analysis (Figure 1). In the initial 24 months of C S A treatment, nearly all o f the discontinuations due to ineffectiveness 343 (18 out of 19) occurred. For the remaining months of C S A treatment, most discontinuations were due to toxicity (n=6) as opposed to ineffectiveness (n=l). The evaluation of predictors of continued C S A treatment revealed that only combination with M T X (yes/no) was a significant predictor (p=0.012) (Figure 2). Gender, age (divided into 20 year increments), years of R A prior to C S A treatment (divided into those with R A < or > 10 years), and extra-articular manifestations (yes/no) were not significant predictors for C S A continuation. Cox Proportional Hazards Model and Generalized Estimating Equations Models The C o x proportional hazards model revealed that concomitant M T X and baseline affected joint count (divided by 10) were predictive of continued C S A therapy. M T X appears to be protective with respect to C S A discontinuation (relative hazard 0.22, 95% CI 0.10 to 0.94), whereas the baseline affected joint count was associated with an increased relative hazard of discontinuing C S A (1.28, 95% CI 1.00 to 1.64). The results from all of the G E E multivariate models are presented in Table 1 and are discussed below. Effectiveness Variables Only the variables time on C S A and the use of M T X were significantly associated with a reduction in the joint count. According to the model, for patients who received the M T X / C S A combination therapy, the joint count was reduced by 21% (95% CI 1% to 37%) when compared to those receiving C S A monotherapy. In addition, for each month of continued treatment with C S A , the joint count was reduced by 1% (95% CI 0% to 2%). 344 For duration o f morning stiffness, only the time on C S A therapy was found to be significant and was not influenced by measured potential confounders. Thus, for every month on C S A therapy, the duration of morning stiffness decreased by 2.0 minutes (95% CI 1.1 to 3.0). Toxicity Variables The number of previous D M A R D s utilized prior to C S A therapy was significantly associated with increased S B P after adjusting for age and weight. With each additional D M A R D utilized prior to starting C S A , S B P increased by 7.2 mmHg (95% CI 2.7 -11.7). Similarly, both the time on C S A therapy and the number of previous D M A R D s utilized prior to C S A therapy were significantly associated with increased D B P after adjusting for the effects o f age and weight. For each additional month on C S A therapy, the model predicted an increase in D B P of 0.07 mmHg (95% CI 0.02-0.09). Likewise, for each additional D M A R D utilized prior to C S A therapy, the diastolic blood pressure increased by 3.8 mmHg (95% CI 3.0-6.4). A s expected, both age and weight were significantly associated with both S B P and D B P . N o interactions between predictors were found to be significant in either model. Both the time on C S A therapy and the number of previous D M A R D s utilized prior to C S A therapy were significantly associated with increased serum creatinine after adjusting for the effects of age, weight, and years of R A prior to C S A therapy. The number of previously tried D M A R D s had a large impact in predicting nephrotoxicity as each additional D M A R D resulted in an increase of 35 umole/L in serum creatinine (95% CI 22 to 48). In addition, an interaction term between length of C S A treatment and the number of previous D M A R D s was 345 also significant. The concomitant use of M T X was not significantly associated with increases in any of the toxicity variables. DISCUSSION We report the first, population-based, longitudinal experience with C S A in an inception cohort of R A patients. We note that C S A is both safe and effective over longterm use and its effectiveness is enhanced when combined with M T X . Other major findings o f our analyses include that the clinical improvement associated with C S A is enhanced by the duration of C S A treatment, that there is a specific discontinuation pattern for C S A , and that previous D M A R D therapy is a significant predictor for the development o f toxicity. Importantly, the combination o f M T X with C S A resulted in additional clinical improvement over C S A monotherapy without additional toxicity. The use o f combination C S A / M T X was associated with longer C S A use and an improvement in joint counts, which is in general agreement with the only randomized, double-blind trial evaluating the efficacy of M T X plus C S A . 1 6 In this multicenter trial, the combination of the two D M A R D s was superior to M T X alone resulting in a clinically significant improvement that was maintained for up to 12 months without additional toxicity. It would appear that the benefit of combination therapy also occurs when M T X is added to C S A , and is maintained over the long-term use of C S A . Thus, combination therapy with M T X and C S A should be considered for all patients being placed on C S A . Further randomized, clinical trials could compare C S A / M T X to other recently successful combination therapies such as M T X plus inf l iximab 1 7 to determine their relative efficacy and cost-effectiveness. Another notable result was that there was no chronic or life-threatening toxicity associated with long-term use of C S A . N o patient developed severe or irreversible side 346 effects although almost half of all C S A discontinuations were due to toxicity (mainly transient increases in SCr and blood pressure). The duration of treatment with C S A and the number of previous D M A R D s patients were exposed to prior to C S A appeared to be associated with increased SCr and elevated blood pressure. Previous D M A R D use did not correlate with duration of R A or with age. While we cannot explain this conclusively from our data, we hypothesize that the number of previous D M A R D s may be a surrogate marker for the number or duration of N S A I D S utilized prior to C S A treatment. N S A I D S are both a risk factor for the development of hypertension and renal dysfunction in patients with RA. . 1 8 \" 2 0 Authors of a recent paper examining the factors that predict drug response in clinical trials for R A found that, amongst other factors, previous D M A R D use was associated with a reduction in the likelihood of patient improvement (adjusted O R of 0.62). 2 1 Therefore, from our data, it is not clear i f the number of previously used D M A R D s is a confounder or an independent risk factor for the development of toxicity from C S A . O f note, an interaction term between duration of C S A and number of D M A R D s was significant. Thus, it would appear that renal toxicity is less likely to occur in patients who have been on C S A for a long time and have been on numerous D M A R D s . This observation is l ikely due to the early discontinuation of C S A in people that are susceptible to nephrotoxicity thus selecting for individuals who can tolerate the renal effects of this agent. The time to discontinuation of C S A appeared to follow a specific pattern with a higher discontinuation rate in the first 24 months of therapy followed by a plateau from 24 to 40 months, and then a more rapid rate of discontinuation after 40 months. This finding is consistent with the data from the Australian and French experiences. 1 3 ' 1 4 For example, Johns et a l . 1 3 found that patients with gastrointestinal disturbances, anxiety, tremors and hirsutism 347 withdrew early from treatment where later withdrawal was due to more serious toxicity (elevated serum creatinine or hypertension). The authors speculated that these early toxicities could have been managed with a dosage reduction rather than discontinuation. In our cohort, patients with these early toxicities were managed with a reduction in C S A dosage and, thus, many early discontinuations were due to ineffectiveness rather than toxicity. This finding supports the assumption in our study that duration of treatment is a measure of overall drug effectiveness. Other clinic-based, retrospective analyses of C S A in R A have been published 1 3 - 1 5 In these, withdrawal because of ineffectiveness occurred in 13% to 36%, while toxicity accounted for 11% to 33% of withdrawals. A s in our study, the major toxicities were elevated serum creatinine and hypertension, although one study reported cancer in three patients. 1 5 Strengths of our study include the relatively large sample size, the prolonged follow-up, and that it encompasses the entire population of R A patients treated with C S A in a defined geographic region (the province of B.C. ) . In addition, the same practitioners repeatedly assessed all R A patients in the cohort. Thus, many potential sources of selection bias and assessment were avoided. Our analysis contains patients treated with a combination M T X plus C S A and patients treated with C S A monotherapy such that comparisons of continuation time can be made. The utilization of Cox regression enabled the identification of determinants of C S A discontinuation and the use o f G E E models allowed for the adjustment of repeated, correlated measures for each subject rather than analyzing the data at single points in time. The G E E regression models also allowed for the determination of variables significantly associated with favorable response and toxicity to C S A . 3 4 8 Although the data were collected prospectively, a limitation was that the analysis was completed retrospectively. Information that might have been of interest such as standardized measures of disability or longitudinal radiographic assessments were not available. Although we observed statistically significant differences between those patients treated with the combination o f M T X and C S A and those on C S A monotherapy in terms o f continuation time and clinical outcomes, treatment assignment was not random. Nonetheless, our results are consistent with those from a randomized trial and may augment the generalisability o f the trial results. Finally, the cost-effectiveness o f C S A relative to other D M A R D s has been determined based upon the results of clinical trials with short follow-up. 2 2 However, the longterm cost-effectiveness of C S A is still unknown and w i l l be a focus of further studies. In summary, C S A appears to be both safe and effective for long-term use in patients with severe R A who have failed on multiple other therapies. Combining C S A with M T X both prolongs the duration of C S A therapy and reduces the number of affected joints. C S A was reasonably well tolerated with no irreversible adverse events although a longer duration of C S A therapy was associated with the development of renal toxicity and increased D B P . The number o f previous D M A R D s used prior to C S A appears to be a determinant for the development of hypertension and renal dysfunction although it is unclear i f this parameter is a confounder or an independent risk factor. 349 R E F E R E N C E S 1. Kl ippe l J H , Weyand C M , Wortmann R L . (eds). Primer on the rheumatic diseases, 11th Ed . Arthritis Foundation, Atlanta, Georgia 1997. 2. Pincus T, Callahan L F . Early mortality in R A predicted by poor clinical status. B u l l Rheum Dis 1992;41:1-4. 3. Tsakonas E , Fitzgerald A A , Fitzcharles M A , Cividino A , Thorne JC, M'Seffar A , Joseph L , Bombardier C , Esdaile J M . Consequences o f delayed therapy with second-line agents in rheumatoid arthritis: a 3 year followup on the hydroxychloroquine in early rheumatoid arthritis ( H E R A ) study. J Rheumatol 2000;27:623-629. 1 Miku l s T, O'Dell JR. The treatment of rheumatoid arthritis: current trends in therapy. Arthritis Rheum 1999;42(suppl):S51. Abstract. 5. Pincus T, Marcum SB, Callahan L F . Long-term drug therapy for rheumatoid arthritis in seven rheumatology private practices: II. Second line drugs and prednisone. J Rheumatol 1992;19:1885-1894. 3. Felson D T , Anderson JJ, Meenan R F . Use of short-term efiicacy/toxicity tradeoffs to select second-line drugs in rheumatoid arthritis. A meta-analysis of published clinical trials. Arthritis Rheum 1992;19:1117-1125. '. Tugwell P, Pincus T, Yocum D , et al.. Combination therapy with cyclosporine and methotrexate in severe rheumatoid arthritis. N Engl J M e d 1995;333:137-141. !. O'Dell JR, Haire C , Erikson N , et al.. Treatment of rheumatoid arthritis with methotrexate alone, sulfasalazine plus hydroxychloroquine, or a combination of all three medications. N Engl J M e d 1996;334:1287-1291. 350 Kirwan JR and the Arthritis and Rheumatism Council L o w Dose Glucocorticoid Study Group. The effect of glucocorticoids on joint destruction in rheumatoid arthritis. N Engl J M e d 1995;333:142-146. Chaudhuri K , Torley H , Madhok R. Cyclosporin. B r J Rheumatol 1997;36:1016-1021. Wells, Haguenauer D , Shea B , Suarez-Almazor M E , Welch V A , Tugwell P. Cyclosporine for rheumatoid arthritis (Cochrane Review). In: the Cochrane library, Issue 3, 1999. Oxford: Update Software. Liang K - Y , Zeger S L . Longitudinal data analysis using generalized linear models. Biometrika 1986;73:13-22. Johns K R . , Littlejohn G O . The safety and efficacy of cyclosporine (Neoral) in rheumatoid arthritis. J Rheumatol 1999;26:2110-3. Carpentier N , Bertin P, Druet Cabanac M , Abdeddaim M , Vergne P, Bonnet C, Treves R. Long-term cyclosporine continuation rates in rheumatoid arthritis patients. Rev. Rhum (Engl. Ed.) 1999;66:245-249. Pascalis L , Aresu G , Pia G . Long-term efficacy and toxicity of cyclosporine A + fluocortolone + methotrexate in the treatment of rheumatoid arthritis. C l i n Exp Rheumatol 1999;17:679-688. Tugwell P, Pincus T, Yocum D , Stein M , Gluck O, Kraag G , et al.. Combination therapy with cyclosporine and methotrexate in severe rheumatoid arthritis. N Engl J M e d 1995;333:137-141. M a i n i R, St. Clair E W , Breedveld F, Furst D , Kalden J, et al.. Infliximab (chimeric anti-tumour necrosis factor alpha monoclonal antibody) versus placebo in rheumatoid 351 arthritis patients receiving concomitant methotrexate: a randomized phase III trial. Lancet 1999;354:1932-1939. 18. Whelton A . Nephrotoxicity o f nonsteroidal anti-inflammatory drugs: physiologic foundations and clinical implications. A m J M e d 1999; 106:13S-24S 19. Ruoff G E . The impact of nonsteroidal anti-inflammatory drugs on hypertension: alternative analgesics for patients at risk. C l i n Ther 1998;20:376-387. 20. Segasothy M , Chin G L , Sia K K , Zulfiqar A , Samad S A . Chronic nephrotoxicity of anti-inflammatory drugs used in the treatment of arthritis. B r J Rheumatol 1995;34:162-165. 21. Anderson JJ, Wells G , Verhoeven A C , Felson D T . Factors predicting response to treatment in rheumatoid arthritis. Arthritis Rheum 2000;43:22-29. 22. Anis A H , Tugwell P X , Wells G A , Stewart D G . A cost effectiveness analysis of cyclosporine in rheumatoid arthritis. J Rheumatol 1996;23:609-616. 352 TABLE 1: RESULTS OF THE MULTIVARIATE ANALYSES Dependent Variable/Outcome Variables Parameter estimate 95%CI Joint Count (divided by 10) Intercept 18.17* 14.73 to 22.42* Time on C S A (months) 0.99* 0.98 to 0.99* M T X combination (y/n) 0.79* 0.63 to 0.99* Morning stiffness (minutes) Intercept Time on C S A (months) Systolic blood pressure (mmHg) Intercept # of previous D M A R D s Age (years) Weight (kg) Diastolic blood pressure (mmHg) Intercept Time on C S A (months) # of previous D M A R D S Age (years) Weight 166.8 115.3 to 218.4 -2.0 -3.0 to-1.1 75.9 61.0 to 90.8 7.2 2.7 to 11.6 0.60 0.45 to 0.74 0.30 0.12 to 0.48 65.1 54.2 to 67.5 0.07 0.02 to 0.09 3.8 3.0 to 6.4 0.07 0.07 to 0.19 0.16 0.10 to 0.22 Serum creatinine (umol/L) Intercept 31.1 16.6 to 45.6 Time on C S A (months) 0.63 0.42 to 0.84 Duration of R A (years) -0.46 -0.77 to -0.15 Age (years) 0.65 0.45 to 0.85 Sex (male as reference group) -14.8 -20.9 to -8 .7 # of previous D M A R D S 34.8 22.1 to 47.7 Interaction between Time on C S A and # of previous D M A R D S -0M -0.47 to -0.26 * For joint count, the parameter estimates are rate ratios (RR) as the log-link was used. A l l other models are linear as the identity link was used. 353 FIGURE 1: SURVIVAL CURVE FOR CSA DISCONTINUATION 1X3 .9 .8 .7 3 GO ro £ O 0.0 X 20 H (-+ L+i 40 100 Duration of C S A Treatment (months) The crosses (+) represent those individuals who were censored 354 FIGURE 2: SURVIVAL CURVE FOR CSA DISCONTINUATION FOR THOSE ON CSA ALONE (SOLID LINE) AND COMBINATION CSA/MTX (DOTTED LINE) 1.0 3 W Qi .5 -t—~t~rtti +- - -+ 1 4 .0 20 40 60 80 100 Duration of CSA treatment (months) The crosses (+) represent those individuals who were censored. 355 APPENDIX IV RHEUMATOID ARTHRITIS ASSESSMENT QUESTIONNAIRE: (SELF-ADMINISTERED) 356 S E C T I O N I: R H E U M A T O I D A R T H R I T I S and H E A L T H C A R E U S E A S S E S S M E N T You may decline to answer any question, however please remember that it is very important that we get the most accurate and complete information we can, and that all the information you provide is completely confidential. HCI. What is your current marital status? • Single • Married • Married and separated • Common - law • Divorced • Widowed HC2. What type of health insurance coverage do you have? | Please check all that apply] • I don't currently have medical insurance • Plan C (Social assistance) • Plan E (Basic MSP) - self paid • Plan E (Basic MSP) - employer paid • Extended medical - self paid • Extended medical - employer paid • Prescription drug plan (3rd party coverage) • Other If O T H E R , please specify: HC3. When were you first diagnosed (by a rheurnatologist) with rheumatoid arthritis? Appropriate date of rheumatoid arthritis diagnosis [month / year] • 1 don't know • I prefer not to answer this question HC4. Do you currently, or have you previously, smoked cigarettes, cigars, or a pipe? • Never smoked |Go to Question HC7] • Currently smoke • Quit smoking • Other • I prefer not to answer this question / 357 If you have quit smoking, how long has it been since you last smoked? • < 3 months • 3-6 months • 6-12 years • 1 - 5 years • > 5 years • I don't know HC6. How much do you, or did you previously smoke? Amount smoked [per day] • I don't know • I prefer not answer this answer HC7. Over the past year have you been admitted to hospital due to your rheumatoid arthritis (ie. Joint surgeries)? • Yes; How many times? • No. • I don't know • I prefer not to answer this question [If NO, go to Question HC91 HC8. What was the total number of days you spent in the hospital because of your rheumatoid arthritis (ie. joint surgeries) in the previous year? Total number of hospital days due to rheumatoid arthritis in the previous year • I don't know • I prefer not to answer this question HC9. Over the past year, have you required any other services for your rheumatoid arthritis such z physiotherapy, occuational therapy, social work, diet/nutrition counselling, or in-home services (e.g. home care)? • Yes • No |Go to Question HCI 1) • 1 don't know • I prefer not to answer this question 358 HC 10. Could you specify the type of service(s) (e.g. physiotherapy, home care), the nature of the actual serv.ce(s) provided (eg. physical conditioning, diet counseling) and number of visits? Type of service Nature of Service Number of visits HC 11 Over the past year, have you had to rent or purchase any equipment (e.g. wheelchair, kitchen aids) related to your rheumatoid arthritis? Yes No I don't know I prefer not to answer this question [Goto Question HCl 5] HCl 2. Type Estimated cost (total per month) Rented Purchased 1. $ • • 2. $ • • »^ j . $ • • 4. $ • • 5. $ • [Goto Question HC15J above, but you decided not to purchase or rent it? Yes No I don't know I prefer not to answer this question HCl 4. What type or equipment was it, and why did you decide not to acquire it? Type of equipment I didn't think it would help I couldn't find one It was too expensive I don't know Other If OTHER, please specify: 359 HCu]5u r ° T t h e P a S t y C a r ' h 3 V e y 0 U U t i l i z e d a\"y complementary methods of health care (e a herbal medications homeopathic medications, acupuncture, healing touch) for the management of your rheumatoid arthritis? 6 • Yes • No • I don't know • I prefer not to answer this question [Go to Section II] HC16. Can you list these complementary methods of care and their estimated cost to you over the last year? Type Estimated cost 1. $ 2. $ 3. $ 4. $ 5. $ 360 S I S ™ S , S e C , i 0 ° i s m a , d e \" P \" f q » e s , i ™ s ™*«e a b o u t y o u r s e l f a n d a n y o n e w h o m i g h t be l i v i n g w i t h y o u ^ : g : ^ L r ~ u ' educa\"on•mcome-and emp,oymen' «*>\"«••« «*»» s» * y PI. What was your main activity during the past 12 months? [Please check only one] • Working at a job • Looking for work • Unable to work due to health reasons • Going to school • Keeping house • Retired • Other If OTHER, please specify: nL !fhy°Ur \"T \"f^ ^ T Y 0 R K I N G A T A J 0 B ' w ^ t is your field of employment (e.g. secretary nurse, laborer, store cleark, plumber, dentist)? ^ Brief description: P3. If you worked, even if your main activity wasn't WORKING A T A JOB, on average approximately how much did you work over the past 12 months? <»'35hours / week) • 30 - 35 hours / week • 20 - 29 hours /week • 10 - 19 hours/week • < 10 hours / week • Casual O N E d l v U n T T l n T f d d , ° ! ; e m p l 0 y m e n t l 0 S t t h e i n C ° m e fr0m a m i s S e d w o r k d ^ w h a t w o u l d «* the value of O N E days lost income (before deductions)? $ Value of one days lost income • I don't know • Not applicable • I prefer not to answer this question P5. Over the past year, have you had to miss work or school because of your rheumatoid arthiritis? • Yes • N o [Go to Question P7] rjNot applicable 361 f l y°!J I 5 1 ™ 3 4 ' h ° W m a n y d a y S ° f W O r k a n d / o r s c h 0 0 ' t h a t y ° u have missed over the past two weeks and the past year because of your rheumatoid arthritis? P Over the past TWO WEEKS Over the past 12 MONTHS Days of WORK missed Days of SCHOOL missed 1 P7. Over the past year, has anyone (e.g. spouse, partner, caregiver, friend) had to miss work or school because of your rheumatoid arthritis? (Go to Question P 9 ] • Yes • No • I prefer not to answer this question P8. Can you specify the relationship of these individuals to you, and how many days of WORK/SCHOOL that they have missed in the previous year because of your rheumatoid arthritis. Work days School days Person 1 Person 2 Person 3 Relationship Missed Missed P9. Have you ever had to change jobs because of your rheumatoid arthritis? • Yes; What was your previous job? • No P10. What is the highest grade (or year) of secondary (high school) or elementary school you have successfully completed? ' Number (1 - 13) of grades of secondary and / or elementary schoolsuccesfully completed • Never attended school, or attended kindergarten only • 1 prefer not to answer this question PI 1. How many years of post-secondary education (after hight school) have you completed? Number of years of post-secondary education • None • Less than one year 362 PI2. What certificates, degrees, or diplomas have you ever obtained? [Please check all that apply] • None • High school diploma • Trades or non-university diploma > • Undergraduate (bachelor's) university diploma • Degree in medicine, dentistry, vet medicine, optometry, chiropractory, etc. • Master or Doctorate degree (M.A., M.Sc, Ph.D., D.Sc, D.Ed.) PI3. Is there another primary, non-dependent adult living in your household (e.g. a spouse or partner? • Yes • No [Go to Question P21] P14. What is this person's relationship to you? • Spouse / Partner • Partner / Guardian • Sibling • Roomate • Other If OTHER, please specify:. PI 5. What was this person's main activity during the last 12 month? • Working at a job • Looking for work • Unable to work due to health reasons • Going to school • Keeping house • Retired • Other • I don't know • I prefer not to answer this question If OTHER, please specify: P16. If this person's main activity was WORKING AT A JOB, what is their field of employment (e.g. secretary, nurse, laborer, store clerk, plumber, dentist)? Brief description 3 6 3 Value of one days lost income • I don't know • Not applicable • I prefer not to answer this question Zw I h t W n f ^ 8 r a d C ( ° r ° f S e C ° n d a r y ( h i g h s c h 0 0 ' ) o r e l e m e \" t a T school this person has P18. succesfully obtained? Number (1 - 13) of grades of secondary and / or elementary schoolsuccesfully completed • Never attended school, or attended kindergarted only • I don't know • I prefer not to answer this question PI 9. How many years of post-secondary education (after hight school) has this person completed? 1 1 ' Number of years of post-secondary education • None • Less than one year • 1 don't know • I prefer not to answer this question P20. What certificates, degrees, or diplomas have they ever obtained? [Please check all that apply] • None • High school diploma • Trades or non-univesity certificate • Undergraduate (bachelor's) university degree • Degree in medicine, dentistry, optometry, chiropractory, etc. • Masters or doctorate degree • I don't know • I prefer not to answer this question 364 P21. What was your approximate total household income from all sources for the previous year, before income tax deduction? [Including only your family members, not roomates that you don't share daily expenses with] • less than $20,000 • $70,001 - $80,000 • $20,000 - $30,000 • $80,001 - $90,000 • $30,001 - $40,000 • $90,001 - $100,000 • $40,001 - $50,000 • greater than $ 100,000 • $50,001 - $60,000 • I don't know • $60,001 - $70,000 • 1 prefer not to answer this question P22. Do you have access to an automobile? • Yes • No If YES: How many automobiles do you have? Total number of automobiles P23. Do you have any children? • Yes • No If YES: How many children do you have? Total number of children How many children currently live with you? Total number of children currently living with you P24. How would you describe the type of dwelling that you currently live in? • Single detached home • Duplex or Townhouse • Apartment or Condominium • Mobile home • Boarding roon / Hotel / Rooming house • Other ' • • I prefer not to answer this question If OTHER, please specify: 365 P25. How many rooms does your current residence have, including kitchen living room bedroom. Total number of rooms P26. How many of these rooms are bedrooms? Number of bedrooms P28. P27. How many adults and children (including yourself) live in your current residence? Total number of adults Total number of children Is your current residence: • Owned by you or a member of household • Rented [even if no rent is paid] • Subsidized housing (you receive government rental assistance) • Other • I don't know • I prefer not to answer this question If OTHER, please specify: 366 SECTION III: RHEUMATOID ARTHRITIS MEDICATION USE • Yes; How many times? M2. • No • I don't know • I prefer not to answer this question If NO, go to Question M3 ] What was your reason for not filling the prescription? • It was too expensive • I didn't think I needed it • I didn't think it would help • I couldn't get to the pharmacy • Other If OTHER, please specify: o^ssure^arthHtir Tu I™\"\"*?6 d o > ™ h a v e a ^ ^ chronic diseases (such as high blood pressure, arthritis, d.abeter, angina, depression) that have been diagnosed by your doctor? • Yes • No • I don't know • I prefer not to answer this question [Go to Question M5J M4. If you have other chronic diseases in addition to your rheumatoid arthritis diseases? ' can you list these 1. 5. 2. 6. j . 7. 4. 8. 367 M7. Did you stop taking any ARTHRITIS MEDICATIONS during the PAST 3 MONTHS, regardless of reason? • Yes • No [Go to Question M8] M7 (continued) IF YES, please complete A L L T H E BLANKS ON T H E LINE for any medications that you have stopped and tell us about the medicine you are taking instead (These medications should also be listed in question M6). Name of Medication You Stopped If Stopped, Why? Did You Start Another Medication to Replace it? If Yes, Which Medication? Yes No • • • • • • • • M8. Overall, how would you describe the severity of your rheumatoid arthritis'? [Please check one) • Very Mild • Mild • Moderate • Severe • Very Severe M9. Overall, how would you classify the control of your rheumatoid arthritis? [Please check one] • • • • • Very Well Well Adequately Not Well Not Controlled Controlled Controlled Controlled Controlled At All M10. Below is a line with '0' at the left-hand end and a i ' at the right-hand end. The '0' represents \"death\", the '1' represents \"perfect health\", and the area in between represents a state of health somewhere in between. Make a mark on the line at the point that you feel represents how you feel today. 0 1 D e a t h Perfect Health 368 SECTION IV: DRUG SIDE E F F E C T It is very important that we get the most accurate information that we possibly can. Some of the questions ask for very specific details, and deal with some things that may have happended over the three months, so please take your time and try and answer the questions as accurately as poss.ble. Once aga.n, please remember that any answers you give are completely confidential Over the PAST T H R E E MONTHS have you had any side effects from your rheumatoid arthritis SI. medications? • Yes n N o I Go to SECTION V | If YES, complete the rest of this section. DIRECTIONS 1. Write in the name of the drug causing the side effect(s). 2. Indicate wheather you stopped the drug. 3. List side effect(s) for each drug. PI ease list any abnormal laboratory findings such as low white blood count, protein in urine, low platelets, kidney problems, anemia, liver problems. 4. Check the severity of each side effect. 5. Please indicate how important the side effect was to you by making a mark on the scale from 0 to 1, where 0 is \"Not at all\" and 1 is \"Very Much\". 6. If you need more room, please use the back of this questionnaire. A. . 1. Drug Name: 2. Did you stop taking the drug because of a side effect? • Yes • No 3. List side effect 4. Severity of side effect • Mild • Moderate • Severe 5. How important was this side effect TO YOU? 1 Very Much 0 Not at all 369 1. Drug Name: 2. Did you stop taking the drug because of a side effect? • • 3. List side effect 4. Severity of side effect • Mild • Moderate • Severe 5. How important was this side effect TO YOU? 0 Not at all 1. Drug Name: 2. Did you stop taking the drug because of a side effect? • Yes • No 3. List side effect Severity of side effect • Mild • Moderate • Severe How important was this side effect TO YOU? 0 Not at all 370 SECTION V: RHEUMATOID ARTHRITIS CLINICAL MEASURES This section asks questions regarding your rheumatoid arthritis that are commonly determined to see Sit;e ,s 8 contro,1i-You m a y d e c i i n e t o a n s w e r a n y q u e s t i ° n s ' h o w e v - p -hat an Zt T 1 7 l m p° r t a ni t h a t w e &the m o s t accurate and complete information we can and \"VERY F t , ^ . ; h a n d a n d a T a t t h G r i g h t - h a n d e n d \" T h e '°' presents VERY WELL , the T represents \"VERY POOR\", and the area in between represents a state of health somewhere in between. Considering all the ways that your arthritis affects you, rate how you are doing on the following scale by placing a mark on the line. i 0 1 Very Well Very Poor C2. Below is a line with '0' at the left-hand end and a '1' at the right-hand end. The '0' represents \"NO PAIN\", the '1' represents \"SEVERE PAIN\", and the area in between represents a state of pain somewhere in between. Make a mark on the line at the point that you feel represents how much pain have you had because of your arthritis IN THE PAST WEEK? i 0 1 No Pain Severe Pain C3. The following two pages are pictures of mannequins to help us determine your number of tender/painful joints (FIRST MANNEQUIN) and swollen joints (SECOND MANNEQUIN). You may decline to answer any question, however please remember that it is very important that we get the most accurate and complete information we can, and that all the information you provide is completely confidential. 371 T = M r ; c L n d l C a t e W l t h a n \" X \" i n t h e C i r c l e s b e l o w ' a n V J ° i n t s which are PAINFUL or In^ o-fJth ? r ef[?V 0 t 6 S t f ° r P a h ' m ° V e y 0 U r j 0 i n t s i n a f u l 1 r a n 9 e o f mot ion and then squeeze the jo int between your thumb and foref inger. BREAST BONC JOINT AW JOINT RIGHT MlD-ftNCER JOINTS RIGHT M I D D L E o f THE F O O T SMOUIDCR €LBOWS KNUCKLES JOINTS AT BASE OF Toes BIG Toes 372 To test for SWELLING, notice if your joint appears larger or bulging and squeeze to see if it feels like it ,s full of l iquid or like a sponge. Indicate with an \" X \" in the circles below any jo ints which are swol len at present. BREAST B O N E JOINT — v \\ w • ~ fcHAW JOINT RIGHT MlO-flNCER JOINTS RIGHT M I D D L E o f THE F O O T SHOUIOCR KNUCKLES JOINTS AT BASE o f TOES B I G Toes 373 Section VI Please choose the response that applies best to v o u at this time C h e c k only one box for each statement. Yes n 1. I have to go to bed earlier than I would like to No • Yes E 2. I'm afraid of people touching me No • 3. It's difficult to find comfortable shoes that I like Yes • No • 4. I avoid crowds because of my arthritis Yes D No • 5. I have difficulty getting dressed Yes ^ No • • Yes • 6. I find it difficult walking from store to store No • Yes • 7. Household chores take me a long time No • 8. I sometimes have problems using the toilet Y e s • No n Yes 9. I often get frustrated ^ No D 10. I frequently have to stop what I am doing to rest Y e S • No n I have difficulty using a knife and fork Y e s n No • 374 Please choose the response that applies best to you at this time. 12. I find it hard to concentrate 13. Sometimes I just want to be left alone 14. I have difficulty walking very far 15. I try to avoid shaking hands with people 16. I often get depressed 17. I'm unable to join in activities with my family or friends 18. I have difficulty taking a bath/shower (Please answer for the one you usually use) 19. Sometimes I have a good cry because of my arthritis 20. M y arthritis limits the places I can go 21. Any amount of activity I do makes me feel tired 22. I feel dependent on others 23. M y arthritis is constantly on my mind 24. I often get angry with myself Y e s D N o n Yes • No • Yes • No • Yes • No • Yes • No • Yes • No • Yes • No • Yes • No • Yes • No • Yes • No • Yes • No • Yes • No • Yes • No • 375 Please choose the response that applies best to vou at this time. 25. It's too much effort to go out and see people Y e s • No n 26. I sleep poorly at night 28. I feel unable to control my arthritis Yes • No • 27. I find it difficult to take care of the people I am close to Y e S D No • Yes • No • 29. I avoid physical contact Y e s ^ No • 30. I'm limited in the clothes I can wear Y e s '-' No • 376 This survey asks for your views about your health. This information will help you keep track of how you feel and how well you are able to do your usual activities. Answer every question by selecting the answer as indicated. If you are unsure about how to answer a question, please give the best answer you can. Please check the level that you feel represents your health in the following categories, (ie. Physical Functioning, Role Limitations, Social Functioning, Pain, Mental Health, Vitality) Physical Functioning • Your health does not limit you in vigorous activities • Your health limits you a little in vigorous activities • Your health limits you a little in moderate activities • Your health limits you a lot in moderate activities • Your health limits you a little in bathing and dressing • Your health limits you a lot in bathing and dressing Role limitations Pain • You have no pain • You have pain but it does not interfere with your normal work (both outside the home and housework\") • You have pain that interferes with your normal work (both outside the home and housework) a little bit • You have pain that interferes with your normal work (both outside the home and housework) moderately • You have pain that interferes with your normal work (both outside the home and housework) quite a bit D You have pain that interfereswith your normal work (both outside the home and housework) extremly Mental health • You have no problems with your work or other regular daily activities as a result • You feel tense or downhearted and low none of the time of your physical health or any emotional problems D You are limited in the kind of work ot other activities as a result of your physical health • You accomplish less than you would like as a result of emotional problems • You feel tense or downhearted and low a little of the time • You feel tense or downhearted and low some of the time • You feel tense or downhearted and low most of the time • You are limited in the kind of work or other activities as a result of your physical health and accomplish less than you would like as a result of emotional problems D Y ou feel tense or downhearted and low all of the time Social Functioning • Your health limits your social activities none of the time • Your health limits your social activities a little of the time • Your health limits your social activities some of the time • Your health limits your social activities most of the time • Your health limits your social activities all of the time Vitality • You have a lot of energy all of the time • You have a lot of energy most of the time • You have a lot of energy some of the time • You have a lot of energy a little of the time • You have a lot of energy none of the time 377 EUROQOL (Page 1 of 2) Your own health state today By placing a tick in one box in each group below, please indicate which statements best describe your own health state today. Mobility • I have no problems in walking about • I have some problems in walking about • I am confined to bed Self-Care • I have no problems with self-care • I have some problems washing or dressing myself D 1 am unable to wash or dress myself Usual Activities (e.g. work, study, housework, family or leisure activities) • I have no problems with performing my usual activities • I have some problems with performing my usual activities • 1 am unable to perform my usual activities Pain/Discomfort • I have no pain or discomfort • I have moderate pain or discomfort • I have extreme pain or discomfort Anxiety/Depression • I am not anxious or'depressed • I am moderately anxious or depressed • I am extremely anxious or depressed 378 EUROQOL (Page 2 of 2) Your own health state today i To help people say how good or bad a health state is, we have drawn a scale (rather like a thermometer) on which the best state you can imagine is marked 100 and the worst state you can imagine is marked 0. We would like you to indicate on this scale how good or bad your own health is today, in your opinion. Please do this by drawing a line from the box below to whichever point on the scale indicates how good or bad your health state is. Your own health state today Best imaginable health state 100 9;;o 8 : : ° 7;:o e:;o 4; jo 3 ; ; o 2 : ; o 1 0 Worst imaginable health state 379 Health Utilities Index Mark 2 and Mark 3 Since the HUI2 and H U D are proprietary, rather than the full instruments, the attributes and levels for each scale (with a description) has been included below. Health status classification system: H U I 2 Attribute Level Description S E N S O R Y 1 Able to see, hear and speak normally for age 2 Requires equipment to see or hear or speak 3 Sees, hears, or speaks with limitations even with equipment 4 Bl ind , deaf or mute M O B I L I T Y 1 2 4 5 Able to walk, bend, lift, jump and run normally for age Walks, bends, lifts, jumps or runs with some limitations but does not require help Requires mechanical equipment (such as canes, crutches, braces or wheelchair) to walk or get around independently Requires the help o f another person to walk or get around and requires mechanical equipment as well Unable to control or use arms and legs E M O T I O N 1 2 4 5 Generally happy and free from worry Occasionally fretful, angry, irritable, anxious, depressed, or suffering \"night terrors\" Often fretful, angry, irritable, anxious, depress or suffering \"night terrors\" Almost always fretful, angry, irritable, anxious, depressed Extremely fretful, angry, irritable or depressed usually requiring hospitalization or psychiatric institutional care C O G N I T I V E 1 2 3 4 Learns and remembers school work normally for age Learns and remembers school work more slowly than classmates as judged by parents and/or teachers Learns and remembers very slowly and usually requires special educational assistance Unable to learn and remember S E L F - C A R E 1 2 Eats, bathes, dresses and uses the toilet normally for age Eats, bathes, dresses or uses the toilet independently with 380 difficulty Requires mechanical equipment to eat, bathe, dress or use the toilet independently Requires the help of another person to eat, bathe, dress or use the toilet Free of pain and discomfort Occasional pain. Discomfort relieved by non-prescription drugs or self-control activity without disruption of normal activities Frequent pain. Discomfort relieved by oral medicines with occasional disruption of normal activities Frequent pain; frequent disruption of normal activities. Discomfort requires prescription narcotics for relief Severe pain. Pain not relieved by drugs and constantly disrupts normal activities 381 Health status classification system: H U D V I S I O N 1 Able to see well enough to read ordinary newsprint and recognize a friend on the other side of the street, without glasses or contact lenses. 2 Able to see well enough to read ordinary newsprint and recognize a friend on the other side o f the street, but with glasses. 3 Able to read ordinary newsprint with or without glasses but unable to recognize a friend on the other side of the street, even with glasses. 4 Able to recognize a friend on the other side of the street with or without glasses but unable to read ordinary newsprint, even with glasses. 5 Unable to read ordinary newsprint and unable to recognize a friend on the other side of the street, even with glasses. 6 Unable to see at all . H E A R I N G 1 Able to hear what is said in a group with at least three other people, without a hearing aid. 2 Able to hear what is said in a conversation with one other person in a quiet room without a hearing aid, but requires a hearing aid to hear what is said in a group conversation with at least three other people. 3 Able to hear what is said in a conversation with one other person in a quiet room with a hearing aid, and able to hear what is said in a group conversation with at least three other people, with a hearing aid. 4 Able to hear what is said in a conversation with one other person in a quiet room, without a hearing aid, but unable to hearing what is said in a group conversation with at least three other people even with a hearing aid. 5 Able to hear what is said in a conversation with one other person in a quiet room with a hearing aid, but unable to hear what is said in a group conversation with at least three other people even with a hearing aid. 6 Unable to hear at all . S P E E C H 1 Able to be understood completely when speaking with strangers or friends. 2 Able to be understood partially when speaking with strangers but able to be understood completely when speaking with people who know me well . 382 3 Able to be understood partially when speaking with strangers or people who know me well . 4 Unable to be understood when speaking with strangers but able to be understood partially by people who know me well . 5 Unable to be understood when speaking to other people (or unable to speak at all). A M B U L A T I O N 1 3 4 5 Able to walk around the neighbourhood without difficulty, and without walking equipment. Able to walk around the neighbourhood with difficulty; but does not require walking equipment or the help of another person. Able to walk around the neighbourhood with walking equipment, but without the help of another person. Able to walk only short distances with walking equipment, and requires a wheelchair to get around the neighbourhood. Unable to walk alone, even with walking equipment; able to walk short distances with the help of another person, and requires a wheelchair to get around the neighbourhood. Cannot walk at all . D E X T E R I T Y 1 2 Ful l use of two hands and ten fingers. Limitations in the use of hands or fingers, but does not require special tools or help of another person. Limitations in the use of hands or fingers, is independent with use of special tools (does not require the help of another person). Limitations in the use of hands or fingers, requires the help of another person for some tasks (not independent even with use of special tools). Limitations in use of hands or fingers, requires the help of another person for most tasks (not independent even with use of special tools). Limitations in use of hands or fingers, requires the help of another person for all tasks (not independent even with use of special tools). E M O T I O N 1 Happy and interested in life. 2 Somewhat happy. 3 Somewhat unhappy. 4 Very unhappy. 383 5 So unhappy that life is not worthwhile. C O G N I T I O N 1 2 3 4 5 6 Able to remember most things, think clearly and solve day to day problems. Able to remember most things, but have a little difficulty when trying to think and solve day to day problems. Somewhat forgetful, but able to think clearly and solve day to day problems. Somewhat forgetful, and have a little difficulty when trying to think or solve day to day problems. Very forgetful, and have great difficulty when trying to think or solve day to day problems. Unable to remember anything at all , and unable to think or solve day to day problems. P A I N 1 Free of pain and discomfort. 2 M i l d to moderate pain that prevents no activities 3 Moderate pain that prevents a few activities. 4 Moderate to severe pain that prevents some activities. 5 Severe pain that prevents most activities. For the standard version of the H U I Mark 2 and 3 self-administered, self-assessed \"one-week\" health status assessment contact: Health Utilities Inc., Dundas O N , Canada L 9 H 2 V 3 , phone (905)525-9140 url: 384 Health Assessment Questionnaire Please place an \"X\" In the box IfiJ which best describes your usual abilities OVER THE PAST WEEK; DRESSING & G R O O M I N G WITHOUT A K V D I F R C U I T Y Are you able to: Dress yourself, including shoelaces and buttons? Shampoo your hair? • WITH S O M E DIFFICULTY o o WITH MUCH OSFHCULTY o U N A B U E T O O O o Are you able to: StamS up from a straight chai r? Get in and out of bed? Are you able to: Cut your meat? Lift a fell cup or glass to your mouth? Open a new milk carton? Are you able to: Walk outdoors on flat ground? Climb up five steps? o • • • o • o o o o • o o p o o o o o o • o Please check any AIDS OR DEVICES that you usually use for any of the above activities: O Devices Used for Dressing Q Built up or special utensils Q Crutches (buttonhook, zipper pull, etc.) Ocsne O Wheelchair Fl Special or built up chair Q Walker Please check any categories for which you usually need HELP FROM ANOTHER PERSON: Q Dressing and Grooming OArfebtg Q E a i n g Q Walking 385 P l e a s e piacs a n \" X \" i n the b o x \\jQ w h i c h bes t describe* y o u r u s u a l ab i l i t ies OVER T H E P A S T W E E K : A re y o u ab le to : W a s h and dry your body? Take a tub bath? Go t on and off the toilet? A r e you able to: Roach and get down a 5 pound object (such a s a bag ot sugar) from above your head? Bend down to pick up doth Ing from the floor? A r c y o u ab le t o : Open car doors? O p e n previously opened jars? Turn faucets on end off? ACTIVITIES A r e y o u ab le t o : Run errands and s i tup? Get In and out of a ca r? O o chores such a s vacuuming o r yard work? W I T H O U T ANY DIFICULTY WITH SOME D I F F I C U L T Y WITH MUCH DIFICULTY UNABLE TO DO • • o • • o o • • • • • o p • • • • • o o o o • • • o • • o P l e a s e c h e c k a n y A IDS O R D E V I C E S that y o u usua l l y u s e fo r a n y of these ac t iv i t ies : Q Ra ised toilet sea l (\"J Bathtub bar Q Long-handled appl iances for reach (\"J Bathtub s e a l f~] long-handled appliances in bathroom Q Ja r opener (for jars previously opened) P l e a s e c h e c k arty ca tego r i es fo r w h i c h y o u usua l l y n e e d H E L P F R O M A N O T H E R P E R S O N : Q Hygiene Q R e a c h f j Gr ipp ing and opening things f*~[ Errands and chores 386 APPENDIX V RHEUMATOID ARTHRITIS ASSESSMENT QUESTIONNAIRE: 3 MONTHS (SELF-ADMINISTERED) The content of the 3 month questionnaire was the same as the baseline questionnaire (from Section III onwards) and only the first page (see below) was added. 387 Please print within the specified boxes (if possible) and mark all checked boxes with an \"X\" I. Overall how would you describe changes in your rheumatoid arthritis since answering our LAST questionnaire (about 3 months ago)? [Please check one] • • • MuchWorse Somewhat The Same Worse • • Somewhat Much Better Better 388 APPENDIX VI RHEUMATOID ARTHRITIS ASSESSMENT QUESTIONNAIRE: 6 MONTHS (SELF-ADMINISTERED) The content of the 6 month questionnaire was the same as the baseline questionnaire (from Section III onwards) and only the first page (see below) was added. 389 Please print within the specified boxes (if possible) and mark all checked boxes with an \"X\" 1. Overall, how would you describe changes in your rheumatoid arthritis since answering our LAST questionnaire (about 3 months ago)? [Please check one] • • • • • MuckWorse Somewhat The Same Somewhat Much Better Worse Better 2 . Overall, how would you describe changes in your rheumatoid artkritis since answering our FIRST questionnaire (about 6 months ago)? [Please check one] • • • • • M u c k W o r s e S o m e w h a t T h e S a m e S o m e w h a t M u c h B e t t e r W o r s e Bet ter 390 "@en ; edm:hasType "Thesis/Dissertation"@en ; vivo:dateIssued "2004-05"@en ; edm:isShownAt "10.14288/1.0091775"@en ; dcterms:language "eng"@en ; ns0:degreeDiscipline "Health Care and Epidemiology"@en ; edm:provider "Vancouver : University of British Columbia Library"@en ; dcterms:publisher "University of British Columbia"@en ; dcterms:rights "For non-commercial purposes only, such as research, private study and education. Additional conditions apply, see Terms of Use https://open.library.ubc.ca/terms_of_use."@en ; ns0:scholarLevel "Graduate"@en ; dcterms:title "Outcome measures in economic evaluations of rheumatoid arthritis"@en ; dcterms:type "Text"@en ; ns0:identifierURI "http://hdl.handle.net/2429/15983"@en .