Open Collections

UBC Theses and Dissertations

UBC Theses Logo

UBC Theses and Dissertations

Outcome measures in economic evaluations of rheumatoid arthritis Marra, Carlo Armando 2004

Your browser doesn't seem to have a PDF viewer, please download the PDF to view this item.

Item Metadata

Download

Media
831-ubc_2004-902269.pdf [ 16.39MB ]
Metadata
JSON: 831-1.0091775.json
JSON-LD: 831-1.0091775-ld.json
RDF/XML (Pretty): 831-1.0091775-rdf.xml
RDF/JSON: 831-1.0091775-rdf.json
Turtle: 831-1.0091775-turtle.txt
N-Triples: 831-1.0091775-rdf-ntriples.txt
Original Record: 831-1.0091775-source.json
Full Text
831-1.0091775-fulltext.txt
Citation
831-1.0091775.ris

Full Text

OUTCOME MEASURES IN ECONOMIC EVALUATIONS OF RHEUMATOID ARTHRITIS by C A R L O A R M A N D O M A R R A B.Sc. (Pharm.), University o f British Columbia, 1992 Pharm.D., University of British Columbia, 1995 A THESIS S U B M I T T E D I N T H E P A R T I A L F U L F I L M E N T O F T H E R E Q U I R E M E N T S F O R T H E D E G R E E O F D O C T O R OF P H I L O S O P H Y in T H E F A C U L T Y O F G R A D U A T E S T U D I E S Department of Health Care and Epidemiology We accept this thesis as conforming to the required standard Aslam H . Anis John M . Esdaile Jacek Kopec Anthony E . Boardman T H E U N I V E R S I T Y O F B R I T I S H C O L U M B I A May 06, 2004 © Carlo Armando Marra, 2004 The work presented in this thesis was conceived, conducted and disseminated by the doctoral candidate. The co-authors of the manuscripts that comprise part of this dissertation made contributions only as is commensurate with a dissertation committee or as experts in a specific area as it pertains to the work. The co-authors provided direction and support. The co-authors reviewed each manuscript prior to submission for publication and offered critical evaluations; however, the candidate was responsible for the writing and the final content of thee manuscripts. Aslam H . AnJs, PW.D., Chair, Supervisory Committee ABSTRACT Objectives: The primary objectives of this study were to: 1) compare the properties of commonly utilized indirect utility assessment instruments (the Health Utilities Index Mark 2 and 3 [HUI2 and H U D ] , the EuroQol [EQ-5D], and the Short Form 6-D [SF-6D] in terms of feasibility, reliability, construct validity and longitudinal construct validity (responsiveness) in rheumatoid arthritis (RA) ; and 2) determine if, when utilized to act as quality weights in the estimation of quality adjusted life years ( Q A L Y s ) in an economic evaluation, the application of scores from the different instruments would result in different incremental cost per Q A L Y ratios. The primary hypotheses of this study were that there would be differences between these instruments in terms of their properties and that using their scores to estimate Q A L Y s in an economic evaluation of an intervention for R A would result in significantly different estimates. Methods: Three hundred and twenty patients between 19 and 90 years of age diagnosed with R A residing in the Greater Vancouver Regional District or rural Okanagan region o f British Columbia were recruited. Patients were administered a questionnaire containing the HUI2 , H U D , E Q - 5 D , SF-6D, a disease-specific instrument (the Rheumatoid Arthritis Quality of Life [ R A Q o L ] questionnaire, and a disability index (the Health Assessment Questionnaire [HAQ]) . In addition, questions were asked regarding R A management (including drug use and toxicity), R A severity (including swollen and tender joints, pain visual analogue scale, R A duration, patient global assessment o f disease activity V A S , and self-perceived R A severity and control), socio-economic status, and R A health utilization. Questionnaires were administered at baseline, three and six months thereafter. In a subset of patients, an i i additional questionnaire was administered within five weeks of the three month questionnaire to determine reliability. Results: Scores obtained with the HUI2, H U D , EQ-5D, and the SF-6D were significantly different, had low agreement, and appeared to be measuring mostly physical function and pain. A l l the instruments displayed cross-sectional construct validity and were able to discriminate between different levels of severity of R A . However, when their scores were used to estimate Q A L Y s in an economic evaluation of R A , there was a two fold difference between the lowest (using the H U D ) and highest (using the SF-6D) incremental cost per Q A L Y ratios. Further examination revealed that the scores achieved with the indirect utility assessment instruments were influenced by annual household income despite adjustment for R A severity and other chronic diseases. Finally, in longitudinal analyses, the disease-specific R A Q o L displayed the highest reliability and sensitivity to change with the H U D and SF-6D scores being the most responsive of the indirect utility assessment instruments in measuring positive change. Conclusions: Although all indirect utility assessment measures appear to be able to assess generic H R Q L in R A , when used as quality weights to estimate Q A L Y s in an economic evaluation, they yielded vastly different estimates o f the incremental cost-effectiveness ratio that could result in different policy recommendations. The scores of these instruments could also be influenced by income leading to possible bias in cost-effectiveness analyses. The H U D and SF-6D were responsive to positive changes in R A . The R A Q o L displayed excellent properties and is a suitable disease-specific H R Q L instrument for R A . 111 TABLE OF CONTENTS ABSTRACT II TABLE OF CONTENTS IV LIST OF TABLES .- : IX LIST OF FIGURES XII CHAPTER 1: INTRODUCTION 1 1.1 RHEUMATOID ARTHRITIS: EPIDEMIOLOGY, ECONOMIC BURDEN AND ECONOMIC EVALUATION 1 1.2 RESEARCH NEEDS AND STUDY JUSTIFICATION 8 1.3 STUDY HYPOTHESIS, OBJECTIVES, AND THESIS ORGANIZATION 11 1.4 SUMMARY 13 1.5 REFERENCES 15 CHAPTER 2: BACKGROUND 25 2.1 COST-EFFECTIVENESS ANALYSIS AND THE QALY 25 2.2 PREFERENCE-BASED, INDIRECT UTILITY ASSESSMENT MEASURES 31 2.2.1. The Health Utilities Index (HUI) Mark 2 and 3 31 2.2.2. The EuroQol (EQ-5D) 35 2.2.3. The Short Form 6D (SF-6D) 39 2.3 EMPIRIC COMPARISONS BETWEEN THE INDIRECT UTILITY ASSESSMENT INSTRUMENTS .'...40 2.3.1. Comparisons between the Health Utilities Index Mark 2 and Mark 3 40 2.3.2. Comparisons across Indirect Utility Assessment Instruments Outside of Musculoskeletal Diseases 44 2.3.3. Comparisons across Indirect Utility Assessment Instruments within Musculoskeletal Diseases 54 2.4 QUALITY WEIGHTINGS IN THE ESTIMATION OF QALYS IN COST-UTILITY ANALYSES IN RA: WHAT ARE INVESTIGATORS USING? 58 2.5 SUMMARY 60 iv 2.6 REFERENCES 62 CHAPTER 3: A COMPARISON OF FOUR INDIRECT METHODS OF ASSESSING UTILITY VALUES IN RHEUMATOID ARTHRITIS 74 3.1 FOREWORD 74 3.2 INTRODUCTION 74 3.3 METHODS 76 3.3.1. Measures 77 3.3.2. Data Analysis • 78 3.4 RESULTS 80 3.4.1 Comparison of Utility Scores 80 3.4.2 Analysis of Agreement 82 3.4.3 Exploratory Factor Analysis 82 3.5 DISCUSSION 83 3.6 REFERENCES 88 CHAPTER 4: COMPARISON OF GENERIC, INDIRECT UTILITY MEASURES (THE HUI2, HUB, SF-6D, AND THE EQ-5D) AND DISEASE-SPECIFIC INSTRUMENTS (THE RAQOL AND THE HAQ) IN RHEUMATOID ARTHRITIS 106 4.1 FOREWORD 106 4.2 INTRODUCTION 106 4.3 METHODS HO 4.3.1 Sample 110 4.3.2 Measures HI 4.3.3 Data Analysis 113 4.4 RESULTS 115 4.4.1 Sample 115 4.4.2 Description of Global and Single-Attribute Utilities 116 4.4.3 Construct Validity 117 4.5 DISCUSSION 119 4.6 REFERENCES 123 CHAPTER 5: NOT ALL QALYS ARE EQUAL: THE IMPACT OF USING DIFFERENT INDIRECT UTILITY MEASURES ON ESTIMATING THE COST-UTILITY OF INFLIXIMAB IN RHEUMATOID ARTHRITIS 137 5.1 FOREWORD 137 5.2 INTRODUCTION 137 5.3 METHODS 139 5.3.1 Clinical Trial Data Source 139 5.3.2 Overview of Model 141 5.3.3 Transition Probability Matrices and Statistical Modeling 142 5.3.4 Mortality Rate 144 5.3.5 Utilities and QALYS 145 5.3.6 Cost Estimation 145 5.3.7 Survival Analysis 148 5.3.8 Cost-Utility and Probabilistic Analysis 148 5.3.9 Univariate Sensitivity Analysis 149 5.4 RESULTS 150 5.4.1 Simulation Results 150 5.4.2 Utility and QALY Values 150 5.4.3 Cost-Utility and Probabilistic Analysis 151 5.4.4 Traditional Sensitivity Analysis 152 5.5 DISCUSSION 152 5.6 REFERENCES 158 CHAPTER 6: THE IMPACT OF LOW FAMILY INCOME ON SELF-REPORTED HEALTH OUTCOMES IN PATIENTS WITH RHEUMATOID ARTHRITIS WITHIN A PUBLICLY-FUNDED HEALTH CARE ENVIRONMENT 187 6.1 FOREWORD 187 6.2 INTRODUCTION 187 vi 6.3 METHODS 189 6.3.1 Study Sample and Design 189 6.3.2 Generic Health-Related Quality of Life Measurement 189 6.3.3 Functional Status Measurement 190 6.3.4 RA Specific Quality of Life Measure 191 6.3.5 Clinical Measurements 192 6.3.6 Socioeconomic Status 192 6.3.7 Statistical Analysis 193 6.4 RESULTS 194 6.5 DISCUSSION 198 6.6 REFERENCES 205 CHAPTER 7: ARE INDIRECT UTILITY MEASURES RELIABLE AND RESPONSIVE IN RHEUMATOID ARTHRITIS PATIENTS? 218 7.1 FOREWORD 218 7.2 INTRODUCTION 218 7.3 METHODS 220 7.3.1 Study Sample 220 7.3.2 Measures 221 7.3.3 Data Analysis 224 7.4 RESULTS 229 7.4.1 Demographics and Missing Values 229 7.4.2 Reliability 230 7.4.3 Validity of the Transition Questionnaire 231 7.4.4 Responsiveness 232 7.4.5 Flexible Polytomous Regression Techniques 234 7.4.6 Change in Unweighted Domain Scores (EQ-5D, SF-6D) and Single Attribute Utilities (HUI2, HUD) 236 7.5 DISCUSSION 236 vii 7.6 REFERENCES 246 CHAPTER 8: GENERAL DISCUSSION, CONCLUSIONS, AND RECOMMENDATIONS 278 8.1 SUMMARY OF STUDY FINDINGS 278 8.2 UNIQUE CONTRIBUTIONS, IMPACT, AND IMPLICATIONS 282 8.3 STUDY STRENGTHS AND LIMITATIONS 284 8.3.1 Strengths 284 8.3.2 Limitations 287 8.4 RECOMMENDATIONS 296 8.4.1 Further Research 297 8.5 CONCLUSIONS : 298 8.6 REFERENCES 301 APPENDIX 1 308 APPENDIX II 318 APPENDIX III 338 APPENDIX IV 356 APPENDIX V 387 APPENDIX VI 389 viii LIST OF TABLES TABLE 2.1: SOURCE OF PREFERENCES USED FOR QALY WEIGHTS IN ECONOMIC EVALUATIONS OFRA 73 TABLE 3.1: COMPARISON OF THE INDIRECT UTILITY ASSESSMENT INSTRUMENTS 91 TABLE 3.2: CLINICAL CHARACTERISTICS OF THE STUDY PARTICIPANTS : 92 TABLE 3.3: OVERALL MEAN AND MEDIAN UTILITY SCORES FROM THE INSTRUMENTS IN THE SAMPLE OF RA PATIENTS 93 TABLE 3.4: INTRACLASS CORRELATIONS AND 95% CONFIDENCE INTERVALS BETWEEN INSTRUMENTS 94 TABLE 3.5: ROTATED FACTOR PATTERN MATRIX 95 TABLE 3.6: FACTOR CORRELATION MATRIX 96 TABLE 3.7: RELATIVE PRATT INDEX SCORES ASSESSING RELATIVE CONTRIBUTION OF EACH FACTOR TO THE MODEL'S ADJUSTED R2 97 TABLE 4.1: OVERVIEW OF MAUT INSTRUMENT PROPERTIES 127 TABLE 4.2: CHARACTERISTICS OF THE STUDY PARTICIPANTS 128 TABLE 4.3: MULTI-ATTRIBUTE AND SINGLE ATTRIBUTE UTILITY SCORES FROM THE MAUT INSTRUMENTS 129 TABLE 4.4: DOMAIN RESPONSES FOR THE MAUT INSTRUMENTS 130 TABLE 4.5: RELATIONSHIP BETWEEN RA SEVERITY AND CONTROL AND THE GLOBAL UTILITY SCORES FOR EACH OF THE MAUT INSTRUMENTS 132 TABLE 4.6: DICHOTOMOUS MEASURES OF RA SEVERITY 133 TABLE 4.7: CORRELATIONS (SPEARMAN'S RHO) FOR MULTI-ATTRIBUTE AND SELECT SINGLE ATTRIBUTE UTILITY SCORES WITH RA SEVERITY 134 TABLE 4.8: SIMPLE LINEAR REGRESSION ANALYSES FOR OVERALL INSTRUMENT SCORES AND HAQ 135 TABLE 5.1: OBSERVED TRANSITION PROBABILITY MATRICES FOR METHOTREXATE FROM THE ATTRACT TRIAL (FROM WEEK 30 TO WEEK 54) 162 ix T A B L E 5.2: OBSERVED TRANSITION PROBABILITY MATRICES FOR INFLIXIMAB FROM THE ATTRACT TRIAL (FROM WEEK 30 TO WEEK 54) 163 TABLE 5.3: CALCULATED WEEKLY TRANSITION PROBABILTY MATRIX FOR METHOTREXATE 164 TABLE 5.4: CALCULATED WEEKLY TRANSITION PROBABILTY MATRIX FOR INFLIXIMAB. 165 TABLE 5.5: UNIT COSTS (IN CANADIAN DOLLARS), OTHER PARAMETERS AND EQUATIONS IN THE MARKOV MODEL 166 TABLE 5.6: MULTIPLE LINEAR REGRESSION MODELS OF THE INDIRECT UTILITY MEASURES 167 TABLE 5.7: DISCOUNTED QALYS GENERATED BY INDIRECT UTILITY METHOD IN THE MARKOV MODEL 168 TABLE 5.8: EXPECTED COSTS AND INCREMENTAL COST-UTILITY RATIOS GENERATED BY THE INDIRECT UTILITY METHODS 169 TABLE 5.9: UNIVARIATE SENSITIVITY ANALYSIS - INCREMENTAL COST-UTILITY RATIO (INCREMENTAL COST PER QALY) BY INDIRECT UTILITY METHOD 170 TABLE 6.1: CHARACTERISTICS OF THE STUDY PARTICIPANTS (N= 313) 209 TABLE 6.2. PROPERTIES OF THE MEASURES OF SOCIOECONOMIC STATUS (SES) IN OUR SAMPLE '' 210 TABLE 6.3 UNIVARIATE ASSOCIATIONS WITH THE GENERIC HRQL MEASURES (THE SF-6D AND THE HUB) 211 TABLE 6.4 UNIVARIATE ASSOCIATIONS WITH THE GENERIC HRQL MEASURES (THE HUI2 AND THE EQ-5D) : 212 TABLE 6.5 UNIVARIATE ANALYSIS WITH THE DISEASE-SPECIFIC MEASURES (THE HAQ AND THE RAQoL) 213 TABLE 6.6: COMPARISON OF RA AND HEALTH STATUS MEASURES ACROSS DIFFERENT SOCIAL CLASSES 214 T A B L E 7.2: TEST - RETEST RELIABILITY 251 x TABLE 7.3: INTRACLASS CORRELATION COEFFICIENT VALUES FOR GENERIC AND DISEASE-SPECIFIC HRQL MEASURES FOR THOSE REPORTING NO CHANGE IN THEIR RHEUMATOID ARTHRITIS BETWEEN 0 AND 6 MONTHS 252 TABLE 7.4: MINIMALLY IMPORTANT DIFFERENCES REPORTED IN THE LITERATURE AND DERIVED FROM THE SAMPLE USING ANCHOR-BASE APPROACHES 253 TABLE 7.5: CORRELATIONS BETWEEN THE TRANSITION QUESTION AND CHANGES IN RHEUMATOID ARTHRITIS OUTCOME VARIABLES FROM 0 TO 6 MONTHS 254 TABLE 7.6: DIFFERENCES AND RESPONSIVENESS STATISTICS FROM BASELINE TO 6 MONTHS STRATIFYING THE SAMPLE BY THE TRANSITION QUESTION 255 TABLE 7.7: DIFFERENCES AND RESPONSIVENESS STATISTICS FROM BASELINE TO 6 MONTHS STRATIFYING THE CATEGORIES CREATED FROM PATIENT GLOBAL ASSESSMENT OF DISEASE SEVERITY VAS 256 TABLE 7.8: RANKINGS OF RESPONSIVENESS OF MEASURES ACCORDING TO THE RESPONSIVENESS STATISTIC AND THE EXTERNAL CRITERIA OF CHANGE (EITHER RESPONSES TO THE PATIENT TRANSITION QUESTION OR TO THE PATIENT GLOBAL ASSESSMENT OF DISEASE ACTIVITY VAS) 257 TABLE 7.9: ASSOCIATIONS BETWEEN INSTRUMENT UNWEIGHTED DOMAINS / SINGLE ATTRIBUTE SCORE CHANGES AND SELF-REPORTED CHANGE FROM 0 TO 6 MONTHS 258 xi L I S T O F F I G U R E S FIGURE 3.1: DISTRIBUTIONS OF GLOBAL UTILITY VALUES ACROSS THE MAUT INSTRUMENTS 98 FIGURE 3.2: BLAND-ALTMAN PLOT OF DIFFERENCE BETWEEN THE HUI2 AND HUB VS. THE AVERAGE SCORE WITHIN PATIENTS 100 FIGURE 3.3: BLAND-ALTMAN PLOT OF DIFFERENCE BETWEEN THE HUB AND THE SF-6D VS. THE AVERAGE SCORE WITHIN PATIENTS 101 FIGURE 3.4: BLAND-ALTMAN PLOT OF DIFFERENCE BETWEEN THE HUB AND EQ-5D VS. THE AVERAGE SCORE OF THESE TWO INSTRUMENTS WITHIN PATIENTS 102 FIGURE 3.5: BLAND-ALTMAN PLOT OF THE DIFFERENCE BETWEEN THE EQ-5D AND SF-6D VS. THE AVERAGE SCORE WITHIN PATIENTS 103 FIGURE 3.6: BLAND-ALTMAN PLOT OF DIFFERENCE BETWEEN THE HUI2 AND THE SF-6D VS. THE AVERAGE SCORE WITHIN PATIENTS 104 FIGURE 3.7: BLAND-ALTMAN PLOT OF DIFFERENCE BETWEEN THE HUI2 AND EQ-5D VS. THE AVERAGE SCORE OF THESE TWO INSTRUMENTS WITHIN PATIENTS 105 FIGURE 4.1: BOX PLOT OF MAUT INSTRUMENT GLOBAL UTILITY SCORES 136 FIGURE 5.1: A SCHEMATIC REPRESENTATION OF THE MARKOV, HAQ-BASED MODEL USED FOR THE COST-EFFECTIVENESS ANALYSIS 171 FIGURE 5.2: KAPLAN-MEIER SURVIVAL CURVES FROM THE 100,000 MONTE CARLO SIMULATIONS 172 FIGURE 5.3: INCREMENTAL COSTS AND QALYS FROM 1000 2ND ORDER MONTE CARLO SIMULATIONS 173 FIGURE 5.4: COST-UTILITY ACCEPTABILITY CURVES FOR EACH INDIRECT UTILITY MEASURE 174 FIGURE 6.1: GENERIC HRQL BY SELF-REPORTED ANNUAL INCOME (HUB AND SF-6D) 215 FIGURE 6. 2: GENERIC HRQL BY SELF-REPORTED ANNUAL INCOME (HUI2 AND EQ-5D) 216 FIGURE 6.3: RAQOL SCORE AND HAQ DISABILITY INDEX BY SELF-REPORTED INCOME 217 Xll FIGURE 7.1: AGREEMENT BETWEEN THE PATIENT TRANSITION QUESTION AND CHANGES USING MID CUTOFFS FOR THE GENERIC AND DISEASE-SPECIFIC INSTRUMENTS 259 FIGURE 7.2: SCATTERPLOT OF HUI2 UTILITY SCORES OVER TIME STRATIFIED BY THE RESULTS OF THE COLLAPSED TRANSITION QUESTION 262 FIGURE 7.3: SCATTERPLOT OF HUI3 UTILITY SCORES OVER TIME STRATIFIED BY THE RESULTS OF THE COLLAPSED TRANSITION QUESTION 263 FIGURE 7.4: SCATTERPLOT OF EQ-5D UTILITY SCORES OVER TIME STRATIFIED BY THE RESULTS OF THE COLLAPSED TRANSITION QUESTION 264 FIGURE 7.5: SCATTERPLOT OF SF-6D UTILITY SCORES OVER TIME STRATIFIED BY THE RESULTS OF THE COLLAPSED TRANSITION QUESTION 265 FIGURE 7.6: RESULTS OF THE MULTI-RESPONSE MODEL OF THE ASSOCIATION BETWEEN A CHANGE IN THE HUI2 AND THE TRANSITION QUESTION 266 FIGURE 7.7: RESULTS OF THE MULTI-RESPONSE MODEL OF THE ASSOCIATION BETWEEN A CHANGE IN THE HUD AND THE TRANSITION QUESTION 267 FIGURE 7.8: RESULTS OF THE MULTI-RESPONSE MODEL OF THE ASSOCIATION BETWEEN A CHANGE IN THE EQ-5D AND THE TRANSITION QUESTION 268 FIGURE 7.9: RESULTS OF THE MULTI-RESPONSE MODEL OF THE ASSOCIATION BETWEEN A CHANGE IN THE SF-6D AND THE TRANSITION QUESTION 269 FIGURE 7.10: RESULTS OF THE MULTI-RESPONSE MODEL OF THE ASSOCIATION BETWEEN A CHANGE IN THE RAQOL AND THE TRANSITION QUESTION.... 270 FIGURE 7.11: RESULTS OF THE MULTI-RESPONSE MODEL OF THE ASSOCIATION BETWEEN A CHANGE IN THE HAQ AND THE TRANSITION QUESTION 271 FIGURE 7.12: RESULTS OF THE MULTI-RESPONSE MODEL OF THE ASSOCIATION BETWEEN A CHANGE IN THE HUI2 AND THE PATIENT GLOBAL ASSESSMENT OF DISEASE ACTIVITY272 FIGURE 7.13: RESULTS OF THE MULTI-RESPONSE MODEL OF THE ASSOCIATION BETWEEN A CHANGE IN THE HUI3 AND THE PATIENT GLOBAL ASSESSMENT OF DISEASE ACTIVITY273 xiii FIGURE 7.14: RESULTS OF THE MULTI-RESPONSE MODEL OF THE ASSOCIATION BETWEEN A CHANGE IN THE EQ-5D AND THE PATIENT GLOBAL ASSESSMENT OF DISEASE ACTIVITY 274 FIGURE 7.15: RESULTS OF THE MULTI-RESPONSE MODEL OF THE ASSOCIATION BETWEEN A CHANGE IN THE SF-6D AND THE PATIENT GLOBAL ASSESSMENT OF DISEASE ACTIVITY275 FIGURE 7.16: RESULTS OF THE MULTI-RESPONSE MODEL OF THE ASSOCIATION BETWEEN A CHANGE IN THE RAQOL AND THE PATIENT GLOBAL ASSESSMENT OF DISEASE ACTIVITY 276 FIGURE 7.17: RESULTS OF THE MULTI-RESPONSE MODEL OF THE ASSOCIATION BETWEEN A CHANGE IN THE HAQ AND THE PATIENT GLOBAL ASSESSMENT OF DISEASE ACTIVITY.... 277 xiv ACKNOWLEDGEMENTS A large debt of gratitude is owed to my co-supervisors, Aslam Anis and John Esdaile, for providing me with invaluable mentorship and support. From the outset, they provided sound advice, amazing opportunities and lots of good ideas. I would have been lost without them. Most sincere thanks to my committee members: Stephen Marion who shared so unselfishly of his time imparting a small fraction of his immense knowledge to me and Jacek Kopec for mentorship and counsel that has benefited me greatly and improved the quality of my work. The support of dedicated research assistants made this work possible: Barbara Vinduska, Amir Adel Rashidi, and Janet Pursed. Also, much thanks to the rheumatologists who facilitated recruitment: Robert Offer, Andrew Chalmers, Kamran Shojania, Barry Koehler, Graham Reid, Dan Macleod, Alice Klinkhojf, John Kelsall, Milton Baker, and Diane Lacaille. Best wishes and heartfelt thanks to all of the study participants. Special thanks to all of the people who shared their time and knowledge with me along the way especially: Daphne Guh, Chris Richardson, and John Woolcott. Also, Larry Lynd, who forged the path ahead of me, motivated me to embark on this career path, and left very large footprints that were difficult to fill . Most of all, I want thank my amazing family; especially, my wife Fawziah for pushing me to follow my goals and for providing me with unyielding support and advice from the beginning that gave me the necessary strength and motivation to begin and complete this work. Without her many compromises, dedication and commitment, none of this would have been possible. To her, I dedicate this thesis. Finally, Yasmin and Noah, thank you both for the constant reminders about the things that are truly important in life and keeping me grounded in reality. This research was generously supported by grants from the Canadian Arthritis Network. Thanks to the Canadian Institutes of Health Research, the Arthritis Society and the Michael Smith Foundation for Health Research for their fellowship support. xv CHAPTER 1 INTRODUCTION 1.1 RHEUMATOID ARTHRITIS: EPIDEMIOLOGY, ECONOMIC BURDEN AND ECONOMIC EVALUATION Rheumatoid arthritis (RA) is a chronic, progressive, inflammatory disease that afflicts approximately 300,000 Canadians. 1 ' 2 The cause of this condition is unknown and there is no cure. This disease affects the physical functioning of patients as well as their psychological and social health 4 and eventually progresses to substantial disability through the loss of mobility, increased morbidity, and premature mortality. 5 - 9 The incidence of death from cardiovascular disease 1 0, infection 1 1 and cancer 1 2 are significantly higher than those experienced in the general population. R A can occur at any age but its onset peaks between the ages of 40 and 60 years. 1 4 The prevalence of R A is approximately 0.5 to 1% of the adult population and the incidence appears to be decreasing over the past 4 decades (from 61.2 per 100,000 in 1955 to 1964 to 32.7/100,000 in 1985 to 1994). 2 , 1 5 However, this finding is from one population-based, longitudinal study in a specific geographic area (Rochester, Minnesota) and may not be generalizable to areas with different ethnicity (this sample was 96% white) or environmental exposures. Other prevalence data is difficult to find and is often not population-based. Using administrative databases in British Columbia, a population-based estimate of 27,710 1 R A cases (mean (SD) age of 64.1(17), 67% women) was identified translating into a prevalence rate of 0.76%. 1 6 Although the incidence of R A may be decreasing, results from epidemiological evaluations show that the premature mortality associated with this condition has not changed over the last several decades despite the development of more effective interventions. 2 ' 1 7 1 8 However, a recent analysis has determined that the use of methotrexate ( M T X ) is associated with a substantial survival benefit in patients with R A despite having had worse prognostic factors for mortality prior to being treated with this agent.9 After adjusting for confounding by indication (specifically, in this case, patients with more severe disease having a higher probability of being prescribed M T X ) , this beneficial affect on mortality was demonstrated in comparisons with patients who were taking other disease modifying antirheumatic drugs ( D M A R D s ) or patients taking no D M A R D s (the adjusted hazard ratio of M T X compared to no D M A R D s was 0.2, 95% confidence interval (CI) 0.1 to 0.7). The mortality hazard ratio for comparisons between those using M T X and those with no M T X use (i.e. other D M A R D s ) was 0.4 (95% CI 0.2 to 0.8) for all cause mortality, 0.3 (95% CI 0.2 to 0.7) for cardiovascular mortality, and 0.6 (95% CI 0.2 to 1.2) for non-cardiovascular mortality. Thus, from the results of this analysis, it would appear that the application of an effective treatment such as M T X is associated with survival benefits. In another study examining survival in R A , 7 Wolfe et al.. demonstrated that the strongest predictor of mortality in this disease group was longitudinal changes in the Health Assessment Questionnaire (HAQ) . A one standard deviation increase in the H A Q (a higher H A Q represents more severe disease) resulted in a 26.2% greater increase in the odds ratio for mortality compared to the next most powerful predictor of mortality, the patient 2 completed global severity index. Interestingly, changes from the fourth quartile to the first quartile for both of these measures would have an estimated reduction in mortality by 50% for the H A Q and 33% for the patient completed global severity index. Thus, since the H A Q and other self-reported measures are the most highly predictive of future mortality, lowering the H A Q with effective drug therapy (as shown by Wolfe et al. 9) or improving self-rated health should result in improved survival in R A . There are many other examples in the literature of the effect of drug therapy on the H A Q in patients with R A ; however, these patients have not been followed for a long enough period of time to determine i f this reduction in H A Q translates into a reduction in mortality. 1 9" 2 3 Thus, one can only postulate whether the reduction in H A Q due to drug treatment (other than M T X ) observed in these trials results in a lower mortality rate. The severity of R A has been shown to be significantly related to a reduction in health-related quality of life ( H R Q L ) 2 4 ' 2 5 and H R Q L in R A has been shown to be worse than in other forms of arthritis. 2 6 The finding that lower H R Q L is associated with higher R A severity has been shown using both disease-specific measures such as the Rheumatoid Arthritis Quality of Life Questionnaire ( R A Q o L ) 2 7 ' 2 8 and the Arthritis Impact Management Scale ( A I M S ) , 2 9 ' 3 0 and, to a lesser extent, preference-based measures of H R Q L such as the EuroQol (EQ-5D) . 3 1 A s with the humanistic burden that R A exerts in terms of mortality and reduction in H R Q L , the economic burden of R A to society is substantial and is thought to rival that of coronary artery disease. 3 2 Since R A usually starts in the 4 t h or 5 t h decade of l i f e 3 3 and the disease cannot be cured, the costs attributable to this condition are often compounded over several years. There are a number of studies in the literature which attempt to examine the 3 direct and indirect costs associated with R A from a variety of perspectives. The findings of these analyses show a great degree of variability in the estimated costs, partly due to differences in the costing methodologies employed (and some utilized charges instead of costs) 4 7, which variables were included in the ascertainment of direct medical costs, and methodological problems in the determination of cost-of-illness in R A . 4 8 Despite these limitations, each study concluded that the costs to manage R A were substantial, and when assessed, the indirect costs were a large portion of the overall costs to manage R A . Several reviews examining studies that investigate the costs to manage R A have been published. 4 8" 5 0 Most of these studies included in these reviews were conducted in an era when new, expensive drug therapies for R A , such as the tumour necrosis factor (TNF) alpha blockers and newer non-steroidal anti-inflammatory drugs, were not yet available. Specifically, in a review by Lubeck, 4 9 hospitalizations generally accounted for > 60% whereas drug costs generally accounted <25% of direct costs in R A . Pugner et al. 5 0 found that the mean annual direct cost to manage R A was $5,425 (1998 U . S . dollars) and that the median percentage of this total due to hospitalization and drugs was 47% and 16%, respectively. Few studies have attempted to quantify indirect costs but, o f those that have , 3 6 ' 3 9 ' 4 0 ' 4 2 ' 4 3 these have ranged from $1082 to $37501 (1996 U . S . dollars) per patient. However, again in the determination of indirect costs, there was a lack of clarity in the studies on how the results were determined and/or important methodological issues that makes comparisons across the studies difficult. The only Canadian study that was published in this earlier era of R A drug treatment was by Clarke et al. who examined cohorts of individuals with rheumatoid arthritis in Saskatchewan. 4 2 ' 4 3 In this longitudinal study of almost 1000 patients with R A , annual direct 4 and indirect costs were determined to be $4656 and $1597, respectively (1994 Canadian dollars). The authors determined resource utilization and direct medical costs from assessing the number of physician visits, medications, diagnostic tests, and inpatient care. Inpatient care was associated with almost two-thirds of the direct medical costs. Indirect costs were assessed through productivity loss using the human capital approach. Since most of the patients were > 60 years of age and no longer considered themselves to be in the work force, the authors did not include these individuals in the calculation of indirect costs resulting in small estimates. However, the results of these studies are no longer accurate because of the introduction and increasing use of biological drug therapy (infliximab, etanercept and adalimumab) as well as a new class of nonsteroidal anti-inflammatory drugs (the cyclo-oxygenase [COX] 2 specific inhibitors) 5 1 ' 3 in the past few years which have caused medication costs to manage R A to skyrocket. 3 5 These biological agents are effective but are extremely expensive with annual acquisition costs from $12,000 to over $20,000 per patient. if A recent analysis examining costs in the biological era was published by Michaud et al.. In a sample of 7,527 patients with R A answering semi-annual questionnaires from January 1999 to December 2001, direct medical costs (calculated from physician and other health professional visits, radiologic examinations, laboratory and other tests, outpatient surgeries, hospitalizations and medications) were determined. In the entire sample, the mean direct cost was $9,519 (2001 U . S . dollars) of which 66% was due to drug costs and 16% and 17% were due to hospital costs and outpatient costs, respectively. In those receiving biological Reference 51 was authored by the candidate during the tenure of the doctoral program and has been inserted as Appendix I 5 agents, the annual direct cost was $19,016 compared to $6,164 (2001 U . S . dollars) in those not receiving these agents. The use of these agents may have had an impact on indirect costs as measured by productivity losses as well . In a study by Y e l i n et a l . , 5 2 the association between the use of etanercept and employment outcomes were investigated in a sample of 497 R A patients. In order to ensure eligibility for employment, only patients between 18 and 64 were included in the study. In structured telephone interviews, patients were asked questions regarding their employment status in the year of diagnosis (75% in the etanercept group vs. 77% in the non-etanercept group) and in the study year (71% in the etanercept group vs. 55% in the non-etanercept group). After adjusting for demographics, overall health status, duration of R A , R A status, and occupation type, the difference increased to 20%, 95% CI 9% to 32% difference (53% vs. 73% employed in the non-etanercept and etanercept groups, respectively). Thus, it would appear that, at least among those of working age, etanercept has the potential to reduce the indirect costs associated with R A . With respect to the impact that these new biological agents make on H R Q L and H A Q scores, a few studies have shown benefit. " Using the Short Form 36 (SF-36), both infliximab and etanercept have been shown to improve H R Q L (at least in the short term) over M T X in randomized controlled tr ials . 5 3 ' 5 4 In a recently published observational study of patients either being treated with infliximab or with stable R A , the responsiveness of the SF-36, the EQ-5D, the standard gamble, and the Short Form 6-D (SF-6D) were evaluated. 5 5 In the group treated with infliximab, large responsiveness indices (>0.80 effect sizes) were observed for the SF-36 physical component score, and relevant domains in the SF-36 (bodily pain, physical functioning, role physical, social functioning and vitality). For preference-6 based measures, the SF-6D was highly responsive (effect size of 1.40), the E Q - 5 D was moderately responsive (effect size of 0.67) and the standard gamble was poorly responsive (effect size of 0.49). However, the sample size of this study was small (60 patients on infliximab and 24 patients with stable R A ) so the results were not conclusive. Although economic evaluations of pharmacotherapies are not new in R A , " there has been an explosion of published cost-effectiveness analyses since the introduction of leflunomide (a new D M A R D ) , new biological agents, pharmacogenetic technologies and C O X - 2 specific inhibitors. 5 9" 7 0 ' 1 3 Although there are many shortcomings of several of these analyses, a detailed discussion of these is beyond the scope of this chapter. A critical review 71 pertaining to the cost-effectiveness literature of biological agents w i l l soon be available. However, a couple of limitations of these analyses have direct relevance to this thesis -namely, the use of randomized controlled data to estimate outcomes and the attempt (or 72 lack thereof) to incorporate H R Q L data into the outcome variables. Wolfe et al. make a compelling case that the short-term efficacy data derived from randomized controlled trials are not suitable to extrapolate to long-term cost-effectiveness results and that observational drug-treatment databases should be utilized. This finding is based on the evidence that treatment outcomes derived from observational databases can often be different than those derived from randomized controlled trials in R A . 7 3 ' C The second major limitation of many of * Of note, the candidate authored one of the cost-effectiveness analyses on the pharmacogenetics technologies (reference 66) during his tenure as a doctoral student and it has been included as Appendix II.. c Of note, the candidate authored a paper (reference 73) during the tenure of his doctoral program that showed that the efficacy and toxicity of cyclosporin in an observational database was different than reported in randomized controlled trials and this has been included as Appendix III. 7 these cost-effectiveness analyses is the lack of an attempt to integrate preference-based generic H R Q L measures into the economic evaluations or the application of instruments/techniques that have not undergone appropriate testing in R A . For example, many of the analyses report either costs alone or cost-effectiveness ratios using naturalistic units (such as reduction in swollen joints or in proportions of patients improving using standard c r i t e r i a ) . 5 6 ' 5 7 , 6 0 ' 6 1 ' 6 6 ' 6 7 ' 6 9 Other analyses utilized direct preference elicitation techniques such as a visual analogue scale ( V A S ) , Time Trade Of f (TTO) or standard CO CQ -J/"\ _____ gamble (SG) •>°>^>'" although the T T O / S G have been shown to be poorly responsive and poorly correlated with clinical outcomes in patients with rheumatoid arthrit is. 5 5 ' 7 4 ' 7 5 The most commonly applied method to obtain preference scores was the E Q - 5 D in the recent cost-effectiveness analyses of biological agents 6 3" 6 5 which has been shown to be both responsive and valid in rheumatoid arthritis. 3 1 Other instruments utilized to derive preference-based scores that are commonly utilized in cost-utility analyses have not yet been applied in assessing the cost-effectiveness of treatments for rheumatoid arthritis. These instruments include the Health Utilities Index Mark 2 (HUI2) and Mark 3 ( H U D ) , and the S F - 6 D . 7 6 However, research in R A and other disease states suggests that scores obtained with these systems are not interchangeable and could have a profound impact on the estimation of incremental cost-effectiveness ratios. 7 7 ' 7 8 1.2 RESEARCH NEEDS AND STUDY JUSTIFICATION The appropriate and most efficient use of health care resources has resulted in the need to conduct economic evaluations for new and existing treatments in order to inform 8 decision-making. For diseases such as R A that are chronic and incurable with a documented impact on H R Q L , the need to integrate H R Q L data into treatment assessment is critical. Compounding this point is the evolving field of R A treatment, which has brought about several new, effective but very expensive agents in the past few years. Thus, through the cost-utility analysis framework, preference-based measures of H R Q L can be used to inform resource allocation decisions in health care. 7 9 ' 8 0 This is done through the calculation of the quality adjusted life year ( Q A L Y ) which is commonly used in the denominator of the OA O 1 incremental cost-utility ratio calculation. ' A s originally described by Weinstein et a l . , 8 2 "the quality adjusted life year approach assigns to each period of time a weight, ranging from 0 to 1, corresponding to the health-related quality of life during that period, where a weight of 1 corresponds to optimal health, and a weight of 0 corresponds to a health state judged to be equivalent to death". Thus, Q A L Y s relating to a health outcome are expressed as the value (weighting) given to a particular health state multiplied by the time spent in that state. The weightings used in the calculation of Q A L Y s are derived from preferences for health states, which can be measured directly through the application of various methodologies such as the standard gamble (SG), time trade off (TTO), person trade-off, and rating scales. 7 9 ' 8 0 However, due to the expense and inconvenience associated with administering many of the direct approaches, generic, preference-based questionnaires have been developed which integrate health into a single index (where death is anchored at zero and perfect health at one). These questionnaires typically consist of a health classification system with an associated scoring formula that assigns preference-weighted values to the health states defined by the classification system and integrates the different aspects of health into a single index. The questionnaires that are 9 most commonly utilized are the Health Utilities Index Mark 2 and 3 (HUI2 and H U B ) , the EuroQol (EQ-5) and the Short Form 6D (SF-6D) . 7 9 ' 8 0 However, despite their widespread use, there have been few comparative studies addressing their strengths, weaknesses and interchangeability. 8 0 Empirical assessments of instruments designed to measure or value H R Q L involves examining feasibility, reliability, validity and responsiveness.8 3 Feasibility refers to the ability of the instrument to be used in practice and accepted by respondents.8 1 Reliability refers to stability of responses i f the conditions under examination remain unchanged. 8 3 Validity is the extent to which an instrument measures the property it is intended to 84 measure. M u c h of the literature on validity o f H R Q L measures focuses on the discriminative properties of instruments which is also the technique commonly employed for preference-based instruments. 8 5 Another essential property of H R Q L instruments is the ability to detect change over time and, the extent to which this change is important or meaningful. Finally, it is not clear i f these instruments, which all purport to assess the same construct, namely a single index score of H R Q L , are interchangeable and, i f used as the weights for Q A L Y s in an economic evaluation of pharmacotherapy for R A , would result in comparable outcomes and potentially similar policy decisions. This study was conceived based on the need to compare the properties of the most commonly utilized indirect, preference-based measures in terms of their cross-sectional construct validity, differences in aspects of health that they assess, potential biases in their scores in terms of the effects of income, and their sensitivity to change and responsiveness. Chapter 2 provides a detailed review of these preference-based, indirect utility instruments, 10 studies that compare their properties, and the use of preference-based measures as Q A L Y weights in economic evaluations of R A . 1.3 S T U D Y H Y P O T H E S I S , O B J E C T I V E S , A N D T H E S I S O R G A N I Z A T I O N The overall aim of this study was to evaluate and compare the different properties of the four indirect utility instruments (HUI2, H U D , SF-6D, EQ-5D) and to assess whether using the scores generated by their different systems in the same cost-effectiveness framework would result in different outcomes. The primary hypothesis of this study was that quality adjusted life year ( Q A L Y ) estimates obtained using these instruments would be different and would result in different incremental cost-utility ratios and, therefore, potentially different policy decisions. The first objective of this study was to determine if, on a cross-sectional basis, the indirect utility instruments would yield similar utility values in patients with R A and, i f not, were the assessed domains of health similar among the instruments. The second objective was to determine i f these indirect utility assessment instruments displayed cross-sectional, construct validity in the assessment of patients with R A and how well they compared, in this regard, to the disease-specific R A Q o L and to a disability status measure, the H A Q . The th i rd objective was to determine i f the utilization of the different utility values generated by the indirect utility instruments in a cost-utility analysis of a new drug therapy (infliximab plus M T X ) compared to usual therapy ( M T X alone) for rheumatoid arthritis would result in different estimates of the incremental cost per Q A L Y gained. 11 The fourth objective was to determine i f the results generated by the indirect utility instruments are influenced by socioeconomic status and, i f so, could therefore bias the results of cost-utility analyses. The fifth objective was to determine the longitudinal validity of the instruments in rheumatoid arthritis in terms of their ability to be responsive to changes that patients experience in their R A . This thesis is comprised of eight chapters, organized chronologically, addressing each of the objectives in order. This first chapter provides a brief introduction to: 1) the epidemiology of R A ; 2) the effect of R A on mortality; 3) direct and indirect costs of R A ; 4) the rapidly evolving field of pharmacotherapy for R A ; 5) the impact on new strategies on work productivity and H R Q L ; 6) published cost-effectiveness analyses in R A ; and 7) the use of preference-based measure scores as weighting factors for Q A L Y s in economic evaluations of R A . Chapter 2 provides a detailed literature review of indirect utility instruments as weightings for Q A L Y s in the economic evaluation of interventions for R A . Specifically, the indirect utility measures, their properties, their use in the calculation of quality adjusted life years, how their properties have compared in other disease states, and the application of these instruments in cost-utility analyses in R A are reviewed in detail. Chapters 3 and 4 present the results of the cross-sectional analysis from the baseline results of a sample of R A patients who participated in our longitudinal study. Chapter 5 provides a comparison of how the application of these utility instruments in a decision-analytic, Markov model for a new pharmacotherapy in R A results in vastly different incremental cost per Q A L Y ratios. Chapter 6 provides an examination of how these indirect utility instruments are influenced by annual 12 income and how this could potentially bias economic evaluation of therapies for R A . Chapter 7 presents the longitudinal validity analyses of these instruments in terms of responsiveness. Chapters 3 through 7, and Appendix I, II, III are each stand-alone manuscripts, which have either been published, are in press, or are under review by a major, peer-reviewed, scholarly journal. The work presented in this thesis was conceived, conducted, and disseminated by the doctoral candidate as has been declared by the co-supervisors of the candidate (Appendix IV). The final chapter provides a summary of the research findings and outlines the strengths, limitations and the unique contributions and potential impact of the findings of this study. 1.4 S U M M A R Y Approximately 300,000 Canadians have been diagnosed with R A . Due to its chronic, debilitating nature, the direct costs associated with the management of this condition and indirect costs due to lost employment are substantial. Functional status and H R Q L have been shown to be reduced in patients with R A . New therapies, specifically biological D M A R D s , have the potential to improve H R Q L , functional status, and offset some o f the indirect costs associated with productivity and potentially some of the direct costs of management (such as hospitalizations) although their acquisition costs are large. Therefore, cost-effectiveness analyses of these new agents that integrate appropriate preference-based measures of H R Q L into years of life are required. Often, due to the convenience and availability, instruments that estimate society's preferences for health states are used to accomplish this task. However, preliminary evidence suggests that the application of these instruments in 13 economic evaluations could result in vastly different cost-effectiveness outcomes. Therefore, further research is required to compare the scores and properties of these instruments in patients with R A . This study focused on the comparison of the scores and properties of four indirect utility assessment instruments (the HUI2 , H U B , the EQ-5D, and the SF-6D) in patients with R A . The evaluation of these instruments in this population required recruitment of a sample of patients with R A for direct comparison of scores and the evaluation of their longitudinal properties. In addition, a HAQ-based, Markov model was created to test the hypothesis that incremental cost per Q A L Y ratios would be different using the various indirect utility instrument scores as weightings for Q A L Y s . 14 1.5 R E F E R E N C E S 1. The Arthritis Society. Accessed on the Internet, January 5, 2004 at: http://vvww.arthritis.ca 2. Doran M F , Pond G R , Crowson C S , O'Fal lon W M , Gabriel SE. Trends in the incidence and mortality of rheumatoid arthritis. Arthritis Rheum 2002; 46:625 3. American College of Rheumatology Subcommittee on Rheumatoid Arthritis Guidelines. Guidelines for the Management of Rheumatoid Arthritis. 2002 Update. Arthritis Rheum 2002;46:328-346. 4. Y e l i n E , Callahan L F , for the National Arthritis Data Work Group. The economic cost and social psychological impact of musculoskeletal conditions. Arthritis Rheum 1995;38:1351-1356. 5. Pincus T, Callahan L F . The 'side effects' o f rheumatoid arthritis: Joint destruction, disability, and early mortality. B r J Rheumatol 1993;32 (Suppl.l): 28-37. 6. Gabriel SE, Crowson C S , Kremers H M , Doran M F , Turesson C, O 'Fal lon W M , Matteson E L . Survival in rheumatoid arthritis. A population-based analysis of trends over 40 years. Arthritis Rheum 2003;48:54-58. 7. Wolfe F, Michaud K , Gefeller O, Choi H K . Predicting mortality in patients with rheumatoid arthritis. Arthritis Rheum 2003;48:1530-1542. 8. Wong JB, Ramey D R , Singh G . Long-term morbidity, mortality, and economics of rheumatoid arthritis. Arthritis Rheum 2001;44:2746-2749. 9. Choi H K , Hernan M A , Seeger SD, Robins J M , Wolfe F. Methotrexate and mortality in patients with rheumatoid arthritis: A prospective study. Lancet 2002;359:1173-1177. 15 10. Mutru O, Laasko M , Isomaki H , Koota K . Cardiovascular mortality in patients with rheumatoid arthritis. Cardiology 1989;76:71-77. 11. Wolfe F, Cathey M A . The assessment and prediction of functional disability in rheumatoid arthritis. J Rheumatol 1991;18:1298-1306. 12. Doran M F , Crowson C S , Pond G R , O'Fal lon W M , Gabriel SE . Frequency of infection in patients with rheumatoid arthritis compared with controls: a population-based stud. Arthritis Rheum 2002;46:2287-2293. 13. Cibere J, Sibley J, Haga M . Rheumatoid arthritis and the risk of malignancy. Arthritis Rheum 1997;40:1580-1586. 14. Tugwell P. Pharmacoeconomics of drug therapy of rheumatoid arthritis. Rheumatology 2000:39(suppl. 1): 43-47. 15. Hochberg M C , Spector T D . Epidemiology o f rheumatoid arthritis: update. Epidemiol Rev 1990;12:247-252. 16. Lacaille D , Anis A H , Guh D , Esdaile J M . Assessing the quality of care for R A at the population level. Arthritis Rheum 2002;46 (suppl.):s626-s626. 17. Gabriel SE, Crowson C S , O'Fal lon W M . Mortality in rheumatoid arthritis: Have we made an impact in 4 decades? J Rheumatol 1999;26:2529-2533. 18. Coste J, Jougla E . Mortality from rheumatoid arthritis in France, 1970-1990. Int J Epidemiol 1994;23:545-552. 19. Scott D L , Strand V . The effects of disease-modifying anti-rheumatic drugs on the Health Assessment Questionnaire score. Lessons from the leflunomide clinical trials database. Rheumatology 2002;41:899-909. 16 20. Kremer J M . Rational use of new and existing drugs for rheumatoid arthritis. A n n Intern M e d 2001;134:695-706. 21. Ma in i R, St. Clair E W , Breedveld F, Furst D , Kalden J, Weisman M , et al.. for the A T T R A C T Study Group. Infliximab (chimeric anti-tumour necrosis factor alpha monoclonal antibody) versus placebo in rheumatoid arthritis patients receiving concomitant methotrexate: a randomized Phase III trial. Lancet 1999;354:1932-1939 22. Boers M , Verhoeven A , Markusse H , Van der Laar M , Westhovens R, V a n Denderen J. Randomised comparison of combined step-down prednisolone, methotrexate and sulphasalazine with sulphasalazine alone in early rheumatoid arthritis. Lancet 1997; 350:309-318. 23. Moreland L , Schiff M , Baumgartner S, Tindall E , Fleischmann R, Bulpitt K . Etanercept therapy in rheumatoid arthritis: a randomised, controlled trial. A n n Intern M e d 1999; 130:478-486. 24. Bendtsen P, Akerl ind I , Hornquist JO. Assessment of quality of life in rheumatoid arthritis: methods and implications. Pharmacoeconomics 1994;286-298. 25. Nicho l M B , Harada A S M . Measuring the effects of medication use on health-related quality of life in patients with rheumatoid arthritis: A review. Pharmacoeconomics 1999;16(5Pt l):433-448. 26. Dominick K L , A h e m F M , Go ld C H , Heller D A . Health-related quality of life among older adults with arthritis. Health and Quality of Life Outcomes 2004;2:5. 27. Nevil le C , Whalley D , McKenna S, Le Comte M , Fortin PR. Adaptation and validation o f the rheumatoid quality o f life scale for use in Canada. J Rheumatol 2001;28:1505-1510. 17 28. de Jong Z , van der Heijde D , Mckenna SP, Whalley D . The reliability and construct validity of the R A Q o L : a rheumatoid arthritis-specific quality of life instrument. B r J Rheumatol 1997;36:878-883. 29. Lorish C D , Abraham N , Austin JS, Bradley L A , Alarcorn G S . A comparison of the full and short versions of the Arthritis Impact Measurement Scales. Arthritis Care Res 1991;4:168-173. 30. Buchbinder R, Bombardier C, Yeung M , Tugwell P. Which outcome measures should be used in rheumatoid arthritis clinical trials? Clinical and quality-of-life measures' responsiveness to treatment in a randomized controlled trial. Arthritis Rheum 1995;38:1568-1580. 31. Hurst N , K i n d P, Ruta D , Hunter M , Stubbings A . Measuring health-related quality of life in rheumatoid arthritis: validity, responsiveness, and reliability o f EuroQol (EQ-5D). B r J Rheumatol 1997;36:551-559. 32. Callahan L F . Economics of rheumatoid arthritis. Rheumatoid Arthritis 1999;2:3-5. 33. Maetzel A , Strand V , Tugwell P, Wells G , Bombardier C. Cost effectiveness of adding leflunomide to a 5-year strategy of conventional disease-modifying antirheumatic drugs in patients with rheumatoid arthritis. Arthritis Rheum 2002;47:655-661. 34. Michaud K , Messer J, Choi H K , Wolfe F. Direct medical costs and their predictors in patients with rheumatoid arthritis. A three-year study of 7,527 patients. Arthritis Rheum 2003;48:2750-2762. 35. Ward M M , Javitz H S , Ye l in E H . The direct cost of rheumatoid arthritis. Value in Health 2000;4:243-252. 18 36. Meenan R P , Y e l i n E H , Heke C J , et al.. The costs of rheumatoid arthritis: A patient-oriented study o f chronic disease costs. Arthritis Rheum 1978:21:827-833. 37. Lubeck DP , Spitz P W , Fries JF, et al.. A multicentre study of annual health service utilization and costs in rheumatoid arthritis. Arthritis Rheum 1986;29:488-493. 38. Y e l i n E , Wanke L A . A n assessment of the annual and long-term direct costs of rheumatoid arthritis. Arthritis Rheum 1999;42:1209-1218. 39. Stone C E . The lifetime economic costs of rheumatoid arthritis. J Rheumatol 1984;11:819-827. 40. Gabriel SE, Crowson C S , Campion M E , et al.. Direct medical costs unique to people with arthritis. J Rheumatol 1997;24:719-725. 41. Y e l i n E . The costs of rheumatoid arthritis - absolute, incremental, and marginal estimates. J Rheumatol 1996;23:47-51. 42. Clarke A E , Zowal l H , Levinton C et al.. Direct and indirect medical costs incurred by Canadian patients with rheumatoid arthritis: A 12 year study. J Rheumatol 1997;24:1051-1060. 43. Clarke A , Levinton C, Joseph L , Penrod J, Zowal l H , Sibley J, et al.. Predicting the short term direct medical costs incurred by patients with rheumatoid arthritis. J Rheumatol 1999; 26:1068-1075. 44. Liang M H , Larson M , Thompson M , et al.. Costs and outcomes in rheumatoid arthritis and osteoarthritis. Arthritis Rheum 1984:27;522-529. 45. van Jaarvsveld C H M , Jacobs J W G , Schrijvers A J P , et al.. Direct cost of rheumatoid arthritis during the first six years: A cost-of-illness study. B r J Rheumatol 1998;37:837-847. 19 46. Pincus T. The underestimated long term medical and economic consequences of rheumatoid arthritis. Drugs 1995;50 (suppl 1): 1 -14. 47. Finkler S A . The distinction between costs and charges. A n n Intern M e d 1982;96:102-109. 48. Cooper N J . Economic burden of rheumatoid arthritis: a systematic review. Rheumatology 2000;39:28-33. 49. Lubeck D P . A review of the direct costs of rheumatoid arthritis: managed care versus fee-for-service settings. Pharmacoeconomics 2001;19:811-818. 50. Pugner K M , Scott DI , Holmes JW, Hieke K . The costs o f rheumatoid arthritis: an international, long-term view. Semin Arthritis Rheum 2000;29:305-320. 51. Marra C A , Esdaile J M , Sun H , Anis A H . The cost of C O X inhibitors: how selective should we be? J Rheumatol 2000;27:2731-2733. 52. Y e l i n E , Trupin L , Katz P, Lubeck D , Rush S, Wanke L . Association between etanercept use and employment outcomes among patients with rheumatoid arthritis. Arthritis Rheum 2003;48:3046-3054. 53. Kosinski M , Kujawski S C , Martin R, Wanke L A , Buatti M C , Ware J E Jr., Perfetto E M . Health-related quality of life in early rheumatoid arthritis: Impact on disease and treatment response. A m J Manag Care 2002;8:231-240. 54. Blumenauer B , Cranney A , Clinch J, Tugwell P. Quality of life in patients with rheumatoid arthritis: which drugs might make a difference? Pharmacoeconomics 2003;21:927-940. 20 55. Russell A S , Conner-Spady B , Mintz A , Mal lon C, Maksymowych W P . The responsiveness of generic health status measures in patients with rheumatoid arthritis receiving infliximab. J Rheumatol 2003;30:941-947. 56. Anis A H , Tugwell P X , Wells G A , Stewart D G . A cost-effectiveness analysis of cyclosporine in rheumatoid arthritis. J Rheumatol 1996;23:609-616. 57. Kavanaugh A , Heudebert G , Cush J, Jain R. Cost evaluation of novel therapeutics in rheumatoid arthritis ( C E N T R A ) : a decision analysis model. Semin Arthritis Rheum 1996;25:297-307. 58. Verhoeven A C , Bibo JC, Boers M , Engel G L , van der Linden SJ. Cost-effectiveness and cost-utility of combination therapy in early rheumatoid arthritis: randomized comparison of combined step-down prednisolone, methotrexate and sulphasalazine with sulphasalazine alone. B r J Rheumatol 1998;37:1102-1109. 59. Maetzel A , Strand V , Tugwell P, Wells G , Bombardier C. Cost effectiveness of adding leflunomide to a 5-year strategy of conventional disease-modifying antirheumatic drugs in patients with rheumatoid arthritis. Arthritis Rheum 2002;47:655-661. 60. Choi H K , Seeger JD, Kuntz K M . A cost-effectiveness analysis of treatment options for patients with methotrexate-resistant rheumatoid arthritis. Arthritis Rheum 2000;43:2316-2327. 61. Choi H K , Seeger JD, Kuntz K M . A cost-effectiveness analysis of treatment options for methotrexate-nai've rheumatoid arthritis. J Rheumatol 2002;29:1156-1165. 62. Wong JB , Singh G , Kavanaugh A . Estimating the cost-effectiveness of 54 weeks of infliximab for rheumatoid arthritis. A m J M e d 2002; 113:400-408. 21 63. Kobelt G , Jonsson L , Young A , Eberhardt K . The cost-effectiveness of infliximab (Remicaide) in the treatment o f rheumatoid arthritis in Sweden and the United Kingdom based on the A T T R A C T study. Rheumatology 2003;42:326-335. 64. Brennan A , Bansback N J , Reynolds A , Conway P. Modeling the cost-effectiveness of etanercept in adults with rheumatoid arthritis in the U K . Rheumatology 2004;43:62-72. 65. Kobelt G , Eberhardt K , Geborek P. T N F inhibitors in the treatment of rheumatoid arthritis in clinical practice: Costs and outcomes in a follow-up study of patients with R A treated with etanercept or infliximab in southern Sweden. A n n Rheum Dis 2004;63:4-10. 66. Marra C A , Esdaile J M , Anis A H . Practical pharmacogenetics: The cost-effectiveness of screening for thiopurine s-methyltransferase polymorphisms in patients with rheumatological conditions treated with azathioprine. J Rheumatol 2002;36:1851-1855. 67. Oh K T , Anis A H , Base SC. Pharmacoeconomic analysis of thiopurine methyltransferase polymorphism screening by polymerase chain reaction for treatment with azathioprine in Korea. Rheumatology 2004;43:156-163. 68. Spiegel M R B , Targownik L , Dulai G S , Gralnek I M . The cost-effectiveness of cyclo-oxygenasae-2 selective inhibitors i n the management of chronic arthritis. A n n Intern M e d 2003;138:795-806. 69. Lee K K , Y o u J H , Ho JT, Suen B Y , Yung M Y , Lau W H , Lee W V , Sung J Y , Chan F K . Economic analysis of celecoxib versus diclofenac plus omeprazole for the 22 treatment of arthritis in patients at risk of ulcer disease. Aliment Pharmacol Ther 2003;18:217-222. 70. Maetzel A , Krahn M , Naglie G . The cost effectiveness of rofecoxib and celecoxib in patients with osteoarthritis or rheumatoid arthritis. Arthritis Rheum 2003;49:283-292. 71. Bansback N , Regier D , Brennan A , A r a R, Shojania K , Esdaile J M , Anis A H , Marra C A . Improving the methods for economic evaluation of rheumatoid arthritis: A review of the literature pertaining to biologic D M A R D s . Drugs (in press). 72. Wolfe F, Michaud K , Pincus T. Do rheumatology cost-effectiveness analysis make sense? Rheumatology 2004;43:4-6. 73. Marra C A , Esdaile J M , Guh D , Fisher J H , Chalmers A , Anis A H . The effectiveness and toxicity of cyclosporin A in rheumatoid arthritis: longitudinal analysis of a population-based registry. Arthritis Rheum 2001;45:240-245. 74. Verhoeven A C , Boers M , van der Linden S. Responsiveness o f the core set, response criteria, and utilities in early rheumatoid arthritis. A n n Rheum Dis 2000;59:966-974. 75. Tijhuis G J , Jansen SJ, Stiggelbout A M , Zwinderman A H , Hazes J M , Vlieland TP. Value of the time trade off method for measuring utilities in patients with rheumatoid arthritis. A n n Rheum Dis 2000;59:892-827. 76. Kopec J A , Wil l i son K D . A comparative review o f four preference-weighted measures of health-related quality of life. J C l i n Epidemiol 2003;56:317-325. 77. Suarez-Almazor M E , Conner-Spady B . Rating of arthritis health states by patients, physicians and the general public. Implications for cost-utility analysis. J Rheumatol 2001;28:648-656. 23 O'Br ien B J , Spath M , Blackhouse G , Severns J L , Dorian P, Brazier J. A view from the bridge: agreement between the SF-6D utility algorithm and the Health Utilities Index. Health Econ 2003;12:975-981. Kopec J A , Wil l i son K D . A comparative review of four preference-weighted measures of health-related quality of life. J C l i n Epidemiol 2003;56:317-25. Drummond M F , O'Br ien B , Stoddart G L , Torrance G W (eds.). Methods for the economic evaluation of health care programmes. 2nd edition. Oxford Medical Publications, Oxford. 1997. Dolan P. The measurement of health-related quality of life for use in resource allocation decisions in health care. Chapter 32. In: Handbook of Health Economics, V o l . 1. Edited by Culyer A J , Newhouse JP. London, U . K . Elsevier Science 2000. Weinstein M C , Stason W B . Foundations of cost-effectiveness analysis for health and medical practices. N Engl J Med 1977;296;716. Streiner D L , Norman G R . Health Measurement Scales: A Practical Guide to their Development and Use. 2 n d edition. Oxford University Press, 1995. Hays R D , Anderson R, Revicki D . Psychometric considerations in evaluating health related quality of life measures. Qual Life Res 1993;2:441-449. Maddigan S L , Feeny D H , Johnson J A for the D O V E investigators. Construct validity of the R A N D - 1 2 and the Health Utilities Index Mark 2 and 3 in type 2 diabetes. Qual Life Res 2004 (in press). Liang M H , L e w R A , Stucki G , Fortin PR, Daltroy L . Measuring clinically important changes with patient-oriented questionnaires. M e d Care 2002;40 (suppl):II-45 -11-51. 24 CHAPTER 2 BACKGROUND 2.1 COST-EFFECTIVENESS ANALYSIS AND THE QALY In recent years, cost-effectiveness analysis has emerged as the preferred technique for economic evaluation in health care.1 Cost-effectiveness analysis shows the relationship between the incremental net resources used (costs) and the net health benefits generated (effects) between a specific intervention and an alternative strategy.2 A s such, the incremental cost-effectiveness ratio (ICER) can be calculated, which is simply the ratio of the difference between two interventions' costs and the difference between their effectiveness as follows: / C ^ = A C o s t ^ E f f e c t Rather than express the outcomes in cost-effectiveness analysis in terms of naturalistic units (such as number of tender joints reduced), analysts have sought outcome measures that permit comparisons across conditions. This framework would inform societal decision-making such that competing interventions that produce the greatest gain in health for the resources expended could be identified. One potential way to permit cross-indication comparisons (i.e. comparisons of cost-effectiveness across disease states) would be to utilize life-expectancy as the measured outcome.1 However, this approach would not consider the health-related quality of life associated with various interventions and would bias funding decisions against those interventions imparting mainly H R Q L while favoring only those 25 interventions that result in improvements in survival. A s such, diseases such as rheumatoid arthritis (RA) , where improvements in survival due to interventions are small (when compared to cancer therapy or H I V pharmacotherapy) " but improvements in H R Q L are paramount, would be hard-pressed to compete for scarce health resources. Therefore, an outcome measure that integrates both years of life and H R Q L into a single metric provides a solution to this problem. 1 ' 6 The use of quality adjusted life years ( Q A L Y s ) is an attempted solution to incorporate both potential life prolongation and improvement in H R Q L . 1 Neumann et al. stated that " Q A L Y s represent the benefit o f a health intervention in terms of time in a series of quality-weighted health states, in which the quality weights reflect the desirability o f l iving in the state, typically from perfect health (weighted 1.0) to dead (weighted 0.0)."1 Therefore, once the quality weights are obtained for each health state experienced by an individual, they are multiplied by the duration of time spent in the health state. The products of these calculations are then summed to obtain the total number of Q A L Y s for that person in the following manner: T Total QALYs(QT) = t^u^D, i=l Where: Uj(qj)= the quality of life in period i (measured by utilities); t = the time interval of period in terms of years; D i = discount factor of period i A s such, the incremental cost-effectiveness ratio becomes: 26 The Q A L Y approach is not without controversy and other competing methods have been suggested such as the Healthy-Years Equivalents (HYEs) , disability adjusted life years ( D A L Y s ) , and saved young life equivalents ( S A V E ) . 1 ' 2 ' 6 Each of these techniques has their relative advantages and disadvantages which are beyond the scope o f this chapter; nonetheless, Q A L Y s have remained the outcome measure o f choice i n the health economic literature. 1' 6 However, the Q A L Y approach has several assumptions on which it is based which include, but are not limited to: 1) utility independence; 2) constant proportionality trade-off; and 3) risk attitude over life years. 1 ' 6 However, assuming that the Q A L Y approach is correct and assumptions are met, there is still the issue of what should be utilized as the source of weights for the health states in the Q A L Y calculation. Certain conditions for these weights must be met which include: 1) that they be based on preferences for health states; 2) that they be measured on an interval scale; and 3) that they be anchored at perfect health (1.0) and death (0.0). 2 For the latter anchoring requirement, health states can be valued to be worse than death and have negative weights associated with them. 1 ' 2 Also , the terminology used to describe these weights can be problematic as researchers have used the terms "utility", "value" and "preference" interchangeably.1 However, as Drummond et a l . 2 describe, the term "utility" is reserved for preferences that are measured under conditions of uncertainty that satisfy the axioms of expected utility theory (the standard gamble [SG]). "Values" are preferences that are measured under conditions of certainty and thus include rating scale and time-tradeoff elicited scores, whereas "preferences" encompass both and is a general term to describe the desirability of a set of outcomes. 27 Sources for weights which meet the aforementioned assumptions come from both directly elicited techniques and indirectly elicited techniques. 1 ' 2 ' 7 Although an in-depth discussion of techniques to directly elicit preference values to be used as weightings for Q A L Y s is beyond the scope of this chapter, a brief description is provided. Directly elicited techniques encompass the methods to elicit values and utilities as described above (namely the SG, the T T O , and the RS). Within this framework, it is recommended to use choice-based techniques (SG or TTO) over scaling methods (RS) . 1 ' 2 The S G is grounded in von Neumann Morgenstern expected utility theory (EUT) and, in its usual form, asks respondents to choose between a particular, intermediate health state with certainty and a gamble involving a probability of a better or worse outcome than the certain 6 7 outcome. ' The goal with the S G approach is to find the probability (p) in the gamble at which the respondent is indifferent between the certain and uncertain alternatives.2 Although long considered to be the preferred method due to its strong ties to E U T , a recent qualitative study investigated what thought processes respondents invoke in formulating their S G responses and found that some respondents were incorporating inappropriate information into their choices. In addition, Dolan argues that since there is evidence that people violate the assumptions of E U T , much of the appeal of the S G is lost. 6 The T T O technique involves asking a respondent to make tradeoffs between a shorter life span in perfect health versus a longer life span in the health state in question. The time in 1 2 full health is varied until the respondent is indifferent between the two alternatives. ' The T T O choice is not made under uncertainly so the values that it elicits are not considered utilities, at least under E U T . ' Recently, a study appeared that has cast doubt on the ability 28 to use the T T O as weighting for Q A L Y s due to the violation of the constant proportional time trade-off assumption. 9 Finally, the R S technique asks respondents to indicate ratings for health states (or their own health state) on a scale (usually a vertical or horizontal line) with endpoints of "worst" and "best" health states usually represented by 0 and 100. To allow for the possibility for health states worse than death, the line is often anchored at the "worst" and "best" imaginable health states.6 , 7 In comparing directly elicited preference scores, application of the various techniques result in different preference weights. 1 The S G approach almost always generates scores that are higher than the T T O method, and both are usually greater than R S scores. 1 ' 2 ' 7 Since people are risk averse, they are less wil l ing to accept the gamble outcome presented in the S G and more wil l ing to accept the certainty. A s well , since people have positive time preference and value years of life in the near future more than years of life in the distant future, they would be more wil l ing to give up years of life at the end of a profile as in the T T O . 6 Both of these assumptions would lead to higher S G scores than T T O values. Indirect preference or utility assessment techniques involve the use o f generic health classification systems in the form of a questionnaire.1'2 Through completion of the health classification system, respondents are assigned a health state which, in turn, is valued using a scoring function that applies preference weights from another population (i.e. society). Due to their relative ease and low cost to administer when compared to the S G or T T O techniques, these questionnaires are widely applied. ' These instruments commonly utilize multi-attribute utility theory ( M A U T ) to combine many attributes into a single utility value. 29 A s it is beyond the scope of the chapter to describe M A U T in detail, the reader is referred to reviews for a complete description and assumptions involved with this theory. 1 ' 2 The most common examples of these questionnaires include the Health Utilities Index Mark 2 and Mark 3 (HUI2 and H U D ) , the EuroQol (EQ-5D) and the Short-Form 6D (SF-6D). Each of these systems assesses different domains of health and relies on different scoring functions/methods to determine preference scores. 1 0 Other preference-based measures that have been less commonly applied are the Quality of Well-Being scale, the Finnish 15-D, and the Assessment of Quality of Life ( A Q o L ) . 2 ' 1 0 In the Canadian Coordinating Office of Health Technology Assessment's Guidelines for the Economic Evaluation of Pharmaceuticals, 1 1 it is stated that due to the lack of head-to-head comparisons o f these systems, there is little information to guide users in the choice o f instrument. A s such, it advises users "to study the alternative systems, to select in advance the one that best suits the study objectives, to justify the selection in the study protocol, and to stick with it. It is not appropriate to try a variety of approaches and simply pick the one that puts the product in the best light". 1 1 Dolan echoes this advice and states that the "evidence on the responsiveness of one measure relative to another is in short supply" and that further within patient comparisons are necessary.6 Neumann et al. voice concerns about the potential lack of sensitivity to important changes in particular disease states that might be experienced in the application of these systems.1 Finally, Hawthorne et al. surmise that sensitivity of these instruments might be context specific with some instruments being more sensitive to health states in some diseases when compared to others. 1 0 30 In the next sections of this chapter, I w i l l review the four most commonly utilized indirect utility assessment instruments, the comparative data that exists and the choice of Q A L Y weightings used in the published cost-utility analyses of treatments for R A . 2.2 PREFERENCE-BASED, INDIRECT UTILITY ASSESSMENT MEASURES 2.2.1. The Health Utilities Index (HUI) Mark 2 and 3 The H U I Mark 2 and 3 systems (HUI2 and HUI3) are generic preference-based measures which, when used together, describe almost 1,000,000 unique health states.12 The HUI2 and H U B health classification systems were designed to directly link the multi-attribute health status classification system used to describe health status with preference-based, multi-attribute utility functions. The preference-based scoring functions convert the descriptive health classifications into values for each attribute and a single value for overall H R Q L . 1 2 ' 1 5 The HUI2 was originally developed to assess the global morbidity burden of childhood cancer. The content of this instrument was based on a study in which lay raters ranked 15 attributes of health according to importance. 1 6 The H U B was developed to, improve upon the definitions used in the HUI2, be applicable in both clinical and general population health studies, and to have structural independence among its attributes (i.e. such that all combinations of levels in the system are possible). Since its creation, the H U B has been used in a variety o f clinical studies and in five major population health surveys in Canada. 1 4 31 The HUI2 and H U B were developed with the intention of capturing 'within the skin' attributes of health status.1 5 The HUI2 classification system originally consisted of 7 attributes: Sensation (vision, hearing, speech), Mobil i ty, Emotion, Cognition, Self-care, Pain, and Fertility. Although fertility was initially included to assess sub-fertility and infertility sequelae associated with childhood cancer and its treatment, this attribute has been dropped from the current H U I questionnaires. The 8 attributes of the HUI3 classification system are: Vis ion , Hearing, Speech, Ambulation, Dexterity, Emotion, Cognition, and Pain. Although certain attributes have the same names across the classification systems, they have different underlying constructs. The Pain attribute assesses severity o f pain in the H U B system whereas, in the HUI2 system, the frequency of pain and its control are considered. Similarly, for the Emotion attributes, in the H U B system there is a focus on happiness versus depression but in the HUI2 system, distress and anxiety are assessed. Finally, the Cognition attribute in the HUI2 system concentrates on learning whereas in the H U B , the ability to solve day to day problems is assessed. Therefore, studies sometimes combine the HUI2 and H U B to take advantage of both of their properties. 1 7" 1 9 However, the creators recommend that the H U B be specified as the measure for primary analysis due to its larger descriptive system (972,000 possible health states vs. 24,000 possible health states in the HUI2), structural independence, and availability for comparison with population norms. The authors also suggest that adding the HUI2 to the H U B also provides an efficient source to conduct sensitivity analysis on the utility values. Therefore, the HUI2 and H U B are often administered together and have been 12 formatted for interviewer-administration (face-to-face or by telephone) or self-completion. There are two versions for the viewpoints of the questionnaires: a self-assessment version 32 where information is collected from people about their own health and a proxy assessment version, where health status information is collected from people other than the subjects. Also , there are different health status assessments periods that can be captured by the H U I systems which are classified as "current" or "usual". There are three "current" versions which assess periods of the past 1-week, 2-weeks and 4-weeks. The "usual" version does not specify time recall periods. The "current" versions are recommended for clinical or health economic evaluations whereas the "usual" version is recommended for use in population health surveys. Both the H U I 2 and H U D scoring systems are based upon multiplicative multi-attribute utility functions. 1 3 ' 1 4 This facilitates calculation of H R Q L index scores, where dead has a utility of 0 and healthy has a utility of 1.0. Single attribute utility scores can also be calculated for each attribute in the HUI2 and H U D . Both systems allow for calculation of utility values less than zero (health states considered worse than death) with the lowest possible scores being -0.03 for the HUI2 and -0.36 for the H U D . 1 2 The utility scores are assumed to have interval scale properties, whereas the attribute levels do not have interval scale properties. 1 3 ' 1 4 H U I utility scores represent mean community preferences, and the scores have been calculated from preference scores measured in accordance with von Neumann- Morgenstern expected utility theory and extensions of this theory to accommodate multiattribute utility functions. 1 3 ' 1 4 ' 2 0 ' 2 1 The investigators obtained preferences (using the S G in four marker states and a rating scale for the remainder in a random sample of the population l iving around Hamilton, Ontario) for single-deficit states, including "corner" states in which the deficit is set to the worst l e v e l . 1 3 ' 1 4 ' 2 2 Rating scale value scores were 33 converted to utility scores using a power function determined from the relationship between the S G and rating scale value scores in the marker states. 1 3 ' 1 4 A number of the attributes of the HUI2 and H U D are specifically relevant to the study o f rheumatoid arthritis, including Mobil i ty, Emotion (from both systems), Self-Care, Pain (from both systems), Ambulation, and Dexterity. The HUI2 and H U D do not contain items that explicitly inquire about social roles, family roles, energy, work/productivity, and personality. These dimensions, which may be considered 'outside the skin ' , may be important to patients with rheumatoid arthritis such as social functioning, work/productivity, 9"^ 97-and energy. ". However, the scoring functions of the HUI2 and H U D may indirectly capture some o f these 'outside the skin' attributes. There has been some work to characterize what would represent the minimally important difference (MID) for the HUI2 and the H U D . 1 2 Grootendorst et al. concluded that differences on the H U D of 0.03 or more should be considered to be clinically important 2 8, 90 whereas Samsa et al. determined, from a small random sample of 160 patients from a Veteran's Administration hospital, that 0.02 (95% confidence interval 0.01 to 0.05) was a clinically meaningful difference. Based upon these results and the fact that the smallest difference between utility scores between levels of an H U I attribute is 0.05, the creators of the H U I systems recommend that a difference of 0.05 (and possibly smaller) is likely meaningful. However, further research is required to substantiate these recommendations in variety of different diseases including R A . To date, no published studies could be identified that examine the properties of the HUI2 or H U D specifically in R A . However, a few studies examine the properties of the H U D system (but not the HUI2) in people with one of several musculoskeletal diseases of 34 which R A was inc luded . 2 6 " 2 8 ' 3 0 Since most of these are comparative studies with other preference-based measures, they w i l l be discussed in detail in section 2.3. Analysis of an arthritis patient sub-sample from a population health survey using the H U D found the greatest burden of morbidity in pain with very small, but significant (due to large sample sizes), differences in ambulation and cognition compared to a reference group without stroke or arthritis 2 8 However, the major limitation of this analysis was that the diagnosis of arthritis was self-reported and could have represented one of a number of conditions including osteoarthritis, rheumatoid arthritis or other rheumatic conditions. This limitation is substantiated by the relatively high mean H U D utility scores for patients reporting "arthritis" in this sample (0.84) which was significantly lower (but not as low as expected i f the sample had been limited to only R A ) than the mean control score (those without arthritis or stroke) o f 0.92. 2.2.2. The EuroQol (EQ-5D) The E Q - 5 D was designed as a cardinal index of health for describing and valuing H R Q L . The main objective in creating this instrument was to develop a standardized measure for describing and valuing H R Q L that could be used to generate cross-national comparisons of health status. The dimensions of the instrument were selected after a detailed review of several, generic health status measures. The instrument consists o f a descriptive health state classification system and a visual analog scale 'health thermometer' (the V A S component). The descriptive health state classification system consists of 5 domains (Mobility, Self-Care, Usual Activities, Anxiety/Depression, and Pain/Discomfort), each with 3 response levels (no problems, some 35 problems, extreme problems). The health 'thermometer' represents a subjective, global evaluation of the respondent's health status on a scale between 0 and 100, where 0 represents worst imaginable health state and 100 represents best imaginable health. Three types of data are produced for each patient: a health state vector or profile describing the extent of problems on each of the 5 domains, a population-weighted health-index based on the health state vector (the E Q - 5 D index or utility score), and a VAS-based ^9 self-rated assessment of H R Q L . The E Q - 5 D was intended for self-completion and the recall period refers to the present (today). 3 3 The scoring algorithm typically applied to the descriptive health classification system is the UK-based York scoring system. 3 4 This scoring system was generated from interviews of a sample of the general U K population. Respondents were asked to rank and then value hypothetical E Q - 5 D health states using the Time Trade-Off approach. 3 4 Although no Canadian-based scoring ' tariff has been developed for the EQ-5D, a scoring model has been generated for VAS-based valuations in an adult U S sample. A study of differences between a European and Canadian-based sample o f E Q - 5 D valuations found V A S valuations for E Q -5D health states were comparable for all domains other than Usual Act iv i t ies . 3 6 Scores on the E Q - 5 D range from -0.56 to 1.00 where negative values represent health 99 states worse than death. This range is the widest of all utility values determined by any of 99 the preference-based instruments. In addition, the brevity of the E Q - 5 D has been considered to be a strength in the study of H R Q L . In a large (sample size was over 1100 patients in each group), randomized comparison of the two instruments, the response rate of the E Q - 5 D was shown to be higher than the SF-6D in severely disabled persons after stroke (66% with no missing data vs. 55%, pO.OOOl , respectively). 3 5 36 However, the advantage of brevity of the E Q - 5 D leads to a major limitation in its health classification system - the few domains of health assessed and the small number of health states described. O f all the preference-based instruments, the E Q - 5 D has the smallest number of health states described (243) compared to the HUI2 (24,000), H U D (972,000) and the SF-6D (18,000). In addition, it has the fewest domains of health that are assessed by its system. Therefore, in studies of chronic conditions such as R A , these might not be sufficient to accurately describe impairments in H R Q L . For example, the E Q - 5 D lacks dimensions of H R Q L that may be impacted by R A such as dexterity, social functioning and vitality. The ubiquitous dimension 'usual activities' has the potential to elicit some information on personality, family roles, or productivity, but the bundling of so many potential aspects of H R Q L obscures interpretation. Also , many researchers have found that there are many gaps in the distribution o f scores achieved by the E Q - 5 D , especially in the mid-utility range (between 0.30 to 0.5) with a clustering o f scores in the upper-utility area possibly leading to ceiling effects. 3 0 ' 3 8 Another concern about the E Q - 5 D relating to the small number of health states described is its ability to detect responsiveness and sensitivity to changes. Floor and ceiling effects have also been observed to a greater degree in the E Q - 5 D than in the Short Form-12 (SF-12), an abbreviated version of the Short Form-36 (SF-36). 3 9 In studies comparing the E Q - 5 D to the SF-36, the EQ-5D index score was less responsive to change and less able to discriminate between groups than the SF-36. 4 0 " 4 2 A recent study comparing the E Q - 5 D to the SF-6D (a preference-based measure derived from the SF-36 - see below for more details), the authors found that the SF-6D was more sensitive than the E Q - 5 D in detecting small changes in patients who had undergone liver transplantation.4 3 37 Specifically, in R A , there have been two studies that have examined the construct validity, responsiveness and reliability of the EQ-5D. Hurst et al. first reported on the validity of the E Q - 5 D in terms of its ability to measure both current health status and change in health status in a small sample of 55 patients with R A . 4 4 A t baseline, the E Q - 5 D index scores were significantly correlated with other condition-specific measures including loss of function, joint pain, joint tenderness and mood. In addition, E Q - 5 D change scores were correlated with changes in these measures. In a larger study with 233 R A patients, these results were repeated.4 5 In addition, the investigators found that the test-retest reliability of the E Q - 5 D utility score (ICCM).78, 95% CI 0.60-0.96) after two weeks which was higher than all other clinical measures except the Health Assessment Questionnaire ( H A Q ) scores. Therefore, it appears that the E Q - 5 D has demonstrated reliability and cross-sectional and longitudinal construct validity in R A . However, from these results, it was unclear how these properties for the E Q - 5 D would compare to other preference-based instruments in the assessment of R A . In a study comparing the responsiveness of generic health status measures in patients with R A who were either receiving infliximab or not receiving infliximab, SF-6D scores were found to be consistently higher than E Q - 5 D scores. These authors found lower test-retest reliability for the E Q - 5 D (ICC = 0.66) compared to the SF-6D (ICC=0.72). In addition, these authors found mean changes almost two-fold greater in the E Q - 5 D than the SF-6D in those patients receiving infliximab. However, effect sizes as a measure of responsiveness were larger in the SF-6D due primarily to the smaller standard deviation at baseline (0.07 for the SF-6D vs. 0.30 for the EQ-5D) . Other studies comparing the E Q - 5 D to other preference-38 based measures have been conducted outside of R A and w i l l be discussed in detail be low. 3 0 ' 4 3 N o studies were identified that identified what is thought to be the minimally important difference (MID) in index scores of the EQ-5D. It has been hypothesized that the M I D is 0.03 since this represents the smallest change in the utility values that can occur as a result of a one category change in a single dimension. Further research is necessary to further characterize the M I D for the EQ-5D. 2.2.3. The Short Form 6D (SF-6D) The most recent o f the preference-based, indirect utility assessment instruments, the SF-6D, was created by Brazier et al. in an effort to derive a scoring algorithm to derive preference-based scores from the SF-3 6 4 6 ' 4 7 The SF-36 is one of the most widely utilized H R Q L measures and contains 36 questions assessing these eight dimensions: physical functioning, role limitation due to physical health , social functioning, vitality, bodily pain, mental health, role limitation due to emotional problems, and general health. 4 8 The SF-6D revised the SF-36 into a six-dimensional health state classifications system assessing physical functioning, role limitations, social functioning, pain, mental health, and vitality. The SF-6D health classification system defines health states by a respondent selecting one level from each of the six dimensions. Each dimension has four to six levels and thus, 18,000 possible health states are defined in this manner . 2 2 ' 4 7 To assess preferences for the multi-attribute health states defined by the SF-6D system, the creators used an interviewer administered S G in a representative sample from the U K . 4 7 The boundaries of the SF-6D utility scores are from 0.30 to 1.00 with a score of 1.00 being indicative of "full health". The 39 M I D of the SF-6D, based upon a meta-analysis of seven longitudinal studies, was determined to be the 0.033 (95% CI 0.029 to 0.037). 4 9 Due to the newness of this preference-based, indirect utility assessment instrument, there have been few published studies in which it has been utilized. However, the use of this measure is increasing, and with the availability o f many SF-36 datasets that could be converted into preference-based measures, it is anticipated that the application of this measure w i l l continue to grow. 26'30>43>49-51 of note, the creators of the SF-6D state that, when compared to the EQ-5D, the fact that the SF-6D has a much larger descriptive system may result in greater sensitivity. 4 7 Specific studies comparing results obtained with the SF-6D and other preference-based measures w i l l be described in detail below. 2.3 EMPIRIC COMPARISONS BETWEEN THE INDIRECT UTILITY ASSESSMENT INSTRUMENTS 2.3.1. Comparisons between the Health Utilities Index Mark 2 and Mark 3 One o f the first studies published in the literature examining differences achieved with scores achieved from the HUI2 and H U D was authored by Neumann et al.. These investigators compared scores achieved with the two H U I systems in a cross-sectional sample of 679 patients with Alzheimer's disease (AD) and their caregivers. In addition, the investigators utilized the scores obtained by the two systems in a decision-analytic, Markov-model based, economic evaluation of a new drug to determine what the impact using the different utility values would be on the incremental cost-effectiveness ratios. When patients completed the questionnaires, their mean (SD) utility scores were lower on the H U D 40 (0.22[0.26]) than on the HUI2 (0.53 [0.21]). However, when caregivers completed the questionnaires as proxies for the patients, similar results were found between the two systems (mean score [SD] on the H U D o f 0.87 [0.14] and HUI2 0.87 [0.11]). Both systems appeared to have construct validity in terms of their ability to discriminate between severity levels of A D . For the H U D , patient scores ranged from 0.47(0.24) for questionable A D to -0.23 (0.08) for terminal A D , compared with a range of 0.73 (0.15) to 0.14 (0.07) for the HUI2 . In the results of the cost-effectiveness analysis, the results were more economically attractive when the scores for the H U D were used as compared to the HUI2 . Maddigan et al.. examined the construct validity of the two H U I systems in 394 patients with type 2 diabetes in rural communities in Alberta and subsequently compared the scores of the two systems and examined reasons for their differences. 5 3 ' 5 4 The mean score of the HUI2 was higher (0.78, S D 0.18) than the mean score of the H U D (0.64, S D 0.30). Using the "known groups" approach to the assessment of construct validity, the investigators found that the HUI2 , H U D and the R A N D - 1 2 all discriminated across subgroups o f individuals representative o f more and less advanced diabetes or differing levels of disease severity. For example, disease severity measures were associated with impairment on the vision ambulation, dexterity and pain attributes on the H U D and impairments on self-care and mobility attributes of the HUI2 . Overall scores were lower in those above the median duration of diabetes than those below and in those whose diabetes was managed using insulin compared to diet alone. In the paper comparing the HUI2 to the H U D scores and examining the extent to which each of these systems detect differences associated with varying levels of type 2 diabetes severity or disease advancement, 372 individuals were available for analysis. Specifically, 41 differences were investigated in single attribute and overall HUI2 and H U D utility scores o f groups with presumed differences in disease severity or stability of control. Severity of type 2 diabetes was defined based upon those receiving insulin therapy (most severe) to those treated with diet al.one (least severe). Stability of control was defined based upon absenteeism from work, emergency room visits, and hemoglobin A l e values. Relative to HUI2 scores, larger differences were seen in HUI2 scores for individuals defined as having more advanced type 2 diabetes. Both the pain and emotion attributes of the H U D categorized a larger proportion o f the sample as having moderate to severe impairment than corresponding attributes in the HUI2 system. These observations prompted the authors to conclude that, due to the greater range o f possible scores (including the wide range o f states valued as worse than death) and its superior ability (relative to the HUI2) to discriminate between those with moderate or severe impairment as compared to mi ld or no impairment, the H U D may be a better instrument to utilize in type 2 diabetes. The responsiveness of the H U I systems was recently compared to the SF-36 and disease-specific measures (the Harris Hip Scale, Western Ontario and McMaster University Osteoarthritis Index ( W O M A C ) , and McMaster-Toronto Arthritis Patient Preference Disability Questionnaire ( M A C T A R ) ) in patients undergoing hip arthroplasty. Feeny et al. evaluated the responsiveness of these questionnaires in 90 patients (out of a possible 553 patients who had been initially referred for hip disease). Questionnaires were applied prior to surgery and post-surgery facilitating comparisons between pairs of measures for each patient. The responsiveness statistics utilized were the effect size (ES), the standardized response mean ( S R M ) , the relative efficiency statistic (RE) and the paired t-test. Some form of improvement was detected by the overall/summary scores for al l o f the instruments and in 42 many o f the domain scores of the SF-36 and the single-attributes for the H U I systems. A l l of the overall scores had large E S statistics including the HUI2 and H U B . For example, for the SF-36, improvements were observed in the physical functioning, bodily pain and vitality domains as well as the physical component summary score; for the HUI2 , improvements were observed in the pain and self-care attributes; and finally, for the H U B , improvements were observed in the pain and ambulation attributes. Surprisingly, the mobility attribute for the HUI2 was not responsive. Overall, as hypothesized, the disease-specific measures were most responsive but the generic measures yielded acceptable responsiveness statistics and would be suitable to be used in this context. Another publication has resulted from this data set which examines differences between community based preferences (based on responses to the HUI2 and H U B ) and individual preferences (based upon S G utilities). 5 6 The investigators examined agreement (as measured by intraclass correlation coefficients [ICC]) between the different utility assessment techniques and compared the mean scores between instruments. Mean scores were statistically higher for the S G when compared to the H U B (0.62, S D 0.31 vs. 0.52, S D 0.21) but not the HUI2 (0.62, S D 0.31 vs. 0.62, S D 0.19). However, the ICCs were low ranging from 0.06 (agreement between the S G and HUI2) to 0.09 (agreement between the S G and the H U B ) . Thus, the authors concluded that the HUI2 was a good proxy for directly measured S G at the group level; however, this conclusion cannot be applied at the individual level (as evidenced by the low agreement). Feeny et al. conducted another study examining the relationships between the HUI2 , H U B and directly-measured S G scores at both the individual- and group-level in a sample o f 140 teenage survivors of extremely low birthweight ( E L B W ) and 124 control group teens. 43 Again, mean S G scores were compared to HUI2 and H U D scores and agreement was assessed between the S G and the HUI2 and H U D using the ICC. For the E L B W group, the S G , HUI2 and H U D mean (SD) scores were 0.90 (0.20), 0.89 (0.14), and 0.80 (0.22) compared to the control scores of 0.93 (0.11), 0.95 (0.09) and 0.89 (0.13) respectively. The differences between the E L B W and control HUI2 and H U D scores were significantly different (pO.OOOl and p<0.0002) and clinically important. However, no such differences were observed between the S G in the sample and the controls. A l so , although there were no differences between mean S G and HUI2 scores, mean S G and HUI2 were significantly different (p<0.001) than H U D scores with the latter being systematically lower. In the assessment of agreement, the ICCs between the S G and the HUI2 or H U D were very low indicating poor agreement at the individual level. Agreement between the HUI2 and H U D was moderate at 0.63 (95% 0.01 to 0.73) in the sample and control groups combined. Again, the authors conclude that, at the group level, results from the HUI2 and the S G are interchangeable but this relationship did not hold up at the individual level. 2.3.2. Comparisons across Indirect Utility Assessment Instruments Outside of Musculoskeletal Diseases The first study comparing preference based scores derived from indirect utility assessment instruments arose within the framework of a randomized clinical trial o f 561 patients being treated with tirilazad mesylate or placebo for aneurysmal subarachnoid hemorrhage. 5 8 The E Q - 5 D V A S , the HUI2 utility scores and a rating scale were used as measures of patient preferences. The scoring function of the E Q - 5 D was not considered as the authors stated that they were assessing patient preferences (rather than societal 44 preferences). The measures o f preferences tended to have higher agreement at lower levels of functioning and poor agreement at higher levels of functioning. Since this study did not directly compare the scores that are typically utilized as Q A L Y weights, little can be concluded from this research. However, this study likely sparked the interests of other investigators to conduct comparisons between the indirect utility assessment instruments and was the forerunner to a body of research. The creation of the Assessment of Quality of Life ( A Q o L ) questionnaire led investigators to compare its properties with those of the H U B , E Q - 5 D , SF-6D and a Finnish preference-based measure, the 15-D. 1 0 O f note, the SF-6D scores were calculated by an older algorithm which has since been changed. 4 6 , 4 7 The investigators administered these instruments to residents in Victoria, Australia. The sample was selected to provide a heterogeneous, representative sample of community members weighted by socioeconomic status, chronically i l l patients attending outpatient clinics in two of Melbourne's largest hospitals, and inpatients from three hospitals. The response rate was 58% (n=396), 43% (n=334) and 58% (n=266) for the community, outpatient and inpatient samples, respectively for a total number o f 976 respondents. The investigators found that the distributions o f the scores o f the five instruments were quite different with A Q o L , H U B and EQ-5D having a greater range o f scores and lower values than the SF-6D and the 15-D. However, when broken down by sample type (community, outpatient, inpatient) and by age (16-35, 36-50, 51-65, and >66), all the instruments displayed a monotonic, decreasing relationship between sample and age-groupings. Spearman's correlations between each pair of instruments scores revealed high (>0.60) correlation coefficients for al l comparisons. The A Q o L and the 15-D had the highest 45 correlation (0.80) whereas the E Q - 5 D and the HUI-3 had the lowest (0.64). Finally, in a more detailed analysis of patterns of agreement between the instruments, it was revealed that a change in the average score in the SF-6D and the 15-D corresponded to a much greater change in the scores predicted by the other three instruments. In the determination of ceiling effects (where scores cluster at the highest ends of the scale), scores o f the other instruments were plotted when the score of the instrument of interest was at a maximum value of 1.00. B y examining the dispersion in the other scores, the investigators determined the ability o f the other instruments to detect differences in health states when the instrument o f interest was at its ceiling. The results showed that the dispersion of scores for the other instruments when the A Q o L , 15D, or SF-6D were at 1.00 was minimal suggesting that these instruments had a relatively high ceiling. However, when a utility value of 1.00 was achieved with the H U D and the E Q - 5 D , there was significant dispersion of scores in the other instruments suggesting a possible low ceiling effect. Thus, it would appear that despite having a wider range, both the H U D and the E Q - 5 D display ceiling effects that are not experienced by the SF-6D. Bosch and Hunink compared the H U D and the E Q - 5 D in 88 patients treated for intermittent claudication in the Netherlands. 5 9 These patients completed the H U D , E Q - 5 D , R A N D 36-Item Health Survey 1.0, T T O , S G and rating scale before revascularization and at follow-up at 1 month after the procedure. After revascularization, improvements were mostly noted in the H U D attributes of pain and ambulation compared to mobility, usual activities and pain/discomfort domains in the E Q - 5 D system. It was hypothesized that since T T O scores are usually lower than S G scores, the mean T T O and the E Q - 5 D (which uses the T T O in its scoring function) scores would be 46 lower than the mean S G and the H U D (which uses the S G in its scoring function as described above) scores. Prior to treatment, the E Q - 5 D mean (SD) score (0.57 (0.25)) was significantly lower than the H U D mean (SD) score (0.66 (0.20), p O . O l ) . Also , as hypothesized, the T T O mean (SD) scores (0.82 (0.17)) were lower than the S G mean (SD) scores (0.91 (0.14)). However, at 1 month after the procedure, there were no differences between the H U D and the E Q - 5 D scores (0.77 (0.21) vs. 0.79 (0.23), respectively). To investigate agreement at the individual level, the investigators determined the I C C values between the H U D and E Q - 5 D at baseline (0.49) and at 1 month after the procedure (0.66). The I C C between the changes in the H U D and EQ-5D score was poor (0.30). The authors concluded that studies utilizing the mean values of these systems (such as in cost-utility analysis) would conclude a lower impact on H R Q L due to revascularization i f the H U D was used instead o f the E Q - 5 D due to the smaller changes in utility scores. Longworth and Bryan conducted a comparison of the E Q - 5 D and the SF-6D in liver transplant patients in 524 patients (90% response rate for at least one questionnaire) in the U K . 4 3 Investigators administered the H R Q L questionnaires from point of listing on the transplant list and then in 3 month intervals until transplantation. After transplantation, H R Q L questionnaires were given at 3, 6, 12 and 24 months. A t the conclusion o f the study, there were 1462 data pairs (at two consecutive time points) to compare the two indirect utility assessment instruments. When results of the mean scores of the two instruments were compared at baseline (listing time) to 12 months post-transplantation, the E Q - 5 D detected a significant improvement in H R Q L (mean score increased from 0.52 to 0.61, mean change o f 0.09, 95% 47 CI 0.03 to 0.14); however, the SF-6D did not show a significant change (mean score increased from 0.61 to 0.62, mean change of 0.01, 95% CI -0.04 to 0.05). In pre-transplantation measurements, the SF-6D was found to have a much narrower spread and symmetrical distribution when compared to the E Q - 5 D . Due to its relatively high lower bound of 0.30, scores could not dip lower than this value. Conversely, no patients were scored as 1.00 (full health) by the SF-6D system. On the other hand, the E Q - 5 D had a sizeable proportion of respondents classified as health states worse than dead (negative values) at any time point prior to transplantation and a number of patients scoring full health at all time points prior to transplantation. In post-transplantation measurements, the results were similar with less o f the E Q - 5 D scores being in the "worse than dead" range but more at the full health point (1.00). The distribution o f the SF-6D scores was similar as those achieved prior to transplantation with a few patients reporting full health (1.00). O f note, there were also gaps in the distribution of the E Q - 5 D scores, with the most noticeable being between 0.37 to 0.50 and 0.88 to 1.00. Although the correlations between the E Q - 5 D and SF-6D scores were high (0.76, p<0.001), there was a large amount of variation in the scores across the measures. In the examination of ceiling effects, when the E Q - 5 D was scored as 1.00 (a total of 237 paired observations), only 22 SF-6D also were at full health. The remaining SF-6D scores ranged from 0.57 to 0.99 with a mean score of 0.82. Thus, it would appear that towards the higher range of utility scores, the SF-6D showed greater sensitivity. However, the reverse was true when floor effects were examined. More respondents indicated the lowest levels on the SF-6D than the E Q - 5 D domains. For example, 42% and 21% of respondents indicated the most severe levels on the role limitation and vitality domain, respectively on the SF-6D 48 questionnaire with the largest proportion indicating the worst level on any E Q - 5 D domain being usual activities at 14%. Therefore, from the results of this analysis, it would appear that despite having better properties at the upper end of the utility range, the SF-6D displays floor effects. This finding is likely limited to disease states where the burden of disease is large as in organ transplantation. A s such, the use of the SF-6D over instruments such as the E Q - 5 D may underestimate the magnitude of H R Q L improvements in these types of conditions and undervalue treatment in cost-utility analysis. This finding is somewhat in agreement with the statement by Brazier et al. in that "any greater sensitivity [of the SF-6D] would be most likely in groups experiencing mild to moderate health problems and in those expected to experience comparatively small changes or where small differences are expected between interventions." 4 7 O'Br ien et al. examined the level o f agreement between the SF-6D utility algorithm and the H U D in patients at increased risk o f sudden cardiac death participating in a randomized trial of implantable defibrillator therapy. 5 1 The SF-6D and the H U D questionnaires were completed at baseline by 246 patients generating cross-sectional scores. The mean values from the H U D (0.61, 95% CI 0.60 to 0.63) and the SF-6D (0.58, 95% CI 0.54 to 0.62, p<0.03). A s shown with other studies, the range o f the H U D scores were much greater (-0.21 to 1.00) as compared to the SF-6D scores (0.30 to 0.95). The distributions of the scores of the two systems again were quite different with the SF-6D passing the Kolmogorov-Smirnoff test for normality. The distribution for the H U D scores failed statistical tests for normality and followed a skewed, bimodal pattern. Agreement, assessed using the intraclass correlation coefficient (ICC), was low (ICC 0.42, 95% C I 0.31 to 0.52). 49 In their discussion, the authors raise several interesting points on potential reasons for the differences observed in the scores from the two instruments. Firstly, the SF-6D considers different domains of health while the H U B is based on "within the skin" attributes. Secondly, although their scores are both based on the S G , SF-6D health states were valued directly while the H U B health states were directly valued by RS and converted to S G scores by a statistical power function. Finally, although the absolute scores for these instruments were different in a cross-sectional study, it is not clear from their results i f difference scores would vary to the same extent in a longitudinal analysis. Siderowf et a l . 6 0 compared the scores of three preference-based instruments, the E Q -5D, the Disability and Distress Index (DDI), and the HUI2 in 100 patients with idiopathic Parkinson's disease (PD). The D D I contains four functional domains: general mobility, usual activities, self-care and social and person relations. Responses on these domains are combined with an overall rating for the dimension "distress" and the entire system is scored between -1.486 and 1.0. While the D D I appears to be preference-based, it does not provide utility values. Construct validity of the three preference-based instruments was determined by comparing their scores with the total Unified P D Rating Scale [UPDRS] (a widely used disease-specific, symptom severity rating system), the Hoehn and Yahr scale (disease severity scale in PD) and the Beck Depression Inventory. The three instruments' discriminative ability was tested by dividing the study sample into upper and lower halves and quartiles based on the U P D R S . Overall, the mean (SD) of the three scores were: E Q - 5 D 0.59 (0.27), HUI2 0.75 (0.18) and D D I 0.93 (0.17) which were significantly different (pO.OOl for pairwise comparisons). Only the EQ-5D supplied scores that were negative (health states valued as 50 worse than death). The scores between the instruments were moderately to strongly correlated with Pearson correlations coefficients of 0.74 (HUI2 with the EQ-5D) , 0.62 (EQ-5D with the DDI) and 0.56 (DDI with the HUI2). A l l three instruments were significantly correlated with disease-specific measures. Generally, the D D I had lower correlations with these measures than the HUI2 and EQ-5D. In terms of discriminative ability, the H U I 2 and E Q - 5 D were superior to the D D I in their ability to distinguish between severities of P D . None of the instruments were able to distinguish between subjects with and without motor fluctuations or drug-induced dyskinesia. In their discussion of the research findings the authors raise an interesting point - namely, that because all the instruments yielded scores that were correlated with the disease-specific measures, they might be measuring functional status much more than preferences. In order to test this hypothesis, further studies examining correlations and agreement with directly elicited preference techniques such as the S G and T T O . Using 36 clinical experts to score the HUI2 , HUI3 and the E Q - 5 D classification systems according to literature reports on eight sequelae associated with childhood meningitis, scores obtained using these three health classification systems were compared. 6 1 The sequelae chosen in the valuation exercise were deafness, minor hearing loss, epilepsy, mild mental retardation, severe mental retardation combined with tetraplegia , paresis of the leg, and mi ld mental retardation combined with epilepsy and paresis of the leg. For each of the sequelae, the investigators constructed a short, structured synopsis that reported on relevant domains. In general, scores on the H U I 2 and the E Q - 5 D were comparable except for the severe retardation and tetraplegia sequelae which was scored, on average, to be -0.15 (0.13) with the E Q - 5 D and 0.12 (0.03) with the HUI2 . Interestingly, with the same health 51 state, the score on the H U B was -0.33 (0.02). A l l health states were scored significantly lower using the H U B than the other systems (p<0.05 for all). However, the HUI2 and H U B had the same ranking for the health states. Rankings with the E Q - 5 D system were similar except it ranked epilepsy lower than mi ld hearing loss and leg paresis lower than deafness in contrast to the H U I systems. Using various measures of agreement, there were significant differences across all three of the instruments for each of the sequalae suggesting that they were not interchangeable. From their results, the authors concluded that sensitivity analyses of Q A L Y weightings must be employed in cost-utility analysis in order to account for the observed differences in scores. Lubetkin et al. examined the relationship between the SF-12, the E Q - 5 D and the H U B for overall scores and in analogous domains of health in a convenience sample of 301 participants (77% participation rate) at an inner-city community health centre in N e w York C i t y . 6 2 Participants were mainly from ethnic minorities, had low annual incomes (90% earned less than $30K), and low education (47% had high school graduation or less). Using Pearson's correlation coefficients, correlation between the overall scores ranged from 0.41 (SF-12 with E Q - V A S ) to 0.69 ( H U B with E Q - 5 D index). Considering just the preference-based measures, correlations between similar domains (using Kendall 's tau for ordinal variables) were 0.59 (between the H U B ambulation attribute and the E Q - 5 D mobility domain), 0.58 (between the H U B pain attribute and the E Q - 5 D pain domain, and 0.55 (between the H U B emotion attribute and the E Q - 5 D anxiety/depression domain). Areas o f impairment most frequently detected by the H U B were pain, vision, cognition and emotion, whereas, for the E Q - 5 D pain/discomfort and anxiety/depression were impaired most often. The authors concluded that despite differences in the structure of these systems, correlations 52 between related aspects were moderate to strong and participants demonstrated consistency in responses across analogous items. From a population health perspective, there have been comparisons of the H U D and the E Q - 5 D both in the U K and in Canada. 6 3 ' 6 4 In the comparison in the U K , the EQ-5D, a modified version of the H U D and the SF-12 were compared within a general population sample. 6 3 The modified version of the H U D that was used was an eight item questionnaire (one for each domain) that is available at no charge from the developers. The authors claimed that they could not afford the fees associated with using the standard H U D questionnaire as they are substantial. The three instruments were evaluated in terms of their feasibility, coverage (such that there should be a broad range o f responses across its items), and discrimination (ability to discriminate between individuals based upon self-rated health status, measurable morbidity, and socioeconomic status). A l l instruments showed feasibility in that there were low non-response rates (less than 6% for all the items across all questionnaires). The SF-12 had a broad distribution of scores across its items although there was still heavy skewing towards responses indicative of good health. However, the H U D and the E Q - 5 D scores were highly skewed on all dimensions with the majority reporting full health. In the E Q - 5 D , respondents were least l ikely to report problems on the self-care domain but most likely to report pain/discomfort. Forty-nine percent of respondents indicated no problems on all five domains of the E Q - 5 D . For the H U D , respondents were least likely to report problems on the speech dimension but most likely to report decrements in the pain dimension. In the sample, there were 35 distinct health states (out of the possible 243) described by the E Q - 5 D system compared to 126 distinct health states defined by the H U D . F i n a l l y , i n terms o f their ability to 53 discriminate between self-reported health states and socioeconomic status, all three instruments had acceptable levels of performance. The SF-12 summary scores could not discriminate among people with different education levels. The authors concluded that, despite the differences in their descriptive systems and scoring functions, overall there was no discerning feature to pick one over the other as a population health measure. The Canadian study attempted to assess the relationship between the H U B and the E Q - 5 D at both the descriptive and scoring level . 6 4 The analysis is performed on answers given by 1,477 respondents of Statistics Canada 1998 National Population Survey pilot study. Both the H U I and the E Q - 5 D mean scores declined with increasing age and with decreasing self-perceived health. People who report chronic conditions had lower scores and people with more severe problems a larger change in the scores. The H U B and E Q - 5 D scores were moderately correlated (0.58 for Spearman correlation coefficient) as were answers to self-related health questions (coefficients varying between 0.48 and 0.56). The instruments' mean scores had reasonable agreement in less healthy respondents and respondents of younger age (16-34) but, in healthier respondents, the mean scores were less similar. Thus, based on these results, it was decided to continue to use the H U B for the Canadian National Population Health Surveys. 2.3.3. Comparisons across Indirect Utility Assessment Instruments within Musculoskeletal Diseases Recently, three studies comparing the various properties of some of the indirect utility 26 27 30 * instruments in samples with rheumatologic conditions were published. ' ' O f these, in only one of the studies did all patients have R A 2 6 while in the other two studies, patients had 54 a mixture o f musculoskeletal diseases. The study conducted exclusively in R A , authored by Russell et al., examined the reliability and responsiveness of the SF-36, SF-6D, E Q - 5 D , S G , the modified H A Q , and a pain V A S in two groups of R A patients (Group 1 consisted of 24 Oft patients with stable R A and Group 2 consisted of 60 patients beginning infliximab therapy). Patients in group 2 were assessed prior to being initiated on infliximab therapy and after 14 weeks o f infliximab treatment. Test-retest reliability was estimated for each instrument in the stable patient group using the I C C whereas responsiveness was assessed by using the paired t-test, effect size (ES) and standardized response mean (SRM) . For all the measures, the I C C ranged from 0.50 (role emotional domain from the SF-36) to 0.92 (physical functioning domain from the SF-36). The preference-based measures had moderate reliability (ICCs of: E Q - 5 D 0.66, SF-6D 0.72, S G 0.73). However, the sample from which these results were derived was very small (n=24) and thus, these estimates may not be robust. In terms of responsiveness, for Group 2; all the overall scores and domain scores for the SF-36 detected significant changes from baseline to the second measurement. Standardized response means ( S R M ) and ES were the largest for the pain V A S , the E Q - 5 D V A S , the SF-36 physical component scores, and the SF-36 vitality domain. In terms of the preference-based measures, the S R M and E S values were 0.67 and 0.64 for the EQ-5D, 1.40 and 0.87 for the SF-6D, and 0.49 and 0.43 for the S G . Despite the fact that the change described by the E Q - 5 D system was twice that described by the SF-6D, the responsiveness statistics were much smaller mainly due to the larger SD of the baseline and change scores of the EQ-5D. The authors concluded that the SF-6D might be a preferable to the E Q - 5 D in measuring clinically-relevant improvement in R A . 55 Conner-Spady et al. assessed the interchangeability o f preference-based, indirect utility assessment instruments (the EQ-5D, H U D , and the SF-6D) in a longitudinal study. One hundred and sixty one patients (of the 252 initially approached) with at least one o f several rheumatological conditions (51% had R A , 19% had low back pain, 14% had knee osteoarthritis, 12% had fibromyalgia and 3% had psoriatic arthritis) participated in the baseline questionnaire and 98 patients had data both at baseline and 12 months later. O f the 98 patients. The mean scores (SD) of the instruments at baseline were 0.49 (0.31), 0.50 (0.54) and 0.62 (0.14) for the EQ-5D, H U D and SF-6D respectively. Distributions of the three measures were very different with the E Q - 5 D having a bimodal distribution with two gaps between 0.28 and 0.50 and another between 0.88 and 1.00. The H U D score distribution was more continuous with a wide range from -0.21 to 1.00. The SF-6D had a normal appearing distribution. O f the three instruments, it appeared that E Q - 5 D had some ceiling effects when compared to the other two instruments. For specific domains that were analogous across the instruments, > 97% reported decrements on the pain domains across all instruments, 42% ( H U D ) to 77% (SF-6D) reported impairment for mental health, and 52% ( H U D ) to 98% (SF-6D) for impairments on mobility, ambulation, or physical functioning. Responsiveness o f the three instruments from baseline to 12 months was assessed using a self-reported change question (dividing the group into "better", "same" and "worse" subgroups) and the E S . After 12 months, 41% reported their health to be better, 31% the same and 28% as worse compared with baseline. Using a repeated measures A N O V A , a significant tool effect with significantly higher SF-6D scores and a significant tool by time by group interaction with the E Q - 5 D scores showing a significantly greater mean improvement than the other two instruments (changes of 0.15, 0.07 and 0.05 for the E Q - 5 D , 56 H U D , and the SF-6D respectively). For the group reporting their health to be "worse", the E Q - 5 D showed a significantly greater mean decrease (0.19) than either the H U D (0.05) or the SF-6D (0.03). For the "better" and "worse" groups, there were no significant differences in the E S between the instruments. However, for the "worse" group, the E Q - 5 D had a significantly larger ES than the H U D and SF-6D. The authors concluded that the instruments, although measuring a similar underlying construct, were not interchangeable and could result in substantially different estimates i f used in a cost-utility analysis. The main limitation of this study was the inclusion of several disease states which may have influenced different domains covered by the various instruments. Thus, it was difficult to separate out the performances of the instruments in any particular disease state. Finally, the E Q - 5 D was compared to the H U D in sample of patients with rheumatic diseases in Singapore. Specifically, the authors compared overall utility scores, test-retest reliability, and construct validity of these instruments in 114 patients with rheumatic diseases (49 had R A , 31 had lupus, and 24 had osteoarthritis).2 7 Test-retest reliability was assessed using I C C values. Construct validity of the instruments was assessed by, based upon median values, dichotomizing SF-36 scores, pain V A S scores, tender points, and the number of other acute/chronic conditions and conducting t-tests and Mann Whitney U tests on the H U D or E Q - 5 D scores in each of these groups (i.e. mean H U D scores in those above and below the median of the SF-36 would be compared). Agreement between responses on analogous domains between the instruments was examined. The test-retest reliability of the E Q - 5 D was 0.64 compared to 0.75 for the H U D . The means (SD) of the preference-based scores were 0.75 (0.21) for the E Q - 5 D and 0.76 (0.17) for the H U D . Correlation between the two instruments'baseline scores was 0.45. The E Q -57 5D system classified patients into 16 unique health states whereas the H U B system classified patients into 72 states. For the pain dimensions on the two instruments, 78% reported deficits on the E Q - 5 D while 90% reported decrements in this domain on the H U B . A s expected for both instruments, patients classified as having worse health status had lower scores than those classified as having better health status (by all o f the criteria). Correlations between the two preference based measures and the SF-36 domain scores ranged from 0.23 to 0.55 for the E Q - 5 D and from 0.29 to 0.49 for the H U B (with the highest correlation for both instruments being with the "bodily pain" domain). The authors concluded that the E Q - 5 D and the H U B performed equally well in assessing H R Q L although they measured different dimensions. A s with the previous study, the inclusion of multiple disease states makes interpretation of the results difficult. In addition, despite collecting longitudinal data, responsiveness was not assessed. 2.4 QUALITY WEIGHTINGS IN THE ESTIMATION OF QALYS IN COST-UTILITY ANALYSES IN RA: WHAT ARE INVESTIGATORS USING? With the availability of new, effective and costly pharmacotherapeutic interventions for R A , economic evaluations of these therapies are becoming more common. 6 5 " 7 9 Increasingly, the methodology utilized to conduct these analyses falls under the cost-utility framework and an incremental cost per Q A L Y is often ca lcula ted . 2 ' 6 7 ' 6 8 ' 7 1 " 7 4 ' 7 7 ' 7 9 ' 8 0 This approach is supported by the publication of a recent consensus-based reference case for economic evaluations of programs or interventions in the management of R A . Recommendations outlined in the consensus document advocate the use o f Q A L Y s as outcome measures but also stated that disease-specific measures could also be considered. 58 The consensus document also attempts to address which sources should be utilized for Q A L Y weightings and states that both direct and indirect (specifically naming the EQ-5D on and the H U D ) methods are acceptable to utilize. A s outlined above in previous sections of this chapter, the use of different quality weighting sources in the estimation of Q A L Y s across economic evaluations (even within the same disease area) could lead to very different estimations in the incremental number of Q A L Y s and, therefore, the incremental cost-effectiveness (or cost-utility) ratio. 2 6> 3 0> 4 3 ' 5 2 ' 6 0 Although the magnitude of this potential problem has not been directly explored in an actual economic evaluation o f a therapy or intervention for R A , Suarez-Almazor and Conner-Spady utilized a hypothetical intervention and results from small surveys conducted with the E Q -5D, R S , T T O and S G techniques in the general public (n=T04), patients with R A (n=51) and health professionals (n=43). Significant differences were found between the scores achieved on the different preference-based methods by technique and by the sample that was surveyed. A s such, the incremental cost per Q A L Y s calculated using these weights for a hypothetical intervention with R A ranged from $40,000 to $220,000. Therefore, the question arises as to what investigators have been using as preference weights for the calculation of Q A L Y s in economic evaluations of R A . For example, i f standardization has already occurred through the mutual yet independent selection of an instrument or technique that appears to be the best suited to measure elements that are germane to R A , then there may be little or no problem in this regard. However, i f there are several different weightings applied to economic evaluations in studies in the literature without an attempt to standardize the outcomes, the results would be very difficult to compare. A s such, Table 2.1 was compiled to examine the different sources for Q A L Y 59 weights that appear in economic evaluations for R A interventions. A s can be seen from Table 1, sources for weighting come from both directly assessed and indirectly assessed preference measures. The S G is the most commonly applied utility weighting being used four times in economic evaluations. The R S and E Q - 5 D were the next most commonly used instruments followed by the T T O . The SF-6D and the H U I systems have not yet been applied in economic evaluations for interventions or treatments for R A . 2.5 SUMMARY Q A L Y is the preferred measure used to integrate both years of life and health -related quality of life into the effectiveness measure in economic evaluations. The quality weightings for Q A L Y s are based upon H R Q L measured anchored at 0 (dead) and 1.00 (full health). Preference-based weightings for the estimation of Q A L Y s are generally recommended and can be either directly elicited or indirectly elicited by using a health classification and scoring system. Techniques to directly elicit preferences include R S , S G , and T T O methods. Indirect measurement techniques that are widely used include the HUI2 , HUI3 , E Q - 5 D , and the SF-6D. The application o f the preference-based, indirect measurement techniques is expanding due to their ease and low cost of administration when compared to the directly elicited techniques. In the literature, there have been a number of recent studies comparing the indirect preference measurement systems in a variety of disease states. Results from these studies suggest that although the instruments perform reasonably well on their own in terms of feasibility, reliability, validity and responsiveness, there are important differences among them. These differences could result in significant variation in the calculation of incremental 60 cost-effectiveness ratios when different instruments are applied as the quality weights in the estimation of Q A L Y s . In R A , there has been little work comparing the properties of the various indirect, preference-based, utility assessment instruments. The published economic evaluations that have included the Q A L Y as an outcome measure make use o f several different weighting sources (RS, T T O , S G , and the EQ-5D) , making comparisons of outcomes between studies difficult. In addition, there have been no studies within R A that determine what the potential impact would be of using different indirect sources of Q A L Y weightings on the outcomes of economic evaluations. A s such, additional work is required in these areas. 61 2.6 REFERENCES 1. Neumann PJ , Goldie SJ, Weinstein M C . Preference-based measures in economic evaluation in health care. Annu Rev Public Health 2000;21:587-611. 2. Drummond M F , O 'Br ien B , Stoddart G L , Torrance G W (eds.). Methods for the economic evaluation of health care programmes. 2nd edition. Oxford Medical Publications, Oxford. 1997. 3. Choi H K , Hernan M A , Seeger S D , Robins J M , Wolfe F. Methotrexate and mortality in patients with rheumatoid arthritis: A prospective study. Lancet 2002;359:1173-7. 4. Progress in cancer control over the past few decades. National Cancer Institute of Canada. Accessed on the Internet, January 25, 2004 at http://www.ncic.cancer.ca/ncic/internet/standard/0,3621,84658243_85787780_91036 035_langld-en,00.html 5. Hogg R S , Heath K V , Y i p B , Craib K J , O'Shaughnessy M V , Schechter M T , Montaner JS. Improved survival among HIV-infected individuals following initiation of antiretroviral therapy. J A M A 1998;279:450-454. 6. Dolan P. The measurement o f health-related quality o f life for use in resource allocation decisions in health care. Chapter 32. In: Handbook of Health Economics, V o l . 1. Edited by Culyer A J , Newhouse JP. London, U . K . Elsevier Science 2000. 7. Tengs T O , Wallace A . One thousand health-related quality of life estimates. M e d Care 2000;38:583-637. 8. Baker R, Robinson A . Responses to standard gambles: are preferences 'well constructed'? Health Econ 2004; 13: 37-48. 62 9. Dolan P, Stalmeier P. The validity of time trade-off values in calculating Q A L Y s : constant proportional time trade-off versus the proportional heuristic. J Health Econ. 2003; 22: 445-58. 10. Hawthorne G , Richardson J, Day N A . A comparison of the Assessment of Quality of Life ( A Q o L ) with four other generic utility instruments. A n n M e d 2001; 33: 358-70. 11. Canadian Coordinating Office for Health Technology Assessment.: Guidelines for Economic Evaluation o f Pharmaceuticals, Canada. Ottawa: The Canadian Coordinating Office for Health Technology Assessment ( C C O H T A ) 2 n d edition, 1997. 12. Horsman J, Furlong W , Feeny D , Torrance G . The Health Utilities Index (HUI®): concepts, measurement properties and applications. Health and Quality of Life Outcomes 2003; 1:54 (available from http://hqlo.com/content/1 /1754). 13. Torrance G W , Feeny D H , Furlong W J , Barr R D , Zhang Y , Wang Q. Multiattribute utility function for a comprehensive health status classification system: Health Utilities Index Mark 2. M e d Care 1996;34:702-722. 14. Feeny D , Furlong W , Torrance G W , Goldsmith C H , Zhu Z , DePauw S, Denton M , Boyle M . Multiattribute and single-attribute utility functions for the Health Utilities Index Mark 3 system. M e d Care 2002;40:113-128. 15. Feeny D H , Torrance G , Furlong W J . Chapter 26: Health Utilities Index. In: Quality of Life and Pharmacoeconomics in Clinical Trials, Second Edition, edited by B . Spilker, Lippincott-Raven Publishers, Philadelphia, 1996: 239-251. 16. Feeny D , Furlong W , Barr R D . A comprehensive multiattribute system for classifying the health status of survivors of childhood cancer. J C l i n Oncol 1992;10:923-928. 63 17. Maddigan S L , Feeny D H , Johnson J A for the D O V E Investigators. Construct validity of the R A N D - 1 2 and the Health Utilities Index Mark 2 and 3 in type 2 diabetes. Qual Life Res 2004 (in press) 18. Blanchard B , Feeny D , Mahon J L , Bourne R, Rorabeck C , Stitt L , Webster-Bogaert S. Is the Health Utilities Index responsive in total hip arthroplasty patients? J C l i n Epidemiol 2003;56:1046-1054. 19. Pickard A S . Responsiveness of generic health status measures in stroke. Doctor of Philosophy thesis. University o f Alberta. 2002. 20. von Neumann J, Morganstern O. Theory of games and economic behaviour. Princeton N J : Princeton University Press, 1944. 21. Keeney R L , Raiffa H . Decisions with multiple objectives: Preferences and value tradeoffs. 2 n d ed. N e w York, N Y : Cambridge University Press, 1993. 22. Kopec J A , Wil l i son K D . A comparative review of four preference-weighted measures of health-related quality of life. J C l i n Epidemiol 2003;56:317-325. 23. Dominick K L , Ahern F M , Gold C H , Heller D A . Health related quality o f life among older adults with arthritis. Health and Quality o f Life Outcomes 2004;2:5 (available at www.hqlo.com/content/2/1/5) 24. Salaffi F, Stancati A , Carotti M . Responsiveness of health status measures and utility-based methods in patients with rheumatoid arthritis. C l i n Rheumatol 2002;21:478-487. 25. Y e l i n E , Trupin L , Katz P, Lubeck D , Rush S, Wanke L . Association between etanercept use and employment outcomes among patients with rheumatoid arthritis. Arthritis Rheum 2003;48:3046-3054. 64 26. Russell A S , Conner-Spady B , Mintz A , Mal lon C, Maksymyowych W P . The responsiveness of generic health status measures as assessed in patients with rheumatoid arthritis receiving infliximab. J Rheumatol 2003;30:941-947. 27. Luo N , Chew L H , Fong K Y , K o h D R , N g SC, Yoon K H , Vasoo S, L i S C , Thumboo J-. A comparison of the EuroQol-5D and the Health Utilities Index Mark 3 in patients with rheumatic disease. J Rheumatol 2003;30:2268-2274. 28. Grootendorst P, Feeny D , Furlong W . Health Utilities Index Mark 3: evidence of construct validity for stroke and arthritis in a population health survey. M e d Care. 2000; 38: 290-299. 29. Samsa G , Edelman D , Rothman M , Will iams G R , Lipscomb J, Matchar D . Determining clinically important differences in health status measures. A general approach with illustrations to the Health Utilities Index Mark II. Pharmacoeconomics 1999;15:141-155. 30. Conner-Spady B , Suarez-Almazor M E . Variation in the estimation of quality adjusted life-years by different preference-based instrument. M e d Care 2003;41:791-801. 31. Brooks R. EuroQol: the current state of play. Health Policy 1996;37:53-72. 32. Coons SJ, Rao S, Keininger D L , Hays R D . A comparative review of generic quality of life instruments. Pharmacoeconomics 2000;17:13-35. 33. Brooks R, Robin R, de Charro F(eds.). The measurement and valuation of health status using E Q - 5 D : A European perspective (Evidence from the EuroQoL B I O M E D Research Programme). Klewer Academic Publishers, Netherlands, 2003. 34. Dolan P. Modeling valuations for the EuroQol health states. M e d Care 1997;35:1095-1108. 65 35. Johnson J A , Coons SJ, Erog A , Azava-Kovats G . Valuation of EuroQol (EQ-5D) health states in an adult U S sample. Pharmacoeconomics 1998;13:421-433. 36. Dorman PJ , Slattery J, Farrell B , Dennis M S , Sandercock P A . A randomised comparison of the EuroQol and Short Form-36 after stroke. United Kingdom collaborators in the International Stroke Trial . B M J 1997; 315: 461. 37. Pickard A S , Weijnen ThJG, Niewenhuizen M G M , Johnson J A , de Charro Fth. A comparison of Canadian and European VAS-based valuations of E Q - 5 D health states (abstract). Can J C l i n Pharmacol 2001;8:23. 38. Wolfe F, Hawley D J . Measurement of the quality of life in rheumatic disorders using the EuroQol. B r J Rheumatol 1997; 25:675-682 39. Johnson J A , Pickard A S . Comparison of the E Q - 5 D and SF-12 health surveys in a general population survey in Alberta, Canada. M e d Care 2000;38:115-121. 40. Hollingworth W , Mackenzie R, Todd C J , Dixon A K . Measuring changes in quality of life following magnetic resonance imaging of the knee: SF-36, Euroqol or Rosser index? Qual Life Res 1995;4:325-334. 41. Essink-Bot M - L , Krabbe P F M , Bonsel G J , Aaronson N K . A n empirical comparison of four generic health status measures: The Nottingham Health Profile, the Medical Outcomes Study 36-item Short Form Health Survey, the C O O P / W O N C A charts and the EuroQol instrument. M e d Care 1997;35:522-537. 42. Jenkinson C , Stradling J, Petersen S. H o w should we evaluate health status? A comparison of three methods in patients presenting with obstructive sleep apnoea. Qual Life Res 1998;7:95-100. 66 43. Longworth L , Bryan S. A n empirical comparison of E Q - 5 D and SF-6D in liver transplant patients. Health Econ 2003; 12: 1061-1067. 44. Hurst N P , Jobanputra P, Hunter M , Lambert M , Lochhead A , Brown H . Validity of Euroqol~a generic health status instrument—in patients with rheumatoid arthritis. Economic and Health Outcomes Research Group. Br J Rheumatol 1994; 33: 655-662. 45. Hurst N P , K i n d P, Ruta D , Hunter M , Stubbings A . Measuring health-related quality of life in rheumatoid arthritis: Validity, responsiveness and reliability of the EuroQol (EQ-5D). B r J Rheumatol 1997;36:551-559. 46. Brazier J, Usherwood T, Harper R, Thomas K . Deriving a preference-based single index from the U K SF-36 Health Survey. J C l i n Epidemiol 1998;51:1115-1128. 47. Brazier J, Roberts J, Deverill M . The estimation of a preference-based measure of health from the SF-36. J Health Econ 2002;21:271-292. 48. Ware JE, Sherbourne C D . The M O S 36-item Short Form Health Survey (SF-36). M e d Care 1992;30:473-483. 49. Walters SJ, Brazier JE. What is the relationship between the minimally important difference and health state utility values? The case of the SF-6D. Health and Quality of Life Outcomes 2003; 1:4 (available at http://www.hqlo.eom/content/l/l/4") 50. Schackman B R , Goldie SJ, Freedberg K A , Losina E , Brazier J, Weinstein M C . Comparison of health state utilities using community and patient preference weights derived from a survey of patients with H I V / A I D S . M e d Decis Making 2002; 22: 27-38. 67 51. O'Brien B J , Spath M , Blackhouse G , Severens J L , Dorian P, Brazier J. A view from the bridge: agreement between the SF-6D utility algorithm and the Health Utilities Index. Health Econ. 2003; 12: 975-981. 52. Neumann PJ , Sandberg E A , Arak i SS, Kuntz K M , Feeny D , Weinstein M C . A comparison of HUI2 and H U D utility scores in Alzheimer's disease. M e d Decis Making. 2000; 20: 413-422. 53. Maddigan S L , Feeny D H , Johnson J A for the D O V E Investigators. Construct validity of the R A N D - 1 2 and Health Utilities Index Mark 2 and 3 in type 2 diabetes. Qual Life Res (in press). 54. Maddigan S L , Feeny D H , Johnson J A for the D O V E Investigators. A comparison of the Health Utilities Indices Mark 2 and Mark 3 in type 2 diabetes. M e d Decis Making 2003;23:489-501. 55. Blanchard C, Feeny D , Mahon J L , Bourne R, Rorabeck C , Stitt L , Webster-Bogaert S. Is the Health Utilities Index responsive in total hip arthroplasty patients? J C l i n Epidemiol 2003;56:1046-1054. 56. Feeny D , Blanchard C , Mahon JL , Bourne R, Rorabeck C , Stitt L , Webster-Bogaert S. Comparing community-preference-based and direct standard gamble utility scores: evidence from elective total hip arthroplasty. Inter J Technol Assess 2003; 19:362-372. 57. Feeny D , Furlong W , Saigal S, Sun J. Comparing directly measured standard gamble scores to HUI2 and H U D utility scores: group- and individual level comparisons. Soc Sci M e d 2004;58:799-809. 68 58. Gl ick H A , Polsky D , Wil lke RJ , Schulman K A . A comparison of preference assessment instruments used in a clinical trial: Responses to the visual analog scale from the EuroQol E Q - 5 D and the Health Utilities Index. M e d Decis Making 1999;19:265-275. 59. Bosch J L , Hunink M G M . Comparison of the Health Utilities Index Mark 3 ( H U D ) and the EuroQol E Q - 5 D in patients treated for intermittent claudication. Qual Life Res 2000;9:591-601. 60. Siderowf A , Ravina B , Gl ick H A . Preference-based quality-of-life in patients with Parkinson's disease. Neurology 2002;59:103-108. 61. Oostenbrink R, M o l l H A , Essink-bot M L . The E Q - 5 D and the Health Utilities Index for permanent sequelae after meningitis. A head to toe comparison. J C l i n Epidemiol 2002;55:791-799. 62. Lubetkin EI , Golde M R . Areas of decrement in health-related quality o f fie ( H R Q L ) : Comparing the SF-12, E Q - 5 D and H U D . Qual Life Res 2003;12:1059-1067. 63. Macran S, Weatherly H , K i n d P. Measuring population health. A comparison of three generic health status measures. M e d Care 2003;41:218-231. 64. Statistics Canada. A head-to-head comparison of two generic health status measures in the household population: McMaster Health Utilities Index (Mark 3) and the E Q -5D. Ottawa: Statistics 2003. 65. Anis A H , Tug well P X , Wells G A , Stewart D G . A cost-effectiveness analysis o f cyclosporine in rheumatoid arthritis. J Rheumatol 1996;23:609-616. 69 66. Kavanaugh A , Heudebert G , Cush J, Jain R. Cost evaluation of novel therapeutics in rheumatoid arthritis ( C E N T R A ) : a decision analysis model. Semin Arthritis Rheum 1996;25:297-307. 67. Verhoeven A C , Bibo JC, Boers M , Engel G L , van der Linden SJ. Cost-effectiveness and cost-utility of combination therapy in early rheumatoid arthritis: randomized comparison of combined step-down prednisolone, methotrexate and sulphasalazine with sulphasalazine alone. Br J Rheumatol 1998;37:1102-1109. 68. Maetzel A , Strand V , Tugwell P, Wells G , Bombardier C . Cost effectiveness of adding leflunomide to a 5-year strategy o f conventional disease-modifying antirheumatic drugs in patients with rheumatoid arthritis. Arthritis Rheum 2002;47:655-661. 69. Choi H K , Seeger JD, Kuntz K M . A cost-effectiveness analysis of treatment options for patients with methotrexate-resistant rheumatoid arthritis. Arthritis Rheum 2000;43:2316-2327. 70. Choi H K , Seeger JD, Kuntz K M . A cost-effectiveness analysis of treatment options for methotrexate-nai've rheumatoid arthritis. J Rheumatol 2002;29:1156-1165. 71. Wong JB , Singh G , Kavanaugh A . Estimating the cost-effectiveness of 54 weeks of infliximab for rheumatoid arthritis. A m J M e d 2002; 113:400-408. 72. Kobelt G , Jonsson L , Young A , Eberhardt K . The cost-effectiveness o f infliximab (Remicaide) in the treatment of rheumatoid arthritis in Sweden and the United Kingdom based on the A T T R A C T study. Rheumatology 2003;42:326-335. 70 73. Brennan A , Bansback N J , Reynolds A , Conway P. Modeling the cost-effectiveness of etanercept in adults with rheumatoid arthritis in the U K . Rheumatology 2004;43:62-72. 74. Kobelt G , Eberhardt K , Geborek P. T N F inhibitors in the treatment of rheumatoid arthritis in clinical practice: Costs and outcomes in a follow-up study of patients with R A treated with etanercept or infliximab in southern Sweden. A n n Rheum Dis 2004;63:4-10. 75. Marra C A , Esdaile J M , Anis A H . Practical pharmacogenetics: The cost-effectiveness of screening for thiopurine s-methyltransferase polymorphisms in patients with rheumatological conditions treated with azathioprine. J Rheumatol 2002;36:1851-1855. 76. Oh K T , Ani s A H , Base S C . Pharmacoeconomic analysis o f thiopurine methyltransferase polymorphism screening by polymerase chain reaction for treatment with azathioprine in Korea. Rheumatology 2004;43:156-163. 77. Spiegel M R B , Targownik L , Dulai G S , Gralnek I M . The cost-effectiveness of cyclo-oxygenasae-2 selective inhibitors in the management of chronic arthritis. A n n Intern M e d 2003;138:795-806. 78. Lee K K , Y o u J H , Ho JT, Suen B Y , Yung M Y , Lau W H , Lee W V , Sung J Y , Chan F K . Economic analysis of celecoxib versus diclofenac plus omeprazole for the treatment of arthritis in patients at risk of ulcer disease. Aliment Pharmacol Ther 2003;18:217-222. 79. Maetzel A , Krahn M , Naglie G . The cost effectiveness of rofecoxib and celecoxib in patients with osteoarthritis or rheumatoid arthritis. Arthritis Rheum 2003;49:283-292. 71 80. Bae S-C, Corzillius M , Kuntz K M , Liang M H . Cost-effectiveness o f low dose corticosteroids versus non-steroidal anti-inflammatory drugs and C O X - 2 specific inhibitors in the long-term treatment of rheumatoid arthritis. Rheumatology 2003;42:46-53. 81. Maetzel A , Tugwell P, Boers M , Guil lemin F, Coyle D , Drummond M , Wong JB , Gabriel S E on behalf o the O M E R A C T 6 Economics Research Group. Economic evaluation of programs or interventions in the management of rheumatoid arthritis: Defining a consensus based reference case. J Rheumatol 2003;30:891-896. 82. Suarez-Almazor M E , Conner-Spady B . Rating of arthritis health states by patients, physicians, and the general public. Implications for cost-utility analysis. J Rheumatol 2001;28:648-656. 72 TABLE 2.1: SOURCE OF PREFERENCES USED FOR QALY WEIGHTS IN ECONOMIC EVALUATIONS OF RA Preference- DMARD NSAID Decision- Clinical or Reference # Elicitation Technique analysis Observational Trial Direct RS V V 67,68,71,79 T T O 77,80 S G vv 67,68,79,80 Indirect E Q - 5 D V 72,73,74 SF-6D HUI2 H U D " D M A R D " refers to a study examining the cost-effectiveness of a disease-modifying antirheumatic drug; " N S A I D " refers to a study examining the cost-effectiveness of a traditional N S A I D or C O X - 2 specific inhibitor; "Decision analysis" refers to the methodology used to perform the economic evaluation; "Cl inical or Observational Tr ia l" refers to whether the economic evaluation was conducted alongside a clinical or observation trial; " R S " = rating scale; " T T O " = time tradeoff; " S G " = standard gamble; " E Q - 5 D " = EuroQol index score; SF-6D = Short-Form 6D index score; HUI2 = Health Utilities Index Mark 2 index score; HUI3= Health Utilities Index Mark 3 index score. 73 CHAPTER 3 A COMPARISON OF FOUR INDIRECT METHODS OF ASSESSING UTILITY VALUES IN RHEUMATOID ARTHRITIS 3.1 FOREWORD This chapter is a cross-sectional comparison of four indirect utility instruments (HUI2, H U D , SF-6D, EQ-5D) in a sample of patients with rheumatoid arthritis. The content of this chapter has been accepted for publication in Medical Care. The candidate is first author on this manuscript, developed the hypotheses, entered and manipulated the data, performed the statistical analyses, and wrote the final manuscript. Co-authors of the study included Daphne Guh, a statistician, Drs. Andy Chalmers and Barry Koehler, rheumatologists who participated in the recruitment of patients, Dr. John Brazier, the developer of SF-6D and Drs. Anis , Esdaile, and Kopec, members of the supervisory committee. 3.2 INTRODUCTION To integrate quality of life into economic analyses, the effectiveness of a health intervention is measured using a metric known as "utility" where values of zero and 1.0 equal death and perfect health, respectively (some measures permit values less than zero for health states ranked worse than death). Utilities are used to calculate quality-adjusted life years ( Q A L Y ) gained by adjusting survival by the average utility weight derived from the outcome of that health intervention. Cost per Q A L Y gained is a unique and preferred 74 measure of the economic value of different interventions, because it permits comparison both within and across disease groups, thereby facilitating funding allocation decisions. 1 A variety of methods exist for measuring health-related quality of l ife. 2 However, in order to integrate such measures into an economic evaluation, the common approach is to use Q A L Y s as the outcome and preference-based assessments (often referred to as "utilities") as the source of weightings to assign quality to life-years.1 The use of a pre-scaled index is often the most convenient and least expensive means of achieving this approach. While no validated index is available for economic evaluations specifically in musculoskeletal disease, several generic preference measures appear suitable for adaptation to economic evaluations in R A . 3 , 4 Examples of these instruments include the Health Utilities Index 2 and 3 (HUI2 and H U B ) , the Short Form 6D (SF-6D), and the EuroQol (EQ-5D). The major characteristics of these instruments have been summarized in Table 3.1 and comprehensive reviews of these instruments are available elsewhere. 1 ' 5 It is important to point out that there is no "gold standard" among these instruments and each likely has its own advantages and disadvantages. Although a few studies have examined the appropriateness of individual indirect utility instruments specifically in R A , 6 " 9 no study has directly compared these measures in the same R A population. However, Conner-Spady et al. recently reported on the intercharigeability of preference-based instruments (the E Q - 5 D , the SF-6D, and the H U B ) in providing weights for Q A L Y s . 1 0 Specifically, they compared the global utility scores in a sample of 161 patients with five different musculoskeletal conditions. With the increased popularity of economic evaluations of new therapies and programmes, the impact of the choice of utility measure to use in the weightings of Q A L Y s 75 is uncertain. It is important to evaluate these instruments in terms of their agreement and also to identify specific deficits of preference-based measures in R A . Thus, our primary objectives of this study were 1) to compare the global utility scores from the HUI2 , HUI3 , E Q - 5 D and the SF-6D at both a sample level and within individuals in a clinically heterogeneous sample o f R A patients; and 2) to determine the extent to which global utility scores from the indirect utility assessment instruments were representative of dimensions of health status measured in a sample of R A patients. 3.3 M E T H O D S Three hundred and thirteen individuals participated in the study. In order to be included in the study, subjects had to have a rheumatologist-confirmed diagnosis of R A (as defined by the American College of Rheumatology diagnostic criteria) 1 1, receive rheumatology care within the province of British Columbia in one of the urban study areas (Vancouver, Richmond) or one of the rural study areas (Vernon and Penticton), consent to answering the questionnaires and be sufficiently proficient in English to answer the questionnaires. Recruitment of R A patients began in October 2001 and ended in September 2002. Ethical approval for this study was obtained through the University of British Columbia's Behavioural Ethics Committee and informed consent was obtained from each of the participants. Eight private practice rheumatologists' offices from the study areas referred subjects as part o f their interactions in routine clinical practice. In addition, two of these rheumatologists' practices sent letters to all of their patients with R A inviting them to participate in the survey. A l l patient questionnaires were self-administered, self-completed 76 and submitted via mail. The study physicians' offices supplied additional information from the patients' health record. 3.3.1. Measures 3.3.1.1 Clinical Participants were asked questions regarding their R A and medication history (including recent adverse events). Other self-reported clinical variables included swollen and tender joint count (using the mannequin-based 42 joint count methodology), 1 2 a 10 cm pain visual analogue scale ( V A S ) and patient global assessment of disease activity (10 cm V A S ) . Erythrocyte sedimentation rate (ESR) values closest to the date of completion of the questionnaire (within 1 month) were extracted from the patient's chart for those patients whose rheumatologist used this measure for patient monitoring. In addition, the attending rheumatologists were asked to complete a physician global assessment of disease activity (10 cm V A S ) for each patient. 3.3.1.2 Questionnaires Respondents self-completed three questionnaires allowing for the scoring of four indirect utility assessment instruments (the HUI2 , the H U D , the E Q - 5 D , and the SF-6D). 3.3.1.3 Hypotheses Since all the instruments purport to measure the same construct (namely, a global utility value), then, in theory, the values obtained with the different instruments should agree within individual subjects. However, since each instrument has been constructed (in terms of 77 domains assessed) and valued in different ways, we hypothesized that there would be significant differences between the instruments. In addition, we hypothesized that the global utility scores achieved with the different M A U T instruments would be represented by different dimensions of health status. 3.3.2. Data Analysis Descriptive statistics were used to characterize the study sample. Repeated measures A N O V A was used to compare global utility values across instruments with Bonferroni's correction to adjust for multiple comparisons between instruments. Due to the skewed nature of the distribution of some of the instrument's global utility scores, nonparametric tests were also applied. Since the results using both approaches agreed, only the results of the parametric tests are reported here. Agreement among the utility scores obtained from the four instruments was assessed using the Intraclass Correlation Coefficient (ICC) with a two-way mixed effect model such that the subject effect is random and the measure effect is f ixed. 1 3 Bland-Altman plots were used to examine patterns of inter-instrument agreement between every possible combination of instruments. 1 4 These plots are useful to reveal a relationship between the differences and the averages to look for any systematic bias and to identify possible outliers. Since, in theory, these instruments should have perfect agreement as they are attempting to measure the same global utility, the difference scores should be randomly distributed closely around the zero line. B y convention, i f 95% of the differences fall within zero ± 1.96 times the standard deviation of the mean difference and are not interpreted to be clinically important, the two methods may be used interchangeably. 78 Minimal ly important differences (MID) in the utility values obtained by the M A U T instruments are thought to be between 0.03 and 0.07. 9 ' 1 5 ' 1 6 Exploratory factor analysis was utilized to identify the dimensions of health measured by the questions and to determine i f their similar domains loaded into identified dimensions. Raw answers for all o f the questions from the SF-6D, the E Q - 5 D and the HUI2/3 questionnaire were utilized for the factor analysis. Due to the data's skewed, categorical nature, techniques based on polychoric correlations were used. Based on the nature o f the instruments, it was expected that the factor analysis would identify the following dimensions: functional ability, pain, cognition, hearing/vision/sensation, and mental/emotional health. Unweighted least squares was utilized as the estimation method for factor extraction. Promax rotation was used as it allows factors to be reasonably correlated. The criteria used to determine the appropriate number o f factors consisted of the scree test, an eigenvalue greater than one, and the presence of residuals greater than 0.05 between the observed and the 17 reproduced correlation matrices and the overall interpretation of the solution. The rotated factor pattern matrix, which represents a matrix that is uncontaminated by overlap among factors, was selected for interpretation of the factor solution. 1 7 1 8 B y adapting the methods o f Richardson and Zumbo to determine the extent to which the relative proportion of variation in the global utility scores were explained by the factor scores produced by the exploratory factor analysis, each global utility score was regressed onto the saved factor scores. To determine the relative contribution of each explanatory variable (i.e., each factor) to the regression equation, a relative Pratt index score was generated. This index quantifies the contribution each independent variable makes to the overall regression equation by partitioning the model R 2 into the proportion attributable to 79 each independent variable. 1 9 One should note that the index is based on a geometric layout of regression and does not make any assumptions regarding the distribution of the variables. In using the modified Pratt Index, we were able to determine which aspects o f health were relevant to R A patients in the different overall utility scores. 3.4 RESULTS Three hundred and thirteen (245 female) respondents with confirmed R A completed the baseline questionnaire. One hundred and ninety seven (63%) patients were recruited directly by the study rheumatologists whereas 116 were recruited via the mail survey. The completion rates differed for direct recruitment (91%) compared to mail recruitment (38%) after accounting for invalid mailings (address problems (n=69), had died (n=6), reported that they did not have R A (n=3), and already recruited (n=l)). The final sample represented a clinically heterogeneous cohort o f patients with R A (Table 3.2). A more detailed description of our cohort is available elsewhere. 2 0 3.4.1 Comparison of Utility Scores Summaries of the instrument utility scores are presented in Table 3.3. There were few missing values in our sample for any o f the four instruments. The distributions of the utility values obtained from the four instruments were markedly different (Figure 3.1). The H U D global utility score was the lowest of the four instruments for 151 (50%) of the participants followed by the SF-6D utility score in 96 participants (32%). The H U I 2 global utility score was the highest score in 141 participants (47%) followed by the E Q - 5 D score in 92 (30%) of participants. Sixteen participants (5%) scored negative values on at least one measure with 80 the H U D global utility score being negative in 5% (n=15) of participants and the E Q - 5 D utility index in 1% (n=2). Both were negative in only one participant. For the SF-6D, a total of 223 health states were defined (the most out of any of the instruments). The most common SF-6D health vector was '121212' (n=9, 2.9%), followed by '523434' (n=6, 1.9%) and '323323' (n=5, 1.6%). N o participant indicated no problems ('111111') or the worst health state ('645655'). A total of 35 different health states were described by the E Q - 5 D health profile (vectors). The most frequent health vector in the sample for the E Q - 5 D was '21221' with 51 (16.2%) indicating this response (some problems in mobility, usual activities and moderate pain/discomfort with no problems in self-care or anxiety/depression). The other most common vectors were '11121' (n=39, 12.4%), '21222' (n=27, 8.6%), '22222' and '22221' (both n=26, 8.3%). Twenty-three (7.3%) indicated no problems ('11111') whereas the worst health state in the sample was '33331' (n=l, 0.3%). Using the HUI2 , a total o f 136 health states were defined. The most common health states were '211112' (n=31, 10.2%) where respondents indicated that they required equipment to hear or see or speak and occasional pain without disruption of normal activities followed by '111112' (n=12, 3.9%). For the H U D , atotal of 217 health states were defined. Four (1%) respondents indicated no problems ('11111111') and 1% (n=5) indicated no problems except for mi ld or moderate pain that prevented no activities ('11111112'). The most common health state vectors were '21111112' (n=14, 4.6%) indicating normal vision with glasses and mild or moderate pain preventing no activities and '21112112' (n=13, 4.2%) indicating normal vision with glasses, mild limitations in the use of hands and fingers and mild or moderate pain that prevented no activities. The worst health state for the H U D system was '51144215' (n=l,0.3%). 81 Using repeated measures A N O V A , there were significant differences among the utility scores obtained by the four instruments within individuals. In examining differences utility scores obtained from the different methods, all comparisons were significant (p<0.005). 3.4.2 Analysis of Agreement The I C C for all o f the measures (HUI2, H U D , EQ-5D, SF-6D) was 0.67 (95% C.I. 0.62 to 0.71). The pairwise I C C values are summarized in Table 3.4. The Bland-Altman plots are presented in Figures 3.2-3.7. In general, for all of the plots, there appeared to be more agreement in the higher utility values compared to the lower utility values across instruments. A l l plots reveal that a substantial proportion of the observations in the lower utility values fall outside the area of zero ± 1.96 times the standard deviation of the differences and most of these differences between any two instruments exceed the M I D . 3.4.3 Exploratory Factor Analysis Bartlett's Test of Sphericity was used to determine whether the correlation matrix differed from the identity matrix. The results o f this test supported the use o f factor analysis (p < 0.0001). There were five factors with eigenvalues greater than one; thus, a factor analysis extracting these factors was performed. This analysis accounted for 74% of the variation in the original variables. Five factors were extracted and interpreted as follows (with strongest loading indicators in brackets): 1) Factor 1 = Physical Functioning (pain mobility/ambulation, self-> 82 care, dexterity, usual activity, physical health, role limitations, social limitations); 2) Factor 2 = Emotional/Mental Health - (anxiety, happiness, and mental health); 3) Factor 3 =Speech (speaking); 4) Factor 4 = Cognition - (cognition, hearing); and 5) Factor 5 = Vision (vision). The rotated factor pattern matrix showing the individual loadings from the raw questions is shown in Table 3.5. The factor correlations are shown in Table 3.6. With the exception of physical functioning/pain and emotional/mental health, there were no moderate or strong correlations among the factors. For multiple linear regressions of the global utility scores (dependent variable), the results revealed that the functional ability/pain factor score contributed the most in explaining the variance in all o f the global utility scores. However, for the HUI2/3 , cognition factor scores explained a large proportion of the variance in the global utility scores whereas the emotional/mental factor scores explained a large component of the variance in the SF-6D and E Q - 5 D utility scores (Table 3.7). 3.5 DISCUSSION This is the first study to report on the results of administering and comparing the four instruments in a relatively large sample of participants with R A . The low number of missing values for each of the instruments attests to the fact that they are suitable to be self-administered in this type of cohort. There were significant differences in utility scores between instruments. In addition, the level of agreement was much lower than theoretically postulated (i.e. an I C C approaching 1.0) with I C C values ranging from 0.56 to 0.79. From the Bland-Altman plots, agreement was much poorer at lower utility values than higher utility values. Al so , it would appear that the global utility scores from the various systems are 83 mostly measuring physical functioning/pain; however, the HUI2/3 are also measuring cognition while the SF-6D and E Q - 5 D are also measuring emotional/mental health. This finding is not surprising considering that the H U I systems were created to be "within the skin" measures; that is, they are primarily concerned with impairment rather than disability or handicap. On the other hand, the SF-6D is based on the psychometric instrument, the SF-36. 2 1 A s such, it is a measure of handicap and includes social and role functioning which are not assessed by the other instruments. The SF-6D, due to its high lower boundary o f +0.30 does not provide a wide range o f utility values. This phenomenon is illustrated in the Bland-Altman plot comparing the global utilities from the SF-6D to the EQ-5D. However, this observation is not unexpected as Brazier stated that this instrument might be most appropriate in "groups experiencing mild to moderate health problems and in those expected to experience comparatively small changes". 2 1 Brazier also states that one of the potential advantages of the SF-6D is the much larger size of its predictive system as compared to the EQ-5D. This appears to be the case as there were 188 more health states defined using the SF-6D compared to the E Q - 5 D system within our cohort. This finding may impart a greater degree of sensitivity to change that this instrument can detect in longitudinal studies especially in those in mildly or moderately impaired. The E Q - 5 D appears to have significant ceiling effects with 21% of the study participants reporting either no problems on all the domains or some problems on one domain only (with the remainder having no problems). More individuals reported having no problems under the E Q - 5 D classification system than reporting the lowest level o f deficit on the SF-6D system. In addition, due to the limited number o f health states that were reported 84 by the sample, there was a lack of variability in,the responses to the five questions. This lack of variation in the responses and low number of possible descriptive health states may impede the sensitivity of the E Q - 5 D in longitudinal studies when compared to the other instruments. The HUI2 and HUI3 displayed superior agreement to the other instruments as they had the highest pair-wise I C C values. However, as shown by the Bland-Altman plot comparing these two instruments, the global utility values defined by them tend to be quite different especially at the lower end of the utility scale. Both of these systems appear to define sufficient health states to enable them to discriminate small changes between patients or within patients over time, which is in agreement with previous research. 2 2 There have been comparisons between instruments in other disease states. For example, there have been comparisons between the HUI2 and H U D in Alzheimer's disease, 1 9 the H U D and the SF-6D in patients at an elevated risk o f sudden cardiac death, 2 3 and the SF-6D, H U D , EQ-5D, the Assessment of Quality of Life Questionnaire ( A Q o L ) , and the Finnish 15D in a sample of Australian Residents. 2 4 Generally, the results of these studies agreed with ours in that there were significantly different values achieved both between instruments and within individuals using the different instruments. There are several limitations to this study. First of all , the scoring functions for both the E Q - 5 D and SF-6D global utility values were derived from samples of the U K population which may differ from preferences given by those in Canada. 2 1 ' 2 5 Secondly, although the SF-6D and the H U I systems utilized the standard gamble as a valuation technique for health states described by their attributes, there are several differences with how the health states 85 are valued. For example, the SF-6D health states were valued directly, whereas the H U I health states were directly valued by a rating scale that was transformed (using a power function) into a standard gamble utility. The E Q - 5 D health states were valued using the time trade-off technique. Also , the H U I systems utilize multiattribute utility theory and a multiplicative model of scoring whereas the SF-6D and E Q - 5 D use empiric methods and additive scoring models. Thus, differences between the health utilities obtained may be confounded by the utility evaluation and valuation methodology. In addition, we have studied a single cohort with a specific disease state thus limiting the generalizability to other diseases. A s we only analyzed cross-sectional data, it is not clear i f changes in the utility values obtained from the M A U T instruments would be similar. I f these changes are similar across instruments, O 'Br ien et al. postulate that this would increase the comfort level for inter-study comparability of Q A L Y s . 2 0 Further research examining the longitudinal validity (responsiveness and sensitivity to change) o f these M A U T instruments is presently ongoing. The results from this research w i l l further guide which of these instruments is the most appropriate to utilize in the assessment of RA-related outcomes. A recent study comparing the M A U T instruments in a sample from the Australian general population recommended that researchers should select a M A U T instrument that is sensitive to the health states that are being investigated. 2 4 Although it is good practice in quality o f life research to randomize the order of administration of questionnaires to enable the evaluation of an "order effect", we did not randomize the sequence of questionnaires. In piloting of the survey in the first 25 patients, we found that patients were not completing the questionnaires in the order that they were 86 presented or another identifiable systematic pattern. Thus, a formal assessment of an "ordering effect" was not feasible. In conclusion, the utility values obtained from the four M A U T instruments were statistically and clinically different. The SF-6D is bounded by a relatively high floor (+0.30) and the E Q - 5 D displayed important ceiling effects that could limit their ability to detect change in patients at the higher or lower limits of the utility scale. The H U I 2 and HUI3 did not appear to suffer from these limitations. It is unlikely that the utility values from the M A U T instruments tested, i f used as the weightings for Q A L Y s in studies examining R A , would result in comparable estimates. 87 3.6 R E F E R E N C E S 1. Drummond M F , O 'Br ien B , Stoddart G L , Torrance G W (eds.). Methods for the economic evaluation of health care programmes. 2nd edition. Oxford Medical Publications, Oxford. 1997. 2. Streiner D L , Norman G R . Health measurement scales. 2nd edition. Oxford Medical Publications, Oxford. 1995. 3. Green C, Brazier J, Deverill M . Valuing health-related quality of life. Pharmacoeconomics 2000;17:151-165. 4. Coons SJ, Rao S, Keininger D L , Hays R D . A comparative review of generic quality-of-life instruments. Pharmacoeconomics 2000;17(l):13-35. 5. Kopec J A , Wil l i son K D . A comparative review of four preference-weighted measures of health-related quality of life. J C l i n Epidemiol 2003;56:317-325. 6. Verhoeven A C , Boers M , van der Linden. Responsiveness of the core set, response criteria, and utilities in early R A . A n n Rheum Dis 2000;59:966-974. 7. Hurst N P , K i n d P, Ruta D , et al.. Measuring health-related quality of life in rheumatoid arthritis: Validity, responsiveness, and reliability of E u r o Q O L (EQ-5D). Br J Rheumatol 1997;36:551-559. 8. Wolfe F, Hawley DJ . Measurement of the quality of life in rheumatic disorders using the EuroQOL. Br J Rheumatol 1997; 36:786-793. 9. Grootendorst P, Feeny D , Furlong W . Health Utilities Index Mark 3: evidence of construct validity for stroke and arthritis in a population health survey. M e d Care 2000; 38:290-299. 88 10. Conner-Spady B , Suarez-Almazor M E . Variation in the estimation of quality-adjusted life-years by different preference-based measures. M e d Care 2003;41:791-801. 11. Araett F C , Edworthy S M , Bloch D A , et al.. The American Rheumatism Association 1987 revised criteria for the classification of rheumatoid arthritis. Arthritis Rheum 1988;31:315-324. 12. Wong A L , Wong W K , Harker J, et al.. Patient self-report tender and swollen joint counts in early rheumatoid arthritis. Western Consortium o f Practicing Rheumatologists. J Rheumatol 1999;26:2551-2561. 13. Shrout P E , Fleiss JL. Intraclass Correlations: Uses in assessing rater reliability. Psychological Bulletin 1979; 2: 420-428. 14. Altaian D G , Bland J M . Measurement in medicine: The analysis of method comparison studies. The Statistician 1983;32;307-317. 15. Drummond M F . Introducing economic and quality of life measurements into clinical studies. A n n M e d 2001;33:344-349. 16. Samsa G , Edelman D , Rothman ML,e t al.. Determining clinically important differences in health status measures. A general approach with illustration to the Health Utilities Index Mark II. Pharmacoeconomics 1999;15:141-155. 17. Tabachnick D G , Fidell L S . Using multivariate statistics (3rd Edition). Harper Collins Publishers, Inc. New York, New York. 1996. 18. Richardson C G , Zumbo B D . A statistical examination of the Health Util i ty Index-Mark III as a summary measure of health status for a general population health survey. Social Indicators Research 2000;51:171-191. 89 19. Thomas D R , Hughes E , Zumbo B D . On variable importance in linear regression. Social Indicators Research 1998;45:253-275. 20. Marra C A , Woolcott JC , Shojania K , et al.. A n assessment of the construct validity of four indirect utility measures in rheumatoid arthritis. Soc Sci M e d (submitted). 21. Brazier J, Roberts J, Deverill M . The estimation of a preference-based measure of health from the SF-36. J Health Econ 2002;21:271-292. 22. Neumann PJ , Sandberg E A , Araki SS, et al.. A comparison of H U I 2 and HUI3 utility scores in Alzheimer's disease. M e d Decis Making 2000;20:413-422. 23. O 'Br ien B J , Spath M , Blackhouse G , et al.. A view from the bridge: agreement between the SF-6D utility algorithm and the Health Utilities Index. Health Econ 2003;12:975-981. 24. Hawthorne G , Richardson J, Atherton Day N . A comparison of the Assessment of Quality of Life ( A Q O L ) with four other generic utility instruments. A n n M e d 2001;33:358-370. 25. Dolan P. Modeling valuations for EuroQol health states. Medical Care 1997;35:1095-1108. 90 TABLE 3.1: COMPARISON OF THE INDIRECT UTILITY ASSESSMENT INSTRUMENTS Dimensions/Domains/Attributes #of Possible Health States Valuation Technique Boundaries HUI2 Sensation (vision, hearing, speech), Mobil i ty , Emotion Cognition, Self-care, Pain 24,000 Standard Gamble -0 .03-1 .00 H U B Vis ion , Hearing, Speech, Ambulation, Dexterity, Emotion, Cognition, Pain 972,000 Standard Gamble -0 .36-1 .00 SF-6D Physical Function, Role Limitation, Social Function, Pain, Mental Health, Vitality 18,000 Standard Gamble 0 .30-1 .00 EQ-5D Mobil i ty , Usual Activities, Self-Care, Pain, Anxiety 243 Time Trade Off -0 .59-1 .00 91 TABLE 3.2: CLINICAL CHARACTERISTICS OF THE STUDY PARTICIPANTS Parameter Mean SD* Age (yrs) 61.5 (25.9) 26 E S R 24.52 (21.02) 21 R A Q o L (range 0 - 3 0 ) 12.82 (8.28) 8 R A Duration (yrs) 13.87(11.41) 11 H A Q 1.10(0.77) 0.8 Pt. Global Assessment 59.82 (25.86) 26 M D Global Assessment 20.88 (23.39) 23 Pain V A S 43.12(27.02) 27 E Q - 5 D Health Thermometer 65.02 (19.27) 19 Tender Joint Count 15.09(11.99) 12 Swollen Joint Count 9.14(9.66) 10 N % Self-Reported R A Severity, n % Very M i l d 9 2.9% M i l d 34 10.9% Moderate 120 38.3% Severe 110 35.1% Very Severe 27 8.6% Missing 13 4.2% Self-Reported R A Control, n % Very Wel l Controlled 33 10.6% Wel l Controlled 76 24.3% Adequately Controlled 123 39.3% Not Wel l Controlled 61 19.5% Not Controlled A t A l l 7 2.2% Missing 13 4.2% * SD = Standard Deviation 92 TABLE 3.3: OVERALL MEAN AND MEDIAN UTILITY SCORES FROM THE INSTRUMENTS IN THE SAMPLE OF RA PATIENTS Instrument Mean* 95% CI Lower Upper Median IQR* N SF-6D 0.63 0.61 0.64 0.60 0.12 302 EQ-5D 0.66 0.63 0.69 0.74 0.19 308 HUI2 0.71 0.69 0.73 0.75 0.28 304 HUI3 0.53 0.50 0.57 0.56 0.44 303 * IQR = interquartile range *P-value O.0001 for comparison of mean utility scores using repeated measures ANOVA (based on n= 302) *P<0.005 for all pairwise comparisons with Bonferroni correction 93 TABLE 3.4: INTRACLASS CORRELATIONS AND 95% CONFIDENCE INTERVALS BETWEEN INSTRUMENTS Instrument HUI2 HUI3 SF-6D EQ-5D HUI2 1.00 0.79 . 0.66 0.68 (0.74-0.83) (0.54-0.72) (0.61-0.74) HUI3 1.00 0.56 0.66 (0.48-0.64) (0.59-0.72) SF-6D 1.00 0.59 (0.51-0.66) EQ-5D 1.00 94 TABLE 3.5: ROTATED FACTOR PATTERN MATRIX Indicator (Questionnaire)* Factor 1 • 2 3 4 5 Pain (SF-6D) 0.88 Mobility (EQ-5D) 0.86 Self-care (EQ-5D) 0.86 Pain (HUI2/3) 0.83 Pain (EQ-5D) 0.81 Physical Functioning (SF-6D) 0.79 Pain with medications (HUI2/3) 0.78 Ambulation (HUI2/3) 0.77 Self-care (HUI2/3) 0.76 Social Limitations (SF-6D) 0.73 0.33 Dexterity (HUI2/3) 0.68 Vitality (SF-6D) 0.62 0.32 Role Limitations 0.60 0.43 Emotion (HUI2/3) 0.89 Anxiety (EQ-5D) 0.85 Mental Health (SF-6D) 0.77 Happiness (HUI2/3) 0.69 Speaking to those you know (HUI2/3) 0.84 Speaking to those you don't know (HUI2/3) 0.93 Memory (HUI2/3) 0.37 0.72 Think/Solve Problems (HUI2/3) 0.51 0.70 Hearing (HUI2/3) 0.63 Vision - Reading (HUI2/3) 0.86 Vision - Recognizing a friend (HUI2/3) 0.84 * Indicators with loadings less than 0.30 have been omitted to improve interpretability 95 TABLE 3.6: FACTOR CORRELATION MATRIX Factor 1 2 3 4 5 Physical Functioning/Pain 1.00 Emotional/Mental Health -0.32 1.00 Speech -0.08 -0.16 1.00 Cognition -0.15 -0.10 -0.15 1.00 Vision -0.15 -0.05 -0.07 -0.04 1.00 96 TABLE 3.7: RELATIVE PRATT INDEX SCORES ASSESSING RELATIVE CONTRIBUTION OF E A C H FACTOR TO THE MODEL'S ADJUSTED R 2 Relative Model Instrument, Pearson Standardized Pratt Index Adjusted Factor Correlation Beta Weight Score* R-square HUI2 0.78 Physical functioning/Pain -0.82 -0.75 0.79 Cognition -0.50 -0.33 0.21 H U B 0.86 Physical Functioning/Pain -0.86 -0.77 0.77 Cognition -0.53 -0.36 0.23 SF-6D 0.83 Physical Functioning/Pain -0.90 -0.77 0.83 Emotional/Mental Health -0.68 -0.21 0.17 EQ-5D 0.51 Physical Functioning/Pain -0.63 -0.71 0.87 Emotional/Mental Health -0.55 -0.13 0.13 * The Relative Pratt Index represents the proportion of the model R-square that is explained by the variable 97 F I G U R E 3 . 1 : D I S T R I B U T I O N S O F G L O B A L U T I L I T Y V A L U E S A C R O S S T H E M A U T I N S T R U M E N T S HUI2 100 B S f f ^ S z E a S S O -- 0 . 2 9 t o - - 0 . 1 9 t o - - 0 . 0 9 t o 0 . 0 1 t o 0 . 1 0 t o 0 . 2 0 t o 0 . 3 0 t o 0 . 4 0 t o 0 . 5 0 t o 0 . 6 0 t o 0 . 7 0 t o 0 . 8 0 t o 0 . 9 0 t o 0.2 0.1 O .O 0 . 0 9 0 . 1 9 0 . 2 9 0 . 3 9 0 . 4 9 0 . 5 9 0 . 6 9 0 . 7 9 0 . 8 9 1 . 0 0 G l o b a l U t i l i t y V a l u e s H U I 3 - 0 . 2 9 t o - - 0 . 1 9 t o - - O . O O t o 0 . 0 1 t o 0 . 1 0 t o 0 . 2 0 t o 0 . 3 0 t o 0 . 4 0 t o 0 . 5 0 t o 0 . 6 O t o 0 . 7 0 t o 0 . 8 0 t o 0 . 9 0 t o 0 . 2 0 . 1 0 . 0 0 . 0 9 0 . 1 9 0 . 2 9 0 . 3 9 0 . 4 9 0 . 5 9 0 . 6 9 0 . 7 9 0 . 6 9 1 . 0 0 G l o b a l U t i l i t y V a l u e s 98 EQ-5D - 0 . 2 9 t o - - O . 1 9 l o - - 0 . O 9 t o 0 . 0 1 t o 0 . 1 0 t o 0 . 2 0 t o 0 . 3 0 t o 0 . 4 0 t o 0 . 5 0 t o 0 . 6 0 t o 0 . 7 0 t o 0 . 8 0 t o 0 . 9 0 t o 0 . 2 0 . 1 O.O 0 . 0 9 0 . 1 9 0 . 2 9 0 . 3 9 0 . 4 9 0 . 5 9 0 . 6 9 0 . 7 9 0 . 8 9 1 . 0 0 G l o b a l U t i l i t y V a l u e s SF-6D -0.29 to - -0.19 to - -0.09 to 0.01 to 0.10 to 0.20 to 0.30 to 0.40 to 0.50 to 0.60 to 0.70 to 0.80 to 0.90 to 0.2 0.1 0.0 0.09 0.19 0.29 0.39 0.49 0.59 0.69 0.79 0.89 1.00 G l o b a l U t i l i t y V a l u e s 99 FIGURE 3.2: BLAND-ALTMAN PLOT OF DIFFERENCE BETWEEN THE HUI2 AND HUI3 VS. THE AVERAGE SCORE WITHIN PATIENTS Average of HUI2 and HUI3 The dotted lines represent ±1.96 times the standard deviation around the difference. The clustering of the data on the right around zero compared to the on the left, indicates that the two utility scores have better agreement in those with higher values and that, at lower values, the HUI2 yields much higher scores than the H U D . 100 FIGURE 3.3: BLAND-ALTMAN PLOT OF DIFFERENCE BETWEEN THE H U B AND THE SF-6D VS. THE AVERAGE SCORE WITHIN PATIENTS • co TS C CC co _D I c 8 c _ li 0 . 0 -.8 • a • J S EL-rn •a c P CP • • 0 . 0 2 4 Average of HUI3 and S F - 6 D .6 1.0 The dotted lines represent ±1.96 times the standard deviation around the difference. The pattern o f observations reveals that for higher utility values, the SF-6D tends to have lower scores than the H U D whereas the converse is true at lower utility values (which is expected since the SF-6D is bounded at +0.30). 101 FIGURE 3.4: BLAND-ALTMAN PLOT OF DIFFERENCE BETWEEN THE HUB AND EQ-5D VS. THE AVERAGE SCORE OF THESE TWO INSTRUMENTS WITHIN PATIENTS A v e r a g e of HUI3 and E Q - 5 D The dotted lines represent ±1.96 times the standard deviation around the difference. From the pattern of the points, it appears that the E Q - 5 D is lower than the H U D at higher values but this relationship reverses at lower values. Also , the odd linear patterns of the points are due to the gaps in the E Q - 5 D scoring system. 102 FIGURE 3.5: BLAND-ALTMAN PLOT OF THE DIFFERENCE BETWEEN THE EQ-5D AND SF-6D VS. THE AVERAGE SCORE WITHIN PATIENTS A v e r a g e o f E Q - 5 D a n d S F - 6 D The dotted lines represent ±1.96 times the standard deviation around the difference. There would appear to be higher agreement on the right compared to the left indicating that the two utility scores have better agreement in those with higher values and that, at lower values, the SF-6D yields much higher scores than the E Q - 5 D . 103 FIGURE 3.6: BLAND-ALTMAN PLOT OF DIFFERENCE BETWEEN THE HUI2 AND THE SF-6D VS. THE AVERAGE SCORE WITHIN PATIENTS Q CD • W T3 C ro • • u x, J J * - o— °a a°a T3 TI • JQ [ i m ,-p c*a 0"n 0.0' X c <D 0 P.D ft 0 J3 D I H° a o • ° . • • • • • B 1 • • • a t 0 • o • 0 d) O C 0 I o • 5 - 4 l n 1.0 Average of HUI2 and S F - 6 D The dotted lines represent ±1.96 times the standard deviation around the difference. The pattern of observations reveals that for higher utility values, the SF-6D tends to have lower scores than the HUI2 whereas the converse is true at lower utility values (which is expected since the SF-6D is bounded at +0.30). 104 FIGURE 3.7: BLAND-ALTMAN PLOT OF DIFFERENCE BETWEEN THE HUI2 AND EQ-5D VS. THE AVERAGE SCORE OF THESE TWO INSTRUMENTS WITHIN PATIENTS Q l o LU TD c ro __ z> I c _ a> <JJ o c I -.0 -.6 a rP B a • • S d? -.2 0.0 .2 .4 Average of HUI2 and E Q - 5 D .8 1.0 The dotted lines represent ±1.96 times the standard deviation around the difference. Also , the odd linear patterns of the points are due to the gaps in the E Q - 5 D scoring system. 105 CHAPTER 4 A COMPARISON OF GENERIC, INDIRECT UTILITY MEASURES (THE HUI2, HUD, SF-6D, AND THE EQ-5D) AND DISEASE-SPECIFIC INSTRUMENTS (THE RAQOL AND THE HAQ) IN RHEUMATOID ARTHRITIS 4.1 FOREWORD This chapter is currently under second review, under the same title, in the journal Social Science and Medicine. The candidate is first author of this manuscript which is co-authored by John Woolcott, a health economist, Drs. K a m Shojania and Robert Offer, clinical rheumatologists who assisted with patient recruitment, Dr. John Brazier, the developer of the SF-6D and by Drs. As lam Anis , John Esdaile and Jacek Kopec, members of the candidate's committee. The candidate's role in this manuscript was the development of the primary hypothesis and methods, data entry and manipulation, statistical analysis, and writing o f the final manuscript. 4.2 INTRODUCTION Rheumatoid arthritis (RA) is a chronic, progressive disease that places a substantial burden on those afflicted and their families. Specifically, the disease itself, its treatments, and complications arising from both, result in detrimental effects on many areas of life including physical, psychological, and social functioning. 1 Yet many clinical measures do not adequately capture the overall impact of the disease on individuals. Furthermore, because 106 the ultimate goal in any therapeutic intervention in R A is to improve health-related quality o f life ( H R Q L ) , it is important to measure this outcome accurately. The two basic categories of instruments used to measure H R Q L are generic instruments and disease-specific instruments.2 Util i ty, or preference-based, measures are an example of generic instruments that are derived from decision and utility theories. A s such, preference-based approaches integrate different aspects o f health into a single index anchored by a value o f ' 1.00' for full health and ' 0 ' for death (health states considered worse than death can be represented by negative values). In turn, these measures are used in economic evaluations to integrate survival and H R Q L into a single metric, the quality adjusted life year ( Q A L Y ) . Preferences for health outcomes can be considered to be utility values when they are choice-based response methods that are framed under uncertainty.3 Health utilities can be either measured directly (using trade-off techniques such as the standard gamble or time trade-off) or indirectly (using multidimensional H R Q L questionnaires developed using multi-attribute utility theory [ M A U T ] ) such as the Health Utilities Index 2 and 3 (HUI2 and H U D ) , 4 , 5 the Short Form 6D ( S F - 6 D ) , 6 and the EuroQol (EQ-5D) . 7 Due to their ease of administration, these indirect measures are commonly used as the source of utility weightings in economic evaluations. A brief overview of these instruments has been provided in Table 4.1 and comprehensive reviews are available. Disease-specific measures are commonly utilized to assess H R Q L in R A . Specifically, the Health Assessment Questionnaire ( H A Q ) Disability Index 9 is a commonly used disease-specific measure whereas the Rheumatoid Arthritis Quality of Life ( R A Q o L ) questionnaire 1 0 is a newly developed instrument. The H A Q was originally developed as one 107 of the first self-report, functional status (disability) measures and has become one of the dominant instruments in musculoskeletal diseases including rheumatoid arthritis. 9 The H A Q has been utilized to assess disability for approximately two decades and is a mandated outcome for clinical trials in R A . The R A Q o L is the first patient-completed instrument specifically designed for use with R A patients. 1 0 It was derived directly from qualitative interviews with relevant patients and considers aspects of many areas o f life that have been affected detrimentally by R A . The goal of the R A Q o L is to be a comprehensive, disease-specific scale that w i l l be more responsive to change than previous scales used in R A . When comparing disease-specific to generic measures of H R Q L , disease-specific measures focus on the particular problems that are often unique to the disease that they are developed to assess. A s such, these measures may have greater ability to measure functional impairments resulting from the disease and detect smaller changes in health relative to generic measures. However, generic measures permit comparisons across disease states which may provide useful data for health policy and resource allocation decision-making. However, there is agreement in the literature that any instrument utilized to assess H R Q L needs to be valid and reliable 1 1 and the most rigorous approach to establishing validity is construct validity. Generally, validity can be defined as the extent to which an instrument measures the property that it is intended to measure. 1 1 Construct validity is an assessment of the extent to which the scores of an instrument correlate with other hypothesized measures or indicators of the health concept or concepts o f interest. ' In a recent paper by Brazier et a l . , 1 2 an argument is made against traditional means to assess construct validity for preference-based measures. One method of assessing construct validity is to test a measure's ability to discriminate between groups hypothesized to differ in 108 terms o f health. However, Brazier et al. stated concern that the aspect of health used to subdivide groups might not reflect preferences (for example, age). To rectify these problems, 10 Brazier et al. make a case for empirical validity (a form o f construct validity) to be the "acid test" for preference-based measures. Specifically, they propose that a preference-based instrument should generate values that reflect people's preferences and supply a hierarchy of evidence of how to determine this: revealed preference data, stated preference data and hypothesized preferences. The authors state that when revealed preference data and stated preference data are lacking (which is common in health care), hypothesized preference data can be used. Hypothesized preferences are very similar to construct validity in that the researcher must hypothesize or construct the expected difference. Provided that care is taken in examining health states where clear preference differences would exist, the hypothesized preference approach should yield valid results. A method o f assessing this is to examine whether the global utility scores of an instrument reproduce the expected differences between groups of patients. Thus, as long as care is taken in selecting the groups such that preferences for health states would be expected to differ, this would be an appropriate method to assess the construct validity of preference-based instruments. There remains a gap in the literature regarding the assessment of construct validity for both global and single-attribute indirect measures of health utility. Although construct validity has been investigated for the HUI2 and H U D in Type 2 diabetes, 1 3 the H U D in self-reported stroke and arthritis, 1 4 the HUI2 and H U D for Alzheimer's disease, 1 5 the H U D and the E Q - 5 D in intermittent claudication, 1 6 and the E Q - 5 D in rheumatoid arthritis, 1 7 there are little comparative data across all four instruments in the same patient sample with a wel l , 109 delineated, chronic disease. In addition, there are no data that compare the construct validity of disease-specific instruments like the Health Assessment Questionnaire ( H A Q ) and the Rheumatoid Arthritis Quality of Life Questionnaire ( R A Q o L ) . Thus, the objectives of this study were to examine the cross-sectional construct validity o f the global and single-attribute scores from the indirect utility instruments in terms o f their ability to distinguish between subgroups of individuals with different levels of R A severity, compare amongst them and compare them to disease-specific instruments. In addition, for each of the instruments, the minimally important difference was determined and was compared to previously defined values where available. 4.3 METHODS 4.3.1 Sample Three hundred and thirteen individuals participated in the study. To be included, subjects had to have a rheumatologist-confirmed diagnosis of R A (as defined by the American College of Rheumatology diagnostic criteria), 1 8 receive rheumatology care within the province of British Columbia in one of the study areas (Vancouver, Richmond, Vernon and Penticton), consent to answer the questionnaires, be sufficiently proficient in English to answer the questionnaires, and be wil l ing to participate in follow-up surveys. Recruitment of R A patients began in October 2001 and ended in September 2002. Ethical approval for this study was obtained through the University of British Columbia's Behavioural Ethics Committee and informed consent was obtained from each of the participants. Eight private rheumatologists' offices from the study areas referred subjects into the cohort during their interactions in routine clinical practice. In addition, two of these 110 rheumatologists' practices sent letters to all of their patients with R A inviting them to participate in the survey. A l l patient questionnaires were self-administered, self-completed and submitted via mail. The study physicians' offices supplied additional information from the patients' health record. 4.3.2 Measures 4.3.2.1 Clinical Participants were asked questions regarding their R A and medication history including adverse reactions over the past three months. Other self-reported clinical variables included swollen joint count (SJC) and tender joint count (TJC) (using the mannequin-based 42 joint count methodology), 1 9 a 10 cm pain visual analogue scale ( V A S ) , a patient global assessment of disease activity (10 cm V A S ) , and R A severity and R A control (both using a 5 point Likert scale). Erythrocyte sedimentation rate (ESR) values closest to the date of completion of the questionnaire (within 1 month) were extracted from the patient's chart for those patients whose rheumatologist used this measure for patient monitoring. In addition, the attending rheumatologists were asked to complete a physician global assessment of disease activity (10 cm V A S ) for each patient. 4.3.2.2 Questionnaires 4.3.2.2.1 Health Assessment Questionnaire (HAQ) Disability Index The H A Q is a measure of physical disability that assesses a respondent's ability to complete everyday tasks in areas such as dressing and grooming, rising, eating, walking, personal hygiene, reach, grip and other activities (such as getting into and out o f a car). Each 111 of these areas is assigned a section score that is further adjusted to account for the use of any aids, devices or help from another person. These are then summed and averaged to give an overall score between 0.0 (best possible function) to 3.0 (worst function). A H A Q score difference of 0.25 is said to represent the minimally important difference ( M I D ) . 2 0 ' 2 1 4.3.2.2.2 Rheumatoid Arthritis Quality of Life Questionnaire (RAQoL) The R A Q o L consists of 30 questions (answered by yes/no) that assess such aspects of R A as moods and emotions, social life, hobbies, everyday tasks, personal and social relationships, and physical contact. The R A Q o L is scored by assigning a point for each affirmative response and no points for negative responses. Thus, scores range from 0 (least severity) to 30 (highest severity). To date, the M I D for the R A Q o L has not been determined. 4.3.2.2.3 Preference Based Measures - MA UT Instruments The multi-attribute utility theory ( M A U T ) based instruments used in the questionnaire were the HUI2 , H U D , SF-6D, and the EQ-5D. The major differences between these instruments are outlined in Table 4.1. The M I D for the HUI2 and the H U D overall utility scores is considered to be 0.03. 4 For the EQ-5D, 0.03 has been postulated to be the M C I D , as it is the smallest of the coefficients in the York weights (i.e., the smallest difference in moving from one level to another on any of the 5 dimensions). 2 2 Finally for the SF-6D global utility scores, the M I D has been estimated to be 0.033 (95% CI: 0.029 to 0.037) from an analysis examining seven longitudinal studies involving the SF-36. In addition to the overall utility scores, single attribute utility scores can be determined for each of the dimensions that are assessed by HUI2 and H U D . Although 112 single-attribute scores are not generally computed for the E Q - 5 D and SF-6D systems, they can be calculated within the scoring functions by holding all other domains constant at no impairment and determining the score for the domain of interest by utilizing the reported categorical response. While we do not advocate the widespread adoption of this methodology, we utilized this approach in an exploratory fashion to facilitate comparisons between these instruments and the H U I systems. A s such, these single attribute scores range from 0.0 to 1.0 and represent functional capacity on each of the dimensions independent of the other attributes in the instrument. 4.3.3 Data Analysis A s an assessment of construct validity of the M A U T instruments global utility scores, the H A Q disability score and the R A Q o L , tests of statistical significance were used to determine the ability of each summary score to discriminate between groups o f differing disease severity. More severe or advanced R A was defined in terms of self-reported severity and control, recent adverse events to R A drug therapy, hospitalizations due to R A in the past year, other chronic diseases besides R A , absenteeism from work or school in the past year due to R A , and use of allied health/home services and the use of special equipment for R A . For all scales, it was hypothesized that groups with more advanced or severe R A would have lower scores. The effect size, the standardized mean difference between two groups on a measured outcome, was calculated for each of the dichotomous clinical variables for each instrument. A n effect size of 1 indicates a change in magnitude equivalent to one standard deviation. According to Cohen, the absolute value of effect sizes (d) can be categorized as small (d = 113 0.2 to 0.5), medium (d = 0.5 to 0.8), or large (d > 0.8). Comparing the effect sizes across the different indirect utility and disease-specific instruments allowed for a comparison in the instruments' abilities to discriminate between groups of different disease severity, with a larger effect size indicating better discriminative ability. To further assess and compare the construct validity among the instruments, relationships between continuous clinical variables and the M A U T global utilities, the single attribute utilities, the H A Q and the R A Q o L were assessed with Spearman's correlations. It was postulated that strong correlations/relationships would exist between the overall scores from all the instruments and measures of R A severity. For the single attribute utility scores, it was postulated that strong correlations/relationships would exist between mobility, self-care and pain (from the HUI2) , ambulation, dexterity and pain (from the H U D ) , physical functioning, role limitations, pain and vitality (from the SF-6D), mobility, self-care, usual activities, and pain/discomfort (from the EQ-5D) and the continuous measures of R A severity. Due to the skewed nature of the data, non-parametric correlations (Spearman's rho) were calculated. A Spearman's rho of > 0.50 or < -0.50 were considered be strong, while values between -0.49 to -0.30 or 0.30 to 0.49 were considered moderate and values between -0.30 and 0.30 were considered to be weak. 2 5 According to the methods outlined by Samsa et a l . , 2 6 M I D values were calculated for each of the M A U T instruments using the calculated effect sizes. In brief, the methodology is as follows: 1) using Cohen's criteria 2 4 , the absolute value of the effects under consideration were considered to be small (d=0.20); 2) the standard deviation of the global utility from the instruments were determined (Table 4.2); 3) a preliminary estimate of the M I D was determined by multiplying the effect size (0.2) by the standard deviation for each of the 114 summary scores. Differences in scores for each dichotomous clinical parameter were compared to the M I D estimate to determine i f they were clinically important. Thus, our hypothesis was that, using the methods of estimating effect size-based clinically important differences (CID) from cross-sectional data, the differences in overall instrument scores between groups of different R A severity, would exceed the minimally important difference (MID). The H A Q was also utilized as a means to estimate the M I D for the overall utility scores. Using simple, ordinary least squares (OLS) , linear regression, the M A U T overall utility scores and the R A Q o L score (independent variables) were regressed on the H A Q score (dependent variable). Since it is well accepted that the M I D for the H A Q is 0.25, the M I D for each instrument was estimated from the beta coefficient from the regression model to produce a 0.25 change in H A Q score. Descriptive statistics were used to characterize the study sample. Parametric tests (t-tests and A N O V A ) were used to test for important differences between those with missing data from the M A U T instruments. Both parametric (t-tests and A N O V A ) and non-parametric (Mann Whitney tests and Kruskal-Wallis tests) were used due to the skewed nature of the data. However, since the parametric and non-parametric approaches agreed, only the results of the parametric tests are reported. 4.4 RESULTS 4.4.1 Sample Three hundred and thirteen (245 female) respondents with confirmed R A completed the baseline questionnaire. One hundred and ninety seven (63%) patients were recruited 115 directly by the study rheumatologists whereas 116 were recruited via mail . The completion rates of the surveys differed according to the method of recruitment. For direct recruitment by a study rheurnatologist, 91% completed the baseline questionnaire, whereas for recruitment by mail, there was a 38% completion rate after accounting for invalid mailings (returned due to address problems (n=69), patient had died (n=6), patient did not have R A (n=3), and patient already recruited by a different rheurnatologist (n=l)). Those recruited by mail tended to be older (62 vs. 58 years, p=0.01), had R A for a longer period of time (15 vs. 10 years, p=0.0002), had better perceived control of their R A (16% vs. 26% rated as "not well controlled" or "not controlled" at all , p=0.03) and included more females (84% vs. 74%, p=0.03) than those recruited directly by a rheurnatologist. The final sample represented a clinically heterogeneous cohort of patients with R A (Table 4.2). There were few missing values in the H R Q L or clinical variables. For the H R Q L questionnaires, the lowest completion rate was for the SF-6D with 11 (<4%) missing values. There were no significant differences in demographic or R A characteristics identified between those with complete and missing values. 4.4.2 Description of Global and Single-Attribute Utilities In Table 4.3, a summary of the results of the multiattribute and single attribute utility values for the M A U T instruments is displayed. In Table 4.4, the specific domain responses for the M A U T instruments are given. A comparison of the distributions of the utility scores is illustrated in Figure 4.1. 116 4.4.3 Construct Validity A s hypothesized, all of the M A U T instrument global utility scores were lower in groups thought to have higher R A severity and most of these relationships were statistically significant (Tables 4.5 and 4.6). For self-reported disease-severity and control (each with five categories of responses), there was a gradient across all the instruments' global scores with the highest level of severity/control having the lowest utility and vice versa (Table 4.5). The Spearman correlation coefficients were very similar across both the disease-specific and generic instruments. Other relationships between dichotomous, disease severity indicators and the summary scores for each of the instruments are shown in Table 4.6. For all o f the severity variables, the hypothesized relationship of a better score (a higher global utility for the M A U T instruments and lower scores for the disease-specific instruments) was found to be valid. O f these, 29 of the 36 were significant at p<0.05. For the effect size analysis, 32 out of the 36 calculated effect sizes exceeded Cohen's low limits o f 0.2 (Table 4.6). The H A Q and R A Q o L were generally better able to discriminate among the groups of lower and higher severity as indicated by the larger effect sizes. However, all of the M A U T instruments appeared to have discriminative ability with the H U D having the largest magnitude in overall differences across the dichotomous severity measures. For the correlation analysis between the multi-attribute and single-attribute utility scores and the disease severity measures, all the expected correlations were in the hypothesized direction and most were highly significant. Strong correlations were observed consistently with the R A Q o L score, the patient global V A S , the H A Q disability score, and 117 the pain V A S with the M A U T global utility scores (Table 4.7). For the single attribute utility scores that were postulated to be highly correlated, consistent strong correlations existed only with the R A Q o L scores and the H A Q disability scores. Generally, with the exception of the pain/discomfort single attribute scores from the EQ-5D, the pain V A S was also strongly correlated with the pain single attribute scores, the mobility single attribute score from the HUI2 , and the physical functioning and the role limitations single attribute scores from the SF-6D. The patient global V A S was strongly correlated with the pain single attribute scores from the HUI2 , H U D , and the SF-6D (along with the physical functioning and role limitations single attribute scores from this measure) and the usual activities single attribute score from the EQ-5D. For the disease specific measures, the R A Q o L score was strongly correlated with all o f the disease severity measures with the exception of R A duration, whereas the H A Q displayed a similar correlation pattern as the global utility scores with strong correlations with the R A Q o L , the patient global V A S , and the pain V A S . Using the effect size methodology to estimate the M I D , these values for each of the M A U T instruments were 0.04 for the HUI2 , 0.06 for the H U D , 0.03 for the SF-6D, and 0.05 for the E Q - 5 D . For the disease-specific measures, the estimated M I D was 1.70 for the R A Q o L and 0.15 for the H A Q disability index. A s it can be seen in Table 4.6, the differences in global scores between naturally occurring groups based on clinical characteristics generally exceeds these M I D estimates. Finally, the results of the simple, linear regression revealed strong associations between the M A U T instruments' global utility values and the R A Q o L (dependent variables) and the H A Q disability index (Table 4.8). In estimating the M I D of the M A U T instruments and the R A Q o L using the accepted M I D of the H A Q and the beta coefficients from the linear 118 regression, it was found that the M I D estimates were in general agreement to those determined by the effect size methodology (0.04 vs. 0.03 for the HUI2 , 0.07 vs. 0.06 for the HUI3 , 0.03 and 0.05 for the SF-6D and the E Q - 5 D , respectively). For the R A Q o L , both methodologies yielded similar results (1.70 and 2.0). 4.5 DISCUSSION This is the first study to examine the construct validity of these four generic, M A U T instruments simultaneously in a relatively large cohort of participants with a single, wel l -defined, chronic disease. In addition, it is the first to compare the generic M A U T instruments to two disease specific measures (the H A Q and the R A Q o L ) in their relative abilities to discriminate across R A severity. Finally, the estimates of the M I D values from each of the instruments both serves as a comparison to those with prior M I D values estimated in the literature (HUI2 and H U D , and the H A Q ) , 4 ' 2 0 ' 2 1 and provides new information for those instruments without prior M I D estimates (SF-6D, EQ-5D, and the R A Q o L ) . The low number of missing values (all < 4%) for each of the instruments attests that they are suitable for self-administration. Overall, all the instruments tend to discriminate across disease severity based on multiple criteria with worse scores being associated with measures indicating a higher severity of R A . The results of the differences in the overall scores across known, naturally occurring groups (Table 4.6) general supports construct validity for each of the instruments. However, there were some important differences among the generic, M A U T instruments. For example, only the E Q - 5 D and the SF-6D overall scores were significantly different between those 119 experiencing adverse events to R A drug therapy over the previous three months compared to those who did not. Similarly, o f the M A U T instruments, only the HUI2 and H U D overall scores were significantly different between groups with and without R A hospitalization and other chronic diseases. It is not clear why these differences among the instrument scores exist but may be due to the different aspects of health that are represented by each o f the systems. O f note, differences across groups based on R A severity were consistently significant for the R A Q o L and the H A Q severity index (with the exception of adverse events for the latter). Contrary to the hypothesis that strong correlations would exist between overall instrument scores or selected single attribute utility scores and R A duration, this finding was not observed. Similarly, few strong correlations were observed for SJC or T J C and the scale results. A l l other hypothesized relationships were found to be significant. The length of R A is a somewhat imprecise measure of severity as some patients have severe, aggressive disease from the onset and others have a slow, insidious disease process. A s shown in Table 4.4, more individuals reported having no problems under the E Q -5D and the HUI2 classification systems than reporting the lowest level of deficit on the other systems. This lack of variation in the responses and low number of possible descriptive health states may impede the sensitivity of the E Q - 5 D and HUI2 in longitudinal studies when compared to the other instruments. Conversely, both the SF-6D and the HUI-3 tended to have responses across the full range of severity for most of the domains assessed. Therefore, the SF-6D and the H U D may have a higher degree of sensitivity to the disease burden of R A . The estimates of M I D that we obtained using the effect-size and regression methodologies closely agreed for the HUI2 and H U D (within 0.01) and were exactly the same for the SF-6D and the EQ-5D. However, although the results o f the HUI2 M I D 120 estimates agree with what has been postulated in the literature (Drummond stated that a difference o f 0.03 in utility values should be used for the basis o f sample size calculations for the HUI2 and H U B ) , 2 7 the estimates we obtained for the H U B M I D were higher at 0.06 and 0.07. Moreover, the M I D for the SF-6D estimated by Brazier et al . was identical to those that we obtained using different methodologies. 2 3 However, using any o f the criteria, it would appear that, for the global utility values, the differences between groups in Table 4.6 exceed the M I D . The results o f this study lend support to the construct validity o f al l o f the generic M A U T instruments and the disease-specific instruments in R A and provide some detail regarding their limitations and strengths. A s mentioned, Brazier et al.'s concern is that a score of a preference-based measure may fail to detect a hypothesized difference simply because the difference is not valued by patients or the result of an insensitive scoring system. We believe that we have chosen health states in R A that would clearly result in hypothesized differences in preferences. Besides, we did see differences among these different health states in the anticipated directions, which would appear to address Brazier's concern that these instruments might not identify changes appropriately. Another limitation was that most of the data obtained was self-reported without subsequent verification from clinical records. Thus, it is possible that study participants did not accurately or objectively describe the severity of their R A . However, we believe that the risk of this is low as previous research has shown that patient self-reporting o f symptoms is valid and reliable in R A . 1 9 In conclusion, the overall and certain single-attribute scores o f both the generic M A U T instruments and the disease-specific instruments are all able to distinguish between 121 groups that were defined by measures of R A severity. A s expected, the disease-specific instruments appeared to be slightly superior to the generic measures; however, the M A U T instruments appeared to have construct validity for R A . Effect-size and regression-based estimates of the M I D for the instruments agreed providing comparison with the M I D values previously postulated in the literature or new information for those measures without prior estimates. Most of the differences across R A severity in our cohort exceeded the M I D for overall scores using any criteria. 122 REFERENCES Tijhuis, G.J . , de Jong, Z , Zwinderman, A . H . , Zuijderduin, W . M . , Jansen, L . M . A , et al.. The validity of the Rheumatoid Arthritis Quality of Life ( R A Q o L ) questionnaire. Rheumatology 2001;40:1112-1119. Guyatt G H , Feeny D H , Patrick D L . Measuring health-related quality o f life. A n n Intern M e d 1993; 118-622-629. Drummond M F , O 'Br ien B , Stoddart G L , Torrance G W (eds.). Methods for the economic evaluation of health care programmes. 2nd edition. Oxford Medical Publications, Oxford. 1997. Torrance G W , Feeny D H , Furlong W J , Barr R D , Zhang Y , Wang Q. Multiattribute utility function for a comprehensive health status classification system: Health Utilities Mark 2. Medical Care 1996;34:702-722. Feeny D , Furlong W , Torrance G W , Goldsmith C H , Zhu Z , et al.. Multiattribute and single-attribute utility functions for the Health Utilities Index Mark 3 system. Medical Care 2002;40:113-128. Brazier J, Roberts J, Deverill M . The estimation of a preference-based measure of health from the SF-36. J Health Econ 2002;21:271-292. The EuroQol Group. EuroQoL - a new facility for the measurement of health-related quality of life. Health Policy 1990;16:199-208. Lubeck D P . Health-related quality o f life measurements and studies in rheumatoid arthritis. A m J Manag Care. 2002,8:811-820. Bruce B , Fries JF. The Stanford Health Assessment Questionnaire: a review of its history, issues, progress, and documentation. J Rheumatol 2003;30:167-178. 123 10. De Jong Z , V a n Der Heijde, Mckenna SP, Whalley D . The reliability and construct validity of the R A Q o L : A rheumatoid arthritis-specific quality of life instrument. B r J Rheumatol 1997;36:878-883. 11. Streiner D L , Norman G R . Health Measurement Scales: A Practical Guide to Their Development and Use. 2nd Edition. Oxford University Press, N e w York, 1995. 12. Brazier J, Deverill M . A checklist forjudging preference-based measures of health related quality of life: Learning from psychometrics. Health Econ 1999;8:41-51. 13. Maddigan SL , Feeny D H , Johnson J A . Construct validity of the R A N D - 1 2 and Health Utilities Index Mark 2 and 3 in Type 2 diabetes. Qual Life Res (in press). 14. Grootendorst P, Feeny D , Furlong W. Health Utilities Index Mark 3: evidence of construct validity for stroke and arthritis in a population health survey. M e d Care. 2000;38:290-299. 15. Neumann PJ , Sandberg E A , Arak i SS, Kuntz K M , Feeny D , Weinstein M C . A comparison of HUI2 and H U D utility scores in Alzheimer's disease. M e d Decis Making 2000;20:413-422. 16. Bosch J L , Hunink M G M . Comparison of the Health Utilities Mark 3 ( H U D ) and the EuroQol E Q - 5 D in patients treated for intermittent claudication. Quality of Life Research 2000;9:591-601. 17. Hurst N P , K i n d P, Ruta D , Hunter M , Stubbings A . Measuring health-related quality of life in rheumatoid arthritis: Validity, responsiveness, and reliability o f EuroQOL (EQ-5D). B r J Rheumatol 1997;36:551-559. 124 18. Arnett F C , Edworthy S M , Bloch D A , McShane D J , Fries JF, Cooper N S , et al.. The American Rheumatism Association 1987 revised criteria for the classification of rheumatoid arthritis. Arthritis Rheum 1988;31:315-324. 19. Wong A L , Wong W K , Harker J, Sterz M , Bulpitt K , Park G , Ramos B , Clements P, Paulus H . Patient self-report tender and swollen joint counts in early rheumatoid arthritis. Western Consortium of Practicing Rheumatologists. J Rheumatol 1999;26:2551-2561 20. Redelmeier D A , Lor ig K . Assessing the clinical importance o f symptomatic improvements — an illustration in rheumatology. Arch Intern M e d 1993;153:1337-1342. 21. Wells G A , Tugwell P, Kraag G R , Baker PR, Groh J, Redelmeier D A . Min imum important difference between patients with rheumatoid arthritis: the patient's perspective. J Rheumatol 1993;20:557-560. 22. Johnson J A . (2003) Personal communication. 23. Walters SJ, Brazier JE. What is the relationship between the minimally important difference and health state utility values? The case of the SF-6D. Health Qual Life Outcomes. 2003;11:4-12. 24. Cohen J. Statistical power analysis for the behavioural sciences. 2nd ed. Hillsdale (NJ): Lawrence Erlbaum Assoc., 1988. 25. Cohen J. A power primer. Psychol B u l l 1992;112;155-159. 26. Samsa G , Edelman D , Rothman M L , Will iams G R , Lipscomb J, et al.. Determining clinically important differences in health status measures. A general approach with 125 illustration to the Health Utilities Index Mark II. Pharmacoeconomics 1999;15:141-155. Drummond M . Introducing economic and quality of life measurements in clinical studies. A n n M e d 2001;33:344-349. 126 TABLE 4.1: OVERVIEW OF MAUT INSTRUMENT PROPERTIES Dimensions/Domains/Attributes #of Possible Health States Valuation Technique Boundaries HUI2 Sensation (vision, hearing, speech), Mobil i ty , Emotion Cognition, Self-care, Pain 24,000 Standard Gamble -0 .03-1 .00 HUI3 Vis ion , Hearing, Speech, Ambulation, Dexterity, Emotion, Cognition, Pain 972,000 Standard Gamble -0 .36-1 .00 SF-6D Physical Function, Role Limitation, Social Function, Pain, Mental Health, Vitality 18,000 Standard Gamble 0 .30 -1 .00 EQ-5D Mobil i ty , Usual Activities, Self-Care, Pain, Anxiety 243 Time Trade Off -0 .59-1 .00 127 TABLE 4.2: CHARACTERISTICS OF THE STUDY PARTICIPANTS Parameter Mean (SD)* Median (IQR)* Age (yrs) 61.5(25.9) 63.0(19.0) ESR 24.52 (21.02) 18.0 (24.0) RAQoL score (range 0 - 30) 12.82 (8.28) 12.5(13.0) RA Duration (yrs) 13.87(11.41) 12.00(15.67) HAQ Disabilty Index (range 0 - 3.00) 1.10(0.77) 1.125 (1.375) Patient Global Assessment (100 mm VAS) 59.82(25.86) 65.00 (37.00) MD Global Assessment (100mm VAS) 20.88 (23.39) 10.5(29.0) Pain VAS (100mm VAS) 43.12(27.02) 42.00 (44.00) EQ-5D Health Thermometer 65.02(19.27) 70 (30) Tender Joint Count (range 0 - 28) 15.09(11.99) 12.00(15.00) Swollen Joint Count (range 0 -28) 9.14(9.66) 6.00(11.00) Self-Reported RA Severity, n % Very Mild 9 3% Mild 34 11% Moderate 120 38% Severe 110 35% Very Severe 27 9% Missing 13 4% Self-Reported RA Control, n % Very Well Controlled 33 11% Well Controlled 76 24% Adequately Controlled 123 39% Not Well Controlled 61 19% Not Controlled At All 7 2% Missing 13 4% Adverse Drug Reaction to RA Medication in Last 3 Months, n % Yes 108 35% No 202 65% Hospitalized For RA in Last 12 Months, n% Yes 45 15% No 253 85% Missed Work or School Due to RA in Last 12 Months, n% Yes 59 37% No 100 63% Purchased or Rented Equipment for RA in Last 12 Months, n% Yes 72 28% No 183 72% Used Allied Health Professional/Home Care Services for RA in Last 12 months, n % Yes 129 42% No 174 58% Concomitant Chronic Illness Other Than RA, n% Yes 192 62% No 118 38% * SD = Standard deviation; IQR = interquartile range 128 TABLE 4.3: MULTI-ATTRIBUTE AND SINGLE ATTRIBUTE UTILITY SCORES FROM THE MAUT INSTRUMENTS N Mean STD* Median IQR* Min. Max. HUI2 Global Utility Score 304 0.71 0.20 0.75 0.28 0.11 1.00 HUI2 Single Attribute Scores Sensation 304 0.95 0.04 0.95 0.61 1.00 Mobility 304 0.96 0.06 0.97 0.03 0.73 1.00 Emotion 304 0.96 0.05 0.93 0.07 0.70 1.00 Cognition 304 0.98 0.03 1.00 0.05 0.65 1.00 Self care 304 0.99 0.03 1.00 0.03 0.80 1.00 Pain 304 0.86 0.15 0.85 0.12 0.38 1.00 H U D Global Utility Score 303 0.53 0.29 0.56 0.44 -0.16 1.00 HUD Single Attribute Scores Vision 303 0.98 0.04 0.98 0.75 1.00 Hearing 303 0.98 0.05 1.00 _ 0.61 1.00 Speech 303 0.99 0.02 1.00 0.81 1.00 Ambulation 303 0.94 0.08 0.93 0.07 0.58 1.00 Dexterity 303 0.89 0.11 0.95 0.19 0.56 1.00 Emotion 303 0.95 0.08 1.00 0.05 0.46 1.00 Cognition 303 0.95 0.08 1.00 0.05 0.42 1.00 Pain 303 0.88 0.11 0.90 0.19 0.55 1.00 SF-6D Global Utility Score 302 0.63 0.13 0.60 0.12 0.31 1.00 SF-6D Single Attribute Scores Physical Functioning 302 0.91 0.04 0.90 0.06 0.83 1.00 Role Limitations 302 0.93 0.03 0.93 - 0.90 1.00 Social Functioning 302 0.92 0.03 0.92 _ 0.88 1.00 Pain 302 0.92 0.05 0.92 0.05 0.79 1.00 Mental Health 302 0.95 0.03 0.94 - 0.83 1.00 Vitality 302 0.98 0.02 0.98 0.93 1.00 EQ-5D Global Utility Score 308 0.66 0.24 0.74 0.19 -0.21 1.00 EQ-5D Single Attribute Scores Mobility 308 0.91 0.08 0.85 0.15 0.34 1.00 Self-care 308 0.94 0.09 1.00 0.18 0.44 1.00 Usual Activities 308 0.90 0.11 0.88 0.12 0.56 1.00 Pain 308 0.70 0.16 0.80 0.27 0.26 1.00 Anxiety/Depression 308 0.94 0.09 1.00 0.15 0.41 1.00 * SD = Standard deviation; IQR = interquartile range - = <0.0001 129 TABLE 4.4: DOMAIN RESPONSES FOR THE MAUT INSTRUMENTS Cognition (n= 308) Self Care (n=308) Emotion (n=308) Pain (n=307) Sensation (n=306) Mobility (n=307) HUI2 Levels* Count % Count % Count % Count % Count % Count % 1 185 60 212 67 152 49 19 6 42 14 145 47 2 115 37 81 27 137 45 124 40 229 74 105 34 3 7 2 11 5 17 5 103 34 33 11 53 17 4 1 1 4 1 2 1 49 16 2 1 4 1 5 - - - - - - 12 4 _ _ 0 0 6 - - - - - - - - - -Pain (n=306) Emotion (n=306) Vision (n= 309) Hearing (n=308) Speech (n=310) Cognition (n=308) Ambulation (n=306) Dexterity (n=307) HUI3 Levels* Count % Count % Count % Count % Count % Count % Count % Count % 1 22 7 157 50 45 15 284 90 294 95 185 60 145 48 74 24 2 93 31 105 34 249 80 6 2 7 2 19 6 104 33 99 32 3 111 36 33 11 4 1 8 3 7 2 54 17 29 9 43 14 4 64 21 10 3 6 2 8 3 2 1 41 13 24 8 73 24 5 16 5 1 1 5 2 1 1 0 0 8 3 3 1 16 5 6 - - 4 1 - - 2 1 - - 1 1 1 1 2 1 130 Physical Functioning (n= 307) Role Limitat (n=302) ons Social Functic (n=302) ming Pain (n=303) Mental Health (n=303) Vitality (n=308) SF-6D Levels* Count % Count % Count % Count % Count % Count % 1 7 3 50 17 44 15 12 4 58 19 2 1 2 54 18 192 63 75 24 55 18 123 41 71 23 3 75 24 11 4 129 4 3 72 24 104 34 108 35 4 50 16 4 9 16 4 3 14 81 27 15 5 84 27 5 95 31 - - 11 4 58 19 3 1 38 12 6 26 8 - - - - 25 8 - - 5 2 Mobility (n=309) Usual Activiti (n=309] es Self Care (n=308) Pain/ Disco rr (n=308] ifort Anxiety/ Depression (n=308) EQ-5D Levels Count % Count % Count % Count % Count % 1 119 39 94 30 219 71 35 11 197 63 2 189 61 202 65 86 28 245 80 108 35 3 1 <1 13 5 3 2 28 9 3 2 For the HUI2, HUD, SF-6D and EQ-5D, higher levels represent higher degrees of limitation or increased symptoms. For the SF-6D, Physical Functioning and Pain have six levels, Social Functioning, Mental Health and Vitality have five levels and Role Limitations has four levels. For the HUI2 all the domains have four levels except for Mobility and Pain which have five. For the HUI3, all the domains have six levels with the exception of Pain, Vision and Speech which U d V C five 131 TABLE 4.5: RELATIONSHIP BETWEEN RA SEVERITY AND CONTROL AND THE GLOBAL UTILITY SCORES FOR E A C H OF T H E MAUT INSTRUMENTS Mean Score (SD) Global Utility Score. Mean rSD^ HAQ ^ RAQoL' HUI2< HUI3* SF-6D} EQ-5D* Self-reported RA severity Very mi ld 0.28 (0.43) 3.22 (3.27) 0.89 (0.09) 0.80 (0.25) 0.83 (0.12) 0.89 (0.14) M i l d 0.43 (0.51) 5.15 (4.40) 0.84 (0.12) 0.75 (0.21) 0.74 (0.12) 0.84 (0.13) Moderate 1.01 (0.64) 11.09 (6.65) 0.76 (0.15) 0.59 (0.24) 0.65 (0.10) 0.71 (0.14) Severe 1.37 (0.72) 16.76 (7.86) 0.63 (0.20) 0.43 (0.28) 0.58 (0.12) 0.58 (0.27) Very Severe 1.68 (0.83) 18.96 (8.54) 0.58 (0.23) 0.28 (0.34) 0.50(0.11) 0.47 (0.30) Spearman's Correlation Coefficient 0.46* 0.52* -0.47* -0.46* -0.55* -0.45* Self-reported RA control Very well controlled 0.65 (0.78) 6.15(7.02) 0.82 (0.20) 0.74 (0.26) 0.73 (0.14) 0.81(0.18) Wel l controlled 0.79 (0.68) 8.0 (6.11) 0.79 (0.13) 0.66 (0.24) 0.69 (0.11) 0.77 (0.13) Adequately controlled 1.09 (0.70) 13.10(6.97) 0.73 (0.17) 0.54 (0.25) 0.62 (0.11) 0.67 (0.11) Not well controlled 1.63 (0.59) 20.37 (6.50) 0.55(0.19) 0.31 (0.24) 0.52 (0.09) 0.48 (0.28) Not controlled at all 2.04 (0.38) 24.0 (3.51) 0.42 (0.08) 0.00 (0.11) 0.46 (0.08) • 0.25 (0.23) Spearman's Correlation Coefficient 0.45* 0.59* -0.49* -0.52* -0.58* -0.50* Comparison (using A N O V A ) o f mean values stratified by severity/control - all p<0 0001 *p<0.0001 132 T A B L E 4.6: D I C H O T O M O U S M E A S U R E S O F R A S E V E R I T Y Parameter Adverse Events to RA Drug Therapy in Last 3 months Yes No Effect Size Hospitalized in Past Year for RA Yes No Effect Size Other Chronic Diseases 1 or more None Effect Size Days Off Work/School Due to RA in Past Year Yes No Effect Size Use of Allied Health/Home Services** for RA in Past Year Yes No Effect Size HAQ Mean Score (SD) 1.20 (0.72) 1.06 (0.77) 0.19 1.38(0.73) 1.05 (0.77) 0.44 1.19(0.76) 0.97 (0.75)* 0.29 1.27 (0.71) 0.83 (0.75) J 0.60 1.34 (0.72) 0.80 (0.74) f 0.74 RAQoL 14.7 (7.9) 11.7 (8.2)* 0.37 14.9 (7.9) 12.3 (8.2)* 0.32 13.6 (8.3) 11.6 (8.1)* 0.24 16.4 (7.7) 10.1 (7.8) f 0.81 15.1 (7.9) 11.0 (8.1) f 0.51 HUI2 0.68 ((0.20) 0.73 (0.19) 0.26 0.63 (0.21) 0.72 (0.19)* 0.45 0.69 (0.20) 0.74 (0.19)* 0.26 0.67 (0.19) 0.76 (0.19) 0.47 0.66 (0.20) 0.75 (0.18) | 0.47 Global Utility Score, Mean HUD 0.50 (0.29) 0.55 (0.29) 0.17 0.44 (0.27) 0.55 (0.29)* 0.39 0.51 (0.29) 0.58 (0.29)* 0.24 0.49 (0.27) 0.62 (0.29) 0.46 0.45 (0.28) 0.60 (0.28) t 0.54 SF-6D 0.60 (0.11) 0.65 (0.14) 0.40 0.60 (0.11) 0.63 ((0.13) 0.25 0.62 (0.13) 0.64(0.13) 0.15 0.59 (0.11) 0.67 (0.13) J 0.66 0.59 (0.12) 0.65 (0.13) f 0.48 (SD) EQ-5D 0.61 (0.27) 0.69 (0.21) 0.33 0.61 (0.26) 0.67 (0.23) 0.24 0.65 (0.24) 0.68 (0.25) 0.12 0.59 (0.28) 0.72 (0.22) 0.52 0.60 (0.26) 0.71 (0.22) f 0.46 Rent or Purchase Equipment for RA in Past Year Yes No Effect Size 1.34 (0.69) 0.90 (0.76) t 0.61 16.6 (8.1) 11.8 (8.0) f 0.60 0.62(0.21) 0.74 (0.18) t 0.61 ••Physiotherapy, Occupational Therapy, Massage Therapy, Home Care, f p-value < 0.0001 from a t-test between the two groups , * p-value <0.001, p-value < 0.01, * p-value < 0.05 Large effect sizes (>0.5) are highlighted in bold text 0.39 (0.29) 0.58 (0.27) f 0.68 0.58 (0.11) 0.63 (0.13) 0.42 0.69 (0.22) 0.57 (0.27) % 0.49 133 T A B L E 4.7: C O R R E L A T I O N S F O R M U L T I A T T R I B U T E A N D S E L E C T S I N G L E A T T R I B U T E U T I L I T Y S C O R E S W I T H R A S E V E R I T Y MEASURE RA Duration RAQoL Swollen Tender Joint Patient HAQ Pain (years) Score Joint Count Count Global VAS Disability VAS (0-30) (0-28) (0-28) (mm) Score (mm) HUI2 Global Utility -0.11 -0.70| -0.38t -0.44| 0.55| -0.66| -0.59t Mobility (HUI2) -0.25t -0.52| -0.37f -0.36t 0.49t -0.64t -0.44t Self-Care (HUI2) -0.10 -0.55$ -0.36f -0.34t 0.42t -0.60| -0.40t Pain (HUI2) -0.04 -0.62t -0.45t -0.42| 0.51| -0.54| -0.59t HUD Global Utility -0.20$ -0.75t -0.42t -0.48t 0.58| -0.76t -0.60t Ambulation (HUD) -0.24t -0.53t -0.37t -0.36| 0.49t -0.65| -0.44t Dexterity (HUD) -0.24$ -0.62| -0.47| -0.43f 0.44| -0.68| -0.39t Pain (HUD) -0.09 -0.70| -0.51| -0.45| 0.57t -0.6 It -0.70t SF-6D Global Utility -0.17$ -0.80t -0.47| -0.53f 0.63t -0.73t -0.62t Physical Functioning -0.21$ -0.64f -0.42| -0.41| 0.50| -0.69t -0.50t Role Limitations -0.14* -0.57t -0.36t -0.36t 0.42| -0.50t -0.42t Social Functioning -0.16$ -0.70f -0.42f -0.44| 0.56t -0.63t -0.52t Pain -0.18$ -0.74| -0.50f -0.48| 0.62| -0.67t -0.63t EQ-5D Global Utility -0.11 -0.70t -0.47t -0.42t 0.58| -0.6 It -0.60t Mobility (EQ-5D) -0.17J -0.53t -0.4 It -0.37t 0.46| -0.53t -0.44t Usual Activities (EQ-5D) -0.10 -0.64t -0.44| -0.45| 0.51| -0.57t -0.51t Pain/Discomfort(EQ-5D) 0.00 -0.34f -0.26| -0.32t 0.30t -0.3 It -0.32t Self-Care (EQ-5D) -0.14* -0.59t -0.34| -0.37f 0.43f -0.60t -0.43t RAQoL Score 0.20$ - 0.53t 0.54t -0.62| 0.76t 0.62t HAQ Disability Score 0.28$ 0.76f 0.48| 0.46$ -0.53f - • 0.54t * p <0.05; * p<0.01;f p <0.001; Variables for which correlations were hypothesized to be strong are in bolded type in the measures column; Correlations considered to be strong are in bold type in the results column 134 TABLE 4.8: SIMPLE LINEAR REGRESSION ANALYSES FOR OVERALL INSTRUMENT SCORES AND HAQ DEPENDENT VARIABLES Variable HAQ Disability Index (0-3.0) Minimally Important Difference (MID) Estimation DEPENDENT VARIABLES HUI2 Global Utility Beta-coefficient (SE)* Model p-value R 2 -0.16(0.01) O.0001 0.43 For each 1.0 change in HAQ, the HUI2 changes 0.16. Therefore, a 0.25 change in HAQ (MID) results in a 0.04 change in the HUI2 (estimated MID) DEPENDENT VARIABLES HUD Global Utility Beta-coefficient (SE)* Model p-value R2 -0.29 (0.01) <0.0001 0.58 For each 1.0 change in HAQ, the HUI3 changes 0.29. Therefore, a 0.25 change in HAQ (MID) results in a 0.07 change in the HUB (estimated MCID) DEPENDENT VARIABLES SF-6D Global Utility Beta-coefficient (SE)* Model p-value R2 -0.12(0.007) <0.0001 0.53 For each 1.0 change in HAQ, the SF-6D changes 0.12. Therefore, a 0.25 change in HAQ (MID) results in a 0.03 change in the SF-6D (estimated MID) DEPENDENT VARIABLES EQ-5D Global Utility Beta-coefficient (SE)* Model p-value R2 -0.19(0.01) O.0001 0.37 For each 1.0 change in HAQ, the EQ-5D changes 0.19. Therefore, a 0.25 change in HAQ (MID) results in a 0.05 change in the EQ-5D (estimated MID) DEPENDENT VARIABLES RAQoL Score Beta-coefficient (SE)* Model p-value R2 8.13 (0.003) O.0001 0.57 For each 1.0 change in HAQ, the RAQoL changes 8.13. Therefore, a 0.25 change in HAQ (MID) results in a 2.0 change in the RAQoL (estimated MID) * SI 3 = Standard Error 135 FIGURE 4.1: BOX PLOT OF MAUT INSTRUMENT GLOBAL UTILITY SCORES MAUT Instrument o Outliers are marked by an open circle. 136 CHAPTER 5 NOT A L L QALYS ARE EQUAL: THE IMPACT OF USING DIFFERENT INDIRECT UTILITY MEASURES ON ESTIMATING THE COST-UTILITY OF INFLIXIMAB IN RHEUMATOID ARTHRITIS 5.1 FOREWORD This manuscript is currently under review under the same title for publication in the Medical Decision Making. The candidate is first author of this manuscript which is co-authored by Drs. Stephen Marion, Fred Wolfe, John Esdaile, Monique Gignac, A n n Clarke and As lam Anis . In addition, a statistician, M s . Daphne Guh, was a co-author. Dr. Stephen Marion completed all of the complicated mathematical modeling such as the construction of the transition probability matrix and instructed the candidate in these methods. Drs. Wolfe, Gignac and Clarke provided access to databases that facilitated estimation o f the costs and effectiveness outcomes. M s . Guh provided assistance with the statistical analysis. Drs. Anis and Esdaile are co-supervisors of the candidate. The candidate's role in this manuscript involved the development of the primary hypothesis, study design, model design (with Dr. Marion), statistical analyses and the writing of the final manuscript. 5.2 INTRODUCTION With the introduction of new, expensive biological agents for the treatment of rheumatoid arthritis (RA) , the costs to manage this debilitating chronic disease have increased. This observation has drawn a great deal of attention as governments and third-137 party payers struggle to make decisions about how to incorporate these therapeutic agents into their funding envelopes.1 Because of the limited funds available for health care, R A drugs often compete for funding with those used to treat other disease areas. However, unlike H I V and cancer, arthritis cripples and does not immediately result in death. Drugs used for R A tend to improve quality of life and do not immediately impact on mortality. Because of this fact, treatments for R A can be undervalued when compared to treatments for other chronic diseases. Economic evaluations have become increasingly important in the allocation of funding to treatments and programmes. Thus, i f R A patients are not to be short-changed by policies and decisions, then measures that incorporate quality of life changes into economic studies are key. In order to integrate quality o f life into economic analyses, the effectiveness o f the health interventions in question is measured using a metric known as "utility," which ranges in value from 0 (dead) to 1 (perfect health). Results are reported as cost per quality-adjusted life years ( Q A L Y ) gained, which are derived by incorporating the utilities as weights in the life expectancy calculation. Cost per Q A L Y gained is a unique and preferred measure of the economic value of different interventions, because it permits comparison across disease groups, thereby facilitating funding allocation decisions. The ultimate goal is a comparison of the cost per quality adjusted life year ( Q A L Y ) gained so as to prioritize funding according to c o s t / Q A L Y . This determination of a c o s t / Q A L Y permits comparison across therapies within disease states (i.e. Drug A vs. Drug B , or Drug vs. physiotherapy vs. surgery) as well as permitting comparisons across diseases ( R A vs. lupus or congestive heart disease). To determine the utility weightings for Q A L Y s , the use of a pre-scaled multi-attribute utility index is often the most convenient and least expensive means of achieving this approach. While no validated multi-attribute utility index is available for economic 138 evaluations specifically in musculoskeletal disease, several generic measures of H R Q O L (health related quality of life) appear suitable for adaptation to economic evaluations in R A . 3 ' 4 Such generic utility-based instruments for use in economic evaluations are the Health Utilities Index Mark 2 and 3 (HUI2 and H U D ) , the EuroQol 5D (EQ-5D) and the Short Form 6-D (SF-6D). 5 These measures each capture different attributes and domains and thus assess different aspects of quality of life and utilize different methodologies to calculate the utility score. However, all of these instruments purport to integrate the health states obtained from the population under study with predetermined societal preference ratings for said states to produce an overall index score. There is no "gold standard" among these instruments and each likely has its own advantages and disadvantages. However, little is known how the choice of these instruments influences the outcome in economic evaluations of R A drug therapy. Thus, the primary objective of this study was to determine the impact on incremental cost effectiveness ratios (ICERs) from using different indirect utility instruments in an economic evaluation of a new biological agent (infliximab) used to treat R A . A secondary objective of the study was to determine the incremental costs, Q A L Y s , and cost-utility of infliximab over standard therapy for the treatment of active, refractory R A . 5.3 M E T H O D S 5.3.1 Clinical Trial Data Source The largest randomized controlled trial to evaluate the efficacy of infliximab in patients with R A refractory to other disease modifying antirheumatic drugs ( D M A R D s ) was 139 the A T T R A C T trial. 6 Patients were eligible for this trial i f they had active R A (6 or more tender and swollen joints and symptoms or signs (at least two of morning stiffness for at least 45 minutes, an erythrocyte sedimentation rate of at least 28 mm/hr, and serum C-reactive protein of at least 2.0 mg/dL) despite methotrexate ( M T X ) doses of more than 12.5 mg per week. 428 patients from 34 centers in the United States, Canada, and Europe were randomly allocated to one of five treatment groups: M T X alone; M T X plus 3 mg/kg of infliximab every 4 or 8 weeks; or M T X plus 10 mg/kg of infliximab every 4 or 8 weeks. A t baseline, most patients were also receiving non-steroidal anti-inflammatory drugs (74%) and corticosteroids (61%). D M A R D s other than M T X were not allowed and, i f necessary, withdrawn before beginning study treatment. The 5 treatment groups were comparable with respect to age (median, 54 years), gender distribution (78% were women), disease duration (8-9 years), functional class, number of swollen and tender joints, and levels of C-reactive protein. The primary outcome of the study was a comparison of A C R 2 0 (American College o f Rheumatology function class: a 20% or greater improvement in swollen and tender joint counts and a 20% or greater improvement in 3 of patient's global assessment, physician's global assessment, physical disability score (as measured by the Health Assessment Questionnaire [HAQ] disability index), erythrocyte sedimentation rate, and patient's assessment of pain) at 54 weeks between groups. A t the end of the 54 weeks, 52% of the infliximab-treated patients had achieved an A C R 20% response compared with 17% of M T X - o n l y controls (P<0.001). Improvement with infliximab was also evident when comparing patient outcomes in terms of A C R 50% (33% vs. 18%) and A C R 70% (18% vs. 3%). 140 5.3.2 Overview of Model The framework of our Markov decision-analytic model is presented as a schematic diagram in Figure 5.1 and is outlined in Appendix I. Consistent with the inclusion/exclusion criteria of the A T T R A C T trial, 6 the decision model was used to compare the costs and effects o f two different drug therapy strategies for adult patients with R A refractory to standard therapy including M T X : 1) intravenous infliximab 3mg/kg every 8 weeks and intravenous M T X of at least 12.5mg/week; and 2) continued usual R A management with M T X as described above. The time horizon for our model was ten years since we assumed that this would likely be the minimal time frame that infliximab would be used in clinical practice and the perspective of both the costs and outcomes were from society. We utilized a 3% discount rate for both costs and Q A L Y s as generally recommended in recent guidelines. The infliximab plus M T X strategy was based on the pooled results across the three infliximab treatment arms in the A T T R A C T trial . 6 Similar to Wong et al., because of the similarity of the observed outcomes in the different doses and dosing intervals for the infliximab arms in the A T T R A C T trial, these were pooled to estimate the health state transitions associated with infliximab treatment. Treatment with infliximab was assumed to be continuous during the ten year model unless, during individual patient simulations, three consecutive months residing in H A Q states > 2.0 were observed. A t this point, the costs o f infliximab were doubled (indicating a shortening in the dosage interval from 8 weeks to 4 weeks which has been shown to be the course most rheumatologists would take considering the lack of response.9 Finally, i f no improvement in H A Q was made after a further three consecutive months, it was assumed that infliximab would be discontinued and the M T X 141 alone strategy would be adopted. The M T X alone strategy was based on the results achieved from the M T X plus placebo arm in the A T T R A C T trial. 6 " ' In defining health states for our Markov model, we utilized the H A Q in increments of 0.125 (scores range from 0, which is perfect function, to 3.0, which is maximum functional impairment in 0.125 increments) as discrete health states (Figure 5.1). The H A Q is the best predictor of mortality in R A , 1 0 work disability, 1 1 and health care resource utilization. 1 2 Therefore, within the model, transitions were made between different HAQ-defined health states while costs and outcomes ( Q A L Y s ) were accrued according to the treatment strategy. A l l cause mortality functioned as the absorbing state in the model. 5.3.3 Transition Probability Matrices and Statistical Modeling Wong et al. published transition probabilities based upon an on-treatment analysis o from the A T T R A C T trial for the period of baseline to 30 weeks and 30 weeks to 54 weeks. These investigators grouped the H A Q into four discrete health states ( H A Q 0 = no disability; H A Q 0.1 to 1.0 = mild impairment; H A Q 1.1 to 2.0 = moderate impairment; and H A Q 2.1 to 3.0 = advanced impairment) and reported the transitions to and from these health states. Wong et al. also reported their transition probabilities as single state transitions across a 30 week (baseline to 30 weeks) or 24 week (30 weeks to 54 weeks) time frame. Since there was substantial improvement in the first 30 weeks of the A T T R A C T trial in both the placebo and the infliximab arms, 6 we classified this time frame as being an adjustment phase where considerable regression to the mean was occurring in both treatment arms. Thus, we did not utilize this time frame to estimate the long-term transition probabilities. Instead, we utilized Wong et al.'s 30 to 54 week transition probabilities for the M T X (Table 5.1) and infliximab 142 plus M T X (Table 5.2) strategies as the basis of our long-term transition probability matrix. However, we used the baseline H A Q distribution for all arms in the A T T R A C T trial as the baseline H A Q distribution for entry into the model. Therefore, the underlying mathematical model that we adopted was a continuous time Markov process. A s such, transitions can occur at any time, not just at discrete weekly or monthly time points and the state of the patient with respect to R A is fully defined by knowing which of 25 H A Q states she/he is in at any time. There may be some error, however, in the measurement of the H A Q state. The true H A Q is the measured H A Q +/- an error. The transitions in H A Q state at any time-point are always to one of the neighboring H A Q states (i.e. from H A Q 0.250 to H A Q 0.375 or H A Q 0.125). The interval between transitions is assumed to be exponentially distributed with a rate parameter that depends on the current H A Q ; i.e. the distribution of between event times is: r exp(-r t) (where "r" is the transition rate and is a function of the H A Q level from which the transition w i l l occur). The transition rate has three multiplicative components: i) a purely random fluctuation component which is equal in both directions, but which is larger for H A Q scores in the middle of the range than at the extremes; ii) a systematic excess tendency for drift in either an upper or a downward direction; and iii) a factor which allows the systematic drift to increase or decrease (and even reverse) across the range of possible H A Q states. This assumption reduces the 600 (24*25) independent transition probabilities in an unstructured 143 model, to 5 parameters in this structured transition rate model. The observed 4 x 4 transition matrix (from Wong et al.) arises by running the underlying model for 24 weeks and then collapsing the resultant 25 x 25 transition matrix to 4 x 4. The 5 parameters were estimated by standard maximum likelihood methods for non-linear models (S-Plus® 6.1 for Windows procedure N L M I N , but with a more stringent convergence criterion than the default). The estimates for the two treatment strategies were carried out independently. Mortality over six months was ignored in these calculations. The 25 x 25 weekly transition probability matrices for M T X and infliximab plus M T X strategies are shown in Table 5.3 and Table 5.4, respectively. Mortality was then estimated from other data (see below) and superimposed on the H A Q transition rate model (see the Appendix I for details of the model and Appendix II for the C-code for the model). 5.3.4 Mortality Rate We modeled the all-cause mortality rate from a large, dataset spanning from 1974 and continuing through 1999, consisting of 1,922 consecutive R A patients seen at the Wichita (Kansas) Arthritis Center, an outpatient rheumatology clinic. Demographic, clinical, laboratory, and self-report data (including H A Q ) were obtained at each follow-up clinic visit. The details of this data set in regard to mortality have been reported previously. 1 0 ' 1 3 The death rate was calculated using Poisson regression using time at risk as an offset variable. The covariates in the model were age, age-squared, the H A Q and HAQ-squared. From this data, we determined the probability of death by H A Q state as people aged during the simulation (Table 5.5). A l l patients were assumed to be 52.6 ± 9.2 years of age at the time of initiating the treatments as this was the mean age of enrollment in the A T T R A C T study.6 The 144 regression model fit was assessed by examining the model deviance divided by its degrees of freedom. 5.3.5 Utilities and QALYS The primary outcome measure used in the analysis was Q A L Y s over a ten year period associated with the use of either treatment strategy. Util i ty weights for the determination of Q A L Y s were calculated using multiple linear regression models assessing the relationship between the H A Q score, age and utility values. The models were estimated using baseline data from a longitudinal study of 317 R A patients comparing different utility instruments (the HUI2 , the H U D , the EQ-5D, and the SF-6D). Details o f this study can be found elsewhere. 1 4 ' 1 5 The model fit was assessed using R 2 and residual plots were used to assess the fit of each model. 5.3.6 Cost Estimation Our cost analysis was performed from the Canadian societal perspective. The cost components include both the direct medical costs and indirect costs incurred by loss of work due to R A . The methodology used to calculate each cost category is outlined below. A l l costs were deflated by the Consumer Price Index for healthcare products and are in 2002 Canadian dollars. 1 6 Unit costs and other equations in our model are summarized in Table 5.5. 5.3.6.1 Direct Drug Costs According to Schering Canada, infliximab is marketed at a cost o f $CAN2002 909.51 / 100 mg vial). Since the infliximab strategy used in our model (and in clinical practice) is 145 dosed based upon body weight, we applied a weight of 66kg as reported in another Canadian clinical trial o f R A drug therapy 1 7 resulting in the use of two vials every eight weeks. However, a report out of the United States suggested that the average weight of infliximab users in clinical practice was 77kg. 9 Furthermore, Malone et al. revealed in their report that 78% of clinicians surveyed gave patients the entire vial when the calculated dose based on weight was less than the entire v ia l . 9 Thus, we used a blended cost assuming that 67% received 2 vials and 33% received 3 vials of infliximab. This assumption was tested in univariate sensitivity analysis. In addition, the costs of pharmacy (for preparation), nursing (for preparation and monitoring), and baseline T B screening tests consisting of a chest X-ray, a P P D skin test, and a rheurnatologist follow-up visit (for the infliximab strategy) were obtained from the provincial medical fee guide and included in the model. The cost of M T X is $1.00 per 2.5mg tablet and $9.75 per 50mg vial. A n analysis of M T X prescriptions in 2000 for all R A patients in the province of British Columbia based on a population-based R A cohort showed that 90% of M T X prescriptions were oral tablets and 5% were injectable solution with preservative and 5% were injectable solution without preservative. The monitoring costs of anti-nuclear antibodies and ant i -DNA antibodies done twice a year were included in the infliximab strategy whereas other monitoring costs were assumed to be the same across the two strategies. 5.3.6.2 Other Direct Costs Other direct costs beside drug cost included in this study were derived from a longitudinal study of 1063 Canadian patients who reported semi-annually on their health services utilization over the preceding 6 months during 1983 and 1994. A detailed description of the determination of these costs is available from the literature. 1 9 The direct 146 costs of R A care were comprised of long-term care, rehabilitation, nursing homes, health professional visits, medications, diagnostic tests, acute hospitalization, emergency department visits, ambulance services, dialysis and outpatient surgeries. Using a subset o f the database, a mixed-effect regression model to estimate direct cost (log transformed) over the next 6-month period was generated where the predictors were gender (fixed effect), disease duration (fixed effect) and H A Q (random effect) at 0 month and within-patient correlation of observations over time were adjusted. These costs were divided by 24 to give average weekly costs by H A Q health state for input into the Markov model with weekly state transitions. 5.3.6.3 Indirect Costs Indirect cost caused by work disability due to R A was estimated from a prospective longitudinal cohort of 120 employed R A patients recruited in Ontario, Canada from September 1999 to December 2001. 2 0 In the self-report questionnaire, participants were asked the number of days missed due to R A in the past 6 months and their regular weekly working hours. Participants' disability level was assessed by items drawn from the H A Q and the Multidimensional Functional Assessment Questionnaire and supplemented with additional items to assess discretionary activities such as hobbies, leisure pursuits. A multiple linear regression model was constructed between work capacity and the disability score. Using the disability score as a proxy of the H A Q score, we estimated the work capacity for our study groups based on the baseline H A Q and improvement in H A Q . A gender-weighted average income of Canadian population aged from 45-64 was multiplied by work missed to estimate the cost of lost work capacity. For the model, once the age of the cohort was 147 equivalent to 65, these indirect costs were no longer accrued as it was assumed that patients would have retired. 5.3.7 Survival Analysis From the 100,000 simulations described below (50,000 for each treatment strategy), we conducted Kaplan Meier survival analysis and Cox regression. Since time at risk and the occurrence of death as a binary variable were tracked, we estimated the probability of survival over the 10 years of the model. The log-rank test was used to test the null hypotheses that the survival time between the treatment strategies was the same. Cox regression was used to determine the hazard ratio associated with the use of infliximab plus M T X as compared to M T X alone. Right censoring was used for those who had not died after the 10 year time horizon of the Markov model. 5.3.8 Cost-Utility and Probabilistic Analysis The analysis of the state-transition model provides expected costs and expected Q A L Y s over a 10 year follow-up period. If the infliximab strategy was both more effective and more costly, we calculated the incremental cost-utility ratio of the additional cost per Q A L Y . To quantify the precision of our cost-utility estimates, we conducted probabilistic analyses. We utilized both 1st order (random walk) and 2nd order (random draws from specified distributions) Monte Carlo simulation methods. For the 2nd order simulations, we conducted 1000 iterations. For each of the 2nd order iterations with sampling from the specified distributions, 50 random walks were conducted for each strategy. Therefore, a total 148 of 100,000 simulations were conducted (50,000 per treatment strategy). For this model, probability distributions were defined for three sets of key model parameters: 1) gender was assumed to follow a Bernoulli; 2) baseline age was assumed to follow a normal distribution; and 3) baseline H A Q scores were assumed to follow the distribution at randomization in the M T X arm of the A T T R A C T tr ial . 6 ' 8 The variables were chosen as they were key variables in the cost calculation (Table 5.5) and the baseline H A Q distribution sets the starting point for the transition probability matrices. The 95% confidence region surrounding the incremental cost-utility ratio was estimated using Fieller's theorem. 2 2 This analysis was also used to generate plots on the cost-effectiveness plane using each of the indirect utility measurements and to generate cost-acceptability curves with the often quoted threshold of society's willingness to pay (WTP) of $50,000 per Q A L Y as the ceiling ratio. 2 3 5.3.9 Univariate Sensitivity Analysis For parameter estimates that were uncertain but where evidence of prior probability distributions was uncertain or inapplicable, we conducted deterministic, univariate sensitivity analysis. Specifically, we calculated the cost per Q A L Y by varying the discount rate from 0%, 3% (base case), 5%, and 7%. In addition, to account for the potential of higher doses of infliximab being used to achieve the same benefit (Malone et al. reported that approximately 1/3 of infliximab patients receive a higher dose than 3 mg/kg every eight weeks 9), we varied the weekly cost of infliximab from $200 to $500 per week. 149 5.4 RESULTS 5.4.1 Simulation Results Under the assumptions of our model, the mean final H A Q states for those still alive after 10 years or those who expired during the simulations were 2.40 ± 0.41 for the M T X alone strategy and 1.38 ± 0.92 for the infliximab plus M T X strategy (p<0.0001 by student's t-test). The Kaplan-Meier survival curve from the analysis of the 100,000 simulated patients is presented in Figure 5.2. The result of the log-rank test revealed that there was a significant benefit in survival by using infliximab plus M T X over M T X alone (p<0.0001). The hazard ratio associated with infliximab plus M T X was 0.63 (95% confidence interval 0.62 to 0.65, p<0.0001) when compared to M T X alone. Q Similar to Wong et al., to compare the benefit o f infliximab as predicted by our model compared to that achieved in the A T T R A C T trial 6 , we determined the predicted mean H A Q score for the infliximab plus M T X and the M T X alone arms after 54 weeks. Our model predicted a mean difference in improvement in H A Q score of 0.4 for infliximab plus M T X versus M T X alone after 54 weeks which was identical to that observed in the A T T R A C T trial. 5.4.2 Utility and QALY Values The results of the multiple linear regression analyses of the indirect utility measures and H A Q are presented in Table 5.6. The discounted Q A L Y s produced by using these equations in the Markov model by treatment strategy are provided in Table 5.7. The SF-6D produced the highest estimations of Q A L Y s secondary to its high lower bound (0.30) as 150 compared to the other three instruments which permit utility values less than zero (presumably, those health states valued less than death). The H U B produced the largest incremental difference between the infliximab plus M T X and the M T X alone strategies. 5.4.3 Cost-Utility and Probabilistic Analysis The results of the expected costs, expected Q A L Y s and the incremental cost-utility ratios (with 95% confidence limits generated by the 1st and 2nd order Monte Carlo simulations) of using the infliximab plus M T X over the M T X alone strategy are presented in Table 5.8. The mean incremental cost per Q A L Y was the highest for utility weightings provided by the SF-6D compared to the lowest for utility weighting provided by the H U B with the HUI2 and E Q - 5 D utility weightings providing estimates in the middle of these results. The results of the probabilistic analysis are shown graphically on the cost-effectiveness plane in Figure 5.3. Finally, in Figure 5.4, the cost-acceptability curves for each of the indirect utility measures are shown. These results suggest that for ceiling ratios below $50,000, the H U B and E Q - 5 D would most likely yield results below this figure (100% and 99% probability, respectively) as compared to the HUI2 and SF-6D (8% and 0% probability, respectively). For a ceiling ratio of $100,000 per Q A L Y , the results indicate that estimates obtained with any of the indirect utility estimates would be judged to be cost-effective (100% probability). 151 5.4.4 Traditional Sensitivity Analysis Results o f the univariate sensitivity analysis o f varying discount rates and the cost o f infliximab are presented in Table 5.9. The incremental cost-utility ratios are relatively robust to the different discount rates. However, increasing the cost of infliximab causes a large increase in the incremental cost per Q A L Y for all of the indirect utility measures. 5.5 DISCUSSION Our analysis reveals that there is considerable variation in the incremental cost per Q A L Y of using different indirect utility instruments as weightings for Q A L Y estimation in economic evaluation of new therapies for R A . It appears that the SF-6D yields the least optimistic while the H U D yields the most optimistic incremental cost per Q A L Y gained. These findings were further supplemented by the results from the cost-acceptability curves which showed that, under a ceiling ratio of society's W T P of $50,000 per Q A L Y , the H U D and E Q - 5 D based results had a 100% probability of being under this limit. A l so , under the assumptions of our model, we demonstrated that the addition of infliximab to M T X in patients with refractory R A results in an improvement in both quality of life (regardless of the indirect utility measurement technique employed) and survival. However, this benefit comes at an increased cost which is due mostly to the acquisition cost of the drug. Recently, a significant amount of attention has been paid to the fact that the available indirect utility instruments could result in drastically different results when applied to the calculation of Q A L Y s in rheumatology. 1 5 , 1 6 , 2 4 ' 2 5 Conner-Spady et al. administered the E Q -5D, the SF-6D, and the H U D in a consecutive sample of rheumatology patients (98 patients were included in the analysis; of these, 51% had R A with the remainder having other 152 rheumatological conditions) at baseline, 3, 6 and 12 months. 2 3 They calculated a theoretical Q A L Y by summing the average Q A L Y by instrument for each time interval (for example, 0 to 3 months). They found that the E Q - 5 D derived Q A L Y s were larger in those reporting better health than H U B or SF-6D derived Q A L Y s . The authors concluded the three indirect tools they tested were not interchangeable which could have important ramifications for economic evaluations. The analysis by Luo et al. also conducted in a relatively small sample (n=114) of patients with a variety of rheumatic diseases concluded that the H U B and the E Q - 5 D performed "equally we l l " in measuring utilities in rheumatic diseases based upon assessment of construct val idi ty. 2 6 While we agree that both of these instruments have construct validity in R A , 1 5 we agree with Conner-Spady et al. that they are clearly not interchangeable for use in economic analysis as Q A L Y weightings. Our analysis, which is based on indirect utility assessment using the HUI2 , H U B , SF-6D, and E Q - 5 D from a sample of 317 patients with rheumatologist-confirmed R A , 1 5 ' 1 6 is the first attempt at quantifying the differences in indirect utility weightings for Q A L Y measurement in an actual economic evaluation in rheumatic arthritis. From the incremental cost-utility results, it can be seen that there was a relative difference of over 100% in incremental cost per Q A L Y between the lowest (from the H U B derived Q A L Y s ) and highest (from the SF-6D derived Q A L Y s ) estimates. In addition, the range in incremental cost per Q A L Y s generated spans the often-quoted threshold of $50,000 per Q A L Y for programs to be funded, making decision-making difficult. For example, using H U B or E Q - 5 D generated Q A L Y s and the $50,000 per Q A L Y threshold as shown in the cost-acceptability curves (Figure 5.4), decision-makers might determine that infliximab is an economically attractive strategy. However, the same model yields less optimistic findings with the HUI2 , and the SF-6D derived Q A L Y s potentially resulting in a conflicting decision. Thus, given the wide 153 range of incremental cost per Q A L Y s generated by using the various instruments, policies regarding new medications should not be based on a single measure wherever possible. A t the very least, economic evaluations should either attempt to explore this issue in sensitivity analysis or, the choice of outcome measures should be standardized across economic evaluations of rheumatoid arthritis. These finding substantiate the conjecture by Conner-Spady et al. that there might be important implications in economic evaluation by employing different indirect utility instruments. 2 5 Our analysis builds on those previously described as the utility estimates are derived from a much larger, homogeneous (all patients had R A ) sample. In addition, we utilized all four of the most commonly utilized indirect utility instruments thus permitting a complete comparison amongst them. Our findings are similar to those who have compared the outcomes in economic analyses using different indirect utility measures. For example, a study by Neumann et a l . , 2 6 showed that incremental cost-effectiveness estimates for a new drug for Alzheimer's disease were more economically attractive when using the HUI3 as compared to the HUI2 . With respect to the secondary objective of our study, namely to determine the incremental cost per Q A L Y in adding infliximab to M T X in the treatment of refractory R A , we found that the cost per additional discounted Q A L Y by adopting the infliximab strategy under the assumptions of our model ranged from $38,161 to $58,991 depending on the utility weighting method utilized. Our results generally agree with other evaluations of the cost-effectiveness of infliximab that have been recently published in that, under certain conditions, the infliximab strategy could be construed to be economically attractive, although there are important methodological differences. 1 ' 8 ' 2 7 A l l o f these analyses (including ours) make use of a Markov model based upon states derived from the Health Assessment 154 Questionnaire ( H A Q ) and utilize the A T T R A C T trial as the primary source of clinical data. Wong et al. extended the results from the 54 week follow-up from the A T T R A C T trial over the lifetime of R A patients.8 These authors found that, from the perspective of society, the infliximab strategy had an incremental cost-utility ratio of $9,100 (US) per Q A L Y gained versus M T X monotherapy. Kobelt et al. conducted a similar analysis from the Swedish healthcare perspective over a 1 year and a 2 year period. 2 8 These investigators found, under the assumptions o f their model, the incremental cost-utility ratio for the infliximab over the M T X strategy was 3440 E U R O per Q A L Y (from the Swedish societal perspective) and 34,800 E U R O per Q A L Y (from the U K societal perspective). Much of the differences in the results across these studies can be explained by basic differences in the construction o f Markov models. Wong et al . utilized a 4x4 transition probability matrix, Kobelt et al. utilized a 6x6 matrix, while ours was 25x25 (with a continuous, underlying process) allowing for increased sensitivity in the relationship between H A Q , cost and Q A L Y s . In addition, the utility weightings employed by each of the studies were different. Wong et al. utilized a visual analogue scale ( V A S ) with death equal to zero and one equal to perfect health whereas Kobelt et al. utilized the E Q - 5 D . In both cases, the sample size from which these values were derived or the methodology to integrate them into the Markov model was not clear. In addition, while the V A S is a preference-based measure, it is not a choice-base method and thus not a utility. The comparison o f these two studies is a perfect illustration how weightings for Q A L Y s vary considerably across economic analysis even for the same drug therapy. Our analysis assumes that infliximab is continued for ten years for those that respond to therapy. This assumption is not unreasonable as it is conceivable that these refractory patients w i l l continue to receive a drug therapy that is as costly as infliximab for the duration 155 of their disease i f it is successful. However, there are several limitations to our study. Since the available results for the A T T R A C T trial only account for 54 weeks of follow-up, the transition probabilities could only be estimated from within this time window. We have ignored the occurrence, costs and outcomes associated with adverse events in the infliximab arm which could include infusion reactions, superficial upper respiratory tract infections, demyelination, and serious opportunistic infections. However, in the A T T R A C T trial, there was no difference in the number of serious adverse events (requiring hospitalization or judged to be life-threatening) between the two treatment groups.6 Many adverse events for the tumor necrosis factor alpha inhibitors were identified due to post-marketing surveillance. A s such, due to either the lack of significant medical intervention required for the management o f these adverse events, their low probability o f occurrence, their elimination through monitoring that we have built into the model (ie. for example, activation of latent T B is a concern with anti-TNF alpha therapy but we have included screening mechanisms prior to treatment for those in the infliximab strategy) and/or their negligible impact on utility, we determined that the costs incurred to manage this complication and changes in utility would not affect the overall model. A recent study has shown that patients encountered in clinical practice are quite different than those enrolled in the A T T R A C T t r ia l . 2 8 In fact, in those patients with "long-term" R A in a clinical practice, only 5% would have fit the inclusion/exclusion criteria of the A T T R A C T trial. This finding may seriously limit the applicability of all of the infliximab cost-utility analyses to the majority of patients encountered in clinical practice. A recent editorial by Wolfe et al.. outlines the potential pitfalls of extrapolating randomized controlled trial data in the conduct of long-term cost-effectiveness studies. Further research in this regard is warranted. 156 Other limitations with the model involve the use of H A Q health states (in 0.125 increments) as the underlying predictor of utility, mortality and work disability. Wolfe recently outlined the non-linear nature of the H A Q (changes in the H A Q score in the range o f 1 to 2 represent much less change in function than changes in the H A Q score in range from 0 to 1). J U While we did not assume linearity in the H A Q in terms of the probability calculation, we made this assumption in the regression models. However, as stated by Wolfe, the H A Q is "a good, sensitive questionnaire, the best we have to date and one that has stood the test of t ime." 3 0 Thus, despite its limitations, it is still l ikely the best method currently available in defining R A health states for transition models as we describe. The use of different indirect utility measurement methods as weightings for Q A L Y s yields quite different incremental cost-utility ratios in the economic evaluation of new therapies for R A . Thus, these differences should be explored in sensitivity analysis or, the choice o f outcome measures should be standardized across economic evaluations of rheumatoid arthritis. Infliximab as add on therapy to patients with R A who are refractory to M T X results in additional years of life even when an adjustment for quality is made. Depending on the method of measurement of utility adopted and the ceiling ratio of society's W T P for a Q A L Y , infliximab may represent good value in certain health care environments. 157 5.6 REFERENCES 1. Jobanputra P, Barton P, Bryan S, Fry-Smith A , et al.. The clinical effectiveness and cost-effectiveness of new drug treatments for rheumatoid arthritis: etanercept and infliximab. Accessed on the Internet June 2002 at http://www.nice.org.uk/pdf/RAAssessmentReport.pdf. 2. Drummond M F , O 'Br ien B , Stoddart G L , Torrance G W (eds.). Methods for the economic evaluation of health care programmes. 2nd edition. Oxford Medical Publications, Oxford. 1997. 3. Green C, Brazier J, Deverill M . Valuing health-related quality o f life. Pharmacoeconomics 2000;17:151-165. 4. Coons SJ, Rao S, Keininger D L , Hays R D . A comparative review of generic quality-of-life instruments. Pharmacoeconomics 2000;17:13-35. 5. Kopec J A , Wil l i son K D . A comparative review of four preference-weighted measures of health-related quality of life. J C l i n Epidemiol 2003;56:317-325. 6. Ma in i R, St Clair E W , Breedveld F, Frust D , Kalden J, Weisman M et al. for the A T T R A C T Study Group. Infliximab (chimeric anti-tumour necrosis factor alpha monoclonal antibody) versus placebo in rheumatoid arthritis patients receiving concomitant methotrexate: a randomized phase III trial. Lancet 1999;354:1932-1939. 7. Canadian Coordinating Office of Health Technology Assessment. Guidelines for the economic evaluation of pharmaceuticals: Canada. 2nd Edition. Ottawa: Canadian Coordinating Office of Health Technology Assessment (CCOHTA);1997 . 8. Wong J, Singh G , Kavanaugh A . Estimating the cost-effectiveness of 54 weeks of infliximab for rheumatoid arthritis. A m J Med 2002;113:400-408. 158 9. Malone D C , Ortmeier B G . Cost effectiveness analysis of etanercept (Enbrel) versus infliximab (Remicade) in the treatment of rheumatoid arthritis patients (abstract). Arthritis Rheum 2002;46(suppl.):s95. 10. Wolfe F, Michaud K , Gefeller O, Choi H K . Predicting mortality in patients with rheumatoid arthritis. Arthritis Rheum 2003;48:1530-1542. 11. Lajas C , Abasolo L , Bellajdel B , Hernandez-Garcia C, Carmona L , Vargas E , Lazaro P, Jover J A . Costs and predictors o f costs in rheumatoid arthritis: a prevalence-based study. Arthritis Rheum. 2003;15;49:64-70. 12. Ethgen O, Kahler K H , Kong S X , Reginster J Y , Wolfe F. The effect of health related quality of life on reported use of health care resources in patients with osteoarthritis and rheumatoid arthritis: a longitudinal analysis. J Rheumatol 2002;29.T 147-1155. 13. Choi H K , Hernan M A , Seeger JD, Robins J M , Wolfe F. M T X therapy and mortality in patients with rheumatoid arthritis: a prospective study. Lancet 2002;359:1173— 1177. 14. Marra C A , Woolcott JC, Shojania K , Offer R, Kopec J, Brazier JE , Esdaile J M , Anis A H . A comparison of generic, indirect utility measures (the HUI2 , H U I 3 , SF-6D, and the EQ-5D) and disease-specific instruments (the R A Q o L and the H A Q ) in rheumatoid arthritis. Soc Sci M e d (submitted). 15. Marra C A , Esdaile J M , Guh D , Kopec J A , Brazier JE, Chalmers A , Koehler B , Anis A H . A comparison of four indirect methods of assessing utility values in rheumatoid arthritis. M e d Care (submitted). 16. Consumer Price Index. Statistics Canada. Retrieved January 10, 2003 from www.statcan.ca/english/subjects/cpi/cpi-en.htm . 159 Tsakonas E , Fitzgerald A A , Fitzcharles M A , Cividino A , Thorne JC, et al.. Consequences o f delayed therapy with second-line agents in rheumatoid arthritis: a 3 year follow-up on the hydroxychloroquine in early rheumatoid arthritis ( H E R A ) study. J Rheumatol 2000;27:623-629. Lacaille D , Anis A , Guh D , Esdaile J. Assessing the quality of care for R A at a population level. Arthritis Rheum 2002;46: S626. Clarke A E , Zowal l H , Levinton C, Assimakopoulos H , Sibley JT, et al.. Direct and indirect medical costs incurred by Canadian patients with rheumatoid arthritis: A 12 year study. J Rheumatology 1997;24:1051-1060. Anis A H , Sun H Y , Gignac M . The indirect costs of illness disability in an incident cohort of arthritis patients. Arthritis Rheum 2002;46:s91. Number of income recipients and their average income in constant dollars by sex and age groups. Statistics Canada. Retrieved December 8, 2001, from http://www.statcan.ca/english/census96/ Briggs A H , O 'Br ien B J , Blackhouse G . Thinking outside the box: Recent advances in the analysis and presentation of uncertainty in cost-effectiveness studies. Annu Rev Public Health 2002;23:377-401. Goeree R, O 'Br ien B J , Blackhouse G , Marshall J, Briggs A , Lad R. Cost-effectiveness and cost-utility o f long-term management strategies for heartburn. Value in Health 2002;5:312-324. Conner-Spady B , Suarez-Almazor M E . Variation in the estimation o f quality-adjusted life-years by different preference-based instruments. M e d Care 2003;41:791-801. 160 Luo N , Ling-Huo C , K o k - Y o n g F , Dow-Rhoon K , Swee-Cheng N , et al.. A comparison of the EuroQoI-5D and the Health Utilities Index Mark 3 in patients with rheumatic disease. J Rheumatol 2003;30:2268-2274. Neumann PJ , Sandberg E A , Arak i SS, Kuntz K M , Feeny D , Weinstein M C . A comparison o f HUI2 and H U D utility scores in Alzheimer's disease. M e d Decis Making 2000;20:413-422. Kobelt G , Jonsson L , Young A , Eberhardt K . The cost-effectiveness o f infliximab (Remicaide®) in the treatment of rheumatoid arthritis in Sweden and the United Kingdom based on the A T T R A C T study. Rheumatology 2003:42:326-335. Sokka T, Pincus T, Eligibi l i ty of patients in routine care for major clinical trials of anti-tumor necrosis factor alpha agents in rheumatoid arthritis. Arthritis Rheum 2003;48:313-318. Wolfe F, Michaud K , Pincus T. Do rheumatology cost-effectiveness analyses make sense? Rheumatol 2004;43:4-6. Wolfe F. The psychometrics of functional status questionnaires: room for improvement. J Rheumatol 2002; 29:865-868. 161 T A B L E 5.1: OBSERVED TRANSITION PROBABILITY M A T R I C E S FOR M E T H O T R E X A T E F R O M T H E A T T R A C T TRIAL (FROM W E E K 30 T O W E E K 54) H A Q SCORE GROUPS* K'PS 0 0.1 -1.0 1.1-2.0 2.1-3.0 GRC 0 0.5 0.143 0 0 :ORE 0.1 -1.0 0.5 0.524 0.063 0 iQ SC 1.1-2.0 0 0.333 0.781 0.2 2.1-3.0 0 0 0.156 0.8 * H A Q transition states defined as 0 = no impairment; 0.1 to 1= mi ld impairment; 1.1 to 2 = moderate impairment; 2.1 to 3 = advanced impairment 162 TABLE 5.2: OBSERVED TRANSITION PROBABILITY MATRICES FOR INFLIXIMAB FROM THE ATTRACT TRIAL (FROM WEEK 30 TO WEEK 54) HAQ SCORE GROUPS* HAQ SCORE GROUPS* HAQ SCORE GROUPS* 0 0.1 -1.0 1.1-2.0 2.1-3.0 HAQ SCORE GROUPS* 0 0.679 0.079 0 0 HAQ SCORE GROUPS* 0.1 -1.0 0.286 0.822 0.158 0 HAQ SCORE GROUPS* 1.1-2.0 0.035 0.089 0.806 0.342 HAQ SCORE GROUPS* 2.1-3.0 0 0.01 0.036 0.658 *HAQ transition states defined as 0 = no impairment; 0.1 to 1= mi ld impairment; 1.1 to 2 = moderate impairment; 2.1 to 3 = advanced impairment 163 TABLE 5.3: CALCULATED WEEKLY TRANSITION PROBABILTY MATRIX FOR METHOTREXATE 0.922 0.084 0.006 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 \ 0.074 0.SO5 0.10? 0.01 0.00 i 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0.005 0.303 0.735 0.136 0.016 0.002 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0.009 0.131 0.665 0.15S 0.023 0.003 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0.001 0.016 0.16 0.601 0.173 0.03 0.004 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0.002 0.025 0.183 0.54S 0.181 0.034 0.005 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0.003 0.035 0.20! 0.511 0.SS4 0.036 0.005 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0.005 0.044 0214 0.489 0.183 0.036 0.004 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0.001 0.007 0.052 0.223 0.483 0.178 0.033 0.004 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0.001 0.OG9 0.056 0.228 0.493 0.S71 0.028 0.002 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 O.OO! 0.009 0.056 0.229 0.518 0.16 0.021 0.001 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0.00! 0.009 0.052 0.225 0.559 0.145 0.015 0.001 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0.001 0.007 0.044 0.213 0.614 0.125 0.009 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0.001 0.005 0.034 0.192 0.6S 0.101 0.005 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0.003 0.024 0.164 0.751 0.077 0.003 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0.00! 0.014 0.13 0.819 0.054 0.001 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0.001 o.oos 0.095 0.878 0.035 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0.004 0.065 0.923 0.021 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 O.OOS 0.04 0.955 0.012 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0.001 0.024 0.975 0.006 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0.013 0.987 0.003 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0.007 0.994 0.001 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0.003 0.997 0.001 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0.001 0.999 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0.001 1.. The upper left hand corner represents the probability of transition from H A Q 0 to H A Q 0 in a one week time frame (pi,i). The rows represent the probability of transition from H A Q states in increments of 0.125 (for example, the second row, first column represents the probability of a transition from H A Q 0.125 to H A Q 0 [p2,i]). Similarly, the columns are in 0.125 increments in H A Q (for example, the first row represent probabilities of transition to a H A Q 0, the second row represents transitions to H A Q 0.125. The bottom right hand corner represents the probability of transitioning from H A Q 3.0 (p25,25)-164 TABLE 5.4: CALCULATED WEEKLY TRANSITION PROBABILTY MATRIX FOR INFLIXIMAB (0.m 0.029 0.00! 0 0 0 0 0 0.034 0.928 0.037 0.001 0 0 0 0 0.001 0.042 0.909 0.04? 0.002 0 0 0 0 0.00! 0.052 0.837 0.059 0.002 0 0 0 0 0.002 0.062 0.363 0.072 0.004 0 0 0 0 0.003 0.073 0.837 0.086 0.005 0 0 0 0 0.004 0.084 0.81 0.1 0 0 0 0 0 0.005 0.094 0.783 0 0 0 0 0 0 0.606 0.103 0 0 0 0 0 0 0 0.008 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 .0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0.007 0 0 0 0 0 0 0 0.114 0.01 0.001 0 0 0 0 0 0.757 0.128 0.012 0.001 0 0 0 0 0.112 0.733 0.14 0.0!5 0.001 0 0 0 0.009 0.118 0.712 0.152 0.018 0.002 0 0 0.00! 0.01 0.123 0.694 0.56! 0.02 0.002 0 0 0.00! o.on 0.126 0.68 0.!69 0.022 0.002 0 0 0.00! 0.012 0.127 0.67 0.175 0.024 0 0 0 0.00! 0.012 0.1.27 0.664 0.178 0 0 0 0 0.00! 0.012 0.125 0.663 0 0 0 0 0 0.001 0.012 0.12! 0 0 0 0 0 0 0.00! o.on 0 0 0 0 0 0 0 0.00! 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 © 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 a 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 © 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0.002 0 0 0 0 0 0 0 0 0.025 0.002 0 0 0 0 0 0 0 0.18 0.025 0.002 0 0 0 0 0 0 0.667 0.179 0.024 0.002 0 0 0 0 0 0.116 0.675 0.176 0.022 0.002 0 0 0 0 0.01 O.U 0.688 0.171 0.02 0.00! 0 0 © 0 0.008 0.103 0.704 0.S63 0.0! 7 0.001 0 0 0 0 0.007 0.095 0.725 0.153 0.0! 5 0.00! 0 0 0 0 0.006 0.086 0.748 0.142 0.012 0.00 0 0 0 0 0.004 0.076 0.774 0.128 0.00< 0 0 0 0 0 0.003 0.066 0.801 0.11 0 0 0 0 0 0 0.002 0.058 0.87 The upper left hand corner represents the probability of transition from H A Q 0 to H A Q 0 in a one week time frame (pi,i). The rows represent the probability of transition from H A Q states in increments of 0.125 (for example, the second row, first column represents the probability o f a transition from H A Q 0.125 to H A Q 0 [p2,i]). Similarly, the columns are in 0.125 increments in H A Q (for example, the first row represent probabilities of transition to a H A Q 0, the second row represents transitions to H A Q 0.125. The bottom right hand corner represents the probability of transitioning from H A Q 3.0 ( £ 2 5 , 2 5 ) 165 TABLE 5.5: UNIT COSTS (IN CANADIAN DOLLARS), OTHER PARAMETERS AND EQUATIONS IN THE MARKOV MODEL Model Parameter Parameter Equation Infliximab* Methotrexate 15mgAveek Pharmacist preparation¥ Nursing monitoring ¥ Laboratory monitoring 1 Chest X-ray1* P P D Sk in Test* Rheumatology Cl inic Vis i t* Proportion females Mean age, yrs/100 (SD) 6 Month Direct Medical Costs Annual Indirect Costs Work Capacity $264.89/week $6.05/week $6.59/week $19.02/week $5.66/week $84.62 once $8.42 once $72.62 once 78% 0.52 (0.09) (exp (6.49 -1.18*age + 0.15*(female)+0.39*HAQ + 0.5*1.66)) ((l-workcapacity)*((47,085*(male) + 33,774*(female))) 1.09 -<0.18*HAQ) Annual Death Rate - exp(-l 1.42+(l 7.05*age)-(0.26*HAQ)-(7.95*age 2)+(0.29*HAQ 2)) *Based on an assumption of 67% receive 2 * lOOmg vials and 33% receive 3 * lOOmg vials - not applicable ¥ For the preparation, administration and monitoring of infliximab infusion * For the monitoring of anti-nuclear antibodies and anti-DNA antibodies twice a year t For the screening of latent tuberculosis prior to infliximab therapy In the equations, "male" and "female" refer to proportion of that gender as reported in the ATTRACT trial; "HAQ" refers to the disability index from 0 to 3.0; and "age" refers to the age in years divided by 100. 166 TABLE 5.6: MULTIPLE LINEAR REGRESSION MODELS OF THE INDIRECT UTILITY MEASURES Dependent Variable Independent Variable, Beta Coefficient, P- Model R 2 Value HUI2 Utility Score Intercept, 0.88, O.0001 0.43 H A Q , - 0 . 1 7 , O.0001 Age* , 0.03, 0.71 H U B Utility Score Intercept 0.83, O.0001 0.59 H A Q , -0.29, <0.0001 Age*, 0.01, 0.55 SF-6D Utility Score Intercept, 0.69, O.0001 0.54 H A Q , -0 .13,0 .0001 Age*, 0.13, 0.002 EQ-5D Utility Score Intercept, 0.72, <0.0001 0.38 H A Q , -0.20, O.0001 Age*, 0.25, 0.008 A l l models based on a sample size of 317 respondents *Age is age in years divided by 100 167 TABLE 5.7: DISCOUNTED QALYS GENERATED BY INDIRECT UTILITY METHOD IN THE MARKOV MODEL Indirect Utility Method Discounted* Mean QALY ± SD Mean Difference ± SD Infliximab plus MTX Strategy MTX Alone Strategy HUI2 5.38 ± 0 . 8 9 4.12 ± 0 . 6 3 1.27 ± 0 . 4 3 HUI3 3.74 ± 1.27 1.74 ± 0 . 7 2 1.99 ± 0 . 6 4 SF-6D 4.74 ± 0.69 3.75 ± 0 . 5 0 0.98 ± 0 . 3 5 EQ-5D 4.72 ± 0.94 3.30 ± 0 . 5 8 1.43 ± 0 . 4 7 *Discounted at 3% per annum 168 TABLE 5.8: EXPECTED COSTS AND INCREMENTAL COST-UTILITY RATIOS GENERATED BY THE INDIRECT UTILITY METHODS Strategies Cost $ per Patient* 95% Confidence Limits^ (1) M T X alone $133,737 $131,007-$136,466 (2) Infliximab plus MTX $199,729 $197,747-$201,713 Difference (2) - (1) $65,993 $64,412 - $67,573 Incremental Cost per QALY* HUI2-QALY $52,078 $48,850 - $55,528 HUI3-QALY $33,092 $30,887 - $35,436 SF-6D-QALY $67,005 $62,773 -$71,540 EQ-5D-QALY $46,159 $43,086 - $49,438 * Discounted at 3% per annum * Discounted at 3% per annum f Generated by application of Fieller's theorem 169 TABLE 5.9: UNIVARIATE SENSITIVITY ANALYSIS - INCREMENTAL COST-UTILITY RATIO (INCREMENTAL COST PER QALY) BY INDIRECT UTILITY METHOD Discount 0% Discount 5% Discount 7% Infliximab Cost* $200 to $500 per week HUI2-QALY $56,318 $51,302 $53,762 $40,918 to $147,454 HUI3-QALY $39,190 $34,851 $35,951 $27,485 to $98,207 SF-6D-QALY $75,142 $84,902 $77,483 $46,616 to$169,237 EQ-5D-QALY $53,518 $47,830 $49,609 $37,084 to $138,098 * At the base case discount rate (3%) for both costs and QALYs 170 FIGURE 5.1: A SCHEMATIC REPRESENTATION OF THE MARKOV, HAQ-BASED MODEL USED FOR THE COST-EFFECTIVENESS ANALYSIS "STAGE" REFERS TO THE MARKOV TRANSITION CYCLE. FOR SIMPLICITY, ONLY TRANSITIONS FROM H A Q 0 ARE SHOWN. 171 FIGURE 5.2: KAPLAN-MEIER SURVIVAL CURVES FROM THE 100,000 MONTE CARLO SIMULATIONS .7 * S S B S S : 0 2 4 6 8 10 12 Follow-up Time (Years) SURVIVAL CURVES FOR THOSE ON INFLIXIMAB PLUS M T X (UPPER BLUE LINE) AND M T X ALONE (LOWER RED LINE). T H E DIAMONDS (0) REPRESENT THOSE WHO WERE CENSORED. 172 FIGURE 5.3: FIELLER'S THEOREM CONFIDENCE LIMITS PLOTTED ON THE COST-EFFECTIVENESS PLANE 10 1.2 1.4 1.6 1.8 2.0 QALY difference (A QALY) Each ellipse within each utility-defined QALY covers 5%, 50% and 95% of integrated joint density between cost and the QALY differences. The lines are the upper and lower confidence limits using Fieller's theorem. As noted by Briggs et al. (ref. 22), the "wedge" defined by Fieller's confidence limits falls inside the 95% ellipse. 173 FIGURE 5.4: COST-UTILITY (or COST-EFFECTIVENESS) ACCEPTABILITY CURVES FOR EACH INDIRECT UTILITY MEASURE Cost-Effectiveness Acceptability Curve (Fieller's Thereom) 30000 40000 50000 60000 70000 Value of ceiling ratio ($) 174 APPENDIX I: MARKOV MODEL To conduct a cost-effectiveness analysis of infliximab plus methotrexate ( M T X ) compared to M T X alone in the treatment of severe rheumatoid arthritis using the A T T R A C T trial as a source o f clinical outcomes, we defined a Markov model with 26 health states: 25 states based upon increments of 0.125 in a Health Assessment Questionnaire ( H A Q ) score from 0 (no disability) to 3.0 (worst level of disability) and one absorbing state, death. The length of our Markov cycle was one week. A n important assumption made was that patients in a given initial health state have a constant probability per unit time of making a transition into any other given state independently of how much time has already passed in the initial state. This assumption, referred to as the Markov property, was essential in modeling the prognosis with a finite number of states. We also assumed that, given that death does not occur, the intermediate transition probabilities from one H A Q state to another are gender and age independent and therefore constant through time. The probability of all-cause mortality in a unit of time for people with rheumatoid arthritis however is a function of the H A Q state and age which we have denoted asp(h,a). A s mentioned, the final transition probabilities were the combination of the intermediate transition probabilities and mortality probability. The transition matrix below describes the transition probabilities in the Markov model (also shown it Tables 5.3 and 5.4). 175 T(h,a) = p^(\-p(h,a)) p>. 2(1-p(h, a)) pi,(\-p(h,a)) p2.2(1-p(h, a)) p(h,a) p(h,a) 0 0 Where pi,, = H A Q 0 to H A Q 0 /?/, 2 = H A Q 0 t o H A Q 0.125 p2,i = H A Q 0.125 to H A Q 0 P25.25 = H A Q 3.0 to H A Q 3.0 PU6, i= 1,2,3 26 = state i to death p(h,a) = all-cause, age and HAQ-dependent mortality experienced by people with rheumatoid arthritis The utility accrued for cycle t, referred to as the cycle sum, was calculated by: Cycle Sum = ]T p,(t)xU, where P , (t) was the distribution of patients in the 26 health states at cycle t and Uj was the utility associated with state /. Simulations were run for 10 years (520 cycles). The cycle sum was then added to a running total - the cumulative utility- which is what was required for cost-utility analysis. /=1,2...26 176 The distribution of patients at each cycle (i.e. one week) was calculated using the transition matrices T(h,a) and the distribution of patients at age to, distributed among the 26 health states (pi(t0), p2(t0) p^o) P4(t0) ps(t0) P6(t0) p7(ta) P26(t0)), then the distribution of patients at age t was given by: t (p,(t),p>(t),p,(t)...pK(t)) = (pl(t«),P2(t„),p,(t»)...px,(to)) T(HAQ,age) age=to+\ where the right hand side of the equation was the matrix product of a row vector (Pi(t0), P2(t0) Pi(t0) P4t0) P5(t0) P6(t0) P7(t0) P26(to)), and t-t0-l transition matrices. 177 APPENDIX II: C - C O D E USED T O R U N T H E M A R K O V M O D E L /* — The following code was compiled under lcc-win32 — */ #include <stdio.h> #include <stdlib.h> #include <string.h> #include <TIME.H> #include "uniform.c" #include "ltqnorm.c" #define ranunif genrand_real3 #define setseed initgenrand #defme str(x) #x #define xstr(x) str(x) //changeable: #define RESTART #define MM 25 #define MM1 24 #define ENDSTAGE 520 (520 weeks = 10 years) #define SEED 14357 #defme RATE 0.03 (discount rate) #define BLKN 1000 (for 2nd order Monte Carlo - # of simulations) #define BLKZ 50 (for 1st order Monte Carlo - # of patients simulated per 2n d order Monte Carlo) #define CFLX 500 (for univariate sensitivity analysis = cost of infliximab up to 500 dollars) #defme MCCOUT sim500.txt int haqmcc(void); int rchoose (double *xx, int mm, double ru) // chooses the first element of a vector of length mm that equals or exceeds ru { int jj; int ii; jj = mm-l; for (ii = 0; ii <= mm-1; ii++) if (ru < xx[ii]) { jj = »; break; } return jj; } // the function that does the simulations: int haqmcc(void) { //data input char *rnames="block outtime ageO gender dead haqiO haqil finaldose rcost rqalyl rqaly2 rqaly3 rqaly4 inflix"; FILE *fpt, *fout; double xcol=0; //intermediate, now misnamed double **ctmtx, **ctflx, **ctmat, *chaq0, *chaq; int nread=0; int ii, jj, stage, iter, block; int haqi, haqiO, trak, doseO, dose,outtime,gender,dead; double *vhaq=(double *)malloc(MM*sizeof(double)); 178 unsigned long s=SEED; double age, ageO, haq, Ipdying, rate, dfact; double directcost, indirectcost, wcage, cinflix, cmethotrexate=6.05; double cost, rcost; double log52=log(52); double *workcapacity=(double *)malloc(MM*sizeof(double)); double *cinfliximab=(double *)malloc(MM*sizeof(double)); double *bchoices=(double *)malloc(2*sizeof(double)); double util[4]={0,0,0,0}; double rqaly[4]={0,0,0,0}; //read matrix ctmtx in //ctmtx is matrix with columns that are cumulated versions of the columns of the mtx transition matrix ctmtx=(double **)malloc(MM*sizeof(double *)); ctmtx[0]=(double *)malloc(MM*MM*sizeof(double)); for(ii= 1 ;ii<=24;ii++) ctmtx[ii]=ctmtx[ii-1 ]+MM; if((fpt=fopen("ctmtx.txt","r"))==NULL) printf("%s","Error opening file ctmtx for reading"); for(ii=0;ii<=24;ii++) { for(jj=0uJ<=24Jj++){ nread=nread+fscanf(fpt,"%le", &xcol); ctmtx[jj][ii]=xcol; //column major order }} fclose(fpt); //read matrix ctflx in //ctflx is formed from the transition matrix for the infliximab group nread=0; ctflx=(double **)malloc(MM*sizeof(double *)); ctflx[0]=(double *)malloc(MM*MM*sizeof(double)); for(ii=l;ii<=24;ii++) ctflx[ii]=ctflx[ii-l]+MM; if((fpt=fopen("ctflx.txt","r"))==NULL) printf("%s","Error opening file ctflx for reading"); for(ii=0;ii<=24;ii++) { for(iJ=0dJ<=24uj++){ nread=nread+fscanf(fpt,"%le", &xcol); ctflx[jj][ii]=xcol; //column order }} fclose(fpt); //read vector chaqO in //baseline cumulative HAQ distribution nread=0; chaqO=(double *)malloc(MM*sizeof(double)); if((fpt=fopen("newchaq0.txt","r"))==NULL) printf("%s","Error opening file chaqO for reading"); for(ii=0;ii<=24;ii++) { nread=nread+fscanf(fpt,"%le", &xcol); chaqO[ii]=xcol; } fclose(fpt); #ifdef RESTART fout=fopen(xstr(MCCOUT),"w"); fprintf(fout,"%s",rnames); #else 179 fout=fopen(xstr(MCCOUT),"a"); #endif /* RESTART */ setseed(s); bchoices[l]=1.0; for(ii=0;ii<=MMl;ii++){ vhaq[ii]=(double)ii/8; // possible haq values workcapacity[ii]=1.09132 - 0.18404*vhaq[ii]; workcapacity[ii]= (workcapacity[ii]>l) ? 1 : workcapacity[ii]; // correct in expected calc. cinfliximab[ii]= (CFLX *0.67)+ (2*CFLX*0.33) + 25.50; // double vial use } for(block=l ;block<=BLKN;block++){ //constants that are not altered during the Markov process can go here. ageO = .52+0.092*ltqnorm(ranunif()); bchoices[0]=l-0.2243; gender=rchoose(bchoices,2,ranunif()); haqiO=rchoose(chaqO,MM,ranunif()); iter=l; for(dose0=0;dose0<=l ;dose0++){ for(iter= 1 ;iter<=B LKZ; iter++) { // initial values for quantities that do change, go here. haqi=haqiO; haq=vhaq[haqiO]; dead=0; dose=dose0; age=ageO; ctmat=(dose==0) ? ctmtx : ctflx; rcost=0; for(ii=0;ii<4;ii++) rqaly[ii]=0; outtime=0; for(stage=0;stage < ENDSTAGE ;stage++) { lpdying=-11.4184+(17.0462*age)-(0.2582*haq)-(7.9548*age*age)+(0.2903*haq*haq)-log52; dead=(ranunif()<exp(lpdying)); if(dead==l) break; dfact= 1 /pow(( 1 +RATE),(stage/52)); // weekly costs directcost=(exp (6.4924-1.183*age+0.1459*(l-gender)+0.3918*haq+0.5*1.6663))/26; if(age<=0.65) wcage=workcapacity[haqi]; else wcage=l; indirectcost=((l-wcage)*((47085*(l-gender) + 33774*gender))/52); cost=(directcost+indirectcost+dose*cinfliximab[haqi]+cmethotrexate)*dfact; util[0]= 0.8286 - 0.2947*haq +0.0545*age; //ii+1: 1=HUI3, 2=SF-6D; 3=HUI2, 4=EQ5D util[l]= 0.6863 - 0.126*haq +0.1316*age; util[2]= 0.8759 - 0.1673*haq + 0.0263*age; util[3]= 0.7207 - 0.197*haq + 0.247*age; for(ii=0;ii<=3;ii++) { util[ii]=util[ii]*dfact; // if dead, utility, cost are zero rqaly[ii]=rqaly[ii]+util[ii]/52; } rcost=rcost+cost; 180 if(dose>=l) { //if dose==0, infliximab is permanently discontinued.so no need to track if(haq>=2.0) trak=trak+1; else trak=0; if(trak==12) {dose=(dose+l)%3;cinflix=dose*cinfliximab[haqi];trak=0;} } if(dose< 1) ctmat=ctmtx; // "else" would give wrong result. Would fail to change cmat. age=age + 1/5200; chaq=ctmat[haqi]; haqi=rchoose(chaq,MM,ranunif()); haq=vhaq[haqi]; outtime=stage; } fprintf(fout,"\n%d %d %f %d %d %d %d %d %i %f %f %f %f %d", block, outtime, ageO, gender, dead, haqiO, haqi, dose, (long)rcost, rqaly[0], rqaly[l], rqaly[2], rqaly[3], doseO); //printf("%f',clock()/CLOCKS_PER_SEC); } } if(block % 50 == 0) printf("\n%d",block);} fprintf(fout,"\n"); fclose(fout); return 0; } int main(void){ int rtv; time_ttl,t2,tt; tl=time(&tt); rtv=haqmcc(); t2=time(&tt); printf("\n\n%f ,difftime(t2,tl)); return rtv; } *For calculation of Itqnorm * Lower tail quantile for standard normal distribution function. * This function returns an approximation of the inverse cumulative * standard normal distribution function. I.e., given P, it returns * an approximation to the X satisfying P = Pr{Z <= X} where Z is a * random variable from the standard normal distribution. * The algorithm uses a minimax approximation by rational functions * and the result has a relative error whose absolute value is less * than 1.15e-9. * * Author: Peter J. Acklam * Time-stamp: 2002-06-09 18:45:44+0200 * E-mail: jacklam@math.uio.no * WWW URL: http://www.math.uio.no/~jacklam * C implementation adapted from Peter's Perl version */ #include <math.h> #include <errno.h> #include <stdio.h> 181 /* Coefficients in rational approximations. */ static const double a[ ] = { -3.969683028665376e+01, 2.209460984245205e+02, -2.759285104469687e+02, 1.383577518672690e+02, -3.066479806614716e+01, 2.506628277459239e+00 }; static const double b[ ] = { -5.447609879822406e+01, 1.615858368580409e+02, -1.556989798598866e+02, 6.680131188771972e+01, -1.328068155288572e+01 }; static const double c[ ] = { -7.784894002430293e-03, -3.223964580411365e-01, -2.400758277161838e+00, -2.549732539343734e+00, 4.374664141464968e+00, 2.938163982698783e+00 }; static const double d[ ] = { 7.784695709041462e-03, 3.224671290700398e-01, 2.445134137142996e+00, 3.754408661907416e+00 }; #define LOW 0.02425 #define HIGH 0.97575 double ltqnorm(double p) { double q, r; errno = 0; if(p<0||p>l) { errno = EDOM; return 0.0; } else if (p == 0) { errno = ERANGE; return - H U G E V A L /* minus "infinity" */; } else if (p == 1) { errno = ERANGE; return HUGE_VAL /* "infinity" */; } else if(p<LOW) { /* Rational approximation for lower region */ q = sqrt(-2*log(p)); return (((((c[0]*q+c[l])*q+c[2])*q+c[3])*q+c[4])*q+c[5]) / ((((d[0] *q+d[ 1 ])*q+d[2])*q+d[3])*q+1); } else if (p> HIGH) { /* Rational approximation for upper region */ q = sqrt(-2*log(l-p)); return -(((((c[0]*q+c[l])*q+c[2])*q+c[3])*q+c[4])*q+c[5]) / ((((d[0]*q+d[ 1 ])*q+d[2])*q+d[3])*q+1); } else { /* Rational approximation for central region */ q = p-0.5; r = q*q; return (((((a[0]*r+a[l])*r+a[2])*r+a[3])*r+a[4])*r+a[5])*q / (((((b[0]*r+b[l])*r+b[2])*r+b[3])*r+b[4])*r+l); } /•calculates uniform A C-program for MT19937, with initialization improved 2002/1/26. Coded by Takuji Nishimura and Makoto Matsumoto. Before using, initialize the state by using init_genrand(seed) or init_by_array(init_key, key_length). Copyright (C) 1997 - 2002, Makoto Matsumoto and Takuji Nishimura, All rights reserved. Redistribution and use in source and binary forms, with or without modification, are permitted provided that the following conditions are met: 1. Redistributions of source code must retain the above copyright notice, this list of conditions and the following disclaimer. 2. Redistributions in binary form must reproduce the above copyright notice, this list of conditions and the following disclaimer in the documentation and/or other materials provided with the distribution. 3. The names of its contributors may not be used to endorse or promote products derived from this software without specific prior written permission. THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, 183 PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.) Any feedback is very welcome. http://www.math.keio.ac.jp/matumoto/emt.html email: matumoto@math.keio.ac.jp */ #include <stdio.h> /* Period parameters */ #define N 624 #defme M 397 #define M A T R I X A 0x9908b0dfUL /* constant vector a */ #define UPPER_MASK 0x80000000UL /* most significant w-r bits */ #define LOWERMASK 0x7fffffffUL /* least significant r bits */ static unsigned long mt[N]; /* the array for the state vector */ static int mti=N+l; /* mti==N+l means mt[N] is not initialized */ /* initializes mt[N] with a seed */ void init_genrand(unsigned long s) { mt[0]= s & OxffffffffUL; for (mti=l; mti<N; mti++) { mt[mti] = (1812433253UL * (mt[mti-l] A (mt[mti-l] » 30)) + mti); /* See Knuth TAOCP Vol2. 3rd Ed. P. 106 for multiplier. */ /* In the previous versions, MSBs of the seed affect */ /* only MSBs of the array mt[]. */ /* 2002/01/09 modified by Makoto Matsumoto */ mt[mti] &= OxffffffffUL; /* for >32 bit machines */ } } /* initialize by an array with array-length */ /* init_key is the array for initializing keys */ /* keylength is its length */ void init_by_array(init_key, key_length) unsigned long init_key[], key_length; { intij , k; init_genrand(19650218UL); i=l;j=0; k = (N>key_length ? N : keylength); for(; k; k~) { mt[i] = (mt[i] A ((mt[i-l] A (mt[i-l] » 30)) * 1664525UL)) + initkeyfj] + j ; /* non linear */ mt[i] &= OxffffffffUL; /* for WORDSIZE > 32 machines */ i++;j++; if (i>=N) { mt[0] = mt[N-l]; i=l; } if (j^keyjength) j=0; 184 } for(k=N-l;k;k») { mt[i] = (mt[i] A ((mt[i-l] A (mt[i-l] » 30)) * 1566083941UL)) - i; /* non linear */ mt[i] &= OxffffffffUL; /* for WORDSIZE > 32 machines */ i++; if (i>=N) { mt[0] = mt[N-l]; i=l;} } mt[0] = 0x80000000UL; /* MSB is 1; assuring non-zero initial array } /* generates a random number on [0,Oxffffffff]-interval */ unsigned long genrand_int32(void) { unsigned long y; static unsigned long mag01[2]={0x0UL, MATRIX_A}; /* mag01[x] = x * M A T R I X A for x=0,l */ if (mti >= N) {/* generate N words at one time */ int kk; if (mti == N+l) /* if init_genrand() has not been called, */ init_genrand(5489UL); /* a default initial seed is used */ for (kk=0;kk<N-M;kk++) { y = (mt[kk]&UPPER_MASK)|(mt[kk+l]&LOWER_MASK); mt[kk] = mt[kk+M] A (y » 1) A mag01[y & OxlUL]; } for(;kk<N-l;kk++) { y = (mt[kk]&UPPER_MASK)|(mt[kk+l]&LOWER_MASK); mt[kk] = mt[kk+(M-N)] A (y » 1) A magOl[y & OxlUL]; } y = (mt[N-l]&UPPER_MASK)|(mt[0]&LOWER_MASK); mt[N-l] = mt[M-l] A (y » 1) A mag01[y & OxlUL]; mti = 0; } y = mt[mti++]; /* Tempering */ y A = ( y » l l ) ; y a= ( y « 7 ) & 0x9d2c5680UL; y A= (y « 15) & 0xefc60000UL; y A = ( y » 18); return y; } /* generates a random number on [0,0x7fffffff]-interval */ long genrand_int3 l(void) { return (long)(genrand_int32()»l); } 185 /* generates a random number on [0,l]-real-interval */ double genrand_reall(void) { return genrand_int32()*(l .0/4294967295.0); /* divided by 2A32-1 */ } /* generates a random number on [0,l)-real-interval */ double genrand_real2(void) { return genrand_int32()*(l .0/4294967296.0); /* divided by 2A32 */ } /* generates a random number on (0,l)-real-interval */ double genrand_real3(void) { return (((double)genrand_int32()) + 0.5)*( 1.0/4294967296.0); /* divided by 2A32 */ } /* generates a random number on [0,1) with 53-bit resolution*/ double genrand_res53(void) { unsigned long a=genrandint32()»5, b=genrand_int32()»6; return(a*67108864.0+b)*( 1.0/9007199254740992.0); } /* These real versions are due to Isaku Wada, 2002/01/09 added */ /* int main(void) { int i; unsigned long init[4]={0xl23, 0x234, 0x345, 0x456}, length=4; init_by_array(init, length); printf("1000 outputs of genrand_int32()\n"); for (i=0; i<1000; i++) { printf("%101u ", genrand_int32()); if(i%5==4)printf("\n"); } printf("\nl000 outputs of genrand_real2()\n"); for (i=0; i<1000; i++) { printf("%10.8f", genrand_real2()); if(i%5==4) printf("\n"); } return 0; } */ 186 CHAPTER 6 THE IMPACT OF LOW FAMILY INCOME ON SELF-REPORTED H E A L T H OUTCOMES IN PATIENTS WITH RHEUMATOID ARTHRITIS WITHIN A PUBLICLY-FUNDED H E A L T H CARE ENVIRONMENT 6.1 FOREWORD This chapter has been accepted for publication in Rheumatology. The candidate is first author of this manuscript which was co-authored by Dr. Larry Lynd , currently a post-doctoral fellow who had trained under Dr. Anis and whose thesis was on the relationship between socioeconomic status and beta-agonist use. Drs. Aslam Anis and John Esdaile, co-supervisors of the candidate are also included as co-authors of the submitted manuscript. The candidate's role in this manuscript was the conception of the problem, data entry and manipulation, co-ordination of study recruitment, development and performance of all statistical analysis, and the writing of the final manuscript. 6.2 INTRODUCTION A s comprehensively outlined by recent reviews, it has been well-established that self-rated health is an independent, strong predictor of morbidity and mortality. 1 ' 2 A s such, self-reported health outcomes are increasingly being assessed in the evaluation of chronic disease states such as rheumatoid arthritis ( R A ) . 3 ' 4 Socio-economic status (SES) has been shown to be strongly associated with self-reported health and therefore, more attention is being directed towards its determination and its role in the development and progression of disease.5 187 Preference-based generic health related quality of life ( H R Q L ) instruments are often used as self-reported measures of health and are increasingly used in economic evaluations as weighting factors for quality adjusted life years ( Q A L Y s ) . Similarly, disease-specific H R Q L and functional status measures are often used as self-reported measures to fully assess the impact of chronic diseases and as monitoring tools in clinical practice. Despite the wel l -known association between self-reported health outcomes and SES, there has been little work evaluating the impact of SES on the results obtained using preference-based and disease specific H R Q L measures in patients with R A . In Canada, there is a universal health care system that is governed by the principles outlined in the Canada Health Act. However, despite the principles of public administration, universality, accessibility, portability, and comprehensiveness outlined in the Act , there remain large socio-economic inequalities in health within our system. 6 ' 7 In British Columbia, these disparities have been investigated in well-established, chronic diseases such as asthma, and H I V / A I D S . 9 However, to our knowledge, the role of SES in self-reported health r outcomes experienced by those with R A in a North American country with universal health care has not been investigated to date. Therefore, the purpose of this study was to investigate the relationship between SES and self-reported health outcomes (both preference-based generic and disease-specific H R Q L and functional status) in a sample of R A patients. Our hypothesis was that, despite adjustment for measures of disease severity, self-perceived generic and disease-specific H R Q L and functional status would be worse in patients of lower SES. 188 6.3 METHODS 6.3.1 Study Sample and Design Three hundred and thirteen English speaking adults between 19 and 90 years of age diagnosed with rheumatoid arthritis by a rheurnatologist and who resided in the Greater Vancouver Regional District ( G V R D ) or the rural Okanagan region of British Columbia were recruited into this cross sectional study. One hundred and ninety seven (63%) patients were recruited directly by study rheumatologists whereas 116 were recruited via targeted mailouts to individuals with R A identified by their rheumatologists. The institutional and university ethics review boards approved the study protocol, and informed consent was obtained from each participant. 6.3.2 Generic Health-Related Quality of Life Measurement The Short Form 6D (SF-6D), the Health Utilities Index Mark 2 and Mark 3 (HUI2 and H U D , respectively), and the EuroQoL (EQ-5D) were used to measure generic H R Q L , all of which have shown to have cross-sectional construct validity in patients with R A . 1 0 Since these instruments measure different dimensions/attributes, all were included to assess a broader range of possible health outcomes Brazier et al. created the SF-6D to derive a preference-based measure of health from the Short Form-36. 1 2 The SF-6D measures six dimensions, each with four to six levels and include physical functioning, role limitation, social functioning, pain, mental health, and vitality. A total of 18,000 health states can be defined by this classification system. O f these states, 249 were valued using the standard gamble (SG) in a sample of 611 U K participants 189 from the general population. Modeling was used to generate the remainder of the values for the health states. Thus, in the final model, the boundaries of the SF-6D multi-attribute utility values were +0.30 and 1.00 (where zero is death and 1.00 is perfect health). The minimally important difference (MID) is thought to be about 0.03. 1 0 ' 1 3 The HUI2 was created to measure H R Q L in pediatric cancer patients and captures up to seven attributes of health: sensation (vision, hearing, speech), mobility, emotion cognition, self-care, pain, and fertility (optional). 1 4 ' 1 5 The HUI2 uses 4 or 5 levels on each attribute for a total of 24,000 possible unique health states. The boundaries on the overall multiattribute utility score are -0.03 to 1.00 and the M I D is thought to be about 0.03 to 0.05. The H U D was created initially to measure H R Q L in the National Population Health Survey. 1 4 The 8 attributes of the H U D are: vision, hearing, speech, ambulation, dexterity, emotion, cognition, and pain. Preference scores can be calculated for each attribute. The H U D system uses 5 or 6 levels for each attribute for a total of 972,000 possible unique health states.15 The boundaries on the overall multiattribute utility score on the H U D are -0.36 to 1.00 and a (MID) is thought to be about 0.06. 1 0 The E Q - 5 D assesses five domains of health: mobility, self-care, usual activity, pain/discomfort, and anxiety/depression. 1 5 Each domain has 3 levels corresponding to a total of 243 possible unique health states. The boundaries on the E Q - 5 D are -0.59 to 1.00. The M I D for the E Q - 5 D is thought to be from 0.03 to 0.05. 1 0 ' 1 5 6.3.3 Functional Status Measurement The Health Assessment Questionnaire ( H A Q ) 1 6 was one of the first self-reported, functional status (disability) measures developed and has become one of the dominant instruments in musculoskeletal diseases including R A . 1 6 The H A Q has been utilized to assess 190 functional status for approximately two decades and is a mandated outcome for clinical trials in R A . The H A Q is a measure of physical disability that assesses a respondent's ability to complete everyday tasks in areas such as dressing and grooming, rising, eating, walking, personal hygiene, reach, grip and other activities (such as getting into and out of a car). Each of these areas is assigned a section score that is further adjusted to account for the use of any aids, devices or help from another person. These scores are then summed and averaged to give an overall score between 0.0 (best possible function) to 3.0 (worst function). A change in H A Q score o f 0.25 is considered to represent the M I D . 1 7 ' 1 8 6.3.4 RA Specific Quality of Life Measure The Rheumatoid Arthritis Quality of Life ( R A Q o L ) questionnaire 1 9 is a newly developed instrument and is the first patient-completed instrument specifically designed for use with R A patients. 1 9 It was derived directly from qualitative interviews with relevant patients and considers aspects of many areas of life that are detrimentally impacted by R A . The R A Q o L is meant to be a comprehensive, disease-specific scale that w i l l be more responsive to change than previous scales used in R A . The R A Q o L consists of 30 questions with binary responses that assess such aspects of R A as moods and emotions, social life, hobbies, everyday tasks, personal and social relationships, and physical contact. The R A Q o L is scored by assigning one point for each affirmative response and no points for negative responses. Thus, scores range from 0 (best RA-specific quality o f life) to 30 (worst R A -specific quality of life). We have estimated that the M I D for the R A Q o L is approximately 1.7to2.0. 1 0 191 6.3.5 Clinical Measurements In addition to demographic questions, participants were asked questions regarding their R A management and severity including D M A R D and prednisone therapy over the past three months. These questions included swollen and tender joint count (using the mannequin-based 42 joint count methodology) 2 0, 10 cm pain visual analogue scale ( V A S ) , and five point Likert scales of self-perceived R A severity and control and duration of R A . The erythrocyte sedimentation rate (ESR) was obtained from the health record. Health utilization measurements over the previous year such as hospitalization, use of other professional services (physiotherapy, occupational therapy, home care, massage therapy, etc.), and the rental or purchase of physical aides (walker, wheelchair, cane, etc.) were also collected. 6.3.6 Socioeconomic Status The association between SES and generic and disease-specific H R Q L was tested at both the individual and population level of SES. Individual measures of SES were based on self reported annual household income and education. Annual household income was classified as less than $20,000, from $20,000 to $50,000, and greater than $50,000 as previously defined. Since number o f people l iving in each household can have an impact on annual household income, this variable was included in all analyses. Education was classified based on the number of years of post-secondary education and highest level of education completed, ordinally categorized as less than high school, high school or trade diploma, or at least a university bachelor's degree. Using a Postal Code Conversion file, we determined the census tract where each participant's current residence was located. From this we determined the neighbourhood 192 characteristics related to SES for each participant's current residence that could be derived from the province's (British Columbia) Census data. 2 1 Census tract level variables deemed representative o f SES included median neighbourhood income, the proportion of the population over 20 years of age completing at least a bachelor's education, and the neighbourhood unemployment rate. 6.3.7 Statistical Analysis Spearman's rho was used to examine the correlations between the different measures of SES (a Spearman's rho of > 0.50 or < -0.50 were considered be strong, while values between -0.49 to -0.30 or 0.30 to 0.49 were considered moderate and values between -0.30 and 0.30 were considered to be weak). 2 2 The dependent variables for all other analyses were the global utility scores for the SF-6D, the HUI2 , the H U D , the E Q - 5 D and the overall R A Q o L and H A Q disability score. Univariate associations between the dependent variables and demographic characteristics (age, and gender), disease severity measures (duration of R A , self-reported pain, swollen/tender joint count, self-reported severity and control), and measures of SES were assessed using simple linear regression. Comparisons of self-reported health and R A severity measures across categories of SES were conducted. Statistical comparisons were made using either A N O V A , Student's t-test or %2 test, where appropriate. For the primary analysis, ordinary least squares (OLS) regression was used to adjust for R A clinical measures and then to assess the relationship between SES and the dependent variables. Each SES variable was modeled separately. A l l two-way interactions between SES and R A clinical measures were also tested in the multiple regression models. Mode l fit was 193 assessed using adjusted R and standardized residuals were plotted against standardized predicted scores to assess each model for homoscedasticity. To account for the possibility that there is an inter-relationship between the dependent variables (the generic or disease-specific H R Q L or functional status score) and annual household income, two-stage least squares (TSLS) regression was used. For T S L S , the "problematic" predictor variable (in our case, income) must be continuous rather than categorical. Therefore, for the T S L S analysis, the self-reported annual income variable was converted to a measure in increments of $10,000.00 (as reported on the original questionnaire). Instrumental variables are not influenced by others in the model but have influence on the variable of interest. Thus, for the first stage of the regression, the instrumental variables used in our analysis to predict annual household income were marital status, number of people in the household, and educational status which were all highly correlated with income but lowly correlated with H R Q L or functional status measures. In the second stage, the predicted values of income were regressed on the generic H R Q L scores (HUI2, H U B , SF-6D, or EQ-5D), the disease-specific H R Q L scores ( R A Q o L ) and the H A Q scores to yield unbiased parameter estimates. These parameters were compared to O L S regression coefficients (from models using the same annual household income variable) to determine how closely they matched. 6.4 RESULTS Characteristics of the study participants are presented in Table 6.1. The average age of the sample was 61.5 ± 25.9 years, there were more women (79%) and males tended to be 194 older (66 vs. 60 years, p=0.004). The mean number of years since being diagnosed with R A was 13.9 ± 11.4. Fifty eight (19%) of the 313 participants did not report their annual household income and were therefore excluded from the income analysis at the individual level. To determine i f this subgroup was different from the group that reported annual income levels, we compared the values for the variables presented in Table 6.1 between these two groups. There were no differences between these variables in the two subgroups adding confidence that no bias was introduced through these missing data. The sample was well distributed across levels of SES (Table 6.2). Although 50 (16%) had an annual household income below $20,000, 47 (15%) reported incomes over $70,000, 15 of which exceeded $100,000. There was a significant relationship between self-reported annual income and number of household members with 56% of those reporting income of less <20K per year being the only household member compared with 17% and 9% of those reporting annual household incomes of 20 - 5 OK and >50K per year, respectively (pO.OOOl). The mean number of household members was also significantly lower for those with an annual household income reported as <20K (mean 1.7, standard deviation 1.1), compared with those reporting annual household incomes of >50K per year (mean 2.6, standard deviation 1.4, pO.OOOl). Wi th respect to self-reported education, 114 (51%) completed at least one year of post secondary education (median 1.0) and 52 (17%) received at least a bachelors degree; 32 (10%>) completed at least five years of post-secondary education. The sample was also heterogeneous for the contextual measures (neighbourhood median income, prevalence of having received at least a bachelor's degree, and percent neighbourhood unemployment), with participants residing in neighbourhoods with unemployment rates varying from 1% to 195 18% (median = 9%) and the prevalence of a bachelor's degree ranging from 2% to 52% (median=T2%). The self-reported measures of SES (annual income and education) were moderately correlated (Spearman's rho 0.33, pO.0001) . However, the contextual measures (tended to be more highly correlated amongst themselves with correlation coefficients ranging from 0.56 to 0.73 (all pO.0001) . Correlations between self-reported and contextual measures were mostly low with coefficients ranging from 0.11 to 0.31 (all p<0.05). Unadjusted associations between the generic H R Q L measures (the SF-6D, the H U B , the HUI2 , and the EQ-5D) , or the R A Q o L and the H A Q , and demographic, SES and R A severity variables are presented in Tables 6.3, 6.4 and 6.5, respectively. There were no associations between age and any of the generic health related quality of life measures; however, there was a significant positive association with the H A Q disability index (p=0.02). On average, men had significantly better generic and disease-specific quality o f life and functional status as measured by most of the instruments. Most of the R A severity variables were significant across all of the H R Q L instruments and the H A Q . N o associations were found between contextual measures of SES (neighbourhood median income, prevalence of bachelor's degrees, and proportion of neighbourhood unemployed) and any of the generic or disease-specific H R Q L measures or the H A Q (Tables 6.3, 6.4 and 6.5) with the exception of median neighbourhood income and the E Q - 5 D (p=0.02). For both self-reported income and education levels, comparison of mean values of all measures (SF-6D, H U B , HUI2 , E Q - 5 D , R A Q o L , H A Q ) by A N O V A showed a significant gradient across SES categories (Table 6.6). For example, lower levels of income were associated with poorer generic and disease-specific H R Q L and physical function. Results were confirmed with nonparametric tests. In general, all measures of health utilization 196 (hospitalization, use of professional services, and use of physical aides/equipment), joint damage, and health status showed a consistent gradient of worse functioning in those in lower self-reported SES categories (self-reported annual family income and self-reported education level) (Table 6.6). There were no differences across SES categories for type or number of D M A R D s or the use of prednisone over the past three months. There were no gradients or associations between any of these variables and the contextual measures of SES. For the proximate (self-reported) SES measures, the differences in these variables were statistically significant for most of the subjective R A severity measures (self-reported R A severity, patient global assessment of disease activity) and both the global scores of the generic and disease-specific quality of life measures. O f note, most of the physically-based clinical measures (such as joint counts) were not significantly different between the different SES levels suggesting no physical differences in disease severity. The results of the O L S regressions with adjustment for disease severity measures show significant associations between the H U D and the SF-6D overall scores and the H A Q for self-reported income (Figures 6.1, 6.2 and 6.3). There were no other significant associations for other measures of SES (self-reported education of contextual measures) after adjustment for disease severity. To account for differences in household size across self-reported annual household income categories, we included number of people in the household in the regression models. Other measures of disease management and severity that were tested but did not improve the overall fit of the model included number and type of D M A R D s used within past three months, number of other chronic diseases, swollen joint count (collinear with tender joint count) and erythrocyte sedimentation rate. O f note, the R A Q o L did not significantly differ across self-reported income categories. A l l differences in 197 the p-estimates between the lowest and highest income categories exceeded the M O D for the SF-6D, H U D , and the H A Q . The results of the T S L S regression analyses also reveal that self-reported annual income was significantly associated with most of the generic H R Q L measures (p=0.03, p=0.03 and 0.04 for the HUI2 , H U D and SF-6D, respectively) and the H A Q (p=0.002) but not for the R A Q o L (p=0.07), or the E Q - 5 D (p=0.14). A comparison of the beta-coefficients between O L S and T S L S regression revealed close agreement for the HUI2 (0.02 vs. 0.04), H U D (0.03 for both), the SF-6D (0.01 for both), the E Q - 5 D (0.03 and 0.04), the H A Q (-0.09 vs. -0.08) and the R A Q o L (-0.50 vs. -0.70). Thus, there likely was little, i f any, feedback between income and H R Q L or functional status in the sample adding further credence to the utilization of O L S . 6.5 DISCUSSION This study demonstrates a consistent and significant gradient in both generic and disease-specific H R Q L and functional status and other self-reported health measures across income and education categories. This gradient is maintained even after adjustment for R A severity and number of people in the household for self-reported income (but not education) categories. O f note, there was no significant gradient across SES measures for physically defined R A severity measures (disease process and joint damage measures). Thus, these results highlight how SES impacts a well-defined chronic disease such as R A by influencing how patients perceive and report their health status. These findings become particularly important when one considers that self-rated health predicts mortality even after controlling for a wide range of factors (demographic, 23 psychosocial, prior illness, physician's assessments and physiological measures). Thus, 198 from our results, we have determined that low SES predicts poor self-reported health independently of R A severity and may thus be a strong contributing factor to the early mortality and substantial morbidity seen in R A patients with low S E S . 2 4 ' 2 5 Another important finding is that the magnitudes of utility values assessed by both the H U D and the SF-6D significantly vary by SES independently of R A severity measures. In addition, although not statistically significant for the HUI2 and the E Q - 5 D , there was a pattern for higher scores in higher income groups. This finding has potentially important ramifications for results of cost-utility analyses in therapies for R A as investigators need to ensure balance between treatment groups in not only clinical and disease specific factors but also SES. Therefore, in order to avoid potential confounding or bias due to SES status in economic evaluation, one would need to: 1) verify SES at baseline in randomized controlled trials (RCT) where utility measures w i l l be used in a cost-utility analysis; 2) control for SES in observational studies where such measures may be used; and 3) ensure that the results obtained from the sample w i l l be generalizable to the population of interest (i.e. to the extent that studies do not report SES and/or the SES is not similar to the general population). Sculpher and O'Br ien outline some additional concerns with using the results of • 26 indirectly assessed utility scores that are influenced by income in cost-utility analysis. They state that often, in cost-benefit analysis, where willingness to pay is often utilized to assign monetary value to benefits, ability to pay biases such data in favour of the more affluent. Q A L Y s , as used as outcomes in cost-utility analyses, are thought to avoid this potential bias. However, the authors argue (and our results suggest) that this may not be the case. Specifically, the authors state that the effects of income could come into play when individuals are asked to value health states to generate utilities for the indirect utility instrument scoring functions or when the instrument is applied in the field. In our study, the 199 latter scenario is applicable. In this situation, the authors state that there is no reason why income effects should be excluded as these could be a relevant component of illness that may contribute to deficits in health status. However, these income effects could bias cost-utility analysis in at least two different ways: 1) when cross-national comparisons are being made and there are differences between countries in the levels of income maintenance available to the sick; and 2) the possibility of double-counting i f income effects of reduced health have already been factored into the valuation stage o f the instrument. For example, reduced quality of life that is mediated by loss of income should be counted in the denominator of the cost-effectiveness ratio. However, i f one also includes loss of income as an indirect cost in the numerator, than there is a potential to count these effects twice - in the numerator and the denominator. In R A , while it is well established that there are associations between low SES and morbidity and mortality, the mechanisms behind these associations are largely unknown. Callahan et a l . 2 7 reported that scores on a helplessness scale appeared to mediate a component of the association between formal education level and five year mortality. In a study attempting to identify a partial explanation for the association between low education and poor outcome in R A , K a t z 2 8 identified that self-care was strongly associated with education and thus concluded that low education was a proxy for a constellation of factors responsible for poor health outcomes. Therefore, the differences in self-reported health that we observed on both the generic and disease-specific H R Q L and the H A Q scales might be indicative o f helplessness or inability to complete self-care tasks in patients with lower SES. Our results generally support the findings by Brekke et a l . . 2 9 and McEntegart et a l . . 3 0 who showed that self-reported health outcomes, but not objective indices of disease activity, differed across groups based upon SES. Specifically, McEntegart et a l . . 3 0 , revealed how 200 patients l iving in more deprived areas in Scotland had poorer H A Q scores as compared to those l iving in more affluent areas. Similarly, Brekke et al.., who conducted their study as a comparison of R A patients from affluent west Oslo to those from deprived east Oslo, extended these findings to disease-specific and generic quality of life measures. Both of these analyses used contextual measures of SES. The study by McEntegart et al.. utilized the Carstairs index (a composite score using postal code that draws on measures of overcrowding, male unemployment, social class and car ownership) while Brekke et al.. utilized neighbourhood factors (such as income, education, employment, mortality, housing standard and proportion of third world citizens) to define the two areas of Oslo as affluent or deprived. Our findings build on those previously reported by including multiple measures of SES including those directly reported by the patient as opposed to only performing neighbourhood level analyses and the addition of two preference-based, generic H R Q L instruments and the R A Q o L . Since we collected patient-specific R A drug treatment data, we were able to determine that there were no treatment differences across SES categories that could have influenced self-reported outcomes. Similarly, since all of our subjects were under the care of rheumatologists, any differences in specialist versus non-specialist care that may have been due to SES and could potentially have influenced self-reported outcomes were avoided. In addition, our study is the first to examine i f this relationship holds true in a North American country with universal access to health care. O f note, we adjusted our model by the number of people l iving in the household. While we found that there was a significant difference in number of people per household across self-reported annual income with higher levels o f income reported by those with larger families, this variable was not significantly 201 associated with the self-reported health variables, did not effect the magnitude or significance of the association of annual household income with the dependent variables, and did not significantly improve the multiple linear regression model fit. Another point of interest that arises from the results of our study was the lack of a consistent gradient (except for the HUI3) across the income categories for the adjusted models of self-reported generic and disease-specific H R Q L and functional status (Figures 6.1 and 6.2). For example, with the SF-6D, it appears that the biggest difference across income categories is between the middle and highest groups rather than between the lowest and highest groups. These results bring up the possibility that there may not be a perfect gradient across the three separate income categories and that it may be a dichotomous phenomenon (i.e. high versus low income) with an annual household income cut-off of approximately $50,000 defining the two groups. Another possible explanation is that there is another factor that is somehow influencing the self-reported health outcomes for the middle income category making it lower than both the high and low income categories. O f interest, in our study, there was a low correlation between the proximate (self-reported) and contextual measures of SES. With both annual household income and education, there were strong univariate associations with self-reported health. However, once adjustments for R A severity were made, only self-reported annual income remained significant. A possible explanation for the lack of association between the self-reported H R Q L measures and education and contextual SES measures is that they may not be indicative o f SES in elderly populations such as those with R A . Our sample was mostly comprised of subjects who had worked in an era when there was less emphasis on education. Therefore, in contrast to a younger, employed sample of asthmatics from the same o geographical area where education and income was highly correlated, results from our 202 sample revealed that these two variables were less correlated. Similarly, in the aforementioned asthma sample, there were strong correlations between contextual and proximate measures o f SES that were not observed in our R A sample indicating that these measures of SES may be more robust for younger participants who are more likely to be currently employed. Another finding from our sample that supports this premise is that older individuals (>50 years of age) who were still in the work force tended to have less education but similar income to those working individuals less than 50 years old. While there were significant gradients across SES (as defined by annual household income) for both of the generic H R Q L measures and the H A Q after adjustment for R A severity, similar findings were not observed for the R A Q o L . Despite significant univariate gradients across SES as defined by annual household income and education, the R A Q o L did not display a clear SES gradient in the multiple linear regression analysis. We postulate that the reason for this is that the R A Q o L is capturing items that are so germane to R A that the variance in its score is explained mostly by the objective and subjective disease severity measures. Indeed, the addition of annual household income had a negligible impact on the model R 2 in the multiple linear regression analysis of the R A Q o L whereas it improved the model fit in all the other analyses. Finally, it can be argued that the results using O L S regression only reveal an association between self-reported annual household income and H R Q L or functional status without the ascertainment o f directionality (i.e. is it the low income that is causing the low HRQL/functional status or vice versa?). We utilized T S L S regression to account for this and found no evidence to support that the low income was "caused" by the low H R Q L or functional status (i.e. the beta coefficients achieved by O L S were not biased). This finding likely makes sense in our sample since most participants were elderly and retired and their 203 current annual household income was likely not influenced by their current H R Q L or functional status. Our study shows that even in a country such as Canada with universal access to health care, the impact of R A on self-reported health is strongly associated with SES as measured by annual income even after adjusting for disease severity. Because self-reported health has been strongly associated with mortality and morbidity, there are important implications for intervention. In addition, these findings should be considered in the context of cost-utility analysis to prevent biasing of utility values obtained from preference-based instruments. In the event that studies do not investigate or report SES or i f the SES in the study sample differs significantly from the population of interest, the results of the analysis may have poor generalizability. Further research should focus on the mediating factors that contribute to this social gradient in self-reported health outcomes in R A . 204 6.6 REFERENCES 1. Idler E L , Benyamini Y . Self-rated health and mortality: A review of twenty-seven community studies. J Health Social Behaviour 1997;38:21-37. 2. Idler E L , Kas l S. Health perceptions and survival: Do global evaluations of health status really predict mortality? J Gerontol 1991;46:S55-S65. 3. Pincus T, Sokka T. Quantitative measures for assessing rheumatoid arthritis in clinical trials and clinical care. Best Pract Res C l i n Rheumatol. 2003;17:753-81. 4. Wolfe F, Michaud K , Gefeller O, Choi H K . Predicting mortality in patients with rheumatoid arthritis. Arthritis Rheum. 2003;48:1530-1542. 5. Franks P, Gold M R , Fiscella K . Sociodemographics, self-rated health, and mortality in the U S . Soc Sci Med. 2003;56:2505-2514. 6. Alter D A , Naylor C D , Austin P C , Chan B T , T u J V . Geography and service supply do not explain socio-economic gradients in angiography use after acute myocardial infarction. Can M e d Assoc J 2003;168:261-264. 7. Hawker G A , Wright JG , Glazier R H , Coyte P C , Harvey B , et al.. The effect of education and income on need and willingness to undergo total joint arthroplasty. Arhritis Rheum 2002;46:3331-3339. 8. Lynd L D , Pare P D , Ba i T, Fitzgerald J M , Anis A H . A cross-sectional evaluation of the relationship between socioeconomic status and the magnitude o f short-acting beta-agonist use in asthma. Chest (accepted December 2003) 9. Wood E , Montaner JS, Chan K , Tyndall M W , Schechter M T , et al.. Socioeconomic status, access to triple therapy, and survival from HIV-disease since 1996. A I D S 2002;16:2065-2072. 205 10. Marra C A , Woolcott JC, Shojania K , Offer R, Kopec J, Brazier JE , Esdaile J M , Anis A H . A comparison of generic, indirect utility measures (the HUI2 , HUI3 , SF-6D, and the EQ-5D) and disease-specific instruments (the R A Q o L and the H A Q ) in rheumatoid arthritis. Soc Sci M e d (submitted) 11. Marra C A , Esdaile J M , Guh D , Kopec J A , Brazier JE, Chalmers A , Koehler B , Anis A H . A comparison of four indirect methods of assessing utility values in rheumatoid arthritis. M e d Care (submitted). 12. Brazier J, Roberts J, Deverill M . The estimation of a preference-based measure of health from the SF-36. J Health Econ 2002;21:271-292. 13. Walters SJ, Brazier JE . What is the relationship between the minimally important difference and health state utility values? The case of the SF-6D. Health Qual Life Outcomes. 2003;11:4-12. 14. Feeny D , Furlong W , Torrance G W , Goldsmith C H , Zhu Z , et al.. Multiattribute and single-attribute utility functions for the Health Utilities Index Mark 3 system. M e d Care 2002;40:113-128. 15. Drummond M F , O 'Br ien B , Stoddart G L , Torrance G W (eds.). Methods for the economic evaluation of health care programmes. 2nd edition. Oxford Medical Publications, Oxford. 1997. 16. Bruce B , Fries JF. The Stanford Health Assessment Questionnaire: a review of its history, issues, progress, and documentation. J Rheumatol 2003;30:167-178. 17. Redelmeier D A , Lor ig K . Assessing the clinical importance of symptomatic improvements — an illustration in rheumatology. Arch Intern M e d 1993; 153:1337-1342. 206 18. Wells G A , Tugwell P, Kraag G R , Baker PR, Groh J, Redelmeier D A . Min imum important difference between patients with rheumatoid arthritis: the patient's perspective. J Rheumatol 1993;20:557-560. 19. De Jong Z , V a n Der Heijde, Mckenna SP, Whalley D . The reliability and construct validity of the R A Q o L : A rheumatoid arthritis-specific quality of life instrument. B r J Rheumatol 1997;36:878-883. 20. Wong A L , Wong W K , Harker J, Sterz M , Bulpitt K , Park G , Ramos B , Clements P, Paulus H . Patient self-report tender and swollen joint counts in early rheumatoid arthritis. Western Consortium of Practicing Rheumatologists. J Rheumatol 1999;26:2551-2561. 21. B C Stats. 1996 Census of Canada. Victoria, B C : Ministry of Finance and Corporate Relations, Government of B C ; 1998 March 9, 1998. 22. Cohen J. A power primer. Psychol B u l l 1992;112;155-159. 23. Idler E L , Angel RJ . Self-rated health and mortality in the N H A N E S - I epidemiologic follow-up study. J Pub Health 1990;80:446-452. 24. Maiden N , Capell H A , Madhok R, Hampson R, Thomson E A . Does social disadvantage contribute to the excess mortality in rheumatoid arthritis patients? A n n Rheum Dis 1999;58:525-529. 25. Pincus T, Callahan L F , Sale W G , Brooks A L , Payne L E , et al.. Severe functional declines, work disability, and increased mortality in seventy-five rheumatoid arthritis patients studied over nine years. Arthritis Rheum 1984;27:864-872. 26. Sculpher M J , O 'Br ien B J . Income effects of reduced health and health effects of reduced income. Med Decis Making 2000;20:207-215. 207 27. Callahan L F , Cordray D S , Wells G , Pincus T. Formal education and five-year mortality in rheumatoid arthritis: Mediation by helplessness scale scores. Arthritis Care Res 1996;9:463-472. 28. Katz PP. Education and self-care activities among persons with rheumatoid arthritis. Soc Sci M e d 1998;46:1057-1066. 29. Brekke M , Hjortdahl P, Thelle D S , Kv ien T K . Disease activity and severity in patients with rheumatoid arthritis: relations to socio-economic equality. Soc Sci M e d 1999;48:1743-1750. 30. McEntegart A , Morrison E , Capell H A , Duncan M R , Porter D , et al.. Effect of social deprivation on disease-severity and outcome in patients with rheumatoid arthritis. A n n Rheum Dis 1997;56:410-413. 208 TABLE 6.1: CHARACTERISTICS OF THE STUDY PARTICIPANTS (N= 313) Parameter Mean SD Age (yrs) 61.5 25.9 RA Duration (yrs) 13.87 11.41 Pain V A S (mm) 0 to 100 43.12 27 Tender Joint Count 0 to 50 15.09 12 Sw ollen Joint Count 0 to 50 9.14 9.67 Erythrocyte Sedimentation Rate (mm/hr) 24.71 21.01 HAQ Disability Index 0 to 3.0 1.1 0.77 RAQoL Score 0 to 28 12.73 8.48 HUI-2 Global Utility Score -0.03 to 1.00 0.71 0.19 HUI 3 Global Utility Score -0.36 to 1.00 0.53 0.29 EQ-5D Global Utility Score -0.59 to 1.00 0.67 0.24 SF-6D Global Utility Score 0.31 -1.00 0.63 0.13 Parameter N % Self-Reported RA Severity, n % Very Mild 9 3% Mild 34 11% Moderate 120 38% Severe ~ 1 1 0 35% Very Severe 27 9% Self-Reported RA Control, n % Very Well Controlled 33 11% Well Controlled 76 24% Adequately Controlled 123 39% Not Well Controlled 61 19% Not Controlled At All 7 2% Hospitalized For RA in Last 12 Months, n% 45 15% Missed Work or School Due to RA in Last 12 Months, n% 59 19% Purchased or Rented Equipment for RA in Last 12 Months, n% 72 23% Used Allied Health Professional/Home Care Services in Last 12 months, n % 129 42% Concomitant Chronic Illness Other Than RA, n% 192 62% 209 TABLE 6.2. PROPERTIES OF THE MEASURES OF SOCIOECONOMIC STATUS (SES) IN OUR SAMPLE Contextual SES Measures Mean SD Neighbourhood Median Income 20040 4313 Bachelor's Education (%) 15.2 10.2 Neighbourhood Unemployment Rate (%) 8.7 2.6 Self-Reported: SES Measures (Proximate) Number % Education Completed Less than High School/Trade 75 24% High School/Trade 169 54% Bachelor's 52 17% Missing 17 5% Annual Household Income <$20,000 50 16% $20,000 to $50,000 115 37% >$50,000 90 28% Missing 58 19% 210 TABLE 6.3 UNIVARIATE ASSOCIATIONS WITH THE GENERIC HRQL MEASURES (THE SF-6D AND THE HUB) SF-6D Global Utility Score HUI3 Global Utility Score Factor Regression Coefficient (SE) p-value Regression Coefficient (SE) p-value DEMOGRAPHICS Age Gender (Female is reference) -0.0002 (0.0006) 0.05 (0.02) NS <0.0001 -0.002 (0.001) 0.08 (0.04) NS 0.04 RA SEVERITY VARIABLES Years since diagnosis -0.002 (0.0006) 0.001 -0.006 (0.001) 0.0001 No. of other chronic diseases -0.02 (0.005) 0.006 -0.03 (0.01) 0.004 Erythrocyte sedimentation rate -0.002 (0.0006) 0.0003 -0.003 (0.001) 0.0007 Tender joint count -0.006 (0.0005) O.0001 -0.01 (0.001) O.0001 Swollen join count -0.006 (0.0007) O.0001 -0.01 (0.002) O.0001 Global pain VAS -0.003 (0.0002) O.0001 -0.006 (0.0005) <0.0001 Patient global assessment VAS 0.003 (0.0002) O.0001 0.007 (0.0005) O.0001 HAQ disability index score -0.12(0.08) O.0001 -0.29 (0.01) <0.0001 Hospitalization in last year* -0.03 (0.02) NS -0.10(0.05) 0.03 Home/Health services for RA* -0.05 (0.02) 0.0009 -0.15(0.03) O.0001 Purchase/rent RA equipment* -0.05 (0.02) 0.005 -0.19(0.04) O.0001 Missed work/school in last year* -0.07 (0.02) 0.0004 -0.14(0.05) 0.004 RA self-reported severity O.0001 <0.0001 Very mild 0.33 (0.04) O.0001 0.51 (0.10) <0.0001 Mild 0.24 (0.03) O.0001 0.47 (0.08) O.0001 Moderate 0.15 (0.02) O.0001 0.31 (0.05) O.0001 Severe 0.08 (0.02) 0.0003 0.15(0.06) 0.0008 Very severe Ref Ref RA self-reported control O.0001 Very well controlled 0.28 (0.05) 0.0001 0.74 (0.10) <0.0001 Well controlled 0.24 (0.04) 0.0001 0.66 (0.10) O.0001 Adequately controlled 0.17(0.04) 0.0001 0.54 (0.09) O.0001 Not well controlled 0.07 (0.04) 0.12 0.31 (0.10) 0.002 Not controlled at all. Ref Ref PROXIMATE SES FACTORS Education Completed <0.0001 NS None -0.06 (0.02) 0.01 -0.11 (0.05) 0.04 High school/Trade -0.04(0.02) . 0.03 -0.06 (0.05) 0.17 Bachelors education Ref Ref Yrs. Post-secondary education 0.005 (0.003) NS Annual household income <0.0001 0.0003 < $20,000 -0.09 (0.02) O.0001 -0.21 (0.05) <0.0001 $20,000 - $50,000 -0.06 (0.02) O.0001 -0.10(0.04) 0.01 > $50,000 Ref Ref CONTEXTUAL SES FACTORS Median Neighborhood Incomet % Bachelors Education Neighbourhood Unemployment 0.017 (0.018) 0.0002 (0.0008) -0.0005 (0.002) NS NS NS 0.04 (0.04) 0.0006 (0.002) -0.006 (0.005) NS NS NS * Reference category is "no" t For categories of $ 10,000 211 T A B L E 6.4 U N I V A R I A T E A S S O C I A T I O N S W I T H T H E G E N E R I C H R Q L M E A S U R E S ( T H E HUI2 A N D T H E EQ-5D) HUI2 Global Utility Score EQ-5D Global Utility Score Factor Regression Coefficient (SE) p-value Regression Coefficient (SE) p-value DEMOGRAPHICS Age Gender (Female is reference) -0.01 (0.0009) 0.04 (0.03) NS NS 0.0008 (0.001) 0.05 (0.03) NS NS RA SEVERITY VARIABLES Years since diagnosis 0.002 (0.001) 0.04 -0.003 (0.001) 0.03 No. of other chronic diseases -0.03 (0.008) 0.0008 -0.02 (0.01) 0.09 Tender joint count -0.007 (0.0008) O.0001 -0.008 (0.001) O.0001 Erythrocyte sedimentation rate -0.002 (0.0007) 0.01 -0.002 (0.0008) 0.03 Swollen join count -0.008 (0.001) <0.0001 -0.01 (0.001) O.0001 Global pain VAS -0.004 (0.0003) <0.0001 -0.005 (0.0004) <0.0001 Patient global assessment VAS 0.004 (0.0003) O.0001 0.005 (0.0004) O.0001 Hospitalization in last year* -0.077 (0.032) 0.02 -0.05 (0.04) 0.18 Home/Health services for RA* -0.09 (0.02) <0.0001 -0.11 (0.03) <0.0001 Purchase/rent RA equipment* -0.12(0.03) <0.0001 -0.12(0.03) 0.0003 Missed work/school in last year* 0.08 (0.03) 0.009 0.13(0.04) 0.002 RA self-reported severity <0.0001 <0.0001 Very mild 0.31 (0.03) <0.0001 0.41 (0.08) O.0001 Mild 0.27 (0.07) O.0001 0.36 (0.05) <0.0001 Moderate 0.18(0.04) O.0001 0.24 (0.04) O.0001 Severe 0.05 (0.04) NS 0.12(0.04) 0.02 Very severe Ref Ref RA self-reported control O.0001 <0.0001 Very well controlled 0.40 0.07) O.0001 0.56 (0.08) <0.0001 Well controlled 0.37 (0.07) <0.0001 0.52 (0.08) <0.0001 Adequately controlled 0.31 (0.06) O.0001 0.42 (0.08) <0.0001 Not well controlled 0.13 (0.07) 0.05 0.22 (0.08) 0.006 Not controlled at all Ref Ref PROXIMATE SES FACTORS Education completed NS 0.02 None -0;07 (0.04) 0.04 -0.11 (0.04) 0.01 High school/trade -0.05 (0.03) NS -0.02 (0.04) NS Bachelors education Ref Ref Yrs. Post-secondary education 0.006 (0.005) NS 0.002 (0.006) NS Annual household income 0.002 0.001 < $20,000 -0.11 (0.03) 0.0006 -0.15(0.04) 0.0004 $20,000 - $50,000 -0.07 (0.03) 0.02 -0.09 (0.03) 0.009 > $50,000 Ref Ref CONTEXTUAL SES FACTORS Median neighborhood income t % bachelors education Neighbourhood unemployment • 0.000002 (0.000003) 0.0001 (0.001) -0.003 (0.004) NS NS NS 0.000008 (0.000004) 0.001 (0.001) -0.008 (0.005) 0.02 NS 0.08 * Reference category is "no" t For categories of $10,000 212 TABLE 6.5 UNIVARIATE ANALYSIS WITH THE DISEASE-SPECIFIC MEASURES (THE HAQ AND THE RAQOL) HAQ Score RAQoL Score Factor Regression Coefficient (SE) p-value Regression Coefficient (SE) p-value DEMOGRAPHICS Age 0.007 (0.004) 0.03 -0.032 (0.05) NS Gender (Female is reference) -0.35 (0.11) 0.001 -3.57(1.13) 0.002 RA SEVERITY VARIABLES Years since diagnosis 0.02 (0.003) <0.0001 0.15(0.05) 0.002 No. of other chronic diseases 0.10(0.03) 0.002 1.02 (0.33) 0.002 Tender joint count 0.03 (0.003) <0.0001 0.38 (0.03) 0.0001 Erythrocyte sedimentation rate 0.008 (0.003) 0.007 0.08 (0.03) 0.004 Swollen join count 0.03 (0.004) <0.0001 0.42 (0.04) 0.0001 Global pain VAS 0.02 (0.001) O.0001 0.19(0.01) 0.0001 Patient global assessment VAS -0.01 (0.001) <0.0001 -0.19(0.01) 0.0001 Hospitalization in last year* 0.33 (0.13) 0.008 2.64(1.32) 0.05 Home/Health services for RA* 0.43 (0.09) O.0001 4.02 (0.93) 0.0001 Purchase/rent RA equipment* 0.48(0.10) <0.0001 4.76(1.11) 0.0001 Missed work/school in last 0.42 (0.12) 0.0005 6.28 (1.27) 0.0001 year* RA self-reported severity O.0001 0.0001 Very mild -1.40 (0.26) O.0001 -15.73 (2.69) 0.0001 Mild -1.26 (0.17) O.0001 -13.81 (1.80) 0.0001 Moderate -0.68 (0.14) <0.0001 -7.97(1.49) 0.0001 Severe -0.31 (0.15) 0.03 -2.34(1.02) 0.12 Very severe Ref Ref RA self-reported control <0.0001 0.0001 Very well controlled -1.39(0.28) O.0001 -17.85 (2.75) 0.0001 Well controlled -1.25 (0.27) O.0001 -16.01 (2.60) 0.0001 Adequately controlled -0.94 (0.26) O.0001 -10.98 (2.56) 0.0001 Not well controlled -0.40 (0.27) 0.13 -3.98 (2.63) 0.15 Not controlled at all Ref Ref PROXIMATE SES FACTORS Education completed 0.05 NS None 0.30(0.13) 0.02 3.10(1.45) 0.03 High school/trade 0.26 (0.12) 0.03 1.88 (1.27) 0.14 Bachelors education Ref Ref Yrs. Post-secondary education -0.03 (0.02) NS -0.10(0.22) NS Annual household income 0.0001 0.03 < $20,000 0.61 (0.13) 0.0001 3.43 (1.48) 0.02 $20,000 - $50,000 0.50 (0.10) 0.0001 2.69(1.18) 0.02 > $50,000 Ref Ref CONTEXTUAL SES FACTORS Median neighborhood 0.1 (0.1) NS -0.0001 (0.001) NS incomet % bachelors education 0.0004 (0.002) NS -0.02 (0.05) NS Neighbourhood unemployment -0^ 0005 (0.002) NS 0.024 (0.15) NS * Reference category is "no" + For categories of $10,000 213 T A B L E 6.6: C O M P A R I S O N O F R A A N D H E A L T H S T A T U S M E A S U R E S A C R O S S D I F F E R E N T S O C I A L C L A S S E S Parameter Self-Reported Annual Family Income Self-Reported Education Level <20K 20 - 50K >50K p-value <HS HS orT BAC p-value N=49 n=113 n=88 n=77 n=163 n=51 RA Duration (years) 16.7(13.7) 13.1 (9.7) 13.0(11.7) 0.14 16.7(12.4) 13.0(10.6) 13.3 (12.5) 0.07 Number of people per household tllliPi;! 2.1 (0.8) 2.6(1.4) <0.0001 2.2(1.1) 2.1 (0.9) 2.5 (1.6) 0.12 Erythrocyte sedimentation rate 26.5 (24.2) 25.9(18.6) 21.2(17.1) 0.32 24.4(19.2) 27.2 (23.6) 20.1 (15.5) 0.27 Number of swollen joints (0-54) 9.6 (10.7) 9.2 (9.3) 8.1 (9.4) 0.59 11.5(9.9) 8.5 (9.0) 7.7 (9.2) 0.03 Number of tender joints (0-54) 15.4(13.3) 15.5 (11.1) 13.2(11.8) 0.33 18.3 (13.5) 14.3(10.6) 13.7(12.6) 0.03 Self-reported RA severity (1-5) 3.8 (0.8) 3.4 (0.9) 3.2(1.0) 0.003 3.7 (0.8) 3.4 (0.87) 3.1 (1.0) 0.1 Self-reported RA control (1-5) 2.9(1.1) 2.7(1.0) 2.6(1.0) 0.38 2.9(1.1) 2.7 (0.98) 2.8(1.1) 0.79 Disease activity - patients' global assessment (0-100) 53.6 (26.4) 58.8 (25.2) 68.0(25.2). 0.004 54.2 (25.3) 62.2 (25.9) 64.7(23.9) 0.03 Hospitalization in last year (%)* 2.5 6.2 5.8 0.89 2.1 9.9 2.8 0.17 Home/Health services for RA (%)* 8.9 17.5 15.9 0.66 9.0 25.4 8.0 0.32 Purchase/rent RA equipment (%)* 6.4 15.2 9.3 0.51 3.4 20.9 4.6 0.006 Missed work/school in last year (%)* 4.0 7.2 8.4 0.37 2.4 12.3 4.1 0.04 Pain VAS (0-100) 46.5 (29.0) 45.1 (26.7) 37.5 (27.8) 0.09 47.6 (27.7) 41.9(26.9) 43.3 (27.6) 0.32 RAQoL Score (0 - 30) 14.5 (8.4) 13.2 (8.2) 10.9 (8.6) 0.04 14.6 (8.9) 12.4 (7.9) 10.8 (7.7) 0.03 HAQ Disability Index (0.0 - 3.0) 1.4 (0.7) 1.3 (0.8) 0.8 (0.7) <0.0001 1.2 (0.8) 1.1 (0.8) 0.85 (0.7) 0.02 SF-6D Global Score (0.30 - 1.00) 0.58(0.13) 0.61 (0.13) 0.67 (0.13) 0.0002 0.60 (0.13) 0.63 (0.12) 0.67 (0.15) 0.03 HUI3 Global Score (-0.36 -1.00) 0.41 (0.31) 0.52 (0.28) 0.63 (0.29) 0.0002 0.48 (0.28) 0.54(0.29) 0.61 (0.28) 0.05 . HUI2 Global Score (-0.03 -1.00) 0.64 (0.21) 0.69 (0.18) 0.76 (0.18) 0.002 • 0.68 (0.20) 0.71 (0.20) 0.75 (0.16) 0.11 EQ-5D Global Score (-0.59 - 1.00) 0.58 (0.28) 0.65 (0.23) 0.73 (0.20) 0.001 0.60 (0.27) 0.68 (0.22) 0.71 (0.23) 0.02 214 FIGURE 6.1: GENERIC HRQL BY SELF-REPORTED ANNUAL INCOME (HUB AND SF-6D) o> 0.7n i_ o o 00 >> £ 0.6-n o O £2 X 0.54 0.4-HUI3 1 <20 K 20-50K >50 K Self-Reported Annual Income p = 0.05 (type III sums of squares) Model R 2= 0.66 SF-6D o u CO n o O Q to • u. co 0.674 0.624 0.57-1 I <20K 20-50K >50K Self-Reported Annual Income p=0.02 (type III sums of squares) Model R2 = 0.71 The points indicate the least squares means and the lines indicate their 95% confidence intervals. HRQL measures were adjusted for RA duration, pain VAS, self-reported RA control and severity, tender joint count, RAQoL score, and number of people in the household. Higher scores indicate better HRQL. 215 FIGURE 6. 2: GENERIC HRQL BY SELF-REPORTED ANNUAL INCOME (HUI2 AND EQ-5D) HUI2 o u CO n .o o O CM => I 0.6-1 <20 K 20-50K >50 K Self-Reported Annual Income p = 0.45 (type III sums of squares) Model R 2 = 0.61 o 2 o O Q m • a UJ E Q - 5 D 0.7H 0.6H I I <20 K 20-50K >50 K Self-Reported Annual Income p = 0.20 (type III sums of squares) Model R 2 = 0.63 The points indicate the least squares means and the lines indicate their 95% confidence intervals. HRQL measures were adjusted for RA duration, pain VAS, self-reported RA control and severity, tender joint count, RAQoL score, and number of people in the household. Higher scores indicate better HRQL. 216 FIGURE 6.3: RAQOL SCORE AND HAQ DISABILITY INDEX BY SELF-REPORTED INCOME 14.5-1 14.0-1 © 13.5 8 13.0-1 CO _J 12.5-1 Q 12.0-1 11.54 11.0-1 10.5 RAQoL Score T " T p = 0.85 (type III Model R2= 0.62 <20K 20-50K >50K Self-Reported Annual Income sums of squares) 1.4-1 | 1.3H ^ 1.2H >« = 1.H g 1.0H a 0 9 I 0.8H 0.7« HAQ Score I I p = 0.0003 (type Model R2= 0.51 —I 1 1 <20K 20-50K >50K Self-Reported Annual Income sums of squares) The points indicate the least squares means and the lines indicate their 95% confidence intervals. HRQL measures were adjusted for RA duration, pain VAS, self-reported RA control and severity, tender joint count, RAQoL score, and number of people in the household. Higher scores indicate better HRQL. 217 CHAPTER 7 ARE INDIRECT UTILITY MEASURES RELIABLE AND RESPONSIVE IN RHEUMATOID ARTHRITIS PATIENTS? 7.1 FOREWORD This manuscript is currently under review under the same title in the Quality of Life Research. The candidate is the first author of the manuscript with is co-authored by Daphne Guh who provided statistical support; Dr. A m i r Ade l Rashidi who assisted with data entry, database construction, and data manipulation; Dr. Jacek Kopec, a member of the candidate's committee who supplied analytical advice; Dr. Michal Abrahamowicz who invented the polytomous regression techniques for responsiveness and instructed the candidate in their application; and Dr. John Brazier who developed the SF-6D and provided methodological advice. Drs. As lam Anis and John Esdaile, co-supervisors of the candidate, were also co-authors on the manuscript. The candidate's role in the manuscript involved the conception of the research question, development of the primary hypothesis and methodology, all statistical analyses, and the writing of the final manuscript. 7.2 INTRODUCTION Improvement in health quality of life ( H R Q L ) is considered to be one of the most important goals in the management of rheumatoid arthritis ( R A ) . 1 A s such, H R Q L and health status measures have often been used as outcomes in clinical trials and studies assessing a 218 variety of interventions in R A . 2 " 5 A variety of instruments that assess RA-specific H R Q L (for example, the Arthritis Impact Measurement Scales (AIMS) , the Rheumatoid Arthritis Quality of Life questionnaire ( R A Q o L ) ) or generic H R Q L or function (such as the Short Form 36 (SF-36)) have been applied to the assessment of R A . 2 ' 6 ' 7 Preference-based or indirect utility measures are generic H R Q L measures that are often used in clinical and observational studies as the scores that they generate can be utilized to calculate quality adjusted life-years ( Q A L Y s ) and can thus be integrated into cost-o utility analyses. Examples of these instruments include the Health Utilities Index Mark 2 (HUI2) and Mark 3 ( H U D ) , EuroQol (EQ-5D), and the Short Form 6D (SF-6D). A l l of these instruments have been previously applied in the assessment of patients with R A . 9 " 1 1 Responsiveness is often defined as the ability of an instrument to measure change; however, there are multiple definitions of responsiveness that exist in the literature. These definitions can be divided into three categories: 1 3 1) ability of an instrument to detect change in general (also referred to as "sensitivity" by Liang et a l . ) 1 4 ; 2) ability o f an instrument to detect clinically important change; and 3) the ability of an instrument to detect real changes in the concept being measured. There has been little work in the evaluation and comparison of responsiveness (using any definition) of the indirect utility instruments. A recent study by Conner-Spady et a l . , 1 1 examined the responsiveness of three preference-based measures o f H R Q L (EQ-5D, H U D , and the SF-6D) in a sample of patients with at least one of several types of rheumatological conditions. To our knowledge, there have been no evaluations of the responsiveness of the R A Q o L in R A in North American populations although one has been published in a Swedish sample. 7 Therefore, there remains a need for more research to assess the responsiveness of 219 these measures, to compare their characteristics, and to determine how their properties compare to disease-specific measures. Finally, since the indirect utility measures are often used as the source of weightings used for Q A L Y s in cost-utility studies in R A , it is important that they are determined to be reliable, valid and responsive in this disease state. Therefore, the primary purpose of this study was to examine the reliability and responsiveness of the indirect utility instruments and the R A Q o L and the H A Q from baseline to six months in a sample of rheumatoid arthritis patients. A secondary purpose was to examine the reliability and validity of using a patient completed transition question as the external criterion to assess responsiveness in R A . 7.3 M E T H O D S 7.3.1 Study Sample To be included, subjects had to have a rheumatologist-confirmed diagnosis of R A (as defined by the American College of Rheumatology diagnostic criteria) 1 5, receive rheumatology care within the province o f British Columbia, consent to answer the questionnaires, be sufficiently proficient in English to answer the questionnaires, and be wil l ing to participate in follow-up surveys. Recruitment of R A patients began in October 2001 and ended in September 2002. Ethical approval for this study was obtained through the University of British Columbia's Behavioural Ethics Committee and informed consent was obtained from each of the participants. Eight private rheumatologists' offices from the study areas referred subjects into the sample during their interactions in routine clinical practice. In addition, two of the eight rheumatologists' practices sent letters to all of their patients with R A inviting them to 220 participate in the survey. A l l patient questionnaires were self-administered, self-completed and submitted via mail. The study rheumatologists' offices supplied additional information from the patients' health record. 7.3.2 Measures Participants were asked to complete a questionnaire at baseline and three and six months thereafter. The questionnaire consisted of sections devoted to socio-economic, clinical and functional status and quality of life assessment instruments. 7.3.2.1 Clinical Participants were asked questions regarding their R A and medication history including adverse reactions over the past three months. Other self-reported clinical variables included swollen joint count (SJC) and tender joint count (TJC) (using the mannequin-based 42 joint count methodology) 1 6, a 10 cm pain visual analogue scale ( V A S ) , a patient global assessment of disease activity (10 cm V A S ) 1 , and R A severity and R A control (both using a 5 point Likert scale). Erythrocyte sedimentation rate (ESR) values closest to the date of completion of the questionnaire (within 1 month) were extracted from the patient's chart for those patients whose rheurnatologist used this measure for patient monitoring. In addition, the attending rheumatologists were asked to complete a physician global assessment of disease activity (10 cm V A S ) for each patient.1 In addition, for the six-month questionnaire, participants were asked to complete a five point Likert scale that assessed changes in their R A since answering the baseline 221 questionnaire. The question asked was "Overall, how would you describe changes in your rheumatoid arthritis since answering the FIRST questionnaire (i.e. about 6 months ago?"). Response choices included "Much Worse", "Somewhat Worse", "The Same", "Somewhat Better" and " M u c h Better". These questions are referred to as "patient transition questions" for the remainder of the manuscript. To increase the number of patients in each category, responses to these questions were collapsed into three categories as follows: (1) worse (included responses "much worse and somewhat worse"); (2) the same; and (3) better (included "much better and somewhat better) which is a similar approach adopted by other 9 12 14 investigators. ' ' The sample of R A patients in our study experienced "natural" courses of their disease over time rather than exposure to a treatment o f known efficacy, administered in a randomized design. In group level analyses, average change scores can mask the proportion of patients with follow-up scores that differ from those at baseline. Because of this, we carried out separate analyses for each of the distribution-based responsiveness measures according to our collapsed transition question criteria ("worse", "the same", or "better"). I I 17 This is the same approach used by other investigators. ' 7.3.2.2 Health Status and HRQL Measures 7.3.2.2.1 Health Assessment Questionnaire (HAQ) Disability Index The H A Q is a measure o f physical disability that assesses ability to complete everyday tasks in areas such as dressing and grooming, rising, eating, walking, personal hygiene, reach, grip and other activities (such as getting into and out of a car). Each of these areas is assigned a section score that is further adjusted to account for the use of any aids, 222 devices or help from another person. Section scores are then summed and averaged to give an overall score between 0.0 (best possible function) to 3.0 (worst function). A H A Q score difference of 0.25 is said to represent the minimally important difference ( M I D ) . 1 8 ' 1 9 7.3.2.2.2 Rheumatoid Arthritis Quality of Life Questionnaire (RAQoL) The R A Q o L consists of 30 questions (answered by yes/no) that assess such aspects of R A as moods and emotions, social life, hobbies, everyday tasks, personal and social relationships, and physical contact. The R A Q o L is scored by assigning a point for each affirmative response and no points for negative responses. Thus, scores range from 0 (least severity) to 30 (highest severity). To date, the M I D for the R A Q o L has been estimated to be approximately 2.00. 7.3.2.2.3 Preference Based Measures - MA UT Instruments The indirect utility assessment instruments used in the questionnaire were the HUI2 , H U D , SF-6D, and the EQ-5D. In a cross-sectional analysis in patients with R A , the M I D for the overall utility scores was determined to be 0.03 to 0.04 for the H U I 2 , 0.06 to 0.07 for the H U D , and 0.03 to 0.05 for the SF-6D and the E Q - 5 D . 2 0 In another analysis of seven longitudinal studies examining SF-6D global utility scores, investigators estimated that the M I D to be 0.033 (95% CI: 0.029 to 0.037). 1 0 A recent comprehensive review of the similarities and differences across these instruments is available and is beyond the scope of this research paper. 223 7.3.3 Data Analysis 7.3.3.1 Reliability The reliability of a transition question to assess changes in health status has not previously been studied. 1 0 To determine reliability of the patient transition question and the other questionnaires, a second questionnaire was sent to a randomly selected group of 50 patients immediately after receipt of their follow-up questionnaire with the instructions to complete and return within 5 weeks. The five week period was chosen as this was determined a priori to be the time window in which changes (either improvement or deterioration) in their R A would be unlikely. Patients were instructed to answer the transition question in relation to their baseline questionnaire. Another approach to test-retest reliability was the "stable groups" approach comparing scores from patients who reported that they remained stable from 0 to 6 months. For all analyses, intraclass correlation coefficients (two-way mixed effect model such that the subject effect was random and the instrument effect was fixed) were calculated for the overall scores from the two time periods. 7.3.3.2 Validity of the Transition Question Similarly, no assessments have been conducted to determine the validity of the transition question with respect to assessing rheumatoid arthritis. 1 0 To determine agreement with the results from the collapsed transition question and changes in the H R Q L instruments (divided into "worse", "same" and "better" categories using M I D values from the literature as cut-off points), a weighted kappa (using quadratic weights) was calculated where 1.00 signifies perfect agreement and 0.00 signifies no agreement.2 2 We also estimated the M I D 224 using values that were calculated from this study using those who scored either "somewhat better" or "somewhat worse" on the transition question assuming that the change experienced by these patients was equivalent to the M I D as noted by other investigators. 1 0 Both anchor-and distribution-based approaches to assessing M I D values have been shown to yield similar values in many situations. 2 3 ' 2 4 To further examine the validity of using the transition question as the external criteria, we examined the Spearman's correlation between the transition question and changes in variables that were found to exhibit strong correlation with the generic and disease-specific H R Q L measures in cross-sectional analyses (patient global assessment of disease severity, pain visual analogue scale ( V A S ) , and the H A Q disability score). Spearman's rho of > 0.50 or < -0.50 were considered be strong, while values between -0.49 to -0.30 or 0.30 to 0.49 were considered moderate and values between -0.30 and 0.30 were considered to be weak). For purposes of this analysis, a correlation of moderate or greater was considered to be evidence of validity. 7.3.3.3 Measures of Responsiveness Our analysis focused on the assessment of responsiveness to change in R A for the indirect utility measures (the HUI2 , H U B , SF-6D and the EQ-5D) , the R A Q o L and the H A Q . Analysis for responsiveness was completed for the baseline and six month pairs of responses. For each patient who had data on all instruments at each o f the pair of visits, the difference between the two corresponding indirect utility, R A Q o L , and H A Q scores was calculated. In the primary analysis of responsiveness, the results were stratified into patients classified as "better", "the same", or "worse" according to the collapsed transition question. 225 In addition, in a secondary analysis, utilizing the patient global assessment of disease activity (called "patient global" hereafter), the percentage improvement over baseline was calculated utilizing the following formula: (6mos.patientglobal -baseline.patientglobal) + (baseline. patientglobaT) According to this formula, patients were classified as: 1) "better" i f the patient global had changed by > 20% , 2) "the same" i f the patient global had changed > -20% and < 20%; and 3) "worse" i f the patient global had changed < -20%. Spearman correlation coefficients were calculated to determine the correlation between this classification criteria and the collapsed transition question. A l l the indices of responsiveness (as described below) were calculated for the subgroups defined by this criterion. Five distribution-based approaches were employed to assess responsiveness: Oft 1) the effect size ( E S ) using the following formula: mean{x\ - xi) totalgroup totalgroup where: xi = mean score at 6 months for the entire group X2 = mean score at baseline for the entire group SDtotaigroup = standard deviation at baseline for the entire group 226 A n effect size of 1 indicates a change in magnitude equivalent to one standard deviation. We adopted the criteria of Cohen, where absolute values of effect sizes (d) can be categorized as small (< 0.5), medium (0.5 to 0.8), or large ( > 0.8). 2 7 Positive values reflect improvement while negative values reflect worsening for the indirect utility instruments while the converse is true for the H A Q and the R A Q O L . 2) the standardized response mean (SRM) using the following formula: where xi = mean score at 6 months for the entire group X2 = mean score at baseline for the entire group S D (xi -X2)totaigroup = the standard deviation ( S D ) for the change in scores in the entire group. The absolute values of the S R M are regarded as either small (<0.5), medium (0.5 to 0.8) or large (>0.8) and the signs (either positive or negative) are interpreted as for the 3) the control standardized response mean ( C S R M ) using the following formula: mean{x\ - xi) totalgroup ES. 28 mean{x\ - xi) totalgroup 227 where the mean change in the total group refers to the mean change in the subgroups of "worse", "same", and "better" (as per the external criteria) and the standard deviation is taken from the subgroup reporting no change . The criterion for the size of the C S R M is the same as for the ES and S R M . 4) the relative efficiency statistic (RE) using the following formula t . ^ l comparison t goldstandard Given the information on the superior responsiveness of disease-specific over generic measures, 3 0 we selected the R A Q o L as the "gold standard" which to compare each of the instruments. The measure with the highest R E has the highest power for a given sample size, or requires fewer patients, to achieve a given level o f statistical power. 12 13 5) paired sample t-tests reported as a p-value Since the standard errors of the distribution-based approaches are not defined, we used bootstrap methods to estimate 95% confidence intervals (CI) for the E S , S R M , and the C S R M . 1 0 Rather than conduct a large number of statistical tests, the 95% CIs were investigated to determine the degree of overlap between the values generated across the H R Q L measures. Also , since it is well-known that the results of these indices sometimes generate conflicting results, 1 2 ' 3 1 we ranked the order of the values according to the responsiveness statistic and calculated the overall median value across the responsiveness statistics to determine the overall rank. . 228 The distribution-based methods described above do not provide answers to practical questions such as, for example, how likely is a decrease in a specified amount in the utility score (as measured by the indirect instruments) to represent actual deterioration? Thus, we utilized a flexible polytomous regression model to assign probabilities of patient's improvement, status quo, or deterioration (as defined by the transition question) to different levels of change in the indirect utility and disease specific H R Q L measures. 1 7 The results of this polytomous regression are presented in a graph of 3 curves, each of which describes how the estimated probability of a respective outcome (improvement, no change, or worsening as defined by the collapsed transition question or the patient global assessment of disease activity question), changes as a function of the difference in two consecutive scores. Finally, we examined associations between changes in either the unweighted domain scores of the E Q - 5 D and the SF-6D (as these instruments do not typically calculate single-attribute utility values) or the single-attribute utility scores of the HUI2 and HUI3 with the external criteria. The purpose of these analyses was to investigate which domains/single attributes were most likely to change in response to improvement or worsening in R A (as defined by the external criteria). Statistical analysis using Kruskal-Wallis was employed. Conservatively, we defined a clear association i f the test for was significant for the domain or single attribute with both external criteria. 7.4 RESULTS 7.4.1 Demographics and Missing Values O f the 320 R A patients who returned the baseline questions, 239 returned the six month questionnaires for a 75% follow-up response rate. Characteristics of our baseline 229 sample have been described in detail elsewhere. Baseline characteristics of those who completed the six month questionnaires compared to those who did not are shown in Table 7.1 For most of the variables examined there were no differences between the baseline characteristics of those who completed the baseline questionnaire as compared to those who did not. However, for all of the instrument scores, those who completed the six month questionnaires appeared to have poorer baseline mean H R Q L scores than those who did not (with the exception of the HUI2) but this relationship was statistically significant only for the H A Q . Other variables that differed between the subgroups were self-reported severity and proportion who worked outside the home in the past 12 months (both favouring those only completing the baseline questionnaire). 7.4.2 Reliability Test-retest reliability for the collapsed categories of the transition question ("worse", "same", and "better") using the follow-up questionnaire and a subsequent questionnaire within 5 weeks of these responses yielded 38 valid (received within the stated time frame) responses and perfect agreement in 36 patients. In the two patients not showing agreement, both returned their reliability questionnaire 14 days after the follow-up questionnaire, one assessed change in his/her R A at 3 months as "better" but "worse" 14 days later, while the other assessed change at 3 months as "worse" but "better" 14 days later. Therefore, the I C C value for the collapsed transition question was 0.80 (95% CI 0.64 to 0.89) with these two responses included and 1.00 i f these are eliminated. The results for the test-retest reliability approach for the generic and disease specific instruments are shown in Table 7.2. Using the stable groups approach (i.e. those reporting no change from 0 to 6 months), we also 230 determined I C C values to examine the reliability for the generic and disease-specific instruments (Table 7.3). Results were similar to the test-retest reliability approach in that reliability of the E Q - 5 D overall score appeared to be the lowest while the R A Q o L and the H A Q displayed the highest reliability. 7.4.3 Validity of the Transition Questionnaire For the 0 to 6 months transition question, 96 (40%) reported improvement, 85 (36%) reported no change and 58 (24%) reported worsening. O f these, 222 patients had pairs of answers on all questionnaires to permit comparisons (89 reporting improvement, 77 reporting status quo and 56 reporting worsening). For the secondary external criterion ( as defined by categorization of the patient global assessment of disease severity V A S ) for these 222 pairs, results of the patient global scores were available and were classified as follows: 65, 118, and 39 reporting improvement, status quo and worsening using criterion described in the Methods section. The two external criteria had fairly low agreement (weighted kappa 0.30, 95% CI 0.20 to 0.41). To examine the agreement between results of the collapsed transition question (improved, status quo, worsened) and the H R Q L measures categorized based upon the literature-based M I D values, we plotted the results as bar graphs and calculated weighted kappa values (Figure 7.1). For all the instruments, agreement between the transition question and the categories of the H R Q L values was relatively low (weighted kappa ranging from 0.15 to 0.28). M I D values calculated from our longitudinal sample using the anchor-based approach yielded values somewhat smaller than those reported in the literature (Table 7.4). 231 Spearman's correlations between the transition question responses and changes in the patient global assessment of disease activity V A S , the pain V A S , and the H A Q disability score are shown in Table 7.5. Correlations between the transition question responses and the R A outcome measures were similar in magnitude to those between the R A outcome measures. Changes in the patient global assessment of disease activity V A S and the H A Q score displayed moderate correlation with the transition question. 7.4.4 Responsiveness The mean change scores for each of the instruments between six months and baseline are shown for the entire sample and stratified by results of the transition question in Table 7.6 and for the categories defined by the patient global assessment of disease activity in Table 7.7. A s hypothesized, for many of the instruments, since the sample was experiencing "natural" changes in their disease over time, the change scores for the entire sample tended to obscure the changes in the subgroups. Scatterplots o f the indirect utility scores over time (from three measurements at baseline, 3 and 6 months) are presented in Figures 7.2 to 7.5 with ordinary least squares regression lines depicting the overall trends. Most of these lines had slopes in the hypothesized direction (positive for "better" and negative for "worse" as defined by the collapsed transition question). For those who reported their R A as "the same", slopes of the regression line tended to be positive across the indirect utility measures. Also of note, within each instrument, the average scores at baseline between the three groups as defined by the collapsed transition question were different with those stating that their R A had worsened having either lower baseline indirect utility scores (for the HUI2 , H U D , SF-6D, and the EQ-5D) and higher (and therefore, worse) R A Q o L and H A Q scores. Those 232 reporting that their R A was "the same" after six months tended to have better H R Q L scores for all the instruments at baseline than the "better" and "worse" categories. For each of the measures, on average, changes for those reporting "better" or "worse" were in the appropriate direction (i.e. for the indirect utility scores, positive and negative, respectively; whereas, for the R A Q o L and the H A Q , this was reversed). These findings were similar when the external criterion for change was changed to categories based upon changes in the patient global assessment of disease severity V A S (Table 7.7). The indices of responsiveness (ES, S R M , C S R M , paired t-test and the R E ) and their associated 95% CI for those who responded as better, the same or worse according to the transition question are presented in Table 7.6, according to the patient global rating of disease severity V A S (Table 7.7) and the rankings of the various responsive statistics are shown in Table 7.8. Generally, the results of the various responsiveness statistics tended to agree within each of the instruments (Table 7.8) and there was little overlap between their 95% CI . Overall, the R A Q o L was the most consistently responsive o f the instruments tested regardless of which of the external criteria were applied. Depending on whether the change was classified as either "worse" or "better" and which of the external criteria were applied, the indirect utility instruments and the H A Q displayed varying degrees of responsiveness. For example, the E Q - 5 D appeared to be responsive in those who were classified as "worse" irrespective of which external criteria were applied but unresponsive in those classified as "better". The H A Q appeared to be relatively responsive in both those classified as better or worse using the patient transition question to define the groups, but less responsive (in relation to the other instruments) when the patient global assessment of disease severity criterion was applied. The H U B appeared to be relatively unresponsive except in those 233 classified as "better" by the patient global assessment of disease severity. The HUI2 was consistently ranked among the middle in responsiveness and the SF-6D appeared to be more responsive in those classified as "better" (by either criterion) than those classified as "worse". 7.4.5 Flexible Polytomous Regression Techniques Results from the flexible polytomous regressions exploring responsiveness are shown in Figures 7.6 to 7.17. The curves on each figure correspond to the 3 types of outcome (worse, same, better) as defined by the external criteria (patient transition question or the patient global assessment of disease activity). Each curve shows how the estimated probabilities of a specific response vary depending on the observed change in the scores of the instruments. In general, the results of using the patient global assessment of disease activity V A S appear to be better able to discriminate between those patients whose R A has improved, worsened or stayed the same than the transition question. This is evident in all o f the graphs as there is a sharper delineation between the three curves (worse, better and same) in Figures 7.12 to 7.17 than in the corresponding Figures 7.6 to 7.11 (for example, comparing Figure 7.6 to Figure 7.12, of which both examine changes in the HUI2) . Overall, the R A Q o L appeared to be most responsive in both Figure 7.10 and Figure 7.16 as compared to the other instruments using the same external criterion. For example, in Figure 7.16, there is very good discrimination between the three curves as shown by their degree of separation. The probability of being classified as "the same" is high (approximately 60%) i f the difference between the two scores is zero. Similarly, this probability decreases as we move in either 234 direction and becomes extremely small when the difference is ± 20. A s the difference in the scores gets larger in the positive direction (recall that larger values in the R A Q o L reflect worse H R Q L ) , the probability of being classified as "worse" grows to > 80% when the difference in scores is approximately 15 and almost 100% when the difference is 20. These values are similar to those displayed for negative values (reflecting improvement) in the R A Q o L and the dashed curve labeled as "better". For the indirect utility instruments for the graphs using the patient transition question as the external criteria for change, there was generally fairly poor discrimination between the curves with significant overlap between the probabilities of being classified "better", "worse" and "same" across the range of difference scores (Figures 7.6 to 7.10). Using the patient global assessment of disease activity V A S criteria, the curves for all the indirect utility instruments showed much better discrimination between those classified as "better" and "worse" (Figures 7.12 -7.15). However, for those classified as the "same", there was considerable overlap between these probabilities and the probabilities for "better" and "worse". The H U B appeared to be the best able to discriminate in this regard (Figure 7.13). Thus, it would seem that although these instruments can discriminate change well (according to the external criterion) in those who improve or worsen, those that stay the same yield somewhat problematic difference scores. This finding could be a property of the instruments or may be a reflection of the cut-off values of our external criterion. Similarly, for the H A Q , the patient global assessment of disease severity V A S criterion appeared to result in better discrimination between the curves; however, as with the indirect utility measures, there was considerable overlap between the "same" category and the other categories. 235 7.4.6 Change in Unweighted Domain Scores (EQ-SD, SF-6D) and Single Attribute Utilities (HUI2, HUH) The associations between the instrument unweighted domains (EQ-5D and SF-6D) and the single attribute scores (HUI2 and H U D ) and the external criteria of change are shown in (Table 7.9). For the EQ-5D, pain/discomfort, anxiety/depression and self-care, and, for the SF-6D, physical, and social functioning, role limitations and pain met our criteria for statistical significance. For the single attributes from the H U I systems, ambulation, emotion, and pain (from the H U D ) and mobility, emotion and pain (from the HUI2) met the criteria. O f note, there were more significant associations between the domains/single attributes and the changes defined by the patient global assessment of disease severity categories than the patient transition question responses. For example, with the E Q - 5 D there was a significant association between the mobility domain in the patient global assessment o f disease severity V A S defined changes but not for the other external criterion. For the SF-6D, H U D , and HUI2 there were significant associations for the vitality domain, the dexterity single attribute, and the sensation single attribute, respectively, using the patient global assessment of disease severity V A S defined changes. O f note, for the self-care single attribute in the HUI2 , there was a significant association between the patient transition defined changes but not the other criterion. 7.5 DISCUSSION This study is the first to compare the reliability and longitudinal changes in scores obtained with four indirect utility instruments ( H U D , HUI2 , E Q - 5 D , SF-6D), a disease-236 specific measure (the R A Q o L ) , and a disability measure (the H A Q ) in a sample of patients with rheumatoid arthritis. Our results demonstrate that while the generic, preference-based measures yielded scores that were generally reliable, they had lower responsiveness (as assessed by multiple methodologies) in R A than the disease-specific R A Q o L . The indirect utility measures did, however, yield moderate responsiveness statistics when the patient global assessment of disease severity was applied as the external criterion for change. The domains and attributes of the indirect utility instruments that were commonly associated with the external criteria for change in R A tended to be pain, ambulation/physical functioning, and emotional/mental health. We also examined the reliability and validity of utilizing a patient-completed transition question to function as the primary external criteria of change which is a common approach. 9 " 1 1 ' 1 4 , 1 7 However, the main concern with this approach is recall bias. Some literature suggests that retrospective estimates of the initial state are often highly correlated with the present state and uncorrelated with the initial state.22 Another concern deals with the starting point of individuals who are rating their health changes. For example, an individual starting at a lower point in health or function may rate a small change as significant where a person of higher function may regard change of the same magnitude as insignificant. 2 9 Despite these concerns, there has been little work in evaluating the reliability and validity of transition questions. 1 0 In this study, we found that the reliability o f the transition question was acceptable using the test-retest approach. However, from the validity standpoint, we found that the responses from the transition question were only lowly to moderately correlated with commonly accepted clinical variables used to assess R A (such as the pain V A S , the patient global assessment of disease severity V A S , and the H A Q ) . 1 The patient 237 transition defined changes also had low agreement with previously defined M I D values and the changes defined by the patient global assessment of disease severity V A S . In addition, we found that there were fairly large mean differences in the instruments between the time points for individuals who were classified as being the "same" from their R A perspective (sometimes the change in this category was of similar magnitude as those classified as "worse" or "better"). This point was illustrated in the polytomous regression plots where there was considerable overlap between the "same" and "better" or "worse" curves. While this finding could be the result of shortcomings of the instrument in assessing changes in R A , these findings were not observed when a different external criterion was applied (categories based upon the patient global assessment of disease activity V A S ) . Also , several single attributes that were expected to have significant associations with changes in R A were significantly associated with changes in the patient global assessment V A S and not the patient transition question changes (mobility (EQ-5D), vitality (SF-6) and dexterity ( H U B ) ) . Therefore, categorization of the patient global assessment of disease activity V A S appears to be a superior external criterion for R A than the patient transition question as it was expected that these domains/single attributes would be associated with changes in R A . Therefore, we would hypothesize that these are the main factors that are driving the observed changes in the global utility scores. Generally, dividing the sample into "worse", "same" and "better" using the patient global assessment of disease severity V A S categories seemed to more accurately define these groups than the patient transition question. This point is illustrated by the larger responsiveness statistics for all of the instruments, the smaller amount of change in all of the instruments in those classified as having their R A being the "same" as at baseline, and a 238 greater magnitude of change (either negative or positive) in those classified as having their R A "worse" or "better" than baseline. Using the transition question as the external criterion resulted in small ES , S R M and C S R M statistics for virtually all o f the instruments and non-significant p-values on the paired t-tests for those who reported to have improved or worsened from baseline for many of the indirect utility measurements (Table 7.6). Conversely, when applying the classification according the patient global assessment of disease severity V A S (Table 7.7), many of the responsiveness statistics for those classified having their R A improved or worsened over baseline can be interpreted as moderate or large, and all o f the paired t-tests for those who improved or worsened were significant for all of the instruments. The indirect utility instruments displayed different properties in this study. Reliability was acceptable for all o f the scores except for the EQ-5D (ICC 0.46 to 0.52 depending on methodology employed). This finding is considerably lower than previously reported in rheumatoid arthritis ( ICC of 0.73 using the stable groups approach and 0.78 using test-retest reliability). 9 The differences in these two findings may be due to the five week window for resubmission of the reliability questionnaires in our study compared to two weeks in the other analysis. In the longer time frame, it is possible that there was a higher probability for change. This change may have penalized the E Q - 5 D much more than the other scales as there is a term in the EQ-5D scoring function (N3) that subtracts 0.269 i f a score of the lowest level (3) occurs on at least one domain. Thus, a one category change (from "2" to "3") in response in a single domain can have profound implications for reducing the E Q - 5 D utility score. However, other instruments which were found to be more responsive than the E Q - 5 D were stable (the R A Q o L and the H A Q ) over this time frame. 239 The HUI2 and the H U D generally had low responsiveness statistics utilizing the patient transition question as the external criteria and moderate responsiveness statistics when the categories o f the patient global assessment o f disease activity V A S were applied. Their relative rankings were towards the middle or bottom for all o f the instruments regardless o f the external criteria applied accept for the "better" category as defined by the patient global assessment of disease activity. For this category, the H U D had the highest responsiveness statistics in three categories (the ES , S R M , and the paired sample t-test). This was likely due to the observation that the mean change in this category was quite large (0.17) which was almost half of the baseline score. In the polytomous regression plots, the H U D appeared to have less overlap between the same and the better or worse curves than the other indirect utility instruments (i.e. Figure 7.13) which may make it more responsive in R A . A s expected, the sensation attribute (HUI2), the vision, hearing and speech attributes ( H U D ) and the cognition attributes (both scales) were not associated with the external criteria defined change in R A . O f note, although one would have expected dexterity ( H U D ) and self-care (HUI2) to be consistently associated with changes in R A , each was only significant for only one of the external criteria. The SF-6D generally had low responsiveness statistics utilizing the patient transition question as the external criteria and moderate responsiveness statistics when the categories of the patient global assessment of disease activity V A S were applied. This latter finding was especially true for the "better" category. In the rankings of the responsiveness statistics, the SF-6D had much higher rankings for those classified as improved (median rankings o f 3 for both external criteria) compared to those classified as worsened (median rankings of 5 and 6). One of the problems with the responsiveness of the SF-6D when using our external 240 criteria was the amount of change experienced by those categorized as the "same". Both paired t-tests for this category using each of the external criteria were significant indicating a large degree o f mean change (0.04 in Table 7.6 which was as large as those reporting improvement and 0.02 in Table 7.7). These results are further illustrated in the Figures 7.9 and 7.14 with the probability of being scored as the same being somewhat constant over the range of SF-6