UBC Faculty Research and Publications

Do the methods used to analyse missing data really matter? An examination of data from an observational… Kaambwa, Billingsley; Bryan, Stirling; Billingham, Lucinda Jun 27, 2012

Your browser doesn't seem to have a PDF viewer, please download the PDF to view this item.

Item Metadata


52383-13104_2012_Article_1645.pdf [ 261.54kB ]
JSON: 52383-1.0223773.json
JSON-LD: 52383-1.0223773-ld.json
RDF/XML (Pretty): 52383-1.0223773-rdf.xml
RDF/JSON: 52383-1.0223773-rdf.json
Turtle: 52383-1.0223773-turtle.txt
N-Triples: 52383-1.0223773-rdf-ntriples.txt
Original Record: 52383-1.0223773-source.json
Full Text

Full Text

RESEARCH ARTICLE Open AccessDo the methods used to analyse missing datareally matter? An examination of data from anobservational study of Intermediate Care patientsBillingsley Kaambwa1*, Stirling Bryan2 and Lucinda Billingham3,4AbstractBackground: Missing data is a common statistical problem in healthcare datasets from populations of olderpeople. Some argue that arbitrarily assuming the mechanism responsible for the missingness and therefore themethod for dealing with this missingness is not the best option—but is this always true? This paper explores whathappens when extra information that suggests that a particular mechanism is responsible for missing data isdisregarded and methods for dealing with the missing data are chosen arbitrarily.Regression models based on 2,533 intermediate care (IC) patients from the largest evaluation of IC done andpublished in the UK to date were used to explain variation in costs, EQ-5D and Barthel index. Three methods fordealing with missingness were utilised, each assuming a different mechanism as being responsible for the missingdata: complete case analysis (assuming missing completely at random—MCAR), multiple imputation (assumingmissing at random—MAR) and Heckman selection model (assuming missing not at random—MNAR). Differences inresults were gauged by examining the signs of coefficients as well as the sizes of both coefficients and associatedstandard errors.Results: Extra information strongly suggested that missing cost data were MCAR. The results show that MCAR andMAR-based methods yielded similar results with sizes of most coefficients and standard errors differing by less than3.4% while those based on MNAR-methods were statistically different (up to 730% bigger). Significant variables inall regression models also had the same direction of influence on costs. All three mechanisms of missingness wereshown to be potential causes of the missing EQ-5D and Barthel data. The method chosen to deal with missing datadid not seem to have any significant effect on the results for these data as they led to broadly similar conclusionswith sizes of coefficients and standard errors differing by less than 54% and 322%, respectively.Conclusions: Arbitrary selection of methods to deal with missing data should be avoided. Using extra informationgathered during the data collection exercise about the cause of missingness to guide this selection would be moreappropriate.Keywords: Missing data, Complete case analysis, Multiple imputation, Generalised linear model, Heckman selection,Observational dataBackgroundMissing data is an unwanted reality in most evaluationsof services for older people as it can lead to biasedresults as well as threats to the generalisability andpower of the results obtained from analysing such data[1, 2]. Even under the best of conditions, missing datamay result in a significant reduction in sample size lead-ing to threats to external validity as a sample reduced insize may no longer be representative of the target popu-lation [3–5]. This is more problematic in circumstanceswhere the likelihood of response is related to observedcharacteristics. Certain forms of missingness can reducethe statistical power of the analyses of the available dataand therefore compromise the internal validity of astudy, which is more serious [3, 6, 7]. A situation thatcan potentially lead to reduced statistical power is when* Correspondence: b.c.kaambwa@bham.ac.uk1Health Economics Unit, Public Health Building, University of Birmingham,Edgbaston, Birmingham B15 2TT, United KingdomFull list of author information is available at the end of the article© 2012 Kaambwa et al.; licensee BioMed Central Ltd. This is an Open Access article distributed under the terms of the CreativeCommons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, andreproduction in any medium, provided the original work is properly cited.Kaambwa et al. BMC Research Notes 2012, 5:330http://www.biomedcentral.com/1756-0500/5/330the probability of response is associated with the valuesof the variable for which values are only partly observed,which is a possibility in a lot of cases [8].The three main mechanisms that lead to missing dataare: missing completely at random (MCAR), missing atrandom (MAR) and missing not at random (MNAR). Ifdata are MAR or MCAR, they can also be referred to as“ignorable” data while those MNAR are “non-ignorable”[8]. Missing data are said to be ignorable if the para-meters that are used to model the missing data processare not related to the parameters used to model theobserved data while non-ignorability exists if there is asystematic difference between responders and nonre-sponders even after accounting for all the observed data[7, 9]. There are various methods that have been pro-posed to deal with missing data with each of these meth-ods premised on a specific missing data mechanism [1,10, 11]. Croninger and Douglas [7] indicate that thechoice of method used for coping with missing data isnot crucial if there is not much missing data and/or thesample is big. This is because most methods will yieldsimilar results in such circumstances. But as the level ofmissingness rises and/or the sample becomes smaller,the choice of method becomes potentially more signifi-cant. In this paper, we do not provide a detailed discus-sion of the various methods that can be used to dealwith missing data. Interested readers can see Fieldinget al. [12] for such a discussion. In general though,complete case analysis (both listwise and pairwise dele-tion) can be performed when data are MCAR [13].Approaches for use when data are MAR include listwisedeletion, various imputation techniques, propensity ad-justment strategy, raw maximum likelihood and expect-ation maximisation [1, 3, 6, 14]. When data are MNAR,panel selection models, including the Heckman, andpattern-mixture approaches can be used [15–17].Most times, the method chosen to deal with missingdata is not based on concrete evidence of the mechan-ism responsible for this missing data. It is consequentlydifficult to assess the accuracy of such methods becausethe data are by definition ‘missing’ [12]. It is a recog-nised fact that data often provide little or no informationat all to help determine the correct mechanism behindmissingness [3, 18]. In many scenarios, therefore, it isdifficult, or even impossible, to know what mechanism isresponsible for the missingness. Sometimes more thanone mechanism may be responsible for different sets ofmissing data within the same evaluation [7, 19]. Thistherefore means that choosing among these alternativemethods is not an easy task.Curran et al. [19] suggest two approaches for deter-mining the missing data mechanism: (1) hypothesis test-ing and (2) collecting extra information, during the datacollection process, about why missing data is missing. Inthe absence of missing data being recovered and ana-lysed, hypothesis testing can at best only rule out thatmissing data are MCAR with no way of confirming thatdata are actually MCAR [20]. Provided enough data hasbeen collected, it therefore seems that, where missingdata is irrecoverable, it is only the latter approach thatwill give some fairly credible indication about whetherdata are MCAR, MAR or MNAR [8, 19].This study explores what happens when extra informa-tion that suggests that a particular mechanism is respon-sible for missing data is disregarded and methods fordealing with the missing data are chosen arbitrarily. Adataset from the largest evaluation of intermediate careservices done and published in the UK to date is used[21]. Intermediate services (IC) are tailored to preventadmission to acute care or long-term care and also aiddischarge from hospital for older people [21]. It is notusual practice for such extra information to be gatheredas part of the data collection process in a evaluationsuch as that for IC and the presence of this informationtherefore presented a unique opportunity to empiricallycompare different methods for dealing with missingdata. As far as we are aware, this is the first time thatthis sort of analysis has been done on a dataset of olderpeople in the UK. Using this dataset, which had missingdata on several variables, the factors that explain vari-ation in costs per patient, change in EQ-5D from admis-sion to discharge (ΔEQ-5D) and change in the Barthelindex from admission to discharge (ΔBarthel) of ICpatients were explored in a regression modelling frame-work. These factors could be broadly divided into threegroups: IC episode characteristics, descriptors of IC ser-vices and descriptors of IC-related services. Three meth-ods incorporating techniques for dealing with missingdata were used: (1) generalised linear models (GLMs)and ordinary least squares (OLS) on complete cases (as-suming that missing data were MCAR), (2) GLM andOLS models on data obtained through multiple imput-ation (MI) (assuming missing data were MAR) and (3)Heckman selection models (assuming that missing datawere MNAR). We were interested in examining thesigns of coefficients as well as the sizes of both coeffi-cients and associated standard errors in the regressionmodel results obtained.MethodsSource of dataData for this study were obtained from five anonymouscase study sites in the UK which were part of the Na-tional Evaluation of the Costs and Outcomes of IC forOlder People (ICNET) [21]. These sites were ‘whole sys-tems’ of IC i.e. areas with a specific geographical bound-ary. Quantitative data were collected by staff working forthe IC services according to protocols set out by theKaambwa et al. BMC Research Notes 2012, 5:330 Page 2 of 12http://www.biomedcentral.com/1756-0500/5/330evaluation team. Service staff completed study proforma, with or on behalf of their patients, at the point ofadmission to the service, for all IC admissions over adefined period. They completed discharge questions onthe day of discharge, transfer to another IC service or assoon as possible following end of service provision. Inaddition, extra information on the reasons as to whysome data were missing was obtained from IC coordina-tors, ICNET researchers’ observations as well as frompreliminary statistical analyses done on the ICNET data-set [21–23]. Data were collected between January 2003and January 2004. Ethical approval was granted by theTrent Multicentre Research Ethics Committee.Missing data in the ICNET datasetThe variables that were collected in the ICNET dataset,based on a sample of 2,253 patients, are presented inTable 1. Up to 42% of the data were missing for somevariables in that dataset. Extra information about whydata were missing were available for all dependent vari-ables (cost per patient, ΔEQ-5D and ΔBarthel) but notavailable for nearly all of the independent variables. Be-cause of this lack of information and for purposes ofcomparing the methods for dealing with missing data, adecision was made to focus on missingness only in thedependent variables. Therefore, 1,536 out of 2,253 obser-vations were excluded from the analyses reported in thispaper due to missing values in the independent vari-ables. There was therefore no missing value for all inde-pendent variables (and interaction terms generated usingthese variables) used in the analyses. A flow chart show-ing how the samples used in the final regression modelswere arrived at is shown in Figure 1. A sample of 717individuals was therefore used for the cost per patientmodels and 125 (17.4%) of these individuals had missingobservations on the cost variable. For the ΔEQ-5D andΔBarthel models, a sample of 1105 individuals was uti-lised. Of this sample, 417 (37.7%) and 392 (35.5%) hadmissing values on the ΔEQ-5D and ΔBarthel variables,respectively.The dependent variablesThe cost per patient variable was calculated by combin-ing resource data with budget information for the indi-vidual IC services.The EQ-5D is an outcome measure whose constructvalidity when used on populations of older people hasbeen well documented [24–26]. It is comprised of fivedimensions of health: mobility, self-care, usual activities,pain/discomfort, and anxiety/depression. There are threelevels of impairment in each domain: no, some/moder-ate, and extreme problems in the relevant dimension ofhealth. Using these responses, the EQ-5D is able to dis-tinguish between 243 states of health [27, 28]. The UK-specific EQ-5D valuation algorithm was used in order toconvert the EQ-5D health description into a valuation.EQ-5D scores have a range of −0.59 to 1: the maximumscore of 1 represents perfect health and a score of 0represents death [28]. Scores less than 0 represent healthstates that are worse than death [28–30]. Its generic na-ture makes it comparable across patient populations.The Barthel Index (BI) is a non-utility based conven-tional clinical scale of functional independence whichhas been recommended by the Royal College of Physi-cians for routine use in the assessment of older people[31]. Its validity when used on a general population ofolder people has also been shown [32] . To measure aperson’s level of functional independence, the BI uses 10items, with each item carrying different weights [33].Two items (bathing and grooming) are rated on a two-point scale of 0 and 5, six (feeding, dressing, bowels,bladder, toilet use and stairs) on a three-point scale of 0,5 and 10 and the last two items (transfers and mobility)are rated on a four-point scale of 0, 5, 10 and 15. Thescores on each item are added to produce an overallscore which ranges from 0 to 100. To standardise them,the overall scores used in this paper were divided by 5and therefore ranged from 0 to 20 [34]. The higher thescore recorded for an item, the greater the level of inde-pendence. The reliability, sensitivity and suitability forproxy-assessment of the BI has been shown elsewhere[33–35].Reasons for missing data in the ICNET datasetWhen data are MCAR, it implies that the probability ofan item missing is unrelated to any measured or un-measured characteristic for that unit [36], while underMAR, the probability of an item having incomplete datadepends on other variables in the dataset [1]. MNAR iswhen the probability of missingness depends on thevalues of the unobserved values perhaps in addition toone or more other variables and/or the observed vari-ables [37].Because of time constraints placed on the data collec-tion process, it was not possible to collect all of the costdata. No other reason was established as being respon-sible for the missing cost data. This suggests that wherecost data were missing, it would be reasonable to assumethat these data were MCAR.In terms of the missing data on the EQ-5D and Barthel,all three mechanisms (MCAR, MAR and MNAR) couldbe assumed as the reason for this missingness.Firstly, information obtained from the IC coordinatorsabout some of the missing EQ-5D and Barthel data indi-cated that some services did not routinely collect this in-formation while some of the item non-responses wereascribed to administrative errors [21]. This suggestedKaambwa et al. BMC Research Notes 2012, 5:330 Page 3 of 12http://www.biomedcentral.com/1756-0500/5/330that it was plausible to assume that the missingnessmechanism for such data was MCAR.Secondly, the ΔEQ-5D and ΔBarthel scores were cal-culated by subtracting the scores at admission fromthose at discharge. A number of individuals had howeverbeen transferred to other services before the end of theirIC episode. For some of these, it meant that their EQ-5D and Barthel scores at ‘discharge’ were not collectedmaking it impossible to compute the ΔEQ-5D andΔBarthel variables. This could be seen as a situationwhere the missing data were MAR as the reason for thepatients transfer was more often than not linked to theirhealth or functional status e.g. the more functionally in-dependent an individual was, the more likely they wereto be transferred to a less intensive form of IC. Add-itional statistical analyses on the IC dataset [23] alsorevealed that the Barthel scores were predictive of themissing EQ-5D values, further reinforcing the plausibil-ity of the missing EQ-5D data being MAR.Thirdly, the mean Barthel scores for some individualswho had missing EQ-5D scores were on average lowerthan those for individuals who did not have missing EQ-Table 1 Variables for use in economic analysis (with level of completeness)Variable Description Missing (%)Episode CharacteristicsAge Age on 01/01/03 3Gender 1 = female , 0 =Male 2Live alone 1 = Individual lives alone, 0 =Otherwise 9Barthel – Start Barthel Score at start of IC episode 31Barthel – End Barthel Score at end of IC episode 38EQ5D – Start EQ-5D at start of IC episode 40EQ5D – End EQ-5D at end of IC episode 41Change in ED-5D Difference between EQ-5D score at end and at start of IC episode 42Change in Barthel Difference between Barthel score at end and at start of IC episode 41Cost Cost per patient 38Descriptors of IC ServicesType of service required 3Admission Avoidance service 1 =Acute Admission Avoidance service, 0 =OtherwiseSupported Discharge service 1 = Supported discharge service, 0 =OtherwiseOther Service 1 =Other IC Services, 0 =OtherwiseType of IC 1 = Residential IC, 0 =Non-Residential IC 0Outcome of IC episode 13Transfer 1 = Transferred before end of IC episode, 0 =Other outcomeComplete 1 = Completed IC episode, 0 =OtherwiseDied 1 = Patient Died, 0 =OtherwiseOther Outcome 1=Alternative Outcome, 0 =Other outcomeStay Duration Duration of service provision (number of days) 17Descriptors of IC related servicesSource of referral 3Referral – primary 0 =Otherwise, 1 = Primary CareReferral – hospital 0 =Otherwise, 1 =HospitalReferral – social 0 =Otherwise, 1 = Social ServicesReferral – other 0 =Otherwise, 1 =Other SourcesAlternatives to IC services 18Alternative – Home 0= Else, 1 =HomeAlternative – Hospital 0 = Else, 1 =HospitalAlternative – other 0 = Else 1 =Other alternativeKaambwa et al. BMC Research Notes 2012, 5:330 Page 4 of 12http://www.biomedcentral.com/1756-0500/5/3305D information [22]. Since some individuals with missingEQ-5D data were associated with lower Barthel scores, itmeans that, by virtue of the positive relationship betweenthe two instruments [23], there is a possibility that theseindividuals would also have had lower EQ-5D scores hadthese been collected. It was therefore reasonable to as-sume that some of the missing data on the EQ-5D couldalso have been MNAR i.e. the poorer ones’ health statuswas, the more difficult it was for them to provide data onthe EQ-5D. By the same token, some of the missingBarthel data could have been MNAR.Choice of regression familiesIn this exercise, it was important to compare both thesigns and sizes of coefficients (and sizes of associatedstandard errors) from the different regression models.Both costs per patient and outcome variables were skewedand heteroscedastic in their residuals. We chose the GLM2,253 obs/ patients in the ‘National evaluation of costsCOST MODEL OUTCOME MODELS1,536 obs with missing data on any of the independent variables used in the cost model excluded 717 obs of which 125 had missing data on the cost variable only Complete caseanalysis based on 592 obs Selection model assuming 125 obs to be censored MI used to impute missing costs for 125patientsGLM on 592 obs Assuming MCAR GLM on 717 obs Assuming MAR Heckman model on 717 obs, of which 125 were ‘censored’ Assuming MNAR EQ-5D model: 1,105obs of which 417 had missing data on the EQ-5D variable only 1,148 obs with missing data on any of the independent variables used in the outcome models excluded Barthel model: 1,105obs of which 392 had missing data on the Barthel variable only OLSon688obsAssu-mingMCAR OLSon1,105 obsAssu-mingMAR Heckman model on 1,105 obs, of which 417 were ‘censored’ Assuming MNAR OLSon713obsAssu-mingMCAR OLSon1,105 obsAssu-mingMAR Heckman model on 1,105 obs, of which 392 were ‘censored’ Assuming MNAR Complete caseanalysis based on 713 obs MI used toimpute EQ-5Dscores for 417 patientsSelectionmodel assuming 417 obs to be censored Complete caseanalysis based on 688 obs MI used toimpute missing Barthel scores for 392 patientsSelectionmodel assuming 392 obs to be censored and out comes of Intermediate Care’ dataset Figure 1 Flow chart showing the data used in the analyses. obs = observations; MI = multiple imputation; GLM = Generalised linear model; OLS- Ordinary least squares; MCAR = missing completely at random; MAR = missing at random; MNAR = missing not at random.Kaambwa et al. BMC Research Notes 2012, 5:330 Page 5 of 12http://www.biomedcentral.com/1756-0500/5/330as it is able to simultaneously deal with both problems [38,39]. We also used log-transformation where the naturallog of the dependent variable was obtained [40] as anothermethod for dealing with the skewed cost data despite sev-eral limitations associated with this approach [41, 42]. Forthe cost models, therefore, a decision was made for theGLM to be used for both the complete cases and themultiply imputed datasets while a log transformed costper patient was used in the Heckman regression model.As the exponentiated coefficients from the GLM modelhave been shown to be easily comparable to the exponen-tiated counterparts obtained from a log-transformedmodel [43], the results of all the cost per patient modelsare presented in terms of exponentiated coefficients.A different approach was taken for the health-outcome dependent variables (ΔEQ-5D and ΔBarthel).This was because these variables also had negativevalues. As a result, log transformation of these variableswould have required the use of a shift factor and thetransformed variables would then have had to be appro-priately retransformed once the results of the model hadbeen obtained. However for ease of analysis and com-parison, a decision was made to use the raw scale ofthese variables. As a result, OLS regressions were usedfor both the ΔEQ-5D and ΔBarthel in the regression oncomplete cases and on multiply imputed datasets. Fur-ther, OLS regressions on a raw scale have also beenwidely used for modelling such outcome data in the lit-erature [44]. The raw scale of the two variables was alsoused in the Heckman selection models.Approaches for dealing with the missing dataFor our two samples (n = 717 and n= 1,105) obtainedfrom the ICNET dataset, three methods, each assumingeither MCAR, MAR or MNAR, were used. A regressionframework was employed in the analysis and in general,the regression relationship between the outcomes ofinterest and the independent variables could be illu-strated as [45]:Yi ¼ β0 þ β1i þ . . .þ βk þ Xki þ μiwhere Yi denotes the outcome of interest (cost per pa-tient, ΔEQ-5D or ΔBarthel) for the ith individual, βi . . . kare the coefficients, Xi . . . k are the explanatory variables(both single and interaction terms) for the ith individualand μi is the stochastic error term for the ith individual.A total of six sets of regression models (two for eachmethod) were conducted:Method 1 involved running regression models oncomplete cases (assuming that data were MCAR). AGLM was used to explain variation in ‘cost per patient’while OLS models were run for cases where thedependent variables were ΔEQ-5D and ΔBarthel.Pairwise deletion, implying the use of all available dataon the particular variables specified in each model, wasthe method used to arrive at the samples modelled ascomplete cases, As a result, disparate sample sizes of592, 688 and 713 observations for the cost per patient,ΔEQ-5D and ΔBarthel models, respectively, were used(please see Figure 1).Method 2 involved running GLM and OLS regressionmodels again to explain variation in costs per patient andoutcomes (ΔEQ-5D and ΔBarthel), respectively. Here,however, we used multiply imputed datasets (assumingthat data were MAR) based on a multivariate normalmodel. [1] Up to about 38% of the data were missing andmultiply imputed datasets were created to account forthese missing data before running GLM and OLSregression models. These analyses focussed on imputingvalues for the dependent variables where theindependent variables were not missing thereby creatingcomplete datasets i.e. 717 observations for the cost perpatient model and 1105 observations for the ΔEQ-5Dand ΔBarthel models. The rationale for this particularimputation was to allow for direct comparison betweenthe results obtained using this method and thoseproduced by method 3 (described below), whichcomparison required that essentially the same sampleswere analysed. Five sets of imputations were createdfollowing conventional practice [11]. Since there was upto 38% data missing, these imputations led to pointestimates that were at least (1+ 0.38/5)−1 = 93% asefficient as those based on m = ∞ imputations [1].In method 3, Heckman selection models (assumingthat missing data were MNAR) were run on the log of‘cost per patient’, on ΔEQ-5D and ΔBarthel using‘complete cases’. Whereas method 1 only consideredcases where there was no missing data for both thedependent variable and independent variables, method3 considers all subjects including those that hadmissing cost, EQ-5D or Barthel information. Thesample selection used a dummy variable equal to 1 ifthe dependent variable was not missing and equal to 0if it was. Using this classification, 125 out of 717observations were censored (missing) for the cost perpatient model while 417 and 392, out of 1105observations, were censored for the ΔEQ-5D andΔBarthel models, respectively.Multiple imputations were conducted in NORM [46]while the rest of the analyses were done in STATA ver-sion 8.2 [47].ResultsThe results of the above analyses are presented in Table 2for the costs per patient models and Tables 3 and 4 forthe ΔEQ-5D and ΔBarthel models, respectively.Kaambwa et al. BMC Research Notes 2012, 5:330 Page 6 of 12http://www.biomedcentral.com/1756-0500/5/330Cost per patient modelsThe results of the GLM regression model on completecases (method 1) and GLM regression model on multi-ply imputed datasets (method 2) are similar. As shownin Table 2, significant predictors of cost per patient werethe Barthel score at admission, IC function (acute ad-mission avoidance service or not), type of IC (residentialor not), if one completed an IC episode, other IC out-come and if the source of referral was primary care. Allof the variables that were found to be significant inmethod (2) were also significant in method (1) with theexception of one (acute admission avoidance service)which was significant in model (2) only. Also, the size ofcoefficients for nearly all of these variables differed byless than 3.4% except the one for ‘completed IC episode’which differed by about 14.4%. The sizes of the standarderrors were also similar. Further, the variables significantin both models had the same direction of influence oncosts per patient. On the other hand, the resultsobtained from the Heckman selection regression model(method 3) were much more different. A lot more vari-ables were found to be insignificant with only twovariables (Barthel score at admission and acute admis-sion avoidance service) shown to significantly influenceTable 2 Comparison of results from three methods of regression analysis of costs per patientGLM on completecases n = 592 [1]aGLM on MIdataset n = 717 [2]bHeckman on complete casesn = 717, 125 obs censored [3]cVariables Exp (Coeff) S.E. Exp (Coeff) S.E. Exp (Coeff) S.E.EpisodeCharacteristicsAge in 2003 0.996 0.003 0.997 0.002 1.000 0.012Gender 0.982 0.063 1.009 0.060 1.085 0.281Lives alone 1.052 0.059 1.047 0.056 1.106 0.275Barthel score at admission 0.973 0.009** 0.984 0.008* 0.884 0.061*EQ5D score at admission 0.973 0.090 0.935 0.087 1.400 0.435Descriptors ofIC ServiceAcute Admission Avoidance Service 0.930 0.129 0.812 0.092* 6.723 0.960*Type of IC 3.181 0.079** 3.150 0.070** 5.146 1.274Transferred before end of IC episode 1.144 0.310 1.259 0.258 1.316 1.422Completed IC episode 2.094 0.300* 2.396 0.248** 4.611 1.318Other IC Outcome 2.703 0.337** 2.796 0.287** 4.374 1.475Patient Died (Reference. Group)DescriptorsofIC-related ServicesReferral – Primary 0.777 0.123* 0.764 0.121* 0.936 0.576Referral – Hospital 0.914 0.158 0.777 0.134 4.523 0.930Referral – Other 1.001 0.212 0.935 0.195 2.240 0.984Referral – Social Workers (Reference Group)Alternative to IC – Other 1.053 0.079 1.058 0.077 0.508 0.451Alternative to IC – Home 1.121 0.074 1.058 0.070 1.112 0.329Alternative to IC – Hospital (Reference Group)Interactions Barthel score at admission*Type of IC 1.031 0.018 1.017 0.097 1.131 0.092Acute Admission Avoidance Service* Type of IC 1.214 0.163 1.217 0.136 0.579 0.752Transfer before IC end*Type of IC 1.145 0.185 1.176 0.169 1.145 0.825Completed Episode*Type of IC 1.152 0.195 1.112 0.162 0.240 0.952Other IC Outcome*Type of IC 0.717 0.708 0.583 0.534 0.773 2.846Patient died*Type of IC (Reference group)_constant 1140.3 0.421** 951.5 0.360** 345.3 1.866**N 592 717 717Censored obs 125R-Squared 0.359 0.634Rho 0.950* 5% level of significance, ** 1% level of significance; Dependent variable: cost per patient for GLM and log of cost per patient for Heckman Selection model;IC = Intermediate care;aMethod 1 assumes that missing data are MCAR;bMethod 2 assumes that missing data are MAR; cMethod 3 assumes that missing data are MNAR.Kaambwa et al. BMC Research Notes 2012, 5:330 Page 7 of 12http://www.biomedcentral.com/1756-0500/5/330costs per patient. The sizes of the coefficients in theHeckman model were also different from those ofthe other two methods. For instance, the coefficient for‘acute admission avoidance service’ was about 730 timesbigger than that obtained in method (2). The mills ratioswere −3.402 and −4.506 for the Heckman selectionmodels with and without interactions, respectively.These were both statistically significant at 95% levelof significance.Change in EQ-5D modelsHere, the results from all three models/methods werebroadly similar (Table 3). Significant predictors of ΔEQ-5D were gender, Barthel score at admission, EQ-5Dscore at admission, IC function, duration of serviceprovision and likely alternatives were IC not available(home and other alternative). Nearly all of the variablesthat were significant in one model were also significantin the other models. The only exceptions were the ‘dur-ation of service provision’ and ‘Alternative to IC-Other*Type of IC’ (both only significant in method 2), ‘acuteadmission avoidance service’ (only significant in method3) and ‘alternative to IC-Other’ (significant only in mod-els 1 and 3). The sizes of the coefficients of variablescommonly significant in all models differed at most byabout 22% with the standard errors differing at most by42% (Table 3). Further, the variables significant in allthree models had the same direction of influence on thechange in EQ-5D. The mills ratios were −0.284 and−0.143 for the Heckman selection models with andTable 3 Comparison of results from three methods of regression analysis (Change in EQ5D)OLS on complete casesn= 688 [1]aOLS on MI datasetn = 1105 cases [2]bHeckman on complete casesn = 1105, 417 obs censored [3]cVariables Coeff S.E. Coeff S.E. Coeff S.E.EpisodeCharacteristicsAge in 2003 0.000 0.001 0.000 0.001 0.000 0.001Gender 0.046 0.022* 0.051 0.018** 0.054 0.024*Lives alone 0.020 0.020 0.015 0.017 0.029 0.023Barthel score at admission 0.017 0.003** 0.017 0.002** 0.016 0.003**EQ5D score at admission −0.495 0.033** −0.479 0.026** −0.484 0.037**Descriptors ofIC ServiceAcute Admission Avoidance Service −0.038 0.027 −0.017 0.021 0.156 0.042**Duration of Service Provision 0.000 0.000 0.001 0.000* 0.000 0.000Descriptors ofIC-related ServicesReferral – Primary −0.031 0.052 −0.044 0.043 −0.020 0.058Referral – Hospital −0.098 0.051 −0.053 0.042 0.020 0.059Referral – Other −0.003 0.078 0.059 0.065 0.013 0.086Referral – Social Workers (Reference Group)Alternative to IC – Other −0.063 0.031* −0.077 0.025** −0.077 0.030*Alternative to IC – Home −0.045 0.023* −0.028 0.019 −0.046 0.022*Alternative to IC – Hospital (Reference Group)Interactions Gender*Type of IC −0.048 0.053 −0.027 0.037 −0.057 0.053Barthel score at admission*Type of IC 0.003 0.004 −0.002 0.003 0.004 0.004EQ5D score at admission *Type of IC −0.098 0.083 0.061 0.057 −0.118 0.082Acute Admission Avoidance Service*Type of IC 0.110 0.064 0.039 0.039 0.086 0.063Alternative to IC – Other *Type of IC 0.137 0.084 0.133 0.059* 0.140 0.082Alternative to IC – Home*Type of IC 0.086 0.106 −0.027 0.049 0.070 0.104Alternative to IC – Hospital *Type of IC(Reference Group)_constant 0.157 0.101 0.093 0.084 0.100 0.105N 688 1,105 688Censored obs 417R-Squared 0.284 0.266 0.634Rho 0.950* 5% level of significance, ** 1% level of significance;Dependent variable: change in EQ-5D, IC = Intermediate careaMethod 1 assumes that missing data are MCAR; bMethod 2 assumes that missing data are MAR; cMethod 3 assumes that missing data are MNAR.Kaambwa et al. BMC Research Notes 2012, 5:330 Page 8 of 12http://www.biomedcentral.com/1756-0500/5/330without interactions, respectively. These were both sta-tistically significant at 95% level of significance.Change in Barthel modelsAs in the ‘change in EQ-5D’ models, the results obtainedfrom all three models/methods for the change in Barthelwere broadly similar (Table 4). Significant predictors ofΔBarthel were the Barthel score at admission, IC func-tion, outcome of IC episode (completed and other),likely alternatives were IC not available (home andother) and an interaction term between likely alterna-tives were IC not available and type of IC. All of thevariables that were significant in one model were alsosignificant in the other models with the exception of‘acute admission avoidance service’ and ‘Alternative toIC—Home*Type of IC’ (only significant in method 3)and ‘Other IC Outcome’ variable only significant in bothmethod (1) and method (2). However, the differences interms of the sizes of coefficients and standard errors ofvariables significant in all methods were slightly biggerin these models than in the ‘change in EQ-5D’ models.They differed at most by about 54% and 322% for coeffi-cients and standard errors, respectively. The variablessignificant in all three models had the same direction ofTable 4 Comparison of results from three methods of regression analysis (Change in Barthel)OLS on complete casesn = 712 [1]aOLS on MI datasetn = 1105 cases [2]bHeckman on complete casesn = 1105, 392 obs censored [3]cVariables Coeff S.E. Coeff S.E. Coeff S.E.EpisodeCharacteristicsAge in 2003 −0.010 0.009 −0.009 0.007 −0.011 0.009Gender −0.007 0.208 0.097 0.164 0.037 0.218Lives alone 0.225 0.190 0.181 0.150 0.320 0.202Barthel score at admission −0.318 0.028** −0.325 0.022** −0.305 0.030**EQ5D score at admission −0.343 0.312 −0.428 0.239 −0.216 0.328Descriptors ofIC ServiceAcute Admission Avoidance Service 0.103 0.218 0.060 0.167 0.728 0.337*Duration of Service Provision 0.008 0.003* 0.011 0.003** 0.006 0.003*Descriptors ofIC-related ServicesTransfer before IC end 4.084 2.452 0.559 0.607 2.713 2.348Completed Episode 7.438 2.440** 3.443 0.587** 4.926 2.478*Other IC Outcome 6.640 2.477** 2.921 0.656** 4.727 2.432Patient died (Reference group)Alternative to IC – Other −1.130 0.291** −1.076 0.221** −1.267 0.291**Alternative to IC – Home −0.709 0.223** −0.667 0.169** −0.669 0.219**Alternative to IC – Hospital (Reference Group)Interactions Barthel score at admission*Type of IC −0.071 0.050 −0.035 0.027 −0.072 0.051Acute Admission Avoidance Service* Type of IC 0.592 0.575 0.131 0.354 0.599 0.589Duration of Service Provision*Type of IC −0.006 0.008 0.001 0.006 −0.006 0.008Transfer before IC end*Type of IC −0.299 0.979 −0.160 0.411 −0.300 0.962Completed Episode*Type of IC 1.053 0.830 0.374 0.424 1.055 0.816Other IC Outcome*Type of IC 0.189 2.000 0.072 0.889 0.200 1.980Patient died*Type of IC (Reference group)Alternative to IC – Other *Type of IC 0.796 0.793 0.968 0.543 0.795 0.778Alternative to IC – Home*Type of IC 2.261 1.025* 0.124 0.447 2.261 1.006*Alternative to IC – Hospital *Type of IC(Reference Group)_constant 0.046 2.536 3.888 0.843 2.687 2.592N 713 1,105 713Censored obs 392R-Squared 0.278 0.634Rho 0.950*5% level of significance, **1% level of significance;Dependent variable: change in Barthel, IC = Intermediate care;aMethod 1 assumes that missing data are MCAR; bMethod 2 assumes that missing data are MAR; cMethod 3 assumes that missing data are MNAR.Kaambwa et al. BMC Research Notes 2012, 5:330 Page 9 of 12http://www.biomedcentral.com/1756-0500/5/330influence on the change in Barthel. The mills ratios were−1.662 and −0.101 for the Heckman selection modelswith and without interactions, respectively. These wereboth statistically significant at 95% level of significance.DiscussionThe ICNET dataset had up to 42% and 38% of the dataon EQ-5D and Barthel scores, respectively, missing while31% of the sample had missing cost data. In terms ofother variables in the dataset, all but one (type of IC)had missing data ranging from 3 to 18%. This situationis common to a vast number of health service researchdatasets for older people. If these missing data are sim-ply ignored, then there is a chance that biased andunderpowered results may be obtained [1, 48]. The mostappropriate method of dealing with this amount of miss-ingness therefore had to be determined [19, 49]. Theresults of this analysis have shown that, in determining themethods to deal with missing data, using extra informa-tion gathered during the data collection exercise about thecause of missingness, rather than the arbitrary selection ofsuch methods, is more appropriate. There is however needto carry out similar analyses in datasets based on indivi-duals with different characteristics in order to discountthe effect that attributes specific to this dataset, such asthe age of respondents, may have had on these results.The evidence gathered concerning the missing costdata strongly suggested MCAR as the reason for thismissingness. When MAR-based methods were used forthese data, the results obtained were not significantlydifferent from those based on the MCAR assumption.These findings seem to bear out the position held bySchafer and Graham [50] and David et al. [51] that inmany realistic applications, departures from MAR arenot big enough to effectively invalidate the results of anMAR-based analysis. A similar position was arrived atby Foster and Fang [8]who found that estimates basedon listwise deletion (assuming MCAR) and those basedon MI and ignorable maximum likelihood estimation(both assuming MAR) were comparable. The use of anMNAR-based method in the costs per patient modelyielded results that were so different to those obtainedwhen either MCAR or MAR were assumed. In particu-lar, fewer significant variables were obtained in theMNAR-based method while, similar to the study by Fos-ter and Fang [8], the sizes of the coefficients were larger.Therefore, different conclusions could potentially bereached if the MNAR assumption was made for themissing cost data. Care must therefore be taken not toapply MNAR-based methods when it is not absolutelyclear that the missing data are MNAR as MNARapproaches often require assumptions that cannot bevalidated from the data at hand [52]. MNAR-basedapproaches are best implemented as sensitivity analysesso as to assess how robust results are across differentanalytic approaches [53].All three mechanisms of missingness were shown tobe potential causes of the missing EQ-5D and Bartheldata. When observations in the dependent variable areMAR while the independent variables are complete, Lit-tle [54] posits that the incomplete cases contribute noinformation to the regression where such a dependentvariable is modelled. While some, as a consequence,have deleted cases with missing values on the dependentvariable, which approach effectively reduces to acomplete case (regression) analysis [55], others haveused imputed values of the dependent variable in subse-quent regression analyses [16]. In this study, we did bothdespite the fact that we did not have outcome data thatwere purely MAR. The results from the ΔEQ-5D andΔBarthel models show that the choice of mechanism didnot have a very significant effect on the results. Despitethe sizes of the coefficients and standard errors beingsomewhat different, the results from all three methodswere broadly comparable in that similar conclusionscould have been reached on the back of running themodels. A possible explanation for this may have beenthe fact that the reason for missing data could beascribed to any one of the three mechanisms of missing-ness or indeed a combination of these mechanisms. Cro-ninger and Douglas [7] also assert that MCAR andMAR-based methods are relatively robust if the samplesize is modestly large even when missing data areMNAR. While the extra information gathered during thedata collection process supported the assertion thatthe missing data were either MCAR, MAR or MNAR,the significant mills ratios lent additional support to theMNAR assumption as its significance in the selectionmodels indicated the presence of significant selectionbias. However, selection models, even though identifi-able, should be treated with caution especially when dataare possibly not MNAR [56].In this study, there were limitations in terms of accur-ately determining the reasons for the missing data as thisdetermination relied on the views of IC coordinators,investigators’ observations and some statistical analysescarried out on the ICNET dataset. Determinations basedon this extra information were not definitive. Further,this information was only available for dependent vari-ables. A more formal way of collecting this extra infor-mation may include adding questions, within the maindata collection instrument, about why these data aremissing and this should be done for both dependent andindependent variables. A critical evaluation of theresponses to these questions will help inform the processof identifying the missingness mechanism. In the ab-sence of hypothesis testing, however, this extra informa-tion provided the best insights into why the missing dataKaambwa et al. BMC Research Notes 2012, 5:330 Page 10 of 12http://www.biomedcentral.com/1756-0500/5/330were not collected. In addition, the exclusion of missingobservations in the independent variable may havealtered the missing data mechanisms. This howeverwould mainly apply to cases where data were MAR. Asthe MAR mechanism was premised mainly on EQ-5Dand Barthel for which the missing observations werekept as low as possible, the probability of alterations inthe missingness mechanisms was minimised. Finally, theuse of untransformed OLS models for ΔEQ-5D andΔBarthel in the presence of the skewed nature of thetwo variables could have potentially led to biased results.Tests of skewness performed on the variables have how-ever showed low level of skewness (p values from theShapiro-Wilk test for ΔEQ-5D and ΔBarthel were 0.047and 0.042, respectively) implying that any bias resultingfrom the use of untransformed OLS models would alsobe minimal.ConclusionsMany studies have emphasised the importance of deter-mining the mechanism behind missing data before de-ciding on the technique to use [19, 49, 57]. This paperconsidered three different mechanisms that may be re-sponsible for missing data and then discussedapproaches that can be used to deal with the missingdata. The results from this analysis suggest that themethods used to analyse missing data really do matterespecially when one is considering whether or not to useMNAR-based methods. Dealing with missing data is noteasy especially as the hypothesis-based techniques fordetecting the pattern of missingness are limited in thatthey can only be used to rule out MCAR but can notconfirm this mechanism. Further, there are nohypothesis-test-based techniques available for determin-ing if data are MAR or MNAR in cases where the miss-ing data are irrecoverable. This therefore means thatthere should not be any arbitrary selection of assump-tions behind data missing mechanisms and using extrainformation gathered during the data collection exerciseabout the cause of missingness to guide this selectionwould be more appropriate. In the absence of this extrainformation, then one of the MAR-based methods couldbe considered as these were shown in this study andelsewhere to be robust for use even in cases where dataare strictly not MAR.Competing interestsThe authors declare that they have no competing interests.AcknowledgmentsWe are grateful to colleagues from the Universities of Birmingham andLeicester who participated in the National Evaluation of Intermediate CareServices from which data used in this study were obtained. We are alsothankful to the intermediate care-coordinators and the staff from the case-study sites that provided the quantitative data and clarified follow-upquestions. The National Evaluation was funded by the Department of Health(Policy Research Programme) and the Medical Research Council. Generalresearch funding for BK is provided through UK Department of health grants,LB is supported by Cancer Research UK and Medical Research Council (grantnumber G0800808). The funders were not involved in the study design, inthe writing of the manuscript or in the decision to submit the manuscriptfor publication.Author details1Health Economics Unit, Public Health Building, University of Birmingham,Edgbaston, Birmingham B15 2TT, United Kingdom. 2Centre for ClinicalEpidemiology and Evaluation, University of British Columbia, ResearchPavilion 702-828 West 10th Ave, Vancouver, Canada. 3Cancer Research UKClinical Trials Unit (CRCTU), University of Birmingham, Edgbaston,Birmingham, B15 2TT, United Kingdom. 4MRC Midland Hub for TrialsMethodology Research, University of Birmingham, Edgbaston, BirminghamB15 2TT, United Kingdom.Authors’ contributionsBK undertook the econometric analyses and wrote the first draft of thepaper. Subsequent drafts were contributed to by SB and LB who haveapproved the final version. BK will act as guarantor. All authors read andapproved the final manuscript.Received: 10 April 2012 Accepted: 27 June 2012Published: 27 June 2012References1. Schafer JL: Analysis of Incomplete Multivariate Data. London: Chapman &Hall; 1997.2. Biglan A, Severson H, Ary D, Faller C, Gallison C, Thompson R, et al: Dosmoking prevention programs really work? Attrition and the internaland external validity of an evaluation of a refusal skills training program.J Behav Med 1987, 10:159–171.3. Rubin DB: Multiple Imputation for Nonresponse in Surveys. New York: JohnWiley & Sons; 1987.4. Dow MM, Anthon Eff E: Multiple Imputation of Missing Data in Cross-Cultural Samples. Cross-Cultural Research 2009, 43:206–229.5. Barry AE: How attrition impacts the internal and external validity oflongitudinal research. J Sch Health 2005, 75:267–270.6. Little RJA, Rubin DB: Statistical Analysis with Missing Data. New York: JohnWiley; 1987.7. Croninger RG, Douglas KM: Missing Data and Institutional Research. InSurvey research. Emerging issues. New directions for institutional research #127.Edited by Umbach PD. San Fransisco: Jossey-Bass; 2005:33–50.8. Foster EM, Fang GY: Alternative methods for handling attrition: anillustration using data from the Fast Track evaluation. Eval Rev 2004,28:434–464.9. Kmetic A, Joseph L, Berger C, Tenenhouse A: Multiple imputation toaccount for missing data in a survey: estimating the prevalence ofosteoporosis. Epidemiology 2002, 13:437–444.10. Allison P: Missing data. Thousand Oaks, CA: Sage; 2000.11. Schafer JL: Multiple imputation: a primer. Stat Methods Med Res 1999,8:3–15.12. Fielding S, Fayers P, Ramsay C: Predicting missing quality of life data thatwere later recovered: an empirical comparison of approaches. Clin Trials2010, 7:333–342.13. Raymond MR, Roberts DM: A comparison of methods for treatingincomplete data in selection research. Educational andPsychologicalMeasurement 1987, 47:13–26.14. Allison PD: Multiple imputation for missing data: a cautionary tale.Sociological methods and Research 2000, 28:301–309.15. Hedeker D, Gibbons RD: Application of random-effects pattern-mixturemodels for missing data in longitudinal studies. Psychological Methods1997, 2:64–78.16. Schafer JL, Olsen MK: Multiple imputation for multivariate missing-dataproblems: A data analyst’s perspective. Multivariate Behavioral Research1998, 33:545–571.17. Heckman JJ: The common structure of statistical models of truncation,sample selection and limited dependent variables and a simpleestimator for such models. Annals of Economic and Social Measurement1976, 5:475–492.Kaambwa et al. BMC Research Notes 2012, 5:330 Page 11 of 12http://www.biomedcentral.com/1756-0500/5/33018. Heitjan DF: Annotation: what can be done about missing data?Approaches to imputation. Am J Public Health 1997, 87:548–550.19. Curran D, Bacchi M, Schmitz SF, Molenberghs G, Sylvester RJ: Identifyingthe types of missingness in quality of life data from clinical trials. StatMed 1998, 17:739–756.20. McKnight PE, McKnight KM, Sidani S, Figueredo AJ: Missing Data: A GentleIntroduction. New York: The Gilford Press; 2007.21. ICNET: A National Evaluation of the Costs and Outcomes of Intermediate Carefor Older People: Final Report. Leicester: The University of Leicester; 2005.22. Kaambwa B, Bryan S, Barton P, Parker H, Martin G, Hewitt G, et al: Costs andhealth outcomes of intermediate care: results from five UK case studysites. Health Soc Care Community 2008, 16:573–581.23. Kaambwa B, Billingham L, Bryan S: Mapping utility scores from the Barthelindex. European Journal of Health Economics 2011 Nov 2 [Epub ahead ofprint].24. Brazier JE, Walters SJ, Nicholl JP, Kohler B: Using the SF-36 and Euroqol onan elderly population. Qual Life Res 1996, 5:195–204.25. Coast J, Peters TJ, Richards SH, Gunnell DJ: Use of the EuroQoL amongelderly acute care patients. Qual Life Res 1998, 7:1–10.26. Lyons RA, Crome P, Monaghan S, Killalea D, Daley JA: Health status anddisability among elderly people in three UK districts. Age Ageing 1997,26:203–209.27. Brazier J, Roberts J, Tsuchiya A, Busschbach J: A comparison of the EQ-5Dand SF-6D across seven patient groups. Health Econ 2004, 13:873–884.28. Dolan P: Modeling valuations for EuroQol health states. Med Care 1997,35:1095–1108.29. Kind P, Hardman G, Macran S: UK population norms for EQ-5D. Discussionpaper 172. York: University of York Centre for Health Economics; 1999.30. Murphy R, Sackley CM, Miller P, Harwood RH: Effect of experience ofsevere stroke on subjective valuations of quality of life after stroke.J Neurol Neurosurg Psychiatry 2001, 70:679–681.31. Sainsbury A, Seebass G, Bansal A, Young JB: Reliability of the Barthel Indexwhen used with older people. Age Ageing 2005, 34:228–232.32. Minosso JSM, Amendola F, Alvarenga MRM, de Campos Oliveira MA:Validation of the Barthel Index in elderly patients attended in outpatientclinics, in Brazil. Acta Paul Enferm 2010, 23:218–223.33. Mahoney FI, Barthel D: Functional Evaluation: The Barthel Index. Md StateMed J 1965, 14:61–65.34. Wolfe CD, Taub NA, Woodrow EJ, Burney PG: Assessment of scales ofdisability and handicap for stroke patients. Stroke 1991, 22:1242–1244.35. Shah S, Vanclay F, Cooper B: Improving the sensitivity of the Barthel Indexfor stroke rehabilitation. J Clin Epidemiol 1989, 42:703–709.36. Musil CM, Warner CB, Yobas PK, Jones SL: A comparison of imputationtechniques for handling missing data. West J Nurs Res 2002, 24:815–829.37. Fielding S, Fayers PM, Ramsay CR: Investigating the missing datamechanism in quality of life outcomes: a comparison of approaches.Health Qual Life Outcomes 2009, 7:57.38. McCullagh P, Nelder JA: Generalized linear models. 2nd edition. London:Chapman & Hall; 1989.39. Manning WG, Mullahy J: Estimating log models: to transform or not totransform? J Health Econ 2001, 20:461–494.40. Altman D: Practical statistics for medical research. 2nd edition. London:Chapman & Hall; 1991.41. Cantoni E, Ronchetti E: A robust approach for skewed and heavy-tailedoutcomes in the analysis of health care expenditures. J Health Econ 2006,25:198–213.42. Duan N: Smearing estimate a nonparametric retransformation method.J Amer Statist Assoc 1983, 78:605–610.43. Kilian R, Matschinger H, Loeffler W, Roick C, Angermeyer MC: A comparisonof methods to handle skew distributed cost variables in the analysis ofthe resource consumption in schizophrenia treatment. J Ment HealthPolicy Econ 2002, 5:21–31.44. Brazier JE, Yang Y, Tsuchiya A, Rowen DL: A review of studies mapping (orcross walking) non-preference based measures of health to genericpreference-based measures. Eur J Health Econ 2010, 11:215–225.45. Gujarati D: Basic Econometrics. 3rd edition. New York: McGraw-Hill, Inv; 1995.46. Schafer JL: NORM: Multiple imputation of incomplete multivariate dataunder a normal model version 2 Software for Windows 95/98/NT. [http://www.stat.psu.edu/jls/misoftwa.html].47. StataCorp LP: Intercooled Stata 82 for Windows.: College Station, TX: USStataCorp LP; 2004.48. Roderick P, Low J, Day R, Peasgood T, Mullee MA, Turnbull JC, et al: Strokerehabilitation after hospital discharge: a randomized trial comparingdomiciliary and day-hospital care. Age Ageing 2001, 30:303–310.49. Cohen J, Cohen P: Applied multiple regression/correlation analysis for thebehavioral sciences. 2nd edition. Hillsdale, NJ: Erlbaum; 1983.50. Schafer JL, Graham JW: Missing data: our view of the state of the art.Psychological Methods 2002, 7:147–177.51. David M, Little RJA, Samuhel ME, Triest RK: Alternative Methods for CPSIncome Imputation. Journal of the American StatisticalAssociation 1986,81:29–41.52. Verbeke G, Molenberghs G: Linear Mixed Models for Longitudinal Data. NewYork: Springer; 2000.53. Mallinckrodt CH, Sanger TM, Dube S, DeBrota DJ, Molenberghs G, Carroll RJ,et al: Assessing and interpreting treatment effects in longitudinal clinicaltrials with missing data. Biol Psychiatry 2003, 53:754–760.54. Little RJA: Regression with missing X’s: a review. Journal of the AmericanStatistical Association 1992, 87:1227–1237.55. Von Hippel PT: Regression with missing Ys: An improved strategy foranalyzing multiply imputed data. Sociological Methodology 2007,37:83–117.56. Glynn RJ, Laird NM, Rubin DB: Drawing Inferences from Self-selectedSamples. In Selection modelling versus mixture modelling with nonignorablenonresponse. Edited by Wainer H. New York: Springer; 1986:115–142.57. Orme JG, Reis J: Multiple regression with missing data. Journal of SocialService Research 1991, 9:61–91.doi:10.1186/1756-0500-5-330Cite this article as: Kaambwa et al.: Do the methods used to analysemissing data really matter? An examination of data from anobservational study of Intermediate Care patients. BMC Research Notes2012 5:330.Submit your next manuscript to BioMed Centraland take full advantage of: • Convenient online submission• Thorough peer review• No space constraints or color figure charges• Immediate publication on acceptance• Inclusion in PubMed, CAS, Scopus and Google Scholar• Research which is freely available for redistributionSubmit your manuscript at www.biomedcentral.com/submitKaambwa et al. BMC Research Notes 2012, 5:330 Page 12 of 12http://www.biomedcentral.com/1756-0500/5/330


Citation Scheme:


Citations by CSL (citeproc-js)

Usage Statistics



Customize your widget with the following options, then copy and paste the code below into the HTML of your page to embed this item in your website.
                            <div id="ubcOpenCollectionsWidgetDisplay">
                            <script id="ubcOpenCollectionsWidget"
                            async >
IIIF logo Our image viewer uses the IIIF 2.0 standard. To load this item in other compatible viewers, use this url:


Related Items