{"@context":{"@language":"en","Affiliation":"http:\/\/vivoweb.org\/ontology\/core#departmentOrSchool","AggregatedSourceRepository":"http:\/\/www.europeana.eu\/schemas\/edm\/dataProvider","Citation":"https:\/\/open.library.ubc.ca\/terms#identifierCitation","CopyrightHolder":"https:\/\/open.library.ubc.ca\/terms#rightsCopyright","Creator":"http:\/\/purl.org\/dc\/terms\/creator","DateAvailable":"http:\/\/purl.org\/dc\/terms\/issued","DateIssued":"http:\/\/purl.org\/dc\/terms\/issued","Description":"http:\/\/purl.org\/dc\/terms\/description","DigitalResourceOriginalRecord":"http:\/\/www.europeana.eu\/schemas\/edm\/aggregatedCHO","FullText":"http:\/\/www.w3.org\/2009\/08\/skos-reference\/skos.html#note","Genre":"http:\/\/www.europeana.eu\/schemas\/edm\/hasType","IsShownAt":"http:\/\/www.europeana.eu\/schemas\/edm\/isShownAt","Language":"http:\/\/purl.org\/dc\/terms\/language","PeerReviewStatus":"https:\/\/open.library.ubc.ca\/terms#peerReviewStatus","Provider":"http:\/\/www.europeana.eu\/schemas\/edm\/provider","Publisher":"http:\/\/purl.org\/dc\/terms\/publisher","PublisherDOI":"https:\/\/open.library.ubc.ca\/terms#publisherDOI","Rights":"http:\/\/purl.org\/dc\/terms\/rights","RightsURI":"https:\/\/open.library.ubc.ca\/terms#rightsURI","ScholarlyLevel":"https:\/\/open.library.ubc.ca\/terms#scholarLevel","Subject":"http:\/\/purl.org\/dc\/terms\/subject","Title":"http:\/\/purl.org\/dc\/terms\/title","Type":"http:\/\/purl.org\/dc\/terms\/type","URI":"https:\/\/open.library.ubc.ca\/terms#identifierURI","SortDate":"http:\/\/purl.org\/dc\/terms\/date"},"Affiliation":[{"@value":"Medicine, Faculty of","@language":"en"},{"@value":"Non UBC","@language":"en"},{"@value":"Obstetrics and Gynaecology, Department of","@language":"en"}],"AggregatedSourceRepository":[{"@value":"DSpace","@language":"en"}],"Citation":[{"@value":"BMC Medical Research Methodology. 2016 Sep 21;16(1):123","@language":"en"}],"CopyrightHolder":[{"@value":"The Author(s).","@language":"en"}],"Creator":[{"@value":"Schummers, Laura","@language":"en"},{"@value":"Himes, Katherine P","@language":"en"},{"@value":"Bodnar, Lisa M","@language":"en"},{"@value":"Hutcheon, Jennifer A","@language":"en"}],"DateAvailable":[{"@value":"2018-05-15T22:15:15Z","@language":"en"}],"DateIssued":[{"@value":"2016-09-21","@language":"en"}],"Description":[{"@value":"Background:\r\n                Compelled by the intuitive appeal of predicting each individual patient\u2019s risk of an outcome, there is a growing interest in risk prediction models. While the statistical methods used to build prediction models are increasingly well understood, the literature offers little insight to researchers seeking to gauge a priori whether a prediction model is likely to perform well for their particular research question. The objective of this study was to inform the development of new risk prediction models by evaluating model performance under a wide range of predictor characteristics.\r\n              \r\n              \r\n                Methods:\r\n                Data from all births to overweight or obese women in British Columbia, Canada from 2004 to 2012 (n\u2009=\u200975,225) were used to build a risk prediction model for preeclampsia. The data were then augmented with simulated predictors of the outcome with pre-set prevalence values and univariable odds ratios. We built 120 risk prediction models that included known demographic and clinical predictors, and one, three, or five of the simulated variables. Finally, we evaluated standard model performance criteria (discrimination, risk stratification capacity, calibration, and Nagelkerke\u2019s r2) for each model.\r\n              \r\n              \r\n                Results:\r\n                Findings from our models built with simulated predictors demonstrated the predictor characteristics required for a risk prediction model to adequately discriminate cases from non-cases and to adequately classify patients into clinically distinct risk groups. Several predictor characteristics can yield well performing risk prediction models; however, these characteristics are not typical of predictor-outcome relationships in many population-based or clinical data sets. Novel predictors must be both strongly associated with the outcome and prevalent in the population to be useful for clinical prediction modeling (e.g., one predictor with prevalence \u226520\u00a0% and odds ratio \u22658, or 3 predictors with prevalence \u226510\u00a0% and odds ratios \u22654). Area under the receiver operating characteristic curve values of\u2009>0.8 were necessary to achieve reasonable risk stratification capacity.\r\n              \r\n              \r\n                Conclusions:\r\n                Our findings provide a guide for researchers to estimate the expected performance of a prediction model before a model has been built based on the characteristics of available predictors.","@language":"en"}],"DigitalResourceOriginalRecord":[{"@value":"https:\/\/circle.library.ubc.ca\/rest\/handle\/2429\/65898?expand=metadata","@language":"en"}],"FullText":[{"@value":"RESEARCH ARTICLE Open AccessPredictor characteristics necessary forbuilding a clinically useful risk predictionmodel: a simulation studyLaura Schummers1* , Katherine P. Himes2, Lisa M. Bodnar3 and Jennifer A. Hutcheon4AbstractBackground: Compelled by the intuitive appeal of predicting each individual patient\u2019s risk of an outcome, there isa growing interest in risk prediction models. While the statistical methods used to build prediction models areincreasingly well understood, the literature offers little insight to researchers seeking to gauge a priori whether aprediction model is likely to perform well for their particular research question. The objective of this study was toinform the development of new risk prediction models by evaluating model performance under a wide range ofpredictor characteristics.Methods: Data from all births to overweight or obese women in British Columbia, Canada from 2004 to 2012(n = 75,225) were used to build a risk prediction model for preeclampsia. The data were then augmented withsimulated predictors of the outcome with pre-set prevalence values and univariable odds ratios. We built 120 riskprediction models that included known demographic and clinical predictors, and one, three, or five of the simulatedvariables. Finally, we evaluated standard model performance criteria (discrimination, risk stratification capacity,calibration, and Nagelkerke\u2019s r2) for each model.Results: Findings from our models built with simulated predictors demonstrated the predictor characteristics requiredfor a risk prediction model to adequately discriminate cases from non-cases and to adequately classify patients intoclinically distinct risk groups. Several predictor characteristics can yield well performing risk prediction models; however,these characteristics are not typical of predictor-outcome relationships in many population-based or clinical data sets.Novel predictors must be both strongly associated with the outcome and prevalent in the population to be useful forclinical prediction modeling (e.g., one predictor with prevalence \u226520 % and odds ratio \u22658, or 3 predictors withprevalence \u226510 % and odds ratios \u22654). Area under the receiver operating characteristic curve values of >0.8 werenecessary to achieve reasonable risk stratification capacity.Conclusions: Our findings provide a guide for researchers to estimate the expected performance of a predictionmodel before a model has been built based on the characteristics of available predictors.Keywords: Epidemiologic methods, Risk prediction model, Discrimination, Risk classification, Model performance, Areaunder the receiver operating characteristic curve* Correspondence: lauraschummers@mail.harvard.edu1Department of Epidemiology, Harvard T.H. Chan School of Public Health,677 Huntington Avenue, Boston, MA 02115, USAFull list of author information is available at the end of the article\u00a9 2016 The Author(s). Open Access This article is distributed under the terms of the Creative Commons Attribution 4.0International License (http:\/\/creativecommons.org\/licenses\/by\/4.0\/), which permits unrestricted use, distribution, andreproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link tothe Creative Commons license, and indicate if changes were made. The Creative Commons Public Domain Dedication waiver(http:\/\/creativecommons.org\/publicdomain\/zero\/1.0\/) applies to the data made available in this article, unless otherwise stated.Schummers et al. BMC Medical Research Methodology  (2016) 16:123 DOI 10.1186\/s12874-016-0223-2BackgroundGiven the intuitive appeal of individual-level risk predic-tion, there is growing interest in developing clinical riskprediction models. By tailoring each individual\u2019s esti-mated risk of an adverse outcome according to theirdemographic and clinical characteristics, risk predictionmodels can distinguish high and low risk patients. Thishas the potential to improve health outcomes and re-duce health care costs by identifying patients who wouldbenefit from additional diagnostic procedures or treat-ment options, and those who would not.While the statistical steps used to build predictionmodels are well described and increasingly well imple-mented, the literature offers little insight to researchersseeking to gauge whether a prediction model is likely toperform well for their particular research question. Pepeand colleagues [1] have demonstrated previously that asingle predictor must have an extremely strong associ-ation with the outcome in order to sufficiently improvea model\u2019s ability to discriminate cases from non-cases.However, it is difficult to generalize these findings tostudies that aim to collect multiple predictors, rangingin prevalence and strength of association with the out-come. Thus, few researchers know how to assess thelikelihood that their prediction model will perform ad-equately a priori based on the characteristics of the pre-dictors they expect to collect in their study, or theextent to which the addition of novel predictors will im-prove the performance of existing models.Predicting an individual woman\u2019s risk of developingpreeclampsia in pregnancy, a leading cause of maternaland perinatal morbidity [2], is of particular interest inperinatal epidemiology. Women identified as high risk inearly pregnancy may benefit from increased prenatal sur-veillance, referral to tertiary care centers with high riskspecialists in maternal-fetal medicine, or treatment withantiplatelet agents such as aspirin [2, 3]. Ruling out ahigh risk of preeclampsia would avoid unnecessary sur-veillance and maternal anxiety [2]. Accordingly, severalresearch groups have built clinical risk prediction modelsfor preeclampsia, each using commonly available demo-graphic and clinical characteristics coupled with novelpredictors unique to their data (e.g., biomarkers or im-aging studies) [4\u20139]. Despite considerable clinical detailin these data sets, none of the models demonstrated suf-ficient performance for routine use in clinical practice.Using the example of preeclampsia, we evaluated per-formance criteria (discrimination, risk stratification cap-acity, and calibration) of a model built using standarddemographic and clinical predictors. We then augmentedthis data with simulated predictors ranging in prevalenceand strength of association with preeclampsia, and builtmultiple clinical prediction models that incorporated one,three, or five of these new variables. The objective of thisstudy was to guide the development of new risk predictionmodels by establishing the performance of risk predictionmodels under a wide range of predictor prevalence valuesand univariable odds ratios with the outcome of interest.MethodsOur study population included all overweight or obesewomen (body mass index \u226525 kg\/m2) who gave birth toinfants weighing at least 500 g or of at least 20 com-pleted weeks of gestation in British Columbia, Canadafrom April 1, 2004 to March 31, 2012. We restricted tooverweight and obese women because they are mostlikely to receive pre-pregnancy counselling on modifiablerisk factors for adverse pregnancy outcomes such as pre-eclampsia, and are thus a population for whom a riskprediction model might be most useful. Data were ob-tained from the British Columbia Perinatal Data Registry[10], a high quality population-based data source admin-istered by Perinatal Services BC that contains abstractedlinked maternal and newborn antenatal and delivery ad-mission medical record data [11]. Preeclampsia wasidentified using the International Classification of Dis-eases Version 10 (ICD-10) codes O11, O13-O16.We built a logistic regression model predicting risk ofpre-eclampsia using the following demographic andclinical characteristics: prepregnancy body mass index,maternal height, maternal age, parity, and smoking sta-tus. This is our \u201coriginal model\u201d. Assumptions of linear-ity were assessed for continuous variables (prepregnancyBMI, height, and maternal age). Linear, quadratic, cat-egorical, and restricted cubic spline transformationswere considered, and the transformation that minimizedthe Akaike Information Criterion (AIC) was selected.We used a \u2018full model\u2019 variable selection approach, inwhich all variables expected to predict preeclampsia ona priori grounds were included in the logistic regressionmodel. This method is known to minimize bias that canbe introduced by selecting variables according to statis-tical criteria [12]. Multi-collinearity between predictorswas examined using Variance Inflation Factors, with avalue >3 as an indicator of multicollinearity.To determine the predictor characteristics necessaryfor a clinical prediction model to perform well in termsof discrimination and risk stratification capacity, we thenaugmented this \u201coriginal model\u201d with additional simu-lated predictors of preeclampsia, as might occur with at-tempts to improve current preeclampsia predictionmodels by adding biomarkers such as PlGF [13] or othernovel predictors, such as sonographic imaging of placen-tal morphology [14] that could potentially improve theperformance of existing models. The prevalence valuesof the simulated predictors were set to 5 %, 10 %, 20 %,or 40 %, and strengths of association (univariable oddsratios) were set to range from 1 to 16. All simulatedSchummers et al. BMC Medical Research Methodology  (2016) 16:123 Page 2 of 10predictors were binary and independent of one another.We built clinical prediction models that included ouroriginal predictors, as well as one, three, or five of thesimulated predictors. We then built 120 models toachieve every possible combination of these prevalenceand odds ratio values. We repeated variable generationand all model building steps for the models built withthe new variables 5000 times to account for variability inthe random draws from the normal distribution used tocreate each simulated variable.Evaluation of model performanceWe evaluated calibration of our models visually by com-paring observed versus expected risks and formally usingthe Hosmer-Lemeshow goodness-of-fit test [12]. Dis-crimination was assessed using the c statistic (the areaunder the receiver operating characteristic curve, AUC,for binary outcomes), where 0.7 is commonly used to in-dicate minimally acceptable discrimination [12, 15]. Weextended our assessment of discrimination by examiningthe proportion of the population classified into a riskstratum in which the likelihood ratio was greater than10 or less than 0.1 [16]. Likelihood ratios were calculatedby dividing the percentage of women with preeclampsiain each risk group by the percentage of women withoutpreeclampsia in that risk group. The proportion of vari-ability in the outcome that was explained by the predic-tors was measured using Nagelkerke\u2019s r2, a summaryindicator of model performance [17].We also examined risk stratification capacity, whichreflects the extent to which the model is able to dividepatients into groups with clinically distinct risk profiles(i.e., high risk vs. low risk) [18]. These risk profiles areintended to alter women\u2019s clinical management. Riskstratification capacity is most often assessed using dec-iles of predicted risk, which reflects arbitrary cutoffsbased on statistical characteristics of the study popula-tion. Instead, we opted to use groupings of predicted riskthat would reflect thresholds for treatment or surveil-lance decisions in clinical practice. We measured riskstratification capacity by assessing the proportion of thepopulation classified into a clinically distinct risk group,defined as predicted risk greater than 15.0 % or less than3.0 %. Given a population average risk of 8.4 %, thesethresholds were determined by a maternal-fetal medicinephysician (KPH) as the thresholds above or below whichclinical management would be altered by the predictionscore (that is, women with a predicted risk of >15.0 %would likely be managed as \u2018high risk\u2019, women with apredicted risk <3.0 % would likely be managed as \u2018lowrisk\u2019, while predicted probabilities of 3.0\u201315.0 % wouldbe considered uninformative because they are clinicallyequivalent to the risk estimated in the absence of amodel (8.4 %)).Model overfitting, or optimism, was evaluated with200 bootstrap samples drawn with replacement from theoriginal sample. We repeated all model building steps tofit the model in each bootstrap sample. Average opti-mism (the average of the difference between the ob-served AUC in the study population and the AUC ineach bootstrap sample) was subtracted from the AUC inthe study population to calculate the optimism-corrected AUC [12]. We followed these steps for theoriginal model, as well as for all models including simu-lated predictors.Sensitivity analysisWe conducted sensitivity analyses using different riskgroup definitions (2.0 %, 2.5 %, 18.0 %, 20.0 %) to evalu-ate how sensitive our findings were to this definition. Toensure that the performance of this prediction modeldid not reflect unique characteristics of preeclampsia,we built clinical prediction models for several other ad-verse pregnancy outcomes as sensitivity analyses. Theseoutcomes were gestational diabetes, spontaneous pre-term delivery before 32 weeks, indicated preterm deliv-ery before 37 weeks, macrosomia, shoulder dystocia,cesarean delivery, postpartum hemorrhage requiringintervention to control bleeding, maternal mortality\/se-vere morbidity, stillbirth, NICU stay \u226548 h, and in-hospital newborn mortality (see Schummers [19] fordetailed outcome definitions). In addition to those pre-dictors included in the model for preeclampsia, we in-cluded additional outcome-specific predictors in somemodels (see footnote of Additional file 1 for completelist). All analyses were conducted using Stata Version12.0 [20].ResultsOf the 334,861 births in British Columbia during thestudy period, 229,387 had available data on prepreg-nancy body mass index. Of these, the 75,225 overweightor obese women (body mass index \u226525) were includedin this analysis. Table 1 presents the prevalence and oddsratios of the predictors of preeclampsia observed in ourdata. Predictors in our data ranged in prevalence from0.4 % (history of neonatal death) to 43.3 % (nulliparity).Crude odds ratios ranged from 0.8 (history of stillbirthor spontaneous abortion) to 2.9 (pre-existing diabetes).The Hosmer-Lemeshow goodness-of-fit test indicatedadequate goodness of fit (calibration), with p = 0.33.Likewise, visual examination of observed versus pre-dicted risks according to the original model indicatedadequate calibration (data not shown). As shown inTable 2, the original model had poor risk stratificationcapacity, with only 19.2 % of the population classifiedinto clinically distinct high or low risk groups (11.5 and7.7 %, respectively). None of the strata had informativeSchummers et al. BMC Medical Research Methodology  (2016) 16:123 Page 3 of 10likelihood ratios (i.e., all likelihood ratios were between0.1 and 10). Our original risk prediction model had anAUC of 0.68, slightly below the 0.7 value widely used asa threshold to indicate adequate discrimination perform-ance [12]. The optimism-corrected AUC was also 0.68(see Additional file 1), indicating minimal overfitting.Similarly, overall model performance appeared poor,with a Nagelkerke\u2019s r2 indicating that this model ex-plained only 7.2 % of the variability in preeclampsia risk.The AUC obtained from the preeclampsia risk predic-tion model was comparable to those we built for otheradverse pregnancy outcomes with AUCs ranging from0.59 for stillbirth to 0.66 for gestational diabetes. Allmodels exhibited minimal overfitting, with estimatedoptimism ranging from 0.02 to <0.01. See Additional file1 for the observed and optimism-corrected AUCs forthe prediction models for all outcomes we examined.The models built after adding simulated predictorsdemonstrate the prevalence and univariable odds ratiovalues necessary for a model to perform well in terms ofdiscrimination, risk stratification capacity, and variabilityin outcome risk explained by the predictors. Figures 1, 2,and 4 show the model performance of each model byplotting the performance metric (AUC, proportion ofthe population classified into a clinically distinct riskgroup, and Nagelkerke\u2019s r2, respectively) on the y-axesagainst the odds ratios of simulated predictors on the x-axes. Each curve in the figure represents a specificprevalence of the simulated predictors in the population,ranging from 5.0\u201340.0 %. Each sub-figure A-C repre-sents the number of simulated predictors added to themodels (one, three, and five, respectively).The starting point for the AUCs of all models built withsimulated predictors is 0.68, the observed AUC for the ori-ginal model. From Fig. 1a (left), we see that the odds ratiofor a single added predictor must be at least 6, and theprevalence at least 20.0 %, to achieve an AUC of 0.8. Onecommon predictor (prevalence \u226520.0 %) with an odds ratioof 16 yields an AUC approaching 0.85. In Fig. 1b (center),we see that odds ratios for three common predictors (eachwith prevalence \u226520.0 %) need only reach a magnitude of 4to produce a model with AUC of 0.8. Three common pre-dictors with odds ratios of 16 can yield an almost perfectAUC, near 0.95. With five simulated predictors (Fig. 1c,right), rare predictors (5.0\u201310.0 % prevalence) can yield anAUC of 0.9, provided each odds ratio exceeds 10. Fivecommon predictors (prevalence \u226520.0 %) with odds ratiosof 3 to 4 can produce an AUC of 0.85, increasing to 0.95 asodds ratios increase. As with the original model, models in-cluding simulated predictors exhibited minimal overfitting,with AUC estimates remaining unchanged to 2 decimalplaces after correcting for optimism.Figure 2 depicts the risk stratification capacity of eachmodel after adding simulated predictors according to theproportion of the population classified into a clinically dis-tinct risk group (i.e., a risk group that is meaningfully highTable 1 Clinical characteristics and risk factors for preeclampsiaincluded in clinical prediction model in our data set and thosefrom a previously published cohort study of preeclampsia riskPredictors in our data setPrevalencen (%)Crude oddsratio (95 % CI)Maternal age a<20 1,624 (2.2) 1.0 (0.8, 1.2)20\u201329 32,140 (42.7) REF30\u201340 38,444 (51.1) 1.0 (0.9, 1.0)\u226540 3,017 (4.0) 1.4 (1.3, 1.6)Prepregnancy body mass index a25\u201329 46,979 (62.5) REF30\u201334 17,692 (23.5) 1.6 (1.5, 1.7)35\u201339 6,968 (9.3) 2.1 (1.9, 2.3)\u226540 3,586 (4.8) 1.8 (2.5, 3.1)Maternal height <60 in. 4,280 (5.7) 0.9 (0.8, 1.0)Nulliparity 32,571 (43.3) 2.5 (2.4, 2.6)Pre-existing diabetes 769 (1.0) 2.9 (2.4, 3.5)Smoking 8,411 (11.2) 0.9 (0.8, 1.0)History of stillbirth 713 (0.9) 0.8 (0.6, 1.1)History neonatal death 281 (0.4) 1.0 (0.6, 1.5)History of spontaneous abortion 18,046 (24.0) 1.0 (0.9, 1.0)aPrepregnancy body mass index and maternal age at birth were modeledusing restricted cubic splinesTable 2 Risk stratification capacity of the original model: observed vs. predicted riskPredicted risk (%) No. of births perstratum (% of sample)Observed risk (%) Likelihood ratio(95 % CI)<3.0 5,788 (7.7) 134 (2.3) 0.3 (0.2\u20130.3)3.0\u20135.5 21,654 (28.8) 876 (4.0) 0.4 (0.4\u20130.4)5.5\u201312.0 a 33,178 (44.11) 2,846 (8.6) 1.0 (1.0\u20131.1)12.0\u201315.0 5,960 (7.9) 800 (13.4) 1.7 (1.6\u20131.8)>15.0 8,645 (11.5) 1,660 (19.2) 2.7 (2.6\u20132.9)Total 75,225 (100.0 %) 6,313 (8.4) -aGiven a baseline risk of 8.4 %, this category is clinically equivalent to the population average riskSchummers et al. BMC Medical Research Methodology  (2016) 16:123 Page 4 of 10or low risk from a clinical perspective). In the originalmodel, less than 20 % of the population was classified intoa clinically distinct risk group (19.2 %); this is the baselineproportion for the models with simulated predictors.Figure 2a, (left) shows the risk stratification for all modelsaugmented with one simulated predictor. With one rare(5.0\u201310.0 % prevalence) simulated predictor added, none ofthe models classified 50 % of the population into clinicallydistinct risk groups, even with odds ratios of 16. With amore common predictor (20.0 % prevalence), an odds ratioof 8 was necessary to classify 50 % of the population intoclinically distinct risk groups. One predictor of 40.0 % preva-lence with an odds ratio of 6 was needed to classify 50 % ofthe population into a clinically distinct risk group, while anodds ratio of greater than 12 was needed to classify 75 % ofthe population into a clinically distinct risk group.For models with three simulated predictors, shown inFig. 2b, rare predictors (5.0\u201310.0 % prevalence) requiredodds ratios of 6 to 10, those with 20 % prevalence requiredodds ratios greater than 4, and common predictors (40.0 %prevalence) required odds ratios greater than 3 to classify50 % of the population into clinically distinct risk groups.Models with rare predictors (5.0\u201310.0 % prevalence) werenever able to classify 75 % of the cohort into clinically dis-tinct risk group, though more common predictors didreach 75 % with odds ratios from 8 to 12.Not surprisingly, models with five simulated predictorsshowed the best risk stratification performance, withFig. 1 Discrimination performance (measured by area under Receiver Operator Characteristic curve) of risk prediction models according tosimulated predictor characteristics. The original risk prediction model was augmented with simulated predictors with prevalence from 5 to 40 %and odds ratios ranging from 1 to 16: a one added simulated predictor per model; b three added simulated predictors per model; c five addedsimulated predictors per modelFig. 2 Proportion of population classified into a clinically distinct risk group (predicted risk <3.0 % or >15.0 %) from risk prediction modelsaccording to simulated predictor characteristics. The original risk prediction model was augmented with simulated predictors with prevalencefrom 5 to 40 % and odds ratios ranging from 1 to 16: a one added simulated predictor per model; b three added simulated predictors permodel; c five added simulated predictors per modelSchummers et al. BMC Medical Research Methodology  (2016) 16:123 Page 5 of 10lower required odds ratio and prevalence values (Fig. 2c).Five predictors of 5.0 % prevalence require odds ratios of6 to classify 50 % of the cohort into clinically distinctstrata and odds ratios of 12 to classify 75 % of the co-hort. Models with 5 common predictors (20.0\u201340.0 %prevalence) classified 50 % of the population into clinic-ally distinct risk groups with odds ratios of 4, and classi-fied 75 % of the population with odds ratios of 8.Figure 3 provides an alternative approach to examine theproportion of the population classified into clinically dis-tinct risk groups. These histograms display the frequenciesof different predicted risks according to two models withvery different risk stratification capacities. In both sub-figures, the area in green (left) indicates the number ofwomen with a predicted risk below 3.0 % (clinically distinctlow risk group); the brown area (center) indicates the num-ber of women with an uninformative predicted risk, notmarkedly different from the population average, or what wewould predict for individuals based on a null model (3.0\u201315.0 %); the blue area (right) shows the number of womenwith predicted risk above 15.0 % (clinically distinct high riskgroup). The histogram on the left (Fig. 3a) shows predictedrisks from a model with one simulated predictor added tothe real data with an odds ratio of 1.5 and 5.0 % prevalence.The histogram on the right (Fig. 3b) shows predicted risksfrom a model with 5 simulated predictors with odds ratiosof 6 and 40.0 % prevalence. A perfect model would classify8.4 % of the population (the incidence of preeclampsia inthis population) as high risk and the rest as low risk. As ex-pected, the model on the left shows poor performance, withthe majority of the population in the uninformative group(79.8 %), and far too few in the low risk group (8.6 %). Themodel on the right performs far better, and classifies themajority of the population (74.0 %) into a clinically distinctrisk group. Appropriately, most (60.2 %) were classified intothe low risk group, a small number were classified into thehigh risk group (13.8 %), and about a quarter (26.0 %) intothe uninformative group.The proportion of variability in preeclampsia risk thatwas explained by the predictors included in each model(Nagelkerke\u2019s r2) was plotted according to predictor charac-teristics in Fig. 4. The observed predictors included in theoriginal model explained very little of the variability in pre-eclampsia risk (7.2 %). As shown in Fig. 4a, models includ-ing only 1 simulated predictor showed poor performance,even when the added predictor was strongly associatedwith the outcome (OR= 16) and prevalent in the popula-tion (\u226520.0 % prevalence). Model performance improvedgreatly with 3 and 5 added predictors. When 3 predictorswith \u226520 % prevalence were included, models explained50 % or more of the variability in preeclampsia risk whenodds ratios were equal to 8 or more. With 5 added predic-tors, even uncommon predictors with large odds ratios(\u226510) were able to explain more than 50 % of the outcomeFig. 3 Histogram of predicted risk for each observation based on the original risk prediction model plus a one simulated predictor with an oddsratio of 1.5 and 5 % prevalence, and b five simulated predictors with odds ratios of 6 and 40 % prevalence. Green bars indicate a clinically distinctlow risk group (predicted risk <3.0 %); brown bars indicate uninformative predicted risk (3.0\u201315.0 %); blue bars indicate a clinically distinct high riskgroup (predicted risk >15.0 %)Schummers et al. BMC Medical Research Methodology  (2016) 16:123 Page 6 of 10variability. As expected, models with 5 common (\u226520.0 %prevalence) and odds ratios \u226512 demonstrated excellentperformance, with r2 values approaching 75 %.Table 3 combines the model performance measures ofdiscrimination and risk stratification by presenting theproportion of the population classified into a stratumwith an informative likelihood ratio. As the number,prevalence, and odds ratios of simulated predictors in-crease, model performance improves in terms of bothdiscrimination and clinically relevant risk stratificationcapacity. This table illustrates a consistent relationshipbetween discrimination according to the AUC and riskstratification capacity according to the proportion of thepopulation classified into a clinically distinct risk group.Models that displayed minimum acceptable discriminativeability, assessed by an AUC of 0.7, exhibited poor risk strati-fication capacity, with 75 % of the population classified intoa group that was clinically equivalent to the populationbaseline risk. In order to classify 50 % of the population intoa clinically distinct risk group (high or low risk), AUCs of0.85 were needed, while AUCs of 0.95 were needed for75 % of the population to be classified into clinically distinctrisk groups. A complete table with performance measuresfor all 120 models we built can be found in Additional fileFig. 4 Overall model performance (measured by the proportion of variability in the outcome explained by the predictors, or Nagelkerge\u2019s r2) ofrisk prediction models according to simulated predictor characteristics. The original risk prediction model was augmented with simulatedpredictors with prevalence from 5 to 40 % and odds ratios ranging from 1 to 16: a one added simulated predictor per model; b three addedsimulated predictors per model; c five added simulated predictors per modelTable 3 Model performance measures according to odds ratio, number, and prevalence of simulated predictorsSimulated predictor characteristics Model performance measuresOR ofsimulatedpredictorsNumber of simulatedpredictors added tooriginal modelPrevalence ofsimulatedpredictorsProportion of population (%)with informative likelihoodratio aProportion of population (%)assigned to clinically distinct riskgroup bAUC Nagelkerke\u2019sr2 (%)2 3 10 % 0.0 27.2 0.71 10.02 3 20 % 0.0 29.9 0.73 11.72 3 40 % 0.0 34.8 0.74 12.92 5 10 % 0.0 28.9 0.73 11.82 5 20 % 0.0 33.3 0.75 14.62 5 40 % 0.0 39.2 0.77 16.66 3 10 % 0.0 54.4 0.83 29.46 3 20 % 63.6 63.6 0.87 37.16 3 40 % 70.2 70.0 0.88 37.16 5 10 % 66.9 66.9 0.88 40.66 5 20 % 72.0 72.0 0.92 50.76 5 40 % 73.8 73.8 0.93 51.2aDefined as the proportion of the population classified into a stratum with a likelihood ratio <0.10 or >10.0bDefined as the proportion of the population classified into a stratum with predicted risk meaningfully different than the baseline rate of pre-eclampsia in thepopulation (<0.03 or >0.15)Schummers et al. BMC Medical Research Methodology  (2016) 16:123 Page 7 of 102. These findings remained stable in our sensitivity analysesin we considered small changes in the thresholds used todefine clinically distinct low and high risk groups.DiscussionUsing the example of preeclampsia, our study estab-lished the predictor characteristics required for a riskprediction model to adequately discriminate cases fromnon-cases and to adequately classify patients into riskgroups for whom distinct clinical management is war-ranted. Our approach of defining risk strata using clinic-ally meaningful risk thresholds (rather than the morecommon method of using deciles of predicted risk)helped to establish the extent to which the application ofthe prediction models in clinical practice would likely in-fluence clinical management decisions through improvedidentification of high and low risk patients. This approachhelped to highlight that evaluation of a risk predictionmodel based on standard discrimination criteria alonemay not provide a complete picture of the model\u2019s clinicalutility. We found that, if an AUC threshold of 0.7 wereused to indicate acceptable risk prediction model perform-ance, a substantial proportion of risk prediction modelswould be of limited use in clinical practice due to theirpoor risk stratification performance.While our original data were population-based, and didnot include novel clinical predictors, the characteristics ofthe preeclampsia predictors in our data are similar tothose of other data sets with which researchers often aimto build risk prediction models. For example, a recentlypublished prediction model for preeclampsia from a de-tailed clinical cohort included predictors with univariableodds ratios ranging from 0.5 to 2.9 (compared to 0.8 to2.9 in our data) and prevalence values ranging from 3.9 to50.3 % (compared to 0.4 to 43.3 %) [4]. Accordingly, theperformance of our original model is expected to be simi-lar to other models that aim to predict preeclampsia risk,and the findings from our simulation study are expectedto be directly applicable to future work in this area.Although risk stratification capacity is rarely the focus ofrisk prediction model performance evaluations, risk stratifi-cation capacity is central to the overall aim of risk predic-tion models [18]. Risk stratification involves transformationof continuous values of predicted risk into binary or cat-egorical groups in which different levels of intervention ormonitoring are warranted. We evaluated risk stratificationcapacity based on meaningful thresholds for identifyinghigh and low risk patients (rather than arbitrary quantilesof risk in our study population) to maximize the clinical ap-plicability of our findings. The primary method by which aclinical prediction model can improve health outcomes isby correctly classifying patients into groups with distinctclinical management plans (binary or categorical groups).For example, risk prediction models have been used todifferentiate prostate cancer patients who would benefitfrom radical prostatectomy from those who need only re-ceive annual screening tests [21], to differentiate childrenadmitted to hospital with cerebrospinal fluid pleocytosiswho would benefit from parenteral antibiotics from thosewho would not [22], and to identify children at high risk ofabuse or neglect who would benefit from an early interven-tion strategy [23]. Thus, for a prediction model to changethe course of a patient\u2019s care, the model must perform wellin terms of risk stratification capacity. We used a measureof risk stratification capacity that equally weighted amodels\u2019 ability to classify patients into low or high riskgroups in order for this methodological analysis to be mostbroadly applicable. However, it is important to note thatthe clinical implications of misclassification of high risk pa-tients into a low risk group are often not equal to the clin-ical implications of misclassification of low risk patientsinto a high risk group, and the relative importance of eachdepends on the specific research question.Interestingly, the relationship between discrimination(AUC) and risk stratification capacity (proportion of thepopulation classified into a clinically distinct risk group)was robust across our 120 models. Shown in Additionalfile 2, we see that a 0.7 AUC threshold for adequate dis-crimination is consistent with 20\u201330 % of the populationbeing classified into clinically distinct risk groups. Thisheld true even in our sensitivity analyses with slightly dif-ferent definitions for clinically distinct risk groups. Thiscalls into question the validity of an AUC of 0.7 indicatingacceptable performance of a clinical prediction model. Itis only when AUC values reach 0.8 or higher that anywomen were classified into a stratum with an informativelikelihood ratio, and again, it is only when AUC valuesreach 0.85 that a sizeable proportion of the population(>40 %) is classified to a stratum with an informative like-lihood ratio. While 0.7 is widely accepted as the lowerlimit of acceptable discrimination, this threshold origi-nated from a footnote of an early study of predictionmodel performance [15], and has not since been formallyevaluated. Our findings suggest that a higher AUC thresh-old (above 0.8) better indicate a model\u2019s clinical utility, al-though further research is needed to formally identify themost appropriate threshold value.The calibration of our original model was adequate, asmeasured by the Hosmer-Lemeshow goodness-of-fit testand visual comparison of observed versus predictedrisks. Our findings are applicable to models with ad-equate calibration. Models with poor calibration shouldnot be used for risk prediction [20].The findings of this simulation study must be inter-preted in light of several limitations. First, evaluation ofa prediction model\u2019s risk stratification capacity is heavilydependent on the clinical context for the particular re-search question at hand. For example, if classificationSchummers et al. BMC Medical Research Methodology  (2016) 16:123 Page 8 of 10into a high risk group would lead a clinician to performbenign intervention, the threshold for defining the riskgroup would be less stringent than for an interventionthat carries potential harms or side effects. Thus, theparticular thresholds we used to define clinically distinctrisk groups for this simulation study were based on theclinical context of preeclampsia diagnosis and manage-ment, and may not be generalizable to other researchquestions. However, we do expect the broad take-homemessage of our findings to be generalizable to a widerarray of clinical contexts, including prediction of out-comes that are not as rare as preeclampsia.To preserve interpretability of our results, we did notbuild risk prediction models to simulate all situations thata research team may encounter. The predictors we simu-lated were binary and independent of one another, and allregression models were logistic. Continuous predictorsmay, in some cases, result in better predictive ability thanbinary predictors. Conversely, non-independent predictorsmay need to be more prevalent in the population and\/orhave higher univariable odds ratios with the outcome toachieve the same model performance we report. Whilerisk prediction models are most often based on logistic re-gression models, extensions of this work to other modeltypes, such as linear regression models or Cox propor-tional hazards models, merit further investigation.ConclusionsOur findings can serve as a guide to researchers whoseek to develop a risk prediction model. In particular, byexamining the relationship between predictors\u2019 univari-able odds ratios and prevalences and model perform-ance, researchers and peer reviewers should be able toestimate a range of expected model performance param-eters before a model has been built. This form of guid-ance has not yet been available to researchers, and maylead to increased efficiency of research efforts and funds.Additional filesAdditional file 1: Presents the observed and optimism-correct areaunder the receiver-operator characteristic curve for risk prediction modelsbuilt for 12 pregnancy and birth outcomes, and provides a complete listof included predictors for each model. (DOCX 30 kb)Additional file 2: Presents all model performance measures for each ofthe 120 models built using data augmented with simulated predictors.Performance measures include the area under the receiver-operator char-acteristic curve, the proportion of the population classified into a clinicallydistinct risk group, the proportion of the population with an informativelikelihood ratio, and Nagelkerke\u2019s r2. (DOCX 47 kb)AcknowledgementsThe authors thank Terri Pacheco, Perinatal Services British Columbia, for herassistance in compiling the study data. All inferences, opinions, andconclusions drawn in this publication are those of the authors, and do notreflect the opinions or policies of Perinatal Services BC.FundingLS was supported by Training Grant T32HD060454 in Reproductive, Perinataland Pediatric Epidemiology from the National Institute of Child Health andHuman Development, National Institutes of Health and a Training Grant inPharmacoepidemiology from the Harvard T.H. Chan School of Public Health.JAH holds New Investigator awards from the Canadian Institutes of HealthResearch and the Michael Smith Foundation for Health Research. KPH issupported by National Institutes of Health Grant K12HD063087.Availability of data and materialsThe data used for this study are administered by Perinatal Services BC.Perinatal Services BC provides access to these data for research purposes,but does not allow the data to be shared publicly in order to maintainconfidentiality and privacy of individual health information. Researchers whowish to access this data may submit a data request directly to PerinatalServices BC.Authors\u2019 contributionsStudy and conceptual design: LS and JH. Data acquisition and analysis: LSand JH. Interpretation of data: LS, KH, LB, JH. Drafting of the manuscript: LSand JH. Critical revision of the manuscript for important intellectual content:LS, KH, LB, JH. All authors have read and approved the final manuscript.Competing interestsThe authors declare that they have no competing interests.Consent for publicationThe data used for this study are population-based and de-identified; henceindividual consent to publish is not available or required for this study.Ethics approval and consent to participateThe study was approved by the Research Ethics Board of the University ofBritish Columbia\/BC Children\u2019s and Women\u2019s Hospital (H-13-01707). Therequirement for participant informed consent was waived by the UBC\/BCChildren\u2019s and Women\u2019s Research Ethics Board in accordance with Canada\u2019sTri-Council Policy Statement on Ethical Conduct for Research InvolvingHumans (Article 3.7). As the study involved minimal risk to participants, thewaiver was unlikely to adversely affect the welfare of participants due to theanonymous nature of the data. Given the large sample size and that theresearch pertains to past pregnancies, it would be highly impractical for theresearchers to seek consent from the individuals to whom the informationrelates.Author details1Department of Epidemiology, Harvard T.H. Chan School of Public Health,677 Huntington Avenue, Boston, MA 02115, USA. 2Department ofEpidemiology, Graduate School of Public Health, and Department ofObstetrics, Gynecology, and Reproductive Sciences, University of Pittsburgh,A742 Crabtree Hall, 130 DeSoto Street, Pittsburgh, PA 15261, USA.3Department of Obstetrics, Gynecology, and Reproductive Sciences,Magee-Womens Research Institute, University of Pittsburgh, 300 HalketStreet, Pittsburgh, PA 15213, USA. 4Department of Obstetrics & Gynaecology,University of British Columbia, 4500 Oak Street C408, Vancouver, BritishColumbia V6H3N1, Canada.Received: 6 April 2016 Accepted: 7 September 2016References1. Pepe MS, Janes H, Longton G, Leisenring W, Newcomb P. Limitations of theodds ratio in gauging the performance of a diagnostic, prognositic, orscreening marker. Am J Epidemiol. 2004;159(9):882\u201390.2. Steegers EAP, von Dadelszen P, Duvekot JJ, Pijnenborg R. Pre-eclampsia.Lancet. 2010;376:631\u201344.3. Askie L, Duley L, Henderson-Smart D, Steward L, PARIS Collaborative Group.Antiplatelet agents for prevention of pre-eclampsia: A meta-analysis ofindividual patient data. Lancet. 2007;369(9575):1791\u20138.4. North RA. In: Lyall F, Belfort M, editors. Classification and diagnosis ofpreeclampsia. 2007. p. 243.Schummers et al. BMC Medical Research Methodology  (2016) 16:123 Page 9 of 105. Myatt L, Clifton RG, Roberts JM, et al. First-trimester prediction ofpreeclampsia in low-risk nulliparous women. Obstet Gynecol. 2012;119(6):1234\u201342.6. Yu CKH, Smith GCS, Papageorghiou AT, Cacho AM, Nicolaides KH. Anintegrated model for the prediction of preeclampsia using maternal facrtorsand uterine artery doppler velocimetry in unselected low-risk women. Am JObstet Gynecol. 2005;193:429\u201336.7. Lee L, Sheu B, Shau W, et al. Mid-trimester B-hCG levels incorporated in amultifactorial model for the prediction of severe pre-eclampsia. PrenatDiagn. 2000;20:738\u201343.8. August P, Helseth G, Cook EF, Sison C. A prediction model forsuperimposed preeclampsia in women with chronic hypertension duringpregnancy. Am J Obstet Gynecol. 2004;191:1666\u201372.9. Poon LCY, Kametas NA, Maiz N, Akolekar R, Nicolaides KH. First-trimesterprediction of hypertensive disorders in pregnancy. Hypertension. 2009;53:812\u20138.10. Perinatal Services BC [creator] (2013). Perinatal data registry. Vol DataStewardship Committee (2013). Accessed 8 Aug 2015. http:\/\/www.perinatalservicesbc.ca\/default.htm ed.; Data extract.11. Frosst GO, Hutcheon JA, Joseph KS, Kinniburgh BA, Johnson C, Lee L.Validating the British Columbia Perinatal Data Registry: A chart re-abstraction study. BMC Preg Childbirth. 2015;15:123. doi: 10.1186\/s12884-015-0563-712. Steyerberg EW. Clinical prediction models: A practical approach todevelopment, validation, and updating, vol. 1. New York: Springer; 2009. p. 446.13. Staff AC, Benton SJ, von Dadelszen P, et al. Redefining preeclampsia usingplacenta-derived biomarkers. Hypertension. 2013;61(5):932\u201342.14. Milligan N, Rowden M, Wright E, et al. Two-dimensional sonographicassessment of maximum placental length and thickness in the secondtrimester: A reproducibility study. J Matern Fetal Neonatal Med. 2014; epubahead of print:1-7.15. Swets JA. Measuring the accuracy of diagnostic systems. Science. 1988;240:1285\u201392.16. Deeks JJ, Altman DG. Diagnostic tests 4: Likelihood ratios. BMJ. 2004;329(7458):882\u201390.17. Steyerberg EW, Vickers AJ, Cook NR, Gerds T, Gonen M, Obuchowski N,Pencina M, Kattan MW. Assessing the performance of prediction models: aframework for some traditional and novel measures. Epidemiology. 2010;21(1):128\u201338. doi:10.1097\/EDE.0b013e3181c30fb2.18. Cook NR. Use and misuse of the receiver operating characteristic curve inrisk prediction. Circulation. 2007;115(7):928\u201335.19. Schummers L, Hutcheon JA, Bodnar LM, Lieberman E, Himes KP. Risk ofadverse pregnancy outcomes by prepregnancy body mass index: Apopulation-based study to inform prepregnancy weight loss counseling.Obstet Gynecol. 2015;125(1):133\u201343. doi:10.1097\/AOG.0000000000000591.20. Stata Corp. Stata staistical software: Release 12. 2011.21. Epstein JI, Walsh PC, Carmichael M, Brendler CB. Pathologic and clinicalfindings to predict tumor extent of nonpalpable (stage T1 c) prostatecancer. JAMA. 1994;271(5):368\u201374.22. Nigrovic LE, Kuppermann N, Macias CG, et al. Clinical prediction rule foridentifying children with cerebrospinal fluid pleocytosis at very low risk ofbacterial meningitis. JAMA. 2007;297(1):52\u201360.23. Wilson ML, Tumen S, Ota R, Simmers AG. Predictive modeling: Potentialapplications in prevention services. Am J Prev Med. 2015;48(5):509\u201319.\u2022  We accept pre-submission inquiries \u2022  Our selector tool helps you to find the most relevant journal\u2022  We provide round the clock customer support \u2022  Convenient online submission\u2022  Thorough peer review\u2022  Inclusion in PubMed and all major indexing services \u2022  Maximum visibility for your researchSubmit your manuscript atwww.biomedcentral.com\/submitSubmit your next manuscript to BioMed Central and we will help you at every step:Schummers et al. BMC Medical Research Methodology  (2016) 16:123 Page 10 of 10","@language":"en"}],"Genre":[{"@value":"Article","@language":"en"}],"IsShownAt":[{"@value":"10.14288\/1.0366839","@language":"en"}],"Language":[{"@value":"eng","@language":"en"}],"PeerReviewStatus":[{"@value":"Reviewed","@language":"en"}],"Provider":[{"@value":"Vancouver : University of British Columbia Library","@language":"en"}],"Publisher":[{"@value":"BioMed Central","@language":"en"}],"PublisherDOI":[{"@value":"10.1186\/s12874-016-0223-2","@language":"en"}],"Rights":[{"@value":"Attribution 4.0 International (CC BY 4.0)","@language":"en"}],"RightsURI":[{"@value":"http:\/\/creativecommons.org\/licenses\/by\/4.0\/","@language":"en"}],"ScholarlyLevel":[{"@value":"Faculty","@language":"en"}],"Subject":[{"@value":"Epidemiologic methods","@language":"en"},{"@value":"Risk prediction model","@language":"en"},{"@value":"Discrimination","@language":"en"},{"@value":"Risk classification","@language":"en"},{"@value":"Model performance","@language":"en"},{"@value":"Area under the receiver operating characteristic curve","@language":"en"}],"Title":[{"@value":"Predictor characteristics necessary for building a clinically useful risk prediction model: a simulation study","@language":"en"}],"Type":[{"@value":"Text","@language":"en"}],"URI":[{"@value":"http:\/\/hdl.handle.net\/2429\/65898","@language":"en"}],"SortDate":[{"@value":"2016-09-21 AD","@language":"en"}],"@id":"doi:10.14288\/1.0366839"}