Open Collections

UBC Faculty Research and Publications

Are potentially clinically meaningful benefits misinterpreted in cardiovascular randomized trials? A… Allan, G. M; Finley, Caitlin R; McCormack, James; Kumar, Vivek; Kwong, Simon; Braschi, Emelie; Korownyk, Christina; Kolber, Michael R; Lindblad, Adriennne J; Babenko, Oksana; Garrison, Scott Mar 20, 2017

Your browser doesn't seem to have a PDF viewer, please download the PDF to view this item.

Item Metadata


52383-12916_2017_Article_821.pdf [ 1.66MB ]
JSON: 52383-1.0366871.json
JSON-LD: 52383-1.0366871-ld.json
RDF/XML (Pretty): 52383-1.0366871-rdf.xml
RDF/JSON: 52383-1.0366871-rdf.json
Turtle: 52383-1.0366871-turtle.txt
N-Triples: 52383-1.0366871-rdf-ntriples.txt
Original Record: 52383-1.0366871-source.json
Full Text

Full Text

RESEARCH ARTICLE Open AccessAre potentially clinically meaningfulbenefits misinterpreted in cardiovascularrandomized trials? A systematicexamination of statistical significance,clinical significance, and authors’conclusionsG. Michael Allan1*, Caitlin R. Finley1, James McCormack2, Vivek Kumar1, Simon Kwong1, Emelie Braschi3,Christina Korownyk1, Michael R. Kolber1, Adriennne J. Lindblad1, Oksana Babenko4 and Scott Garrison1AbstractBackground: While journals and reporting guidelines recommend the presentation of confidence intervals, manyauthors adhere strictly to statistically significant testing. Our objective was to determine what proportions of notstatistically significant (NSS) cardiovascular trials include potentially clinically meaningful effects in primary outcomesand if these are associated with authors’ conclusions.Methods: Cardiovascular studies published in six high-impact journals between 1 January 2010 and 31 December2014 were identified via PubMed. Two independent reviewers selected trials with major adverse cardiovascularevents (stroke, myocardial infarction, or cardiovascular death) as primary outcomes and extracted data on trialcharacteristics, quality, and primary outcome. Potentially clinically meaningful effects were defined broadly as arelative risk point estimate ≤0.94 (based on the effects of ezetimibe) and/or a lower confidence interval ≤0.75(based on the effects of statins).Results: We identified 127 randomized trial comparisons from 3200 articles. The primary outcomes were statisticallysignificant (SS) favoring treatment in 21% (27/127), NSS in 72% (92/127), and SS favoring control in 6% (8/127). In61% of NSS trials (56/92), the point estimate and/or lower confidence interval included potentially meaningfuleffects. Both point estimate and confidence interval included potentially meaningful effects in 67% of trials (12/18)in which authors’ concluded that treatment was superior, in 28% (16/58) with a neutral conclusion, and in 6% (1/16)in which authors’ concluded that control was superior. In a sensitivity analysis, 26% of NSS trials would includepotential meaningful effects with relative risk thresholds of point estimate ≤0.85 and/or a lower confidenceinterval ≤0.65.(Continued on next page)* Correspondence: michael.allan@ualberta.ca1Evidence-Based Medicine, Department of Family Medicine - ResearchProgram, University of Alberta, 6-10 University Terrace, Edmonton, AB T6G2T4, CanadaFull list of author information is available at the end of the article© The Author(s). 2017 Open Access This article is distributed under the terms of the Creative Commons Attribution 4.0International License (, which permits unrestricted use, distribution, andreproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link tothe Creative Commons license, and indicate if changes were made. The Creative Commons Public Domain Dedication waiver( applies to the data made available in this article, unless otherwise stated.Allan et al. BMC Medicine  (2017) 15:58 DOI 10.1186/s12916-017-0821-9(Continued from previous page)Conclusions: Point estimates and/or confidence intervals included potentially clinically meaningful effects inup to 61% of NSS cardiovascular trials. Authors’ conclusions often reflect potentially meaningful results of NSScardiovascular trials. Given the frequency of potentially clinical meaningful effects in NSS trials, authors should beencouraged to continue to look beyond significance testing to a broader interpretation of trial results.Keywords: Cardiovascular, Randomized controlled trials, Statistical significance, Clinical significance, Confidenceintervals, ConclusionsBackgroundThe preferred reporting of clinical outcomes in randomizedcontrolled trials (RCTs) is described in the ConsolidatedStandards of Reporting Trials (CONSORT) statement [1].Within CONSORT the use of confidence intervals isemphasized in preference to p-values. Confidenceintervals describe the precision of the estimate and“are especially valuable in relation to differences thatdo not meet conventional statistical significance, forwhich they often indicate that the result does not ruleout an important clinical difference” [1]. Editorialsdating back almost 40 years have encouraged authorsto use confidence intervals to describe the results oftheir studies rather than simply reporting the findingsas statistically significant or not [2–4]. Despite this,the use of p-values in published articles remainsapproximately seven times more common than confi-dence intervals [5]. Furthermore, confidence intervalsare often used in a manner similar to p-values, todichotomize outcomes as statistically significant (SS) ornot. We have previously written about three importantclinical controversies resulting from this dichotomousactivity [6].Interpretation of trial results when primary outcomesare not statistically significant (NSS) is challenging. Inparticular, it can be difficult putting the potential clinicalrelevance of the NSS effect and confidence intervals incontext of the entire study results. Boutron and colleaguesdemonstrated that authors often place a favorable “spin”(positive portrayal) on trial results when the primary out-come is NSS [7]. Such spin occurred in 58% of abstractconclusions, 50% of main text conclusions, and 18% oftitles. Others have similarly reported spin in RCTs evaluat-ing wound care [8] and surgical modalities [9, 10].Although promotion of results may be common in NSStrial reporting, the evaluation assumes that NSS resultsdemonstrate no potentially clinically meaningful effect.For these reasons we examined the primary outcomesand conclusions of RCTs in six major medical journals.We had two primary questions: (1) How often do thepoint estimates and confidence intervals of the primaryoutcome of NSS and SS trials include potentially clinicallymeaningful effects? and (2) Are the authors’ conclusionsin the abstract of NSS trials influenced by potentiallyclinically meaningful point estimates and confidence inter-vals? We focused specifically on cardiovascular trials withmajor adverse cardiovascular events (MACE) becausethese are established, objective, patient-oriented outcomesthat overlap between trials. Additionally, in large cardio-vascular trials with hard clinical endpoints, statisticalsignificance can be difficult to attain but the results havehigh clinical relevance. We hypothesized that authors ofcardiovascular trials may discount potentially clinicallymeaningful effects identified in the confidence intervalsand/or point estimates when the results are NSS.MethodsWe followed the basic approach described in PRISMA[11] because there is no agreed on methodology for thistype of study.Eligibility and information sourcesWe included all cardiovascular RCTs of superiority de-sign that evaluated preventive or interventional therapiesregardless of the nature of the interventions – includingmedication, surgery, models of care, and lifestyle change.All comparators were valid, including placebo, activecontrol, and no intervention. The primary outcome hadto include at least one MACE: myocardial infarction,stroke, or cardiovascular death. We used PubMed toidentify relevant trials from five high-impact generalmedical journals and one high-impact specialty journal:New England Journal of Medicine (N Engl J Med),Lancet, Journal of the American Medical Association(JAMA), British Medical Journal (BMJ), Annals ofInternal Medicine (Ann Intern Med), and Circulation.Study search and selectionBetween 17 March and 14 April 2015, we searchedPubMed for papers using the full journal title (andabbreviation, if present) with PubMed limits for RCTsand date (1 January 2010 to 31 December 2014). In thecase of Circulation, the term circulation could relate tomedical/physiologic issues in addition to the journal, sowe restricted the search field to “Journal”. For the otherfive journals we did not apply any search restrictions inorder to minimize the unlikely chance of missingrelevant articles. For each journal, two authors (fromAllan et al. BMC Medicine  (2017) 15:58 Page 2 of 10VK, SK, EB, and GMA) independently evaluated andselected studies for inclusion. We excluded studies ofsubgroups, re-analyses, and studies that were eitherextensions or follow-ups from previously published trialsto avoid including the same data more than once. We alsoexcluded non-inferiority designed studies because authors’interpretations and conclusions of non-inferiority resultsare broader, and this would add complexity to ourinterpretation of abstract conclusions. Disagreements forinclusion were resolved by consensus.Data extraction and managementTwo authors (CF with VK or SK) independently ex-tracted data from the trials. Disagreement was resolvedwith consensus or third-party review (GMA).Data extraction on study characteristics includedcitation, type of intervention and control, primary versussecondary prevention population, mean age in study,and percentage of males studied. Data on traditional riskof bias included allocation concealment, blinding,analysis (intention to treat or per protocol), and with-drawals. We also collected data on funding, and whetherthe trial was stopped early (if so, why) or extended. Datarelated to the primary outcome included the clinicalendpoint, number of subjects in each study arm, numberwith the outcomes in each group, point estimate, confi-dence intervals, and p-values.To evaluate the authors’ conclusions, the abstractconclusion was rated using a method derived from Als-Nielsen and colleague’s technique [12]. We condensedthe score from six to three possible conclusions: treat-ment superior, neutral, or control superior.Assessing potentially meaningful effectsTo assess if the primary outcome of an NSS trialincluded potentially meaningful effects, we focused onthe point estimate and lower confidence interval. Themargins of potentially clinically meaningful effect areundoubtedly debatable. Over 20 years ago, authors sug-gested that potentially clinically meaningful effects couldbe 25% or 50% relative risk reductions [13]. Morerecently, trials showing a relative risk reduction of 6%for ezetimibe [14] and 14% for empagliflozin [15] havebeen greeted with enthusiasm [16, 17]. We selected ourmargins of potentially meaningful effect liberally to bebroad and inclusive, thereby ruling out what is likely nota clinically meaningful effect. We decided that thesmallest potentially clinically meaningful effect was a 6%relative risk reduction or a 0.94 relative risk, as reportedby the IMPROVE-IT trial for ezetimibe [14]. For lowerconfidence intervals to include potentially meaningfuleffects, we selected a 25% relative risk reduction or 0.75relative risk described in meta-analyses of statin trials[18], an established clinical therapy.Analysis of resultsStudy characteristics and potential biases are presenteddescriptively. Relative effect estimates including relativerisks, hazard ratios, rate ratios, and odds ratios wereused for primary analysis. If not provided, relative risksand 95% confidence intervals were calculated.Trials were initially categorized into three groupsbased on the statistical testing of the primary outcome:SS trials favoring control, SS trials favoring treatment,and NSS trials. Statistical significance was determined byhypothesis testing via the p-value first and, if not avail-able, we determined if the confidence interval excluded1 (the line of no-effect).To analyze and describe the results, the primaryoutcomes for all RCTs were presented on a forestplot with the potentially clinically meaningful thresh-olds for point estimate (≤0.94) and confidence interval(≤0.75) indicated. We categorized NSS trials as having(1) both the lower confidence interval and pointestimate include potentially meaningful effects; (2)either the lower confidence interval or point estimateinclude a potentially meaningful effect; or (3) neitherthe lower confidence interval nor point estimateinclude a potentially meaningful effect. Among NSStrials, results were further stratified according to au-thors’ conclusions.We used chi-square and independent samples mediantest to examine if selected factors were associated withauthors’ conclusions in NSS trials. Factors comparedincluded type of control used in the trials, funding(industry, public, or mixed), point estimates, and lowerconfidence intervals.Sensitivity analysesWe performed sensitivity analyses to examine the effect ofsome key variables on the proportion of NSS trials withpotentially clinically meaningful effect. Because smallertrials may be expected to have broader confidence inter-vals, we performed an analysis of trials with <2000patient-years and those with ≥2000 patient-years. Becauseprimary prevention trials will have smaller absolute bene-fits for a given relative benefit, we performed an analysisof primary versus secondary prevention trials.To determine how sensitive the results were to thethreshold of potential clinically meaningful effects, weincreased the potentially meaningful relative risk reduc-tion threshold for point estimates to ≥15% (or ≤0.85relative risk) and for lower confidence intervals to ≥35%(or ≤0.65 relative risk).ResultsStudy inclusion and characteristicsThe flow of study exclusion and inclusion is detailed inFig. 1. Of the original 3200 studies identified in ourAllan et al. BMC Medicine  (2017) 15:58 Page 3 of 10search, 127 RCTs met inclusion criteria. Agreement forstudy selection was 97% and for data extraction was93%. General characteristics and risk of bias of includedstudies are outlined in Table 1 and the full list of thestudies is included in Additional file 1. Secondaryprevention trials (85%, 108/127) and community-basedtrials (58%, 74/127) were most common. The primaryoutcome included a range of one to ten combinedoutcomes (median, three) with myocardial infarction(80%), stroke (65%), and cardiovascular death (50%)being the most common. Overall, study quality wasgood: for example, 77% (98/127) described allocationconcealment and 94% (119/127) performed intention-to-treat analysis. Most trials (75%, 95/127) were com-pleted as planned but 22% (28/127) were stoppedearly for varying reasons (usually harm or futility)and 3% (4/127) were extended.Statistical significance of primary outcome and conclusionsFigure 2 outlines the flow of study outcomes withcategorization by statistical significance for the primaryoutcome and authors’ conclusions. The primary outcomewas SS favoring treatment in 21% of trials (27/127) and allconcluded treatment was superior. The primary outcomewas NSS in 72% of trials (92/127), of which 63% (58/92)had a neutral conclusion, 20% (18/92) concluded treat-ment was superior, and 17% (16/92) concluded controlwas superior. The primary outcome was SS favoring con-trol in 6% (8/127) and all concluded control was superior.Potentially clinically meaningful effects by statisticalsignificanceFigure 3 provides the forest plot of all primary outcomesorganized by statistical significance and if the confidenceintervals and/or point estimates included potentiallyFig. 1 Study flowAllan et al. BMC Medicine  (2017) 15:58 Page 4 of 10meaningful effects. Careful inspection of the forest plotreveals that a number of the NSS trials had lower confi-dence intervals and point estimates that appear similarto the effects in many of the SS trials. Among NSS trials,in 32% (29/92) both the lower confidence interval andpoint estimate included potentially meaningful effects,while in 29% (27/92) only one of the two included a po-tentially meaningful effect. Neither the lower confidenceinterval nor point estimate included potentially mean-ingful effects in 39% of NSS trials (36/92).Potentially clinically meaningful effects and authors’conclusions in not statistically significant trialsFigure 4 shows the findings based on authors’ conclusionsand whether the lower confidence intervals and/or pointestimates of the primary outcomes included potentiallymeaningful effects. The lower confidence interval andpoint estimate included potentially meaningful effects in67% of trials (12/18) with conclusions that treatment wassuperior compared to 6% of trials (1/16) with conclusionsTable 1 Study characteristics and risk of bias of the 127 includedrandomized controlled trialsStudy characteristicsJournal, n (%)New England Journal of Medicine 65 (51)Lancet 23 (18)Journal of the American Medical Association 20 (16)British Medical Journal 5 (4)Annals of Internal Medicine 1 (1)Circulation 13 (10)Setting, nCommunity 74 (58)Hospital 53 (42)Primary or secondary prevention, nPrimary 19 (15)Secondary 108 (85)Experimental interventional, nMedication 65 (51)Surgery 32 (25)Models of care 11 (9)Vitamin/supplement 9 (7)Lifestyle 4 (3)Diagnostics/other* 6 (5)Patient characteristicsMedian age (interquartile range), years 63.8 (61.5–66.5)Percent males (interquartile range) 72.0 (60.4–78.0)Study size and durationMedian study size (interquartile range) 3020 (1319–8521)Median study duration (interquartile range), months 24.0 (8.3–45.3)Primary outcome included (median 3, range 1–10), n (%)Myocardial infarction 101 (80)Stroke 83 (65)Cardiovascular death 64 (50)Overall death 51 (40)Revascularization 31 (25)Heart failure 22 (17)Othera 37 (29)Risk of bias, n (%)Planned trial durationCompleted as planned 95 (75)Extended 4 (3)Stopped for benefit 8 (6)Stopped for harm 10 (8)Stopped for futility 9 (7)Stopped for financial reasons 1 (1)Table 1 Study characteristics and risk of bias of the 127 includedrandomized controlled trials (Continued)Allocation concealmentYes 98 (77)Unclear/no 29 (23)BlindingDouble 65 (51)Single 13 (10)None 49 (39)AnalysisIntention to treat 119 (94)Modified intention to treat 7 (6)Per protocol 1 (1)Sample size estimationEstimation attained 83 (65)Estimation missed 38 (30)No estimation given 6 (5)WithdrawalNumber provided 115 (91)Median (interquartile range) 2.3 (0.5–7.0)FundingIndustry 52 (41)Mixed 46 (36)Public 28 (22)Not described 1 (1)*Examples of other include stem cells and continuous positive airway pressureaOther includes angina, thromboembolism, stent failure, cardiac arrest, renaloutcomes, shock, peripheral vascular event, bleeding, arrhythmia, pericardialtamponade, respiratory failure, severe left ventricular dysfunction requiringmechanical support, hypertension, and/or aortic insufficiencyAllan et al. BMC Medicine  (2017) 15:58 Page 5 of 10that control was superior. Neither the lower confidenceinterval nor point estimate included potentially meaning-ful effects in 11% of trials (2/18) with conclusions thattreatment was superior compared to 63% of trials (10/16)with conclusions that control was superior. The point esti-mates and lower confidence intervals of neutral authors’conclusions were distributed relatively evenly.Factors associated with authors’ conclusionsTable 2 shows NSS trial abstract conclusions comparedto selected study characteristics, including the type ofcomparator used (placebo or active), funding (industryor public), point estimate (median and threshold), andlower confidence interval (median and threshold). Therewas no association between conclusions and type of com-parator or funding. Both median point estimates and me-dian lower confidence intervals decline as authors’conclusions change from control superior to neutral totreatment superior (both p ≤ 0.006). Additionally, the clin-ically meaningful thresholds for point estimates and lowerconfidence intervals were statistically significantly associ-ated with authors’ conclusion (p ≤ 0.002). These findingsconsistently show a similar association for NSS trials:lower point estimates and/or confidence intervals thatsuggest potentially clinical effects are associated withauthors’ concluding that treatment is superior.Sensitivity analysesTable 3 shows the sensitivity analyses. Subgroups of trialsize or primary and secondary prevention generally hadsimilar proportions of trials with potentially meaningfuleffects. The only exception was the trial size subgroupexamining the proportion of NSS trials with confidenceintervals that suggested potentially meaningful effects:71% (25/35) for smaller NSS trials with <2000 patient-years versus 28% (16/57) for larger NSS trials with ≥2000patient-years. The proportion of larger NSS trials with apoint estimate and/or confidence interval including poten-tially meaningful effects was 53% (30/57).Lastly, NSS trials were re-examined using increasedpotentially clinically meaningful thresholds. Theincreased thresholds were a relative risk reduction of≥15% for point estimates and ≥35% for lower confidenceintervals. In 15% of NSS trials (14/92) both the increasedpoint estimate and confidence interval included poten-tially meaningful effects, in 11% (10/92) only one of thetwo included a potentially meaningful effect, and in 74%(68/92) neither threshold was met.DiscussionIn 61% of NSS cardiovascular trials, the primary out-come had a confidence interval that included an effectsimilar to or better than statin therapy (relative riskreduction ≥25%) and/or a point estimate similar to orbetter than ezetimibe (≥6%). These results suggest that ifwe were to strictly focus on a dichotomous finding ofwhether results are SS or NSS, we run the risk of dis-missing a treatment in almost two thirds of NSS trialsthat could potentially have meaningful effects. Further-more, about one third of NSS trials had even higherprobability of potentially clinically meaningful effectsbecause both confidence intervals and point estimatesincluded potentially meaningful effects. In fact, visualinspection of Fig. 2 shows the distribution of the effectsis very similar between SS trials favoring treatment andNSS trials when both confidence interval and pointestimates include potential meaningful effects. Thisfurther suggests that strict adherence to an arbitrarythreshold for statistical significance may serve poorly asa judgment of treatment benefit.Within NSS trials, authors’ conclusions were associ-ated with the potentially meaningful effects in the confi-dence intervals and point estimates. For example, boththe point estimate and confidence intervals included po-tentially meaningful effects in 67% of NSS trials in whichthe authors concluded treatment was superior. In con-trast, both the point estimate and confidence intervalsincluded potentially meaningful effects in only 6% ofNSS in which the authors’ concluded control was super-ior. Past research suggested that just over half of NSSstudies have conclusions that are unjustifiably positiveand inconsistent with the results [7]. However, our studyFig. 2 Flow of trial primary outcome including presence of statistical significance and abstract conclusionAllan et al. BMC Medicine  (2017) 15:58 Page 6 of 10suggests that some of these favorable interpretationsmay relate to potentially meaningful benefits suggestedin the confidence intervals and/or point estimates. Giventhis and the recommendations of CONSORT regardingthe presentation of results [1], future research evaluatingauthors’ interpretations or conclusions of NSS trialsshould assess trial outcomes beyond statistical signifi-cance testing.Fig. 3 Forest plot of trial primary outcomes organized by statistically significant testing and if potentially clinical meaningful effects are indicatedby the point estimates (≤0.94) and/or the confidence intervals (≤0.75). RCT randomized controlled trial. * Not statistically significant as p-valueadjusted for multiple comparisonsAllan et al. BMC Medicine  (2017) 15:58 Page 7 of 10Potentially meaningful effects in the point estimatesand confidence intervals are not the only factors in-fluencing authors’ conclusions. For example, 28% ofNSS trials with a neutral conclusion had both a lowerconfidence interval and point estimate suggestive ofpotentially meaningful effects. Perhaps these authorsare basing their conclusions solely on statistical sig-nificance but it is also possible that other elements ofthe trial results or intervention play a role: adverseevents, costs, and secondary outcomes are all poten-tially relevant.Our results were sensitive to two possibly predictablefactors. First, trials of smaller size frequently have less pre-cision in the estimate and thus broader confidence inter-vals. Within our study, this could result in more of thesmaller trials having lower confidence intervals crossing apotentially meaningful threshold. This did occur but mostof the trials included in this review were large. Therefore,Fig. 4 The proportion of lower confidences intervals and/or point estimates that suggest potentially meaningful effects within conclusions fromnot statistically significant trials. Potentially meaningful lower confidence interval ≤0.75 relative risk; potentially meaningful point estimate ≤0.94relative riskTable 2 Abstract conclusions of included not statistically significant trials with a superiority design categorized by study characteristicsAuthors’ conclusion in the abstract p-valueControl superior Neutral Treatment superiorNumber of studies 16 58 18ComparatorPlacebo/nothing: 49 studies (%) 12 (24) 29 (59) 8 (16) 0.15aStandard/active comparator: 43 studies (%) 4 (9) 29 (67) 10 (23)FundingIndustry: 37 studies (%) 9 (24) 21 (57) 7 (19) 0.14aMixed: 35 studies (%) 7 (20) 20 (57) 8 (23)Public: 20 studies (%) 0 17 (85) 3 (15)Point estimateMedian (interquartile range) 0.99 (0.96–1.09) 0.95 (0.90–1.01) 0.88 (0.83–0.98) 0.006bPoint estimate >0.94: 48 studies (%) 14 (29) 29 (60) 5 (10) 0.002aPoint estimate ≤0.94: 44 studies (%) 2 (5) 29 (66) 13 (30)Lower confidence intervalMedian (interquartile range) 0.84 (0.73–0.89) 0.79 (0.66–0.86) 0.66 (0.57–0.69) 0.005bConfidence interval >0.75: 51 studies (%) 11 (22) 37 (73) 3 (6) 0.001aConfidence interval ≤0.75: 41 studies (%) 5 (12) 21 (51) 15 (37)aChi-squarebIndependent samples median testAllan et al. BMC Medicine  (2017) 15:58 Page 8 of 10the proportion of NSS trials in which either the point esti-mate and/or the confidence interval included potentiallymeaningful effects was only slightly lower in larger trials(having ≥2000 patient-years) than overall (53% versus61%, respectively). Second, modification of the thresholdsof potentially clinically meaningful effects foreseeably re-duced the proportion of trials with potentially meaningfuleffects. The proportion of NSS trials in which either thepoint estimate and/or the confidence interval included po-tentially meaningful effects was 61% in our primary ana-lysis but fell to 26% when the relative risk reductionthresholds were increased to ≥15% for point estimates and≥35% for confidence intervals. However, even with thesestricter criteria, a quarter of all NSS cardiovascular trialsfound potentially meaningful effects.Despite our findings, it is important not to over-interpret our results and assume that we are suggestingthat a 6% relative risk reduction is a meaningful effect inall populations. Nor would we suggest all researchersuse these thresholds for sample size estimation and/orextended or repeated studies until these small benefitsare entirely ruled out. All interventions, and the trialsassessing their clinical value, need to be considered inthe boarder context of many relevant factors, includingoverall risk of the primary outcome, adverse events,costs, inconvenience, and alternative interventions. Wehope this paper can draw attention to the need to useconfidence intervals and describe potentially meaningfuleffects. Fortunately, it appears that a number of authorsare already doing this. Moreover, we support the advice[19] that authors and evidence-users move away fromthe dogmatic adherence to hypothesis testing that leadssome to believe that a p-value of 0.049 means a positivetrial and treatment works while a p-value of 0.051 meansa negative trial and treatment does not work.There are some notable limitations to our study. First,there are many factors involved in how authors interprettheir research but our study focused only on pointestimates and confidence intervals of primary outcomes.Second, we focused on cardiovascular trials with hardclinical (MACE) endpoints and so confirmation isrequired to determine if results would be similar for re-search in other conditions like chronic obstructivepulmonary disease or infectious disease. Third, ourdefinitions of potentially clinically meaningful effects maybe seen as arbitrary or too generous. There is no agreed-on minimal clinically important effect for MACE out-comes so we derived our definition from established ther-apies although some will certainly feel they are toogenerous. We used somewhat liberal thresholds becauseour goal was to determine if results included any “poten-tially” clinically meaningful effects but we also performeda sensitivity analysis with stricter criteria. While some willsee these cut-offs as arbitrary, a goal of this paper is to re-flect on the rigid adherence to the 0.05 statistic signifi-cance threshold, which itself can be considered arbitrary.Fourth, we used relative margins. The use of relative mar-gins allows for more easy comparison across trials becauseany assessment of absolute effects must also account fortime. Fifth, although we assessed authors’ conclusions byfocusing on abstract conclusions, this is a previousmethod of rating conclusions [12] and abstract conclusionis the most likely location for promotion of results [7]. Itshould also be noted that the abstract conclusions, likeany part of the articles, may have been modified throughthe peer-review process and editorial recommendations. Itis not possible to clarify to what, if any, degree this oc-curred but we suspect it is small.ConclusionsIn up to 61% of NSS cardiovascular trials, the primaryoutcome has a point estimate and/or confidence intervalthat includes potentially clinically meaningful effects.Furthermore, among the NSS cardiovascular trials, au-thors’ conclusions were positively associated with pointestimates and lower confidence intervals that suggestgreater potential effects. In fact, both the point estimatesand confidence intervals included potentially meaningfulTable 3 Sensitivity analysis of not statistically significant randomized controlled trialsSubgroups Categories Point estimateRelative riskConfidence intervalRelative risk≤0.94 >0.94 ≤0.75 >0.75Study size (in patient years) <2000 patient-years(n = 35)16 (46%) 19 (54%) 25 (71%) 10 (29%)≥2000 patient-years(n = 57)28 (49%) 29 (51%) 16 (28%) 41 (72%)Primary versus secondary prevention Primary(n = 13)5 (38%) 8 (62%) 5 (38%) 8 (62%)Secondary(n = 79)39 (49%) 40 (51%) 36 (46%) 43 (54%)≤0.85 >0.85 ≤0.65 >0.65Increase in potentially clinically meaningful thresholds (n = 92) 16 (17%) 76 (83%) 22 (24%) 70 (76%)Allan et al. BMC Medicine  (2017) 15:58 Page 9 of 10effects in 67% of trials (12/18) in which the authorsconcluded that treatment was superior, compared toonly 6% (1/16) in which authors concluded that controlwas superior. Given the frequency of NSS cardiovasculartrials, it is reassuring that many authors look beyond stat-istical significance testing and consider the potentiallymeaningful clinical effects of their results. Additionally,journals and evidence-users should be encouraged, asdirected by CONSORT, to consider point estimates andconfidence intervals in the context of potentially clinicallymeaningful effects and not strictly for hypothesis andstatistical significance testing.Additional fileAdditional file 1: Supplementary material: included studies. (DOCX 61 kb)AcknowledgementsNone.FundingNo external funding.Availability of data and materialsThe list of included studies and whether the results were statisticallysignificant is available in the online supplement.Authors’ contributionsGMA conceived of the study and GMA, SG, MRK, JM, CK, and AJL refined anddesigned the study. VK, SK, EB and GMA performed study selection whileCRF, VK, SK, and GMA performed data extraction. GMA, SG, JM, MRK, CK, andAJL performed analysis. OB provided statistical advice and assistance withanalysis. All authors helped refine the study concept, provided input inanalysis, critically contributed to the manuscript, had full access to the data,and read and approved the final manuscript.Competing interestThe authors declare that they have no competing interests.Consent for publicationNot applicable.Ethics approval and consent to participateNot applicable.Author details1Evidence-Based Medicine, Department of Family Medicine - ResearchProgram, University of Alberta, 6-10 University Terrace, Edmonton, AB T6G2T4, Canada. 2Faculty of Pharmaceutical Sciences, University of BritishColumbia, Vancouver, British Columbia, Canada. 3Family Medicine, McGillUniversity, Montreal, QC, Canada. 4Medical Education, Department of FamilyMedicine, University of Alberta, Edmonton, AB, Canada.Received: 13 November 2016 Accepted: 16 February 2017References1. Moher D, Hopewell S, Schulz KF, Montori V, Gøtzsche PC, Devereaux PJ,Elbourne D, Egger M, Altman DG. CONSORT 2010 explanation andelaboration: updated guidelines for reporting parallel group randomisedtrials. BMJ. 2010;340:c869. doi:10.1136/bmj.c869.2. Rothman KJ. A show of confidence. N Engl J Med. 1978;299:1362–3.3. Gardner MJ, Altman DG. Confidence intervals rather than P values:estimation rather than hypothesis testing. Br Med J (Clin Res Ed).1986;292:746–50.4. Cummings P, Koepsell TD. P values vs estimates of association withconfidence intervals. Arch Pediatr Adolesc Med. 2010;164:193–6.5. Chavalarias D, Wallach JD, Li AHT, Ioannidis JPA. Evolution of reporting Pvalues in the biomedical literature, 1990-2015. JAMA. 2016;315:1141–8.6. McCormack J, Vandermeer B, Allan GM. How confidence intervals becomeconfusion intervals. BMC Med Res Methodol. 2013;13(1):134.7. Boutron I, Dutton S, Ravaud P, Altman DG. Reporting and interpretation ofrandomized controlled trials with statistically nonsignificant results forprimary outcomes. JAMA. 2010;303(20):2058–64.8. Lockyer S, Hodgson R, Dumville JC, Cullum N. “Spin” in wound careresearch: the reporting and interpretation of randomized controlled trialswith statistically non-significant primary outcome results or unspecifiedprimary outcomes. Trials. 2013;14:371.9. Patel SV, Van Koughnett JA, Howe B, Wexner SD. spin is common in studiesassessing robotic colorectal surgery: an assessment of reporting andinterpretation of study results. Dis Colon Rectum. 2015;58(9):878–84.10. Patel SV, Chadi SA, Choi J, Colquhoun PH. The use of “spin” in laparoscopiclower GI surgical trials with nonsignificant results: an assessment ofreporting and interpretation of the primary outcomes. Dis Colon Rectum.2013;56(12):1388–94.11. Moher D, Liberati A, Tetzlaff J, Altman DG, PRISMA Group. Preferredreporting items for systematic reviews and meta-analyses: the PRISMAstatement. BMJ. 2009;339:b2535. doi:10.1136/bmj.b2535.12. Als-Nielsen B, Chen W, Gluud C, Kjaergard LL. Association of funding andconclusions in randomized drug trials: a reflection of treatment effect oradverse events? JAMA. 2003;290(7):921–8.13. Moher D, Dulberg CS, Wells GA. Statistical power, sample size, and theirreporting in randomized controlled trials. JAMA. 1994;272(2):122–4.14. Cannon CP, Blazing MA, Giugliano RP, McCagg A, White JA, Theroux P,Darius H, Lewis BS, Ophuis TO, Jukema JW, De Ferrari GM, Ruzyllo W, DeLucca P, Im K, Bohula EA, Reist C, Wiviott SD, Tershakovec AM, Musliner TA,Braunwald E, Califf RM, IMPROVE-IT Investigators. ezetimibe added to statintherapy after acute coronary syndromes. N Engl J Med. 2015;372(25):2387–97.15. Zinman B, Wanner C, Lachin JM, Fitchett D, Bluhmki E, Hantel S, MattheusM, Devins T, Johansen OE, Woerle HJ, Broedl UC, Inzucchi SE, EMPA-REGOUTCOME Investigators. Empagliflozin, cardiovascular outcomes, andmortality in type 2 diabetes. N Engl J Med. 2015;373(22):2117–28.16. Jarcho JA, Keaney Jr JF. Proof that lower is better–LDL cholesterol andIMPROVE-IT. N Engl J Med. 2015;372:2448–50.17. Grant PJ. Empagliflozin in diabetes: a therapeutic light at the end of thecardiovascular tunnel? Diab Vasc Dis Res. 2015;12(6):394–5.18. Taylor F, Huffman MD, Macedo AF, Moore TH, Burke M, Davey Smith G,Ward K, Ebrahim S. Statins for the primary prevention of cardiovasculardisease. Cochrane Database Syst Rev. 2013;1, CD004816.19. Hackshaw A, Kirkwood A. Interpreting and reporting clinical trials with resultsof borderline significance. BMJ. 2011;343:d3340. doi:10.1136/bmj.d3340.•  We accept pre-submission inquiries •  Our selector tool helps you to find the most relevant journal•  We provide round the clock customer support •  Convenient online submission•  Thorough peer review•  Inclusion in PubMed and all major indexing services •  Maximum visibility for your researchSubmit your manuscript your next manuscript to BioMed Central and we will help you at every step:Allan et al. BMC Medicine  (2017) 15:58 Page 10 of 10


Citation Scheme:


Citations by CSL (citeproc-js)

Usage Statistics



Customize your widget with the following options, then copy and paste the code below into the HTML of your page to embed this item in your website.
                            <div id="ubcOpenCollectionsWidgetDisplay">
                            <script id="ubcOpenCollectionsWidget"
                            async >
IIIF logo Our image viewer uses the IIIF 2.0 standard. To load this item in other compatible viewers, use this url:


Related Items