UBC Faculty Research and Publications

Identification of validated case definitions for chronic disease using electronic medical records: a… Souri, Sepideh; Symonds, Nicola E; Rouhi, Azin; Lethebe, Brendan C; Garies, Stephanie; Ronksley, Paul E; Williamson, Tyler S; Fabreau, Gabriel E; Birtwhistle, Richard; Quan, Hude; McBrien, Kerry A Feb 23, 2017

Your browser doesn't seem to have a PDF viewer, please download the PDF to view this item.

Item Metadata


52383-13643_2017_Article_431.pdf [ 341.8kB ]
JSON: 52383-1.0367871.json
JSON-LD: 52383-1.0367871-ld.json
RDF/XML (Pretty): 52383-1.0367871-rdf.xml
RDF/JSON: 52383-1.0367871-rdf.json
Turtle: 52383-1.0367871-turtle.txt
N-Triples: 52383-1.0367871-rdf-ntriples.txt
Original Record: 52383-1.0367871-source.json
Full Text

Full Text

PROTOCOL Open AccessIdentification of validated case definitionsfor chronic disease using electronic medicalrecords: a systematic review protocolSepideh Souri1, Nicola E. Symonds2, Azin Rouhi3, Brendan C. Lethebe1, Stephanie Garies1,4, Paul E. Ronksley1,Tyler S. Williamson1, Gabriel E. Fabreau5, Richard Birtwhistle6, Hude Quan1 and Kerry A. McBrien1,4*AbstractBackground: Primary care electronic medical record (EMR) data are being used for research, surveillance, andclinical monitoring. To broaden the reach and usability of EMR data, case definitions must be specified to identifyand characterize important chronic conditions. The purpose of this study is to identify all case definitions for a setof chronic conditions that have been tested and validated in primary care EMR and EMR-linked data. This work willprovide a reference list of case definitions, together with their performance metrics, and will identify gaps wherenew case definitions are needed.Methods: We will consider a set of 40 chronic conditions, previously identified as potentially important forsurveillance in a review of multimorbidity measures. We will perform a systematic search of the published literatureto identify studies that describe case definitions for clinical conditions in EMR data and report the performance ofthese definitions. We will stratify our search by studies that use EMR data alone and those that use EMR-linked data.We will compare the performance of different definitions for the same conditions and explore the influence of datasource, jurisdiction, and patient population.Discussion: EMR data from primary care providers can be compiled and used for benefit by the healthcare system.Not only does this work have the potential to further develop disease surveillance and health knowledge, EMRsurveillance systems can provide rapid feedback to participating physicians regarding their patients. Existing casedefinitions will serve as a starting point for the development and validation of new case definitions and will enablebetter surveillance, research, and practice feedback based on detailed clinical EMR data.Systematic review registration: PROSPERO CRD42016040020Keywords: Systematic review, Electronic medical record, Chronic disease, Case definitions, Big dataBackgroundRationaleThe collection and storage of vast amounts of healthdata is growing rapidly [1]. These “big data” include elec-tronic medical record (EMR) data and traditional codedadministrative health data. EMRs, which contain com-prehensive demographic and clinical information includ-ing diagnoses, prescriptions, physical measurements, andlaboratory test results, are increasingly used in the pri-mary care setting to record patient information and pro-vide patient care [2]. EMR data are used for research,surveillance, and clinical monitoring in many countries;however, their potential is largely unused in Canada [3].Administrative health data are routinely used for re-search and surveillance, as most are population-based,relatively inexpensive compared to primary data collec-tion, and exist in a structured format [4]. Like adminis-trative data, information contained in EMRs also has thepotential to be collected in databases and used in re-search and public health surveillance [3]. EMR data canbe used alone or in some cases linked to traditional* Correspondence: kamcbrie@ucalgary.ca1Department of Community Health Sciences, Cumming School of Medicine,University of Calgary, Calgary, Alberta, Canada4Department of Family Medicine, Cumming School of Medicine, University ofCalgary, Calgary, Alberta, CanadaFull list of author information is available at the end of the article© The Author(s). 2017 Open Access This article is distributed under the terms of the Creative Commons Attribution 4.0International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, andreproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link tothe Creative Commons license, and indicate if changes were made. The Creative Commons Public Domain Dedication waiver(http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated.Souri et al. Systematic Reviews  (2017) 6:38 DOI 10.1186/s13643-017-0431-9coded administrative health data (EMR-linked data). Animportant step in conducting research using EMR datais to identify subgroups of patients with a specific dis-ease or condition of interest using validated disease casedefinitions.Case definitions, also referred to as phenotypes, areautomated computerized algorithms applied to second-ary data that allow for identification of specific cohortswithin EMR databases without the need for manualchart review by a researcher or clinician [5]. In general,case definitions are validated against a gold standard fordisease identification, most often manual review of pa-tient charts. Researchers around the world have devel-oped and validated case definitions for different diseaseconditions and applied them to EMR data. Validated dis-ease case definitions have the potential to be modifiedand applied to various EMR databases to enable bettersurveillance, research, and practice feedback based ondetailed clinical EMR data.Chronic diseases are a significant burden to patientsand the health care system. They include both physicaland mental illnesses and affect at least one third of allCanadians [6]. Barnett et al. conducted a literature re-view, followed by a consensus exercise to identify a setof 40 conditions likely to be chronic and have significantimpact on patients’ treatment needs, function, quality oflife, morbidity, and mortality [7]. A systematic review ofcase definitions applied to administrative health dataidentified validated algorithms to detect 30 of these con-ditions [8]. No previous work has identified and reportedon validated disease case definitions for chronic diseasein EMR or EMR-linked data.ObjectiveThe objective of this study is to identify all case defini-tions for a set of chronic conditions, which have beentested and validated in primary care EMR and EMR-linked data. We will conduct a systematic review ofprimary studies that report on the development and val-idation of chronic disease case definitions for use in pri-mary care EMR and EMR-linked data. This work willallow us to collect and report on a comprehensive set ofchronic conditions with validated case definitions. Notonly will this be a valuable resource for researchers usingEMR databases, but knowledge of these existing defini-tions will also pave the way for development and valid-ation of additional case definitions for diseases wheresuch definitions are lacking.MethodsWe will perform a systematic review following a prede-termined protocol, in accordance with the PreferredReporting Items for Systematic Reviews and Meta-analyses (PRISMA) reporting guidelines [9].Data sources and search strategyWe will search MEDLINE and MEDLINE-in-Process(Ovid) and Embase (Ovid) with no date, country, orlanguage restrictions. We will also search the bibliog-raphies of all identified studies. Further, the websites forEMR and administrative databases will be searched forbibliographic lists (e.g., Clinical Practice Research Data-link [10], www.cprd.com), and content experts will becontacted for information about other potential ongoingor unpublished studies. The search of online databaseswill include three themes:1. Electronic medical records2. Case definition3. Validation studyWe will use a comprehensive set of MeSH terms andkeyword searches for each of the three themes to ensurewe capture all relevant references. For example, the term“EMR” may be synonymous with a number of relevantkeywords (e.g., computerized medical records, electronichealth record, EHR). These three searches will then becombined using the Boolean term “AND.” Additional file1 outlines our detailed MEDLINE search strategy. Termsused to define chronic conditions will be intentionallyomitted to ensure capture of any chronic condition, in-cluding our pre-specified list of 40 conditions as shownin Table 1 [7].Study selectionTwo reviewers will independently screen all abstracts.Articles that report original data for the developmentand validation of chronic disease case definitions in pri-mary care EMR data or EMR-linked data will beTable 1 List of the 40 chronic disease conditions (Barnett et al. [7])• Hypertension• Depression• Painful condition• Asthma• Coronary heart disease• Treated dyspepsia• Diabetes• Thyroid disorders• Rheumatoid arthritis• Hearing loss• Chronic obstructivepulmonary disease• Anxiety and othersomatoform disorders• Irritable bowelsyndrome• New diagnosis ofcancer in last 5 years• Alcohol problems• Other psychoactivesubstance misuse• Treated constipation• Stoke and transientischemic attack• Chronic kidney disease• Diverticular disease ofthe intestine• Atrial fibrillation• Peripheral vasculardisease• Heart failure• Prostate disorders• Glaucoma• Epilepsy• Dementia• Schizophrenia• Psoriasis oreczema• Inflammatorybowel disease• Migraine• Blindness andlow vision• Chronic sinusitis• Learningdisability• Anorexia orbulimia• Bronchiectasis• Parkinson’sdisease• Multiplesclerosis• Viral hepatitis• Chronic liverdiseaseSouri et al. Systematic Reviews  (2017) 6:38 Page 2 of 4considered for further review. The initial screen will beintentionally broad to capture any relevant literature. Allcitations where either reviewer feels that further reviewis warranted will be kept for full text review. Agreementwill be quantified at this stage using the kappa statistic,and any disagreements will be resolved by consensus orby a third reviewer as needed. Bibliographic details fromall stages of the review will be managed with the Synthe-sis software package [11].The same two reviewers will scan full text articles forthe following inclusion criteria:1. The database under study is either a primary careEMR database or a primary care EMR databaselinked to at least one administrative health database.2. There is a description of a computerized casedefinition for a specific disease or condition.3. The condition or conditions under study include atleast one of the 40 chronic conditions identified byBarnett et al. [7].4. A clearly stated reference standard is used tovalidate the case definition.5. Validity outcomes are reported (i.e., sensitivity,specificity, positive predictive value, negativepredictive value, kappa, receiver operatingcharacteristic, likelihood ratio).Exclusion criteria: Non-human studies will be ex-cluded. The study will be limited to diseases that presentin a primary care setting. Studies reporting on dentalhealth or other non-primary care settings will be ex-cluded. We will also exclude studies where EMR data isbased on patient self-report.Data extractionA data extraction form will be used to collect informa-tion from each included study. In duplicate, the follow-ing data elements will be extracted: publication date,first author, country, EMR platform, administrative datasources (in the case of linked studies), description ofcase definition, disease(s) under study, and measures ofvalidity (e.g., sensitivity, specificity).Risk of bias assessmentIncluded studies will be assessed for quality using acomponent approach. We will use relevant items fromthe QUADAS quality assessment tool for diagnostic ac-curacy studies [12]. This tool includes an assessment ofbias in several domains, including patient selection, thevalidation strategy, and reporting of outcomes. Two au-thors will independently assess risk of bias in each do-main and report the risk of bias as high, low, or unclear.Disagreements will be resolved by discussion or with athird reviewer as needed.Data synthesisThe number of articles identified, including those thatare included and excluded will be summarized using aflow chart. Results from included studies will be de-scribed in detail, grouped by disease or health condition,and reported for EMR and EMR-linked data separately.For each chronic condition, relevant elements from eachstudy will be reported and summarized. Data will not bepooled, since there are several disease conditions and weanticipate finding heterogeneity between databases usedacross the different studies. We will stratify our findingsby data source (number and type), jurisdiction, and pa-tient population. Finally, given the complementary na-ture of our review with that done by Tonelli et al. oncase definitions in administrative health data [8], we willproduce a comparison table that describes case defini-tions and their metrics for each of the three major typesof data: EMR data alone, EMR-linked data, and adminis-trative data alone.In addition to summarizing case definitions and theirperformance metrics by disease condition, we will alsoperform a secondary analysis focused on the methodsemployed across case definitions. We will produce a de-tailed inventory of the combinations of variables used,the data fields accessed, and the computer programmingmethods used. Within disease conditions for which thereis more than one validated case definition, we will per-form a descriptive analysis that compares the specifica-tions of the case definitions and their relativeperformance.DiscussionData collected in primary care EMRs is becoming an im-portant resource for conducting research and under-standing disease patterns and prevalence. The recentand widespread uptake of EMRs in primary care has cre-ated a new source of detailed clinical information notfound in administrative health data that has the potentialto be used in research and surveillance [1, 3]. An essen-tial step in the use of EMR data in research is applyingvalidated disease case definitions to identify a group ofpatients with a condition under study.We undertook this project to collect and report on allstudies that have developed and validated disease casedefinitions using EMR data. Validated case definitionsare important tools, since they can be adapted and ap-plied to different EMR databases to conduct research. Inaddition, this study will allow us to understand the ex-tent of disease conditions for which validated case defi-nitions have been developed and encourage furtherresearch to develop and validate case definitions forother disease conditions, where such definitions do notexist.Souri et al. Systematic Reviews  (2017) 6:38 Page 3 of 4Specifically, our results will improve our ability toanalyze chronic diseases at the population level and, fur-ther, examine the effects of multimorbidity. The exist-ence of validated case definitions for EMRs will alsoallow precise characterization of individual patients, en-abling physicians to tailor practice guidelines accordingto individual risk profiles, as well as enhance clinicalfeedback to physicians and practices by making qualitymetrics more specific to their practice panel. Addition-ally, this review will enable researchers to access the de-tailed clinical information contained in EMR data.Finally, our results will improve standardization of defi-nitions used for disease conditions and will ultimatelyimprove comparison of surveillance metrics at the inter-national level.Additional fileAdditional file 1: Proposed search strategy for Ovid MEDLINE®.(DOCX 62 kb)AbbreviationsEHR: Electronic health record; EMR: Electronic medical record;PRISMA: Preferred Reporting Items for Systematic Reviews and Meta-analyses;QUADAS: Quality Assessment of Diagnostic Accuracy StudiesAcknowledgementsNot applicable.FundingThis was an investigator-initiated project. No sources of funding are relatedto the research reported.Availability of data and materialsAll datasets and materials are publically available.Authors’ contributionsThis review was conceived by PER, TSW, GEF, and KAM, and the protocolwas designed with input by SS, NES, AR, BCL, SG, RB, and HQ. NES, AR, BCL,SG, and KAM designed the search strategy. RB and HQ contributed asknowledge users. SS, NES, AR, BCL, PER, and KAM drafted the manuscript,and all authors critically revised it and approved the final version. KAM willact as the guarantor for this review.Competing interestsThe authors declare that they have no competing interests.Consent for publicationNot applicable.Ethics approval and consent to participateAll data will be obtained from publically available materials and will notrequire ethics approval.Author details1Department of Community Health Sciences, Cumming School of Medicine,University of Calgary, Calgary, Alberta, Canada. 2Faculty of Science, Universityof British Columbia, Vancouver, Canada. 3Faculty of Medicine and Dentistry,University of Alberta, Edmonton, Canada. 4Department of Family Medicine,Cumming School of Medicine, University of Calgary, Calgary, Alberta, Canada.5Department of Medicine, Cumming School of Medicine, University ofCalgary, Calgary, Alberta, Canada. 6Department of Family Medicine, Faculty ofHealth Sciences, Queen’s University, Kingston, Ontario, Canada.Received: 25 July 2016 Accepted: 10 February 2017References1. Murdoch TB, Detsky AS. The inevitable application of big data to healthcare. JAMA. 2013;309:1351–2.2. Biro SC, Barber DT, Kotecha JA. Trends in the use of electronic medicalrecords. Can Fam Physician. 2012;58, e21.3. Birtwhistle R, Williamson T. Primary care electronic medical records: a newdata source for research in Canada. CMAJ. 2015;187:239–40.4. Quan H, Smith M, Barlett-Esquilant G, Johansen H, Tu K, Lix L, HypertensionOutcome and Surveillance Team. Mining administrative health databases toadvance medical science: geographical considerations and untappedpotential in Canada. Can J Cardiol. 2012;28:152–4.5. Williamson T, Green ME, Birtwhistle R, Khan S, Garies S, Wong ST, et al.Validating the 8 CPCSSN case definitions for chronic disease surveillance ina primary care database of electronic health records. Ann Fam Med. 2014;12:367–72.6. Broemeling AM, Watson DE, Prebtani F. Population patterns of chronichealth conditions, co-morbidity and healthcare use in Canada: implicationsfor policy and practice. Healthc Q. 2008;11:70–6.7. Barnett K, Mercer SW, Norbury M, Watt G, Wyke S, Guthrie B. Epidemiologyof multimorbidity and implications for health care, research, and medicaleducation: a cross-sectional study. Lancet. 2012;380:37–43.8. Tonelli M, Wiebe N, Fortin M, Guthrie B, Hemmelgarn BR, James MT, et al.Methods for identifying 30 chronic conditions: application to administrativedata. BMC Med Inform Decis Mak. 2015;15:31.9. Moher D, Liberati A, Tetzlaff J, Altman DG, PRISMA Group. Preferredreporting items for systematic reviews and meta-analyses: the PRISMAstatement. Ann Intern Med. 2009;151:264–9.10. Herrett E, Gallagher AM, Bhaskaran K, Forbes H, Mathur R, van Staa T, et al.Data resource profile: clinical practice research datalink (CPRD). Int JEpidemiol. 2015;44:827–36.11. Yergens, D. Synthesis v2.4 and v3.0. 2015 [cited 2015 June 1]; Availablefrom: [http://www.synthesis.info]12. Whiting P, Rutjes AW, Reitsma JB, Bossuyt PM, Kleijnen J. The developmentof QUADAS: a tool for the quality assessment of studies of diagnosticaccuracy included in systematic reviews. BMC Med Res Methodol. 2003;3:25.•  We accept pre-submission inquiries •  Our selector tool helps you to find the most relevant journal•  We provide round the clock customer support •  Convenient online submission•  Thorough peer review•  Inclusion in PubMed and all major indexing services •  Maximum visibility for your researchSubmit your manuscript atwww.biomedcentral.com/submitSubmit your next manuscript to BioMed Central and we will help you at every step:Souri et al. Systematic Reviews  (2017) 6:38 Page 4 of 4


Citation Scheme:


Citations by CSL (citeproc-js)

Usage Statistics



Customize your widget with the following options, then copy and paste the code below into the HTML of your page to embed this item in your website.
                            <div id="ubcOpenCollectionsWidgetDisplay">
                            <script id="ubcOpenCollectionsWidget"
                            async >
IIIF logo Our image viewer uses the IIIF 2.0 standard. To load this item in other compatible viewers, use this url:


Related Items