{"@context":{"@language":"en","Affiliation":"http:\/\/vivoweb.org\/ontology\/core#departmentOrSchool","AggregatedSourceRepository":"http:\/\/www.europeana.eu\/schemas\/edm\/dataProvider","Citation":"https:\/\/open.library.ubc.ca\/terms#identifierCitation","CopyrightHolder":"https:\/\/open.library.ubc.ca\/terms#rightsCopyright","Creator":"http:\/\/purl.org\/dc\/terms\/creator","DateAvailable":"http:\/\/purl.org\/dc\/terms\/issued","DateIssued":"http:\/\/purl.org\/dc\/terms\/issued","Description":"http:\/\/purl.org\/dc\/terms\/description","DigitalResourceOriginalRecord":"http:\/\/www.europeana.eu\/schemas\/edm\/aggregatedCHO","FullText":"http:\/\/www.w3.org\/2009\/08\/skos-reference\/skos.html#note","Genre":"http:\/\/www.europeana.eu\/schemas\/edm\/hasType","IsShownAt":"http:\/\/www.europeana.eu\/schemas\/edm\/isShownAt","Language":"http:\/\/purl.org\/dc\/terms\/language","PeerReviewStatus":"https:\/\/open.library.ubc.ca\/terms#peerReviewStatus","Provider":"http:\/\/www.europeana.eu\/schemas\/edm\/provider","Publisher":"http:\/\/purl.org\/dc\/terms\/publisher","PublisherDOI":"https:\/\/open.library.ubc.ca\/terms#publisherDOI","Rights":"http:\/\/purl.org\/dc\/terms\/rights","RightsURI":"https:\/\/open.library.ubc.ca\/terms#rightsURI","ScholarlyLevel":"https:\/\/open.library.ubc.ca\/terms#scholarLevel","Subject":"http:\/\/purl.org\/dc\/terms\/subject","Title":"http:\/\/purl.org\/dc\/terms\/title","Type":"http:\/\/purl.org\/dc\/terms\/type","URI":"https:\/\/open.library.ubc.ca\/terms#identifierURI","SortDate":"http:\/\/purl.org\/dc\/terms\/date"},"Affiliation":[{"@value":"Medicine, Faculty of","@language":"en"},{"@value":"Non UBC","@language":"en"},{"@value":"Anesthesiology, Pharmacology and Therapeutics, Department of","@language":"en"}],"AggregatedSourceRepository":[{"@value":"DSpace","@language":"en"}],"Citation":[{"@value":"BMC Medical Research Methodology. 2021 Aug 28;21(1):179","@language":"en"}],"CopyrightHolder":[{"@value":"The Author(s)","@language":"en"}],"Creator":[{"@value":"Petrosyan, Yelena","@language":"en"},{"@value":"Thavorn, Kednapa","@language":"en"},{"@value":"Smith, Glenys","@language":"en"},{"@value":"Maclure, Malcolm","@language":"en"},{"@value":"Preston, Roanne","@language":"en"},{"@value":"van Walravan, Carl","@language":"en"},{"@value":"Forster, Alan J.","@language":"en"}],"DateAvailable":[{"@value":"2021-10-04T21:18:44Z","@language":"en"}],"DateIssued":[{"@value":"2021-08-28","@language":"en"}],"Description":[{"@value":"Background\r\n                Since primary data collection can be time-consuming and expensive, surgical site infections (SSIs) could ideally be monitored using routinely collected administrative data. We derived and internally validated efficient algorithms to identify SSIs within 30\u2009days after surgery with health administrative data, using Machine Learning algorithms.\r\n              \r\n              \r\n                Methods\r\n                All patients enrolled in the National Surgical Quality Improvement Program from the Ottawa Hospital were linked to administrative datasets in Ontario, Canada. Machine Learning approaches, including a Random Forests algorithm and the high-performance logistic regression, were used to derive parsimonious models to predict SSI status. Finally, a risk score methodology was used to transform the final models into the risk score system. The SSI risk models were validated in the validation datasets.\r\n              \r\n              \r\n                Results\r\n                Of 14,351 patients, 795 (5.5%) had an SSI. First, separate predictive models were built for three distinct administrative datasets. The final model, including hospitalization diagnostic, physician diagnostic and procedure codes, demonstrated excellent discrimination (C statistics, 0.91, 95% CI, 0.90\u20130.92) and calibration (Hosmer-Lemeshow \u03c72 statistics, 4.531, p\u2009=\u20090.402).\r\n              \r\n              \r\n                Conclusion\r\n                We demonstrated that health administrative data can be effectively used to identify SSIs. Machine learning algorithms have shown a high degree of accuracy in predicting postoperative SSIs and can integrate and utilize a large amount of administrative data. External validation of this model is required before it can be routinely used to identify SSIs.","@language":"en"}],"DigitalResourceOriginalRecord":[{"@value":"https:\/\/circle.library.ubc.ca\/rest\/handle\/2429\/79879?expand=metadata","@language":"en"}],"FullText":[{"@value":"RESEARCH Open AccessPredicting postoperative surgical siteinfection with administrative data: arandom forests algorithmYelena Petrosyan1, Kednapa Thavorn1,2,3,4*, Glenys Smith3, Malcolm Maclure5, Roanne Preston5,Carl van Walravan1,2,3 and Alan J. Forster1,3,6AbstractBackground: Since primary data collection can be time-consuming and expensive, surgical site infections (SSIs)could ideally be monitored using routinely collected administrative data. We derived and internally validatedefficient algorithms to identify SSIs within 30 days after surgery with health administrative data, using MachineLearning algorithms.Methods: All patients enrolled in the National Surgical Quality Improvement Program from the Ottawa Hospitalwere linked to administrative datasets in Ontario, Canada. Machine Learning approaches, including a RandomForests algorithm and the high-performance logistic regression, were used to derive parsimonious models topredict SSI status. Finally, a risk score methodology was used to transform the final models into the risk scoresystem. The SSI risk models were validated in the validation datasets.Results: Of 14,351 patients, 795 (5.5%) had an SSI. First, separate predictive models were built for three distinctadministrative datasets. The final model, including hospitalization diagnostic, physician diagnostic and procedurecodes, demonstrated excellent discrimination (C statistics, 0.91, 95% CI, 0.90\u20130.92) and calibration (Hosmer-Lemeshow \u03c72 statistics, 4.531, p = 0.402).Conclusion: We demonstrated that health administrative data can be effectively used to identify SSIs. Machinelearning algorithms have shown a high degree of accuracy in predicting postoperative SSIs and can integrate andutilize a large amount of administrative data. External validation of this model is required before it can be routinelyused to identify SSIs.Keywords: Surgical site infection, Administrative data, Machine learning, Random forests, Data mining, Predictive modeling\u00a9 The Author(s). 2021 Open Access This article is licensed under a Creative Commons Attribution 4.0 International License,which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you giveappropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate ifchanges were made. The images or other third party material in this article are included in the article's Creative Commonslicence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commonslicence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtainpermission directly from the copyright holder. To view a copy of this licence, visit http:\/\/creativecommons.org\/licenses\/by\/4.0\/.The Creative Commons Public Domain Dedication waiver (http:\/\/creativecommons.org\/publicdomain\/zero\/1.0\/) applies to thedata made available in this article, unless otherwise stated in a credit line to the data.* Correspondence: kthavorn@ohri.ca1Clinical Epidemiology, Ottawa Hospital Research Institute, 1053 Carling Ave,Ottawa, Ontario K1Y 4E9, Canada2School of Epidemiology and Public Health, University of Ottawa, 75 LaurierAve E, Ottawa, Ontario K1N 6N5, CanadaFull list of author information is available at the end of the articlePetrosyan et al. BMC Medical Research Methodology          (2021) 21:179 https:\/\/doi.org\/10.1186\/s12874-021-01369-9BackgroundSurgical site infection (SSI) is common and consideredone of the most common types of postoperative compli-cations [1]. SSIs are associated with substantial morbid-ity and mortality, prolonged hospital duration of stay,increased hospital readmission rate, and financial burdento health care systems [1\u20135]. Previous research hasshown the importance of effective prevention strategiestargeting both short- and long-term consequences ofSSI, which requires an ability to track SSIs [2]. Since theprimary data collection can be time-consuming and ex-pensive, routinely collected health administrative dataoffer ample opportunities to identify and monitor SSIs,and assess the impact of prevention strategies, given awide population coverage and minimal costs and efforts.Several studies have developed some accurate adminis-trative algorithms to identify SSIs [6\u201310], while otherstudies have found that SSI identification using adminis-trative data is imprecise [11]. However, previous studieswere often based on small sample sizes and\/or a limitedset of pre-selected variables to predict SSIs.Machine learning approaches have been successfullyapplied to create predictive models in several fields ofstudy, including automatic medical diagnostics [12, 13].With interpretability of model parameters and ease ofuse, logistic regression can generate excellent modelsand serve as a commonly accepted statistical tool.Random Forests approach is used in situations where re-gression assumptions may be violated by situations inwhich many predictors are associated with a small num-ber of outcomes [14]. It can cope with inter-correlationbetween multiple explanatory variables, since each pre-dictor is selected randomly for each stage of the learningprocess [15], unlike standard regression approaches.Previous studies have indicated that the Random Forestsapproach may have better prediction accuracy thanother machine learning methods [16, 17]. We hypothe-sized that the use of machine learning approaches and alarge data set with many features will improve theaccuracy of SSI prediction. This study aimed to developefficient algorithms to identify SSIs within 30 days aftersurgery using health administrative data.Material and methodsThis study was divided into three stages. In the first stage, aRandom Forests algorithm was used to perform a prelimin-ary screening of variables and to rank the importance ofcandidate variables. In the second stage, the 30 most im-portant variables from the first stage were input into thehigh-performance logistic regression to build interpretableand parsimonious models for all three administrative data-sets used in this study. Finally, we used risk score modelingmethodology to transform the final logistic models formthe second stage into the risk score system.Selection and description of participantsThis study was performed at The Ottawa hospital(TOH), Canada, a 1200-bed academic health sciencescenter providing approximately 90% of the major surgi-cal operations in a catchment area of 1.2 millionpeople. We identified all patients at TOH aged 18 yearsand older who underwent surgery and were included inthe American College of Surgeons National SurgicalQuality Improvement Program (NSQIP) data collection,between April 1, 2010, and March 31, 2015. TheNSQIP uses trained Surgical Clinical Reviewers tocollect data using a combination of chart review andfollow up from the preoperative period through 30 dayspostoperatively. Patients were excluded if: 1) they werenot eligible for the Ontario Health Insurance Program(OHIP) or had an invalid OHIP number, because thiswas required for linkage to health administrativedatasets; or 2) they had missing admission, discharge,or surgery dates.Population-based health administrative datasetsWe linked the NSQIP dataset to three distinctpopulation-based, health administrative datasetshoused at the Institute for Clinical and EvaluativeSciences (ICES). ICES is an independent, non-profit re-search institute whose legal status under Ontario\u2019shealth information privacy law allows it to collect andanalyze health care and demographic data, withoutinformed consent, for health system evaluation andimprovement. The use of data in this project was au-thorized under section 45 of Ontario\u2019s Personal HealthInformation Protection Act, which does not requirereview by a Research Ethics Board. The datasets in-cluded: 1) the Discharged Abstract Database and SameDay Surgery Database to identify the records of thehospitalization (ICD-10 code), including admission anddischarge dates, diagnoses, 2) the Physician ServicesDatabase to retrieve all claims for services provided byall eligible health care providers, and 3) the OntarioHealth Insurance Plan (OHIP) database that containsphysician diagnostic codes (ICD-9 codes) and diagno-sis descriptions. All patients were followed for 30 daysfrom the time of their surgery. All databases werelinked using anonymized unique identifiers and ana-lyzed at the ICES at the University of Ottawa, Ontario.This study was approved by the Ottawa Health ScienceNetwork Research Ethics Board.Study outcomeAll individuals who had any type of SSIs (i.e. superficial,deep, or organ space) (Additional file 1) within 30 daysafter surgery, according to the definition of the NSQIPprotocol, were defined as having experienced an SSI.Petrosyan et al. BMC Medical Research Methodology          (2021) 21:179 Page 2 of 11Statistical analysisThis study utilized a 3-stage predictive modeling basedon the hybrid modeling approaches developed in previ-ous studies [14, 15, 18]. All stages described below wereapplied to each administrative dataset used in this studyto generate three sub-models that contributed to theomnibus SSI model.Stage 1 \u2013 model development using random forestsalgorithmDetails of Random Forests method have been describedelsewhere [19\u201321]. In short, each of the classificationtrees is built using a bootstrap sample of the data, and arandom subset of variables was selected at each split,thereby constructing a large collection of decision treeswith controlled variation [22, 23] (Additional file 2). TheRandom Forests trees are not pruned, so as to obtainlow-bias trees. Every tree in the forest casts a \u201cvote\u201d forthe best classification for a given observation, and theclass receiving most votes results in the prediction forthat observation. The study cohort was first dividedrandomly into derivation (70%) and validation (30%)samples (Additional file 3). Then, the derivation datawas sampled to create an in-bag partition \u2013 (2\/3) toconstruct the decision tree, and a smaller out-of-bagpartition (1\/3) to test the constructed tree to evaluate itsperformance by computing: 1) misclassification error, 2)C-statistics, and 3) model performance (sensitivity, spe-cificity, etc.). The optimal number of trees and a subsetof variables at each node were selected using the\u201ctuneRF\u201d function in R to minimize the misclassificationerror. Random Forests calculates estimates of variableimportance for classification using permutation variableimportance measure (VIM) [19], which is based on thedecrease of a classification accuracy when values of avariable in a node of a tree are permuted randomly.Finally, K-fold cross validation was used to evaluate theRandom Forests model with 10 folds. We identified sub-sets of top 30 important diagnostic or procedure codesto predict SSIs, using a mean decrease accuracy value of0.02 as a cut-off point. The Random Forests analyseswere performed in R statistical software (3.3.2.) using\u201crandomForest\u201d package [21].Stage 2 \u2013 stepwise model selection using high-performance logistic regression approachRandom forests algorithm was used to perform a prelim-inary screening of variables and to gain importanceranks. Then, the selected top-30 important predictorswere input into the high-performance logistic modelwith stepwise variable selection to find the best parsimoni-ous model to predict SSIs [14, 24, 25]. High-performancelogistic regression (proc hplogisitc) belongs to the high-performance analytics procedures that can be used toreduce the dimension or identify important variables toobtain parsimonious predictive models [26]. It permitsseveral link functions and can handle ordinal and nominaldata with more than two response categories [26]. TheSchwarz Bayesian Criterion (SBC) was used as a penalizedmeasure of fit for logistic regression model to help avoidthe model over-fitting.Stage 3 \u2013 point system or risk scoresWe used the methods suggested by Sullivan et al. [27] tosummarize each logistic model from stage 2 as a pointsystem. The point system or risk scores provide statis-tical information in a more clinically useful form thanlogistic regression models, as generalizability of themodels developed from data from a single or a smallgroup of hospitals to other patient populations is ques-tionable [28, 29]. Clinical prediction models and associ-ated risk-scoring systems are popular statistical methodsas they permit a rapid assessment of patient risk withoutthe use of computers or other electronic devices [30].The use of such points-based systems facilitatesevidence-based clinical decision making [30]. The pointsystem developed in this study was designed to predictthe risk of postoperative SSIs, based on a patient\u2019s pre-procedural risk factors or predictors. The point scoreassigned to each predictor was derived from a well-fit lo-gistic regression model.The point scores were developed for hospitalization(ICD-10) and physician (ICD-9) diagnostic codes, andphysician procedure claims. All variables in the modelswere categorical, and the distance between a variableand its base category in regression coefficient units wasequal to the size of the coefficient. For each variable, itsdistance from the base category in regression coefficientunits was divided by this constant and rounded to thenearest integer to get its point value.Then, the obtained point scores were input into logis-tic regression model and adjusted for other potentialconfounding factors suggested by the existing literature,including age, sex, surgical procedure, emergency case,concurrent surgical procedures, patient\u2019s physical status(ASA-5), and duration of surgery. The full model dis-crimination (C statistics or AUC) and calibration(Hosmer-Lemeshow (H-L) statistics) were assessed inthe validation dataset. All methods were performed inaccordance with the guidelines for developing andreporting Machine Learning predictive models in bio-medical research [31]. The high-performance regressionand point score assignment were performed in SAS 9.4statistical software.ResultsWe identified 14,351 patients who underwent surgeryfrom April 1, 2010 to March 31, 2015 and were enrolledPetrosyan et al. BMC Medical Research Methodology          (2021) 21:179 Page 3 of 11into NSQIP at our hospital. An SSI was identified in 795(5.5%) of these patients. Of these, 540 (68%) had superfi-cial SSIs and 255 (32%) had deep or organ space SSIs.Descriptive statistics for patients in the study sample arereported in Additional file 4. The derivation and valid-ation datasets were similar in terms of baseline covari-ates (Additional file 5).Predictive modeling for hospitalization diagnostic codes(ICD-10)We identified 3085 hospitalization diagnostic (ICD-10)codes recorded within 30 days following the surgerydate. These codes then were clustered into 994 three-digit hospitalization diagnostic codes that were used forthe further analyses.Stage 1: Given a large number of diagnostic codes (pos-sible predictors), the Random Forests approach was usedto identify a subset of top important 30 hospitalizationdiagnostic codes that best predicts classification. We used800 classification trees and 46 variables available for split-ting at each tree node. The accuracy of the RandomForests model was 95.3%. The resulting SSI predictionmodel demonstrated positive predictive value (PPV) of98%, negative predictive value (NPV) of 97%, and AUC(area under the receiver operating characteristic curve) of0.78 (95% CI 0.77\u20130.79). The accuracy of the RandomForests model after a 10-fold cross-validation was 94.3%.Figure 1 presents the top 30 hospitalization diagnostic(ICD-10) codes for classification of SSIs that have beenidentified using the permutation VIM.Stage 2: The identified top 30 hospitalization diagnos-tic codes (ICD-10) codes were input into the high-performance logistic regression with a stepwise selectionto identify the best parsimonious model to predict SSIs.Table 1, model 1 presents the final model of sixhospitalization diagnostic codes to identify SSIs (AUC0.87, 95% CI 0.86\u20130.89).Stage 3: Risk scores for the final model of hospitalizationdiagnostic (ICD-10) codes are presented in Table 1, Model1 [27]. Among the entire cohort, 80.3% of patients had ascore of 0, 11.8% had a score of 1, and 7.9% had a scoreequal or greater than 2.Predictive modeling for physician diagnostic (ICD-9)codes.We identified 442 physician diagnostic 3-digit codes(using ICD-9-CA) recorded within 30 days following thesurgery date.Stage 1: Given a large number of diagnostic codes(possible predictors), the Random Forests approach wasused to identify a subset of 30 physician diagnostic codesthat best predicts SSIs. The best misclassification ratewas achieved by using 800 classification trees and 31variables available for splitting at each tree node. The ac-curacy of the Random Forests model was 94.7%. Theresulted SSI prediction model demonstrated PPV of98%, NPV of 96%, and AUC of 0.82 (95% CI 0.81\u20130.83).The accuracy of the model after a 10-fold cross-validation was 94.1%. Figure 2 presents the top 30 im-portant physician diagnostic (ICD-9) codes for predic-tion of SSIs that have been identified using VIM.Stage 2: The identified top 30 physician diagnosticcodes were input into the high-performance logistic re-gression model to identify the best parsimonious modelfor prediction of SSIs, using a stepwise selection ap-proach. Table 1, Model 2 presents the final models ofnine physician diagnostic codes to identify SSIs (AUC0.85, 95% CI 0.84\u20130.86).Stage 3: Risk scores for the final model of physiciandiagnostic codes are presented in Table 1, Model 2 [27].Among the entire cohort, 77.8% of patients had a scoreof 0, 7.7% had a score of 1, and 14.5% had a score equalor greater than 2.Predictive modeling for physician procedure claimsWe identified 2543 physician procedure claims recordedwithin 30 days following the surgery date. These codesthen were clustered into 610 three-digit codes that wereused for the further analyses.Stage 1: Given a large number of physician procedurecodes (possible predictors), Random forests approachwas used to identify a subset of 30 physician procedureclaims that best predicts SSIs. The best misclassificationrate was achieved by using 1000 classification trees and37 variables available for splitting at each tree node. Theaccuracy of the Random Forests model was 94.8%. Theresulted SSI prediction model demonstrated PPV of99%, NPV of 97%, and AUC of 0.82 (95% CI 0.81\u20130.83).The accuracy of the model after a 10-fold cross-validation was 94.4%. Figure 3 presents the top 30 phys-ician procedure claims that have been identified usingthe permutation VIM.Stage 2: The identified top 30 physician procedureclaims were input into the high-performance logistic re-gression model to identify the best parsimonious modelfor prediction of SSIs. We used a stepwise variableselection approach. Table 1, Model 3 presents the finalmodels of 14 physician procedure claims to identify SSIs(AUC 0.84, 95% CI 0.83\u20130.85).Stage 3: Risk scores for the final model of physicianprocedure claims are presented in Table 1, Model 3 [27].Among the entire cohort, 55.4% of patients had a scoreof 0, 11.9% had a score of 1, and 44.6% had a score equalor greater than 2.Full model with total risk score of diagnostic andprocedure codesIn the derivation cohort, the total scores of hospitalizationdiagnostic (ICD-10) codes, physician diagnostic (ICD-9)Petrosyan et al. BMC Medical Research Methodology          (2021) 21:179 Page 4 of 11codes and physician procedure claims were included inthe logistic regression model and adjusted for potentialconfounding factors, including surgical specialties, age,sex, duration of surgery, emergency case, ASA class andconcurrent surgical procedures (Table 2).The full model had excellent discrimination (AUC 0.91;95% CI, 0.90\u20130.92) and calibration (H-L statistics, 4.53,p = 0.402). The predicted probability threshold with theoptimal operating characteristics [32] (e.g., the square ofdistance between the point (0, 1) on the upper left handFig. 1 Description of top 30 hospitalization diagnostic (ICD-10) codes to identify SSIs. T81 \u2013 Operative complication (infection, hemorrhage, etc.);C54 \u2013 Malignant neoplasm of specified part of uterus; K65 \u2013 Peritonitis; B96 - Other bacterial agents as the cause of diseases classified elsewhere;K83 \u2013 Biliary duct infection, obstruction, perforation, or fistulation; Y83 - Surgical operation\/procedures as the cause of abnormal reaction of thepatient\/or later complication; C51 - Malignant neoplasms of female genital organs; Y83 - Surgical operation\/procedures as the cause of abnormalreaction of the patient\/complication; C51 - Malignant neoplasms of female genital organs; K75 - Abscess of liver; L27 - Dermatitis and eczema;B95 - Streptococcus and staphylococcus as the cause of diseases; K42 - Umbilical hernia; A04 - Other bacterial intestinal infections; M71 \u2013 Bursalabscess, cyst, infection; N39 - Other disorders of urinary system; D05 - Carcinoma in situ of breast; C21 - Malignant neoplasm of anus and analcanal; T85 - Complications of internal prosthetic devices, implants and grafts; K26 - Duodenal ulcer; N43 - Other disorders of prostate; C25 -Malignant neoplasm of pancreas; A49 - Bacterial infection of unspecified site; K35 - Acute appendicitis; K92 - Other diseases of digestive system;K63 \u2013 Other diseases of intestine; K55 - Vascular disorders of intestine; G00 - Bacterial meningitis, unspecified; Y60 - Unintentional cut, puncture,perforation or haemorrhage during surgical and medical care; D62 - Acute posthaemorrhagic anemia; J80 - Acute respiratory distress syndromePetrosyan et al. BMC Medical Research Methodology          (2021) 21:179 Page 5 of 11corner of ROC space and any point on ROC curve) was apredicted risk of 4% (sensitivity, 83.4%; specificity, 89.2%;PPV, 34.2%; and NPV, 99.1%). In the internal validationcohort, the full model remained strongly discriminative(AUC 0.89, 95% CI 0.88\u20130.90) and well calibrated (H-Lstatistics, 6.47, p = 0.487) (Fig. 4).DiscussionWe used a 3-stage predictive modeling approach to de-rive and internally validate models to predict SSIs within30 days after surgical procedure. To the best of ourknowledge, this is the first study that used MachineLearning approaches to develop efficient algorithms foridentifying SSIs within 30 days after surgery by use ofhealth administrative data. The key finding of our studyis that the risk of SSIs can be reliably estimated usingroutinely collected administrative data, including phys-ician procedure claims, hospital (ICD-10) and physician(ICD-9) diagnostic codes. Our study results demonstratehigh performance of the Random Forests algorithm forTable 1 The best parsimonious models for prediction of SSIsModel 1. The best parsimonious model of hospitalization diagnostic (ICD 10) codesEffect *AOR, 95% CI Risk pointT81- Operative complication (infection, hemorrhage, etc.) 6.40 (5.08\u20138.01) 2K65 - Peritonitis 5.87 (3.88\u20137.88) 1B96 - Other bacterial agents causing infections 2.56 (1.84\u20133.47) 1K83 - Biliary duct infection, obstruction, perforation 6.32 (4.42\u20138.01) 3Y83 - Surgical operation\/procedures as the cause of abnormal reaction of the patient\/ or later complication 2.46 (1.97\u20133.07) 1B95 \u2013 Streptococcus\/ staphylococcus as the cause of diseases 3.25 (2.17\u20134.87) 1Model 2. The best parsimonious model of physician diagnostic (ICD 9) codesEffect AOR, 95% CI Risk point686 - Pyoderma, pyogenic granuloma, other local infections 8.13 (6.50\u20139.20) 3682- Cellulitis, abscess 4.70 (3.57\u20136.10) 2998 - Other complications of procedures 5.68 (4.77\u20136.78) 2556 - Ulcerative colitis 8.60 (6.31\u20139.18) 3685 - Pilonidal cyst with fistula, abscess 2.69 (1.52\u20133.76) 2560 - Intestinal obstruction without mention of hernia 2.97 (2.19\u20134.01) 2154 - Malignant neoplasm of rectum, rectosigmoid junction 4.37 (3.29\u20135.17) 2599 - Other disorders of urethra and urinary tract 2.04 (1.55\u20132.62) 1153 - Malignant neoplasm of colon 2.71 (2.02\u20133.22) 1Model 3. The best parsimonious model of physician procedure claimsEffect AOR, 95% CI Risk pointZ59 - Digestive system surgical procedure: colon\/biliary tract 7.38 (6.08\u20139.09) 4C46 - Infectious disease: hospital consult\/assessment 5.77 (4.66\u20137.43) 3Z10 - Skin\/subcutaneous tissue: incision of abscess or hematoma 7.88 (6.04\u20138.67) 3C03 - General surgery: hospital consult\/assessment 3.45 (2.86\u20134.19) 2H15 - Family practice: assessment on weekend 2.33 (1.80\u20133.01) 2S16 - Digestive system surgical procedures: intestine 1.98 (1.48\u20132.52) 1C20 - Obstetrics and gynecology assessment\/consult 2.25 (1.66\u20133.05) 2Z08 - Debridement of wound(s) and\/or ulcer(s) 4.01 (2.83\u20135.56) 3S21 - Digestive system surgical procedures: colon\/rectum 2.65 (1.91\u20133.62) 2R06 - Skin\/subcutaneous tissue: free island flaps 4.64 (2.58\u20136.36) 3C13 - Internal medicine: hospital assessment\/consult 1.96 (1.52\u20132.36) 1H13 - Family practice: assessment\/consult on weekdays 2.85 (2.18\u20133.52) 2C21 \u2013 Pain management: limited consultations 1.84 (1.55\u20132.10) 1R11- Operations of the breast: incision, excision, repair 2.81 (1.02\u20133.41) 3*AOR, 95% CI = Adjusted Odds Ratio, 95% Confidence IntervalPetrosyan et al. BMC Medical Research Methodology          (2021) 21:179 Page 6 of 11prediction of SSIs without pre-selection of possible pre-dictors given a small number of cases. We derived arelatively small set of variables to identify postoperativeSSIs, including 6 hospital diagnostic codes, 9 physiciandiagnostic codes, and 14 physician procedure claims.Several studies have examined the use of administrativedata to identify postoperative SSIs [6\u201310]. Our studyfindings are consistent with these studies [6, 10]. vanWalraven et al. [6], for example, found that administrativedata, including hospital diagnostic, emergency departmentvisit codes and physician procedure claims, can be effect-ively used to identify postoperative patients with a low riskof having SSIs within 30 days of their surgical procedure.In particular, the predictive probability threshold with theoptimal characteristics was a predicted risk of 5% (sensi-tivity, 82.1%, specificity, 85.6%, PPV, 27.7%). Additionally,Sands et al. found that [9] automated medical and claimrecords together can be used to screen for post dischargeSSIs, but the method they used identified only 10% ofprocedures as possible infections.Fig. 2 Description of the top 30 physician diagnostic (ICD-9) codes to identify SSIs. 686 - Pyoderma, pyogenic granuloma, other local skininfections; 682 - Cellulitis, abscess; 998 - Other complications of procedures, not elsewhere classified; 556 - Ulcerative colitis; 685 - Pilonidal cyst orabscess; 739 - Nonallopathic lesions, not elsewhere classified; 332 - Parkinson\u2019s disease; 599 - Other disorders of urethra and urinary tract; 192 -Malignant neoplasm of other and unspecified parts of nervous system; 257 - Testicular dysfunction; 603 \u2013 Hydrocele; 560 - Intestinal obstructionwithout mention of hernia; 608 - Other disorders of male genital organs; 170 - Malignant neoplasm of bone and articular cartilage; 154 -Malignant neoplasm of rectum, rectosigmoid junction and anus; 821 - Fracture of femur; 075- Infectious mononucleosis, glandular fever; 917-Superficial injury of foot and toe(s); 788 - Symptoms involving urinary system; 153 \u2013 Malignant neoplasm of large intestine - excluding rectum;372 - Conjunctiva disorders (e.g., conjunctivitis, pterygium); 845 \u2013 Sprains and strains of ankle and foot; 591 \u2013 Hydronephrosis; 184 - Malignantneoplasm of vagina, vulva, other female genital organs; 156 - Malignant neoplasm of gallbladder and extra hepatic bile ducts; 290 - Seniledementia, presenile dementia; 569- Other disorders of intestine; 646 - Other complications of pregnancy (e.g., vulvitis, vaginitis, cervicitis, pyelitis,cystitis); 437- Other and ill-defined cerebrovascular disease; 346 - Other diseases of central nervous system (e.g., brain abscess, narcolepsy, motorneuron disease, syringomyelia)Petrosyan et al. BMC Medical Research Methodology          (2021) 21:179 Page 7 of 11The approach used in our study added a new contri-bution to the existing literature by incorporating muchlarger set of features as compared with the previousstudies. It was possible to include all available diagnosticor procedure codes to identify SSIs in this study, becauseRandom Forests approach is generally unaffected by theaddition of irrelevant features and is robust to collinear-ity due to the use of subsets of random variables for treesplits. All the features included in this study were ob-tained from routinely collected data, and given theFig. 3 Description of the top 30 physician procedure claims to identify SSIs. Z59 - Digestive system surgical procedure; C46 - Infectious disease -non-emergency hospital in-patient services: assessment\/ consultation; Z10 - Integumentary system surgical procedures: incision of abscess\/haematoma; K07 - Family practice\/geriatrics acute and chronic home care supervision; K99 - Emergency department \u2013 special visit premium; C03- General surgery, non- emergency hospital in-patient services-assessment, visits, consultations; A35 - Urology -consultations\/ assessment; S16 -Digestive system surgical procedures; H15 - Family practice & practice in general - weekend and holidays: assessment\/care; C64 - General thoracicsurgery - non-emergency hospital in-patient services: consultation assessment; H12 - Family practice & practice in general - nights assessmentand car; C12- Non-emergency hospital in-patient services: Subsequent visits by the MRP; R11- Integumentary system surgical procedures:operations of the breast; E08 - Hospital and institutional consultations\/assessments by MRP; C20 - Obstetrics and gynecology - non-emergencyhospital in-patient services; Z08 - Debridement of wound(s) and\/or ulcer(s) extending into subcutaneous tissue, tendon, ligament, bursa and\/orbone; G55- Diagnostic and therapeutic procedures, critical care; S21- Digestive system surgical procedures: rectum; S65 - Male genital surgicalprocedures; Z74 \u2013 Respiratory surgical procedures; R62- Musculoskeletal system surgical procedures \u2013 amputation; A20 - Obstetrics andgynecology - assessment or consultation; Z22 - Musculoskeletal system surgical procedures; R06 - Myocutaneous, myogenous or fascia-cutaneousflaps, neurovascular island transfer, transplantation of free island skin and subcutaneous flap; A24 - Otolaryngology \u2013 assessment\/ consultation;C13 - Internal and occupational medicine: non- emergency hospital in-patient services; C01 - Non-emergency hospital in-patient services,subsequent visits by the MRP; H13 - Family practice & practice in general \u2013weekdays, evenings: assessment\/care; C21 \u2013 Consultations\/visitsanaesthesia -non-emergency hospital in-patient servicesPetrosyan et al. BMC Medical Research Methodology          (2021) 21:179 Page 8 of 11Table 2 Full model of total risk scores for hospitalization diagnostic (ICD-10) codes, physician diagnostic (ICD-9) codes and physicianprocedure claims, adjusted for the study covariatesEffect Adjusted Odds Ratio 95% Confidence intervalHospitalization diagnostic score 2.12 1.91\u20132.20Physician diagnostic score 1.88 1.75\u20132.02Physician procedure score 1.45 1.31\u20131.56Age < 65 years 1.74 1.40\u20132.16Log-operation duration, min 1.52 1.30\u20131 .72Surgical specialtyGeneral surgery 1.60 1.20\u20132.15Gynecology 1.19 0.80\u20131.76Orthopedics 0.77 0.53\u20131.11Plastics 2.37 1.59\u20133.51Vascular 1.75 1.12\u20132.68Other Reference ReferenceFemale 1.18 0.96\u20131 .47Concurrent surgical procedures1 1.05 0.67\u20131.632+ 1.09 0.67\u20131.750 Reference ReferenceASA classI 0.87 0.75\u20131.33II 1.21 0.80\u20131.80III 1.10 0.66\u20131.76IV 0.32 0.04\u20131.03V Reference ReferenceEmergent case 0.99 0.79\u20131.20Fig. 4 Receiver Operator Characteristics Curve (ROC curve) and *calibration plot for the full model with risk scores for hospitalization diagnostic (ICD-10) codes, physician diagnostic (ICD-9) codes, and physician procedure claims, adjusted for the **study covariates, in the validation cohort. *In thecalibration plot, the observed percentage of patients having an SSI within 30 days of surgery is plotted against the predicted SSI risk from the SSI riskmodel (horizontal axis). **Study covariates: surgical specialties, age, sex, duration of surgery, emergency case, ASA class, concurrent surgical proceduresPetrosyan et al. BMC Medical Research Methodology          (2021) 21:179 Page 9 of 11complex etiology of SSIs, there might be variables thatwould be overlooked if we used a narrower search strat-egy guided by a priori clinical expectations. It would beinappropriate to interpret the identified diagnostic orprocedure codes as either causes or consequences ofSSIs. Random Forests allows us to select variables thatare influencing prediction given a small sample sizes andthe extremely small ratio of samples to variable (large\u201cp\u201d and small \u201cn\u201d). If the identified important variablesare consistent with clinical knowledge, there will bemore confidence in the derived model as a decisionsupport tool.Several aspects of our study should be carefully con-sidered. First, our study contained no information aboutoutpatient antibiotic treatments because the Ontariohealth administrative data used for the study capturesmedication use for people over the age of 65 and whoare on social assistance. Also, we did not include infor-mation about laboratory tests, because the Ontariohealth administrative data captures information only onoutpatient laboratory tests, while laboratory tests per-formed during hospitalization are most important inpredicting SSIs. Thus, information about antibiotic useand laboratory test could substantially improve SSI iden-tification. Second, our study and model captured SSIsthat occurred within 30 days after surgical procedure, soany SSI that occurred outside of this timeframe wouldhave been missed. Third, our study was conducted in asingle teaching hospital, providing about 90% of themajor surgical operations in a catchment area of 1.2 mil-lion people. Therefore, external validation is necessary tomeasure model\u2019s utility in other hospitals and geo-graphic regions. Finally, the coding systems used in theprovince of Ontario might not be available in other ju-risdictions. Therefore, some modifications might be re-quired before using our models in other health regions.ConclusionThis study shows that health administrative data couldbe effectively used in identifying SSIs. Machine learningapproaches have shown a high degree of accuracy in pre-dicting postoperative SSIs and can integrate and utilize alarge amount of administrative data. The results of ourstudy are useful in advancing current and future efforts touse administrative data for patient safety surveillance andimprovement. Further research should examine the use ofmachine learning approaches to identify SSIs, stratified bythe specific types of surgical procedures.AbbreviationsSSI: Surgical site infections; NSQIP: National Surgical Quality ImprovementProgram; OHIP: Ontario Health Insurance Plan; ICD: Classification ofDiseases; ASA: Patient\u2019s physical status; H-L: Hosmer-Lemeshow statistics;PPV: Positive predictive value; NPV: Negative predictive value;VIM: Variable importance measure; AOR: Adjusted odds ratio;CI: Confidence interval; AUC: Area under curveSupplementary InformationThe online version contains supplementary material available at https:\/\/doi.org\/10.1186\/s12874-021-01369-9.Additional file 1. Includes American College of Surgery \u2013 NationalSurgical Quality I mprovement Program definition of different types ofSSIs: superficial, deep and orga-space.Additional file 2. Provides information about the Random Forestsalgorithm: constructing a large collection of decision trees withcontrolled variation, as well as how the multiple models are normallycombined by \u2018voting\u2019.Additional file 3. Provides information about the data partitioning intoderivation (70%) and validation (30%) samples.Additional file 4. Provides descriptive statistics for patients in the studysample.Additional file 5. Provides information on the derivation and validationdatasets.Additional file 6. Provides information on the title of the manuscript,author list and affiliations.AcknowledgmentsThis study was supported by ICES, which is funded by an annual grant fromthe Ontario Ministry of Health and Long-Term Care (MOHLTC). The opinions,results and conclusions reported in this paper are those of the authors andare independent from the funding sources. No endorsement by ICES or theOntario MOHLTC is intended or should be inferred.Authors\u2019 contributionsYelena Petrosyan: conception and design, analysis and interpretation of data;drafting the article and revising it critically for important intellectual content:final approval of the version to be published. Kednapa Thavorn: conceptionand design, interpretation of data; critical revision of the article critically forimportant intellectual content; final approval of the version to be published.Glenys Smith: acquisition of data. Malcolm Maclure: conception, criticalrevision of the article critically for important intellectual content; finalapproval of the version to be published. Roanne Preston: conception anddesign, critical revision of the article critically for important intellectualcontent; final approval of the version to be published. Carl van Walrevan:conception and design, critical revision of the article critically for importantintellectual content; final approval of the version to be published. AlanForster: conception and design, acquisition of data, critical revision of thearticle critically for important intellectual content; final approval of theversion to be published.FundingThis study was supported by the Ontario Research Fund (RE05\u2013070) andCanadian Institutes of Health Research (CIHR) grant. The study design,opinions, results and conclusions reported in this paper are those of theauthors and are independent from the funding sources.Availability of data and materialsThe data that support the findings of this study are available at the Institute forClinical Evaluative Sciences (ICES) (www.ices.on.ca\/DAS), but restrictions applyfor the availability of these data, which were used under license for the currentstudy, and so are not publicly available. Data are however available from theauthors upon reasonable request and with permission of ICES.DeclarationsEthics approvalThis study was approved by the Ottawa Health Science Network ResearchEthics Board.Consent to participateNot applicable.Petrosyan et al. BMC Medical Research Methodology          (2021) 21:179 Page 10 of 11Consent for publicationICES is a prescribed entity under section 45 of Ontario\u2019s Personal HealthInformation Protection Act. Section 45 authorizes ICES to collect personalhealth information, without informed consent, for the purpose of analysis orcompiling statistical information with respect to the management, evaluationor monitoring of the allocation of resources to or planning for all or part ofthe health system. Projects conducted under section 45, by definition, donot require review by a Research Ethics Board. This project was conductedunder section 45 and approved by ICES\u2019 Privacy and Legal Office.Competing interestsNo researcher involved in this study had any declared or otherwise knownconflicts of interest.Author details1Clinical Epidemiology, Ottawa Hospital Research Institute, 1053 Carling Ave,Ottawa, Ontario K1Y 4E9, Canada. 2School of Epidemiology and PublicHealth, University of Ottawa, 75 Laurier Ave E, Ottawa, Ontario K1N 6N5,Canada. 3Institute for Clinical and Evaluative Sciences, 1053 Carling Ave,Ottawa, Ontario K1Y 4E9, Canada. 4The Ottawa Hospital - General Campus,501 Smyth Road, PO Box 201B, Ottawa, ON K1H 8L6, Canada. 5Department ofAnesthesiology, Pharmacology and Therapeutics, University of BritishColumbia, Vancouver, British Columbia V6T 1Z4, Canada. 6Department ofMedicine, University of Ottawa, 75 Laurier Ave E, Ottawa, Ontario K1N 6N5,Canada.Received: 28 November 2020 Accepted: 28 July 2021References1. Pittet D, Harbarth S, Ruef C, Francioli P, Sudre P, Petignat C, et al. Prevalenceand risk factors for nosocomial infections in four university hospitals inSwitzerland. Infect Control Hosp Epidemiol. 1999;20(1):37\u201342.2. Petrosyan Y, Thavorn K, Maclure M, Smith G, McIsaac DI, Schramm D, et al.Long-term health outcomes and health system costs associated withsurgical site infections: a retrospective cohort study. Ann Surg. 2019.3. Jenks PJ, Laurent M, McQuarry S, Watkins R. Clinical and economic burdenof surgical site infection (SSI) and predicted financial consequences ofelimination of SSI from an English hospital. J Hosp Infect. 2014;86(1):24\u201333.4. Whitehouse JD, Friedman ND, Kirkland KB, Richardson WJ, Sexton DJ. Theimpact of surgical-site infections following orthopedic surgery at acommunity hospital and a university hospital: adverse quality of life, excesslength of stay, and extra cost. Infect Control Hosp Epidemiol. 2002;23(4):183\u20139.5. Badia JM, Casey AL, Petrosillo N, Hudson PM, Mitchell SA, Crosby C. Impactof surgical site infection on healthcare costs and patient outcomes: asystematic review in six European countries. J Hosp Infect. 2017;96(1):1\u201315.6. van Walraven C, Jackson TD, Daneman N. Derivation and validation of thesurgical site infections risk model using health administrative data. InfectControl Hosp Epidemiol. 2016;37(4):455\u201365.7. Grammatico-Guillon L, Baron S, Gaborit C, Rusch E, Astagneau P. Qualityassessment of hospital discharge database for routine surveillance of hipand knee arthroplasty-related infections. Infect Control Hosp Epidemiol.2014;35(6):646\u201351.8. Rennert-May E, Manns B, Smith S, Puloski S, Henderson E, Au F, et al. Validityof administrative data in identifying complex surgical site infections from apopulation-based cohort after primary hip and knee arthroplasty in Alberta,Canada. Am J Infect Control. 2018;46(10):1123\u20136.9. Sands K, Vineyard G, Livingston J, Christiansen C, Platt R. Efficientidentification of postdischarge surgical site infections: use of automatedpharmacy dispensing information, administrative data, and medical recordinformation. J Infect Dis. 1999;179(2):434\u201341.10. van Walraven C, Jackson TD, Daneman N. Administrative data measuredsurgical site infection probability within 30 days of surgery in elderlypatients. J Clin Epidemiol. 2016;77:112\u20137.11. Song X, Cosgrove S, Pass M, Perl T. Using hospital claim data to monitorsurgical site infections for inpatient procedures. Am J Infect Control. 2008;36(3).12. Cohen AM, Ambert K, McDonagh M. A prospective evaluation of anautomated classification system to support evidence-based medicine andsystematic review. AMIA Annu Symp Proc. 2010;2010:121\u20135.13. Szlosek DA, Ferrett J. Using Machine Learning and Natural LanguageProcessing Algorithms to Automate the Evaluation of Clinical DecisionSupport in Electronic Medical Record Systems. EGEMS (Wash DC). 2016;4(3):1222.14. Maroco J, Silva D, Rodrigues A, Guerreiro M, Santana I, de Mendonca A.Data mining methods in the prediction of dementia: a real-datacomparison of the accuracy, sensitivity and specificity of linear discriminantanalysis, logistic regression, neural networks, support vector machines,classification trees and random forests. BMC Res Notes. 2011;4:299.15. Douglas PK, Harris S, Yuille A, Cohen MS. Performance comparison ofmachine learning algorithms and number of independent componentsused in fMRI decoding of belief vs. disbelief. Neuroimage. 2011;56(2):544\u201353.16. Li J, Alvarez B, Siwabessy J, Tran M, Huang Z, Przeslawski R, et al. Applicationof random forest, generalised linear model and their hybrid methods withgeostatistical techniques to count data: predicting sponge species richness.Environ Model Softw. 2017;97:112\u201329.17. Li J, Tran M, Siwabessy J. Selecting optimal random Forest predictivemodels: a case study on predicting the spatial distribution of seabedhardness. PLoS One. 2016;11(2):e0149089.18. Bartz-Kurycki MA, Green C, Anderson KT, Alder AC, Bucher BT, Cina RA, et al.Enhanced neonatal surgical site infection prediction model utilizingstatistically and clinically significant variables in combination with amachine learning algorithm. Am J Surg. 2018;216(4):764\u201377.19. Breiman L. Random Forests Machine Learning. 2001;45:5\u201332.20. Touw WG, Bayjanov JR, Overmars L, Backus L, Boekhorst J, Wels M, et al.Data mining in the life sciences with random Forest: a walk in the park orlost in the jungle? Brief Bioinform. 2013;14(3):315\u201326.21. Liam A, Wiener M. Classification and regression by random forest. R News.2002;2(3):18\u201322.22. Ozcift A. Enhanced cancer recognition system based on random forestsfeature elimination algorithm. J Med Syst. 2012;36(4):2577\u201385.23. Diaz-Uriarte R, Alvarez de Andres S. Gene selection and classification ofmicroarray data using random forest. BMC Bioinformatics. 2006;7:3.24. Doerken S, Avalos M, Lagarde E, Schumacher M. Penalized logisticregression with low prevalence exposures beyond high dimensionalsettings. PLoS One. 2019;14(5):e0217057.25. Yao D, Yang J, Zhan X. A novel method for disease prediction: hybrid ofrandom Forest and multivariate adaptive regression splines. J Comput. 2013;8(1):170\u20137.26. Cohen R. \u201cSAS Meets Big Iron: High Performance Computing in SASAnalytical Procedures,\u201d in Proceedings of the Twenty-Seventh Annual SASUsers Group International Conference. Cary, NC: SASInstitute Inc.; 2002.27. Sullivan LM, Massaro JM, D'Agostino RB Sr. Presentation of multivariate datafor clinical use: the Framingham study risk score functions. Stat Med. 2004;23(10):1631\u201360.28. Wu C, Hannan EL, Walford G, Ambrose JA, Holmes DR Jr, King SB 3rd, et al.A risk score to predict in-hospital mortality for percutaneous coronaryinterventions. J Am Coll Cardiol. 2006;47(3):654\u201360.29. Hong W, Lillemoe KD, Pan S, Zimmer V, Kontopantelis E, Stock S, et al.Development and validation of a risk prediction score for severe acutepancreatitis. J Transl Med. 2019;17(1):146.30. Austin PC, Lee DS, D'Agostino RB, Fine JP. Developing points-based risk-scoring systems in the presence of competing risks. Stat Med. 2016;35(22):4056\u201372.31. Luo W, Phung D, Tran T, Gupta S, Rana S, Karmakar C, et al. Guidelines forDeveloping and Reporting Machine Learning Predictive Models inBiomedical Research: A Multidisciplinary View. J Med Internet Res. 2016;18(12):e323-e.32. Streiner DL, Cairney J. What's under the ROC? An introduction to receiveroperating characteristics curves. Can J Psychiatr. 2007;52(2):121\u20138.Publisher\u2019s NoteSpringer Nature remains neutral with regard to jurisdictional claims inpublished maps and institutional affiliations.Petrosyan et al. BMC Medical Research Methodology          (2021) 21:179 Page 11 of 11","@language":"en"}],"Genre":[{"@value":"Article","@language":"en"}],"IsShownAt":[{"@value":"10.14288\/1.0402417","@language":"en"}],"Language":[{"@value":"eng","@language":"en"}],"PeerReviewStatus":[{"@value":"Reviewed","@language":"en"}],"Provider":[{"@value":"Vancouver : University of British Columbia Library","@language":"en"}],"Publisher":[{"@value":"BioMed Central","@language":"en"}],"PublisherDOI":[{"@value":"10.1186\/s12874-021-01369-9","@language":"en"}],"Rights":[{"@value":"Attribution 4.0 International (CC BY 4.0)","@language":"en"}],"RightsURI":[{"@value":"http:\/\/creativecommons.org\/licenses\/by\/4.0\/","@language":"en"}],"ScholarlyLevel":[{"@value":"Faculty","@language":"en"},{"@value":"Researcher","@language":"en"}],"Subject":[{"@value":"Surgical site infection","@language":"en"},{"@value":"Administrative data","@language":"en"},{"@value":"Machine learning","@language":"en"},{"@value":"Random forests","@language":"en"},{"@value":"Data mining","@language":"en"},{"@value":"Predictive modeling","@language":"en"}],"Title":[{"@value":"Predicting postoperative surgical site infection with administrative data: a random forests algorithm","@language":"en"}],"Type":[{"@value":"Text","@language":"en"}],"URI":[{"@value":"http:\/\/hdl.handle.net\/2429\/79879","@language":"en"}],"SortDate":[{"@value":"2021-08-28 AD","@language":"en"}],"@id":"doi:10.14288\/1.0402417"}