UBC Faculty Research and Publications

The comparative recall of Google Scholar versus PubMed in identical searches for biomedical systematic… Bramer, Wichor M; Giustini, Dean; Kramer, Bianca M; Anderson, PF Dec 23, 2013

Your browser doesn't seem to have a PDF viewer, please download the PDF to view this item.

Item Metadata


52383-13643_2013_Article_178.pdf [ 212.03kB ]
JSON: 52383-1.0223292.json
JSON-LD: 52383-1.0223292-ld.json
RDF/XML (Pretty): 52383-1.0223292-rdf.xml
RDF/JSON: 52383-1.0223292-rdf.json
Turtle: 52383-1.0223292-turtle.txt
N-Triples: 52383-1.0223292-rdf-ntriples.txt
Original Record: 52383-1.0223292-source.json
Full Text

Full Text

METHODOLOGY Open AccessThe comparative recall of Google Scholar versusPubMed in identical searches for biomedicalsystematic reviews: a review of searches usedin systematic reviewsWichor M Bramer1*, Dean Giustini2, Bianca MR Kramer3 and PF Anderson4AbstractBackground: The usefulness of Google Scholar (GS) as a bibliographic database for biomedical systematic review(SR) searching is a subject of current interest and debate in research circles. Recent research has suggested GSmight even be used alone in SR searching. This assertion is challenged here by testing whether GS can locateall studies included in 21 previously published SRs. Second, it examines the recall of GS, taking into account themaximum number of items that can be viewed, and tests whether more complete searches created by aninformation specialist will improve recall compared to the searches used in the 21 published SRs.Methods: The authors identified 21 biomedical SRs that had used GS and PubMed as information sources andreported their use of identical, reproducible search strategies in both databases. These search strategies were rerunin GS and PubMed, and analyzed as to their coverage and recall. Efforts were made to improve searches thatunderperformed in each database.Results: GS’ overall coverage was higher than PubMed (98% versus 91%) and overall recall is higher in GS: 80% ofthe references included in the 21 SRs were returned by the original searches in GS versus 68% in PubMed. Only72% of the included references could be used as they were listed among the first 1,000 hits (the maximum numbershown). Practical precision (the number of included references retrieved in the first 1,000, divided by 1,000) was onaverage 1.9%, which is only slightly lower than in other published SRs. Improving searches with the lowest recallresulted in an increase in recall from 48% to 66% in GS and, in PubMed, from 60% to 85%.Conclusions: Although its coverage and precision are acceptable, GS, because of its incomplete recall, should not beused as a single source in SR searching. A specialized, curated medical database such as PubMed provides experiencedsearchers with tools and functionality that help improve recall, and numerous options in order to optimize precision.Searches for SRs should be performed by experienced searchers creating searches that maximize recall for as manydatabases as deemed necessary by the search expert.Keywords: Bibliographic databases, Information retrieval, Systematic reviews, Methodology, Literature searching,Reproducibility* Correspondence: w.bramer@erasmusmc.nl1Erasmus MC - University Medical Center Rotterdam, Medical Library, PO Box2040, 3000 CA, Rotterdam, The NetherlandsFull list of author information is available at the end of the article© 2013 Bramer et al.; licensee BioMed Central Ltd. This is an Open Access article distributed under the terms of the CreativeCommons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, andreproduction in any medium, provided the original work is properly cited. The Creative Commons Public Domain Dedicationwaiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwisestated.Bramer et al. Systematic Reviews 2013, 2:115http://www.systematicreviewsjournal.com/content/2/1/115BackgroundFor several years, information specialists have discussedwhich databases (and how many) should be used toperform exhaustive searches of the literature. Prior to2004, the year of Google Scholar’s release, these discussionsfocused primarily on traditional databases such as Embaseand MEDLINE [1,2]. Further, the general consensus haddeveloped that searching a limited number of databaseswas insufficient where completeness was the goal [3-6].In addition, the type of searching that is required tosupport systematic reviews (SRs) is more complex andtime-consuming than searching for simple clinical queries.The demands placed on a searcher for the SR are muchhigher than for other searches because of the specificrequirements of the SR [7] which is integral to scienceand must therefore be performed systematically, andmade repeatable, verifiable and accountable [8,9].Since 2004, Google Scholar (GS) has been widely-usedto locate specific items and aid in cumulating the scholarlyliterature. In 2005, Giustini [10] stated that GS producedacceptable results for browsing routines but results oflow precision meant that its use for other searchingwas problematic. Since then, GS has improved its scope;from 2005 to 2012, its coverage of the literature rosefrom 30 to 88% to 98 to 100% [11,12]. An importantunanswered question about GS remains: ‘Is GS advancedenough in its development to replace more sophisticatedtools such as PubMed or Embase?’In 2007, Shultz [13] provided an overview of criticismof GS that had been generated since its debut in 2004.Unfortunately, many of the original shortcomings identi-fied between GS and traditional bibliographic databasessuch as MEDLINE and Embase are still in evidence:GS lacks a controlled vocabulary, search histories andsets cannot be built and manipulated and wildcardsand limits (for instance study types) cannot be usedprecisely. Only the first 1,000 citations of any searchin GS are viewable and search strings must be keptunder 256 characters.Since 2004, a number of studies have examined thevalue of GS in biomedical searching. Falagas, Pitsouni,et al. [14] compared four databases including GS andPubMed, and concluded that GS retrieved more obscureitems than other search tools. Anders and Evans [15]focused on using advanced searching in GS and PubMedbut, given major differences in the databases, their studymade true comparisons difficult. In Nourbakhsh, Nugent,et al. [16], researchers found that the first 20 results in GSoften produced more relevant hits than similar searchesin PubMed. But since PubMed, until very recently, listscitations in chronological order (not by algorithms, asin GS) the authors’ conclusions are counter-intuitive. Thefrequently cited study by Walters [17] covered one topic(on older person migration), which is out-of-scope in amedical database. A recent study by Shariff, Bejaimal, et al.[18] compared search strategies designed by end usersand compared the first 40 hits in GS and PubMed.In 2013, Gehanno, Rollin, et al. [19] published a paperthat generated important critical discussion of the valueof GS. In their paper, Gehanno et al. used GS to locateall studies originally cited in a published SR. By findingall known items, the authors argued that GS, after someimprovements to increase its search precision, could beused alone in searching for SRs.The article by Gehanno et al. drew much attention toGS and resulted in some follow-up articles. Giustini andKamel Boulos [20] argued that a ‘known-item’ searchingis a very different activity than locating articles by sub-ject, as is attaining 100% recall of the (mostly unknown)relevant literature. Boeker, Vach, et al. [21] reinvestigatedthe results of Gehanno et al. using search strategiesdesigned to match Medline strategies used for Cochranesystematic reviews. However, the authors designed thesearches themselves and failed to account for the maximumnumber of results that can be retrieved in GS (1,000),although they mentioned this limitation in their manu-script. The low precision in GS as reported by Gehannoet al. and by Boeker et al., is mainly an artifact due tothe large number of hits that are reported by GS. Sincesearch results cannot be viewed beyond the first 1,000references, actual precision in GS should be calculatedas the number of relevant references found in those first1,000 references, divided by the number of hits that canactually be viewed, which is 1,000 at most.The research done on GS for SRs is limited in method-ology, which is crucial for evaluating GS for SR searching. InTable 1 we critique in more detail earlier research on GS.Though it seems unlikely that an experienced informa-tion specialist would use GS as the sole database in a SR, aless experienced researcher, faced with the enormous taskof performing a review without expert help, might betempted to do so (based on the aforementioned research).At least one review is known that, after doing preliminaryTable 1 Limitations of current published research on theusability of Google Scholar for medical purposesLimitations of research ReferencesNot testing for systematic reviews [13-16,18,22-26]Limited number of searches [14-16,24-26]Relevancy of results only determined by the authors [13,14,16,23,25,26]Not reviewing the first 1,000 hits in Google Scholar [15,16,18,19,21,22]Only using searches designed by the authors [14-16,21,22]Searches not comparable between the databases [15,16,23,25]Published more than five years ago [13,14,23,24]Only searching for known items [13,14,19]Only looking at coverage, not retrieval [19]Bramer et al. Systematic Reviews 2013, 2:115 Page 2 of 9http://www.systematicreviewsjournal.com/content/2/1/115searches in a wide range of databases decided ultimatelyto use only PubMed and GS, but failed to notice theirsearch strategy was not executable in GS, since it was over500 characters long [27].In this paper, the usability of GS in searching for SRsis considered, where relevancy is pre-determined by in-clusion in papers that have been previously published.Both the original (identical) topical searches reported bythose papers and searches improved by an informationspecialist are used in order to compare the recall withinGS and PubMed. The aim of this paper is to discoverwhether the original authors would have found all in-cluded references by using GS only. When studies fromthe original SRs were not found, it is assessed whether amore exhaustive search strategy created by an informationspecialist would improve recall. Given its potential forone-stop searching, we assess whether GS can indeedreplace the multiple databases required for the SR andlocate all studies needed to conduct a SR.MethodsIn May 2013, PubMed and Embase were searched usingthe exact phrases ‘systematic review’ and ‘google scholar’in title and/or abstract fields. Of the records identified,the full-text of relevant papers was retrieved on the openweb or via subscriptions at the first author’s institution.The full-text and appendices of available articles werescanned for descriptions of the strategies used to searchPubMed and GS.Articles that clearly described identical search strategieswere investigated further. The queries as performed in theinitial searches in the SRs were recreated. If the SRs didnot discuss a medical topic, the review was excluded as itwas unlikely that PubMed would have been viewed as avaluable database in those instances. When the length of areproduced search exceeded the maximum query lengthallowable in GS (256 characters) the review was alsoexcluded. All inclusion and exclusion criteria are sum-marized in Table 2.Searches that were reproducible were executed in bothGS and PubMed, and the number of hits was documentedaccordingly. In PubMed, results were limited to before theMeSH date (field: [mhda]) of the original search date asstated in the article. The number of hits in PubMed wascompared with the number originally reported (either forPubMed, or the total for all databases). If it did not exceedthat number, the SR was included. In GS, search resultswere limited to the publication start year used by theoriginal authors and the end publication year of thesearch date. Because publication dates can differ fromsearch dates (because publication dates are generallyadded to the print version, while the electronic versionmight be available longer) we checked whether the list ofincludes contained articles with newer publication dates,and if so, changed the publication limits accordingly.For each replicated search, the first 1,000 results of GSwere saved in a Word document. Using the ‘find’ functionin Word, occurrences of each included reference from theoriginal SR were identified. Distinctive fragments of thetitle were searched but where no match was located,author names were searched. If a citation was not foundamong the first 1,000 results, GS coverage for that itemwas checked using author names and part of the titlebetween double quotes. If the item was indeed presentin GS, the reference was checked for retrieval in thesearch query (beyond the first 1,000 hits) by combiningauthor names and distinctive title words with the fullquery (to check whether they had ranked low on Google’sPageRank algorithm).We did not exclude hits that were citations only (bydefinition) but for those references, it was checked whetherthe citing articles, as linked in GS, were published beforethe citing systematic review. It could then be assumed thatthe citation was present in GS when the original authorsperformed their searches. If the citation was only present inarticles with a more recent publication year, the result wasconfidently discarded.All included studies were searched in PubMed bysearching for the complete reference. If PubMed did notreveal a match, a second attempt was performed using acombination of first author [1au], page number [pg] andpublication year [dp]. Included references were collectedusing the PubMed Clipboard. Once all references wereretrieved, clipboard contents were checked against theresults of the replicated search.The intention of this project was not to judge thequality of the replicated searches. In a later stage, weimproved some searches to investigate whether morecitations could be found. An experienced informationspecialist (WB) created improved search strategies forGS based on the original authors’ description of theirresearch question, without taking into account theTable 2 Inclusion and exclusion criteriaInclusion criteria Exclusion criteriaSystematic review in a medical topic Length of search strategy greater than 256 charactersReporting the use of both Google Scholar and PubMed Number of hits retrieved in PubMed exceeds the reportedtotal number of hits reviewedReporting in reproducible detail an identical single phrase search for these databasesBramer et al. Systematic Reviews 2013, 2:115 Page 3 of 9http://www.systematicreviewsjournal.com/content/2/1/115included references from that SR. A second searchstrategy was designed based on the frequency of wordsin the titles of included references of these SRs. For thesearches that had missed the most included referencesin PubMed, an information specialist created a morecomprehensive search strategy using MeSH terms andfree text, without using the included references to de-termine search words.ResultsOf the 578 SRs retrieved, the full-text was obtained for453 articles. A total of 84 articles described in enoughdetail identical searches that could be rerun in PubMedand GS. Eight articles were excluded because their searchstrategies exceeded the maximum search length allowedby GS (256 characters). Twenty articles were excludedbecause they made no mention of the number of hitsretrieved. Two articles were excluded because the topicwas non-medical and therefore their search strategiesreturned no results in PubMed. Seven articles wereexcluded because the authors used multiple search queriesand 24 others were excluded because the numbersreported (for PubMed or total) did not match numberof hits retrieved for the replicated searches. See Figure 1for a flow diagram of the in- and exclusion procedure. Forone article the list of included references contained threereferences from beyond the search year, thus we decidedto expand the publication date limits with one year.In 21 articles, the cited searches for both GS andPubMed were identical, well-documented, and the numberof hits in PubMed did not exceed the number of hitsfirst reported. Additional file 1 describes the originaland replicated searches along with other parametersand the detailed results. In eleven cases, the searchesused in this research were exactly the same as thosedescribed in the full text or appendices of the studies.In ten instances, some minor changes had to be made.In some cases, Boolean operators (AND and OR) werenot stated and nesting was not clearly laid out; propersearches cannot be performed without operators but whatwas intended was clear. If major changes were required, thereviews had been excluded. For five references retrieved ascitations, the citing articles all had a publication datelater than the original search date, so these citations wereignored. A full list of references to all SRs included in thisarticle can be found in Additional file 2.CoverageThe total number of studies included by the SRs was 541.In GS, ten studies were not present, thus the overall cover-age of GS reached 98%. In PubMed, 48 references werenot present, so PubMed had an overall coverage of 91%.RecallOf the total number of included studies that were reviewed(541), 389 (72%) were present in the first 1,000 hits ofthe original searches in GS. Forty-five articles had beenretrieved by the search strategy in GS, but were notamong the first 1,000 hits. If GS had allowed its users toreview all search results, recall would thus have been 80%.The same searches retrieved 369 hits in PubMed (68%).PrecisionAs shown in Figure 2, practical precision in GS has anaverage of 1.9% and a median of 1.7%. Average precisionfor SRs, according to Sampson, Tetzlaff, et al. [28] isaround 2.9%. The practical precision in GS for the searchesobserved in this article is slightly below the reported aver-age but 1.9% is nonetheless quite acceptable for SR search-ing, where, in order to be complete, researchers mustbrowse through irrelevant hits to find important references.Improvement of search strategies in Google Scholar163 included studies were not present in the first 1,000hits of the original searches in GS. For five SRs whereNo full text available125 excludedFull text scanned search strategy453 Strategy GS not mentioned or inequal to PubMed371 excludedSearches recreated and tested82 Search string > 256 characters, numbers not replicable or not mentioned61 excludedIncluded21Articles retrieved from databases578Figure 1 Flow diagram of reviewed articles. Bramer et al. - therecall of Google Scholar is insufficient.1981 1 10% 1% 2% 3% 4% 5% 6% 7%Observed practical precisionFigure 2 Practical precision of Google Scholar. Bramer et al. - therecall of Google Scholar is insufficient.Bramer et al. Systematic Reviews 2013, 2:115 Page 4 of 9http://www.systematicreviewsjournal.com/content/2/1/115GS had missed more than ten included references (intotal 110), we tried to improve the search strategies. Usingthe first improved search strategies, created without takinginto account the included studies, retrieval in GS for thesefive SRs increased from 53% to 60%. The search strategiesdesigned to capture as much included studies as possible(when the search strategies were based on words in thetitle of the included studies) resulted in 66% retrieval forthese five SRs (Table 3). The improved search strategiesfor GS are available in Additional file 3.Improvement of search strategies in PubMedOf the references found in PubMed, 124 were not re-trieved by replicating the original searches. Of these, 111were included by the seven SRs in Table 4. The otherSRs each had less than three included references thatwere not found in PubMed. We tried to create bettersearches for the research questions of these seven SRs tosee if this would increase retrieval in PubMed. Using theimproved search strategies increased retrieval for theseven SRs from 61% to 85%. The improved strategies forPubMed are available in Additional file 4.DiscussionLiterature searching in multiple databases can often becumbersome and is always time-consuming if it is donewell. GS offers an easy-to-use, familiar interface andrelevance ranking, making simple searching for a fewgood articles much easier. However, the use of GS as arobust search tool is not without its challenges.To focus on the differences between the databases, theresearch was restricted to SRs that used both GS andPubMed with identical search strategies. Although thisresearch represents a small sample of all published SRs,and is not representative, we nonetheless believe ourfindings to be of indicative of a trend. Had SRs beenselected that did not describe their GS search strategies, itwould have been necessary to create them which was notthe purpose of this research. If SRs reported non-identicalsearches for GS and PubMed, the findings would bereviewing the ability of the original reviewers to translatetheir searches, which again was not the intention.One of the most challenging aspects is the frequencywith which Google changes its functionality withoutgiving any prior notice to its users. In 2012, GS changedits advanced searching features and removed the abilityto limit results to specific domains and disciplines (forinstance medicine). In March 2013, GS reduced themaximum number of articles it can show per page from100 to 20, and in June the tilde operator, that was veryusable to search for synonyms (frequently used as areplacement for truncation), was removed from the regu-lar search engine (google.com), at the moment it is stillavailable in GS. These changes have a major impact onsearching, and when users asked whether some of thesefeatures would be reinstated, Google said little [29]. Evenmore threatening is the fact that since the end of 2011, GShas disappeared from the menu of regular Google, thusmaking it harder to find for those users who do notalready know of its existence, although results from GSand a link to a search in GS (‘Scholarly articles for…’)often appear in the search results of regular Google.Because of Google's tendency to shut down applications itconsiders less frequently used (like recently Google ReaderTable 3 Systematic reviews of which more than ten included references were not retrieved in Google Scholar;performance of improved searchesNumber ofincluded referencesNumber of includes retrieved byAuthors’ search Improved search #1 Improved search #2Hasani 78 26 26 37Novak 30 10 12 12Verhoeven 89 72 75 72Navarese 17 6 13 16Belsey 20 10 15 16234 124 (53%) 141 (60%) 154 (66%)Table 4 Systematic reviews that contributed most to the‘not retrieved articles’ in PubMed; performance ofimproved searchesNumberofincludedreferencesNumber of includes retrieved byAuthors’ search Improved searchJavan 68 29 61Hasani 78 51 67Verhoeven 89 67 81Navarese 17 5 5Novak 30 19 26Gupta 16 8 12Hardefeldt 26 18 23324 197 (61%) 275 (85%)Bramer et al. Systematic Reviews 2013, 2:115 Page 5 of 9http://www.systematicreviewsjournal.com/content/2/1/115in June 2013), this might be a threat for the continuity ofGS. And if GS were to be shut down, this would be amajor threat to the replicability of the methodology of theSRs that were performed with a GS search.Many published SRs report on the total number ofhits in the databases they have used. However, the ratiobetween the number of hits we retrieved in PubMedand the total number of hits that were reported by theoriginal authors varied a lot. In one case the reportedtotal was 108 times higher than the number we foundin PubMed, while for many other SRs the ratio betweennumbers reported and what was retrieved was equal toone. Some authors opted to report the total number ofcitations found in GS, while others took into accountthe first 1,000, or just reported the number they feltwas necessary to view. Still, others ignored number ofhits in GS in their reporting and counted only relevant,unique hits. Consequently, the resulting numbers in oursearches were at odds with what the original authorsreported in their published reviews. It is recommendedthat authors of SRs that use GS as a primary source reportonly those hits that were actually reviewed from GS.Search reproducibility was low due to inaccurate orincomplete reporting of search strategies. Many papersthat were examined referred to their search strategiesby listing the keywords and Boolean operators used inan illogical order [30]. Even in cases where searcheswere explicitly stated, the number of hits did not matchthe number of hits retrieved using the exact limitersand search parameters. To ensure transparency andreproducibility, authors of SRs should take care to fol-low the guidelines in the PRISMA Statement [7] forreporting search strategies. This states that the numberof studies screened should be stated and not the numbera database claims to have found.Importing all references into reference managementsoftware is now a standard feature in bibliographic data-bases such as PubMed and Embase. GS does not offersuch a feature. With Zotero, all results from one page canbe imported from GS. However, recently the maximumnumber of hits shown per page changed from 100 to20, making downloading the full set of hits more timeconsuming. When the authors of this article used Zoteroto import the contents of a single search into Endnote,after downloading 200 references a ‘Captcha’ was shown,as Google had detected that our ‘computer or networkmay be sending automated queries’. GS seemed disin-clined to provide the flexibility required to properly searchthe literature for the SRs.We observed that many items were found because GSindexes content beyond the abstract into the full-text ofarticles, including their references. Excerpts often showa part of the article containing the reference list with thesearched words found in titles of referred articles. Whensearching for included references in the first 1,000,numerous false hits were encountered showing theincluded articles in the reference list of other retrievedarticles. GS seems to perform citation tracking forarticles by using search words in the title. This accountsfor much of the extra hits (or ‘noise’) in GS’ results.One limitation in this research (as in all retrospectiveresearch involving GS) is one can never be certain of GS’coverage at any specific point in time, which is a seriousproblem for searchers. We performed searches severalyears after the original reviews were performed (averagesearch date was September 2010 but ranged from Jan2007 to October 2012, while we searched GS in May/October 2013). Although we limited GS searches to thepublication year of the original search date, the resultswill probably differ from those retrieved by the originalauthors. Not surprisingly, the number of hits retrievedduring our searching was often at odds with what wasoriginally reported. Since replicable searching is essentialin performing research, the inability to reproduce searchesin GS severely limits its value to researchers. Bibliographicdatabases such as PubMed offer additional database man-agement dates next to publication dates. These databaseskeep track of content, and note when changes are made.In PubMed using the field mesh date ([mhda]), one can berather certain what a search result would have been ona given date. GS lacks these management dates, andonly offers publication dates. Because of the absence ofa clear date restriction feature, replicability, the searchprocess (which is crucial to SR searching) is very is ren-dered far more problematic than would be the case incurated databases such as Medline and Embase.We limited our searches to the publication dates of thelast search date. However, this reduced the reported hitssubstantially. Even when a theoretical limit to publicationyears 1800 to 2099 was applied, the number of hitsdropped. In simple queries, this seemed to affect onlythe number of hits, but not the resulting references.The first 1,000 articles remain largely the same, makingthis a good replacement for date limits in bibliographicdatabases. However, on the improved search strategiesthat were more complex, the number of hits reportedoften dropped a factor ten, even with the theoretical limitto all publications dates. Resulting references differedimmensely, and the first 1,000 hits hardly contained anyof the included references. Therefore, for the improvedsearches, publication date limits were not used. This is anewly identified problem when using GS for SR searching.Though it was not our intention to judge the qualityof the searches of the SRs, we believe that the quality ofthe searches was poor as they often only combined a fewwords with hardly any synonyms. This is of course alsodue to the selection process: if an identical search isused in GS, none of the more sophisticated tools ofBramer et al. Systematic Reviews 2013, 2:115 Page 6 of 9http://www.systematicreviewsjournal.com/content/2/1/115PubMed (for instance MeSH terms, field codes andtruncation) could have been used.Improving search strategies is a major challenge inGS. As GS cannot store search histories, it is mostlyimpossible to build multi-set queries or evaluate changesmade to search queries. Search strings are momentarilylimited to 256 characters, and searchers have to selecttheir keywords accordingly. This is further complicated bythe fact that GS does not allow truncation as required bysearchers. A feature that could replace truncation inGS is the tilde (~), which automatically searches forword variants. Although useful, the feature does notwork in combination with Boolean operators like OR, andtherefore cannot be used in exhaustive search queries.In addition, the feature was recently deprecated fromGoogle's regular search engine, making its continuedavailability in GS uncertain. Using | instead of OR andby simply leaving out ANDs it is possible to use asmany of those characters for search words as possible.Finally, GS has no feature for proximity or adjacencysearching; parentheses can be used to search word vari-ants in a double-quoted phrase ("(myocardial|heart)(infarct|attack)"), as well as asterisks (*), but the num-ber of asterisks used marks the exact number of wordsallowed, where proximity searching in other databasesgenerally describes a range of words. These missing featuresmake a translation of a proper search query as designed forother databases difficult if not impossible. The limitationsexperienced in this research are presented in Table 5.Though we wanted to improve the searches that returnedthe least included references, we could not draw conclu-sions about the effects of our improvements. We do notknow, for example, if the authors missed important refer-ences in their original searches. We assume that within theextra hits retrieved, extra relevant studies might have beenfound. Because GS was unable to retrieve all articles foundby the authors, its results cannot be considered complete.ConclusionAs we've shown, the coverage and precision of GS areacceptable. Although coverage is not 100%, for manyinvestigators 98% might suffice for simple literature ornarrative reviews of a topic. The overall precision of GS(using the total reported number of hits as denominator)is rather low, but the practical precision, calculated by thenumber of relevant hits in the first 1,000, with 1,000 as de-nominator is theoretically acceptable for SR searching, andhighly dependent on the number of included references.Our a priori question was ‘Is GS’s recall sufficient tobe used on its own in systematic review (SR) searching?’Overall retrieval in GS is 72%, which is too low for it tobe used as a single database for the SR. PubMed faredsimilarly at 68%. The creation of better searches in GSproved to be a constant challenge. A high of 66% recallwas achieved for the five searches that initially missedthe most references from the SRs. In PubMed, our im-proved searches reached 85% recall. Neither database wasTable 5 Comparison of Google Scholar and PubMed in systematic review searchingGoogle Scholar PubMedShows up to 1,000 results Shows all resultsSearches the full text of the article, and words on the webpage Searches only bibliographic data and controlled vocabulary (MeSH terms)No controlled vocabulary available Controlled vocabulary (MeSH terms) added by skilled indexers andsearchable (including ‘explode’)Searches the broad aspect of science (filters limiting results tomedical articles were removed)Only contains articles on medical topicsNo search history available (unable to compare or combine record sets) Detailed search history available, flexibility in combining record sets tocreate complicated search strategiesSearch queries limited to 256 characters No limits on the length of search queriesNo truncation allowed. Tilde can be used to search for variants, butcannot be used in OR relationship with other words. GS is said to searchfor word variants, but this is very rare and the mechanism is unclear.Truncation allowedPossibly automatic searching for synonyms (details unclear) Automatic Term Mapping (details available)Field names only for title (complete query) and author names Field names for many fields can be assigned per synonymNo advanced limits (for publication type, human studies and so on) Multiple advanced limits in the database itself, or available fromthird partiesCannot accurately limit to search dates (no controlled updates) Different date fields available to limit searches to results before acertain dateCannot download results in bulk to reference management software Multiple options to download the complete results set to referencemanagement softwareProximity searching only with exact order and exact number ofconnecting wordsNo proximity search possibleBramer et al. Systematic Reviews 2013, 2:115 Page 7 of 9http://www.systematicreviewsjournal.com/content/2/1/115sufficient on its own to find all articles from previously-published SRs.Researchers from other disciplines might find results inGS that are ‘good enough’. Similarly, medical professionalsmight find using PubMed (or another bibliographic data-base) on its own is good enough for day-to-day searching.Some may even prefer GS for initial searching to find ‘afew good articles’ due to its excellent relevance ranking.However, SRs require a complete view of all existingliterature in a given area. This can only be achieved byperforming exhaustive searches of relevant databasesand websites in consultation with a trained informationspecialist. These important searches that support the SRmethodology must be repeatable, verifiable and account-able, which poses a problem with GS.There is therefore no reason why GS should be viewedas more suitable for performing SR searches than PubMed(or any other specialized database). We hope that ourresearch will inform future authors and guide their useof GS. Authors of future SRs should continue to useGS but in concert with multiple other databases, not asa replacement of other databases.Additional filesAdditional file 1: Search strategies replicated from included articlesand obtained results. A full table of the original description of thesearches performed by included SRs, the searches as we used them,restrictions of the original searches (that is, search date, and start year)and number of included references.Additional file 2: Systematic reviews included in this article.A reference list of all systematic reviews used in this article.Additional file 3: Improvement of search strategies in GoogleScholar. A description of the improved search strategies for GoogleScholar and the results obtained with them.Additional file 4: Improvement of search strategies in PubMed. Adescription of the improved search strategies for PubMed and the resultsobtained with them.AbbreviationsGS: Google Scholar; SR: Systematic review.Competing interestsThe authors declare that they have no competing interests. No funding hasbeen received for this research.Authors’ contributionsWB designed the research, reviewed the included SRs, collected the dataand optimized the final searches. DG, BK and PA provided feedback on theinitial results and interpretation of the data. WB prepared the manuscriptwith revisions from DG, BK and PA. All authors read and approved the finalmanuscript.Authors’ informationWB is an information specialist at Erasmus MC, Rotterdam UniversityAcademic Hospital. He holds a BSc in biology and information science. Heperforms exhaustive searches for more than 200 systematic reviews a year.DG is the UBC Biomedical Branch Librarian at Vancouver General Hospital inCanada. He holds MLS and MEd degrees.BK is an information specialist at Utrecht University Library, working at theUniversity Medical Center Utrecht. She holds a PhD in Neurobiology.PA is Emerging Technologies Librarian at the University of Michigan. Sheholds an MLIS and a BSc in psychology and music.AcknowledgementsThe authors thank Jacqueline Limpens (Information Specialist, AMCAmsterdam) for her insightful and critical comments on earlier drafts of thearticle. She insisted we should try to optimize the original, often suboptimalsearches in both databases for a thorough comparison, which added realscientific value to our article.Author details1Erasmus MC - University Medical Center Rotterdam, Medical Library, PO Box2040, 3000 CA, Rotterdam, The Netherlands. 2The University of BritishColumbia, UBC Biomedical Branch Library, Gordon and Leslie DiamondHealth Care Centre, 2775 Laurel Street, Floor 2, Vancouver, BC V5Z 1 M9,Canada. 3Utrecht University Library, PO Box 80125, 3508, TC Utrecht, TheNetherlands. 4University of Michigan, Taubman Health Sciences Library, 1135E Catherine St, Ann Arbor, MI 48109-5726, USA.Received: 12 November 2013 Accepted: 13 December 2013Published: 23 December 2013References1. Sampson M, Barrowman NJ, Moher D, Klassen TP, Pham B, Platt R, St JohnPD, Viola R, Raina P: Should meta-analysts search Embase in addition toMedline? J Clin Epidemiol 2003, 56:943–955.2. Watson RJ, Richardson PH: Identifying randomized controlled trials ofcognitive therapy for depression: comparing the efficiency of Embase,Medline and PsycINFO bibliographic databases. Br J Med Psychol 1999,72(Pt 4):535–542.3. Lemeshow AR, Blum RE, Berlin JA, Stoto MA, Colditz GA: Searching one ortwo databases was insufficient for meta-analysis of observational studies.J Clin Epidemiol 2005, 58:867–873.4. Topfer LA, Parada A, Menon D, Noorani H, Perras C, Serra-Prat M: Comparison ofliterature searches on quality and costs for health technology assessmentusing the MEDLINE and EMBASE databases. Int J Technol Assess Health Care1999, 15:297–303.5. Crumley ET, Wiebe N, Cramer K, Klassen TP, Hartling L: Which resourcesshould be used to identify RCT/CCTs for systematic reviews: a systematicreview. BMC Med Res Methodol 2005, 5:24.6. Betran AP, Say L, Gulmezoglu AM, Allen T, Hampson L: Effectiveness ofdifferent databases in identifying studies for systematic reviews:experience from the WHO systematic review of maternal morbidity andmortality. BMC Med Res Methodol 2005, 5:6.7. Moher D, Liberati A, Tetzlaff J, Altman DG: The Prisma Group: PreferredReporting Items for Systematic Reviews and Meta-Analyses: the PRISMAStatement. PLoS Med 2009, 6:e1000097.8. McGowan J, Sampson M: Systematic reviews need systematic searchers.J Med Libr Assoc 2005, 93:74–80.9. Higgins JPT, Green S: Cochrane Handbook for Systematic Reviews ofInterventions Version 5.1.0 [updated March 2011]. The CochraneCollaboration 2011. [http://www.cochrane-handbook.org].10. Giustini D: How Google is changing medicine. BMJ 2005, 331:1487–1488.11. Chen X: Google Scholar's dramatic coverage improvement five yearsafter debut. Ser Rev 2010, 36:221–226.12. Harzing AW: A longitudinal study of Google Scholar coverage between2012 and. Scientometrics 2013, 2013:1–11.13. Shultz M: Comparing test searches in PubMed and Google Scholar. J MedLibr Assoc 2007, 95:442–445.14. Falagas ME, Pitsouni EI, Malietzis GA, Pappas G: Comparison of PubMed,Scopus, Web of Science, and Google Scholar: strengths and weaknesses.FASEB J 2008, 22:338–342.15. Anders ME, Evans DP: Comparison of PubMed and Google Scholarliterature searches. Respir Care 2010, 55:578–583.16. Nourbakhsh E, Nugent R, Wang H, Cevik C, Nugent K: Medical literaturesearches: a comparison of PubMed and Google Scholar. Health Info Libr J2012, 29:214–222.17. Walters WH: Google scholar search performance: comparative recall andprecision. Portal 2009, 9:5–24.Bramer et al. Systematic Reviews 2013, 2:115 Page 8 of 9http://www.systematicreviewsjournal.com/content/2/1/11518. Shariff SZ, Bejaimal SA, Sontrop JM, Iansavichus AV, Haynes RB, Weir MA,Garg AX: Retrieving clinical evidence: a comparison of PubMed and GoogleScholar for quick clinical searches. J Med Internet Res 2013, 15:e164.19. Gehanno J-F, Rollin L, Darmoni S: Is the coverage of Google Scholarenough to be used alone for systematic reviews. BMC Med Inform DecisMak 2013, 13:7.20. Giustini D, Kamel Boulos MN: Google Scholar is not enough to be usedalone for systematic reviews. Online J Public Health Inform 2013, 5:214.21. Boeker M, Vach W, Motschall E: Google Scholar as replacement forsystematic literature searches: good relative recall and precision are notenough. BMC Med Res Meth 2013, 13:131.22. Freeman MK, Lauderdale SA, Kendrach MG, Woolley TW: Google Scholarversus PubMed in locating primary literature to answer drug-relatedquestions. Ann Pharmacother 2009, 43:478–484.23. Haase A, Follmann M, Skipka G, Kirchner H: Developing search strategiesfor clinical practice guidelines in SUMSearch and Google Scholar andassessing their retrieval performance. BMC Med Res Methodol 2007, 7:28.24. Henderson J: Google Scholar: a source for clinicians? Can Med Assoc J2005, 172:1549–1550.25. Mastrangelo G, Fadda E, Rossi CR, Zamprogno E, Buja A, Cegolon L:Literature search on risk factors for sarcoma: PubMed and GoogleScholar may be complementary sources. BMC Res Notes 2010, 3:131.26. Tober M: PubMed, ScienceDirect, Scopus or Google Scholar - Which is thebest search engine for an effective literature research in laser medicine?Med Laser Appl 2011, 26:139–144.27. Kennedy CM, Powell J, Payne TH, Ainsworth J, Boyd A, Buchan I: Activeassistance technology for health-related behavior change: an interdiscip-linary review. J Med Internet Res 2012, 14:e80.28. Sampson M, Tetzlaff J, Urquhart C: Precision of healthcare systematic reviewsearches in a cross-sectional sample. Res Synth Meth 2011, 2:119–125.29. Google Scholar Removes Search By Discipline. [http://www.seroundtable.com/google-scholar-discipline-16631.html]30. Golder S, Loke Y, McIntosh HM: Poor reporting and inadequate searcheswere apparent in systematic reviews of adverse effects. J Clin Epidemiol2008, 61:440–448.doi:10.1186/2046-4053-2-115Cite this article as: Bramer et al.: The comparative recall of GoogleScholar versus PubMed in identical searches for biomedical systematicreviews: a review of searches used in systematic reviews. SystematicReviews 2013 2:115.Submit your next manuscript to BioMed Centraland take full advantage of: • Convenient online submission• Thorough peer review• No space constraints or color figure charges• Immediate publication on acceptance• Inclusion in PubMed, CAS, Scopus and Google Scholar• Research which is freely available for redistributionSubmit your manuscript at www.biomedcentral.com/submitBramer et al. Systematic Reviews 2013, 2:115 Page 9 of 9http://www.systematicreviewsjournal.com/content/2/1/115


Citation Scheme:


Citations by CSL (citeproc-js)

Usage Statistics



Customize your widget with the following options, then copy and paste the code below into the HTML of your page to embed this item in your website.
                            <div id="ubcOpenCollectionsWidgetDisplay">
                            <script id="ubcOpenCollectionsWidget"
                            async >
IIIF logo Our image viewer uses the IIIF 2.0 standard. To load this item in other compatible viewers, use this url:


Related Items