UBC Faculty Research and Publications

iHOPerator: user-scripting a personalized bioinformatics Web, starting with the iHOP website Good, Benjamin M; Kawas, Edward A; Kuo, Yu-Lin B; Wilkinson, Mark D Dec 15, 2006

Your browser doesn't seem to have a PDF viewer, please download the PDF to view this item.

Item Metadata

Download

Media
52383-12859_2006_Article_1273.pdf [ 2.34MB ]
Metadata
JSON: 52383-1.0223262.json
JSON-LD: 52383-1.0223262-ld.json
RDF/XML (Pretty): 52383-1.0223262-rdf.xml
RDF/JSON: 52383-1.0223262-rdf.json
Turtle: 52383-1.0223262-turtle.txt
N-Triples: 52383-1.0223262-rdf-ntriples.txt
Original Record: 52383-1.0223262-source.json
Full Text
52383-1.0223262-fulltext.txt
Citation
52383-1.0223262.ris

Full Text

ralssBioMed CentBMC BioinformaticsOpen AcceSoftwareiHOPerator: user-scripting a personalized bioinformatics Web, starting with the iHOP websiteBenjamin M Good*, Edward A Kawas, Byron Yu-Lin Kuo and Mark D WilkinsonAddress: The James Hogg iCAPTURE Centre for Cardiovascular and Pulmonary Research, Providence Health Care/University of British Columbia, St. Paul's Hospital, Rm. 166, 1081 Burrard St. Vancouver, British Columbia, V6Z 1Y6, CanadaEmail: Benjamin M Good* - goodb@interchange.ubc.ca; Edward A Kawas - ekawas@mrl.ubc.ca; Byron Yu-Lin Kuo - bkuo@mrl.ubc.ca; Mark D Wilkinson - mwilkinson@mrl.ubc.ca* Corresponding author    AbstractBackground: User-scripts are programs stored in Web browsers that can manipulate the contentof websites prior to display in the browser. They provide a novel mechanism by which users canconveniently gain increased control over the content and the display of the information presentedto them on the Web. As the Web is the primary medium by which scientists retrieve biologicalinformation, any improvements in the mechanisms that govern the utility or accessibility of thisinformation may have profound effects. GreaseMonkey is a Mozilla Firefox extension that facilitatesthe development and deployment of user-scripts for the Firefox web-browser. We utilize this toenhance the content and the presentation of the iHOP (information Hyperlinked Over Proteins)website.Results: The iHOPerator is a GreaseMonkey user-script that augments the gene-centred pages oniHOP by providing a compact, configurable visualization of the defining information for each geneand by enabling additional data, such as biochemical pathway diagrams, to be collected automaticallyfrom third party resources and displayed in the same browsing context.Conclusion: This open-source script provides an extension to the iHOP website, demonstratinghow user-scripts can personalize and enhance the Web browsing experience in a relevantbiological setting. The novel, user-driven controls over the content and the display of Webresources made possible by user-scripts, such as the iHOPerator, herald the beginning of atransition from a resource-centric to a user-centric Web experience. We believe that thistransition is a necessary step in the development of Web technology that will eventually result inprofound improvements in the way life scientists interact with information.BackgroundUser-scripts are programs, typically written in JavaScript,display in the browser. The name 'user-script' may beslightly misleading as a typical user of a Web browser willPublished: 15 December 2006BMC Bioinformatics 2006, 7:534 doi:10.1186/1471-2105-7-534Received: 04 September 2006Accepted: 15 December 2006This article is available from: http://www.biomedcentral.com/1471-2105/7/534© 2006 Good et al; licensee BioMed Central Ltd. This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.Page 1 of 8(page number not for citation purposes)that are installed inside web-browsers. They manipulatethe content of specified sets of Web pages prior to theirnot likely write user-scripts (but see [1] for work on mak-ing this more feasible). The name might more appropri-BMC Bioinformatics 2006, 7:534 http://www.biomedcentral.com/1471-2105/7/534ately be 'user-side-scripts' to convey the notion that thescript operates within the user's browser and that itsinstallation and activation is under the user's control. Forbrevity and to stay in alignment with common terminol-ogy, we will use 'user-scripts' throughout the rest of thetext.User-scripts can be used to perform tasks including, butnot limited to: automatically adjusting style sheets, strip-ping unwanted advertisements, integrating the content ofmultiple Web resources, or introducing novel visualiza-tions. Anyone capable of writing JavaScript can write andshare user-scripts that alter the content displayed on anyWeb page. By writing or locating a suitable user-script, forexample in a public repository such as userscripts.org [2],and installing it in their browser, users gain unprece-dented control over the content that is ultimately dis-played in their browser window. User-scripts thus offer animmediate mechanism by which the Web browsing expe-rience can be shifted from its current resource-centred pat-tern of control towards a more user-centred view.Here we introduce the iHOPerator – a user-script designedto provide an enhanced, customized view of the iHOPwebsite, a key bioinformatics resource describing pro-teins, their properties, and the relationships that holdbetween them. We describe how the iHOPerator scriptgenerates and embeds a novel visualization of the con-tents of the iHOP Web pages and extends the content ofthose pages with information gathered from related, exter-nal Web resources. We conclude with a discussion of thepotential implications of user-scripts, describing theirrelationship with the emerging Semantic Web in the lifesciences.iHOPThe iHOP database provides information about proteinsthat have been automatically associated with PubMedabstracts [3-5]. Using the iHOP website [6], it is possibleto browse through the literature using hyperlinks thatassociate abstracts to one another using co-occurringgenes. After identifying a gene of interest, a user may nav-igate to a page that contains the "defining information"for the gene. This information consists of the gene'snames in different databases, its source organism, and apotentially very long list of snippets of text that have beenextracted from abstracts associated with the gene (Figure1).Tag cloudsTag clouds are visually-weighted renditions of collectionsof words ('tags') that are used to describe something [7].Tags in a cloud are sized, organized and coloured so as topopularity in 'social-tagging' applications such as Flikr [8],Connotea [9], and del.icio.us [10] because they provide amechanism through which untrained users can quicklyvisualize the dominant features of voluminous databasesand because they provide a visually based navigation par-adigm that is complementary to text search and operatesnaturally over non-hierarchically organized informationsystems.ImplementationThe iHOPerator is a user-script, a JavaScript that can beembedded in a Web browser such that it processes thecontents of visited Web pages prior to their presentationto the user. Though a user-script may be instructed toprocess any set of Web pages, (e.g. those from a particulardomain) the iHOPerator is focused specifically on thegene-information pages of the iHOP website.GreaseMonkeyAt this time, most user-scripts require extensions to Webbrowsers such as GreaseMonkey [17] for Mozilla's Firefox,Creammonkey [18] for Apple's Safari, and Turnabout [19]for Microsoft's Internet Explorer. Though user-scripts foreach of these browsers are written in JavaScript, there areno accepted standards for user-script extensions and thusscripts written for one browser may or may not work inanother browser. As user-scripts become more popular,standardization efforts are likely to emerge that willimprove script/browser interoperability; for the momenthowever, the iHOPerator is built for Firefox and is thusdependent on the GreaseMonkey extension for its opera-tion.The GreaseMonkey/Firefox combination was chosen forthis project because both are cross-platform, activelydeveloped, open source, and because GreaseMonkey wasthe first and is still the most widely used browser exten-sion for housing user-scripts. We utilize GreaseMonkey toadd a tag cloud to pages describing genes on the iHOPwebsite by processing the HTML and JavaScript present onthose pages prior to presentation in the browser. As well,we extend the content of the website by utilizing theGreaseMonkey API to retrieve content from externalHTTP-accessible resources.ResultsThe purpose of the iHOPerator user-script is to enhancethe user's experience when visiting the iHOP Web page. Itdoes this by generating a tag cloud visualization of someof the information presented on the gene-informationWeb pages and by integrating additional content acquiredfrom PubMed[11] and the Kyoto Encyclopedia of Genesand Genomes (KEGG)[12].Page 2 of 8(page number not for citation purposes)illustrate aspects of the relationship between each tag andthe entity that it describes. Tag clouds have recently gainedBMC Bioinformatics 2006, 7:534 http://www.biomedcentral.com/1471-2105/7/534iHOPerator tag cloudsThe iHOPerator script produces tag clouds based either onMESH keywords from the abstracts associated with a geneor from other genes that iHOP identifies as interactingwith a gene. For example, (Figure 2) shows a tag cloudgenerated using MESH terms gathered from abstracts asso-ciated with the gene Brca1 and (Figure 3) shows a tagcloud composed of genes related to Brca1. In both clouds,the size of each tag is used to display the frequency ofoccurrence of that tag (gene or keyword) in the context ofabstracts associated with Brca1 and colour is used to high-light the impact factor of the journals in which the tagsappear. From the user's perspective, these tag cloudsThe process of generating the tag clouds works as follows:1. Extract tags (MESH keywords or interacting genes)embedded in the HTML of the page. (This is greatly facili-tated by the presence of XML mark-up of these entitiesprovided by the iHOP website).2. Count the number of occurrences of each tag3. Calculate a score for the tag based on its relative fre-quency in the page.4. Collect the impact factor assigned to each abstract andDefault iHOP page displaying the defining information for VEGFFigure 1Default iHOP page displaying the defining information for VEGF. The default iHOP gene-focused Web page without the enhancements provided by the iHOPerator script. The page is displaying the defining information for the gene VEGF. The top of the page displays alternate names while the bottom (extending well past the area that can be displayed in the figure) pro-vides extractions from the text of abstracts associated with the gene.Page 3 of 8(page number not for citation purposes)appear to be embedded directly within the iHOP Webpage (Figure 4).associate it with the appropriate tag. (Once again, this isfacilitated by XML mark-up in the iHOP page).BMC Bioinformatics 2006, 7:534 http://www.biomedcentral.com/1471-2105/7/5345. Find the average impact factor associated with each tag.6. Produce the HTML for the cloud bya. Assigning each tag to a predefined Cascading Style Sheetclass that is associated with a particular size and colourthat is determined by the frequency of occurrence of thetag in the page and the average impact factor of the jour-nals associated with the tag occurrences respectively.b. Sorting the tags alphabetically.The iHOPerator script also allows the user to customizethe interface by selecting different ranges for the font sizesin the cloud and by specifying whether iHOPerator-gener-ated content should be hidden, display in another win-dow, or display within the iHOP Web page.iHOPerator integration of third-party contentAside from the tag-cloud based visualization (producedentirely using JavaScript operating within the browser), akey feature of the iHOPerator script is its ability to acquireand display third-party content related to the gene in thesame browser-context. For example, the script utilizesGreaseMonkey's built in support for AJAX (AsynchronousJavaScript and XML) to execute an asynchronous HTTPrequest that invokes a BioMoby [13] Web service work-flow stored as a Java servlet that, when possible, providesKEGG pathway diagrams containing the gene of interest(Figure 5). The script also makes it possible for the user toA tag cloud built from genes related to Brca1Figure 3A tag cloud built from genes related to Brca1. This tag cloud was built automatically using the iHOPerator user-script. It is composed of gene names extracted from abstracts associated with the gene Brca1 (in mouse). Colour (redness) correlates with the impact factor of the journals where the gene name occurs. Size correlates with the number of times the related gene A tag cloud built from MESH terms associated with Brca1Figure 2A tag cloud built from MESH terms associated with Brca1. This tag cloud was built automatically using the iHOPerator user-script. It is composed of MESH terms extracted from abstracts associated with the gene Brca1 (in mouse). Colour (red-ness) correlates with the impact factor of the journals where the term occurs. Size correlates with the number of times the term occurs in association with the gene – in this case Brca1.Page 4 of 8(page number not for citation purposes)name occurs in association with the gene in question – in this case Brca1.BMC Bioinformatics 2006, 7:534 http://www.biomedcentral.com/1471-2105/7/534access relevant external websites using an embeddedIFRAME element. This allows the user to view the abstractsassociated with the gene and/or MESH term of interest orto initialize a Web service browsing session using theGbrowse Moby [14] BioMoby client application that orig-inates with a gene selected from the cloud. Without theiHOPerator, each of these activities would require that theuser find the additional resources themselves, learn howto use them, cut and paste search terms into them, and ofcourse, navigate away from the iHOP website.Related workWithin the bioinformatics domain, only a few examplestory [2] and one was identified via Web search [15]. Bothscripts listed on [2] facilitate the addition of bookmarks toarticles listed in PubMed [11] to similar science-focusedsocial bookmarking systems, Connotea [9] and CiteULike[16]. In the other, Pierre Lindenbaum provides a scriptthat generates a TreeMap [17] visualization of Connoteareference collections [15].DiscussionAt present, Web browsers are the dominant technologyused to satisfy the information gathering and visualiza-tion needs of life scientists. In their current form, browsersprovide users with the ability to retrieve information fromThe iHOP webpage enhanced by the iHOPerator user-scriptFigure 4The iHOP webpage enhanced by the iHOPerator user-script. The iHOP webpage after it has been enhanced with the iHOPerator user-script. Compare with Figure 1. The Web page now includes a tag cloud composed of MESH terms from abstracts associated with the gene Brca1 in mouse as well as a panel of controls for manipulating the new visualization. The number of terms used to build the cloud, the scale of the fonts used, the presence or absence of the cloud on the page, and the actions taken when the user clicks on an element of the cloud are all under the user's control.Page 5 of 8(page number not for citation purposes)of user-scripts appear to exist so far. At the time of thiswriting, only two were listed at the primary global reposi-widely distributed sources, but essentially no means tointegrate information from multiple sources and only aBMC Bioinformatics 2006, 7:534 http://www.biomedcentral.com/1471-2105/7/534very constrained set of operations for manipulating thedisplay of that information. Given the distributed natureof information on the Web and the diversity of userrequirements in interacting with that information, this sit-uation is unsatisfactory.In most current implementations, Web browsers facilitateinformation transfer between only two parties – theresource provider, who determines all information pre-sented, all links to external resources, and nearly all man-ner of visualizing that information; and the consumer,who essentially can only control which page they chooseto view next. The typical Web browsing experience canthus be characterized as resource-centric because everythingthat the user sees on a Web page is governed entirely bythe resource provider.By introducing an additional layer of processing thatoccurs only at the discretion of the user (by choosingwhether or not to install a given script), user-scripts offera way to effect a transition towards a user-centric browsingexperience. Though it has always been possible for thetechnically skilled to engineer their own software forprocessing Web content (e.g. the notorious 'screen-scrap-ing' characteristic of early bioinformatics [18]), the arrivalof popular browser extensions such as GreaseMonkeymarks the beginning of a fundamental change in the wayand to find such scripts in public repositories, Web userscan now more actively make decisions about what Webcontent they see and how that content is presented.Despite its intriguing, paradigm-shifting nature, the user-script concept is not without its problems. Because Webcontent is still primarily provided as HTML, user-scriptsmust process HTML in order to function. This is problem-atic for two reasons: 1) HTML is not designed for knowl-edge or data representation and hence is difficult to parseconsistently and 2) HTML representations may changefrequently even when the underlying data does not. Theformer makes it challenging to write effective user-scripts,particularly scripts that are intended to operate over mul-tiple Web pages. The latter makes these scripts brittle inthe face of superficial changes to their inputs and thuspotentially unreliable [18]. Since information on the Webis currently provided primarily as HTML, alterations to thestructure of this content are frequent and necessary resultsof the need to keep the browsable interfaces up to date. Toalleviate these problems, it would clearly be beneficial ifthe underlying data could be exposed in a manner thatwas independent of its HTML representationThe potential value of separating content from presenta-tion provides motivation for the Semantic Web [19] initi-ative and the standards for the annotation of WebThe iHOP webpage for IRF-3, enhanced with a tag cloud and a pathway diagram using the iHOPerator user-scriptFigure 5The iHOP webpage for IRF-3, enhanced with a tag cloud and a pathway diagram using the iHOPerator user-script. The iHOPerator user-script is shown providing access to a KEGG pathway diagram containing the gene IRF-3 within the context of the iHOP website. The diagram was retrieved as a result of a mouse-click on 'IRF-3' in the tag cloud.Page 6 of 8(page number not for citation purposes)end-users can interact with the Web. Empowered with theability to easily embed scripts directly into their browserresources, such as the Resource Description Framework(RDF)[20] and the Web Ontology Language (OWL)[21],BMC Bioinformatics 2006, 7:534 http://www.biomedcentral.com/1471-2105/7/534that have recently emerged from it. With these standardsin place, content providers are encouraged to provide arepresentation of their data for visualization (HTML) inparallel with an additional representation of their data formachine-interpretation (RDF/OWL). This would enablethose who wish to utilize the content in novel ways toprocess the more stable, machine-readable representa-tions while remaining unaffected by visual modificationsto the associated websites. Though widespread adoptionof Semantic Web standards by the community may, inprinciple, enable the creation of powerful, user-centredapplications that go beyond the capabilities of user-scriptenabled browsers [22], this process is occuring very slowly[23] and the problems faced by life scientists in gathering,integrating and interpreting information on the Web arepressing. In their current form, user-scripts, such as theiHOPerator, provide an immediate means to addressthese needs and thus should be more widely exploited tothis end.ConclusionBy adding the iHOPerator user-script to their browser,users gain access to 1) a novel method of visualizing andnavigating the defining information about genes on theiHOP website and 2) enhancements to that informationthat are gathered automatically using external resourcessuch as PubMed and KEGG. The iHOPerator thus pro-vides an extension to the iHOP website that demonstrateshow user-scripts can be used to personalize and toenhance the Web browsing experience in a biological con-text.User-scripts represent a small, but immediate and usefulstep in the direction of a user-centred rather than aresource-centred Web browsing experience. In contrast toother proposed routes to achieving this goal, they offer amechanism that can be effected immediately using exist-ing resources and representations to provide end-userswith a straightforward way to exert greater control overwhat and how they see on the Web.Availability and requirements• Project name: iHOPerator• To install: Go to the project homepage and follow theinstallation instructions• Project homepage: http://bioinfo.icapture.ubc.ca/iHOPerator/• Operating system: any OS that supports the Mozilla Fire-fox Web browser• Other requirements: JavaScript enabled Firefox Webbrowser, GreaseMonkey Firefox extension, Internet con-nection• License: FreeBSDAuthors' contributionsBMG instigated the project and drafted the manuscript.EAK wrote all of the software. BYK developed the projectwebsite and provided intellectual input throughout theproject. MDW provided substantial advice and guidanceduring all phases of the project and assisted in the draftingof the manuscript. All author's read and approved thefinal manuscript.AcknowledgementsMDW and BYK are supported by an award to the iCAPTURE Centre from the Michael Smith Foundation for Health Research. EAK is supported by an award from Genome Alberta, in part through Genome Canada, a not-for-profit organization leading Canadian genomics and bioinformatics research. BMG is supported by an award to the Better Biomarkers in Transplantation project from Genome British Columbia, in part through Genome Canada. Core laboratory funding provided by the Natural Sciences and Engineering Research Council of Canada (NSERC). Infrastructure support provided by IBM and SUN Microsystems.References1. Bolin M: End-User Programming for the Web.  In Masters Thesisin Electrical Engineering and Computer Science Boston: MassachusetsInstitute of Technology; 2005. 2. Userscripts.org - Universal Repository   [http://userscripts.org/]3. Hoffmann R, Krallinger M, Andres E, Tamames J, Blaschke C, ValenciaA: Text mining for metabolic pathways, signaling cascades,and protein networks.  Sci STKE 2005, 2005(283):pe21.4. Hoffmann R, Valencia A: Implementing the iHOP concept fornavigation of biomedical literature.  Bioinformatics 2005, 21Suppl 2:ii252-ii258.5. Hoffmann R, Valencia A: A gene network for navigating the lit-erature.  Nat Genet 2004, 36(7):664.6. iHOP - Information Hyperlinked over Proteins   [http://www.ihop-net.org/UniPub/iHOP/]7. Tag cloud - Wikipedia, the free encyclopedia   [http://en.wikipedia.org/wiki/Tag_cloud]8. Flickr   [http://www.flickr.com/explore/]9. Connotea: free online reference management for cliniciansand scientists   [http://www.connotea.org/]10. Del.icio.us   [http://del.icio.us/tag/ ]11. Entrez PubMed   [http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?db=PubMed]12. Kanehisa M, Goto S, Kawashima S, Okuno Y, Hattori M: The KEGGresource for deciphering the genome.  Nucleic Acids Res 2004,32(Database issue):D277-80.13. Wilkinson MD, Links M: BioMOBY: an open source biologicalweb services proposal.  Briefings in bioinformatics 2002,3(4):331-341.14. GBrowse: MOBY-S Web Service Browser   [http://mobycentral.icapture.ubc.ca/cgi-bin/gbrowse_moby]15. A GreaseMonkey Script to Display SVG TreeMaps of Tags inConnotea   [http://www.urbigene.com/gmconnoteasvg/]16. CiteULike: A free online service to organize your academicpapers   [http://www.citeulike.org/]17. Treemaps for space-constrained visualization of hierarchies[http://www.cs.umd.edu/hcil/treemap-history/]18. Stein L: Creating a bioinformatics nation.  Nature 2002,417(6885):119-120.Page 7 of 8(page number not for citation purposes)• Programming languages: JavaScript 19. Berners-Lee T, Hendler J, Lassila O: The Semantic Web.  ScientificAmerican 2001, 284(5):34-43.20. W3C RDF Primer   [http://www.w3.org/TR/rdf-primer/]Publish with BioMed Central   and  every scientist can read your work free of charge"BioMed Central will be the most significant development for disseminating the results of biomedical research in our lifetime."Sir Paul Nurse, Cancer Research UKYour research papers will be:available free of charge to the entire biomedical communitypeer reviewed and published immediately upon acceptancecited in PubMed and archived on PubMed Central BMC Bioinformatics 2006, 7:534 http://www.biomedcentral.com/1471-2105/7/53421. OWL Web Ontology Language Overview   [http://www.w3.org/TR/owl-features/]22. Quan D, Karger D: How to make a semantic web browser:New York, NY, USA.   ACM Press; 2004:255-265. 23. Good BM, Wilkinson MD: The Life Sciences Semantic Web isFull of Creeps!  Brief Bioinform 2006.yours — you keep the copyrightSubmit your manuscript here:http://www.biomedcentral.com/info/publishing_adv.aspBioMedcentralPage 8 of 8(page number not for citation purposes)

Cite

Citation Scheme:

        

Citations by CSL (citeproc-js)

Usage Statistics

Share

Embed

Customize your widget with the following options, then copy and paste the code below into the HTML of your page to embed this item in your website.
                        
                            <div id="ubcOpenCollectionsWidgetDisplay">
                            <script id="ubcOpenCollectionsWidget"
                            src="{[{embed.src}]}"
                            data-item="{[{embed.item}]}"
                            data-collection="{[{embed.collection}]}"
                            data-metadata="{[{embed.showMetadata}]}"
                            data-width="{[{embed.width}]}"
                            async >
                            </script>
                            </div>
                        
                    
IIIF logo Our image viewer uses the IIIF 2.0 standard. To load this item in other compatible viewers, use this url:
http://iiif.library.ubc.ca/presentation/dsp.52383.1-0223262/manifest

Comment

Related Items