UBC Faculty Research and Publications

Pegasys: software for executing and integrating analyses of biological sequences Shah, Sohrab P; He, David Y; Sawkins, Jessica N; Druce, Jeffrey C; Quon, Gerald; Lett, Drew; Zheng, Grace X; Xu, Tao; Ouellette, BF F Apr 19, 2004

Your browser doesn't seem to have a PDF viewer, please download the PDF to view this item.

Item Metadata

Download

Media
52383-12859_2004_Article_156.pdf [ 2.6MB ]
Metadata
JSON: 52383-1.0228391.json
JSON-LD: 52383-1.0228391-ld.json
RDF/XML (Pretty): 52383-1.0228391-rdf.xml
RDF/JSON: 52383-1.0228391-rdf.json
Turtle: 52383-1.0228391-turtle.txt
N-Triples: 52383-1.0228391-rdf-ntriples.txt
Original Record: 52383-1.0228391-source.json
Full Text
52383-1.0228391-fulltext.txt
Citation
52383-1.0228391.ris

Full Text

ralssBioMed CentBMC BioinformaticsOpen AcceSoftwarePegasys: software for executing and integrating analyses of biological sequencesSohrab P Shah, David YM He, Jessica N Sawkins, Jeffrey C Druce, Gerald Quon, Drew Lett, Grace XY Zheng, Tao Xu and BF Francis Ouellette*Address: UBC Bioinformatics Centre, University of British Columbia, Vancouver, British Columbia, CanadaEmail: Sohrab P Shah - sohrab@bioinformatics.ubc.ca; David YM He - david@bioinformatics.ubc.ca; Jessica N Sawkins - jessica@bioinformatics.ubc.ca; Jeffrey C Druce - jdruce@bioinformatics.ubc.ca; Gerald Quon - gtquon@uwaterloo.ca; Drew Lett - drewlett@bioinformatics.ubc.ca; Grace XY Zheng - gxz@interchange.ubc.ca; Tao Xu - taoxu@bioinformatics.ubc.ca; BF Francis Ouellette* - francis@bioinformatics.ubc.ca* Corresponding author    AbstractBackground: We present Pegasys – a flexible, modular and customizable software system thatfacilitates the execution and data integration from heterogeneous biological sequence analysistools.Results: The Pegasys system includes numerous tools for pair-wise and multiple sequencealignment, ab initio gene prediction, RNA gene detection, masking repetitive sequences in genomicDNA as well as filters for database formatting and processing raw output from various analysistools. We introduce a novel data structure for creating workflows of sequence analyses and aunified data model to store its results. The software allows users to dynamically create analysisworkflows at run-time by manipulating a graphical user interface. All non-serial dependent analysesare executed in parallel on a compute cluster for efficiency of data generation. The uniform datamodel and backend relational database management system of Pegasys allow for results ofheterogeneous programs included in the workflow to be integrated and exported into GeneralFeature Format for further analyses in GFF-dependent tools, or GAME XML for import into theApollo genome editor. The modularity of the design allows for new tools to be added to the systemwith little programmer overhead. The database application programming interface allowsprogrammatic access to the data stored in the backend through SQL queries.Conclusions: The Pegasys system enables biologists and bioinformaticians to create and managesequence analysis workflows. The software is released under the Open Source GNU GeneralPublic License. All source code and documentation is available for download at http://bioinformatics.ubc.ca/pegasys/.BackgroundPipelines for biological sequence analysistational tools. For high-throughput data analysis, thesetools must be tied together in a coordinated system thatPublished: 19 April 2004BMC Bioinformatics 2004, 5:40Received: 27 February 2004Accepted: 19 April 2004This article is available from: http://www.biomedcentral.com/1471-2105/5/40© 2004 Shah et al; licensee BioMed Central Ltd. This is an Open Access article: verbatim copying and redistribution of this article are permitted in all media for any purpose, provided this notice is preserved along with the article's original URL.Page 1 of 16(page number not for citation purposes)Large scale sequence analysis is a complex task thatinvolves the integration of results from numerous compu-can automate the execution of a set of analyses insequence or in parallel. To this end, a diverse array ofBMC Bioinformatics 2004, 5 http://www.biomedcentral.com/1471-2105/5/40software systems for biological sequence analysis haveemerged in recent years. For example, the Ensembl pipe-line [1] automates the annotation of several eukaryoticgenomes, Mungall et al [2] have created a robust pipelinefor annotation and analysis of the Drosophila genome,GenDB [3] is used as an annotation system for severalprokaryotic genomes and Yuan et al [4] have publishedresources for annotating the rice and other plant genomes.These pipelines are extensive in their scope, are well-designed and meet their objectives. In surveying these andother systems, we have identified three critical areas thatare essential for building on the design of existing biolog-ical sequence analysis pipelines:• There is a need for flexible architecture so that one soft-ware system can be used to analyse different data sets thatmay require different analysis tools.• A system needs to allow for the inclusion of new tools ina modular fashion so the software architecture does nothave to change with the addition of new tools.• A system should provide the framework to facilitate dataintegration of analysis results from different tools thatwere computed on the same input.The need for flexible architectureThe systems outlined above differ substantially from eachother in their design and application, but share commonattributes. The diversity is naturally reflective of the variedcomputational tasks that biologists working on differentprojects need to perform in order to analyse their data. Aresearcher working on bacteria will need different toolsfor her analyses than someone working on mouse. Thespecificity driven by the needs of a research project makesit impossible to use a pipeline designed for a particulardata set for analysis of another data set that has inherentdifferences such as the organism from which it was gener-ated. As a result, numerous software pipelines have beencreated, many of which perform similar analyses (such asgenome annotation) but on different data. For example,the concept of constructing a pipeline or 'workflows' ofdata processing are common to nearly all high-through-put sequence analysis projects. This shared concept pro-vides an opportunity to harness the commonality insoftware so that a new system need not be designed forevery new project.Incorporating new tools into existing frameworksThe bioinformatics community is faced with a challengingand dynamic environment where new computationaltools and data sets for sequence analysis are constantlybeing generated. Capitalizing on algorithmic and compu-that is 'hard coded', it may require a significant program-ming investment to incorporate a new tool. This may dis-courage biologists from integrating a new tool on thebasis of logistics, rather than on the basis of scientificapplicability. Therefore, a system should provide a frame-work that is designed for flexibility and extensibility.Facilitating data integrationGenome annotation requires data integration. For exam-ple ab initio prediction of gene structures on genomicsequence can be greatly enhanced by using supportingsequence similarity searches [5-7]. Concordance betweendifferent methodologies lends stronger support and givesmore compelling evidence to an algorithm or a person try-ing to infer true biological features from computationallyderived features [8]. It follows that any analysis pipelineor system should provide a design that facilitates integra-tion of heterogeneous sources of data.The Pegasys biological sequence analysis systemTo meet the challenges outlined above we have designedand implemented Pegasys: a flexible, modular and cus-tomizable framework for biological sequence analysis.The software is implemented in the Java programminglanguage and is Open Source, released under the GNUGeneral Public License. The features of Pegasys allow it tobe used on a wide variety of tasks and data. Analysis mod-ules for pair-wise and multiple sequence alignment, abinitio gene prediction, masking of repetitive elements, pre-diction of RNA sequences and eukaryotic splice site pre-dictors have been developed. A new set of analyses isperformed by first creating a new 'workflow'. We define aworkflow as a set of analyses a biologist wishes to performon a single sequence or set of sequences. Each workflowhas the following qualities: a) the analyses can be linkedtogether such that output from one analysis can be usedas input to a subsequent analysis, b) analyses can acceptoutputs from more than one analysis as input, and c)analyses that are not serially dependent can be executed inparallel.Analysis tools in the Pegasys system are wrapped in mod-ules that can easily be plugged into the system. The back-end database system provides a data model that abstractsthe concept of a computational feature and captures datafrom all the different analysis tools in the same frame-work. We have implemented data adaptors that canexport computational results in General Feature Format[9] and Genome Annotation Markup Elements (GAME)XML [10] for import into the Apollo genome editor [11].For simple workflows where data integration is not appli-cable, for example one analysis on an input sequence,raw, untransformed output from the analysis can also bePage 2 of 16(page number not for citation purposes)tational advances is critical to discovering more about thedata being analysed. For a system that has a rigid pipelineretrieved.BMC Bioinformatics 2004, 5 http://www.biomedcentral.com/1471-2105/5/40The system is fronted by a graphical user interface thatallows users to create workflows at run-time and havethem executed on the Pegasys server. The GUI also allowsusers to save their workflows for repeat execution on dif-ferent input, or using different reagents.To demonstrate the utility of Pegasys in widely differentbioinformatics tasks, we present three use cases of the sys-tem: a single application workflow, a workflow designedfor formatting a database for BLAST [12,13] and searchingthe newly formatted database, and finally a workflowdesigned for genome annotation of eukaryotic genomicsequence.We are releasing this work with the intention that a widevariety of sequence analyses in the bioinformaticsresearch community will be enabled. Full details of theavailability, support and documentation of Pegasys can befound at http://bioinformatics.ubc.ca/pegasys/.ImplementationThe design of the Pegasys system is guided by three mainprinciples: modularity, flexibility and data integration.With these principles in mind, we designed Pegasys withthe following architecture.Architecture and data flowThe architecture of the system has a layered topology thatuses a client/server model. The client has a graphical userinterface (see Figure 4) for the creation of workflows.Once a workflow is created, it is sent to the server where itis executed. The server is made up of separate layers for jobscheduling, execution, database interaction, and adaptors.The connectivity between layers is shown in Figure 1. Theapplication layer converts the work flow rendered in XMLinto a directed acyclic graph (DAG) of analyses in mem-ory. While traversing the DAG, the application schedulesall of the analyses on a distributed compute cluster andfacilitates the flow of data so that a particular node's pro-gram is only executed once all of its inputs are ready (i.e.all of the 'parent' analyses are complete). As each analysiscompletes, the results are inserted into the backend data-base layer. Complete reports and computational featuresof a sequence are inserted into relational tables. Sophisti-cated queries on the data, in which results from selectedprograms can be integrated together over a portion or allof the input sequence, can then be run to compile data foroutput. The data is exported from the system via the adap-tor layer in various formats (currently GFF, GAME XMLand raw output from each analysis tool are supported) forhuman interpretation or for import into other applica-tions such as viewing tools (DAS [14]), editing tools(Apollo [11]) or statistical analysis tools such as R [15].The Pegasys data structureThe core data structure of the Pegasys system is a DAGG(V, E), consisting of a set of nodes V and a set of edgesconnecting the nodes E (see Figure 2). The DAG datastructure models a workflow created by a user of thePegasys system. A node can take one of three forms: a) aninput sequence or b) an individual run of a program in thesystem or c) an output node. An edge (v1, v2) where v1and v2 are nodes in V links data flow between v1 and v2.An edge represents a serial dependency, indicating thatthe input of v2 is tied to the output of v1. We refer to thisrelationship as a parent-child relationship: node v2 is achild of node v1 and node v1 is the parent of node v2. Theedge ensures that the output format from v1 is consistentwith the input format of v2. A node in the DAG can havemore than one parent and therefore can have heterogene-ous input from multiple sources. The edges in the graphare directional and can only connect two nodes that areexecuted one after another. The graph therefore has achronological axis: the child nodes are executed after theirparent nodes have completed.The DAG is created dynamically at run time as the usermanipulates the GUI (see The Graphical User Interfacesection). The user can create workflows using any combi-nation of the available programs in Pegasys by dragging/dropping and linking graphical icons that representsequence analysis tools on a canvas together with edges inmuch the same way that one would use drawing tool soft-ware to create a flow diagram. Each program icon can beclicked to open a dialogue box that can take inputs forparameters that are supported by that particular program.Once all of the parameters for all the nodes have beenfilled in, the information for each node and their relation-ships to each other are compiled into a structured XMLfile. This file is then used as input to the Pegasys server thatexecutes the analyses in parallel (described in the Archi-tecture and Data Flow section) or can be saved for laterediting or distribution. During the execution of the DAG,the data structure can adjust itself to accommodate out-puts generated from the nodes. Consider the edge (v3, v5)depicted in Figure 2 that connects an ab initio gene predic-tion program v3 with a sequence alignment program v5.In v5, the user wishes to search the coding regions fromthe output of v3 against a protein database. v5 cannotknow how many genes will be predicted from v3 before v3has terminated. Once v3 has terminated however, v5 willreplicate itself for each 'output unit' generated from v3(see Figure 2B). In this case, v5 replicates itself for each ofthe coding regions and the DAG executes each 'copy' of v5in parallel. This built-in elasticity confers maximum paral-lel execution of analyses and therefore more efficient exe-cution of the computations in the DAG.Page 3 of 16(page number not for citation purposes)BMC Bioinformatics 2004, 5 http://www.biomedcentral.com/1471-2105/5/40The Program moduleThe Program module is the fundamental unit of thenodes of the aforementioned DAG in the application layerof the server and is a real instance of a node v ∈ V. 'Pro-gram' is an object oriented class that abstracts the conceptof a Unix program that is natively compiled. Unix pro-grams generally have a set of input command line param-eters and output that is sent to the standard output,standard error or an output file. The Program class has adata structure to store a program's command line argu-ments and parameters. It contains methods for setting thepath to the program's location on the system, executinganalysis program, we created a PegasysProgram class thatextends Program by adding an input sequence attributeand a PegasysResultSet to store the results of the analysis.The ProgramResultSet is a hierarchical, recursive datastructure that allows storage of nested analysis results. Forexample a BLAST output has a list of similar sequencesthat each in turn has a list of high scoring pairs. SimilarlyGenscan produces output that contains a list of predictedgenes, each of which could have a promoter, a list of exonsand a poly-A signal. PegasysResultSet captures the hierar-chical nature of these results.Diagram showing the client/server model and layering of the Pegasys architectureFigure 1Diagram showing the client/server model and layering of the Pegasys architecture. Arrows between the layers indicate a transfer of data. The workflow created by manipulating the GUI in the client is sent as a Pegasys DAG XML file to the server. The application layer then processes the XML file, and sends jobs to the job scheduling layer. The analyses are then executed and the results are stored in the database. The adaptor layer takes results stored in the PegasysResultSet data structure in memory in the application layer and can create output in GFF or GAME XML format. This file is then returned to the GUI where it can be digested by the user or input into a visualization tool.ClientGUILayerXMLLayerServerApplication LayerDatabase LayerJob scheduling LayerExecution LayerAdaptor LayerSend to computeclusterWrite results todatabaseOutput data in astandard formatSchedule DAGfor executionSend/receive Pegasys DAG XML Send/receive output in GFF, Game XML, etc...WorkstationPage 4 of 16(page number not for citation purposes)the program and capturing its output from a file, standarderror and standard output streams. To abstract a sequenceFor each sequence specific analysis tool in Pegasys, we cre-ated a class that extends PegasysProgram. Each of theseBMC Bioinformatics 2004, 5 http://www.biomedcentral.com/1471-2105/5/40classes implement their own methods that load the partic-ular output of the program and parse it into their Pegasys-ResultSet. For example, the locations of computationalevidences such as predicted exons from a gene findingtool, or a high scoring pair from an alignment algorithmare parsed along with a statistic and/or score when availa-ble. This architecture generalises a computational featureso that programmatically, results from different analysisprograms can be treated equally. As mentioned earlier,this allows the user to output results from different pro-grams in a unified format such as GFF, or GAME XML. Inaddition, it facilitates querying for all computational evi-dence computed on a segment of sequence that may be ofinterest to the biologist.Creating a new PegasysProgram derivative involves writ-ing a parser for the particular application that can extractdata that is amenable to being loaded into a PegasysRe-blastp, blastx, tblastn, tblastx) [12,13], WU BLAST [17],the EMBOSS [18] implementation of Smith-Waterman[19], Genscan [20], HMMgene [21], Mlagan [22], Sim4[23], TrnaScan-SE [24] and GeneSplicer [25].The databaseThe backend database of the Pegasys system was createdwith the goal of maximizing information capture duringexecution of a workflow. The database tracks all parame-ters used for the invocations of analysis programs, allinput sequences, and all output generated bycomputation.The Pegasys schemaThe Pegasys schema has three main tables: 'sequence'which stores the input sequences, 'program_run' whichstores the information about an individual program'sprocess on the system and 'pegasys_result' which storesDiagram showing an abstract representation of a Pegasys DAGFigure 2Diagram showing an abstract representation of a Pegasys DAG. A): Consider v1: this could be an input sequence that is used by two sequence analysis programs v2 and v3. v4 is dependent on the output of both v2 and v3 and therefore cannot execute until v2 and v3 have completed. In this diagram, v2 and v3 will be executed in parallel as will v4 and v5. B): DAG in the case where v3 produces two instances of the expected output to v5. The sub-DAG rooted at v5 replicates itself (v5a and v5b) for each instance of its input. All of the new sub-DAGs are executed in parallel.v1v3v2v4 v5v6v1v3v2v4 v5av6av5bv6bA BPage 5 of 16(page number not for citation purposes)sultSet. The system, at the time of this writing hasPegasysPrograms for RepeatMasker [16], BLAST (blastn,the locations of computational features on the inputsequence. Peripheral to the three core tables are seventeenBMC Bioinformatics 2004, 5 http://www.biomedcentral.com/1471-2105/5/40Diagram showing the relations of the Pegasys database modelFigure 3Diagram showing the relations of the Pegasys database model. There are three core tables to the database: sequence (shown in blue), program_run (shown in orange) and pegasys_result (shown in yellow). The meta tables for each of the three core tables are colour coded to match the corresponding core table. Foreign keys are indicated with 'FK' and indexed fields are marked with T.pegasys_xrefPK pegasys_xref_idI1 id_stringFK1 xref_type_idversionevidence_typePK evidence_type_idevidence_namedescriptionsubseqPK subseq_idFK1 sequence_idstartstopargumentPK argument_idnamedescriptionprogram_runPK program_run_idFK2,I2 class_idFK1,I1 sequence_idprocess_idexit_statusprogram_outputnameversiondescriptionpathstart_timestampfinish_timestamp flagPK,FK1 argument_idarg_setbatch_runPK batch_run_idFK1,I1 batch_type_idownerdescriptionstart_timestampfinish_timestampfinished_successfullyxref_typePK xref_type_idxref_typedescriptionparameterPK,FK1 argument_idvaluepegasys_resultPK pegasys_result_idFK2,I1 evidence_type_idFK1,I2 database_reagent_idFK3,I3 program_run_idstrandstartstopsubject_strandsubject_startsubject_endphaseframescorestatisticdescriptionquery_descriptiontime_stampFK4,I4 parent_idbatch_run_programsPK,FK2,I1 batch_run_idPK,FK1,I2 program_run_idprogram_run_has_argumentPK,FK1,I1 program_run_idPK,FK2,I2 argument_idbatch_typePK batch_type_idbatch_typedescriptionresult_has_xrefPK result_has_xref_idFK1,I1 pegasys_result_idFK2,I2 pegasys_xref_idclassPK class_idnameseq_has_subseqPK,FK1,I1 parent_seq_idPK,FK2,I2 subseq_iddatabase_formatPK database_format_idformatsequencePK sequence_idsequencedeflineI1 accessionversionI2 hashcodedatabase_reagentPK database_reagent_idFK1,I1 database_format_iddatabase_namedatabase_descpathPage 6 of 16(page number not for citation purposes)BMC Bioinformatics 2004, 5 http://www.biomedcentral.com/1471-2105/5/40meta tables that store information about the data in thecore tables. The full schema is presented in Figure 3.The 'program_run' table is designed to store all informa-tion on an invocation of an analysis tool in order to facil-itate reprocessing of results without having to recomputean analysis and can also aid in diagnosing problems thatare bound to occur in the system. 'program_run' stores theclass that invoked the process, the raw unprocessed out-put of the program, the start and end time of the processand the exit status of the process. In addition, all com-mand line arguments used to invoke the program arestored in support tables to 'program_run' in the structuredtables 'argument', 'parameter', and 'flag'. Entries into'program_run' can be grouped into batches for selectiveretrieval of analysis results.The 'sequence' table stores the raw sequence string itself, aunique hash code for the sequence string generated by thejava.lang.String.hashCode() function, an identifier for thesequence (by default the GenBank accession.versionnumber) and a description of the sequence (by default theNCBI definition line of the FASTA file). This table doesnot store meta data about the sequence, rather it is meantto store unique sequences used for computation. The sys-tem assumes additional information on the sequence isstored elsewhere. The uniqueness is enforced by ensuringall sequences have distinct hash codes, description andidentifiers.Support tables for sequence have been createdto enable the analysis of sub-sequences of a larger inputsequence. The subsequence relationship to the sequenceis stored in the 'subseq' and 'seq_has_subseq' relations.These tables are useful for 'sliding window' analyses orwhen focusing in on small regions of interest of a largerinput sequence.The 'pegasys_result' table stores the results of the compu-tations. It has attributes for a computational evidencetype, a database reagent (if the result is from similaritysearches or uses a particular model in ab initio predic-tions), the strand, start and end positions of the computa-tional feature, a score and a statistic for the computationalfeature and a free-text description of the feature. If availa-ble, the strand, start and end position on the targetsequence of an alignment are also recorded. To supporthierarchical computational evidences, the table has a'parent_id' that is a self-referential foreign key. This ena-bles relating a particular row entry in the table to anotherrow in the table. Theoretically, the table supports infinitenesting of hierarchical data types, although in practiceresults are no more than 2 levels deep.The support tables for 'pegasys_result' allow cross-refer-search so that the full sequence and meta data of thatsequence can be easily retrieved. This cross-referencing ofa 'pegasys_result' to an identifier is stored in the'result_has_xref' relation. The type of identifier is labeledby a controlled vocabulary so that one can query on a par-ticular type of cross-reference (such as accession number)as well as add a new type of cross-reference to the system.Additional support tables to 'pegasys_result' are: 'data-base-format', 'database_reagent' and 'evidence-type'. Eachof these tables stores controlled nomenclature that is ref-erenced by 'pegasys_result'. The 'database-format' con-tains values such as blast, fasta, and genscan for BLASTformatted, FASTA formatted and Genscan training modelrespectively. The 'database_reagent' table stores the namesand descriptions of sequence databases and statisticalmodels that are used in the analysis, so that a user canquery the Pegasys database for results from a particulardatabase reagent. This structure also allows adding newdatabase reagents into the system seamlessly. The 'evi-dence-type' table stores an ontology of computational evi-dence types, for example 'blastn_hit' or 'genscan_exon'.For each program that is part of the Pegasys system, thecomputational evidence(s) that it outputs must berecorded in the 'evidence-type' table prior to its use.Database APITo communicate programmatically with the database, wehave created a modular application programming inter-face (API). The PegasysDB class contains public methodsfor insertion and retrieval of sequences, analysis resultsand sets of results (from different programs) on a particu-lar sequence. Application developers that wish to accessdata from a Pegasys database can use these high-levelmethods to rapidly store and access data in a straightfor-ward manner without having to study the underlyingschema of the database. The database API uses the Post-greSQL JDBC driver and so is backend relational databasemanagement system (RDBMS) independent.AdaptorsWe have implemented several adaptors for exporting datafrom a PegasysProgram or set of PegasysPrograms thatcontain analysis results. The derived PegasysAdaptorclasses all implement a print method to output data in aspecific format. We currently have derived PegasysAdap-tor classes for GAME XML for import into Apollo [11] andGFF [9] which can be imported into numerous tools andservers such as the Distributed Annotation System [14](DAS) and Gbrowse [26]. The adaptor architecture isextensible and easily allows the development and inclu-sion of new adaptors for additional formats. ThePegasysAdaptor classes serve as an important bridge fromthe Pegasys data structure to other well-used standardsPage 7 of 16(page number not for citation purposes)encing of ids. For example, the system models the conceptof linking out an identifier from the result of a databaseand permits interoperability between data computedBMC Bioinformatics 2004, 5 http://www.biomedcentral.com/1471-2105/5/40Screenshot of the Pegasys GUI showing the three pane designFigure 4Screenshot of the Pegasys GUI showing the three pane design. The visible pane is the canvas pane which allows the user to create a workflow by clicking and dragging icons corresponding to the programs available to the system. The icons can be connected to each other through edges. The parameters used for the execution of each program can be set by double click-ing the icon and filling in the dialogue box that appears (see Figure 5). Expected inputs and outputs for the edge can be set by double clicking the edge and filling in the dialogue (see Figure 6). This workflow will run RepeatMasker on the sequence speci-fied in the File node and write the results to a text file whose path is specified in the text output node. The RepeatMasker anal-Page 8 of 16(page number not for citation purposes)ysis itself is run on the compute server and the results are communicated back to the client.BMC Bioinformatics 2004, 5 http://www.biomedcentral.com/1471-2105/5/40using Pegasys and many other bioinformatics tools anddatabases.ParallelismOur local installation of Pegasys runs on a 28 CPU distrib-uted memory compute cluster that runs the OpenPBS par-allel batch server [27]. We have implemented 'serial'parallelism into the system meaning that each applicationis a serial process, but many serial processes can be run inparallel. It is important to note that this is distinct fromparallelism where a single application is itself imple-mented using a message passing library that can use manydistributed processors in a compute cluster environment.To enable serial parallelism, we implemented a Runnablethread class in the Pegasys application layer that can nav-igate a command line argument of a PegasysProgram,and create a script at runtime that is used to submit a jobto a PBS job queue. To monitor job progress, we imple-mented a Java server called QstatServer, that registers eachjob sent to the PBS job queue. The QstatServer maintainsa hash table of jobs in the queue and informs the Pegasysapplication layer when a particular job has terminated.This architecture enables the Pegasys application server toexecute jobs in sequence or in parallel according to thestructure of the DAG that was sent by the client.Pegasys and JavaThe Pegasys system is implemented in the Java program-ming language. Java offers robust data typing thatfacilitates object-oriented programming in its truest form.The principles and advantages of object-oriented designare well documented in the software engineering litera-ture (see [28]). Java is becoming widely adopted in thebioinformatics software domain. For example, theEnsembl database has a Java API to programmaticallyaccess genome annotations [29]. The Biojava toolkit [30]is an extensive set of packages written in Java for sequencemanipulation, analysis and processing. The Apollogenome editor [11], that we use with Pegasys, allows biol-ogists and bioinformaticians to edit and create annota-tions in a sophisticated GUI and is written in Java. Wehave integrated the Biojava toolkit into Pegasys formanipulation of sequence files as well as parsing of BLASToutput. Using Java also allows us to make use of the JDBClibrary for database connectivity that facilitates standarddatabase interactions independent of the RDBMS engine.To enable parallelism, we made use of the robust Threadand Runnable classes that allow development of multi-threaded programs.We have designed Pegasys in a layered architecture thatconsists of independent Java packages that can easily beimported into any external Java application that wishes toics.ubc.ca/pegasys/. Implementing Pegasys in Java hasbrought the system strength and robustness that wouldnot have been attainable with using a scripting language.Pegasys provides a Java alternative to existing Perl-basedsequence analysis systems such as GenDB [3] and BioPipe[31].The Graphical User InterfaceThe Pegasys graphical user interface (GUI) is designed forease of use while maximizing functionality. When the cli-ent is started, the user sees a simple three pane design (seeFigure 4). On left of the screen is a list of programs (the'Tool Box') available to the user. The list is retrieved fromthe server as an XML configuration file when the clientstarts, ensuring all the programs that are available to theuser from the client are available on the server. The canvasfor drawing the workflow is on the upper right side of thescreen, and on the bottom of the screen there is a consoleto view feedback from the client program.The structure of the workflow the user creates on the can-vas mirrors the structure of the DAG (see The Pegasys datastructure section). The nodes of this DAG can either beinput files, output files, or a program, while the edges thatconnect the nodes manage the flow of input and outputinformation. For example, the Genscan program node canproduce many types of outputs, a list of nucleotide FAS-TAs of predicted transcripts, or a list of amino acid FASTAsof the protein products. If a user connects a BLASTP nodeto this Genscan node, then the edge between these twonodes can be used to get the list of amino acid FASTAsfrom the Genscan node as input for the BLASTP node.During the creation of the workflow, the user can modifythe parameters of the analysis programs by double-click-ing a node. This opens a Node Properties dialogue. Anexample for BLAST is pictured in Figure 5. The input/out-put types for each edge must be set during the creation ofthe workflow. This is done through the Edge Propertiesdialogue (see Figure 6).When the user has finished creating the workflow, it canbe saved as an XML file representing the DAG. This XMLfile stores all the parameters for the nodes and edges thathave been set by the user during the creation of the DAG.This file can be kept on the local hard drive and retrievedfor later modification or distribution, or sent to the serverto be executed on the compute cluster. The saved DAG canalso be sent to the server using the command-line Java cli-ent for high-throughput, or automated processing. Whenthe processing is complete, the results are sent back to theGUI client to be saved as text files.Page 9 of 16(page number not for citation purposes)make use of them. These packages are well described inthe Pegasys user manual, available at: http://bioinformatTo ensure that the user's workflow is syntactically correct,the Pegasys client validates the workflow in real time. AsBMC Bioinformatics 2004, 5 http://www.biomedcentral.com/1471-2105/5/40Screenshot of the Node Properties dialogue window where users can input parameters for the analysis programsFigure 5Screenshot of the Node Properties dialogue window where users can input parameters for the analysis pro-grams. There are three columns – the name of the parameter, its current value and a check box to indicate if this parameter is enabled. Disabled parameters will be excluded from the DAG XML, and consequently from the actual command that is exe-cuted on the server. All default values are set in the ProgramList.xml file that the server reads on startup.Page 10 of 16(page number not for citation purposes)BMC Bioinformatics 2004, 5 http://www.biomedcentral.com/1471-2105/5/40Screenshot of the Edge Properties dialogue window where users set the inputs and outputs of an edgeFigure 6Screenshot of the Edge Properties dialogue window where users set the inputs and outputs of an edge. The input/output values are selected with drop-down select bars so users can only select input/output types that are available to the two nodes. Incompatible input/output types for an edge are not allowed by the GUI and the user is alerted to the error. Page 11 of 16(page number not for citation purposes)The input/output lists for each node are set in the ProgramList.xml file that the server reads on startup.BMC Bioinformatics 2004, 5 http://www.biomedcentral.com/1471-2105/5/40the user draw nodes and edges, they are validated for cor-rectness based on their requirements. For example, if aProgram Node has a required parameter that is not filledin, the Pegasys client will display that node with a red 'X'beside it. Once this required parameter is filled in, the red'X' will turn into a green tick mark, indicating the correct-ness of this node. Invalid edges are displayed in red, whilecorrect ones are displayed in black. Typically, edges will beinvalid if the 'output' and 'input' values of the edges arenot set or do not match. If the workflow has a red edge ora node marked with a red 'X', the Pegasys client will notallow the user to send the workflow to the server and willoutput a warning to the 'Console' area.The GUI component of the Pegasys system is imple-mented in C++, using QT graphical libraries [32]. The QTlibraries offer a "write once compile anywhere" approach.Because the QT components are natively compiled for itstarget operating system, GUI components written in C++/QT have a more native look and feel and give fast responsetimes to the user. In addition, C++/QT can be compiledon all the major operating systems, giving it nearly thesame level of portability as Java and facilitating the distri-bution of the Pegasys GUI client for most platforms.XML configuration filesCommunication between the client and server is medi-ated through XML files. There are three key XML files inthe Pegasys client. The first XML file, the Pegasysconfiguration file (PegasysConfig.xml), keeps track of thesystem settings for default output directories on the server,queuing time for the scheduler, location of Pegasys Javajar files, and database information. This file also containsthe path to the second XML file – the program list filewhich list all of the programs and their associated param-eters that are currently available on the Pegasys server(ProgramList.xml). This file needs to be updated when-ever a new module is added to the server, or the parame-ters of an existing module are changed. It is kept on theserver and is transmitted to the client every time it startsup to inform the users of the available programs on theserver and their associated parameters.The third XML file is the textual representation of theworkflow. This file is generated by saving the workflowusing the client. It can be sent to the server where it isparsed and then executed, or it can be re-opened at a latertime for further modification. For each node on thecanvas, its parameters, flags, and coordinates on the can-vas are recorded in the DAG XML file. Edges have theirstart and end nodes recorded.Communication via XML is one of the standard ways ofers for XML. This allowed us to rapidly build the softwarecomponents that exchange information between the cli-ent and the server.Results and discussionTo illustrate the flexibility of Pegasys for diverse analyses,we chose three workflows to demonstrate as use cases forthe system. The simplest workflow takes an inputsequence, runs a single analysis on this sequence andsaves the unprocessed results.Figure 4 shows an example of detecting repeats in agenomic sequence using RepeatMasker. In this example,the unprocessed results are written to a text file. Thisexample is almost as if RepeatMasker were run locally onthe command line, except that all information about theparameters used, the input sequence and the results arelogged to the Pegasys database.Figure 7 shows a workflow that has two inputs. The first isa FASTA-formatted nucleotide sequence file. This file isused as input to 'formatdb' – an application that trans-forms FASTA-formatted databases into a format that canbe used by BLAST. The second input is a query sequencethat will be used to search the newly formatted databaseusing BLAST. The results of the search are outputted in aGFF-formatted text file.Figure 8 shows a workflow that would be suitable forannotation of eukaryotic genomic sequence. The outputof this workflow would serve as the input for an annota-tion tool like Apollo. The DAG branches after the inputsequence File node into a sub-DAG of analyses that workon the input as is and a sub-DAG that analyzes the inputsequence that is masked for repeats with RepeatMasker.The unmasked sequence is analysed for tRNAs usingtRNAscan-SE, and for protein coding genes using ab initiogene predictors Genscan and HmmGene. The maskedsequence is searched against a database of curated pro-teins using BLASTX and against a database compiled fromESTs, full-length cDNAs and mRNA sequences (dbTran-script). The results from the latter search are further proc-essed by an application (bt2fasta) that filters all hits basedon taxonomy (in this case the user-inputted NCBI taxonidof the source organism of the input sequence) andretrieves their full sequences. This results in an organism-specific database of FASTA formatted sequences consist-ing of the BLASTN against dbTranscript hits. Theunmasked input sequence is then used as input to Sim4,which in turn aligns the input sequence to the entries inthe organism specific database. Results for all analyses arethen integrated into a GAME XML file for further interpre-tation using Apollo. The Pegasys XML DAG file thatPage 12 of 16(page number not for citation purposes)disseminating information on the Internet. Both Java forthe backend and QT for the client have ready-made pars-includes the parameters for all programs is available fordownload at http://bioinformatics.ubc.ca/pegasys/.BMC Bioinformatics 2004, 5 http://www.biomedcentral.com/1471-2105/5/40Workflow showing a BLAST pipelineFigure 7Workflow showing a BLAST pipeline. A FASTA formatted database is to be formatted for BLAST using 'formatdb'. A query sequence is then searched against this new database using BLAST. The results are written to a text file in GFF format.Page 13 of 16(page number not for citation purposes)BMC Bioinformatics 2004, 5 http://www.biomedcentral.com/1471-2105/5/40These use cases provide good examples of how Pegasyscan be used in sequence-based bioinformatics analyses.The system itself is by no means limited to these exam-ples. In theory any Unix program or script can beincorporated into the system and Pegasys could be usedfor workflows for systems administration, or other high-level scripting.Comparison with other systemsAs mentioned above, there are other systems that are sim-ilar to Pegasys in philosophy and approach. The Discov-gies. This system is a 'middleware' system that can be usedto create workflows of annotation tools. Pegasys differsfrom the DiscoveryNet approach in two major ways. First,Pegasys provides a rigorously defined data model for stor-ing computational features that is mapped by a relationalbackend database. The use case for DiscoveryNet describesoutput in the form of text-based flat files. Storing the datain a database allows it to be mined using SQL for selectivesub-sets of computational evidence and gives the usermore control over what they are interpreting. Second, thePegasys system is designed to create workflows on the flyWorkflow for genome annotationFigure 8Workflow for genome annotation. This workflow executes ab initio gene prediction, tRNA detection, repeat detection, sequence similarity searching against protein and transcript databases and alignments of transcripts to genomic sequence. Results for all of these analyses are integrated into a single GAME XML output file that can be inputted into Apollo, where a user can create annotations on the original input sequence.Page 14 of 16(page number not for citation purposes)eryNet platform [33] is a system that integratesbioinformatics tools based on Grid computing technolo-using the GUI and XML. The DiscoveryNet genome anno-tation workflow was programmed and any new workflowBMC Bioinformatics 2004, 5 http://www.biomedcentral.com/1471-2105/5/40would also require programming investment.DiscoveryNet uses the concept of web-services and distrib-uted computing. The architecture of Pegasys is extensibleto web service based analyses. We plan on adding thecapability of making remote calls to application serversand being able to integrate their analysis results into thePegasys framework. This would give Pegasys the utmostflexibility and extensibility by combining the power oflocally installed applications with remote web services.The Biopipe framework [31] describes a framework forprotocol-based bioinformatics. The protocols aredeveloped with the goal of creating reproducibility ofresults from computational analyses. This idea comple-ments Pegasys quite well and we envisage using Pegasys toencode protocols by creating workflow standards gener-ated from the Pegasys GUI for specific types of analyses(e.g. genome annotation or mass spectrometry peptidefragment identification) that we can distribute to thePegasys user community. This will facilitate cross-compar-ison of results from similar bioinformatics experimentsperformed on data sources in different research labs, or bycolleagues working in the same lab. In addition, Pegasyscan be used to compare results of different protocolsdesigned to address similar scientific problems.Future directionsThe work described in this paper has led us to considermany new challenges for future work on Pegasys. Whilethe specifications, the data model and the software aremature enough to be used in a research setting, thereremain many features and enhancements to the systemthat we are implementing in on-going work. We are add-ing new modules to Pegasys for distribution to the com-munity. We are implementing Pegasys modules for theInfernal package that is driving the Rfam repository offamilies of functional RNAs [34]. Our genome annotationwork to date has focused largely on eukaryotic systems,and we have therefore devoted most of our developmenttime to applications tuned for eukaryotic animal analysis.We are adding modules for prokaryotic analysis (e.g.Glimmer [35,36]) and plants (Eugene [37]) to comple-ment the current tools in Pegasys.From a software perspective, we hope to make Pegasysinter-operable and compliant with additional existingOpen Source bioinformatics standards and specifications,namely BioSQL and Chado to allow data computed withPegasys to be used in other systems that employ and inter-act with these specifications.ConclusionsWe have created a robust, modular, flexible software sys-integrate results from ab initio gene prediction, pair-wiseand multiple sequence alignments, RNA gene detectionand masking of repetitive sequences to greatly enhanceand automate several levels of the biological sequenceanalysis process. The GUI allows users to create workflowsof analyses by dragging and dropping icons on a canvasand joining processes together by connecting them withgraphical 'edges'. Each analysis is highly configurable andusers are presented with the option to change all parame-ters that are supported by the underlying program. Dataintegration is facilitated through the creation of a datamodel to represent computational evidence which is inturn implemented in a robust backend relational databasemanagement system. The database API provides program-matic access to the results through high-level methodsthat implement SQL queries on the data. The Pegasys sys-tem is currently driving numerous diverse sequence anal-ysis projects and can be easily configured for others.Implemented in Java, the backend of Pegasys is inter-operable with a growing number of bioinformatics toolsdeveloped in Java. Pegasys can output text files in stand-ard formats that can then be imported into other tools forsubsequent analysis or viewing. We are continually add-ing to Pegasys through the development of additionalmodules and methods of data integration. The flexibility,customization, modularity and data integration capabili-ties of Pegasys make it an attractive system to use in anyhigh throughput sequence analysis endeavour. We arereleasing the source code of Pegasys under the GNU Gen-eral Public License with the hope that the bioinformaticscommunity worldwide will make use of our efforts and inturn contribute improvements in the spirit of OpenSource.Availability and requirementsPegasys is available at http://bioinformatics.ubc.ca/pegasys/ and is distributed under the GNU General PublicLicense. Pegasys is designed to run on Unix based systems.Please consult the user manual (available with the distri-bution) for detailed installation and configurationinstructions. The Pegasys server is written in Java and hasthe following dependencies: Java 1.3.1 or higher, Post-greSQL 7.3.*, JDBC driver for PostgreSQL 7.3.* and Bio-Java 1.2*. We have tested Pegasys on a distributedmemory cluster (recommended) running OpenPBS2.3.16 to administer the job scheduling. In theory an SMPsystem running OpenPBS should work, but this has notbeen tested. The system's analysis programs include thefollowing: NCBI BLAST 2.2.3, WU BLAST 2.0, EMBOSS2.7.1 (for Smith-Waterman implementation only),tRNAscan-SE 1.23, the LAGAN toolkit 1.2, Sim4, Genscan1.0, HMMgene 1.1, MaskerAid (2001-11-08) and Gene-Page 15 of 16(page number not for citation purposes)tem for the execution and integration of heterogeneousbiological sequence analyses. Pegasys can execute andSplicer. All of the analysis tools are freely available to aca-demics. For details please consult the Pegasys manualBMC Bioinformatics 2004, 5 http://www.biomedcentral.com/1471-2105/5/40available with the distribution. The server has successfullybeen deployed and tested on a 28 CPU Linux cluster run-ning RedHat 7.3.The client is written in C++ and requires the QT librariesversion 3.11, and gcc version 3.2.2. The client has beentested on Linux Mandrake9.x, Solaris 8, Mac OSX,Windows98/NT/ME/XP.Authors' contributionsSS was the lead architect of the system and contributed tothe design and implementation and wrote most of thismanuscript. DH was the principal developer and contrib-uted to the design and implementation of the server andthe GUI. JS contributed to the design of the project andprovided requirements to the developers who weredesigning the system. GQ, GZ, JD, DL and TX all partici-pated in the implementation of various components ofthe system. BFFO conceived of the project, guided itsdevelopment, and edited this manuscript.AcknowledgmentsBFFO would like to acknowledge GenomeBC for funding this project. DL is supported by the CIHR/MSFHR Strategic Training Program in Bioinfor-matics http://bioinformatics.bcgsc.ca. TX is supported by CIHR grant #MOP-53259. We wish to thank Stefanie Butland, Joanne Fox and Yong Huang for critical reviews of this manuscript. We also thank Miroslav Hatas and Graeme Campbell for systems and software installation and mainte-nance for the Pegasys server.References1. Hubbard T, Barker D, Birney E, Cameron G, Chen Y, Clark L, Cox T,Cuff J, Curwen V, Down T, Durbin R, Eyras E, Gilbert J, Hammond M,Huminiecki L, Kasprzyk A, Lehvaslaiho H, Lijnzaad P, Melsopp C,Mongin E, Pettett R, Pocock M, Potter S, Rust A, Schmidt E, Searle S,Slater G, Smith J, Spooner W, Stabenau A, Stalker J, Stupka E, Ureta-Vidal A, Vastrik I, Clamp M: The Ensembl genome databaseproject. Nucleic Acids Res 2002, 30:38-41.2. Mungall CJ, Misra S, Berman BP, Carlson J, Frise E, Harris N, MarshallB, Shu S, Kaminker JS, Prochnik SE, Smith CD, Smith E, Tupy JL, WielC, Rubin GM, Lewis SE: An integrated computational pipelineand database to support whole-genome sequence annota-tion. Genome Biol 2002, 3(12):. RESEARCH0081. Epub 2002 Dec 23.Review3. Meyer F, Goesmann A, McHardy A, Bartels D, Bekel T, Clausen J,Kalinowski J, Linke B, Rupp O, Giegerich R, Pühler A: GenDB – anopen source genome annotation system for prokaryotegenomes. Nucleic Acids Res 2003, 31(8):2187-2195.4. Yuan Q, Ouyang S, Liu J, Suh B, Cheung F, Sultana R, Lee D, Quack-enbush J, Buell C: The TIGR rice genome annotation resource:annotating the rice genome and creating resources for plantbiologists. Nucleic Acids Res 2003, 31:229-233.5. Korf I, Flicek P, Duan D, Brent MR: Integrating genomic homol-ogy into gene structure prediction. Bioinformatics 2001,17:S140-S148. Suppl 16. Mathé C, Déhais P, Pavy N, Rombauts S, Van Montagu M, Rouzé P:Gene prediction and gene classes in Arabidopsis thaliana. JBiotechnol 2000, 78(3):293-299.7. Yeh R, Lim L, Burge C: Computational inference of homologousgene structures in the human genome. Genome Res 2001,11(5):803-816.8. Rogic S, Ouellette B, Mackworth A: Improving gene recognitionaccuracy by combining predictions from two gene-finding10. GAME XML DTD  [http://flybase.bio.indiana.edu/annot/gamexml.dtd.txt]11. Lewis SE, Searle SM, Harris N, Gibson M, Lyer V, Richter J, Wiel C,Bayraktaroglir L, Birney E, Crosby MA, Kaminker JS, Matthews BB,Prochnik SE, Smithy CD, Tupy JL, Rubin GM, Misra S, Mungall CJ,Clamp ME: Apollo: a sequence annotation editor. Genome Biol2002, 3(12):. RESEARCH0082. Epub 2002 Dec 23. Review.12. Altschul S, Gish W, Miller W, Myers E, Lipman D: Basic local align-ment search tool. J Mol Biol 1990, 215(3):403-410.13. Altschul S, Madden T, Schäffer A, Zhang J, Zhang Z, Miller W, LipmanD: Gapped BLAST and PSI-BLAST: a new generation of pro-tein database search programs. Nucleic Acids Res 1997,25(17):3389-3402.14. Dowell R, Jokerst R, Day A, Eddy S, Stein L: The distributed anno-tation system. BMC Bioinformatics 2001, 2:7-7.15. R Development Core Team: R: A language and environment forstatistical computing. R Foundation for Statistical Computing, Vienna,Austria 2003 [http://www.R-project.org]. [ISBN 3-900051-00-3]16. Bedell J, Korf I, Gish W: Masker Aid: a performance enhance-ment to RepeatMasker. Bioinformatics 2000, 16(11):1040-1041.17. Gish W: WU BLAST 2.0.  [http://blast.wustl.edu/blast/README.html].18. Rice P, Longden I, Bleasby A: EMBOSS: the European MolecularBiology Open Software Suite. Trends Genet 2000, 16(6):276-277.19. Smith T, Waterman M: Identification of common molecularsubsequences. J Mol Biol 1981, 147:195-197.20. Burge C, Karlin S: Prediction of complete gene structures inhuman genomic DNA. J Mol Biol 1997, 268:78-94.21. Krogh A: Two methods for improving performance of anHMM and their application for gene finding. Proc Int Conf IntellSyst Mol Biol 1997, 5:179-186.22. Brudno M, Do C, Cooper G, Kim M, Davydov E, Green E, Sidow A,Batzoglou S: LAGAN and Multi-LAGAN: efficient tools forlarge-scale multiple alignment of genomic DNA. Genome Res2003, 13(4):721-731.23. Florea L, Hartzell G, Zhang Z, Rubin G, Miller W: A computer pro-gram for aligning a cDNA sequence with a genomic DNAsequence. Genome Res 1998, 8(9):967-974.24. Lowe T, Eddy S: tRNAscan-SE: a program for improved detec-tion of transfer RNA genes in genomic sequence. Nucleic AcidsRes 1997, 25(5):955-964.25. Pertea M, Lin X, Salzberg S: GeneSplicer: a new computationalmethod for splice site prediction. Nucleic Acids Res 2001,29(5):1185-1190.26. Stein L, Mungall C, Shu S, Gaudy M, Mangone M, Day A, Nickerson E,Stajich J, Harris T, Arva A, Lewis S: The generic genome browser:a building block for a model organism system database.Genome Res 2002, 12(10):1599-1610.27. OpenPBS  [http://www.openpbs.org]28. Booch G: Object-oriented Analysis and Design with Applications The Ben-jamin/Cummings Publishing Company; 1994. 29. Ensj  [http://www.ensembl.org/java/]30. BioJava.org  [http://www.biojava.org]31. Hoon S, Ratnapu K, Chia J, Kumarasamy B, Juguang X, Clamp M, Sta-benau A, Potter S, Clarke L, Stupka E: Biopipe: a flexible frame-work for protocol-based bioinformatics analysis. Genome Res2003, 13(8):1904-1915.32. Trolltech – Qt Overview  [http://www.trolltech.com/products/qt/index.html]33. Rowe A, Kalaitzopoulos D, Osmond M, Ghanem M, Guo Y: The dis-covery net system for high throughput bioinformatics. Bioin-formatics 2003, 19(Suppl 1):225-225.34. Griffiths-Jones S, Bateman A, Marshall M, Khanna A, Eddy S: Rfam:an RNA family database. Nucleic Acids Res 2003, 31:439-441.35. Delcher A, Harmon D, Kasif S, White O, Salzberg S: Improvedmicrobial gene identification with GLIMMER. Nucleic Acids Res1999, 27(23):4636-4641.36. Salzberg S, Delcher A, Kasif S, White O: Microbial gene identifica-tion using interpolated Markov models. Nucleic Acids Res 1998,26(2):544-548.37. Schiex T, A M, P R: EUGENE: An Eukaryotic Gene Finder ThatCombines Several Sources of Evidence. In JOBIM 2000:111-125.Page 16 of 16(page number not for citation purposes)programs. Bioinformatics 2002, 18(8):1034-1045.9. General Feature Format  [http://www.sanger.ac.uk/Software/formats/GFF/index.shtml]

Cite

Citation Scheme:

        

Citations by CSL (citeproc-js)

Usage Statistics

Share

Embed

Customize your widget with the following options, then copy and paste the code below into the HTML of your page to embed this item in your website.
                        
                            <div id="ubcOpenCollectionsWidgetDisplay">
                            <script id="ubcOpenCollectionsWidget"
                            src="{[{embed.src}]}"
                            data-item="{[{embed.item}]}"
                            data-collection="{[{embed.collection}]}"
                            data-metadata="{[{embed.showMetadata}]}"
                            data-width="{[{embed.width}]}"
                            async >
                            </script>
                            </div>
                        
                    
IIIF logo Our image viewer uses the IIIF 2.0 standard. To load this item in other compatible viewers, use this url:
http://iiif.library.ubc.ca/presentation/dsp.52383.1-0228391/manifest

Comment

Related Items