@prefix vivo: . @prefix edm: . @prefix ns0: . @prefix dcterms: . @prefix skos: . vivo:departmentOrSchool "Science, Faculty of"@en ; edm:dataProvider "DSpace"@en ; ns0:degreeCampus "UBCV"@en ; dcterms:creator "Castillo Arnemann, Javier José"@en ; dcterms:issued "2021-02-05T09:05:05Z"@en, "2021"@en ; vivo:relatedDegree "Master of Science - MSc"@en ; ns0:degreeGrantor "University of British Columbia"@en ; dcterms:description "Pseudomonas aeruginosa is a clinically-important, opportunistic pathogen that is the third leading cause of hospital infections in North America, the major cause of life-threatening chronic infections in patients with cystic fibrosis, and a major threat due to its high level of antibiotic resistance. To understand the complexity behind the adaptive behaviours of P. aeruginosa it is necessary to employ systems biology methods made possible by the ongoing revolution in high-throughput omics technologies. One powerful systems biology approach leverages existing molecular interaction databases to generate networks showing the interactions between the identified molecules. However, most existing interaction databases are focused on data for humans and other well-studied organisms; thus, there is a lack of systems biology tools to study medically-important bacterial pathogens such as P. aeruginosa. I developed the Pseudomonas aeruginosa Interaction Database, PaIntDB, to fill in this gap. It is an intuitive web-based tool for network-based systems biology analyses using protein-protein interactions (PPI). It enables the interpretation and visualization of omics studies including proteomics, RNA-Seq, and Tn-Seq. These high-throughput datasets are mapped onto PPI networks, which can be explored visually and filtered to uncover putative molecular pathways related to the conditions of study. PaIntDB employs the most comprehensive P. aeruginosa interactome to date, collected from a variety of resources, including interactions predicted computationally to further expand analysis capabilities. Two case studies demonstrate how PaIntDB can be used to quickly identify functional gene groups involved in growth in physiologically-relevant conditions and biofilm formation, and use these insights to derive new hypotheses about the underlying biology."@en ; edm:aggregatedCHO "https://circle.library.ubc.ca/rest/handle/2429/77248?expand=metadata"@en ; skos:note "NETWORK-BASED INTEGRATION AND VISUALIZATION OF HIGH-THROUGHPUT DATASETS IN PSEUDOMONAS AERUGINOSAbyJavier José Castillo ArnemannB.Sc., Monterrey Institute of Technology and Higher Education, 2018A THESIS SUBMITTED IN PARTIAL FULFILLMENT OF THE REQUIREMENTS FOR THE DEGREE OFMASTER OF SCIENCEinTHE FACULTY OF GRADUATE AND POSTDOCTORAL STUDIES(Bioinformatics)THE UNIVERSITY OF BRITISH COLUMBIA(Vancouver)February 2021© Javier José Castillo Arnemann, 2021The following individuals certify that they have read, and recommend to the Faculty ofGraduate and Postdoctoral Studies for acceptance, the thesis entitled: Network-based integration and visualization of high-throughput datasets in Pseudomonas aeruginosa submitted by Javier José Castillo Arnemann in partial fulfillment of therequirements for the degree of Master of Science in Bioinformatics.Examining Committee:Robert E. W. HancockProfessor, Microbiology and Immunology, University of British ColumbiaSupervisorTamara MunznerProfessor, Computer Science, University of British ColumbiaSupervisory Committee MemberFiona BrinkmanProfessor, Molecular Biology and Biochemistry, Simon Fraser University Supervisory Committee MemberiiAbstractPseudomonas aeruginosa is a clinically-important, opportunistic pathogen that is the thirdleading cause of hospital infections in North America, the major cause of life-threateningchronic infections in patients with cystic fibrosis, and a major threat due to its high level ofantibiotic resistance. To understand the complexity behind the adaptive behaviours of P.aeruginosa it is necessary to employ systems biology methods made possible by the ongoingrevolution in high-throughput omics technologies. One powerful systems biology approachleverages existing molecular interaction databases to generate networks showing theinteractions between the identified molecules. However, most existing interaction databasesare focused on data for humans and other well-studied organisms; thus, there is a lack ofsystems biology tools to study medically-important bacterial pathogens such as P.aeruginosa. I developed the Pseudomonas aeruginosa Interaction Database, PaIntDB, to fillin this gap. It is an intuitive web-based tool for network-based systems biology analysesusing protein-protein interactions (PPI). It enables the interpretation and visualization ofomics studies including proteomics, RNA-Seq, and Tn-Seq. These high-throughput datasetsare mapped onto PPI networks, which can be explored visually and filtered to uncoverputative molecular pathways related to the conditions of study. PaIntDB employs the mostcomprehensive P. aeruginosa interactome to date, collected from a variety of resources,including interactions predicted computationally to further expand analysis capabilities. Twocase studies demonstrate how PaIntDB can be used to quickly identify functional genegroups involved in growth in physiologically-relevant conditions and biofilm formation, anduse these insights to derive new hypotheses about the underlying biology.iiiLay SummaryPseudomonas aeruginosa is a leading cause of chronic and hospital infections, and a modelorganism for the ongoing issue of antibiotic resistance. The biology behind the adaptivelifestyle of P. aeruginosa is very complex and involves the interaction of thousands of genes.Current sequencing technologies allow the large-scale study of the entire genome, but theresulting datasets are huge and difficult to interpret for humans. The tool presented here,PaIntDB, enables the interpretation and visualization of these results as a network ofinterconnected genes, by mapping them onto a database of compiled protein-proteininteractions. These networks can be filtered using biological and experimental information togenerate smaller, manageable subnetworks that group the interacting genes according to theirfunction. PaIntDB is available as a web application with an intuitive graphical interface,allowing researchers and non-experts to analyze their results without training in computerprogramming.ivPrefaceDr. Robert E. W. Hancock conceived the idea of compiling the existing protein-protein andprotein-metabolite data in P. aeruginosa to perform network-based analyses through a webapplication. Olga Solodova built the database by compiling and annotating the available P.aeruginosa interaction data. The idea to extend the application’s functionality to integrateRNA-Seq and Tn-Seq results came from Dr. Robert Hancock and Dr. Amy Lee. CorrieBelanger and Melanie Dostert were beta testers for the web application and performed theexperiments that generated the data analyzed in Chapter 3. I had the idea to develop avisualization module as part of Dr. Tamara Munzner’s Information Visualization course. Ideveloped the Python module to generate the networks and was responsible for designingand implementing the framework and web interface behind PaIntDB.A version of Chapter 2 describing the design and use of PaIntDB was submitted forpublication. Olga Solodova contributed the methods for creating the database, I drafted themanuscript and performed all further research, and Dr. Bhavjinder Dhillon and Dr. RobertHancock reviewed and extended the manuscript for submission.Castillo-Arnemann, J., Solodova, O., Dhillon, B. & Hancock, R. PaIntDB: Network-basedintegration and visualization of high-throughput results in Pseudomonas aeruginosa.Submitted.vTable of ContentsAbstract...................................................................................................................................iiiLay Summary..........................................................................................................................ivPreface......................................................................................................................................vTable of Contents....................................................................................................................viList of Tables..........................................................................................................................viiList of Figures.......................................................................................................................viiiList of Abbreviations..............................................................................................................ixAcknowledgements..................................................................................................................xDedication................................................................................................................................xiChapter 1: Introduction..........................................................................................................11.1 High-throughput sequencing approaches to study pathogenic bacteria..........................11.2 Network-based interpretation and integration of high-throughput results......................31.3 Existing solutions............................................................................................................41.4 Proposed solution............................................................................................................61.5 Database creation............................................................................................................7Chapter 2: Network-based interpretation and integration of high-throughput results with PaIntDB..........................................................................................................................122.1 Network generation.......................................................................................................132.2 Network visualization...................................................................................................152.3 Subnetwork generation.................................................................................................16Chapter 3: Applications of PaIntDB....................................................................................193.1 Identification of genes important for growth in host-like conditions...........................193.1.1 Nucleotide metabolism..........................................................................................203.1.2 Iron uptake.............................................................................................................223.2 Identification of genes involved in biofilm formation..................................................243.2.1 Energy metabolism in biofilms.............................................................................25Chapter 4: Conclusion...........................................................................................................274.1 Limitations....................................................................................................................274.2 Future directions...........................................................................................................29Bibliography...........................................................................................................................31viList of TablesTable 1. Comparison of existing software for network-based interpretation of high-throughput results in P. aeruginosa...........................................................................................5Table 2. List of sources compiled to build the PaIntDB interaction database.........................10Table 3. Network classes handle different input data..............................................................14viiList of FiguresFigure 1. Database schema used to store the compiled interaction data...................................9Figure 2. Workflow of a network-based analysis with PaIntDB.............................................12Figure 3. Graphical user interface to build PPI networks in PaIntDB.....................................13Figure 4. Graphical user interface to explore PPI networks. This is a combined network connecting RNA-Seq and Tn-Seq genes.................................................................................16Figure 5. Functional subnetworks generated with PaIntDB, showing genes involved in siderophore mediated iron transport........................................................................................18Figure 6. Subnetwork connecting differentially-expressed and essential gene products involved in nucleotide metabolism in P. aeruginosa PAO1 grown in serum..........................21Figure 7. Genes involved in nucleotide metabolism were largely down-regulated in P. aeruginosa PAO1 grown in serum...........................................................................................21Figure 8. Subnetwork connecting genes involved in iron uptake, regulation and pyoverdine biosynthesis in P. aeruginosa PAO1 grown in serum and treated with azithromycin.............23Figure 9. Subnetwork generated using the enriched GO term \"multi-organism process\" in P. aeruginosa PA14 growing as a biofilm...................................................................................25Figure 10. Subnetwork of genes involved in respiratory metabolism in P. aeruginosa PA14 biofilm.....................................................................................................................................26viiiList of AbbreviationscDNA: Complementary DNACSV: Comma-separated valuesGO: Gene ontologymRNA: messenger RNAPPI: Protein-protein interactionsPCSF: Prize-collecting Steiner forestRNA-Seq: RNA sequencingT3SS: Type III secretion systemTn-Seq: Transposon sequencingixAcknowledgementsI thank my supervisor, Dr. Robert E. W. Hancock, for his constant academic and personalsupport throughout my degree. I would also like to thank my committee, Dr. TamaraMunzner and Dr. Fiona Brinkman for their valuable suggestions to improve my project. I want to thank all members of the Hancock laboratory for the great conversations and forproviding a great space to work and grow as a researcher. In particular, thanks to SusanFarmer for keeping the lights on and spoiling us with food at any possible chance. I thank Dr.Amy Lee, who was a great source of guidance and mentorship. I also thank Corrie Belangerand Melanie Dostert, who were a huge help in testing the web application and interpretingthe networks presented in Chapter 3.Finally, I thank the Canadian Institutes for Health Research and Mitacs, who provided thefunding for this research.xDedicationTo the universe, and all the cool beings that inhabit it.xiChapter 1: IntroductionPseudomonas aeruginosa is a Gram-negative, opportunistic pathogen involved in cysticfibrosis, sepsis, burn infections and pneumonia, and is a leading cause of nosocomialinfections. It is difficult to treat due to its innate, mutational, acquired and adaptive antibioticresistance and has a plethora of virulence factors.1–3 P. aeruginosa also possesses a range ofadaptive mechanisms including biofilm formation on surfaces,4,5 swarming and surfingmotility,6,7 and quorum sensing8 that allow it to survive a wide variety of circumstances,particularly contributing to infections. These adaptive mechanisms are dependent onhundreds of genes and the altered expression of hundreds to thousands of gene products,7,9,10making a systems biology approach necessary for comprehensively understanding theunderlying biological processes, and providing the knowledgebase for devising potential newtreatments against infection. With the big-picture perspective provided by access to data-intensive genome-wide methods,the focus can shift from the role of individual genes to the interaction of groups of genes.This approach is possible due to the ongoing revolution in high-throughput technologies thatenable the screening of the entire set of molecules of a certain type in a cell, namely omicsstudies. Transcriptomics,3,11–13 proteomics,14–16 and metabolomics11,17 represent high-throughput omics methods that have become crucial to understanding the signalling,regulatory and metabolic pathways that confer on P. aeruginosa its remarkable adaptivetraits. 1.1 High-throughput sequencing approaches to studypathogenic bacteriaThe increased throughput and reduced costs of nucleic acid sequencing technologies haveresulted in new applications for the study of pathogenic bacteria. RNA-Seq, whichcomprehensively analyzes the transcriptome, has become particularly valuable for1identifying important genes and pathways involved in pathogenesis and antibioticresistance.18 It has superseded microarray hybridization experiments due to its more accuratequantification of gene expression levels, scalability, lack of requirement for backgroundsubtraction and single nucleotide resolution. For this method, bacteria are grown under theconditions of interest, then mRNA transcripts are extracted and reverse-transcribed intocDNAs, which are then sequenced and quantified, resulting in a snapshot of the wholetranscriptome. Finally, the expression data is analyzed to find genes with statisticallysignificant changes across the studied conditions.19–21Another recently developed methodology, Tn-Seq, involves the use of a promiscuoustransposon, with many possible insertion sites, to create a library of mutant bacteria that eachcontain single transposon insertions into a particular gene. The library, containing hundredsof thousands of individual mutants, is then grown under varying treatments or conditions asspecified in the study. Since the transposon insertions can induce loss-of-function in theirtarget gene, some mutants will not be able to grow under the treatment. By sequencing thegenomes of the initial and final populations, the relative quantification of each mutant is usedto identify genes that are phenotypically essential for growth under the conditionsemployed.22 It is still unclear how changes in transcript abundance correlate with thephenotypic importance of a gene, making RNA-Seq and Tn-Seq complementaryapproaches.23High-throughput RNA-Seq studies often return hundreds, if not thousands, of significantlydifferentially expressed genes. This provides a challenge in the interpretation of these resultsin a biologically meaningful fashion, to enable researchers to devise further experimentstargeting specific operons or pathways. Usually the first step is to perform functionalenrichment, where the significant genes are assigned to different sets and a statisticaloverrepresentation analysis such as Fischer’s exact test is applied to determine which setsinclude more genes than expected by chance. Many variations of this method existemploying different statistical approaches and tests,24,25 but the result is mostly the same: a2list of enriched GO terms or pathways that summarize the observed changes in expression.The sets include functional annotations such as Gene Ontology (GO)26,27 or biologicalpathways from databases such as KEGG,28 MetaCyc29 or Reactome.30From this point on, a biologist has to manually examine the enriched list and search theliterature to find specific genes or pathways relevant to their studies. To streamline this time-consuming process, I proposed in this thesis to develop a network-based method ofintegrating, visualizing and filtering these large datasets to aid microbiologists in thegeneration of new hypotheses for further experiments.1.2 Network-based interpretation and integration of high-throughput resultsThe flat gene lists returned by high-throughput experiments obviously paint an incompletepicture of the biology behind the observed changes. Genes do not act discretely or inisolation, but rather within the context of complex biochemical pathways with multi-levelsystems of regulation.31 Since all biological pathways involve enzymes and other moleculesinteracting with one another, modelling them as networks makes intuitive sense. In this view,nodes represent proteins (or any other biological molecule) and edges represent theirinteraction in the cell, including direct physical, metabolic or regulatory interactions. Thehypothesis is that if two proteins or biomolecules interact within a cell they are involved incommon biological purposes or events. Such interactions are termed protein-proteininteractions (PPI) and for mammalian cells have been captured in the database InnateDBdeveloped by our lab in collaboration.32 PPI are not binary, but rather a single protein or biomolecule can interact with many otherbiomolecules. This enables PPI to be visualized as networks that inherently capturerelationships between the interacting molecules, and group them in a manner relevant to theiroverlapping functions, capturing pathways and common functional roles. Highly-connected3nodes are termed hubs that are considered to have significant biological meaning since theyare thought to function by receiving and disbursing signals. Such hubs can be readilyidentified and represent potential drug targets and functional biomarkers, since theirdisruption would have the potential to significantly alter the bacterium’s survival prospects.Network-based methods also enable the intuitive integration of different omics types, and candeliver similar information to data-driven complex multivariate statistical approaches.33,34 PPIare particularly useful to integrate RNA-Seq and Tn-Seq results, since genes can be easilymapped to proteins.The generation of these networks relies on existing interaction databases that have beendeveloped to organize, standardize and distribute the massive amounts of data resulting fromhigh-throughput screenings.32,35,36 Systems biology studies often mine these databases togenerate networks involving their genes of interest.13,37,38 This approach enables thecontextualization of the experimental results with existing curated information about theidentified molecules, combined with the advantages of the network model showing theirinteractions.1.3 Existing solutionsNetwork-based methods are a multi-step process and often require multiple tools. Theirapplication is not as common in bacterial studies due to the relative lack of interaction datawhen compared to humans and other well-studied organisms. Accordingly, the majority ofexisting software for network-based analyses is not suitable for bacteria.39–41 PPI networks forbacteria can be created using STRING,36 but the visualization features are very basic, and thenetworks are restricted to tens of proteins. In addition, STRING interactions are not manuallycurated and include text-mined interactions, making them unreliable for large-scale analyses.The available tools that can be used for large-scale PPI network analysis in P. aeruginosa aresummarized in Table 1.4Table 1. Comparison of existing software for network-based interpretation of high-throughput results in P. aeruginosa. Feature PaIntDB NetworkAnalyst OmicsIntegrator CytoscapeInput Gene list with optional expression dataRaw RNA-Seq counts or gene listGene list(s) and interactomeGene listSource of P. aeruginosa PPI dataCompiled from other databases and studiesSTRING or Zhang42Uploaded by the userSTRING, KEGGor Reactome, with pluginsPPI Network GenerationYes Yes Yes Yes, with pluginNetwork VisualizationYes Yes No YesEnrichment AnalysisYes Yes No Yes, with pluginBiogically-informed nodeselectionYes Yes No YesSubnetworkGenerationYes No Yes NoOmicsIntegrator43 takes one or more lists of genes and an interactome as input, then returns asubnetwork of the genes that best link the data, using the Prize-Collecting Steiner Forest(PCSF) algorithm. It is not user-friendly since it is a command line application, theinteractome and gene list(s) must be pre-processed by the user, and it requires externalsoftware to visualize the resulting networks. Cytoscape44 is a well-known, community-driven, open source software for visualization andanalysis of biological networks. It has hundreds of plugins for different network analyses,including GO term enrichment45 and generating PPI networks using data from STRING46 orother interaction databases. However, Cytoscape needs to be downloaded and installed, has a5steep learning curve and needs a specific combination of plug-ins that must be installedindividually depending on the analysis.NetworkAnalyst47 contains a suite of advanced network analysis and visualization tools andis available online. By default, it employs manually-curated interactions from InnateDB togenerate PPI networks out of gene lists or expression data. However, since InnateDB doesnot include bacterial data, for P. aeruginosa NetworkAnalyst uses either STRINGinteractions or a computationally-predicted interactome.42A common problem in network visualization is the ‘hairball’ issue that occurs when networksget bigger and the overabundance of nodes and edges obscures the underlying structure. Tohandle this issue, NetworkAnalyst can prune the networks according to topological features.Topological methods are blind to the underlying biology and can therefore return networksthat while smaller and easier to visualize, might lose genes of functional importance. Theuser can also select nodes using enriched pathways or topological communities, but cannotcreate subnetworks using this selection. NetworkAnalyst also supports 3D visualization toaid the exploration of dense networks, but this approach has disadvantages includingocclusion of some nodes, the inability to export it as a figure and the unfamiliarity of thisrepresentation for untrained users. 1.4 Proposed solutionIn this thesis, I have developed a PPI network-based method to visually explore, interpret andintegrate high-throughput results in P. aeruginosa, with particular attention to RNA-Seq andTn-Seq. This methodology is implemented as a web-based tool called PaIntDB, which standsfor Pseudomonas aeruginosa Interactions DataBase. PaIntDB has three components: anSQLite database containing the interaction data, a Python module to generate the PPInetworks, and a web application with a graphical interface to visualize, filter and export thenetworks.6PaIntDB was designed to (1) compile the available interaction data in P. aeruginosa, (2)make use of this dataset to generate networks from high-throughput omics results, (3) enablethe visualization and biologically-informed filtering of the networks, and (4) combinesophisticated analysis with user-friendliness such that microbiologists without programmingexperience can explore the relationships implicated by their high-throughput omicsexperiments. Chapter 2 describes the implementation and workflow of PaIntDB, whileChapter 3 demonstrates its use to identify genes related to biofilm formation and growth inhost-like media.1.5 Database creationThe first stage in this project was the creation of the interaction database itself. This wasdone in the lab by an undergraduate student, Olga Solodova. It was built by compiling allexisting P. aeruginosa interaction data in the literature. To do this, protein identifiers, names,cellular locations and related gene ontology (GO) terms for P. aeruginosa PAO1 and PA14proteins were collected from the Pseudomonas Genome Database48 to begin populating thePaIntDB database. Then, interaction data for P. aeruginosa strains PAO1 and PA14 wascollected from existing interaction databases and individual studies. Interaction data wasdownloaded programmatically where possible, or from provided flat files. These interactionswere then input into the database, ensuring annotations met all features according to the PSI-MI 2.5 standard.49 Figure 1 shows a diagram of the database schema.Given that the collected interaction data for P. aeruginosa strains did not provide a completepicture of the interactome, additional interactions were predicted from orthologousmappings. This relied on the concept that if two proteins in P. aeruginosa have strongorthologs that are known to interact in another species, the likelihood is that these proteinsalso interact in P. aeruginosa, a concept previously introduced in the POINT (Prediction ofInteractome) database.50 Thus 5,364 orthologous genes were identified between P.aeruginosa PAO1 and PA14 using OrtholugeDB51 based on widely-accepted reciprocal best-7BLAST hit criteria. Experimentally-verified interactions in strain PAO1 were then predictedto exist between orthologous proteins in strain PA14, and vice-versa, and were annotated assuch in PaIntDB.Furthermore, since PPI data is limited for Pseudomonas species, additional interactions werealso derived from the well-studied Escherichia coli K12 strain MG1655. In this case,orthologs between this E. coli strain and P. aeruginosa PAO1 (1,841 orthologs) and PA14(1,776 orthologs) were also collected using OrtholugeDB and interactions predicted to existbetween orthologous proteins if they are interactors in E. coli. Overall, these predictedinteractions, annotated as such, resulted in the addition of 122,501 interactions, for a total of157, 427 interactions. The complete list of sources for PPI data is described below as Table 2.89Figure 1. Database schema used to store the compiled interaction data. Most table columns correspond to the required PSI-MI 2.5 fields.Table 2. List of sources compiled to build the PaIntDB interaction database. P-P meansprotein-protein, P-M means protein-metabolite. The number of interactions for each categoryis included in parentheses.Source No. of InteractionsOrganism Interaction TypeVerified experimentallyAgile Protein Interaction DataAnalyzer (APID)5263 Pseudomonas aeruginosa (Pa) PAO1P-P YesBindingDB53 2 Escherichia coli (E. coli)P-M YesDatabase of Interacting Proteins (DIP)547,305 E. coli P-P Yes (6,170)No (1,135)EcoCyc55 10,512 E. coli P-P (2,395)P-M (8,117)NoGalán-Vázquez31 1,560 Pa PAO1 (1,513)Pa PA14 (47)P-P NoInternational Molecular Exchange (IMEx)5615,230 E. coli (15,135)PAO1 (88)PA14 (7)P-P (14,870)P-M (360)Yes (15,182)No (48)IntAct57 39,082 E. coli (38,980)Pa PAO1 (95)PA14 (7)P-P (38,384)P-M (698)Yes (38,988)No (94)Interaction ReferenceIndex (iRefIndex)5823,099 E. coli (22,988)Pa PAO1 (104)Pa PA14 (7)P-P Yes (17,037)No (6,062)Kyoto Encyclopedia of Genes and Genomes (KEGG)2834,543 Pa PA14 (14,849)Pa PAO1 (14,724)E. coli (4,970)P-P (19,357)P-M (15,186)Nomentha59 31,247 E. coli (31,225)Pa PAO1 (21)Pa PA14 (1)P-P Yes (25,785)No (5,462)The Molecular Interaction Database (MINT)60327 E. coli (312)Pa PAO1 (15)P-P (319)P-M (8)Yes10Source No. of InteractionsOrganism Interaction TypeVerified experimentallyMicrobial Protein Interaction Database (MPIDB)61336 E. coli (289)Pa PAO1 (47)P-P (330)P-M (6)YesRegulonDB62 1,845 E. coli P-P NoUniProt63 23 E. coli P-P YesXLinkDB64 297 Pa PAO1 P-P YesZhang42 38,439 Pa PAO1 P-P No11Chapter 2: Network-based interpretation andintegration of high-throughput results with PaIntDBPaIntDB is available as a web application (https:// www. paintdb.ca ) accessible through anymodern web browser. Since user-friendliness was a major goal, a full analysis can beperformed using only its graphical user interface. It was built using the open source Dashframework for Python, including the Cytoscape.js65 API that is part of the framework to drawthe networks. The steps to perform a network-based analysis with PaIntDB are summarizedin Figure 2 and detailed in this Chapter. 12Figure 2. Workflow of a network-based analysis with PaIntDB.The user uploads a list of genes and their interactions are mapped tobuild a network. This network can be explored visually withinPaIntDB, filtered to find functional subnetworks, or exported forfurther analyses in NetworkAnalyst or Cytoscape.2.1 Network generationPaIntDB has three possible inputs: (i) a list of genes/proteins of interest derived from anyhigh-throughput experiment (ii), a list of differentially-expressed genes/proteins withassociated expression and significance values, or (iii) a combination of differentially-expressed genes with a list of genes identified through TnSeq (Fig. 3A). To handle thesedifferent inputs, I created three network classes with specific attributes (Table 3). The geneidentifiers employed are locus tags since these are used in high-throughput experiments.After the data is uploaded, the user can select the P. aeruginosa strain to work with (PAO1 orPA14), the network order and the interaction detection method (Fig. 3B). When hoveringover the question mark next to a parameter in the graphical interface, a short text pops up toexplain its functionality.13Figure 3. Graphical user interface to build PPI networks in PaIntDB. (A) Input Selection. (B) Parameters to generate the network. (C) Selection of genes for GO term enrichment.Table 3. Network classes handle different input data. The classes are hierarchical:BioNetwork is the parent class of DENetwork, which is the parent class ofCombinedNetwork. Therefore, the attributes are cumulative. The significance sourceindicates the experiment in which the gene was identified.Input Network Class AttributesGene list BioNetwork - Gene name- Gene description- Cellular locationDifferentially-expressed (DE)genes listDENetwork - Log2 fold change- Adjusted p-valueDE genes list + Tn-Seq essential genes listCombinedNetwork - Significance sourceFor the network order parameter, the user can select between a zero-order network, whichonly connects any input genes that interact directly, or a first-order network, which uses theinput genes as ‘seeds’ then finds other interacting genes in the database that were not part ofthe original list to connect them. First-order networks are useful when the input gene list isshort, since a zero-order network would result in many orphaned nodes. With the detection method parameter, the user can select between using all of the interactionsin the database, or only using the interactions that have been verified experimentally.Selecting experimental interactions results in a smaller, higher-confidence network, butbiases the analysis towards well-studied genes and may result in the loss of many genes thatare relevant to the studied conditions. After the parameters are selected, PaIntDB will mapthe interactions between the input genes using the relevant PPI data. The generated networkconsists of nodes representing proteins and edges representing their biophysical, biochemicalor regulatory interaction. To aid in the interpretation of the network, functional gene ontology (GO) term enrichment isperformed using the GOATOOLS library,66 using GO terms obtained from the PseudomonasGenome Database.48 The user has the option to perform the enrichment with either all of the14input genes or only the genes that were mapped to the network (Fig. 3C). Fischer’s exact testis employed with the usual 0.05 p-value cutoff to determine significant terms. These enrichedGO terms can be used to filter the nodes when visualizing the network, as explained insection 2.3. The full enrichment results can be downloaded as a CSV file. 2.2 Network visualizationPaIntDB has a graphical interface to explore the generated networks interactively. Its maingoal is to allow the quick and easy identification of interesting groups of genes involved inthe conditions of study. To do this, the user can zoom, pan and select nodes with the mouse(Fig. 4C). Gene names, descriptions, and experimental information, if included, of theselected nodes is shown in a table that can be downloaded as a CSV file (Fig 4D).The nodes are positioned using the neato layout from the Graphviz graph drawing suite. Thislayout algorithm models the network as a physical system where the edges are springs thatpush the connected nodes with a force proportional to the shortest path between them. Thesystem is solved iteratively to place the nodes in a low-energy configuration.67The node size is mapped to the node degree to quickly identify hub proteins, whereby hubsare highly connected nodes. If differential expression data is included, the node colorindicates up- or down-regulation, enabling in the identification of co-regulated gene modulesin the network. If Tn-Seq genes are included, the color mapping can be changed to indicatethe experiment in which the genes were identified (Fig. 4A).When the networks contain more than 1000 nodes, the well-known hairball problem appears,where the over-abundance of nodes and edges makes visual inspection difficult and theunderlying network structure is lost. Even with an ideal layout, extracting patterns fromhundreds of on-screen nodes is still a challenging task. To tackle this issue, I implemented15filters that take advantage of the user’s prior knowledge to generate smaller, biologically-relevant subnetworks. 2.3 Subnetwork generationNodes can be selected according to their cellular location or the enriched GO termscontaining them (Fig. 4B). If differential expression data is included, then it is possible toselect up-regulated or down-regulated genes, and if a Tn-Seq dataset is included, there isanother filter to select genes according to the experiment in which they were identified. All16Figure 4. Graphical user interface to explore PPI networks. This is a combined networkconnecting RNA-Seq and Tn-Seq genes. (A) Visual options. (B) Filters used to select specificnodes to extract subnetworks. These are generated dynamically depending on the networktype. (C) Network view. Nodes can be selected and moved with the mouse. (D) Table showingdetails about the selected nodes, obtained either through the filters or with the mouse.filters can be combined to fine-tune the selected nodes as desired, for example: “find all up-regulated genes identified through both Tn-Seq and RNA-Seq associated with DNA Repairand located in the cytoplasm”. Individual genes of interest can also be added by name orlocus tag to the query.The next step is creating a subnetwork connecting the selected nodes using the smallestnumber of additional nodes possible; this involves the use of the Prize-collecting SteinerForest (PCSF) algorithm, using the implementation included in OmicsIntegrator.43,68 Thisalgorithm assigns weights to each selected node, called a prize, and to each edge, called acost. It then finds the subnetwork that maximizes the prizes and minimizes the costs. PCSFuses the whole interactome as background, so it can identify genes, called Steiner nodes, thatwere not included in the original list to connect the existing ones. The user has the option asto whether to include these additional nodes, and also whether to include low-confidenceedges in the solution (Fig. 5). This filtering is independent from the network order parameter,thus subnetworks can be extracted from both zero and first order networks.For differential expression data, the prize is assigned to the gene’s absolute fold change toprioritize genes with large changes in expression. Tn-Seq genes, if included, are assigned themaximum prize, since it is assumed that they are of particular importance to the conditions ofstudy. If no experimental data is included, then all genes are assigned the same prize. Theinteractions are assigned a cost of 0.5 if they are verified experimentally and a cost of 1 ifthey are not, since the higher the edge cost, the less likely it is to be included in thesubnetwork. Running the algorithm with this modelling returns a high-confidence subnetwork that bestlinks the selected nodes. This approach, when combined with the versatile filteringcapabilities, allows the user to quickly generate smaller, manageable networks related tospecific biological functions of interest, as shown in Chapter 3. Any network or subnetwork17can be exported as a .graphml file if the user chooses to perform topological analyses orcustomize its visual appearance using NetworkAnalyst or Cytoscape. 18Figure 5. Functional subnetworks generated with PaIntDB, showing genes involved insiderophore mediated iron transport. The user has the option to include low-confidenceinteractions or Steiner nodes in the subnetwork. Steiner nodes are genes identified with thePrize-Collecting Steiner Forest algorithm that are not included in the original data, but helpconnect other genes. In this case, the addition of fur, a global regulator of responses tolimiting iron, indicated with an arrow, connects hmuV and PA14_62350 to the rest of thenetwork, and tightens the network by providing a point of connectivity for what wouldotherwise be distant nodes.Chapter 3: Applications of PaIntDBPaIntDB enables the rapid identification of functionally-related genes, turning spreadsheetswith thousands of rows into digestible PPI subnetworks within minutes. These subnetworksshow gene relationships within and across metabolic pathways, thus aiding in the generationof new hypotheses about the underlying biology. This Chapter demonstrates the integrationof RNA-Seq and Tn-Seq results to identify and characterize the behaviour of key actors inhost-like media growth and biofilm formation in P. aeruginosa. Such integration has notbeen performed previously. The experiments that generated these datasets were performed byPhD students Corrie Belanger and Melanie Dostert.3.1 Identification of genes important for growth in host-like conditionsAntibiotic susceptibility assays traditionally use Mueller Hinton broth (MHB) as a growthmedium. MHB is nutrient-rich and does not reflect the in vivo conditions where bacteriagrow during an infection. Since the bacterium’s metabolism will change depending on thenutrient composition of the medium, there is a need to develop assays using host-like mediato make the conclusions more applicable to actual infections. Human blood is a low-nutrientenvironment, so bacteria must synthesize many cofactors and metabolic intermediates thatare readily available in MHB. Iron, an essential nutrient for P. aeruginosa, is sequestered bytransferrin in the blood and thus not bioavailable. The scarcity of iron and nucleotideprecursors seem to be important limiting factors for bacterial growth in blood.69,70 Moreover,recent studies have shown that antibiotic susceptibilities change when using physiologically-relevant media.71–73The Hancock Lab is characterizing the behaviour of P. aeruginosa in host-like conditions tofind novel activities of known antibiotics, where azithromycin in particular has shownenhanced antimicrobial activity under such conditions.73 For this study, the host-like medium19was RPMI, an enriched medium widely used for eukaryotic cell cultures, with the addition ofhuman serum, to mimic wound exudate from infections or human blood. P. aeruginosaPAO1 was grown in RPMI with serum, and MHB was used as a control medium. Then,differentially-expressed genes were identified through RNA-Seq, with and withoutazithromycin treatment. Similarly, essential genes for the same growth and treatmentconditions were identified using Tn-Seq. 3.1.1 Nucleotide metabolismTo find altered pathways involved in growth in serum, I created a zero-order networkintegrating 3,113 differentially-expressed genes from RNA-Seq with the corresponding 169essential genes identified through Tn-Seq. The resulting network contained 2,067 nodes, so itwas necessary to use the filters to obtain manageable subnetworks for analysis. To explorethe regulatory changes in nucleotide synthesis in serum, I made a subnetwork using theenriched GO term “nucleotide metabolic process”, resulting in a smaller network with 74genes. Five genes participating in purine and pyrimidine metabolism (purEFL, pyrDC) were foundto be essential and differentially-expressed, supporting the claims of the importance ofnucleotide synthesis for growth in serum (Fig. 6). PA3505, a putative L-aspartatedehydrogenase,74 was essential but not differentially-expressed. Interestingly, most of thegenes in the subnetwork were down-regulated (Fig. 7). Since nucleotide precursors arelacking in serum, this could reflect the conservation of resources by the bacterium byreducing the production of the enzymes that metabolize them. Additional experiments wouldbe needed to test this hypothesis. 20Figure 6. Subnetwork connecting differentially-expressed (RNA-Seq) andessential gene products (Tn-Seq) involved in nucleotide metabolism in P.aeruginosa PAO1 grown in serum. Distinct network regions grouped genesinvolved in the (A) adenosine triphosphate (ATP) synthesis, (B) purinesynthesis and (C) pyrimidine synthesis pathways.Figure 7. Genes involved in nucleotide metabolism were largely down-regulated in P. aeruginosa PAO1 grown in serum.213.1.2 Iron uptakeI used a similar approach to identify genes involved in azithromycin susceptibility in serum,creating a zero-order network combining 2,206 differentially-expressed genes with 130essential genes. The resulting network contained 1,447 genes. In this case, we were interestedin the regulatory changes induced by the low iron availability in serum and the azithromycintreatment, so I generated a subnetwork using the enriched GO terms “iron ion transport”,“iron import into cell” and “pyoverdine metabolic process”. Pyoverdines are siderophoremolecules produced by P. aeruginosa and other pseudomonads to chelate and assimilate ironfrom the surrounding environment and are important virulence factors.75The subnetwork contained 10 small, disconnected components, so I included Steiner nodes toconnect them into a larger network (Fig. 8). Generally, iron uptake and pyoverdine synthesisgenes were up-regulated, presumably so the bacterium can extract as much iron as possiblefrom the low-iron environment. Two sigma factors, FoxI and PA4896 were essential anddifferentially-expressed (Fig. 8 inset). Sigma factors are specific transcription factors andcomponents of RNA polymerase exclusive to bacteria. FoxI regulates the expression ofFoxA,76 a receptor of ferrioxiamine, another iron chelating agent. PA4896 regulates genesparticipating in the synthesis of pyocins,77 which are toxins active against closely-relatedstrains. However, it is unclear how these specific genes relate to the azithromycin treatment,and additional experiments would be needed to find any specific connection.2223Figure 8. Subnetwork connecting genes involved in iron uptake, regulation andpyoverdine biosynthesis in P. aeruginosa PAO1 grown in serum and treated withazithromycin. Most genes are up-regulated in this low-iron environment. Inset: foxI andPA4896, both sigma factors, are essential and differentially-expressed under the azithromycintreatment in serum. Steiner nodes (in gray) were added to connect the independentcomponents.3.2 Identification of genes involved in biofilm formationP. aeruginosa is a leading cause of chronic infections due to its ability to form biofilms onmedical devices and mucosal surfaces, where the bacterial cells lose motility, aggregate, andsecrete an extracellular matrix that potentiates antibiotic resistance and protects them againstthe host immune system.78,79 To identify and characterize genes involved in this process, P.aeruginosa PA14 was grown in planktonic and biofilm conditions and differentially-expressed genes were identified using RNA-Seq. Similarly, the transposon mutant librarieswere grown in the same conditions to identify essential genes through Tn-Seq. For biofilmsrelative to planktonic cells, 1,302 differentially-expressed genes and 129 essential genes wereidentified. Once again, I integrated these datasets by building a zero-order combined networkcontaining 729 nodes. To filter this network, I selected the genes associated the enriched GO term “multi-organismprocess”. This GO term is defined as \"a biological process which involves another organismof the same or different species\", thus biofilms are by definition a multi-organism process.Pel is one of the exopolysaccharides that forms the extracellular matrix in P. aeruginosabiofilms,80 and predictably the genes that encode its biosynthesis were up-regulated (Fig.9A). Genes encoding the Type III secretion system (T3SS) were included in this network sincethis system involves the interaction of the bacteria with the host cells, also making it a multi-organism process. The T3SS creates a molecular needle that penetrates the host cellmembrane and secretes toxins directly into the cytoplasm during infection.81 Biofilms aretypically associated with chronic infections, so virulence systems such as the T3SS are not asnecessary and thus down-regulated (Fig. 9B). However, the gene encoding one of the maincomponents of the T3SS, pcsF, was identified as essential for biofilm growth. Anotheressential gene, xcpU, is part of the Type II secretion system and has been previously shown24to be essential for swarming motility82 and for growth in airway mucus where cystic fibrosischronic infections take place.83 3.2.1 Energy metabolism in biofilmsFinally, we were interested in characterizing energy metabolism in P. aeruginosa biofilms. P.aeruginosa has versatile energy metabolism, allowing it to thrive in aerobic, microaerobicand anaerobic environments.84 Biofilms are large three-dimensional structures, and due totheir size, a decreasing gradient of nutrients and oxygen is created from the periphery to thecenter of the biofilm. Thus, diverse microenvironments exist within P. aeruginosa biofilms,and the bacteria must adapt their metabolism accordingly. 25Figure 9. Subnetwork generated using the enriched GO term\"multi-organism process\" in P. aeruginosa PA14 growing as abiofilm. (A) Pel synthesis genes are up-regulated. (B) Genes in the typeIII secretion system are down-regulated. XcpU is a component of theType II secretion system. The genes pscF and xcpU are essential.To examine this process, I generated a subnetwork using the enriched GO term “electrontransfer activity” (Fig. 10). This subnetwork shows the up-regulation of genes involved inaerobic (sdhBCD, nuoGE, PA14_57570, -60, -40) and anaerobic respiratory chains (nor, nir,nar, nos, PA14_06790) and up-regulation of terminal oxidases with a range of affinities tooxygen (cioA, coxAB-coIII, PA14_10500, PA14_44340, -50, -60, -70, PA14_40000). The up-regulation of these distinct energy metabolism pathways supports the existence ofmetabolically-diverse subpopulations in biofilms. 26Figure 10. Subnetwork of genes involved in respiratorymetabolism in P. aeruginosa PA14 biofilm. Genes participatingin both aerobic and anaerobic metabolism are upregulated, aswell as terminal oxidases with varying affinities to oxygen.Chapter 4: ConclusionThe network-based approach presented here enables the rapid interpretation, integration andvisualization of large high-throughput datasets with thousands of significant genes. PaIntDBcombines network analysis features that usually require more than one tool, and because it isimplemented as a web application with an intuitive graphical interface, these features areavailable for users without any special training or a computational background. The filtering system deals effectively with the hairball issue in large networks better than theexisting alternatives, combining and expanding the biological selection feature found inNetworkAnalyst with the Prize-collecting Steiner Forest algorithm used in OmicsIntegratorto generate high-confidence, biologically-relevant subnetworks. When there are manyseparate components, the addition of Steiner nodes can connect them to contextualize thedifferent pathways within a larger network.PaIntDB employs the most comprehensive Pseudomonas aeruginosa interactome to date.Following in the footsteps of the popular InnateDB,32 that contains curated interactions in theinnate immune response, PaIntDB is the first step towards developing a similarknowledgebase of molecular interactions in a pathogenic bacterium. I demonstrated the useof PaIntDB to characterize the behaviour of functional gene groups involved in growth underphysiologically-relevant conditions and during biofilm formation. I showed how thesubnetworks naturally group the genes according to their pathways or operons, thus aiding inthe generation of new hypotheses about the underlying biology behind the observed changes.4.1 LimitationsThe main caveat to this network-based approach is the limited availability of PPI data forbacteria. Overall, 3,829 proteins in strain PAO1 and 3,722 proteins in strain PA14 have atleast one interaction recorded in PaIntDB, representing 67.3% and 62.2% of their respective27genomes. When only considering interactions verified experimentally, the coverage isreduced to 30.5% in PAO1 and 26.5% in PA14. This must be taken into account whenanalyzing data with PaIntDB, since any analysis will always be a subset of all the identifiedgenes. Similarly, many genes that have interaction data are not yet annotated, and thenetworks often contain genes for which the function is still unknown, so PaIntDB can also bea valuable tool to generate hypotheses regarding their function based on their interactionswith well-characterized genes.Another assumption in this approach is that all molecular interactions happen at the sametime. PPI are dynamic and often involve the formation of short-lived protein complexes thatonly occur under certain environmental conditions or during different stages of growth andinfection.85 Therefore, the networks generated by PaIntDB include interactions that might notbe actually happening under the conditions of study. However, this issue is common to mostexisting PPI network-based approaches, and overall, the primary goal of PaIntDB is to aid ingenerating new hypotheses and suggesting confirmatory experiments that reflect theconditions of study, rather than accurately reflecting the interactions happening in the cell ata specific time point. The web application is responsive when exploring networks up to around 1,200 genes. Asnetworks get larger than this number, performance falters when selecting nodes andgenerating subnetworks, taking more than 5 seconds for each, although the analyses can stillbe run and the network can be explored with fluidity. The Dash framework used to build theapplication requires the network to be stored client-side in the browser, introducing thislatency when it is modified, sent to the server, then back to the client. An alternativeapproach for these larger networks would be to not draw them at all, using the filteringsystem to select the desired nodes in a table view, then generating and visualizing a smallersubnetwork.284.2 Future directionsPaIntDB can take a list of genes derived from any omics experiment, but its integrationfeatures are focused on RNA-Seq and Tn-Seq due to their popularity for bacterial studies. Inprinciple, functionality can be extended to integrate different combinations, such astranscriptomics with proteomics, and to allow the integration of more than two datasets. Thedatabase contains 14,363 protein-metabolite interactions that are unused in the application’sinitial release. These interactions could be used to integrate metabolomics with other omicsdatasets by mapping the metabolites to their interacting proteins in a similar fashion toMetaBridge,86 another web-based tool developed in our lab. Due to its open source natureand implementation, PaIntDB can be extended to include other bacterial species if theirinteraction data is compiled following the same database schema. The source code isavailable under a permissive BSD license, allowing the community to develop, modify andextend it. The network generation module and visualization interface are species-agnostic.To address the limited GO annotations in P. aeruginosa, pathway enrichment support can beadded utilizing existing databases KEGG,28 MetaCyc,29 or Reactome.30These would workusing the same enrichment approach, binning the input genes into their respective pathwaysinstead of GO terms, and would give the user the option to generate pathway-specificsubnetworks. Similarly, the annotations from the Comprehensive Antibiotic ResistanceDatabase (CARD),87 that collects and organizes information about the resistome in P.aeruginosa and other pathogenic bacteria, could be added to enable the selection of geneslinked with antibiotic resistance.PaIntDB could take more advantage of the network topology to identify interesting structuresand nodes. The only network topology statistic used so far is the simplest: node degree.Other topological information, such as node centrality or betweenness, could give a bettermeasure of a node’s importance in the network. Community detection algorithms can also be29implemented to find highly-interconnected modules, enabling the comparison of thesetopological groups with the biologically-informed subnetworks.In its current implementation, PaIntDB assumes the user has pre-existing biologicalknowledge to select specific genes according to enriched GO terms or cellular locations theyare already interested in. GO terms have a hierarchical structure, so that filter could beimproved by selecting terms at a specific level in the hierarchy, since most networks have 50-100 statistically enriched terms that are hard to organize mentally just by looking at a drop-down/search menu. 30Bibliography1. Moradali MF, Ghods S, Rehm BHA. Pseudomonas aeruginosa lifestyle: a paradigm foradaptation, survival, and persistence. Frontiers in Cellular and Infection Microbioliogy.2017;7. doi:10.3389/fcimb.2017.000392. Pang Z, Raudonis R, Glick BR, Lin T-J, Cheng Z. Antibiotic resistance in Pseudomonasaeruginosa: mechanisms and alternative therapeutic strategies. Biotechnology Advances. 2019;37(1):177-192. doi:10.1016/j.biotechadv.2018.11.0133. Sun E, Gill EE, Falsafi R, Yeung A, Liu S, Hancock REW. Broad-spectrum adaptive antibiotic resistance associated with Pseudomonas aeruginosa mucin-dependent surfingmotility. Antimicrobial Agents and Chemotherapy. 2018; 62(9). doi:10.1128/AAC.00848-184. Taylor PK, Yeung ATY, Hancock REW. Antibiotic resistance in Pseudomonas aeruginosa biofilms: Towards the development of novel anti-biofilm therapies. Journal of Biotechnology. 2014; 191:121-130. doi:10.1016/j.jbiotec.2014.09.0035. de la Fuente-Núñez C, Reffuveille F, Fernández L, Hancock RE. Bacterial biofilm development as a multicellular adaptation: antibiotic resistance and new therapeutic strategies. Current Opinion in Microbiology. 2013; 16(5):580-589. doi:10.1016/j.mib.2013.06.0136. Yeung ATY, Torfs ECW, Jamshidi F, et al. Swarming of Pseudomonas aeruginosa is controlled by a broad spectrum of transcriptional regulators, Including MetR. Journal of Bacteriology. 2009;191(18):5592-5602. doi:10.1128/JB.00157-097. Pletzer D, Sun E, Ritchie C, et al. Surfing motility is a complex adaptation dependent on the stringent stress response in Pseudomonas aeruginosa LESB58. PLOS Pathogens.2020;16(3):e1008444. doi:10.1371/journal.ppat.10084448. Smith RS, Iglewski BH. P. aeruginosa quorum-sensing systems and virulence. Current Opinion in Microbiology. 2003;6(1):56-60. doi:10.1016/S1369-5274(03)00008-09. Coleman SR, Smith ML, Spicer V, Lao Y, Mookherjee N, Hancock REW. Overexpression of the small RNA PA0805.1 in Pseudomonas aeruginosa modulates theexpression of a large set of genes and proteins, resulting in altered motility, cytotoxicity,and tobramycin resistance. mSystems. 2020;5(3). doi:10.1128/mSystems.00204-2010. Breidenstein EBM, Khaira BK, Wiegand I, Overhage J, Hancock REW. Complex ciprofloxacin resistome revealed by screening a Pseudomonas aeruginosa mutant Library for Altered Susceptibility. Antimicrobial Agents and Chemotherapy. 2008;52(12):4486-4491. doi:10.1128/AAC.00222-0811. Han M-L, Zhu Y, Creek DJ, et al. Comparative metabolomics and transcriptomics teveal multiple pathways associated with polymyxin killing in Pseudomonas aeruginosa. mSystems. 2019;4(1). doi:10.1128/mSystems.00149-183112. Alford MA, Baghela A, Yeung ATY, Pletzer D, Hancock REW. NtrBC regulates invasiveness and virulence of Pseudomonas aeruginosa during high-density Infection. Frontiers in Microbiology. 2020;11. doi:10.3389/fmicb.2020.0077313. Molina-Mora JA, Chinchilla-Montero D, Chavarría-Azofeifa M, et al. Transcriptomic determinants of the response of ST-111 Pseudomonas aeruginosa AG1 to ciprofloxacin identified by a top-down systems biology approach. Scientific Reports. 2020;10(1):13717. doi:10.1038/s41598-020-70581-214. Yan X, He B, Liu L, et al. Antibacterial mechanism of silver nanoparticles in Pseudomonas aeruginosa: proteomics approach. Metallomics. 2018;10(4):557-564. doi:10.1039/C7MT00328E15. Piatek M, Griffith DM, Kavanagh K. Quantitative proteomic reveals gallium maltolate induces an iron-limited stress response and reduced quorum-sensing in Pseudomonas aeruginosa. Journal of Biological Inorganic Chemistry. 2020;25(8):1153-1165. doi:10.1007/s00775-020-01831-x16. Coleman SR, Bains M, Smith ML, et al. The small RNAs PA2952.1 and PrrH as regulators of virulence, motility and iron metabolism in Pseudomonas aeruginosa. Applied and Environmenta Microbioliogy. Published online November 6, 2020. doi:10.1128/AEM.02182-2017. Mielko KA, Jabłoński SJ, Milczewska J, Sands D, Łukaszewicz M, Młynarz P. Metabolomic studies of Pseudomonas aeruginosa. World Journal of Microbioliogy and Biotechnology 2019;35(11). doi:10.1007/s11274-019-2739-118. McAdam PR, Richardson EJ, Fitzgerald JR. High-throughput sequencing for the study of bacterial pathogen biology. Current Opinion in Microbiology. 2014;19:106-113. doi:10.1016/j.mib.2014.06.00219. Love MI, Huber W, Anders S. Moderated estimation of fold change and dispersion for RNA-seq data with DESeq2. Genome Biology. 2014;15(12):550. doi:10.1186/s13059-014-0550-820. Law CW, Chen Y, Shi W, Smyth GK. voom: precision weights unlock linear model analysis tools for RNA-seq read counts. Genome Biology. 2014;15(2):R29. doi:10.1186/gb-2014-15-2-r2921. Robinson MD, McCarthy DJ, Smyth GK. edgeR: a Bioconductor package for differential expression analysis of digital gene expression data. Bioinformatics. 2010;26(1):139-140. doi:10.1093/bioinformatics/btp61622. van Opijnen T, Bodi KL, Camilli A. Tn-seq: high-throughput parallel sequencing for fitness and genetic interaction studies in microorganisms. Nature Methods. 2009;6(10):767-772. doi:10.1038/nmeth.13773223. Jensen PA, Zhu Z, van Opijnen T. Antibiotics disrupt coordination between transcriptional and phenotypic stress responses in pathogenic bacteria. Cell Reports. 2017;20(7):1705-1716. doi:10.1016/j.celrep.2017.07.06224. Mathur R, Rotroff D, Ma J, Shojaie A, Motsinger-Reif A. Gene set analysis methods: a systematic comparison. BioData Mining. 2018;11(1):8. doi:10.1186/s13040-018-0166-825. Zyla J, Marczyk M, Domaszewska T, Kaufmann SHE, Polanska J, Weiner J. Gene set enrichment for reproducible science: comparison of CERNO and eight other algorithms. Bioinformatics. 2019;35(24):5146-5154. doi:10.1093/bioinformatics/btz44726. Ashburner M, Ball CA, Blake JA, et al. Gene Ontology: tool for the unification of biology. Nature Genetics. 2000;25(1):25-29. doi:10.1038/7555627. Gene Ontology Consortium T. The Gene Ontology Resource: 20 years and still GOing strong. Nucleic Acids Research. 2019;47(D1):D330-D338. doi:10.1093/nar/gky105528. Kanehisa M, Furumichi M, Sato Y, Ishiguro-Watanabe M, Tanabe M. KEGG: integrating viruses and cellular organisms. Nucleic Acids Research. doi:10.1093/nar/gkaa97029. Caspi R, Billington R, Fulcher CA, et al. The MetaCyc database of metabolic pathways and enzymes. Nucleic Acids Research. 2018; 46(D1):D633-D639. doi:10.1093/nar/gkx93530. Jassal B, Matthews L, Viteri G, et al. The reactome pathway knowledgebase. Nucleic Acids Research. 2020;48(D1):D498-D503. doi:10.1093/nar/gkz103131. Galán-Vásquez E, Luna B, Martínez-Antonio A. The regulatory network of Pseudomonas aeruginosa. Microbial Informatics and Experimentation. 2011;1(1):3. doi:10.1186/2042-5783-1-332. Breuer K, Foroushani AK, Laird MR, et al. InnateDB: systems biology of innate immunity and beyond—recent updates and continuing curation. Nucleic Acids Research. 2013;41(Database issue):D1228-D1233. doi:10.1093/nar/gks114733. Wanichthanarak K, Fahrmann JF, Grapov D. Genomic, proteomic, and metabolomic data integration strategies. Biomarker Insights. 2015;10(Suppl 4):1-6. doi:10.4137/BMI.S2951134. Lee AH, Shannon CP, Amenyogbe N, et al. Dynamic molecular changes during the first week of human life follow a robust developmental trajectory. Nature Communications. 2019;10(1):1-14. doi:10.1038/s41467-019-08794-x35. Orchard S, Ammari M, Aranda B, et al. The MIntAct project--IntAct as a common curation platform for 11 molecular interaction databases. Nucleic Acids Research. 2013;42(Database issue):D358-63. doi:10.1093/nar/gkt11153336. Szklarczyk D, Gable AL, Lyon D, et al. STRING v11: protein-protein association networks with increased coverage, supporting functional discovery in genome-wide experimental datasets. Nucleic Acids Research. 2019;47(D1):D607-D613. doi:10.1093/nar/gky113137. Anupama R, Sajitha Lulu S, Mukherjee A, Babu S. Cross-regulatory network in Pseudomonas aeruginosa biofilm genes and TiO2 anatase induced molecular perturbations in key proteins unraveled by a systems biology approach. Gene. 2018;647:289-296. doi:10.1016/j.gene.2018.01.04238. Miryala SK, Anbarasu A, Ramaiah S. Systems biology studies in Pseudomonas aeruginosa PA01 to understand their role in biofilm formation and multidrug efflux pumps. Microbial Pathogenesis. 2019;136:103668. doi:10.1016/j.micpath.2019.10366839. Karnovsky A, Weymouth T, Hull T, et al. Metscape 2 bioinformatics tool for the analysis and visualization of metabolomics and gene expression data. Bioinformatics. 2012;28(3):373-380. doi:10.1093/bioinformatics/btr66140. Hu Z, Hung J-H, Wang Y, et al. VisANT 3.5: multi-scale network visualization, analysisand inference based on the gene ontology. Nucleic Acids Research. 2009;37(suppl_2):W115-W121. doi:10.1093/nar/gkp40641. Warde-Farley D, Donaldson SL, Comes O, et al. The GeneMANIA prediction server: biological network integration for gene prioritization and predicting gene function. Nucleic Acids Research. 2010;38(suppl_2):W214-W220. doi:10.1093/nar/gkq53742. Zhang M, Su S, Bhatnagar RK, Hassett DJ, Lu LJ. Prediction and analysis of the protein interactome in Pseudomonas aeruginosa to enable network-based drug target selection. PLOS ONE. 2012;7(7):e41202. doi:10.1371/journal.pone.004120243. Tuncbag N, Gosline SJC, Kedaigle A, Soltis AR, Gitter A, Fraenkel E. Network-based interpretation of diverse high-throughput datasets through the OmicsIntegrator softwarepackage. PLOS Computational Biology. 2016;12(4):e1004879. doi:10.1371/journal.pcbi.100487944. Shannon P, Markiel A, Ozier O, et al. Cytoscape: a software environment for integrated models of biomolecular interaction networks. Genome Research. 2003;13(11):2498-2504. doi:10.1101/gr.123930345. Bindea G, Mlecnik B, Hackl H, et al. ClueGO: a Cytoscape plug-in to decipher functionally grouped gene ontology and pathway annotation networks. Bioinformatics. 2009;25(8):1091-1093. doi:10.1093/bioinformatics/btp10146. Doncheva NT, Morris JH, Gorodkin J, Jensen LJ. Cytoscape StringApp: Network analysis and visualization of proteomics data. Journal of Proteome Research. 2019;18(2):623-632. doi:10.1021/acs.jproteome.8b007023447. Zhou G, Soufan O, Ewald J, Hancock REW, Basu N, Xia J. NetworkAnalyst 3.0: a visual analytics platform for comprehensive gene expression profiling and meta-analysis. Nucleic Acids Research. 2019;47(W1):W234-W241. doi:10.1093/nar/gkz24048. Winsor GL, Griffiths EJ, Lo R, Dhillon BK, Shay JA, Brinkman FSL. Enhanced annotations and features for comparing thousands of Pseudomonas genomes in the Pseudomonas genome database. Nucleic Acids Research. 2016;44(D1):D646-653. doi:10.1093/nar/gkv122749. Kerrien S, Orchard S, Montecchi-Palazzi L, et al. Broadening the horizon – level 2.5 of the HUPO-PSI format for molecular interactions. BMC Biology. 2007;5(1):44. doi:10.1186/1741-7007-5-4450. Huang T-W, Tien A-C, Huang W-S, et al. POINT: a database for the prediction of protein-protein interactions based on the orthologous interactome. Bioinformatics. 2004;20(17):3273-3276. doi:10.1093/bioinformatics/bth36651. Whiteside MD, Winsor GL, Laird MR, Brinkman FSL. OrtholugeDB: a bacterial and archaeal orthology resource for improved comparative genomic analysis. Nucleic Acids Research. 2013;41(D1):D366-D376. doi:10.1093/nar/gks124152. Prieto C, De Las Rivas J. APID: Agile Protein Interaction DataAnalyzer. Nucleic Acids Research. 2006;34(Web Server issue):W298-W302. doi:10.1093/nar/gkl12853. Liu T, Lin Y, Wen X, Jorissen RN, Gilson MK. BindingDB: a web-accessible database of experimentally determined protein-ligand binding affinities. Nucleic Acids Research. 2007;35(Database issue):D198-201. doi:10.1093/nar/gkl99954. Xenarios I, Rice DW, Salwinski L, Baron MK, Marcotte EM, Eisenberg D. DIP: the Database of Interacting Proteins. Nucleic Acids Research. 2000;28(1):289-291.55. Keseler IM, Mackie A, Santos-Zavaleta A, et al. The EcoCyc database: reflecting new knowledge about Escherichia coli K-12. Nucleic Acids Research. 2017;45(D1):D543-D550. doi:10.1093/nar/gkw100356. Orchard S, Kerrien S, Abbani S, et al. Protein interaction data curation: the InternationalMolecular Exchange (IMEx) consortium. Nature Methods. 2012;9(4):345-350. doi:10.1038/nmeth.193157. Orchard S, Ammari M, Aranda B, et al. The MIntAct project—IntAct as a common curation platform for 11 molecular interaction databases. Nucleic Acids Research. 2014;42(Database issue):D358-D363. doi:10.1093/nar/gkt111558. Razick S, Magklaras G, Donaldson IM. iRefIndex: A consolidated protein interaction database with provenance. BMC Bioinformatics. 2008;9(1):405. doi:10.1186/1471-2105-9-4053559. Calderone A, Castagnoli L, Cesareni G. mentha: a resource for browsing integrated protein-interaction networks. Nature Methods. 2013;10(8):690-691. doi:10.1038/nmeth.256160. Licata L, Briganti L, Peluso D, et al. MINT, the molecular interaction database: 2012 update. Nucleic Acids Research. 2012;40(D1):D857-D861. doi:10.1093/nar/gkr93061. Goll J, Rajagopala SV, Shiau SC, Wu H, Lamb BT, Uetz P. MPIDB: the microbial protein interaction database. Bioinformatics. 2008;24(15):1743-1744. doi:10.1093/bioinformatics/btn28562. Santos-Zavaleta A, Salgado H, Gama-Castro S, et al. RegulonDB v 10.5: tackling challenges to unify classic and high throughput knowledge of gene regulation in E. coli K-12. Nucleic Acids Research. 2019;47(D1):D212-D220. doi:10.1093/nar/gky107763. The UniProt Consortium. UniProt: a worldwide hub of protein knowledge. Nucleic Acids Research. 2019;47(D1):D506-D515. doi:10.1093/nar/gky104964. Schweppe DK, Zheng C, Chavez JD, et al. XLinkDB 2.0: integrated, large-scale structural analysis of protein crosslinking data. Bioinformatics. 2016;32(17):2716-2718.doi:10.1093/bioinformatics/btw23265. Franz M, Lopes CT, Huck G, Dong Y, Sumer O, Bader GD. Cytoscape.js: a graph theory library for visualisation and analysis. Bioinformatics. 2016;32(2):309-311. doi:10.1093/bioinformatics/btv55766. Klopfenstein DV, Zhang L, Pedersen BS, et al. GOATOOLS: A Python library for GeneOntology analyses. Scientific Reports. 2018;8(1):10872. doi:10.1038/s41598-018-28948-z67. Gansner ER, North SC. An open graph visualization system and its applications to software engineering. Software - Practice and Experience. 2000;30(11):1203-1233.68. Hegde C, Indyk P, Schmidt L. A nearly-linear time framework for graph-structured sparsity. In: IJCAI International Joint Conference on Artificial Intelligence. Vol 2016-January. ; 2016:4165-4169.69. Weber BS, De Jong AM, Guo ABY, et al. Genetic and chemical screening in human blood serum reveals unique antibacterial targets and compounds against Klebsiella pneumoniae. Cell Reports. 2020;32(3):107927. doi:10.1016/j.celrep.2020.10792770. Samant S, Lee H, Ghassemi M, et al. Nucleotide biosynthesis is critical for growth of bacteria in human blood. PLOS Pathogens. 2008;4(2):e37. doi:10.1371/journal.ppat.004003771. Colquhoun JM, Wozniak RAF, Dunman PM. Clinically relevant growth conditions alterAcinetobacter baumannii antibiotic susceptibility and promote identification of novel Antibacterial Agents. PLOS ONE. 2015;10(11):e0143033. doi:10.1371/journal.pone.01430333672. Lin L, Nonejuie P, Munguia J, et al. Azithromycin synergizes with cationic antimicrobial peptides to exert bactericidal and therapeutic activity against highly multidrug-resistant gram-negative bacterial pathogens. EBioMedicine. 2015;2(7):690-698. doi:10.1016/j.ebiom.2015.05.02173. Belanger CR, Lee AH-Y, Pletzer D, Dhillon BK, Falsafi R, Hancock REW. Identification of novel targets of azithromycin activity against Pseudomonas aeruginosa grown in physiologically relevant media. PNAS. Published online December 10, 2020. doi:10.1073/pnas.200762611774. Li Y, Kawakami N, Ogola HJO, et al. A novel L-aspartate dehydrogenase from the mesophilic bacterium Pseudomonas aeruginosa PAO1: molecular characterization and application for L-aspartate production. Applied Microbiology and Biotechnology. 2011;90(6):1953-1962. doi:10.1007/s00253-011-3208-475. Kang D, Revtovich AV, Chen Q, Shah KN, Cannon CL, Kirienko NV. Pyoverdine-dependent virulence of Pseudomonas aeruginosa isolates from cystic fibrosis patients. Frontiers in Microbiology. 2019;10. doi:10.3389/fmicb.2019.0204876. Bastiaansen KC, van Ulsen P, Wijtmans M, Bitter W, Llamas MA. Self-cleavage of the Pseudomonas aeruginosa cell-surface signaling anti-sigma factor FoxR occurs through an N-O acyl rearrangement. Journal of Biological Chemistry. 2015;290(19):12237-12246. doi:10.1074/jbc.M115.64309877. Llamas MA, Mooij MJ, Sparrius M, Vandenbroucke‐Grauls CMJE, Ratledge C, Bitter W. Characterization of five novel Pseudomonas aeruginosa cell-surface signalling systems. Molecular Microbiology. 2008;67(2):458-472. doi:https://doi.org/10.1111/j.1365-2958.2007.06061.x78. Gellatly SL, Hancock REW. Pseudomonas aeruginosa : new insights into pathogenesis and host defenses. Pathogens and Disease. 2013;67(3):159-173. doi:10.1111/2049-632X.1203379. Ciofu O, Tolker-Nielsen T. Tolerance and resistance of Pseudomonas aeruginosa biofilms to antimicrobial agents—How P. aeruginosa can escape antibiotics. Frontiers in Microbiology. 2019;10. doi:10.3389/fmicb.2019.0091380. Jennings LK, Storek KM, Ledvina HE, et al. Pel is a cationic exopolysaccharide that cross-links extracellular DNA in the Pseudomonas aeruginosa biofilm matrix. PNAS. 2015;112(36):11353-11358. doi:10.1073/pnas.150305811281. Hauser AR. The Type III secretion system of Pseudomonas aeruginosa: Infection by injection. Nature Reviews Microbiology. 2009;7(9):654-665. doi:10.1038/nrmicro219982. Overhage J, Lewenza S, Marr AK, Hancock REW. Identification of genes involved in swarming motility using a Pseudomonas aeruginosa PAO1 Mini-Tn5-lux mutant Library. Journal of Bacteriology. 2007;189(5):2164-2169. doi:10.1128/JB.01623-063783. Alrahman MA, Yoon SS. Identification of essential genes of Pseudomonas aeruginosa for its growth in airway mucus. Journal of Microbiology. 2017;55(1):68-74. doi:10.1007/s12275-017-6515-384. Arai H. Regulation and function of versatile aerobic and anaerobic respiratory metabolism in Pseudomonas aeruginosa. Frontiers in Microbiology. 2011;2. doi:10.3389/fmicb.2011.0010385. Chen B, Fan W, Liu J, Wu F-X. Identifying protein complexes and functional modules—from static PPI networks to dynamic PPI networks. Briefings in Bioinformatics. 2014;15(2):177-194. doi:10.1093/bib/bbt03986. Hinshaw SJ, Lee AHY, Gill EE, Hancock REW. MetaBridge: enabling network-based integrative analysis via direct protein interactors of metabolites. Bioinformatics. 2018;34(18):3225-3227. doi:10.1093/bioinformatics/bty33187. Alcock BP, Raphenya AR, Lau TTY, et al. CARD 2020: antibiotic resistome surveillance with the comprehensive antibiotic resistance database. Nucleic Acids Research. 2020;48(D1):D517-D525. doi:10.1093/nar/gkz93538"@en ; edm:hasType "Thesis/Dissertation"@en ; vivo:dateIssued "2021-05"@en ; edm:isShownAt "10.14288/1.0395820"@en ; dcterms:language "eng"@en ; ns0:degreeDiscipline "Bioinformatics"@en ; edm:provider "Vancouver : University of British Columbia Library"@en ; dcterms:publisher "University of British Columbia"@en ; dcterms:rights "Attribution-NonCommercial-NoDerivatives 4.0 International"@* ; ns0:rightsURI "http://creativecommons.org/licenses/by-nc-nd/4.0/"@* ; ns0:scholarLevel "Graduate"@en ; dcterms:title "Network-based integration and visualization of high-throughput datasets in Pseudomonas aeruginosa"@en ; dcterms:type "Text"@en ; ns0:identifierURI "http://hdl.handle.net/2429/77248"@en .