@prefix vivo: <http://vivoweb.org/ontology/core#> .
@prefix edm: <http://www.europeana.eu/schemas/edm/> .
@prefix ns0: <https://open.library.ubc.ca/terms#> .
@prefix dcterms: <http://purl.org/dc/terms/> .
@prefix skos: <http://www.w3.org/2009/08/skos-reference/skos.html#> .

<http://dx.doi.org/10.14288/1.0395820>
  vivo:departmentOrSchool "Science, Faculty of"@en ;
  edm:dataProvider "DSpace"@en ;
  ns0:degreeCampus "UBCV"@en ;
  dcterms:creator "Castillo Arnemann, Javier José"@en ;
  dcterms:issued "2021-02-05T09:05:05Z"@en, "2021"@en ;
  vivo:relatedDegree "Master of Science - MSc"@en ;
  ns0:degreeGrantor "University of British Columbia"@en ;
  dcterms:description "Pseudomonas aeruginosa is a clinically-important, opportunistic pathogen that is the third leading cause of hospital infections in North America, the major cause of life-threatening chronic infections in patients with cystic fibrosis, and a major threat due to its high level of antibiotic resistance. To understand the complexity behind the adaptive behaviours of P. aeruginosa it is necessary to employ systems biology methods made possible by the ongoing revolution in high-throughput omics technologies. One powerful systems biology approach leverages existing molecular interaction databases to generate networks showing the interactions between the identified molecules. However, most existing interaction databases are focused on data for humans and other well-studied organisms; thus, there is a lack of systems biology tools to study medically-important bacterial pathogens such as P. aeruginosa. I developed the Pseudomonas aeruginosa Interaction Database, PaIntDB, to fill in this gap. It is an intuitive web-based tool for network-based systems biology analyses using protein-protein interactions (PPI). It enables the interpretation and visualization of omics studies including proteomics, RNA-Seq, and Tn-Seq. These high-throughput datasets are mapped onto PPI networks, which can be explored visually and filtered to uncover putative molecular pathways related to the conditions of study. PaIntDB employs the most comprehensive P. aeruginosa interactome to date, collected from a variety of resources, including interactions predicted computationally to further expand analysis capabilities. Two case studies demonstrate how PaIntDB can be used to quickly identify functional gene groups involved in growth in physiologically-relevant conditions and biofilm formation, and use these insights to derive new hypotheses about the underlying biology."@en ;
  edm:aggregatedCHO "https://circle.library.ubc.ca/rest/handle/2429/77248?expand=metadata"@en ;
  skos:note "NETWORK-BASED INTEGRATION AND VISUALIZATION OF HIGH-THROUGHPUT DATASETS IN PSEUDOMONAS AERUGINOSAbyJavier José Castillo ArnemannB.Sc., Monterrey Institute of Technology and Higher Education, 2018A THESIS SUBMITTED IN PARTIAL FULFILLMENT OF THE REQUIREMENTS FOR THE DEGREE OFMASTER OF SCIENCEinTHE FACULTY OF GRADUATE AND POSTDOCTORAL STUDIES(Bioinformatics)THE UNIVERSITY OF BRITISH COLUMBIA(Vancouver)February 2021© Javier José Castillo Arnemann, 2021The following individuals  certify  that  they  have read,  and recommend to the  Faculty ofGraduate and Postdoctoral Studies for acceptance, the thesis entitled: Network-based integration and visualization of high-throughput  datasets  in   Pseudomonas  aeruginosa submitted  by Javier  José  Castillo  Arnemann in partial  fulfillment  of  therequirements for the degree of Master of Science in Bioinformatics.Examining Committee:Robert E. W. HancockProfessor, Microbiology and Immunology, University of British ColumbiaSupervisorTamara MunznerProfessor, Computer Science, University of British ColumbiaSupervisory Committee MemberFiona BrinkmanProfessor,  Molecular Biology and Biochemistry, Simon Fraser University  Supervisory Committee MemberiiAbstractPseudomonas aeruginosa is a clinically-important, opportunistic pathogen that is the thirdleading cause of hospital infections in North America, the major cause of life-threateningchronic infections in patients with cystic fibrosis, and a major threat due to its high level ofantibiotic  resistance.  To understand the  complexity  behind the  adaptive behaviours  of  P.aeruginosa it is necessary to employ systems biology methods made possible by the ongoingrevolution in high-throughput omics technologies. One powerful systems biology approachleverages  existing  molecular  interaction  databases  to  generate  networks  showing  theinteractions between the identified molecules. However, most existing interaction databasesare focused on data for humans and other well-studied organisms; thus, there is a lack ofsystems  biology  tools  to  study  medically-important  bacterial  pathogens  such  as  P.aeruginosa. I developed the Pseudomonas aeruginosa Interaction Database, PaIntDB, to fillin this gap.  It  is an intuitive web-based tool for network-based systems biology analysesusing  protein-protein  interactions  (PPI).  It  enables  the  interpretation  and visualization  ofomics studies including proteomics, RNA-Seq, and Tn-Seq. These high-throughput datasetsare  mapped  onto  PPI  networks,  which  can  be  explored  visually  and filtered  to  uncoverputative molecular pathways related to the conditions of study. PaIntDB employs the mostcomprehensive  P. aeruginosa interactome to date,  collected  from a  variety  of  resources,including interactions predicted computationally to further expand analysis capabilities. Twocase  studies  demonstrate  how PaIntDB can  be  used  to  quickly  identify  functional  genegroups involved in growth in physiologically-relevant conditions and biofilm formation, anduse these insights to derive new hypotheses about the underlying biology.iiiLay SummaryPseudomonas aeruginosa is a leading cause of chronic and hospital infections, and a modelorganism for the ongoing issue of antibiotic  resistance.  The biology behind the adaptivelifestyle of P. aeruginosa is very complex and involves the interaction of thousands of genes.Current sequencing technologies allow the large-scale study of the entire genome, but theresulting datasets are huge and difficult to interpret for humans. The tool presented here,PaIntDB,  enables  the  interpretation  and  visualization  of  these  results  as  a  network  ofinterconnected  genes,  by  mapping  them  onto  a  database  of  compiled  protein-proteininteractions. These networks can be filtered using biological and experimental information togenerate smaller, manageable subnetworks that group the interacting genes according to theirfunction.  PaIntDB is available as a web application with an intuitive graphical interface,allowing researchers and non-experts to analyze their results without training in computerprogramming.ivPrefaceDr. Robert E. W. Hancock conceived the idea of compiling the existing protein-protein andprotein-metabolite data in  P. aeruginosa to perform network-based analyses through a webapplication. Olga Solodova built the database by compiling and annotating the available  P.aeruginosa interaction data. The idea to extend the application’s functionality to integrateRNA-Seq and Tn-Seq results  came from Dr.  Robert  Hancock and Dr.  Amy Lee.  CorrieBelanger and Melanie Dostert were beta testers for the web application and performed theexperiments  that  generated  the  data  analyzed in  Chapter  3.  I  had  the  idea  to  develop avisualization module as part of Dr. Tamara Munzner’s Information Visualization course. Ideveloped the Python module to generate the networks and was responsible for designingand implementing the framework and web interface behind PaIntDB.A version  of  Chapter  2  describing  the  design  and  use  of  PaIntDB  was  submitted  forpublication. Olga Solodova contributed the methods for creating the database, I drafted themanuscript and performed all further research, and Dr. Bhavjinder Dhillon and Dr. RobertHancock reviewed and extended the manuscript for submission.Castillo-Arnemann, J., Solodova, O., Dhillon, B. & Hancock, R. PaIntDB: Network-basedintegration  and  visualization  of  high-throughput  results  in Pseudomonas  aeruginosa.Submitted.vTable of ContentsAbstract...................................................................................................................................iiiLay Summary..........................................................................................................................ivPreface......................................................................................................................................vTable of Contents....................................................................................................................viList of Tables..........................................................................................................................viiList of Figures.......................................................................................................................viiiList of Abbreviations..............................................................................................................ixAcknowledgements..................................................................................................................xDedication................................................................................................................................xiChapter 1: Introduction..........................................................................................................11.1 High-throughput sequencing approaches to study pathogenic bacteria..........................11.2 Network-based interpretation and integration of high-throughput results......................31.3 Existing solutions............................................................................................................41.4 Proposed solution............................................................................................................61.5 Database creation............................................................................................................7Chapter 2: Network-based interpretation and integration of high-throughput results with PaIntDB..........................................................................................................................122.1 Network generation.......................................................................................................132.2 Network visualization...................................................................................................152.3 Subnetwork generation.................................................................................................16Chapter 3: Applications of PaIntDB....................................................................................193.1 Identification of genes important for growth in host-like conditions...........................193.1.1 Nucleotide metabolism..........................................................................................203.1.2 Iron uptake.............................................................................................................223.2 Identification of genes involved in biofilm formation..................................................243.2.1 Energy metabolism in biofilms.............................................................................25Chapter 4: Conclusion...........................................................................................................274.1 Limitations....................................................................................................................274.2 Future directions...........................................................................................................29Bibliography...........................................................................................................................31viList of TablesTable 1. Comparison of existing software for network-based interpretation of high-throughput results in P. aeruginosa...........................................................................................5Table 2. List of sources compiled to build the PaIntDB interaction database.........................10Table 3. Network classes handle different input data..............................................................14viiList of FiguresFigure 1. Database schema used to store the compiled interaction data...................................9Figure 2. Workflow of a network-based analysis with PaIntDB.............................................12Figure 3. Graphical user interface to build PPI networks in PaIntDB.....................................13Figure 4. Graphical user interface to explore PPI networks. This is a combined network connecting RNA-Seq and Tn-Seq genes.................................................................................16Figure 5. Functional subnetworks generated with PaIntDB, showing genes involved in siderophore mediated iron transport........................................................................................18Figure 6. Subnetwork connecting differentially-expressed and essential gene products involved in nucleotide metabolism in P. aeruginosa PAO1 grown in serum..........................21Figure 7. Genes involved in nucleotide metabolism were largely down-regulated in P. aeruginosa PAO1 grown in serum...........................................................................................21Figure 8. Subnetwork connecting genes involved in iron uptake, regulation and pyoverdine biosynthesis in P. aeruginosa PAO1 grown in serum and treated with azithromycin.............23Figure 9. Subnetwork generated using the enriched GO term \"multi-organism process\" in P. aeruginosa PA14 growing as a biofilm...................................................................................25Figure 10. Subnetwork of genes involved in respiratory metabolism in P. aeruginosa PA14 biofilm.....................................................................................................................................26viiiList of AbbreviationscDNA: Complementary DNACSV: Comma-separated valuesGO: Gene ontologymRNA: messenger RNAPPI: Protein-protein interactionsPCSF: Prize-collecting Steiner forestRNA-Seq: RNA sequencingT3SS: Type III secretion systemTn-Seq: Transposon sequencingixAcknowledgementsI thank my supervisor, Dr. Robert E. W. Hancock, for his constant academic and personalsupport  throughout  my  degree.  I  would  also  like  to  thank  my  committee,  Dr.  TamaraMunzner and Dr. Fiona Brinkman for their valuable suggestions to improve my project. I want to thank all members of the Hancock laboratory for the great conversations and forproviding a great space to work and grow as a researcher. In particular, thanks to SusanFarmer for keeping the lights on and spoiling us with food at any possible chance. I thank Dr.Amy Lee, who was a great source of guidance and mentorship. I also thank Corrie Belangerand Melanie Dostert, who were a huge help in testing the web application and interpretingthe networks presented in Chapter 3.Finally, I thank the Canadian Institutes for Health Research and Mitacs, who provided thefunding for this research.xDedicationTo the universe, and all the cool beings that inhabit it.xiChapter 1: IntroductionPseudomonas  aeruginosa is  a  Gram-negative,  opportunistic  pathogen  involved  in  cysticfibrosis,  sepsis,  burn  infections  and  pneumonia,  and  is  a  leading  cause  of  nosocomialinfections. It is difficult to treat due to its innate, mutational, acquired and adaptive antibioticresistance and has a plethora of virulence factors.1–3 P. aeruginosa also possesses a range ofadaptive  mechanisms  including  biofilm  formation  on  surfaces,4,5 swarming  and  surfingmotility,6,7 and quorum sensing8 that allow it  to survive a wide variety of circumstances,particularly  contributing  to  infections.  These  adaptive  mechanisms  are  dependent  onhundreds of genes and the altered expression of hundreds to thousands of gene products,7,9,10making  a  systems  biology  approach  necessary  for  comprehensively  understanding  theunderlying biological processes, and providing the knowledgebase for devising potential newtreatments against infection. With the big-picture perspective provided by access to data-intensive genome-wide methods,the focus can shift from the role of individual genes to the interaction of groups of genes.This approach is possible due to the ongoing revolution in high-throughput technologies thatenable the screening of the entire set of molecules of a certain type in a cell, namely omicsstudies.  Transcriptomics,3,11–13 proteomics,14–16 and  metabolomics11,17 represent  high-throughput  omics  methods  that  have  become  crucial  to  understanding  the  signalling,regulatory  and metabolic  pathways  that  confer  on  P. aeruginosa its  remarkable  adaptivetraits. 1.1  High-throughput  sequencing  approaches  to  studypathogenic bacteriaThe increased throughput and reduced costs of nucleic acid sequencing technologies haveresulted  in  new  applications  for  the  study  of  pathogenic  bacteria.  RNA-Seq,  whichcomprehensively  analyzes  the  transcriptome,  has  become  particularly  valuable  for1identifying  important  genes  and  pathways  involved  in  pathogenesis  and  antibioticresistance.18 It has superseded microarray hybridization experiments due to its more accuratequantification  of  gene  expression  levels,  scalability,  lack  of  requirement  for  backgroundsubtraction and single nucleotide resolution. For this method, bacteria are grown under theconditions  of  interest,  then  mRNA transcripts  are  extracted  and  reverse-transcribed  intocDNAs,  which  are  then  sequenced  and quantified,  resulting  in  a  snapshot  of  the  wholetranscriptome.  Finally,  the  expression  data  is  analyzed  to  find  genes  with  statisticallysignificant changes across the studied conditions.19–21Another  recently  developed  methodology,  Tn-Seq,  involves  the  use  of  a  promiscuoustransposon, with many possible insertion sites, to create a library of mutant bacteria that eachcontain single transposon insertions into a particular gene. The library, containing hundredsof thousands of individual mutants, is then grown under varying treatments or conditions asspecified in the study. Since the transposon insertions can induce loss-of-function in theirtarget gene, some mutants will not be able to grow under the treatment. By sequencing thegenomes of the initial and final populations, the relative quantification of each mutant is usedto  identify  genes  that  are  phenotypically  essential  for  growth  under  the  conditionsemployed.22 It  is  still  unclear  how  changes  in  transcript  abundance  correlate  with  thephenotypic  importance  of  a  gene,  making  RNA-Seq  and  Tn-Seq  complementaryapproaches.23High-throughput RNA-Seq studies often return hundreds, if not thousands, of significantlydifferentially expressed genes. This provides a challenge in the interpretation of these resultsin  a  biologically  meaningful  fashion,  to  enable researchers  to  devise further  experimentstargeting  specific  operons  or  pathways.  Usually  the  first  step  is  to  perform  functionalenrichment,  where  the  significant  genes  are  assigned  to  different  sets  and  a  statisticaloverrepresentation analysis such as Fischer’s exact test is applied to determine which setsinclude  more  genes  than  expected  by  chance.  Many  variations  of  this  method  existemploying different statistical approaches and tests,24,25 but the result is mostly the same: a2list of enriched GO terms or pathways that summarize the observed changes in expression.The  sets  include  functional  annotations  such  as  Gene  Ontology  (GO)26,27 or  biologicalpathways from databases such as KEGG,28 MetaCyc29 or Reactome.30From this point on, a biologist has to manually examine the enriched list  and search theliterature to find specific genes or pathways relevant to their studies. To streamline this time-consuming  process,  I  proposed  in  this  thesis  to  develop  a  network-based  method  ofintegrating,  visualizing  and  filtering  these  large  datasets  to  aid  microbiologists  in  thegeneration of new hypotheses for further experiments.1.2  Network-based  interpretation  and  integration  of  high-throughput resultsThe flat gene lists returned by high-throughput experiments obviously paint an incompletepicture  of  the  biology  behind  the  observed  changes.  Genes  do  not  act  discretely  or  inisolation, but rather within the context of complex biochemical pathways with multi-levelsystems of regulation.31 Since all biological pathways involve enzymes and other moleculesinteracting with one another, modelling them as networks makes intuitive sense. In this view,nodes  represent  proteins  (or  any  other  biological  molecule)  and  edges  represent  theirinteraction in the cell,  including direct physical, metabolic or regulatory interactions. Thehypothesis is that if two proteins or biomolecules interact within a cell they are involved incommon  biological  purposes  or  events.  Such  interactions  are  termed  protein-proteininteractions (PPI) and for mammalian cells have been captured in the database InnateDBdeveloped by our lab in collaboration.32 PPI are not binary, but rather a single protein or biomolecule can interact with many otherbiomolecules.  This  enables  PPI  to  be  visualized  as  networks  that  inherently  capturerelationships between the interacting molecules, and group them in a manner relevant to theiroverlapping functions, capturing pathways and common functional roles. Highly-connected3nodes are termed hubs that are considered to have significant biological meaning since theyare  thought  to  function  by  receiving  and  disbursing  signals.  Such  hubs  can  be  readilyidentified  and  represent  potential  drug  targets  and  functional  biomarkers,  since  theirdisruption would have the potential to significantly alter the bacterium’s survival prospects.Network-based methods also enable the intuitive integration of different omics types, and candeliver similar information to data-driven complex multivariate statistical approaches.33,34 PPIare particularly useful to integrate RNA-Seq and Tn-Seq results, since genes can be easilymapped to proteins.The generation  of  these networks  relies  on existing  interaction databases  that  have beendeveloped to organize, standardize and distribute the massive amounts of data resulting fromhigh-throughput  screenings.32,35,36 Systems  biology  studies  often  mine  these  databases  togenerate  networks  involving  their  genes  of  interest.13,37,38 This  approach  enables  thecontextualization  of  the  experimental  results  with  existing  curated  information  about  theidentified molecules,  combined with the advantages  of  the network model  showing theirinteractions.1.3 Existing solutionsNetwork-based methods  are  a  multi-step  process  and often  require  multiple  tools.  Theirapplication is not as common in bacterial studies due to the relative lack of interaction datawhen compared to humans and other well-studied organisms. Accordingly, the majority ofexisting software for network-based analyses is not suitable for bacteria.39–41 PPI networks forbacteria can be created using STRING,36 but the visualization features are very basic, and thenetworks are restricted to tens of proteins. In addition, STRING interactions are not manuallycurated and include text-mined interactions, making them unreliable for large-scale analyses.The available tools that can be used for large-scale PPI network analysis in P. aeruginosa aresummarized in Table 1.4Table  1.  Comparison of  existing software  for network-based interpretation  of  high-throughput results in P. aeruginosa. Feature PaIntDB NetworkAnalyst OmicsIntegrator CytoscapeInput Gene list with optional expression dataRaw RNA-Seq counts or gene listGene list(s) and interactomeGene listSource of P. aeruginosa PPI dataCompiled from other databases and studiesSTRING or Zhang42Uploaded by the userSTRING, KEGGor Reactome, with pluginsPPI Network GenerationYes Yes Yes Yes, with pluginNetwork VisualizationYes Yes No YesEnrichment AnalysisYes Yes No Yes, with pluginBiogically-informed nodeselectionYes Yes No YesSubnetworkGenerationYes No Yes NoOmicsIntegrator43 takes one or more lists of genes and an interactome as input, then returns asubnetwork of the genes that best link the data, using the Prize-Collecting Steiner Forest(PCSF)  algorithm.  It  is  not  user-friendly  since  it  is  a  command  line  application,  theinteractome and gene  list(s)  must  be  pre-processed  by the  user,  and it  requires  externalsoftware to visualize the resulting networks. Cytoscape44 is a well-known, community-driven, open source software for visualization andanalysis of biological networks. It has hundreds of plugins for different network analyses,including GO term enrichment45 and generating PPI networks using data from STRING46 orother interaction databases. However, Cytoscape needs to be downloaded and installed, has a5steep learning curve and needs a specific  combination of plug-ins that  must  be installedindividually depending on the analysis.NetworkAnalyst47 contains a suite of advanced network analysis and visualization tools andis available online. By default, it employs manually-curated interactions from InnateDB togenerate PPI networks out of gene lists or expression data. However, since InnateDB doesnot  include  bacterial  data,  for   P.  aeruginosa  NetworkAnalyst  uses  either  STRINGinteractions or a computationally-predicted interactome.42A common problem in network visualization is the ‘hairball’ issue that occurs when networksget bigger and the overabundance of nodes and edges obscures the underlying structure. Tohandle this issue, NetworkAnalyst can prune the networks according to topological features.Topological methods are blind to the underlying biology and can therefore return networksthat while smaller and easier to visualize, might lose genes of functional importance. Theuser can also select nodes using enriched pathways or topological communities, but cannotcreate subnetworks using this selection. NetworkAnalyst also supports 3D visualization toaid  the  exploration  of  dense  networks,  but  this  approach  has  disadvantages  includingocclusion of some nodes, the inability to export it as a figure and the unfamiliarity of thisrepresentation for untrained users. 1.4 Proposed solutionIn this thesis, I have developed a PPI network-based method to visually explore, interpret andintegrate high-throughput results in P. aeruginosa, with particular attention to RNA-Seq andTn-Seq. This methodology is implemented as a web-based tool called PaIntDB, which standsfor  Pseudomonas  aeruginosa Interactions  DataBase.  PaIntDB has  three  components:  anSQLite  database  containing  the  interaction  data,  a  Python  module  to  generate  the  PPInetworks, and a web application with a graphical interface to visualize, filter and export thenetworks.6PaIntDB was designed to (1) compile the available interaction data in  P. aeruginosa,  (2)make use of this dataset to generate networks from high-throughput omics results, (3) enablethe  visualization  and  biologically-informed  filtering  of  the  networks,  and  (4)  combinesophisticated analysis with user-friendliness such that microbiologists without programmingexperience  can  explore  the  relationships  implicated  by  their  high-throughput  omicsexperiments.  Chapter  2  describes  the  implementation  and  workflow  of  PaIntDB,  whileChapter 3 demonstrates its use to identify genes related to biofilm formation and growth inhost-like media.1.5 Database creationThe first stage in this project was the creation of the interaction database itself. This wasdone in the lab by an undergraduate student, Olga Solodova. It was built by compiling allexisting P. aeruginosa interaction data in the literature. To do this, protein identifiers, names,cellular locations and related gene ontology (GO) terms for P. aeruginosa PAO1 and PA14proteins were collected from the Pseudomonas Genome Database48 to begin populating thePaIntDB database.  Then,  interaction data  for  P. aeruginosa strains  PAO1 and PA14 wascollected  from existing interaction  databases  and individual  studies.  Interaction  data  wasdownloaded programmatically where possible, or from provided flat files. These interactionswere then input into the database, ensuring annotations met all features according to the PSI-MI 2.5 standard.49 Figure 1 shows a diagram of the database schema.Given that the collected interaction data for P. aeruginosa strains did not provide a completepicture  of  the  interactome,  additional  interactions  were  predicted  from  orthologousmappings.  This  relied  on  the  concept  that  if  two proteins  in P.  aeruginosa  have  strongorthologs that are known to interact in another species, the likelihood is that these proteinsalso interact in P. aeruginosa, a concept previously introduced in the POINT (Prediction ofInteractome)  database.50 Thus  5,364  orthologous  genes  were  identified  between  P.aeruginosa PAO1 and PA14 using OrtholugeDB51 based on widely-accepted reciprocal best-7BLAST hit criteria. Experimentally-verified interactions in strain PAO1 were then predictedto exist between orthologous proteins in strain PA14, and vice-versa, and were annotated assuch in PaIntDB.Furthermore, since PPI data is limited for Pseudomonas species, additional interactions werealso  derived  from  the  well-studied  Escherichia  coli  K12  strain  MG1655.  In  this  case,orthologs between this  E. coli strain and  P. aeruginosa PAO1 (1,841 orthologs) and PA14(1,776 orthologs) were also collected using OrtholugeDB and interactions predicted to existbetween  orthologous  proteins  if  they  are  interactors  in  E.  coli.  Overall,  these  predictedinteractions, annotated as such, resulted in the addition of 122,501 interactions, for a total of157, 427 interactions. The complete list of sources for PPI data is described below as Table 2.89Figure 1. Database schema used to store the compiled interaction data. Most table columns correspond to the required PSI-MI 2.5 fields.Table 2. List of sources compiled to build the PaIntDB interaction database. P-P meansprotein-protein, P-M means protein-metabolite. The number of interactions for each categoryis included in parentheses.Source No. of InteractionsOrganism Interaction TypeVerified experimentallyAgile Protein Interaction DataAnalyzer (APID)5263 Pseudomonas aeruginosa (Pa) PAO1P-P YesBindingDB53 2 Escherichia coli (E. coli)P-M YesDatabase of Interacting Proteins (DIP)547,305 E. coli P-P Yes (6,170)No (1,135)EcoCyc55 10,512 E. coli P-P (2,395)P-M (8,117)NoGalán-Vázquez31 1,560 Pa PAO1 (1,513)Pa PA14 (47)P-P NoInternational Molecular Exchange (IMEx)5615,230 E. coli (15,135)PAO1 (88)PA14 (7)P-P (14,870)P-M (360)Yes (15,182)No (48)IntAct57 39,082 E. coli (38,980)Pa PAO1 (95)PA14 (7)P-P (38,384)P-M (698)Yes (38,988)No (94)Interaction ReferenceIndex (iRefIndex)5823,099 E. coli (22,988)Pa PAO1 (104)Pa PA14 (7)P-P Yes (17,037)No (6,062)Kyoto Encyclopedia of Genes and Genomes (KEGG)2834,543 Pa PA14 (14,849)Pa PAO1 (14,724)E. coli (4,970)P-P (19,357)P-M (15,186)Nomentha59 31,247 E. coli (31,225)Pa PAO1 (21)Pa PA14 (1)P-P Yes (25,785)No (5,462)The Molecular Interaction Database (MINT)60327 E. coli (312)Pa PAO1 (15)P-P (319)P-M (8)Yes10Source No. of InteractionsOrganism Interaction TypeVerified experimentallyMicrobial Protein Interaction Database (MPIDB)61336 E. coli (289)Pa PAO1 (47)P-P (330)P-M (6)YesRegulonDB62 1,845 E. coli P-P NoUniProt63 23 E. coli P-P YesXLinkDB64 297 Pa PAO1 P-P YesZhang42 38,439 Pa PAO1 P-P No11Chapter  2:  Network-based  interpretation  andintegration of high-throughput results with PaIntDBPaIntDB is available as a web application (https://  www.  paintdb.ca  ) accessible through anymodern  web  browser.  Since  user-friendliness  was  a  major  goal,  a  full  analysis  can  beperformed using only its graphical user interface. It was built using the open source Dashframework for Python, including the Cytoscape.js65 API that is part of the framework to drawthe networks. The steps to perform a network-based analysis with PaIntDB are summarizedin Figure 2 and detailed in this Chapter. 12Figure  2. Workflow of a network-based analysis with PaIntDB.The user uploads a list of genes and their interactions are mapped tobuild  a  network.  This  network  can  be  explored  visually  withinPaIntDB,  filtered  to  find  functional  subnetworks,  or  exported  forfurther analyses in NetworkAnalyst or Cytoscape.2.1 Network generationPaIntDB has three possible inputs: (i) a list of genes/proteins of interest derived from anyhigh-throughput  experiment  (ii),  a  list  of  differentially-expressed  genes/proteins  withassociated  expression  and  significance  values,  or  (iii)  a  combination  of  differentially-expressed genes with a list of genes identified through TnSeq (Fig. 3A). To handle thesedifferent inputs, I created three network classes with specific attributes (Table 3). The geneidentifiers  employed are locus tags  since these are  used in  high-throughput  experiments.After the data is uploaded, the user can select the P. aeruginosa strain to work with (PAO1 orPA14), the network order and the interaction detection method (Fig. 3B). When hoveringover the question mark next to a parameter in the graphical interface, a short text pops up toexplain its functionality.13Figure 3. Graphical user interface to build PPI networks in PaIntDB. (A) Input Selection. (B) Parameters to generate the network. (C) Selection of genes for GO term enrichment.Table  3.  Network  classes  handle  different  input  data. The  classes  are  hierarchical:BioNetwork  is  the  parent  class  of  DENetwork,  which  is  the  parent  class  ofCombinedNetwork.  Therefore,  the  attributes  are  cumulative.  The  significance  sourceindicates the experiment in which the gene was identified.Input Network Class AttributesGene list BioNetwork - Gene name- Gene description- Cellular locationDifferentially-expressed (DE)genes listDENetwork - Log2 fold change- Adjusted p-valueDE genes list + Tn-Seq essential genes listCombinedNetwork - Significance sourceFor the network order parameter, the user can  select between a zero-order network, whichonly connects any input genes that interact directly, or a first-order network, which uses theinput genes as ‘seeds’ then finds other interacting genes in the database that were not part ofthe original list to connect them. First-order networks are useful when the input gene list isshort, since a zero-order network would result in many orphaned nodes. With the detection method parameter, the user can select between using all of the interactionsin  the  database,  or  only  using  the  interactions  that  have  been  verified  experimentally.Selecting  experimental  interactions  results  in  a  smaller,  higher-confidence  network,  butbiases the analysis towards well-studied genes and may result in the loss of many genes thatare relevant to the studied conditions. After the parameters are selected, PaIntDB will mapthe interactions between the input genes using the relevant PPI data. The generated networkconsists of nodes representing proteins and edges representing their biophysical, biochemicalor regulatory interaction. To aid in the interpretation of the network, functional gene ontology (GO) term enrichment isperformed using the GOATOOLS library,66 using GO terms obtained from the PseudomonasGenome Database.48 The user has the option to perform the enrichment with either all of the14input genes or only the genes that were mapped to the network (Fig. 3C). Fischer’s exact testis employed with the usual 0.05 p-value cutoff to determine significant terms. These enrichedGO terms can be used to  filter  the nodes when visualizing the network,  as explained insection 2.3. The full enrichment results can be downloaded as a CSV file. 2.2 Network visualizationPaIntDB has a graphical interface to explore the generated networks interactively. Its maingoal is to allow the quick and easy identification of interesting groups of genes involved inthe conditions of study. To do this, the user can zoom, pan and select nodes with the mouse(Fig.  4C).  Gene  names,  descriptions,  and  experimental  information,  if  included,  of  theselected nodes is shown in a table that can be downloaded as a CSV file (Fig 4D).The nodes are positioned using the neato layout from the Graphviz graph drawing suite. Thislayout algorithm models the network as a physical system where the edges are springs thatpush the connected nodes with a force proportional to the shortest path between them. Thesystem is solved iteratively to place the nodes in a low-energy configuration.67The node size is mapped to the node degree to quickly identify hub proteins, whereby hubsare  highly  connected  nodes.  If  differential  expression  data  is  included,  the  node  colorindicates up- or down-regulation, enabling in the identification of co-regulated gene modulesin the network. If Tn-Seq genes are included, the color mapping can be changed to indicatethe experiment in which the genes were identified (Fig. 4A).When the networks contain more than 1000 nodes, the well-known hairball problem appears,where  the  over-abundance  of  nodes  and  edges  makes  visual  inspection  difficult  and  theunderlying  network structure  is  lost.  Even with an ideal  layout,  extracting patterns  fromhundreds of on-screen nodes is still a challenging task. To tackle this issue, I implemented15filters that take advantage of the user’s prior knowledge to generate smaller, biologically-relevant subnetworks. 2.3 Subnetwork generationNodes  can  be  selected  according  to  their  cellular  location  or  the  enriched  GO  termscontaining them (Fig. 4B). If differential expression data is included, then it is possible toselect up-regulated or down-regulated genes, and if a Tn-Seq dataset is included, there isanother filter to select genes according to the experiment in which they were identified. All16Figure 4. Graphical user interface to explore PPI networks. This is a combined networkconnecting RNA-Seq and Tn-Seq genes. (A) Visual options. (B) Filters used to select specificnodes to extract subnetworks.  These are  generated dynamically depending on the networktype. (C) Network view. Nodes can be selected and moved with the mouse. (D) Table showingdetails about the selected nodes, obtained either through the filters or with the mouse.filters can be combined to fine-tune the selected nodes as desired, for example: “find all up-regulated genes identified through both Tn-Seq and RNA-Seq associated with DNA Repairand located in the cytoplasm”. Individual genes of interest can also be added by name orlocus tag to the query.The next step is  creating a subnetwork connecting the selected nodes using the smallestnumber of additional nodes possible;  this  involves the use of the Prize-collecting SteinerForest  (PCSF) algorithm,  using the implementation  included in OmicsIntegrator.43,68 Thisalgorithm assigns weights to each selected node, called a prize, and to each edge, called acost. It then finds the subnetwork that maximizes the prizes and minimizes the costs. PCSFuses the whole interactome as background, so it can identify genes, called Steiner nodes, thatwere not included in the original list to connect the existing ones. The user has the option asto whether to include these additional nodes, and also whether to include low-confidenceedges in the solution (Fig. 5). This filtering is independent from the network order parameter,thus subnetworks can be extracted from both zero and first order networks.For differential expression data, the prize is assigned to the gene’s absolute fold change toprioritize genes with large changes in expression. Tn-Seq genes, if included, are assigned themaximum prize, since it is assumed that they are of particular importance to the conditions ofstudy. If no experimental data is included, then all genes are assigned the same prize. Theinteractions are assigned a cost of 0.5 if they are verified experimentally and a cost of 1 ifthey  are  not,  since  the  higher  the  edge  cost,  the  less  likely  it  is  to  be  included  in  thesubnetwork. Running the algorithm with this modelling returns a high-confidence subnetwork that bestlinks  the  selected  nodes.  This  approach,  when  combined  with  the  versatile  filteringcapabilities,  allows the user to  quickly generate  smaller,  manageable networks related tospecific biological functions of interest, as shown in Chapter 3. Any network or subnetwork17can be exported as a .graphml file if the user chooses to perform topological analyses orcustomize its visual appearance using NetworkAnalyst or Cytoscape. 18Figure 5. Functional subnetworks generated with PaIntDB, showing genes involved insiderophore mediated iron transport. The user has the option to include low-confidenceinteractions or Steiner nodes in the subnetwork. Steiner nodes are genes identified with thePrize-Collecting Steiner Forest algorithm that are not included in the original data, but helpconnect other genes. In this case, the addition of  fur, a global regulator of responses tolimiting iron, indicated with an arrow, connects hmuV and PA14_62350 to the rest of thenetwork, and tightens the network by providing a point of connectivity for what wouldotherwise be distant nodes.Chapter 3: Applications of PaIntDBPaIntDB enables the rapid identification of functionally-related genes, turning spreadsheetswith thousands of rows into digestible PPI subnetworks within minutes. These subnetworksshow gene relationships within and across metabolic pathways, thus aiding in the generationof new hypotheses about the underlying biology. This Chapter demonstrates the integrationof RNA-Seq and Tn-Seq results to identify and characterize the behaviour of key actors inhost-like media growth and biofilm formation in  P. aeruginosa.  Such integration has notbeen performed previously. The experiments that generated these datasets were performed byPhD students Corrie Belanger and Melanie Dostert.3.1 Identification of genes important for growth in host-like conditionsAntibiotic susceptibility assays traditionally use Mueller Hinton broth (MHB) as a growthmedium. MHB is nutrient-rich and does not reflect the  in vivo  conditions where bacteriagrow during an infection. Since the bacterium’s metabolism will change depending on thenutrient composition of the medium, there is a need to develop assays using host-like mediato make the conclusions more applicable to actual infections. Human blood is a low-nutrientenvironment, so bacteria must synthesize many cofactors and metabolic intermediates thatare readily available in MHB. Iron, an essential nutrient for P. aeruginosa, is sequestered bytransferrin  in  the  blood  and  thus  not  bioavailable.  The  scarcity  of  iron  and  nucleotideprecursors seem to be important limiting factors for bacterial growth in blood.69,70 Moreover,recent studies have shown that antibiotic susceptibilities change when using physiologically-relevant media.71–73The Hancock Lab is characterizing the behaviour of P. aeruginosa in host-like conditions tofind  novel  activities  of  known  antibiotics,  where  azithromycin  in  particular  has  shownenhanced antimicrobial activity under such conditions.73 For this study, the host-like medium19was RPMI, an enriched medium widely used for eukaryotic cell cultures, with the addition ofhuman serum,  to  mimic  wound exudate  from infections  or  human  blood.  P.  aeruginosaPAO1 was grown in RPMI with serum, and MHB was used as a control medium. Then,differentially-expressed  genes  were  identified  through  RNA-Seq,  with  and  withoutazithromycin  treatment.  Similarly,  essential  genes  for  the  same  growth  and  treatmentconditions were identified using Tn-Seq. 3.1.1 Nucleotide metabolismTo  find  altered  pathways  involved  in  growth  in  serum,  I  created  a  zero-order  networkintegrating 3,113 differentially-expressed genes from RNA-Seq with the corresponding 169essential genes identified through Tn-Seq. The resulting network contained 2,067 nodes, so itwas necessary to use the filters to obtain manageable subnetworks for analysis. To explorethe regulatory changes in  nucleotide synthesis  in  serum, I  made a  subnetwork using theenriched GO term “nucleotide metabolic process”, resulting in a smaller network with 74genes. Five genes participating in purine and pyrimidine metabolism (purEFL, pyrDC) were foundto  be  essential  and  differentially-expressed,  supporting  the  claims  of  the  importance  ofnucleotide  synthesis  for  growth  in  serum  (Fig.  6).  PA3505,  a  putative  L-aspartatedehydrogenase,74 was  essential  but  not  differentially-expressed.  Interestingly,  most  of  thegenes  in  the  subnetwork  were  down-regulated  (Fig.  7).  Since  nucleotide  precursors  arelacking  in  serum,  this  could  reflect  the  conservation  of  resources  by  the  bacterium  byreducing the production of the enzymes that metabolize them. Additional experiments wouldbe needed to test this hypothesis. 20Figure  6. Subnetwork connecting differentially-expressed  (RNA-Seq) andessential gene  products (Tn-Seq)  involved in nucleotide metabolism in  P.aeruginosa PAO1  grown in serum.  Distinct network regions grouped genesinvolved  in  the  (A)  adenosine  triphosphate  (ATP)  synthesis,  (B)  purinesynthesis and (C) pyrimidine synthesis pathways.Figure 7. Genes involved in nucleotide metabolism were largely down-regulated in P. aeruginosa PAO1 grown in serum.213.1.2 Iron uptakeI used a similar approach to identify genes involved in azithromycin susceptibility in serum,creating  a  zero-order  network  combining  2,206  differentially-expressed  genes  with  130essential genes. The resulting network contained 1,447 genes. In this case, we were interestedin the regulatory changes induced by the low iron availability in serum and the azithromycintreatment, so I generated a subnetwork using the enriched GO terms “iron ion transport”,“iron import  into cell”  and “pyoverdine metabolic  process”.  Pyoverdines are siderophoremolecules produced by P. aeruginosa and other pseudomonads to chelate and assimilate ironfrom the surrounding environment and are important virulence factors.75The subnetwork contained 10 small, disconnected components, so I included Steiner nodes toconnect them into a larger network (Fig. 8). Generally, iron uptake  and pyoverdine synthesisgenes  were up-regulated, presumably so the bacterium can extract as much iron as possiblefrom the low-iron environment. Two sigma factors, FoxI and  PA4896 were essential  anddifferentially-expressed (Fig.  8 inset).  Sigma factors are specific transcription factors andcomponents  of  RNA polymerase  exclusive  to  bacteria.  FoxI  regulates  the  expression  ofFoxA,76 a receptor of ferrioxiamine, another iron chelating agent. PA4896 regulates genesparticipating in the synthesis  of pyocins,77 which are  toxins  active against  closely-relatedstrains. However, it is unclear how these specific genes relate to the azithromycin treatment,and additional experiments would be needed to find any specific connection.2223Figure  8.  Subnetwork  connecting  genes  involved  in  iron  uptake,  regulation  andpyoverdine  biosynthesis  in  P.  aeruginosa  PAO1  grown  in  serum  and  treated  withazithromycin. Most genes  are up-regulated in this  low-iron environment.  Inset:  foxI andPA4896, both sigma factors, are essential and differentially-expressed under the azithromycintreatment  in  serum.  Steiner  nodes  (in  gray)  were  added  to  connect  the  independentcomponents.3.2 Identification of genes involved in biofilm formationP. aeruginosa is a leading cause of chronic infections due to its ability to form biofilms onmedical devices and mucosal surfaces, where the bacterial cells lose motility, aggregate, andsecrete an extracellular matrix that potentiates antibiotic resistance and protects them againstthe host immune system.78,79 To identify and characterize genes involved in this process,  P.aeruginosa PA14  was  grown  in  planktonic  and  biofilm  conditions  and  differentially-expressed genes were identified using RNA-Seq. Similarly, the transposon mutant librarieswere grown in the same conditions to identify essential genes through Tn-Seq. For biofilmsrelative to planktonic cells, 1,302 differentially-expressed genes and 129 essential genes wereidentified. Once again, I integrated these datasets by building a zero-order combined networkcontaining 729 nodes. To filter this network, I selected the genes associated the enriched GO term “multi-organismprocess”. This GO term is defined as \"a biological process which involves another organismof the same or different species\", thus biofilms are by definition a multi-organism process.Pel is one of the exopolysaccharides that forms the extracellular matrix in P. aeruginosabiofilms,80 and predictably the genes that encode its  biosynthesis  were up-regulated (Fig.9A). Genes encoding the Type III secretion system (T3SS) were included in this network sincethis system involves the interaction of the bacteria with the host cells, also making it a multi-organism  process.  The  T3SS  creates  a  molecular  needle  that  penetrates  the  host  cellmembrane and secretes toxins directly into the cytoplasm during infection.81 Biofilms aretypically associated with chronic infections, so virulence systems such as the T3SS are not asnecessary and thus down-regulated (Fig. 9B). However, the gene encoding one of the maincomponents  of  the  T3SS,  pcsF,  was  identified  as  essential  for  biofilm growth.  Anotheressential gene, xcpU, is part of the Type II secretion system and has been previously shown24to be essential for swarming motility82 and for growth in airway mucus where cystic fibrosischronic infections take place.83 3.2.1 Energy metabolism in biofilmsFinally, we were interested in characterizing energy metabolism in P. aeruginosa biofilms. P.aeruginosa  has versatile energy metabolism, allowing it to thrive in aerobic, microaerobicand anaerobic environments.84 Biofilms are large three-dimensional structures, and due totheir size, a decreasing gradient of nutrients and oxygen is created from the periphery to thecenter of the biofilm. Thus, diverse microenvironments exist within P. aeruginosa biofilms,and the bacteria must adapt their metabolism accordingly. 25Figure  9.  Subnetwork  generated  using  the  enriched  GO  term\"multi-organism  process\"  in  P.  aeruginosa  PA14  growing  as  abiofilm. (A) Pel synthesis genes are up-regulated. (B) Genes in the typeIII secretion system are down-regulated. XcpU is a component of theType II secretion system. The genes pscF and xcpU are essential.To examine this process, I generated a subnetwork using the enriched GO term “electrontransfer activity” (Fig. 10). This subnetwork shows the up-regulation of genes involved inaerobic (sdhBCD, nuoGE, PA14_57570, -60, -40) and anaerobic respiratory chains (nor, nir,nar,  nos, PA14_06790) and up-regulation of terminal oxidases with  a range of  affinities tooxygen (cioA, coxAB-coIII, PA14_10500, PA14_44340, -50, -60, -70, PA14_40000). The up-regulation  of  these  distinct  energy  metabolism  pathways  supports  the  existence  ofmetabolically-diverse subpopulations in biofilms. 26Figure  10.  Subnetwork  of genes  involved  in  respiratorymetabolism in P. aeruginosa PA14 biofilm. Genes participatingin  both  aerobic  and  anaerobic  metabolism are  upregulated,  aswell as terminal oxidases with varying affinities to oxygen.Chapter 4: ConclusionThe network-based approach presented here enables the rapid interpretation, integration andvisualization of large high-throughput datasets with thousands of significant genes. PaIntDBcombines network analysis features that usually require more than one tool, and because it isimplemented as a web application with an intuitive graphical interface, these features areavailable for users without any special training or a computational background. The filtering system deals effectively with the hairball issue in large networks better than theexisting  alternatives,  combining  and  expanding  the  biological  selection  feature  found  inNetworkAnalyst with the Prize-collecting Steiner Forest algorithm used in OmicsIntegratorto  generate  high-confidence,  biologically-relevant  subnetworks.  When  there  are  manyseparate components, the addition of Steiner nodes can connect them to contextualize thedifferent pathways within a larger network.PaIntDB employs the most comprehensive  Pseudomonas aeruginosa interactome to date.Following in the footsteps of the popular InnateDB,32 that contains curated interactions in theinnate  immune  response,  PaIntDB  is  the  first  step  towards  developing  a  similarknowledgebase of molecular interactions in a pathogenic bacterium. I demonstrated the useof PaIntDB to characterize the behaviour of functional gene groups involved in growth underphysiologically-relevant  conditions  and  during  biofilm  formation.  I  showed  how  thesubnetworks naturally group the genes according to their pathways or operons, thus aiding inthe generation of new hypotheses about the underlying biology behind the observed changes.4.1 LimitationsThe main caveat to this network-based approach is the limited availability of PPI data forbacteria. Overall, 3,829 proteins in strain PAO1 and 3,722 proteins in strain PA14 have atleast one interaction recorded in PaIntDB, representing 67.3% and 62.2% of their respective27genomes.  When  only  considering  interactions  verified  experimentally,  the  coverage  isreduced to  30.5% in  PAO1 and 26.5% in PA14.  This  must  be  taken into account  whenanalyzing data with PaIntDB, since any analysis will always be a subset of all the identifiedgenes.  Similarly,  many  genes  that  have  interaction  data  are  not  yet  annotated,  and  thenetworks often contain genes for which the function is still unknown, so PaIntDB can also bea valuable tool to generate hypotheses regarding their function based on their interactionswith well-characterized genes.Another assumption in this approach is that all molecular interactions happen at the sametime. PPI are dynamic and often involve the formation of short-lived protein complexes thatonly occur under certain environmental conditions or during different stages of growth andinfection.85 Therefore, the networks generated by PaIntDB include interactions that might notbe actually happening under the conditions of study. However, this issue is common to mostexisting PPI network-based approaches, and overall, the primary goal of PaIntDB is to aid ingenerating  new  hypotheses  and  suggesting  confirmatory  experiments  that  reflect  theconditions of study, rather than accurately reflecting the interactions happening in the cell ata specific time point. The web application is responsive when exploring networks up to around 1,200 genes. Asnetworks  get  larger  than  this  number,  performance  falters  when  selecting  nodes  andgenerating subnetworks, taking more than 5 seconds for each, although the analyses can stillbe run and the network can be explored with fluidity. The Dash framework used to build theapplication  requires  the  network to  be stored client-side in  the browser,  introducing thislatency  when  it  is  modified,  sent  to  the  server,  then  back  to  the  client.  An  alternativeapproach for these larger  networks would be to not  draw them at  all,  using the filteringsystem to select the desired nodes in a table view, then generating and visualizing a smallersubnetwork.284.2 Future directionsPaIntDB can take a list  of genes derived from any omics experiment,  but its  integrationfeatures are focused on RNA-Seq and Tn-Seq due to their popularity for bacterial studies. Inprinciple,  functionality  can  be  extended  to  integrate  different  combinations,  such  astranscriptomics with proteomics, and to allow the integration of more than two datasets. Thedatabase contains 14,363 protein-metabolite interactions that are unused in the application’sinitial release. These interactions could be used to integrate metabolomics with other omicsdatasets  by  mapping the  metabolites  to  their  interacting  proteins  in  a  similar  fashion toMetaBridge,86 another web-based tool developed in our lab. Due to its open source natureand implementation,  PaIntDB can be extended to include  other  bacterial  species  if  theirinteraction  data  is  compiled  following  the  same  database  schema.  The  source  code  isavailable under a permissive BSD license, allowing the community to develop, modify andextend it. The network generation module and visualization interface are species-agnostic.To address the limited GO annotations in P. aeruginosa, pathway enrichment support can beadded utilizing existing databases  KEGG,28 MetaCyc,29 or  Reactome.30These would workusing the same enrichment approach, binning the input genes into their respective pathwaysinstead  of  GO  terms,  and  would  give  the  user  the  option  to  generate  pathway-specificsubnetworks.  Similarly,  the  annotations  from  the  Comprehensive  Antibiotic  ResistanceDatabase  (CARD),87 that  collects  and  organizes  information  about  the  resistome  in  P.aeruginosa  and other pathogenic bacteria, could be added to enable the selection of geneslinked with antibiotic resistance.PaIntDB could take more advantage of the network topology to identify interesting structuresand nodes.  The only network topology statistic used so far is the simplest:  node degree.Other topological information, such as node centrality or betweenness, could give a bettermeasure of a node’s importance in the network. Community detection algorithms can also be29implemented  to  find  highly-interconnected  modules,  enabling  the  comparison  of  thesetopological groups with the biologically-informed subnetworks.In  its  current  implementation,  PaIntDB  assumes  the  user  has  pre-existing  biologicalknowledge to select specific genes according to enriched GO terms or cellular locations theyare already interested  in.  GO terms  have  a  hierarchical  structure,  so that  filter  could  beimproved by selecting terms at a specific level in the hierarchy, since most networks have 50-100 statistically enriched terms that are hard to organize mentally just by looking at a drop-down/search menu. 30Bibliography1. Moradali MF, Ghods S, Rehm BHA. Pseudomonas aeruginosa lifestyle: a paradigm foradaptation, survival, and persistence. Frontiers in Cellular and Infection Microbioliogy.2017;7. doi:10.3389/fcimb.2017.000392. Pang Z, Raudonis R, Glick BR, Lin T-J, Cheng Z. Antibiotic resistance in Pseudomonasaeruginosa: mechanisms and alternative therapeutic strategies. Biotechnology Advances. 2019;37(1):177-192. doi:10.1016/j.biotechadv.2018.11.0133. Sun E, Gill EE, Falsafi R, Yeung A, Liu S, Hancock REW. Broad-spectrum adaptive antibiotic resistance associated with Pseudomonas aeruginosa mucin-dependent surfingmotility. Antimicrobial Agents and Chemotherapy. 2018; 62(9). doi:10.1128/AAC.00848-184. Taylor PK, Yeung ATY, Hancock REW. Antibiotic resistance in Pseudomonas aeruginosa biofilms: Towards the development of novel anti-biofilm therapies. Journal of Biotechnology. 2014; 191:121-130. doi:10.1016/j.jbiotec.2014.09.0035. de la Fuente-Núñez C, Reffuveille F, Fernández L, Hancock RE. Bacterial biofilm development as a multicellular adaptation: antibiotic resistance and new therapeutic strategies. Current Opinion in Microbiology. 2013; 16(5):580-589. doi:10.1016/j.mib.2013.06.0136. Yeung ATY, Torfs ECW, Jamshidi F, et al. Swarming of Pseudomonas aeruginosa is controlled by a broad spectrum of transcriptional regulators, Including MetR. Journal of Bacteriology. 2009;191(18):5592-5602. doi:10.1128/JB.00157-097. Pletzer D, Sun E, Ritchie C, et al. Surfing motility is a complex adaptation dependent on the stringent stress response in Pseudomonas aeruginosa LESB58. PLOS Pathogens.2020;16(3):e1008444. doi:10.1371/journal.ppat.10084448. Smith RS, Iglewski BH. P. aeruginosa quorum-sensing systems and virulence. Current Opinion in Microbiology. 2003;6(1):56-60. doi:10.1016/S1369-5274(03)00008-09. Coleman SR, Smith ML, Spicer V, Lao Y, Mookherjee N, Hancock REW. Overexpression of the small RNA PA0805.1 in Pseudomonas aeruginosa modulates theexpression of a large set of genes and proteins, resulting in altered motility, cytotoxicity,and tobramycin resistance. mSystems. 2020;5(3). doi:10.1128/mSystems.00204-2010. Breidenstein EBM, Khaira BK, Wiegand I, Overhage J, Hancock REW. Complex ciprofloxacin resistome revealed by screening a Pseudomonas aeruginosa mutant Library for Altered Susceptibility. Antimicrobial Agents and Chemotherapy. 2008;52(12):4486-4491. doi:10.1128/AAC.00222-0811. Han M-L, Zhu Y, Creek DJ, et al. Comparative metabolomics and transcriptomics teveal multiple pathways associated with polymyxin killing in Pseudomonas aeruginosa. mSystems. 2019;4(1). doi:10.1128/mSystems.00149-183112. Alford MA, Baghela A, Yeung ATY, Pletzer D, Hancock REW. NtrBC regulates invasiveness and virulence of Pseudomonas aeruginosa during high-density Infection. Frontiers in Microbiology. 2020;11. doi:10.3389/fmicb.2020.0077313. Molina-Mora JA, Chinchilla-Montero D, Chavarría-Azofeifa M, et al. Transcriptomic determinants of the response of ST-111 Pseudomonas aeruginosa AG1 to ciprofloxacin identified by a top-down systems biology approach. Scientific Reports. 2020;10(1):13717. doi:10.1038/s41598-020-70581-214. Yan X, He B, Liu L, et al. Antibacterial mechanism of silver nanoparticles in Pseudomonas aeruginosa: proteomics approach. Metallomics. 2018;10(4):557-564. doi:10.1039/C7MT00328E15. Piatek M, Griffith DM, Kavanagh K. Quantitative proteomic reveals gallium maltolate induces an iron-limited stress response and reduced quorum-sensing in Pseudomonas aeruginosa. Journal of Biological Inorganic Chemistry. 2020;25(8):1153-1165. doi:10.1007/s00775-020-01831-x16. Coleman SR, Bains M, Smith ML, et al. The small RNAs PA2952.1 and PrrH as regulators of virulence, motility and iron metabolism in Pseudomonas aeruginosa. Applied and Environmenta Microbioliogy. Published online November 6, 2020. doi:10.1128/AEM.02182-2017. Mielko KA, Jabłoński SJ, Milczewska J, Sands D, Łukaszewicz M, Młynarz P. Metabolomic studies of Pseudomonas aeruginosa. World Journal of Microbioliogy and Biotechnology 2019;35(11). doi:10.1007/s11274-019-2739-118. McAdam PR, Richardson EJ, Fitzgerald JR. High-throughput sequencing for the study of bacterial pathogen biology. Current Opinion in Microbiology. 2014;19:106-113. doi:10.1016/j.mib.2014.06.00219. Love MI, Huber W, Anders S. Moderated estimation of fold change and dispersion for RNA-seq data with DESeq2. Genome Biology. 2014;15(12):550. doi:10.1186/s13059-014-0550-820. Law CW, Chen Y, Shi W, Smyth GK. voom: precision weights unlock linear model analysis tools for RNA-seq read counts. Genome Biology. 2014;15(2):R29. doi:10.1186/gb-2014-15-2-r2921. Robinson MD, McCarthy DJ, Smyth GK. edgeR: a Bioconductor package for differential expression analysis of digital gene expression data. Bioinformatics. 2010;26(1):139-140. doi:10.1093/bioinformatics/btp61622. van Opijnen T, Bodi KL, Camilli A. Tn-seq: high-throughput parallel sequencing for fitness and genetic interaction studies in microorganisms. Nature Methods. 2009;6(10):767-772. doi:10.1038/nmeth.13773223. Jensen PA, Zhu Z, van Opijnen T. Antibiotics disrupt coordination between transcriptional and phenotypic stress responses in pathogenic bacteria. Cell Reports. 2017;20(7):1705-1716. doi:10.1016/j.celrep.2017.07.06224. Mathur R, Rotroff D, Ma J, Shojaie A, Motsinger-Reif A. Gene set analysis methods: a systematic comparison. BioData Mining. 2018;11(1):8. doi:10.1186/s13040-018-0166-825. Zyla J, Marczyk M, Domaszewska T, Kaufmann SHE, Polanska J, Weiner J. Gene set enrichment for reproducible science: comparison of CERNO and eight other algorithms. Bioinformatics. 2019;35(24):5146-5154. doi:10.1093/bioinformatics/btz44726. Ashburner M, Ball CA, Blake JA, et al. Gene Ontology: tool for the unification of biology. Nature Genetics. 2000;25(1):25-29. doi:10.1038/7555627. Gene Ontology Consortium T. The Gene Ontology Resource: 20 years and still GOing strong. Nucleic Acids Research. 2019;47(D1):D330-D338. doi:10.1093/nar/gky105528. Kanehisa M, Furumichi M, Sato Y, Ishiguro-Watanabe M, Tanabe M. KEGG: integrating viruses and cellular organisms. Nucleic Acids Research. doi:10.1093/nar/gkaa97029. Caspi R, Billington R, Fulcher CA, et al. The MetaCyc database of metabolic pathways and enzymes. Nucleic Acids Research. 2018; 46(D1):D633-D639. doi:10.1093/nar/gkx93530. Jassal B, Matthews L, Viteri G, et al. The reactome pathway knowledgebase. Nucleic Acids Research. 2020;48(D1):D498-D503. doi:10.1093/nar/gkz103131. Galán-Vásquez E, Luna B, Martínez-Antonio A. The regulatory network of Pseudomonas aeruginosa. Microbial Informatics and Experimentation. 2011;1(1):3. doi:10.1186/2042-5783-1-332. Breuer K, Foroushani AK, Laird MR, et al. InnateDB: systems biology of innate immunity and beyond—recent updates and continuing curation. Nucleic Acids Research. 2013;41(Database issue):D1228-D1233. doi:10.1093/nar/gks114733. Wanichthanarak K, Fahrmann JF, Grapov D. Genomic, proteomic, and metabolomic data integration strategies. Biomarker Insights. 2015;10(Suppl 4):1-6. doi:10.4137/BMI.S2951134. Lee AH, Shannon CP, Amenyogbe N, et al. Dynamic molecular changes during the first week of human life follow a robust developmental trajectory. Nature Communications. 2019;10(1):1-14. doi:10.1038/s41467-019-08794-x35. Orchard S, Ammari M, Aranda B, et al. The MIntAct project--IntAct as a common curation platform for 11 molecular interaction databases. Nucleic Acids Research. 2013;42(Database issue):D358-63. doi:10.1093/nar/gkt11153336. Szklarczyk D, Gable AL, Lyon D, et al. STRING v11: protein-protein association networks with increased coverage, supporting functional discovery in genome-wide experimental datasets. Nucleic Acids Research. 2019;47(D1):D607-D613. doi:10.1093/nar/gky113137. Anupama R, Sajitha Lulu S, Mukherjee A, Babu S. Cross-regulatory network in Pseudomonas aeruginosa biofilm genes and TiO2 anatase induced molecular perturbations in key proteins unraveled by a systems biology approach. Gene. 2018;647:289-296. doi:10.1016/j.gene.2018.01.04238. Miryala SK, Anbarasu A, Ramaiah S. Systems biology studies in Pseudomonas aeruginosa PA01 to understand their role in biofilm formation and multidrug efflux pumps. Microbial Pathogenesis. 2019;136:103668. doi:10.1016/j.micpath.2019.10366839. Karnovsky A, Weymouth T, Hull T, et al. Metscape 2 bioinformatics tool for the analysis and visualization of metabolomics and gene expression data. Bioinformatics. 2012;28(3):373-380. doi:10.1093/bioinformatics/btr66140. Hu Z, Hung J-H, Wang Y, et al. VisANT 3.5: multi-scale network visualization, analysisand inference based on the gene ontology. Nucleic Acids Research. 2009;37(suppl_2):W115-W121. doi:10.1093/nar/gkp40641. Warde-Farley D, Donaldson SL, Comes O, et al. The GeneMANIA prediction server: biological network integration for gene prioritization and predicting gene function. Nucleic Acids Research. 2010;38(suppl_2):W214-W220. doi:10.1093/nar/gkq53742. Zhang M, Su S, Bhatnagar RK, Hassett DJ, Lu LJ. Prediction and analysis of the protein interactome in Pseudomonas aeruginosa to enable network-based drug target selection. PLOS ONE. 2012;7(7):e41202. doi:10.1371/journal.pone.004120243. Tuncbag N, Gosline SJC, Kedaigle A, Soltis AR, Gitter A, Fraenkel E. Network-based interpretation of diverse high-throughput datasets through the OmicsIntegrator softwarepackage. PLOS Computational Biology. 2016;12(4):e1004879. doi:10.1371/journal.pcbi.100487944. Shannon P, Markiel A, Ozier O, et al. Cytoscape: a software environment for integrated models of biomolecular interaction networks. Genome Research. 2003;13(11):2498-2504. doi:10.1101/gr.123930345. Bindea G, Mlecnik B, Hackl H, et al. ClueGO: a Cytoscape plug-in to decipher functionally grouped gene ontology and pathway annotation networks. Bioinformatics. 2009;25(8):1091-1093. doi:10.1093/bioinformatics/btp10146. Doncheva NT, Morris JH, Gorodkin J, Jensen LJ. Cytoscape StringApp: Network analysis and visualization of proteomics data. Journal of Proteome Research. 2019;18(2):623-632. doi:10.1021/acs.jproteome.8b007023447. Zhou G, Soufan O, Ewald J, Hancock REW, Basu N, Xia J. NetworkAnalyst 3.0: a visual analytics platform for comprehensive gene expression profiling and meta-analysis. Nucleic Acids Research. 2019;47(W1):W234-W241. doi:10.1093/nar/gkz24048. Winsor GL, Griffiths EJ, Lo R, Dhillon BK, Shay JA, Brinkman FSL. Enhanced annotations and features for comparing thousands of Pseudomonas genomes in the Pseudomonas genome database. Nucleic Acids Research. 2016;44(D1):D646-653. doi:10.1093/nar/gkv122749. Kerrien S, Orchard S, Montecchi-Palazzi L, et al. Broadening the horizon – level 2.5 of the HUPO-PSI format for molecular interactions. BMC Biology. 2007;5(1):44. doi:10.1186/1741-7007-5-4450. Huang T-W, Tien A-C, Huang W-S, et al. POINT: a database for the prediction of protein-protein interactions based on the orthologous interactome. Bioinformatics. 2004;20(17):3273-3276. doi:10.1093/bioinformatics/bth36651. Whiteside MD, Winsor GL, Laird MR, Brinkman FSL. OrtholugeDB: a bacterial and archaeal orthology resource for improved comparative genomic analysis. Nucleic Acids Research. 2013;41(D1):D366-D376. doi:10.1093/nar/gks124152. Prieto C, De Las Rivas J. APID: Agile Protein Interaction DataAnalyzer. Nucleic Acids Research. 2006;34(Web Server issue):W298-W302. doi:10.1093/nar/gkl12853. Liu T, Lin Y, Wen X, Jorissen RN, Gilson MK. BindingDB: a web-accessible database of experimentally determined protein-ligand binding affinities. Nucleic Acids Research. 2007;35(Database issue):D198-201. doi:10.1093/nar/gkl99954. Xenarios I, Rice DW, Salwinski L, Baron MK, Marcotte EM, Eisenberg D. DIP: the Database of Interacting Proteins. Nucleic Acids Research. 2000;28(1):289-291.55. Keseler IM, Mackie A, Santos-Zavaleta A, et al. The EcoCyc database: reflecting new knowledge about Escherichia coli K-12. Nucleic Acids Research. 2017;45(D1):D543-D550. doi:10.1093/nar/gkw100356. Orchard S, Kerrien S, Abbani S, et al. Protein interaction data curation: the InternationalMolecular Exchange (IMEx) consortium. Nature Methods. 2012;9(4):345-350. doi:10.1038/nmeth.193157. Orchard S, Ammari M, Aranda B, et al. The MIntAct project—IntAct as a common curation platform for 11 molecular interaction databases. Nucleic Acids Research. 2014;42(Database issue):D358-D363. doi:10.1093/nar/gkt111558. Razick S, Magklaras G, Donaldson IM. iRefIndex: A consolidated protein interaction database with provenance. BMC Bioinformatics. 2008;9(1):405. doi:10.1186/1471-2105-9-4053559. Calderone A, Castagnoli L, Cesareni G. mentha: a resource for browsing integrated protein-interaction networks. Nature Methods. 2013;10(8):690-691. doi:10.1038/nmeth.256160. Licata L, Briganti L, Peluso D, et al. MINT, the molecular interaction database: 2012 update. Nucleic Acids Research. 2012;40(D1):D857-D861. doi:10.1093/nar/gkr93061. Goll J, Rajagopala SV, Shiau SC, Wu H, Lamb BT, Uetz P. MPIDB: the microbial protein interaction database. Bioinformatics. 2008;24(15):1743-1744. doi:10.1093/bioinformatics/btn28562. Santos-Zavaleta A, Salgado H, Gama-Castro S, et al. RegulonDB v 10.5: tackling challenges to unify classic and high throughput knowledge of gene regulation in E. coli K-12. Nucleic Acids Research. 2019;47(D1):D212-D220. doi:10.1093/nar/gky107763. The UniProt Consortium. UniProt: a worldwide hub of protein knowledge. Nucleic Acids Research. 2019;47(D1):D506-D515. doi:10.1093/nar/gky104964. Schweppe DK, Zheng C, Chavez JD, et al. XLinkDB 2.0: integrated, large-scale structural analysis of protein crosslinking data. Bioinformatics. 2016;32(17):2716-2718.doi:10.1093/bioinformatics/btw23265. Franz M, Lopes CT, Huck G, Dong Y, Sumer O, Bader GD. Cytoscape.js: a graph theory library for visualisation and analysis. Bioinformatics. 2016;32(2):309-311. doi:10.1093/bioinformatics/btv55766. Klopfenstein DV, Zhang L, Pedersen BS, et al. GOATOOLS: A Python library for GeneOntology analyses. Scientific Reports. 2018;8(1):10872. doi:10.1038/s41598-018-28948-z67. Gansner ER, North SC. An open graph visualization system and its applications to software engineering. Software - Practice and Experience. 2000;30(11):1203-1233.68. Hegde C, Indyk P, Schmidt L. A nearly-linear time framework for graph-structured sparsity. In: IJCAI International Joint Conference on Artificial Intelligence. Vol 2016-January. ; 2016:4165-4169.69. Weber BS, De Jong AM, Guo ABY, et al. Genetic and chemical screening in human blood serum reveals unique antibacterial targets and compounds against Klebsiella pneumoniae. Cell Reports. 2020;32(3):107927. doi:10.1016/j.celrep.2020.10792770. Samant S, Lee H, Ghassemi M, et al. Nucleotide biosynthesis is critical for growth of bacteria in human blood. PLOS Pathogens. 2008;4(2):e37. doi:10.1371/journal.ppat.004003771. Colquhoun JM, Wozniak RAF, Dunman PM. Clinically relevant growth conditions alterAcinetobacter baumannii antibiotic susceptibility and promote identification of novel Antibacterial Agents. PLOS ONE. 2015;10(11):e0143033. doi:10.1371/journal.pone.01430333672. Lin L, Nonejuie P, Munguia J, et al. Azithromycin synergizes with cationic antimicrobial peptides to exert bactericidal and therapeutic activity against highly multidrug-resistant gram-negative bacterial pathogens. EBioMedicine. 2015;2(7):690-698. doi:10.1016/j.ebiom.2015.05.02173. Belanger CR, Lee AH-Y, Pletzer D, Dhillon BK, Falsafi R, Hancock REW. Identification of novel targets of azithromycin activity against Pseudomonas aeruginosa grown in physiologically relevant media. PNAS. Published online December 10, 2020. doi:10.1073/pnas.200762611774. Li Y, Kawakami N, Ogola HJO, et al. A novel L-aspartate dehydrogenase from the mesophilic bacterium Pseudomonas aeruginosa PAO1: molecular characterization and application for L-aspartate production. Applied Microbiology and Biotechnology. 2011;90(6):1953-1962. doi:10.1007/s00253-011-3208-475. Kang D, Revtovich AV, Chen Q, Shah KN, Cannon CL, Kirienko NV. Pyoverdine-dependent virulence of Pseudomonas aeruginosa isolates from cystic fibrosis patients. Frontiers in Microbiology. 2019;10. doi:10.3389/fmicb.2019.0204876. Bastiaansen KC, van Ulsen P, Wijtmans M, Bitter W, Llamas MA. Self-cleavage of the Pseudomonas aeruginosa cell-surface signaling anti-sigma factor FoxR occurs through an N-O acyl rearrangement. Journal of Biological Chemistry. 2015;290(19):12237-12246. doi:10.1074/jbc.M115.64309877. Llamas MA, Mooij MJ, Sparrius M, Vandenbroucke‐Grauls CMJE, Ratledge C, Bitter W. Characterization of five novel Pseudomonas aeruginosa cell-surface signalling systems. Molecular Microbiology. 2008;67(2):458-472. doi:https://doi.org/10.1111/j.1365-2958.2007.06061.x78. Gellatly SL, Hancock REW. Pseudomonas aeruginosa : new insights into pathogenesis and host defenses. Pathogens and Disease. 2013;67(3):159-173. doi:10.1111/2049-632X.1203379. Ciofu O, Tolker-Nielsen T. Tolerance and resistance of Pseudomonas aeruginosa biofilms to antimicrobial agents—How P. aeruginosa can escape antibiotics. Frontiers in Microbiology. 2019;10. doi:10.3389/fmicb.2019.0091380. Jennings LK, Storek KM, Ledvina HE, et al. Pel is a cationic exopolysaccharide that cross-links extracellular DNA in the Pseudomonas aeruginosa biofilm matrix. PNAS. 2015;112(36):11353-11358. doi:10.1073/pnas.150305811281. Hauser AR. The Type III secretion system of Pseudomonas aeruginosa: Infection by injection. Nature Reviews Microbiology. 2009;7(9):654-665. doi:10.1038/nrmicro219982. Overhage J, Lewenza S, Marr AK, Hancock REW. Identification of genes involved in swarming motility using a Pseudomonas aeruginosa PAO1 Mini-Tn5-lux mutant Library. Journal of Bacteriology. 2007;189(5):2164-2169. doi:10.1128/JB.01623-063783. Alrahman MA, Yoon SS. Identification of essential genes of Pseudomonas aeruginosa for its growth in airway mucus. Journal of Microbiology. 2017;55(1):68-74. doi:10.1007/s12275-017-6515-384. Arai H. Regulation and function of versatile aerobic and anaerobic respiratory metabolism in Pseudomonas aeruginosa. Frontiers in Microbiology. 2011;2. doi:10.3389/fmicb.2011.0010385. Chen B, Fan W, Liu J, Wu F-X. Identifying protein complexes and functional modules—from static PPI networks to dynamic PPI networks. Briefings in Bioinformatics. 2014;15(2):177-194. doi:10.1093/bib/bbt03986. Hinshaw SJ, Lee AHY, Gill EE, Hancock REW. MetaBridge: enabling network-based integrative analysis via direct protein interactors of metabolites. Bioinformatics. 2018;34(18):3225-3227. doi:10.1093/bioinformatics/bty33187. Alcock BP, Raphenya AR, Lau TTY, et al. CARD 2020: antibiotic resistome surveillance with the comprehensive antibiotic resistance database. Nucleic Acids Research. 2020;48(D1):D517-D525. doi:10.1093/nar/gkz93538"@en ;
  edm:hasType "Thesis/Dissertation"@en ;
  vivo:dateIssued "2021-05"@en ;
  edm:isShownAt "10.14288/1.0395820"@en ;
  dcterms:language "eng"@en ;
  ns0:degreeDiscipline "Bioinformatics"@en ;
  edm:provider "Vancouver : University of British Columbia Library"@en ;
  dcterms:publisher "University of British Columbia"@en ;
  dcterms:rights "Attribution-NonCommercial-NoDerivatives 4.0 International"@* ;
  ns0:rightsURI "http://creativecommons.org/licenses/by-nc-nd/4.0/"@* ;
  ns0:scholarLevel "Graduate"@en ;
  dcterms:title "Network-based integration and visualization of high-throughput datasets in Pseudomonas aeruginosa"@en ;
  dcterms:type "Text"@en ;
  ns0:identifierURI "http://hdl.handle.net/2429/77248"@en .