UBC Faculty Research and Publications

Gene co-expression network analysis in Rhodobacter capsulatus and application to comparative expression… Peña-Castillo, Lourdes; Mercer, Ryan G; Gurinovich, Anastasia; Callister, Stephen J; Wright, Aaron T; Westbye, Alexander B; Beatty, J T; Lang, Andrew S Aug 28, 2014

Your browser doesn't seem to have a PDF viewer, please download the PDF to view this item.

Item Metadata


52383-12864_2014_Article_6415.pdf [ 1.38MB ]
JSON: 52383-1.0223755.json
JSON-LD: 52383-1.0223755-ld.json
RDF/XML (Pretty): 52383-1.0223755-rdf.xml
RDF/JSON: 52383-1.0223755-rdf.json
Turtle: 52383-1.0223755-turtle.txt
N-Triples: 52383-1.0223755-rdf-ntriples.txt
Original Record: 52383-1.0223755-source.json
Full Text

Full Text

RESEARCH ARTICLE Open AccessGene co-expression network analysis inuncharacterized genes in modules are now connected with groups of genes that constitute a joint functionalPeña-Castillo et al. BMC Genomics 2014, 15:730http://www.biomedcentral.com/1471-2164/15/730John’s, NL, CanadaFull list of author information is available at the end of the articleannotation. We identified R. capsulatus modules enriched with genes for ribosomal proteins, porphyrin andbacteriochlorophyll anabolism, and biosynthesis of secondary metabolites to be preserved in R. sphaeroides whereasmodules related to RcGTA production and signalling showed lack of preservation in R. sphaeroides. In addition, wedemonstrated that network statistics may also be applied within-species to identify congruence between mRNAexpression and protein abundance data for which simple correlation measurements have previously had mixed results.Keywords: Comparative transcriptomics, Module preservation, Gene-protein expression conservation, Rhodobactercapsulatus, Rhodobacter sphaeroides* Correspondence: lourdes@mun.ca; aslang@mun.ca1Department of Biology, Memorial University of Newfoundland, St. John’s, NLA1B 3X5, Canada2Department of Computer Science, Memorial University of Newfoundland, St.Conclusions: Our analyses provide new sources of informaRhodobacter capsulatus and application tocomparative expression analysis of RhodobactersphaeroidesLourdes Peña-Castillo1,2*, Ryan G Mercer1, Anastasia Gurinovich2, Stephen J Callister3, Aaron T Wright3,Alexander B Westbye4, J Thomas Beatty4 and Andrew S Lang1*AbstractBackground: The genus Rhodobacter contains purple nonsulfur bacteria found mostly in freshwater environments.Representative strains of two Rhodobacter species, R. capsulatus and R. sphaeroides, have had their genomes fullysequenced and both have been the subject of transcriptional profiling studies. Gene co-expression networks can beused to identify modules of genes with similar expression profiles. Functional analysis of gene modules can thenassociate co-expressed genes with biological pathways, and network statistics can determine the degree of modulepreservation in related networks. In this paper, we constructed an R. capsulatus gene co-expression network, performedfunctional analysis of identified gene modules, and investigated preservation of these modules in R. capsulatus proteomicsdata and in R. sphaeroides transcriptomics data.Results: The analysis identified 40 gene co-expression modules in R. capsulatus. Investigation of the module genecontents and expression profiles revealed patterns that were validated based on previous studies supporting the biologicalrelevance of these modules. We identified two R. capsulatus gene modules preserved in the protein abundance data. Wealso identified several gene modules preserved between both Rhodobacter species, which indicate that these cellularprocesses are conserved between the species and are candidates for functional information transfer between species.Many gene modules were non-preserved, providing insight into processes that differentiate the two species. Inaddition, using Local Network Similarity (LNS), a recently proposed metric for expression divergence, we assessed theexpression conservation of between-species pairs of orthologs, and within-species gene-protein expression profiles.tion for functional annotation in R. capsulatus because© 2014 Peña-Castillo et al.; licensee BioMed Central Ltd. This is an Open Access article distributed under the terms of theCreative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use,distribution, and reproduction in any medium, provided the original work is properly credited. The Creative Commons PublicDomain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in thisarticle, unless otherwise stated.Peña-Castillo et al. BMC Genomics 2014, 15:730 Page 2 of 14http://www.biomedcentral.com/1471-2164/15/730BackgroundSpecies in the genus Rhodobacter are purple nonsulfurbacteria found mostly in freshwater environments [1].A hallmark of purple nonsulfur bacteria is that theydisplay tremendous physiological diversity [2]. Genomesequences are available from two Rhodobacter species, R.capsulatus [3] and R. sphaeroides [4], and transcriptionalprofiling studies have been performed with both species[5-8]. These two species have been widely studied asmodel organisms for anoxygenic photosynthesis, carbonand nitrogen fixation, chemotaxis and flagellar motility,and various regulatory systems including quorum sensing,two-component phosphorelays and those responsible forregulation in response to O2 and light [9-12]. R. capsulatusis also a model organism for study of a gene transfer agent,RcGTA, which is a virus-like particle that packages smallsegments of the genome of a GTA-producing cell that canthen be transferred to recipient cells [13].Weighted gene co-expression network analysis (WGCNA)has been widely used to analyze transcriptional profilessince its introduction in 2005 [14,15], and has provedto be a useful approach for the functional annotationof uncharacterized genes [16,17]. In a recent criticalassessment of methods for constructing gene networks[18] WGCNA was found to be one of the methods thatperformed the best for constructing global co-expressionnetworks. After network construction, functional analysisfocuses on groups of tightly connected genes (known asmodules) instead of single genes. Because genes within thesame modules tend to maintain a consistent, correlatedexpression relationship independent of phenotype orexperimental condition, such genes are assumed to befunctionally associated, and shared regulatory and/orfunctional pathways may be inferred. In addition, WGCNAoffers functionality to assess whether gene modules arepreserved in other networks [19]. Preserved gene modulesindicate biological processes that are conserved betweenspecies and may be candidates for functional informationtransfer between species. Non-preserved gene modulesreflect species-specific modules, which may provide insightinto biological processes that have diverged betweenspecies. Recently, a metric for expression divergence calledLocal Network Similarity (LNS) was proposed to assessexpression conservation of a pair of orthologs [20]. LNS isthe correlation between the correlations of the pair oforthologs’ expression and the expression patterns of allother identified orthologs. This metric differs from themodule preservation statistics obtained by WGCNA inthat it is applied to a pair of genes instead of to a genemodule. LNS and WGCNA may also be applied to diversedatasets such as mRNA expression and protein abun-dance data. Observations of low to moderate correlationsbetween mRNA expression and protein abundance dataare recurrent in the literature [21,22], indicating thatnetwork-based metrics of similarity may be more suitableto compare these two types of data.In this study, we constructed an R. capsulatus geneco-expression network, and took advantage of themodule preservation functionality in WGCNA to identifyR. capsulatus gene modules preserved in a collection ofpublished R. sphaeroides mRNA expression data, and in aR. capsulatus proteomics dataset. In addition, we calcu-lated LNS for all 2175 pairs of orthologs between the twoRhodobacter species, and we also applied this metric toassess whether R. capsulatus genes and proteins havesimilar co-expression relationships in the protein abundanceand mRNA expression data. We also related LNS toWGCNA module preservation statistics and investigatedthe effect of the size of the datasets in LNS. In sum, weproduced comparative transcriptomics resources to guidefurther functional studies of R. capsulatus, and, to the bestof our knowledge, performed the first application ofnetwork-based expression preservation metrics betweentranscriptomics and proteomics data.Results and discussionR. capsulatus co-expression networkWe used 48 gene expression experiments encompassing 23different conditions and/or mutant strains for the 3571genes on the R. capsulatus microarrays to construct a geneco-expression network using WGCNA. A total of 40 geneco-expression modules were identified. To assess the stabil-ity of modules, we performed a resampling analysis ofcluster robustness as described in [23]. The results of clusterstability analysis indicated that module assignments werereasonably stable with many of the modules being identifiedin most resampled data sets (see Additional file 1). Themodules varied in size from 18 to 696 genes with an averagesize of 87 genes. A total of 3,533 genes out of the 3,571genes represented on the microarrays were assigned tomodules. Thirty-seven modules had enrichment of at leastone type of biological gene set (i.e., protein domain,biological pathway, protein complex or transcription unit),and 21 modules were related to at least one biological path-way, which indicated that the modules were biologicallymeaningful. Some modules of interest are discussed belowto illustrate the validity of this analysis.One gene co-expression module containing 43 genes(the orange module) was associated with the productionof RcGTA. This module was enriched (p-value = 5.8e-35)with the RcGTA gene cluster (rcc01682 to rcc01698) [24]. Italso contained the endolysin and holin genes (rcc00555 andrcc00556) required for RcGTA release [25,26], and genespredicted to be involved in DNA uptake and recombination,with two genes annotated as related to competence (comMand rcc02362) and three genes associated with DNA repairand protection and incorporation of DNA received fromRcGTA particles (radC, recA and dprA) [27]. There werePeña-Castillo et al. BMC Genomics 2014, 15:730 Page 3 of 14http://www.biomedcentral.com/1471-2164/15/730also two genes encoding predicted signal transductionproteins, rcc00042 encoding a sensor domain protein andrcc00645 encoding a diguanylate cyclase/phosphodiesterase,which had previously been identified as affected by theloss of the response regulator CtrA similar to the RcGTAgene cluster [5]. The trends for genes in this module wereincreased expression in the stationary phase relative tologarithmic phase, reduced expression in the ctrA and gtaImutants but not in the cckA mutant, and greatly increasedexpression in the RcGTA overproducer strain, DE442(Figure 1a).Signal transduction and transcriptional regulation proteinsaffected by the loss of the response regulator protein CtrA[5] are significantly over-represented (p-value = 4.8e-22)among the 141 genes forming the pink module, with 17 outof the 23 previously identified proteins in this module. Thepink module showed a significant enrichment of genesinvolved in chemotaxis (FDR-corrected p-value of 1.3e-34),two-component systems (FDR-corrected p-value of 8.6e-11),and flagellar assembly (FDR-corrected p-value of 2.9e-9).This module also contains all 17 R. capsulatus proteinscontaining a Methyl-accepting chemotaxis protein (MCP)signalling domain (FDR-corrected p-value of 1.7e-20).Genes in the pink module showed significantly lowerexpression in both the cckA and ctrA strains (Figure 1b).This corresponds to previous work that demonstrated thatCtrA and CckA are required for expression of flagellarand chemotaxis genes [5,28,29]. Genes within this modulehave also been shown to be involved in control of motility[28] and expression of the RcGTA genes (rbaU, rbaV andrbaW; [30]). The darkturquoise module also contained anover-representation of flagellar genes (FDR-correctedp-value of 6.3e-11), and visually showed a very similarexpression profile (Figure 1c) as the pink module. The oneexception was that expression of genes in the darkturquoisemodule was elevated in the DE442 strain in the transitionand stationary growth phases while expression of genes inthe pink module was not. The median expression profilesof the pink and darkturquoise modules reciprocally correl-ate the most with each other (Pearson correlation of 0.76,p-value = 2.87e-5). Correlations between module median ex-pression profiles are shown in Additional file 2. In additionto flagellar genes, the darkturquoise module contained anover-representation of gas vesicle genes. In total, the orange,pink and darkturquoise modules represent 84% of whatwas previously identified as the CtrA “regulon” [5].Several modules showed patterns of expression thatwere most affected by the culture growth medium(heatmaps illustrating the expression profiles of all modulesare provided in Additional file 3). This included the darkredand orangered4 modules that showed a relative decrease inexpression in RCV medium and the cyan, greenyellow andpaleturquoise modules that showed increased expression inRCV medium. Not surprisingly, these modules containedmany genes involved in transport and various aspects ofmetabolism such as sugar and vitamin biochemistry.Two modules, midnightblue and salmon4, which showedhigh relative expression across all strains and conditionscontained many of the genes required for phototrophicgrowth. This included genes encoding the photosyntheticreaction centre, the light-harvesting complexes I and II, andbacetriochlorophyll and carotenoid pigment biosynthesisproteins. High expression of these genes is expectedbecause all RNA samples used in the microarray experi-ments came from cultures grown photoheterotrophically,and these genes are well characterized for their globalregulation by several key regulators [31].The darkgreen module contained genes responsible forsynthesis of the capsule, rcc0181-01086 and rcc01958-01960,required as an RcGTA receptor [32]. This module showedincreased expression in strain DE442 and decreased geneexpression in the gtaImutant in the logarithmic phase whengrown in RCV medium. This module also contains one ofthe R. capsulatus crispr-associated (cas) gene clusters [3].The skyblue3 module could also be implicated as affectedby quorum sensing because of decreased expression levelsin the gtaI mutant relative to wild type (in the logarithmicphase only). This module included another one of thegenes required for capsule synthesis, rcc01932 [32],and rcc01955-01957, which are located adjacent to thegenes in the darkgreen module mentioned above that areinvolved in quorum sensing-dependent capsule production.Three modules, skyblue, turquoise and violet, showedlower relative expression across all strains and conditions.These modules obviously represent genes with low or noexpression under the conditions of these experiments, andthe turquoise module was the largest of all 40 modules,with 696 genes. Of note in the turquoise module is a largenumber of prophage genes, representing 5 distinct unchar-acterized prophage regions as well as the majority of thegenes of RcapMu [33]. This module also includes genes fornitrogen fixation and several alternative sigma factors.As a result of this gene co-expression network analysis,99% of the 909 R. capsulatus genes described as“hypothetical protein” were assigned to modules. Theseuncharacterized genes might now be putatively impli-cated in specific biological processes to guide functionalcharacterization. Gene module assignments are providedin Additional file 4.We also tested whether genes in certain modules werepreferentially packaged in the RcGTA particles using theavailable RcGTA packaging microarray data [25]. Nomodules were found to be over-represented in theRcGTA-packed DNA but we observed a strong inversecorrelation (Pearson correlation of -0.84, p-value = 2.35e-12)between the content of plasmid genes in a module and theintensity measurements detected in the RcGTA DNA. Thisis expected as the RcGTA DNA was isolated from DE442,Peña-Castillo et al. BMC Genomics 2014, 15:730 Page 4 of 14http://www.biomedcentral.com/1471-2164/15/730which lacks the ~100-kb plasmid present in the genome-sequenced strain, SB1003 [25]. Although the plasmidgenes were distributed amongst 17 different modules, thedarkmagenta module showed the largest proportionalFigure 1 Expression profiles of genes in selected co-expression moduindicate robust z-scores. Colours on top of the columns refer to clusters of condcorrespond to the indices of the conditions mutant strains described in Table 1(c) Darkturquoise co-expression module.plasmid gene content (18/30) of all co-expression mod-ules, at 60% plasmid-borne genes. The darkmagenta andthree other modules that contained >15 plasmid genes,purple (17), royalblue (25) and turquoise (31), combined toles across all conditions and/or mutant strains. Heatmap coloursitions/mutant strains highlighted in Figure 2. Numbers below the columns. (a) Orange co-expression module; (b) Pink co-expression module;contain 64% of all plasmid genes on the arrays, indicatingwidespread co-regulation of the plasmid-borne genes.To confirm that the modules with a large number ofplasmid genes were still identifiable in the absence ofthe samples from the plasmid-lacking DE442 strain, weassessed (using the WGCNA modulePreservation function)whether such modules were reproducible in the data subsetwithout the DE442 data. Indeed, there was moderate tostrong evidence (Zsummary.pres > 5) that all 40 moduleswere present in the subset of data without the DE442samples. Thus, modules are robust to the lack of signalfor the plasmid genes in the DE442 data.To explore similarity of expression profiles based onconditions and/or mutant strains, we performed a hier-archical cluster (average linkage) analysis with multiscalebootstrap resampling (ten different sample sizes and 10000bootstrap samples) [34] using the Pvclust R package(version 1.2.2) [35]. The dendrogram obtained (Figure 2)indicated that gene expression profiles form groups basedon RCV growth medium (green), the DE442 strain (orange),and culture growth phase (yellow and blue versus red).Network based comparison of transcript levels andprotein abundanceWe explored the preservation of co-expression betweenmodulePreservation function from WGCNA. This func-tion, using a permutation test, assessed whether the mod-ule nodes identified in R. capsulatus gene co-expressionnetwork remained connected in the protein co-expressionnetwork and whether the connectivity pattern betweennodes in both networks was similar. A composite preserva-tion statistic, Zsummary, can be used to evaluate whethermodules are preserved. A Zsummary > 2 indicates thatthere is weak to moderate evidence of preservation andZsummary > 10 indicates that there is strong evidence thatthe module is preserved [19]. Note that module preserva-tion can be assessed using the protein co-expression net-work (as defined by a correlation matrix) without clusterdetection. We realized that the small sample size of theproteomics dataset (six conditions and 1158 proteins) mightreduce the statistical power to pinpoint preserved modules;however, it seemed possible that strongly conserved bio-logical signals could be identified. Indeed, we found evi-dence of preservation of two gene modules in the proteinabundance data: the blue module (Zsummary = 2.90), whichwas enriched with a number of housekeeping functions(Additional file 5), and especially with ribosomal pro-teins (FDR-corrected p-value of 3.3e-38), and the brownmodule (Zsummary = 2.48), which was enriched in genesrelated to iron transport (FDR-corrected p-values <0.01)s andf nPeña-Castillo et al. BMC Genomics 2014, 15:730 Page 5 of 14http://www.biomedcentral.com/1471-2164/15/730R. capsulatus mRNAs and proteins by applying theFigure 2 Dendrogram showing hierarchical clustering of conditionbranches are approximately unbiased (AU) percentage values (bottom) apercentage of bootstrap replicates in which a given cluster appears. Leadescribed in Table 1. Coloured boxes indicate branches in the dendrogram(green), the DE442 strain (orange), and culture growth phase (yellow and b(Additional file 5). Additional file 6 shows preservationnd/or mutant strains based on gene expression profiles. Values atbootstrap (BP) percentage values (top). BP values indicate theumbers correspond to the indices of the conditions/mutant strainsof conditions/mutant strains having in common RCV growth mediumlue versus red).Peña-Castillo et al. BMC Genomics 2014, 15:730 Page 6 of 14http://www.biomedcentral.com/1471-2164/15/730statistics of R. capsulatus mRNA modules in theprotein co-expression network. This suggests thatnetwork-based analysis may be suitable for identifyingpreservation of global expression between transcripto-mics and proteomics data. Comparison of these twodata types has frequently yielded mixed results withreports of low to moderate correlations [21,22]. Anetwork-based analysis with a larger sample size ofprotein abundance data is needed to corroborate andfurther extend our finding.Comparative transcriptomics in Rhodobacter speciesWe investigated the preservation of global co-expressionbetween R. capsulatus and R. sphaeroides using the network-based statistics calculated by the modulePreservation func-tion from WGCNA. There are two main network-basedstatistics found to accurately distinguish preserved fromunpreserved modules: Zsummary and medianRank [19].These statistics are calculated twice: once to assesswhether modules are reproducible in the reference datasubset consisting only of genes in common with the testdataset (referred to as “quality” statistics), and the secondtime to evaluate the conservation of the modules in thetest data subset (referred to as “preservation” statistics).The quality statistics are a complementary approach tothe cluster stability analysis to assess the robustness of theidentified modules. 2123 one-to-one orthologs betweenthe species have been identified by Reciprocal Best Match[36], and we calculated the module preservation statisticsin the data subsets containing these 2123 orthologousgenes between the R. capsulatus (reference) and R. sphaer-oides (test) co-expression networks.The quality and preservation of the 40 R. capsulatusco-expression modules identified are illustrated in Figure 3a.Zsummary tends to be more dependent on the module sizethan medianRank [19]; nevertheless both statistics showeda strong correlation (Pearson coefficient of -0.654) assessingthe preservation of R. capsulatus modules in R. sphaeroidesdata (Figure 3b). Unsurprisingly, there was strong evidenceof preservation (Zsummary.pres > 10) of the blue module.We found low to moderate evidence of preservation(2 < Zsummary.pres < 10) for ten additional modules.Among those, there were modules enriched with proteinsimplicated in porphyrin and bacteriochlorophyll metabolism(midnightblue, FDR-corrected p-value of 1.16e-11), biosyn-thesis of secondary metabolites (red, FDR-corrected p-valueof 7.9e-7), ABC transporters (tan, FDR-corrected p-value of0.0001), CO2 fixation (darkorange, FDR-corrected p-value of0.0003), two-component systems (salmon4, FDR-correctedp-value of 0.0007), protein secretion (palevioletred3,FDR-corrected p-value of 0.03), and lysine biosynthesis(thistle2, FDR-corrected p-value of 0.03).The orange module, related to RcGTA production, isamongst those not preserved in R. sphaeroides. This isconsistent with the fact that no evidence of GTA productionhas been found in R. sphaeroides [13], despite conservationof the GTA genes [37]. Other non-preserved moduleswere the pink (related to chemotaxis and signalling),darkturquoise (related to flagellar assembly), darkred(related to aerobic hydrogen oxidation), darkolivegreen(related to Fe2+ oxidation), green (related to adenosylcoba-lamin biosynthesis from cobyrinate a,c), turquoise (relatedto chloroalkane and chloroalkene degradation), sienna3(related to valine metabolism), yellowgreen (related tocreatinine degradation and formate oxidation), steelblue(related to biotin metabolism), and darkgreen (relatedto 2-ketoglutarate dehydrogenase complex). There wasalso no evidence of preservation of the brown modulein R. sphaeroides, which was one of the conservedR. capsulatus mRNA-protein modules.Assessment of gene-wise conservation of expressionIn addition to evaluating module preservation betweenthe two Rhodobacter species, we wanted to assess pairwiseconservation of expression between orthologs. Therefore,we calculated Local Network Similarity (LNS) [20] tostudy the conservation of gene expression between thetwo species. This metric was developed for applicationto expression datasets consisting of unmatched experi-mental conditions, and it quantifies the similarity ofthe expression correlations between a pair of orthologsand all other identified orthologs. We decided to ex-plore the effect of dataset size in LNS and obtained theLNS within each species by dividing the available datainto two different subsets. We also simulated the null-hypothesis of no conservation by randomizing theortholog pairs (see Methods). LNS scores of the nulldistribution ranged from -0.11 to 0.11 with an averagevery close to zero (2.6e-5). The within-species LNSdistributions showed a pronounced shift towards positivevalues (Figure 4a). However, the R. capsulatus distributionwas less positive (median LNS of 0.61) than that ofR. sphaeroides (median LNS of 0.85). This is likely due tothe difference in the amounts of transcriptomics data forthe two species, as the R. sphaeroides dataset contains eighttimes as many arrays as the R. capsulatus dataset.The LNS was then calculated between R. capsulatusand R. sphaeroides orthologs and between the R. capsulatustranscriptomics and proteomics data (Figure 4). Thebetween-species LNS scores of the matched orthologpairs showed less positive values than the within-speciesLNS, but there were still values to the right of the nulldistribution such that 30% of ortholog pairs had a posi-tive LNS score greater than 100% of the random values(Figure 4b). Co-expression of between-species orthologpairs is expected to be less similar than the within-speciesco-expression. Furthermore, the orthologs’ functions mayhave diverged in the different species, in which case thePeña-Castillo et al. BMC Genomics 2014, 15:730 Page 7 of 14http://www.biomedcentral.com/1471-2164/15/730LNS should be low to reflect this divergence. For example,the LNS scores of R. capsulatus genes involved in the pro-duction of RcGTA and their corresponding R. sphaeroidesorthologs ranged from -0.08 to 0.10 while highly conservedhousekeeping genes such as aroA and radA have LNSscores of 0.43 and 0.38, respectively. Encouragingly, theFigure 3 Preservation statistics of R. capsulatus modules in R. sphaerosubset used to assess module preservation. The horizontal lines indicate th(above 10) and for low to moderate evidence of conservation (above 2). R.on the right side. (a) Module preservation as a function of module quality;pres and medianRank). Lower medianRank indicates higher preservation.LNS scores between R. capsulatus mRNA expressionand protein abundance data are also to the right of thenull distribution suggesting that the LNS metric is sensi-tive enough to detect conservation of expression in smalldatasets and between diverse data types. LNS scores areprovided in Additional file 7.ides data. The size of the bubble represents module size in the datae Zsummary.pres thresholds for strong evidence of conservationcapsulatus modules found to be conserved in R. sphaeroides are listed(b) relationship between the two preservation statistics (Zsummary.Peña-Castillo et al. BMC Genomics 2014, 15:730 Page 8 of 14http://www.biomedcentral.com/1471-2164/15/730Relationship between module preservation statisticsConnectivity statistics quantify whether connectionsbetween genes in the reference network are similar tothose in the test network. By its definition, LNS is aconnectivity-based metric. To relate LNS to WGCNA mod-ule preservation statistics, we obtained the median LNS perFigure 4 Within-species and between-species expression conservationshifted to right of the null distribution. Within-species show a stronger shifthe cumulative distribution. The vertical dashed line indicates the maximummodule (henceforth referred to as median-LNS). After com-paring the median-LNS with WGCNA connectivity-basedstatistics, we found that median-LNS correlated bestwith bicor.kMEall, which is the correlation of the totalnetwork module eigengenes connectivity. A module eigen-gene (ME) summarizes the expression profile of a module.. (a) Distribution of within-species and between-species LNS scores ist towards positives values; (b) represents the same data as in (a) but asLNS score observed in the null distribution.tatformPeña-Castillo et al. BMC Genomics 2014, 15:730 Page 9 of 14http://www.biomedcentral.com/1471-2164/15/730The relationship between LNS-Median and cor.kMEallis shown in Figure 5. We observed a Pearson correlationbetween median-LNS and cor.kMEall of 0.78 (p-value of2.6e-9) for the network comparison between the twoRhodobacter species, and of 0.49 (p-value of 0.001) forthe comparison between R. capsulatus mRNA and proteinexpression.ConclusionsUsing WGCNA and functional analysis of R. capsulatustranscriptomics data, we identified distinct groups ofco-expressed genes with associations to biological genesets (protein domains, metabolic pathways, transcrip-tional units and/or protein complexes). We observedFigure 5 Relationship between module preservation connectivity sconnectivity (bicor.KMEall) as a function of the median-LNS per modulemodules in R. capsulatus proteomics data (right). Each point represents ablack line is the loess smoothed line.co-expression modules associated with functions knownto be co-regulated based on previous studies, such asthe production of RcGTA, motility and chemotaxis.These identified co-expression modules will be useful toidentify candidate genes for further investigations inR. capsulatus biology, such as the regulation and pro-duction of RcGTA. In addition, we distinguished be-tween preserved and non-preserved modules betweenR. capsulatus and R. sphaeroides. The module preser-vation results point to a lack of similarity between thetwo Rhodobacter species for many of the modules,whereas the expression of several metabolic pathwayswas similar in both species. We also quantified theconservation of expression of all one-to-one orthologsbetween these species using the LNS metric. Theseresources may aid in the identification of functionalanalog genes (those with conserved functional roles)in these bacteria, and comparative transcriptomics studiessuch as this can be applied to other bacterial species toobtain evidence of gene expression conservation andthereby allow further exploration of gene function.MethodsDatasetsPutting together published [5,25,32] and unpublishedmicroarray experiments (NCBI Gene Expression Omnibusdatabase accessions: GSE18149, GSE33176, GSE41014 andGSE53636), we collected 48 gene expression experimentsencompassing 23 different conditions and/or mutantstrains for the 3571 genes on the R. capsulatus microar-rays. We also analyzed a small-scale proteomics dataset of1158 proteins for R. capsulatus over six conditions and/ormutant strains ([5], and our own unpublished data) andcollected all data from 192 R. sphaeroides microarrayexperiments available in NCBI Gene Expression Omnibus(GEO) [38].istics. Total network correlation of the module eigengenesR. capsulatus modules in R. sphaeroides (left) and R. capsulatus mRNAodule labeled by the colour corresponding to the module name. TheR. capsulatus transcriptomics analysisIn addition to the previously published arrays [5,25,32],data were used from the strains and growth conditionsdescribed below. The complete listing of conditions and/or mutant strains used for the analyses is provided inTable 1 and the strains are described in Table 2. RNA iso-lations and hybridizations to the arrays were done as de-scribed in [5]; specifically, RNA was isolated using QiagenRNeasy Minikit and cDNA synthesis, labelling and targethybridization performed as described in the AffymetrixExpression Analysis Technical Manual for prokaryoticsamples. Arrays were quantile normalized together usingthe RMA method as implemented in the Affy package[39] for R (version 2.15.0). Quality tests were performed onthe normalized array data using the Bioconductor AffyPLMpackage (version 1.36.0) [40], and by examining chip treesgenerated by the R WGCNA package (version 1.27.1) [41]and the Pvclust R package (version 1.2.2) [35].Probes were mapped using BLAST+ 2.2.24 to codingsequences in the R. capsulatus chromosome and plasmidTable 1 List of conditions and/or mutant strains representedModule indexa Strain Growth phaseb Growth cond1 (yellow) cckA ML YPS 37°C2 (yellow) SBRM1 ML YPS 37°C3 (blue) ALS1 T YPS 37°C4 (blue) SB1003 T YPS 37°C5 (blue) ALS1 ML YPS 37°C6 (blue) SB1003 ML YPS 30°C7 (blue) SB1003 ML YPS 37°CPeña-Castillo et al. BMC Genomics 2014, 15:730 Page 10 of 14http://www.biomedcentral.com/1471-2164/15/7308 (green) ALS1 ES RCV 30°C9 (green) SB1003 ES RCV 30°C10 (green) SLRK ES RCV 30°C11 (green) SLRK ML RCV 30°C12 (green) ALS1 ML RCV 30°C13 (green) SB1003 ML RCV 30°C14 (orange) DE442 ML YPS 37°C(sequences were downloaded from NCBI on 24 January2012). Only hits with an E-value of less than 0.001were considered. Probes that mapped to multiple geneswere discarded from further analysis. If two or moreprobes mapped to a single gene, the expression value forthat gene was determined by averaging the signals acrossthose probes. Expression values were log2-transformedbefore being processed further. Normalized and log215 (orange) DE442 T YPS 37°C16 (orange) DE442 ES YPS 37°C17 (orange) DE442 LS YPS 37°C18 (red) cckA ES YPS 37°C19 (red) SBRM1 ES YPS 37°C20 (red) SB1003 LS YPS 37°C21 (red) SB1003 ES YPS 30°C22 (red) ALS1 ES YPS 37°C23 (red) SB1003 ES YPS 37°CaColours in parentheses correspond to the clusters highlighted in Figures 1 and 2.bML, mid-logarithmic growth phase; ES, early stationary growth phase; LS, late statistationary phases.cAll cultures were grown under phototrophic conditions. YPS and RCV represent coTable 2 R. capsulatus strains used in this studyR. capsulatus strain Details ReferenceSB1003 Genome-sequenced strain [3]SBRM1 SB1003 with disrupted ctrA [5]cckA SB1003 with disrupted cckA [28]ALS1 SB1003 with disrupted gtaI [42]SLKR SB1003 with disrupted gtaR [43]aDE442 RcGTA overproducer [44,45]; ProvidenceuncertainaDescribes the mutation of gtaR in a different parental strain.in R. capsulatus samplesitionc Description Number of replicatesSB1003 cckA mutant 1SB1003 ctrA mutant 3gtaI quorum sensing mutant 1Wild type 1gtaI quorum sensing mutant 1Wild type 1Wild type 7gtaI quorum sensing mutant 1Wild type 1gtaR quorum sensing mutant 1gtaR quorum sensing mutant 1gtaI quorum sensing mutant 1Wild type 1GTA overproducer 1transformed expression values were averaged across repli-cate chips to generate an averaged expression value foreach gene per experimental condition. Robust z-scoreswere obtained and used to construct the co-expressionnetwork. The robust z-score is the number of medianabsolute deviations (MAD) away from the median [46].To build a signed weighted co-expression network andidentify modules (clusters) of co-expressed genes, we usedthe function blockwiseModules in the R WGCNA package.The co-expression network was constructed based on allpairwise biweight midcorrelation values raised to a power βequal to 18. Biweight midcorrelation is less susceptible tooutliers than Pearson correlation [23]. We set the minimummodule size to fifteen, reassignThreshold to zero, and pam-RespectsDendro to false. All other WGCNA parametersremained at their default settings.To determine if any underlying biological processeswere enriched within the co-expression modules, we car-ried out over-representation analysis [47] using biologicalgenes sets from KEGG [48] metabolic pathways, transcrip-tion units and protein complexes from MetaCyc [49], andGTA overproducer 1GTA overproducer 1GTA overproducer 1SB1003 cckA mutant 1SB1003 ctrA mutant 3Wild type 2Wild type 1gtaI quorum sensing mutant 1Wild type 7onary growth phase; T, the transition point between the logarithmic andmplex and defined media, respectively.Peña-Castillo et al. BMC Genomics 2014, 15:730 Page 11 of 14http://www.biomedcentral.com/1471-2164/15/730protein domains from Pfam [50]. Functional annota-tions for all R. capsulatus genes were used for theover-representation analysis. The hypergeometric dis-tribution was used to test for statistically significantover-representation of genes from particular biologicalgene sets within the co-expression modules. P-valueswere corrected for multiple testing using false discoveryrate (FDR) [51]. Biological gene sets with an FDR-correctedp-value of less than 0.05 were deemed statistically signifi-cantly enriched within the given co-expression module. Fullfunctional analysis results are provided in Additional file 5.To investigate whether the genes in a co-expressionmodule as a set were preferentially packaged or excludedfrom RcGTA particles, we used rank-based permutationtests and the microarray data from Hynes et al. [25].Permutation tests (also called randomization tests) arenon-parametric procedures for determining statisticalsignificance based on rearrangements of the labels of adataset. We performed a rank-based permutation approachwhere all genes were ranked based on the robust z-scoresof their normalized and log2-transformed expression valuesfrom the DNA packaging array [25]. The observed ranks ofthe genes in a module were compared against the rank of1000 randomly selected sets of genes of the same size (i.e.,containing the same number of genes as the co-expressedmodule) using the Wilcoxon-Mann-Whitney test. Moduleswhose median rank were statistically lower (or greater) at asignificance level of 0.01 than the median rank of 85% ofthe random gene sets (and no random gene set was statisti-cally greater or lower) were considered to be differentiallypacked in RcGTA particles.R. sphaeroides mRNA expression dataWe gathered all available R. sphaeroides mRNA expressiondata in NCBI GEO [38] using the R package GEOquery(version 2.26.2) [52]. The 192 microarray experiments col-lected had previously been published elsewhere [6-8,53-64].Linear expression values were log2-transformed. If two ormore probes mapped to a single gene, the expression valuefor that gene was determined by averaging the signalsacross those probes. Robust z-scores were obtained andused to calculate the correlation matrix with Biweightmidcorrelation.R. capsulatus proteomics dataProtein abundance data was collected from [5] and ourown data on growth in a complex medium with andwithout supplemented phosphate (PeptideAtlas databaseaccession: PASS00523). R. capsulatus SB1003 was culturedphotoheterotrophically in 165 mL capped flat bottles for36 hours at 30°C in YPSm medium or YPSm supplementedwith 9.6 mM KPO4 pH 6.8 [26]. Cells were harvested fromapproximately 40 mL culture by centrifugation (15,000 rcf).The Accurate Mass and Time (AMT) tag proteomicsapproach was used to generate label-free relative quantifi-cation measurements [65]. Briefly, proteins from phos-phate enriched and regular cell cultures were extractedfrom whole cell, soluble and insoluble lysate fractions thendigested according to established protocols [5]. A pooledsample of peptides generated from each lysate fractionwas further fractionated using strong cation exchangeSCX-HPLC according to established protocols. 148 col-lected fractions (½ from phosphate enriched and ½ fromphosphate depleted cell cultures) were then analyzed usinga linear ion trap mass spectrometer (Thermo Scientific, SanJose CA) coupled to a reverse phase HPLC separation. MSinstrumentation operating and HPLC separation conditionshave been described previously for tandem mass spectrageneration [5]. Peptide sequence assignment to tandemmass spectra was performed using SEQUEST [66] andresults further processed using MSGF [67] in order toassign spectral probabilities. Only peptides having a spectralprobability of less than 1×10−10 and a length of at least sixamino acids were retained for matching to peptide featuredata generated using high resolution FT-MS instrumenta-tion (LTQ-Orbitrap; Thermo Scientific, San Jose CA) as de-scribed previously [68]. Arbitrary abundance measurementsfor matched peptides were determined by integrating thearea under each LC-FT-MS peak for a given peptidefeature. Measurements from multiple peptides uniquelymapped to a single protein were averaged to obtain onemeasurement of abundance per protein. Protein abundancedata were normalized using a central tendency approach[69]. Normalized abundance values were log2-transformedand converted to z-scores (the number of standard devia-tions away from the mean). Z-scores were averaged acrossreplicate conditions to generate an averaged abundancevalue for each protein per experimental condition. Proteinswith more than two missing values were removed. Z-scoreswere used to calculate the correlation matrix with Biweightmidcorrelation.Module preservationModule preservation and quality statistics were computedusing the modulePreservation function (1000 permutations)implemented in the R package WGCNA [19]. Networkmodule preservation statistics assess whether modules iden-tified in the reference network remain connected in the testnetwork (density), and whether node connections are similarbetween the reference and the test network (connectivity).These statistics are calculated without the need to definemodules in the test dataset. R. capsulatus transcriptomicsdata was our reference dataset; R. sphaeroides transcripto-mics data and R. capsulatus proteomics data were our twotest datasets. The complete set of network-based statisticsobtained is provided in Additional file 8. This same proced-ure was used to determine the reproducibility R. capsulatusmodules in the absence of the DE442 data.5. Mercer RG, Callister SJ, Lipton MS, Pasa-Tolic L, Strnad H, Paces V, Beatty JT,Peña-Castillo et al. BMC Genomics 2014, 15:730 Page 12 of 14http://www.biomedcentral.com/1471-2164/15/730LNS calculationWe calculated LNS as described by Guan et al. [20].Correlation values were transformed using the inversehyperbolic tangent (atanh) function (also called Fisher’s ztransformation), and LNS of a pair of orthologs is thecorrelation between their matched correlation vectors.Let WA = [wAij ] and WB = [wBij] denote n x n matrices ofatanh-transformed correlations, where A and B denotethe species and n is the number of orthologs betweenthese species. A correlation vector wA of an ortholog gene jis the j-th row of WA with n components (wAj1, wAj2, … wAjn).The LNS of two ortholog genes j and j' is defined as thecorrelation between the correlation vectors wA(j) and wB(j').The null distribution of LNS scores was obtained byrandomizing the ortholog mapping table while preserv-ing the correlation matrix and thus the network struc-ture. Note that randomization might also be performedby permuting the gene labels in the correlation matrix(equivalent to shuffling node labels in the network); inthis case, the null distribution will differ from the oneobtained here. However, we considered that the networktopology and the connectivity pattern of each node inthe network should be preserved during randomization;thus, we favoured randomizing the ortholog mapping table.We performed 100 random permutations of the ortholog-mapping table. To obtain the within-species LNS, weevenly divided the conditions available per species andcalculated the LNS per gene using the two resulting datasubsets. This random subsampling process was repeated100 times for each species.Availability of supporting dataThe datasets supporting the results of this article are in-cluded within the article and its additional files. Microarraydata have been deposited in the NCBI Gene ExpressionOmnibus (database accessions: GSE18149, GSE33176,GSE41014 and GSE53636) and proteomics data have beendeposited in PeptideAtlas (database accession: PASS00523).Additional filesAdditional file 1: Gene dendrogram and module labels fromresampled data sets. Cluster stability analysis results.Additional file 2: Module median expression profile similarities.Pearson correlation coefficients between the median expression profilesof the identified modules.Additional file 3: Module heatmaps. Expression profile of genes in all40 identified co-expression modules across all conditions and/or mutantstrains.Additional file 4: Gene module assignment. Module assignment andfunctional annotation for all R. capsulatus genes.Additional file 5: Functional analysis results. List of gene sets foundstatistically significantly enriched in the co-expression modules.Additional file 6: Preservation statistics of R. capsulatus genemodules in R. capsulatus proteomics data. Module preservation as afunction of module quality and relationship between the twopreservation statistics, Zsummary.qual and medianRank.Additional file 7: LNS scores. LNS scores for ortholog’ pairs betweenRhodobacter species, and for R. capsulatus mRNA-protein data.Additional file 8: Network statistics. Complete set of network-basedstatistics per co-expression module.Competing interestsThe authors declare that they have no competing interests.Authors’ contributionsLP-C conceived the study and performed the analyses. ASL and LP-Cinterpreted results of analyses and drafted the manuscript. RGM performedthe gene expression experiments. AG pre-processed the proteomics data.SJC and ATW performed the proteomics experiments. ABW and JTBprovided the cells for the proteomics experiment. All authors edited, readand approved the final version of the manuscript.AcknowledgementsThe research in L.P-C’s laboratory was supported by a Discovery Grant fromthe Natural Sciences and Engineering Research Council (NSERC), and anIgniteR&D grant from the Newfoundland and Labrador Research &Development Corporation (NL RDC). The research in A.S.L.’s laboratory wassupported by grants from NSERC, NL RDC, the Canada Foundation forInnovation, and the Industrial Research and Innovation Fund from theGovernment of Newfoundland and Labrador. R.G.M. was supported byfellowships from Memorial University and NSERC. A.G. was partiallysupported by a fellowship from Memorial University. A portion of theresearch described in this paper was funded by the Department of EnergyOffice of Biological and Environmental Research (OBER) Genome SciencesProgram under the Pan-omics project, and was performed in part in theEnvironmental Molecular Sciences Laboratory (EMSL), a national scientificuser facility sponsored by the DOE OBER and located at Pacific NorthwestNational Laboratory (PNNL). PNNL is a multiprogram national laboratoryoperated by Battelle for the DOE under contract DE-AC05-76RLO01830. A.B.W. was supported by a Canadian Institutes of Health Research grant (#93779)awarded to J.T.B.Author details1Department of Biology, Memorial University of Newfoundland, St. John’s, NLA1B 3X5, Canada. 2Department of Computer Science, Memorial University ofNewfoundland, St. John’s, NL, Canada. 3Biological Sciences Division, PacificNorthwest National Laboratory, Richland, WA 99352, USA. 4Department ofMicrobiology and Immunology, University of British Columbia, Vancouver, BC,Canada.Received: 22 January 2014 Accepted: 21 August 2014Published: 28 August 2014References1. Srinivas TN, Kumar PA, Sasikala C, Ramana C, Imhoff JF: Rhodobactervinaykumarii sp. nov., a marine phototrophic alphaproteobacteriumfrom tidal waters, and emended description of the genus Rhodobacter.Int J Syst Evol Microbiol 2007, 57(Pt 9):1984–1987.2. Madigan M, Jung D: An Overview of Purple Bacteria: Systematics,Physiology, and Habitats. In Volume 28. Edited by Hunter CN, Daldal F,Thurnauer M, Beatty JT. Netherlands: Springer; 2008:1–15.3. Strnad H, Lapidus A, Paces J, Ulbrich P, Vlcek C, Paces V, Haselkorn R: Completegenome sequence of the photosynthetic purple nonsulfur bacteriumRhodobacter capsulatus SB 1003. J Bacteriol 2010, 192(13):3545–3546.4. Kontur WS, Schackwitz WS, Ivanova N, Martin J, Labutti K, Deshpande S, TiceHN, Pennacchio C, Sodergren E, Weinstock GM, Noguera DR, Donohue TJ:Revised sequence and annotation of the Rhodobacter sphaeroides 2.4.1genome. J Bacteriol 2012, 194(24):7016–7017.Lang AS: Loss of the response regulator CtrA causes pleiotropic effectson gene expression but does not affect growth phase regulation inRhodobacter capsulatus. J Bacteriol 2010, 192(11):2701–2710.Peña-Castillo et al. BMC Genomics 2014, 15:730 Page 13 of 14http://www.biomedcentral.com/1471-2164/15/7306. Arai H, Roh JH, Eraso JM, Kaplan S: Transcriptome response to nitrosativestress in Rhodobacter sphaeroides 2.4.1. Biosci Biotechnol Biochem 2013,77(1):111–118.7. Dufour YS, Imam S, Koo BM, Green HA, Donohue TJ: Convergence of thetranscriptional responses to heat shock and singlet oxygen stresses.PLoS Genet 2012, 8(9):e1002929.8. Kontur WS, Ziegelhoffer EC, Spero MA, Imam S, Noguera DR, Donohue TJ:Pathways involved in reductant distribution during photobiological H(2)production by Rhodobacter sphaeroides. Appl Environ Microbiol 2011,77(20):7425–7429.9. Mackenzie C, Eraso JM, Choudhary M, Roh JH, Zeng X, Bruscella P, Puskas A,Kaplan S: Postgenomic adventures with Rhodobacter sphaeroides.Annu Rev Microbiol 2007, 61:283–307.10. Masepohl B, Hallenbeck PC: Nitrogen and molybdenum control ofnitrogen fixation in the phototrophic bacterium Rhodobacter capsulatus.Adv Exp Med Biol 2010, 675:49–70.11. Wu J, Bauer CE: RegB/RegA, a global redox-responding two-componentsystem. Adv Exp Med Biol 2008, 631:131–148.12. Shelswell KJ, Beatty JT: Coordinated, long-range, solid substrate movementof the purple photosynthetic bacterium Rhodobacter capsulatus. PLoS One2011, 6(5):e19646.13. Lang AS, Zhaxybayeva O, Beatty JT: Gene transfer agents: phage-like elementsof genetic exchange. Nat Rev Microbiol 2012, 10(7):472–482.14. Zhang B, Horvath S: A general framework for weighted gene co-expressionnetwork analysis. Stat Appl Genet Mol Biol 2005, 4:17.15. Zhao W, Langfelder P, Fuller T, Dong J, Li A, Hovarth S: Weighted genecoexpression network analysis: state of the art. J Biopharm Stat 2010,20(2):281–300.16. Qiao J, Shao M, Chen L, Wang J, Wu G, Tian X, Liu J, Huang S, Zhang W:Systematic characterization of hypothetical proteins in Synechocystis sp.PCC 6803 reveals proteins functionally relevant to stress responses. Gene2013, 512(1):6–15.17. Childs KL, Davidson RM, Buell CR: Gene coexpression network analysis as asource of functional annotation for rice genes. PLoS One 2011, 6(7):e22196.18. Allen JD, Xie Y, Chen M, Girard L, Xiao G: Comparing statistical methodsfor constructing large scale gene networks. PLoS One 2012, 7(1):e29348.19. Langfelder P, Luo R, Oldham MC, Horvath S: Is my network modulepreserved and reproducible? PLoS Comput Biol 2011, 7(1):e1001057.20. Guan Y, Dunham MJ, Troyanskaya OG, Caudy AA: Comparative gene expressionbetween two yeast species. BMC Genomics 2013, 14:33-2164-14-33.21. Maier T, Guell M, Serrano L: Correlation of mRNA and protein in complexbiological samples. FEBS Lett 2009, 583(24):3966–3973.22. Vogel C, Marcotte EM: Insights into the regulation of protein abundancefrom proteomic and transcriptomic analyses. Nat Rev Genet 2012,13(4):227–232.23. Langfelder P, Horvath S: Fast R Functions for Robust Correlations andHierarchical Clustering. J Stat Softw 2012, 46(11):i11.24. Lang AS, Beatty JT: Genetic analysis of a bacterial genetic exchangeelement: the gene transfer agent of Rhodobacter capsulatus. Proc NatlAcad Sci U S A 2000, 97(2):859–864.25. Hynes AP, Mercer RG, Watton DE, Buckley CB, Lang AS: DNA packagingbias and differential expression of gene transfer agent genes within apopulation during production and release of the Rhodobacter capsulatusgene transfer agent. RcGTA Mol Microbiol 2012, 85(2):314–325.26. Westbye AB, Leung MM, Florizone SM, Taylor TA, Johnson JA, Fogg PC,Beatty JT: Phosphate concentration and the putative sensor kinaseprotein CckA modulate cell lysis and release of the Rhodobactercapsulatus gene transfer agent. J Bacteriol 2013, 195(22):5025–5040.27. Brimacombe CA, Ding H, Beatty JT: Rhodobacter capsulatus DprA isessential for RecA-mediated gene transfer agent (RcGTA) recipientcapability regulated by quorum-sensing and the CtrA response regulator.Mol Microbiol 2014, 92(6):1260–1278.28. Mercer RG, Quinlan M, Rose AR, Noll S, Beatty JT, Lang AS: Regulatorysystems controlling motility and gene transfer agent production andrelease in Rhodobacter capsulatus. FEMS Microbiol Lett 2012, 331(1):53–62.29. Lang AS, Beatty JT: A bacterial signal transduction system controlsgenetic exchange and motility. J Bacteriol 2002, 184(4):913–918.30. Mercer RG, Lang AS: Identification of a predicted partner-switchingsystem that affects production of the gene transfer agent RcGTA andstationary phase viability in Rhodobacter capsulatus. BMC Microbiol 2014,14:71-2180-14-71.31. Klug G: Beyond catalysis: vitamin B as a cofactor in gene regulation.Mol Microbiol 2014, 91(4):635–640.32. Brimacombe CA, Stevens A, Jun D, Mercer R, Lang AS, Beatty JT:Quorum-sensing regulation of a capsular polysaccharide receptor forthe Rhodobacter capsulatus gene transfer agent (RcGTA). Mol Microbiol2013, 87(4):802–817.33. Fogg PC, Hynes AP, Digby E, Lang AS, Beatty JT: Characterization of anewly discovered Mu-like bacteriophage, RcapMu, in Rhodobactercapsulatus strain SB1003. Virology 2011, 421(2):211–221.34. Shimodaira H: Approximately unbiased tests of regions usingmultistep-multiscale bootstrap resampling. Ann Stat 2004, 32(6):2616–2641.35. Suzuki R, Shimodaira H: Pvclust: an R package for assessing the uncertaintyin hierarchical clustering. Bioinformatics 2006, 22(12):1540–1542.36. Whiteside MD, Winsor GL, Laird MR, Brinkman FS: OrtholugeDB: a bacterialand archaeal orthology resource for improved comparative genomicanalysis. Nucleic Acids Res 2013, 41(Database issue):D366–D376.37. Lang AS, Taylor TA, Beatty JT: Evolutionary implications of phylogeneticanalyses of the gene transfer agent (GTA) of Rhodobacter capsulatus.J Mol Evol 2002, 55(5):534–543.38. Barrett T, Wilhite SE, Ledoux P, Evangelista C, Kim IF, Tomashevsky M,Marshall KA, Phillippy KH, Sherman PM, Holko M, Yefanov A, Lee H, ZhangN, Robertson CL, Serova N, Davis S, Soboleva A: NCBI GEO: archive forfunctional genomics data sets–update. Nucleic Acids Res 2013,41(Database issue):D991–D995.39. Gautier L, Cope L, Bolstad BM, Irizarry RA: affy–analysis of AffymetrixGeneChip data at the probe level. Bioinformatics 2004, 20(3):307–315.40. Brettschneider J, Collin F, Bolstad BM, Speed TP: Quality Assessmentfor Short Oligonucleotide Microarray Data. Technometrics 2008,50(3):241–264.41. Langfelder P, Horvath S: WGCNA: an R package for weighted correlationnetwork analysis. BMC Bioinformatics 2008, 9:559.42. Schaefer AL, Taylor TA, Beatty JT, Greenberg EP: Long-chain acyl-homoserinelactone quorum-sensing regulation of Rhodobacter capsulatus genetransfer agent production. J Bacteriol 2002, 184(23):6515–6521.43. Leung MM, Brimacombe CA, Spiegelman GB, Beatty JT: The GtaR proteinnegatively regulates transcription of the gtaRI operon and modulatesgene transfer agent (RcGTA) expression in Rhodobacter capsulatus.Mol Microbiol 2012, 83(4):759–774.44. Yen HC, Hu NT, Marrs BL: Characterization of the gene transfer agentmade by an overproducer mutant of Rhodopseudomonas capsulata.J Mol Biol 1979, 131(2):157–168.45. Ding H, Moksa MM, Hirst M, Beatty JT: Draft Genome Sequences of SixRhodobacter capsulatus Strains, YW1, YW2, B6, Y262, R121, and DE442.Genome Announc 2014, 2(1):10.1128. genomeA.00050-14.46. Birmingham A, Selfors LM, Forster T, Wrobel D, Kennedy CJ, Shanks E,Santoyo-Lopez J, Dunican DJ, Long A, Kelleher D, Smith Q, Beijersbergen RL,Ghazal P, Shamu CE: Statistical methods for analysis of high-throughputRNA interference screens. Nat Methods 2009, 6(8):569–575.47. Khatri P, Sirota M, Butte AJ: Ten years of pathway analysis: current approachesand outstanding challenges. PLoS Comput Biol 2012, 8(2):e1002375.48. Kanehisa M, Goto S, Sato Y, Furumichi M, Tanabe M: KEGG for integrationand interpretation of large-scale molecular data sets. Nucleic Acids Res2012, 40(Database issue):D109–D114.49. Caspi R, Altman T, Dreher K, Fulcher CA, Subhraveti P, Keseler IM, Kothari A,Krummenacker M, Latendresse M, Mueller LA, Ong Q, Paley S, Pujar A,Shearer AG, Travers M, Weerasinghe D, Zhang P, Karp PD: The MetaCycdatabase of metabolic pathways and enzymes and the BioCyccollection of pathway/genome databases. Nucleic Acids Res 2012,40(Database issue):D742–D753.50. Punta M, Coggill PC, Eberhardt RY, Mistry J, Tate J, Boursnell C, Pang N,Forslund K, Ceric G, Clements J, Heger A, Holm L, Sonnhammer EL, Eddy SR,Bateman A, Finn RD: The Pfam protein families database. Nucleic Acids Res2012, 40(Database issue):D290–D301.51. Benjamini Y, Hochberg Y: Controlling the False Discovery Rate: A Practicaland Powerful Approach to Multiple Testing. J R Stat Soc Ser B Methodol1995, 57(1):289–300.52. Davis S, Meltzer PS: GEOquery: a bridge between the Gene ExpressionOmnibus (GEO) and BioConductor. Bioinformatics 2007, 23(14):1846–1847.53. Anthony JR, Warczak KL, Donohue TJ: A transcriptional response to singletoxygen, a toxic byproduct of photosynthesis. Proc Natl Acad Sci U S A2005, 102(18):6502–6507.54. Arai H, Roh JH, Kaplan S: Transcriptome dynamics during the transitionfrom anaerobic photosynthesis to aerobic respiration in Rhodobactersphaeroides 2.4.1. J Bacteriol 2008, 190(1):286–299.55. Braatsch S, Moskvin OV, Klug G, Gomelsky M: Responses of theRhodobacter sphaeroides transcriptome to blue light under semiaerobicSubmit your next manuscript to BioMed Centraland take full advantage of: • Convenient online submission• Thorough peer review• No space constraints or color figure charges• Immediate publication on acceptance• Inclusion in PubMed, CAS, Scopus and Google Scholar• Research which is freely available for redistributionPeña-Castillo et al. BMC Genomics 2014, 15:730 Page 14 of 14http://www.biomedcentral.com/1471-2164/15/730conditions. J Bacteriol 2004, 186(22):7726–7735.56. Bruscella P, Eraso JM, Roh JH, Kaplan S: The use of chromatinimmunoprecipitation to define PpsR binding activity in Rhodobactersphaeroides 2.4.1. J Bacteriol 2008, 190(20):6817–6828.57. Eraso JM, Roh JH, Zeng X, Callister SJ, Lipton MS, Kaplan S: Role of theglobal transcriptional regulator PrrA in Rhodobacter sphaeroides 2.4.1:combined transcriptome and proteome analysis. J Bacteriol 2008,190(14):4831–4848.58. Gomelsky L, Sram J, Moskvin OV, Horne IM, Dodd HN, Pemberton JM,McEwan AG, Kaplan S, Gomelsky M: Identification and in vivocharacterization of PpaA, a regulator of photosystem formation inRhodobacter sphaeroides. Microbiology 2003, 149(Pt 2):377–388.59. Moskvin OV, Gomelsky L, Gomelsky M: Transcriptome analysis of theRhodobacter sphaeroides PpsR regulon: PpsR as a master regulator ofphotosystem development. J Bacteriol 2005, 187(6):2148–2156.60. Moskvin OV, Kaplan S, Gilles-Gonzalez MA, Gomelsky M: Novel heme-basedoxygen sensor with a revealing evolutionary history. J Biol Chem 2007,282(39):28740–28748.61. Tavano CL, Podevels AM, Donohue TJ: Identification of genes required forrecycling reducing power during photosynthetic growth. J Bacteriol 2005,187(15):5249–5258.62. Tsuzuki M, Moskvin OV, Kuribayashi M, Sato K, Retamal S, Abo M, Zeilstra-Ryalls J,Gomelsky M: Salt stress-induced changes in the transcriptome, compatiblesolutes, and membrane lipids in the facultatively phototrophic bacteriumRhodobacter sphaeroides. Appl Environ Microbiol 2011, 77(21):7551–7559.63. Zeller T, Moskvin OV, Li K, Klug G, Gomelsky M: Transcriptome andphysiological responses to hydrogen peroxide of the facultativelyphototrophic bacterium Rhodobacter sphaeroides. J Bacteriol 2005,187(21):7232–7242.64. Zeller T, Mraheil MA, Moskvin OV, Li K, Gomelsky M, Klug G: Regulation ofhydrogen peroxide-dependent gene expression in Rhodobacter sphaeroides:regulatory functions of OxyR. J Bacteriol 2007, 189(10):3784–3792.65. Smith RD, Anderson GA, Lipton MS, Pasa-Tolic L, Shen Y, Conrads TP,Veenstra TD, Udseth HR: An accurate mass tag strategy for quantitativeand high-throughput proteome measurements. Proteomics 2002,2(5):513–523.66. Eng JK, McCormack AL, Yates JR: An approach to correlate tandem massspectral data of peptides with amino acid sequences in a proteindatabase. J Am Soc Mass Spectrom 1994, 5(11):976–989.67. Kim S, Gupta N, Pevzner PA: Spectral probabilities and generating functionsof tandem mass spectra: a strike against decoy databases. J Proteome Res2008, 7(8):3354–3363.68. Robidart J, Callister SJ, Song P, Nicora CD, Wheat CG, Girguis PR: Characterizingmicrobial community and geochemical dynamics at hydrothermal ventsusing osmotically driven continuous fluid samplers. Environ Sci Technol 2013,47(9):4399–4407.69. Callister SJ, Barry RC, Adkins JN, Johnson ET, Qian WJ, Webb-Robertson BJ,Smith RD, Lipton MS: Normalization approaches for removing systematicbiases associated with mass spectrometry and label-free proteomics.J Proteome Res 2006, 5(2):277–286.doi:10.1186/1471-2164-15-730Cite this article as: Peña-Castillo et al.: Gene co-expression networkanalysis in Rhodobacter capsulatus and application to comparativeexpression analysis of Rhodobacter sphaeroides. BMC Genomics2014 15:730.Submit your manuscript at www.biomedcentral.com/submit


Citation Scheme:


Citations by CSL (citeproc-js)

Usage Statistics



Customize your widget with the following options, then copy and paste the code below into the HTML of your page to embed this item in your website.
                            <div id="ubcOpenCollectionsWidgetDisplay">
                            <script id="ubcOpenCollectionsWidget"
                            async >
IIIF logo Our image viewer uses the IIIF 2.0 standard. To load this item in other compatible viewers, use this url:


Related Items