UBC Faculty Research and Publications

Identification of a set of genes showing regionally enriched expression in the mouse brain D'Souza, Cletus A; Chopra, Vikramjit; Varhol, Richard; Xie, Yuan-Yun; Bohacec, Slavita; Zhao, Yongjun; Lee, Lisa L; Bilenky, Mikhail; Portales-Casamar, Elodie; He, An; Wasserman, Wyeth W; Goldowitz, Daniel; Marra, Marco A; Holt, Robert A; Simpson, Elizabeth M; Jones, Steven J Jul 14, 2008

Your browser doesn't seem to have a PDF viewer, please download the PDF to view this item.

Item Metadata

Download

Media
52383-12868_2007_Article_749.pdf [ 1.58MB ]
Metadata
JSON: 52383-1.0223689.json
JSON-LD: 52383-1.0223689-ld.json
RDF/XML (Pretty): 52383-1.0223689-rdf.xml
RDF/JSON: 52383-1.0223689-rdf.json
Turtle: 52383-1.0223689-turtle.txt
N-Triples: 52383-1.0223689-rdf-ntriples.txt
Original Record: 52383-1.0223689-source.json
Full Text
52383-1.0223689-fulltext.txt
Citation
52383-1.0223689.ris

Full Text

ralssBioMed CentBMC NeuroscienceOpen AcceResearch articleIdentification of a set of genes showing regionally enriched expression in the mouse brainCletus A D'Souza*†1, Vikramjit Chopra†1, Richard Varhol1, Yuan-Yun Xie2, Slavita Bohacec2, Yongjun Zhao1, Lisa LC Lee2, Mikhail Bilenky1, Elodie Portales-Casamar2, An He1, Wyeth W Wasserman2, Daniel Goldowitz2, Marco A Marra1, Robert A Holt2, Elizabeth M Simpson2 and Steven JM Jones1Address: 1Genome Sciences Centre, British Columbia Cancer Agency, 570 West 7th Ave – Suite 100, Vancouver, BC, V5Z 4E6, Canada and 2Centre for Molecular Medicine and Therapeutics, Child and Family Research Institute, Department of Medical Genetics, University of British Columbia, 950 West 28th Ave., Vancouver, BC, V5Z 4H4, CanadaEmail: Cletus A D'Souza* - cdsouza@bcgsc.ca; Vikramjit Chopra - vchopra@bcgsc.ca; Richard Varhol - rvarhol@bcgsc.ca; Yuan-Yun Xie - yyxie@cmmt.ubc.ca; Slavita Bohacec - slavita@cmmt.ubc.ca; Yongjun Zhao - yzhao@bcgsc.ca; Lisa LC Lee - Lisa.Lee@postgrad.manchester.ac.uk; Mikhail Bilenky - mbilenky@bcgsc.ca; Elodie Portales-Casamar - elodie@cmmt.ubc.ca; An He - ahe@bcgsc.ca; Wyeth W Wasserman - wyeth@cmmt.ubc.ca; Daniel Goldowitz - dang@cmmt.ubc.ca; Marco A Marra - mmarra@bcgsc.ca; Robert A Holt - rholt@bcgsc.ca; Elizabeth M Simpson - simpson@cmmt.ubc.ca; Steven JM Jones - sjones@bcgsc.ca* Corresponding author    †Equal contributorsAbstractBackground: The Pleiades Promoter Project aims to improve gene therapy by designing humanmini-promoters (< 4 kb) that drive gene expression in specific brain regions or cell-types oftherapeutic interest. Our goal was to first identify genes displaying regionally enriched expressionin the mouse brain so that promoters designed from orthologous human genes can then be testedto drive reporter expression in a similar pattern in the mouse brain.Results: We have utilized LongSAGE to identify regionally enriched transcripts in the adult mousebrain. As supplemental strategies, we also performed a meta-analysis of published literature andinspected the Allen Brain Atlas in situ hybridization data. From a set of approximately 30,000 mousegenes, 237 were identified as showing specific or enriched expression in 30 target regions of themouse brain. GO term over-representation among these genes revealed co-involvement in variousaspects of central nervous system development and physiology.Conclusion: Using a multi-faceted expression validation approach, we have identified mousegenes whose human orthologs are good candidates for design of mini-promoters. These mousegenes represent molecular markers in several discrete brain regions/cell-types, which couldpotentially provide a mechanistic explanation of unique functions performed by each region. Thisset of markers may also serve as a resource for further studies of gene regulatory elementsinfluencing brain expression.Published: 14 July 2008BMC Neuroscience 2008, 9:66 doi:10.1186/1471-2202-9-66Received: 24 December 2007Accepted: 14 July 2008This article is available from: http://www.biomedcentral.com/1471-2202/9/66© 2008 D'Souza et al; licensee BioMed Central Ltd. This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.Page 1 of 14(page number not for citation purposes)BMC Neuroscience 2008, 9:66 http://www.biomedcentral.com/1471-2202/9/66BackgroundThe Pleiades Promoter Project (please see Availability &requirements for more information) addresses two majorchallenges identified in gene therapy – first, the delivery ofDNA to specific cell types to reduce side effects from treat-ing healthy cells and second, controlled delivery of DNAto a specific locus in the genome to avoid insertionalmutagenesis. The goal for the project is the generation ofhuman DNA promoters less than 4 kb in length (mini-promoters) that drive gene expression in brain regionsimportant in neurological conditions. To achieve thisgoal, we have first identified genes with enriched expres-sion in different regions of the adult mouse brain.Regional expression patterns within the brain tend to beconserved between orthologous human and mouse genes[1]. Additionally, as regulatory sequences in tissue-specificgenes tend to be highly conserved [2], human mini-pro-moters are expected to drive regional gene expression intransgenic mice based on earlier studies [3]. Therefore,promoter regions from orthologous human genes will beassessed in the mouse brain for the ability to driveregional expression.Selection of the most optimal genes for promoter designnecessitates detailed assessment of gene expression pat-terns. An invaluable resource to identify genes expressedin the mammalian brain is the serial analysis of geneexpression (SAGE) technique [4,5]. A modern improve-ment of tag-based expression analysis is LongSAGE, whichproduces longer transcript tags (21-bp) better suited tounique mapping onto cDNA and genome sequences [6].As part of the Mouse Atlas of Gene Expression project [7],LongSAGE was used to profile transcriptomes of 72 tissuesof mouse strain C57BL/6J at various stages of develop-ment [8]. For the Pleiades Promoter Project [9], a scion ofthe Mouse Atlas project, we have generated new Long-SAGE data on gene expression in the adult mouse centralnervous system to identify genes that display enrichedexpression in key brain regions.While LongSAGE provides a rich perspective on geneexpression patterns, we extended our data mining effortsto include other large information sources. The PubMeddatabase [10] provides an unparalleled compendium oftext from the scientific literature. In order to facilitateextraction of key information from Medline abstracts orfull-text articles in PubMed, natural language processingtools are routinely employed to semi-automate the proc-ess of literature mining [11,12]. In this study we investi-gated an approach to specifically and automaticallyidentify associations between genes and brain regionsfrom the literature. We further analysed expression datafrom the Allen Brain Atlas (ABA; [13]), a high-throughputwe report the successful utilization of a combination ofgene-finding tools, including SAGE analysis, text miningand ABA expression data, to identify genes displayingregionally enriched expression in surrogate regions oftherapeutic interest within the mouse brain.ResultsIdentification of brain region-enriched gene expression by LongSAGETo identify regionally enriched gene expression within thebrain of the adult mouse strain C57BL/6J, we used the pre-cision of Laser Capture Microdissection (LCM; Figure 1)[16] to isolate component tissues and construct SAGElibraries from 17 brain regions as well as the whole adultmouse brain for comparison (Methods). As shown inTable 1, these libraries have been sampled to a depth of >100,000 tags each, a level shown to be adequate for thediscovery of medium-to-high level transcripts [8]. Bioin-formatics analysis of differential gene expression was per-formed as described in Methods. Since the majority oftranscripts were detected in multiple libraries, weemployed a heuristic approach to identify and rankexpression patterns (outlined in Table 2). For each brainregion, we ranked genes from 1–91 based on the level andpattern of expression in descending order. Expression spe-cificity of a ranked list of 1999 SAGE-identified genes wasthen confirmed by examining related literature informa-tion and Allen Brain Atlas in situ hybridization data. Basedon this collective information, region-specific or region-enriched genes were further considered.Of the 237 genes identified as displaying regionallyenriched expression in this study, 132 genes [see Addi-tional file 1] displayed expression patterns listed in Table2. Only 22 genes were found in a single library and five ofthese (A930006D11Rik, Chrna6, Gdf10, Hcrt, and Hes3)were determined to be tissue-specific at a statistically sig-nificant level (tag counts > 5, P < 0.05).Complexity of the adult mouse brain transcriptome and SAGE-based analysis of transcriptome similarity of brain regionsAs an indication of complexity of the adult mouse braintranscriptome, within the 18 Pleiades libraries (includingwhole adult brain library) expression was observed for11,836 genes of the total 17,098 genes detectable withinthe Mouse Atlas (total number of tags mapped to theMouse Atlas libraries was approximately 8.8 millionincluding singletons). In contrast, the Allen Brain Atlas(ABA) contains expression patterns of approximately16,000 genes across the entire adult C57BL/6J mousebrain (Susan Sunkin, ABA, personal communication); ofthese genes, roughly 65.5% (10,479/16,000) were detect-Page 2 of 14(page number not for citation purposes)in situ hybridization platform that has assayed expressionfor ~20,000 genes in the adult mouse brain [14,15]. Here,able in the 18 Pleiades libraries. Furthermore, the PleiadesBMC Neuroscience 2008, 9:66 http://www.biomedcentral.com/1471-2202/9/66libraries provided about 8% (1,357/17,357) additionalgenes to the total number of genes detectable by ABA.We also analyzed SAGE data to measure transcriptomesimilarity between selected tissues. The premise was thattissues would cluster together or diverge based on thedegree to which their genes are differentially expressed.Hierarchical clustering was done based on unweightedaverage distance between formed clusters (see descriptionin Methods), the results of which are displayed in theform of a dendrogram (Figure 2). A pattern of divergenttissue clusters consistently emerges: a cluster of neuronaltissues and several discrete single tissue clusters includingEpendymal Layers, Cerebellum White Matter and Cerebel-Table 1: List of adult brain region SAGE librariesName Description No. of Genes Total TagsbSM098 Whole braina 6893 108441SM110 Hypothalamus 6676 108882SM132 Ventral Thalamus 6441 105701SM137 Hippocampus Dentate Gyrus, dorsal/anterior 5935 104322SM139 Medial Thalamus 6608 105364SM147 Visual Cortex Layers II/III/IV 6683 136039SM152 Substantia Nigra 6584 115991SM153 Basal Nucleus of Meynert 6581 120997SM180 Locus Coeruleus 6282 102933SM181 Raphe Nuclei 6434 104627SM182 Cerebellum White Matter 5461 107335SM183 Primary Motor Cortex 6543 115262SM184 Hippocampus CA1, dorsal/anterior 6331 118198SM193 Amygdala, basolateral complex 6396 109772SM194 Amygdala, central nucleus 6451 110056SM195 Dorsal striatum 6185 105509SM196 Cerebellum, Purkinje Cell Layer 6604 104850SM201 Ependymal and Subependymal Layers 6561 107041aManually dissected; all others were laser capture microdissectedbRepresents filtered dataTable 2: Rank order based on the level and pattern of gene expressionRank Order Expression Pattern1 1 TL* and 0 OTL* (PTL-OTL < = 0.05) (TL tag count > = 5)2–6 1 TL and 1–5 OTLs (PTL-OTL < = 0.05)7–11 1 TL and 1–5 OTLs (PTL-OTL > 0.05) (TL tag count > = 5, OTL tag count: 1–4)12–17 2 TLs and 0–5 OTLs (PTL-TL > 0.05; PTL-OTL < = 0.05)18–22 2 TLs and 1–5 OTLs (PTL-TL > 0.05; PTL-OTL > 0.05) (TL tag count > = 5, OTL tag count: 1–4)23–28 3 TLs and 0–5 OTLs (PTL-TL > 0.05; PTL-OTL < = 0.05)29–33 3 TLs and 1–5 OTLs (PTL-TL > 0.05; PTL-OTL> 0.05) (TL tag count > = 5, OTL tag count: 1–4)34–39 4 TLs and 0–5 OTLs (PTL-TL > 0.05; PTL-OTL< = 0.05)40–44 4 TLs and 1–5 OTLs (PTL-TL > 0.05; PTL-OTL > 0.05) (TL tag count > = 5, OTL tag count: 1–4)45–55 1 TL and 6–16 OTLs (PTL-OTL < = 0.05)56–65 2 TLs and 6–15 OTLs (PTL-TL > 0.05; PTL-OTL < = 0.05)66–74 3 TLs and 6–14 OTLs (PTL-TL > 0.05; PTL-OTL < = 0.05)75–82 4 TLs and 6–13 OTLs (PTL-TL > 0.05; PTL-OTL < = 0.05)83 1 TL with 4 tags84 1 TL with 3 tags85 2 TLs with 4 tags86 2 TLs with 3 tags87 1 TL with 2 tags88 2 TLs with 2 tags89 3 TLs with 3 tags90 3 TLs with 2 tags91 1 TL with 1 tagPage 3 of 14(page number not for citation purposes)* TL = Target library (brain region of interest), OTL = Off-target library (background region)BMC Neuroscience 2008, 9:66 http://www.biomedcentral.com/1471-2202/9/66lum Purkinje Cell Layer. Among neuronal tissues, the between any two pairs of tissues. Additionally, Visual Cor-Use of Laser Capture Microdissection to isolate the hippocampus dentate gyrus from an adult mouseFigure 1Use of Laser Capture Microdissection to isolate the hippocampus dentate gyrus from an adult mouse. A) Intact coronal brain section at ~Bregma -1.35 stained with cresyl violet. B & C) dentate gyrus (DG) has been microdissected with laser. D) dentate gyrus has been isolated and captured for total RNA extraction and construction of SAGE libraries. Images were captured using a Sony DXC-390P 3-CCD color video camera attached to a Nikon Eclipse TE2000-S microscope (10× magnification). Scale bar = 100 m. D: dorsal; V: ventral.A BC D100 μmPage 4 of 14(page number not for citation purposes)Ventral and Medial Thalamus consistently clusteredtightly together and had the lowest expression divergencetex, Primary Motor Cortex, Amygdala (basolateral), Amy-gdala (central), and Dorsal Striatum also clusteredBMC Neuroscience 2008, 9:66 http://www.biomedcentral.com/1471-2202/9/66together. Segregation of the Ependymal tissue into a sepa-rate single cluster makes sense given its non-neuronalnature [17], and the Cerebellar White Matter is composedof myelinated axonal processes. Clustering is usually sen-sitive to the specific expression divergence measure used.However, we tried several empirical measures, as well asdifferent P values for selecting differentially expressedgenes, and observed that the main pattern of clusteringoutlined above remains unchanged.Literature mining strategy to rapidly identify genes associated with brain regions of interestWe included in the present analysis several additionalbrain regions and cell-types, for example, Blood-BrainBarrier, Barrington's Nucleus, Astroglia etc., for whichSAGE libraries had not been constructed. Therefore, toexpand our set of genes with regionally enriched expres-sion for all brain regions, we then scrutinized literaturefrom PubMed. We obtained a list of Medline records usingBoolean logic with search term combinations indicated inTranscriptome similarity among 17 brain tissues based on expression divergence at P value = 0.01Figure 2Transcriptome similarity among 17 brain tissues based on expression divergence at P value = 0.01. Tissues being compared are indicated on the Y-axis, and expression divergence (EDP) of clusters of tissues is plotted on the X-axis. At each node in the dendrogram, the number of genes shared between libraries in the tissue cluster is indicated. A threshold of 50% of maximum EDP was chosen for coloring of branch lines in the dendrogram.1877408322322321308331184725356138354396510942494861495042764973Ependymal and Subependymal LayersCerebellum White MatterCerebellum, Purkinje Cell LayerAmygdala, Central NucleusAmygdala, Basolateral ComplexPrimary Motor CortexDorsal StriatumHippocampus CA1, dorsal/anteriorVisual Cortex Layers II/III/IVHippocampus Dentate Gyrus, dorsal/anteriorSubstantia NigraRaphe NucleiLocus CoeruleusHypothalamusBasal Nucleus of MeynertMedial ThalamusVentral thalamus0.02 0.03 0.04 0.05 0.06 0.07 0.08 0.09 0.1Expression Divergence (ED)Page 5 of 14(page number not for citation purposes)Table 3. To facilitate retrieval of publications from a largeliterature database such as PubMed, we also developed aBMC Neuroscience 2008, 9:66 http://www.biomedcentral.com/1471-2202/9/66semi-automated literature mining strategy (see Methodsand Figure 3) based on natural language processing. Inthis approach we looked for the appearance of a genename or synonym and a brain region in a sentence. Of the99.7 million sentences searched, 314,515 occurrences of abrain region term were found; 4,395 mouse genes names,or the names of their human orthologs, were found toappear within the same sentence as a brain region (notshown).The candidature of literature-mined genes was verified byassessing available expression data (reporter gene expres-sion, microarray expression profile, radioactive/non-radi-oactive in situ hybridization) in publications, andconfirmed with in situ hybridization data from the AllenBrain Atlas (see below). In addition to promoter-reporterfusion data from the literature, reporter expression datafor BAC (Bacterial Artificial Chromosome) transgenicmice, when available from the GENSAT database [18],was also considered as complementary evidence of expres-sion [see Additional file 2].Data mining genes showing regionally enriched expression from Allen Brain AtlasThe entire Allen Brain Atlas (ABA) data set can be searchedvia a web-based application [13,14]. We used this featureto examine expression patterns of genes identified asregionally enriched by SAGE and/or the literature. Thisverification was particularly apt for SAGE because ABA insitu hybridization patterns were also derived from thesame mouse strain C57BL/6J. We also employed the ABAAnatomic Search tool to identify additional genes whoseexpression patterns cluster within brain regions of inter-Table 3: Boolean search terms to obtain Medline records with information about region-associated expression or promoter characterizationGene AND brain AND in situ [qualifiers: Mouse/Human]Gene AND brain region AND in situGene AND regulationGene AND promoterGene AND promoter AND brainGene AND promoter AND brain regionGene AND promoter AND transgenic miceGene AND promoter AND reporter (qualifiers: CAT/Luciferase/Gfp)Text mining data flowFigure 3Text mining data flow. This shows the steps by which the medical sentence parser retrieves Medline records that contain Medline PubMedCentral9,656,783 abstracts 24,287 full-textsExtract sentences using medical sentence parser99,750,126 sentencesIdentify sentences with brain regions 314,515 sentencesIdentify sentences with gene names/synonyms and brain regions 958,149 associations4,395 genes in the same sentence as a brain regionPage 6 of 14(page number not for citation purposes)expression information for a gene in a specific region of the brain.BMC Neuroscience 2008, 9:66 http://www.biomedcentral.com/1471-2202/9/66est. While this approach short-listed genes for majorregions (Thalamus, Cerebral Cortex etc.) of the mousebrain listed under Anatomic Search, we also searchedwithin these regions to identify expression in sub-regionsof interest, e.g. within Pons for genes expressed in LocusCoeruleus. Recent introduction of the alternative ABAsearch tool, NeuroBlast, also proved to be useful. We usedregionally enriched co-expressed genes in this manner isindispensable in subsequent identification of shared reg-ulatory elements for efficient mini-promoter design.Thus, SAGE analysis of the adult mouse brain transcrip-tome combined with meta-analysis using data miningresources described above identified 237 genes as show-Table 4: List of regionally enriched genes in 30 brain regions and cell-types of therapeutic interestBrain Regions/Cell types Example Processes/Disease AssociationsGenesCortex Alzheimer Disease, Amyotropic Lateral Sclerosis, PlasticityB3galt2, 3110035E14Rik, Ccl27, Ctgf, Emx1, Fhl2, Klf10, Myl4, Rbp4, Rtn4rl2, Stx1a, Tbr1, Vip, Ddit4l, Dkkl1, Rspo2, Ier5, Igfbp6, Ephb6, Mpped1, Pak7, Satb2, Cplx3, E430002G05RikHippocampus Alzheimer Disease Adult Neurogenesis, Depression, PlasticityHtr1A, Tgfb2, Gria1, Nr3c2Hippocampus, Ammon's Horn Alzheimer Disease Adult Neurogenesis, Depression, PlasticityHunk, Klk8, Gpr161, Arfrp2, C630041L24Rik, Slc9a2, Neurod6, Pkp2, Fibcd1, Sstr4Hippocampus, Dentate Gyrus Alzheimer Disease Adult Neurogenesis, Depression, PlasticityGabrd, Prox1, Dsp, C78409, Lct, Crlf1, Tdo2, A330019N05Rik, Lrrtm4, Htr4, Tspan18Neurogenic Regions Adult Neurogenesis Nr2e1, Dcx, Mki67, Vim, Dlx2, Nes, Dlx1, Dscam, Fabp7, Igfbpl1, Lrrn1, Rrm1, Sox2, Thbs4Striatum Huntington Disease, Parkinson Disease, Plasticity in DepressionAdora2a, Gpr88, Drd1a, Drd2, Gpr6, Rgs9, Adcy5, Crym, Foxp1, Lpl, Pde1b1, Pdyn, Rarb, Rasd2, TgfaAmygdala Huntington Disease, Depression, Plasticity Tac1, Cyp26b1, Hap1, Cdh9, Ptprc, Gabra2, Hgf, Pdzrn3, Plxnd1, Wwox, Rasal1, Dock10, PrkcdAmygdala, Basolateral Complex Huntington Disease, Depression, Plasticity Grp, Nov, Nr2f2Amygdala, Central Nucleus Huntington Disease, Depression, Plasticity Atp6v1c2Thalamus Huntington Disease Ramp3, Rgs16, Slitrk6, Tnnt1, 1110069I04Rik Amotl1, Rab37, Sh3d19, Grid2ip, Lef1, Plekhg1, Syt9, Tcf7l2, Gm804, Gja7, Socs6, Vangl1Hypothalamus Cancer Hcrt, Gpx3, Trh, Fezf1, Agrp, Calcr, Ghrh, Npy, Pmch, PomcCerebellum, Granule Cells Medulloblastoma, Ataxia, Cerebellar hypoplasiaGabra6, Cbln3Cerebellum, Purkinje Cell Layer Spinocerebellar Ataxia, Autism, Plasticity Pcp2, Hbegf, Icmt, Atp2a3, Casq2, Gdf10, Grid2, Hes3, Lhx1, Ptprm, A930006D11RikBasal Nucleus of Meynert Acetylcholine System, Alzheimer Disease Gal, Ngfr, Tac2, Lhx8, Ecel1, Gbx1, Lancl3, Ntrk1Substantia Nigra Dopamine System, Parkinson Disease Ddc, Slc6a3, Ntsr1, Pitx3, Aldh1a1, Chrna6, Chrnb3, ThRaphe Nuclei Norepinephrine System, Depression Fev, Gchfr, Slc6a4, Slc17a8, Tph2, Maob, Esr2Locus Coeruleus Serotonin System, Depression Dbh, Maoa, Slc6a2, Slc18a2Astroglia Alzheimer Disease Gfap, S100b, Slc1a2, Plaur, Gcm1, Gcm2, Serpina3nMicroglia (activated) Alzheimer Disease, Amyotropic Lateral SclerosisCd68, Aif1, P2rx7, Sulf2Microglia (constitutive) Alzheimer Disease Cx3cr1, ItgamOligodendroglia Alzheimer Disease, Multiple Sclerosis Olig1, Ugt8a, Cnp, Gjb1, Klk6, Mag, Apod, Enpp2, Fa2h, Mal, Mbp, Mobp, Mog, Olig2, Pllp, Plp1, Sox10, Tmem63aBarrington's Nucleus Pain Crh, Fgfr1Brainstem, Pons and Medulla Pain Slc6a5, Glra1, Pogz, Anxa4, Spp1, Esr1, Pou4f1, Slc4a2, StacCortex, Anterior Cingulate Pain Egr1, Stmn1, Cckbr,Adcy1Cortex, Somatosensory Pain Rspo1, Cyp39a1, Cartpt, Col5a1, Rorb, Loc433228, Gnb4Cortex, Insula Pain Lxn, Ntng2, Nr4a2, Fezf2, Ttc9bHypothalamus, Paraventricular Nucleus Pain Avp, OxtSubthalamic Nucleus Pain Pitx2, Lmx1bBlood Brain Barrier Drug therapy Abcb1a, Cldn5, Ednra, Fcgrt, Hspa12b, Lrp10, Lrp8, Rage, Slc2a1, Slc7a5, Slco1c1, Slc6a12, Slc28a2GABAergic neurons Schizophrenia, Bipolar Disorder Vip*, Gpr88‡*also listed as a cortex-specific gene‡also listed as a striatum-specific genePage 7 of 14(page number not for citation purposes)NeuroBlast to retrieve genes co-expressed with a seeded(query) gene in a region of interest. Identification ofing regionally enriched expression (Table 4). A summaryof the meta-analysis that supports regionally enrichedBMC Neuroscience 2008, 9:66 http://www.biomedcentral.com/1471-2202/9/66expression is presented [see Additional file 2]; whereavailable, this file includes examples of supporting ABAimages downloaded from the ABA website (please seeAvailability & requirements for more information)Identification of over-represented GO terms among genes with region-enriched expressionThe Gene Ontology (GO) resource [19] is a powerful toolto identify common functions shared by genes identifiedby high-throughput gene expression methods such asSAGE. We searched for over-representation of GO termsamong our set of genes from each of three ontologyclasses: Biological Process, Molecular Function and Cellu-lar Component (Methods). Of 237 genes in our selection,we found annotations for 216 genes in the whole mousegenome set of 18535 annotated genes (as of March 18,2008). From this list, we determined the top 12 statisti-cally over-represented GO terms [see Additional file 3].Annotations for the test selection of genes were comparedwith GO annotations of the whole mouse genome. Signif-icant biological processes involved nervous system devel-opment, transmission of nerve impulse, cell-cellsequence-specific DNA binding, neurotransmitter recep-tor activity, steroid hormone receptor activity, neurotrans-mitter transporter activity etc. Products of some of thesegenes also tended to be localized in the extracellularregion, plasma membrane, synapse, or within transcrip-tion factor complexes. Thus, it appears that many of thegenes we identified have established neurological func-tions, which accounts for their regionally enriched expres-sion. It is noteworthy that we found 28 transcriptionfactor encoding genes representing 16 of 30 regions/cell-types of interest (Table 5). This information combinedwith identification of regulatory sequences within pro-moters of selected genes will aid the design of mini-pro-moters specific for each brain region. Because ourselection of the 237 genes was biased towards those withknown functions, we also carried out GO analysis ongenes expressed in each of 18 SAGE libraries [see Addi-tional file 4]. Specific neurological functions were lessapparent among over-represented GO terms for theselarger sets than for the 237 genes presented in this study.DiscussionTable 5: Regionally enriched genes encoding transcription factorsGene Transcription Factor Description Associated Brain RegionNr2f2 Nuclear receptor subfamily 2, group F, member 2 Amygdala, Basolateral ComplexGbx1 Gastrulation brain homeobox 1 Basal Nucleus of MeynertLhx8 LIM homeobox protein 8 Basal Nucleus of MeynertEsr1 Estrogen receptor 1 Brainstem (Pons and Medulla)Pou4f1 POU domain, class 4, transcription factor 1 Brainstem (Pons and Medulla)Lhx1 LIM homeobox protein 1 Cerebellum, Purkinje Cell LayerEmx1 Empty spiracles homeobox 1 CortexTbr1 T-box brain gene 1 CortexEgr1 Early growth response 1/Zinc finger protein 225 Cortex, Anterior CingulateNr4a2 Nuclear receptor subfamily 4, group A, member 2 Cortex, InsulaNr3c2 Nuclear receptor subfamily 3, group C, member 2 HippocampusNeurod6 Neurogenic differentiation 6; Basic HLH transcription factor Hippocampus, Ammon's HornDlx1 Distal-less homeobox 1 NeurogenicDlx2 Distal-less homeobox 2 NeurogenicNr2e1 Nuclear receptor subfamily 2, group E, member 1 NeurogenicSox2 SRY (sex determining region Y)-box 2 NeurogenicEsr2 Estrogen receptor 2 Raphe NucleiFoxp1 Forkhead box P1 StriatumRarb Retinoic acid receptor, beta: StriatumPitx3 Paired-like homeodomain transcription factor 3 Substantia NigraLmx1b LIM homeobox transcription factor 1, beta Subthalamic NucleusPitx2 Paired-like homeodomain transcription factor 2 Subthalamic NucleusLef1 Lymphoid enhancer binding factor 1 ThalamusTcf7l2 Transcription factor 7-like 2 (T-cell specific, HMG-box) ThalamusGcm1 Glial cells missing homolog 1 White Matter – Glia, AstrocytesGcm2 Glial cells missing homolog 2 White Matter – Glia, AstrocytesOlig1 Oligodendrocyte transcription factor 1 White Matter – Glia, OligodendrogliaOlig2 Oligodendrocyte transcription factor 2 White Matter – Glia, OligodendrogliaSox10 SRY (sex determining region Y)-box 10 White Matter – Glia, OligodendrogliaPage 8 of 14(page number not for citation purposes)signaling, neurogenesis, behavior etc. Significant molecu-lar functions involved neuropeptide hormone activity,Targeting gene therapy to specific regions of the brainrequires the application of well-defined promoters thatBMC Neuroscience 2008, 9:66 http://www.biomedcentral.com/1471-2202/9/66can drive expression in a region-specific manner. In thisstudy our goal was to identify regionally enriched tran-scripts in sub-structures/cell-types of the mouse brainwith a particular focus on those brain regions associatedwith diseases. We were encouraged by findings from theABA project that above background level expression wasfound for ~80% of genes assayed – and approximately70% of genes have been localized to fewer than 20% of allbrain cells – suggesting that gene expression is clustered insmall brain regions [14]. For a variety of reasons webelieve that human orthologs of regionally enrichedmouse genes would be good candidates to design promot-ers from. First, at the genomic level, approximately 99% ofmouse genes have an ortholog in the human genome[20]. Second, it has been shown that 84% of human-mouse orthologous gene pairs show significantly lowerexpression divergence than that of random gene pairs[21]. In another comparable study within the milieu ofneurogenomics, it was demonstrated that there are signif-icant constraints on the evolution of gene expression andnucleotide sequence of region-specific genes in the brainsof humans and mice [1]. In general, transcripts that areregionally enriched in mice also appear to be regionallyenriched in humans – further emphasizing conservationof mammalian brain gene expression. Nonetheless, we areexercising caution in assuming global conservation ofexpression across species as divergent as mouse andhuman, and will be testing multiple candidate genes foreach region.Our study profiles region-enriched gene expression within17 key areas of the adult mouse brain by LongSAGE anal-ysis. For the small number of brain regions for which wehad no SAGE data we interrogated the literature and theABA directly. We used several expression indicatorsincluding SAGE tag abundance and specificity, in situhybridization, promoter-reporter fusion data etc. to assesscandidacy of genes. Our data mining strategy was to startwith SAGE-identified genes ranked on the basis of specif-icity and expression level, confirmed with supporting evi-dence from the literature, ABA or GENSAT. Although weprioritized finding genes displaying absolute regional spe-cificity (no detectable background expression), for ourdata mining strategy to be practicable we did not limitourselves to this level of stringency – especially for thebrain nuclei e.g. Basal Nucleus of Meynert, Barrington'sNucleus etc. Therefore, we also selected genes that dis-played the highest level of regional enrichment with theidea that promoters of such genes can be manipulated toproduce desired specificity of expression, as reported byMachon et al. for the mouse Dach1 gene [22]. Comparedto ubiquitous expression of the native Dach1 gene, a trans-gene with 5.8 kb of Dach1 regulatory sequence restricts -further delimited cortex-specific activity to a minimal 2.5kb promoter region. From a total of about 30,000 mousegenes [20], we have identified a set of 237 genes display-ing regional enrichment of expression.Analysis of SAGE data to delineate transcriptome similar-ity among 17 selected brain tissues revealed segregation ofa large cluster of neuronal tissues from discrete single clus-ters of non-neuronal tissues (Ependymal tissue and thehighly myelinated Cerebellar White Matter tissue) and theneuronal outlier Cerebellar Purkinje Cell Layer. This pat-tern of tissue clustering appears to be borne out by uniquetissue composition at the very least. Among neuronal tis-sues, tight clustering of the Ventral and Medial Thalamusregions is possibly a reflection of common diencephalicorigin, although from a functional standpoint the two tis-sues can be considered to be different. The expression sig-nature of a tissue may either independently confer tissueuniqueness, or itself depend on unique tissue composi-tion, the surrounding cellular environment, or a combina-tion of factors.Other studies have also demonstrated the utility of geneexpression patterns in assessing cytoarchitectural distinct-ness of rodent brain regions. During review of this manu-script another study was published that employed SAGEgene expression profiling to identify region expression in11 regions of the adult mouse brain [23]. Interestingly,regional enrichment of some transcripts was found to beconserved in the human brain. Microarray analysis ofgene expression patterns in 24 neural tissues in the mousecentral nervous system has mapped discrete braindomains based on such expression patterns [24]. Impor-tantly, it was revealed that embryological imprinting isstill evident in the adult brain. Microarray analysis hassimilarly identified molecular markers for neuronal sub-types in the adult mouse forebrain [25], in brain regionsin each of eight strains of inbred mice [26], as well as inthe adult rat CNS [27,28]. Fang et al. have shown that themost regionally discriminative genes are associated withone of four specific factors: regional myelin/oligodendro-cyte levels, resident neuron types, neurotransmitter inner-vation profiles, and Ca+2-dependent signaling and secondmessenger systems [28].By assessing over-representation of GO terms within ourset of regionally expressed genes, we identified common-alities in molecular functions, cellular locations andinvolvement in key biological processes. This offers thepromise of a unique set of molecular markers for eachregion/cell-type, and could potentially provide a mecha-nistic explanation of unique functions performed by dis-crete brain regions. Because of the disease application ofPage 9 of 14(page number not for citation purposes)galactosidase reporter expression within the mouse brainto the neocortex. Deletion analysis of this 5.8 kb fragmentour work, we were assured by the over-representation ofgenes involved in neurotransmitter synthesis, receptionBMC Neuroscience 2008, 9:66 http://www.biomedcentral.com/1471-2202/9/66and degradation. Importantly, we have also identifiedmany regionally expressed transcription factor-encodinggenes. This is consistent with previous findings of Suzukiet al. who have identified region-specific transcription fac-tors in 11 mouse brain regions by using medium-scalereal-time RT-PCR [29].  They reported that 90% of knowntranscription factors display significant expression in atleast one brain region. Additionally, it was found that 349of over 1000 transcription factor and co-regulator genes,mapped by in situ hybridization in the brains of develop-ing mice, show restricted expression patterns adequate todescribe the anatomical organization of the mouse brain[30]. The identification of brain region-specific transcriptionfactors is a prelude to explaining expression patterns ofsimilarly enriched genes regulated by these factors. Armedwith this knowledge, we can now search for evidence oftranscription factor co-regulation of genes by availing ofexisting repositories of regulatory sequence collections[31-33]. In particular, the PAZAR system [33] has beenemployed to integrate transcription factor data and anno-tated regulatory sequences from the Pleiades PromoterProject. Additionally, given that much is already knownabout pathways that activate transcription factors, itwould now be possible to identify pathways with whichgenes regulated by these transcription factors are associ-ated. Indeed, a regulatory network comprising 15 impor-tant basic helix-loop-helix transcription factors and 153target genes within the mouse brain has now been con-structed [34]. From the perspective of the Pleiades Pro-moter Project, the identification of DNA-bindingelements, transcription factors and pathways influencingtheir interaction will stand in good stead for efficientmini-promoter design.We encountered challenges during in this study that aredeserving of mention. In literature mining, curation wasobfuscated by the existence of numerous synonyms foreither mouse or human genes, references to a single pro-tein rather than two distinct isoforms, or different geneswith the same synonym. Furthermore, where genes werenot represented on either ABA or GENSAT it was not pos-sible to confirm expression, but nonetheless such geneswere retained based on level and specificity of expressionindicated by the literature or SAGE. Additionally, for agood number of genes there was low correlation betweenexpression detected by SAGE and in situ hybridization.Despite the depth of sampling, expression of many geneswas not detected by our SAGE procedure; for e.g Pde1b1,which has been shown to be strongly expressed in thestriatum by in situ hybridization on ABA and in the litera-ture [35]. Also, Hcrt appeared to be Hypothalamus-spe-expression. Although our SAGE procedure and ABA in situhybridization profiled gene expression from the samemouse strain C57BL/6J, lack of correlation between thetwo could be due to inherent differences in the way RNAis processed and/or detected in these procedures. None-theless, Hcrt was retained in our study after consideringsignificance of expression in SAGE analysis (P value = 0)and the description of minimal promoters in the literature[36,37].ConclusionWe have successfully identified genes displaying region-enriched expression in the mouse brain by the applicationof SAGE and data mining from a variety of publicly avail-able sources. These genes represent useful molecularmarkers that could potentially aid in unraveling the func-tions of representative brain regions/cell-types. Impor-tantly, for the Pleiades Promoter Project, identification ofthese genes has brought us closer to our goal of designingwell-defined human promoters for gene therapy. Indeed,we have further identified promoters of human orthologsof a subset of these mouse genes, and are now gearing upto test expression of reporter genes in transgenic mice(unpublished data). Ultimately, it will be of great interestto determine for how many of these promoters the mousepattern of regional enrichment is recapitulated within thehuman brain, and which of these successfully remediatethe disorders they may be designed for.MethodsMiceMice used in our experiments were all adult male C57BL/6J mice (12-week old post-natal). All procedures used inthese experiments were in accordance with the CanadaCouncil on Animal Care and approved by the Universityof British Columbia Animal Care Committee (A05-1748).All experiments were conducted in accordance with Cana-dian and International standards for animal care. Allefforts were made to minimize the number and sufferingof any animals used in these experiments.Whole brain manual dissection and RNA extractionWhole brains were manually dissected at room tempera-ture from the intact bodies of mice. To minimize theeffects of stress on gene expression, the mother, and theentire litter remained in the family cage until harvest. Micewere removed, one at a time and killed in a separate room,by cervical dislocation. Tissue was immediately flash fro-zen in liquid nitrogen and stored at -80°C until furtherprocessing. Frozen tissue was disrupted and homogenizedfor 30 seconds with a Polytron® PT 1200CL hand-heldhomogenizer (Kinematica AG, through Brinkmann™Instruments Inc, Mississauga, Canada) at a setting of 3Page 10 of 14(page number not for citation purposes)cific by SAGE but ABA indicated enrichment in theHypothalamus with low level, widespread background(~13,000 RPM), which had been equipped with a 7-mmeasy-care generator (PT-DA 1207/2EC). Total RNA wasBMC Neuroscience 2008, 9:66 http://www.biomedcentral.com/1471-2202/9/66extracted using RNeasy Lipid Tissue Mini Kit (Qiagen Inc.,Missisauga, Canada), following the manufacturer's proto-col with the modification of using 1.5-ml Phase Lock Gel™Heavy Tube (Eppendorf Scientific, through Fisher Scien-tific, Ottawa, Canada) for more robust phase separation.Also, while on the column, samples underwent DNase Itreatment during RNA extraction. Standard care was usedto avoid RNA degradation: reagents were prepared withdiethyl pyrocarbonate (DEPC)-treated water and all sur-faces and equipment were treated with an RNase decon-tamination solution (RNaseZap® and RNaseZap® Wipes;Ambion Inc., Austin, Texas, USA). The quality and quan-tity of the RNA samples were tested on an Agilent 2100Bioanalyzer with the RNA 6000 Nano LabChip® Kit (Agi-lent Technologies Canada Inc., Mississauga, Canada).Harvesting adult brain regions by Laser Capture Microdissection (LCM)Brains (1–3 per region; exception: 7 per Ependymal andSubependymal Layers), recovered as above, were immedi-ately frozen on dry ice and mounted in OCT (OptimalCutting Temperature) embedding medium. For the VisualCortex (SM147), Cerebellar White Matter (SM182), Dor-sal Striatum (SM195), and Cerebellar Purkinje cells(SM196) sagittal sections were processed, while coronalsections were used for the remaining tissues. Cryosections(20 m) of fresh-frozen tissues were mounted ontoRNase-free membrane slides (Molecular Machines &Industries AG (MMI), Glattbrugg, Switzerland) manufac-tured for LCM. To identify the desired regions for process-ing by LCM, each slide was individually stained with amodified Nissl-substance stain using cresyl violet (CV)dye (Polysciences, Inc., Warrington, PA) as follows: Slide-mounted sections were air-dried for 2–3 min and the sur-rounding OCT medium was rinsed off with 1× PBS (madewith DEPC water). Tissue was fixed for 30 sec with 75%ethanol, stained for 1 min with 0.5% CV, then sequen-tially rinsed for 5–10 sec with 75%, 95%, and 100% eth-anol. After air-drying for 2–3 min, sections wereimmediately dissected with the SL CUT system (MMIAG; Glattbrugg, Switzerland) under the 10× objective of aNikon Eclipse TE2000-S, at laser power < 70 mV, for nolonger than 15 min. The cut regions were collected ontothe adhesive cap of a 500-l microfuge tube (MMI AG,Glattbrugg, Switzerland) designed for the SL CUT sys-tem, digested with 30 l lysis buffer RLT (RNeasy MicroKit; Qiagen Inc., Missisauga, Canada), and transferredfrom the cap to the vial. The samples were vortexed, cen-trifuged for 5 sec, and then stored at -80°C until RNAextraction (as above). High-quality samples were pooledwithin groups for SAGE library generation.SAGE library preparationcDNA was synthesized with Powerscript Reverse Tran-scriptase (Clontech, BD Biosciences, Mississauga, Can-ada) and LITE1/LITE TS primer mix (Invitrogen, Carlsbad,CA) using 15–120 ng of DNase-treated total RNA, andamplified by a 20-cycle PCR according to the SAGE-Litemethod [38]. SAGE-Lite biochemistry for the generationof full-length cDNA libraries is based upon the SMART(Switching Mechanism At the 5' end of RNA Transcripts)cDNA synthesis strategy (Clontech, BD Biosciences, Mis-sissauga, Canada). Following amplification, the cDNAwere processed according to an adaptation of the standardLongSAGE protocol using the I-SAGE Long kit (Invitro-gen, Carlsbad, CA). The SAGE protocol includes steps ofanchoring by NlaIII, tagging by MmeI, and generating 131bp ditags by T4 DNA ligase. The 131 bp ditags were ampli-fied using the scale-up PCR varying from 23–27 cyclesdepending on the optimal scale up condition as describedin the protocol, and were digested with NlaIII to removeadapter sequences. Purified 36-bp ditags were ligated toform concatemers that were cloned into SphI-digestedpZErO-1 vector (Invitrogen, Carlsbad, CA), and transfor-mations were done using One Shot DH10B T1 electro-competent E. coli (Invitrogen, Carlsbad, CA).After transformants had been screened by colony PCR, thefraction containing concatemers of sizes ranging from 900bp-1300 bp was chosen for sequencing. Colonies werepicked using a Q-Pix robot (Genetix, Beaverton, OR) andinoculated into 2xYT media with Zeocin (50 g/ml) andglycerol (7.5%). After overnight culture, glycerol stockswere used to inoculate larger volume cultures for plasmidpreparation, carried out using a standard alkaline-lysisprocedure adapted for high-throughput processing withmicrotiter plates. DNA sequencing was performed withBigDye v3.1 dye terminator cycle sequencing reactionsrun on Tetrad thermal cyclers (MJ Research, Waltham,MA). Products from the sequencing reaction were purifiedby ethanol precipitation and then run on capillary DNAsequencers (Model 3730xl, Applied Biosystems, FosterCity, CA).Following inspection of data quality from a first 384-wellsequencing plate, each library was sequenced to a depth of> 100,000 raw tags. The resulting sequence data were col-lected automatically and processed by both trimming thereads for sequence quality and removing sequences fromnon-recombinant clones, vector DNA and linker-derivedtags. Processed data can be found on the Mouse Atlas web-site (please see Availability & requirements for more infor-mation)SAGE data analysisTo obtain high quality SAGE tags for this study, all rawPage 11 of 14(page number not for citation purposes)The LongSAGE-Lite method was used to construct thelibraries as previously described [5]. In brief, first strandSAGE tags underwent a three-step cluster modificationprocess developed by Siddiqui et al. [8]. In the first step,BMC Neuroscience 2008, 9:66 http://www.biomedcentral.com/1471-2202/9/66we calculated for each tag a P value based on the Phredquality score [39] to identify single nucleotide variantslikely to originate from sequencing error. In the secondstep, we used tag sequence clustering to group such vari-ants to combine tags likely to originate from a commontranscript. Thus, some singletons were clustered andcounted as a more abundant tag. The third step was to fil-ter out low quality tags and compare each P value to ameta-library P value calculated from all SAGE libraries.Tag-to-gene-mapping was then carried out using Discov-erySpace 4.0 application [40]. All cluster-modified tagswere then mapped to transcripts in the NCBI ReferenceSequence Collection [41]. The remaining unmapped tagswere mapped to transcripts in the Mammalian Gene Col-lection [42], followed by the Ensembl database [43]. Onlysense transcripts and unique mappings were considered,and tags that mapped to more than one transcript in anyof the three transcript databases were discarded. The threemapping results were subsequently merged based on genesymbol.For each gene, a P value was assigned to each target (TL;brain region of interest) and off-target (OTL; backgroundregion) library pair using the P value option in Discov-erySpace. The P value was computed based on Audic-Clav-erie algorithms [44] to assess confidence level ofdifferential expression between two transcript libraries. Aranking system was implemented to facilitate selection ofcandidate genes with specific or enriched expression ineach target library (Table 2). Region-specific transcriptswere obtained by selecting transcripts detected with 5 tagsor more only in one target library. To identify region-enriched transcripts, those detected in one target libraryand one off-target library (PTL-OTL value < = 0.05) wereselected. Transcripts detected in multiple libraries wereranked based on pre-defined P value limits of differentialexpression (PTL-TL, PTL-OTL), as well as additional criteriasuch as target and off-target library counts. Transcriptswhose expression patterns did not fit these criteria werenot ranked.To analyze transcriptome similarity of tissues, a dendro-gram was generated using MATLAB 7 (The MathWorks,Natick, MA) based on hierarchical clustering using theUnweighted Pair Group Method with Arithmetic Mean(UPGMA). The input data is a list of objects (tissue SAGElibraries) with their pair-wise distances (expression diver-gence ED; see below), and the output is a dendrogram.Initially, each object is in its own cluster; then, at each stepof the hierarchical clustering the nearest two clusters arecombined into a higher-level cluster. The distancebetween any two clusters A and B is taken to be the aver-age of all distances between pairs of objects in A and B.ferentially expressed genes in their corresponding SAGElibraries, using the formula:ED(p) = Ndiff(p)/N(Ndiff(p) = number of differentially expressed genes for agiven P value, N = number of shared genes between twocorresponding libraries).Semi-automated Literature miningAll synonyms for 28,000 mouse genes were obtained fromEntrez (RefSeq release 14) combined with Ensembl (build34) of the mouse genome. Synonyms for the humanorthologs were obtained using Compara (Ensembl build34) to identify similarities between human and mousetogether with Homologene (version 47) for homologdetection. In each case, Ensembl and Entrez were used ascross-references for gene identifiers. From these searchstrings, all names found in the English dictionary weresubtracted to remove obfuscating gene terms such as"Ice". Abstracts were parsed from Medline (extraction per-formed September 7, 2006) and the complete text of arti-cles were parsed from PubMed Central [45], andconverted into individual sentences using the medicalsentence parser [46]. Each sentence was searched for theco-occurrence of gene names with brain regions of inter-est. For each brain region, expanded search terms wereapplied referring to finer structures appropriate to theregion as defined by the ontology available from the AllenBrain Atlas website [13]. The number of sentences withgene names and brain regions obtained is greater than thenumber of sentences with only brain regions because ofthe plural nature of both search terms. We scrutinizedretrieved publications for details indicating regionallyenriched/specific expression in a brain region.Gene Ontology over-representation analysisGene Ontology [19] over-representation analysis was per-formed for the 237 genes using the BiNGO [47] plug-infor the Cytoscape [48] software package. Significance ofover-representation of GO terms was calculated using thehypergeometric test, corrected for multiple testing with aBenjamini & Hochberg false discovery rate correction[49], and a cut-off of 0.05 was applied to the result. Thetest selection of 237 genes was compared to all GO anno-tated genes in the mouse genome (18535 genes, as ofMarch 18, 2008).AbbreviationsSAGE: Serial Analysis of Gene Expression; LCM: LaserCapture Microdissection; OCT: Optimal Cutting Temper-ature; CV: Cresyl violet; DEPC: Diethyl Pyrocarbonate;ABA: Allen Brain Atlas; BAC: Bacterial Artificial Chromo-Page 12 of 14(page number not for citation purposes)Thus, we defined pair-wise distance or expression diver-gence (ED) between any two tissues as the fraction of dif-some; GENSAT: Gene Expression Nervous System Atlas;BMC Neuroscience 2008, 9:66 http://www.biomedcentral.com/1471-2202/9/66UPGMA: Unweighted Pair Group Method with ArithmeticMean; ED: Expression Divergence; GO: Gene Ontology.Availability & requirementsThe Pleiades Promoter Project: http://www.pleiades.orgABA website: http://www.brain-map.org; Seattle (WA):Allen Institute for Brain Science © 2004–2007; in accord-ance with ABA Terms of Use and Citation Policy.Mouse Atlas website: http://www.mouseatlas.org/data/supplemental/brain_tags_processedAuthors' contributionsCAD analyzed SAGE data, ABA in situ hybridization dataand mined the PubMed database to identify region-enriched genes, carried out GO analysis, and drafted thismanuscript. VC analysed SAGE data, ABA in situ hybridi-zation data, mined the PubMed database to identifyregion-enriched genes, and contributed to the compila-tion of gene expression summaries in Additional file 2.DG confirmed candidature of SAGE and literature minedgenes by inspecting ABA images. RV and LLCL performedbioinformatics analysis of SAGE data. Y–YX and SB laser-microdissected tissues for construction of SAGE libraries.YZ participated in SAGE library construction. MB did thehierarchical clustering analysis of tissue transcriptomesutilizing the java script written by AH. EP–C participatedin data mining and selection of region-enriched genes.EMS conceived of the study, and participated in its designand coordination along with WWW, DG, MAM, RAH, andSJMJ. All authors read and approved the final manuscript.Additional materialAcknowledgementsWe wish to acknowledge financial support from the BC Cancer Founda-tion, Genome British Columbia, Genome Canada, UBC Institute of Mental Health, Child and Family Research Institute, UBC Office of the Vice Presi-dent Research, BC Mental Health and Addiction Services, GlaxoSmithKline R & D Ltd., and Canada Research Chair in Genetics and Behaviour (to E.M.S). We are grateful to the SAGE Library Construction Group and the Sequencing Group at the Genome Sciences Centre for technical assistance. We would like to thank Charles De Leeuw for useful comments on this manuscript, and Tracey Weir and Russell Watkins for editorial assistance with the manuscript. S.J.M.J, R.A.H, W.W.W and M.A.M are Michael Smith Foundation for Health Research Scholars. WWW is also a CIHR New Investigator.References1. Strand AD, Aragaki AK, Baquet ZC, Hodges A, Cunningham P, Hol-mans P, Jones KR, Jones L, Kooperberg C, Olson JM: Conservationof regional gene expression in mouse and human brain.  PLoSgenetics 2007, 3(4):e59.2. Wasserman WW, Palumbo M, Thompson W, Fickett JW, LawrenceCE: Human-mouse genome comparisons to locate regula-tory sites.  Nature genetics 2000, 26(2):225-228.3. Gotz J, Probst A, Spillantini MG, Schafer T, Jakes R, Burki K, GoedertM: Somatodendritic localization and hyperphosphorylationof tau protein in transgenic mice expressing the longesthuman brain tau isoform.  Embo J 1995, 14(7):1304-1313.4. Velculescu VE, Zhang L, Vogelstein B, Kinzler KW: Serial analysisof gene expression.  Science 1995, 270(5235):484-487.5. Khattra J, Delaney AD, Zhao Y, Siddiqui A, Asano J, McDonald H, Pan-doh P, Dhalla N, Prabhu AL, Ma K, et al.: Large-scale productionof SAGE libraries from microdissected tissues, flow-sortedcells, and cell lines.  Genome research 2007, 17(1):108-116.6. Saha S, Sparks AB, Rago C, Akmaev V, Wang CJ, Vogelstein B, KinzlerKW, Velculescu VE: Using the transcriptome to annotate thegenome.  Nat Biotechnol 2002, 20(5):508-512.7. The Mouse Atlas of Gene Expression Project   [http://www.mouseatlas.org]8. Siddiqui AS, Khattra J, Delaney AD, Zhao Y, Astell C, Asano J,Babakaiff R, Barber S, Beland J, Bohacec S, et al.: A mouse atlas ofgene expression: large-scale digital gene-expression profilesfrom precisely defined developing C57BL/6J mouse tissuesand cells.  Proceedings of the National Academy of Sciences of the UnitedStates of America 2005, 102(51):18485-18490.9. The Pleiades Promoter Project   [http://www.pleiades.org/]10. The NCBI PubMed Database   [http://www.pubmed.com]11. De Bruijn B, Martin J: Getting to the (c)ore of knowledge: min-ing biomedical literature.  Int J Med Inform 2002, 67(1–3):7-18.12. Scherf M, Epple A, Werner T: The next generation of literatureanalysis: integration of genomic analysis into text mining.Brief Bioinform 2005, 6(3):287-297.13. The Allen Brain Atlas Database   [http://www.allenbrainatlas.org/]14. Lein ES, Hawrylycz MJ, Ao N, Ayres M, Bensinger A, Bernard A, BoeAF, Boguski MS, Brockway KS, Byrnes EJ, et al.: Genome-wide atlasof gene expression in the adult mouse brain.  Nature 2007,445(7124):168-176.15. McCarthy M: Allen Brain Atlas maps 21,000 genes of theAdditional file 1Compilation of SAGE data for 237 regionally enriched genes.Click here for file[http://www.biomedcentral.com/content/supplementary/1471-2202-9-66-S1.xls]Additional file 2Summary of expression profiles of region-specific or enriched genes by sub-anatomical region.Click here for file[http://www.biomedcentral.com/content/supplementary/1471-2202-9-66-S2.doc]Additional file 3Top 12 over-represented GO terms in each ontology category among the 237 regionally enriched genes.Click here for file[http://www.biomedcentral.com/content/supplementary/1471-2202-9-66-S3.xls]Additional file 4Top 10 over-represented GO terms in each ontology category among the genes in each of 18 SAGE libraries.Click here for file[http://www.biomedcentral.com/content/supplementary/1471-2202-9-66-S4.xls]Page 13 of 14(page number not for citation purposes)mouse brain.  Lancet Neurol 2006, 5(11):907-908.16. Simone NL, Bonner RF, Gillespie JW, Emmert-Buck MR, Liotta LA:Laser-capture microdissection: opening the microscopicPublish with BioMed Central   and  every scientist can read your work free of charge"BioMed Central will be the most significant development for disseminating the results of biomedical research in our lifetime."Sir Paul Nurse, Cancer Research UKYour research papers will be:available free of charge to the entire biomedical communitypeer reviewed and published immediately upon acceptancecited in PubMed and archived on PubMed Central BMC Neuroscience 2008, 9:66 http://www.biomedcentral.com/1471-2202/9/66frontier to molecular analysis.  Trends Genet 1998,14(7):272-276.17. Nakai J, Fujita S: Early events in the histo- and cytogenesis ofthe vertebrate CNS.  The International journal of developmental biol-ogy 1994, 38(2):175-183.18. The GENSAT Database   [http://www.gensat.org]19. Ashburner M, Ball CA, Blake JA, Botstein D, Butler H, Cherry JM,Davis AP, Dolinski K, Dwight SS, Eppig JT, et al.: Gene ontology:tool for the unification of biology. The Gene Ontology Con-sortium.  Nature genetics 2000, 25(1):25-29.20. Waterston RH, Lindblad-Toh K, Birney E, Rogers J, Abril JF, AgarwalP, Agarwala R, Ainscough R, Alexandersson M, An P, et al.: Initialsequencing and comparative analysis of the mouse genome.Nature 2002, 420(6915):520-562.21. Liao BY, Zhang J: Evolutionary conservation of expression pro-files between human and mouse orthologous genes.  Molecularbiology and evolution 2006, 23(3):530-540.22. Machon O, Bout CJ van den, Backman M, Rosok O, Caubit X, FrommSH, Geronimo B, Krauss S: Forebrain-specific promoter/enhancer D6 derived from the mouse Dach1 gene controlsexpression in neural stem cells.  Neuroscience 2002,112(4):951-966.23. Brochier C, Gaillard MC, Diguet E, Caudy N, Dossat C, Segurens B,Wincker P, Roze E, Caboche J, Hantraye P, et al.: Quantitative geneexpression profiling of mouse brain regions reveals differen-tial transcripts conserved in human and affected in diseasemodels.  Physiological genomics 2008, 33(2):170-179.24. Zapala MA, Hovatta I, Ellison JA, Wodicka L, Del Rio JA, Tennant R,Tynan W, Broide RS, Helton R, Stoveken BS, et al.: Adult mousebrain gene expression patterns bear an embryologic imprint.Proceedings of the National Academy of Sciences of the United States ofAmerica 2005, 102(29):10357-10362.25. Sugino K, Hempel CM, Miller MN, Hattox AM, Shapiro P, Wu C,Huang ZJ, Nelson SB: Molecular taxonomy of major neuronalclasses in the adult mouse forebrain.  Nat Neurosci 2006,9(1):99-107.26. Letwin NE, Kafkafi N, Benjamini Y, Mayo C, Frank BC, Luu T, Lee NH,Elmer GI: Combined application of behavior genetics andmicroarray analysis to identify regional expression themesand gene-behavior associations.  J Neurosci 2006,26(20):5277-5287.27. Stansberg C, Vik-Mo AO, Holdhus R, Breilid H, Srebro B, Petersen K,Jorgensen HA, Jonassen I, Steen VM: Gene expression profiles inrat brain disclose CNS signature genes and regional patternsof functional specialisation.  BMC genomics 2007, 8:94.28. Fang H, Tong W, Shi L, Jakab RL, Bowyer JF: Classification ofcDNA array genes that have a highly significant discrimina-tive power due to their unique distribution in four brainregions.  DNA and cell biology 2004, 23(10):661-674.29. Suzuki H, Okunishi R, Hashizume W, Katayama S, Ninomiya N, OsatoN, Sato K, Nakamura M, Iida J, Kanamori M, et al.: Identification ofregion-specific transcription factor genes in the adult mousebrain by medium-scale real-time RT-PCR.  FEBS letters 2004,573(1–3):214-218.30. Gray PA, Fu H, Luo P, Zhao Q, Yu J, Ferrari A, Tenzen T, Yuk DI,Tsung EF, Cai Z, et al.: Mouse brain organization revealedthrough direct genome-scale TF expression analysis.  Science2004, 306(5705):2255-2257.31. Sandelin A, Wasserman WW: Constrained binding site diversitywithin families of transcription factors enhances pattern dis-covery bioinformatics.  Journal of molecular biology 2004,338(2):207-215.32. Montgomery SB, Griffith OL, Sleumer MC, Bergman CM, Bilenky M,Pleasance ED, Prychyna Y, Zhang X, Jones SJ: ORegAnno: an openaccess database and curation system for literature-derivedpromoters, transcription factor binding sites and regulatoryvariation.  Bioinformatics (Oxford, England) 2006, 22(5):637-640.33. Portales-Casamar E, Kirov S, Lim J, Lithwick S, Swanson MI, Ticoll A,Snoddy J, Wasserman WW: PAZAR: a framework for collectionand dissemination of cis-regulatory sequence annotation.Genome Biol 2007, 8(10):R207.34. Li J, Liu ZJ, Pan YC, Liu Q, Fu X, Cooper NG, Li YX, Qiu MS, Shi TL:Regulatory module network of basic/helix-loop-helix tran-scription factors in mouse brain.  Genome Biol 2007, 8(11):R244.regions having extensive dopaminergic innervation.  J Neurosci1994, 14(3 Pt 1):1251-1261.36. Sakurai T, Moriguchi T, Furuya K, Kajiwara N, Nakamura T, Yanagi-sawa M, Goto K: Structure and function of human prepro-orexin gene.  The Journal of biological chemistry 1999,274(25):17771-17776.37. Waleh NS, Apte-Deshpande A, Terao A, Ding J, Kilduff TS: Modula-tion of the promoter region of prepro-hypocretin by alpha-interferon.  Gene 2001, 262(1–2):123-128.38. Peters DG, Kassam AB, Yonas H, O'Hare EH, Ferrell RE, Brufsky AM:Comprehensive transcript analysis in small quantities ofmRNA by SAGE-lite.  Nucleic acids research 1999, 27(24):e39.39. Ewing B, Green P: Base-calling of automated sequencer tracesusing phred. II. Error probabilities.  Genome research 1998,8(3):186-194.40. Robertson N, Oveisi-Fordorei M, Zuyderduyn SD, Varhol RJ, Fjell C,Marra M, Jones S, Siddiqui A: DiscoverySpace: an interactivedata analysis application.  Genome Biol 2007, 8(1):R6.41. The NCBI Reference Sequence Collection   [http://www.ncbi.nlm.nih.gov/RefSeq]42. The Mammalian Gene Collection   [http://mgc.nci.nih.gov]43. The Ensembl Database   [http://www.ensembl.org]44. Audic S, Claverie JM: The significance of digital gene expressionprofiles.  Genome research 1997, 7(10):986-995.45. Greenberg DS: National Institutes of Health moves ahead with"PubMed Central".  Lancet 1999, 354(9183):1009.46. Berman JJ: Improved Medical Sentence Parser.  Arch Pathol LabMed 2003, 127:789-813.47. Maere S, Heymans K, Kuiper M: BiNGO: a Cytoscape plugin toassess overrepresentation of gene ontology categories inbiological networks.  Bioinformatics (Oxford, England) 2005,21(16):3448-3449.48. Shannon P, Markiel A, Ozier O, Baliga NS, Wang JT, Ramage D, AminN, Schwikowski B, Ideker T: Cytoscape: a software environmentfor integrated models of biomolecular interaction networks.Genome research 2003, 13(11):2498-2504.49. Benjamini Y, Hochberg Y: Controlling the False Discovery Rate:a Practical and Powerful Approach to Multiple Testing.  Jour-nal of the Royal Statistical Society, Series B 1995, 57:289-300.yours — you keep the copyrightSubmit your manuscript here:http://www.biomedcentral.com/info/publishing_adv.aspBioMedcentralPage 14 of 14(page number not for citation purposes)35. Polli JW, Kincaid RL: Expression of a calmodulin-dependentphosphodiesterase isoform (PDE1B1) correlates with brain

Cite

Citation Scheme:

        

Citations by CSL (citeproc-js)

Usage Statistics

Share

Embed

Customize your widget with the following options, then copy and paste the code below into the HTML of your page to embed this item in your website.
                        
                            <div id="ubcOpenCollectionsWidgetDisplay">
                            <script id="ubcOpenCollectionsWidget"
                            src="{[{embed.src}]}"
                            data-item="{[{embed.item}]}"
                            data-collection="{[{embed.collection}]}"
                            data-metadata="{[{embed.showMetadata}]}"
                            data-width="{[{embed.width}]}"
                            async >
                            </script>
                            </div>
                        
                    
IIIF logo Our image viewer uses the IIIF 2.0 standard. To load this item in other compatible viewers, use this url:
http://iiif.library.ubc.ca/presentation/dsp.52383.1-0223689/manifest

Comment

Related Items