Open Collections

UBC Faculty Research and Publications

Meta-analysis of human methylomes reveals stably methylated sequences surrounding CpG islands associated… Edgar, Rachel; Tan, Powell P C; Portales-Casamar, Elodie; Pavlidis, Paul Oct 23, 2014

Your browser doesn't seem to have a PDF viewer, please download the PDF to view this item.

Item Metadata


52383-13072_2014_Article_333.pdf [ 991.83kB ]
JSON: 52383-1.0223904.json
JSON-LD: 52383-1.0223904-ld.json
RDF/XML (Pretty): 52383-1.0223904-rdf.xml
RDF/JSON: 52383-1.0223904-rdf.json
Turtle: 52383-1.0223904-turtle.txt
N-Triples: 52383-1.0223904-rdf-ntriples.txt
Original Record: 52383-1.0223904-source.json
Full Text

Full Text

Meta-analysis of human methylomes revealsstably methylated sequences surrounding CpGislands associated with high gene expressionEdgar et al.Edgar et al. Epigenetics & Chromatin 2014, 7:28 of human migs-oconditions is of intense interest to understanding mam- correlate such domains with the location of genes or theirEdgar et al. Epigenetics & Chromatin 2014, 7:28 is associated with repression of transcription [6,7].of British Columbia, 2890 E Mall, Vancouver, BC V6T 1Z4, CanadaFull list of author information is available at the end of the articlemalian gene regulation. To this end, numerous studieshave been carried out to measure DNA methylation statesamong cell types or conditions at the resolution of singlecytosine guanine dinucleotides (CpGs). Currently, the fieldis undergoing an explosion of characterization of methy-lomes, leading to a growing but still highly incomplete un-derstanding of the relationships among methylation, generegulatory sequences, and with other epigenetic markssuch as histone acetylation or methylation. However, evenwith massive efforts such as ENCODE [2], numerous gapsin our knowledge exist, particularly in the variation (andfunctional significance) of epigenetic states across multiplecell types and conditions.Early studies focused on CpG islands (CGIs), defined asshort (approximately 1 kb) regions of high CpG density inan otherwise CpG-sparse genome [3]. Many CGIs are as-sociated with gene promoters [4,5], and methylation at* Correspondence: paul@chibi.ubc.ca2Centre for High-Throughput Biology and Department of Psychiatry, Universitytypes, developmental stages and physiological or diseaseexpression, partly based on the observation that a lack of CpG island methylation in gene promoters is associatedwith high transcriptional activity. However, the CpG island methylation level only accounts for a fraction of thevariance in gene expression, and methylation in other domains is hypothesized to play a role. We hypothesizedthat regions of very high stability in methylation would exist and provide biological insight into the role ofmethylation both within and outside CpG islands.Results: We set out to identify highly stable regions in the human methylome, based on the subset of CpGsassayed with an Illumina Infinium 450 K array. Using 1,737 samples from 30 publically available studies, weidentified 15,224 CpGs that are ‘ultrastable’ in their state across tissues and developmental stages (974 alwaysmethylated; 14,250 always unmethylated). Further analysis of ultrastable CpGs led us to identify a novel subset ofCpG islands, ‘ravines’, which exhibit a markedly consistent pattern of low methylation with highly methylatedflanking shores and shelves. We distinguish ravines from other CpG islands characterized by a broader flankingregion of low methylation. Interestingly, ravines are associated with higher gene expression compared to typicalunmethylated CpG islands, and are more often found near housekeeping genes.Conclusions: The identification of ultrastable sites in the human methylome led us to identify a subclass of CpGislands characterized by a very stable pattern of methylation encompassing the island and flanking regions,established early in development and maintained through differentiation. This pattern is associated with particularlyhigh levels of gene expression, providing new evidence that methylation beyond the CpG island could play a rolein gene expression.BackgroundVariation in the methylation state of DNA across cellexpression, normal cellular function and disease [1]. Theconceptually simplest approach is to divide chromosomesinto domains or clusters of similar methylation states andstably methylated sequenislands associated with hRachel Edgar1,2, Powell Patrick Cheng Tan2, Elodie PortaleAbstractBackground: DNA methylation is thought to play an imp© 2014 Edgar et al.; licensee BioMed Central LCommons Attribution License (http://creativecreproduction in any medium, provided the orDedication waiver (http://creativecommons.orunless otherwise stated.Open Accessethylomes revealsces surrounding CpGh gene expressionCasamar2 and Paul Pavlidis2*rtant role in the regulation of mammalian genetd. This is an Open Access article distributed under the terms of the, which permits unrestricted use, distribution, andiginal work is properly credited. The Creative Commons Public Domaing/publicdomain/zero/1.0/) applies to the data made available in this article,Edgar et al. Epigenetics & Chromatin 2014, 7:28 Page 2 of 12 recently, the utility of the concept of the CGI hasbeen challenged as it has become more technologicallyfeasible to directly measure methylation, rather than rely-ing on inferred states based on CpG density [8]. Genome-wide analysis has thus helped define a growing geographyof biologically significant methylation patterns besides thatassociated with CGIs near promoters. CGI ‘shores’, definedas the 2 kb of sequence flanking a CGI, have been re-ported to be more dynamic than the CGI itself [9,10].Beyond shores are ‘shelves’ [11] and ‘open sea’ sites[12]. More recently, large DNA methylation ‘valleys’and ‘canyons’ of low methylation have been identified[13-15]. Other domains, identified in tumor cells, aretermed ‘low-methylated regions’ (LMRs) and ‘long-rangeepigenetic activation’ (LREA) or silencing (LRES) domainsof relatively low or high methylation [16-18]. We note thatthe definition of these domains inevitably relies oninvestigator-specified parameters of length and methy-lation level, and they are not mutually exclusive; forexample, canyons often overlap CGIs. In addition, therelative stability of domains such as LREAs and canyonsacross cell types and conditions is still not completelydocumented.In general, the largest changes in DNA methylationare seen during development, which involves globalmethylation erasure and reestablishment [19], and incancer, which is characterized by extensive and oftengene-specific changes compared to normal tissues [20].Beyond this, many studies have emphasized the generalstability of the methylome. Even between different tissuesor tumor types, the number of differentially methylatedCpGs reported ranges from 0.5% to 20% (depending inpart on the statistical tests and significance cut-offs;[21,22]). Understanding which sites and domains are rela-tively static or dynamic is an important step to assigningfunction to DNA methylation.Because many previous studies focused on differencesin methylation across conditions or cell types, there islikely to be additional information on stability waiting tobe identified. Here we analyze a large collection of DNAmethylation data to identify a set of ultrastable CpGsites. We associate many of these sites with a novel sub-set of CGIs we refer to as ‘ravines’, which tend to be nearhousekeeping genes and associated with high expressionactivity and open chromatin states. We propose a newclassification of CGIs that takes into account the methyla-tion state of the island as well as the shores and shelves.ResultsUltrastable DNA methylation sitesOur initial analysis was to identify CpGs that have aconsistent methylation state, across all available tissue,developmental stage and disease variation. To do this,we took advantage of the large amount of data availablefrom the Illumina Infinium HumanMethylation450 Bead-Chip (450 K) [11]. The 450 K assays 485,577 CpGs in thehuman genome and is widely used in methylation studies,many of which are publicly available through the GeneExpression Omnibus (GEO; [23]). Careful quality con-trol (see Methods) yielded a set of 1,737 samples from30 different GEO series (a series typically reflects a sin-gle publication, [see Additional file 1: Table S1]), covering26 tissue types and a wide range of conditions (Figure 1Aand [see Additional file 1: Table S2]). We used a simplebut stringent computational approach to identify can-didate CpGs that were consistently methylated or unmethy-lated in all samples (see Methods). Based on this analysis,974 CpGs were considered consistently methylated in everysample and 14,250 consistently unmethylated (Figure 1B,[see Additional file 1: Table S3 and Additional file 2]).Together, we refer to these as ‘ultrastable’ CpGs. Theserepresent 3.1% of the CpG sites measured on the 450K. A less stringent definition of ‘ultrastable’ would ex-pand this set, but for our initial analysis we consideredthese as our starting pool.One concern is that the apparent stability of a CpGmight be a function of the platform and methodology.We therefore checked the methylation state of the ultra-stable CpGs in the ENCODE reduced representation bisul-fite sequencing (RRBS) data as validation. The 1.2 millionCpGs measured in the ENCODE RRBS data include 17%of the sites assayed by the 450 K, including 5,063 (33%) ofthe ultrastable CpGs. Of the 121 ultrastable methylatedand 4,942 ultrastable unmethylated CpGs of interestfor which there was data available in ENCODE RRBSdata, 80% and 98% were methylated and unmethylated,respectively, in 90% of RRBS samples [see Additionalfile 1: Figure S2]). The agreement of ENCODE RRBSdata with our results was correlated with sequencingdepth, so that higher-quality ENCODE sites tended toagree more closely with our methylation calls (that is,failures to verify tended to be poorly-covered sites inthe ENCODE data). This suggests that the large major-ity of the ultrastable CpGs are not merely artifacts ofthe 450 K. We further tested whether these CpGs mightbe giving erroneous measurements due to unusual resist-ance or sensitivity to bisulfite conversion [24], which isused both by the 450 K and RRBS methods. We examinedthe status of CpGs assayed on the 450 k in methylation-sensitive restriction Enzyme Sequencing (MRE-Seq; whichextracts unmethylated regions of the genome) and methyl-ated DNA immunoprecipitation sequencing (MeDIP-Seq;which extracts methylated regions of the genome) data asneither technique involves a bisulfite conversion. Wefound that the ultrastable unmethylated CpGs have asignificantly higher average read count in the MRE-Seqdata than the other 450 K CpGs (P <0.001), confirmingtheir stably unmethylated status. Similarly, the ultrastabletoderEEdgar et al. Epigenetics & Chromatin 2014, 7:28 Page 3 of 12 Syndrome (2)HGP and Werner (7)Ulcerative Colitis (11)Schizophrenia (62)Cancer (317) Stem Cell (78)Oral (76)Colon (69)Bone Marrow (38)Brain (34) EcACrohn's Disease (16)methylated CpGs had a significantly higher averageread count in the MeDIP-Seq than the other 450 KCpGs (P <0.01), confirming their stably methylated sta-tus. This analysis confirms that ultrastable CpGs areseen in both bisulfite-treated and non-bisulfite treateddata [see Additional file 1: Figure S3]. Additionally, weArthritis (354)Healthy (968) Other (126)Trophoblast (90)Lung (82)CDselpmaS7371ssor cAat eBnaeMResort FeatureN. Shelf N. Shore CGI S. Shore S. Shelf1. With Both Ultrastable CpG Types (85)Resorts With Unmethylated Ultrastable CpGs Resorts With Methylated Ultrastable CpGs (395)Resorts With No Ultrastable CpGsMethylation450K ProbesResort FeatureCpG DensityMethylation450K ProbeResort FeatureCpG DensitTBlood (1,114) Mesoderm (1,248)(5,598)(21,098)Figure 1 Ultrastable cytosine guanine dinucleotides (CpGs) highlightdisease tissue and germ layer samples used in analysis. [See Additional fileCpGs of the methylation stability states (not ultrastable, ultrastable unmethsample. Color scheme for ultrastable CpGs is maintained throughout the pmethylation pattern. Composite profiles are shown for all 27,176 resorts onCGI is shown here as relative to the length of the CGI. CGIs are plotted as 9the CGI boundaries on the plot (that is, start at 0 and end at 935.23), the CHorizontal lines indicate the CGI, shore and shelf boundaries. The four paneunmethylated CpGs, only ultrastable methylated CpGs and no ultrastable Cchr12:57881750 to 57882035; ravine) and TBX5 (bottom; CGI chr12: 114845861methylation patterns as smoothed lines showing the methylation pattern of acolored labelled bars. Lines indicate positions of 450 K probes assaying the reshows CpG density for bins of 50 bp on a scale of 0 to 0.2 CpG/bp. The genem (34)ndoderm (266)CpG StateB0.751.00examined the ultrastable CpGs in data sets that pur-posefully manipulated methylation, either by direct en-zymatic treatment of the DNA, or by genetic knockoutof DNA methyltransferases. This analysis showed thatunder appropriate conditions, the ultrastable sites canbe measured in their opposite state. This suggests thatUltra-stableMethylatedUltra-stableUnmethylatedNot ultra-stable||||| | | || | | || ||| | |chr12:57,881,7355kbhg19MARSN. ShoreN. Shelf CGI S. Shore S. Shelfs0.||| || || | ||| | || | || || || ||||| |||| || |||| || || || || ||chr12:114,846,2475kbhg19N. ShoreN. Shelf CGI S. Shore S. Shelfssy0. cg03734035 cg02859992BetaTBX5-AS1ARHGAP9a novel class of CpG islands (CGI). (A) Counts of 450 K samples1: Table S2 for complete list of tissue types used]. (B) Representativeylated and ultrastable methylated). Points represent an individualaper (C) Ultrastable CpGs allow observation of a unique resortthe 450 K. As CGIs have variable lengths, the CpG position within a35.23 bp (mean length of all CGIs measured on the 450 K). BeyondpGs actual distance, in base pairs, from the CGI start or end are show resorts with both types of ultrastable CpGs, only ultrastablepGs. (D) Example resorts associated with the genes MARS (top; CGIto 114847650; not ravine) are depicted with individual samplen individual across the resort. Resort feature positions are indicated bysorts, ultrastable CpGs are highlighted with taller red lines. The histogramtrack is extracted from UCSC Genome browser hg19 (refseq track).Edgar et al. Epigenetics & Chromatin 2014, 7:28 Page 4 of 12 is no inherent problem with the ultrastable CpGsbeing measured at either methylation state, but thatunder a wide range of biological conditions, the CpGsare always in one state.Distribution of ultrastable cytosine guanine dinucleotidesites in the human genomeBecause the ultrastable sites are consistent across a widerange of tissues, developmental stages and conditions,we hypothesized they would be of biological significance.Both classes of ultrastable sites tend to be near tran-scription start sites (TSS; P <0.001, t-test; accounting forthe distribution of sites on the 450 K; [see Additional file 1:Figure S5]). Concomitantly, ultrastable CpGs tend to be as-sociated with CGIs. Of all 450 K CpGs assayed, 62% areCGI-associated (in CGI, shore or shelf), while 95.5% of theultrastable CpGs are CGI-associated. We also observedthat ultrastable CpGs tend to be found in CGIs in groupsof two or more, rather than in isolation, more often thanexpected by chance [see Additional file 1: Figure S6]). Theultrastable unmethylated CpGs are overrepresented inCGIs, rather than in shores and shelves. In contrast,ultrastable methylated CpGs are underrepresented inCGIs but overrepresented in CGI shelves [see Additionalfile 1: Figure S7]. This distribution is expected as CpGs inCGIs are generally unmethylated and those in the rest ofthe genome tend to be methylated. However, the extremestability of these sites led us to hypothesize that the ultra-stable CpGs might reflect other features of the CGIs theyassociate with, leading us to focus further investigation onCGIs. We leave a deeper analysis of the 1,134 non-CGI-associated ultrastable sites as a topic for future study.Profiles of regions containing ultrastable CpG sitesWe stratified CGIs and their associated flanking shoresand shelves into four categories based on the presenceor absence of an ultrastable CpG. For brevity, followingthe terminology of [25], we use the term ‘resort’ to referto the complex of a CGI and its flanking shores andshelves. We created a methylation profile for each resortcategory by aligning the CGIs, shores and shelves andplotting the mean methylation level of each CpG assayedin the resorts (see legend to Figure 1C and Methods). Asshown in Figure 1C, an interesting pattern emerges. Re-sorts that contain at least one methylated and unmethy-lated ultrastable CpG (top panel) have a strikingly highcontrast between the low methylation level of the CGIcompared to the highly methylated shores and shelves.In comparison, resorts that lack ultrastable CpGs do notshow this pattern (bottom panel), and such resorts theCGI can be either methylated or unmethylated, as canbe the shores and shelves. Resorts that have only methyl-ated or unmethylated ultrastable sites show an inter-mediate pattern (middle panels). To get a better sense ofthe correlation structure of methylation levels across sin-gle resorts, we visualized the data at sample-level fortwo characteristic resorts (Figure 1D). Generally, and inthe examples shown, resorts with high contrast betweenCGI and shore/shelf show a very consistent patternacross samples whereas others do not. By analogy to thepreviously reported methylation ‘valleys’ and ‘canyons’[14,15], we refer to the sharp pattern shown in Figure 1Dtop panel as a ‘ravine.’ We note that ravines genomic posi-tions do not overlap with canyons or valleys (in additionto being smaller; ravines average 785 bp of unmethylatedregion, canyons >3.5 kb and valleys >5 kb). Because genebody methylation has been previously reported to be posi-tively correlated with gene expression [26,27], we furthertested whether the super-additive effect we observe couldbe explained by a ravine being equivalent to a CGI next toa highly methylated gene body. This appears to not be thecase as ravines are symmetrical with respect to transcrip-tion direction, and ravines can be found away from genebodies [see Additional file 1: Figure S8]. A further exten-sive comparison of ravines to a number of previously-defined methylation domain types shows that ravinesrepresent a novel aspect of the methylome [see Additionalfile 1: Table S4]. To confirm our findings were not due tosome idiosyncrasy of the set of 450 K samples or the pa-rameters we used to define ravines, we tested whether theravines had the same properties on an additional set of757 samples of similar variety, which became availableafter we started our study [see Additional file 1: Table S5and Figure S9]. The results show that the CGIs we classifyas ravines, whether uniformly unmethylated or ‘other,’have the same features in the new data set, strongly sup-porting the idea that ravines are stable features of thesegenomic regions.Ravines are associated with active transcriptionTo identify ravines more comprehensively, we quantifiedthe difference between the CGI and shore/shelf methyla-tion levels (‘steepness’) for all 450 K resorts. In this man-ner we ranked all 27176 resorts assayed on the 450 K fortheir ‘ravine-ness’, independent of whether they containedan ultrastable CpG. As depicted in Figure 2A, the 1,500resorts with steepest ravines (mean steepness 0.638) repre-sent the most extreme ravine pattern (hereafter referredto as ‘steep ravines’) whereas the 1,500 unmethylatedresorts with the lowest ravine steepness (CGI meanmethylation <0.3 and mean steepness 0.097) show a moreuniform pattern (hereafter referred to as ‘uniformlyunmethylated resorts,’ mean methylation and CpG densityof resorts [see Additional file 1: Table S6]).To test whether the high methylation in the shoreshad an impact on the associated gene expression, weused the ENCODE DNase-sequencing data [2] as an indir-ect measure of non tissue-specific transcriptional activity.Edgar et al. Epigenetics & Chromatin 2014, 7:28 Page 5 of 12 CGIs are generally associated with hightranscriptional activity at their associated gene [7]. As ex-pected, uniformly unmethylated resorts show significantly(P <0.001, Wilcoxon rank sum (Wilcoxon RS) test) higherDNase sensitivity than all other resorts. Interestingly, theDNaseHypersensitivityScoreDNaseUniformly Unmethylated Resorts Steep R0Uniformly Unmethylated Resorts1.00.50.0Steep RaResort PlpmaSOEG7371ssor cAat eBnaeMBC-0.25 0.25 -0.25-0.25 0.25 - Unmethylated Resort GenesSteep GenesnoisserpxEev it al eRnaeM) st esat aDammeG0581ssorcAeneGdet ai cossAhcaE(Expressio10005000(1,255 genes)(1,500)(1,500)-2,500 2,500 -2,500Figure 2 Ravines are associated with higher transcriptional activity. (Aresorts forming the steep ravine class and the least steep unmethylated resDNase sensitivity scores for each resort class. (C) Distribution of gene expreCGI in the uniformly unmethylated, ravine or other resort classes. Density obehind the box plots.steep ravines show significantly (P <0.001, Wilcoxon RStest) higher DNase sensitivity than the uniform resorts(Figure 2B). Since the main difference between the steepravines and the uniformly unmethylated resorts is thehighly methylated shores and shelves, it suggests that this Score Densityavines (1,500) Other Resorts0vines0Other Resortsosition (bp)0.25 -0.25 0.250.25 -0.25 0.25Ravine Resort Other Resort Genesn Level Density(1,389 genes)(1,500) (19,290)(19,290)(12,165 genes)2,500 -2,500 2,500) Resorts are classified based on steepness, with the steepest 1500orts forming the uniformly unmethylated class. (B) Distribution ofssion levels of all genes associated (5’, promoter or intragenic) with af DNase scores and expression levels are shown by the violin plotshigh methylation on the edges of CGIs facilitates a tran-scriptionally permissive state. The relationship betweenhigh ravine steepness and high transcriptional activity issupported by analysis of a diverse set of microarray ex-pression experiments (see Methods). Averaged across ex-pression data sets, the expression of genes associated withsteep ravines is significantly higher (P <0.001, t-test) thanfor the uniformly unmethylated resorts (Figure 2C).We next tested whether the steepness of ravines waspredictive of gene expression, beyond that which is pos-sible using methylation level of the CGI alone, using aregression approach (see Methods). Gene expressionvariance (R2) explained by CGI methylation level aloneis 4.6%, comparable to previous reports [28,29] eventhough our expression and methylation data comes fromdifferent sources. Variance in expression levels explainedby resort steepness alone is 3.4%. In combination, ravinesteepness and CGI methylation level explain 9.8% of theexpression variance, significantly greater than would beexpected if they were purely additive (significant inter-action, P <0.001, ANOVA).The association of ravines with high transcriptional activitywas also supported by ENCODE RNA polymerase II bindingdata (POLR2A; [2], [see Additional file 1: Figure S10]). Activetranscription of ravine associated genes is not explained bychanges in histone marks as ravine CGIs show no significantdifferences in the 12 histone marks measured by ENCODE([2]; [see Additional file 1: Figure S11]). However, uniformlyunmethylated resorts do show significant differences inH3k27me3 and H3k4me1 marks (P <0.001, Wilcoxon RStest; [see Additional file 1: Figure S11]).Ravines are associated with housekeeping genesTaken together, the consistency of the ravine pattern,high DNase sensitivity and high associated gene expres-sion, both across a variety of tissues and conditions sug-gests the genes associated with steep ravines are universallyactive in human cells. Indeed, we find that steep ravine-associated genes are significantly associated with a curatedset of housekeeping genes (P <0.001, Fishers exact test;Figure 3), but not with tissue-specific genes [30]. Incontrast the uniformly unmethylated resorts are notsignificantly associated with either set of genes (Figure 3).However uniformly unmethylated resorts are over repre-sented for gene ontology (GO) groups for developmentand disease ontology (DO) groups for developmentResort Class Associated GeneseGene ListgEdgar et al. Epigenetics & Chromatin 2014, 7:28 Page 6 of 12 RavinesHouse Keeping Tissue SpecificGengnippalr evOseneGf ot necr eP1050*(1,573 genes)Figure 3 Steep ravine genes are overrepresented for housekeepingsteep ravines or uniformly unmethylated resorts with a list of housekeepingshow mean overlap of housekeeping and tissue specific lists with randomUniformly Unmethylated Resorts  House Keeping Tissue Specific CategoryRandom Genes  (1,465 genes)enes. Dark bars show the percent overlap of genes associated withgenes (2,064 genes) or tissue specific genes (2,293 genes). Light barsgene lists from all 450 K resort associated genes.Edgar et al. Epigenetics & Chromatin 2014, 7:28 Page 7 of 12 diseases ([31]; [see Additional file 1: TablesS7 and S8]). Ravine associated genes had no significantenrichment for GO groups or diseases. Suggesting that ra-vines, which are maintained across tissues and conditions,may be regulatory features associated with the expressionof ubiquitous genes, while uniformly unmethylated resortassociated genes function in development.DiscussionOur contributions in this paper are twofold. First, weidentified a subset of CpGs in the human genome that ap-pears to be highly stable in their methylated or unmethy-lated state, across diverse developmental states and celltypes. Second, we identified a subclass of CGIs that havean unusually high contrast between the methylation stateof the CGI and the flanking shores. We found that suchCGIs tend to be found near highly expressed genes. Whilethe 450 K array only measures a subset of CpGs in thehuman genome, our results are consistent with a rolefor shore methylation in the regulation of genes whichare ‘always on’.The existence of CpGs with ultrastable methylation statesreveals a previously undocumented feature of the humanmethylome. While some level of stability has been previ-ously noted in differentiated somatic cells, the dramaticchanges in methylation during development and differencesbetween tissues [32] suggested that much of the methylomeis dynamic. In contrast, our analysis suggests that a subsetof the human methylome is highly stable across differenti-ated cell types, cancer cells, embryonic stem cells, inducedpluripotent stem cells, trophoblasts and germ cells. Theconsistency of the CpGs across our developmental andgerm cell samples suggests the state of the sites we foundto be ultrastable are established early in development andthen maintained in all studied differentiated tissues.We cannot rule out the possibility that some of theultrastable CpGs we identified will have a different statein some cell type or physiological state not yet examined.However, the data set we have assembled covers manyof the states previously identified with variability, includ-ing between tissues [22], developmental states [9] anddiseases [13,33]. Indeed, we suspect there are manyother CpGs in the human genome that show unusualstability but not revealed by our study. Our analysis useda very stringent threshold, disallowing even single excep-tions; additional CpGs are ‘nearly stable.’ Furthermorethe 450 K array does not assay most of the CpGs in thegenome, some of which are likely to also be ultrastable.As there are only a few samples of certain tissue types,we could not assess the potential existence of tissue-specific ultrastable CpGs. Future experiments to assessadditional CpGs and larger numbers and varieties ofsamples will help further elucidate the scope of ultra-stable CpGs.Our association of ultrastable CpGs with TSSs and re-sorts (94.5% in resorts) agrees with the previous observa-tions that differentially methylated (that is, dynamic)regions are primarily located far from the TSSs, outsideof resorts [22]. However, we note that the 450 K array isbiased towards resort CpGs. The small subset of ultra-stable CpGs we observe that are not in resorts (5.5%)hints that many other ultrastable CpGs may be outsideresorts. Because most of the CpGs we identified are inor near resorts, we focused our analysis on their poten-tial roles in resort function. Regarding the ravine pat-tern, we note that some degree of contrast betweenshores and CGIs is expected: the majority of CpGs areknown to be methylated, with CGIs being the excep-tion. However, we show that this contrast is not ob-served in all resorts, and is particularly striking inresorts that also contain ultrastable unmethylated CpGs.The association of steep ravines with higher gene ex-pression levels, high DNase I sensitivity and highPOLR2A occupancy provides a novel and biologicallymeaningful classification of human CGIs that comple-ments earlier efforts [5].In our study (as in many others), we attempted to re-late the methylation state of a region to the expressionlevel of nearby genes. However, it is not clear how to tellif a CGI is in a position to influence (or be influencedby) a gene. The examination of ravines may providesome insight. The classic association of genes with CGIsis based on the presence of a CGI in a gene’s 5′ pro-moter. This association is sufficiently strong that it wasoriginally used to annotate human genes [34]. However,many CGIs do not appear to function as 5′ gene promoters([35]; [see Additional file 1: Figure S12]). In contrast, the ra-vine CGIs are more strongly overrepresented in 5′ pro-moter regions of genes. Thus ravines fit the classic CGIarchetype: an unmethylated CGI in 5′ promoter of a highlyexpressed gene. Ravines share an additional feature in com-mon with the classical CGI, an association with housekeep-ing genes [36,37]. The image of unmethylated 5′ promoterCGIs leading to gene expression may be more specific toravines and not true for resorts and CGIs in general. Astable ravine pattern at many 5′ promoters supports theemerging idea that it is crucial to examine non-promoterCpGs and CGIs in differential methylation analysis as non-promoter regions may have more dynamic methylationthan 5′ promoter regions.While CGIs are the classic unit of focus for humanmethylation studies, other groups have focused on iden-tifying other types of methylation domains [14-18] thathave some overlap with CGI classes we identified. Spe-cifically, uniformly unmethylated resorts (non-ravines)are encompassed by canyons and valleys [14,15] morethan other resorts, suggesting that uniformly unmethy-lated resorts, canyons and valleys may be related domainsEdgar et al. Epigenetics & Chromatin 2014, 7:28 Page 8 of 12[see Additional file 1: Table S4]. Subsets of canyons andvalleys lack H3k27me3 similar to uniformly unmethylatedresorts. Additionally, uniformly methylated resorts, can-yons and valleys are all enriched for genes which functionin development. Thus uniformly unmethylated resorts areconfirmation of canyons and valleys as features of themethylome in a greater variety of tissues. Ravines on theother hand minimally overlap with canyons, valleys ormost other previously defined methylation domains.Additionally ravines show no obvious relation to his-tone marks. Other regulatory mechanisms are likely beinvolved which explain ravine association with stableand high gene expression.One of our observations is that across a wide range oftissue types and developmental stages, DNA methylationflanking CGIs is positively correlated with gene expres-sion, especially when the CGI has a very low methylationlevel. Previously, positive correlations between shoremethylation and gene expression have been reported insome studies. Hansen et al. [38] and VanderKraats et al.[39] found tissue-specific ravine-like patterns emergingbetween cancer and healthy states as a differential methy-lation signature. This suggests that ravines might not justbe a static feature associated with housekeeping genes, butone that can be generated under different conditions.While our manuscript was under review, Lou et al. [40]reported gene body methylation changes associated withincreases in gene expression, a pattern that may also havea relationship to the ravines we observed. Although theassociation they saw was specifically in blood and limitedto a family trio, it is further evidence that ravine-likepatterns are positively correlated with gene expression.Another potentially relevant study showing a patternsimilar to our ravines, from Wu et al. [41], found thatthe shores and shelves of unmethylated 5′ promoterCGIs, are associated with high Dnmt3a activity in themouse genome. Wu et al. also found Dnmt3a- shoreand shelf DNA methylation is associated with increasedgene expression. We hypothesize that the regions identi-fied by Wu et al. may correspond to ravines, but we wereunable to confirm this with the information available. Wespeculate that possible Dmnt3a activity at steep ravines’shores and shelves could function to antagonize the bind-ing of transcriptional repressors. The previous work onDnmt3a binding at gene promoters also found that shoreand shelf methylation in proximal promoters antagonizedpolycomb protein-binding [41]. Interestingly, uniformlyunmethylated resorts had a higher association withpolycomb binding sites than steep ravines ([42]; [seeAdditional file 1: Polycomb Binding Sites]). On theother hand, we did not find evidence that ravines areassociated with low H3k27me3, as would be predictedfrom polycomb binding inhibition [see Additional file 1:ENCODE Histone Modifications]. To resolve the functionof ravines, it will be important to further explore their re-lationship with polycomb binding and other regulatorymechanisms.An alternate model for ravine function is that tran-scription factors that bind methylated CpGs could bedirectly affected by shore and shelf methylation. How-ever, most studies of methyl-CpG-binding proteins showthey function to repress gene expression, agreeing withthe classical model of any methylation in promotersbeing repressive [7]. There is, however, recent evidenceof the methyl-CpG-binding protein MeCP2 havingtranscription activating function at promoters withmethylated CpGs [43]. A model where MeCP2 bindsmethylated shores at gene promoters and performs itstranscription activation function could explain the as-sociation of methylated shores and shelves with highgene expression.ConclusionsIn summary, ravines are a novel subset of CGIs, distinctfrom previously identified methylome domains. The ultra-stable CpGs and ravine consistency across samples sug-gests they are stable component of the human methylome.While the ravines suggest that CGI shore methylation isstably associated with high gene expression, other workhas shown some CGI shores methylation to be highly dy-namic. Both results support the overall importance ofshores for gene expression. The presence of ravines in the5′ gene promoters of many actively transcribed genes sup-ports a complex role for methylation in both activating andrepressing expression.MethodsData collectionAs of 30 April 2013, 58 unique sample series run on theIllumina 450 K platform (GPL13534 or GPL16304) wereavailable in the Gene Expression Omnibus (GEO) [23].Using the R Bioconductor package ‘GEOquery’ 2.26.2[44], the series were collected and considered for qualitycontrol [see Additional file 1: Table S1]).Quality controlTo qualify for inclusion in our study, samples had tohave beta values for all 485,577 probes, disqualifying19 series. An additional four studies that involved dir-ect global manipulation (genetic or chemical) of DNAmethylation were also removed (DNMT1; DNMT3bdouble knockouts or methyltransferase treatment). Fivemore series were considered unsuitable for the meta-analysis, for individual reasons, and removed (that is,mislabeled data, high amount of missing data in allsamples, multiple arrays grouped together, etcetera; seeAdditional file 1: Individual Study Quality Control fordetails of data exclusion justifications). Within eachEdgar et al. Epigenetics & Chromatin 2014, 7:28 Page 9 of 12, individual samples were further assessed for quality.Eight samples with unusually high numbers of missingvalues (5 SD from the mean, corresponding to >0.4% or1957 were removed.Ultrastable cytosine guanine dinucleotide callingA three-component mixture model was fit to eachseries beta distribution using the R ‘mixtools’ package[45]. The mean was calculated for each component;μ +2sd and μ - 2sd were used as the unmethylatedand methylated beta value thresholds, respectively,for each series separately [see Additional file 1: FigureS1]. For each sample, unmethylated and methylatedprobes were called based on the thresholds computedfor the series. Typical thresholds were near beta valuesof 0.2 and 0.8. Probes that were scored as methylatedor unmethylated in all 1,737 samples were deemed‘ultrastable.’ENCODE confirmation of ultrastable cytosine guaninedinucleotidesData from 102 ENCODE RRBS samples was collectedfrom UCSC (Release 3 of ENCODE/HudsonAlpha RRBSdata; [2]). In many RRBS studies, reads with <10 foldcoverage [46,47] are discarded; therefore, a ten-fold cover-age cutoff was used on the ENCODE RRBS data. CpGswere considered methylated in ENCODE RRBS data iftheir percent methylation was >80 and unmethylated ifthe CpG percent methylation was <20 [see Additionalfile 1: Figure S2].Methyltransferase confirmation of ultrastable cytosineguanine dinucleotidesFour methyltransferase 450 K studies (DNMT1; DNMT3bdouble knockouts, methyltransferase inhibitor or methyl-transferase treated) with a total 68 samples were availableon GEO [see Additional file 1: Table S1]. The studies wereexcluded from the ultrastable site calling, and the states ofthe ultrastable sites were then checked in the 68 samples.MRE-Seq and MeDIP-Seq confirmation of ultrastablecytosine guanine dinucleotidesFrom the NIH Roadmap Epigenomics Mapping Consor-tium data [48] 7 MRE-Seq and 7 MeDIP-Seq samples wereused from seven tissue types (GSM669604, GSM669614,GSM543007, GSM543021, GSM669600, GSM669610,GSM543009, GSM543023, GSM707017, GSM941725,GSM428286, GSM456941, GSM543013, and GSM543027).Due to computational constraints, here we present data forchromosome 20 (analysis for other chromosomes is a workin progress). For the 10,379 450 K CpGs on chromosome20 the reads covering a CpG seen in either technique wereaveraged across samples. The average number of readsacross samples, from either technique, is used as the signalof methylated (MeDIP-Seq) and unmethylated (MRE-Seq)of a CpG. A Wilcoxon Rank Sum (Wilcoxon RS) test wasused to test the significance of the difference between ultra-stable sites on the array and non-ultrastable sites on thearray.Ultrastable cytosine guanine dinucleotide characterizationTo annotate the CpGs, we used three sources of infor-mation. The first was that provided by Illumina [11] andincluded UCSC CGI a CpG site is associated with andthe CGI relation. CpG shores and shelves are defined bybase pairs from the UCSC defined CGI start and stopcoordinates. Shores are 2 kb from the CGI boundaries[10], and shelves are 2 to 4 kb from the CGI boundary[11]. The second annotation, available on GEO underGPL16304, contains additional probe annotations tothose provided by Illumina under GPL13534 [49], in-cluding distance to nearest TSS. A Student’s t-test wasperformed to determine significantly different distanceto TSS between all CpGs and ultrastable CpGs.Composite profile of resortsThe 27176 resorts have a range of lengths (minimum201 bp, maximum 45,710 bp, mean 935 bp). To allowcomparison of resorts, the position of a CpG in a CGIwas converted to the CpG's relative position in a CGI ofthe mean CGI size (935 bp). As an example, a CpG 200bp from the start of a 1,200 bp CGI would be shown at155.83 bp from the start of the CGI in the compositeplot. Conversion of CpG position to a relative valueallowed comparisons of CGIs of varying sizes. Resortshores include all CpGs less than 2 kb from the CGIstart or end. CGI shelves include all CpGs 2 to 4 kbfrom the CGI start or end. Since shores and shelves arefixed sizes CpG positions within shores and shelves areshown at their actual, not relative, distance from theCGI boundaries.Resort classifier based on ravine steepnessSteepness of a ravine was only calculated for those re-sorts which had at least one CpG measured on the 450K array in each relevant part of the result (CGI, thenorth shore or shelf and the south shore or shelf; 22,290resorts). Steepness was calculated as mean beta methyla-tion level of the CGI CpGs subtracted from the meanbeta methylation of the shore and shelf CpGs. Steep ra-vines were arbitrarily defined as those with the 1,500highest steepness values. Uniformly unmethylated re-sorts were defined as those with a CGI mean methyla-tion <0.3 and the 1,500 lowest steepness values.ENCODE DNase sensitivity dataENCODE UCSC DNase clusters track (wgEncodeRegD-naseClusteredV2) from the University of WashingtonEdgar et al. Epigenetics & Chromatin 2014, 7:28 Page 10 of 12 Duke University were collected for 125 cells types[2]. DNase score for a CGI was calculated by taking thescore for any DNase hot spot overlapping a CGI body. Ifmultiple DNase hot spots overlapped a CGI, the scoreswere weighted by the amount of the CGI the DNasepeak overlapped. Wilcoxon RS tests on the DNase datawere performed among the three classes of resorts.Cytosine guanine dinucleotide island-to-gene associationsThere are multiple methods of annotating a CGI with agene association, including the annotation a CGI withclosest gene TSS to the CpGs making up a CGI [49], theposition of the CpGs making up a CGI in a gene’s bodyor promoter [11], or overlap of an entire CGI with agene’s body or promoter [35]. Each yields a slightly differ-ent CGI to gene associations. Even with a given method, aCGI can end up associated with more than one gene [seeAdditional file 1: Figure S4]. For this study, an inclusiveCGI to gene association was used. Genes that overlap aCGI in their promoter or gene body were considered asso-ciated with that CGI. An inclusive association was usedbecause the exact role of CGIs and resorts in regulatinggene expression is unclear. Using inclusive associationswill hopefully capture any possible CGI effects on geneexpression.CGI were considered associated with a gene if the CGIis located in the gene body or in promoter region of agene. Classifications of CGI in promoters and gene bod-ies were based of the [35] definitions. Refseq genes weredownloaded from UCSC. For Refseq genes with multipletranscripts the longest form was used, to capture anypossible intragenic functions. Non-coding RNA (ncRNA)annotations were collected from Ensembl [50]. The finallist included 40,721 unique transcription units. Thereare 21,743 CGI on the 450 K array associated with 17,725genes or ncRNA (39% intragenic CGI, 61% promoterCGI).Although gene expression results are subject to noisefrom incorrect CGI to gene associations, DNase sensitiv-ity data is independent of gene to CGI associations.DNase sensitivity data will capture the effects of methy-lation on transcriptional activity without absolute geneto CGI associations. Until CGI to gene associations aredefinite DNase sensitivity data will be valuable to pairwith methylation for examining transcriptional activity.Gene expression dataGene expression data from 2,021 GEO expression stud-ies were assembled from the Gemma database [51],representing 97,388 samples and 34 tissue types. Expres-sion information was available for 21,733 genes, 14,809of which were associated with one of 22,290 450 K CGI(only those CGI in resorts previously classed by steep-ness were compared for expression). Student’s t tests onthe gene expression data were performed among thethree classes of resorts. Linear regression was done with17,127 CGI (CGI with associated gene expression leveland steepness class). Models were for expression vari-ance with associated resort steepness and associatedCGI mean methylation, and resort steepness and CGImethylation interaction. An F-test was used to show signifi-cant interaction of resort steepness and CGI methylation.Steep ravine-associated gene functionList of steep ravine- and uniform resort-associated genesare the same as those used with the gene expressiondata. One hundred random gene lists of the same lengthas the steep ravine and uniform resort gene associationlists (1,573 and 1,465, respectively) were generated. Per-cent overlap of each random gene list and either thehousekeeping or tissue-specific list was calculated. Meanoverlap of the 100 random lists with the housekeepingand tissue-specific lists were taken as the expected over-lap from comparison with the steep ravine and uniformresort gene lists. Fisher’s exact tests were performed be-tween each random gene list overlap and steepness classgene lists.We used the GO annotations of the 19,389 genes asso-ciated with the 450 K probes [11] and disease ontology(DO) terms from the Phenocarta database [31] for en-richment analysis. Enrichment of GO and DO groups inuniformly unmethylated resort- and ravine-associated genesusing overrepresentation analysis was done in ErmineJ [52].Statistical significance is reported as false discoveryrates computed using the Benjamini-Hochberg methodin ErmineJ. Also calculated are the multifunctionalityscores of the ontology gene sets [53], as well as the P valuescorrected for multifunctionality.Additional filesAdditional file 1: Supplementary Information. Description of data:Additional analyses, figures and tables.Additional file 2: Ultrastable CpGs loci. Description of data: Probe_IDis the 450 K CpG probe ID given by Illumina. State is the state ofultrastable CpG. Coordinate_37 is the Human Genome Build 37 position.Chromosome_37 is the Human Genome Build 37 chromosome.AbbreviationsCDMR: cancer-specific differentially methylated region; CGI: CpG island;CpG: cytosine guanine dinucleotide; cUMR: control unmethylated region;DO: disease ontology; FFPE: formalin-fixed paraffin-embedded; GEO: GeneExpression Omnibus; GO: gene ontology; LMR: low methylated region;LREA: long-range epigenetic activation; LRES: long-range epigeneticsilencing; MeDIP-Seq: methylated DNA immunoprecipitation sequencing;MRE-Seq: methylation-sensitive restriction enzyme sequencing;ncRNA: non-coding RNA; POLR2A: RNA polymerase II; RDMR: reprogrammingspecific differentially methylated region; RRBS: representation bisulfitesequencing; TDMR: tissue specific differentially methylated region;TFBS: transcription factor binding site; TSS: transcription start sites;UMR: unmethylated region; Wilcoxon RS: Wilcoxon rank sum; 450 K: IlluminaInfinium HumanMethylation450 BeadChip.Edgar et al. Epigenetics & Chromatin 2014, 7:28 Page 11 of 12 interestsThe authors declare that they have no competing interests.Authors’ contributionsPPCT assembled and provided the gene expression data from previousstudies. RE performed all other data collection and analysis. PP and EPC weresupervisory authors and involved throughout the project in conceptformation and manuscript edits. All authors read and approved the finalmanuscript.AcknowledgementsWe thank Michael Kobor, Martin Hirst, Sanja Rogic, Shreejoy Tripathy, andMagda Price for comments on the manuscript. This work was supported bythe Natural Science and Engineering Research Council of Canada CREATEResearch Rotation Award to RE; NeuroDevNet Network of Centres ofExcellence; and National Institutes of Health GM076990. Funding for openaccess charge: National Institutes of Health GM076990.Author details1Genome Science and Technology Graduate Program, University of BritishColumbia, 2329 W Mall, Vancouver, BC V6T 1Z4, Canada. 2Centre forHigh-Throughput Biology and Department of Psychiatry, University of BritishColumbia, 2890 E Mall, Vancouver, BC V6T 1Z4, Canada.Received: 24 June 2014 Accepted: 6 October 2014Published: 23 October 2014References1. Jones PA: Functions of DNA methylation: islands, start sites, gene bodiesand beyond. Nat Rev Genet 2012, 13:484–492.2. ENCODE Project Consortium, Bernstein BE, Birney E, Dunham I, Green ED,Gunter C, Snyder M: An integrated encyclopedia of DNA elements in thehuman genome. Nature 2012, 489:57–74.3. Gardiner-Garden M, Frommer M: CpG islands in vertebrate genomes. J MolBiol 1987, 196:261–282.4. Ioshikhes IP, Zhang MQ: Large-scale human promoter mapping usingCpG islands. Nat Genet 2000, 26:61–63.5. Saxonov S, Berg P, Brutlag DL: A genome-wide analysis of CpG dinucleotidesin the human genome distinguishes two distinct classes of promoters.Proc Natl Acad Sci U S A 2006, 103:1412–1417.6. Esteller M: CpG island hypermethylation and tumor suppressor genes: abooming present, a brighter future. Oncogene 2002, 21:5427–5440.7. Bird A: DNA methylation patterns and epigenetic memory. Genes Dev2002, 16:6–21.8. Greally JM: Bidding the CpG island goodbye. Elife 2013, 2:e00593.9. Doi A, Park IH, Wen B, Murakami P, Aryee MJ, Irizarry R, Herb B, Ladd-Acosta C,Rho J, Loewer S, Miller J, Schlaeger T, Daley GQ, Feinberg AP: Differentialmethylation of tissue- and cancer-specific CpG island shores distinguisheshuman induced pluripotent stem cells, embryonic stem cells andfibroblasts. Nat Genet 2009, 41:1350–1353.10. Irizarry RA, Ladd-Acosta C, Wen B, Wu Z, Montano C, Onyango P, Cui H,Gabo K, Rongione M, Webster M, Ji H, Potash JB, Sabunciyan S, Feinberg AP:The human colon cancer methylome shows similar hypo- andhypermethylation at conserved tissue-specific CpG island shores.Nat Genet 2009, 41:178–186.11. Bibikova M, Barnes B, Tsan C, Ho V, Klotzle B, Le JM, Delano D, Zhang L,Schroth GP, Gunderson KL, Fan JB, Shen R: High density DNAmethylation array with single CpG site resolution. Genomics 2011,98:288–295.12. Sandoval J, Heyn H, Moran S, Serra-Musach J, Pujana MA, Bibikova M, Esteller M:Validation of a DNA methylation microarray for 450,000 CpG sites in thehuman genome. Epigenetics 2011, 6:692–702.13. Hon GC, Hawkins RD, Caballero OL, Lo C, Lister R, Pelizzola M, Valsesia A, Ye Z,Kuan S, Edsall LE, Camargo AA, Stevenson BJ, Ecker JR, Bafna V, Strausberg RL,Simpson AJ, Ren B: Global DNA hypomethylation coupled to repressivechromatin domain formation and gene silencing in breast cancer. GenomeRes 2012, 22:246–258.14. Jeong M, Sun D, Luo M, Huang Y, Challen GA, Rodriguez B, Zhang X,Chavez L, Wang H, Hannah R, Kim SB, Yang L, Ko M, Chen R, Gottgens B,Lee JS, Gunaratne P, Godley LA, Darlington GJ, Rao A, Li W, Goodell MA:Large conserved domains of low DNA methylation maintained byDnmt3a. Nat Genet 2014, 46:17–23.15. Xie W, Schultz MD, Lister R, Hou Z, Rajagopal N, Ray P, Whitaker JW, Tian S,Hawkins RD, Leung D, Yang H, Wang T, Lee AY, Swanson SA, Zhang J,Zhu Y, Kim A, Nery JR, Urich MA, Kuan S, Yen CA, Klugman S, Yu P,Suknuntha K, Propson NE, Chen H, Edsall LE, Wagner U, Li Y, Ye Z, et al:Epigenomic analysis of multilineage differentiation of humanembryonic stem cells. Cell 2013, 153:1134–1148.16. Stadler MB, Murr R, Burger L, Ivanek R, Lienert F, Scholer A, van Nimwegen E,Wirbelauer C, Oakeley EJ, Gaidatzis D, Tiwari VK, Schubeler D: DNA-bindingfactors shape the mouse methylome at distal regulatory regions. Nature2011, 480:490–495.17. Bert SA, Robinson MD, Strbenac D, Statham AL, Song JZ, Hulf T, Sutherland RL,Coolen MW, Stirzaker C, Clark SJ: Regional activation of the cancer genomeby long-range epigenetic remodeling. Cancer Cell 2013, 23:9–22.18. Coolen MW, Stirzaker C, Song JZ, Statham AL, Kassir Z, Moreno CS, Young AN,Varma V, Speed TP, Cowley M, Lacaze P, Kaplan W, Robinson MD, Clark SJ:Consolidation of the cancer genome into domains of repressive chromatinby long-range epigenetic silencing (LRES) reduces transcriptional plasticity.NatCell Biol 2010, 12:235–246.19. Mayer W, Niveleau A, Walter J, Fundele R, Haaf T: Demethylation of thezygotic paternal genome. Nature 2000, 403:501–502.20. Feinberg AP, Tycko B: The history of cancer epigenetics. Nat Rev Cancer2004, 4:143–153.21. de la Rica L, Urquiza JM, Gomez-Cabrero D, Islam AB, Lopez-Bigas N, Tegner J,Toes RE, Ballestar E: Identification of novel markers in rheumatoid arthritisthrough integrated analysis of DNA methylation and microRNA expression.J Autoimmun 2013, 41:6–16.22. Ziller MJ, Gu H, Muller F, Donaghey J, Tsai LT, Kohlbacher O, De Jager PL,Rosen ED, Bennett DA, Bernstein BE, Gnirke A, Meissner A: Charting adynamic DNA methylation landscape of the human genome. Nature 2013,500:477–481.23. Edgar R, Domrachev M, Lash AE: Gene expression omnibus: NCBI geneexpression and hybridization array data repository. Nucleic Acids Res 2002,30:207–210.24. Warnecke PM, Stirzaker C, Melki JR, Millar DS, Paul CL, Clark SJ: Detectionand measurement of PCR bias in quantitative methylation analysis ofbisulphite-treated DNA. Nucleic Acids Res 1997, 25:4422–4426.25. Sofer T, Schifano ED, Hoppin JA, Hou L, Baccarelli AA: A-clustering: a novelmethod for the detection of co-regulated methylation regions, and regionsassociated with exposure. Bioinformatics 2013, 29:2884–2891.26. Lister R, Pelizzola M, Dowen RH, Hawkins RD, Hon G, Tonti-Filippini J, Nery JR, LeeL, Ye Z, Ngo QM, Edsall L, Antosiewicz-Bourget J, Stewart R, Ruotti V, Millar AH,Thomson JA, Ren B, Ecker JR: Human DNA methylomes at base resolutionshow widespread epigenomic differences. Nature 2009, 462:315–322.27. Ball MP, Li JB, Gao Y, Lee JH, LeProust EM, Park IH, Xie B, Daley GQ, Church GM:Targeted and genome-scale strategies reveal gene-body methylationsignatures in human cells. Nat Biotechnol 2009, 27:361–368.28. Lam LL, Emberly E, Fraser HB, Neumann SM, Chen E, Miller GE, Kobor MS:Factors underlying variable DNA methylation in a human communitycohort. Proc Natl Acad Sci U S A 2012, 109(Suppl 2):17253–17260.29. van Eijk KR, de Jong S, Boks MP, Langeveld T, Colas F, Veldink JH, de Kovel CG,Janson E, Strengman E, Langfelder P, Kahn RS, van den Berg LH, Horvath S,Ophoff RA: Genetic analysis of DNA methylation and gene expression levelsin whole blood of healthy human subjects. BMC Genomics 2012, 13:636.30. Chang CW, Cheng WC, Chen CR, Shu WY, Tsai ML, Huang CL, Hsu IC:Identification of human housekeeping genes and tissue-selective genesby microarray meta-analysis. PLoS One 2011, 6:e22859.31. Portales-Casamar E, Ch'ng C, Lui F, St-Georges N, Zoubarev A, Lai AY, Lee M,Kwok C, Kwok W, Tseng L, Pavlidis P: Neurocarta: aggregating and sharingdisease-gene relations for the neurosciences. BMC Genomics 2013, 14:129.32. Hackett JA, Surani MA: DNA methylation dynamics during the mammalianlife cycle. Philos Trans R Soc Lond B Biol Sci 2013, 368:20110328. doi.33. Sproul D, Meehan RR: Genomic insights into cancer-associated aberrantCpG island hypermethylation. Brief Funct Genomics 2013, 12:174–190.34. Venter JC, Adams MD, Myers EW, Li PW, Mural RJ, Sutton GG, Smith HO,Yandell M, Evans CA, Holt RA, Gocayne JD, Amanatides P, Ballew RM, Huson DH,Wortman JR, Zhang Q, Kodira CD, Zheng XH, Chen L, Skupski M, Subramanian G,Thomas PD, Zhang J, Gabor Miklos GL, Nelson C, Broder S, Clark AG, Nadeau J,McKusick VA, Zinder N, et al: The sequence of the human genome. Science2001, 291:1304–1351.35. Maunakea AK, Nagarajan RP, Bilenky M, Ballinger TJ, D'Souza C, Fouse SD,Johnson BE, Hong C, Nielsen C, Zhao Y, Turecki G, Delaney A, Varhol R,Thiessen N, Shchors K, Heine VM, Rowitch DH, Xing X, Fiore C, Schillebeeckx M,Jones SJ, Haussler D, Marra MA, Hirst M, Wang T, Costello JF: Conserved roleEdgar et al. Epigenetics & Chromatin 2014, 7:28 Page 12 of 12 intragenic DNA methylation in regulating alternative promoters.Nature 2010, 466:253–257.36. Larsen F, Gundersen G, Lopez R, Prydz H: CpG islands as gene markers inthe human genome. Genomics 1992, 13:1095–1107.37. Zhu J, He F, Hu S, Yu J: On the nature of human housekeeping genes.Trends Genet 2008, 24:481–484.38. Hansen KD, Langmead B, Irizarry RA: BSmooth: from whole genomebisulfite sequencing reads to differentially methylated regions.Genome Biol 2012, 13:R83.39. Vanderkraats ND, Hiken JF, Decker KF, Edwards JR: Discovering high-resolutionpatterns of differential DNA methylation that correlate with gene expressionchanges. Nucleic Acids Res 2013, 41:6816–6827.40. Lou S, Lee HM, Qin H, Li JW, Gao Z, Liu X, Chan LL, Kl Lam V, So WY, Wang Y,Lok S, Wang J, Ma RC, Tsui SK, Chan JC, Chan TF, Yip KY: Whole-genomebisulfite sequencing of multiple individuals reveals complementary roles ofpromoter and gene body methylation in transcriptional regulation.Genome Biol 2014, 15:408-014-0408-0.41. Wu H, Coskun V, Tao J, Xie W, Ge W, Yoshikawa K, Li E, Zhang Y, Sun YE:Dnmt3a-dependent nonpromoter DNA methylation facilitatestranscription of neurogenic genes. Science 2010, 329:444–448.42. Lee TI, Jenner RG, Boyer LA, Guenther MG, Levine SS, Kumar RM, Chevalier B,Johnstone SE, Cole MF, Isono K, Koseki H, Fuchikami T, Abe K, Murray HL,Zucker JP, Yuan B, Bell GW, Herbolsheimer E, Hannett NM, Sun K, Odom DT,Otte AP, Volkert TL, Bartel DP, Melton DA, Gifford DK, Jaenisch R, Young RA:Control of developmental regulators by Polycomb in human embryonicstem cells. Cell 2006, 125:301–313.43. Yasui DH, Peddada S, Bieda MC, Vallero RO, Hogart A, Nagarajan RP,Thatcher KN, Farnham PJ, Lasalle JM: Integrated epigenomic analyses ofneuronal MeCP2 reveal a role for long-range interaction with activegenes. Proc Natl Acad Sci U S A 2007, 104:19416–19421.44. Davis S, Meltzer PS: GEOquery: a bridge between the gene expressionomnibus (GEO) and BioConductor. Bioinformatics 2007, 23:1846–1847.45. Benaglia T, Chauveau D, Hunter DR, Young D: Mixtools: an R package foranalyzing finite mixture models. J Stat Softw 2009, 32:1–29. 5.46. Akalin A, Garrett-Bakelman FE, Kormaksson M, Busuttil J, Zhang L, Khrebtukova I,Milne TA, Huang Y, Biswas D, Hess JL, Allis CD, Roeder RG, Valk PJ, Lowenberg B,Delwel R, Fernandez HF, Paietta E, Tallman MS, Schroth GP, Mason CE, Melnick A,Figueroa ME: Base-pair resolution DNA methylation sequencing revealsprofoundly divergent epigenetic landscapes in acute myeloidleukemia. PLoS Genet 2012, 8:e1002781.47. Varley KE, Gertz J, Bowling KM, Parker SL, Reddy TE, Pauli-Behn F, Cross MK,Williams BA, Stamatoyannopoulos JA, Crawford GE, Absher DM, Wold BJ,Myers RM: Dynamic DNA methylation across diverse human cell linesand tissues. Genome Res 2013, 23:555–567.48. Bernstein BE, Stamatoyannopoulos JA, Costello JF, Ren B, Milosavljevic A,Meissner A, Kellis M, Marra MA, Beaudet AL, Ecker JR, Farnham PJ, Hirst M,Lander ES, Mikkelsen TS, Thomson JA: The NIH roadmap epigenomicsmapping consortium. Nat Biotechnol 2010, 28:1045–1048.49. Price ME, Cotton AM, Lam LL, Farre P, Emberly E, Brown CJ, Robinson WP,Kobor MS: Additional annotation enhances potential for biologically-relevantanalysis of the Illumina Infinium Human Methylation 450 Bead Chip array.Epigenetics Chromatin 2013, 6:4.50. Guttman M, Amit I, Garber M, French C, Lin MF, Feldser D, Huarte M, Zuk O,Carey BW, Cassady JP, Cabili MN, Jaenisch R, Mikkelsen TS, Jacks T, Hacohen N,Bernstein BE, Kellis M, Regev A, Rinn JL, Lander ES: Chromatin signaturereveals over a thousand highly conserved large non-coding RNAs inmammals. Nature 2009, 458:223–227.51. Zoubarev A, Hamer KM, Keshav KD, McCarthy EL, Santos JR, Van Rossum T,McDonald C, Hall A, Wan X, Lim R, Gillis J, Pavlidis P: Gemma: a resourceSubmit your next manuscript to BioMed Centraland take full advantage of: • Convenient online submission• Thorough peer review• No space constraints or color figure charges• Immediate publication on acceptance• Inclusion in PubMed, CAS, Scopus and Google Scholar• Research which is freely available for redistributionfor the reuse, sharing and meta-analysis of expression profiling data.Bioinformatics 2012, 28:2272–2273.52. Lee HK, Braynen W, Keshav K, Pavlidis P: ErmineJ: tool for functionalanalysis of gene expression data sets. BMC Bioinformatics 2005, 6:269.53. Gillis J, Pavlidis P: The impact of multifunctional genes on "guilt byassociation" analysis. PLoS One 2011, 6:e17258.doi:10.1186/1756-8935-7-28Cite this article as: Edgar et al.: Meta-analysis of human methylomesreveals stably methylated sequences surrounding CpG islandsassociated with high gene expression. Epigenetics & Chromatin 2014 7:28.Submit your manuscript at


Citation Scheme:


Citations by CSL (citeproc-js)

Usage Statistics



Customize your widget with the following options, then copy and paste the code below into the HTML of your page to embed this item in your website.
                            <div id="ubcOpenCollectionsWidgetDisplay">
                            <script id="ubcOpenCollectionsWidget"
                            async >
IIIF logo Our image viewer uses the IIIF 2.0 standard. To load this item in other compatible viewers, use this url:


Related Items