Open Collections

UBC Faculty Research and Publications

Additional annotation enhances potential for biologically-relevant analysis of the Illumina Infinium… Price, Magda E; Cotton, Allison M; Lam, Lucia L; Farré, Pau; Emberly, Eldon; Brown, Carolyn J; Robinson, Wendy P; Kobor, Michael S Mar 3, 2013

Your browser doesn't seem to have a PDF viewer, please download the PDF to view this item.

Item Metadata

Download

Media
52383-13072_2012_Article_244.pdf [ 355.07kB ]
Metadata
JSON: 52383-1.0223974.json
JSON-LD: 52383-1.0223974-ld.json
RDF/XML (Pretty): 52383-1.0223974-rdf.xml
RDF/JSON: 52383-1.0223974-rdf.json
Turtle: 52383-1.0223974-turtle.txt
N-Triples: 52383-1.0223974-rdf-ntriples.txt
Original Record: 52383-1.0223974-source.json
Full Text
52383-1.0223974-fulltext.txt
Citation
52383-1.0223974.ris

Full Text

RESEARCH Open AccessAdditional annotation enhances potential forbiologically-relevant analysis of the IlluminaInfinium HumanMethylation450 BeadChip arrayE Magda Price1,2,3, Allison M Cotton3, Lucia L Lam2,4, Pau Farré5, Eldon Emberly5, Carolyn J Brown3,Wendy P Robinson2,3 and Michael S Kobor2,3,4*AbstractBackground: Measurement of genome-wide DNA methylation (DNAm) has become an important avenue forinvestigating potential physiologically-relevant epigenetic changes. Illumina Infinium (Illumina, San Diego, CA, USA)is a commercially available microarray suite used to measure DNAm at many sites throughout the genome.However, it has been suggested that a subset of array probes may give misleading results due to issues related toprobe design. To facilitate biologically significant data interpretation, we set out to enhance probe annotation ofthe newest Infinium array, the HumanMethylation450 BeadChip (450 k), with >485,000 probes covering 99% ofReference Sequence (RefSeq) genes (National Center for Biotechnology Information (NCBI), Bethesda, MD, USA).Annotation that was added or expanded on includes: 1) documented SNPs in the probe target, 2) probe bindingspecificity, 3) CpG classification of target sites and 4) gene feature classification of target sites.Results: Probes with documented SNPs at the target CpG (4.3% of probes) were associated with increased within-tissue variation in DNAm. An example of a probe with a SNP at the target CpG demonstrated how samplegenotype can confound the measurement of DNAm. Additionally, 8.6% of probes mapped to multiple locations insilico. Measurements from these non-specific probes likely represent a combination of DNAm from multiplegenomic sites. The expanded biological annotation demonstrated that based on DNAm, grouping probes by analternative high-density and intermediate-density CpG island classification provided a distinctive pattern of DNAm.Finally, variable enrichment for differentially methylated probes was noted across CpG classes and gene featuregroups, dependant on the tissues that were compared.Conclusion: DNAm arrays offer a high-throughput approach for which careful consideration of probe contentshould be utilized to better understand the biological processes affected. Probes containing SNPs and non-specificprobes may affect the assessment of DNAm using the 450 k array. Additionally, probe classification by CpGenrichment classes and to a lesser extent gene feature groups resulted in distinct patterns of DNAm. Thus, werecommend that compromised probes be removed from analyses and that the genomic context of DNAm isconsidered in studies deciphering the biological meaning of Illumina 450 k array data.Keywords: Infinium HumanMethylation450 BeadChip array, DNA methylation, non-specific probes, Polymorphicprobes, CpG islands, Annotation, CpG enrichment, Tissue-specific DNA methylation, Repetitive elements, 450 k* Correspondence: msk@cmmt.ubc.ca2The Child & Family Research Institute, 950 West 28th Avenue, Vancouver, BCV5Z 4H4, Canada3Department of Medical Genetics, University of British Columbia, 2329 WestMall, Vancouver, BC V6T 1Z3, CanadaFull list of author information is available at the end of the article© 2013 Price et al.; licensee BioMed Central Ltd. This is an Open Access article distributed under the terms of the CreativeCommons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, andreproduction in any medium, provided the original work is properly cited.Price et al. Epigenetics & Chromatin 2013, 6:4http://www.epigeneticsandchromatin.com/content/6/1/4BackgroundMeasuring epigenetic marks has become an attractiveapproach for connecting phenotype, genetics and envir-onment in many fields of medicine [1-3]. DNA methyla-tion (DNAm), the addition of a methyl group primarilyto cytosines in the context of CpG dinucleotides, is onesuch highly studied epigenetic mark. Epigenome-wideassociation studies (EWAS) of DNAm have been pro-posed as a complement to genome-wide associationstudies (GWAS) for elucidating loci correlated withcomplex disease [4]. Although these large-scale associ-ation studies provide a great amount of information,there are currently limits to our ability to interpret thisdata [5] given the variability of DNAm across individ-uals, ethnicities, sex, age, tissue type and environment[6,7]. To improve the analysis potential of a popular toolfor large-scale measurement of DNAm, the InfiniumHumanMethylation450 BeadChip (450 k) (Illumina, SanDiego, CA, USA), we have annotated technically unreli-able probes and enhanced the biological annotation ofthis DNAm microarray.The 450 k array combines two technically distinct assaysin one platform: the Infinium I assay (type I probes) andInfinium II assay (type II probes) (see methods section fordetails). The design and specifications of the 450 k arrayhave been discussed in other publications [8-10], andextensive probe annotation is available from Illumina toaid users in data interpretation. This annotation includes,for example, probe location within genes (annotated byUniversity of California, Santa Cruz (UCSC) GenomeBrowser (http://genome.ucsc.edu; UCSC Genome Bio-informatics, Santa Cruz, CA, USA), CpG islands andshores, and regulatory features. However, recently tech-nical limitations of the Infinium platform have been de-scribed [11,12]. In 2011, an evaluation of an earlierversion of the array, the Infinium HumanMethylation27kBeadChip (27 k) that used exclusively the Infinium I assay,identified two groups of probes as possibly compromisedby their design [10]. The first group, accounting for about6 to 10% of the 27 k array, was non-specific probes, thatis, probes that hybridized to multiple genomic locations insilico. The level of DNAm at non-specific probes likely re-flects a combination of DNAm at the various locations towhich these probes hybridize. The second group of unreli-able probes was those with a polymorphic target (0.24% of27 k probes). Since the Infinium DNAm platform usesquantitative genotyping of C/T SNPs introduced followingbisulfite conversion to determine the level of DNAm,probes with polymorphisms at the target C or G havethe potential of assessing a difference in genotype ratherthan a true difference in DNAm. A corresponding in-crease in the number of both non-specific probes andpolymorphic probes is expected given the similar tech-nology of the 450 k array [12].CpG dinucleotides are not randomly distributedthroughout the genome, most have spontaneously de-aminated with the exception of some CpG-enriched re-gions known as ‘CpG islands’ [13]. About 70% of genepromoters are associated with CpG islands [14] andtraditionally gene transcription has been thought to berepressed by the presence of promoter CpG islandDNAm [15,16]. There are different approaches for clas-sifying CpG enrichment, for example, UCSC definesCpG islands based on CG content >50%, Observed/Expected (Obs/Exp) CpG ratio >0.6 and length >200bps [17]. An alternative classification of CpG islandsproviding more enrichment discrimination is high-density CpG islands (HCs, CG content >55%, Obs/ExpCpG ratio >0.75 and length >500 bps), intermediate-density CpG islands (ICs, CG content >50%, Obs/ExpCpG ratio >0.48 and length >200 bps) and non-islands(LCs or low-density CpG regions, non-HC/IC regions)[16,18]. However, the most biologically meaningful def-inition of CpG enrichment remains to be determined.In the past, many DNAm studies focused on pro-moters and CpG islands however, recently attention hasalso turned towards the study of DNAm patterns in theregions surrounding islands, known as shores. CpG is-land shores have been observed to be variably methyl-ated between unrelated individuals, in cancer and in iPScell lines [19-21]. The level of DNAm in shores may infact be more highly correlated with gene expression thanthat of CpG islands [22]. Furthermore, tissue-specificgene expression has been associated with tissue differ-ences in DNAm at shores [19], perhaps as a conse-quence of transcription machinery binding to nearbypromoter CpG islands. Others have shown that DNAmoutside of CpG islands and shores may also be associ-ated with gene expression. For example, one study iden-tified lower levels of gene body DNAm associated withthe lowest and highest levels of gene expression, whereashigher levels of gene body DNAm were associated withintermediate levels of gene expression [23].While the 450 k array offers the opportunity to exam-ine DNAm at individual CpGs across CpG island andnon-island regions, the inclusion of this diverse range ofsites requires a more complex and detailed annotationof the array. To enhance the utility of the 450 k array,we increased probe annotation in four areas: 1) docu-mented SNPs in the target CpG, 2) probe binding speci-ficity, 3) CpG classification of target sites and 4) genefeature classification of target sites. We tested the ex-panded annotation in a set of control samples of interestto our investigations: adult blood (n = 4), child buccal(n = 4) and placental chorionic villi (n = 4), and followedup some analyses in a larger, publically available blooddataset (n = 261). In particular, we evaluated DNAm pat-terns relative to functional aspects of probe location,Price et al. Epigenetics & Chromatin 2013, 6:4 Page 2 of 15http://www.epigeneticsandchromatin.com/content/6/1/4while considering the effects of technically biasedprobes. Based on our analyses, we recommend that usersanalyze 450 k data with the following factors in mind: 1)DNAm measured at probes with SNPs in the target sitemay be compromised by sample genotype, 2) DNAmmeasured at non-specific probes may not only representthe intended site of hybridization and 3) DNAm variesacross CpG enrichment classes as well as gene features.Results and discussionPolymorphic CpGs may affect the assessment of DNAmInfinium assays are based on quantifying bisulfite-introduced C/T SNPs, thus the actual DNA sequence atthe target CpG is at risk of compromising the assess-ment of DNAm. The end of each 450 k probe targets aCpG of interest and although the alignment of type Iand type II probes with CpGs differs by one base pair(Additional file 1), end nucleotide match is essential forextension of both probe types. A SNP leading to a se-quence change at a target CpG might result in a falseDNAm signal due to hybridization of the wrong probe(possible for type I probes) or no/minimal extension atthe target site (possible for both probe types). Illuminaincluded annotation of SNPs located within 10 bps of thetarget CpG (SNP <10 bp, n = 36,535 probes) and thoselocated within the remainder of the probe (SNP >10 bp,n = 59,892 probes). We have added annotation of probesthat query CpGs with documented polymorphisms specif-ically at the C and/or G position (target CpG SNPs).Using the database of single nucleotide polymorphisms(dbSNP, National Center for Biotechnology Information(NCBI), Bethesda, MD, USA), one or more SNPs wereannotated at 4.3% (n = 20,869) of target CpGs (Figure 1A).Most of these probes had only one target CpG SNP(n = 20,270), however, 599 had two or more (Additionalfile 2); 32.5% of probes with a target CpG SNP were notdocumented as variable by dbSNP, while 43.2% had a het-erozygosity greater than 0.1 (Figure 1A). Being more fre-quent in the population, this second group of SNPs ismore likely to affect the assessment of DNAm. The major-ity (67.3%) of the rs numbers for probes with target CpGSNPs corresponded to those annotated by Illumina as aSNP <10 bp. Differences between the annotations may bea result of our inclusion of SNPs in the C or G of the tar-get CpG (whereas Illumina only annotated SNPs in theprobe sequence, see Additional file 1), updates to thedbSNP database and the possibility that Illumina used aminimum heterozygosity as SNP inclusion criteria.Theoretically, a bi- or tri-modal distribution of DNAmwould be produced by a probe affected by sample geno-types at a target CpG SNP and this pattern would resultin a high within-tissue SD in ß value (450 k array meas-ure of DNAm ranging from 0 to 1). Thus, we examinedthe distribution of within-tissue SD in ß (n = 4/tissue) atprobes annotated with a target CpG SNP, SNP <10 bp(excluding those probes also annotated with a targetCpG SNP) and SNP >10 bp (Figure 1B, results forblood). The distribution of SD in ß for probes annotatedwith a target CpG SNP was most different (p= 1.78 ×19-15) from that of all probes based on a Kolmogorov-Smirnov (KS) test for difference in distribution. Thistrend was illustrated by a shift in the density curve forSD in ß of probes with target CpG SNPs in comparisonto the curve for SD in ß of all probes (Figure 1B). To en-sure that this finding was not an artifact of our smallsample size, we performed the same analysis using alarger, publically available dataset, Gene ExpressionOmnibus (GEO) [GSE:40279], that had investigatedage-associated DNAm changes in the blood of 656 indi-viduals (aged 19 to 101 years) [24]. We extracted theyounger half of samples (n = 261, aged 19 to 61 years) forour analysis since this roughly covered the age of theblood samples in our study. In this larger dataset (referredto as the ‘aging dataset’), the distribution of SD in ß forprobes annotated with a target CpG SNP also exhibitedthe largest difference in distribution from that of allprobes (p = 1.22 × 10-14) based on a KS test (Figure 1C).We next hypothesized that highly variable probes (de-fined as within-tissue SD in ß ≥0.25), were likelycompromised by the presence of a target CpG SNP.There were 780 such probes in blood, 819 in buccal, 666in chorionic villus samples and 480 in the aging datasetthat met this criterion (Table 1). We did not expect thenumber of probes affected by SNPs to be the sameacross tissues since all samples were from different indi-viduals and thus of different genotypes. Comparing thesevariably methylated probes to the SNP annotation,85.0%, 81.6%, 72.7% and 92.5% were annotated with atarget CpG SNP in blood, buccal, chorionic villus sam-ples and the aging dataset, respectively (Table 1). Of thehighly variable probes, only four in blood, two in buccaland two in chorionic villi overlapped with the sex-specific autosomal probes described in the next section,thus we do not believe that these large SDs were drivenby sex differences in the data. No probes in the agingdataset met this criterion.To confirm that a target CpG SNP could affect DNAm,samples were genotyped at a probe (cg06961873) that hadan annotated target CpG SNP and SD in ß ≥0.25 in allthree tissues. As predicted, homozygous C samples wereassessed as hypermethylated, heterozygotes were assessedas approximately 50% methylated and homozygous T sam-ples as hypomethylated (Figure 1D). Although we werenot able to genotype samples, a histogram of DNAm atthis same CpG site across the 261 aging dataset samplesshowed the same trimodal pattern of DNAm (Additionalfile 3). Other examples of highly variable probes in theaging dataset also illustrate this pattern (Additional file 3).Price et al. Epigenetics & Chromatin 2013, 6:4 Page 3 of 15http://www.epigeneticsandchromatin.com/content/6/1/4Given the demonstrated potential to bias the call ofDNAm, we suggest that probes with a target CpG SNPshould be disregarded in most analyses of the 450 karray. At minimum, 450 k users should carefully checkcandidate probes against the target CpG SNP annotationin addition to a current SNP database, as more polymor-phisms are likely to be identified and validated in com-ing years. Although we have used a straightforwardexample to illustrate how a target CpG SNP may con-found the assessment of DNAm, effects may also be ob-served at SNPs within the remainder of the probe, thatis, outside of the target CpG. For example, polymor-phisms throughout the interval of hybridization havebeen shown to affect the binding of probes used inIllumina mRNA expression arrays [25], which have thesame probe lengths as the 450 k array. Similar effectshave also been observed in Affymetrix mRNA expressionarrays (Affymetrix, Santa Clara, CA, USA), althoughthese use shorter probes that might be more sensitive tosequence mismatches [26]. Additionally, several studieshave recognized the heritability of DNAm through thegenetic-epigenetic interaction of methylation-associatedSNPs (mSNPs) [27,28], suggesting that some SNP-associated differences in DNAm may be true differencesand not due to technical artifacts.8 to 9% of probes mapped to more than one location insilicoAn additional confounding feature of the Infinium arraysis that some probes potentially map to multiple locations0.1 0.2 0.3 0.4 0.50510151520051020Standard deviation in  valueDensityDensityAll probesSNP>10bp (0.21)SNP<10bp (0.10)Target CpG SNP (0.32)0.1 0.2 0.3 0.4 0.5Standard deviation in  valueAll probesSNP>10bp (0.14)SNP<10bp (0.19)Target CpG SNP (0.25)A00.4-0.5+0.3-0.40.2-0.30.1-0.20-0.1Heterozygosity distribution of SNPsBC16.6%5.7%8.2%12.7%24.3%probes with no target CpG SNP 95.7%contain target CpG SNP4.3%32.5%level of DNA methylation (ß value)sample genotype0.00.20.40.60.81.0CCTCTTbloodbuccalchorionic villiDFigure 1 Probes targeting polymorphic CpGs may affect the assessment of DNAm. (A) A documented SNP was identified at the target C orG position of 4.3% of 450 k probes (target CpG SNP). Of these SNPs, 43.2% had a heterozygosity of >0.1 and due to their frequency in thepopulation are more likely to affect measurement of DNAm. (B) Using blood samples (n = 4) as example, the SD in ß value between individualswas calculated for all probes. Probes with small SD in ß (<0.10) were removed from the analysis. The distribution of SD in ß value was plotted forall probes, and for the subsets of probes annotated with a target CpG SNP, a SNP within 10 bps of the target but without a target CpG SNP (SNP<10 bp) and a SNP within the remainder of the probe (SNP >10 bp). Numbers in brackets indicate Kolmogorov-Smirnov (KS) statistics incomparison to the distribution of all probes. (C) Using a selection of 261 adult blood samples extracted from the aging dataset [GSE:40279], thedistribution of SD in ß value was plotted for the subsets of probes as described in (B). Numbers in brackets indicate KS statistics in comparison tothe distribution of all probes. (D) DNAm at probe cg06961873 across 12 individuals exemplified the trichotomous pattern of DNAm hypothesizedat a target CpG SNP. The three distinct levels of DNAm corresponded to sample genotype at SNP rs61775206, located at the target CpG: TTgenotypes were assessed as hypomethylated, TC genotypes as approximately 50% methylated and CC genotypes as close to fully methylated.450 k, Infinium HumanMethylation450 BeadChip; DNAm, DNA methylation.Price et al. Epigenetics & Chromatin 2013, 6:4 Page 4 of 15http://www.epigeneticsandchromatin.com/content/6/1/4in the genome [10]. Signals from these non-specificprobes likely represent a combination of DNAm at morethan one location. Using alignment to four different insilico bisulfite-treated genomes [10], we identified 11.2%(n = 15,125) of type I probes and 7.7% (n = 26,812) oftype II probes on the 450 k array as non-specific (totalof 8.6% of 450 k probes). While the number of cross-hybridization loci per probe ranged from 2 to 1615, themajority of non-specific probes cross-hybridized to be-tween two and five locations (52.4% of type I non-specific probes and 65.2% of type II non-specific probes).Within non-specific probes, 600 were intended to targetsex chromosomes but also mapped to autosomalchromosomes, while 11,412 were intended to targetautosomal chromosomes but also mapped to sex chro-mosomes (Table 2). Other publications have describedhow this second group of probes may be problematic instudies assessing sex differences in DNAm on autosomesor in studies where male and female subjects are ana-lyzed together [10,29]. Thus we included in our annota-tion whether each probe cross-hybridized to sex orautosomal chromosomes.In the aging dataset, after excluding sex chromosomeprobes, but not our annotated non-specific probes, weused a false discovery rate (FDR) and minimum differ-ence in DNAm (Δß) between sexes to identify autosomalprobes that were differentially methylated between males(n = 133) and females (n = 128). An FDR of <1% andminimum Δß of 10% identified 75 sex-specific auto-somal probes of which 40% were annotated to cross-hybridize to the sex chromosomes (Additional file 4).Although some true sex differences in DNAm likelyexist on the autosomes, this result indicates that manyof the large autosomal sex differences in DNAm may bean artifact of probe design and likely actually representsex-chromosome differences in DNAm. Depending on theresearch question, some investigators may choose to ex-clude all or a subset of non-specific probes prior to dataanalysis, while others may include them and follow-upcandidate probe specificity before establishing conclusions.Homologous gene families, duplicated genes or repeti-tive elements have been proposed as potential causes ofin silico cross-hybridization of Infinium probes [10].Thus, for all 450 k probes, we annotated the number ofnucleotides at the intended site of hybridization thatmapped to repetitive DNA based on RepeatMasker(http://www.repeatmasker.org; RepeatMasker, Institutefor Systems Biology, Seattle, WA, USA) annotation inBLAT [30]. For 72,957 probes (15.0% of 450 k probes),more than half of the nucleotides in the probe (>25 bps)was targeted to repetitive DNA. We had annotated19,731 of these repetitive probes as non-specific, whichreflects their in silico cross-hybridization. Interestingly,for 24,847 specific probes (that is, mapped only to theintended target), the entire probe (50 bps) was in repeti-tive DNA. This group of specific repetitive probes mightbe exploited to assess DNAm of repetitive elements; ofinterest to studies investigating changes in DNAm in can-cer or in association with environmental exposure [31,32].Comparing Illumina and HIL annotation of probeshighlighted differences between CpG classificationsystemsAs previously mentioned, the 450 k array includesprobes designed to target UCSC CpG islands, as well asshores, shelves and non-island regions, which we referto as the ‘sea’ [9] (Additional file 5A, see methods forclass definitions). Alternative ‘HIL’ CpG classes (that is,Table 1 The majority of highly variable probes were annotated with SNPsTissueHighly variable probesa Blood Buccal Chorionic villi Aging datasetTotal 780 819 666 480Annotated with target CpG SNP 663 (85.0) 668 (81.6) 484 (72.7) 444 (92.5)aDefined as within-tissue SD in ß ≥0.25. Number in brackets is percentage (%) of total/tissue.Table 2 Location of in silico cross-hybridization of non-specific probesIntended probe target: auto chrs Intended probe target: sex chrsTotal on array 473,864 11,648Non-specific probes Cross-hybridize only to auto chrs 29,178 371Cross-hybridize only to sex chrs 540 747Cross-hybridize to auto and sex chrs 10,872 229Total: cross-hybridize to sex chrs 11,412 976Total: cross-hybridize to auto chrs 40,050 600Total 40,590 1,347Auto, autosomal; chrs, chromosomes.Price et al. Epigenetics & Chromatin 2013, 6:4 Page 5 of 15http://www.epigeneticsandchromatin.com/content/6/1/4high-density CpG island (HC), intermediate-density CpGisland (IC) and non-island (LC)) provide a different criter-ion for probe annotation based on CpG enrichment. Weexpanded the 450 k annotation by categorizing probesinto four HIL classes: 1) HC probes, 2) IC probes, 3)ICshore probes (regions of intermediate-density CpG is-land that border HCs) and 4) LC probes (Additional file5B, see methods for class definitions) [16,18].The distribution of probes within each Illumina-annotated CpG class was compared to the distribution ofprobes within each HIL-annotated CpG class (Additionalfile 6). The majority of probes were classified as anticipated,with 77.6% of HIL-annotated HC probes annotated asIllumina island probes, 65.0% of HIL-annotated ICshoreprobes annotated as Illumina shore probes and 61.5% ofHIL-annotated LC probes annotated as Illumina sea probes(Figure 2). The largest difference in annotation was thatclose to half of HIL-annotated IC probes (51.0%) wereIllumina-annotated sea probes, while the remainder of ICprobes was distributed across Illumina-annotated islands(17.2%), shores (19.9%) and shelves (11.9%).To elucidate potential functional differences betweenCpG classes, we examined the distribution of DNAm forboth Illumina and HIL-annotated CpG classes (forblood, Additional file 7; buccal, Additional file 8 andchorionic villi, Additional file 9). Within each classifica-tion system, all distribution curves were significantly dif-ferent from each other. On average, KS statistics werelarger for comparisons between HIL CpG classes thanfor Illumina CpG classes (Additional files 7, 8, 9), indica-tive of more distinct distributions of DNAm in HIL CpGclasses.Using blood as example, ß values were separated intothree categories: hypomethylated (ß values of 0 to ≤0.2),heterogeneously methylated (ß values of >0.2 to <0.8)and hypermethylated (ß values of ≥0.8 to 1.0) (Figure 3and Additional file 10) [7,33]. The majority of both HIL-annotated HC probes (79.2%) and Illumina-annotated is-land probes (72.3%) fell in the hypomethylated categoryin blood, consistent with the characteristic pattern ofCpG island DNAm [33,34]. The distribution of DNAmwithin Illumina-annotated shore probes, HIL-annotatedHC probes (n=153,859) IC probes (n=118,727)IC shore probes (n=33,955) LC probes (n=178,971)Island probes (n=150,254)Shore probes (n=112,067)Shelf probes (n=47,144)Sea probes (n=176,047)19.5%(n=29,889)0.3%(n=492)2.6%(n=4,033)11.9%(n=14,191)51.0%(n=60,514)19.9%(n=23,644)17.9%(n=32,056)0.2%(n=375)20.4%(n=36,465)65.0%(n=22,069)1.2%(n=405)4.2%(n=1,425)61.5%(n=110,075)Illumina-annotated29.6%(n=10,056)77.6%(n=119,445)17.2%(n=20,378)Figure 2 Comparison of the genomic distribution of Illumina-annotated CpG probe classes within each HIL-annotated CpG probe class.Within HCs, ICshores and LCs, the majority of probes were categorized into the respective Illumina-annotated CpG class. However, even thoughICs and ICshores have the same CpG density, the distribution of probes based on Illumina CpG classes was different between these two HILclasses, suggesting a functional difference between ICs that border HCs and isolated ICs. HC, high-density CpG island; HIL, high-density CpGisland (HC), intermediate-density CpG island (IC) and non-island (LC); ICshore, intermediate-density CpG island shore.Price et al. Epigenetics & Chromatin 2013, 6:4 Page 6 of 15http://www.epigeneticsandchromatin.com/content/6/1/4IC probes and HIL-annotated ICshore probes was differ-ent (for example, in the hypomethylated category 34.0%,13.6% and 46.1%, respectively), suggesting that these CpGclasses are distinctive. Interestingly, a higher proportion ofIllumina-annotated shelf probes than Illumina-annotatedsea probes were hypermethylated (72.6% vs 66.4% respect-ively), perhaps attributable to the differing CpG enrich-ment profile within shelves and seas (as demonstrated bythe contribution of HIL-annotated HC, IC and LC probesto each of these classes, Additional file 6).Previous studies have shown that tissue-specific differ-ences in DNAm occur in CpG island shores [19]. Wewere interested in assessing where tissue-specific differ-ences in DNAm occur based on the Illumina versus HILCpG classes. Thus, we examined probes that were differ-entially methylated between tissues (tDM) for enrich-ment within each CpG class. The highest number oftDM probes were identified between blood versuschorionic villus samples (91,255, 21.3% of probes), incomparison to chorionic villus versus buccal samples(75,021, 17.5% of probes) and blood versus buccalsamples (69,174, 16.2% of probes). tDM probes were sig-nificantly depleted in Illumina-annotated islands andHIL-annotated HCs, and significantly enriched in allother CpG classes (Figure 4). Interestingly, the level ofenrichment within each CpG class varied by the tissuescompared (Figure 4 and Additional file 11).A goal of the additional CpG classification of 450 kprobes was to identify biologically-relevant structuresthat might underlie genome-wide changes in DNAm.The HIL CpG classes demonstrated a more extremeDNAm profile and larger percentage of tDM probeswhich may be reflective of biological processes. Intri-guingly, even though ICs and ICshores have the sameCpG density, distinct differences between these twoclasses emerged in our analyses, suggesting that ICs thatborder HCs are distinct from ICs on their own,highlighting the utility of this additional classification.DNAm was variable across nine gene feature groupsThere is increasing evidence that DNAm of gene fea-tures outside of CpG islands and promoters may be animportant marker of gene expression. For example, ithas been shown that DNAm of the first exon is corre-lated with transcriptional repression [35]. Coverage ofregions outside of CpG islands and promoters was dra-matically increased from the 27 k to 450 k array, how-ever Illumina only categorized probes into six genefeature groups: TSS1500 (within 1500 bps of a transcrip-tion start site (TSS)), TSS200 (within 200 bps of a TSS),5’UTR (5’ untranslated region), first exon, body and3’UTR (3’ untranslated region). Given the number ofprobes on the array, a more detailed gene structure clas-sification might increase the potential to observe subtlebiologically-relevant trends in DNAm. Thus we expandedon gene feature annotation by: 1) annotating the distanceof each probe to the closest TSS and 2) classifying probesinto nine groups based on three gene components (firstexons, exons and introns) and three gene regions (5’UTR,body and 3’UTR). Probes were grouped into: 1) 5’UTRfirst exons, 2) 5’UTR exons, 3) 5’UTR introns, 4) bodyfirst exons, 5) body exons, 6) body introns, 7) 3’UTR firstexons, 8) 3’UTR exons and 9) 3’UTR introns using a)transcript and b) RefGene name. Due to alternative TSSand splicing, a given probe could be categorized intoseveral gene feature categories (Figure 5).Due to the observed differences in DNAm across HILCpG classes detailed in the previous section, gene fea-tures were further subclassified by HIL CpG class.Given the known bias in the distribution of CpGs in thegenome [14], there was a predictable unequal distribu-tion of the proportion of probes annotated to each HILCpG class across gene feature groups (Figure 6 andAdditional file 12). For example, HC probes wereIslandShoreShelfSeaHC ICICshoreLC100806040200Percentage of CpG classIllumina-annotated CpG probe classesHIL-annotated CpG probe classes136,712100,08339,833151,588139,826100,16430,467157,759Figure 3 Distinct patterns of DNAm within CpG classificationsystems. Probes were grouped into three levels of DNAm based onaverage ß values within a tissue: hypomethylated (ß values of 0 to≤0.2, yellow), heterogeneously methylated (ß values of >0.2 to <0.8,light blue) and hypermethylated (ß values of ≥0.8 to 1, dark blue).The percentage of probes in Illumina and HIL-annotated CpG classeswas plotted for the three levels of DNAm in blood (n = 4). HIL CpGclasses were more characteristic in their DNAm profiles thanIllumina-annotated CpG classes. Numbers on top of bars indicatenumber of probes/class. DNAm, DNA methylation; HIL, high-densityCpG island (HC), intermediate-density CpG island (IC) and non-island(LC); ICshore, intermediate-density CpG island shore.Price et al. Epigenetics & Chromatin 2013, 6:4 Page 7 of 15http://www.epigeneticsandchromatin.com/content/6/1/4significantly overrepresented in first exons found in the5’UTR and gene body, while LC probes were signifi-cantly underrepresented in both these groups. Withineach HIL CpG class, trends in DNAm across gene fea-tures were consistent (Additional file 12). For example,in blood, DNAm of intronic probes increased from5’UTR to 3’UTR to gene body probes (Figure 7A), whileDNAm of 5’UTR probes increased from first exon to in-tron to exon probes (Figure 7B).We were also interested in assessing where tissue-specific differences in DNAm occurred based on genefeatures. Thus, we examined tDM probes for enrichmentwithin each gene feature group (again separated by CpGclass, Additional file 13). tDM probes in first exons weresignificantly depleted in 5’UTRs located in HCs andICshores, but significantly enriched in 5’UTRs located inLCs. HC exons were significantly enriched for tDMprobes in 5’UTR, body and 3’UTR across all tissue com-parisons, perhaps due to biological significance or smallprobe numbers in these categories. Although CpG clas-ses were primarily associated with differences in DNAm,gene structure is also an important factor to considerwhen analyzing 450 k array results.ConclusionWith the advent of next-generation sequencing appliedto bisulfite converted samples, measurement of DNAmwill truly be possible on a genome-wide, sequence-specific scale. However, difficulties currently lie in thealignment of reduced complexity reads as well asbiologically-relevant interpretation of data [36]. Array-based technologies, which target specific genomic regionsof interest, are of value for assessing physiologically-relevant changes in studies of human health and disease.Detailed and comprehensive annotation of locus-specificarrays is paramount to successful analysis and interpret-ation of data.In this article, we presented an expanded annotationof the 450 k array including both compromised probeannotation and additional biologically-relevant annota-tion. Our expanded annotation has been deposited as aplatform on the NCBI GEO (http://www.ncbi.nlm.nih.gov/geo) under the accession [GSE:42409]. Based on ourfindings, we suggest that all 450 k users analyze datawith the following factors in mind. Probe signals may bebiased by the presence of SNPs in the target CpG and/orbinding of probes to multiple genomic locations. SNPs% relative enrichment: blood vs. buccal−60 −40 −20 0 20 40 60SeaShelfShoreIsland% relative enrichment: blood vs. buccal−60 −40 −20 0 20 40 60LCICICshoreHC% relative enrichment: blood vs. chorionic villi−60 −40 −20 0 20 40 60SeaShelfShoreIsland% relative enrichment: blood vs. chorionic villi−60 −40 −20 0 20 40 60LCICICshoreHCA BFigure 4 Enrichment of differentially methylated probes in many CpG classes. Probes that were differentially methylated between bloodand buccal samples (n = 69,174), or between blood and chorionic villus samples (n = 91,255), were assessed for enrichment in (A) Illumina and(B) HIL-annotated CpG classes. Enrichment was plotted as ‘percentage relative enrichment’, representing the enrichment of tDM probes relativeto the total percentage of probes within each CpG class. Negative percentage relative enrichment indicates that tDM probes were depleted inthe given probe-type category whereas positive percentage relative enrichment indicates that tDM probes were enriched in the given probe-type category. All enrichment analyses were significant with the exception of ICshore probes in the comparison of blood versus chorionic villi.HIL, high-density CpG island (HC), intermediate-density CpG island (IC) and non-island (LC); ICshore, intermediate-density CpG island shore; tDM,differentially methylated between tissues.Price et al. Epigenetics & Chromatin 2013, 6:4 Page 8 of 15http://www.epigeneticsandchromatin.com/content/6/1/4at the target CpG may be especially problematic in stud-ies with small sample size, as chance may result in dra-matic differences in the frequency of polymorphismsbetween groups. However, false positives may still resultin studies with larger sample sizes, if groups are not eth-nically matched. Additionally, DNAm patterns withinCpG enrichment classes or gene features could over-shadow findings between study groups if probes are notseparated and considered within these genomic features.There are certainly other methods and filters that canbe applied to 450 k array analyses that were not touchedupon in this article. Notably, a recent study excluded450 k probes that mapped to copy-number variations(CNVs) because of the potential to bias measurement ofDNAm [37]. Furthermore these authors set a criterionof ‘comethylation’ to identify differentially methylatedregions, that is, all probes within a 250 bp window hadto show the same trend in DNAm. As more studiesusing the 450 k array are published, we will undoubtedlysee various combinations of applied filters and methodo-logical practices for data analysis. In this era of extensivedata collection using such high-throughput assays, it isvital that the type of biological sample as well as the re-search question is carefully considered in relation todownstream analytical choices as well as the technicalplatform.MethodsAnnotationTo complete the expanded annotation, we calculatedadditional probe location information based on theIllumina-provided MAPINFO GenomeStudio column(location of the C in the target CpG): 1) the intervalof the target CpG (CpG), 2) the interval containingthe probe but excluding the target CpG (Probew/o CpG)and 3) the interval of the entire probe (entire probe)(Additional files 1 and 14). Probe type (type I vs typeII) and strand of design (F or R) were taken into con-sideration when calculating genomic location. Ten typeI and ten type II probes were manually checkedagainst the annotated probe sequence. A UCSC trackwas created containing the targeted Cs on the 450 karray (Additional file 15). All of the annotation andanalysis of the expanded annotation was conducted on485,512 probes, including both cg (CpG loci) and ch(non-CpG loci) probes but excluding rs (SNP assay)probes, unless otherwise specified.SNP annotationThe dbSNP131 table was imported into Galaxy (http://galaxyproject.org; Galaxy, Pennsylvania State Univer-sity, PA, USA) from UCSC [38]. Only rs numbers forSNPs that were an interval of 1 bp in length and of thehighest quality (weight = 1) were included in the anno-tation. An interval file was uploaded into Galaxy usingAi ii iii450k probes2 5 8B1 3 9C3 6 8D4 8E7Figure 5 Illustration of gene feature annotation. Based on theoverlap of three gene components (first exon vs exon vs intron)with three gene regions (5’UTR vs body vs 3’UTR) probes wereannotated into the following nine gene feature groups: 1) 5’UTR firstexons, 2) 5’UTR exons, 3) 5’UTR introns, 4) body first exons, 5) bodyexons, 6) body introns, 7) 3’UTR first exons, 8) 3’UTR exons and 9)3’UTR introns (corresponding to numbers below transcripts). A givenprobe could be annotated with more than one gene feature, asillustrated by the multiple transcripts (A to E) of a fictional gene.Probe i would be annotated as 5’UTR exon, 5’UTR first exon and5’UTR intron; probe ii would be annotated as body exon, 5’UTRintron, body intron, body first exon; and probe iii would beannotated as 3’UTR exon, 3’UTR intron, 3’UTR exon, 3’UTR exon and3’UTR first exon. White boxes represent untranslated exons, greyboxes represent translated exons. 5’UTR, 5’ untranslated region;3’UTR, 3’ untranslated region.020406080100LCICICshoreHC1st exonsexonsintrons1st exonsexonsintrons1st exonsexonsintrons18,8841,55033,3948,45724,888105,1951,78514,7314,431Percentage of gene feature probes5’UTR Body 3’UTRFigure 6 Contribution of HIL CpG classes to probes in ninegene feature groups. The percentage of probes within each HILCpG class was different for each gene feature group. Numbers ontop of bars indicate the number of probes/gene feature group; atotal of 213,315 probes were located within these nine gene featuregroups. HIL, high-density CpG island (HC), intermediate-density CpGisland (IC) and non-island (LC); ICshore, intermediate-density CpGisland shore.Price et al. Epigenetics & Chromatin 2013, 6:4 Page 9 of 15http://www.epigeneticsandchromatin.com/content/6/1/4the hg19 location we annotated for the interval of eachprobe spanning the C and G of the target CpG for cgprobes only. The probe file was intersected with thedbSNP131 table to create a list of probes with docu-mented SNPs in the C or G of the target CpG (targetCpG SNP). This file was collapsed in R (http://www.r-project.org; R Foundation for Statistical Computing,Vienna University of Economics and Business, Vienna,Austria) to create a list of rs numbers for each probe,since some target CpGs were documented with morethan one SNP. The rs numbers for SNPs in the targetCpG were included in the expanded annotation in the‘target CpG SNP’ column (n = 20,270), while the num-ber of SNPs/probes was annotated in the ‘n_target CpGSNP’ column.Non-specific probe annotationTo identify probes that potentially have multiple gen-omic targets (non-specific probes), we followed themethod described by Chen et al. [10]. Special treatmentof type II probes was required as the Illumina annotationhas noted Cs in CpGs within the probe as an ‘R’ SNP.For type II probes that contained Rs we considered twoprobe sequence versions, one with all Rs replaced by Asand the other with all Rs replaced by Gs. Using these con-ditions, we matched each of the 450 k probes with theIllumina-annotated genomic location (intended target).Briefly, we used BLAT [30] to align probe sequencesto four versions of the hg19 draft sequence genome: 1) afully unmethylated ‘bisulfite treated’ genome, with all Csconverted to Ts; 2) a fully methylated ‘bisulfite treated’genome, with only non-CpG Cs converted to Ts; 3) and4) were the above treatments on the reverse comple-ment sequence. BLAT was run using the following pa-rameters: stepSize = 5, wordsize = 11 and repMatch =1,000,000; lowering the word length led to only fraction-ally more hits. The selection criterion used was aspreviously outlined: for a probe to be considered non-specific, there had to be 90% identity over the alignedregion, at least 40 of 50 matching bps, no gaps, and the50th nucleotide had to align, as the probe hybridizes tothe target CpG at this position [10]. The number ofnon-specific probes hits were annotated in the ex-panded annotation ‘AlleleA_Hits, AlleleB_Hits’ columns,while the site of cross-hybridization was annotated inthe columns ‘XY_Hits’ (if at least one hit was on a sexchromosome) and ‘Autosomal_Hits’ (if at least one hit wason an autosomal chromosome). Repetitive sequences from5'UTR Body 3'UTRIntronic probesLevel of DNA methylation (  value)0.00.20.40.60.81.0HCICshoreICLC1st exon Exon Intron5'UTR probesLevel of DNA methylation (  value)0.00.20.40.60.81.0HCICshoreICLCA Bß ßFigure 7 Variation of gene feature DNAm within a CpG class. The level of DNAm was plotted as an average ß value for each gene feature inblood. Analyses were conducted within each HIL CpG class due to the large differences in DNAm that were observed between classes. Average ßvalues varied across probes by (A) gene location, as exemplified by intronic probes and (B) gene components, as exemplified by 5’UTR probes.5’UTR, 5’ untranslated region; DNAm, DNA methylation; HIL, high-density CpG island (HC), intermediate-density CpG island (IC) and non-island(LC); ICshore, intermediate-density CpG island shore.Price et al. Epigenetics & Chromatin 2013, 6:4 Page 10 of 15http://www.epigeneticsandchromatin.com/content/6/1/4RepeatMasker were marked in lowercase in the four ge-nomes. Thus we identified the amount of repetitive DNAwithin the Illumina-intended alignment of each probe inthe expanded annotation column ‘n_bp_repetitive’.CpG enrichment annotationIllumina categorized probes in CpG islands (GenomeStudiocolumn ‘Relation_to_UCSC_CpG_Island’) based on theUCSC Genome Browser criteria of CG content >50%, Obs/Exp CpG ratio >0.60 and length >200 bps. Shores andshelves were identified based on their relationship to a CpGisland; shores as the 2 kbs up- and down-stream of CpGislands and shelves as the 2 kbs outside of shores. Theremaining probes were located in non-island regions, whichwe refer to as the ‘sea’ [9] (Additional file 5A).We annotated probes into four HIL CpG classes basedon alternative CpG enrichment criteria: high-densityCpG island probes (HC, n = 153,859), intermediate-density CpG island probes (IC, n = 118,727), ICshoreprobes (probes in ICs that border HCs, n = 33,955) andnon-island probes (LC, n = 178,971) (Additional file 5B).This annotation has been added in the ‘HIL_CpG_class’column of the expanded annotation. To locate probeswithin each of the four CpG classes, we first annotatedthese CpG enrichment classes throughout the genome.The hg19 genomic sequence was downloaded fromUCSC in overlapping segments and read by CpGIE, aJava software program [39]. CpGIE searches input se-quences in sliding windows based on user-set criteria.HCs were defined as regions with CG content >55%,Obs/Exp CpG ratio >0.75 and length >500 bps, whileICs were defined as regions with CG content >50%,Obs/Exp CpG ratio >0.48 and length >200 bps [16,18].CpGIE HC and IC output was merged into a single filefor each chromosome, duplicate islands were removedand CpG islands were identified as follows: ICs, isolatedregions of the genome with IC density; ICshores, regionsof the genome with IC density that were next to regionswith HC density; HCs, any region of the genome withHC density; and LCs, regions that were not of IC or HCdensity. Islands were given unique names in the annota-tion, for example, chr8_IC:49890018–49891221 (chr#_CpGclass: genomic start–genomic end). The hg19 HC and ICislands have been complied into a UCSC track available inAdditional file 16. The hg19 HIL annotation wasintersected with the genomic location (hg19) of 450 k tar-gets in Galaxy to assign probes into the four CpG classes.An annotation of probes into HIL CpG islands using thedetailed nomenclature can be found in the expanded anno-tation column ‘HIL_CpG_Island_Name’.Gene feature and TSS annotationUsing the NCBI Reference Sequence (RefSeq) gene an-notation, we annotated probes into nine groups basedon three gene components (first exons, exons andintrons) and three gene regions (5’UTR, body and3’UTR). Probes were grouped into: 1) 5’UTR first exons,2) 5’UTR exons, 3) 5’UTR introns, 4) body first exons,5) body exons, 6) body introns, 7) 3’UTR first exons, 8)3’UTR exons and 9) 3’UTR introns (Figure 5). Briefly,the hg19 RefSeq table was downloaded from UCSC [38].Exon and intron information was extracted and parsedinto genomic interval data with the most upstream exondenoted as the first exon. Next, 5’UTR, gene body and3’UTR location was parsed into genomic interval datautilizing the transcription start/stop and coding start/stop information from RefSeq. Intersection was performedbetween each of 5’UTR, gene body and 3’UTR with firstexon, exon and intron intervals to generate the nine genefeatures. The gene feature intervals were then intersectedwith the hg19 genomic location of 450 k targets in R to as-sign probes into the nine gene features. This annotationwas completed using both RefSeq gene names and tran-script names. Gene feature annotation was conductedusing the GenomicRanges package in R [40].The hg19 UCSC knownGene table [38] was downloadedto Galaxy and the closest TSS for each probe was anno-tated, regardless of whether the probe was located withinthe same gene. For each probe, the distance to the closestTSS, gene name and transcript name was noted in the ex-panded annotation columns ‘Closest_TSS’, ‘Distance_closest_TSS’, ‘Closest_TSS_gene_name’ and ‘Closest_TSS_Transcript’.Sample collectionTwo male and two female chorionic villus samples werecollected through the BC Women’s Hospital & HealthCentre, Vancouver, BC, Canada, as controls for a studyof chromosomal abnormalities in the placenta. DNA wasextracted from a small piece of chorionic villi as previ-ously described [41]. For each placental sample (n = 4),DNA from two independent chorionic villi was com-bined in equal amounts prior to bisulfite conversion toensure a representative sample of the placenta. DNAwas extracted by standard salt method. Two male andtwo female blood samples were collected as adult con-trols for ongoing studies of respiratory disease and epi-genetics (n = 4). Peripheral blood mononuclear cell(PBMC) DNA was extracted according to standard pro-cedures. Buccal epithelial samples were collected fromtwo males and two females for a study on maternal careeffects on childhood DNAm (n = 4). Buccal sampleswere collected using Isohelix DNA Buccal Swabs (CellProjects Ltd, Harrietsham, Kent, UK), and stabilizationreagents and DNA were extracted using Isohelix DNAIsolation Kits (Cell Projects Ltd) as per the manufac-turer’s protocols.Price et al. Epigenetics & Chromatin 2013, 6:4 Page 11 of 15http://www.epigeneticsandchromatin.com/content/6/1/4Illumina 450 k arrayTwo ug of genomic DNA was purified using the DNeasyBlood & Tissue Kit (Qiagen, Valencia, CA, USA) follow-ing the manufacturer’s protocol. Purified DNA qualityand concentration were assessed with a NanoDrop ND-1000 (Thermo Scientific, Waltham, MA, USA) prior tobisulfite conversion. One ug of purified genomic DNAwas bisulfite converted using the EZ DNA MethylationKit (Zymo Research, Orange, CA, USA) following themanufacturer’s protocol. Bisulfite DNA quality and con-centration were assessed using the NanoDrop and, if re-quired, samples were concentrated to approximately 50ng/ul using a SpeedVac (Thermo Electron Corporation,Waltham, MA, USA). Following the Illumina 450 k arrayprotocol, 4 ul of bisulfite converted sample was whole-genome amplified, enzymatically digested, hybridized tothe array and then single nucleotide extension wasperformed [9].Two assay types are used by the 450 k array to meas-ure DNAm: Infinium I (type I probes) and Infinium II(type II probes), bound to beads scattered throughoutthe array. When a probe successfully binds to DNA, asingle fluorescently labeled nucleotide extends off theprobe and this signal is read by an Illumina scanner. TheInfinium I assay uses two bead types specific to the CpGof interest: an unmethylated (u) and a methylated (m)bead, each with a different probe design (ProbeA (u) andProbeB (m)). Both type I probes for a given CpG fluor-esce in the same color channel (either red (Cy5) or green(Cy3)). The Infinium II assay uses only one bead type foreach CpG of interest, an m + u bead. One probe isdesigned for each type II target site and the color offluorescence is based on which nucleotide is incorpo-rated in the single base extension step. The incorpor-ation of an A or T signals an unmethylated site in red(u) and the incorporation of a C or a G signals a methyl-ated site in green (m) [8].Chips were scanned using an Illumina HiScan on atwo-color channel to detect Cy3 labeled probes on thegreen channel and Cy5 labeled probe on the red chan-nel. Illumina GenomeStudio Software 2011.1 was usedto read the array output and conduct backgroundnormalization. The signalA, signalB and probe intensitywere exported for autosomal probes and read into R. Mvalues were generated using the Bioconductor (http://www.bioconductor.org; Fred Hutchinson Cancer Re-search Center, Seattle, WA, USA) methylumi package,M = log2(intensity m + 1/intensity u + 1) since this valuehas been shown to be valid for statistical analyses [42].Following correction for chip to chip color bias usingthe Bioconductor lumi package [43] and probe type cor-rection using subset-quantile within array normalization(SWAN) [44], M values were converted to ß values usingthe equation ß = (2M/(2M + 1)). The ß value is a numberranging from 0 to 1 that is directly proportional to per-centage DNAm; thus to ease interpretation, we havereported results as ß values. The microarray data used inthis article was submitted to the NCBI GEO under ac-cession number [GSE:42409]. Probes with a detection pvalue >0.01 in any sample, probes with no ß value in anysample, all rs and ch probes, all sex chromosome andnon-specific probes were removed prior to analyses. Thelevel of DNAm for 428,216 probes in our sample datasetwas intersected with the expanded annotation for furtheranalyses.Processing of aging datasetSeries matrix files were downloaded for [GSE:40279]containing ß values for 473,039 probes per sample [24].We worked with the subset of samples that roughlymatched the age of the samples used in our study (n =261, aged 19 to 61 years). Probes with no ß value in anysample, all sex chromosome probes, all rs and all chprobes were removed from the dataset. For SNP ana-lyses, non-specific probes were also removed, howeverthese were retained in the analysis of autosomal sex-specific probes. For the discovery of autosomal probeswith sex differences in DNAm, ß values were read intoR, converted into M values using the Bioconductor pack-age lumi [43] and then significance analysis of microarrays(SAM) was conducted using the Bioconductor packagesiggenes [45]. At FDR <1%, 10,139 autosomal probes wereidentified as significantly different between male and fe-male samples. Next, this list was crossed with a list of Δßvalues for each probe calculated by taking the absolutevalue of the difference between average ß of males andaverage ß of females.PyrosequencingProbe cg06961873 was selected for genotype validationof SNP rs61775206 in each sample. Primers weredesigned using PSQ Assay Design software version 1.0.6(Biotage AB, Uppsala, Sweden). Primer sequences andprobe information are available in Additional file 17.Using the following conditions, 0.5 ul of genomic DNAwas PCR-amplified: 95°C for 5 minutes, (95°C for 20 sec-onds, 55°C for 20 seconds, 75°C for 20 seconds) × 50,72°C for 5 minutes. Genotyping was performed using aPyroMark MD system (Biotage AB) and analyzed withPSQ 96MA SNP software (Biotage AB).Statistical analysesA KS test was used to assess the difference in distribu-tion of SD in ß values for probes that contained SNPs.The KS statistic represents the maximum absolutedifference between the cumulative distributions of twofunctions. Probes with small within-tissue SD in ß(<0.10) were removed from all probe groups to increasePrice et al. Epigenetics & Chromatin 2013, 6:4 Page 12 of 15http://www.epigeneticsandchromatin.com/content/6/1/4the power of the analysis. Probes with a target CpG SNPwere removed from the SNP <10 bp group. The numberof probes included in the SD in ß distribution curves forblood samples was 5,450 for all probes, 809 for SNP >10bp, 402 for SNP <10 bp and 2,190 for target CpG SNP,and for the aging dataset was 6,267 for all probes, 1,022for SNP >10 bp, 362 for SNP <10 bp and 2,753 for targetCpG SNP. KS tests were also used to assess the differ-ence in distribution of DNAm between Illumina CpGclasses and between HIL CpG classes. Fisher’s exact testwas used to compare the distribution of the number ofprobes within the three levels of DNAm for both Illuminaand HIL-annotated CpG classes: hypomethylated (ßvalues of 0 to ≤0.2), heterogeneously methylated (ß valuesof >0.2 to <0.8) and hypermethylated (ß values of ≥0.8 to1.0). Enrichment analyses of tDM probes were performedin Python (Python Software Foundation). To select tDMprobes, DNAm was first averaged for each probe within atissue. A z-score was calculated for each probe compari-son between tissues. A p value cutoff of 0.05 was selectedwith a Bonferroni correction to account for repeated com-parisons [19]. KS and Fisher’s exact tests were performedin R. Statistical significance was considered as tests with pvalues <1.0 × 10-7. All figures were created in R andAdobe Illustrator CS6.Additional filesAdditional file 1: Relative location of probes to target CpG. Tocomplete our analysis, it was necessary to locate 450 k probes within thegenome. Illumina annotated the hg19 location of each target C (calledmapinfo) and the strand on which the probe was designed; R probesbind to the negative strand, whereas F probes bind to the positivestrand. With this information we annotated the start and end coordinatesfor all probes on the array. Refer to Additional file 14 for the start andend location for each probe type. Type I versus type II probes and Fversus R probes align differently with target CpGs. Single nucleotideextension of a probe occurs by one of four fluorescently labelednucleotides, A and T are labeled in red, while C and G are labeled ingreen. The color of single nucleotide extension of type I probes is notdependant on whether the target site is methylated or unmethylated;however, for type II probes, incorporation of an A or T signals anunmethylated site in red and the incorporation of a C or a G signals amethylated site in green.Additional file 2: Frequency of target SNP CpG/probe.Additional file 3: Distribution of DNAm at three highly variableprobes. The level of DNAm was plotted for three highly variable probes(SD in ß ≥0.25) annotated with a target CpG SNP, across the 261individuals in the aging dataset. cg06961873 corresponds to the CpG sitegenotyped in Figure 1D. A trimodal pattern of DNAm was observed atthese three exemplary sites, indicating that DNAm measured at thesesites may reflect sample genotype.Additional file 4: List of autosomal probes with sex differences inDNAm.Additional file 5: Illustration of Illumina and HIL CpG classes. (A)Diagram of Illumina-annotated probes, based on their relative location toa CpG island: within the island, shore or shelf. We used the term ‘seaprobes’ to refer to probes that were not annotated into one of theIllumina CpG classes. Islands were defined based on UCSC criteria: CGcontent >50%, Obs/Exp, CpG ratio >0.60 and length >200 bps. Shoreswere defined as the 2 kb up- and down-stream of a CpG island andshelves as the 2 kb outside of a shore. (B) The HIL definition of CpGislands was used to annotate probes into three CpG classes: HC probes(map to a high-density CpG island or HC), IC probes (map to an isolatedintermediate-density CpG island or IC) and ICshore probes (map to aregion with IC density that borders an HC). The remainder of probes didnot map to a CpG island and were thus considered non-island or LCprobes. HCs were defined as CG content >55%, Obs/Exp CpG ratio >0.75and length >500 bps, while ICs were defined as CG content >50%, Obs/Exp CpG ratio >0.48 and length >200 bps.Additional file 6: Distribution of probes within Illumina-annotatedand HIL-annotated CpG classes.Additional file 7: Distinct patterns of DNAm across Illumina-annotated and HIL-annotated CpG classes in blood. Density curveswere plotted using average ß values for probes within each Illumina-annotated and HIL-annotated CpG class in blood (n = 4). The number ofprobes contributing to each curve was: island = 136,712, shore = 100,083,shelf = 39,833, sea = 151,588, HC = 139,826, ICshore = 100,164, IC =30,467 and LC = 157,759. For Illumina-annotated CpG classes, KS statisticsin comparison to the distribution of DNAm of sea probes was 0.67 forisland probes, 0.34 for shore probes and 0.06 for shelf probes. For HIL-annotated CpG classes, KS statistics in comparison to the distribution ofDNAm of LC probes was 0.77 for HC probes, 0.53 for ICshore probes and0.08 for IC probes.Additional file 8: Distinct patterns of DNAm across Illumina-annotated and HIL-annotated CpG classes in buccal samples.Density curves were plotted using average ß values for probes withineach Illumina-annotated and HIL-annotated CpG class in buccal samples(n = 4). The number of probes contributing to each curve was: island =136,712, shore = 100,083, shelf = 39,833, sea = 151,588, HC = 139,826,ICshore = 100,164, IC = 30,467 and LC = 157,759. For Illumina-annotatedCpG classes, KS statistics in comparison to the distribution of DNAm ofsea probes was 0.66 for island probes, 0.32 for shore probes and 0.06 forshelf probes. For HIL-annotated CpG classes, KS statistics in comparisonto the distribution of DNAm of LC probes was 0.76 for HC probes, 0.49for ICshore probes and 0.07 for IC probes.Additional file 9: Distinct patterns of DNAm across Illumina-annotated and HIL-annotated CpG classes in chorionic villi. Densitycurves were plotted using average ß values for probes within eachIllumina-annotated and HIL-annotated CpG class in chorionic villi (n = 4).The number of probes contributing to each curve was: island = 136,712,shore = 100,083, shelf = 39,833, sea = 151,588, HC = 139,826, ICshore =100,164, IC = 30,467 and LC = 157,759. For Illumina-annotated CpGclasses, KS statistics in comparison to the distribution of DNAm of seaprobes was 0.61 for island probes, 0.28 for shore probes and 0.08 forshelf probes. For HIL-annotated CpG classes, KS statistics in comparisonto the distribution of DNAm of LC probes was 0.72 for HC probes, 0.45for ICshore probes and 0.10 for IC probes.Additional file 10: Distribution of ß values within Illumina-annotated and HIL-annotated CpG classes for blood.Additional file 11: Enrichment of differentially methylated probesin Illumina-annotated and HIL-annotated CpG classes.Additional file 12: Average DNAm and SD of nine gene features.Additional file 13: Enrichment of tDM probes within gene features.Additional file 14: Calculation of intended genomic location of450k probes.Additional file 15: UCSC track of 450 k target Cs.Additional file 16: UCSC track of HC/IC CpG islands.Additional file 17: Primers for genotyping validation of a targetCpG SNP.Abbreviations27 k: Infinium HumanMethylation27k BeadChip; 3’UTR: 3’ untranslated region;450 k: Infinium HumanMethylation450 BeadChip; 5’UTR: 5’ untranslatedregion; auto: autosomal; bp: base pair; chrs: chromosomes; CNV: copy-number variation; dbSNP: database of single nucleotide polymorphisms;Price et al. Epigenetics & Chromatin 2013, 6:4 Page 13 of 15http://www.epigeneticsandchromatin.com/content/6/1/4DNAm: DNA methylation; EWAS: epigenome-wide association studies;FDR: false discovery rate; GEO: Gene Expression Omnibus; GWAS: genome-wide association studies; HC: high-density CpG island; HIL: high-density CpGisland (HC), intermediate-density CpG island (IC) and non-island (LC) or low-density regions; IC: intermediate-density CpG island; ICshore: intermediate-density CpG island shore; KS: Kolmogorov-Smirnov; LC: non-island;mSNP: methylation-associated SNP; NCBI: National Center for BiotechnologyInformation; Obs/Exp: Observed/Expected; PBMC: peripheral bloodmononuclear cell; PCR: polymerase chain reaction; RefSeq: ReferenceSequence; SAM: significance analysis of microarrays; SNP: single nucleotidepolymorphism; SWAN: subset-quantile within array normalization;tDM: differentially methylated between tissues; TSS: transcription start site;UCSC: University of California, Santa Cruz.Competing interestsThe authors declare that they have no competing interests.Authors’ contributionsEMP carried out Pyrosequencing, conducted SNP annotation and analysis,participated in CpG island annotation and analysis, participated in the designof the study, and drafted the manuscript. AMC participated in CpG islandannotation and analysis, participated in the design and analysis of the study,and conceived of the study. LLL processed Illumina data, conducted genefeature annotation and participated in the design of the study. PF conductedenrichment analyses. EE conducted non-specific probe analyses andparticipated in the design of the study. MSK and CJB conceived of the study,participated in data analysis and participated in the design of the study. WPRparticipated in data analysis and design of the study. All authors contributedto the writing of the manuscript, and read and approved the final version.AcknowledgementsThe authors would like to thank Courtney Hanna, Dr Kirsten Hogg, SamanthaPeeters, Dr Maria Peñaherrera and John Blair for suggestions and editorialassistance, and Sarah Neumann for aid in array processing. MSK is a Fellowof the Canadian Institute for Advanced Research (CIFAR), Experience-BasedBrain and Biological Development program, and a Scholar of the DjavadMowafaghian Foundation. EE is also a Fellow of the CIFAR. This research wassupported by grants to WPR from the Canadian Institutes for HealthResearch (CIHR, grant MOP-106430), to MSK from the NeuroDevNet, aCanadian Network of Centres of Excellence, and to CJB from the CIHR (grantMOP-13690). EE’s laboratory is supported by a grant from the NationalSciences and Engineering Council of Canada (NSERC).Author details1Department of Obstetrics and Gynaecology, University of British Columbia,2H30-4490 Oak Street, Vancouver, BC V6H 3N1, Canada. 2The Child & FamilyResearch Institute, 950 West 28th Avenue, Vancouver, BC V5Z 4H4, Canada.3Department of Medical Genetics, University of British Columbia, 2329 WestMall, Vancouver, BC V6T 1Z3, Canada. 4Centre for Molecular Medicine andTherapeutics, 950 West 28th Avenue, Vancouver, BC V5Z 4H4, Canada.5Department of Physics, Simon Fraser University, 8888 University Drive,Burnaby, BC V5A 1S6, Canada.Received: 17 November 2012 Accepted: 13 February 2013Published: 3 March 2013References1. Dempster EL, Pidsley R, Schalkwyk LC, Owens S, Georgiades A, Kane F, KalidindiS, Picchioni M, Kravariti E, Toulopoulou T, Murray RM, Mill J: Disease-associatedepigenetic changes in monozygotic twins discordant for schizophrenia andbipolar disorder. Hum Mol Genet 2011, 20(24):4786–4796.2. Hansen KD, Timp W, Bravo HC, Sabunciyan S, Langmead B, McDonald OG,Wen B, Wu H, Liu Y, Diep D, Briem E, Zhang K, Irizarry RA, Feinberg AP:Increased methylation variation in epigenetic domains across cancertypes. Nat Genet 2011, 43(8):768–775.3. Aston KI, Punj V, Liu L, Carrell DT: Genome-wide sperm deoxyribonucleicacid methylation is altered in some men with abnormal chromatinpackaging or poor in vitro fertilization embryogenesis. Fertil Steril 2012,97(2):285–292.4. Rakyan VK, Down TA, Balding DJ, Beck S: Epigenome-wide associationstudies for common human diseases. Nat Rev Genet 2011, 12(8):529–541.5. Heijmans BT, Mill J: Commentary: The seven plagues of epigeneticepidemiology. Int J Epidemiol 2012, 41(1):74–78.6. Foley DL, Craig JM, Morley R, Olsson CA, Dwyer T, Smith K, Saffery R: Prospectsfor epigenetic epidemiology. Am J Epidemiol 2009, 169(4):389–400.7. Lam LL, Emberly E, Fraser HB, Neumann SM, Chen E, Miller GE, Kobor MS:Factors underlying variable DNA methylation in a human communitycohort. Proc Natl Acad Sci U S A 2012, 109(Suppl 2):17253–17260.8. Dedeurwaerder S, Defrance M, Calonne E, Denis H, Sotiriou C, Fuks F:Evaluation of the Infinium Methylation 450 K technology. Epigenomics2011, 3(6):771–784.9. Sandoval J, Heyn H, Moran S, Serra-Musach J, Pujana MA, Bibikova M,Esteller M: Validation of a DNA methylation microarray for 450,000 CpGsites in the human genome. Epigenetics 2011, 6(6):692–702.10. Chen YA, Choufani S, Ferreira JC, Grafodatskaya D, Butcher DT, Weksberg R:Sequence overlap between autosomal and sex-linked probes on theIllumina HumanMethylation27 microarray. Genomics 2011, 97(4):214–222.11. Zhang X, Mu W, Zhang W: On the analysis of the illumina 450 k arraydata: probes ambiguously mapped to the human genome. Front Genet2012, 3:73.12. Morris T, Lowe R: Report on the Infinium 450 k methylation array analysisworkshop: April 20, 2012 UCL, London, UK. Epigenetics 2012, 7(8):961–962.13. Ioshikhes IP, Zhang MQ: Large-scale human promoter mapping usingCpG islands. Nat Genet 2000, 26(1):61–63.14. Saxonov S, Berg P, Brutlag DL: A genome-wide analysis of CpGdinucleotides in the human genome distinguishes two distinct classes ofpromoters. Proc Natl Acad Sci U S A 2006, 103(5):1412–1417.15. Hsieh CL: Dependence of transcriptional repression on CpG methylationdensity. Mol Cell Biol 1994, 14(8):5487–5494.16. Weber M, Hellmann I, Stadler MB, Ramos L, Paabo S, Rebhan M, SchubelerD: Distribution, silencing potential and evolutionary impact of promoterDNA methylation in the human genome. Nat Genet 2007, 39(4):457–466.17. Gardiner-Garden M, Frommer M: CpG islands in vertebrate genomes.J Mol Biol 1987, 196(2):261–282.18. Cotton AM, Lam L, Affleck JG, Wilson IM, Penaherrera MS, McFadden DE,Kobor MS, Lam WL, Robinson WP, Brown CJ: Chromosome-wide DNAmethylation analysis predicts human tissue-specific X inactivation. HumGenet 2011, 130(2):187–201.19. Irizarry RA, Ladd-Acosta C, Wen B, Wu Z, Montano C, Onyango P, Cui H,Gabo K, Rongione M, Webster M, Ji H, Potash JB, Sabunciyan S, Feinberg AP:The human colon cancer methylome shows similar hypo- andhypermethylation at conserved tissue-specific CpG island shores. NatGenet 2009, 41(2):178–186.20. Doi A, Park IH, Wen B, Murakami P, Aryee MJ, Irizarry R, Herb B, Ladd-AcostaC, Rho J, Loewer S, Miller J, Schlaeger T, Daley GQ, Feinberg AP: Differentialmethylation of tissue- and cancer-specific CpG island shoresdistinguishes human induced pluripotent stem cells, embryonic stemcells and fibroblasts. Nat Genet 2009, 41(12):1350–1353.21. Zhang D, Cheng L, Badner JA, Chen C, Chen Q, Luo W, Craig DW, Redman M,Gershon ES, Liu C: Genetic control of individual differences in gene-specificmethylation in human brain. Am J Hum Genet 2010, 86(3):411–419.22. Ji H, Ehrlich LI, Seita J, Murakami P, Doi A, Lindau P, Lee H, Aryee MJ, Irizarry RA,Kim K, Rossi DJ, Inlay MA, Serwold T, Karsunky H, Ho L, Daley GQ, Weissman IL,Feinberg AP: Comprehensive methylome map of lineage commitment fromhaematopoietic progenitors. Nature 2010, 467(7313):338–342.23. Jjingo D, Conley AB, Yi SV, Lunyak VV, Jordan IK: On the presence and roleof human gene-body DNA methylation. Oncotarget 2012, 3(4):462–474.24. Hannum G, Guinney J, Zhao L, Zhang L, Hughes G, Sadda S, Klotzle B,Bibikova M, Fan JB, Gao Y, Deconde R, Chen M, Rajapakse I, Friend S, IdekerT, Zhang K: Genome-wide Methylation Profiles Reveal Quantitative Viewsof Human Aging Rates. Mol Cell 2013, 49(2):359–367.25. Barbosa-Morais NL, Dunning MJ, Samarajiwa SA, Darot JF, Ritchie ME, LynchAG, Tavare S: A re-annotation pipeline for Illumina BeadArrays: improvingthe interpretation of gene expression data. Nucleic Acids Res 2010, 38(3):e17.26. Benovoy D, Kwan T, Majewski J: Effect of polymorphisms within probe-target sequences on olignonucleotide microarray experiments. NucleicAcids Res 2008, 36(13):4417–4423.27. Fraser HB, Lam LL, Neumann SM, Kobor MS: Population-specificity ofhuman DNA methylation. Genome Biol 2012, 13(2):R8.28. Bell JT, Pai AA, Pickrell JK, Gaffney DJ, Pique-Regi R, Degner JF, Gilad Y,Pritchard JK: DNA methylation patterns associate with genetic and geneexpression variation in HapMap cell lines. Genome Biol 2011, 12(1):R10.Price et al. Epigenetics & Chromatin 2013, 6:4 Page 14 of 15http://www.epigeneticsandchromatin.com/content/6/1/429. Blair JD, Price EM: Illuminating Potential Technical Artifacts of DNA-Methylation Array Probes. Am J Hum Genet 2012, 91(4):760–762.30. Kent WJ: BLAT–the BLAST-like alignment tool. Genome Res 2002,12(4):656–664.31. Baccarelli A, Wright RO, Bollati V, Tarantini L, Litonjua AA, Suh HH, Zanobetti A,Sparrow D, Vokonas PS, Schwartz J: Rapid DNA Methylation Changes afterExposure to Traffic Particles. Am J Respir Crit Care Med 2009, 179(7):572–578.32. Rusiecki JA, Baccarelli A, Bollati V, Tarantini L, Moore LE, Bonefeld-JorgensenEC: Global DNA hypomethylation is associated with high serum-persistent organic pollutants in Greenlandic Inuit. Environ Health Perspect2008, 116(11):1547–1552.33. Eckhardt F, Lewin J, Cortese R, Rakyan VK, Attwood J, Burger M, Burton J, CoxTV, Davies R, Down TA, Haefliger C, Horton R, Howe K, Jackson DK, Kunde J,Koenig C, Liddle J, Niblett D, Otto T, Pettett R, Seemann S, Thompson C, WestT, Rogers J, Olek A, Berlin K, Beck S: DNA methylation profiling of humanchromosomes 6, 20 and 22. Nat Genet 2006, 38(12):1378–1385.34. Smith ZD, Chan MM, Mikkelsen TS, Gu H, Gnirke A, Regev A, Meissner A: Aunique regulatory phase of DNA methylation in the early mammalianembryo. Nature 2012, 484(7394):339–344.35. Brenet F, Moh M, Funk P, Feierstein E, Viale AJ, Socci ND, Scandura JM: DNAmethylation of the first exon is tightly linked to transcriptional silencing.PLoS One 2011, 6(1):e14524.36. Laird PW: Principles and challenges of genomewide DNA methylationanalysis. Nat Rev Genet 2010, 11(3):191–203.37. Beyan H, Down TA, Ramagopalan SV, Uvebrant K, Nilsson A, Holland ML,Gemma C, Giovannoni G, Boehm BO, Ebers GC, Lernmark A, Cilio CM, LeslieRD, Rakyan VK: Guthrie card methylomics identifies temporally stableepialleles that are present at birth in humans. Genome Res 2012,22(11):2138–2145.38. Karolchik D, Hinrichs AS, Furey TS, Roskin KM, Sugnet CW, Haussler D, KentWJ: The UCSC Table Browser data retrieval tool. Nucleic Acids Res 2004,32(Database issue):D493–D496.39. Wang Y, Leung FC: An evaluation of new criteria for CpG islands in thehuman genome as gene markers. Bioinformatics 2004, 20(7):1170–1177.40. Aboyoun P, Pages H, Lawrence M: GenomicRanges: Representation andmanipulation of genomic intervals. R package version 1.6.7.41. Yuen RK, Penaherrera MS, von Dadelszen P, McFadden DE, Robinson WP:DNA methylation profiling of human placentas reveals promoterhypomethylation of multiple genes in early-onset preeclampsia. Eur JHum Genet 2010, 18(9):1006–1012.42. Du P, Zhang X, Huang CC, Jafari N, Kibbe WA, Hou L, Lin SM: Comparisonof Beta-value and M-value methods for quantifying methylation levelsby microarray analysis. BMC Bioinformatics 2010, 11:587.43. Du P, Kibbe WA, Lin SM: lumi: a pipeline for processing Illuminamicroarray. Bioinformatics 2008, 24(13):1547–1548.44. Maksimovic J, Gordon L, Oshlack A: SWAN: Subset-quantile within arraynormalization for illumina infinium HumanMethylation450 BeadChips.Genome Biol 2012, 13(6):R44.45. Holger S: siggenes: Multiple testing using SAM and Efron's empiricalBayes approaches. R package version 1.28.0. 2011.doi:10.1186/1756-8935-6-4Cite this article as: Price et al.: Additional annotation enhances potentialfor biologically-relevant analysis of the Illumina InfiniumHumanMethylation450 BeadChip array. Epigenetics & Chromatin 2013 6:4.Submit your next manuscript to BioMed Centraland take full advantage of: • Convenient online submission• Thorough peer review• No space constraints or color figure charges• Immediate publication on acceptance• Inclusion in PubMed, CAS, Scopus and Google Scholar• Research which is freely available for redistributionSubmit your manuscript at www.biomedcentral.com/submitPrice et al. Epigenetics & Chromatin 2013, 6:4 Page 15 of 15http://www.epigeneticsandchromatin.com/content/6/1/4

Cite

Citation Scheme:

        

Citations by CSL (citeproc-js)

Usage Statistics

Share

Embed

Customize your widget with the following options, then copy and paste the code below into the HTML of your page to embed this item in your website.
                        
                            <div id="ubcOpenCollectionsWidgetDisplay">
                            <script id="ubcOpenCollectionsWidget"
                            src="{[{embed.src}]}"
                            data-item="{[{embed.item}]}"
                            data-collection="{[{embed.collection}]}"
                            data-metadata="{[{embed.showMetadata}]}"
                            data-width="{[{embed.width}]}"
                            async >
                            </script>
                            </div>
                        
                    
IIIF logo Our image viewer uses the IIIF 2.0 standard. To load this item in other compatible viewers, use this url:
http://iiif.library.ubc.ca/presentation/dsp.52383.1-0223974/manifest

Comment

Related Items