UBC Faculty Research and Publications

On the identification of potential regulatory variants within genome wide association candidate SNP sets Chen, Chih-yu; Chang, I-Shou; Hsiung, Chao A; Wasserman, Wyeth W Jun 11, 2014

Your browser doesn't seem to have a PDF viewer, please download the PDF to view this item.

Item Metadata


52383-12920_2014_Article_489.pdf [ 2.72MB ]
JSON: 52383-1.0228396.json
JSON-LD: 52383-1.0228396-ld.json
RDF/XML (Pretty): 52383-1.0228396-rdf.xml
RDF/JSON: 52383-1.0228396-rdf.json
Turtle: 52383-1.0228396-turtle.txt
N-Triples: 52383-1.0228396-rdf-ntriples.txt
Original Record: 52383-1.0228396-source.json
Full Text

Full Text

RESEARCH ARTICLE Open AccessOn the identification of potential regulatoryvariants within genome wide associationcandidate SNP setsChih-yu Chen1,2, I-Shou Chang3, Chao A Hsiung4 and Wyeth W Wasserman1,5*AbstractBackground: Genome wide association studies (GWAS) are a population-scale approach to the identification ofsegments of the genome in which genetic variations may contribute to disease risk. Current methods focus on thediscovery of single nucleotide polymorphisms (SNPs) associated with disease traits. As there are many SNPs withinidentified risk loci, and the majority of these are situated within non-coding regions, a key challenge is to identifyand prioritize variants affecting regulatory sequences that are likely to contribute to the phenotype assessed.Methods: We focused investigation on SNPs within lung and breast cancer GWAS loci that reached genome-widesignificance for potential roles in gene regulation with a specific focus on SNPs likely to disrupt transcription factorbinding sites. Within risk loci, the regulatory potential of sub-regions was classified using relevant open chromatinand epigenetic high throughput sequencing data sets from the ENCODE project in available cancer and normal celllines. Furthermore, transcription factor affinity altering variants were predicted by comparison of position weightmatrix scores between disease and reference alleles. Lastly, ChIP-seq data of transcription associated factors andtopological domains were included as binding evidence and potential gene target inference.Results: The sets of SNPs, including both the disease-associated markers and those in high linkage disequilibriumwith them, were significantly over-represented in regulatory sequences of cancer and/or normal cells; however,over-representation was generally not restricted to disease-relevant tissue specific regions. The calculated regulatorypotential, allelic binding affinity scores and ChIP-seq binding evidence were the three criteria used to prioritizecandidates. Fitting all three criteria, we highlighted breast cancer susceptibility SNPs and a borderline lung cancerrelevant SNP located in cancer-specific enhancers overlapping multiple distinct transcription associated factorChIP-seq binding sites.Conclusion: Incorporating high throughput sequencing epigenetic and transcription factor data sets from bothcancer and normal cells into cancer genetic studies reveals potential functional SNPs and informs subsequentcharacterization efforts.Keywords: GWAS, Lung cancer, Regulatory regions, Gene regulation, Transcription factor binding site alteration,Enhancer, Topological domains* Correspondence: wyeth@cmmt.ubc.ca1Centre for Molecular Medicine and Therapeutics, Child and Family ResearchInstitute, University of British Columbia, Vancouver, British Columbia, Canada5Department of Medical Genetics, University of British Columbia, Vancouver,British Columbia, CanadaFull list of author information is available at the end of the article© 2014 Chen et al.; licensee BioMed Central Ltd. This is an Open Access article distributed under the terms of the CreativeCommons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, andreproduction in any medium, provided the original work is properly credited. The Creative Commons Public DomainDedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article,unless otherwise stated.Chen et al. BMC Medical Genomics 2014, 7:34http://www.biomedcentral.com/1755-8794/7/34BackgroundGenome wide association studies (GWAS) examine com-mon genetic variants, typically single nucleotide polymor-phisms (SNPs), to detect statistical association with a traitacross a set of unrelated individuals. Using large-scaleSNP genotyping data, these studies compare the geneticmakeup between two groups of individuals, those withand without the phenotype or disease of interest. GWASfindings have made important contributions to under-standing of numerous disorders.Once disease associated regions are identified, the chal-lenge is to interpret the potential role of each of the asso-ciated SNPs, in order to prioritize candidate functionalvariants contributing to the phenotype. Around 90% ofthe variants in GWASdb, a repository of phenotype-associated SNPs identified in GWAS, are situated withinintergenic or intronic regions [1]. The analysis and inter-pretation of such variants are challenging, as sequence-specific functions, such as cis-regulatory elements, arepresent at low density in these regions and have beenlargely undetermined. The plethora of large-scale data setsgenerated with human cells creates a new opportunity toprobe the relationship between GWAS identified loci andregulatory sequence variants. Both DNase-seq and ChIP-seq datasets generated by consortia such as the ENCODEproject [2] have been key. Open chromatin regions de-fined by DNase I hypersensitivity assays are enriched forthe presence of regulatory regions such as promoters andenhancers. Histone modifications such as mono- andtri- methylation of the lysine at position 4 of histone 3(referred to as H3K4me1 and H3K4me3 respectively)are associated with enhancer and promoter positions[3,4], while acetylation of the lysine at position 27 of his-tone 3 (H3K27ac) reflects active utilization of the regions[5]. Extensive research into the relationships between di-verse chromatin modifications and functional roles of themarked DNA segments is ongoing.The interpretation of cancer GWAS may be particu-larly impacted by the study of non-coding regions, ascancer can be considered, at the fundamental level, adisorder of gene regulation. Differential activities of dis-tal regulatory elements, such as enhancers, have beenidentified at prostate and colon cancer risk variants fromGWAS [6-8]. Using H3K4me1 ChIP-seq datasets incolon cancer and normal cells, Akhtar-Zaidi et al. identi-fied thousands of variants with loss or gain of H3K4me1,and found them to comprise a signature that is predict-ive of in vivo colon cancer gene expression patterns [6].Gerasimova et al. successfully predicted functional SNPscontributing to asthma, in part by taking into accounttissue-specificity of enhancers using epigenetic datasets[9]. Paul et al. showed enrichment of SNPs associatedwith hematological traits within nucleosome depleted re-gions of hematopoietic cells [10]. One previous studysuccessfully coupled disease associated SNPs to regulatorysequence annotation by pooling and analyzing datasetsfrom multiple cell types to focus on potential regulatorySNPs [11]. As GWAS derived disease-associated SNPs aremost commonly found in non-coding regions, incorporat-ing regulatory sequence annotations into the interpret-ation process is anticipated to further the identification ofthe causal variations within GWAS loci.The identification of regulatory sequence variantsimpacting phenotype has received increasing research at-tention [12]. Initial methods focused on the identificationof mutations that disrupt transcription factor binding sites(TFBS) [13-15]. More specifically, the intersection ofGWAS and large-scale regulatory sequence annotationavailability has catalyzed the creation of tools focused onthe identification or ranking of potential regulatory vari-ants. Ward et al. created the online resource tool, Hap-loReg, to provide annotations for non-coding variantsthrough chromatin states of 9 cell types, conservation andimpact on predicted TFBS [16]. The RegulomeDB tool an-notates functional variation using a combination of highthroughput sequencing (HTS) data, TFBS predictions,eQTLs and enhancer information [17]. Coetzee et al. cre-ated a functional SNP annotator by incorporating EN-CODE TF and histone modification datasets within an Rpackage, FunciSNP, which was subsequently used in abreast cancer GWAS analysis [18,19]. The ChroMoS webserver, on the other hand, facilitates SNP prioritizationusing genetic and epigenetic data, and predicts differentialtranscription factor and miRNA binding [20].In this study, we introduce an approach to theprioritization of regulatory variants within GWAS de-fined loci. The methods are applied to GWAS cancersusceptibility SNP sets for lung, breast, prostate andcolorectal cancers. Based on the observed strong signa-ture of potential regulatory variants and HTS data avail-ability, we focus on the analysis of breast and lungcancer GWAS as models for the prioritization of non-coding functional variants. Our objective is to interpretpotential cell type-specific functionality of cancer GWASSNPs in non-coding regions by incorporating sequencemotif information and HTS datasets from the ENCODEproject (a workflow overview is presented in Figure 1).We expanded cancer susceptibility SNP sets to SNPs inhigh linkage disequilibrium (LD). After annotating regula-tory sequences based on data sets from cancer and normalcells, we assessed enrichment of the SNPs in regulatorysequences of relevant and non-relevant cell types. We de-tected significant TF binding affinity differences from pos-ition weight matrices (PWM) to narrow the focus towardfunctional SNPs in regulatory sequences that potentiallyresult in a difference in predicted binding status. ChIP-seqdata was also used to identify transcription associatedfactors (TAFs), including both sequence-specific DNAChen et al. BMC Medical Genomics 2014, 7:34 Page 2 of 15http://www.biomedcentral.com/1755-8794/7/34binding TFs and a broader set of proteins involved intranscription, whose binding may be affected by SNPs.Lastly we examined ENCODE RNA-seq data fromtumor and normal cells to report nearby differentiallyexpressed genes. In the breast cancer GWAS and a casestudy of a published lung cancer meta-analysis [21], wehighlight SNPs that fit all criteria and are situated withinpotential cancer-specific enhancers. Higher order chro-matin interaction data was analyzed to infer the poten-tial gene targets of the variants. Overall we prioritizedfunctional SNP candidates by integrating multiple levelsof information. The analysis process may serve as a gen-eral framework for the investigation of GWAS loci forpotential regulatory variants.MethodsThe bioinformatics analyses were done using R 3.0.2[22] and Bioconductor 2.13 [23] unless otherwise stated.UCSC GWAS SNPs and the corresponding LD80 SNPsWe obtained lung cancer, breast cancer, prostate cancerand colorectal cancer SNP lists from the gwasCatalogtable of UCSC database [2] collected by NHGRI [24],and applied the stringent threshold of P < 5×10−8 on forgenome-wide significance (gwasCatalog downloaded onMarch 17, 2014; Additional file 1). With the p-valuethreshold, the Lung.cancer set (in the trait column ofgwasCatalog) included SNPs from European GWAS on“Lung adenocarcinoma” and “Lung cancer”. The Breast.cancer set included SNPs from European GWAS on“Breast cancer”, “Breast cancer (male)”, and “Breast Cancerin BRCA1 mutation carriers”. The Colorectal.cancerand Prostate.cancer sets included European GWAS on“Colorectal cancer” and “Prostate cancer”, respectively.We note that some of the studies included meta-analysis. We focused on European descent studies toavoid potential differences in epigenetic status amongdifferent ethnicities. All data sets were obtained frompublished studies, thus no specific ethics approval wasrequired for the study.To account for potentially biased SNP selection of theSNP arrays, we obtained all SNPs in linkage disequilib-rium with the SNPs of interest, using the SNAP webtool[25]. The search distance was limited to 500 kilobasesfor simplicity, and the r2 threshold was set to 0.80 onSNAP for 1000 Genomes project (Pilot 1), Phase III Hap-Map (release 2), and Phase II HapMap (release 22 and 21)data sets to expand to variants in strongest LD to diseasesusceptibility SNPs. A union and unique set of variants,which we referred to as LD80, was compiled from thesequeries for each GWAS SNP set. Since all SNPs in this re-port were collected from European descent studies, LD80SNPs were obtained using r2 values with the CEU (Utahresidents with ancestry from northern and westernEurope) population. To eliminate duplicate or obsoleteSNPs, the LD80 SNPs were further filtered for uniquegenomic coordinates in hg19 through BioMart, and ana-lyses were conducted on these LD80 SNP sets.High throughput sequencing data from ENCODEThe HTS datasets of H1 embryonic stem (ES) cells,A549 lung cancer cells and its ‘healthy equivalent’ NHLF(Normal Human Lung Fibroblasts), MCF-7 breast can-cer and its ‘healthy equivalent’ HMEC (Human Mam-mary Epithelial Cells), LNCaP prostate cancer and its‘healthy equivalent’ PrEC (Human Prostate EpithelialCells), and two colorectal cancer (Caco-2 and HCT-116)cell lines were obtained from the ENCODE project inthe hg19 genomic build [2]. We included in our analysisthe datasets from H1 ES cells, as GWAS SNPs associ-ated with various cancers have been reported to beenriched in ES cell enhancers [26]. Regions that were inopen chromatin as shown by DNase-I hypersensitivitydata, and occupied by H3K4me3 and CTCF (an insulatorbinding protein) were collected along with bound re-gions for multiple transcription factors. RNA-seq expres-sion data of A549, NHLF, MCF-7, and HMEC cell lineswere obtained for expression analysis. Also, the TAFChIP-seq peaks of the cell lines (A549, H1 ES, HCT-116,MCF-7 cells) were retrieved, and overlapping peaksacross the replicates of each TF within each cell linewere required for stringency. No TAF data was identifiedfor NHLF, HMEC, LNCaP, PrEC and Caco-2 cells. De-tailed information on the retrieved public datasets wasprovided in Additional file 2.Figure 1 Overview of regulatory variant discovery workflow.The analysis workflow takes as input a list of SNPs identified ingenome wide association studies, diverse high-throughput sequencingdata related to the delineation of cis-regulatory sequences, andposition weight matrices (PWMs). The input SNP lists are extended toSNPs in high linkage disequilibrium (LD). Functionality of each SNP isevaluated through the three criteria (regulatory potential, TF bindingaffinity and binding evidence). The output is a set of candidate variantsthat display characteristics consistent with a cis-regulatory role in thedisease process.Chen et al. BMC Medical Genomics 2014, 7:34 Page 3 of 15http://www.biomedcentral.com/1755-8794/7/34Annotation of regulatory sequencesIn order to interpret the functionality associated withthe SNPs, regulatory sequences were annotated in H1embryonic stem cells, A549 lung cancer, NHLF lungnormal, MCF-7 breast cancer, HMEC breast normal,LNCaP prostate cancer, PrEC prostate normal, Caco-2and HCT-116 colorectal cancer cell lines. Where datawere available, open chromatin regions were specified byDNase I hypersensitive regions from DNase-seq, poten-tial promoters by H3K4me3 ChIP-seq regions, potentialinsulators by ChIP-seq peaks of CTCF, and finally poten-tial enhancers (pEnh) by DNase I regions lackingH3K4me3 in non-exonic regions.Genomic functional categoriesThe Ensembl transcripts from ENSEMBL GENES 71were used for annotation information to specify the lo-cations of exons, untranslated regions (UTRs), transcrip-tion start and termination sites (TSS and TTS). SNPswithin 10 kb upstream of TSSs were labeled as upstreamregions, and SNPs within 10 kb downstream of TTSswere labeled as downstream regions. SNPs falling intomore than one category were assigned in the priorityorder from high to low: 5′ UTR, 3′ UTR, exons, genic,upstream, downstream and intergenic.Enrichment tests of SNPs in regulatory sequencesIn order to test the enrichment of SNPs in regulatory se-quences, we randomly drew from the Illumina 660KSNP Array (widely used in GWAS) an equal number ofvariants with the overall distributions matching three at-tributes of the LD80 SNP sets: minor allele frequency,GC content in the region +/-500 bps from the variantand distance to the nearest TSS. We separated each at-tribute into 20 percentile bins and repeated matcheddrawings 1,000 times. Using these background sets, wedetermined the distributions of the number of SNPsoverlapping each annotated experimental dataset (e.g.open chromatin, H3K4me3, etc.). We then obtained thep-value of each SNP list with respect to each dataset bycomparing the true foreground overlapping counts tothe background distributions. Multiple hypothesis test-ing adjustment was conducted using the ‘qvalue’ R pack-age [27].Differential TF binding affinity analysis using PWMsPWM scores have been repeatedly demonstrated to havestrong correlation with the sequence specific binding en-ergy of the modeled TF using the PWM scoring proced-ure described in [28,29]. We computed the TF affinityscores on 30 base pairs up- and down- stream of eachSNP with major and minor alleles using PWMs from theJASPAR 2010 database [30]. For binding site predictionstringency, we used 90 out of 130 vertebrate PWMs withinformation content greater than 10. The best scoreoverlapping each SNP for each TF was retained, and wetook the differences between PWM scores of the majorand minor alleles to represent the difference in bindingaffinity of each TF. We randomly selected 10,000 SNPsfrom Illumina 660K SNP array that had the same singlenucleotide variation, similar GC content (+/-500 bps)and similar distance to the nearest TSS as the studiedset of SNPs, and used the TF binding affinity differencesof these random SNPs as the background (i.e. we matchedthe proportions of every possible nucleotide substitution,and 20 percentile bins of GC content and distance to thenearest TSS between the LD80 SNP set and the randomlydrawn sets). Using such background distribution of scoredifferences for each PWM, we obtained a two-tailed em-pirical p-value of the difference in TF binding affinityfor each LD80 SNP of interest. SNPs with empirical p-values ≤ 0.05 were considered to induce differential af-finity for such TF. To further distinguish meaningfuldifferential affinity that potentially infers enhancementor disruption of a TF binding site, we included an add-itional criterion of PWM scores greater than 80 foreither allele.Regulatory potential indexFor each of the high throughput sequencing data, inter-section of regions from replicated data sets was taken,and values were compiled using geometric means acrossreplicates. To obtain a quantitative score for the regula-tory sequences, we computed regulatory potential index(RPI) for each cell line by summing the following valuesfor each SNP: the reads per kilobase per million sequen-cing reads from DNase-seq, the average enrichment valuesfrom H3K4me3 and CTCF ChIP-seq data from the EN-CODE consortium. The relative regulatory potential wascomputed as log2 of (RPIcancer + 1)/(RPInormal + 1). In thecase of lung cancer, the relative regulatory potential waslog2((RPIA549 + 1)/(RPINHLF + 1)). We note that for colo-rectal cancer, data sets were only available in two cancercell lines, in such case the relative regulatory potential wascomputed from the two cancer cell lines and do not re-flect comparison between cancer and normal cell lines.Case study of a lung cancer meta-analysisLandi et al. published a meta-analysis of lung cancerGWAS [21]. The Lung.Meta SNP list was generatedthrough a meta-analysis of 11 lung cancer GWAS com-bining histological types with 13,300 primary lung can-cer cases and 19,666 controls of European descent. It isa permissive list that is inclusive of all eight SNPs fromLung.cancer collection introduced above. The publishedset was based on a threshold of P < 8×10−5, which wetook as reported for a case study to demonstrate howthe regulatory analyses can be applied to borderline lungChen et al. BMC Medical Genomics 2014, 7:34 Page 4 of 15http://www.biomedcentral.com/1755-8794/7/34cancer candidate variant prioritization. We note thatthe published threshold is less stringent than the thresh-old applied to the UCSC collections, and greater cautionis therefore required in assessing the reliability of thecandidate.Topological domains and chromatin interactions fromHi-C datasetsWe downloaded topological associating domains (TADs)and mapped reads of the chromatin interaction datasetsin H1 human ES cells and IMR90 fibroblast cells fromHi-C experiments conducted by Dixon et al. using therestriction enzyme HindIII [31] (summary files obtainedthrough GEO: GSE35156). We lifted the genomic coor-dinates of TADs and paired-end reads originally mappedto reference human genome hg18 to hg19 build usingthe liftOver tool [32]. As the number of paired reads be-tween two genomic regions reflects the degree of inter-actions, and Hi-C technique is limited by resolution, wecounted the numbers of paired reads in 20 kilobase binsand plotted the result using the ‘HiTC’ R package [33].ResultsCancer susceptibility SNPs frequently occur in non-codingregionsWe retrieved cancer susceptibility SNP sets of multiplelung, breast, colorectal, and prostate cancer studiesthrough the GWAS catalog [24]. Using the SNAP webt-ool [25], we expanded from the reported SNPs in eachset to include those SNPs in high linkage disequilibriumin order to account for potentially biased SNP selectionof the SNP arrays. We incorporated such SNPs with anr2 greater than 0.8 in the CEU population from either ofthe 1000 Genomes Project Consortium or the HapMapproject (see Methods). We referred to these SNPs inaddition to the reported SNPs as LD80, and conductedanalyses on the LD80 sets for each GWAS study. Intotal, we compiled the LD80 sets of 219 lung, 1798breast, 1197 prostate and 253 colorectal cancer SNPs(Figure 2). To determine how frequently the cancer sus-ceptibility SNPs occur in non-coding regions of the gen-ome, we assessed the distributions of genomic functionalcategories for each set (Figure 2). We found that over80% of the SNPs were located in non-coding regionsconsistently for all LD80 sets.Delineating potential regulatory sequences of thegenome in different cell typesIn order to assess the potential regulatory roles of SNP-containing segments, in both cancer and normal cellconditions, we analyzed a diverse group of features pro-filed in ENCODE project HTS data sets from cells rele-vant to the GWAS (detailed in Methods), includingDNase-seq, H3K4me3, and CTCF ChIP-seq data. Asshown in Additional file 3A, open chromatin regions de-tected by DNase I sensitivity experiments are largely in-clusive of the promoter regions marked by H3K4me3. Inagreement with previous literature [34], a large propor-tion of CTCF bound regions and H3K4me3 marks areshared among cell types, whereas DNase-seq data setsare in greater variation. The CTCF bound regions inNHLF cells are noted to differ from other cell types. Wedefined regulatory sequences using these markers previ-ously associated with promoters, enhancers and insula-tors (see Methods). The coverage percentages of thegenome for all categories are similar across cell types(Additional file 3B).Cancer susceptibility SNPs are enriched in regulatorysequencesTo investigate the functions associated with regionsencompassing the LD80 sets, we tested whether theSNPs are enriched in regulatory sequences both in rele-vant cells and across tissues for comparison using a totalof 32 datasets. Through comparison to randomly drawnSNPs with matching minor allele frequency, GC contentand distance to the nearest TSS from the Illumina 660KFigure 2 Distributions of genomic functional categories ofcancer GWAS LD80 SNP sets. For each SNP in the correspondingGWAS LD80 SNP set, the genomic functional category wasdetermined based on genomic annotation, and the overallproportions were shown in the plot. Categories included coding, 5′untranslated and 3′ untranslated portions of exons, as well asintronic, intergenic and upstream or downstream proximals (within10 kb of the TSSs or TTSs). The distribution of Illumina 660K SNParray was presented as a background. Numbers above the chartshowed the corresponding total SNP counts of each LD80 SNP set.Chen et al. BMC Medical Genomics 2014, 7:34 Page 5 of 15http://www.biomedcentral.com/1755-8794/7/34genotyping array, we found the GWAS LD80 SNPs to beenriched in regulatory sequences of cancer and/or nor-mal cells. Due to the nature of GWAS, the multiplehistological types of cancer and the relevancy of theavailable cell lines to the study, significant enrichment inthe cell types and type of regulatory sequences variedbetween the LD80 sets (Figure 3, Additional file 4). Ther2 threshold of 0.80 was selected to expand the SNP lists,alternatively a more stringent 0.95 threshold still showeda lesser degree but similar significance in enrichmenttests of regulatory sequences (Additional file 5).Overall, the Prostate.cancer and Breast.cancer LD80sets showed the most frequent enrichment for regulatorysequences found both within and across tissues, indicat-ing that these SNPs were disproportionately situatedwithin active regions shared among cell types (Figure 3).The Prostate.cancer and Breast.cancer sets were signifi-cantly enriched in 22 and 20 categories of regulatory se-quences (q-value <0.05), respectively (Additional file 4B,C). The Lung.cancer set was most strongly enriched inthe promoter regions of multiple cancer cell lines, MCF-7, LNCaP, Caco-2 and A549 as well as normal cell lines,such as HMEC and NHLF (Figure 3, Additional file 4A).The Breast.cancer set was most strongly, but not exclu-sively, enriched in the open chromatin regions of MCF-7breast cancer, A549 lung cancer and HMEC breast nor-mal cells (Figure 3, Additional file 4B). The Prostate.can-cer set was enriched most strongly in open chromatinregions of LNCaP prostate cancer, PrEC prostate normaland all other cancer cell lines. Enrichment was alsonoted for promoter and enhancer regions of multiplecell types as well as CTCF binding sites in Caco-2 cells(Figure 3, Additional file 4C). Due to data availability,regulatory sequences of two colorectal cell lines were ex-amined for colorectal LD80 SNPs. Enrichment was ob-served most strongly in active transcribed regionsmarked by H3K36me3 of Caco-2 colorectal cancer cells,open chromatin regions of HMEC and NHLF normalcell lines, and promoter regions of HCT-116 colorectalcancer cells (Figure 3, Additional file 4D). Interestingly,we observed a depletion of the inactive mark, H3K27me3,in Caco-2 colorectal cancer cell line in all four SNP sets.Consequence of the SNPs on TF binding affinity scoresAltered binding affinity has been shown to have substan-tial impacts on the contributions of TFBS [35]. In orderto assess the potential impact of SNPs on TF binding,we next scored the predicted differential TF binding af-finity between alleles of each SNP using PWMs. We re-ferred to SNPs that result in a significantly higher orlower PWM scores compared to the major alleles ashaving an increase or decrease of binding affinity for thespecific TF, respectively. The impact of a SNP on TFbinding affinity was defined by comparing the observedscore difference to the distribution of PWM score differ-ences from 10,000 matched randomly selected SNPs fromthe Illumina 660K SNP array (see Methods). We reportedthe empirical p-values which reflected the random chanceof having such a difference between two alleles at a TFbinding site in Additional file 6. We note that a SNP allelecan be found with an increase of TF binding affinity forone TF and a decrease for another. The percentages ofSNPs with significant (p < 0.05) predicted differential TFFigure 3 Heatmap illustration of enrichment of LD80 SNPs inregulatory sequences. The figure displays the degrees ofenrichment significance in regulatory sequences for GWAS SNPsextended to SNPs with r2 > =0.80 (LD80). The evaluated LD80 SNPsets are indicated across the horizontal axis. The y-axis indicates thecells of origin and feature data sets that reflects regulatory sequences(all from the ENCODE consortium). Vertical and horizontal side bars arecolored according to tissue types and whether it is data from a canceror normal cell line. Enrichment testing was done by comparing thetrue foreground overlapping count of each SNP set with each featuredata to distributions of overlapping counts by randomly selected SNPsets with matching minor allele frequencies, GC content (+/-500 bps)and distance to the nearest TSSs repeated 1000 times. Multiplehypothesis-adjusted q-values were computed. The enrichment of SNPlists within each feature is colored with a transformed value frommultiple hypothesis adjusted q-values: -1x(log10 (q-values +0.0001)).Highly enriched feature and SNP list pairs are colored in yellow, andnon-enriched pairs are colored in red.Chen et al. BMC Medical Genomics 2014, 7:34 Page 6 of 15http://www.biomedcentral.com/1755-8794/7/34binding affinity for at least one TF were 36 for Lung.can-cer, 26 for Breast.cancer, 20 for Prostate.cancer, and 30 forColorectal.cancer. For example, the Sox17 motif was fre-quently found to have significant differential binding affin-ity in breast, prostate and colorectal LD80 sets. Ourresults showed that the SNPs identified in multiple cancertypes can alter TF binding affinity as shown by significantchanges in PWM scores.Prioritizing functional SNPs using regulatory potentialand TF binding affinityTo prioritize functional SNPs, we examined the relativeregulatory potential by comparing cancer to normal celllines where data is available (detailed in Methods) anddifferences in predicted TF binding affinity by comparingmajor and minor alleles. We did not restrict the TF affinityanalysis to SNPs present in the cancer cell lines weworked with, and we assumed the regulatory potential inthe cancer cell lines shows whether regions are active andaccessible in the corresponding cancer cells. Figure 4showed the functional prioritization plot for Lung.cancerand Breast.cancer due to higher data availability, and plotsfor other LD80 sets were provided in Additional file 7.In Figure 4, the SNP-impacted TFBS in quadrants Iwere consistent with the presence of a stronger TFBS inregulatory regions preferentially observed in the cancersamples, quadrant II with the presence of a strongerTFBS in regulatory regions preferentially observed innormal cells, quadrant III with the presence of a weakerTFBS in regions preferentially observed in normal cells,A CB DFigure 4 Differences in regulatory potential and allelic TF binding affinity for Lung.cancer and Breast.cancer LD80 SNPs. The plotspresent potentially affected TFBS, with the upper panel (A & C) displaying SNPs that confer stronger TFBS patterns in cancer patients with theminor allele while the lower panel (B & D) displayed an decrease in TF binding affinity. The x-axis represents the relative regulatory potential,defined as log2 ratio of regulatory potential index between cancer and normal cells plus 1. The relative regulatory potential is indicated aspositive for higher regulatory potential in cancer cells (A549 for A and B; MCF-7 for C and D) and negative for higher regulatory potential in thecorresponding normal cells (NHLF normal lung fibroblasts for A and B; HMEC breast normal cells for C and D). The y-axis shows the -1xlog2transformation of empirical p-values for motif affinity score changes. The data shown on the plot are restricted to PWMs with p-values<0.05 fromthe two-tailed test, and for visualization purposes, only PWMs with scores > 85 in at least one allele are shown. TFs with an increase or decreaseof TF binding affinity where the SNP has non-zero regulatory potential in either cancer or normal cells are labeled along with the correspondingSNP. SNPs with zero regulatory potential index in both cells are represented by gray dots, whereas those with regulatory potential indices >0 inboth cells are colored in blue. SNPs with regulatory potential index restricted to a single cell type (cancer or normal cells) are colored in red andgreen, respectively. In plot C, a red arrow indicates a SNP rs1391720 that is discussed in the text. The vertical bar illustrates the degree of difference inTF affinity.Chen et al. BMC Medical Genomics 2014, 7:34 Page 7 of 15http://www.biomedcentral.com/1755-8794/7/34and quadrant IV with presence of a weaker TFBS in re-gions preferentially observed in cancer cells (i.e. loss ofa silencing TFBS). The magnitude of relative regulatorypotential observed in the Breast.cancer set was higherthan that of the Lung.cancer set. We found that an in-crease of TF binding affinity in the minor allele was notnecessarily associated with a gain of regulatory potentialin the cancer cell line, and vice versa.Prioritizing functional SNPs using all three criteriaTo further evaluate how TAF occupancy may aid in pri-oritizing functional SNPs, we inspected the relationshipbetween relative regulatory potential and the number ofoverlapping TAF occupied regions reported in ChIP-seqdata. We gathered available TAF ChIP-seq data from theENCODE project in the cell types we investigated.Counts of datasets overlapping each SNP were generallyhigher for SNPs with regulatory potential in both celltypes (blue dots), whereas the SNPs with regulatory po-tential restricted to either cancer or normal cells (red orgreen dots) were occupied by less TAFs (Lung.cancerand Breast.cancer sets shown in Figure 5, plots for otherLD80 sets were provided in Additional file 7). This resultis expected as common regulatory regions that are openand accessible in multiple cell types have higher chancesof detectable TAF binding.We listed SNPs with regulatory potential and TF affinitydifferences as well as TAF binding evidence in Additionalfile 6. The nearest differentially expressed genes wereindicated based on RNA-seq data, where available.Edwards et al. compiled a list of functional genetic vari-ants and/or target genes that were identified fromGWAS, for which regulatory functions of variants wereexperimentally verified through diverse methods suchas electrophoretic mobility shift assays, reporter assays,and more [36]. As a validation of our prioritization ap-proach, we compared our results to the Edwards’ list(Additional file 8). For the 5 Edwards’ SNPs that werepresent in the datasets we analyzed, at least one (but neverall) of the three criteria was fulfilled (i.e. differential regu-latory potential, predicted differential TF binding affinityor overlap with TAF ChIP-seq regions).For those variants meeting all three criteria, we furtherapplied a threshold of ±5 to the relative regulatory po-tential score and a threshold of 6 to the number of over-lapping TAF ChIP-seq data (both parameter values wereset based on the 95th percentile of all examined SNPs).While 12 regulatory variants emerged in the Breast.can-cer set, none of the Lung.cancer variants met the thresh-olds. Within the breast cancer candidates, a set of threehighly correlated SNPs stood out among the highestranking with interesting cancer-related characteristics:rs1292011 (GWAS P = 9×10−22 [37]), rs1391720 andrs1391721 (highlighted in Figure 5). While all three SNPshad differential TF affinity of different TFs, the rs1391720showed the strongest differential TF affinity for a nuclearreceptor protein, Nr2e3 (an increase from 72.4 to 87.9 inPWM scores; highlighted by a red arrow in Figure 4).These SNPs are located in a gene-sparse region with mul-tiple long intergenic non-coding RNAs (lincRNAs) within200 kb of distance, and are over 714 kb upstream of theT-box 3 gene involved with developmental processes,TBX3. The SNPs are within a potential cancer-specificenhancer observed to be in open chromatin for multiplecancer cell lines such as MCF-7, A549 and LNCaP, butnot in normal NHLF, HMEC and PrEC cell lines (Additionalfile 9). All three SNPs overlap with binding sites of 9unique TAFs in MCF-7 cells including Sin3Ak-20, GATA3,Figure 5 Visualizing Lung.cancer and Breast.cancer LD80 SNPs with TAF ChIP-seq binding data. The relative regulatory potential is plottedalong the x-axis, as in Figure 4. The y-axis displays the number of TAF ChIP-seq data sets reporting binding in multiple cells examined in thisstudy: A549, H1 embryonic stem, HCT-116, MCF-7 cells. Each dot represents a SNP within the Lung.cancer (A) and Breast.cancer (B) LD80 lists.SNPs with zero regulatory potential indices in both cells are represented in gray dots, whereas those with regulatory potential in both cancer andnormal cells are labeled and colored in blue. SNPs with only regulatory potential observed in cancer or normal cells are colored in red and green,respectively. The red arrows in B highlight a set of correlated SNPs, rs1391720, rs1391721 and rs1292011 that overlaps 15 to 16 TAF ChIP-seqpeaks. ChIP-seq datasets used are detailed in the supplementary information (Additional file 2).Chen et al. BMC Medical Genomics 2014, 7:34 Page 8 of 15http://www.biomedcentral.com/1755-8794/7/34Rad21, CTCF, HDAC2, HA-E2F1, NR2F2, ZNF217, andthe enhancer marking p300 [38].A case study: functional interpretation of a borderlinelung cancer relevant SNP located in a cancer-specificenhancerAs we did not find regulatory candidates in the Lung.can-cer set, we next applied the methodology within a specificcase study from a meta-analysis of lung cancer GWAS(Lung.Meta) [21]. We note that the significance threshold(P < 8×10−5) applied in the original publication was lessstringent, and the reader should apply discretion in evalu-ating such borderline candidates. Through incorporatingthe regulatory sequence information as well as differencesin TF binding affinity to interpret non-coding SNPs, weidentified a specific SNP with interesting cancer-relatedcharacteristics (Additional files 10 and 11). The SNPrs12087869 from the Lung.Meta LD80 set was among thehighest ranking across all three criteria: regulatory poten-tial differences, predicted TF binding affinity differences,and overlap with TAF ChIP-seq binding sites in A549 cells(Figure 6). It is located 60 kb upstream of a tyrosine-protein kinase transmembrane receptor gene, ROR1. TheSNP lies within a potential cancer-specific enhancer ob-served as open chromatin for multiple cancer cell linessuch as A549, LNCaP and MCF-7, but not in normalNHLF and HMEC cell lines (Figure 6; Additional file 12).Although open chromatin was detected in the PrECprostate epithelial cell line, the magnitude was muchlower than those of cancer cells. The H3K4me1 andH3K27ac data from the A549 and NHLF cell lines fur-ther indicate the region to have enhancer potential inlung cancer cells but not in the normal cells. rs12087869overlaps with binding sites of eight unique TAFs in A549cells including c-Myc, Max, and the enhancer markingp300 [38]. In comparison to the non-risk allele rs12087869-T, we found the risk allele rs12087869-C to increase thepredicted binding affinity of TLX1::NFIC, MAX and Myc(i.e. the PWM scores rise from 71.6 to 80.2, 69.6 to 82.2,and 78.1 to 90.5, respectively) (Figure 6B; Additional file13). The A549 cell line does not contain the Lung.Metaassociated allele, nor does it display elevated ROR1 ex-pression relative to NHLF cells (RNA-seq data shown inAdditional file 12). Although ChIP-seq data of Myc andMax is not available for the NHLF cells, the presence ofoverlapping c-Myc and Max A549 ChIP-seq peaks andthe predicted increase of binding affinity in the GWASrisk allele are strongly suggestive of a mechanistic rolefor the rs12087869 SNP elevating risk for lung cancer.Inferring potential targets of a SNP using topologicaldomainsIn order to infer potential gene targets of the enhancercontaining the SNP, we used Hi-C chromatin interactiondatasets in cells where datasets were available, H1 andIMR90 cells [31]. Enhancers are known that target mul-tiple TSSs, and a recent large-scale enhancer studyacross human cell types has shown that 40% of inferredTSS-associated enhancers (computed from pairwise cor-relation of FANTOM5 CAGE data) target at least thenearest TSSs [39]. Such enhancer-TSS interactions varyacross cell types, and can be revealed through chromo-some conformation capture techniques. Recent studieshave shown that the boundaries of highly interactivegenomic neighbourhoods (topological associating do-mains; TADs) were highly consistent across cell types[31,40], whereas interactions between sub-TADs werecell type-specific [41]. Through examining the topo-logical domains and Hi-C chromatin interaction datagenerated by Dixon et al., genes that can potentially beaffected by an increase in TF binding affinity of thers12087869 risk allele include PGM1, ROR1, Mir-544,BC040909, AK096291 and UBE2U (Figure 7). Potentialtargets of the breast cancer susceptibility SNPs that wehighlighted in the previous section (rs1292011, rs1391720and rs1391721) include multiple lincRNAs, Metazoa_SRPand TBX3 (Additional file 14).DiscussionVariants in regulatory sequences that affect the tran-scription rate of target genes can have substantial impacton phenotype. While the interpretation of protein alteringdifferences has progressed rapidly, the identification ofcis-regulatory modifying mutations remains a challenge.The substantial challenges in moving from GWAS-mapped loci to specific causal variants may be in partderived from this limited capacity to identify regulatorySNPs. We have shown that functional annotation of theregions around non-coding SNPs can contribute to theinterpretation of cancer risk alleles arising from GWAS.We observe that cancer susceptibility SNPs are enrichedin regulatory sequences in the genome, and can be situ-ated in TF binding sites. Integrating genome-scale datasets and TF binding site analysis into the interpretationprocess can highlight key SNPs consistent with regula-tory roles. Such analysis was used here to highlight SNPsin ChIP-seq supported binding sites within cancer-specificenhancers. This work provides a general bioinformaticsapproach for the identification of regulatory variantswithin GWAS identified risk loci.Previously reported experimentally validated functionalSNPs that were evaluated in our study met one or two ofour three criteria: regulatory potential, TF binding affinityand TAF ChIP-seq binding. We highlighted a group ofthree highly correlated SNPs, rs1391720, rs1391721 andrs1292011 in the Breast.cancer set that fit all three criteria.These SNPs are located in a cancer-specific enhanceroverlapping 9 MCF-7 TAF ChIP-seq data, and are withinChen et al. BMC Medical Genomics 2014, 7:34 Page 9 of 15http://www.biomedcentral.com/1755-8794/7/34ABFigure 6 (See legend on next page.)Chen et al. BMC Medical Genomics 2014, 7:34 Page 10 of 15http://www.biomedcentral.com/1755-8794/7/34the TAD containing multiple lincRNAs and TBX3 whichis situated 714 kb away. Despite the vast distance, a pre-vious study has suggested rs1292011 to be potentiallymediating ER-positive breast cancer through an effecton TBX3 due to significant elevation of expression inplasma from individuals with breast cancer [42]. Allthree SNPs have differential predicted TF binding affin-ity of distinct TFs, but rs1391720 shows the strongestsignificance for an Nr2e3 binding site, which has beenpreviously reported to be an upstream regulator ofESR1 in breast cancer [43].As a case study, we also investigated the rs12087869SNP identified as a borderline candidate in a meta-analysis of lung cancer GWAS (Lung.Meta) that fit allthree criteria. The SNP is located in a cancer-specific en-hancer situated ~60 kb upstream of the cancer-associatedROR1 gene, which is one of the potential target genes lo-cated in the TAD containing the candidate SNP. ROR1 isFigure 7 Two-dimensional heatmap of chromatin interaction in the neighbourhood of the rs12087869 SNP. The figure shows Hi-Cchromatin interaction datasets in H1 human ES cells (upper) and IMR90 fibroblast cells (lower panel) obtained from Dixon et al. [31] in theneighbourhood of the rs12087869 SNP. The topological domains (TADs) from both cell types were shown to indicate genomic neighbourhoodof stronger within-domain interactions. The heatmap values indicated in a color scale correspond to the number of times that reads in two 20 kbbins were sequenced as a pair, with the red color indicating stronger interaction and white being little or no interaction. The 85 percentile readcounts (29 for H1 and 21 for IMR90 cells) were used as the upper limit for the heatmap to avoid color domination of extremely interactiveregions. This plot was generated using ‘HiTC’ R package, and the dotted lines were drawn to aid in visualizing the interactive domain in whichthe SNP is located. The TAD region (from H1 cells) containing the SNP is highlighted in a light pink box.(See figure on previous page.)Figure 6 Annotation features proximal to the rs12087869 SNP location from the Lung.Meta case study. Part (A) depicts annotationrelated to genetics, epigenetics, and TAF ChIP-seq peaks in proximity to the rs12087869 SNP in A549 lung cancer, NHLF normal and H1embryonic stem cell lines using the UCSC Genome Browser. The red vertical line highlights the location of the SNP. From the top of the figure,the genetic information includes the locations of the SNP and proximal genes, and copy number status in A549 cells. The chromatin informationshows the DNase I hypersensitive sites, occupancy sites of active histone modification marks (H3K4me1, H3K4me3, H3K27ac) in the cell lines. TheChIP-seq section shows the TAF-associated regions in A549 cells where data is available. Peaks of chromatin information and ChIP-seq sectionswere reported by the ENCODE project with the gray scale color reflecting the magnitude of open chromatin and binding. (B) The figure illustratesboth strands of the reference sequence within 15 base pairs of rs12087869, and locations of predicted TF binding sites for the reference andrisk alleles in solid and dotted lines, respectively. The motif logos for the binding properties of TLX1::NFIC, MAX and Myc are also depicted atrs12087869 risk allele all with increasing binding affinity. The variant within each binding sequence below each logo is underlined, and thepredicted Myc binding locations for the reference and risk alleles are different, whereas those of TLX1:NFIC and MAX were the same.Chen et al. BMC Medical Genomics 2014, 7:34 Page 11 of 15http://www.biomedcentral.com/1755-8794/7/34highly expressed in early embryonic stages and in diversecancers such as human breast cancer and B-cell chroniclymphocytic leukemia [44,45]. Another potential targetgene, Mir-544, has been reported to be associated withosteosarcoma, gastric cancer, and nasopharyngeal carcin-oma [46-48]. The enhancer containing the SNP is markedas active by both H3K4me1 and H3K27ac epigenetics datafrom A549 lung cancer cells, while these marks of activityare absent in NHLF. The enhancer overlaps ChIP-seqbinding peaks of 8 transcription associated factors inA549 cells, including the enhancer marking p300 co-activator and other TFs such as c-Myc, Max, and SP1which are linked to cancer development and growth[49-51]. Although the cancer cells profiled in large-scalestudies (e.g. A549) do not carry the risk allele, the datafrom the cells indicates that the SNP is situated within aregion that is open and accessible in multiple cancer cells(A549, MCF-7, and LNCaP). TLX1::NFIC, MAX and Mycare predicted to have stronger binding affinity in those in-dividuals with the risk allele, highlighting the potentialfunctional role of this SNP in lung cancer.One of the challenges in the study of regulatory se-quences is the relevance of experimental data collectedfrom diverse cells and tissues to the specific cancer celltype. We observe that cancer susceptibility GWAS SNPsare significantly over-represented in regulatory sequencesfrom diverse data sources – the enrichment is not re-stricted to the specific cancer or normal cell types under-lying each GWAS study. Such a finding is not in conflictwith a functional role, as a regulatory variant could con-tribute to the creation of an environment in which a can-cer is more likely to form or progress to detectable diseasewhether it is present in a regulatory sequences of canceror normal cells. The presence of the TFBS altering changemay impact expression in a phenotype-altering manner.From the GWAS SNP sets analyzed, the Breast.cancer andProstate.cancer LD80 SNP sets showed the highest andmost significant enrichment across regulatory regionsfrom multiple cell types. The Lung.cancer and Colorectal.cancer SNP sets had relatively fewer enriched categoriesof regulatory regions. In particular, we note that thesupport for the regulatory roles at the breast cancer sus-ceptibility SNP rs1391720 and the borderline lung can-cer relevant SNP rs12087869 were provided fromcancer-related data collections. Thus, if we had limitedthe analysis to normal cells, we would not have detectedthe functional SNP. It is our interpretation that in thesearch for cancer-related regulatory variants, it is appro-priate to consider a range of cell and tissue sources.This work fits into a spectrum of informatics efforts tobetter understand regulatory regions in the human genome.Building on high-throughput annotation of genome proper-ties such as epigenetics marks, DNase I hypersensitivity andTF binding, the informatics community has been rapidlycreating innovative methods for discrimination of regula-tory regions. At the TF affinity level, our initial TF affinitystudy used only the PWMs from JASPAR 2010, since then,JASPAR 2014 [52] has been made available. Future studiescould evaluate alternative sources such as HoCoMoCo [53]which aggregates profiles from multiple resources. At theregulatory sequence level, methods for scoring regulatorypotential such as ChromHMM and Segway incorporatediverse data types in order to predict likely regulatoryregions [3,54,55].Recently, three groups have applied informatics methodsto predict potential regulatory variants within risk loci fromspecific GWAS of asthma, breast cancer and blood traits[9,10,18]. Our work is complementary to these studies,showing that GWAS loci from diverse cancer studies areenriched for candidate regulatory SNPs and providing astructured procedure for their identification.The informatics methods and candidate variants pre-sented here highlight clear opportunities for furtherwork. For example, the cancer susceptibility regulatorySNPs highlighted in the report will require experimentalinvestigation. While in vitro binding studies of transcrip-tion factors with differential binding affinities could bepursued, it has been well established that computationalpredictions of binding affinity are highly correlated withexperimental measurements [56]. Therefore, we thinklonger-term it will be most appropriate to obtain rele-vant cancer cells and normal cells with the specific riskallele and measure TF binding in vivo, expression of po-tential gene targets (inferred from the Hi-C interactiondata and topological domains), and the small-scalechromosome conformation capture in the presence orabsence of the candidate regulatory variation. Beyondthe specific candidate, a key challenge for the field is thecomputational prediction of the relationship betweenenhancers and specific promoters, which would allowfor targeted experimental approaches.ConclusionsCancer GWAS SNP sets extended with high linkage dis-equilibrium are over-represented in cis-regulatory se-quences active in cancer and normal cells. The degree ofover-representation varied across assessed GWAS. Not-ably, such enrichment of the cancer susceptibility SNPsin normal cells highlights the potential contribution ofthe variants to the earliest phases of tumorigenesis. Ana-lysis of the differential TF binding affinity between theregular and risk alleles reveals functional SNPs with po-tential disruption or enhancement of TF binding. Wehighlight candidate regulatory SNPs, one with borderlinelung cancer susceptibility and three with significantbreast cancer susceptibility, within cancer-specific en-hancers with multiple overlapping TAF binding sites andsignificant increase in TF binding affinity scores with theChen et al. BMC Medical Genomics 2014, 7:34 Page 12 of 15http://www.biomedcentral.com/1755-8794/7/34risk allele. The methods constitute a framework to inter-pret functionality of GWAS loci candidate regulatorySNPs based on epigenetics, regulatory potential and TFbinding affinity properties.Additional filesAdditional file 1: Sources of GWAS SNP lists used. The table lists theliterature (PubMed ID) and sources from which we obtained thecorresponding GWAS SNP lists. All SNPs passed a genome-widesignificance threshold of P < 5×10−8, so we used and listed only thestudies with at least one SNPs passing the threshold from the gwasCatalogtable on UCSC. The Lung.cancer set included SNPs from European GWASon “Lung adenocarcinoma” and “Lung cancer” (in the trait column ofgwasCatalog). The Breast.cancer set included SNPs from European GWAS on“Breast cancer”, “Breast cancer (male)”, and “Breast Cancer in BRCA1 mutationcarriers”. The Colorectal.cancer set included European GWAS on “Colorectalcancer”. The Prostate.cancer set included European GWAS on “Prostate cancer”.Additional file 2: Additional HTS data from ENCODE used toindicate regulatory functions. The table lists the feature information,name, data type, cell type and data source of each high throughputsequencing dataset we obtained from the ENCODE project.Additional file 3: An overview of the data depicting thepercentages of overlapping regions between regulatory sequencesamong cell lines. The heatmap in part A shows the pair-wise percentagesof overlapping regions for every feature pair. Features from associated celltypes in our study are included. The strongest overlap is indicated in redand the weakest in blue, as depicted in the colour key. Features areclustered according to similarity in overlaps, and are labeled with the cellline names followed by the feature names. The “pEnh” term refers toputative enhancers. The percentage of the human genome covered byeach feature is shown in part B.Additional file 4: Enrichment of GWAS SNPs in all HTS datasurveyed. This illustration provides an alternative presentation of theenrichment result shown in Figure 3. Figures exhibit the overlap countsof LD80 SNPs obtained from Lung (A), Breast (B), Prostate (C) andColorectal (D) cancer GWAS with regulatory sequences in all cell typesexamined. The x-axis indicates the cells of origin and feature data setsthat reflects regulatory sequences (all from the ENCODE consortium). They-axis shows the number of SNPs in the set that are within each categoryof regulatory sequences. The “pEnh” term refers to putative enhancers.The red dots represent the actual overlapping counts of the SNP setswith regulatory sequences, and the boxplots represent the distributionsof overlapping counts by randomly selected SNP sets with matchingminor allele frequencies, GC content (+/-500 bps) and distance to thenearest TSSs repeated 1000 times. Multiple hypothesis-adjusted q-valueslower than 0.05 are noted above the boxplots.Additional file 5: Heatmap illustration of enrichment of LD95 SNPsin regulatory sequences. The figure displays the degrees of enrichmentsignificance in regulatory sequences for GWAS SNPs extended to SNPswith r2 > =0.95 (LD95). The x-axis represents the LD95 SNP sets, and they-axis represents data features from all cell types examined. Vertical andhorizontal side bars are colored according to tissue types and whether itis data from a cancer or normal cell line. The enrichment of SNP listswithin each feature is colored with a transformed value from multiplehypothesis adjusted q-values: -1x(log10 (q-values +0.0001)). Highly enrichedfeature and SNP list pairs are colored in yellow, and non-enriched pairs arecolored in red.Additional file 6: Results for functional prioritization of LD80 SNPsets. We list the prioritization results of all LD80 SNP sets examined inthe study in multiple tabs of the excel file. The columns include theinformation on the SNPs, regulatory potential in relevant cancer andnormal cell types (where data available), motif affinity differences(significant TFs concatenated with commas), overlap with TAF ChIP-seqdata, information on the nearest gene, and nearby differentially expressedtranscripts according to RNA-seq data (where data available). Log2((RPKMcancer + 1) / (RPKMnormal + 1)) were computed for TSSs within200 kb up- and down- stream of each SNP, and SNPs with greater than2 fold difference in expression between the relevant cancer and normalcells were reported.Additional file 7: SNP prioritizing plots of Prostate.cancer andColorectal.cancer LD80 SNPs. The file includes plots on differences inregulatory potential and allelic TF binding affinity (A-D) as well as TAFChIP-seq data (E-F) for Prostate.cancer and Colorectal.cancer LD80 SNPsin addition to the Lung.cancer and Breast.cancer LD80 SNP sets plottedin Figures 4 and 5.Additional file 8: Comparison to previously reviewed functionalgenetic variants and their target genes. The table summarizes thecomparison of our prediction results to the potential causal variants listedin the review by Edwards et al. 2013. A proportion of SNPs were notincluded in the input of our analysis. All SNPs that were included inour study met one or two (not all three) criteria in regulatory potential,TF binding affinity or TAF ChIP-seq binding.Additional file 9: Annotation features proximal to the rs1391720SNP location from Breast.cancer LD80 set. The figure depictsannotation related to genetics, epigenetics, and TAF ChIP-seq peaks inproximity to the rs1391720 SNP in MCF-7 breast cancer and HMEC breastnormal cell lines using the UCSC Genome Browser. The red vertical barhighlights the location of the 3 SNPs. From the top of the figure, thegenetic information includes the locations of the SNPs and copy numberstatus in MCF-7 cells, the SNPs are located in a gene desert. The chromatininformation shows the DNase I hypersensitive sites in multiple cell types,occupancy sites of promoter marks in MCF-7 cells and active histonemodification marks (H3K4me1, H3K4me3, H3K27ac) in HMEC cells. TheChIP-seq section shows the TAF-associated regions in cells we examined(where data is available). Hotspot of chromatin information and peaks inChIP-seq section were reported by the ENCODE project with the gray scalecolor reflecting the magnitude of open chromatin and binding.Additional file 10: SNP prioritizing plots of Lung.Meta LD80 SNPsfrom the case study. The file includes plots displaying differences inregulatory potential and allelic TF binding affinity (A&B) as well as TAFChIP-seq data (C) for Lung.Meta LD80 SNPs corresponding to Figures 4 and 5.Additional file 11: Results for functional prioritization of Lung.MetaLD80 SNP set. We list the prioritization results of the Lung.Meta LD80SNP set. The columns include the information on the SNPs, regulatorypotential in lung cancer and normal cell types, motif affinity differences(significant TFs concatenated with commas), overlap with TAF ChIP-seqdata, information on the nearest gene, and nearby differentially expressedtranscripts according to RNA-seq data (where data available). Log2((RPKMcancer + 1) / (RPKMnormal + 1)) were computed for TSSs within200 kb up- and down- stream of each SNP, and SNPs with greater than2 fold difference in expression between the lung cancer and normal cellswere reported.Additional file 12: Open chromatin features in other cell lines andexpression data around rs12087869. The figure shows the openchromatin (DNase-seq) features in MCF7, HMEC, LNCaP and PrEC celllines around the rs12087869 SNP. The RNA-seq signals on the plus DNAstrand in A549 and NHLF cells are displayed. Both ROR1 and PGM1genes are expressed in both cell types, and are not differentiallyexpressed comparing between cancer and normal cells (as reported inAdditional file 6).Additional file 13: An exhaustive list of differences in TF bindingaffinity for Lung.Meta LD80 SNPs. The table is a flat file of all PWMspredicted on each Lung.Meta LD80 SNP with the JASPAR 2010 PWM IDs,PWM scores in reference and risk alleles, and empirical p-values of thedifference.Additional file 14: Two-dimensional heatmap of chromatininteraction in the neighbourhood of the rs1391720 SNP. The figureshows Hi-C chromatin interaction datasets in H1 human ES cells (upper)and IMR90 fibroblast cells (lower panel) obtained from Dixon et al. [31] inthe neighbourhood of the rs1391720 SNP and two other SNPs in highLD. The SNPs overlapping the potential cancer-specific enhancer andover 16 TAF ChIP-seq data are labeled in the middle panel along withChen et al. BMC Medical Genomics 2014, 7:34 Page 13 of 15http://www.biomedcentral.com/1755-8794/7/34genes from UCSC and transcripts from Ensembl that include longnon-coding RNAs. The topological domains (TADs) from both celltypes were shown to indicate genomic neighbourhood of strongerwithin-domain interactions. The heatmap values indicated in a colorscale correspond to the number of times that reads in two 20 kb binswere sequenced as a pair, with the red color indicating strongerinteraction and white being little or no interaction. The 85 percentileread counts (22 for H1 and 18 for IMR90 cells) were used as the upperlimit for the heatmap to avoid color domination of extremely interactiveregions. This plot was generated using ‘HiTC’ R package, and the dottedlines were drawn to aid in visualizing the interactive domain in whichthe SNP is located. The TAD region (from H1 cells) containing the SNP ishighlighted in a light pink box.AbbreviationsGWAS: Genome wide association study; SNP: Single nucleotidepolymorphism; HTS: High throughput sequencing; TF: Transcription factor;TAF: Transcription associated factor; PWM: Position weight matrix;TFBS: Transcription factor binding site; LD: Linkage disequilibrium;TAD: Topological associating domain.Competing interestsThe authors declare that they have no competing interests.Authors’ contributionsAll authors contributed to the design of the study. CYC implemented the analysis.ISC and CAH advised about GWAS analysis of cancer risk, and provided feedbackon the predictions generated. CYC and WWW wrote the manuscript. WWWsupervised the project. All authors read and approved the final manuscript.AcknowledgementsWe thank Dr. Angela Brooks-Wilson for critical review, Rebecca Worsley-Hunt,Dr. Anthony Mathelier and Dr. Maja Tarailo-Graovac for helpful discussionsand manuscript feedback, David Arenillas for computational assistance, andDora Pak for research management.This work was initiated through the Canadian Institutes of Health Research2012 Summer Program in Taiwan, and CYC was partially funded by CIHR-IGand NSC-DOIC for the duration. CYC was supported by a scholarship fromCanada’s National Sciences and Engineering Research Council. The Wassermanlaboratory and the work described in this report were supported by theCanadian Institutes of Health Research (CIHR): MOP82875, and the NaturalSciences and Engineering Research Council of Canada (NSERC):15RGPIN355532-10. The computer systems of the Gene RegulationBioinformatics Laboratory were funded by the Canada Foundation forInnovation and the BC Knowledge Development Fund. The research wassupported by Award Number R01GM084875 from the National Institute ofGeneral Medical Sciences and the ABC4DE project funded by GenomeCanada and Genome British Columbia. This research was enabled inpart by support provided by WestGrid (www.westgrid.ca) and ComputeCanada Calcul Canada (www.computecanada.ca). Work of ISC waspartially supported by Taiwan Bioinformatics Core with the grant numberNSC-102-2319-B-400-001. The content is solely the responsibility of theauthors and does not necessarily represent the official views of the NationalInstitute of General Medical Sciences or the National Institutes of Health.The funders had no role in study design, data collection and analysis,decision to publish, or preparation of the manuscript.Author details1Centre for Molecular Medicine and Therapeutics, Child and Family ResearchInstitute, University of British Columbia, Vancouver, British Columbia, Canada.2Graduate Program in Bioinformatics, University of British Columbia,Vancouver, British Columbia, Canada. 3National Institute of Cancer Research,National Health Research Institutes, Zhunan, Taiwan. 4Division of Biostatisticsand Bioinformatics, Institute of Population Health Sciences, National HealthResearch Institutes, Zhunan, Taiwan. 5Department of Medical Genetics,University of British Columbia, Vancouver, British Columbia, Canada.Received: 22 January 2014 Accepted: 2 June 2014Published: 11 June 2014References1. Li MJ, Wang P, Liu X, Lim EL, Wang Z, Yeager M, Wong MP, Sham PC,Chanock SJ, Wang J: GWASdb: a database for human genetic variantsidentified by genome-wide association studies. Nucleic Acids Res 2012,40:D1047–D1054.2. Rosenbloom KR, Dreszer TR, Long JC, Malladi VS, Sloan CA, Raney BJ, ClineMS, Karolchik D, Barber GP, Clawson H, Diekhans M, Fujita PA, Goldman M,Gravell RC, Harte RA, Hinrichs AS, Kirkup VM, Kuhn RM, Learned K, MaddrenM, Meyer LR, Pohl A, Rhead B, Wong MC, Zweig AS, Haussler D, Kent WJ:ENCODE whole-genome data in the UCSC Genome Browser: update2012. Nucleic Acids Res 2012, 40:D912–917.3. Chen CY, Morris Q, Mitchell JA: Enhancer identification in mouseembryonic stem cells using integrative modeling of chromatin andgenomic features. BMC Genomics 2012, 13:152.4. Heintzman ND, Stuart RK, Hon G, Fu Y, Ching CW, Hawkins RD, Barrera LO,Van Calcar S, Qu C, Ching KA, Wang W, Weng Z, Green RD, Crawford GE,Ren B: Distinct and predictive chromatin signatures of transcriptionalpromoters and enhancers in the human genome. Nat Genet 2007,39:311–318.5. Creyghton MP, Cheng AW, Welstead GG, Kooistra T, Carey BW, Steine EJ,Hanna J, Lodato MA, Frampton GM, Sharp PA, Boyer LA, Young RA, JaenischR: Histone H3K27ac separates active from poised enhancers and predictsdevelopmental state. Proc Natl Acad Sci U S A 2010, 107:21931–21936.6. Akhtar-Zaidi B, Cowper-Sal-lari R, Corradin O, Saiakhova A, Bartels CF,Balasubramanian D, Myeroff L, Lutterbaugh J, Jarrar A, Kalady MF, Willis J,Moore JH, Tesar PJ, Laframboise T, Markowitz S, Lupien M, Scacheri PC:Epigenomic enhancer profiling defines a signature of colon cancer.Science 2012, 336:736–739.7. Wasserman NF, Aneas I, Nobrega MA: An 8q24 gene desert variantassociated with prostate cancer risk confers differential in vivo activity toa MYC enhancer. Genome Res 2010, 20:1191–1197.8. Zhang X, Cowper-Sal lari R, Bailey SD, Moore JH, Lupien M: Integrativefunctional genomics identifies an enhancer looping to the SOX9 genedisrupted by the 17q24.3 prostate cancer risk locus. Genome Res 2012,22:1437–1446.9. Gerasimova A, Chavez L, Li B, Seumois G, Greenbaum J, Rao A, Vijayanand P,Peters B: Predicting cell types and genetic variations contributing to diseaseby combining GWAS and epigenetic data. PLoS One 2013, 8:e54359.10. Paul DS, Albers CA, Rendon A, Voss K, Stephens J, van der Harst P,Chambers JC, Soranzo N, Ouwehand WH, Deloukas P: Maps of openchromatin highlight cell type-restricted patterns of regulatory sequencevariation at hematological trait loci. Genome Res 2013, 23:1130–1141.11. Schaub MA, Boyle AP, Kundaje A, Batzoglou S, Snyder M: Linking diseaseassociations with regulatory information in the human genome.Genome Res 2012, 22:1748–1759.12. Worsley-Hunt R, Bernard V, Wasserman WW: Identification of cis-regulatorysequence variations in individual genome sequences. Genome Med 2011,3:65.13. Andersen MC, Engstrom PG, Lithwick S, Arenillas D, Eriksson P, Lenhard B,Wasserman WW, Odeberg J: In silico detection of sequence variationsmodifying transcriptional regulation. PLoS Comput Biol 2008, 4:e5.14. Gerstein MB, Kundaje A, Hariharan M, Landt SG, Yan KK, Cheng C, Mu XJ,Khurana E, Rozowsky J, Alexander R, Min R, Alves P, Abyzov A, Addleman N,Bhardwaj N, Boyle AP, Cayting P, Charos A, Chen DZ, Cheng Y, Clarke D,Eastman C, Euskirchen G, Frietze S, Fu Y, Gertz J, Grubert F, Harmanci A, JainP, Kasowski M, et al: Architecture of the human regulatory networkderived from ENCODE data. Nature 2012, 489:91–100.15. Spivakov M, Akhtar J, Kheradpour P, Beal K, Girardot C, Koscielny G,Herrero J, Kellis M, Furlong EE, Birney E: Analysis of variation attranscription factor binding sites in Drosophila and humans. GenomeBiol 2012, 13:R49.16. Ward LD, Kellis M: HaploReg: a resource for exploring chromatin states,conservation, and regulatory motif alterations within sets of geneticallylinked variants. Nucleic Acids Res 2012, 40:D930–D934.17. Boyle AP, Hong EL, Hariharan M, Cheng Y, Schaub MA, Kasowski M,Karczewski KJ, Park J, Hitz BC, Weng S, Cherry JM, Snyder M: Annotation offunctional variation in personal genomes using RegulomeDB.Genome Res 2012, 22:1790–1797.18. Rhie SK, Coetzee SG, Noushmehr H, Yan C, Kim JM, Haiman CA, Coetzee GA:Comprehensive functional annotation of seventy-one breast cancer riskLoci. PLoS One 2013, 8:e63925.Chen et al. BMC Medical Genomics 2014, 7:34 Page 14 of 15http://www.biomedcentral.com/1755-8794/7/3419. Coetzee SG, Rhie SK, Berman BP, Coetzee GA, Noushmehr H: FunciSNP: anR/bioconductor tool integrating functional non-coding data sets withgenetic association studies to identify candidate regulatory SNPs.Nucleic Acids Res 2012, 40:e139.20. Barenboim M, Manke T: ChroMoS: an integrated web tool for SNPclassification, prioritization and functional interpretation.Bioinformatics 2013, 29:2197–2198.21. Landi MT, Chatterjee N, Yu K, Goldin LR, Goldstein AM, Rotunno M,Mirabello L, Jacobs K, Wheeler W, Yeager M, Bergen AW, Li Q, Consonni D,Pesatori AC, Wacholder S, Thun M, Diver R, Oken M, Virtamo J, Albanes D,Wang Z, Burdette L, Doheny KF, Pugh EW, Laurie C, Brennan P, Hung R,Gaborieau V, McKay JD, Lathrop M, et al: A genome-wide association studyof lung cancer identifies a region of chromosome 5p15 associated withrisk for adenocarcinoma. Am J Hum Genet 2009, 85:679–691.22. R: A language and environment for statistical computing.[http://www.R-project.org]23. Gentleman RC, Carey VJ, Bates DM, Bolstad B, Dettling M, Dudoit S, Ellis B,Gautier L, Ge Y, Gentry J, Hornik K, Hothorn T, Huber W, Iacus S, Irizarry R,Leisch F, Li C, Maechler M, Rossini AJ, Sawitzki G, Smith C, Smyth G, TierneyL, Yang JY, Zhang J: Bioconductor: open software development forcomputational biology and bioinformatics. Genome Biol 2004, 5:R80.24. A catalog of published genome-wide association studies.[http://www.genome.gov/gwastudies]25. Johnson AD, Handsaker RE, Pulit SL, Nizzari MM, O’Donnell CJ, de Bakker PI:SNAP: a web-based tool for identification and annotation of proxy SNPsusing HapMap. Bioinformatics 2008, 24:2938–2939.26. Teng L, Firpi HA, Tan K: Enhancers in embryonic stem cells are enrichedfor transposable elements and genetic variations associated withcancers. Nucleic Acids Res 2011, 39:7371–7379.27. Storey JD, Tibshirani R: Statistical significance for genomewide studies.Proc Natl Acad Sci U S A 2003, 100:9440–9445.28. Wasserman WW, Sandelin A: Applied bioinformatics for the identificationof regulatory elements. Nat Rev Genet 2004, 5:276–287.29. Lenhard B, Wasserman WW: TFBS: Computational framework for transcriptionfactor binding site analysis. Bioinformatics 2002, 18:1135–1136.30. Portales-Casamar E, Thongjuea S, Kwon AT, Arenillas D, Zhao X, Valen E,Yusuf D, Lenhard B, Wasserman WW, Sandelin A: JASPAR 2010: the greatlyexpanded open-access database of transcription factor binding profiles.Nucleic Acids Res 2010, 38:D105–D110.31. Dixon JR, Selvaraj S, Yue F, Kim A, Li Y, Shen Y, Hu M, Liu JS, Ren B:Topological domains in mammalian genomes identified by analysis ofchromatin interactions. Nature 2012, 485:376–380.32. Hinrichs AS, Karolchik D, Baertsch R, Barber GP, Bejerano G, Clawson H,Diekhans M, Furey TS, Harte RA, Hsu F, Hillman-Jackson J, Kuhn RM,Pedersen JS, Pohl A, Raney BJ, Rosenbloom KR, Siepel A, Smith KE, SugnetCW, Sultan-Qurraie A, Thomas DJ, Trumbower H, Weber RJ, Weirauch M,Zweig AS, Haussler D, Kent WJ: The UCSC Genome Browser Database:update 2006. Nucleic Acids Res 2006, 34:D590–598.33. Servant N, Lajoie BR, Nora EP, Giorgetti L, Chen CJ, Heard E, Dekker J, BarillotE: HiTC: exploration of high-throughput ‘C’ experiments. Bioinformatics2012, 28:2843–2844.34. Heintzman ND, Hon GC, Hawkins RD, Kheradpour P, Stark A, Harp LF, Ye Z,Lee LK, Stuart RK, Ching CW, Ching KA, Antosiewicz-Bourget JE, Liu H,Zhang X, Green RD, Lobanenkov VV, Stewart R, Thomson JA, Crawford GE,Kellis M, Ren B: Histone modifications at human enhancers reflect globalcell-type-specific gene expression. Nature 2009, 459:108–112.35. Rowan S, Siggers T, Lachke SA, Yue Y, Bulyk ML, Maas RL: Precise temporalcontrol of the eye regulatory gene Pax6 via enhancer-binding siteaffinity. Genes Dev 2010, 24:980–985.36. Edwards SL, Beesley J, French JD, Dunning AM: Beyond GWASs: illuminatingthe dark road from association to function. Am J Hum Genet 2013, 93:779–797.37. Michailidou K, Hall P, Gonzalez-Neira A, Ghoussaini M, Dennis J, Milne RL,Schmidt MK, Chang-Claude J, Bojesen SE, Bolla MK, Wang Q, Dicks E, Lee A,Turnbull C, Rahman N, Fletcher O, Peto J, Gibson L, Dos Santos Silva I,Nevanlinna H, Muranen TA, Aittomaki K, Blomqvist C, Czene K, Irwanto A,Liu J, Waisfisz Q, Meijers-Heijboer H, Adank M, van der Luijt RB: Large-scalegenotyping identifies 41 new loci associated with breast cancer risk.Nat Genet 2013, 45:353–361. 361e351-352.38. Visel A, Blow MJ, Li Z, Zhang T, Akiyama JA, Holt A, Plajzer-Frick I, Shoukry M,Wright C, Chen F, Afzal V, Ren B, Rubin EM, Pennacchio LA: ChIP-seq accuratelypredicts tissue-specific activity of enhancers. Nature 2009, 457:854–858.39. Andersson R, Gebhard C, Miguel-Escalada I, Hoof I, Bornholdt J, Boyd M,Chen Y, Zhao X, Schmidl C, Suzuki T, Ntini E, Arner E, Valen E, Li K,Schwarzfischer L, Glatz D, Raithel J, Lilje B, Rapin N, Bagger FO, Jorgensen M,Andersen PR, Bertin N, Rackham O, Burroughs AM, Baillie JK, Ishizu Y,Shimizu Y, Furuhata E, Maeda S, et al: An atlas of active enhancers acrosshuman cell types and tissues. Nature 2014, 507:455–461.40. Nora EP, Lajoie BR, Schulz EG, Giorgetti L, Okamoto I, Servant N, Piolot T,van Berkum NL, Meisig J, Sedat J, Gribnau J, Barillot E, Bluthgen N, Dekker J,Heard E: Spatial partitioning of the regulatory landscape of theX-inactivation centre. Nature 2012, 485:381–385.41. Berlivet S, Paquette D, Dumouchel A, Langlais D, Dostie J, Kmita M:Clustering of tissue-specific Sub-TADs accompanies the regulation ofHoxA genes in developing limbs. PLoS Genet 2013, 9:e1004018.42. Ghoussaini M, Fletcher O, Michailidou K, Turnbull C, Schmidt MK, Dicks E,Dennis J, Wang Q, Humphreys MK, Luccarini C, Baynes C, Conroy D,Maranian M, Ahmed S, Driver K, Johnson N, Orr N, Dos Santos Silva I,Waisfisz Q, Meijers-Heijboer H, Uitterlinden AG, Rivadeneira F, Hall P, Czene K,Irwanto A, Liu J, Nevanlinna H, Aittomaki K, Blomqvist C, Meindl A, et al:Genome-wide association analysis identifies three new breast cancersusceptibility loci. Nat Genet 2012, 44:312–318.43. Park YY, Kim K, Kim SB, Hennessy BT, Kim SM, Park ES, Lim JY, Li J, Lu Y,Gonzalez-Angulo AM, Jeong W, Mills GB, Safe S, Lee JS: Reconstruction ofnuclear receptor network reveals that NR2E3 is a novel upstreamregulator of ESR1 in breast cancer. EMBO Mol Med 2012, 4:52–67.44. Zhang S, Chen L, Cui B, Chuang HY, Yu J, Wang-Rodriguez J, Tang L, Chen G,Basak GW, Kipps TJ: ROR1 is expressed in human breast cancer andassociated with enhanced tumor-cell growth. PLoS One 2012, 7:e31127.45. Uhrmacher S, Schmidt C, Erdfelder F, Poll-Wolbeck SJ, Gehrke I, Hallek M,Kreuzer KA: Use of the receptor tyrosine kinase-like orphan receptor 1(ROR1) as a diagnostic tool in chronic lymphocytic leukemia (CLL).Leuk Res 2011, 35:1360–1366.46. Zhi Q, Guo X, Guo L, Zhang R, Jiang J, Ji J, Zhang J, Zhang J, Chen X, Cai Q,Li J, Liu B, Zhu Z, Yu Y: Oncogenic miR-544 is an important moleculartarget in gastric cancer. Anticancer Agents Med Chem 2013, 13:270–275.47. Thayanithy V, Sarver AL, Kartha RV, Li L, Angstadt AY, Breen M, Steer CJ,Modiano JF, Subramanian S: Perturbation of 14q32 miRNAs-cMYC genenetwork in osteosarcoma. Bone 2012, 50:171–181.48. Luo Z, Zhang L, Li Z, Li X, Li G, Yu H, Jiang C, Dai Y, Guo X, Xiang J, Li G: Anin silico analysis of dynamic changes in microRNA expression profiles instepwise development of nasopharyngeal carcinoma. BMC Med Genomics2012, 5:3.49. Nesbit CE, Tersak JM, Prochownik EV: MYC oncogenes and humanneoplastic disease. Oncogene 1999, 18:3004–3016.50. Eisenman RN: Deconstructing myc. Genes Dev 2001, 15:2023–2030.51. Li L, Davie JR: The role of Sp1 and Sp3 in normal and cancer cell biology.Ann Anat 2010, 192:275–283.52. Mathelier A, Zhao X, Zhang AW, Parcy F, Worsley-Hunt R, Arenillas DJ,Buchman S, Chen CY, Chou A, Ienasescu H, Lim J, Shyr C, Tan G, Zhou M,Lenhard B, Sandelin A, Wasserman WW: JASPAR 2014: an extensivelyexpanded and updated open-access database of transcription factorbinding profiles. Nucleic Acids Res 2014, 42:D142–147.53. Kulakovskiy IV, Medvedeva YA, Schaefer U, Kasianov AS, Vorontsov IE, BajicVB, Makeev VJ: HOCOMOCO: a comprehensive collection of humantranscription factor binding sites models. Nucleic Acids Res 2013,41:D195–D202.54. Hoffman MM, Buske OJ, Wang J, Weng Z, Bilmes JA, Noble WS:Unsupervised pattern discovery in human chromatin structure throughgenomic segmentation. Nat Methods 2012, 9:473–476.55. Ernst J, Kellis M: ChromHMM: automating chromatin-state discovery andcharacterization. Nat Methods 2012, 9:215–216.56. Benos PV, Bulyk ML, Stormo GD: Additivity in protein-DNA interactions:how good an approximation is it? Nucleic Acids Res 2002, 30:4442–4451.doi:10.1186/1755-8794-7-34Cite this article as: Chen et al.: On the identification of potentialregulatory variants within genome wide association candidate SNP sets.BMC Medical Genomics 2014 7:34.Chen et al. BMC Medical Genomics 2014, 7:34 Page 15 of 15http://www.biomedcentral.com/1755-8794/7/34


Citation Scheme:


Citations by CSL (citeproc-js)

Usage Statistics



Customize your widget with the following options, then copy and paste the code below into the HTML of your page to embed this item in your website.
                            <div id="ubcOpenCollectionsWidgetDisplay">
                            <script id="ubcOpenCollectionsWidget"
                            async >
IIIF logo Our image viewer uses the IIIF 2.0 standard. To load this item in other compatible viewers, use this url:


Related Items