UBC Faculty Research and Publications

Identification of functional SNPs in the 5-prime flanking sequences of human genes Mottagui-Tabar, Salim; Faghihi, Mohammad A; Mizuno, Yosuke; Engström, Pär G; Lenhard, Boris; Wasserman, Wyeth W; Wahlestedt, Claes Feb 17, 2005

Your browser doesn't seem to have a PDF viewer, please download the PDF to view this item.

Item Metadata


52383-12864_2004_Article_219.pdf [ 771.85kB ]
JSON: 52383-1.0223266.json
JSON-LD: 52383-1.0223266-ld.json
RDF/XML (Pretty): 52383-1.0223266-rdf.xml
RDF/JSON: 52383-1.0223266-rdf.json
Turtle: 52383-1.0223266-turtle.txt
N-Triples: 52383-1.0223266-rdf-ntriples.txt
Original Record: 52383-1.0223266-source.json
Full Text

Full Text

ralssBioMed CentBMC GenomicsOpen AcceResearch articleIdentification of functional SNPs in the 5-prime flanking sequences of human genesSalim Mottagui-Tabar*1, Mohammad A Faghihi1, Yosuke Mizuno1, Pär G Engström1, Boris Lenhard1, Wyeth W Wasserman2 and Claes Wahlestedt1Address: 1Center for Genomics and Bioinformatics, Karolinska Institutet, SE-17177 Stockholm, Sweden and 2Centre for Molecular Medicine and Therapeutics, University of British Columbia, Vancouver, BC V5Z 4H4, CanadaEmail: Salim Mottagui-Tabar* - salim.mottagui-tabar@cgb.ki.se; Mohammad A Faghihi - mohammad.ali.faghihi@cgb.ki.se; Yosuke Mizuno - yosuke.mizuno@cgb.ki.se; Pär G Engström - par.engstrom@cgb.ki.se; Boris Lenhard - boris.lenhard@cgb.ki.se; Wyeth W Wasserman - wyeth@cmmt.ubc.ca; Claes Wahlestedt - claes.wahlestedt@cgb.ki.se* Corresponding author    AbstractBackground: Over 4 million single nucleotide polymorphisms (SNPs) are currently reported toexist within the human genome. Only a small fraction of these SNPs alter gene function orexpression, and therefore might be associated with a cell phenotype. These functional SNPs areconsequently important in understanding human health. Information related to functional SNPs incandidate disease genes is critical for cost effective genetic association studies, which attempt tounderstand the genetics of complex diseases like diabetes, Alzheimer's, etc. Robust methods forthe identification of functional SNPs are therefore crucial. We report one such experimentalapproach.Results: Sequence conserved between mouse and human genomes, within 5 kilobases of the 5-prime end of 176 GPCR genes, were screened for SNPs. Sequences flanking these SNPs werescored for transcription factor binding sites. Allelic pairs resulting in a significant score differencewere predicted to influence the binding of transcription factors (TFs). Ten such SNPs were selectedfor mobility shift assays (EMSA), resulting in 7 of them exhibiting a reproducible shift. The full-lengthpromoter regions with 4 of the 7 SNPs were cloned in a Luciferase based plasmid reporter system.Two out of the 4 SNPs exhibited differential promoter activity in several human cell lines.Conclusions: We propose a method for effective selection of functional, regulatory SNPs that arelocated in evolutionary conserved 5-prime flanking regions (5'-FR) regions of human genes andinfluence the activity of the transcriptional regulatory region. Some SNPs behave differently indifferent cell types.Background every 1000 nucleotides. The vast majority of SNPs are neu-Published: 17 February 2005BMC Genomics 2005, 6:18 doi:10.1186/1471-2164-6-18Received: 02 November 2004Accepted: 17 February 2005This article is available from: http://www.biomedcentral.com/1471-2164/6/18© 2005 Mottagui-Tabar et al; licensee BioMed Central Ltd. This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.Page 1 of 10(page number not for citation purposes)Single nucleotide polymorphisms (SNPs) are the mostcommon form of genomic variations occurring on averagetral allelic variants, however the few that do influence aphenotype in a measurable way, are important forBMC Genomics 2005, 6:18 http://www.biomedcentral.com/1471-2164/6/18understanding the underlying genetics of human health.SNPs are the focus of a large number of human geneticsstudies attempting to understand their impact on complexdiseases like Alzheimers, Parkinsons, diabetes, etc. MostSNPs, by the virtue of their location within genes (introns,3'-UTRs, etc) or between genes, are considered most likelyto be benign and not to contribute to a phenotype,whether it may be the manifestation of a disease orquicker metabolism of a drug. Among the group of SNPslocated within coding regions of genes and causing achange in the peptide sequence (non-synonymous SNPsor 'nsSNPs') or among SNPs located within promoters(regulatory SNPs or rSNPs), a majority may not influencethe overall activity of the protein or the gene expression.With the per-SNP validation and genotyping cost rela-tively high, it is increasingly important to develop strate-gies to predict functionally relevant SNPs in silico. The SNPdatabases in public domain, like NCBI/dbSNP and HGV-base, have facilitated this by highlighting all nsSNPs andalso further classifying the location of the amino acidwithin the encoded proteins [1] to more accurately predictthe detrimental effects of a change in peptide sequence.Several recent studies have attempted to focus on the sub-set of nsSNPs that most likely influence phenotype [2-6].Of the approximately 4.5 Million SNPs in dbSNP [7], anestimated 10,000 nsSNP exist and approximately 10–15%of those are projected to be damaging [6]. Comparativelyfewer attempts have been made to predict and validatefunctional promoter SNPs [8].Transcriptional regulatory regions in the 5'-FR of humangenes encode short (often < 25 bp) [9,10] sequenceswhich serve as targets for binding of transcription factors(TFs). Understanding the conditions of binding, specifi-city and identity of the factors would help us understandthe mechanism of regulation of human genes. EukaryoticTFs tolerate considerable sequence variation in their targetsites and recent bioinformatics works [11-13] have devel-oped methods to model the DNA binding specificity ofindividual TFs [10]. Such matrices, although highly accu-rate [9,14], are less specific in the identification of siteswith in vivo function [11], mainly due to our limitedunderstanding of additional factors involved in TF specif-icity such as factor cooperative binding, protein-proteininteractions, chromatin superstructures and TF concentra-tions. Currently the most successful approach to over-come this information gap is based on the assumptionthat sequences conserved between species (here humanand mouse) would most likely mediate biological func-tion [15-19].The 7TM (7 trans-membrane domain proteins), alsoknown as the hetero-trimeric GTP-binding protein (Ggenome [20]. By some estimates, nearly 60% of drugsmarketed today target directly or indirectly the GPCR fam-ily members [21]. Several studies have collectively ana-lyzed the occurrence, and importance, of coding SNPs topharmaceutical efforts, in this family of genes [22-24].Characterizing polymorphisms that are located in the 5'-FR of these genes and that influence expression has beenreported earlier. We therefore selected a subset of clini-cally and pharmacologically important GPCR genes andtheir 5'-FR sequences to test our bioinformatics and labo-ratory experimentation approach for prediction of func-tionally important SNPs in regulatory regions. Ourselection system evaluates the influence of SNPs on TF-DNA complex stability, and further investigates the influ-ence of such SNPs on promoter activity. We present aproof-of-concept for such a strategy and identify issuesand problem-areas for future developments.ResultsThe lists of the full names and Ensembl ENSG numbers[25] of the 176 GPCR genes are shown in additional files[See Additional file 1]. From a total of approximately 800SNPs in proximal 5 kb regions, less than 200 weremapped to regions of mouse-human genome conserva-tion. Of these approximately 200 SNPs, 36 were predictedto influence TF binding, in regions of sequence conserva-tion of over 70% in human-mouse; the alignments fortwo such regions are indicated in additional files [SeeAdditional file 2]. Table 1 lists the 21 genes, along withthe SNPs, TFs and TFBS sequences and positions relativeto transcription start site. These 36 candidate SNPs inTable 1 were qualified by our selection criteria, asdescribed in the methods section, and were predicted toinfluence the binding of TFs in a qualitative manner. Theabsolute binding score of the TF differed by at least 2 unitsbetween the two alleles. Ten SNPs within the 5'-FR of 7genes were selected for EMSA tests and are shown in boldletters in Table 1. The choice of these genes was based onour understanding of their significance in human physiol-ogy and relevance to research interests within the GPCRresearch community. Table 2 shows the results from theEMSA experiments, where values in each column forAllele 1 and Allele 2 are the ratios of measurements fromeach of the 5 different concentrations of the competitoroligomers (labeled × 5 through × 25 in Table 2 and in Fig-ure 1) divided by the measurements without competitor(labeled 'C' in Figure 1). The decrease in level of thelabeled product as a consequence of increasing non-labeled oligomer concentration is an indication of theefficiency of displacement, thereby reflecting the relativestability of the DNA-protein complex. A marginal increasein level of radio-labeled complex, instead of a decrease(Table 2 rs1800508) is generally considered to be due toPage 2 of 10(page number not for citation purposes)protein)-coupled receptors (GPCRs) are members of alarge family with an estimated 700 genes in the humanadditional factor involvements. Table 2, column 'Ratio'shows the difference, calculated for the highestBMC Genomics 2005, 6:18 http://www.biomedcentral.com/1471-2164/6/18concentration of the non-labeled competitor (25-fold),between the efficiency of competition between a perfect-match competitor and the allelic mismatch competitor.Values close to 1.00 indicate no difference in rate of com-petition between the two alleles, and therefore no relativedifference in stability of the DNA-protein complex. While4 SNPs exhibit mobility shift with a difference of 2-fold ormore (rs267412, rs509813, rs945032 and rs2528521),three SNPs exhibit a moderate, nevertheless reproducibleshift (rs1799722, rs2882225 and rs1538251). Finally,three SNPs fail to show any significant shift (rs968554,rs267413 and rs1800508). In all, seven out of ten poly-morphic markers indicated reproducible binding differ-ences between the alleles. Figure 1 shows pictures ofEMSA gels for two of the SNPs, rs1799722 andrs2528521.For EMSA-positive markers with no validation informa-tion at dbSNP, HGVbase or Celera Discovery Systems™ wedid a validation analysis using RFLP (restriction fragmentlength polymorphism) on DNA samples from 25 healthyNordic individuals. The three SNPs which failed to showpositive gel-shift results (serotonin receptor 5HT-1A:rs968554; DRD1: rs267413; and BK-2: rs1800508) werenot investigated any further. For dopamine receptor D1(DRD1) polymorphism (rs267412) the genotype distri-bution was found to be TT = 30%, AA = 20% and AT =50%, and for calcitonin receptor promoter (CT-R) poly-morphism rs2528521, it was GG = 40%, AA = 30% andGA = 30%. While rs267412 was found to be in HardyWeinberg Equilibrium (HWE), rs2528521 was not. Alarger population sample should be genotyped to accu-rately measure HWE for both these loci. Bradykinin B2(BK-2) promoter polymorphism rs1799722 is noted to bepolymorphic (major allele C: 56%) at NCBI/dbSNP. Thepolymorphic nature of this locus (rs1799722) was alsoconfirmed by crosschecking at Celera Discovery Systems™.Also for the second BK-2 SNP rs945032, the allele fre-quency information was found at dbSNP (major allele =80%). NCBI's dbSNP provided no allele frequency infor-mation about rs1538251 adenosine-A3 receptor(ADORA3), and on sequencing 20 DNA samples thismarker proved to not be polymorphic in Nordic samplepopulation, and was eliminated from further analysis(data not shown). The genotype frequency of muscarinicacetylcholine receptor M1 (CHRM1) SNP rs509813 wasdocumented at Celera Discovery Systems™. The contig-position of rs2882225 (follicle simulating hormonereceptor; FSHR) was not in agreement between the threemajor public genome databases, i.e. NCBI, Ensembl andSanta Cruz Genome Assembly (UCSC). It was mappedwithin the transcript for FSHR by NCBI/dbSNP, and com-pletely absent from Ensembl and UCSC [26]. Therefore ofPolyacrylamide gels from electromobility shift assays.Figure 1Polyacrylamide gels from electromobility shift assays.  Polyacrylamide gels showing the decrease in amounts of protein complex with labelled oligomer as the concentration of the competing non-labelled oligomer increases (lanes marked x5 through x25). Lane marked ’C’ has no competitor and represents the basal levels of labelled complex. The measure of displacement of the labelled oli-gomer is expressed as a ratio of  radio-labelled product, for each lane, divided by the basal value, presented in Table 2. Comparison of allele specific DNA-protein complex stability is a ratio of the highest competitor concentration (x25) of each of the two alleles. Thus for rs1799722 (A) allele A binds proteins 1.66 times better than allele G and for rs2528521 (B) allele A binds proteins 2.62 times better than allele G Page 3 of 10(page number not for citation purposes)the seven SNPs exhibiting positive EMSA, five SNPs (in(Table 2, extreme right column).BMC Genomics 2005, 6:18 http://www.biomedcentral.com/1471-2164/6/18Table 1: List of positive SNP candidates for EMSA studies. The genes and the SNPs which were tested in this report are indicated in bold text. The name of the TF and the predicted consensus site from the transcription start site are indicated.Gene Name /Ensembl ENSG TF name (distance from start site) SNP rs ID Allele 1 Allele 2Beta-2 adrenergic receptor ENSG00000164272 Nkx(-4257) rs2082382 ttcagtg ttcggtgGklf(-4216) rs2082395 aagtgagaag aagtgagaaac-ETS(-1049) rs1432622 gatcct gatcttP2y purinoceptor 5 (p2y5) ENSG00000139679 SAP-1(-149) rs2233571 agcggaaat agtggaaatAnion exchange protein 2 ENSG00000164889 Yin-Yang(-3480) rs2069453 gccatg gccgtgc-ETS(-2988) rs2069451 tttccc tgtcccSPI-1(-2988) rs2069451 gggaaa gggacaSPI-B(-2990) rs2069451 atgggaa atgggacc-MYB_1(-1536) rs2069442 gggagttg gggacttgNkx(-1537) rs2069442 tcaagtc tcaactcSP1(-1440) rs2069441 agggctggga agagctgggaC-c chemokine receptor type 1 ENSG00000163823 SPI-B(-117) rs3181080 acaagaa actagaaSOX17(-118) rs3181080 tttcttgtc tttctagtcFrizzled 6 precursor (frizzled-6) ENSG00000164930 deltaEF1(-629) rs3758096 caccta aacctaProteinase activated receptor 3 ENSG00000164220 c-ETS(-32) rs2069647 catcct cctcctdeltaEF1(-31) rs2069647 ctcctt atccttChemokine (C-X-C) receptor 6 ENSG00000163819 c-MYB_1(-469) rs2234352 tacagatg tatagatgThing1-E47(-469) rs2234352 catctgtaaa catctataaa5-hydroxytryptamine 1a receptor ENSG00000178394 ARNT(-2174) rs968554 caagtg caactgc-MYB_1(-2174) rs968554 tccagttg tccacttgdeltaEF1(-2174) rs968554 cacttg cagttgn-MYC(-2174) rs968554 cacttg cagttgUSF(-2174) rs968554 caagtgg caactggUSF(-2175) rs968554 cacttgg cagttggARNT(-2174) rs968554 cacttg cagttgdeltaEF1(-2174) rs968554 caactg caagtgn-MYC(-2174) rs968554 caagtg caactgMuscarinic acetylcholine receptor M1 ENSG00000168539 MZF_1-4(-148) rs509813 tggggg tggcggMZF_5-13(-149) rs509813 gtggggggag gtggcgggagMZF_1-4(-147) rs509813 gggggg ggcgggSP1(-147) rs509813 ggggggagga ggcgggaggaDopamine receptor D1a ENSG00000184845 FREAC-4(-4311) rs267412 gtaaaccc gtaagcccTCF11MafG(-4446) rs267413 actgac acagacFollicle stimulating hormone receptor ENSG00000170820 HFH-3(-81) rs2882225 ggatgctttttt ggatgctgttttHFH-2(-82) rs2882225 gatgcttttttt gatgctgtttttc-ETS(-80) rs2349718 cttctt ctttttGklf(-87) rs2349718 aaaaaaaaag aaaaaaaaaaSPI-1(-80) rs2349718 aagaag aaaaag5-hydroxytryptamine 2c receptor ENSG00000147246 HFH-1(-1736) rs3795182 ccatgtttata ccatatttataMEF2(-1734) rs3795182 atatttataa atgtttataaFREAC-4(-1734) rs3795182 ataaacat ataaatatc-ETS(-271) rs3813928 tatcct taccctMZF_1-4(-273) rs3813928 tgagga tgagggSPI-B(-273) rs3813928 tgaggat tgagggtBradykinin receptor 2 ENSG00000168398 Ahr-ARNT(-535) rs945032 tgggtg tgggtaMZF_1-4(-80) rs1800508 tgggca tgagcaAP2alpha(-79) rs1800508 gcccaggag gctcaggagTCF11-MafG(-61) rs1799722 aatgat agtgatAdenosine A3 receptor ENSG00000121933 AP2alpha(-4276) rs1538251 gccctctgg tccctctggAlpha-1a adrenergic receptor ENSG00000120907 c-ETS(-4898) rs562843 cttctt cttattSPI-1(-4898) rs562843 aagaag aataagNkx(-4900) rs562843 ataagtt agaagttC-c chemokine receptor type 2 ENSG00000121807 TCF11MafG(-1823) rs3092964 catgcc cataccAhr-ARNT(-1825) rs3092964 tgcatg tgcataTCF11MafG(-1823) rs3749462 catgcc cataccAhr-ARNT(-1825) rs3749462 tgcatg tgcataPage 4 of 10(page number not for citation purposes)Putative chemokine receptor ENSG00000119594 SPI-B(-86) rs3825163 tcaggaa ccaggaaFREAC-4(-80) rs3825163 gtaaccat ataaccatBMC Genomics 2005, 6:18 http://www.biomedcentral.com/1471-2164/6/18four genes) qualified for analysis of their influence onpromoter activity.For expression analysis in living cells, the published pro-moter regions, or putative regulatory 5'FR of up to 2 Kb,were cloned and basal levels of luciferase were monitored.Repeated attempts to clone the promoter region of DRD1failed (data not shown). Furthermore the position ofrs267413 is mapped at -4446 nucleotides with respect tothe transcription start site, whereas the minimum lengthof the genomic fragment known to drive DRD1 expres-sion is limited to 2571 nucleotides. Considering the distalposition of the marker, we decided not to examiners267413 further in this study. The promoter regions ofBK-2, CHRM1 and CT-R were cloned successfully. A totalof four dissimilar human cell lines (HeLa, Hep2G and SK-N-MC, HEK293) were used to monitor the influence ofthe four SNPs (rs945032, rs1799722, rs2528521 andrs509813) to investigate differences in expression that arepossibly due to differences in TF expression in differentcell types. BK-2 promoter SNPs rs945032 (genotype =GG) and rs1799722 (genotype = AA) showedapproximately 40%-60% higher activity in HeLa cells ascompared to their other homozygote alleles AA and GG,respectively (Figure 2). The BK-2 SNP rs945032 behavesin a reciprocal manner in two (HeLa, and HEK293) celltypes. The BK-2 SNP rs1799722 allele 'C' increases expres-sion only in HeLa while decreasing expression in the otherthree cell types, similar to CHRM1 marker rs509813. CT-R marker rs2528521 and CHRM1 marker rs509813 failedto show any influence on luciferase expression levels inHeLa and SK-N-MC cells. Finally, the CT-R SNPrs2528521, allele 'C', influences expression in a significantmanner, but only in one cell line (Hep2G). Hence, differ-DiscussionSNPs that are located within coding regions and result ina change in the peptide sequence may be classified as'damaging' or 'altering' if predicted to be in structurally orfunctionally important sites of the three dimensionalstructure of the protein. It is less straightforward to predictthe functional importance of SNPs within regulatoryregions. TFs tolerate variation in their binding sites and allpositions in a site do not contribute equally to the bindingenergy. Therefore, the quantitative effect a given rSNP hason gene expression depends on its position and the basesinvolved. Complex human diseases like Parkinson's, dia-betes and obesity are polygenic diseases, where many pre-disposing genetic and environmental factors together,over a period of time, cause a disease state. Differences inexpression of genes and cellular concentrations of pro-teins due to common polymorphisms in 5' regulatoryregions could equally elucidate gene function as the exam-ination of non-synonymous SNPs in coding regions. Wedecided to test the 5'-FR of a group of physiologically andclinically important genes, for SNPs within TFBS, whichcould potentially influence the kinetics of bindingaffinity.A common strategy for modeling the binding preferencesof a transcriptions factor is to construct a position weightmatrix (PWM) from known binding sites. PWMs are prob-abilistic models that capture the nucleotide preference ateach position of a TFBS as well as the differential contri-bution of positions to the overall binding energy. When aputative binding site sequence is assessed using a PWM, ascore is obtained that theoretically should be propor-tional to the binding energy between the TF and thatsequence [10]. It has been convincingly shown, and gen-TCF11-MafG(-81) rs3825163 gataac ggtaacNkx(-161) rs2256572 ttatttg ctatttgS8(-163) rs2256572 tatta tactaTCF11MafG(-232) rs590447 gctgac gccgacdeltaEF1(-230) rs590447 cagctt cggcttCalcitonin receptor ENSG00000004948 TCF11MafG(-511) rs2528521 agtgac agtggcLectomedin-3 ENSG00000150471 AP2alpha(-770) rs905963 gccccgagc accccgagcSPI-1(-763) rs905963 gcgaac gcgagcc-ETS(-365) rs1505666 cctcct ccttctSPI-1(-366) rs1505666 gagaag gaggagMZF_1-4(-367) rs1505666 agagga agagaaGlucagon-like P2 ENSG00000065325 SPI-B(-881) rs1402655 tgagaaa tgataaaG protein-coupled receptor ENSG00000102865 Yin-Yang(-118) rs2240047 gccatg gccctgTCF11MafG(-118) rs2240047 catggc cagggcGfi(-1575) rs724615 aaaatcacag aaaatgacagTable 1: List of positive SNP candidates for EMSA studies. The genes and the SNPs which were tested in this report are indicated in bold text. The name of the TF and the predicted consensus site from the transcription start site are indicated. (Continued)Page 5 of 10(page number not for citation purposes)ent alleles behave differently in different cellenvironments.erally accepted, that by considering DNA sequence conser-vation between mouse and human, the over-predictiveBMC Genomics 2005, 6:18 http://www.biomedcentral.com/1471-2164/6/18nature of TFBS modeling can be significantly remedied.Therefore we chose for this work, not to include 'negativecontrols', that is, SNPs from outside of mouse-humanconserved regions. We do indeed think that larger studiesin the future should perhaps incorporate certain numberof such negative controls to validate the theoretical predic-We reasoned that if the score difference between alleles islarge, it should correspond to a difference in gene expres-sion that is reproducible in living cells. Using the JASPARdatabase of PWMs [27] and a phylogenetic footprintingstrategy previously shown to be successful [17], we devel-oped a method to detect putative TFBS and identify rSNPsTable 2: Oligonucleotide sequences used for EMSA. For every SNP, 4 oligonucleotides (2 complimentary pairs) were synthesized, one pair for each allele. One oligonucleotide sequence from each pair had additional GG dinucleotide overhangs at the 5'end for fill-in labeling reaction. Care was taken to make sure that the additional GG-dinucleotide did not influence the predicted TF binding capability. The complementary sequences lacked the GG pairs. Only the allelic sequence predicted to bind most stably was chosen for the fill-in labeling reaction (marked *) while the gel shift assays were carried out using competitor with a perfect match versus a competitor with the allelic mismatch. The polymorphic site is underlined. Column 'Ratio' shows the difference in competition between the labeled and non-labeled oligomers at 25-fold excess, by dividing Allele2 (x25) values by Allele1 (x25).Gene name and rs ID Sequence Allele 1(competitor oligo is a perfect match)Allele 2 (competitor oligo has a mismatch)Ratiox5 x10 x15 x20 x25 x5 x10 x15 x20 x25Serotonin receptor (5-HT-1A) rs968554 ENST00000323865GGAAAAGAATCCA CTTGGGCCAATG *GGAAAAGAATCCAGTTGGGCCAATG1.29 - 1.27 1.23 1.00 1.16 - 1.12 1.13 0.96 0.96Dopamine receptor DRD1 rs267412 ENST00000329144GGAATGTAAACCCAACACAAAAG *GGAATGTAAGCCCAACACAAAAG0.72 - - 0.54 0.56 0.98 - 1.06 1.03 1.12 2.05Dopamine receptor DRD1 rs267413 ENST00000329144GGTATAAAAGTCAGTGAATACAG *GGTATAAAAGTCTGTGAATACAG0.96 - 0.81 0.74 0.81 0.97 - 0.98 0.97 1.01 1.24Muscarinic acetylcholine receptor M1 rs509813 ENST00000306960GGCTTGGGCTCCTCCCCCCAGCCAAC *GGCTTGGGCTCCTCCCGCCAGCCAAC0.21 0.11 0.08 0.05 0.09 0.70 0.60 0.46 0.46 0.37 4.11Follicle stimulating hormone receptor. rs2882225 ENST00000304421GGCAAGGGAGCTGTTTTTTTTGGCAAGGGAGCTTTTTTTTTT *1.10 1.00 0.86 0.71 0.75 1.96 1.73 1.67 1.36 1.35 1.80Adenosine-A3 receptor. rs1538251 ENST00000241356GGTGGCCACCAGAGGGCAGCACG *GGTGGCCACCAGAGGGAAGCACG1.08 1.11 0.95 0.89 0.76 1.49 1.41 1.44 1.51 1.40 1.84Bradykinin receptor B2 rs1800508 ENST00000306005GGGAAGTGCCCAGGAGGC *GGGAAGTGCTCAGGAGGC1.67 1.28 1.17 1.16 1.20 1.10 1.06 1.11 1.27 1.33 1.03Bradykinin receptor B2 rs945032 ENST00000306005GGTTCCTGGGTGCGGG *GGTTCCTGGGTACGGG0.88 0.72 0.72 0.62 0.55 1.15 1.22 1.29 1.22 1.27 2.30Bradykinin receptor B2 rs1799722 ENST00000306005GGCTGGGTAGTGATGTCATCAGCGGCTGGGTAATGATGTCATCAGC *0.36 0.21 0.16 0.12 0.12 0.50 0.29 0.22 0.20 0.20 1.66Calcitonin receptor precursor rs2528521 ENST00000316576, ENST00000248548GGCTGTCCCCGGAGTGGCGGCTGGCTGTCCCCGGAGTGACGGCT *0.54 0.36 0.25 0.22 0.21 0.90 0.83 0.73 0.60 0.55 2.62Page 6 of 10(page number not for citation purposes)tions. Thus, using PWMs it should be possible to over-come the difficulties with rSNP detection stated above.likely to affect TF binding significantly. By incorporatingphylogenetic footprinting, the method reported in thisBMC Genomics 2005, 6:18 http://www.biomedcentral.com/1471-2164/6/18study emphasizes SNPs present in genomic regions thatare highly conserved between human and mouse, therebyincreasing the probability of a downstream functionalinfluence of variations within theses sequences.Electromobility Shift Assays (EMSA) produce DNA-pro-tein binding interactions in artificial conditions. There-fore in silico prediction methods based on other in vitro orin vivo selection technologies, like 'systematic evolution ofligands by exponential enrichment' (SELEX), may notagree with the experimental outcome of EMSA proce-dures. Since the construction of PWMs is often concludedfrom published records based on SELEX enrichmentapproaches, it is informative to experimentally validatethe predicted binding site using methods like EMSA.Therefore, we validated a subset of our predictions with invitro electro-mobility shift. We used a stringent selectioncriterion, that is, qualified only alleles demonstrating anabsolute binding score difference of 2.0 or more. A strin-gent selection criterion would no doubt decrease excessivehits and false positives at the expense of certain loss of truepositives. Our results showed that approximately 60%-70% (i.e. 7 out of the selected 10 SNPs) of predicted sites(Table 2) bind proteins from HeLa nuclear extracts.We finally attempted to correlate the EMSA findings fromthe 10 SNPs with influences on actual promoter activitywithin living cells. Due to mapping discrepancies of onefar. Of the four SNPs tested in a promoter-less expressionvector, two (rs945032 and rs1799722) indicatedsignificant influence on promoter activity, while twoshowed convincing and reproducible, yet comparativelylimited influence (rs2528521 and rs509813) on promoteractivity. The influence of these polymorphisms (rs945032and rs1799722) indicate that any given functional varia-tion within a regulatory region might exert a measurableinfluence within the context of a cell type determined bythe TF expression profile of the cells and perhaps compet-itive binding of the TF to overlapping multiple bindingsites.There are several factors in the current approach whichindicate that there are far more rSNPs than currentlydetected using available technologies. The EMSA assaysemploy HeLa nuclear extract, thereby limiting our find-ings to the TF expression repertoire of only HeLa nuclei.The TFBS package used a limited collection of high qualityPWMs, which unfortunately represent only a small part ofthe approximately 2,000 known human and mouse TFs.The theoretical thresholds set for selections of alleleswhich are predicted to differentially bind TF require fur-ther rigorous testing to ensure that the present selection isoptimal.ConclusionsFrom a total of approximately 200 SNPs in evolutionallyconserved 5'-FR of 176 human GPCR genes, our predic-tion algorithm selected 36 SNPs with possible influenceon TFBS. When ten of these 36 SNPs were tested formobility shifts, seven exhibited a positive result, and fourof these were further tested for influence on promoteractivity using an in situ reporter system. Finally, two of thefour showed significant and reproducible influenceswhich were dependent on the cell environment. Thusstarting from a large pool of potential regulatory SNPs, wesuccessfully identified a small fraction that actually influ-enced promoter activity. We therefore propose a methodfor effective selection of functional, regulatory SNPs, inevolutionary conserved 5'-FR regions of human genes, asa means for identification of candidate SNPs for geneticassociation analysis studies.MethodsSequence alignment and TFBS detectionThe GPCR genes were selected from Ensembl [25].Human and mouse genome assemblies (versions hg12and mm2, respectively) and mappings of GenBank andRefSeq cDNA sequences to the assemblies were retrievedfrom the UCSC Genome Browser Database [26]. In addi-tion, cDNA sequences for the 176 7TM or GPCR genes(online supplement) were mapped to the human genomeComparative promoter activity in different cell lines.Figure 2Comparative promoter activity in different cell lines.  Influence of four functional promoter SNPs on promoter activity is dependent on cell types. Measurements are an average of four independent experiments. A ‘T’ indicates an ‘AA’ and a ‘C’ indicates a ‘GG’ genotype.  Page 7 of 10(page number not for citation purposes)SNP and failure to clone one promoter, we tested onlyfour out of the seven EMSA-positive SNPs identified thusassembly and 50,821 mouse cDNA sequences from theRIKEN project [29] were mapped to the mouse genomeBMC Genomics 2005, 6:18 http://www.biomedcentral.com/1471-2164/6/18assembly using the client/server version of BLAT [30] withdefault settings. For each of the 176 GPCR-encodinghuman cDNAs, we retrieved the genomic mapping withthe highest number of matching bases. Orthologousmouse loci were identified by similarly retrieving mousegenomic mappings for mouse cDNAs defined asorthologs to the human cDNAs in GeneLynx [31]. Tomore reliably identify transcriptional start sites wesearched for other cDNA mappings overlapping theretrieved mappings and indicating similar gene structures.For each gene, the cDNA mapping extending furthest 5'was then used for further analysis. We extracted humangenomic sequences from -5000 to +100 relative to startsof human cDNA mappings and mouse genomicsequences from -30000 to +100 relative to starts of mousecDNA mappings. Orthologous genomic sequences werealigned using BLASTZ [32] with default settings. Alignedregions preceding human cDNA mappings were searchedfor putative rSNPs as follows. SNP data for the humangenomic regions was retrieved from dbSNP, build 114.For each SNP within an aligned region, two allelic ver-sions of a 110-bp alignment slice centered around theSNP were searched for putative TFBS using the TFBS Perlmodules [33] and all position-weight matrices in the JAS-PAR database describing vertebrate TFBS and having aninformation content of at least 7 [27]. Hits fulfilling thefollowing 3 criteria were considered putative TFBS: (a) sit-uated within regions of at least 70% sequence identity(conservation) over 50 base pairs; (b) situated at corre-sponding (aligned) positions in human and mousesequences; (c) having a relative matrix score of at least 0.5in both human and mouse sequences. Selected for furtheranalysis were putative TFBS with a relative matrix scoreexceeding 0.8 in one of the human alleles and eitherundetected in the other allele or having a difference inabsolute matrix score of at least 2 between the humanalleles.Electromobility Shift Assays (EMSA)Table 2 lists the sequences of oligonucleotides used forEMSA tests. For setting up of EMSA experimental proce-dures, an earlier published positive shift assay was repro-duced using a polymorphism in the gene MMP12 [34].Method modifications were then applied as describedbelow.Double stranded oligonucleotides were synthesized with5'-GG dinucleotide overhangs. The 3'-end of the comple-mentary strand was labeled with [α32P] dCTP with fill-inreaction using Klenow flagment. The labeled oligonucle-otide were passed through ProbeQuant G-50 Micro Col-umns (Pfizer-Pharmacia Inc) and the concentration wasadjusted to 0.035 nM. A 0.8 µl volume portion was mixedAfter 10 min incubation at room temperature, 0.8 µl ofnon-labeled competitor DNA, either one allele or theother allele, was added in varying concentrations (5-, 10-,15-, 20- and 25-folds greater than the radio-labeled oligo-nucleotide). After 20 min room temperature incubation,the entire 8 µl reaction was loaded on polyaclylamide gel(5% 22.5 mM Tris/22.5 mM boric acid/0.5 mM EDTAbuffer in, BIO RAD™). Thereafter, electrophoresis was per-formed in TBE for 20 min at 200 V. Gels were placed onWhatmann 3 MM™ filters and to facilitate drying a BIORAD™ gel-dryer was used for 30 minutes. The dried gelwere exposed to intensifying screen and analyzed byTyphoon Image Analyzer 9400 (Pfizer-Pharmacia Inc™).Sequencing and RFLP (Restriction fragment length polymorphism)The frequency of Dopamine receptor D1 polymorphismrs267413 was determined by sequencing a 174 base-pairfragment of the promoter in 10 DNA samples fromhealthy Swedish individuals. The sequence of the forwardprimer used for PCR was 5'-GGGGTACCACTTGACCGT-TCTGTTGCTTT-3' where a KpnI restriction site (GGTACC)and GG-dinucleotide was added to the 5' end. Thesequence of the reverse primer was 5'-TCTTTTAAGCTC-TACTGTGGGTGA-3'. Calcitonin receptor promoter poly-morphism rs2528521 was analyzed by RFLP (fragmentlength was 334 bp, restriction enzyme Tsp45I). Forwardprimer sequence used for PCR was 5'-ACCCCCAAGGT-GTCTCTTCT-3' and reverse primer: 5'- GAGGGAC-CCGAGTTAGACCT-3'. The primer sequences forBradykinin promoter SNP rs1799722 were as follows:Forward primer 5'-CCAGGAGGCTGATGACGTCA-3'. Thefourth base from 3'-end was changed from A to G fromoriginal genomic sequence to create a Tsp45I restrictionsite for RFLP analysis.Reverse primer: 5'-TCAGTCGCTCCCTGGTACTG-3'. Frag-ment length amplified was 150 bp. PCR conditions for allRFLP and sequencing reactions were as follows: 94 C (4min), followed by 42 cycles of 94 C (1 min), 61 C (30 sec)and 72 C (30 sec).Luciferase expression system for promoter activity analysisA promoter-less luciferase vector (Basic PGL3, Promega™)was used for cloning known promoter regions betweenrestriction sites KpnI and BglII of the plasmid vector. Prim-ers for CHRM1: Forward: 5'-GGGGTACCGCAGGACCCA-CATCTCTAGG-3' Reverse = 5'-GAAGATCTTCACCAGGGCACCCAAT-3'. Primers for BK-2: Forward = 5'-GGGGTACCATCTGAGACTCTGTTTCCC-3' reverse = 5'-GAAGATCTTTCAGTCGCTCCCTGGTACT-3'. Primers for CT-R: Forward = 5'-GGGGTACCCCTT-GGAATCAACTTGCCT-3' reverse = 5'-TTCTCGAGCGTC-Page 8 of 10(page number not for citation purposes)with 1.6 µl HeLa Nuclear Extract (Promega™), 1.6 µl 5xGel Shift Binding 5x Buffer (Promega™), and 3.2 µl water.CTTGGAATCAACTTGC-3'. Genomic DNA of 27individuals of Nordic origin were amplified andBMC Genomics 2005, 6:18 http://www.biomedcentral.com/1471-2164/6/18sequenced to identify the genotype of sample DNA.Cloned DNA were sequenced by using primer setGLprimer2 : 5'-CTAGCAAATAGGCTGTCCC-3' and 5'-CTTTATGTTTTTGGCGTCTTCC-3'. HeLa cells were platedin 24 well plates one day before transfection in appropri-ate medium with serum without antibiotics. Basic PGL3Plasmids containing the cloned promoter region (180 ng)were co-transfected with 20 ng of pRL-TK plasmid, usingLipofectamine 2000 (Invitrogen™). Luciferase activitieswere determined using a dual Luciferase Reporter Assaysystem (Promega™) according to the manufacturer'sinstructions.AbbreviationsTF: Transcription Factor; TFBS: transcription factor bind-ing site; SNP: Single Nucleotide Polymorphism; GPCR: G-protein coupled receptors; 5'-FR: 5' flanking regulatoryregion.Authors' contributionsOriginal Concepts and supervision: CW, WWW, BL andSM-T; Bioinformatics: PGE, BL and SMT; Running costs:SMT and CW; Manus preparation: SMT, PGE, CW; Electro-mobility Shift Assays and RFLP assays: YM; Luciferaseexpression: MAF and YM.Additional materialAcknowledgmentsThis work was supported by Pfizer Inc. and by the Swedish National Research Foundation. We would like to express our thanks to Dr. Madis Metsis for his technical advice and unbiased analysis of the EMSA results and to Dr. Albin Sandelin for valuable discussion.References1. Brookes AJ, Lehvaslaiho H, Siegfried M, Boehm JG, Yuan YP, SarkarCM, Bork P, Ortigao F: HGBASE: a database of SNPs and othervariations in and around human genes. Nucleic Acids Res 2000,28:356-360.2. Cargill M, Altshuler D, Ireland J, Sklar P, Ardlie K, Patil N, Shaw N,ization of single-nucleotide polymorphisms in coding regionsof human genes. Nat Genet 1999, 22:231-238.3. Chasman D, Adams RM: Predicting the Functional Conse-quences of Non-synonymous Single Nucleotide Polymor-phisms: Structure-based Assessment of Amino AcidVariation,. Journal of Molecular Biology 2001, 307:683-706.4. Ramensky V, Bork P, Sunyaev S: Human non-synonymous SNPs:server and survey. Nucl Acids Res 2002, 30:3894.5. Sunyaev S, Ramensky V, Bork P: Towards a structural basis ofhuman non-synonymous single nucleotide polymorphisms.Trends in Genetics 2000, 16:198-200.6. Sunyaev S, Ramensky V, Koch I, Lathe III W, Kondrashov AS, Bork P:Prediction of deleterious human alleles. Hum Mol Genet 2001,10:591.7. Sherry ST, Ward MH, Kholodov M, Baker J, Phan L, Smigielski EM,Sirotkin K: dbSNP: the NCBI database of genetic variation.Nucleic Acids Res 2001, 29:308-311.8. Ponomarenko JV, Merkulova TI, Orlova GV, Fokin ON, GorshkovaEV, Frolov AS, Valuev VP, Ponomarenko MP: rSNP_Guide, a data-base system for analysis of transcription factor binding toDNA with variations: application to genome annotation.Nucleic Acids Res 2003, 31:118-121.9. Pennacchio LA, Rubin EM: Genomic strategies to identify mam-malian regulatory sequences. Nat Rev Genet 2001, 2:100-109.10. Stormo GD: DNA binding sites: representation and discovery.Bioinformatics 2000, 16:16.11. Fickett JW: Quantitative discrimination of MEF2 sites. Mol CellBiol 1996, 16:437.12. Fickett JW: Predictive methods using nucleotide sequences.Methods Biochem Anal 1998, 39:231-245.13. Workman CT, Stormo GD: ANN-Spec: a method for discover-ing transcription factor binding sites with improvedspecificity. Pac Symp Biocomput 2000:467-478.14. Tronche F, Ringeisen F, Blumenfeld M, Yaniv M, Pontoglio M: Analy-sis of the Distribution of Binding Sites for a Tissue-specificTranscription Factor in the Vertebrate Genome,. Journal ofMolecular Biology 1997, 266:231-245.15. Duret L, Bucher P: Searching for regulatory elements inhuman noncoding sequences. Curr Opin Struct Biol 1997,7:399-406.16. Krivan W, Wasserman WW: A Predictive Model for RegulatorySequences Directing Liver-Specific Transcription. Genome Res2001, 11:1559.17. Lenhard B, Sandelin A, Mendoza L, Engstrom P, Jareborg N, Wasser-man WW: Identification of conserved regulatory elements bycomparative genome analysis. J Biol 2003, 2:13.18. Loots GG, Locksley RM, Blankespoor CM, Wang ZE, Miller W, RubinEM, Frazer KA: Identification of a coordinate regulator ofinterleukins 4, 13, and 5 by cross-species sequencecomparisons. Science 2000, 288:136-140.19. Shabalina SA, Ogurtsov AY, Kondrashov VA, Kondrashov AS: Selec-tive constraint in intergenic regions of human and mousegenomes. Trends Genet 2001, 17:373-376.20. Rubin GM, Yandell MD, Wortman JR, Gabor Miklos GL, Nelson CR,Hariharan IK, Fortini ME, Li PW, Apweiler R, Fleischmann W, CherryJM, Henikoff S, Skupski MP, Misra S, Ashburner M, Birney E, BoguskiMS, Brody T, Brokstein P, Celniker SE, Chervitz SA, Coates D,Cravchik A, Gabrielian A, Galle RF, Gelbart WM, George RA, Gold-stein LS, Gong F, Guan P, Harris NL, Hay BA, Hoskins RA, Li J, Li Z,Hynes RO, Jones SJ, Kuehl PM, Lemaitre B, Littleton JT, Morrison DK,Mungall C, O'Farrell PH, Pickeral OK, Shue C, Vosshall LB, Zhang J,Zhao Q, Zheng XH, Lewis S: Comparative genomics of theeukaryotes. Science 2000, 287:2204-2215.21. Muller G: Towards 3D structures of G protein-coupled recep-tors: a multidisciplinary approach. Curr Med Chem 2000,7:861-888.22. Rana BK, Shiina T, Insel PA: Genetic variations and polymor-phisms of G protein-coupled receptors: functional and ther-apeutic implications. Annu Rev Pharmacol Toxicol 2001, 41:593-624.23. Sadee W, Hoeg E, Lucas J, Wang D: Genetic variations in humanG protein-coupled receptors: implications for drug therapy.AAPS PharmSci 2001, 3:E22.24. Small KM, Seman CA, Castator A, Brown KM, Liggett SB: False pos-itive non-synonymous polymorphisms of G-protein coupledAdditional File 1ENSG ids A list of 176 initial GPCRs considered for this study, along with the Ensembl ENSG Ids.Click here for file[http://www.biomedcentral.com/content/supplementary/1471-2164-6-18-S1.txt]Additional File 2Alignments Alignment information for sequence flanking rs945032 and rs1799722 in human and mouse.Click here for file[http://www.biomedcentral.com/content/supplementary/1471-2164-6-18-S2.txt]Page 9 of 10(page number not for citation purposes)Lane CR, Lim EP, Kalyanaraman N, Nemesh J, Ziaugra L, Friedland L,Rolfe A, Warrington J, Lipshutz R, Daley GQ, Lander ES: Character- receptor genes. FEBS Letters 2002, 516:253-256.Publish with BioMed Central   and  every scientist can read your work free of charge"BioMed Central will be the most significant development for disseminating the results of biomedical research in our lifetime."Sir Paul Nurse, Cancer Research UKYour research papers will be:available free of charge to the entire biomedical communitypeer reviewed and published immediately upon acceptancecited in PubMed and archived on PubMed Central BMC Genomics 2005, 6:18 http://www.biomedcentral.com/1471-2164/6/1825. Hubbard T, Andrews D, Caccamo M, Cameron G, Chen Y, Clamp M,Clarke L, Coates G, Cox T, Cunningham F, Curwen V, Cutts T, DownT, Durbin R, Fernandez-Suarez XM, Gilbert J, Hammond M, HerreroJ, Hotz H, Howe K, Iyer V, Jekosch K, Kahari A, Kasprzyk A, Keefe D,Keenan S, Kokocinsci F, London D, Longden I, McVicker G, MelsoppC, Meidl P, Potter S, Proctor G, Rae M, Rios D, Schuster M, Searle S,Severin J, Slater G, Smedley D, Smith J, Spooner W, Stabenau A,Stalker J, Storey R, Trevanion S, Ureta-Vidal A, Vogel J, White S,Woodwark C, Birney E: Ensembl 2005. Nucleic Acids Res 2005, 33Database Issue:D447-D453.26. Karolchik D, Baertsch R, Diekhans M, Furey TS, Hinrichs A, Lu YT,Roskin KM, Schwartz M, Sugnet CW, Thomas DJ: The UCSCGenome Browser Database. Nucleic Acids Res 2003, 31:51-54.27. Sandelin A, Alkema W, Engstrom P, Wasserman WW, Lenhard B:JASPAR: an open-access database for eukaryotic transcrip-tion factor binding profiles. Nucleic Acids Res 2004, 32 Databaseissue:D91-D94.28. Okazaki Y, Furuno M, Kasukawa T, Adachi J, Bono H, Kondo S,Nikaido I, Osato N, Saito R, Suzuki H: Analysis of the mouse tran-scriptome based on functional annotation of 60,770 full-length cDNAs. Nature 2002, 420:563-573.29. Kent WJ: BLAT - the BLAST-like alignment tool. Genome Res2002, 12:656-664.30. Lenhard B, Hayes WS, Wasserman WW: GeneLynx: a gene-cen-tric portal to the human genome. Genome Res 2001,11:2151-2157.31. Schwartz S, Kent WJ, Smit A, Zhang Z, Baertsch R, Hardison RC,Haussler D, Miller W: Human-mouse alignments withBLASTZ. Genome Res 2003, 13:103-107.32. Lenhard B, Wasserman WW: TFBS: Computational frameworkfor transcription factor binding site analysis. Bioinformatics2002, 18:1135-1136.33. Jormsjo S, Whatling C, Walter DH, Zeiher AM, Hamsten A, ErikssonP: Allele-specific regulation of matrix metalloproteinase-7promoter activity is associated with coronary artery luminaldimensions among hypercholesterolemic patients. ArteriosclerThromb Vasc Biol 2001, 21:1834-1839.yours — you keep the copyrightSubmit your manuscript here:http://www.biomedcentral.com/info/publishing_adv.aspBioMedcentralPage 10 of 10(page number not for citation purposes)


Citation Scheme:


Citations by CSL (citeproc-js)

Usage Statistics



Customize your widget with the following options, then copy and paste the code below into the HTML of your page to embed this item in your website.
                            <div id="ubcOpenCollectionsWidgetDisplay">
                            <script id="ubcOpenCollectionsWidget"
                            async >
IIIF logo Our image viewer uses the IIIF 2.0 standard. To load this item in other compatible viewers, use this url:


Related Items