UBC Faculty Research and Publications

Correlation between the secondary structure of pre-mRNA introns and the efficiency of splicing in Saccharomyces… Rogic, Sanja; Montpetit, Ben; Hoos, Holger H; Mackworth, Alan K; Ouellette, BF F; Hieter, Philip Jul 29, 2008

Your browser doesn't seem to have a PDF viewer, please download the PDF to view this item.

Item Metadata


52383-12864_2008_Article_1548.pdf [ 995.33kB ]
JSON: 52383-1.0221598.json
JSON-LD: 52383-1.0221598-ld.json
RDF/XML (Pretty): 52383-1.0221598-rdf.xml
RDF/JSON: 52383-1.0221598-rdf.json
Turtle: 52383-1.0221598-turtle.txt
N-Triples: 52383-1.0221598-rdf-ntriples.txt
Original Record: 52383-1.0221598-source.json
Full Text

Full Text

ralssBioMed CentBMC GenomicsOpen AcceResearch articleCorrelation between the secondary structure of pre-mRNA introns and the efficiency of splicing in Saccharomyces cerevisiaeSanja Rogic*1,2, Ben Montpetit3,4, Holger H Hoos1, Alan K Mackworth1, BF Francis Ouellette5 and Philip Hieter3Address: 1Department of Computer Science, University of British Columbia, Vancouver, Canada, 2Center for High-Throughput Biology, University of British Columbia, Vancouver, Canada, 3Michael Smith Laboratories, University of British Columbia, Vancouver, Canada, 4Centre for Molecular Medicine and Therapeutics, Vancouver, Canada and 5Ontario Institute for Cancer Research, Toronto, CanadaEmail: Sanja Rogic* - rogic@bioinformatics.ubc.ca; Ben Montpetit - bmontpet@cmmt.ubc.ca; Holger H Hoos - hoos@cs.ubc.ca; Alan K Mackworth - mack@cs.ubc.ca; BF Francis Ouellette - francis@oicr.on.ca; Philip Hieter - hieter@msl.ubc.ca* Corresponding author    AbstractBackground: Secondary structure interactions within introns have been shown to be essential forefficient splicing of several yeast genes. The nature of these base-pairing interactions and their effecton splicing efficiency were most extensively studied in ribosomal protein gene RPS17B (previouslyknown as RP51B). It was determined that complementary pairing between two sequence segmentslocated downstream of the 5' splice site and upstream of the branchpoint sequence promotesefficient splicing of the RPS17B pre-mRNA, presumably by shortening the branchpoint distance.However, no attempts were made to compute a shortened, 'structural' branchpoint distance andthus the functional relationship between this distance and the splicing efficiency remains unknown.Results: In this paper we use computational RNA secondary structure prediction to analyze thesecondary structure of the RPS17B intron. We show that it is necessary to consider suboptimalstructure predictions and to compute the structural branchpoint distances in order to explainpreviously published splicing efficiency results. Our study reveals that there is a tight correlationbetween this distance and splicing efficiency levels of intron mutants described in the literature. Weexperimentally test this correlation on additional RPS17B mutants and intron mutants within twoother yeast genes.Conclusion: The proposed model of secondary structure requirements for efficient splicing is thefirst attempt to specify the functional relationship between pre-mRNA secondary structure andsplicing. Our findings provide further insights into the role of pre-mRNA secondary structure ingene splicing in yeast and also offer basis for improvement of computational methods for splice siteidentification and gene-finding.BackgroundSplicing of precursor mRNA is one of the essential cellularing three decades ago [1,2], resulting in a thorough under-standing of the splicing pathway and identification of thePublished: 29 July 2008BMC Genomics 2008, 9:355 doi:10.1186/1471-2164-9-355Received: 7 March 2008Accepted: 29 July 2008This article is available from: http://www.biomedcentral.com/1471-2164/9/355© 2008 Rogic et al; licensee BioMed Central Ltd. This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.Page 1 of 19(page number not for citation purposes)processes in eukaryotic organisms. Although this processhas been extensively studied since the discovery of splic-numerous components of the splicing machinery, thereare still many unanswered questions. For example, whileBMC Genomics 2008, 9:355 http://www.biomedcentral.com/1471-2164/9/355the ability of pre-mRNA to form intramolecular interac-tions between short complementary segments in longyeast introns was initially suggested 20 years ago [3], therole of pre-mRNA secondary structure in splicing is notwell understood.Introns in S. cerevisiae are known to have bimodal lengthdistribution [4] and can be classified into short and longintrons based on their length. The distance between the 5'splice site and the branchpoint sequence, also known asthe 'lariat length' or 'branchpoint distance' (we also referto it as linear branchpoint distance), is tightly correlatedwith intron length (with a Pearson correlation coefficientof r = 0.99 [5]) and can also be used to classify introns intolong (5'L) and short (5'S) [3]. It was hypothesized that 5'Lintrons, for which the branchpoint distance is greater than200 nt, can fold into secondary structures to optimize thepositioning of the 5' splice site and branchpoint sequenceto one that is optimal for spliceosome assembly [3]. Thishypothesis was confirmed for a limited number of yeastintrons by comprehensive biological experiments thatdemonstrated that the existence of such secondary struc-ture elements is essential for splicing efficiency [6-11].Structural elements that exhibit a similar effect on splicingefficiency were also found in introns of Drosophila mela-nogaster and related species [12]. Furthermore, in mam-malian cells, folding of long intron sequences is facilitatedby protein binding and interactions, which presumablyshortens the long distance between essential splicingsequences [13].The nature of the base-pairing interactions within intronsand their effect on splicing efficiency were most exten-sively studied in S. cerevisiae's ribosomal protein geneRPS17B, previously known as RP51B (YDR447C). It wasshown that secondary structure interaction between twosequence segments located downstream of the 5' splicesite and upstream of the branchpoint sequence promotesefficient splicing of the RPS17B pre-mRNA [7]. This inter-action was further tested by comprehensive mutationaland structure-probing analysis to determine the structureof the stem formed in the wildtype intron and the sensi-tivity of splicing efficiency to the alterations in this stem[8,9]. These studies demonstrated that complementarypairing between two ends of the RPS17B intron, but notnecessarily the formation of the described stem, is essen-tial for its efficient splicing in vitro and in vivo.While the authors of the previous studies speculated thatthe function of the complementary pairing is to shortenthe branchpoint distance, they did not attempt to deter-mine the secondary structure of the intron and the result-ing 'structural' branchpoint distance. Thus a functionalIn this paper we use computational RNA secondary struc-ture prediction to investigate the secondary structures ofwildtype and mutant intron sequences within the S. cere-visiae RPS17B pre-mRNA. We present a unique algorithmfor measuring 'structural' distance between two bases inan RNA secondary structure and use it to compute the dis-tance between the 5' splice site and the branchpointsequence based on the predicted secondary structure. Ouranalysis show that there is a tight correlation betweenstructural branchpoint distances and splicing efficiencylevels for all mutants examined.ResultsSecondary structure of RPS17B intron and the efficiency of splicingThe first goal of our study was to determine if the splicingefficiency results previously reported for RPS17B intron[8] can be correlated with the computationally predictedsecondary structures of wildtype and mutant intronsequences.In this study the sensitivity of splicing to alterations in thestem formed in the RPS17B intron was tested by introduc-ing mutations in the interacting regions designated UB1(upstream box 1) and DB1 (downstream box 1). Theassumption behind the mutant design was that any muta-tion within the stem would disrupt it and change theintron secondary structure in such a way that the resultingstructural branchpoint distance (ds) would be greater thanfor the wildtype intron. The authors created 9 mutantintrons within the RPS17B gene: 3mUB1 (3 nt mutation),4mUB1 (4 nt), 5mUB1 (5 nt), 6mUB1 (6 nt) and 8mUB1(8 nt), where mutations fall in the UB1 region; 3mDB1 (3nt) and 5mDB1 (5 nt), where mutations fall in the DB1region and are designed to restore the base-pairing dis-rupted by the mutations in the 3mUB1 and 5mUB1,respectively; and 3mUB1_3mDB1 and 5mUB1_5mDB1,which are double mutants. All of the single mutants areexpected to disrupt the secondary structure, while thedouble mutants are predicted to restore it. The RPS17Bintron was inserted into the coding region of the copperresistance gene (CUP1), which served as a reporter gene.Thus, yeast cells grown on copper containing medium willbe viable only if the intron-containing Cup1 mRNA isspliced. The results of this assay suggested that for all sin-gle mutants except 8mUB1, splicing was reduced. Surpris-ingly, 8mUB1 had a similar growth rate on copper mediaas the wildtype intron suggesting that splicing was as effi-cient. Out of two double mutants, 5mUB1_5mDB1 wasable to partially rescue copper resistance, while3mUB1_3mDB1 did not. The authors hypothesized thatthese unexpected results were the result of some second-ary structure rearrangements; however, the secondaryPage 2 of 19(page number not for citation purposes)relationship between this distance and the splicing effi-ciency remains unknown.structure of the mutants 8mUB1 and 3mUB1_3mDB1 wasnot explored.BMC Genomics 2008, 9:355 http://www.biomedcentral.com/1471-2164/9/355In order to investigate if the differences in the splicing effi-ciency levels are due to the differences in secondary struc-tures, we computed the minimum free energy (MFE)structures of the introns using mfold [14,15], one of themost frequently used RNA secondary structure predictiontools. The comparative RNA secondary structure predic-tion, which is considered more reliable, requires a certainnumber of orthologous sequences which were availableonly for the wildtype RPS17B intron and not for themutants created in [8].According to the mfold MFE predictions, the introducedmutations have the desired effect of disrupting the stem inall single mutants, but the compensatory mutations fail torestore it in two double mutants. Focusing on the posi-tioning between the donor site and the branchpointsequence, we compared the part of the structure that con-tains these two sites across all the mutants. The specifiedstructural domain was almost identical for the 3mUB1,5mUB1, 8mUB1, 3mDB1, 3mUB1_3mDB1 and5mUB1_5mDB1 mutants, some of which have very differ-ent splicing efficiency levels (see Additional file 1). More-over, the full secondary structures of the 3mUB1 and3mUB1_3mDB1 mutants were almost identical with onlythree base-pairs difference, while the copper resistanceexperiment suggested significant differences in splicingefficiency. Therefore, it appears that differences in thesplicing efficiency of Libri et al.'s [8] mutants cannot beattributed to differences in the computed MFE secondarystructures of introns.However, considering only a single, minimum free energysecondary structure prediction of an intron might not bethe appropriate approach. While functional, non-codingRNAs, such as tRNAs and rRNAs, have a strong evolution-ary pressure to maintain their unique, functional struc-ture, it is believed that mRNAs, whose primary role is tocarry the protein coding information to the translationapparatus, do not have functional constraints on their glo-bal structure. Thus, instead of always folding into uniqueMFE structure, it is likely that mRNAs exist in a populationof structures [16-18]. Another reason for considering sub-optimal structures, especially when using computationalprediction methods, is that RNA secondary structure pre-diction algorithms have limited accuracy and sometimesthe correct structure is buried among the suboptimal pre-dictions with free energies very close to the MFE[15,19,20].Structural branchpoint distances of suboptimal secondary structures and the efficiency of splicingBased on these considerations, we modified our approachto include not only the optimal, i.e., MFE structure, buttionship between the free energy of a structure and itsprobability in the ensemble of all possible structures for agiven sequence. The probability of a structure Si in theBoltzmann ensemble of all possible structures (S1, S2,...)for a given RNA sequence is given by:where ΔG(Si) is the free energy of structure Si, Q = ΣS e-ΔG(S)/RT the partition function for all possible secondarystructures for the given sequence, R is the physical gas con-stant, and T is the temperature. The probability of a sec-ondary structure is also called the Boltzmann weight ofthat structure.From the equation we can see that the lower the freeenergy of a structure the higher its probability, thus, thepredictions within 5% from the MFE also represent themost probable structures for a given sequence, with theMFE prediction being the one with the highest probabil-ity.We used RNAsubopt algorithm [20] to sample 1000 sub-optimal structures within 5% of the MFE for each consid-ered intron. RNAsubopt first calculates all suboptimalstructures within a user defined energy range and thenproduces a random sample of structures, drawn withprobabilities equal to their Boltzmann weights. Therefore,RNAsubopt computes a representative sample of the sec-ondary structure space within 5% of the MFE.Since the pair-wise structure comparison and distanceestimation approach that we used for MFE structure pre-dictions were not applicable to large number of structureswe had to devise a new way to quantify the structural dis-tance between the donor site and the branchpointsequence. We designed an algorithm that converts an RNAsecondary structure into a graph and then applies a short-est-path algorithm from graph theory to compute theshortest distance between two bases in the secondarystructure. To the best of our knowledge this is the firstalgorithm for structural distance computation. Moredetails are given in Materials and Methods.For each secondary structure prediction, we computed theexact distance between the donor site and the branchpointsequences (ds) using the shortest-path algorithm. Theaverage structural branchpoint distances are given inTable 1. We assigned descriptive splicing efficiency labelsbased on the gel images in Figure 2A in [8]. The distribu-tions of computed structural branchpoint distances foreach of the RPS17B mutants are given in Figure 1.P Se G Si RTQi( )( ) /=−Δ(1)Page 3 of 19(page number not for citation purposes)also near-optimal predictions whose free energies arewithin 5% of the optimum. There is an exponential rela-BMC Genomics 2008, 9:355 http://www.biomedcentral.com/1471-2164/9/355These results suggest an interesting correlation betweenthe average structural branchpoint distance and the splic-ing efficiency levels: sequences that are more efficientlyspliced (wildtype, 3mUB1, 5mUB1, 8mUB1,5mUB1_5mDB1, and 4mUB1) have lower values for theaverage distance than those that are poorly spliced. Afterassigning numerical values to the descriptive splicing effi-ciency labels (efficient = 1, slightly reduced = 2, reduced =3 and inhibited = 4) we obtain a Pearson correlation coef-ficient of 0.87.The histograms in Figure 1 offer further insights into therelationship between structural branchpoint distances ofintrons and their efficiency of splicing; introns that arespliced efficiently or with slightly reduced efficiency havelarge frequency of suboptimal structures with ds < 10.Mutant 5mUB1_5mDB1, which does not have this promi-nent peak in its distribution histogram and mutant4mUB1, which has reduced splicing efficiency, but notcompletely inhibited, still have higher frequency of struc-tures with ds < 20 than the remaining, poorly splicedmutants. The correlation coefficient between splicing effi-ciency level and the proportion of structures with ds < 20is 0.85.Finally, the cumulative distribution plot of structuralbranchpoint distances for all mutants, where lines arelabeled according to the splicing efficiency levels (efficient– blue, slightly reduced – green, reduced – black andinhibited – red) shows a clear separation of spliced andunspliced mutants (Figure 2).Upon closer inspection we noticed that most of the struc-tures with ds < 10 have ds = 4. Analysis of the secondarystructures of these sequences reveals that this distance cor-responds to a structural conformation where the donorbase-pairing interactions are not necessarily inconsistentwith established models of the splicing process, accordingto which spliceosomal snRNAs interact with the donorsite and the branchpoint sequence, since the base-pairingcan be easily disrupted after the splicing factors have beenaligned properly.Structural branchpoint distances and the efficiency of splicing for other published RPS17B mutantsIn order to test the generality of the observed correlationbetween splicing efficiency levels and structural branch-point distances we also analyzed the RPS17B intronmutants described in [9]. These are mut-UB1i, which hasan inverted UB1 sequence; mut-DB1i, which has aninverted DB1 sequence; mut-UB1iDB1i, which has bothUB1 and DB1 sequences inverted to make them comple-mentary to each other; mut-5, which reduces the consecu-tive pairing region to 5 base-pairs, mut-12; whichimproves pairing to 12 consecutive base-pairs (eliminat-ing one one-nucleotide bulge); and mut-18, which extendspairing to 18 consecutive base-pairs (eliminating all threebulges in the pairing region). The authors compared splic-ing efficiency of the wildtype and mutant introns by ana-lyzing the formation of spliceosomal complexes. Based ontheir gel images, we assigned descriptive and numericalsplicing efficiency labels to the tested sequences (see Table2). The average structural branchpoint distances of 1000suboptimal structures sampled from within 5% of theMFE for each mutant are given in Table 2.The branchpoint distance results for these mutants aresimilar to those of Libri et al.'s [8] mutants; the averagestructural branchpoint distances are lower for thesequences that are efficiently spliced (wildtype, mut-UB1iDB1i, mut-12, and mut-18). After assigning numericalvalues to the descriptive splicing efficiency labels(improved = 1, normal = 2 and reduced = 3), we obtainthe correlation coefficient as 0.85. This, again, corre-sponds to the ability of these sequences to fold in such away as to bring the donor site and the branchpointsequences close to each other; each of the efficientlyspliced sequences has a large fraction of predicted second-ary structures for which ds < 10 (Figure 3 and Additionalfile 2). The mutants that show reduced splicing have veryfew of these structures (0.02% for mut-UB1i and 0.33%for mut-5), except for mut-DB1i, which has 11.3% of struc-tures with ds < 10. However, this is still significantly lowerthan for the efficiently spliced mutants. Again, the cumu-lative distribution plot clearly separates mutants based ontheir splicing efficiency (Figure 3).Base-pairing probabilities of the RPS17B intron and the efficiency of splicingTable 1: Average structural branchpoint distances for the wildtype (wt) RPS17B intron and Libri et al.'s [8] intron mutants.mutant average ds splicing efficiencywt 26.67 efficient (1)3mUB1 27.67 slightly reduced (2)5mUB1 28.42 slightly reduced (2)8mUB1 27.94 efficient (1)3mDB1 37.55 inhibited (4)5mDB1 39.19 inhibited(4)3mUB1_3mDB1 37.44 inhibited (4)5mUB1_5mDB1 33.81 slightly reduced (2)6mUB1 46.31 inhibited (4)4mUB1 32.08 reduced (3)Levels of splicing efficiency were approximated from the gel images in Figure 2A in [8]. The numbers within parentheses correspond to numerical values assigned to descriptive splicing efficiency labels.Page 4 of 19(page number not for citation purposes)and branchpoint sequences have two base-pairing interac-tions between them (see Section 2.1.3). The observedThe branchpoint distance analysis of S. cerevisiae's RPS17Bintron suggests that the ability to form highly probableBMC Genomics 2008, 9:355 http://www.biomedcentral.com/1471-2164/9/355Page 5 of 19(page number not for citation purposes)Distribution histograms of structural branchpoint distances for (a) wt, (b) 3mUB1, (c) 5mUB1, (d) 8mUB1, (e) 3mDB1, (f) 5mDB1, (g) 3mUB1_3mDB1, (h) 5mUB1_5mDB1, (i) 6mUB1, and (j) 4mUB1 intronsFigure 1Distribution histograms of structural branchpoint distances for (a) wt, (b) 3mUB1, (c) 5mUB1, (d) 8mUB1, (e) 3mDB1, (f) 5mDB1, (g) 3mUB1_3mDB1, (h) 5mUB1_5mDB1, (i) 6mUB1, and (j) 4mUB1 introns.10 20 30 40 50 60 70 80 90 10000. branchpoint distancefraction of structures(a)10 20 30 40 50 60 70 80 90 10000. branchpoint distancefraction of structures(b)10 20 30 40 50 60 70 80 90 10000. branchpoint distancefraction of structures(c)10 20 30 40 50 60 70 80 90 10000. branchpoint distancefraction of structures(d)10 20 30 40 50 60 70 80 90 10000. branchpoint distancefraction of structures(e)10 20 30 40 50 60 70 80 90 10000. branchpoint distancefraction of structures(f)10 20 30 40 50 60 70 80 90 10000. branchpoint distancefraction of structures(g)10 20 30 40 50 60 70 80 90 10000. branchpoint distancefraction of structures(h)10 20 30 40 50 60 70 80 90 10000. branchpoint distancefraction of structures(i)10 20 30 40 50 60 70 80 90 10000. branchpoint distancefraction of structures(j)BMC Genomics 2008, 9:355 http://www.biomedcentral.com/1471-2164/9/355secondary structures (within 5% of the MFE) with shortdistance between the donor site and the branchpointsequence seems to be required for efficient splicing of theintron. The short structural branchpoint distance for theRPS17B intron results from two base-pair interactions:between the first intron base (G) and the third base of thebranchpoint sequence (C); and between the second basein the intron (U) and the second base of the branchpointsequence (A) (see Figure 4). It is possible to compute theprobability of these base-pairing interactions directlyusing a dynamic programming algorithm that computesthe partition function [21]. The base-pair probabilityreflects a sum of all probability-weighted structures inwhich the chosen base-pair occurs. Thus, these base-pair-ing probabilities also take into account the structures thatwere not within 5% from the MFE, eliminating the neces-sity to chose an arbitrary percent suboptimality value. Thebase-pair probabilities can be computed using RNAfold[22], another frequently used program for RNA secondarystructure prediction.The base-pair probability values for the wildtype RPS17Bintron and all of Libri et al.'s [8] mutants are given inTable 3. The probability values for the two base-pairs (G-C and U-A) are identical up to second decimal place forCumulative distributions of structural branchpoint distances for all Libri et al.'s [8] intron mutantsFigure 2Cumulative distributions of structural branchpoint distances for all Libri et al.'s [8] intron mutants.0 10 20 30 40 50 60 70 8000. branchpoint distanceCumulative probability  wt3mUB15mUB18mUB13mDB15mDB13mUB13mDB15mUB15mDB16mUB14mUB1Table 2: Average structural branchpoint distances for the wildtype (wt) RPS17B intron and Charpentier and Rosbash's [9] intron mutants.mutant average ds splicing efficiencywt 26.67 normal (2)mut-UB1i 42.51 reduced (3)mut-DB1i 35.95 reduced (3)mut-UB1iDB1i 26.39 improved (1)mut-5 32.14 reduced (3)mut-12 24.82 improved (1)mut-18 25.30 improved (1)Page 6 of 19(page number not for citation purposes)each intron sequence and that is why only one number isshown in the table. It can be observed that all of the effi-We inferred levels of splicing efficiency based on Figures 2 and 3 and Table 1 in [9]. The numbers within parentheses correspond to numerical values assigned to descriptive splicing efficiency labels.BMC Genomics 2008, 9:355 http://www.biomedcentral.com/1471-2164/9/355ciently spliced sequences have higher base-pair probabili-ties than the poorly spliced sequences (r = -0.92). Thecorrelation is not strictly linear since, for example, themutant sequence 8mUB1 has almost the same base-pairprobability value as 3mUB1 and 5mUB1, although it ismore efficiently spliced than these two. Similarly, thedouble mutant 5mUB1_5mDB1 is more efficiently splicedthan 4mUB1, but this is not reflected in the base-pairprobability values.For Charpentier and Rosbash's mutants, the base-pairprobabilities are also higher for the sequences that aremore efficiently spliced (Table 4): all of the sequences thatare efficiently spliced (wildtype, mut-UB1iDB1i, mut-12,and mut-18) have base-pair probabilities of 0.40, whilethe other sequences have lower values (r = -0.85).Overall, based on the results for Libri et al.'s [8] and Char-pentier and Rosbash's [9] mutants it seems that, at leastfor RPS17B intron, base-pair probabilities for the twobase-pairs formed between the first two bases of thewill see in the following sections that this is not a generalrequirement for all genes. Taken together with theobserved correlation between the splicing efficiency levelsand structural branchpoint distances the results are con-sistent with the following hypothesis: the existence ofhighly probable secondary structures that have shortbranchpoint distance is required for efficient splicing ofyeast introns.Experimental testing of the hypothesisIn order to test the validity of the proposed hypothesis, wedesigned and functionally tested in vivo a series of RPS17Bintron mutants. To assay the effect of these mutations onsplicing we opted to introduce the mutated intronsequences at their endogenous locus, instead of within theCUP1 gene as was previously done [8,9]. This allows us toanalyze the splicing of this intron within its normal con-text of flanking DNA sequences. We estimated the splicingefficiency directly from protein expression levels, whichwere quantified using a fluorescence imaging system.Cumulative distributions of structural branchpoint distances for all Charpentier and Rosbash's [9] intron mutantsFigure 3Cumulative distributions of structural branchpoint distances for all Charpentier and Rosbash's [9] intron mutants.0 10 20 30 40 50 60 70 8000. branchpoint distanceCumulative probability  wtUB1iDB1iUB1iDB1imut5mut12mut18Page 7 of 19(page number not for citation purposes)intron and the second and third base of the branchpointsequence are good indicators of splicing efficiency. WeUsing protein expression as a measurement of splicingefficiency requires that: 1) the level of protein abundanceBMC Genomics 2008, 9:355 http://www.biomedcentral.com/1471-2164/9/355is proportional to the mRNA abundance (for a givengene) in the cell and, 2) the abundance of mRNA in thecell reflects any change in splicing efficiency. To demon-strate that RPS17B follows these general rules, we ana-lyzed a number of Libri et al.'s [8] mutants that havepreviously documented changes in mRNA levels for theirprotein expression levels. The sequences tested were thewildtype RPS17B intron, and the 5mUB1, 3mUB1,8mUB1, 5mDB1, and 3mDB1 mutated introns. The levelsof protein expression, as shown in Figure 5, are propor-tional to the levels of copper-resistance in the coppergrowth assay in [8]. Moreover, our approach is able toprovide a quantifiable measure for mutants such as3mDB1 and 5mDB1, which did not support any growth inthe copper growth assay. Thus, using changes in proteinexpression levels in the context of different intronsequences to assay the effects of mutations on splicingefficiency is a valid approach.New RPS17B intron mutantsintronic pre-mRNA secondary structure and splicing effi-ciency. The most important structural characteristic usedfor mutant design was structural branchpoint distance (ds)of its MFE and suboptimal structures. Four mutants thatare predicted to splice efficiently were designed to havemultiple suboptimal structures with contact conforma-tion (Figure 4) and short average structural branchpointdistance (these mutants are labeled with letter 'S', whichstands for short ds). The only exception is mutant rps17b-S2, which does not have any suboptimal structures withcontact conformation, but still exhibits a short structuralbranchpoint distance (most of the suboptimal predictionshave ds = 10). This mutant was designed to test whethercontact conformation, rather than the resulting shortstructural branchpoint distance, is important for splicing.Four mutants that are predicted to have reduced splicingwere designed not to have any structures with contact con-formation or otherwise short structural branchpoint dis-tances (these mutants are labeled with letter 'L', whichstands for long ds).The mutant design was based on mfold predictions, whileRNAsubopt predictions where used post-experimentallyto analyze the results. Mfold also samples the suboptimalspace of secondary structures, however it does not com-pute all possible structures and the sample is muchsmaller. Although the distribution of ds computed basedon structure predictions by mfold is similar to the onebased on RNAsubopt predictions, the average distancesfor RNAsubopt predictions are not as distinct between 'S'and 'L' mutants as ones based on mfold predictions.Table 5 shows average ds for newly designed mutantsbased on RNAsubopt predictions and base-pair probabil-ities computed by RNAfold. The analogous table based onmfold suboptimal predictions, which was used in thedesign process is given in Additional file 3.As seen in Figure 6, mutants rps17b-L1, rps17b-L2 andrps17b-L4 have reduced protein expression levels whencompared to the wildtype as expected. Mutant rps17b-L3has reduced splicing efficiency but not as much as theother three mutants with long structural branchpoint dis-tances. As previously explained, this mutant was designedto have reduced splicing based on suboptimal predictionsby mfold, which failed to predict any structures with ds <10. However, RNAsubopt, which does a more rigoroussampling of the suboptimal space, detected a small frac-tion of suboptimal structures that have ds < 10 (see Addi-tional file 4). This is in agreement with the relatively highprobability of base-pairing interaction between the donorsite and the branchpoint sequence (0.21).A part of the wildtype RPS17B intron secondary structure that sh ws base-pairing betwee  the don r site and the br nc point sequencFigu e 4A part of the wildtype RPS17B intron secondary structure that shows base-pairing between the donor site and the branchpoint sequence. The highlighted stem is the same as the one identified in [9] using experimental structure prob-ing.donorsitebranchpointsequencerest ofintronGUACGUACCACGAGAUGUUGAUGAAGCCGGAUAUGAUGGACUGGGCGCUGAACACAUGAAAUGAGGGCAAGGU U UG CAGAGAGAUUG CGAAUGGCACAUUCUAUCUUAUCC AAUGGUCUUG A AGAGAGGUAUUUAC UAACUUAAGUUGUCUCAUUUGAUUAUUGCUAUUUUUAUAG5’3’stem confirmedby structural probingDB1B1UPage 8 of 19(page number not for citation purposes)We designed 8 new RPS17B intron mutants for the pur-pose of testing our current model of correlation betweenMutants rps17b-S1, rps17b-S2, and rps17b-S3 are all splicedefficiently, as predicted. The efficient splicing of mutantBMC Genomics 2008, 9:355 http://www.biomedcentral.com/1471-2164/9/355rps17b-S2, which has short structural branchpoint dis-tance (ds = 10) without contact conformation in many ofthe predicted structures, suggests that a specific structuralarrangement between the donor site and the branchpointsequence is not required for efficient splicing. Mutantrps17b-S4 shows reduced levels of protein abundance,which is in disagreement with our prediction. Themutated sequence for this mutant has the same locationas the mutated sequence for the mutant rps17b-S3, whichis efficiently spliced, thus we can exclude the possibilitythat the discrepancy in splicing is sequence-based. A pos-sible explanation for this phenomenon may be the exist-ence of a very thermodynamically stable stem (with freeenergy ΔG = -36.6 kcal/mol) that holds the 5' splice siteand the branchpoint together (analogous stems inwildtype introns have much higher free energy, see Sec-tion 2.3). This stem may be too stable to be disrupted,which might prevent the spliceosome to bind to the splicesignals [8]. Overall, the results on the new RPS17B intronmutants are consistent with the proposed model of therole of intronic secondary structure in gene splicing inyeast.Selecting additional genes for experimental validationTo further validate our hypothesis regarding the role ofintron secondary structure in splicing, we selected addi-tional yeast, intron-containing genes to test our model.The selection criteria were: the linear distance (number ofnucleotides) between the donor site and the branchpointsequence is greater than 200 nt (5'L introns); the introndoes not contain an snRNA gene; the gene is not essential(i.e., cells are viable if the gene is mutated or deleted); andthe protein product has relatively high abundance in thecell, is amenable to c-terminal tagging, and has molecularweight between 20–120 kDa (to facilitate manipulation).From our initial dataset of 98 yeast genes that contain 5'Lintrons (see Materials and Methods), 18 genes matchedthe selection criteria (17 of these were ribosomal proteingenes). We selected two of these for the experiments: theribosomal protein gene RPS6B (YBR181C) and theamino-peptidase gene APE2 (YKL157W).The RPS6B gene contains one intron of length 352 nt,with a linear branchpoint distance (the distance betweenthe 5' splice site and the branchpoint sequence) of d = 329nt. The computed structural branchpoint distance (ds) is18 for the MFE and all the suboptimal computationallypredicted secondary structures within 5% of the MFE.Thus for this intron, unlike for the RPS17B intron, thedonor and branchpoint sequences are not base-paired.The APE2 gene contains one intron of length 383 nt, witha linear branchpoint distance of d = 327 nt. One of thesuboptimal structures within 5% of the MFE has a struc-tural branchpoint distance of 6 and the others have greaterdistances. In the suboptimal prediction that has ds = 6there is no base-pairing interactions between the donorand branchpoint sequences.RPS6B intron mutantsWe designed intron mutants for the RPS6B gene in a sim-ilar manner as for the RPS17B gene: the mutants that aresupposed to have efficient splicing were designed to havesimilar structural branchpoint distances as the wildtypeintron, and the mutants that are supposed to have reducedsplicing were designed to have longer distances (see Addi-tional file 5). Table 6 shows average structural branch-point distances for a sample of 1000 suboptimalpredictions within 5% of the MFE and the probability ofshort branchpoint distance derived form the base-pairingprobabilities. The reported probability is the highest base-pair probability between the first donor nucleotide andany nucleotide within 20 bases away from the branch-point adenosine. This guarantees that the branchpointdistance in a secondary structure that contains that base-pair will be no longer than 20.From Figure 7 we can see that all of the 'S' mutants, whichTable 3: Base-pairing probabilities of contact conformation (Figure 4) for the wildtype (wt) RPS17B intron and Libri et al.'s [8] intron mutants.mutant base-pairing probability splicing efficiencywt 0.40 efficient3mUB1 0.33 slightly reduced5mUB1 0.31 slightly reduced8mUB1 0.34 efficient3mDB1 0.01 inhibited5mDB1 < 0.01 inhibited3mUB1_3mDB1 0.01 inhibited5mUB1_5mDB1 0.11 slightly reduced6mUB1 0.05 inhibited4mUB1 0.18 reducedTable 4: Base-pairing probabilities of contact conformation for the wildtype (wt) RPS17B intron and Charpentier and Rosbash's [9] intron mutants.mutant base-pairing probability splicing efficiencywt 0.40 normalmut-UB1i 0.04 reducedmut-DB1i 0.25 reducedmut-UB1iDB1i 0.40 improvedmut-5 0.04 reducedmut-12 0.40 improvedPage 9 of 19(page number not for citation purposes)have structural branchpoint distances similar to thewildtype intron, are expressed at levels similar to themut-18 0.40 improvedBMC Genomics 2008, 9:355 http://www.biomedcentral.com/1471-2164/9/355wildtype. Mutant rps6b-L1, which has avg(ds) = 40 showsa reduction in splicing efficiency. The probability of ds <20 also correlates well with the protein expression dataexcept for mutant rps6b-S5 for which ds > 20 for all subop-timal predictions. Thus, for the RPS6B gene, structuralbranchpoint distances slightly longer than 20 seem to bestill optimal for splicing. To summarize, the proteinexpression data for the RPS6B gene containing designedintron mutants are compatible with our proposed modelof splicing efficiency dependence on the structuralbranchpoint distance.APE2 intron mutantsUsing the same selection criteria as before, we designed sixAPE2 intron mutants. The values for average ds and theprobabilities of structural branchpoint distance shorterthan 20 are given in Table 7, and the histograms of struc-tural branchpoint distance distributions are given in Addi-tional file 6.The experimental results are consistent with our predic-tion for five out of seven mutants: mutants ape2-S1, ape2-S2, ape2-S3 and ape2-S5 all have a level of protein abun-dance similar to the wildtype (Figure 8) and mutant ape2-L1 shows significantly reduced expression as expected.Mutant ape2-L2, which was expected to have reduced pro-tein abundance as a consequence of reduced splicing effi-ciency, is expressed at the same level as the wildtype. Also,mutant ape2-S4 has reduced splicing despite the fact thatit has a similar distribution of structural branchpoint dis-tances as the wildtype intron. Since this mutant has themutation at the same location as ape2-L1 (see Materialsand Methods), it is possible that the intron segment thatwe mutated was important for splicing (e.g., contained asplicing enhancer). Overall, the results for APE2 mutantssupport our hypothesis of the role of structural branch-point distance in gene splicing.Shortening of branchpoint distances by zipper stemsProtein expression levels for the RPS17B gene containing some of Libri et al.s [8] mutant intronsFigure 5Protein expression levels for the RPS17B gene containing some of Libri et al.'s [8] mutant introns. Expression levels are normal-ized with respect to the internal loading control and plotted as a fraction of the wildtype expression level. Shaded boxes repre-sent the mean value for several different samples and error bars represent +/- 1 standard deviation for these samples. The error bar for the wildtype intron comes from the comparison of two different wildtype samples.wt 3mUB1 5mUB1 8mUB1 3mDB1 5mDB100. inserted into RPS17B genenormalized level of protein expressionProtein expression results for Libri’s mutantsPage 10 of 19(page number not for citation purposes)The splicing efficiency study of RPS17B, RPS6B and APE2genes containing wildtype and mutant introns supportsBMC Genomics 2008, 9:355 http://www.biomedcentral.com/1471-2164/9/355our hypothesis that short structural branchpoint distancesare required for efficient splicing. Although these dis-tances are computed in the context of the secondary struc-ture of the entire intron, our hypothesis is still consistentwith the original hypothesis [3] that attributes the short-ening of a long branchpoint distance to a single stem.Such stems, which we will refer to as 'zipper' stems, sincethey 'zip' the intron, are probably essential for achieving ashort structural branchpoint distance. If we analyze thecomputed secondary structures of the RPS17B, RPS6B andAPE2 wildtype introns we can easily identify stable stemswhose 3' and 5' constituents are close to the donor siteand the branchpoint sequence (Figure 9). The zipper stemlabeled in the RPS17B intron is the same as the one iden-tified in [9] using experimental structure probing.To further test the functional importance of the identifiedzipper stems we performed comparative structure analysisusing several closely related yeast species (S. paradoxus, S.mikatae, and S. bayanus, as well as S. cerevisiae, all belong-ing to the Saccharomyces sensu stricto group). We used mul-tiple sequence alignments to extract the orthologousintron sequences for our three genes [23,24]. BothRPS17B and RPS6B intron alignments contain three sensustricto sequences. The multiple sequence alignment forAPE2 contains all four sequences; however, these are notintronic sequences but sequences from the exon 2 of theAPE2 gene. This error is due to the old S. cerevisiae anno-Table 5: Characteristics of newly designed RPS17B mutants.mutant avg(ds) bp probwt 26.67 0.40rps17b-L1 43.63 0.0rps17b-L2 41.11 0.0rps17b-L3 34.05 0.21rps17b-L4 32.98 0.04rps17b-S1 24.55 0.40rps17b-S2 29.62 0.03rps17b-S3 12.65 0.80rps17b-S4 9.27 0.70avg(ds) – average structural branchpoint distances of 1000 suboptimal structures predicted by RNAsubopt; bp prob – base-pairing probability of interaction between the donor site and the branchpoint sequence based on the partition function.Protein expression results for the RPS17B gene containing the newly designed mutant intronsFigure 6Protein expression results for the RPS17B gene containing the newly designed mutant introns.wt L1 L2 L3 L4 S1 S2 S3 S400. inserted into RPS17B genenormalized level of protein expressionProtein expression results for new RPS17B intron mutantsPage 11 of 19(page number not for citation purposes)BMC Genomics 2008, 9:355 http://www.biomedcentral.com/1471-2164/9/355tation which mapped two genes to the location of the cur-rent APE2 gene [25].We computed the consensus structure of RPS17B andRPS6B introns using Alifold [26]. The previously indi-cated zipper stems were found in the consensus structuresfor both genes (Figure 9), thus suggesting evolutionaryconservation of these structural elements.DiscussionThe hypothesis that secondary structure interactionswithin yeast introns are needed for efficient splicing wasproposed two decades ago [3]. Since then, experimentalevidence in support of this hypothesis was found for sev-eral of S. cerevisiae's introns [6-11]. These studies identi-fied complementary segments located downstream of thedonor site and upstream of the branchpoint sequencewhose base-pairing interactions are essential for splicing.It is conjectured that the function of the formed stem is tobring the donor site and the branchpoint sequence closertogether so that they are in optimal alignment for spliceo-some assembly.In this paper we use computational RNA secondary struc-ture prediction to study structural requirements for effi-cient splicing in yeast. Our approach considers arepresentative sample of suboptimal structures with freeenergies close to the MFE and it also considers the entiresecondary structure of an intron, rather than a single stem,both of which are more consistent with the nature of RNAmolecules. Furthermore, the approach includes a calcula-tion of the structural branchpoint distance, which is usedto quantify the effect of the secondary structure on the dis-tance between the donor site and the branchpointsequence and can easily be correlated with splicing effi-ciency measurements. Using this method we were able toidentify structural characteristics of the RPS17B intronand its mutants that seem to be responsible for their splic-ing differences. Notably, mutants that are likely to have ashort structural branchpoint distance are spliced moreefficiently.Based on our model of structural requirements for effi-cient splicing we computationally designed intronmutants for three S. cerevisiae genes, RPS17B, RPS6B andAPE2, and experimentally tested their splicing efficiency.The results were mostly consistent with our model, with afew exceptions (rps17b-L3, rps17b-S4, ape2-L1 and ape2-S4) which may be due to some structural characteristics ofmutants that are not considered by the current model orsome inherent approximations in the model that are dis-cussed below. Some of the intron mutants that weredesigned to have different structural characteristics andsplicing efficiencies have mutations at the same locations(e.g., rps17b-L3 and 8mUB1; rps17b-S3 and 3mDB1; rps6b-L1 and rps6b-S3). The experimental results that confirmdifferences in splicing between these pairs of mutantsindicate that the secondary structure of a pre-mRNA,rather than the underlying primary sequence, is responsi-ble for differences in splicing.We also tested our model on the YRA1 gene intron, whosesplicing efficiency had previously been studied by Prekerand Guthrie [27]. The published experimental resultswere in agreement with our model; the efficiently splicedmutants (ΔL10 and ΔTCC/GGA) had higher base-pairprobabilities than the poorly spliced sequences (wildtypeintron and mutants ΔR/L10, TCC ΔL10, GGA ΔL10 andTCC+GGA ΔL10) (data not shown).Our current model is simplified in the sense that the sec-ondary structure of an intron is computed disregarding itsflanking sequences, and the three dimensional branch-point distance is estimated from secondary structure inter-actions. However, we believe that folding intronicsequences in isolation is appropriate, partly because of theexistence of co-transcriptional splicing, where splicingoccurs before the entire pre-mRNA has been synthesized[28-30]. Therefore, the precise part of the pre-mRNA thatserves as the splicing substrate is not known. The regionupstream of the transcribed intron, which consists of the5' UTR and the first exon, is also not precisely defined dueto the fact that the transcription start sites have not beenunambiguously mapped [31]. In addition, 5'UTRs areknown to associate with a number of protein factors[32,33] which are likely to have an effect on the structureformation, but these interactions are not currently mod-elled by computational RNA secondary structureapproaches. A preliminary investigation, in which we con-sidered some of the upstream region yielded inconclusiveresults (data not shown). Thus, we believe that foldingonly intronic sequences gives us a reasonable approxima-tion of the secondary structure of an intron at the time ofthe splicing reaction.Table 6: Characteristics of newly designed RPS6B mutants.mutant avg(ds) bp probwt 18.06 0.84rps6b-L1 36.74 0rps6b-S1 19.08 0.65rps6b-S2 18.04 0.84rps6b-S3 18.04 0.83rps6b-S4 18.09 0.84rps6b-S5 22.00 0Page 12 of 19(page number not for citation purposes)The approximation of the three dimensional branchpointdistance using pre-mRNA secondary structure is necessaryavg(ds) – average structural branchpoint distance; bp prob – probability of ds < 20.BMC Genomics 2008, 9:355 http://www.biomedcentral.com/1471-2164/9/355since there are no reasonably reliable algorithms for pre-dicting RNA tertiary structure. However, it is believed thatRNA secondary structure plays a crucial role in tertiarystructure formation, since most tertiary interactions arethought to arise after the formation of a stable secondarystructure, when the molecule is able to bend around theflexible, single-stranded regions [34,35]. Moreover, thetertiary structure interactions that arise in the later stagesof folding are usually too weak to disrupt secondary struc-ture that has already formed. Therefore, we believe thatthe structural branchpoint distance based on the second-ary structure interactions provides a reasonable approxi-mation of the true spatial distance.ConclusionOur computational study offers further insights into therole of pre-mRNA secondary structure in gene splicing inyeast. We show that it is necessary to consider near-opti-mal structure predictions to be able to detect structuraldifferences between intron mutants that have differentsplicing efficiencies. We also propose a novel method forquantifying a distance between two bases in an RNA sec-ondary structure and apply this to compute structuralbranchpoint distances in the studied intron mutants. Pos-itive experimental results on three different yeast genessuggest that our model of structural requirements for effi-cient splicing can be applied universally to all 5'L yeastintrons. Additional laboratory experiments are needed torefine the current model by determining the upper boundof the structural branchpoint distance needed for efficientsplicing and acceptable thermodynamic stability of thestems adjacent to splicing signals. Considering that sev-eral biological studies indicate that shortening of thebranchpoint distance, either by formation of secondaryProtein expression results for the RPS6B gene containing the newly designed mutant intronsFigure 7Protein expression results for the RPS6B gene containing the newly designed mutant introns.wt L1 S1 S2 S3 S4 S500. inserted into RPS6B genenormalized level of protein expressionProtein expression results for RPS6B intron mutantsTable 7: Characteristics of newly designed APE2 mutants.mutant avg(ds) bp probwt 27.90 0.37ape2-L1 75.73 0ape2-L2 69.68 0ape2-S1 8.93 0.82ape2-S2 23.33 0.50ape2-S3 24.60 0.45ape2-S4 25.14 0.42ape2-S5 4.10 0.99Page 13 of 19(page number not for citation purposes)structure or by protein interactions, is important for effi-cient splicing in Drosophila melanogaster and some mam-avg(ds) – average structural branchpoint distance; bp prob – probability of ds < 20.BMC Genomics 2008, 9:355 http://www.biomedcentral.com/1471-2164/9/355malian species [12,13], it might be possible to extend ourmodel to define structural requirements for efficient splic-ing in other eukaryotes. Another possible application ofour findings is in gene-finding, where structural character-istics of identified long introns can be used to distinguishbetween real and false positive predictions.MethodsComputational RNA secondary structure predictionIn this work we used four different RNA secondary predic-tion tools: mfold [14,15], RNAsubopt [20], RNAfold [22]and Alifold [26].Mfold was used for predicting MFE secondary structuresfor Libri et al.'s [8] mutants and for predicting suboptimalstructures within 5% of the MFE during the mutant designprocess. Mfold uses dynamic programming to identify theMFE secondary structure and a set of suboptimal struc-tures within a user defined percentage from the MFE for agiven RNA sequence. We used both, the web (3.2) andRNAsubopt was used to compute a sample of 1000 sub-optimal structures within the 5% from the MFE. Unlikemfold, it computes all suboptimal secondary structureswithin a user defined energy range or percentage from theMFE for a given RNA sequence. It can also draw a randomsample of the computed suboptimal structures using theirBoltzmann weights. We used the command line versionof RNAsubopt with options "-ep 5 -p 1000 -noLP", whichspecify the percentage from the MFE (5%), random sam-ple size (1000) and disable prediction of helices of length1.RNAfold was used to compute partition function andbase-pair probabilities. RNAfold uses dynamic program-ming to compute the MFE secondary structure of a givenRNA sequence, but when run with option '-p' it also com-putes base-pair probabilities.Alifold was used to compute consensus secondary struc-ture for RPS17B and RPS6B introns based on the align-Protein expression results for the APE2 gene containing the newly designed mutant intronsFigure 8Protein expression results for the APE2 gene containing the newly designed mutant introns. Protein expression level is normal-ized with respect to wildtype expression level. Shaded boxes represent the mean value for several different samples and error bars represent +/- 1 standard deviation for these samples.wt L1 L2 S1 S2 S3 S4 S500. inserted into APE2 genenormalized level of protein expressionProtein expression results for APE2 intron mutantsPage 14 of 19(page number not for citation purposes)command line (3.0) versions of mfold with defaultparameters.ment of introns in Sensu stricto species. It uses modifieddynamic programming algorithms that add a covarianceBMC Genomics 2008, 9:355 http://www.biomedcentral.com/1471-2164/9/355term to the standard energy model to compute a consen-sus secondary structure for a set of aligned RNAs.RNAsubopt, RNAfold and Alifold are part of the ViennaRNA secondary structure package [22] (we used version1.7). All four algorithms use free energy calculation basedon Turner's nearest neighbour energy model [15,36-38].Distance calculation in an RNA secondary structureCalculating the spatial distance between two nucleotidesin a folded RNA molecule requires knowledge of the terti-ary structure of the molecule. Since currently there are noreasonably reliable algorithms for predicting RNA tertiarystructure, our distance calculation is based solely on RNAsecondary structure. Considering that secondary structureis generally believed to play a crucial role in tertiary struc-ture formation [34,35], this approach should give us agood approximation of the true spatial distance.To calculate the structural branchpoint distance ds, weconsider a predicted secondary structure of the intronicpre-mRNA as an undirected graph whose vertices arenucleotide bases and whose edges correspond to thebonds between the nucleotides. These bonds can be eithersugar-phosphate bonds between the nucleotides in theRNA chain or the hydrogen bonds between paired basesin a given RNA secondary structure. Figure 10 shows theconversion from an RNA secondary structure to the sec-ondary structure graph representing it. To compute thealgorithm requires a directed graph, we represent eachnon-directed edge (u, v) as two directed edges, (u, v) and(v, u). All edges in the RNA secondary structure graph haveuniform weight w(u, v) = 1.In our implementation of the algorithm, the inputs to theprogram are a pseudoknot-free RNA secondary structurein dot-bracket notation (Vienna format) and the locationsof two bases for which the distance needs to be calculated.These bases are the first nucleotide of the intron and thebulging A in the branchpoint sequence (UACUAAC). Theoutput of the program is the shortest distance betweenthese two bases, which we consider as the structuralbranchpoint distance (ds) for the given intron secondarystructure. The program is available at http://cs.ubc.ca/~rogic/splicing.html.Mutant sequencesWe used two basic strategies for designing intron mutantswith desired structural characteristics. To obtain mutantswith long structural branchpoint distances we aimed todisrupt a zipper stem that was bringing the donor site andthe branchpoint sequence close together in the wildtypeintron. Conversely, for the mutants designed to have effi-cient splicing we aimed to stabilize the zipper stem foundin the wildtype intron. With these strategies in mind, weused a combination of a trial-and-error approach and sec-ondary structure designs computed by RNA Designer [40]to obtain mutant sequence with desired structural charac-Portions of the RPS17B, RPS6B and APE2 introns containing computationally identified zipper stemsFigure 9Portions of the RPS17B, RPS6B and APE2 introns containing computationally identified zipper stems. The free energy values (ΔG) for the shaded zipper stem are given in parentheses. Stems conserved between Saccharomyces sensu stricto group are also labeled.Page 15 of 19(page number not for citation purposes)distance between two vertices in the graph, we employedDijkstra's shortest-path algorithm [39]. Since Dijkstra'steristics.BMC Genomics 2008, 9:355 http://www.biomedcentral.com/1471-2164/9/355Most of the intron mutants that we designed have seg-ment substitutions around 20–30 nt long. Sequence seg-ments of this size allowed us to rearrange the secondarystructure of a mutant in a desired way. The exception ismutant rps6b-S5 which has three short insertions (8 nt intotal) in the polypyrimidine tract of RPS6B intron. Mutantrps17b-L3 is a result of two 3-nucleotide-segment substitu-tions in Libri et al.'s [8] mutant 8mUB1 (the middlesequence of lower case letters represents the original8mUB1 mutation). Similarly, mutant rps17b-S3 is a resultof a 4-nucleotide-segment substitution in the 3mDB1mutant (the first segment of lower case letters representsthe original 3mDB1 mutation). Table 8 gives the locationand sequence of mutant substitutions and Figure 11depicts mutant locations with respect to the secondarystructure of the introns we studied.Generation and assaying of intron mutantsUsing the TRP1 gene as a selectable marker, RPS17B,RPS6B and APE2 were tagged at their genomic locus witha -13MYC fragment to generate C-terminal proteinfusions in yeast strains derived from a s288c background[41]. Western blotting with a MYC antibody (Covanceselected gene plus 5' and 3' flanking sequences weredeleted through homologous recombination with theURA3 selectable marker in each of these tagged strains.Intron DNA containing sequences homologous to regions5' and 3' of the URA3 insertion plus the selected intronmutations were created by PCR. Transformation of thesefragments into the appropriate intron deletion strainresults in recombination, removal of the URA3 gene, andinsertion of the mutant intron sequence. The URA3 geneproduct leads to cell death when placed on 5-fluorooroticacid (5-FOA) due to the conversion of 5-FOA to a toxic byproduct [42]. After transformation, cells can be selectedon 5-FOA for those strains that have lost URA3 via inser-tion of the mutant intron, and thus can grow in the pres-ence of 5-FOA. PCR was used to confirm that 5-FOAresistant strains were the result of insertion of the mutatedintron in place of URA3. Each intron mutation was subse-quently confirmed by sequencing. Strains containing thecorrect intron mutations were mated with a strain carryinga 13 MYC epitope tagged protein of a different molecularweight as an internal control and assayed for proteinexpression levels by western blotting. Western blottingwas performed using 20–200 ng of whole cell lysate withConversion from the RNA secondary structure to the graph representing itFigure 10Conversion from the RNA secondary structure to the graph representing it. (a) Graphical representation of the secondary structure of an intron produced by mfold (filled-in circles represent base-pairing interactions, i.e., hydrogen bonds). (b) Graph representing the RNA structure in (a). The bolded path between the source and target vertices is the one found by the algo-rithm to be the shortest (ds = 11).Page 16 of 19(page number not for citation purposes)Research Products) confirmed expression of the correctsize protein product in each strain. The intron of thea MYC antibody (Covance Research Products) and wasquantitated after being developed with ECL Plus WesternBMC Genomics 2008, 9:355 http://www.biomedcentral.com/1471-2164/9/355Page 17 of 19(page number not for citation purposes)Table 8: Specifications for the new intron mutants used in our study.mutant segment location original sequence substitution/insertionrps17b-L1 258–268 (11 nt) UGAAGAGAGGU augagacaacurps17b-L2 138–157 (20 nt) GAUUAGAAAACUCCAUUACU cuuaaguuaguaaauaccucrps17b-L3 22–47 (26 nt) UGAAGCCGGAUAUGAUGGACUGGGC uuaAGCCGcuacuacuUGGACUGucgrps17b-L4 167–189 (23 nt) AGAAGAGCGCUCAAUGAAGUAGU uggcuuggguuaguaggugccucrps17b-S1 217–231 (15 nt) AAUUGCUUUCGAAUG uuucauguguucagcrps17b-S2 280–286 (7 nt) UAAGUUG uacguacrps17b-S3 246–253 (8 nt) AUCCAAUG uagCggcurps17b-S4 244–253 (10 nt) UUAUCCAAUG cuucaucaacrps6b-L1 21–54 (34 nt) CCUUAGAAUUCUAAUGAAUCAGCACGCGCUAACC guauuuugggugugucccuguuauaaauaauaccrps6b-S1 19–29 (11 nt) AUCCUUAGAAU uuuguuaguaarps6b-S2 87–113 (27 nt) CACAAAUUAGUGCACUAUAAUAAAAAU uuauaaauagugauaccauuugguaaarps6b-S3 21–57 (37 nt) CCUUAGAAUUCUAAUGAAUCAGCACGCGCUAACCGGC aaauuccaacguuucccugcaacaugccuuucuuccgrps6b-S4 38–55 (18 nt) AUCAGCACGCGCUAACCG auucccaacagacuguccrps6b-S5 337–345 GUAUUAUUU GgUguucAUUAUUacaUape2-L1 159–175 (17 nt) UGUUACCCUCAUAUUCU ggguacaauuaauagagape2-L2 237–252 (16 nt) GCAAUAGCUUAGGUAA ccuucguacuuuugggape2-S1 23–37 (15 nt) CAAAGAAACAAGGAA agggcagaaauagaaape2-S2 43–57 (15 nt) AUACAUAAUAUAAAU aacugguagguacguape2-S3 237–252 (16 nt) GCAAUAGCUUAGGUAA caaugaaugagaacucape2-S4 159–175 (17 nt) UGUUACCCUCAUAUUCU aaauauuaccuaagcuaape2-S5 300–322 (23 nt) CUCGUUACCGACCUUUGAGUUCU uuaagcuuuuguguuugagaacaThe upper case letters represent the original sequences and the lower case letters represent substitution or insertion sequences. The first base of an intron is numbered 1.Location of mutations with respect to the secondary structure for (a) RPS17B, (b) RPS6B, and (c) APE2 intronsFigure 11Location of mutations with respect to the secondary structure for (a) RPS17B, (b) RPS6B, and (c) APE2 introns. The two lines for each mutant indicate the beginning and end of the sequence segment that was modified.GUAUGUUUACACAAAAGCUUAACAAAGAAACAAGGAAAAGAUAUACAUAAUAUAAAUUACCACUGAUUAUUUUGAACAUGAGA CCGAAAUACCAGGAUGUAUUACUAGCCUUGUUUACCAUUCUUCA ACUGGUGA U UACAAAGUGAGUUCU CAUUCAUUGUUACCCUCAUAUUCUUAUCUUCUGCAGCAGGA AA AA GGGAU U UACCCAACUAAUUGACUUUGAUAGAAUUGUGAUUGCA AUAGCUUAGGUAAUAUUUCAGUUAGACUACGACAUCCAGUUAUUUUUUUCUAUUUCUCUGCU C G UUACCGACCUUUGAGUU CUUACUAACUUAC A U UCUCUAUUAAUUGUACCCAAAAGUACGAAGGACAACUAAAAUGACCAG5’3’donorsitebranchpointsequenceape2-S5ape2-S1ape2-S2ape2-L1ape2-S4ape2-L2ape2-S3donorsitebranchpointsequenceGUAUGUAAUAUCACCCAAAUCCUUAGAAUUCUAAUGAAUCAGCACGCGCUAACCGGCUGUUUCUGACUGUUUGAUAAACGUAUACCCACAAAUUAGUGCAC U AUAAU A AAAAUUCUCAAGAACAACGUUGUUUAAACGAGAUAAU UCCCUC U AAUAUACACGUA CC GACACUUAGGAAAA U AUCUCGCUAAGUUC AAAUUAAGGAAUGAAAAAGGAAUUU ACGAAAAGGGU UUAGAAAUAUCAAUGAAAAUAA GA AAAACCUGUAACGGAAGAAAGGACAGCAGGGAUUCGUUGGAAUUUGUCGA UAUUGGCUUCGGACAACUUUACUAACAAAUGGUAUUAUUUAUAACAG5’3’rps6b-S5rps6b-S1rps6b-S4rps6b-L1rps6b-S3rps6b-S2branchpointsequencedonorsiteGUACGUACCACGAGAUGUUGAUGAAGCCGGAUAUGAUGGACUGGGCGCUGAACACAUGAAAUGAGGGCAAGGU U UG CAGAGAGAUUGAAAGCGUUAUGGGAACGAGGGGACCAGCAGGGCAUUCUUAUUUAUGAGCAGAUUAGAAAACU C CAUUACUGAUUAGUUUAGAAGAGCGCUC AA UG AAGUAGU AGAUAUUUAAAAGAUCACCAAAUAACCAAUUGCUUUCGAAUGGCACAUUCUAUCUUAUCC AAUGGUCUUG A AGAGAGGUAUUUAC UAACUUAAGUUGUCUCAUUUGAUUAUUGCUAUUUUUAUAG5’3’rps17b-S2rps17b-L1rps17b-S3rps17b-S4rps17b-S1rps17b-L4rps17b-L2rps17b-L3Libri et al.’s and Charpentierand Rosbash’s mutantsUB1DB1(a) (b) (c)BMC Genomics 2008, 9:355 http://www.biomedcentral.com/1471-2164/9/355Blotting Detection reagent (Amersham Bioscience) usinga Storm Imaging system (Amersham Bioscience). For eachmutant assayed, the internal control was used to normal-ize protein loading, and the experiments were performeda minimum of 2 times on two independently derivedmutant isolates.Yeast intron datasetIn order to obtain a high quality yeast intron dataset weconsulted three databases: the Ares lab Yeast Intron Data-base [43], the Yeast Intron DataBase [44], and the Com-prehensive Yeast Genome Database [45]. For additionalinformation, we used the Saccharomyces Genome Data-base (SGD) [46]. We constructed our dataset by includingintrons that have consistent annotations between at leasttwo of the three databases. We considered only intronsfrom single-intron genes (which represent the majority ofintron-containing genes in S. Cerevisiae) that interrupt thegene's coding region (this excluded introns found in the 5'UTR region). The number of introns found to have a con-sistent annotation between at least two databases was 214(there are ~240 introns in the yeast genome). Eleven ofthese were excluded because they were not supported bythe latest comparative genomic study [23], which labeledthem as possible misannotations. The final dataset con-tains 203 yeast introns, 155 of which are experimentallyverified and 48 are putative introns. There are 98 long(5'L) and 105 short (5'S) introns. We call this dataset theSTRuctural INtron (STRIN) dataset. The STRIN dataset isavailable at http://cs.ubc.ca/~rogic/splicing.html.Authors' contributionsSR conceived the study, performed computational experi-ments and drafted the manuscript. HHH and AKM partic-ipated in the design and coordination of the study andtogether with BFO supervised the research project. BMdesigned and performed laboratory experiments andhelped draft the manuscript. PH helped design laboratoryexperiments. All authors analyzed the results andreviewed drafts of the manuscript. All authors read andapproved the final manuscript.Additional materialAcknowledgementsWe gratefully acknowledge valuable feedback from the four anonymous reviewers, which helped us to improve several aspects of our study.S. Rogic was supported by an NSERC (the Natural Sciences and Engineering Research Council of Canada) Discovery Grant to A. Mackworth, who also holds a Canada Research Chair in Artificial Intelligence. H. Hoos was partly supported by the Mathematics of Information Technology and Complex Systems (MITACS) Network of Centres of Excellence. B. Montpetit was supported by awards from NSERC and the Michael Smith Foundation for Health Research. P. Hieter is supported by the Canadian Institutes of Health Research (grant MOP-38096) and the U.S. National Institutes of Health (grant P01-CA0161519).ReferencesAdditional file 1Minimum free energy structures for Libri et al.'s [8] mutants predicted by mfold.Click here for file[http://www.biomedcentral.com/content/supplementary/1471-2164-9-355-S1.pdf]Additional file 2Distribution histograms of structural branchpoint distances for (a) wt, (b) UB1i, (c) DB1i, (d) UB1iDB1i, (e) mut-5, (f) mut-12, and (g) mut-18 introns.Click here for file[http://www.biomedcentral.com/content/supplementary/1471-2164-9-355-S2.pdf]Additional file 3Structural characteristics of newly designed RPS17B mutants based on mfold predictions: ds – structural branchpoint distances for MFE and all suboptimal predictions within 5% from the MFE; avg – average ds; bp prob – base-pairing probability of interaction between the donor site and the branchpoint sequence based on the partition function.Click here for file[http://www.biomedcentral.com/content/supplementary/1471-2164-9-355-S3.pdf]Additional file 4Distribution histograms of structural branchpoint distances for (a) rps17b-L1, (b) rps17b-L2, (c) rps17b-L3, (d) rps17b-L4, (e) rps17b-S1, (f) rps17b-S2, (g) rps17b-S3, and (h) rps17b-S4 mutants.Click here for file[http://www.biomedcentral.com/content/supplementary/1471-2164-9-355-S4.pdf]Additional file 5Distribution histograms of structural branchpoint distances for (a) RPS6B wildtype intron, (b) rps6b-L1, (c) rps6b-S1, (d) rps6b-S2, (e) rps6b-S3, (f) rps6b-S4, and (g) rps6b-S5 mutants.Click here for file[http://www.biomedcentral.com/content/supplementary/1471-2164-9-355-S5.pdf]Additional file 6Distribution histograms of structural branchpoint distances for (a) APE2 wildtype intron, (b) ape2-L1, (c) ape2-L2, (d) ape2-S1, (e) ape2-S2, (f) ape2-S3, (g) ape2-S4, and (h) ape2-S5 mutants.Click here for file[http://www.biomedcentral.com/content/supplementary/1471-2164-9-355-S6.pdf]Page 18 of 19(page number not for citation purposes)1. Chow LT, Gelinas RE, Broker TR, Roberts RJ: An amazingsequence arrangement at the 5' ends of adenovirus 2 mes-senger RNA.  Cell 1977, 12:1-8.Publish with BioMed Central   and  every scientist can read your work free of charge"BioMed Central will be the most significant development for disseminating the results of biomedical research in our lifetime."Sir Paul Nurse, Cancer Research UKYour research papers will be:available free of charge to the entire biomedical communitypeer reviewed and published immediately upon acceptancecited in PubMed and archived on PubMed Central BMC Genomics 2008, 9:355 http://www.biomedcentral.com/1471-2164/9/3552. Berget SM, Moore C, Sharp PA: Spliced segments at the 5' ter-minus of adenovirus 2 late mRNA.  Proc Natl Acad Sci USA 1977,74(8):3171-3175.3. Parker R, Patterson B: Architecture of fungal introns: implica-tions for spliceosome assembly.  In Molecular biology of RNA: newperspectives Edited by: Inouye M, Dudock BS. San Diego, CA, USA:Academic Press, Inc; 1987:133-149. 4. Spingola M, Grate L, Haussler D, Ares M: Genome-wide bioinfor-matic and molecular analysis of introns in Saccharomyces cer-evisiae.  RNA 1999, 5(2):221-234.5. Kupfer DM, Drabenstot SD, Buchanan KL, Lai H, Zhu H, Dyer DW,Roe BA, Murphy JW: Introns and splicing elements of fivediverse fungi.  Eukaryot Cell 2004, 3(5):1088-1100.6. Newman A: Specific accessory sequences in Saccharomycescerevisiae introns control assembly of pre-mRNAs into spli-ceosomes.  EMBO J 1987, 6(12):3833-3839.7. Goguel V, Rosbash M: Splice site choice and splicing efficiencyare positively influenced by pre-mRNA intramolecular basepairing in yeast.  Cell 1993, 72(6):893-901.8. Libri D, Stutz F, McCarthy T, Rosbash M: RNA structural patternsand splicing: molecular basis for an RNA-based enhancer.RNA 1995, 1(4):425-436.9. Charpentier B, Rosbash M: Intramolecular structure in yeastintrons aids the early steps of in vitro spliceosome assembly.RNA 1996, 2(6):509-522.10. Mougin A, Grégoire A, Banroques J, Ségault V, Fournier R, Brulé F,Chevrier-Miller M, Branlant C: Secondary structure of the yeastSaccharomyces cerevisiae pre-U3A snoRNA and its implica-tion for splicing efficiency.  RNA 1996, 2(11):1079-1093.11. Howe KJ, Ares M: Intron self-complementarity enforces exoninclusion in a yeast pre-mRNA.  Proc Natl Acad Sci USA 1997,94(23):12467-12472.12. Chen Y, Stephan W: Compensatory evolution of a precursormessenger RNA secondary structure in the Drosophila mel-anogaster Adh gene.  Proc Natl Acad Sci USA 2003,100(20):11499-11504.13. Martinez-Contreras R, Fisette JF, Nasim FU, Madden R, Cordeau M,Chabot B: Intronic binding sites for hnRNP A/B and hnRNP F/H proteins stimulate pre-mRNA splicing.  Plos Biol 2006, 4(2):.14. Zuker M, Mathews DH, Turner DH: Algorithms and thermody-namics for RNA secondary structure prediction: A practicalguide.  In RNA Biochemistry and Biotechnology NATO ASI Series, Klu-wer Academic Publishers; 1999. 15. Mathews DH, Sabina J, Zuker M, Turner DH: Expanded sequencedependence of thermodynamic parameters improves pre-diction of RNA secondary structure.  J Mol Biol 1999,288(5):911-940.16. Christoffersen RE, Mcswiggen DJ: Application of computationaltechnologies to ribozyme biotechnology products.  J Mol Struc-ture 1994, 311:273-284.17. Betts L, Spremulli LL: Analysis of the role of the Shine-Dalgarnosequence and mRNA secondary structure on the efficiencyof translational initiation in the Euglena gracilis chloroplastatpH mRNA.  J Biol Chem 1994, 269(42):26456-26463.18. Freyhult E, Gardner PP, Moulton V: A comparison of RNA foldingmeasures.  BMC Bioinformatics 2005, 6:241-241.19. Morgan SR, Higgs PG: Evidence for kinetic effects in the foldingof large RNA molecules.  J Chem Phys 1996, 105(16):7152-7157.20. Wuchty S, Fontana W, Hofacker IL, Schuster P: Complete subop-timal folding of RNA and the stability of secondary struc-tures.  Biopolymers 1999, 49(2):145-165.21. McCaskill JS: The equilibrium partition function and base pairbinding probabilities for RNA secondary structure.  Biopoly-mers 1990, 29(6–7):1105-1119.22. Hofacker IL, Fontana W, Stadler LPF, Bonhoeffer S, Tacker M, Schus-ter P: Fast folding and comparison of RNA secondary struc-tures.  Monatsh Chem 1994, 125:167-188.23. Kellis M, Patterson N, Endrizzi M, Birren B, Lander E: Sequencingand comparison of yeast species to identify genes and regu-latory elements.  Nature 2003, 423(6937):241-254.24. Saccharomyces sensu stricto alignments   [http://www-genome.wi.mit.edu/annotation/fungi/comp_yeasts/downloads.html]25. Davis CA, Grate L, Spingola M, Ares M: Test of intron predictionsreveals novel splice sites, alternatively spliced mRNAs and26. Hofacker IL: Vienna RNA secondary structure server.  NucleicAcids Res 2003, 31(13):3429-3431.27. Preker PJ, Guthrie C: Autoregulation of the mRNA export fac-tor Yra1p requires inefficient splicing of its pre-mRNA.  RNA2006, 12(6):994-1006.28. Elliott DJ, Rosbash M: Yeast pre-mRNA is composed of twopopulations with distinct kinetic properties.  Exp Cell Res 1996,229(2):181-188.29. Kotovic KM, Lockshon D, Boric L, Neugebauer KM: Cotranscrip-tional recruitment of the U1 snRNP to intron-containinggenes in yeast.  Mol Cell Biol 2003, 23(16):5768-5779.30. Görnemann J, Kotovic KM, Hujer K, Neugebauer KM: Cotranscrip-tional spliceosome assembly occurs in a stepwise fashion andrequires the cap binding complex.  Mol Cell 2005, 19:53-63.31. Zhang Z, Dietrich FS: Mapping of transcription start sites inSaccharomyces cerevisiae using 5' SAGE.  Nucleic Acids Res2005, 33(9):2838-2851.32. Neugebauer KM: On the importance of being co-transcrip-tional.  J Cell Sci 2002, 115(Pt 20):3865-3871.33. Yu MC, Bachand F, McBride AE, Komili S, Casolari JM, Silver PA:Arginine methyltransferase affects interactions and recruit-ment of mRNA processing and export factors.  Genes Dev2004, 18(16):2024-2035.34. Brion P, Westhof E: Hierarchy and dynamics of RNA folding.Annu Rev Biophys Biomol Struct 1997, 26:113-137.35. Tinoco I, Bustamante C: How RNA folds.  J Mol Biol 1999,293(2):271-281.36. Freier SM, Kierzek R, Jaeger JA, Sugimoto N, Caruthers MH, Neil-sonT, Turner DH: Improved free-energy parameters for predic-tions of RNA duplex stability.  Proc Natl Acad Sci USA 1986,83(24):9373-9377.37. Turner DH, Sugimoto N, Jaeger JA, Longfellow CE, Freier SM,Kierzek R: Improved parameters for prediction of RNA struc-ture.  Cold Spring Harb Symp Quant Biol 1987, 52:123-133.38. Turner DH, Sugimoto N: RNA structure prediction.  Annu Rev Bio-phys Biophys Chem 1988, 17:167-192.39. Dijkstra EW: A note on two problems in connexion withgraphs.  Numerische Mathematik 1959, 1:269-271.40. Andronescu M, Fejes AP, Hutter F, Hoos HH, Condon A: A newalgorithm for RNA secondary structure design.  J Mol Biol2004, 336(3):607-624.41. Longtine MS, McKenzie A, Demarini DJ, Shah NG, Wach A, BrachatA, Philippsen P, Pringle JR: Additional modules for versatile andeconomical PCR-based gene deletion and modification inSaccharomyces cerevisiae.  Yeast 1998, 14(10):953-961.42. Boeke JD, Trueheart J, Natsoulis G, Fink GR: "5-Fluoroorotic acidas a selective agent in yeast molecular genetics".  MethodsEnzymol 1987, 154:164-175.43. Grate L, Ares M: Searching yeast intron data at Ares lab Website.  Methods Enzymol 2002, 350:380-392.44. Lopez PJ, Séraphin B: YIDB: the Yeast Intron DataBase.  NucleicAcids Res 2000, 28:85-86.45. Mewes HW, Frishman D, Güldener U, Mannhaupt G, Mayer K,Mokrejs M, Morgenstern B, Münsterkötter M, Rudd S, Weil B: MIPS:a database for genomes and protein sequences.  Nucleic AcidsRes 2002, 30:31-34.46. Saccharomyces Genome Database   [http://www.yeastgenome.org/]yours — you keep the copyrightSubmit your manuscript here:http://www.biomedcentral.com/info/publishing_adv.aspBioMedcentralPage 19 of 19(page number not for citation purposes)new introns in meiotically regulated genes of yeast.  NucleicAcids Res 2000, 28(8):1700-1706.


Citation Scheme:


Citations by CSL (citeproc-js)

Usage Statistics



Customize your widget with the following options, then copy and paste the code below into the HTML of your page to embed this item in your website.
                            <div id="ubcOpenCollectionsWidgetDisplay">
                            <script id="ubcOpenCollectionsWidget"
                            async >
IIIF logo Our image viewer uses the IIIF 2.0 standard. To load this item in other compatible viewers, use this url:


Related Items