UBC Theses and Dissertations
SAGE2Splice : unmapped SAGE tags reveal novel splice junctions Kuo, Byron Yu-Lin
Serial analysis of gene expression (SAGE) not only is a method for profiling the global expression of genes, but also offers the opportunity for the discovery of novel transcripts. SAGE tags are mapped to known transcripts to determine the source of tags. We hypothesized that tags that map neither to a known transcript nor to the genome span a splice junction, for which the exon combination or exon(s) are unknown. Splice junctions are typically recognized by the pair of highly conserved dinucleotides at each edge of an intron, GT at the 5' end and AG at the 3' end, as well as by other less conserved nucleotides flanking the junctions. In the known transcriptome, between 1.6 to 6.2% of predicted tags span a splice junction. We have developed an algorithm, SAGE2Splice, to efficiently map these unmapped SAGE tags to potential splice junctions in a genome. An evaluation scheme was designed based on position weight matrices to t assess the quality of candidates. Candidates were classified into three types of spliced tags, reflecting the previous annotations of the putative splice junctions. A Type I tag spans a novel junction where the exons are known; a Type 2 tag spans a previously known and an unknown exon; and a Type 3 tag spans two previously unknown exons. Analysis of predicted tags extracted from EST sequences demonstrated that candidate junctions having the splice junction located closer to the centre of the tags are more reliable. Using high sensitivity and high specificity parameters, 7,757 candidates were predicted from 1,639 of 20,000 unmapped tags by SAGE2Splice. We selected 12 r candidates splice junctions and tested them using RT-PCR. Nine of these twelve candidates were validated by RT-PCR and sequencing, and among these, four revealed previously uncharacterized exons. To screen more unmapped SAGE tags, we proposed methods to improve SAGE2Splice in engineering efficiency, program usability, and candidate evaluation methods, as well as to include a high throughput laboratory procedure for testing the predicted candidates. We expect that many more novel transcripts can be discovered using SAGE2Splice. SAGE2Splice is available online at http://www.bcgsc.ca/sage2splice/.