- Library Home /
- Search Collections /
- Open Collections /
- Browse Collections /
- UBC Theses and Dissertations /
- Detecting common secondary structure elements in RNA...
Open Collections
UBC Theses and Dissertations
UBC Theses and Dissertations
Detecting common secondary structure elements in RNA sequences Shah, Sohrab P
Abstract
As evidence for the important and diverse roles of RNA molecules in our cellular machinery continues to grow, there is an increasing interest in developing computational methods to analyse RNA sequences. Sets of evolutionarily related RNA sequences contain signals at both the sequence and secondary structure levels that can be exploited to detect motifs common to all or a portion of those sequences. Motifs conserved in evolution are believed to be functionally important and therefore detection of such motifs could yield novel functional RNA sequences. We developed an algorithm called DISCO to detect conserved motifs in a set of unaligned RNA sequences. Our algorithm uses a powerful probabilistic formalism called covariance models (CM) to model motifs. We introduce a novel approach to initialise a CM using pairwise and multiple sequence alignment. The CM is then iteratively refined using expectation maximisation. Our initialisation method can operate on sequence signals alone using only a portion of the input sequences to initialise a CM to recover the remaining motif instances. We tested our algorithm on 26 data sets derived from Rfam seed alignments of microRNA (miRNA) precursors and conserved elements in the untranslated regions of mRNAs (UTR elements). By three measures of specificity and positive predictive value, our algorithm performed well on the miRNA data sets and showed a bi-modal distribution for the UTR element data sets where the motif was completely missed, or very accurately predicted. In a comparison test with a competing algorithm, DISCO outperformed RNAProfile in measures of sensitivity and positive predictive value, although the running time of RNAProfile was considerably faster. The accuracy of our algorithm was unaffected by average percent pairwise sequence identity, overall length or number of sequences in the input data, indicating that DISCO could be run with similar accuracy on diverse data sets. The running time of DISCO is 0(W³ + L²W² + L³) where W is the width of the motif and L is the length of the longest sequence in the input data. This is an improvement on SLASH, the only other RNA motif finding algorithm in the literature that uses CMs.
Item Metadata
Title |
Detecting common secondary structure elements in RNA sequences
|
Creator | |
Publisher |
University of British Columbia
|
Date Issued |
2005
|
Description |
As evidence for the important and diverse roles of RNA molecules in our cellular
machinery continues to grow, there is an increasing interest in developing computational
methods to analyse RNA sequences. Sets of evolutionarily related RNA
sequences contain signals at both the sequence and secondary structure levels that
can be exploited to detect motifs common to all or a portion of those sequences. Motifs
conserved in evolution are believed to be functionally important and therefore
detection of such motifs could yield novel functional RNA sequences.
We developed an algorithm called DISCO to detect conserved motifs in a set
of unaligned RNA sequences. Our algorithm uses a powerful probabilistic formalism
called covariance models (CM) to model motifs. We introduce a novel approach to
initialise a CM using pairwise and multiple sequence alignment. The CM is then
iteratively refined using expectation maximisation. Our initialisation method can
operate on sequence signals alone using only a portion of the input sequences to
initialise a CM to recover the remaining motif instances.
We tested our algorithm on 26 data sets derived from Rfam seed alignments
of microRNA (miRNA) precursors and conserved elements in the untranslated regions
of mRNAs (UTR elements). By three measures of specificity and positive
predictive value, our algorithm performed well on the miRNA data sets and showed
a bi-modal distribution for the UTR element data sets where the motif was completely
missed, or very accurately predicted. In a comparison test with a competing
algorithm, DISCO outperformed RNAProfile in measures of sensitivity and positive
predictive value, although the running time of RNAProfile was considerably faster.
The accuracy of our algorithm was unaffected by average percent pairwise sequence
identity, overall length or number of sequences in the input data, indicating that
DISCO could be run with similar accuracy on diverse data sets. The running time
of DISCO is 0(W³ + L²W² + L³) where W is the width of the motif and L is the
length of the longest sequence in the input data. This is an improvement on SLASH,
the only other RNA motif finding algorithm in the literature that uses CMs.
|
Genre | |
Type | |
Language |
eng
|
Date Available |
2009-12-11
|
Provider |
Vancouver : University of British Columbia Library
|
Rights |
For non-commercial purposes only, such as research, private study and education. Additional conditions apply, see Terms of Use https://open.library.ubc.ca/terms_of_use.
|
DOI |
10.14288/1.0051575
|
URI | |
Degree | |
Program | |
Affiliation | |
Degree Grantor |
University of British Columbia
|
Graduation Date |
2005-05
|
Campus | |
Scholarly Level |
Graduate
|
Aggregated Source Repository |
DSpace
|
Item Media
Item Citations and Data
Rights
For non-commercial purposes only, such as research, private study and education. Additional conditions apply, see Terms of Use https://open.library.ubc.ca/terms_of_use.