RNA secondary structure prediction using hierarchical folding

UBC Theses and Dissertations

Featured Collection

UBC Theses and Dissertations

RNA secondary structure prediction using hierarchical folding Hosna, Hosna

Abstract

Algorithms for prediction of RNA secondary structure— the set of base pairs that form when an RNA molecule folds— are valuable to biologists who aim to understand RNA structure and function. Improving the accuracy and efficiency of prediction methods'is an ongoing challenge, particularly for pseudoknotted secondary structures, in which base pairs overlap. This challenge is biologically important, since pseudoknotted structures play essential roles in functions of many RNA molecules, such as splicing and ribosomal frameshifting. State-of-the- art methods, which are based on free energy minimization, have high runtime complexity (typically θ(n⁵) or worse), and can handle (minimize over) only limited types of pseudoknotted structures. We analyze a new approach for prediction of pseudoknotted structures, motivated by the hypothesis that RNA structures fold hierarchically, with pseudoknot free (non-overlapping) base pairs forming first, and pseudoknots forming later so as to minimize energy relative to the folded pseudoknot free structure. Our HFold algorithm, based on work of S. Zhao, uses two-phase energy minimization to predict hierarchically-formed secondary structures in 0(n³) time, matching the complexity of the best algorithms for pseudoknot free secondary structure prediction via energy minimization. Our algorithm can handle a wide range of biological structures, including kissing hairpins and nested kissing hairpins, which have previously required θ(n⁶) time. We also report on the experimental evaluations of HFold and present thorough analyses of the results. We show that if the input structure to the algorithm is correct, running the algorithm results in 16% accuracy improvement on average over the accuracy of the true pseudoknot free structures. However if the input structure is not correct, the accuracy improvement is not significant. If the first 10 suboptimal foldings are given as input to our algorithm instead of just the minimum free energy structure (MFE), the prediction accuracy improves significantly over the accuracy of the MFE structures. This improvement is even more when the number of suboptimal foldings as input to our algorithm increases. The comparison of the energy of the structures predicted by HFold on the true pseudoknot free structures with the energy of the true structures calculated using a different method with the same energy model shows that the energy model may be the cause for the cases for which HFold predicts structures far from the true structures. Our experimental result provides some ways in which the hierarchical folding hypothesis might need to be refined.

Item Metadata

Title	RNA secondary structure prediction using hierarchical folding
Creator	Hosna, Hosna
Publisher	University of British Columbia
Date Issued	2007
Description	Algorithms for prediction of RNA secondary structure— the set of base pairs that form when an RNA molecule folds— are valuable to biologists who aim to understand RNA structure and function. Improving the accuracy and efficiency of prediction methods'is an ongoing challenge, particularly for pseudoknotted secondary structures, in which base pairs overlap. This challenge is biologically important, since pseudoknotted structures play essential roles in functions of many RNA molecules, such as splicing and ribosomal frameshifting. State-of-the- art methods, which are based on free energy minimization, have high runtime complexity (typically θ(n⁵) or worse), and can handle (minimize over) only limited types of pseudoknotted structures. We analyze a new approach for prediction of pseudoknotted structures, motivated by the hypothesis that RNA structures fold hierarchically, with pseudoknot free (non-overlapping) base pairs forming first, and pseudoknots forming later so as to minimize energy relative to the folded pseudoknot free structure. Our HFold algorithm, based on work of S. Zhao, uses two-phase energy minimization to predict hierarchically-formed secondary structures in 0(n³) time, matching the complexity of the best algorithms for pseudoknot free secondary structure prediction via energy minimization. Our algorithm can handle a wide range of biological structures, including kissing hairpins and nested kissing hairpins, which have previously required θ(n⁶) time. We also report on the experimental evaluations of HFold and present thorough analyses of the results. We show that if the input structure to the algorithm is correct, running the algorithm results in 16% accuracy improvement on average over the accuracy of the true pseudoknot free structures. However if the input structure is not correct, the accuracy improvement is not significant. If the first 10 suboptimal foldings are given as input to our algorithm instead of just the minimum free energy structure (MFE), the prediction accuracy improves significantly over the accuracy of the MFE structures. This improvement is even more when the number of suboptimal foldings as input to our algorithm increases. The comparison of the energy of the structures predicted by HFold on the true pseudoknot free structures with the energy of the true structures calculated using a different method with the same energy model shows that the energy model may be the cause for the cases for which HFold predicts structures far from the true structures. Our experimental result provides some ways in which the hierarchical folding hypothesis might need to be refined.
Genre	Thesis/Dissertation
Type	Text
Language	eng
Date Available	2011-03-02
Provider	Vancouver : University of British Columbia Library
Rights	For non-commercial purposes only, such as research, private study and education. Additional conditions apply, see Terms of Use https://open.library.ubc.ca/terms_of_use.
DOI	10.14288/1.0052065
URI	http://hdl.handle.net/2429/32001
Degree (Theses)	Master of Science - MSc
Program (Theses)	Computer Science
Affiliation	Science, Faculty of; Computer Science, Department of
Degree Grantor	University of British Columbia
Campus	UBCV
Scholarly Level	Graduate
Aggregated Source Repository	DSpace

Item Media

ubc_2007-0443.pdf -- 2.9MB

Item Citations and Data

Rights

For non-commercial purposes only, such as research, private study and education. Additional conditions apply, see Terms of Use https://open.library.ubc.ca/terms_of_use.

Open Collections

UBC Theses and Dissertations

RNA secondary structure prediction using hierarchical folding Hosna, Hosna

Abstract

Item Metadata

Item Media

Item Citations and Data

Rights