Space and energy efficient molecular programming and space efficient text indexing methods for sequence alignment

UBC Theses and Dissertations

Featured Collection

UBC Theses and Dissertations

Space and energy efficient molecular programming and space efficient text indexing methods for sequence alignment Thachuk, Christopher Joseph

Abstract

Nucleic acids play vital roles in the cell by virtue of the information encoded into their nucleotide sequence and the folded structures they form. Given their propensity to alter their shape over time under changing environmental conditions, an RNA molecule will fold through a series of structures called a folding pathway. As this is a thermodynamically-driven probabilistic process, folding pathways tend to avoid high energy structures and those which do are said to have a low energy barrier. In the first part of this thesis, we study the problem of predicting low energy barrier folding pathways of a nucleic acid strand. We show various restrictions of the problem are computationally intractable, unless P=NP. We propose an exact algorithm that has exponential worst-case runtime, but uses only polynomial space and performs well in practice. Motivated by recent applications in molecular programming we also consider a number of related problems that leverage folding pathways to perform computation. We show that verifying the correctness of these systems is PSPACE-hard and in doing so show that predicting low energy barrier folding pathways of multiple interacting strands is PSPACE-complete. We explore the computational limits of this class of molecular programs which are capable, in principle, of logically reversible and thus energy efficient computation. We demonstrate that a space and energy efficient molecular program of this class can be constructed to solve any problem in SPACE ---the class of all space-bounded problems. We prove a number of limits to deterministic and also to space efficient computation of molecular programs that leverage folding pathways, and show limits for more general classes. In the second part of this thesis, we continue the study of algorithms and data structures for predicting properties of nucleic acids, but with quite different motivations pertaining to sequence rather than structure. We design a number of compressed text indexes that improve pattern matching queries in light of common biological events such as single nucleotide polymorphisms in genomes and alternative splicing in transcriptomes. Our text indexes and associated algorithms have the potential for use in alignment of sequencing data to reference sequences.

Item Metadata

Title	Space and energy efficient molecular programming and space efficient text indexing methods for sequence alignment
Creator	Thachuk, Christopher Joseph
Publisher	University of British Columbia
Date Issued	2013
Description	Nucleic acids play vital roles in the cell by virtue of the information encoded into their nucleotide sequence and the folded structures they form. Given their propensity to alter their shape over time under changing environmental conditions, an RNA molecule will fold through a series of structures called a folding pathway. As this is a thermodynamically-driven probabilistic process, folding pathways tend to avoid high energy structures and those which do are said to have a low energy barrier. In the first part of this thesis, we study the problem of predicting low energy barrier folding pathways of a nucleic acid strand. We show various restrictions of the problem are computationally intractable, unless P=NP. We propose an exact algorithm that has exponential worst-case runtime, but uses only polynomial space and performs well in practice. Motivated by recent applications in molecular programming we also consider a number of related problems that leverage folding pathways to perform computation. We show that verifying the correctness of these systems is PSPACE-hard and in doing so show that predicting low energy barrier folding pathways of multiple interacting strands is PSPACE-complete. We explore the computational limits of this class of molecular programs which are capable, in principle, of logically reversible and thus energy efficient computation. We demonstrate that a space and energy efficient molecular program of this class can be constructed to solve any problem in SPACE ---the class of all space-bounded problems. We prove a number of limits to deterministic and also to space efficient computation of molecular programs that leverage folding pathways, and show limits for more general classes. In the second part of this thesis, we continue the study of algorithms and data structures for predicting properties of nucleic acids, but with quite different motivations pertaining to sequence rather than structure. We design a number of compressed text indexes that improve pattern matching queries in light of common biological events such as single nucleotide polymorphisms in genomes and alternative splicing in transcriptomes. Our text indexes and associated algorithms have the potential for use in alignment of sequencing data to reference sequences.
Genre	Thesis/Dissertation
Type	Text
Language	eng
Date Available	2013-04-12
Provider	Vancouver : University of British Columbia Library
Rights	Attribution-NonCommercial-NoDerivs 3.0 Unported
DOI	10.14288/1.0052204
URI	http://hdl.handle.net/2429/44172
Degree	Doctor of Philosophy - PhD
Program	Computer Science
Affiliation	Science, Faculty of; Computer Science, Department of
Degree Grantor	University of British Columbia
Graduation Date	2013-05
Campus	UBCV
Scholarly Level	Graduate
Rights URI	http://creativecommons.org/licenses/by-nc-nd/3.0/
Aggregated Source Repository	DSpace

Open Collections

UBC Theses and Dissertations

UBC Theses and Dissertations

Space and energy efficient molecular programming and space efficient text indexing methods for sequence alignment Thachuk, Christopher Joseph

Abstract

Item Metadata

Item Media

Item Citations and Data

Rights