UBC Theses and Dissertations

UBC Theses Logo

UBC Theses and Dissertations

Machine learning in transcriptome analysis using long RNA sequencing data Hafezqorani, Saber

Abstract

The advent of long-read RNA sequencing technologies has advanced transcriptomic research by enabling the sequencing of entire transcripts. This advancement holds the promise of uncovering novel insights into the complex nature of eukaryotic transcriptomes, characterized by phenomena such as alternative splicing and intron retention. However, the vast amounts of data generated by these technologies necessitates the development of sophisticated tools for effective data analysis. This thesis addresses these challenges through two main objectives: the development of a novel simulator for long-read RNA sequencing data, capable of accurately mimicking transcriptome-specific features and the application of deep learning techniques for the robust analysis of RNA sequencing data. The first objective addresses the need for a simulator tailored specifically for RNA sequencing data from Oxford Nanopore Technologies, aiming to create a tool that not only generates reads mimicking real sequencing outputs, but also incorporates critical transcriptomic features such as expression levels and intron retention events. This simulator serves as a resource for the development and refinement of transcriptome analysis tools, offering a cost-effective alternative to extensive sequencing experiments to generate ground-truth data for benchmarking. The second objective explores the potential of deep learning in transcriptomics, focusing on the development of a nucleotide sequence embedding method. It aims to capture the complex, long-term dependencies within sequences, a task that has proven challenging due to the variable length and intricate nature of RNA sequences. By leveraging deep learning's capacity to learn feature representations implicitly, this research seeks to enhance the accuracy and efficiency of sequence classification and clustering tasks within transcriptomic studies.

Item Citations and Data

Rights

Attribution-NonCommercial-NoDerivatives 4.0 International