Machine learning in transcriptome analysis using long RNA sequencing data

UBC Theses and Dissertations

Featured Collection

UBC Theses and Dissertations

Machine learning in transcriptome analysis using long RNA sequencing data Hafezqorani, Saber

Abstract

The advent of long-read RNA sequencing technologies has advanced transcriptomic research by enabling the sequencing of entire transcripts. This advancement holds the promise of uncovering novel insights into the complex nature of eukaryotic transcriptomes, characterized by phenomena such as alternative splicing and intron retention. However, the vast amounts of data generated by these technologies necessitates the development of sophisticated tools for effective data analysis. This thesis addresses these challenges through two main objectives: the development of a novel simulator for long-read RNA sequencing data, capable of accurately mimicking transcriptome-specific features and the application of deep learning techniques for the robust analysis of RNA sequencing data. The first objective addresses the need for a simulator tailored specifically for RNA sequencing data from Oxford Nanopore Technologies, aiming to create a tool that not only generates reads mimicking real sequencing outputs, but also incorporates critical transcriptomic features such as expression levels and intron retention events. This simulator serves as a resource for the development and refinement of transcriptome analysis tools, offering a cost-effective alternative to extensive sequencing experiments to generate ground-truth data for benchmarking. The second objective explores the potential of deep learning in transcriptomics, focusing on the development of a nucleotide sequence embedding method. It aims to capture the complex, long-term dependencies within sequences, a task that has proven challenging due to the variable length and intricate nature of RNA sequences. By leveraging deep learning's capacity to learn feature representations implicitly, this research seeks to enhance the accuracy and efficiency of sequence classification and clustering tasks within transcriptomic studies.

Item Metadata

Title	Machine learning in transcriptome analysis using long RNA sequencing data
Creator	Hafezqorani, Saber
Supervisor	Birol, Inanc
Publisher	University of British Columbia
Date Issued	2024
Description	The advent of long-read RNA sequencing technologies has advanced transcriptomic research by enabling the sequencing of entire transcripts. This advancement holds the promise of uncovering novel insights into the complex nature of eukaryotic transcriptomes, characterized by phenomena such as alternative splicing and intron retention. However, the vast amounts of data generated by these technologies necessitates the development of sophisticated tools for effective data analysis. This thesis addresses these challenges through two main objectives: the development of a novel simulator for long-read RNA sequencing data, capable of accurately mimicking transcriptome-specific features and the application of deep learning techniques for the robust analysis of RNA sequencing data. The first objective addresses the need for a simulator tailored specifically for RNA sequencing data from Oxford Nanopore Technologies, aiming to create a tool that not only generates reads mimicking real sequencing outputs, but also incorporates critical transcriptomic features such as expression levels and intron retention events. This simulator serves as a resource for the development and refinement of transcriptome analysis tools, offering a cost-effective alternative to extensive sequencing experiments to generate ground-truth data for benchmarking. The second objective explores the potential of deep learning in transcriptomics, focusing on the development of a nucleotide sequence embedding method. It aims to capture the complex, long-term dependencies within sequences, a task that has proven challenging due to the variable length and intricate nature of RNA sequences. By leveraging deep learning's capacity to learn feature representations implicitly, this research seeks to enhance the accuracy and efficiency of sequence classification and clustering tasks within transcriptomic studies.
Genre	Thesis/Dissertation
Type	Text
Language	eng
Date Available	2024-07-29
Provider	Vancouver : University of British Columbia Library
Rights	Attribution-NonCommercial-NoDerivatives 4.0 International
DOI	10.14288/1.0444844
URI	http://hdl.handle.net/2429/88727
Degree (Theses)	Doctor of Philosophy - PhD
Program (Theses)	Bioinformatics
Affiliation	Science, Faculty of
Degree Grantor	University of British Columbia
Graduation Date	2024-11
Campus	UBCV
Scholarly Level	Graduate
Rights URI	http://creativecommons.org/licenses/by-nc-nd/4.0/
Aggregated Source Repository	DSpace

Open Collections

UBC Theses and Dissertations

UBC Theses and Dissertations

Machine learning in transcriptome analysis using long RNA sequencing data Hafezqorani, Saber

Abstract

Item Metadata

Item Media

Item Citations and Data

Rights