Advancing our understanding of genome regulation via optimization of stem cell differentiation and interpretable deep learning

Name: Advancing our understanding of genome regulation via optimization of stem cell differentiation and interpretable deep learning
Published: 2022
License: http://creativecommons.org/licenses/by-nc-nd/4.0/

Library

UBC Theses and Dissertations

Featured Collection

UBC Theses and Dissertations

Advancing our understanding of genome regulation via optimization of stem cell differentiation and interpretable deep learning Novakovskiy, German

Abstract

The regulation of gene expression is a core challenge in understanding how diverse types of cells can be produced from the same DNA instructions. Insights about this complex machinery advance not only science but applications in therapy and pharmacology. For instance, the differentiation of stem cells for the purpose of regenerative medicine to treat patients with diabetes. In my second chapter, I address the problem of optimizing the differentiation protocol towards definitive endoderm, the precursor of insulin-producing pancreatic beta cells, by replacing the expensive growth factor with cheap molecule alternatives. I introduce a multiple-step pipeline based on small molecule transcriptome response profiles. The discovered chemicals emphasize the importance of key transcription factors in the process, such as HIF and MYC. The study of transcription factors is of high importance, and will further promote our knowledge about differentiation. Motivated by the thought, I explore the current trends of studying transcription factors in the gene regulation context. With large-scale data generation efforts by public consortia such as ENCODE, deep learning methods have become pervasive. A large training dataset is fundamental to the success of these methods, however, the amount of TF-related data is often small. To tackle this issue, in my third chapter, I perform an in-depth assessment of transfer learning for TF binding prediction and provide biologically motivated guidelines for efficient training of deep models when the data is limited. An additional challenge for deep models beyond data sufficiency is interpretability. In the fourth chapter, I systematically categorize and summarize interpretation approaches, exploring their underlying assumptions, strengths, and weaknesses. Inspired by transparent deep learning architectures, I present ExplaiNN, a new transparent model for the genomics tasks. I explore its efficiency and usability on a variety of problems in the fifth chapter of this thesis. Finally, in the last chapter, I apply ExplaiNN to ATAC-seq datasets of mouse and human immune systems to study differences in cis-regulatory logic. Transparency of the new method allowed me to discover a reproducible set of sequence motifs that either individually or combinatorially are responsible for the bulk of the predictions, and tend to have species-specific occurrence patterns.

Item Metadata

Title	Advancing our understanding of genome regulation via optimization of stem cell differentiation and interpretable deep learning
Creator	Novakovskiy, German
Supervisor	Wasserman, Wyeth W.; Mostafavi, Sara
Publisher	University of British Columbia
Date Issued	2022
Description	The regulation of gene expression is a core challenge in understanding how diverse types of cells can be produced from the same DNA instructions. Insights about this complex machinery advance not only science but applications in therapy and pharmacology. For instance, the differentiation of stem cells for the purpose of regenerative medicine to treat patients with diabetes. In my second chapter, I address the problem of optimizing the differentiation protocol towards definitive endoderm, the precursor of insulin-producing pancreatic beta cells, by replacing the expensive growth factor with cheap molecule alternatives. I introduce a multiple-step pipeline based on small molecule transcriptome response profiles. The discovered chemicals emphasize the importance of key transcription factors in the process, such as HIF and MYC. The study of transcription factors is of high importance, and will further promote our knowledge about differentiation. Motivated by the thought, I explore the current trends of studying transcription factors in the gene regulation context. With large-scale data generation efforts by public consortia such as ENCODE, deep learning methods have become pervasive. A large training dataset is fundamental to the success of these methods, however, the amount of TF-related data is often small. To tackle this issue, in my third chapter, I perform an in-depth assessment of transfer learning for TF binding prediction and provide biologically motivated guidelines for efficient training of deep models when the data is limited. An additional challenge for deep models beyond data sufficiency is interpretability. In the fourth chapter, I systematically categorize and summarize interpretation approaches, exploring their underlying assumptions, strengths, and weaknesses. Inspired by transparent deep learning architectures, I present ExplaiNN, a new transparent model for the genomics tasks. I explore its efficiency and usability on a variety of problems in the fifth chapter of this thesis. Finally, in the last chapter, I apply ExplaiNN to ATAC-seq datasets of mouse and human immune systems to study differences in cis-regulatory logic. Transparency of the new method allowed me to discover a reproducible set of sequence motifs that either individually or combinatorially are responsible for the bulk of the predictions, and tend to have species-specific occurrence patterns.
Genre	Thesis/Dissertation
Type	Text; Dataset
Language	eng
Date Available	2022-09-02
Provider	Vancouver : University of British Columbia Library
Rights	Attribution-NonCommercial-NoDerivatives 4.0 International
DOI	10.14288/1.0418592
URI	http://hdl.handle.net/2429/82661
Degree (Theses)	Doctor of Philosophy - PhD
Program (Theses)	Bioinformatics
Affiliation	Science, Faculty of
Degree Grantor	University of British Columbia
Graduation Date	2022-11
Campus	UBCV
Scholarly Level	Graduate
Rights URI	http://creativecommons.org/licenses/by-nc-nd/4.0/
Aggregated Source Repository	DSpace

Item Media

ubc_2022_november_novakovskiy_german.pdf -- 10.12MB

ubc_2022_november_novakovskiy_german_supp.zip -- 36.43MB

Item Citations and Data

Rights

Attribution-NonCommercial-NoDerivatives 4.0 International

Open Collections

UBC Theses and Dissertations

Advancing our understanding of genome regulation via optimization of stem cell differentiation and interpretable deep learning Novakovskiy, German

Abstract

Item Metadata

Item Media

Item Citations and Data

Rights