Antimicrobial peptide host toxicity prediction with transfer learning for proteins

UBC Theses and Dissertations

Featured Collection

UBC Theses and Dissertations

Antimicrobial peptide host toxicity prediction with transfer learning for proteins Taho, Figali

Abstract

Antimicrobial peptides (AMPs) are host defense peptides produced by all multicellular organisms, and can be used as alternative therapeutics in peptide-based drug discovery. In large peptide discovery and validation pipelines, it is important to avoid time and resource sinks that arise due to the necessity of experimentally validating a large number of peptides for toxicity. Therefore, in silico methods the prediction of antimicrobial peptide toxicity can be applied in advance to filter out any sequences that may be of toxic nature. While many machine learning-based approaches exist for predicting toxicity of proteins, it is often defined as a problem of classifying venoms and toxins from proteins that are nonvenomous. In my thesis I propose a new method called tAMPer that focuses on the classification of AMPs that may or may not induce host toxicity based on their sequences. I have used deep learning model ELMo as adapted by SeqVec to obtain vector embeddings for a dataset of synthetic and natural AMPs that have been experimentally tested in vitro for their toxicity through hemolytic and cytotoxicity assays. This is a balanced dataset that contains ~2600 sequences, split 80/20 into train and test set. By utilizing the latent representation of the data by SeqVec, and by further applying ensemble learning methods on these embeddings I have built a model that is capable of predicting toxicity of antimicrobial peptides with a F1 score of 0.758 and accuracy of 0.811 on the test set, and performing comparably better than state-of-the-art approaches both when trained and tested on our dataset as well as on other methods’ respective train and test datasets.

Item Metadata

Title	Antimicrobial peptide host toxicity prediction with transfer learning for proteins
Creator	Taho, Figali
Publisher	University of British Columbia
Date Issued	2020
Description	Antimicrobial peptides (AMPs) are host defense peptides produced by all multicellular organisms, and can be used as alternative therapeutics in peptide-based drug discovery. In large peptide discovery and validation pipelines, it is important to avoid time and resource sinks that arise due to the necessity of experimentally validating a large number of peptides for toxicity. Therefore, in silico methods the prediction of antimicrobial peptide toxicity can be applied in advance to filter out any sequences that may be of toxic nature. While many machine learning-based approaches exist for predicting toxicity of proteins, it is often defined as a problem of classifying venoms and toxins from proteins that are nonvenomous. In my thesis I propose a new method called tAMPer that focuses on the classification of AMPs that may or may not induce host toxicity based on their sequences. I have used deep learning model ELMo as adapted by SeqVec to obtain vector embeddings for a dataset of synthetic and natural AMPs that have been experimentally tested in vitro for their toxicity through hemolytic and cytotoxicity assays. This is a balanced dataset that contains ~2600 sequences, split 80/20 into train and test set. By utilizing the latent representation of the data by SeqVec, and by further applying ensemble learning methods on these embeddings I have built a model that is capable of predicting toxicity of antimicrobial peptides with a F1 score of 0.758 and accuracy of 0.811 on the test set, and performing comparably better than state-of-the-art approaches both when trained and tested on our dataset as well as on other methods’ respective train and test datasets.
Genre	Thesis/Dissertation
Type	Text
Language	eng
Date Available	2020-09-24
Provider	Vancouver : University of British Columbia Library
Rights	Attribution-NonCommercial-NoDerivatives 4.0 International
DOI	10.14288/1.0394496
URI	http://hdl.handle.net/2429/76155
Degree (Theses)	Master of Science - MSc
Program (Theses)	Bioinformatics
Affiliation	Science, Faculty of
Degree Grantor	University of British Columbia
Graduation Date	2020-11
Campus	UBCV
Scholarly Level	Graduate
Rights URI	http://creativecommons.org/licenses/by-nc-nd/4.0/
Aggregated Source Repository	DSpace

Open Collections

UBC Theses and Dissertations

UBC Theses and Dissertations

Antimicrobial peptide host toxicity prediction with transfer learning for proteins Taho, Figali

Abstract

Item Metadata

Item Media

Item Citations and Data

Rights