UBC Theses and Dissertations
Recommendation-based approaches to unveil binding preferences of experimentally unexplored RNA-binding proteins Yang, Shu
RNA-binding proteins (RBPs) interact with their RNA targets to mediate various critical cellular processes in post-transcriptional gene regulations such as RNA splicing, modification, replication, localization, etc.. Characterizing the binding preferences of RBP is essential for us to decipher the underlying interaction code and to understand the functions of the interaction partners. However, currently only a minority of the numerous RBPs have RNA binding data available from in vivo or in vitro experiments. The binding preferences of experimentally unexplored RBPs remain largely unknown and are challenging to identify. In this thesis, we take machine learning based recommendation approaches to address this problem. We focus on leveraging the binding data currently available to infer the RNA preferences for RBPs that have not been experimentally explored. Firstly, we present a recommendation method based on co-evolutions to predict the RNA binding specificities for experimentally unexplored RBPs, waiving the need of the RBPs' binding data. We first demonstrate the co-evolutionary relationship between RBPs and their RNA targets. We then describe a K-nearest neighbors based algorithm to explore co-evolutions to infer the RNA binding specificity of an RBP using only the specificities information from its homologous RBPs. Secondly, we present a nucleic acid recommender system to predict probe-level binding profiles for unexplored or poorly studied RBPs. We first encode biological sequences to distributed feature representations by adapting word embedding techniques. We then build a neural network to recommend binding profiles for unexplored RBPs by learning the similarities between them and RBPs that have binding data available. Thirdly, we present a graph convolutional network for unexplored RBPs' binding affinities recommendation. Extending from the previous two approaches, this method adopts a transductive message passing setting to incorporate more information from the data. It predicts the interaction affinity between an unexplored RBP and an RNA probe by propagating information from other explored RBP-RNA interactions through a heterogeneous graph of RBPs and RNAs. Overall, the approaches presented here can help to improve the understanding of RBPs' binding mechanisms and provide new opportunities to investigate the complex post-transcriptional regulations.
Item Citations and Data
Attribution-NonCommercial-NoDerivatives 4.0 International