UBC Theses and Dissertations

UBC Theses Logo

UBC Theses and Dissertations

The impact of hybridization on alternative splicing revealed by transcriptome analysis Baute, Gregory Joseph 2011

Your browser doesn't seem to have a PDF viewer, please download the PDF to view this item.

Item Metadata

Download

Media
24-ubc_2011_fall_baute_gregory.pdf [ 1.1MB ]
Metadata
JSON: 24-1.0105083.json
JSON-LD: 24-1.0105083-ld.json
RDF/XML (Pretty): 24-1.0105083-rdf.xml
RDF/JSON: 24-1.0105083-rdf.json
Turtle: 24-1.0105083-turtle.txt
N-Triples: 24-1.0105083-rdf-ntriples.txt
Original Record: 24-1.0105083-source.json
Full Text
24-1.0105083-fulltext.txt
Citation
24-1.0105083.ris

Full Text

THE IMPACT OF HYBRIDIZATION ON ALTERNATIVE SPLICING REVEALED BY TRANSCRIPTOME ANALYSIS  by Gregory Joseph Baute B.Sc., The University of Guelph, 2009  A THESIS SUBMITTED IN PARTIAL FULFILLMENT OF THE REQUIREMENTS FOR THE DEGREE OF MASTER OF SCIENCE in THE FACULTY OF GRADUATE STUDIES (Botany) THE UNIVERSITY OF BRITISH COLUMBIA (Vancouver)  June 2011  © Gregory Joseph Baute, 2011  Abstract Hybridization can result in offspring with transgressive traits when compared to the parents. These hybrid traits may have had an important role in plant evolution, they impact ecological fitness, and they have been exploited extensively in agricultural systems. Gene expression levels and patterns of small RNA expression as well as epigenetic modifications can be affected by hybridization. Alternative splicing (AS) is an important form of gene regulation, where multiple final transcript types are created from a single gene. AS is found in a large number of plant genes and has various functional consequences. Here changes in AS regulation upon hybridization have been assayed on a genome-wide scale using RNA-seq data from rice hybrids. AS was quantified in both parental genotypes and their reciprocal hybrids based on read coverage across genes as well as junction spanning reads. Of the over 24,000 AS events investigated, ~8% happen at different frequencies between the different subspecies. The majority of events in the hybrids occur at additive levels compared to their parents, although a large number of events (~9%) are at non-additive levels. Many nonadditive events in the hybrids are at levels equal to one parent. Several hundred AS events happen at higher or lower levels in at least one of the hybrids then in either parent, indicating transgressive AS. Some of these events, 48 in total, could be unique to the hybrids. The transgressively spliced events could contribute to the phenotype of the hybrids by altering the proteome through introducing different proteins, or via regulation of protein levels through nonsense-mediated decay. These data may also be useful in understanding the evolution of the regulation of AS.  ii  Table of contents Abstract....................................................................................................................................ii Table of contents.....................................................................................................................iii List of tables............................................................................................................................iv List of figures...........................................................................................................................v List of abbreviations...............................................................................................................vi Acknowledgements...............................................................................................................vii 1 Introduction...........................................................................................................................1 1.1  Hybridization.........................................................................................................1 1.1.1 The importance of heterosis........................................................................1 1.1.2 Mechanisms and consequences of heterosis..............................................2 1.2 Alternative splicing................................................................................................5 1.2.1 Functional consequences of AS..................................................................5 1.2.2 AS prevalence and evolution.......................................................................7 1.2.3 Investigating AS with high-throughput sequencing......................................8 1.2.4 AS regulation.............................................................................................10 1.3 AS in hybrids.......................................................................................................12 2 Methods...............................................................................................................................15 2.1 Data sources...................................................................................................... 15 2.2 Alignment............................................................................................................15 2.3 Calling events.....................................................................................................16 2.4 AS quantification.................................................................................................16 2.5 Functional analysis.............................................................................................17 3 Results.................................................................................................................................19 3.1 Read alignment and AS quantification................................................................19 3.2 Intraspecific variation in AS.................................................................................25 3.3 AS in the hybrids compared to their parents.......................................................27 3.4 Changes in AS and histone marks......................................................................32 4 Discussion...........................................................................................................................34 4.1 Differences in AS among the genotypes.............................................................34 4.2 Transgressive AS following hybridization...........................................................35 4.3 Intraspecific variation of AS................................................................................37 4.4 Potential mechanisms responsible for AS variation............................................38 4.5 Conclusions........................................................................................................39 References..............................................................................................................................40  iii  List of tables Table 1. Alignment of RNA-seq read libraries..........................................................................20 Table 2. Types of AS events.....................................................................................................21 Table 3. Shared AS modes between the hybrids.....................................................................29 Table 4. Characteristics of transgressively spliced events.......................................................29 Table 5. Novel AS events in one or both hybrids.....................................................................31 Table 6. Changes in AS and changes in histone marks...........................................................33  iv  List of figures Figure 1. AS event levels across the genotypes......................................................................23 Figure 2. Analysis of AS mean differences..............................................................................24 Figure 3. Intraspecific AS variation.........................................................................................26 Figure 4. Modes of AS inheritance...........................................................................................28 Figure 5. GO enrichment of transgressively spliced genes.....................................................30  v  List of abbreviations AS  Alternative splicing  SR  Serine-rich  PTC  Premature termination codon  NMD  Nonsense-mediated decay  hnRNPs  heterogeneous nuclear ribonucleoproteins  UPF  UP-frameshift  vi  Acknowledgements I would like to thank Dr. Keith Adams for giving me the opportunity to be immersed in science and to pursue this question. The resources and people in the Department of Botany and in the Beaty Biodiversity Research Centre have been crucial to this work. I want to especially thank Shao-Lun (Allen) Liu for all the interesting conversations. Allen has been a very important resource and always had the time to help. This research was supported by the Natural Sciences and Engineering Research Council of Canada through a BRITE training grant to me and a research grant to Keith Adams.  vii  1. Introduction 1.1 Hybridization 1.1.1 The importance of heterosis Hybridization not only plays a key role in modern agriculture, but has been a frequent event in the evolutionary history of land plants (Mallet 2007; Springer and Stupar 2007; Rieseberg 2007). Darwin recognized the impact of crossing plants on their vigour and experimented with many crosses in different species, noting, “To my surprise, the crossed plants when fully grown were plainly taller and more vigorous than the self-fertilised ones” (Darwin 1876). The combination of divergent genomes is observed across eukaryotes and can create novel variability and material for adaptation (Arnold and Martin 2010). One major factor that contributes to the frequent success of hybrids is the fact that many hybrids exhibit transgressive traits, which are traits of the intraspecific or interspecific hybrids that do not fall within the range of expected parental values. These traits can range from gene expression levels to reproductive fitness. The genetic basis of this heterosis, or hybrid vigour, has been of great interest for decades. Several models of genetic interaction have been proposed to explain heterosis at a molecular level (Chen 2010), and the modern tools of molecular biology will allow researchers to investigate these models in more depth than previously possible, as well as discover new pathways by which heterosis may occur. Humans have exploited heterosis extensively in some crop plants. Oryza sativa, rice, is one of the world's most important crops, as it is responsible for producing one fifth of the world's calories (Zhang et al. 2010). Hybridization between subspecies of rice is beginning to play a more important role in modern agriculture as F1 hybrid seed is being adopted globally.  1  Particularly in China, hybrid rice has grown in use and 50% of rice grown there is now hybrid. The gain in yield is significant over conventional high yield varieties, with conventional varieties producing on average 5.4 tons/ha, and hybrids an average of 6.9tons/ha (Virmani and Kumar 2009). These commercial hybrids make use of the diverse variety of Oryza germplasm that is available. This rich genetic diversity has allowed this crop to be grown in a wide variety of climates and conditions around the world. Cultivated O. sativa has a complex history of domestication, and is split largely into two main subspecies, ssp. indica and ssp. japonica. The majority of rice produced is one of these two subspecies, which roughly encompass short and long grain rices, respectively. These subspecies have distinct morphology and are adapted to different growing conditions. Although molecular dating of intergenic regions of these two subspecies suggests that these two lineages have been separated for ~0.5 million years, introgression of useful genes, including genes important for domestication, has occurred between them (Sweeney and McCouch 2007; Kovach, et al. 2007). Hybridization between cultivars of these two sub-species results in F1 hybrids with increased vigour, biomass, and yield (Zhang et al. 2008a). 1.1.2 Mechanisms and consequences of heterosis The molecular underpinnings of heterosis is an area of great interest, and understanding what processes can occur upon hybridization in terms of gene regulation will be a key step in understanding how hybrid vigour arises. Several aspects of genome structure and gene regulation are affected by hybridization. Epigenetic changes, such as changes in DNA methylation and post-translational histone modifications, have been implicated in heterosis (Ni et al. 2009) and have been observed to change at many loci in hybrid rice (He et al. 2010) and synthetic Arabidopsis allopolyploids (Madlung et al. 2002). These epigenetic 2  changes have various impacts aside from affecting transcriptional activity, they have also been implicated in the derepression of various mobile elements in hybrids (Michalak 2009). Small RNAs, many of which are regulators of gene expression, as well as epigenetic modifications, are also affected by hybridization (Kenan-Eichler et al. Levy 2011). Hybrid plants have been shown to exhibit all possible modes of mRNA expression levels compared to parental expression levels; additive, dominant, and transgressive (which includes both over- and underdominance). Most genes are expressed at or between parental levels, but over- and underdominance can occur (Swanson-Wagner et al. 2006; He et al. 2010; Hegarty et al. 2006). Allele-specific gene expression has also been observed, where alleles may inherit the expression levels from their parents (Zhang and Borevitz 2009). There are also examples of allele-specific silencing in F1 hybrids (Springer and Stupar 2007; Adams 2007). Allele-specific silencing may occur upon hybridization in an organ-specific manner, as shown in diploid cotton hybrids (Adams and Wendel 2005). Similarly, cotton allopolyploids exhibit strong homeoallele specific expression bias or silencing (Flagel et al. 2008; Adams et al. 2004). These changes in gene expression upon hybridization have been implicated in heterosis, and a positive correlation between the amount of non-additive gene expression and the amount of heterosis has been observed, although this does not imply causation (Birchler et al. 2010). Why do these observed patterns of non-additive gene expression occur in hybrids? Many factors are involved in determining the level at which a gene will be expressed, and these factors can be divided into two main groups, cis and trans regulators. Elements that are located physically at or near a gene are called cis-acting, and include the promoter and enhancer regions of that gene and the status of associated histones. Trans-acting factors 3  originate from a different region of the genome, and include various proteins involved in transcriptional regulation, such as transcription factors and other regulators, such as small RNAs. As populations of a species or subspecies diverge, they accumulate various changes in these regulatory elements, which may become fixed in the population by adaptive or nonadaptive mechanisms (Wittkopp 2007). These changes in regulation may or may not result in different expression levels. When the divergent regulatory mechanisms are brought together they may interact in novel ways resulting in non-additive expression (Landry et al. 2007). These processes may also involve small RNAs and epigenetic changes, which have been shown to correlate with changes in gene expression (He et al. 2010). It is difficult to assign causative roles to any one process. All of those phenomena have been as categorized as a part of “genome shock” (McClintock 1984) or “transcriptome shock” (Hegarty et al. 2006). Some specific changes at the molecular level have been connected to the extreme phenotypes of hybrids. A change in the epigenetic regulation of key circadian clock genes in an interspecific Arabidopsis hybrid results in increased biomass (Ni et al. 2009). Heterozygosity at a single locus in tomatoes increases fruit yield by up to 60%. A single dosage dependant gene SINGLE FLOWER TRUSS, which is involved in the flowering signal cascade, is responsible (Krieger et al. 2010). Another type of gene regulation that has been shown to be affected by hybridization is alternative splicing (AS) (Nasrallah et al 2007; Scascitelli et al. 2010). These splicing changes that occur following hybridization may have important functional consequences. Interspecific hybrids of Arabidopsis have altered AS patterns, which may be responsible for the breakdown of self incompatibility (Nasrallah et al 2007). Investigating which types of processes can happen following hybridization, and their frequencies, will be an important component in understanding how heterosis occurs. 4  1.2 Alternative splicing AS is a process that creates multiple mRNA isoforms from a single type of pre-mRNA transcript of a gene. This important type of gene regulation can result in altered protein products and can play a role in regulating the amount of protein made. The role and mechanism of AS is largely conserved across the eukaryotes, with many highly conserved components (Reddy, 2007; Nilsen and Graveley, 2010). The prevalence of AS has become more apparent recently with next generation sequencing platforms, where 95% of human genes and 33-48% of all rice genes have evidence for undergoing at least one AS event (Pan, et al. 2008; Zhang et al. 2010; Lu et al. 2010). Alternatively spliced transcripts can have a variety of functional roles in several processes (Reddy 2007). 1.2.1 Functional consequences of AS AS can result in altered protein activity as a consequence of the incorporation of different amino acids into the protein (Reddy, 2007; Nilsen and Graveley, 2010). For example, normal ribulose 1,5-bisphosphate (Rubisco) function in Arabidopsis requires two protein isoforms, which are two different products of the Rubisco transcript that arise from AS (Zhang et al. 2002). AS can also result in different target peptides and sub-cellular localization (Lamberto et al. 2010). Alternative protein products are known to also be important in defence and flowering (Reddy 2007). By creating mRNA isoforms that incorporate an in-frame premature termination codon (PTC), AS may also play a role in regulating gene expression levels. These transcripts may be targets for the nonsense-mediated decay (NMD) pathway. The NMD system has been understood to be a surveillance mechanism for destroying aberrant transcripts which may  5  have been mis-spliced. It is now becoming understood that by coupling AS to NMD, the amount of protein that is made could be regulated. Current genome wide surveys conflict with regards to the impact of NMD on gene regulation. Computational predictions, which are based on the location of premature stop codons in the transcript, estimate as many as 35% of human mRNA isoforms are targets for NMD, and many ultra-conserved sequences, at least in mammals, have roles in AS-directed NMD (McGlincy and Smith, 2008). In rice, 48% of the AS events that have been identified could result in frameshift-causing premature stop codons that may signal the NMD pathway (Zhang et al. 2010). Components of the NMD machinery are highly conserved across eukaryotes. Organisms lacking essential components of the NMD machinery, such as the UP-frameshift genes (UPF1-3), display phenotypes that range from embryo lethal in mice to only slight growth defects under certain growing conditions for yeast. A. thaliana plants deficient in UPF genes have altered leaf and flower morphology, flower later than wild-type plants, and are unable to survive the seedling stage (Arciga-Reyes et al., 2006). Microarray experiments with organisms lacking these essential components of the NMD machinery have shown that NMD does have a role in regulating expression levels (McGlincy and Smith, 2008; Kurihara et al., 2009). Hundreds of genes in A. thaliana are up-regulated in the absence of UPF proteins (Kurihara et al., 2009). Some families of genes are known to be highly regulated by this process, such as the Serine-Rich (SR) proteins and heterogeneous nuclear ribonucleoproteins (hnRNPs) families, and there are also several case studies that found examples of its importance for specific genes (McGlincy and Smith, 2008). The prevalent AS in SR genes in A. thaliana is widely coupled to NMD (Palusa and Reddy, 2010).  6  1.2.2 AS prevalence and evolution In plants, AS events most frequently result in the retention of an intron. Other possible splicing events are the use of an alternative donor or acceptor site, or the inclusion of an alternative exon (which is the most prevalent form of AS in animals). Exons may also be mutually exclusive, where the inclusion of one specific exon prevents the inclusion of another (Reddy, 2007). AS evolution has been studied to varying depths and at multiple levels of conservation. Many studies have investigated genomic conservation, which is the conservation of intron-exon structure across species, and event conservation, which is the conservation of an AS event at a particular homologous site across species. The majority of these studies have been conducted in animal systems (Irimia et al., 2009). One event conservation study that compared rice and A. thaliana found low levels of conservation (~9%), likely because of the evolutionary distance between these two species and the incomplete nature of the available expressed sequence tags (Wang and Brendel, 2006). The least studied area of AS evolution is regulatory conservation of AS events, which are homologous AS events that occur in the same stage of development or under the same environmental conditions across species or populations (Irimia et al., 2009). In animals, regulatory event conservation has been studied in yeast, mice and primates, and these studies found high conservation between the closely related species examined (Rukov et al., 2007; Calarco J. et al., 2007;Blekhman et al. 2010; Harr and Turner, 2010). The only investigations of regulatory conservation in plants have been with A. thaliana ecotypes using a microarray platform. A substantial number of different intron retention events were found between two the ecotypes studied. The differences were attributed to cis regulatory divergence based on splicing patterns of F1 hybrids (Zhang and Borevitz 2009). Investigating 7  the conservation of AS, especially regulatory AS, across and within -species will give new insights into its evolutionary importance and role. 1.2.3 Investigating AS with high-throughput sequencing The advent of high-throughput sequencing technologies has allowed both the identification and quantification of AS simultaneously on a genome-wide scale (Harr and Turner 2010; Blekhman et al. 2010; Pan et al. 2008; Zhang et al. 2010). For transcriptome analysis, short fragments of complementary DNA derived from mRNA are sequenced (RNAseq), and the resulting reads are aligned to a reference sequence. The reads theoretically are a random sampling of the mRNA molecules that were isolated, so the number of reads at a given location of the genome represents the amount of transcripts originating from that site. The number of reads aligning to a gene can be used to quantify gene expression level, and similarly the distribution of reads across a gene can be used to discover and/or quantify the AS of a gene. Quantification of gene expression levels is usually accomplished by normalizing the number of reads aligning to a gene by the length of the gene, as longer genes will have more reads, and the total number of reads which aligned to genes in that library (Mortazavi et al. 2008). There are currently a variety of methods that have been developed for the quantification of AS. Although no consensus has emerged on how best to analyze AS with high-through-put sequencing, a variety of programs have been developed for this purpose (Filichkin et al. 2010; Trapnell et al. 2010; Blekhman et al. 2010; Harr and Turner 2010). Methods for the quantification of AS from RNA-seq data fall into two general categories; isoform quantification and event quantification. Quantification of isoform levels attempts to assign reads to one of a list of whole isoforms. These approaches must address the fact that 8  many reads cannot be unambiguously assigned to an isoform and so some methods use a likelihood approach to assign ambiguous reads, while others simply ignore these reads (Trapnell et al. 2010; Song et al. 2009; Richard et al. 2010; Nicolae et al. 2011). Isoform quantification also requires a set of isoforms to which the reads must be assigned. For most species the list of known isoforms is almost certainly incomplete, which can result in incorrect estimations for the level of AS for the other isoforms. Novel isoform prediction using RNA-seq may help address this problem, but this is also not without its limitations (Trapnell et al. 2010). The accuracy of isoform prediction and quantification may suffer when using short, single-end reads or libraries with low depth coverage. Event quantification, on the other hand, does not suffer from the problems associated with isoform quantification, and can be accomplished by counting the number of reads supporting each AS event. However, one limitation of event quantification is that one does not necessarily know which splicing events have occurred upstream and downstream in the rest of the transcript from which the event arose, so it is not always possible to fully assess its functional impact. Alignment approaches for most AS quantification methods are similar in that two rounds of alignments are conducted, in which the first round attempts to align all reads in their entirety, and the second uses the remaining unaligned reads from the first round to connect potential splice junctions. Split reads, which are reads that span exon-exon boundaries and appear split in an alignment, can be used to identify the exact location of the exon boundaries (Trapnell et al. 2009). Although counting reads at these boundary regions may appear to be the most intuitive way to quantify AS, these reads are not useful in isolation, as they represent a small percentage of the total number of reads originating from the occurrence of that event and so suffer more from sampling stochasticity (Harr and Turner 2010). In the face of many 9  different methods for AS quantification, the choice of type of analysis to carry out should be based on the quality of the reference annotation, the length and quality of the reads, the number of reads generated, and the goals of the project. The RNA-seq data can also be used to identify novel events in addition to events found in the annotation, or they can be used without any annotation or exon boundary information (Trapnell et al 2009). Using RNA-seq to identify new splicing sites is often undertaken with a threshold approach, where events supported by a certain number of reads are considered as true events, but more sophisticated methods have been proposed to take into account quality of the reads along the length of each read (Wang et al. 2010). 1.2.4 AS regulation Arabidopsis and rice have an average of 5 and 4 introns per gene, respectively, each of which may be fully spliced out or undergo one of many possible AS events. Intron excision is carried out by a large multi-protein complex, the spliceosome, that requires direction to precise locations on the precursor mRNA molecule. This is accomplished by the coordination of additional proteins. Several lines of evidence suggest that splicing in plants is largely intron-defined. The majority of plant and animal introns are bordered by a GT-AG consensus splicing sequence, with less than 1% of introns being bordered by non-consensus sequences. Introns in plants also have a higher GC content, a polypyrimidine group near the 3' end, and a branch point for recognition by a specific component of the spliceosome (Reddy 2007). AS occurs when the spliceosome fails to excise an intron or uses an alternate donor or acceptor site. The regulation of AS is developmentally and environmentally controlled, as precise splicing and tight regulation is required for normal growth and development (Blencowe, 2006; Reddy, 2007). 10  The mechanisms involved in the regulation of AS are extremely complex and there is still much to be understood about them (Matlin et al., 2005; Luco et al. 2011). The precision of splicing events depends upon both the cis-acting sequences on the target pre-mRNA and on the trans-acting regulatory proteins that recruit splicing machinery. Specific sequences encoded on the pre-mRNA control spliceosomal targeting in several ways. Other cis-acting regulatory sequences that affect splicing may either enhance or inhibit splicing, and can be found in both introns and exons (Reddy, 2007). Computational studies have found that the presence or absence of these different enhancer or inhibitor motifs has some predictive power with regards to whether or not AS will occur at that site. There are also known exceptions to this rule where enhancer motifs may act to inhibit splicing and vice versa (Nilsen and Graveley, 2010). The level of conservation at synonymous substitution sites, which may act as splicing signals in exons of mammalian species, is indicative of whether that exon undergoes AS (Lu et al. 2009). These synonymous sites do not affect which amino acids get incorporated into the final protein, but are thought to play a role in splicing specificity, as they may act as splicing enhancers or inhibitors. In animals, progress has been made with regards to understanding the role of various sequence motifs on pre-mRNA transcripts and using these motifs to predict how a transcript will be spliced (Barash et al. 2010). It is noteworthy that the many factors involved in AS regulation create a number of selective forces that act at a given site, RNA molecules must encode both protein sequences and splicing information, so gene-encoding DNA sequences are under dual selective pressure (Xing and Lee, 2006; Nilsen and Graveley, 2010). Although splicing can be regulated in a tissue-specific manner few tissue-specific regulators have been identified. Most known regulators are expressed constitutively and 11  exhibit small fluctuations in expression levels across different tissues. The level of splicing is determined by the interaction of many competing regulatory proteins at the splice sites. The SR and hnRNPs are well-studied, and are thought to bind RNA and recruit the spliceosome. These factors are thought to either promote or inhibit splicing, by binding enhancer or inhibitor sequence elements. The SR proteins are normally associated with positive regulation and hnRNPs with negative regulation, although there are known exceptions to this convention. The amount of these different proteins in the cell at a given time probably plays a large part in the control of developmental or stress-specific splicing (Nilsen and Graveley, 2010). More recently, several additional factors have come to light as important in the regulation of AS. Spliceosomal activity can happen cotranscriptionally, and presumably so can AS regulation. The rate of transcription of a gene, or even the rate of transcription of a specific exon or intron, may affect its accessibility to the spliceosome or to various AS regulators. Similarly, the secondary structure of the RNA molecule may also play a role (Nilsen and Graveley 2010). The state of local chromatin and histone modifications may also be important. Surveys of a large number of histone modifications show that they are not randomly distributed across exons and introns, implicating these modifications in some role in splicing (Luco et al. 2011). For example, the tight regulation of AS of the human fibroblast growth factor receptor 2 gene likely involves changes in its histone methylation status (Luco et al. 2010). Many questions remain about the importance of epigenetic modifications in regulating AS, including how many genes are affected (Luco et al. 2011). 1.3 AS in hybrids Given the complex regulation of AS, and the fact that many aspects of gene regulation are affected by hybridization, it is possible that the AS of many genes are also affected by 12  hybridization. When cis or trans splicing regulatory features of diverged lineages are brought together in a hybrid, their interaction could result in differential splice site recognition and the creation of new mRNA isoforms relative to the parents. As the expression levels of many genes are altered in hybrids, some of the genes that are involved in splicing regulation, such as the SR genes or hnRNPs, may have different expression profiles which could result in novel splicing events. Hybrids can also have different chromatin states than their parents, which may be important in AS regulation (Luco et al 2011). Two examples of novel interspecific AS events in hybrids have been found. The first case observed was in a hybrid derived from A. thaliana and A. lyrata in which a gene involved in self incompatibility exhibited altered AS (Nasrallah et al. 2007). The only systematic search for different AS patterns in an interspecific hybrid assayed 40 genes in a hybrid of Populus trichcarpa and P. deltoides. Of these genes, 2 were found to have novel AS events compared to the parents (Scascitelli et al. 2010). As for intraspecific hybrids, exon and intron levels were assayed in hybrids of A. thaliana using a tiling microarray platform but these studies only categorized AS in the hybrids as additive or dominant (Zhang et al. 2008b; Zhang and Borevitz 2009). Gene expression data from a study of rice hybrids presents an excellent resource to examine AS in hybrids in more detail and at a larger scale than previously possible. He at al. (2010) used an Illumina platform to generate RNA-seq data, as well as data on epigenetic marks, in two rice subspecies, O. sativa ssp. japonica (cv Nipponbare) and O. sativa ssp. indica (cv 93-11), and their reciprocal hybrids. This group’s interest was in the interplay of altered expression levels with small RNAs and epigenetic modification in these hybrids (He et al. 2010). In the current study, this data set was used to investigate various aspects of AS between these sub-species and the hybrids. 13  With this study the following questions were addressed: 1) What is the amount of regulatory conservation of AS between these subspecies? 2) What is the frequency of nonadditive AS levels in the hybrids? 3) How many, and what type of, AS events happen at levels in the hybrid that do not occur in either parent and how many of those are potentially novel to the hybrid? 4) What are the possible functional consequences of these AS events in the hybrids? 5) What, if any, is the involvement of post-translational histone modifications in differences in AS between these genotypes?  14  2. Methods 2.1 Data sources The raw Illumina reads from He et al. (2010) were obtained from the NCBI Short Read Archive using SRR034580- SRR034599 for the RNA-seq reads and SRR034622-SRR034661 for the ChIP-seq reads. For each of the parents and both reciprocal hybrids, this data set includes sequences of polyadenylated RNA and chromatin immunoprecipitation sequencing (ChIP-seq) experiments with antibodies for three different post-translational histone modifications. All of the molecular extractions were made from pools of 10 plants, which were split into technical replicates. For the RNA-seq, each of the parental lines had four Illumina lanes and each of the hybrids had six on a Genome analyzer. The genome sequence and annotation of O. sativa ssp japonica was obtained from the Rice Genome Annotation Project (http://rice.plantbiology.msu.edu/). Gene families were obtained from PLAZA 2.0 (Proost et al. 2009). 2.2 Alignment All reads were aligned to the japonica reference genome (Goff et al. 2002), excluding scaffolds outside the 12 chromosomal sequences. The RNA-seq reads were aligned using the program Tophat (Trapnell,et al. 2010) with zero multimapping reads allowed and up to three mismatches per read. The mismatch threshold was increased from the default of two mismatches to reduce the bias against reads originating from indica alleles. Novel junction sites were allowed on top of the annotated junction sites using a maximum allowable intron length of 100,000 nucleotides. The ChIP-seq reads were aligned with the short read aligner Bowtie (Langmead et al. 2009) using similar settings as was used for the RNA-seq 15  alignments. Peaks representing histone locations were called using FindPeaks (Fejes et al. 2008) and assigned to genes using R. 2.3 Calling events A set of AS events that were likely to be occurring in this data set was generated and used throughout the study. This set of AS events was composed by combining the genome annotation with novel splice sites retrieved from the RNA-seq alignments using custom PERL scripts. All isoforms of a gene were compared, and any region that did not appear in all of the isoforms was called as an AS event. The events were classified as either intron retention, exon skipping, alternative donor, alternative acceptor, and alternative first or last exon events. Additional novel events were identified from the split reads, which are reads originating from exon-exon boundaries and align partially in two places of the genomic sequence. If three reads from three flow cells supported a given splicing site, the corresponding event was added to the pool of potential events. All possible intron events were considered except introns that had an annotated skipped exon within them. Only events supported by at least three reads in at least one of the replicates were considered for quantitative analysis. 2.4 AS quantification For the events identified, the relative levels of AS were quantified for each replicate of each genotype. This was accomplished by counting the number of reads for each event that unambiguously supported the event, meaning reads in a region of the transcript that would only be found following that AS event. Intron retention events were quantified by counting all the reads which would have originated from each particular intron retention event; these include reads falling totally within an intron, reads overlapping the 5’ exon-intron boundary  16  and reads overlapping the 3’ boundary. Alternative donor and acceptor events were counted using only reads within the alternative region of the intron, reads that overlap the exon-intron boundary, and split reads found at the alternative site. All reads falling in a skipped exon as well as split reads at the boundaries of that exon were counted in support of the exon skipping event. The total number of reads aligned to each gene was also counted. To obtain the relative frequency of an AS event, taking into account that genes have different levels of coverage across the genotypes, the number of reads supporting that event (E) was divided by the total number of reads aligning to that gene (T), minus the number of reads supporting the  event.     E . The quantified AS levels were then Arcsin transformed. The level of gene T −E  expression was calculated by normalizing the read count with gene length and the total number of reads that aligned to genes in each library. The differences of the normalized AS event levels between the genotypes were estimated using a linear modelling approach and pairwise Tukey tests. To correct for the large number of tests computed, the p-values from the ANOVA tests were corrected using the 'fdr' method and each Tukey test was corrected with the Bonferroni method. From the ANOVA and Tukey tests, the differences between genotypes were parsed out. The same approach, ANOVA followed by pairwise comparisons, was used with the gene expression data to find how genes are differentially expressed across the genotypes. 2.5 Functional analysis The potential functional role of each event was investigated. GO enrichment analysis was carried out using the japonica annotation available from MSU, chi-squared tests, and a 'fdr' p-value correction. Similarly, enrichment for event type, intron retention, exon skipping, 17  etc, was investigated. Further gene structure analysis was carried out to investigate the potential functional consequences of the AS event. The number of nucleotides introduced into the transcript by the event was queried, and if not divisible by three and in a protein coding region of a gene, then the event was scored as potentially causing a downstream frameshift. PTCs were identified in the sequence of each AS event if the event occurs in an annotated protein coding region and if in-frame according to an annotated upstream exon. The PTC were classified as potential targets of NMD if found >55 nt of the last exon boundary and >300 nt from the end of the transcript, considering all annotated transcripts (Kurihara et al. 2009).  18  3. Results 3.1 Read alignment and AS quantification Analysis of AS events in the two parental subspecies and their reciprocal hybrids began with establishing events which events would be investigated. The Illumina RNA-seq reads from He et al., (2010) were obtained and aligned to the japonica genome. Of the over 100 million RNA-seq reads obtained, an average of 73% aligned to the reference sequence. Of the reads that aligned, an average of 89% aligned to annotated genes (Table 1). 20,855 genes were detected as expressed with at least three reads in one of the genotypes. By combining the current annotation and novel junction sites obtained from the RNA-seq alignments, a large pool of events that could potentially occur in the genome was created. This set of potential events was filtered for events that were detected in this data set, and a total of 24,256 events were retained that are the basis of the remainder of the analysis. This data set was comprised primarily of intron retention events (Table 2).  19  Table 1. Alignment of RNA-seq read libraries. Run SRR034580 SRR034581 SRR034582 SRR034583 SRR034584 SRR034585 SRR034586 SRR034587 SRR034588 SRR034589 SRR034590 SRR034591 SRR034592 SRR034593 SRR034594 SRR034595 SRR034596 SRR034597 SRR034598  Genotype Japonica Japonica Japonica Japonica Indica Indica Indica Indica Japonica x Indica Japonica x Indica Japonica x Indica Japonica x Indica Japonica x Indica Japonica x Indica Indica x Japonica Indica x Japonica Indica x Japonica Indica x Japonica Indica x Japonica  Total reads 5642282 5428674 5068379 5109674 5187951 5233901 4583793 4641079 5198498 5027119 5056066 4231457 4341385 4251748 5663392 5415256 5365357 5079459 5055719  Total aligned 3978301 3840441 3571298 3610482 3734784 3733359 3150753 3197267 3903190 3799938 3788831 3184631 3246893 3160477 4235615 4032674 3985491 3784288 3685960  Percent aligned 70.5 70.7 70.5 70.7 72 71.3 68.7 68.9 75.1 75.6 74.9 75.3 74.8 74.3 74.8 74.5 74.3 74.5 72.9  Total aligned to genes 3514536 3391767 3157053 3192755 3305923 3306851 2792266 2832313 3460357 3368265 3363151 2821696 2878694 2801902 3770620 3588553 3548143 3369821 3282942  Percent aligned to genes 88.3 88.3 88.4 88.4 88.5 88.6 88.6 88.6 88.7 88.6 88.8 88.6 88.7 88.7 89 89 89 89 89.1  20  Table 2. Types of AS events. The percentages of each type of event within each category is given in bold and in parenthesis. Investigated events  Different between sub-species  Dominant in one or both hybrids  Transgressive in one or both hybrids  Transgressive with no PTC and no frame shift  Exon skipping  3127 (12.9)  124 (6.1)  112 (6.0)  54 (6.6)  72 (47.1)  Intron retention  17223 (71.0)  1309 (64.7)  1205 (64.8)  521 (63.3)  30 (19.6)  Alternative acceptor  1170 (4.8)  92 (4.5)  79 (4.2)  37 (4.5)  12 (7.8)  Alternative donor  1146 (4.7)  64 (3.2)  59 (3.2)  29 (3.5)  8 (5.2)  Alternative first exon  684 (2.8)  160 (7.9)  149 (8.0)  45 (5.5)  15 (9.8)  Alternative last exon  906 (3.7)  275 (13.6)  255 (13.7)  137 (16.6)  18 (11.8)  24256  2024  1859  823  153  Total  21  To compare these AS events across the genotypes, a quantitative measure of each event, which takes into account gene expression level (see methods), was established using the RNA-seq data. Examination of the data, by Pearson correlations of the quantified AS levels across all the events, revealed that the parental lines (the different subspecies) are the most different and that the two hybrids are the most similar. There is considerable variance across the replicates, as correlations within each genotype vary from 71-78% (Fig. 1A), possibly due to the stochasticity associated with lower read counts of some AS events. For the AS events investigated, ANOVA tests revealed that for 6501 events there are some differences across the genotypes (p < 0.05). Based on the distribution of the P-values (Fig. 2A) many of these represent true positives, as the p-values would be evenly distributed between 0 and 1 under the null hypothesis. Because of the large number of tests carried out the P-values were corrected using the fdr method, after which 3175 events remained significantly different across the genotypes at a q < 0.05. As a diagnosis of this approach, the mean AS levels and the mean gene expression levels of genes with different AS across the genotypes were compared. AS events that are different across the genotypes, according to ANOVA, occur at higher AS frequency than ones that were not found to be significantly different (t-test p-value < 2.2e-16) among the genotypes. The normalized expression levels of genes that have different splicing are the same as those that do not have different splicing (ttest p-value = 0.50, Fig 2C). To establish the precise differences in AS between the genotypes, all pairwise comparisons of the genotype AS levels were made using Tukey exact tests. A similar trend is observed as with the correlation data (Fig. 1A), where the parental lines have the highest number of differences in AS events and the hybrids have the fewest (Fig. 1B).  22  Figure 1. AS event levels across the genotypes. A) Correlation of AS levels. Replicates are indicated by SRA accession number. AS levels are a function of the number of reads originating from a given event and the number of reads aligned to that gene (see methods section for details). B) Number of shared events between each genotype. The number below and in bold indicates the number of events where the genotype on the x-axis has the higher frequency 23  Figure 2. Analysis of AS mean differences. A) Distribution of P-values from ANOVA tests of 24,256 AS events to test if the genotypes investigated have different means. Distribution of event levels (B) and gene expression levels (C) for AS events which have different means across the genotypes (red) and those that do not (blue). The AS level are the mean event frequency (see methods) across the genotypes.. Gene expression levels are the RPKM (reads per kilobase per million reads mapped to genes) mean of the genotypes. 24  3.2 Intraspecific variation in AS Events that were quantitatively different between the different subspecies, indica and japonica, were investigated first. There were a total of 2024 events that differed quantitatively between these subspecies at a q-value < 0.05. These events are visualized in Fig. 3, where points that deviate from the diagonal represent events with different means, with events that appear red and below the diagonal having higher frequency in japonica (1284) and events that appear above the diagonal are occurring at a higher frequency in indica (757). The relationship between gene duplication and AS was investigated with this data set. Compared to genes with no difference in AS patterns between the subspecies, genes with different AS are slightly, but significantly, more often singletons or in medium size (5-10 members) gene families, and not in large gene families of >10 members. Variation in gene expression was also investigated between the different sub-species, and 9395 of the 20,855 genes investigated were differentially expressed. More genes with differential expression also had differential AS than expected (Fig. 3C) (chi-square p.value < 0.001). Changes in gene expression and changes in AS event frequency were positively correlated (Fig. 3D, R = 0.467 , p.value <2.2e-16), where increases in gene expression levels are associated with increases in AS event frequency.  25  Figure 3. Intraspecific AS variation. A) Levels of AS events, as a function of the number of reads originating from that event and the number of reads coming from that gene, in the different subspecies, Significantly different (q<0.05) events are in red. B) Family size of genes with different AS in the subspecies and those with the same AS. C) Events that are different between the subspecies are more likely to also have differences in gene expression (Chisquared p<0.05) D) Differences in AS levels correlate with differences in expression between the subspecies (R = 0.467, p<0.05). 26  3.3 AS in the hybrids compared to their parents Using the information parsed from pairwise comparisons of all the genotypes, the AS events in the hybrids were classified into different modes of gene action: additive, where the hybrid AS levels are between or equal to both parents (example Fig. 4B), dominant, where the hybrid has AS levels equal to one parent but not the other (example Fig. 4C), or transgressive, where the hybrid AS levels are either higher or lower than both of the parents (example Fig. 4D). For the majority of the events that are different between the parents, the hybrids displayed a dominant mode of inheritance. It is important to note that several different patterns are grouped into each of these categories; for example, both high and low parent dominance are categorized as dominant. Similarly, there are several possible patterns grouped as transgressive; the parental lines could have the same or different AS and the hybrids could be higher or lower than the parents. The total number of events that fall into each group is summarized in Fig. 4A. In both hybrids, we see that there are several hundred events that are classified as transgressive. Of the 2024 events that are quantitatively different between the parents, each was assayed in both hybrids (for a total 4048), and 643 were found to be additive, 3183 were dominant and 222 were transgressive. The remaining transgressive events are found in genes where there was no difference in the parents (506 events in japonica x indica and 336 events in indica x japonica). The majority of the transgressive events, 605, happen at a higher frequency in the hybrids than they do in the parents as opposed to occurring at a lower level (459 events). Of the events that are transgressive in one hybrid, 241 are transgressive in the other, leaving 528 that are transgressive in only one of the hybrids.  27  Figure 4. Modes of AS inheritance. A) Categorization of all events investigated in the hybrids into additive, dominant or transgressive. Percentage of total events in each category for a given hybrid is given in brackets. Examples of each event type in B-D . B) Additive AS for an alternative last exon event in LOC_Os01g36950 C) Dominant AS in second intron event in LOC_Os03f15460 D) Transgressive AS in alternative donor in the last intron of LOC_Os07g01030. 28  Table 3. Shared AS modes between the hybrids. Similarities in classification of events in the both hybrids. Additive Indica x Japonica  Additive Dominant Transgressive  Japonica x Indica Dominant Transgressive 21674 244 326 191 1324 77 156 23 241  Table 4. Characteristics of transgressively spliced events. Events classified as transgressive in either hybrid were evaluated as to whether or not they would cause a downstream frame shift if translated and if they have any PTC which could make the transcripts be a target for NMD. No PTC Likely NMD target Not likely NMD target total  No frame shift Frame shift Total 153 297 135 230 2 6 290 533  450 365 8  Of the events that have transgressive levels of AS in the hybrids, 48 are potentially novel events that do not occur in the parents or in the japonica annotation. It is possible that these events may be detected in the parental lines if different thresholds were used. These events happen in genes of a diverse variety of functions. All of these events are intron retention. Most have in frame PTCs and/or would cause frame shifts down stream and could be targets for NMD. In one gene, a putative protein kinase, this novel AS could lead to an additional protein product (Table 5). For all of the transgressive alternatively spliced genes, GO enrichment analysis found that in the japonica x indica hybrid there is an enrichment for genes involved in generation of precursor metabolites and energy (GO:0006091), carbohydrate metabolism (GO:0005975), and photosynthesis (GO:0015979). Transgressively spliced genes in the indica x japonica hybrid had more genes that are plastid components (GO:0009536) than would be expected. In both hybrids, fewer of the transgressive events are intron retention events compared to all the events investigated (Table 2) and more are alternative last exons (16.6%) than expected 29  (chi squared, p.value < 0.05). Of the 823 transgressive events that are transgressively spliced in one or both of the hybrids, the majority would cause downstream frame shifts. A large number of events, 365, have an in-frame PTC and are likely targets of NMD, although this may be an underestimate as this method does not take into account PTC introduced by the AS event downstream by frame shifts (see Methods). Some transgressive events, 153, would not cause a frame shift if translated and do not have any in-frame PTC. As such, these events could result in the creation of multiple protein products (Table 4.). A large number of these events are exon skipping (Table 2.).  Figure 5. GO enrichment of transgressively spliced genes. Significantly enriched categories in each group are highlighted in yellow.  30  Table 5. Novel AS events in one or both hybrids. Gene LOC_Os01g42260 LOC_Os01g05800 LOC_Os01g05800 LOC_Os01g50830 LOC_Os01g58390 LOC_Os01g60230 LOC_Os01g07590 LOC_Os01g07790 LOC_Os01g66000 LOC_Os01g66240 LOC_Os01g11946 LOC_Os02g04710 LOC_Os02g39795 LOC_Os02g46700 LOC_Os02g47120 LOC_Os02g52430 LOC_Os02g57160 LOC_Os02g08120 LOC_Os03g21740 LOC_Os03g03920 LOC_Os03g49600 LOC_Os03g53670 LOC_Os03g55820 LOC_Os03g63480 LOC_Os03g11570 LOC_Os04g28620 LOC_Os04g31790 LOC_Os04g32540 LOC_Os04g37710 LOC_Os06g41930 LOC_Os06g06780 LOC_Os06g17220 LOC_Os07g17970 LOC_Os07g19550 LOC_Os07g31460 LOC_Os07g44004 LOC_Os07g49390 LOC_Os08g24770 LOC_Os08g28190 LOC_Os08g29750 LOC_Os08g42610 LOC_Os08g10244 LOC_Os09g33690 LOC_Os09g34130 LOC_Os09g04440  Location chr01 23941273-23941828 chr01 2770627-2770852 chr01 2771785-2771938 chr01 29192131-29192491 chr01 33732588-33733188 chr01 34835628-34835768 chr01 3648449-3648618 chr01 3739359-3739457 chr01 38305783-38305866 chr01 38478522-38478657 chr01 6504935-6505678 chr02 2129365-2129462 chr02 24043211-24043240 chr02 28514427-28514461 chr02 28755429-28755850 chr02 32082909-32083980 chr02 34999856-34999986 chr02 4297842-4298756 chr03 12424549-12424631 chr03 1792554-1792661 chr03 28237062-28237947 chr03 30771213-30771319 chr03 31773651-31774004 chr03 35855509-35855613 chr03 5999165-5999711 chr04 16778790-16779156 chr04 18865204-18865301 chr04 19422231-19422612 chr04 22244100-22244210 chr06 25157792-25157935 chr06 3196477-3197280 chr06 9974678-9975840 chr07 10640675-10641138 chr07 11584165-11584341 chr07 18645085-18645775 chr07 26301797-26302312 chr07 29579945-29580042 chr08 14995445-14995582 chr08 17202202-17202272 chr08 18280354-18280866 chr08 26931564-26932337 chr08 5949942-5950168 chr09 19902937-19902993 chr09 20151775-20151856 chr09 2354440-2354496  Event 10th intron retention 4th intron retention 8th intron retention 3rd intron retention 4th intron retention 5th intron retention 7th intron retention 10th intron retention 3rd intron retention 10th intron retention 1st intron retention 16th intron retention 4th intron retention 3rd intron retention 1st intron retention 1st intron retention 11th intron retention 2nd intron retention 17th intron retention 2nd intron retention 1st intron retention 1st intron retention 1st intron retention 4th intron retention 2nd intron retention 9th intron retention 6th intron retention 7th intron retention 1st intron retention 2nd intron retention 3rd intron retention 1st intron retention 3nd intron retention 1st intron retention 4th intron retention 1st intron retention 11th intron retention 1st intron retention 15th intron retention 1st intron retention 1st intron retention 2nd intron retention 4 intron retention 5th intron retention 6th intron retention  Gene region Coding Coding Coding Coding UTR Coding Coding Coding Coding Coding Coding Coding Coding Coding Coding UTR UTR Coding UTR Coding UTR Coding Coding Coding Coding Coding Coding Coding Coding UTR Coding Coding Coding Coding Coding UTR UTR Coding UTR Coding UTR UTR Coding Coding Coding  PTC Yes Yes Yes Yes No Yes Yes Yes Yes Yes Yes Yes Yes Yes Yes Yes Yes Yes Yes Yes Yes No Yes Yes Yes Yes Yes Yes Yes Yes Yes Yes Yes Yes Yes No Yes No No Yes No No Yes Yes Yes  Frame shift Yes Yes Yes Yes Yes No Yes No No Yes No Yes No Yes Yes Yes Yes No Yes No No Yes No No Yes Yes Yes Yes No No No Yes Yes No Yes No Yes No Yes No No Yes No Yes No  Description transcriptional corepressor LEUNIG, putative inner membrane protein, putative inner membrane protein, putative expressed protein haloacid dehalogenase, putative auxin efflux carrier component, putative universal stress protein domain containing protein, putative polygalacturonase, putative NADH dehydrogenase I subunit N, putative mitochondrion protein, putative ABC transporter, ATP-binding protein, putative cycloartenol synthase, putative S-adenosyl-l-methionine decarboxylase leader peptide, putative GCN5-related N-acetyltransferase, putative region found in RelA/SpoT proteins containing protein regulator of ribonuclease, putative ELMO/CED-12 family protein, putative calmodulin binding protein, putative valyl-tRNA synthetase, putative Ubiquitin family domain containing protein Os3bglu7 - beta-glucosidase, exo-beta-glucanse YT521-B-like family domain containing protein thioredoxin, putative ankyrin repeat domain containing protein proteasome subunit, putative male sterility protein, putative expressed protein OsSCP24 - Putative Serine Carboxypeptidase homologue serine hydrolase domain containing protein zinc-binding protein, putative harpin-induced protein, putative UDP-glycosyltransferase, putative AMP-binding domain containing protein expressed protein peptide-Nasparagine amidase, putative expressed protein P-protein, putative, expressed protein kinase, putative actin, putative hypothetical protein pentatricopeptide, putative retrotransposon protein, putative, unclassified Os9bglu32 - beta-glucosidase homologue, similar to G. max hydroxyisourate hydrolase GRAM domain containing protein DNA-binding protein, putative  31  3.4 Changes in AS and histone marks There is a growing body of research investigating the connection of AS and the epigenetic status of genes (Luco et al. 2011). Using ChIP-seq data, which assayed three histone modifications (H3K4me3, H3K27me3 and H3K9ac), changes in histone modifications were assessed in these same rice genotypes. Following alignment of the reads and peakcalling using Bowtie and Findpeaks, respectively, the amount of coverage for each histone mark across each gene was measured. For all the genotype comparisons, the change each AS event frequency (for differentially regulated AS events) was correlated to the difference in the coverage of each gene by each of the different histone marks. Changes in all three histone marks are positively correlated with changes in AS between the parental lines. The same correlation is not found in any comparison using either of the hybrids at p < 0.05 (Table 6).  32  Table 6. Changes in AS and changes in histone marks. Pearson correlation between the change in AS, for events which are different between two genotypes, and the change in the percent of that gene which is covered by a given histone mark. R values are highlighted yellow and the associated p-values are highlighted blue. H3K4me3 Japonica Japonica Indica Japonica x Indica Indica x Japonica  0.15 0.01 0.00  Indica 1.34e-11 0.03 0.05  Japonica x Indica 0.70 0.21  Indica x Japonica 0.98 0.07 0.60  -0.02  H3K9ac Japonica Japonica Indica Japonica x Indica Indica x Japonica  0.10 0.01 0.01  Indica 5.28e-06 0.02 0.01  Japonica x Indica 0.58 0.42  Indica x Japonica 0.71 0.68 0.86  -0.01  H3K27me3 Japonica Japonica Indica Japonica x Indica Indica x Japonica  0.11 0.02 -0.01  Indica 4.24e-07 0.02 0.01  Japonica x Indica 0.34 0.42  Indica x Japonica 0.82 0.57 0.59  -0.02  33  4.Discussion There is considerable interest in understanding the underlying molecular mechanisms of hybrid vigour (Chen 2010; Birchler et al. 2010). To be able to investigate all routes that may lead to heterosis we must first know what types of genomic and transcriptomic changes can occur in hybrids. Here, a transcriptome wide study investigating thousands of AS events has revealed that AS is affected by hybridization at an appreciable rate and that AS can happen at greater or lower frequencies in the hybrids than in either parent, depending on the gene. 4.1 Differences in AS among the genotypes Different AS levels were found across the genotypes for 3175 events. The events that are called as significantly different between the different genotypes have higher frequencies of AS than those which are not found to be significant (Fig. 2B), which suggests that our ability to detect differences between the genotypes decreases with a lower frequency of the events. This may be due to the stochasticity associated with lower read counts, which would be the case for many low frequency AS events. Another diagnostic of this approach is to compare the mean gene expression levels for genes that were found to have different AS and those that did not. These two groups have the same distribution of mean expression levels (Fig. 2C), suggesting that this method of correcting AS by gene expression level is a viable method, as the genes with different AS represent a random sampling of genes with regards to gene expression levels. For the AS events which have different means across the genotypes, pairwise comparisons revealed in which way each of the genotypes differed from one another (Fig. 1B).  34  4.2 Transgressive AS following hybridization By comparing AS levels of the parental lines to each of the hybrid lines, each AS event was categorized into a different mode of inheritance for AS, as has been done previously for gene expression in hybrids (Swanson-Wagner et al 2006) (Fig 4A). The majority of events were categorized as additive, where the hybrid AS level was at or between the parental levels. There was a substantial number of events which had non-additive levels in the hybrids. The majority of these events exhibited a dominant form of inheritance, where the AS levels in the hybrid are at the same level as one parent but not the other. There are a sizable number of events where one or both hybrids have higher or lower AS levels than either parent. It is difficult to calculate the precise rate at which transgressive AS happens, as the frequency found here may be affected by a number of parameters, including the criteria used to decide to investigate an AS event . Although the hybrids are the most similar genotypes, in terms of the number of events where they have the same level of AS, many of their events are categorized differently in terms of being additive, dominant or transgressive (Table 3). This could be due to parental or cytoplasmic effects, or other factors. There are also 48 events that appear be unique to the hybrids. These events, which are all intron retention, occur in genes involved in a large number of functions (Table 5). Novel splicing events were also observed in interspecific Populus hybrids (Scascitelli et al. 2010). These events could be occurring due to novel interactions of regulatory elements and divergent target sequence. Novel splicing, even without creating additional proteins could have functional impact possibly through NMD to lower the total level of expression of the gene.  35  The consequences of the transgressive events may have various effects on the phenotype of the hybrids. Its possible that the altered AS frequencies of many genes make incremental contributions to various cellular processes, the outcome of which has a net impact on growth and vigour. In the japonica x indica hybrids, more of the genes with transgressive AS are involved in the processes of generation of precursor metabolites and energy, carbohydrate metabolism, and photosynthesis than one would expect based on all the genes investigated. There are several plausible pathways by which altered AS of genes in these functional groups could contribute to heterosis given that carbohydrate metabolism has previously been shown to be important in heterosis (Ni et al. 2009). These GO categories are not found to be enriched in the other hybrid, indica x japonica, where only plastid components are enriched. Fewer genes are transgressively spliced in this hybrid, which may prevent enrichment in other groups from being detected. It is also possible that a change in a single AS event could have a large impact. Of the transgressively spliced events, 153 could possibly result in altered proteins being made, as they do not introduce an NMD inducing PTC and do not cause a frameshift downstream. A large number of the remaining transgressive events could affect gene function without altering the protein sequence, possibly through the NMD pathway (Table 4). Coupling AS to NMD may be responsible for regulating the final transcript levels of a large number of genes (McGlincy and Smith, 2008). Any one of these events could have a significant impact on phenotype although establishing the importance of each would require further research. These events could be changing the proteome by altering the type and level of proteins in the cell as well as possibly introducing proteins that do not appear in either parent.  36  4.3 Intraspecific variation of AS The largest difference between any two of the genotypes is between the different subspecies, japonica and indica (Fig. 1B). Of the 24,256 AS events investigated, 2024 have different frequencies in these sub-species. High-throughput sequencing has previously been used to look at intraspecific and interspecific variation of AS in animals (Harr and Turner 2010; Blekhman et al. 2010). Our findings of the rate at which AS is different between these different subspecies (~8%) is similar to the level of variation among mouse subspecies (6.5%) (Harr and Turner 2010). This level of within species variation is comparable to that observed in A. thaliana ecotypes (Zhang and Borevitz 2009). Variations in AS have been associated with phenotypic differences within a species (Yuan, et al. 2009), so the variation in AS observed here in over 2000 genes may play an important role in the phenotypic differences between these subspecies. The quantity of transcripts that is made for each gene is an important aspect of gene regulation. Many genes demonstrated different levels of mRNA expression between the japonica and indica sub-species (He et al. 2010). Using a similar approach to find differences in gene expression as for AS, a large number of genes were found to have different expression patterns between these sub-species (see results). Genes which have changes in AS also have changes in gene expression more frequently than we expect (Fig. 3C), and the difference in gene expression level correlates with the difference in the AS frequency (Fig. 3D). One possible explanation for this observation is that gene expression and AS may have common underlying regulatory mechanisms, such as epigenetic modifications, that affect them both. It is also possible that some genes undergo rapid evolution, either via selection or drift, of both gene expression and AS. 37  4.4 Potential mechanisms responsible for AS variation What are the potential molecular mechanisms underlying the differences in AS observed across these genotypes? There are several ways in which AS could be differentially regulated in these subspecies. Differences in cis- and trans-regulatory mechanisms could be responsible. This could be investigated with a cis-trans analysis, which has been previously utilized to assess regulatory divergence of gene transcript expression levels (McManus et al. 2010). In the majority of events where there are differences between the parental subspecies, the hybrids have exhibited a dominant mode of inheritance in the hybrids, suggesting that trans-acting factors may be responsible for the observed differences. An ideal cis-trans analysis would investigate allele-specific AS levels; unfortunately, there are not enough informative single nucleotide polymorphisms (SNPs) in the appropriate locations to accomplish this using this data set. Future studies may be able to address this using more divergent lineages, or with an eQTL approach as was recently implemented in human cell lines (Pickrell et al. 2010). Another likely factor in the regulation of AS is the epigenetic status of the gene in question (Luco et al. 2011). The observed correlation between differences in AS and differences in histone marks H3K4me3, H3K27me3, and H3K9ac in the different subspecies (Table 6) suggest that epigenetic regulation is important in splicing. However, differences in AS are also correlated with differences in gene expression in the parental lines (Fig. 3D), thus confounding the relationship between histone modifications and AS. It is however tempting to speculate, and recent work supports (Luco et al. 2011), that changes in AS patterns can be in part caused by the differences in histone states. Transgressive AS events are one of many molecular processes that are affected by hybridization. Combining divergent regulatory hierarchies, which include all the cis and trans 38  regulatory mechanisms responsible for the differences in the subspecies, could result in novel interactions and altered AS. Such cis by trans interactions could potentially be detected using a RNA-seq approach with sufficient read depth and informative SNPs. However, the majority of transgressive events happen in events where there are no differences in the parental genotypes, so divergent regulatory mechanisms may not be solely responsible. Other changes, such as modifications in histone modifications or gene expression changes, may be involved in the observed changes in AS. Changes in the levels of AS regulators, for example the SR proteins, could result in different AS levels. Another possibility is that the changes in AS are associated with changes gene expression, polymerase rates or the epigenetic differences across the genotypes. Although correlations are observed in the parental subspecies for differences in AS and histone marks, the correlation are not found in the hybrids. It is possible that histone modifications do not play a role in the transgressive AS in the hybrids. It is also possible that the correlation is not observed because there is a relatively small amount of non-additive histone modifications in the hybrids (He et al. 2010). 4.5 Conclusions There is considerable variation in AS across these rice genotypes. With-in this species there are thousands of genes that undergo different levels of AS. Combining the genomes of these sub-species results in hundreds of events happening at levels not found in either parent. It may also result in many new AS events occurring, as previously observed on a smaller scale (Scascitelli et al. 2010). High-throughput sequencing technologies will allow this question to be investigated in a wide array of hybrids, as well as allopolyploids. Further research, at population and molecular levels, will be required to fully understand the extent, impact and regulatory underpinnings of these AS events and their role in heterosis. 39  References Adams, K.L. 2007. Evolution of duplicate gene expression in polyploid and hybrid plants. The Journal of Heredity 98: 136-41. Adams, K.L., Percifield, R., and Wendel, J.F. 2004. Organ-specific silencing of duplicated genes in a newly synthesized cotton allotetraploid. Genetics 168: 2217-26. Adams, K.L., and Wendel, J.F. 2005. Allele-specific, bidirectional silencing of an alcohol dehydrogenase gene in different organs of interspecific diploid cotton hybrids. Genetics 171: 2139-42. Arnold, M.L., and Martin, N.H. 2010. Hybrid fitness across time and habitats. Trends in Ecology & Evolution 1-7. Barash, Y., Calarco, J. a, Gao, W., Pan, Q., Wang, Xinchen, Shai, O., Blencowe, B.J., and Frey, B.J. 2010. Deciphering the splicing code. Nature 465: 53-59. Birchler, J.A., Yao, H., Chudalayandi, S., Vaiman, D., and Veitia, R.A. 2010. Heterosis. The Plant Cell 22: 2105-2112. Blekhman, R., Marioni, J.C., Zumbo, P., Stephens, M., and Gilad, Y. 2010. Sex-specific and lineage-specific alternative splicing in primates. Genome Research 20: 180-9. Calarco, J.A., Xing, Y., Cáceres, M., Calarco, J.P., Xiao, X., Pan, Q., Lee, C., Preuss, T.M and Blencowe, B.J. 2007 Global analysis of alternative splicing differences between humans and chimpanzees. Genes & Development. 21:2963-2975. Chen, Z.J. 2010. Molecular mechanisms of polyploidy and hybrid vigor. Trends in Plant Science 15: 57-71. Darwin, C.R. 1876. The Effects of Cross- and Self-fertilization in the Vegetable Kingdom. Fejes, A.P., Robertson, G., Bilenky, M., Varhol, R., Bainbridge, M., and Jones, S.J.M. 2008. FindPeaks 3.1: a tool for identifying areas of enrichment from massively parallel shortread sequencing technology. Bioinformatics 24: 1729-30. Filichkin, S. a, Priest, H.D., Givan, S. a, Shen, R., Bryant, D.W., Fox, S.E., Wong, W.-K., and Mockler, T.C. 2010. Genome-wide mapping of alternative splicing in Arabidopsis thaliana. Genome Research 20: 45-58. Goff, S.A. et al. 2002. A draft sequence of the rice genome (Oryza sativa L. ssp. japonica). Science 296: 92-100. Harr, B., and Turner, L.M. 2010. Genome-wide analysis of alternative splicing evolution among Mus subspecies. Molecular Ecology 19: 228-39. He, G. et al. 2010. Global epigenetic and transcriptional trends among two rice subspecies and their reciprocal hybrids. The Plant Cell 22: 17-33. Hegarty, M.J., Barker, G.L., Wilson, I.D., Abbott, R.J., Edwards, K.J., and Hiscock, S.J. 2006. Transcriptome shock after interspecific hybridization in senecio is ameliorated by genome duplication. Current Biology 16: 1652-9. Kenan-Eichler, M., Leshkowitz, D., Tal, L., Noor, E., Melamed-Bessudo, C., Feldman, M., and Levy, A. a. 2011. Wheat Hybridization and Polyploidization Results in Deregulation of 40  small RNAs. Genetics. 188:1 Kovach, M.J., Sweeney, M.T., and McCouch, S.R. 2007. New insights into the history of rice domestication. Trends in Genetics : TIG 23: 578-87. Krieger, U., Lippman, Z.B., and Zamir, D. 2010. The flowering gene SINGLE FLOWER TRUSS drives heterosis for yield in tomato. Nature Genetics 42: 459-63. Kurihara, Y. et al. 2009. Genome-wide suppression of aberrant mRNA-like noncoding RNAs by NMD in Arabidopsis. Proceedings of the National Academy of Sciences of the United States of America 106: 2453-8. Lamberto, I., Percudani, R., Gatti, R., Folli, C., and Petrucco, S. 2010. Conserved splicing of Arabidopsis transthyretin-like determines protein localization and S-allantoin synthesis in peroxisomes. The Plant Cell 22: 1564-74. Landry, C.R., Hartl, D.L., and Ranz, J.M. 2007. Genome clashes in hybrids: insights from gene expression. Heredity 99: 483-93. Langmead, B., Trapnell, C., Pop, M., and Salzberg, S.L. 2009. Ultrafast and memory-efficient alignment of short DNA sequences to the human genome. Genome Biology 10: R25. Lu, H., Lin, L., Sato, S., Xing, Y., and Lee, C.J. 2009. Predicting functional alternative splicing by measuring RNA selection pressure from multigenome alignments. PLoS Computational Biology 5: e1000608. Lu, T. et al. 2010. Function annotation of rice transcriptome at single nucleotide resolution by RNA-seq. Genome Research 1238-1249. Luco, R.F., Allo, M., Schor, I.E., Kornblihtt, A.R., and Misteli, T. 2011. Epigenetics in alternative pre-mRNA splicing. Cell 144: 16-26. Luco, R.F., Pan, Q., Tominaga, K., Blencowe, B.J., Pereira-Smith, O.M., and Misteli, T. 2010. Regulation of alternative splicing by histone modifications. Science 327: 996-1000. Madlung, A., Masuelli, R.W., Watson, B., Reynolds, S.H., Davison, J., Comai, L., and M, W.A. 2002. Remodeling of DNA Methylation and Phenotypic and Transcriptional Changes in Synthetic Arabidopsis. Plant Physiology 129: 733-746. Mallet, J. 2007. Hybrid speciation. Nature 446: 279-83. McClintock, B. 1984. The significance of responses of the genome to challenge. Science 226: 792-801. Mcmanus, C.J., Coolon, J.D., Duff, M.O., Eipper-mains, J., Graveley, B.R., and Wittkopp, P.J. 2010. Regulatory divergence in Drosophila revealed by mRNA-seq. Genome Research 20: 816-825. Michalak, P. 2009. Epigenetic, transposon and small RNA determinants of hybrid dysfunctions. Heredity 102: 45-50. Mortazavi, A., Williams, B.A., Mccue, K., Schaeffer, L., and Wold, B. 2008. Mapping and quantifying mammalian transcriptomes by RNA-Seq. Nature Methods 5: 1-8.  41  Nasrallah, J.B., Liu, P., Sherman-Broyles, S., Schmidt, R., and Nasrallah, M.E. 2007. Epigenetic mechanisms for breakdown of self-incompatibility in interspecific hybrids. Genetics 175: 1965-73. Ni, Z., Kim, E.-D., Ha, M., Lackey, E., Liu, J., Zhang, Yirong, Sun, Q., and Chen, Z.J. 2009. Altered circadian rhythms regulate growth vigour in hybrids and allopolyploids. Nature 457: 327-31. Nicolae, M., Mangul, S., M, I., and Zelikovsky, A. Estimation of alternative splicing isoform frequencies from RNA-Seq data Algorithms for Molecular Biology 6:9. Nilsen, T.W., and Graveley, B.R. 2010. Expansion of the eukaryotic proteome by alternative splicing. Nature 463: 457-63. Pan, Q., Shai, O., Lee, L.J., Frey, B.J., and Blencowe, B.J. 2008. Deep surveying of alternative splicing complexity in the human transcriptome by high-throughput sequencing. Nature Genetics 40: 1413-5. Pickrell, J.K., Pai, A.A., Gilad, Y., and Pritchard, J.K. 2010. Noisy Splicing Drives mRNA Isoform Diversity in Human Cells. PLoS Genetics 6: 1-8. Proost, S., Van Bel, M., Sterck, L., Billiau, K., Van Parys, T., Van de Peer, Y., and Vandepoele, K. 2009. PLAZA: a comparative genomics resource to study gene and genome evolution in plants. The Plant Cell 21: 3718-31. Reddy, A.S.N. 2007. Alternative splicing of pre-messenger RNAs in plants in the genomic era. Annual review of Plant Biology 58: 267-94. Richard, H. et al. 2010. Prediction of alternative isoforms from exon expression levels in RNASeq experiments. Nucleic Acids Research 38: 11-13. Rieseberg, L., Willis J. 2007. Plant speciation. Science 317: 910-4. Scascitelli, M., Cognet, M., and Adams, K.L. 2010. An interspecific plant hybrid shows novel changes in parental splice forms of genes for splicing factors. Genetics 184: 975-83. Song, H.-R., Song, J.-D., Cho, J.-N., Amasino, R.M., Noh, B., and Noh, Y.-S. 2009. The RNA binding protein ELF9 directly reduces SUPPRESSOR OF OVEREXPRESSION OF CO1 transcript levels in arabidopsis, possibly via nonsense-mediated mRNA decay. The Plant Cell 21: 1195-211. Springer, N.M., and Stupar, R.M. 2007. Allelic variation and heterosis in maize: how do two halves make more than a whole? Genome research 17: 264-75. Swanson-Wagner, R. a, Jia, Y., DeCook, R., Borsuk, L. a, Nettleton, D., and Schnable, P.S. 2006a. All possible modes of gene action are observed in a global comparison of gene expression in a maize F1 hybrid and its inbred parents. Proceedings of the National Academy of Sciences of the United States of America 103: 6805-10. Sweeney, M., and McCouch, S. 2007. The complex history of the domestication of rice. Annals of Botany 100: 951-7. Trapnell, C., Pachter, L., and Salzberg, S.L. 2009. TopHat: discovering splice junctions with RNA-Seq. Bioinformatics 25: 1105-11. Trapnell, C., Williams, B. a, Pertea, G., Mortazavi, A., Kwan, G., Baren, M.J. van, Salzberg, 42  S.L., Wold, B.J., and Pachter, L. 2010. Transcript assembly and quantification by RNASeq reveals unannotated transcripts and isoform switching during cell differentiation. Nature Biotechnology 28: 516-520. Virmani, S., and Kumar, I. 2009. Rice Improvements in the Genomics Era. Hybrid Rice Techonology. p105-138. Wang, L., Xi, Y., Yu, J., Dong, L., Yen, L., and Li, Wei. 2010. A statistical method for the detection of alternative splicing using RNA-seq. PloS One 5: e8529. Wang B-B and Brendel V. 2006. Genomewide comparative analysis of alternative splicing in plants. Proceedings of the National Academy of Sciences of the United States of America. 103:7175-80 Wittkopp, P.J. 2007. Variable gene expression in eukaryotes: a network perspective. The Journal of Experimental Biology 210: 1567-75. Yuan, Y.-X., Wu, J., Sun, R.-F., Zhang, X.-W., Xu, D.-H., Bonnema, G., and Wang, X.-W. 2009. A naturally occurring splicing site mutation in the Brassica rapa FLC1 gene is associated with variation in flowering time. Journal of Experimental Botany 60: 1299-308. Zhang, X., and J. O. Borevitz, 2009 Global analysis of allele-specific expression in Arabidopsis thaliana. Genetics. 182: 943-954. Zhang, G. et al. 2010. Deep RNA sequencing at single base-pair resolution reveals high complexity of the rice transcriptome Deep RNA sequencing at single base-pair resolution reveals high complexity of the rice transcriptome. Genome Research. 20: 646-54. Zhang, H.-Y. et al. 2008a. A genome-wide transcription analysis reveals a close correlation of promoter INDEL polymorphism and heterotic gene expression in rice hybrids. Molecular Plant 1: 720-31. Zhang, N., Kallis, R.P., Ewy, R.G., and Portis, A.R. 2002. Light modulation of Rubisco in Arabidopsis requires a capacity for redox regulation of the larger Rubisco activase isoform. Proceedings of the National Academy of Sciences of the United States of America 99: 3330-4. Zhang, X., Byrnes, J.K., Gal, T.S., Li, W.-H., and Borevitz, J.O. 2008b. Whole genome transcriptome polymorphisms in Arabidopsis thaliana. Genome Biology 9: R165.  43  

Cite

Citation Scheme:

        

Citations by CSL (citeproc-js)

Usage Statistics

Share

Embed

Customize your widget with the following options, then copy and paste the code below into the HTML of your page to embed this item in your website.
                        
                            <div id="ubcOpenCollectionsWidgetDisplay">
                            <script id="ubcOpenCollectionsWidget"
                            src="{[{embed.src}]}"
                            data-item="{[{embed.item}]}"
                            data-collection="{[{embed.collection}]}"
                            data-metadata="{[{embed.showMetadata}]}"
                            data-width="{[{embed.width}]}"
                            async >
                            </script>
                            </div>
                        
                    
IIIF logo Our image viewer uses the IIIF 2.0 standard. To load this item in other compatible viewers, use this url:
http://iiif.library.ubc.ca/presentation/dsp.24.1-0105083/manifest

Comment

Related Items