Open Collections

UBC Theses and Dissertations

UBC Theses Logo

UBC Theses and Dissertations

Evolution and neofunctionalization of imprinted genes after duplication in Brassicaceae Qiu, Yichun 2013

Your browser doesn't seem to have a PDF viewer, please download the PDF to view this item.

Item Metadata

Download

Media
24-ubc_2013_spring_qiu_yichun.pdf [ 1.52MB ]
Metadata
JSON: 24-1.0073522.json
JSON-LD: 24-1.0073522-ld.json
RDF/XML (Pretty): 24-1.0073522-rdf.xml
RDF/JSON: 24-1.0073522-rdf.json
Turtle: 24-1.0073522-turtle.txt
N-Triples: 24-1.0073522-rdf-ntriples.txt
Original Record: 24-1.0073522-source.json
Full Text
24-1.0073522-fulltext.txt
Citation
24-1.0073522.ris

Full Text

EVOLUTION AND NEOFUNCTIONALIZATION OF IMPRINTED GENES AFTER DUPLICATION IN BRASSICACEAE  by  Yichun Qiu  A THESIS SUBMITTED IN PARTIAL FULFILLMENT OF THE REQUIREMENTS FOR THE DEGREE OF  MASTER OF SCIENCE  in  The Faculty of Graduate Studies (Botany)  THE UNIVERSITY OF BRITISH COLUMBIA (Vancouver)  January 2013  © Yichun Qiu, 2013 ii  Abstract Plant genomes have large numbers of duplicated genes. After duplication one duplicate can acquire a new function or expression pattern, referred to as neofunctionalization. Some duplicated genes are imprinted, where only one allele is expressed depending on its parental origin. I hypothesized that duplicated imprinted genes frequently show an accelerated rate of amino acid sequence evolution and have a new expression pattern compared with their paralogs, which together are suggestive of neofunctionalization.  I first studied four imprinted genes in Arabidopsis, FIS2, MPC, FWA, and HDG3 that have flower and/or seed specific expression. I found that they all have considerably accelerated rates of sequence evolution compared to their paralogs. To determine the ancestral expression pattern I assayed expression patterns in outgroup species, the results of which strongly suggested that the imprinted genes have acquired a novel organ-specific expression pattern restricted to flowers and/or seeds. Using data from recent large-scale identification studies of imprinted genes, I detected by phylogenetic tree analyses 133 imprinted genes that arose from gene duplication events in Brassicaceae. Analyses of 48 alpha whole genome duplicated gene pairs indicated that many imprinted genes show an accelerated rate of amino acid changes compared to their paralogs. Analyses of microarray data indicated that many imprinted genes have expression patterns restricted to flowers and/or seeds, compared with their broadly-expressed paralogs.  Both the accelerated sequence rate evolution and the new expression pattern in the imprinted genes suggest that after evolutionarily recent duplication events, imprinted genes frequently underwent neofunctionalization. In particular, neofunctionalization of the FIS2 gene has led to a change in the mechanism of regulating seed development in Brassicaceae. Multiple lines of evidence, when considered together, are highly suggestive of many origins of imprinting in Brassicaceae. This study reveals that the origin of genetic imprinting can arise over short evolutionary time periods and gene duplication serves as an important factor generating imprinted genes.   iii  Preface This project included collaborations with Shao-Lun (Allen) Liu at Tunghai University in Taiwan (formerly at UBC) and Yii Van Tay in the Adams lab at UBC.  In the Results section “Asymmetric sequence evolution is common in duplicated pairs with imprinted genes”, Shao-Lun Liu helped to develop the computational analyses.  I was responsible for obtaining sequences and making alignments to generate input data for the analyses.  I also processed the output data to finalize the analyses. Shao-Lun Liu also helped design microarray data analyses for expression breadth of imprinted genes compared with their paralogs, in the Results section “Many imprinted genes have more restricted expression patterns and preferential expression in reproductive organs”. Raw microarray data were processed by Shao-Lun Liu, and he developed the computational analyses as part of a previous project. I chose the gene pairs for analyses and processed the raw output data to finalize the analyses. In the final RT-PCR section in the Results, “Expression patterns of imprinted genes changed after gene duplication to become restricted to reproductive organs”, Yii Van Tay helped perform RT-PCRs for four pairs of gene duplicates in Arabidopsis thaliana and their orthologous genes in Carica papaya and Vitis vinifera. I was responsible for designing gene-specific primers, processing gel data, and interpreting results.    iv   Table of Contents  Abstract …......................................................................................................................... ii Preface …………………………………………………………………………………...iii Table of Contents ............................................................................................................. iv List of Tables …………………………………………………………………………..... v List of Figures ….............................................................................................................. vi Acknowledgements ……………………………………………………………………. vii Chapter One      Introduction …………………………………….……………………. 1 Chapter Two      Methods and Materials …………………………….………………... 5 Chapter Three   Results ………….………………………………………………….. ...11 Chapter Four     Discussion ……………………………………………………….......  20 Tables and Figures …………………………………………………………………….  30 References …………………………………………………………………………….... 49  v  List of Tables  Table 1. PCR primers designed for this study ………………...……………………………. 30  Table 2. Alpha WGD-derived imprinted genes and their paralogs ……………………….. 34  Table 3. Sequence rate analyses for imprinted genes derived from the alpha WGD to detect         asymmetric and accelerated sequence evolution  ………………..……………………. 37  Table 4. Microarray assay results for imprinted genes derived from the alpha WGD and         their paralogs ……………………………………………………………………………. 40  vi  List of Figures  Figure 1. Example tree illustrating the detection of imprinted genes that arose by gene        duplication in the Arabidopsis lineage ………………………………………………….. 42  Figure 2. Phylogenetic trees of FIS2/VRN2 (A) and HDG3/HDG2 (B) in rosids …………. 43  Figure 3. Sequence rate and selection analyses of FIS2, MPC, FWA, and HDG3 .……….. 45  Figure 4. RT-PCR expression assay results of orthologs of FIS2, MPC, FWA, and        HDG3……………………………………………………………………………………… 46  Figure 5. RT-PCR expression assay results of imprinted genes, their paralogs, and        orthologs from outgroup species ………………………………………………………... 47  Figure 6. FIS2 and VRN2 form two types of PRC2 complexes that regulate different        developmental stages in Brassicaceae ……………………………...………………….... 48    vii  Acknowledgements I owe best gratitude to Dr. Keith Adams, who has supervised me throughout the project and supported me through my graduate study.  I thank Dr. Quentin Cronk and Dr. Sean Graham for being my committee members and providing good suggestions for my project.  I thank the Adams lab members for their help and encouragement. I offer my special thanks to Dr. Shao-Lun Liu (now at Tunghai University in Taiwan) and Yii Van Tay for collaboration.   1  Chapter One   Introduction  Gene duplication has long been considered to be a major aspect of genome evolution. Ongoing gene duplication events during evolutionary history have provided new genes that can diverge in function and gain new functions. There are several different types of gene duplication events. The largest scale of gene duplication is whole genome duplication (WGD), which doubles the entire genome. WGDs have been shown across eukaryotes including yeasts, animals, and plants (reviewed in Van de Peer et al. 2009). All angiosperms have experienced at least one round of ancient WGD event at the early stage of their evolutionary history, and many lineages have one or more additional polyploidy events (Cui et al. 2006; Soltis et al. 2009). In particular, the Arabidopsis lineage has experienced two WGD events after the divergence from a common ancestor with Carica (papaya) in the order Brassicales (Barker et al. 2009).  The most recent WGD, the alpha WGD that is specific to Brassicaceae, has contributed approximately 2500 pairs of duplicated genes to the Arabidopsis genome (Blanc et al. 2003; Bowers et al. 2003). Other kinds of gene duplication at small scales, such as tandem duplication, segmental or chromosomal duplication, and duplicative retroposition, have also continually enlarged the pool of duplicated genes in existing genomes.  After gene duplication, duplicated gene pairs may experience different fates. The most likely outcome is that one copy might be lost or become a pseudogene. To avoid the reversion to a single copy state, divergence of expression or function is one way facilitating their preservation. For instance, one gene can gain a novel function compared to the pre-duplicated ancestral gene, or a new expression pattern. This mechanism of functional divergence is referred to as neofunctionalization, or regulatory neofunctionalization if the change is associated with expression patterns (Force et al. 1999). Genes that have undergone neofunctionalization often 2  have experienced an accelerated rate of sequence evolution, compared to paralogs that have greater evolutionary constrains; this will result in asymmetric amino acid sequence evolution in duplicated gene pairs (e.g., Van de Peer et al. 2001; Byrne and Wolfe 2007; Liu et al. 2011). Several studies have explored the evolution of duplicate genes expression patterns in plants, and cases of regulatory neofunctionalization have been found (e.g., Duarte et al. 2006; Liu and Adams 2010; Liu et al. 2011).  Another way in which duplicated genes could diverge from each other is by acquisition of imprinting by one copy. That molecular mechanism, however, has received little attention to understand its relative contribution to the retention of duplicated genes. Genomic imprinting is when an allele is expressed or silenced depending on its parental origin, resulting in mono-allelic expression.  The phenomenon has been shown in mammals and flowering plants (Berger and Chaudhury 2009). A maternally imprinted gene has only paternal allele expression, so it is also regarded as a paternally expressed gene (abbreviated as peg). Similarly a paternally imprinted gene has only maternal allele expression (abbreviated as meg). In flowering plants, the imprinted expression is primarily within the endosperm, a tissue that facilitates the absorption of resources from maternal tissue to embryo, and which stores and provides nutrients to the embryo while the seed develops (Berger and Chaudhury 2009).  As of the beginning of 2011 there were only a few reported imprinted genes in flowering plants, including 11 in Arabidopsis thaliana and six in maize (reviewed in Berger and Chaudhury 2009; Raissig et al. 2011). A subset of the imprinted genes are recently duplicated genes. For example, MATERNALLY EXPRESSED PAB C-TERMINAL (MPC), which shows maternal-specific expression, encodes the C-terminal domain of poly(A) binding proteins that play roles in mRNA stability and translation (Tiwari et al. 2008).  Its paralog in Arabidopsis is 3  the full-length poly(A) binding protein PAB8 and that gene is not imprinted (Tiwari et al. 2008). FLOWERING WAGENINGEN (FWA, also called HOMEODOMAIN GLABROUS 6, HDG6) is a transcriptional factor in the homeodomain-leucine zipper (HD-ZIP) Class IV family. It has maternal expression in the female gametophyte and silique (Nakamura et al. 2006). Its ectopic expression in vegetative tissue causes a late flowering phenotype (Kinoshita et al. 2004). Its paralog, HDG7, is also an HD-ZIP transcription factor but functions in vegetative meristems (Nakamura et al. 2006).  HDG3 and HDG2 are also a pair of duplicated genes in the HD-ZIP IV family, where HDG3 is a paternally expressed gene of unknown function (Gehring et al. 2009). Another maternally expressed gene, FERTILIZATION INDEPENDENT SEED 2 (FIS2), is a duplicated gene derived from VERNALIZATION2 (VRN2).  FIS2 and VRN2 are both VEF domain-containing polycomb group proteins. They each are a part of a polycomb repressive group complex (PRC2) that negatively regulates gene expression. VRN2 also has a recent tandem duplicate, VEF_L36, with unknown function (Chen et al. 2009). Among those imprinted genes that arose by gene duplication events, only the evolutionary history of MEDEA in Arabidopsis has been studied in detail. After divergence from its non-imprinted paralog SWN, MEDEA no longer has expression in vegetative organs as SWN does, but shows an overlapping but not identical expression pattern in reproductive organs, along with accelerated sequence rate evolution and evidence for positive selection in Arabidopsis (Spillane et al. 2007). It is the only imprinted gene so far to show evidence of neofunctionalization after gene duplication. During 2011 several genome-wide investigations of imprinting using RNA-seq approaches in Arabidopsis thaliana, maize, and rice were published (Gehring et al. 2011; Hsieh et al. 2011; Luo et al. 2011; Waters et al. 2011; Wolff et al. 2011; Zhang et al. 2011).  Three genome-wide studies in Arabidopisis together reported more than 300 imprinted genes, leading to an explosion 4  of known imprinted genes in Arabidopsis (Gehring et al. 2011; Hsieh et al. 2011; Wolff et al. 2011).  This study aims to characterize sequence evolution and expression patterns of imprinted genes and their paralogs from a macroevolutionary phylogenetic perspective in Arabidopsis thaliana. Arabidopsis thaliana was chosen because more duplicated imprinted genes are known from this species, both at the start of this project and during the early stages, as the first transcriptome-wide identification studies were reported from Arabidopsis thaliana.  In particular I have two main objectives in this study: 1) Analyze the sequence evolution of imprinted genes to determine if there is accelerated amino acid sequence evolution compared with their paralogs and if the imprinted genes show evidence for positive selection. 2) Compare expression patterns of imprinted genes with their paralogs, and evaluate expression patterns of orthologs in outgroup species to infer if there has been regulatory neofunctionalization in the imprinted genes after gene duplication. Collectively these objectives will indicate which imprinted genes show evidence for neofunctionalization after duplication and provide insights into the interplay between the origin of genetic imprinting and the retention of gene duplicates.   5  Chapter Two   Methods and Materials  Phylogenetic analyses of imprinted genes and their paralogs in Arabidopsis and identification of orthologs in other species Imprinted genes in this study were obtained from three different sources: 11 imprinted genes known in late 2010 (reviewed in Berger and Chaudhury 2009; Raissig et al. 2011), 126 imprinted genes identified in Hsieh et al. (2011) and 65 imprinted genes identified in Wolff et al. (2011).  To identify imprinted genes that arose from recent gene duplication events during the evolution of Brassicaceae, phylogenetic analyses were performed on gene families containing imprinted genes. For each imprinted gene, sequences (both nucleotide and amino acid) of family members from Arabidopsis thaliana, Carica papaya (CP), Populus trichocarpa (PT), Vitis vinifera (VV), Ricinus communis (RC), Manihot esculenta (ME), and Zea mays (ZM) were obtained from PLAZA 2.0 (http://bioinformatics.psb.ugent.be/plaza/) (Proost et al. 2009). Amino acid sequences were aligned using MUSCLE with default settings (Edgar 2004), followed by manual editing with BioEdit to improve the alignments (Hall 1999). The aligned amino acid sequences were then reverse translated to get the codon alignments. A maximum likelihood tree based on the codon-based DNA alignment for each gene family was estimated in Garli v1.0 (Zwickl 2006) using default parameters. Tree topologies were compared with the expected species tree. Imprinted genes were selected for subsequent analyses if they had recent paralogs specific to Brassicaceae (Figure 1). Orthologs of imprinted genes and their paralogs were identified according to the gene-tree topology and used for sequence rate analyses and expression assays (Figure 1).  6  Imprinted genes that arose from the alpha WGD were identified with their paralogs according to Blanc et al. (2003) and Bowers et al. (2003).  Imprinted genes that arose from recent tandem duplication were identified by the adjacent loci numbers and further confirmed according to Haberer et al. (2004).  To further examine the phylogenetic timing of FIS2 formation by duplication from VRN2, I performed a more detailed phylogenetic analysis of FIS2/VRN2 genes. Additional homologs of FIS2 and VRN2 from eurosids were obtained, including sequences from Theobroma cacao from PLAZA 2.5 (http://bioinformatics.psb.ugent.be/plaza/) (Van Bel et al. 2012), Capsella rubella, Glycine max, and Medicago truncatula from phytozome v8.0 (http://www.phytozome.org/), Thellungiella parvula from the Thellungiella genomics website (http://thellungiella.org/) (Dassanayake et al. 2011), Brassica rapa from the BRAD Brassica database (http://brassicadb.org/brad/) (Cheng et al. 2011), and Cleome spinosa from Eric Schranz’s lab. Alignments were generated using the same methods described above. A maximum likelihood tree based on the codon alignment was generated by RAxML v.7.0.3 (Stamatakis 2006). Bootstrapping with 200 replicates was applied to determine support for each clade. The same approach was also applied to HDG3/HDG2 genes, with fewer sequences in rosids.  Ka/Ks analyses to analyze sequence rate evolution of imprinted genes and their paralogs For the gene pairs FIS2/VRN2, MPC/PAB8, FWA/HDG7, and HDG3/HDG2 (all of which were among the first imprinted genes to be identified) selection analyses were performed. For each gene pair, sequences from Arabidopsis thaliana, Carica papaya, Vitis vinifera, Populus trichocarpa, Ricinus communis, and Manihot esculenta were aligned using MUSCLE. Branchwise Ka/Ks ratios along the phylogenetic tree branches were estimated using a 7  phylogeny-based free-ratio test using Codeml in PAML (Yang 2007). To test if the Ka/Ks ratio of imprinted genes and their paralogs evolved in an asymmetric fashion, two-ratio models and three-ratio models were implemented for comparison. The first model assumes that the Arabidopsis duplicate branch has one Ka/Ks ratio, while the orthologs have a different ratio, which is a hypothesis that the two Arabidopsis copies evolved at the same rate. The second model assumes that the two Arabidopsis branches have different Ka/Ks ratios, and thus the two genes evolved at different rates, with the third Ka/Ks ratio for the ortholog branch. A likelihood ratio test was applied, where twice the different of likelihood values was calculated and compared against a chi-square distribution with the degree of freedom (df) set at one to determine whether sequence evolution is asymmetric. When the second model fits better than the first model with statistical support by a likelihood ratio test, the evolutionary rate of the duplicated pair is considered to evolve in an asymmetric fashion.  Plant materials, nucleic acid extraction, and RT-PCR for expression assay and inference of ancestral expression state Total RNA was extracted from Arabidopsis thaliana (col-0), Carica papaya (cultivar Sun-Up), and Vitis vinifera (cultivar Pinot Noir). For each species five organ types were used for RNA extraction: root, stem, leaf (rosette leaves in Arabidopsis), flower, and seed (whole siliques in Arabidopsis). Fresh plant materials were collected and frozen in liquid nitrogen. A modified CTAB method was used for RNA extraction (Zhou et al. 2011). The quality of RNA was checked on 2% agarose gels by electrophoresis, and the quantity of RNA was determined by a Nanodrop spectrophotometer. DNase (Invitrogen) treatment was applied to remove residual DNA, according to the manufacturer’s instructions. M-MLV reverse transcriptase (Invitrogen) 8  was used to generate cDNA, according to the manufacturer’s instructions. PCR was performed with cDNA templates to detect the organ-specific expression of target genes. Gene-specific primers were designed to amplify 250-1000 bp of the cDNA of targeted genes (Table 1). For PCR analyses, the cycling programs were: preheating at 94℃ for 3 minutes; 30-35 cycles of denaturing at 94℃ for 30 seconds, annealing at 55-58℃ for 30 seconds, elongation at 72℃ for 30 seconds, and a final elongation at 72℃ for 7 minutes. PCR products were checked on 1.2% agarose gels.  Sequence rate analyses and detection of asymmetric sequence evolution To detect whether there has been a significant difference in the rate of protein sequence evolution between genes in each duplicated pair in a large scale, I followed the method described in Blanc and Wolfe (2004) and Liu et al. (2011). For each imprinted gene, a triplet of amino acid sequences was constructed with the imprinted gene, its paralog, and an outgroup gene. Orthologs from Carica papaya, Vitis vinifera, and Ricinus communis were used as outgroup sequences one at a time. These outgroup sequences were first identified by phylogenetic analyses, and they were further confirmed by best reciprocal BLASTP hits, as described in Hulsen et al. (2006) and Liu et al. (2011). Three individual rounds of analyses with different outgroup sequences were carried out. In each triplet, three sequences were aligned using MUSCLE (Edgar 2004). All alignments were manually checked using BioEdit (Hall 1999).  Gaps in outgroup sequences were compared to genomic sequences from GenBank at NCBI in order to determine if the gaps were real or errors caused by potential annotation problems. Then the sequence triplets were analyzed with two models of evolution: unconstrained and clock-like. Model I assumed that all sequences are unconstrained to evolve at their unique rates, so all the branch lengths can be different. 9  Model II assumed that the duplicates have the same rate, so the two branch lengths are set equal. The likelihood estimates were obtained using Codeml in PAML (Yang 2007). To test whether the two models are significantly different, a likelihood ratio test (LRT) was applied. Twice the difference of the likelihood estimates (X=2(Ln1-Ln2)), where X indicates twice likelihood ratio, Ln1 indicates the likelihood estimate from Model I, and Ln2 indicates the likelihood estimate from Model II, was compared against a chi-square distribution with one degree of freedom (df) (Felsenstein 1981). A significant difference (P<0.05) indicates that the duplicated pair has asymmetric (non-clock like) sequence rate evolution. In a gene pair showing asymmetric evolution, whether the imprinted gene evolves faster or slower than its paralog was determined by comparison of the branch lengths estimated from Model I. For each gene pair, three separate tests were applied using different outgroup sequences as references, Carica, Vitis, and Ricinus, respectively. Whether the gene pair showed asymmetric evolution and which copy evolved faster was determined according to the majority outcome (two or three out of three) from the three tests.  Microarray analyses for expression breadth of imprinted genes compared with their paralogs ATH1 microarray data were obtained from the TAIR website (http:// www.arabidopsis.org/). 63 different organ types and developmental stages from Arabidopsis thaliana were included (Schmid et al. 2005). The data were normalized following the description in Liu et al. (2011). To compare the expression breadth of imprinted genes and their paralogs, I calculated two indices, expression width and organ specificity. Expression width is defined by the number of organ types and developmental stages in the total of 63 types that show significant expression level of a gene. It is based on the presence of absence of expression in each organ type (Liu et al. 2011). A 10  gene with broader expression would have a greater expression width. Expression organ specificity (τ) is calculated by the formula according to Yang and Gaut (2011): τ        (where n = 63 is the number of organ type and S(i,max) is the highest log2 transformed expression values for gene i across the n organ type. A gene with expression limited to one or a few organ types or developmental stages would demonstrate a high organ specificity, while broadly expressed gene with similar expression level in most of organ types and developmental stages would have a low organ specificity.   11  Chapter Three   Results  Identification of imprinted genes that arose from gene duplication in Brassicaceae At the beginning of this project there were 11 known imprinted genes in Arabidopsis thaliana (reviewed in Berger and Chaudhury 2009; Raissig et al. 2011).  A subset of the genes are members of pairs duplicated by the alpha whole genome duplication (Blanc et al. 2003; Bowers et al. 2003), such as MPC with its paralog PAB8, and FWA with its paralog HDG7. HDG3 and HDG2 are also an alpha WGD pair in the HD-ZIP IV family. Although in Nakaruma et al. (2006) their formation was reported as being before the divergence between gymnosperms and angiosperms, my phylogenetic analysis indicates it occurred in Brassicaceae (Figure 2), which is supported by the two studies of genes derived from the alpha WGD (Blanc et al. 2003; Bowers et al. 2003). Other kinds of gene duplication events also created imprinted genes, such as FIS2 which is derived from VRN2. Although it is not documented as an alpha WG duplicate by Blanc et al. (2003) or Bowers et al. (2003), my phylogenetic analysis indicates that it is also a recent formed gene that arose after the divergence of the Brassicaceae lineage from the Cleomaceae linage and the Caricaceae lineage (Figure 2).  Rapid amino acid sequence evolution in imprinted genes compared to their paralogs Next I tested the hypothesis that the imprinted genes are evolving significantly faster than their paralogs.  I applied a free-ratio analysis, which allows each branch of phylogeny to have its own Ka/Ks ratio, using PAML to calculate the Ka/Ks ratio of imprinted genes and their paralogs. I found that MPC, FWA, HDG3, and FIS2 all experienced accelerated sequence rate evolution (i.e., higher Ka/Ks ratio) compared with their paralogs (Figure 3). The Ka/Ks ratio of MPC is approximately twice that of PAB8, and the Ka/Ks ratio of FWA is approximately three times that 12  of HDG7. HDG3 has a Ka/Ks ratio approximately seven times that of HDG2. All of those imprinted genes show evidence for relaxing of purifying selection, while the non-imprinted paralogs were under strong purifying selection. FIS2 shows a Ka/Ks ratio greater than 1, suggesting that positive selection has acted on the sequence of FIS2, and frequent amino acid substitutions have been fixed by selection in this imprinted gene, whereas its non-imprinted paralog VRN2 has a rather low Ka/Ks ratio indicating purifying selection (Figure 3). The results of these analyses indicate that all four of the imprinted genes are evolving considerably faster than their paralogs.  Imprinted genes have new expression patterns All of the imprinted genes with non-imprinted paralogs presented above show expression that is restricted to reproductive organs, such as flowers, fruits, and/or seeds, but not vegetative organs, including roots, stems, and leaves. In contrast their paralogs usually have a broader expression pattern, including both vegetative and reproductive organs, and most of them are ubiquitously expressed in all examined organ types, according to microarray data from the Arabidopsis thaliana developmental expression atlas of Schmid et al. (2005) and RT-PCR results (Figure 4). FIS2, MPC, and HDG3 have flower- and/or silique-specific expression, whereas their alpha WG duplicates, VRN2, PAB8, and HDG2 are expressed broadly in vegetative and reproductive organs. FWA is expressed in flowers and seeds.  In contrast its alpha WG duplicate HDG7 is expressed only in young vegetative tissue, such as root and stem tips, and because of its low expression level in limited tissue types, the expression was not detected by RT-PCR. In this case the imprinted gene FWA has a complementary expression pattern to its paralog. 13   To infer the ancestral expression pattern of these gene pairs and test the hypothesis that the expression of the imprinted genes became more restricted after their formation by gene duplication, I assayed the expression pattern of orthologs in Carica papaya and Vitis vinifera. Carica was chosen because it is in the order Brassicales, and Vitis was chosen because its lineage has not experienced any whole genome duplication events since the gamma WGD during early eudicot evolution, which applies to Carica as well, and thus genes are frequently single copy in these taxa, which facilitates the analysis.  Orthologs were identified by phylogenetic analysis of the gene families (Figure 2 and additional trees not shown). For FIS2/VRN2, MPC/PAB8, and HDG3/HDG2 pairs, their orthologs in Carica and Vitis are widely expressed in all examined organ types, which is the same as the non-imprinted paralogs of the imprinted genes in Arabidopsis (Figure 4). Collectively the inferred pre-duplicated expression state would be a broad expression pattern, which the paralogs of the imprinted genes reflect, and thus the imprinted genes have lost expression in vegetative organs to become specifically expressed in reproductive organs. The FWA/HDG7 pair shows a different pattern: The HDG7 orthologs from Carica and Vitis have no expression in reproductive organs, which is also the same as HDG7 in Arabidopsis, suggesting the gain of reproductive expression in the imprinted FWA. Among the four pairs of duplicates where one gene is imprinted, the imprinted genes are characterized by expression only in reproductive organs, which is a new expression pattern compared to the ancestral expression pattern which is retained by the paralogs of the imprinted genes in Arabidopsis. Overall, all of the imprinted genes studied above (FIS2, MPC, FWA, and HDG3) show accelerated sequence evolution, as well as major changes in their expression patterns, to become specifically expressed in seeds, or seeds and flowers.  14  A large number of imprinted genes were formed by duplication during the evolution of Brassicaceae During the course of my thesis research, two large-scale identification studies of imprinted genes, using Illumina transcriptome sequencing, in Arabidopsis thaliana were published (Hsieh et al. 2011; Wolff et al. 2011).  A total of 126 imprinted and putatively imprinted genes were identified by Hsieh et al. (2011), including 116 maternally expressed genes (megs) and 10 paternally expressed genes (pegs), and 65 imprinted genes were identified in Wolff et al. (2011), including 39 megs and 26 pegs.  Together the two studies identified 183 imprinted and putatively imprinted genes including 149 megs and 34 pegs.  I analyzed the set of imprinted genes to identify those that arose by gene duplication during the evolution of Brassicaceae, based on the phylogenies of gene families (Figure 1). Based on the tree topologies, 133 out of the 183 imprinted genes (73%) arose from evolutionarily recent gene duplication events in the Brassicaceae lineage after it diverged from the Caricaceae lineage, where there is more than one paralog in the Arabidopsis clade (Figure 1). Among the 133 imprinted genes, 54 genes have alpha WG duplicates (Table 2), and 50 genes have tandem duplicates. Other kinds of gene duplication events also have contributed to the formation of the imprinted genes. Most of the paralogs are not reported as imprinted (Hsieh et al. 2011; Wolff et al. 2011), suggesting many imprinted genes might be specific to the Brassicaceae lineage, and that the acquisition of imprinting happened after those gene duplication events.  Asymmetric sequence evolution is common in duplicated pairs with imprinted genes The gene pairs FIS2/VRN2, MPC/PAB8, FWA/HDG7, and HDG3/HDG2 all show asymmetric sequence evolution in terms of the overall rate of amino acid substitutions, and in each case the 15  imprinted gene evolved faster than the paralog. I wanted to determine how common this phenomenon is among imprinted genes and their paralogs by studying the newly discovered imprinted genes (Hsieh et al. 2011; Wolff et al. 2011). Gene pairs with imprinted genes that arose from the alpha WGD were used in this test, because they were simultaneously duplicated, unless there are more than four Arabidopsis copies in the same clade. I applied an analysis described in Blanc and Wolfe (2004) (see methods section for details) to test the hypothesis that there is asymmetric sequence evolution in duplicated pairs with imprinted genes. There is a major difference in the method of detecting imprinted genes in the two studies. Hsieh et al. (2011) applied micro dissection to purify endosperm and then analyzed the endosperm transcriptome to find evidence for allelic biased expression and imprinting. However Wolff et al. (2011) sequenced the transcriptome from whole seeds, then they discarded genes having seed expression outside of the endosperm, and as a result they only analyzed genes with endosperm- specific expression in the seed. Considering that the two research groups used different methodology for identifying imprinted genes, which may be partly responsible for the low level of overlap between the genes identified by the two studies, I decide not to mix them up and instead to keep them as two sets of genes.  Of the imprinted genes identified by Hsieh et al. (2011), 16 out of 36 (about 44%) pairs of alpha WG duplicated genes show asymmetric protein sequence evolution, and within the 16 pairs, there are 10 pairs (about 63%) where the imprinted genes evolved faster (Table 3). In contrast, of the imprinted genes from Wolff et al. (2011), eight out of 11 (about 73%) of alpha WG duplicated pairs have asymmetric protein sequence evolution, and seven (about 88%) of the imprinted genes evolved faster than their paralogs (Table 3). Although there are a few cases in which the paralogs evolved faster than the imprinted genes, I found a frequent acceleration in 16  amino acid sequence evolution of imprinted genes compared with their paralogs derived from the alpha whole genome duplication. This pattern is much more apparent in the imprinted genes with endosperm-specific expression within the seed, from Wolff et al. (2011).  Overall, the patterns of accelerated sequence rate evolution of the imprinted genes FWA, FIS2, MPC, and HDG3 that I reported above appear to be common among imprinted genes with endosperm-specific expression in the seed, and not limited to those four imprinted genes.  Many imprinted genes have more restricted expression patterns and preferential expression in reproductive organs Next I wanted to determine the frequency of imprinted genes that have a more restricted expression pattern than their recent paralogs, to see if the pattern I found in the first set of four imprinted genes is more generally applicable to duplicated imprinted genes.  I used Arabidopsis ATH1 microarray data from 63 different organ types and developmental stages (Schmid et al. 2005).  For both imprinted genes and their paralogs, I calculated the expression width, defined as how many organ types and developmental stages in which a gene has significant expression, and expression organ specificity which indicates whether a gene has preferred expression in few organ types and developmental stages or broad expression in most organ types and developmental stages. All gene pairs with imprinted genes that arose from the alpha WGD analyzed in the rate analyses in the previous section were also used in this test if microarray data were available for both copies. In some cases, microarray data were not available for one or both copies.  For imprinted genes from Hsieh et al. (2011), 11 out of 24 (about 46%) of the imprinted genes are expressed in fewer organ types and developmental stages than their alpha WG paralogs 17  (Table 4). Calculating the expression specificity in different organ types, 13 out of 24 imprinted genes, about 54%, have higher organ specificity of expression than the alpha WG paralogs; nine pairs overlapped in the two analyses. A similar trend is more apparent with imprinted genes from Wolff et al. (2011): six out of nine imprinted genes have limited expression, and six out of nine imprinted genes have higher organ specificity of expression (Table 4).  I also noticed that the imprinted genes with limited expression patterns also have higher expression levels in reproductive organs. In contrast, in those cases where the paralogs have more restricted expression patterns or higher organ specificity, their expression is more towards levels seen in vegetative organs. For example, imprinted gene AT5G15140 (galactose mutarotase-like) and its paralog AT3G01260 both have high organ specificity (0.74 and 0.73 respectively); however for AT5G15140, the highest expression is in flower and silique, and AT3G01260 has highest expression in root. Overall the reproductive-specific expression is a relatively common feature of imprinted genes, which is quite different from the broadly expressed paralogs.  Expression patterns of imprinted genes changed after gene duplication to become restricted to reproductive organs The first four imprinted genes in this study (FWA, FIS2, MPC, and HDG3) all have a different expression pattern from the inferred ancestral status. Considering the divergence of expression pattern between many duplicated pairs with imprinted genes that I found in the analysis described in the previous section, I wanted to test the hypothesis that the imprinted genes with limited expression represent loss of expression in vegetative organs.  Thus I investigated the expression pattern of orthologs from Carica and Vitis to infer the likely ancestral expression pattern of the duplicates.  I chose seven gene pairs from the microarray analysis that have 18  expression of the imprinted gene restricted to reproductive organs and broad expression of the paralog.  Five of the seven imprinted genes are only expressed in flowers and/or siliques of Arabidopsis (Figure 5): the maternally expressed genes JMJ15 (AT2G34880, a histone demethylase) and VEL2 (AT2G18880, vernalization5), along with the paternally expressed genes TAR1 (AT1G23320, tryptophan aminotransferase related), PKR2 (AT4G31900, pickle related 2), and AT3G50720 (a protein kinase).  In contrast the alpha WGD paralogs all have broad expression in both vegetative and reproductive expression (Figure 5).  The Carica and Vitis orthologs of each of the seven gene pairs from Arabidopsis are broadly expressed (Figure 5), suggesting that the pre-duplicated expression pattern is more likely to be broad expression. Thus the imprinted genes in Arabidopsis likely lost expression in vegetative organs, and the restricted expression pattern of the imprinted genes is the derived status. The maternally expressed gene AT1G54280 (an ATPase), has expression in stems besides flowers and siliques. However, the alpha WG paralog AT3G13900, and the Carica and Vitis orthologs are all broadly expressed in all tested organ types, suggesting that a similar loss of expression in many vegetative organs occurred in the imprinted AT1G54280.  Collectively those imprinted genes showing reproductive-specific expression have experienced expression loss in vegetative organs after gene duplication, and the paralogs usually reflect the ancestral broad expression status.  A different kind of expression pattern is exhibited by the paternally expressed gene AT5G15140 (galactose mutarotase-like) compared to its alpha WG paralog, AT3G01260. AT5G15140 has high expression in flower and silique, while AT3G01260 has only root expression (Figure 5). The expression pattern of Carica and Vitis orthologs turned out to be ubiquitous in all tested 19  organ types, suggesting broad expression as the pre-duplicated status. Thus both duplicated genes in Arabidopsis have lost expression in certain organ types.   20  Chapter Four   Discussion  Imprinted genes with endosperm-preferential expression in seeds frequently show evidence of neofunctionalization after gene duplication After their formation, duplicated gene pairs may have different fates. Those include functional diversification and neofunctionalization, regulatory neofunctionalization, subfunctionalization, and other kinds of changes in expression patterns, among other fates. Genes that experienced neofunctionalization can show a novel expression pattern, rapid amino acid substitution rates, and sometimes evidence for positive selection (Blanc and Wolfe 2004). In this study I tested the hypotheses that duplicated imprinted genes show accelerated sequence evolution compared with their paralogs, that they show expression restricted to reproductive organs in contrast to their broadly expressed paralogs, and that the reproductive organ-specific expression was derived after gene duplication. I found that after gene duplication, many imprinted genes have evolved a novel expression pattern restricted to reproductive organs. Of the genes with endosperm-specific expression in the seed from Wolff et al. (2011), 66% of them showed expression limited to reproductive organs.  In addition many of them (64% of genes examined) have experienced accelerated sequence rate evolution compared with their paralogs. FIS2 shows evidence for positive selection. Those findings suggest regulatory neofunctionalization and functional specialization in the endosperm. Although for most cases the function of the imprinted genes and their paralogs has not been characterized, there are a few duplicated gene pairs with functional data for one or both copies; those will be discussed further in the next two sections of the Discussion.  Although the general trend is similar in all genes analyzed in my research, the features of imprinted genes in Hsieh et al. (2011) are different from those in Wolff et al. (2011) to some 21  extent. The majority of the imprinted genes in Wolff et al. (2011) have a restricted expression pattern and an accelerated sequence evolution rate compared to their paralogs; however, although this trend is found in imprinted genes in Hsieh et al. (2011), it is less frequent than in the genes from Wolff et al. (2011). This is probably because of the differences between the criteria in assaying and filtering for imprinted genes.  When sequencing and analyzing the endosperm transcriptome from small seeds to identify imprinted genes, the most difficult part is to avoid contamination from maternal tissue, such as the seed coat or nucellus. The potential contamination would bring in more maternal transcripts, and lead to an artifact of more expression of maternal alleles, resulting in false positives in meg identification or failure in peg identification. In order to avoid maternal contamination and increase the confidence of imprinted gene identification, different approaches were carried out in the two studies. In Hsieh et al. (2011), seeds were micro-dissected and laser capture micro-dissection (LCM) was applied to isolate the endosperm. As micro-dissection technically eliminated the maternal tissue, the Illumina sequences were considered transcripts from pure endosperm, and were tested for biased allelic expression. In Wolff et al. (2011), in contrast, the transcriptome from the whole seeds was sequenced. However, only genes with significant endosperm expression, but not seed coat or other parts of the seed according to microarray data, were taken into the subsequent analysis. The different criteria to avoid maternal contamination may lead to the different nature of the two sets of genes: the genes in Wolff et al. (2011) are strictly endosperm-specifically expressed in seed, and thus they are selected to have limited expression in seeds at the very first experimental step in the original study.  In contrast, imprinted genes identified in Hsieh et al. (2011) were not pre- selected for expression width. 22   As mentioned above, imprinted genes in Wolff et al. (2011) have restricted expression in endosperm in seeds, and it is also true that most of them have reproductive organ-preferred expression. In contrast the expression pattern for the paralogs is usually broad in both vegetative and reproductive parts. Also I found a correlation between limited expression pattern and accelerated sequence rate evolution for imprinted genes, while widely-expressed paralogs tend to experience stronger evolutionary constrains. I also found this trend by analyzing imprinted genes in Hsieh et al. (2011). This is consistent with the finding that imprinted genes with endosperm- specific expression evolve faster compared with genes in Arabidopsis in general (Wolff et al. 2011).  Neofunctionalization of MPC and FWA Imprinted MPC and non-imprinted PAB8 are an alpha WGD pair, and MPC is specific to Brassicaceae. However, unlike most other genes duplicated by the alpha WGD, MPC is just one- fourth of the total length of PAB8, aligning only at the 3’ end (in the C terminus of the corresponding protein). In addition to having a new limited expression pattern (Figure 4), MPC has a different function as a truncated protein (Tiwari et al. 2008). PAB proteins are mRNA polyA binding proteins. They bind the polyA tails of mRNAs through N-terminal RNA recognition motifs and interact with other proteins through the C-terminus, affecting mRNA stability and regulating translation. The C-terminus is very conserved, and is recognized by CID proteins carrying a PAM2 domain. MPC could be regarded as a pure PAB C-terminus. It might bind to the PAM2 domain and block the interaction with a complete PAB protein. However, MPC has lost the mRNA binding domain, as a result down-regulating the activities of other PAM2 domain containing CID proteins (Tiwari et al. 2008). Thus, in addition to accelerated 23  sequence rate evolution and a novel restricted expression pattern in reproductive organs (determined in my study), MPC has acquired a new function through loss of functional domains.  FWA and HDG7 are another pair of alpha WG duplicates. They are both homeodomain- leucine  zipper (HD-ZIP) class IV proteins, which are characterized by the homeodomain helix III followed by the leucine zipper-loop-zipper motif (Nakamura et al. 2006). They are transcription factors, but regulate transcription in different manners. HDG7 is expressed in primordial parts of vegetative organs and functions in the L1 layer of the apical meristem. HDG7 was observed binding L1-like box sequences, and likely regulates L1 layer-specific expression (Nakamura et al. 2006). However, FWA has a novel expression pattern in reproductive organs that is different from HDG7 and the orthologs from outgroup species.  FWA expression is female gametophyte- and endosperm-specific, and the epigenetic FWA mutant with ectopic FWA expression has a late-flowering phenotype (Kinoshita et al. 2004). Although its epigenetic regulation has been extensively studied, the function of FWA in seeds is not yet known.  Neofunctionalization of FIS2 created a new mechanism for preventing seed development prior to fertilization in Brassicaceae that replaced the ancestral mechanism FIS2 formed from VRN2 by duplication after the divergence of the Brassicaceae lineage from the Caricaceae lineage (Figure 2). Both genes code for VEF-domain-containing polycomb group proteins (Chen et al. 2009).  Imprinted FIS2 shows several features of a neofunctionalized gene: a novel expression pattern that has become restricted to flowers and seeds after duplication from VRN2 (Figure 4), an accelerated rate of evolutionary sequence change along with positive selection in the VEF functional domain (Figure 3), changes in functional domain structures (Chen et al. 2009), and ultimately a novel function in the female gametophyte (discussed below). 24  FIS2 has a large serine-rich domain that is not shared with any other VEF genes in any species, suggesting gain of the domain in Brassicaceae (Chen et al. 2009).  FIS2 also lost several exons, called the E15-17 region, compared to VRN2 (Chen et al. 2009). In addition to major changes in functional domains, FIS2 has a very divergent VEF domain, showing evidence of positive selection (Figure 3), whereas the VEF domains in VRN2 and all VRN2/EMF2-like sequences across flowering plants are relatively conserved (Chen et al. 2009).  As polycomb group proteins containing VEF-domains, FIS2 and VRN2 join with other polycomb group proteins with SET domains, such as MEA, SWN, or CLF, and with proteins containing WD40 domains, such as MSI1 and FIE, to form different kinds of polycomb repressive complexes (PRC2) (Figure 6). Those complexes regulate gene expression primarily by establishing transcriptional repression at certain methylated cites in target genes (Kohler and Makarevich 2006). VRN2, CLF/SWN, FIE, and MSI1 form a PRC2 complex, called the VRN2 complex, which regulates vernalization to control flowering time in Arabidopsis (Figure 6; reviewed in Hennig and Derkacheva 2009). The VRN2 complex is present across rosids (Chen et al. 2009) and likely functions in a similar way. (VRN2 does not appear to be present outside of rosids; however the EMF2 polycomb group complex, present in all eudicots, is partially redundant with the VRN2 complex.)  In Brassicaceae, FIS2 substitutes for VRN2 to create another kind of PRC2 complex, the FIS2 complex, which functions only in the female gametophyte and the endosperm (Figure 6). The product of the imprinted gene MEDEA (MEA) also functions in the FIS2 complex (Figure 6), and MEA was derived from SWN by the alpha WGD at the base of Brassicaceae, followed by functional divergence (Spillane et al. 2007). The FIS2 complex is important in gametophyte and seed development by mediating repression of gene expression. It has two major functions. The FIS2 complex prevents proliferation of the 25  central cell of the female gametophyte until after fertilization so that seed development does not start until after fertilization (reviewed in Hennig and Derkacheva 2009). The FIS2 complex also acts post-fertilization. It is needed for endosperm cellularization during seed development (Hehenberger et al. 2012). Because FIS2 originated during the evolution of the Brassicaceae lineage after its divergence from the Caricaceae lineage, the role of the FIS2 complex in suppression of seed development until after fertilization, as well as its roles in seed development after fertilization, appear to be evolutionarily recent innovations that are specific to the Brassicaceae lineage. FIS2 is not redundant with VRN2 in the pre-fertilization function, and thus the FIS2 PRC2 complex function in the female gametophyte is clearly specific to FIS2 complexes and not VRN2 complexes (Roszak and Kohler 2011). FIS2 mutants are defective in controlling central cell proliferation in the female gametophyte (Roszak and Kohler 2011). Thus, the ancestral mechanism for preventing the initiation of seed development before fertilization has been abolished in Brassicaceae and replaced with the repression mechanism involving the FIS2 complex. How is this process regulated in other angiosperms? Clues come from studies of FIE, which is a member of the FIS2 complex (Figure 6), in Hieraceum (Asteraceae) and rice. The central cell proliferation phenotype of Arabidopsis fie mutants is not seen in rice fie mutants or in Hieraceum FIE RNAi lines; thus a PRC2 complex does not regulate central cell proliferation in the female gametophyte of rice or Hieraceum, in contrast to Arabidopsis (Rodrigues et al. 2008; Luo et al. 2009). FIE down-regulation in Hieraceum leads to seed abortion (Rodrigues et al. 2008) and thus FIE is important for seed development, perhaps as part of a PRC2 complex. Asterids do not contain FIS2, VRN2, or MEA (Figure 6). Thus, if there is a PRC2 complex 26  regulating seed development after fertilization in asterids, it probably contains the product of a modified EMF2 gene, perhaps a duplicated and neofunctionalized copy of EMF2, as well as a modified SWN gene product.  It would be interesting to identify EMF2 and SWN paralogs in asterids and examine sequence rate evolution and expression patterns to determine if they show accelerated sequence evolution, and if they show flower and seed specific expression. Interestingly I found that the neofunctionalization of FIS2 has parallels to another Brassicaceae-specific gene, SHORT SUSPENSOR (SSP) (Liu and Adams 2010). SSP is an alpha WG duplicate and it has also experienced neofunctionalization. SSP regulates the elongation and asymmetric division of the zygote after fertilization. It has a unique mechanism of gene regulation in that it is transcribed in pollen, and the transcripts are translated only in the zygote after fertilization (Bayer et al. 2009). This paternal control of SSP prevents embryo development before fertilization. Thus the function has parallels to FIS2, which prevents seed development from occurring before fertilization. In addition to its roles in seed development, the FIS2 complex plays an important role in establishing imprinted expression of some genes in the endosperm, as the differentially methylated paternal or maternal allele can affect the targeting by this complex (Kohler et al. 2012). The imprinting of several genes, in particular several pegs, has been shown to be regulated by the FIS2 complex (Wolff et al. 2011).  Multiple recent origins of imprinting in Brassicaceae Analyses from my study, combined with data from the literature, are indicative of multiple recent origins of imprinting in Brassicaceae. Although there was speculation about this from an analysis of the MEDEA gene (Spillane et al. 2007), no studies have shown evidence for recent origins of 27  imprinting in multiple genes. Below I present several lines of evidence that, when considered together, indicate that many imprinted genes originated and became imprinted during the evolution of Brassicaceae.  I determined that 133 imprinted genes originated by duplication during the evolution of Brassicaceae. In only a few cases was the paralog identified as imprinted in the studies of Hsieh et al. (2011) or Wolff et al. (2011). However, for some genes there were not enough RNA-seq reads with informative SNP sites to be able to reliably assess the imprinting status, and for other genes there were no SNPs between the alleles. Thus, the lack of identification of the paralogs as imprinted does not necessarily mean that most of them are not imprinted. Further studies with deeper Illumina sequencing or RT-PCRs assays of individual genes, as well as using genotypes with SNPs between the maternal and paternal alleles for some genes, will be necessary for determining whether or not all of the paralogs are imprinted. However, it is likely that many of the paralogs of the 133 imprinted genes are not imprinted.  One example is the non-imprinted PAB8, which is the paralog of imprinted MPC (Tiwari et al. 2008). There is also the possibility of loss of imprinting in one gene after duplication, and this possibility could be evaluated by assaying the imprinting status in orthologous genes from Carica papaya, which is a member of the order Brassicales, to infer if the ancestral, pre-duplication state was imprinted.  The evolutionarily recent origins of FIS2 (determined in this study) and MEA (Spillane et al. 2007), which are components of the FIS2 complex that participates in regulation of imprinting of some genes (Figure 6), is also indicative of multiple recent origins of imprinting of some genes in Brassicaceae. Interestingly MEA and FIS2 are both recently duplicated genes that underwent neofunctionalization along with acquisition of imprinting. As they also established subsequent imprinting regulation for some imprinted genes, this implies those genes regulated by 28  the FIS2 complex also have imprinting specific to Brassicaceae.  Analysis of fis2 mutants showed that the maternal allele of some paternally expressed genes was activated (Wolff et al. 2011). Those genes include two from my study, PRK2 and AT3G50720 (a protein kinase), which showed accelerated sequence rate evolution compared with their paralogs and a novel restricted expression pattern in reproductive organs. In addition, analyses of fie mutants (another member of the PRC2 complexes; Figure 6) found that the silenced allele of several genes was activated (Hsieh et al. 2011).  Future directions My project aimed to help understand the evolutionary history of imprinted genes that arose by recent gene duplication in Brassicaceae. This study shows that accelerated sequence evolution and novel restricted expression patterns are common features of imprinted genes, suggesting that there has been neofunctionalization of many imprinted genes. The FIS2 gene was especially interesting because its neofunctionalization led to a new mechanism for preventing seed development before fertilization in Brassicaceae.  Future endeavours could include functional studies of imprinted genes that show evidence for neofunctionalization. In addition, one could determine the imprinting status of the paralogs of all of the imprinted genes in this study, as well as studying imprinting of orthologs in outgroup species, to more definitively show multiple origins of imprinting in Brassicaceae. It would also be good to extend evolutionary studies of imprinted genes to other flowering plant species.  One could identify imprinted genes of evolutionarily recent origin and study their sequence evolution and expression patterns. Maize and rice would be good candidates because large-scale identification studies of imprinted genes were published in 2011. It also would be 29  informative to do large-scale identification studies of imprinted genes in other angiosperms, as imprinting has only been studied in Arabidopsis, maize, and rice so far.    30  Tables and Figures  Table 1. PCR primers designed for this study.  Species Gene Primer direction sequence (5' —> 3') Arabidopsis thaliana At FIS2 (AT2G35670) Forward ACCACGACTCAACTAGCAATAG Reverse CTACTAACACGACGCACCTTAG At VRN2 (AT4G16845) Forward CTAGGCAACCCATCGTTTCT Reverse CATCGACTTCATCCTCGCTATC Carica papaya Cp VRN2 (CP00089G00250) Forward GCCAATGGCGTTGGAGCAAGTAAT Reverse AGCATCGAGAAGCCCATGATTCCA Vitis vinifera Vv VRN2 (VV02G00530) Forward AAGGCCTGTTGCAGAAAGCTATGC Reverse AATGCCTCACAAGCCCAAGGAATG Arabidopsis thaliana At MPC (AT3G19350) Forward AAGTTGCCCGTGGTGATT Reverse CCATTGCCTCTTTGACCATTTC At PAB8 (AT1G49760) Forward GCCTTTGGTCCCATTCTATCT Reverse GTTTCCCTCTCGGACTTCTTT Carica papaya Cp PAB8 (CP00200G00230) Forward GCAAACTCCACAAGCAGTTCCGTT Reverse ATTACCGGCCTGTTGTTGTTGAGC Vitis vinifera Vv PAB8 (VV03G06400) Forward GTTCCAATGCCACCTTCCGTTGTT Reverse AACCTCCATAGCCTCTGCAACCTT Arabidopsis thaliana At FWA (AT4G25530) Forward GTGGAGGCTTCAAGAGCTAAA Reverse TCTGAAACTCAGGTGGCTAATG At HDG7 (AT5G52170) Forward AACAAGAACAACGCCCTAAGA Reverse TGAAGGCTGCTCAAGTGAAATA Carica papaya Cp HDG7 (CP00107G00240) Forward AGCAGCATCAGCTGAGGATTGAGA Reverse TGGAGAGCACCACTTCTTGTTCCA Vitis vinifera Vv HDG7 (VV16G00480) Forward TGGCCTCATCCTGACGAGAAACAA Reverse GTCACTGCAAGCTCCACAAACACA 31  Species Gene Primer direction sequence (5' —> 3') Arabidopsis thaliana At HDG3 (AT2G32370) Forward TGGTGGAACTGACAACACTAAT Reverse CATTTCCCTTAGATCCCATCCC At HDG2 (AT1G05230) Forward GACCAAGATCCTCTTCATCCTAAC Reverse CGGTCGATCTCTTCCCTTAATC Carica papaya Cp HDG2 (CP00008G02160) Forward TCGATGATCGGTATGGGCGGAAAT Reverse TCATCTGGGTTCGCTTGTTCTGGA Vitis vinifera Vv HDG2 (VV12G12230) Forward TCAATGGACCGACTGAAGCTGACA Reverse ATGGCCTTCTTCGACATCTCACCA Arabidopsis thaliana At TAR1 (AT1G23320) Forward AGTCACAAACTCGAGCCACCACAA Reverse AGGCTCGACGCATTTGAGATCCTT At TAA1 (AT1G70560) Forward AGAGCGATGCTTTCACTCTTCCCA Reverse TGAGCTTCATGTTGGCGAGTCTCT Carica papaya Cp TAA1 (CP00069G00950) Forward TCTGCTCACGGCGAAATACCCATA Reverse TCACCTGCCCATTTATACAGCCCT Vitis vinifera Vv TAA1 (VV00G18000) Forward TACGGTTGGGAATGCAGTAACGGA Reverse CATGCAAAGGCAGGATGTGGTTCA Arabidopsis thaliana AT1G54280 Forward CGCCACAACGCTACCTTACTTGTT Reverse GAATGCTTCCTTTGAAGCCTGCCT AT3G13900 Forward TTCCACATGCTCTCGGAAACCCTA Reverse ATCAGTTGATGCTGAGACGCCACT Carica papaya CP00025G00770 Forward TTCCCTGCACTGTATCAGCAAGGA Reverse AGCTTCCCTCTCAACTGCCTGATT Vitis vinifera VV09G02820 Forward TTGGGTGGATGGGAAATGGTCTCT Reverse TCTTTGCGTCGACTCTTGCTGAGA Arabidopsis thaliana At VEL2 (AT2G18880) Forward TGGCTCATAAGCTTCTCAAGGGCA Reverse GCTCAATTCTTCGACGCCGCTAAA At VEL1 (AT4G30200) Forward TCTTCGGCAGTGGAAAGAAGCGTA Reverse CAGCCAAAGCCATGGGATCATCAA 32  Species Gene Primer direction sequence (5' —> 3') Carica papaya Cp VEL1 (CP01021G00010) Forward TCAACTGGCTGAGGAGATGAGCAA Reverse GCTGTGCATTCCTTGTCTTGCCTT Vitis vinifera Vv VEL1 (VV11G09110) Forward AGTCACTGGATTCCGTGCTCTCAA Reverse TTGTGTCCTGGAAGGAAACGACCT Arabidopsis thaliana At JMJ15 (AT2G34880) Forward TCCTCGTAGGCGTTGTCATCGAAA Reverse ATGGGATTCTCCAAATCCGTCCCA At JMJ18 (AT1G30810) Forward ATAACTTACCACGGCTTCCTGGCT Reverse TGATACGCCCTCGGGAATGTCAAT Carica papaya Cp JMJ18 (CP00192G00210) Forward AGAAGTGCTGGGAAATGGTGTTGC Reverse ATGGTTTATGGATCTCGTCCGCCA Vitis vinifera Vv JMJ18 (VV10G08120) Forward TGCCCTCTTTGGAATGTTTGCAGG Reverse ATCTCCTCGGTCAGCGTTGTGAAT Arabidopsis thaliana AT3G50720 Forward GCCATTTCTCCGACGATGATGCTT Reverse GTGGCTCTATGCAAGCTCCAACAA AT5G66710 Forward TGTGGAGAATCTTCCAGAAGGCGT Reverse AGTCCGGGACAATCACAGACAACA Carica papaya CP00158G00090 Forward ACGCTGTTATCCAGGACGAACCAT Reverse AATGGAGCCTTGTTTGTCAGCAGC Vitis vinifera VV04G02590 Forward TTATCGGTGGATGGCTCCTGAGTT Reverse TCTGCAGTCACATTGCTTCTGGGA Arabidopsis thaliana At PKR2 (AT4G31900) Forward TTCATCGCTGAGAACGGAGCTCAT Reverse CAAAGAGGCCAAGAAGGCAATGCT At PKL (AT2G25170) Forward AAGCAGCAGGACAAGGAGTTCAGA Reverse ACTCCCACTGTGTCAGCGATCTTT Carica papaya Cp PKL (CP00127G00130) Forward TGCGAAACTGGGAACGGGAATTTG Reverse TCAGCTCCACCAACCTTTCCATCT Vitis vinifera Vv PKL (VV04G12540) Forward TCTGAAGCACCTGGAAACCAGACA Reverse AGCAGAGTCTGTTCTGAAGTGGCA 33  Species Gene Primer direction sequence (5' —> 3') Arabidopsis thaliana AT5G15140 Forward AGATGGTGCCGGTTCTGATGATGA Reverse GGCCAACGGTTGCTCCGAAATAAA At3g01260 Forward TCACGATGACGATGACCATGACGA Reverse TTCCCTTGATCGCCATCAGGACTT Carica papaya CP00014G00240 Forward AAACCACACACCATTGGAAGCACC Reverse ATCAAGTGCTCGTAAGTCTGCCCA Vitis vinifera VV14G01760 Forward TGGCGATGTCGTTCTTGGGTATGA Reverse AGTAAGCGTGCTGGGCTAGATTCA  Arabidopsis thaliana ACTIN Forward AAGCTGTTCTCTCCTTGTACGCCA Reverse TCTTCATGCTGCTTGGTGCAAGTG Carica papaya ACTIN Forward ATTGTGCTGGACTCTGGTGATGGT Reverse TCGGTCTGCAATACCAGGGAACAT Vitis vinifera ACTIN Forward TGCCTGCCATGTATGTTGCCATTC Reverse TGCAGCTTCCATCCCAATGAGAGA   34    Table 2. Alpha WGD-derived imprinted genes and their paralogs.  meg: maternally expressed gene; peg: paternally expressed gene. Genes with locus number and gene name in bold are genes studied in previous sections. In the reference column, H: Hsieh et al. (2011); W: Wolff et al. (2011); G: Gehring et al. (2009); T: Tiwari et al. (2008); K: Kinoshita et al. (2004).  imprinted gene locus number imprinted gene name (if any) imprinted gene description alpha WG paralog paralog gene name reference meg AT3G19350 MPC maternally express PAB C-terminal  AT1G49760 PAB8 T  AT4G25530 FWA, HDG6 HD-ZIP IV protein AT5G52170 HDG7 K, W  AT1G05570 CALS1,GSL6 callose synthase AT2G31960 GSL3 H  AT1G13900 PAP2 purple acid phosphoesterase AT2G03450 PAP9 H  AT1G28050  B-box type zinc finger protein AT2G33500  H  AT1G35580 CINV1, A/N-INVG cytosolic invertase AT4G09510 CINV2, A/N-INVI H  AT1G35630  protease-associated zinc finger protein AT4G09560  H  AT1G54280  ATPase E1-E2 type protein AT3G13900  W  AT1G62660  vacuolar invertase AT1G12240 FRUCT4 H  AT1G64610  WD-40 repeat protein AT5G42010  H  AT1G69900  actin cross-linking protein AT1G27100  H  AT1G73390  BRO1-like domain protein AT1G17940  H  AT1G77000 SKP2B ubiquitin-protein ligase AT1G21410 SKP2A H  AT1G78830  curculin-like mannose-binding lectin protein AT1G16905  H  AT1G79520  cation efflux protein AT1G16310  H  AT2G17990  unknown protein AT4G36105  H  AT2G18880 VEL2, VIL3 vernalization5 AT4G30200 VEL1, VIL2 W  AT2G34880 JMJ15, MEE27 transcription factor AT1G30810 JMJ18 H  AT3G03260 HDG8 HD-ZIP IV protein AT5G17320 HDG9 G 35  imprinted gene locus number imprinted gene name (if any) imprinted gene description alpha WG paralog paralog gene name reference meg AT3G05700  drought-responsive protein AT5G26990  H  AT3G17250  protein phosphatase 2C protein AT1G48040  H  AT3G22810  phosphoinositide binding AT4G14740  H  AT3G25290  auxin-responsive protein AT4G12980  H  AT3G26590  MATE efflux protein AT1G12950 RSH2 W  AT3G27300 G6PD5 glucose-6-phosphate dehydrogenase AT5G40760 G6PD6 H  AT3G54100  O-fucosyltransferase protein AT2G37980  H  AT4G01840 KCO5, TPK5 outward rectifier potassium channel AT1G02510 KCO4, TPK4 H  AT4G12080 AHL1 AT-hook motif nuclear-localized protein AT4G22770  H  AT4G15080  DHHC-type zinc finger protein AT3G22180  H  AT4G16760 ACX1 acyl-CoA oxidase AT2G35690 ACX5 H  AT4G18150  kinase-related protein AT5G46380  H  AT4G26140 BGAL12 beta-galactosidase AT5G56870 BGAL4 W  AT4G39140  RING/U-box zinc finger protein AT2G21500  H  AT5G02630 CAND6 lung seven transmembrane receptor protein AT3G09570  H  AT5G02970  hydrolase AT3G09690  H  AT5G15470 GAUT14 galacturonosyltransferase AT3G01040 GAUT13 H  AT5G17320 HDG9 HD-ZIP IV protein AT3G03260 HDG8 G, H  AT5G44350  ethylene-responsive nuclear protein related AT4G20880 ERT2 H  AT5G47770 FPS1 farnesyl diphosphate synthase AT4G17190 FPS2 H  AT5G53870 ENODL1 early nodulin-like protein AT4G27520 ENODL2 H  AT5G64400  unknown AT5G09570  H peg AT2G32370 HDG3 HD-ZIP IV protein AT1G05230 HDG2 G, H  AT1G17770 SDG17, SUVH7 SET domain protein AT1G73100 SDG19, SUVH3 H, W  AT1G23320 TAR1 trytophan aminotransferase related AT1G70560 TAA1 W 36  imprinted gene locus number imprinted gene name (if any) imprinted gene description alpha WG paralog paralog gene name reference peg AT3G50720  protein kinase AT5G66710  W  AT4G10160  RING/U-box zinc finger protein AT1G33480  W  AT4G31900 PKR2, CHR7 chromatin remodeling factor AT2G25170 CHD3, CHR6, PKL W  AT5G15140   galactose mutarotase-like protein AT3G01260   W    37  Table 3. Sequence rate analyses for imprinted genes derived from the alpha WGD to detect asymmetric and accelerated sequence evolution.  Arabidopsis genes loci numbers in bold are genes studied in previous sections. Outgroup genes used to construct triplets for analyses of asymmetric rate evolution are listed.  If a triplet shows evidence for asymmetric rate evolution between paralogs in Arabidopsis, then the outgroup sequence appears in bold.  The p-value is shown in scientific format if it is less than 0.05. The P-value is replaced by “sym” if it is greater than 0.05, which means a lack of statistical support for asymmetric evolution. NA means no orthologous gene could be found in the outgroup species. In the evolutionary rate of imprinted gene column, “faster” means the imprinted gene has a faster sequence evolution rate compared to its paralog, “slower” means a slower rate for the imprinted gene, and “0” means the imprinted gene and its paralog have similar rates. In the reference column, H: Hsieh et al. (2011); W: Wolff et al. (2011); G: Gehring et al. (2009); T: Tiwari et al. (2008); K: Kinoshita et al. (2004).  imprinted gene alpha WG paralog Carica ortholog Vitis ortholog Ricinus ortholog evolutionary rate of imprinted gene reference AT3G19350 AT1G49760 CP00200G00230 9.09E-23 VV03G06400 3.70E-17 RC29801G00960 3.34E-22 faster T AT4G25530 AT5G52170 CP00107G00240 1.15E-21 VV16G00480 2.03E-17 RC30147G00450 1.95E-22 faster K, W AT2G32370 AT1G05230 CP00008G02160 9.13E-48 VV12G12230 7.26E-49 RC29600G00040 5.13E-56 faster G, H AT1G05570 AT2G31960 CP00189G00090 sym VV13G00640 sym RC30226G00460 sym faster H AT1G35580 AT4G09510 CP00066G00250 5.89E-03 VV18G15230 sym RC29728G00050 2.25E-02 faster H AT1G78830 AT1G16905 CP00077G00010 3.42E-23 VV19G01860 1.50E-03 RC29681G00520 1.04E-06 faster H AT2G34880 AT1G30810 CP00192G00210 4.00E-03 VV10G08120 2.11E-04 RC29776G00050 1.92E-04 faster H AT3G27300 AT5G40760 CP00130G00370 1.86E-03 VV14G01300 8.55E-05 RC27721G00020 3.84E-03 faster H AT5G02630 AT3G09570 NA VV08G02070 2.59E-11 RC29836G00040 1.29E-12 faster H AT5G44350 AT4G20880 CP43226G00010 sym VV10G05860 1.44E-02 RC30075G00290 2.49E-02 faster H AT5G64400 AT5G09570 CP00020G00260 5.36E-04 VV00G21490 1.01E-03 RC29506G00020 1.32E-07 faster H AT2G17990 AT4G36105 CP00006G02330 1.42E-05 VV04G00870 5.78E-07 RC27471G00190 3.64E-07 slower H AT4G01840 AT1G02510 CP00057G00060 4.74E-23 VV07G10410 2.16E-16 RC30076G01240 1.39E-25 slower H AT4G12080 AT4G22770 CP00029G00680 6.05E-03 VV02G07930 1.63E-02 RC30190G02120 1.05E-05 slower H AT4G15080 AT3G22180 CP00008G02800 1.80E-02 VV03G02100 1.21E-05 RC30106G00240 1.13E-02 slower H AT4G16760 AT2G35690 CP00089G00650 1.85E-02 VV00G14230 5.19E-03 RC29948G00330 2.74E-02 slower H 38  imprinted gene alpha WG paralog Carica ortholog Vitis ortholog Ricinus ortholog evolutionary rate of imprinted gene reference AT4G18150 AT5G46380 CP00017G01350 7.88E-13 VV10G03490 1.23E-08 RC29693G01060 2.64E-11 slower H AT1G13900 AT2G03450 CP00001G01000 sym VV01G10300 4.50E-02 RC30174G05330 sym 0 H AT1G28050 AT2G33500 CP00123G00570 sym VV01G00950 sym RC30128G04260 sym 0 H AT1G35630 AT4G09560 CP00066G00310 sym VV18G16070 sym RC28505G00040 sym 0 H AT1G62660 AT1G12240 CP00029G01210 sym VV02G00080 sym RC28200G00060 sym 0 H AT1G64610 AT5G42010 CP00015G00230 sym VV02G05330 sym RC30068G00350 sym 0 H AT1G69900 AT1G27100 CP00014G00550 sym VV01G12860 1.77E-03 RC29822G01270 sym 0 H AT1G73390 AT1G17940 CP00168G00070 sym VV16G04720 sym RC30009G00030 sym 0 H AT1G77000 AT1G21410 CP00026G00750 sym VV18G11240 sym RC29648G00150 sym 0 H AT1G79520 AT1G16310 CP00032G00640 sym VV18G14940 sym RC30186G00100 sym 0 H AT3G05700 AT5G26990 CP03064G00010 sym VV14G09270 sym RC30128G02680 sym 0 H AT3G17250 AT1G48040 CP00146G00660 1.31E-02 VV05G02070 sym RC29889G01230 sym 0 H AT3G22810 AT4G14740 CP00044G01120 sym VV05G05380 sym RC29844G00730 sym 0 H AT3G25290 AT4G12980 CP00008G00860 sym VV00G10770 sym RC28708G00110 sym 0 H AT3G54100 AT2G37980 CP00056G00770 sym VV08G13470 sym RC27798G00180 sym 0 H AT4G39140 AT2G21500 CP00106G00050 sym VV03G02900 sym RC30131G02290 sym 0 H AT5G02970 AT3G09690 CP00045G00400 sym VV08G02470 sym RC29805G00130 sym 0 H AT5G15470 AT3G01040 CP00014G01530 sym VV14G02890 sym RC29630G00120 sym 0 H AT5G17320 AT3G03260 CP00178G00200 sym VV04G07090 9.21E-03 RC30171G00200 sym 0 G, H AT5G47770 AT4G17190 CP00027G02160 sym VV19G06090 sym RC29629G00820 sym 0 H AT5G53870 AT4G27520 CP00080G00790 sym VV12G01710 4.20E-03 RC29668G00110 sym 0 H AT1G17770 AT1G73100 CP00054G00470 4.52E-08 VV13G06870 6.11E-19 RC29686G00440 2.56E-24 faster H, W AT1G23320 AT1G70560 CP00069G00950 sym VV00G18000 3.39E-03 RC27504G00140 3.73E-02 faster W AT2G18880 AT4G30200 CP01021G00010 9.60E-15 VV11G09110 4.80E-19 RC29763G00100 1.16E-23 faster W 39  imprinted gene alpha WG paralog Carica ortholog Vitis ortholog Ricinus ortholog evolutionary rate of imprinted gene reference AT3G50720 AT5G66710 CP00158G00090 sym VV04G02590 6.42E-04 RC29646G00150 3.39E-02 faster W AT4G10160 AT1G33480 CP00142G00050 1.34E-05 VV18G17350 4.02E-02 RC30074G00080 sym faster W AT4G31900 AT2G25170 CP00127G00130 2.92E-31 VV04G12540 1.53E-31 RC29848G01120 8.54E-36 faster W AT5G15140 AT3G01260 CP00014G00240 2.82E-05 VV14G01760 8.06E-05 RC29822G01630 1.55E-04 slower W AT1G54280 AT3G13900 CP00025G00770 sym VV09G02820 sym RC29706G00630 sym 0 W AT3G26590 AT1G12950 CP00033G00120 sym VV00G08640 sym RC27504G00310 5.63E-03 0 W AT4G26140 AT5G56870 CP00093G00260 sym VV11G03590 sym RC29912G00490 3.46E-02 0 W AT3G03260 AT5G17320 CP00178G00200 sym VV04G07090 9.21E-03 RC30171G00200 sym 0 G    40  Table 4. Microarray assay results for imprinted genes derived from the alpha WGD and their paralogs.  Arabidopsis genes loci numbers in bold are genes studied in expression assays. Expression width of imprinted gene and expression organ specificity of imprinted gene values are in bold if the imprinted gene shows restricted expression compared to its paralog. In the reference column, H: Hsieh et al. (2011); W: Wolff et al. (2011); G: Gehring et al. (2009); T: Tiwari et al. (2008); K: Kinoshita et al. (2004).  imprinted gene paralog expression width of imprinted gene expression width of paralog overlapping organ types expression organ specificity of imprinted gene expression organ specificity of paralog reference AT3G19350 AT1G49760 53 63 53 0.57 0.20 T AT4G25530 AT5G52170 4 0 0 0.62 0.28 K, W AT2G32370 AT1G05230 2 55 2 0.75 0.38 G, H AT1G05570 AT2G31960 61 63 61 0.31 0.21 H AT1G13900 AT2G03450 63 63 63 0.30 0.50 H AT1G28050 AT2G33500 61 47 46 0.28 0.33 H AT1G35580 AT4G09510 55 53 49 0.32 0.31 H AT1G62660 AT1G12240 62 63 62 0.38 0.21 H AT1G64610 AT5G42010 63 43 43 0.47 0.62 H AT1G77000 AT1G21410 63 62 62 0.35 0.32 H AT1G79520 AT1G16310 52 41 36 0.43 0.41 H AT2G17990 AT4G36105 63 12 12 0.21 0.27 H AT2G34880 AT1G30810 0 46 0 0.46 0.38 H AT3G17250 AT1G48040 57 63 57 0.27 0.56 H AT3G22810 AT5G43870 20 55 16 0.33 0.49 H AT3G25290 AT4G12980 45 61 43 0.44 0.29 H AT3G27300 AT5G40760 63 62 62 0.22 0.26 H AT3G54100 AT2G37980 51 26 22 0.35 0.70 H AT4G12080 AT4G22770 28 30 20 0.55 0.51 H AT4G16760 AT2G35690 63 59 59 0.20 0.40 H AT5G02630 AT3G09570 61 63 61 0.29 0.15 H AT5G15470 AT3G01040 63 63 63 0.42 0.37 H 41  imprinted gene paralog expression width of imprinted gene expression width of paralog overlapping organ types expression organ specificity of imprinted gene expression organ specificity of paralog reference AT5G44350 AT4G20880 47 63 47 0.61 0.37 H AT5G47770 AT4G17190 59 57 56 0.22 0.25 H AT5G53870 AT4G27520 54 62 54 0.63 0.23 H AT5G64400 AT5G09570 63 59 59 0.09 0.62 H AT1G23320 AT1G70560 0 53 0 0.18 0.51 W AT1G54280 AT3G13900 10 53 10 0.76 0.75 W AT2G18880 AT4G30200 0 63 0 0.22 0.15 W AT3G26590 AT1G12950 55 7 7 0.38 0.68 W AT3G50720 AT5G66710 0 17 0 0.42 0.47 W AT4G26140 AT5G56870 39 59 35 0.43 0.34 W AT4G31900 AT2G25170 1 63 1 0.60 0.14 W AT5G15140 AT3G01260 17 7 0 0.74 0.73 W    42  Figure 1. Example tree illustrating the detection of imprinted genes that arose by gene duplication in the Arabidopsis lineage.   Imprinted genes in Arabidopsis are considered to be recent Brassicaceae lineage-specific duplicates if they have one or more Arabidopsis paralogs to form a clade with Carica and other eurosid genes in deeper-branching arrangements. Dotted lines indicate possible cases where there are more than one Arabidopsis paralogs.  43  Figure 2. Phylogenetic trees of FIS2/VRN2 (A) and HDG3/HDG2 (B) in rosids.  (A)     44  (B)    The phylogenetic trees were constructed with maximum likelihood using RAxML. Bootstrap values are indicated for each node. The circle indicates the gene duplication event that gave rise to FIS2 (A) and HDG3 (B). 45  Figure 3. Sequence rate and selection analyses of FIS2, MPC, FWA, and HDG3.       Phylogenetic trees of each gene have sequences from Arabidopsis thaliana (At), Carica papaya (Cp), Populus trichocarpa (Pt), Ricinus communis (Rc), Manihot esculenta (Me), and Vitis vinifera (Vv). Trees are unrooted. Branch lengthes were generated by Codeml in PAML, and the scale bar indicates nucleotide substitutions per codon. Branch-wise Ka/Ks ratios are indicated above the branches. Imprinted genes and their larger Ka/Ks ratios are shown in bold. Circles indicate the gene duplication events that gave rise to the imprinted genes and their paralogs.   46  Figure 4. RT-PCR expression assay results of orthologs of FIS2, MPC, FWA, and HDG3.         RT-PCR expression assays were performed using five organ types: root, stem, leaf, flower, and silique (Arabidopsis) / seed (Carica and Vitis), listed above the corresponding columns. Plus signs indicate the presence of reverse transcriptase in the reaction, and minus signs indicate the absence of reverse transcriptase as negative controls. gDNA indicates genomic DNA from leaf tissue. Gene pairs in Arabidopsis thaliana (At), and their orthologs of FIS2/VRN2, MPC/PAB8, FWA/HDG7, and HDG3/HDG2 from outgroup species, Carica papaya (Cp) and Vitis vinifera (Vv) are listed beside the corresponding panels. The first Arabidopsis gene, shown at the top of each set of gel pictures, is imprinted.  ACTIN in each species was used as a positive control, and the results are shown in the bottom image.  47  Figure 5. RT-PCR expression assay results of imprinted genes, their paralogs, and orthologs from outgroup species.         RT-PCR expression assays were performed using five organ types: root, stem, leaf, flower, and silique (Arabidopsis) / seed (Carica and Vitis), listed above the corresponding columns. Plus signs indicate the presence of reverse transcriptase in the reaction, and minus signs indicate the absence of reverse transcriptase as RT negative controls. Gene pairs in Arabidopsis thaliana (At), and their orthologs from outgroup species, Carica papaya (Cp) and Vitis vinifera (Vv) are listed beside the corresponding panels. The first Arabidopsis gene at the top of each set of gel pictures is imprinted.  ACTIN in each species was used as a positive control, and the results are shown in the bottom right image.  48  Figure 6. FIS2 and VRN2 form two types of PRC2 complexes that regulate different developmental stages in Brassicaceae.     VRN2 along with CLF/SWN, FIE and MSI1 form a VRN2 PRC2 complex regulating flowering after vernalization. FIS2 along with MEA, FIE, and MSI1 form a FIS2 complex that regulates seed/ovule development before and after fertilization.  49  References   Barker MS, Vogel H, Schranz ME. 2009. Paleopolyploidy in the Brassicales: analyses of the Cleome transcriptome elucidate the history of genome duplications in Arabidopsis and other Brassicales. Genome Biology and Evolution, 1:391–399.  Bayer M, Nawy T, Giglione C, Galli M, Meinnel T, Lukowitz W. 2009. Paternal control of embryonic patterning in Arabidopsis thaliana. Science, 323:1485-1488.  Berger F and Chaudhury A. 2009. Parental memories shape seeds. Trends in Plant Science, 14:550-556.  Blanc G, Hokamp K, Wolfe KH. 2003. A recent polyploidy superimposed on older large-scale duplications in the Arabidopsis genome. Genome Research, 13:137–144.  Blanc G and Wolfe KH. 2004. Functional divergence of duplicated genes formed by polyploidy during Arabidopsis evolution. Plant Cell, 16:1679–1691.  Bowers JE, Chapman BA, Rong J, Paterson AH. 2003. Unraveling angiosperm genome evolution by phylogenetic analysis of chromosomal duplication events. Nature, 422:433–438.  Byrne KP and Wolfe KH. 2007. Consistent patterns of rate asymmetry and gene loss indicate widespread neofunctionalization of yeast genes after whole-genome duplication. Genetics, 175:1341–1350.  Chen LJ, Diao ZY, Specht C, Sung ZR. 2009. Molecular evolution of VEF-domain-containing PcG genes in plants. Molecular Plant, 2:738-754.  Cheng F, Liu S, Wu J, Fang L, Sun S, Liu B, Li P, Hua W, Wang X. 2011. BRAD, the genetics and genomics database for Brassica plants. BMC Plant Biology, 11:136.  Cui L, Wall PK, Leebens-Mack JH, Lindsay BG, Soltis DE, Doyle JJ, Soltis PS, Carlson JE, Arumuganathan K, Barakat A, Albert VA, Ma H, de Pamphilis CW. 2006. Widespread genome duplications throughout the history of flowering plants. Genome Research, 16:738–749.  Dassanayake M, Oh DH, Haas JS, Hernandez A, Hong H, Ali S, Yun DJ, Bressan RA, Zhu JK, Bohnert HJ, Cheeseman JM. 2011. The genome of the extremophile crucifer Thellungiella parvula. Nature Genetics, 43:913- 918.  Duarte JM, Cui L, Wall PK, Zhang Q, Zhang X, Leebens-Mack J, Ma H, Altman N, dePamphilis CW. 2006. Expression pattern shifts following duplication indicative of subfunctionalization and neofunctionalization in regulatory genes of Arabidopsis. Molecular Biology and Evolution, 23:469-478.  Edgar RC. 2004. MUSCLE: multiple sequence alignment with high accuracy and high throughput. Nucleic Acids Research, 32:1792–1797.  Felsenstein J. 1981. Evolutionary trees from DNA sequences: a maximum likelihood approach. Journal of Molecular Evolution, 17:368-376.  Force A, Lynch M, Pickett FB, Amores A, Yan YL, Postlethwait J. 1999. Preservation of duplicate genes by complementary, degenerative mutations. Genetics, 151:1531–1545.  Gehring M, Bubb KL, Henikoff S. 2009. Extensive demethylation of repetitive elements during seed development underlies gene imprinting. Science, 324:1447-1451.  50  Gehring M, Missirian V, Henikoff S. 2011. Genomic analysis of parent-of-origin allelic expression in Arabidopsis thaliana seeds. PLoS One, 6(8):e23687.  Haberer G, Hindemitt T, Meyers BC, Mayer KF. 2004. Transcriptional similarities, dissimilarities, and conservation of cis-elements in duplicated genes of Arabidopsis. Plant Physiology, 136:3009–3022.  Hall TA. 1999. BioEdit: a user-friendly biological sequence alignment editor and analysis program for Windows 95/98/NT. Nucleic Acids Symposium Series, 41:95–98.  Hehenberger E, Kradolfer D, Köhler C. 2012. Endosperm cellularization defines an important developmental transition for embryo development. Development, 139:2031-2039.  Hennig L and Derkacheva M. 2009. Diversity of Polycomb group complexes in plants: same rules, different players? Trends in Genetics, 25:414-423.  Hsieh TF, Shin J, Uzawa R, Silva P, Cohen S, Bauer MJ, Hashimoto M, Kirkbride RC, Harada JJ, Zilberman D, Fischer RL. 2011. Regulation of imprinted gene expression in Arabidopsis endosperm. Proceeding of National Academy of Sciences of the U S A., 108:1755-1762.  Hulsen T, Huynen MA, de Vlieg J, Groenen PM. 2006. Benchmarking ortholog identification methods using functional genomics data. Genome Biology, 7:R31.  Kinoshita T, Miura A, Choi Y, Kinoshita Y, Cao X, Jacobsen SE, Fischer RL, Kakutani T. 2004. One-way control of FWA imprinting in Arabidopsis endosperm by DNA methylation. Science, 303:521-523.  Köhler C and Makarevich G. 2006. Epigenetic mechanisms governing seed development in plants. EMBO Reports, 7:1223-1227.  Köhler C, Wolff P, Spillane C. 2012. Epigenetic mechanisms underlying genomic imprinting in plants. Annual Review of Plant Biology, 63: 331-352.  Liu S-L and Adams KL. 2010. Dramatic change in function and expression pattern of a gene duplicated by polyploidy created a paternal effect gene in the Brassicaceae. Molecular Biology and Evolution, 27(12):2817–2828.  Liu S-L, Baute G, Adams KL.  2011. Organ and cell type-specific complementary expression patterns and regulatory neofunctionalization between duplicated genes in Arabidopsis thaliana. Genome Biology and Evolution, 3: 1419-1436.  Luo M, Taylor JM, Spriggs A, Zhang H, Wu X, Russell S, Singh M, Koltunow A. 2011. A Genome-Wide Survey of Imprinted Genes in Rice Seeds Reveals Imprinting Primarily Occurs in the Endosperm. PLoS Genetics, 7(6): e1002125. doi:10.1371/journal.pgen.1002125.  Luo M, Platten D, Chaudhury A, Peacock WJ, Dennis ES. 2009. Expression, imprinting, and evolution of rice homologs of the polycomb group genes. Molecular Plant, 2:711-723.  Nakamura M, Katsumata H, Abe M, Yabe N, Komeda Y, Yamamoto KT, Takahashi T. 2006. Characterization of the class IV homeodomain-Leucine Zipper gene family in Arabidopsis. Plant Physiology, 141:1363-1375.  Proost S, Van Bel M, Sterck L, Billiau K, Van Parys T, Van de Peer Y, Vandepoele K. 2009. PLAZA: a comparative genomics resource to study gene and genome evolution in plants. Plant Cell, 21:3718–3731.  Raissig MT, Baroux C, Grossniklaus U. 2011. Regulation and flexibility of genomic imprinting during seed development. Plant Cell, 23:16-26.  51  Rodrigues JC, Tucker MR, Johnson SD, Hrmova M, Koltunow AM. 2008. Sexual and apomictic seed formation in Hieracium requires the plant polycomb-group gene FERTILIZATION INDEPENDENT ENDOSPERM. Plant Cell, 20:2372-2386.  Roszak P and Köhler C. 2011. Polycomb group proteins are required to couple seed coat initiation to fertilization. Proceeding of National Academy of Sciences of the U S A., 108:20826-20831.  Schmid M , Davison TS, Henz SR, Pape UJ, Demar M, Vingron M, Schölkopf B, Weigel D, Lohmann JU. 2005. A gene expression map of Arabidopsis thaliana development. Nature Genetics, 37:501–506.  Soltis DE, Albert VA, Leebens-Mack J, Bell CD, Paterson AH, Zheng C, Sankoff D, dePamphilis CW, Wall PK, Soltis PS. 2009. Polyploidy and angiosperm diversification. American Journal of Botany, 96:336–348.  Spillane C, Schmid KJ, Laoueille-Duprat S, Pien S, Escobar-Restrepo J-M, Baroux C, Gagliardini V, Page DR, Wolfe KH, Grossniklaus U. 2007. Positive darwinian selection at the imprinted MEDEA locus in plants. Nature, 448:349–352.  Stamatakis A. 2006. RAxML-VI-HPC: Maximum Likelihood-based Phylogenetic Analyses with Thousands of Taxa and Mixed Models. Bioinformatics, 22:2688–2690.  Tiwari S, Schulz R, Ikeda Y, Dytham L, Bravo J, Mathers L, Spielman M, Guzmán P, Oakey RJ, Kinoshita T, Scott RJ. 2008. MATERNALLY EXPRESSED PAB C-TERMINAL, a novel imprinted gene in Arabidopsis, encodes the conserved C-terminal domain of polyadenylate binding proteins. Plant Cell, 20:2387-2398.  Waters AJ, Makarevitch I, Eichten SR, Swanson-Wagner RA, Yeh CT, Xu W, Schnable PS, Vaughn MW, Gehring M, Springer NM. 2011. Parent-of-origin effects on gene expression and DNA methylation in the maize endosperm. Plant Cell, 23:4221-4233.  Wolff P, Weinhofer I, Seguin J, Roszak P, Beisel C, Donoghue MT, Spillane C, Nordborg M, Rehmsmeier M, Köhler C. 2011. High-resolution analysis of parent-of-origin allelic expression in the Arabidopsis Endosperm. PLoS Genetics, 7(6): e1002126. doi:10.1371/journal.pgen.1002126.  Van Bel M, Proost S, Wischnitzki E, Movahedi S, Scheerlinck C, Van de Peer Y, Vandepoele K. 2012. Dissecting plant genomes with the PLAZA comparative genomics platform. Plant Physiology, 158:590-600.  Van de Peer Y, Maere S, Meyer A. 2009. The evolutionary significance of ancient genome duplications. Nature Reviews Genetics, 10:725–732.  Van de Peer Y, Taylor JS, Braasch I, Meyer A. 2001. The ghost of selection past: rates of evolution and functional divergence of anciently duplicated genes. Journal of Molecular Evolution, 53: 436–446.  Yang L and Gaut BS. 2011. Factors that contribute to variation in evolutionary rate among Arabidopsis genes. Molecular Biology and Evolution, 28: 2359-2369.  Yang Z. 2007. PAML 4: phylogenetic analysis by maximum likelihood. Molecular Biology and Evolution, 24:1586–1591.  Zhang M, Zhao H, Xie S, Chen J, Xu Y, Wang K, Zhao H, Guan H, Hu X, Jiao Y, Song W, Lai J. 2011. Extensive, clustered parental imprinting of protein-coding and noncoding RNAs in developing maize endosperm. Proceeding of National Academy of Sciences of the U S A., 108:20042-20047.  Zhou R, Moshgabadi N, Adams KL. 2011. Extensive changes to alternative splicing patterns following allopolyploidy in natural and resynthesized polyploids. Proceeding of National Academy of Sciences of the U S A., 108:16122-16127.  52  Zwickl DJ. 2006. Genetic algorithm approaches for the phylogenetic analysis of large biological sequence datasets under the maximum likelihood criterion. Austin (TX): The University of Texas at Austin.    

Cite

Citation Scheme:

        

Citations by CSL (citeproc-js)

Usage Statistics

Share

Embed

Customize your widget with the following options, then copy and paste the code below into the HTML of your page to embed this item in your website.
                        
                            <div id="ubcOpenCollectionsWidgetDisplay">
                            <script id="ubcOpenCollectionsWidget"
                            src="{[{embed.src}]}"
                            data-item="{[{embed.item}]}"
                            data-collection="{[{embed.collection}]}"
                            data-metadata="{[{embed.showMetadata}]}"
                            data-width="{[{embed.width}]}"
                            async >
                            </script>
                            </div>
                        
                    
IIIF logo Our image viewer uses the IIIF 2.0 standard. To load this item in other compatible viewers, use this url:
https://iiif.library.ubc.ca/presentation/dsp.24.1-0073522/manifest

Comment

Related Items