"Science, Faculty of"@en . "Botany, Department of"@en . "DSpace"@en . "UBCV"@en . "Tay, Yii Van"@en . "2013-10-31T07:00:00Z"@en . "2013"@en . "Master of Science - MSc"@en . "University of British Columbia"@en . "Gene duplication has supplied the raw material for novel gene functions and evolutionary innovations in plants. Duplicated genes can have different fates over time such as neofunctionalization and subfunctionalization. Sublocalization, which is a type of subfunctionalization based on protein subcellular relocalization, happens when the products of the duplicate genes are each directed to only one of two subcellular locations that were previously targeted by the single ancestral gene. The goals of the first part of my project were to study changes in protein subcellular localization (relocalization) after gene duplication by finding cases of sublocalization and further characterizing them from an evolutionary perspective. I found that sublocalization is a relatively uncommon phenomenon in plants as only two out of the seven gene families that I analyzed demonstrated cases of sublocalization. I identified and analyzed multiple cases of sublocalization of the APX and PP5 genes by doing RT-PCR experiments and then performing phylogenetic analyses and sequence rate analyses to further characterize the genes from an evolutionary perspective. Regulatory neofunctionalization involves changes in expression patterns of a gene after duplication. The goals for the second part of my thesis were to study expression patterns of duplicated genes in Arabidopsis thaliana and to analyze the selective forces acting on the genes of interest. I focused on eight pairs of duplicates that showed one copy broadly expressed and the other copy having expression only in certain organ types. By analyzing the expression patterns of the orthologs in outgroup species and selective forces acting on the sequences, I obtained evidence for potential neofunctionalization for a few cases. The results from my thesis provide new insights into the frequency and process of sublocalization of duplicated genes, as well as characterizing new examples of neofunctionalization of duplicated genes."@en . "https://circle.library.ubc.ca/rest/handle/2429/44264?expand=metadata"@en . "FATES OF GENES AFTER DUPLICATION: SUBLOCALIZATION AND REGULATORY NEOFUNCTIONALIZATION by Yii Van Tay A THESIS SUBMITTED IN PARTIAL FULFILLMENT OF THE REQUIREMENTS FOR THE DEGREE OF MASTER OF SCIENCE in The Faculty of Graduate Studies (Botany) THE UNIVERSITY OF BRITISH COLUMBIA (Vancouver) April 2013 \u00C2\u00A9Yii Van Tay, 2013 ii Abstract Gene duplication has supplied the raw material for novel gene functions and evolutionary innovations in plants. Duplicated genes can have different fates over time such as neofunctionalization and subfunctionalization. Sublocalization, which is a type of subfunctionalization based on protein subcellular relocalization, happens when the products of the duplicate genes are each directed to only one of two subcellular locations that were previously targeted by the single ancestral gene. The goals of the first part of my project were to study changes in protein subcellular localization (relocalization) after gene duplication by finding cases of sublocalization and further characterizing them from an evolutionary perspective. I found that sublocalization is a relatively uncommon phenomenon in plants as only two out of the seven gene families that I analyzed demonstrated cases of sublocalization. I identified and analyzed multiple cases of sublocalization of the APX and PP5 genes by doing RT-PCR experiments and then performing phylogenetic analyses and sequence rate analyses to further characterize the genes from an evolutionary perspective. Regulatory neofunctionalization involves changes in expression patterns of a gene after duplication. The goals for the second part of my thesis were to study expression patterns of duplicated genes in Arabidopsis thaliana and to analyze the selective forces acting on the genes of interest. I focused on eight pairs of duplicates that showed one copy broadly expressed and the other copy having expression only in certain organ types. By analyzing the expression patterns of the orthologs in outgroup species and selective forces acting on the sequences, I obtained evidence for potential neofunctionalization for a few cases. The results from my thesis provide new insights into the frequency and process of sublocalization of duplicated genes, as well as characterizing new examples of neofunctionalization of duplicated genes. iii Preface The project included collaborations with Alexander Hammel in the Adams lab at UBC. In the Regulatory Neofunctionalization Results and Discussion section \u00E2\u0080\u009CIdentification of Arabidopsis thaliana alpha whole genome duplicates with regulatory neofunctionalization\u00E2\u0080\u009D, Alexander Hammel wrote Python scripts to do a screen of microarray data. In particular, his analysis identified alpha whole genome duplicates with a negative correlation coefficient for gene expression patterns in different organ types, using raw ATH1 microarray data with 63 different organ types and developmental stages (Schmid et al. 2005). I manually picked duplicate genes with a restricted expression pattern in one of the duplicates and broad expression in another duplicate, using the same microarray data. I did the RT-PCR expression experiments and the sequence rate analyses that are presented. iv Table of Contents Abstract \u00E2\u0080\u00A6\u00E2\u0080\u00A6\u00E2\u0080\u00A6\u00E2\u0080\u00A6\u00E2\u0080\u00A6\u00E2\u0080\u00A6\u00E2\u0080\u00A6\u00E2\u0080\u00A6\u00E2\u0080\u00A6\u00E2\u0080\u00A6\u00E2\u0080\u00A6\u00E2\u0080\u00A6\u00E2\u0080\u00A6\u00E2\u0080\u00A6\u00E2\u0080\u00A6\u00E2\u0080\u00A6\u00E2\u0080\u00A6\u00E2\u0080\u00A6\u00E2\u0080\u00A6\u00E2\u0080\u00A6\u00E2\u0080\u00A6\u00E2\u0080\u00A6\u00E2\u0080\u00A6\u00E2\u0080\u00A6\u00E2\u0080\u00A6\u00E2\u0080\u00A6\u00E2\u0080\u00A6\u00E2\u0080\u00A6\u00E2\u0080\u00A6\u00E2\u0080\u00A6\u00E2\u0080\u00A6\u00E2\u0080\u00A6\u00E2\u0080\u00A6\u00E2\u0080\u00A6 ii Preface \u00E2\u0080\u00A6\u00E2\u0080\u00A6\u00E2\u0080\u00A6\u00E2\u0080\u00A6\u00E2\u0080\u00A6\u00E2\u0080\u00A6\u00E2\u0080\u00A6\u00E2\u0080\u00A6\u00E2\u0080\u00A6\u00E2\u0080\u00A6\u00E2\u0080\u00A6\u00E2\u0080\u00A6\u00E2\u0080\u00A6\u00E2\u0080\u00A6\u00E2\u0080\u00A6\u00E2\u0080\u00A6\u00E2\u0080\u00A6\u00E2\u0080\u00A6\u00E2\u0080\u00A6\u00E2\u0080\u00A6\u00E2\u0080\u00A6\u00E2\u0080\u00A6\u00E2\u0080\u00A6\u00E2\u0080\u00A6\u00E2\u0080\u00A6\u00E2\u0080\u00A6\u00E2\u0080\u00A6\u00E2\u0080\u00A6\u00E2\u0080\u00A6\u00E2\u0080\u00A6\u00E2\u0080\u00A6\u00E2\u0080\u00A6\u00E2\u0080\u00A6\u00E2\u0080\u00A6. iii Table of Contents \u00E2\u0080\u00A6\u00E2\u0080\u00A6\u00E2\u0080\u00A6\u00E2\u0080\u00A6\u00E2\u0080\u00A6\u00E2\u0080\u00A6\u00E2\u0080\u00A6\u00E2\u0080\u00A6\u00E2\u0080\u00A6\u00E2\u0080\u00A6\u00E2\u0080\u00A6\u00E2\u0080\u00A6\u00E2\u0080\u00A6\u00E2\u0080\u00A6\u00E2\u0080\u00A6\u00E2\u0080\u00A6\u00E2\u0080\u00A6\u00E2\u0080\u00A6\u00E2\u0080\u00A6\u00E2\u0080\u00A6\u00E2\u0080\u00A6\u00E2\u0080\u00A6\u00E2\u0080\u00A6\u00E2\u0080\u00A6\u00E2\u0080\u00A6\u00E2\u0080\u00A6\u00E2\u0080\u00A6\u00E2\u0080\u00A6\u00E2\u0080\u00A6\u00E2\u0080\u00A6 iv List of Tables \u00E2\u0080\u00A6\u00E2\u0080\u00A6\u00E2\u0080\u00A6\u00E2\u0080\u00A6\u00E2\u0080\u00A6\u00E2\u0080\u00A6\u00E2\u0080\u00A6\u00E2\u0080\u00A6\u00E2\u0080\u00A6\u00E2\u0080\u00A6\u00E2\u0080\u00A6\u00E2\u0080\u00A6\u00E2\u0080\u00A6\u00E2\u0080\u00A6\u00E2\u0080\u00A6\u00E2\u0080\u00A6\u00E2\u0080\u00A6\u00E2\u0080\u00A6\u00E2\u0080\u00A6\u00E2\u0080\u00A6\u00E2\u0080\u00A6\u00E2\u0080\u00A6\u00E2\u0080\u00A6\u00E2\u0080\u00A6\u00E2\u0080\u00A6\u00E2\u0080\u00A6\u00E2\u0080\u00A6\u00E2\u0080\u00A6\u00E2\u0080\u00A6\u00E2\u0080\u00A6\u00E2\u0080\u00A6\u00E2\u0080\u00A6 v List of Figures \u00E2\u0080\u00A6\u00E2\u0080\u00A6\u00E2\u0080\u00A6\u00E2\u0080\u00A6\u00E2\u0080\u00A6\u00E2\u0080\u00A6\u00E2\u0080\u00A6\u00E2\u0080\u00A6\u00E2\u0080\u00A6\u00E2\u0080\u00A6\u00E2\u0080\u00A6\u00E2\u0080\u00A6\u00E2\u0080\u00A6\u00E2\u0080\u00A6\u00E2\u0080\u00A6\u00E2\u0080\u00A6\u00E2\u0080\u00A6\u00E2\u0080\u00A6\u00E2\u0080\u00A6\u00E2\u0080\u00A6\u00E2\u0080\u00A6\u00E2\u0080\u00A6\u00E2\u0080\u00A6\u00E2\u0080\u00A6\u00E2\u0080\u00A6\u00E2\u0080\u00A6\u00E2\u0080\u00A6\u00E2\u0080\u00A6\u00E2\u0080\u00A6\u00E2\u0080\u00A6\u00E2\u0080\u00A6.. vi Acknowledgements \u00E2\u0080\u00A6\u00E2\u0080\u00A6\u00E2\u0080\u00A6\u00E2\u0080\u00A6\u00E2\u0080\u00A6\u00E2\u0080\u00A6\u00E2\u0080\u00A6\u00E2\u0080\u00A6\u00E2\u0080\u00A6\u00E2\u0080\u00A6\u00E2\u0080\u00A6\u00E2\u0080\u00A6\u00E2\u0080\u00A6\u00E2\u0080\u00A6\u00E2\u0080\u00A6\u00E2\u0080\u00A6\u00E2\u0080\u00A6\u00E2\u0080\u00A6\u00E2\u0080\u00A6\u00E2\u0080\u00A6\u00E2\u0080\u00A6\u00E2\u0080\u00A6\u00E2\u0080\u00A6\u00E2\u0080\u00A6\u00E2\u0080\u00A6\u00E2\u0080\u00A6\u00E2\u0080\u00A6\u00E2\u0080\u00A6... viii Chapter One: Introduction \u00E2\u0080\u00A6\u00E2\u0080\u00A6\u00E2\u0080\u00A6\u00E2\u0080\u00A6\u00E2\u0080\u00A6\u00E2\u0080\u00A6\u00E2\u0080\u00A6\u00E2\u0080\u00A6\u00E2\u0080\u00A6\u00E2\u0080\u00A6\u00E2\u0080\u00A6\u00E2\u0080\u00A6\u00E2\u0080\u00A6\u00E2\u0080\u00A6\u00E2\u0080\u00A6\u00E2\u0080\u00A6\u00E2\u0080\u00A6\u00E2\u0080\u00A6\u00E2\u0080\u00A6\u00E2\u0080\u00A6\u00E2\u0080\u00A6\u00E2\u0080\u00A6\u00E2\u0080\u00A6\u00E2\u0080\u00A6\u00E2\u0080\u00A6\u00E2\u0080\u00A6. 1 Chapter Two: Materials and Methods \u00E2\u0080\u00A6\u00E2\u0080\u00A6\u00E2\u0080\u00A6\u00E2\u0080\u00A6\u00E2\u0080\u00A6\u00E2\u0080\u00A6\u00E2\u0080\u00A6\u00E2\u0080\u00A6\u00E2\u0080\u00A6\u00E2\u0080\u00A6\u00E2\u0080\u00A6\u00E2\u0080\u00A6\u00E2\u0080\u00A6\u00E2\u0080\u00A6\u00E2\u0080\u00A6\u00E2\u0080\u00A6\u00E2\u0080\u00A6\u00E2\u0080\u00A6\u00E2\u0080\u00A6\u00E2\u0080\u00A6\u00E2\u0080\u00A6... 6 Chapter Three: Sublocalization Results and Discussion \u00E2\u0080\u00A6\u00E2\u0080\u00A6\u00E2\u0080\u00A6\u00E2\u0080\u00A6\u00E2\u0080\u00A6\u00E2\u0080\u00A6\u00E2\u0080\u00A6\u00E2\u0080\u00A6\u00E2\u0080\u00A6\u00E2\u0080\u00A6\u00E2\u0080\u00A6\u00E2\u0080\u00A6\u00E2\u0080\u00A6\u00E2\u0080\u00A6 11 Chapter Four: Regulatory Neofunctionalization Results and Discussion \u00E2\u0080\u00A6\u00E2\u0080\u00A6\u00E2\u0080\u00A6\u00E2\u0080\u00A6\u00E2\u0080\u00A6\u00E2\u0080\u00A6\u00E2\u0080\u00A6 20 Chapter Five: Conclusions and Future Directions \u00E2\u0080\u00A6\u00E2\u0080\u00A6\u00E2\u0080\u00A6\u00E2\u0080\u00A6\u00E2\u0080\u00A6\u00E2\u0080\u00A6\u00E2\u0080\u00A6\u00E2\u0080\u00A6\u00E2\u0080\u00A6\u00E2\u0080\u00A6\u00E2\u0080\u00A6\u00E2\u0080\u00A6\u00E2\u0080\u00A6\u00E2\u0080\u00A6\u00E2\u0080\u00A6\u00E2\u0080\u00A6. 24 Tables and Figures \u00E2\u0080\u00A6\u00E2\u0080\u00A6\u00E2\u0080\u00A6\u00E2\u0080\u00A6\u00E2\u0080\u00A6\u00E2\u0080\u00A6\u00E2\u0080\u00A6\u00E2\u0080\u00A6\u00E2\u0080\u00A6\u00E2\u0080\u00A6\u00E2\u0080\u00A6\u00E2\u0080\u00A6\u00E2\u0080\u00A6\u00E2\u0080\u00A6\u00E2\u0080\u00A6\u00E2\u0080\u00A6\u00E2\u0080\u00A6\u00E2\u0080\u00A6\u00E2\u0080\u00A6\u00E2\u0080\u00A6\u00E2\u0080\u00A6\u00E2\u0080\u00A6\u00E2\u0080\u00A6\u00E2\u0080\u00A6\u00E2\u0080\u00A6\u00E2\u0080\u00A6\u00E2\u0080\u00A6\u00E2\u0080\u00A6\u00E2\u0080\u00A6. 26 References \u00E2\u0080\u00A6\u00E2\u0080\u00A6\u00E2\u0080\u00A6\u00E2\u0080\u00A6\u00E2\u0080\u00A6\u00E2\u0080\u00A6\u00E2\u0080\u00A6\u00E2\u0080\u00A6\u00E2\u0080\u00A6\u00E2\u0080\u00A6\u00E2\u0080\u00A6\u00E2\u0080\u00A6\u00E2\u0080\u00A6\u00E2\u0080\u00A6\u00E2\u0080\u00A6\u00E2\u0080\u00A6\u00E2\u0080\u00A6\u00E2\u0080\u00A6\u00E2\u0080\u00A6\u00E2\u0080\u00A6\u00E2\u0080\u00A6\u00E2\u0080\u00A6\u00E2\u0080\u00A6\u00E2\u0080\u00A6\u00E2\u0080\u00A6\u00E2\u0080\u00A6\u00E2\u0080\u00A6\u00E2\u0080\u00A6\u00E2\u0080\u00A6\u00E2\u0080\u00A6\u00E2\u0080\u00A6\u00E2\u0080\u00A6... 47 v List of Tables Table 1. The name and locus number for the duplicate genes of interest and their related publications \u00E2\u0080\u00A6\u00E2\u0080\u00A6\u00E2\u0080\u00A6\u00E2\u0080\u00A6\u00E2\u0080\u00A6\u00E2\u0080\u00A6\u00E2\u0080\u00A6\u00E2\u0080\u00A6\u00E2\u0080\u00A6\u00E2\u0080\u00A6\u00E2\u0080\u00A6\u00E2\u0080\u00A6\u00E2\u0080\u00A6\u00E2\u0080\u00A6\u00E2\u0080\u00A6\u00E2\u0080\u00A6\u00E2\u0080\u00A6\u00E2\u0080\u00A6\u00E2\u0080\u00A6\u00E2\u0080\u00A6\u00E2\u0080\u00A6\u00E2\u0080\u00A6\u00E2\u0080\u00A6\u00E2\u0080\u00A6\u00E2\u0080\u00A6\u00E2\u0080\u00A6\u00E2\u0080\u00A6\u00E2\u0080\u00A6\u00E2\u0080\u00A6\u00E2\u0080\u00A6\u00E2\u0080\u00A6\u00E2\u0080\u00A6. 26 Table 2. The results for the likelihood ratio test \u00E2\u0080\u00A6\u00E2\u0080\u00A6\u00E2\u0080\u00A6\u00E2\u0080\u00A6\u00E2\u0080\u00A6\u00E2\u0080\u00A6\u00E2\u0080\u00A6\u00E2\u0080\u00A6\u00E2\u0080\u00A6\u00E2\u0080\u00A6\u00E2\u0080\u00A6\u00E2\u0080\u00A6\u00E2\u0080\u00A6\u00E2\u0080\u00A6\u00E2\u0080\u00A6\u00E2\u0080\u00A6\u00E2\u0080\u00A6. 27 vi List of Figures Figure 1. Regulatory subfunctionalization and regulatory neofunctionalization \u00E2\u0080\u00A6\u00E2\u0080\u00A6\u00E2\u0080\u00A6\u00E2\u0080\u00A6 28 Figure 2. Targeting of a protein to different subcellular compartments caused by alternative splicing (AS) and sublocalization after gene duplication \u00E2\u0080\u00A6\u00E2\u0080\u00A6\u00E2\u0080\u00A6\u00E2\u0080\u00A6\u00E2\u0080\u00A6\u00E2\u0080\u00A6\u00E2\u0080\u00A6\u00E2\u0080\u00A6\u00E2\u0080\u00A6 29 Figure 3. Outline of the 62-kD PP5 and 55-kD PP5 alternative splice forms \u00E2\u0080\u00A6\u00E2\u0080\u00A6\u00E2\u0080\u00A6\u00E2\u0080\u00A6\u00E2\u0080\u00A6... 30 Figure 4. Outline of the thylakoid APX (tAPX) and the stromal APX (sAPX) alternative splice forms \u00E2\u0080\u00A6\u00E2\u0080\u00A6\u00E2\u0080\u00A6\u00E2\u0080\u00A6\u00E2\u0080\u00A6\u00E2\u0080\u00A6\u00E2\u0080\u00A6\u00E2\u0080\u00A6\u00E2\u0080\u00A6\u00E2\u0080\u00A6\u00E2\u0080\u00A6\u00E2\u0080\u00A6\u00E2\u0080\u00A6\u00E2\u0080\u00A6\u00E2\u0080\u00A6\u00E2\u0080\u00A6\u00E2\u0080\u00A6\u00E2\u0080\u00A6\u00E2\u0080\u00A6\u00E2\u0080\u00A6\u00E2\u0080\u00A6\u00E2\u0080\u00A6\u00E2\u0080\u00A6\u00E2\u0080\u00A6\u00E2\u0080\u00A6\u00E2\u0080\u00A6\u00E2\u0080\u00A6\u00E2\u0080\u00A6\u00E2\u0080\u00A6\u00E2\u0080\u00A6\u00E2\u0080\u00A6\u00E2\u0080\u00A6. 30 Figure 5. The amino acid alignments for PP5 \u00E2\u0080\u00A6\u00E2\u0080\u00A6\u00E2\u0080\u00A6\u00E2\u0080\u00A6\u00E2\u0080\u00A6\u00E2\u0080\u00A6\u00E2\u0080\u00A6\u00E2\u0080\u00A6\u00E2\u0080\u00A6\u00E2\u0080\u00A6\u00E2\u0080\u00A6\u00E2\u0080\u00A6\u00E2\u0080\u00A6\u00E2\u0080\u00A6\u00E2\u0080\u00A6\u00E2\u0080\u00A6\u00E2\u0080\u00A6\u00E2\u0080\u00A6. 31 Figure 6. Expression and splicing of the Cucumis sativus (cucumber) PP5 \u00E2\u0080\u00A6\u00E2\u0080\u00A6\u00E2\u0080\u00A6\u00E2\u0080\u00A6\u00E2\u0080\u00A6\u00E2\u0080\u00A6. 31 Figure 7. Hydropathy plots (Kyte-Doolittle) of Cucumis sativus (cucumber) PP5, Arabidopsis thaliana PP5, Solanum lycopersicum (tomato) PP5\u00E2\u0080\u00A6\u00E2\u0080\u00A6\u00E2\u0080\u00A6\u00E2\u0080\u00A6\u00E2\u0080\u00A6\u00E2\u0080\u00A6\u00E2\u0080\u00A6\u00E2\u0080\u00A6\u00E2\u0080\u00A6\u00E2\u0080\u00A6\u00E2\u0080\u00A6. 32 Figure 8. Maximum likelihood tree for PP5 genes \u00E2\u0080\u00A6\u00E2\u0080\u00A6\u00E2\u0080\u00A6\u00E2\u0080\u00A6\u00E2\u0080\u00A6\u00E2\u0080\u00A6\u00E2\u0080\u00A6\u00E2\u0080\u00A6\u00E2\u0080\u00A6\u00E2\u0080\u00A6\u00E2\u0080\u00A6\u00E2\u0080\u00A6\u00E2\u0080\u00A6\u00E2\u0080\u00A6\u00E2\u0080\u00A6\u00E2\u0080\u00A6.. 33 Figure 9. A PP5 phylogenetic tree with Ka/Ks ratios \u00E2\u0080\u00A6\u00E2\u0080\u00A6\u00E2\u0080\u00A6\u00E2\u0080\u00A6\u00E2\u0080\u00A6\u00E2\u0080\u00A6\u00E2\u0080\u00A6\u00E2\u0080\u00A6\u00E2\u0080\u00A6\u00E2\u0080\u00A6\u00E2\u0080\u00A6\u00E2\u0080\u00A6\u00E2\u0080\u00A6\u00E2\u0080\u00A6\u00E2\u0080\u00A6. 34 Figure 10. The APX amino acid sequence alignment of sequences from the Brassicaceae family (Arabidopsis thaliana, Arabidopsis lyrata, Capsella rubella, Brassica oleracea and Brassica rapa) and its close relatives, Cleome spinosa and Carica papaya \u00E2\u0080\u00A6\u00E2\u0080\u00A6\u00E2\u0080\u00A6\u00E2\u0080\u00A6\u00E2\u0080\u00A6\u00E2\u0080\u00A6\u00E2\u0080\u00A6 35 Figure 11. Maximum likelihood tree for APX genes \u00E2\u0080\u00A6\u00E2\u0080\u00A6\u00E2\u0080\u00A6\u00E2\u0080\u00A6\u00E2\u0080\u00A6\u00E2\u0080\u00A6\u00E2\u0080\u00A6\u00E2\u0080\u00A6\u00E2\u0080\u00A6\u00E2\u0080\u00A6\u00E2\u0080\u00A6\u00E2\u0080\u00A6\u00E2\u0080\u00A6\u00E2\u0080\u00A6\u00E2\u0080\u00A6... 36 Figure 12. An APX phylogenetic tree with Ka/Ks ratios \u00E2\u0080\u00A6\u00E2\u0080\u00A6\u00E2\u0080\u00A6\u00E2\u0080\u00A6\u00E2\u0080\u00A6\u00E2\u0080\u00A6\u00E2\u0080\u00A6\u00E2\u0080\u00A6\u00E2\u0080\u00A6\u00E2\u0080\u00A6\u00E2\u0080\u00A6\u00E2\u0080\u00A6\u00E2\u0080\u00A6\u00E2\u0080\u00A6 37 Figure 13. Expression and splicing of Carica papaya APX \u00E2\u0080\u00A6\u00E2\u0080\u00A6\u00E2\u0080\u00A6\u00E2\u0080\u00A6\u00E2\u0080\u00A6\u00E2\u0080\u00A6\u00E2\u0080\u00A6\u00E2\u0080\u00A6\u00E2\u0080\u00A6\u00E2\u0080\u00A6\u00E2\u0080\u00A6\u00E2\u0080\u00A6\u00E2\u0080\u00A6. 38 Figure 14. An APX amino acid sequence alignment of four species from the Malpighiales order (Manihot esculenta, Ricinus communis, Populus trichocarpa and Linum usitatissimum) \u00E2\u0080\u00A6\u00E2\u0080\u00A6\u00E2\u0080\u00A6\u00E2\u0080\u00A6\u00E2\u0080\u00A6\u00E2\u0080\u00A6\u00E2\u0080\u00A6\u00E2\u0080\u00A6\u00E2\u0080\u00A6\u00E2\u0080\u00A6\u00E2\u0080\u00A6\u00E2\u0080\u00A6\u00E2\u0080\u00A6\u00E2\u0080\u00A6\u00E2\u0080\u00A6\u00E2\u0080\u00A6\u00E2\u0080\u00A6\u00E2\u0080\u00A6\u00E2\u0080\u00A6\u00E2\u0080\u00A6\u00E2\u0080\u00A6\u00E2\u0080\u00A6\u00E2\u0080\u00A6\u00E2\u0080\u00A6\u00E2\u0080\u00A6\u00E2\u0080\u00A6\u00E2\u0080\u00A6\u00E2\u0080\u00A6\u00E2\u0080\u00A6\u00E2\u0080\u00A6\u00E2\u0080\u00A6. 38 Figure 15. Expression and splicing of Populus trichocarpa (cottonwood) APX \u00E2\u0080\u00A6\u00E2\u0080\u00A6\u00E2\u0080\u00A6\u00E2\u0080\u00A6.. 39 Figure 16. Expression and splicing of the Manihot esculenta (cassava) APX \u00E2\u0080\u00A6\u00E2\u0080\u00A6\u00E2\u0080\u00A6\u00E2\u0080\u00A6\u00E2\u0080\u00A6... 39 Figure 17. Expression and splicing of Ricinus communis APX \u00E2\u0080\u00A6\u00E2\u0080\u00A6\u00E2\u0080\u00A6\u00E2\u0080\u00A6\u00E2\u0080\u00A6\u00E2\u0080\u00A6\u00E2\u0080\u00A6\u00E2\u0080\u00A6\u00E2\u0080\u00A6\u00E2\u0080\u00A6\u00E2\u0080\u00A6.. 40 vii Figure 18. An APX amino acid sequence alignment of sequences from the Poaceae family (Zea mays, Sorghum bicolor, Setaria italica, Brachypodium distachyon and Oryza sativa japonica) and its relative, but non-Poaceae species, Dioscorea alata (Dioscoreaceae family) \u00E2\u0080\u00A6\u00E2\u0080\u00A6\u00E2\u0080\u00A6\u00E2\u0080\u00A6\u00E2\u0080\u00A6\u00E2\u0080\u00A6\u00E2\u0080\u00A6\u00E2\u0080\u00A6\u00E2\u0080\u00A6\u00E2\u0080\u00A6\u00E2\u0080\u00A6\u00E2\u0080\u00A6\u00E2\u0080\u00A6\u00E2\u0080\u00A6\u00E2\u0080\u00A6\u00E2\u0080\u00A6\u00E2\u0080\u00A6\u00E2\u0080\u00A6\u00E2\u0080\u00A6\u00E2\u0080\u00A6\u00E2\u0080\u00A6\u00E2\u0080\u00A6\u00E2\u0080\u00A6\u00E2\u0080\u00A6\u00E2\u0080\u00A6\u00E2\u0080\u00A6\u00E2\u0080\u00A6\u00E2\u0080\u00A6\u00E2\u0080\u00A6\u00E2\u0080\u00A6\u00E2\u0080\u00A6\u00E2\u0080\u00A6\u00E2\u0080\u00A6\u00E2\u0080\u00A6. 40 Figure 19. Gene expression from microarray assays and RT-PCR expression assays \u00E2\u0080\u00A6... 41 Figure 20. Phylogenetic trees with Ka/Ks ratios for eight different duplicated gene pairs from Arabidopsis and orthologs from outgroup species \u00E2\u0080\u00A6\u00E2\u0080\u00A6\u00E2\u0080\u00A6\u00E2\u0080\u00A6\u00E2\u0080\u00A6\u00E2\u0080\u00A6\u00E2\u0080\u00A6\u00E2\u0080\u00A6\u00E2\u0080\u00A6\u00E2\u0080\u00A6\u00E2\u0080\u00A6\u00E2\u0080\u00A6\u00E2\u0080\u00A6\u00E2\u0080\u00A6 44 viii Acknowledgements I would like to specifically thank Dr. Keith Adams, who has supervised and guided me throughout my projects and supported me through my graduate study. I would also like to thank my research committee, Dr. Sean Graham and Dr. Patrick Keeling for their insights and suggestions for my projects. I also want to thank everyone in Adams lab for helping me with my projects. I would like to thank Dr. Shao-Lun Liu (now at Tunghai University in Taiwan) and Yichun Qiu for guiding me with the wet lab experiments (RNA extraction, RT-PCR, DNA sequencing). I would also like to thank Alexander Hammel who helped me by writing Python scripts to do a screen on the microarray data. 1 Chapter One: Introduction Gene duplication has supplied the raw material for new gene functions and evolutionary innovations during eukaryotic evolution (reviewed by Dermuth and Hahn 2009; Hastings et al. 2009). There are several types of gene duplication, including tandem duplication, duplicative retroposition, segmental duplication, chromosome duplication, and whole genome duplication (polyploidy). In angiosperms, all lineages have undergone at least one round of ancient polyploidy, and many lineages experienced multiple ancient polyploidy events during angiosperm evolution (e.g., Blanc and Wolfe 2004; Sterck et al. 2005; Cui et al. 2006; Barker et al. 2009; Schmutz et al. 2010; Tang et al. 2010; Jiao et al. 2011). Many plant species have undergone an evolutionarily recent polyploidy event, as polyploidy is an ongoing process in plants, and these plants are considered polyploids cytologically (Wood et al. 2009). Tandem duplications are also common in plant genomes. It was estimated that tandem duplication contributed to 14-16% of the genes in angiosperm genomes (Rizzon et al. 2006). Duplicated genes can have different fates over time (reviewed in Zhang 2003). In some circumstances, both copies of duplicate genes retain their original function. In contrast, sometimes a copy of the newly duplicated genes might lose its function or expression to produce a pseudogene. Among the retained duplicates, occasionally one copy may gain a new function or expression pattern, which is usually called neofunctionalization. Neofunctionalization could be a result of mutation in coding or regulatory regions, and could be a related function or an entirely different function. Sometimes, original functions or expression patterns are divided between the new duplicates, which is usually called subfunctionalization. There have been numerous studies of neofunctionalization and subfunctionalization in angiosperms (reviewed in S\u00C3\u00A9mon and Wolfe 2007; Hahn 2009; Innan and Kondrashov 2010). A type of neofunctionalization is called regulatory neofunctionalization where one duplicate maintains the broad expression pattern in different organs as in the ancestral expression pattern whereas another duplicate only has expression in certain organ types (Figure 1). Similarly, regulatory subfunctionalization is where one duplicate maintain parts of the ancestral expression pattern whereas another duplicate maintain the other parts of the ancestral expression pattern. Liu et al. (2011) has found evidence for more regulatory neofunctionalization cases than 2 subfunctionalization cases in Arabidopsis thaliana, which is consistent with findings of Freeling (2008) that have de-emphasized the importance of subfunctionalization in duplicate genes as a retention mechanism. The finding of Liu et al. (2011) was also consistent with previous studies performed in different eukaryotic taxa, such as yeast (Tirosh and Barkai 2007), Drosophila (Oakley et al. 2006), and mammals (Farr\u00C3\u00A9 and Alb\u00C3\u00A0 2010). In yeast, 45% of duplicated genes have been identified to have undergone regulatory neofunctionalization (Tirosh and Barkai 2007). In Drosophila, Oakley et al. (2006) stated that regulatory neofunctionalization is more common than regulatory subfunctionalization. This is the case in mammals, where Farr\u00C3\u00A9 and Alb\u00C3\u00A0 (2010) studied the expression evolution of duplicate genes and found that 42\u00E2\u0080\u009352% of the genes were neofunctionalized while 23\u00E2\u0080\u009325% of the genes showed regulatory subfunctionalization. Gene duplication also can contribute to changes in protein subcellular localization, which is also known as protein subcellular relocalization (PSR). Protein subcellular localization can be caused by a specific N-terminal peptide, C-terminal motifs, or the presence of internal localization signals. PSR could happen as a result of an initial imperfect duplication or even with just a single point mutation (Byun-McKay and Geeta 2007). Depending on the absence or presence of these sequences, the peptides could remain in the cytosol or be targeted to a membrane-bound organelle like the endoplasmic reticulum, the mitochondria, or the chloroplasts (Byun-McKay and Geeta 2007; Byun McKay et al. 2009). A large-scale study of localization of the products of duplicated genes in Arabidopsis thaliana showed that changes in subcellular localization are relatively common after gene duplication, affecting about 18% of gene pairs (Liu et al., in preparation). Protein subcellular relocalization can be divided into neolocalization and sublocalization. Neolocalization happens after gene duplication where one of duplicate genes encountered changes in its transit peptide region which caused its products to be directed to a novel location. Several examples of neolocalization, sometimes accompanied by neofunctionalization, were characterized in Liu et al. (2011). Sublocalization, which is the phenomenon of interest in this research, happens after gene duplication when the products of the duplicate genes are each directed to only one of two subcellular locations that were previously targeted by the single ancestral gene (Figure 2). Few examples of sublocalization are known, and a study of duplicated genes in yeast suggested that the phenomenon is relatively rare, at least in this species (Marques et al. 2008). One example of sublocalization has been reported in plants, for the AGPase genes in Zea mays, where the products of the two duplicate genes are targeted 3 respectively to the cytosol and the plastids, in contrast to the single gene in other grasses, whose products is targeted to both locations (R\u00C3\u00B6sti and Denyer 2007). Before a gene undergoes duplication and sublocalization, it may utilize the alternative splicing (AS) mechanism at its transit peptide region to direct its products to different subcellular compartments/organelles (Figure 2). Alternative splicing is a process where the exons of an mRNA are reconnected in different ways during mRNA splicing by using different 5\u00E2\u0080\u0099 and/or 3\u00E2\u0080\u0099 splice sites. There are several types of AS, such as intron retention where a complete intron remains in the mature mRNA, exon skipping where an exon is excluded from the mature mRNA, alternative 5\u00E2\u0080\u0099 splice site which depends on the use of a proximal or distal 5\u00E2\u0080\u0099 splice site, and alterative 3\u00E2\u0080\u0099 splice site which depends on the usage of a proximal or distal 3\u00E2\u0080\u0099 splice site (reviewed in Reddy 2007; Barbazuk et al. 2008). Thesis part 1: My study was divided into two parts. The goal of the first part of my thesis was to study changes in protein subcellular localization after gene duplication by finding and analyzing cases of sublocalization. I focused on genes where alternative splicing results in localization of the gene products to different subcellular locations. I searched for cases where two paralogous genes each have one of the two different alternative splice forms found in a single gene in other angiosperms, causing targeting of the products of each duplicate to different subcellular compartments/organelles (sublocalization). The results of this part of the project were intended to reveal if the phenomenon has occurred multiple times in different lineages for the same gene, which would suggest that it is a relatively common phenomenon after duplication of alternatively spliced genes whose products function in different subcellular compartments. I intended to further analyze cases of sublocalization by examining the duplication history of other species within the same plant family and order to infer if the duplication and sublocalization were relatively recent evolutionary events. After the products of the duplicate genes are differentially targeted, the two genes would be free of the evolutionary constraint imposed by two proteins being coded for by a single alternatively spliced gene. Thus, they are potentially free to accumulate beneficial mutations that might be deleterious in an alternatively spliced ancestral gene, or to undergo relaxed purifying selection. Thus, there could be accelerated amino acid sequence evolution for one or both duplicates. To investigate this possibility, I analyzed the 4 duplicate genes with sublocalization for the presence of an accelerated sequence evolution rate and positive selection. One of the genes of interest for the sublocalization study was Protein Phosphatase 5 (PP5). In plants, PP5 dephosphorylates tyrosine and/or serine and threonine residues. There are two isoforms, the 55-kD isoform, which is localized in the cytosol, and the 62-kD isoform, which is localized in the endoplasmic reticulum. In Arabidopsis thaliana and Solanum lycopersicum (tomato), both the 55-kD and 62-kD isoforms are produced by alternative splicing (de la Fuente van Bentem et al. 2003). The mechanism of alternative splicing in PP5 has been studied in tomato (Solanum lycopersicum), and the two alternative spliced forms are shown in Figure 3. For the mRNA that produces the 55-kD isoform, there are 12 exons split by 11 introns (de la Fuente van Bentem et al. 2003). However in the mRNA that produces the 62-kD isoform, there is an additional exon in the Solanum lycopersicum PP5 (LePP5) gene between the fourth and the fifth exons, which was given the name \u00E2\u0080\u0098exon 4A\u00E2\u0080\u0099. The hydrophobic characteristic of exon 4A suggests that this exon forms a membrane-spanning region (de la Fuente van Bentem et al. 2003). This also suggests that the 62-kD isoform which contains the exon 4A serves as an integral membrane protein. This is consistent with the finding that the 62-kD isoform is localized to the endoplasmic reticulum, and that the 55-kD isoform which does not contain the exon 4A is localized to the cytosol (de la Fuente van Bentem et al. 2003). Another gene of interest for the sublocalization study was the Ascorbate Peroxidase (APX). In plants, APX has a role in the scavenging of H2O2 by using ascorbate as an electron donor (Mano et al. 1997). In Arabidopsis thaliana, there are three isoforms of APX, the cytosolic, microsomal, and chloroplastic APX. The isoform that is of interest is the chloroplastic APX which is divided into the chloroplastic stromal APX (sAPX) and chloroplastic thylakoid APX (tAPX). These different isoforms are targeted to different suborganellar locations, and they are encoded by different genes instead of undergoing alternative splicing in Arabidopsis thaliana (Ishikawa and Shigeoka 2008). However, in some species like Cucurbita (pumpkin) and Nicotiana tabacum, the chloroplastic stromal APX and chloroplastic thylakoid APX are produced by the same gene using alternative splicing (Mano et al. 1997). The mechanism of alternative splicing in chloroplastic APX has been studied in spinach (Spinacia oleracea) in detail (Ishikawa and Shigeoka 2008) and a diagram of tAPX and sAPX structures is shown in 5 Figure 4. The chloroplastic APX consists of 13 exons split by 12 introns, and an exon 12 consists of a stop codon and a potential polyadenylation signal of the sAPX mRNA (Ishikawa et al. 1997). The final exon (exon 13), consists of the corresponding sequence of the hydrophobic anchor region and the entire 3\u00E2\u0080\u0099 untranslated region of tAPX mRNA. In sAPX, the exon 12 which has a stop codon is not spliced out which causes the absence of exon 13 in the encoded proteins. In contrast, the exon 12 is alternatively spliced out in the tAPX which causes the product of tAPX to contain the exon 13 as shown in Figure 4. The hydrophobic anchor region in exon 13 is what causes the tAPX to localize in the thylakoid in chloroplasts. Without this region, the sAPX is localized only in the stroma in chloroplasts. Thesis part 2: The goal for the second part of my study was to study duplicated genes in Arabidopsis thaliana that are candidates for regulatory neofunctionalization in restricted organ types. Specifically, I intended to infer the ancestral state of expression of duplicate genes of interest, to examine the hypothesis of regulatory neofunctionalization through comparison with two outgroup species (Carica papaya and Vitis vinifera). Genes were chosen by identifying duplicates with negative correlation coefficients for gene expressions in different organ types in microarray data. I also tested whether the duplicated genes with restricted organ type expression showed asymmetric sequence rate evolution and positive selection. 6 Chapter Two: Materials and Methods Gene choice and duplicate gene searches For the sublocalization project, genes with differential protein subcellular localization caused by alternative splicing were chosen. The main genes that I studied were ascorbate peroxidase (APX) (Mano et al. 1997; Ishikawa and Shigeoka 2008) the products of which are differentially targeted to the stroma and thylakoids in the chloroplasts, and protein phosphatase 5 (PP5) (de la Fuente van Bentem et al. 2003) the products of which are differentially targeted to the endoplasmic reticulum and the cytosol. I also included thymidylate kinase (TMPK) (Ronceret et al. 2008) the products of which are differentially targeted to the mitochondria and the cytosol; holocarboxylase synthetase 1 (HCS1) (Puyaubert et al. 2008) the products of which are differentially targeted to the plastids and the cytosol; transthyretin-like (TTL) (Lamberto et al. 2010) the products of which are differentially targeted to the peroxisomes and cytosol; and hydroxypyruvate reductase (HPR) (Mano et al. 1999) the products of which are differentially targeted to the leaf peroxisomes and the cytosol. For the regulatory neofunctionalization project, the whole genome duplicate genes derived from the most recent whole genome duplication event in the Brassicaceae family, identified by Blanc et al. (2003) were used. The raw ATH1 microarray data with 63 different organ types and developmental stages (ADA, Arabidopsis Development Atlas; Schmid et al. 2005) were acquired from the TAIR website (http://www.arabidopsis.org/). Python scripts were written by Alexander Hammel, a graduate student from the Adams Lab, to identify duplicates with a negative correlation coefficient for gene expressions in different organ types. These candidate duplicate genes were then manually checked with AtGenExpress Visualization Tool (AVT) (http://jsp.weigelworld.org/expviz/expviz.jsp) (Schmid et al. 2005) to identify duplicate genes with restricted expression in the reproductive organs. These genes were then checked with The Arabidopsis Information Resource (TAIR) (http://www.arabidopsis.org/) for names and functions. Genes with names and known function were the priority choices for this research. 7 The gene sequences were obtained from the National Center for Biotechnology Information (NCBI) website (http://www.ncbi.nlm.nih.gov/), and orthologs for these genes from various angiosperms were obtained from the databases of Plaza (http://bioinformatics.psb.ugent. be/plaza/) (Proost et al. 2009; Van Bel et al. 2012), Phytozome of the Joint Genome Institute (http://www.phytozome.net/) (Goodstein et al. 2012), and NCBI\u00E2\u0080\u0099s GenBank using BLAST searches (http://blast.ncbi.nlm.nih.gov/Blast.cgi). The orthologs were easily selected from Plaza and Phytozome as these websites have grouped their genes into gene families. In the NCBI website, tblastn was chosen to BLAST for sequences in the nucleotide collection (nr/nt) and non- human, non-mouse ESTs (est_others) database. Sequence alignment and phylogenetic analysis The paralogs and orthologs for each gene of interest were translated into amino acid sequences and the sequences were aligned with MUSCLE with default settings (Edgar 2004), and then manually checked using the program BioEdit Sequence Alignment Editor (Hall 1999). Sequences from species with duplicate genes were compared with sequences of species with a single gene that has alternatively splice products. Duplicate genes with one gene showing a similar transit peptide region with one of the alternatively spliced products, and the other gene showing a similar transit peptide region with the other alternatively splice product, were picked as candidate genes for sublocalization. The aligned amino acid sequences were then reverse translated to obtain the codon alignments. Phylogenetic trees were constructed for these genes based on the codon-based DNA alignment using Garli-1.0 with a maximum likelihood (ML) method (Zwickl 2006), using default parameters (GTR + gamma), to view the phylogenetic relationships among the genes and verify duplicates. Bootstrap phylogenetic trees for these genes were constructed with RAxML (Stamatakis 2006), using 100 ML replicates to determine the branch support. The phylogenetic trees were viewed and exported with the program TREEVIEW (Page 1996). 8 Protein Hydrophobicity Plots To determine whether exon 4A in the PP5 gene from Cucumis sativus has similar hydrophobic properties as in the Solanum lycopersicum (tomato) and Arabidopsis thaliana PP5 gene, Kyte-Doolittle hydropathy plots (Kyte and Doolittle 1982) were done at http://www.vivo. colostate.edu/molkit/hydropathy (Bowen, Colorado State University). Plant Materials and Nucleic Acid Extraction Leaves of Solanum lycopersicum (tomato), Glycine max (soybean), Populus trichocarpa (poplar), Ricinus communis (castor oil plant), hypocotyls of Cucumis sativus (cucumber) and roots, stems, leaves, flower, siliques/seeds of Arabidopsis thaliana, Carica papaya and Vitis vinifera were collected. The plant materials collected were frozen immediately in liquid nitrogen and stored in -80\u00CB\u009AC until RNA extraction. A modified CTAB method was used for RNA extraction (Zhou et al. 2011). The amount of extracted RNA was measured with a spectrophotometer. A concentration of 0.05 \u00CE\u00BCg/\u00CE\u00BCl of RNA was prepared for the DNaseI treatment to remove any DNA from the RNA extraction according to the instructions from the manufacturer (Invitrogen). The DNaseI treated RNA was then checked on a 1.0% agarose gel to assess its quality. Reverse transcription polymerase chain reaction (RT-PCR) The DNaseI-treated RNA was used in reverse-transcription reactions to produce cDNA, according to the manufacturer\u00E2\u0080\u0099s instructions, using 1 \u00CE\u00BCl of M-MLV reverse transcriptase (Invitrogen) and 1 \u00CE\u00BCl of random hexamers. The reverse transcription conditions were 25\u00C2\u00B0C for 10 min, 37\u00C2\u00B0C for 50 min, and 70\u00C2\u00B0C for 15 min. To make sure the samples were free of genomic DNA contamination, a parallel reaction without the adding of reverse transcriptase enzyme was performed. The reverse-transcribed samples were then treated with 1 \u00CE\u00BCl of RNase (Invitrogen) at 37\u00C2\u00B0C for 20 min to remove the residual RNA. 9 The PCR primers were designed with OligoAnalyzer 3.1, an online primer design tool provided by Integrated DNA Technologies (http://www.idtdna.com/analyzer/Applications/ OligoAnalyzer). For the sublocalization project, two sets of primers were designed to amplify each of the two duplicate genes within a species if there were enough single-nucleotide polymorphisms (SNPs) to do so. If the two duplicate genes had too few SNPs to distinguish them, a set of primers was designed to co-amplify both genes, and then the product was sequenced to identify the duplicates. For the regulatory neofunctionalization project, a set of primers was designed for the one duplicate which has a restricted organ-specific expression pattern. The primers were also designed at sites with SNPs so only the duplicate of interest was amplified. To perform the polymerase chain reaction (PCR), 1 \u00C2\u00B5l of cDNA, 0.1 \u00C2\u00B5l of Paq5000 TM DNA polymerase (Stratagene), 1 \u00C2\u00B5l of 1x Paq5000 TM reaction buffer, 1 \u00C2\u00B5l of 0.2 mM each dNTPs, and 0.5 \u00C2\u00B5l of each forward and reverse 0.4 \u00CE\u00BCM primer were mixed to a final reaction volume of 10 \u00C2\u00B5l. The cycling conditions were 94\u00CB\u009AC for 3 minutes; 30 cycles of 94\u00CB\u009AC for 30 seconds, the optimal annealing temperature for 30 seconds, and 72\u00CB\u009AC for 30 seconds with a final extension period at 72\u00C2\u00BAC for 7 minutes. The annealing temperature used was 3\u00C2\u00BAC lower than the average melting temperature of the two primers used, and varied for each gene and species. A negative control was also prepared with addition of water instead of cDNA template to be sure that the reagents used were free from DNA contamination. The amplified fragments were resolved with electrophoresis on 1.5% agarose gels. The gels were stained with ethidium bromide (EtBr) for visualization under ultraviolet light. The size of the sequence amplified was compared to the expected size to determine that the correct gene was amplified. Genes that were verified with the RT-PCR were further confirmed with DNA sequencing. 10 Sequencing of RT-PCR products RT-PCR bands from the gels were cut and purified with E.Z.N.A. Gel Extraction Kit (Omega). The purified DNA was then re-amplified by PCR with 25 cycles. The amount of DNA amplified was measured with a spectrophotometer. The sequencing was performed in a reaction mixture containing 0.4 \u00CE\u00BCl of ABI BigDye Version 3.1 (Applied Biosystems), 3.6 \u00CE\u00BCl of BigDye buffer, 5.5 \u00CE\u00BCl of 50 ng template, and 0.5 \u00CE\u00BCl of 0.4\u00CE\u00BCM forward or reverse primer. The sequencing reaction was carried out with the following program: 1 minute at 96\u00C2\u00B0C, and 26 cycles of 10 seconds at 96\u00C2\u00B0C, 5 seconds at 50\u00C2\u00B0C, and 4 minutes at 60\u00C2\u00B0C. The PCR product was then purified with a Sephadex G-50 column. The purified product was sent to the Nucleic Acid Protein Service Unit (NAPS Unit) in the University of British Columbia for capillary electrophoresis. Sequence rate evolution analysis using PAML Sequences of genes of interest and outgroup species were aligned with MUSCLE. Ka/Ks (omega) ratios for each phylogenetic tree branch were estimated with the branch model free- ratios test using Codeml in Phylogenetic Analysis by Maximum Likelihood (PAML) (Yang 2007). The Ka/Ks ratios were used to identify genes with positive selection. In order to determine if the one of the duplicate genes of interest evolved in an asymmetric fashion, two- ratio models and three-ratio models were used. The first model assumes that the branches of the two duplicate genes of interest have one Ka/Ks ratio, while the orthologs in other species have a different ratio, which implies a hypothesis that the two duplicate genes of interest evolved at the same rate. The second model assumes that the branches of the two duplicate genes of interest have different Ka/Ks ratios, which means the two genes evolved at different rates, while the ortholog branch has the third Ka/Ks ratio. A likelihood ratio test was performed, where twice the difference of likelihood values (2\u00CE\u00B4L) was calculated and compared against a chi-square distribution with the degree of freedom (df) equal to df2-df1 (difference of the number of branch- wise Ka/Ks ratios in the two models) to determine whether sequence evolution is asymmetric. The duplicated genes were considered to evolve in an asymmetric fashion if the second model fits better than the first model in a likelihood ratio test. 11 Chapter Three: Sublocalization Results and Discussion Results Duplication and differential localization of Protein Phosphatase 5 (PP5) in Cucumis sativus I searched for duplicates of PP5 in several angiosperm species using the Plaza (Proost et al. 2009; Van Bel et al. 2012) and Phytozome databases (Goodstein et al. 2012). Most species only have a single PP5 gene. I found only two species with two duplicate genes, which are Cucumis sativus and Glycine max. These duplicate genes were aligned and compared with the two isoforms of PP5 from Solanum lycopersicum, which consists of the 62-kD PP5 (endoplasmic reticulum) with the exon 4A region, and the 55-kD PP5 (cytosolic) without the exon 4A region (de la Fuente van Bentem et al. 2003). From the alignment, only Cucumis sativus (cucumber) and Glycine max (soybean) show one gene with the exon 4A region while the other gene without exon 4A corresponding to the Solanum lycopersicum PP5 (Figure 5). RT-PCR was performed to assay whether alternative splice forms were present in these genes. Two sets of different primers were designed to amplify the two Cucumis sativus PP5 duplicates. For Glycine max PP5, due to high similarity between the two duplicates, the same forward and reverse primers were used to amplify both genes. RT-PCR results shows that only one band appears in each lane corresponding to the two Cucumis sativus PP5 duplicates (Figure 6), which indicated that there is no alternative splicing in either duplicate. The RT-PCR bands from the gel were cut and DNA sequencing was performed. The result shows that the correct genes were amplified by the primers, and Cucumis sativus indeed has two duplicates of the PP5 gene, with one gene containing exon 4A and the other gene lacking exon 4A. Thus, PP5 in Cucumis sativus is a candidate for sublocalization after gene duplication. Similarly, the two bands of Glycine max PP5 amplified by a single set of primers were separately cut and sequencing was performed. The sequencing result showed that only one of the two Glycine max PP5 genes was amplified, and this particular gene shows two alternative splice forms, one with the exon 4A region and another one without. This result indicated that Glycine max PP5 is not a possible candidate for sublocalization after gene duplication because one of the duplicates still maintained the two alternative splice forms. 12 Next I evaluated the exon 4A sequence PP5 of Cucumis sativus to determine if it has similar properties to the corresponding sequences from Arabidopsis thaliana and Solanum lycopersicum. The amino acid sequence alignment shows that the sequence of exon 4A in the Solanum lycopersicum 62-kD PP5 isoform has less homology with the Arabidopsis thaliana 62- kD PP5 isoform than in other exons (de la Fuente van Bentem et al. 2003), and this is also the case for the Cucumis sativus 62-kD PP5 product. Hydropathy plots (Kyte-Doolittle) show that there are hydrophobic peaks in the exon 4A region of the Solanum lycopersicum and Arabidopsis thaliana 62-kD PP5 isoforms. This suggested that this region forms the membrane-spanning region and these isoforms could be integral membrane proteins (de la Fuente van Bentem et al. 2003). In order to evaluate whether the exon 4A region of the Cucumis sativus 62-kD PP5 product has the same hydrophobic characteristics, hydropathy plots (Kyte-Doolittle) were constructed (Figure 7). The results shows that the Cucumis sativus 62-kD PP5 product has similar hydrophobic peaks in the exon 4A region as compared to the exon 4A region of the Solanum lycopersicum and Arabidopsis thaliana 62-kD PP5 isoforms. This suggested that the Cucumis sativus 62-kD PP5 product has the similar hydrophobic characteristics as the Solanum lycopersicum and Arabidopsis thaliana 62-kD PP5 isoforms, and this further suggested that this region of the Cucumis sativus 62-kD PP5 product should function as an integral membrane protein, like PP5 in Solanum lycopersicum and Arabidopsis thaliana. In order to understand the phylogenetic relationships of the PP5 duplicates of interest and their outgroups, and to evaluate the likely structure of the ancestral PP5 gene, a consensus phylogenetic tree of PP5 gene family was constructed with RAxML which utilizes the maximum likelihood method (Figure 8). The phylogenetic tree was constructed with 100 bootstrap replicates. The species other than Glycine max labelled with \u00E2\u0080\u0098AS\u00E2\u0080\u0099 in Figure 8 were identified as a single gene with alternative splicing by checking the available ESTs for these species. From the analysis, the ancestral gene is inferred to be a single gene with alternative splicing. At present, Cucumis sativus PP5 is the only PP5 gene known to have two duplicates with each gene only having one alternative splice form. Next I evaluated sequence rate evolution in the PP5 genes of Cucumis sativus in order to identify whether there is asymmetric rate evolution and/or relaxation of purifying selection in one copy (Figure 9). The figure shows a PP5 phylogenetic tree with the branch-wise Ka/Ks 13 ratios designated above the branches. There was a three times difference in the Ka/Ks ratio for the two Cucumis sativus PP5 paralogs. The likelihood ratio test indicated that the Cucumis sativus PP5 gene #1 experienced slightly accelerated sequence evolution relative to gene #2, suggestive of slightly relaxed purifying selection, when the p-value of the likelihood ratio test was set to 0.05. Duplication and differential localization of Ascorbate Peroxidase (APX) The second gene that I studied is ascorbate peroxidase (APX) that is localized to the chloroplast. The Plaza and Phytozome databases show that several species have two or more duplicate genes in the APX gene family. The sequences of these genes were obtained from those two databases, sequences of other species were obtained from the NCBI NR database, and EST sequences of other species were obtained from the NCBI EST database. These sequences were aligned to compare with the two isoforms of Cucurbita tAPX and sAPX (Mano et al. 1997), and the two duplicates of Arabidopsis thaliana tAPX and sAPX (Kangasj\u00C3\u00A4rvi et al. 2008). The Brassicaceae and Poaceae family, Populus trichocarpa (cottonwood), Manihot esculenta (cassava) and Solanum lycopersicum show one gene consisting of the hydrophobic thylakoid spanning domain, whereas the other gene has no hydrophobic thylakoid spanning domain corresponding to the Cucurbita and Arabidopsis thaliana APX genes. Part of these sequence alignments are shown in Figure 10. These genes were chosen as candidates for possible sublocalization. For Glycine max, both of its duplicates consist of the hydrophobic thylakoid spanning domain, and therefore it was not considered a possible candidate for sublocalization. The sequences of different species from the NCBI NR and EST database show no possible candidates for sublocalization. However, there are a few species showing a single gene with alternative splicing at the hydrophobic thylakoid spanning domain. To evaluate the likely structure of the ancestral APX gene and infer if it was a single gene with alternative splicing, or two duplicates without alternative splicing, a bootstrap phylogenetic tree of APX gene family was constructed with RAxML which utilizes the maximum likelihood method (Figure 11). The phylogenetic tree was constructed with 100 bootstrap replicates. This phylogenetic tree also helps to show which species have duplicates and which species have a 14 single gene with alternative splicing. The species other than Carica papaya and Ricinus communis labelled with \u00E2\u0080\u0098AS\u00E2\u0080\u0099 in Figure 11 were identified as a single gene with alternative splicing by checking the available ESTs for these species. I then evaluated sequence rate evolution in the APX genes of interest to identify whether any of the duplicates showed asymmetric sequence rate evolution and/or evidence for relaxation of purifying selection. The Ka/Ks ratios and the likelihood ratio test show that there is not asymmetric rate evolution or relaxed purifying selection in one duplicate compared to the other, for the APX duplicates in Manihot esculenta, Populus trichocarpa, Arabidopsis thaliana and Zea mays (Figure 12). Analyses of APX genes in the Brassicales order The Brassicales order was studied to infer the phylogenetic timing for sublocalization in this order. From the sequences obtained from Plaza and NCBI, the species in the Brassicales order showed two duplicates for the APX gene except for Carica, which only has a singleton. To compare and identify the 3\u00E2\u0080\u0099 end region of the APX sequences, where the differential targeting region is located, APX sequences from species in the Brassicaceae family (Arabidopsis thaliana, Arabidopsis lyrata, Capsella rubella, Brassica oleracea and Brassica rapa) and its close relatives from other families in the Brassicales, Cleome spinosa and Carica papaya, were aligned using MUSCLE (Figure 10). In the Brassicaceae species studied here, one of the duplicates consists of the hydrophobic thylakoid spanning domain at the 3\u00E2\u0080\u0099 end of the gene and the other duplicate does not. This suggested that they are possible cases of sublocalization. The two ESTs of the Cleome spinosa APX are considered two different genes, supported by checking the number of SNPs in the sequences. The ESTs demonstrate that one copy of the duplicates consists of the hydrophobic thylakoid spanning domain at the 3\u00E2\u0080\u0099 end of the gene while the other duplicate does not, as in the Brassicaceae family. From Plaza, only one Carica papaya APX gene was found. However, the EST sequence was used because this sequence is more complete compared to the sequence in Plaza. RT-PCR was performed for the Carica papaya APX, and it shows two bands on the gel. The two bands corresponded to the predicted size of Carica papaya 15 tAPX and sAPX (Figure 13) which suggested that this gene undergoes alternative splicing to form the two isoforms and thus it is not a possible case of sublocalization. Analyses of APX genes in the Malpighiales order The Malpighiales order was studied to infer the phylogenetic timing for sublocalization in this order. From the sequences obtained from Plaza and NCBI, duplicates for the APX gene were found in two species in the Malpighiales order (Manihot esculenta and Populus trichocarpa), whereas Ricinus communis and Euphorbia esula only have a singleton. To compare and identify the 3\u00E2\u0080\u0099 end region of the APX sequences in the Malpighiales order, the APX sequences from Manihot esculenta, Ricinus communis, Populus trichocarpa and Linum usitatissimum were aligned using MUSCLE (Figure 14). From the alignments, only Manihot esculenta and Populus trichocarpa have duplicates with one of the duplicates consisting of the hydrophobic thylakoid spanning domain at the 3\u00E2\u0080\u0099 end of the gene and the other gene lacking it. RT-PCRs were carried out for the genes in these species to check whether alternative splice forms were presented in these genes. For the Populus trichocarpa APX, two distinct reverse primers were designed for each of the two duplicate genes. However, only one forward primer was designed for the two duplicate genes due to the high similarities of these two genes. The RT-PCR result shows that only one band appears in each lane, corresponding to the two Populus trichocarpa APX duplicates (Figure 15), which indicated that there is no alternative splicing in either of the Populus trichocarpa APX duplicates. These RT-PCR gel bands were cut and DNA sequencing was performed. The DNA sequencing further showed that Populus trichocarpa APX indeed has duplicates with each gene containing only one alternatively spliced form. For Manihot esculenta APX, two sets of different primers were designed to amplify the two duplicates. The RT-PCR result shows that only one band appears in each lane corresponding to the two Manihot esculenta APX duplicates (Figure 16), which also suggested that there is no alternative splicing in either of the Manihot esculenta APX duplicate copies. From Plaza, only one Ricinus communis APX gene was found. Primers were designed to amplify this gene with RT-PCR. The RT-PCR gel shows two bands, with the sizes of the two bands corresponding to the predicted size of Ricinus communis tAPX and sAPX (Figure 17). This suggested that this single gene produces two isoforms with alternative splicing and therefore is not a possible case of sublocalization. 16 Analyses of APX genes in the Poaceae family The Poaceae family was studied to infer phylogenetic timing for sublocalization in this family. From the sequences obtained from Plaza and NCBI, all the species in the Poaceae family demonstrated that they have two duplicates for the APX gene while non-Poaceae monocot (Dioscorea alata) demonstrated that it only has a singleton for the APX gene. To compare and identify the 3\u00E2\u0080\u0099 end region of the APX sequences of members of Poaceae family (Sorghum bicolor, Zea mays, Setaria italica, Brachypodium distachyon and Oryza sativa japonica) and its relative non-Poaceae monocot species, Dioscorea alata (purple yam), these APX sequences were aligned using MUSCLE (Figure 18). All of the species in the Poaceae family listed here show three duplicates with one of the duplicates (genes labeled \u00E2\u0080\u00981\u00E2\u0080\u0099 in Figure 18) including the hydrophobic thylakoid spanning domain at the 3\u00E2\u0080\u0099 end of the gene, whereas the other two do not have this hydrophobic thylakoid spanning domain. For Dioscorea alata, the EST sequences from the NCBI EST database shows that it has at least one APX gene with alternative splicing of the hydrophobic 3\u00E2\u0080\u0099 end. This suggested that Dioscorea alata APX is not a case of sublocalization. Discussion Duplication and sublocalization of APX genes the Brassicales order All the genes that encode tAPX from the Brassicaceae family and also Cleome spinosa are in one clade, as shown in the phylogenetic tree (Figure 11), whereas the genes that encode sAPX are in another clade. The Carica papaya APX gene is an outgroup to these clades. The location of the two Cleome spinosa APX ESTs in the phylogenetic tree confirmed that these two ESTs are indeed two different genes. There are two possibilities for the ancestral state of APX in a common ancestor of the Brassicaceae and Cleomaceae prior to gene duplication. One of the possibilities is that the ancestral gene was a single gene with alternative splicing. This ancestral gene then duplicated and underwent sublocalization before the divergence of the Brassicaceae and Cleomaceae families from a common ancestor. Thus, the tAPX in Brassicaceae and Cleomaceae is considered to be an out-paralog to the sAPX genes in the Brassicaceae and Cleomaceae families. Another possibility is that the ancestral genes were two duplicates with each gene having only one alternative splice form. After Carica papaya diverged from the 17 ancestor of the Brassicaceae and Cleomaceae family, these APX gene structures were maintained in the Brassicaceae and Cleomaceae family. Carica papaya APX then lost one of the duplicates and the other duplicate gained the alternative splicing mechanism to produce the two isoforms. Although this possibility cannot be ruled out, the more plausible hypothesis is that the ancestral gene was a single gene with alternative splicing and then gene duplication happened, followed by sublocalization. In the future one could test this hypothesis by obtaining more APX sequences from other rosids to potentially better establish the ancestral gene arrangement (one gene with alternative splicing vs. two genes with no alternative splicing). Independent cases of sublocalization after gene duplications in Populus trichocarpa and Manihot esculenta The APX duplicates in Manihot esculenta and Populus trichocarpa are derived from separate gene duplication events, as the genes of the same species are in individual clades in the phylogenetic tree that are well supported (Figure 11). After Populus trichocarpa diverged from the ancestor of Manihot esculenta and Ricinus communis, an independent gene duplication event occurred in it followed by sublocalization of its APX gene. The duplicates in Populus might have been derived from the whole genome duplication during the evolutionary history of the Salicales. Sometime after Manihot esculenta branched out from Ricinus communis, an independent gene duplication event of APX in Manihot esculenta was followed by sublocalization of its APX gene. The phylogenetic tree suggests that the ancestral state of this gene was a single gene with alternative splicing. This further suggested that there were independent cases of sublocalization in these two species that occurred after the two different gene duplication events. Ricinus communis and Euphorbia esula did not undergo individual gene duplication events and thus they maintained a single gene with alternative splicing. Even though Linum usitatissimum has duplicated APX genes, both genes still maintain the two alternative splice forms, as shown in the alignment (Figure 14). 18 Sublocalization of APX in the Poaceae family All the genes that encode tAPX (genes labeled \u00E2\u0080\u00981\u00E2\u0080\u0099 in Figure 11) from the Poaceae family are in one clade, one of the two genes that encodes sAPX (genes labeled \u00E2\u0080\u00982\u00E2\u0080\u0099 in Figure 11) is in one clade, and all the other genes that encode sAPX (genes labeled \u00E2\u0080\u00983\u00E2\u0080\u0099 in Figure 11) are in another clade. The Dioscorea alata APX, as well as APX genes from other monocots that are outside of the Poaceae family, serves as an outgroup for the APX in the Poaceae family. There are two possibilities for the ancestral state. The more plausible hypothesis is that the ancestral gene was a single gene with alternative splicing, as found in Dioscorea alata, Liriodendron tulipifera, and Curcuma longa. Dioscorea alata diverged from its common ancestor with the Poaceae family and maintained the ancestral APX gene structure of a single gene with alternative splicing. The ancestral gene of the tAPX genes labeled \u00E2\u0080\u00981\u00E2\u0080\u0099 and sAPX genes labeled \u00E2\u0080\u00982\u00E2\u0080\u0099 then diverged from the sAPX genes labeled \u00E2\u0080\u00983\u00E2\u0080\u0099. This ancestral gene then duplicated into two genes, one of which produces proteins with hydrophobic thylakoid spanning domain at the 3\u00E2\u0080\u0099 end, and the other producing proteins without hydrophobic thylakoid spanning domain at the 3\u00E2\u0080\u0099 end. Thus, tAPX genes labeled \u00E2\u0080\u00981\u00E2\u0080\u0099 and sAPX genes labeled \u00E2\u0080\u00982\u00E2\u0080\u0099 are considered cases of sublocalization. The sAPX genes labeled \u00E2\u0080\u00983\u00E2\u0080\u0099 might have lost its tAPX gene counterparts sometime after duplication in the Poaceae family. An alternative possibility is that the genes in the common ancestor of the Poaceae species and Dioscorea were two duplicates with each gene only having one alternatively spliced form. When Dioscorea alata diverged from its common ancestor within the Poaceae family, one of the genes was lost and another gene gained the other alternative splice form. Then the tAPX genes labeled \u00E2\u0080\u00981\u00E2\u0080\u0099 and sAPX genes labeled \u00E2\u0080\u00982\u00E2\u0080\u0099 diverged from the sAPX genes labeled \u00E2\u0080\u00983\u00E2\u0080\u0099 where genes labeled \u00E2\u0080\u00981\u00E2\u0080\u0099 and \u00E2\u0080\u00982\u00E2\u0080\u0099 maintained the ancestral state. This possibility is less likely because it required one gene gaining an alternatively spliced form that corresponds exactly to the alternative splice form found in the copy that would eventually become deleted. Sublocalization after gene duplication is not common in angiosperms Aside from the APX and PP5 genes, there were several other genes from angiosperms with differential protein subcellular localization caused by alternative splicing that were initially chosen for this research. These genes were the thymidylate kinase (TMPK) (Ronceret et al., 19 2008), holocarboxylase synthetase 1 (HCS1) (Puyaubert et al., 2008), transthyretin-like (TTL) (Lamberto et al., 2010), and hydroxypyruvate reductase (HPR) (Mano et al., 1999). There were no candidates for duplication and sublocalization for the TMPK genes, most species have a singleton for the TTL and HCS1 genes, and HPR gene has alternative splicing that appears to be specific to the Cucurbita species. Thus, these genes were not analyzed further. This suggests that sublocalization is not a common phenomenon in angiosperms, which accords with the findings of Liu et al. (unpublished) who found that most protein subcellular relocalization examples examined demonstrated neolocalization and not sublocalization. Thus, neolocalization may have a more essential role on the duplicate gene preservation than sublocalization. In addition, large- scale studies on protein subcellular relocalization in yeast and animals found out that neolocalization was a more common fate for these duplicated genes than sublocalization (Marque et al. 2008; Wang et al. 2009; Qian and Zhang 2009). Sublocalization as a fate of duplicated genes Although relatively rare, sublocalization as a fate of duplicated genes is important in several ways. Sublocalization as a type of protein subcellular relocalization (PSR), might be an evolutionary process which could contribute to new protein functions in a relatively short amount of time, and ultimately the preservation of duplicate genes (Byun-McKay and Geeta 2007). Proteins, in particular their surfaces that are in direct contact with the environment, have been demonstrated to have amino acids that are characteristic of their metabolic surroundings (Andrade et al. 1998). Studies have shown that protein function can be strongly influenced by subcellular location, and that altered protein targeting can cause unpredictable effects on the function of a protein in the cell (Sirover 1999; Bj\u00C3\u00B6rses et al. 2000; Dom\u00C3\u00ADnguez et al. 2003; Bijur and Jope 2003; Simpson and Pepperkok 2006). Relaxation of purifying selection or even positive selection can happen to the duplicate gene shortly after subcellular relocalization as the protein adapts to its new subcellular environment. 20 Chapter Four: Regulatory Neofunctionalization Results and Discussion Identification of alpha whole genome duplicates in Arabidopsis thaliana with contrasting expression breadths To identify gene pairs showing broad expression of one duplicate and expression of the other duplicate limited to a few organ types, the set of genes derived from the most recent whole genome duplication event in the Brassicaceae family (Blanc et al. 2003) was analyzed with a Python script (by Alex Hammel) and then manually checked (by me). Eight duplicate pairs with a name and known function were given priority for further study (Table 1). To evaluate the relationships of the chosen duplicate genes relative to other potential paralogs, phylogenetic trees of the genes of interest were constructed with Garli-1.0, which utilizes the maximum likelihood method (see Methods for details). Restricted organ-specific expression pattern in one of the paralogs According to the microarray data from the Arabidopsis thaliana developmental expression atlas of Schmid et al. (2005), all the duplicate genes of interest have one paralog with restricted expression in certain organ types whereas the other paralog(s) have a broader expression pattern (Figure 19). RT-PCR was carried out to further verify the limited organ- specific expression pattern for the candidate genes. RT-PCR is more sensitive than microarrays and thus genes with low levels of expression in one or more organ types might be detected by RT-PCR. The Arabidopsis thaliana root, stem, leaf, flower, and silique cDNA, as well as genomic DNA (as a control), were used for the RT-PCR. The RT-PCR results (Figure 19) show that ACA7, TTL2 and UBP1 have flower- and silique-specific expression; LORELEI and the paralog of CYT1 have leaf-, flower- and silique-specific expression; CBL7 has root-specific expression (although there was some difficulties with amplifying the genomic DNA); WRKY34 and EMB2271 expression was not detected by RT-PCR, which might be due to their relatively low expression level (and also these genes might only expressed in pollen or flower of certain stages). However, the genomic DNA that served as a positive control showed that the primers for 21 WRKY34 and EMB2271 worked and thus showing that these two genes have no expression in the five organ types tested. The results demonstrated that the eight genes have restricted organ type expression compared to their paralogs which are broadly expressed. Inference of ancestral expression pattern with outgroup species To infer the ancestral expression pattern of the genes of interest, RT-PCR was performed for the orthologs in Carica papaya and Vitis vinifera of each of the eight Arabidopsis gene pairs. These two species were chosen because the lineage of these species has not experienced any whole genome duplication events after the gamma whole genome duplication during early eudicot evolution, which simplifies the analysis as these species usually have singletons. Besides, Carica papaya was chosen as it is in the Brassicales order, which is the order to which Arabidopsis thaliana also belongs. These orthologs were identified by phylogenetic analysis of the gene families (Figure 20). The RT-PCR results (Figure 19) showed that the orthologs in Carica papaya and Vitis vinifera for all the genes of interest were broadly expressed in all examined organ types (root, stem, leaf, flower and seed for both species). From these, the inferred ancestral expression for the duplicate genes in Arabidopsis would be a broad expression pattern. We could then infer that after the broadly expressed Arabidopsis thaliana pre-duplicated gene underwent the alpha whole genome duplication, one of the duplicated genes maintained the broad expression pattern and the other developed an expression pattern restricted to certain organ types. Sequence rate evolution analysis for Arabidopsis thaliana paralogs To test whether the duplicate genes with restricted expression underwent positive selection or relaxed purifying selection, the Ka/Ks ratio for each branch on the phylogeny was calculated with a free-ratio analysis using PAML. The results (Figure 20) showed that all eight sets of duplicates with restricted expression have Ka/Ks ratio smaller than 1, which indicates that all of these genes are under purifying selection. All the duplicates with organ-specific restricted 22 expression patterns, except for UBP1 and paralogs for CYT1, have a higher Ka/Ks ratio compare to their paralogs. A likelihood ratio test was carried out with PAML to test the hypothesis that the duplicates with restricted expression are evolving significantly faster than their paralogs and are under relaxed purifying selection (p \u00E2\u0089\u00A4 0.05). It was found out that four out of the eight sets of duplicates experienced asymmetric sequence evolution, where one duplicate sequence is evolving significantly faster than the other duplicate (Table 2). The four sets of duplicate genes were the ACA7, CBL7, LORELEI, WRKY34 and their paralogs, respectively. The Ka/Ks ratio for WRKY34 was four times higher than that of WRKY2. The WRKY34 gene was considered to have experienced relaxed purifying selection as its Ka/Ks ratio was relatively high comparing to its paralog. The WRKY34 gene has been shown to contribute to mature pollen viability under cold stress (Zou et al. 2010), which is a very different function from its paralog (WRKY2), which is required to reestablish the polar organization of the zygote and to break the symmetry of its division (Ueda et al. 2011), and to mediate seed germination and postgermination developmental arrest by abscisic acid (Jiang and Yu 2009). This suggests that the more rapidly evolving paralog (WRKY34) has not only developed a new expression pattern (microarray and RT-PCR results), but also has likely acquired a derived function after duplication. The Ka/Ks ratio for LORELEI was six times that of LLG1 indicating that LORELEI has experienced relaxed purifying selection after duplication compared with its paralog. The LORELEI gene encodes a protein that causes the pollen tube to release sperm cells upon arrival at the embryo sac, helping double fertilization to take place (Capron et al. 2008; Tsukamoto et al. 2010) whereas its paralog LLG1\u00E2\u0080\u0099s function is unknown at present but has been shown to be not redundant with that of LORELEI in the female gametophyte (Tsukamoto et al. 2010). The Ka/Ks ratio for ACA7 was twice that of its paralog, ACA2 which also suggests that the ACA7 gene has experienced relaxed purifying selection. The ACA7 gene encodes a protein that functions during pollen development, possibly through regulation of Ca 2+ homeostasis (Lucca and Le\u00C3\u00B3n 2012) whereas its paralog ACA2 encodes a Ca 2+ pump which actively transports Ca 2+ (Hwang et al. 2000). The Ka/Ks ratio for CBL7 is 19 times of CBL3 and six times of CBL2. The CBL7 gene was considered to be experiencing relaxed purifying selection as its Ka/Ks ratio was relatively high compared to its paralogs. The function for CBL7 is unknown at present. 23 Regulatory neofunctionalization as a fate of duplicated genes Similar to sublocalization after gene duplication, regulatory neofunctionalization might cause protein functional divergence and long-term preservation after gene duplication. Liu et al. (2011) have proposed several molecular mechanisms that cause differences in expression patterns. One of the mechanisms was the divergence of cis-regulatory element regions between duplicates. Haberer et al. (2004) discovered that in the segmental duplicates and tandem duplicates, there was high similarity of cis-elements, albeit with high divergence in their expression patterns, thus suggesting that minor changes in cis-element regions could cause regulatory neofunctionalization. Tandem duplicates are usually derived from unequal crossing over, and there is the potential that only part of a cis-element region is duplicated (Achaz et al. 2000), potentially causing regulatory neofunctionalization. The findings in this project were in accord with the findings of Liu et al. (2011) who found that BSL1 has a broad expression and its paralog, BSU1 showed strong expression in mature pollen. Another interesting finding of Liu et al. (2011) was that diverged expression patterns were most commonly observed in the pollen and silique. This is in agreement with the finding of this project where three out of the eight duplicates of interest (ACA7, TTL2 and UBP1) have only flower- and silique-specific expression, and another two of the eight duplicates (WRKY34 and EMB2271) have potentially only pollen- specific expression. Another two duplicates (LORELEI and the paralog of CYT1) also have flower- and silique-specific expression, but with expression also in leaf. Out of the eight duplicates of interest, CBL7 was the only gene with root-specific expression. 24 Chapter Five: Conclusions and Future Directions My thesis revealed cases of sublocalization of the APX gene in the Brassicales and Malpighiales orders and in the Poaceae family. The only species that shows sublocalization in PP5 is Cucumis sativus. The results of this project also suggested that sublocalization after gene duplication is not a common phenomenon in angiosperms. Despite that sublocalization is relatively rare, it is important as it might serve as an evolutionary process that could contribute to novel protein functions. As more genomic data are available, more species could be added to test whether those species show phenomenon of sublocalization for the PP5 and APX gene. If there are more findings of single gene with alternatively spliced products that differentially targeted, future researchers could try to find more examples in these genes to show that there are more cases of sublocalization other than those reported here. Future researchers could also perform Green Fluorescent Protein (GFP) assay to further support the idea of sublocalization in these genes. My thesis also revealed new cases of regulatory neofunctionalization after gene duplication. This project has shown eight examples of regulatory neofunctionalization with the microarray data and RT-PCR data. WRKY34 was a particularly interesting gene as its function is different from its paralog (WRKY2). The WRKY34 gene has been shown to contribute to mature pollen viability under cold stress (Zou et al. 2010), which is a very different function from WRKY2, which is required to reestablish the polar organization of the zygote and to break the symmetry of its division (Ueda et al. 2011), and to mediate seed germination and postgermination developmental arrest by abscisic acid (Jiang and Yu 2009). Aside from this, WRKY34 also has a restricted expression pattern and shows relaxed purifying selection when comparing the two paralogs. This supported the idea that regulatory neofunctionalization is sometimes followed by relaxed purifying selection where the rapidly evolving paralog acquires novel functions. Future researchers could perform expression assays with our genes of interest in other flowering plant species with duplicate genes to test whether these duplicates also show a restricted expression pattern in one copy and broad expression in another copy. By knowing the expression patterns in other species, one could infer the phylogenetic timing for the regulatory 25 neofunctionalization phenomenon in those genes. In the future, when more is known about the functions of the genes of interest, one could compare the functions of the paralogs with different expression patterns to test the hypothesis of whether regulatory neofunctionalization could cause the development of new functions. Future researchers could also perform expression assays and sequence rate analysis with tandem duplicates instead of the alpha whole genome duplicates used in this project. Besides, a large-scale identification study of regulatory neofunctionalization could also be performed to gain general knowledge of regulatory neofunctionalization in flowering plants. 26 Tables and Figures Table 1. The name and locus number for the duplicate genes of interest and their related publications. Paralog with restricted expression pattern Paralog with broad expression pattern Related publications ACA7 Auto-regulated Calcium ATPase 7 (AT2G22950) ACA2 Auto-regulated Calcium ATPase 2 (AT4G37640) Lucca and Le\u00C3\u00B3n 2012; Hwang et al. 2000. CBL7 Calcineurin B-Like 7 (AT4G26560) CBL3 Calcineurin B-Like 3 (AT4G26570) & CBL2 Calcineurin B-Like 2 (AT5G55990) Liu et al. 2013; Zhou et al. 2009; Batisti\u00C4\u008D et al. 2012. EMB2271 Embryo defective 2271 (AT4G21130) YAO (AT4G05410) Li et al. 2010. LORELEI (AT4G26466) LLG1 LORELEI-Like-GPI-anchored protein 1 (AT5G56170) Capron et al. 2008; Tsukamoto et al. 2010. TTL2 Tetratricopeptide-repeat Thioredoxin-Like 2 (AT3G14950) TTL1 Tetratricopeptide-repeat Thioredoxin-Like 1 (AT1G53300) Lakhssassi et al. 2012; Schapire et al. 2006. UBP1 Ubiquitin-specific Protease 1 (AT2G32780) UBP2 Ubiquitin-specific Protease 2 (AT1G04860) Yan et al. 2000. WRKY34 transcription factor (AT4G26440) WRKY2 transcription factor (AT5G56270) Zou et al. 2010; Jiang and Yu 2009; Ueda et al. 2011. Unknown (AT3G55590) CYT1 Cytokinesis defective 1 (AT2G39770) Lukowitz et al. 2001. 27 Table 2. The results for the likelihood ratio test. The twice of difference of likelihood values (2\u00CE\u00B4L) calculated and the p-values for the duplicate genes of interest. Whether or not there is asymmetric evolution between the duplicates is shown at the last column. Duplicate genes 2\u00CE\u00B4L p-value Asymmetric evolution ACA7 & ACA2 8.40 0.0037 Yes CBL7 & CBL3 & CBL2 10.33 0.0057 Yes EMB2271 & YAO 0.99 0.3206 No LORELEI & LLG1 5.22 0.0223 Yes TTL2 & TTL1 1.86 0.1724 No UBP1 & UBP2 0.41 0.5233 No WRKY34 & WRKY2 12.14 0.0004 Yes CYT1 & AT3G55590 0.02 0.8937 No 28 Figure 1. Regulatory subfunctionalization and regulatory neofunctionalization. A) Regulatory subfunctionalization is where one duplicate maintain parts of the ancestral expression pattern whereas another duplicate maintain the other parts of the ancestral expression pattern. B) Regulatory neofunctionalization is where one duplicate maintained the broad expression pattern in different organs as in the ancestral expression pattern whereas another duplicate only has expression in certain organ types (e.g. reproductive organs). Regulatory neofunctionalization also can refer to a duplicate gaining expression in a new organ type where the ancestral gene was not expressed. 29 Figure 2. Targeting of a protein to different subcellular compartments caused by alternative splicing (AS) and sublocalization after gene duplication. A) AS at the transit peptide region of the mature mRNAs causes the encoded proteins to direct to two different subcellular locations. B) Sublocalization happens after gene duplication when the products of the duplicate genes are each directed to only one of two subcellular locations that were previously targeted by the single ancestral gene. 30 Figure 3. Outline of the 62-kD PP5 and 55-kD PP5 alternative splice forms. For the mRNA that encodes the 62-kD PP5 isoform, the presence of the exon 4A between the fourth and the fifth exons causes this protein to localize in the endoplasmic reticulum. For the mRNA that encodes the 55-kD PP5 isoform, the absence of the exon 4A between the fourth and the fifth exons causes this protein to localize to the cytosol (Redrawn from de la Fuente van Bentem et al. 2003). Figure 4. Outline of the thylakoid APX (tAPX) and the stromal APX (sAPX) alternative splice forms. In sAPX, the exon 12 which has a stop codon (red) is not spliced out which causes the absence of exon 13 in the encoded proteins. In contrast, the exon 12 is alternatively spliced out in the tAPX which causes the product of tAPX to contain the exon 13, which serves as an anchor region (Redrawn from Mano et al. 1997; Ishikawa and Shigeoka 2008). 31 Figure 5. The amino acid alignments for PP5. The amino acid alignments of the two alternative spliced forms of the Arabidopsis thaliana and Solanum lycopersicum PP5 respectively, and products of the Cucumis sativus and Gycine max duplicate genes. The region in the middle (between aligned residue 158 and 239) is the exon 4A region of PP5. Figure 6. Expression and splicing of the Cucumis sativus (cucumber) PP5. Shown is an RT- PCR gel. Only one band was observed for each set of primers used, which were specific to each duplicated gene. This shows that there is no alternative splice form in either gene. 32 Figure 7. Hydropathy plots (Kyte-Doolittle) of Cucumis sativus (cucumber) PP5, Arabidopsis thaliana PP5, Solanum lycopersicum (tomato) PP5. The Exon 4A region is depicted in the middle of the plot. Regions with values above 0 show hydrophobic character. Membrane-spanning domains usually have values larger than 1.6 on the Kyte-Doolittle scale. 33 Figure 8. Maximum likelihood tree for PP5 genes. Numbers beside branches indicate bootstrap support. The Cucumis sativus PP5 genes are shown in the green boxes. The \u00E2\u0080\u0098AS\u00E2\u0080\u0099 beside the taxon name shows that this particular gene has its mRNA alternatively spliced. The \u00E2\u0080\u00981\u00E2\u0080\u0099 beside Cucumis indicates that this is a duplicate gene which only produces the 55-kD (cytosolic) PP5. The \u00E2\u0080\u00982\u00E2\u0080\u0099 beside Cucumis indicates that this is a duplicate gene which only produces the 62- kD (endoplasmic reticulum) PP5. The \u00E2\u0080\u0098A\u00E2\u0080\u0099 and \u00E2\u0080\u0098B\u00E2\u0080\u0099 beside Glycine indicate that these are duplicate genes where the products of one of the copies have alternatively spliced forms and whether the other copy\u00E2\u0080\u0099s products have alternatively spliced forms is unknown. The \u00E2\u0080\u0098sg\u00E2\u0080\u0099 beside the taxon name indicates that it is a single gene but whether its products undergo alternative splicing is unknown. 34 Figure 9. A PP5 phylogenetic tree with Ka/Ks ratios. Branch lengths were generated with Codeml in PAML, and the scale bar indicates nucleotide substitutions per codon. Branch-wise Ka/Ks ratios are designated above the branches. The Cucumis sativus PP5 genes are shown in the green boxes. The \u00E2\u0080\u0098AS\u00E2\u0080\u0099 beside the taxon name shows that this particular gene has its mRNA alternatively spliced. The \u00E2\u0080\u00981\u00E2\u0080\u0099 beside Cucumis indicates that this is a duplicate gene which only produces the 55-kD (cytosolic) PP5. The \u00E2\u0080\u00982\u00E2\u0080\u0099 beside Cucumis indicates that this is a duplicate gene which only produces the 62-kD (endoplasmic reticulum) PP5. The \u00E2\u0080\u0098sg\u00E2\u0080\u0099 beside the taxon name indicates that it is a single gene but whether its products undergo alternative splicing is unknown. 35 Figure 10. The APX amino acid sequence alignment of sequences from the Brassicaceae family (Arabidopsis thaliana, Arabidopsis lyrata, Capsella rubella, Brassica oleracea and Brassica rapa) and its close relatives, Cleome spinosa and Carica papaya. The regions shown here are the hydrophobic thylakoid spanning domains at the 3\u00E2\u0080\u0099 end of the genes. All the species in the Brassicaceae family listed here have two duplicate genes. One of the duplicates consists of the hydrophobic thylakoid spanning domain at the 3\u00E2\u0080\u0099 end of the gene and the other does not. Cleome spinosa is thought to have two duplicate genes as evidenced by SNPs in the ESTs. The Carica papaya has a single gene with alternative splicing. The Carica papaya EST shows the tAPX isoform but is not complete at the 3\u00E2\u0080\u0099 end. 36 Figure 11. Maximum likelihood tree for APX genes. Numbers beside branches indicate bootstrap support. The Brassicaceae family and Cleome spinosa APX genes are shown in the green boxes, the Manihot esculenta APX genes in the orange boxes, the Populus trichocarpa APX genes in the brown boxes, and the Poaceae family APX genes in the red boxes. The \u00E2\u0080\u0098AS\u00E2\u0080\u0099 beside the taxon name shows that this particular gene has its mRNA alternatively spliced. The \u00E2\u0080\u00981\u00E2\u0080\u0099 beside the taxon name indicates that this is a duplicate gene which only produces the thylakoid APX. The \u00E2\u0080\u00982\u00E2\u0080\u0099 or \u00E2\u0080\u00983\u00E2\u0080\u0099 beside the taxon name indicates that this is a duplicate gene which only produces the stromal APX. The \u00E2\u0080\u0098sg\u00E2\u0080\u0099 beside the taxon name indicates that it is a single gene, but whether its products undergo alternative splicing is unknown. The \u00E2\u0080\u0098EST\u00E2\u0080\u0099 beside the taxon name indicates \u00E2\u0080\u0098expressed sequence tag\u00E2\u0080\u0099, and \u00E2\u0080\u0098A\u00E2\u0080\u0099, \u00E2\u0080\u0098B\u00E2\u0080\u0099, \u00E2\u0080\u0098C\u00E2\u0080\u0099, \u00E2\u0080\u0098D\u00E2\u0080\u0099 indicate different EST sequences. 37 Figure 12. An APX phylogenetic tree with Ka/Ks ratios. Branch lengths were generated with Codeml in PAML, and the scale bar indicates nucleotide substitutions per codon. Branch-wise Ka/Ks ratios are designated above the branches. The Manihot esculenta APX genes are shown in the orange boxes, the Populus trichocarpa APX genes in the brown boxes, the Arabidopsis thaliana APX genes in the green boxes, and the Zea mays APX genes in the red boxes. The \u00E2\u0080\u0098AS\u00E2\u0080\u0099 beside the taxon name shows that this particular gene has its mRNA alternatively spliced. The \u00E2\u0080\u00981\u00E2\u0080\u0099 beside the taxon name indicates that this is a duplicate gene which only produces the thylakoid APX. The \u00E2\u0080\u00982\u00E2\u0080\u0099 or \u00E2\u0080\u00993\u00E2\u0080\u0099 beside the taxon name indicates that this is a duplicate gene which only produces the stromal APX. The \u00E2\u0080\u0098sg\u00E2\u0080\u0099 beside the taxon name indicates that it is a single gene but whether its products undergo alternative splicing is unknown. 38 Figure 13. Expression and splicing of Carica papaya APX. Shown is an RT-PCR gel. Two bands were observed for the RT+ sample, which corresponded to the sAPX and tAPX. There was no band observed for the RT- sample. This shows that the Carica papaya APX single gene has two alternative splice forms, and therefore it is not a possible candidate of sublocalization. Figure 14. An APX amino acid sequence alignment of four species from the Malpighiales order (Manihot esculenta, Ricinus communis, Populus trichocarpa and Linum usitatissimum). The regions shown here are the hydrophobic thylakoid spanning domains at the 3\u00E2\u0080\u0099 end of the genes. Manihot esculenta and Populus trichocarpa have duplicates in which one of the duplicates has a hydrophobic thylakoid spanning domain at the 3\u00E2\u0080\u0099 end of the gene and another does not. Ricinus communis has a single gene with alternative splicing, and its EST only shows the sAPX isoform. Linum usitatissimum has duplicate genes for APX, but both duplicates undergo alternative splicing to form the two isoforms. Linum usitatissimum\u00E2\u0080\u0099s sequences labeled as \u00E2\u0080\u0098A1\u00E2\u0080\u0099 and \u00E2\u0080\u0098A2\u00E2\u0080\u0099 are two isoforms from one duplicate gene, and sequences labeled as \u00E2\u0080\u0098B1\u00E2\u0080\u0099 and \u00E2\u0080\u0098B2\u00E2\u0080\u0099 are two isoforms from another duplicate gene. 39 Figure 15. Expression and splicing of Populus trichocarpa (cottonwood) APX. An RT-PCR gel is shown. Only one band was observed for each set of primers, which were specific to each duplicated gene. This shows that there is no alternatively spliced form in either gene. Figure 16. Expression and splicing of the Manihot esculenta (cassava) APX. An RT-PCR gel is shown. Only one band was observed for each set of primers, which were specific to each duplicated gene. This shows that there is no alternative splice form in either gene. 40 Figure 17. Expression and splicing of Ricinus communis APX. Two bands were observed for the RT+ sample, which corresponded to the sAPX and tAPX. There was no band observed for the RT- sample. This shows that the Ricinus communis APX single gene has alternative splice forms, and therefore it is not a possible candidate of sublocalization. Figure 18. An APX amino acid sequence alignment of sequences from the Poaceae family (Zea mays, Sorghum bicolor, Setaria italica, Brachypodium distachyon and Oryza sativa japonica) and its relative, but non-Poaceae species, Dioscorea alata (Dioscoreaceae family). The regions shown here are the hydrophobic thylakoid spanning domains at the 3\u00E2\u0080\u0099 end of the genes. All of the species in the Poaceae family here show three duplicates with one of the duplicates (genes labeled \u00E2\u0080\u00981\u00E2\u0080\u0099) including a hydrophobic thylakoid spanning domain at the 3\u00E2\u0080\u0099 end of the gene, whereas the other two do not have this hydrophobic thylakoid spanning domain. Dioscorea alata has a single gene with alternative splicing which forms the tAPX isoform (labeled \u00E2\u0080\u0098AS A\u00E2\u0080\u0099 and the sAPX isoform (labeled \u00E2\u0080\u0098AS B\u00E2\u0080\u0099). 41 42 43 Figure 19. Gene expression from microarray assays and RT-PCR expression assays. Gene expression data from microarray assays for each duplicate gene of interest were obtained from the AtGenExpress Visualization Tool (AVT) (Schmid et al. 2005). RT-PCR expression assays were performed using five organ types: root, stem, leaf, flower, and silique (Arabidopsis thaliana)/ seed (Carica papaya and Vitis vinifera), which are listed above the corresponding columns. The first rows are Arabidopsis thaliana duplicate with restricted expression pattern from the microarray assay data. The second and third rows are Carica papaya (CP) and Vitis vinifera (VV) orthologs for the orthologous genes. Plus signs indicate the presence of reverse transcriptase in the reaction, and minus signs indicate the absence of reverse transcriptase as RT negative controls. Microarray data for LORELEI were not available from the AtGenExpress Visualization Tool (AVT). Actin in each species was used as a positive control. 44 45 46 Figure 20. Phylogenetic trees with Ka/Ks ratios for eight different duplicated gene pairs from Arabidopsis and orthologs from outgroup species. Phylogenetic trees of all eight genes consist of sequences from Arabidopsis thaliana, Carica papaya, Ricinus communis, Populus trichocarpa/Manihot esculenta, and Vitis vinifera. Branch lengths were generated with Codeml in PAML, and the scale bar indicates nucleotide substitutions per codon. Branch-wise Ka/Ks ratios are designated above the branches. 47 References Achaz G, Coissac E, Viari A, Netter P. 2000. Analysis of intrachromosomal duplications in yeast Saccharomyces cerevisiae: a possible model for their origin. Mol Biol Evol. 17:1268\u00E2\u0080\u00931275. Andrade MA, O'Donoghue SI, Rost B. 1998. Adaptation of protein surfaces to subcellular location. J. Mol. Biol. 276:517\u00E2\u0080\u0093525. Barbazuk WB, Fu Y, McGinnis KM. 2008. Genome-wide analyses of alternative splicing in plants: Opportunities and challenges. Genome Res. 18: 1381\u00E2\u0080\u00931392. Barker MS, Vogel H, Schranz ME. 2009. Paleopolyploidy in the Brassicales: analyses of the Cleome transcriptome elucidate the history of genome duplications in Arabidopsis and other Brassicales. Genome Biol Evol. 1:391\u00E2\u0080\u0093399. Batisti\u00C4\u008D O, Rehers M, Akerman A, Schl\u00C3\u00BCcking K, Steinhorst L, Yalovsky S, Kudla J. 2012. S- acylation-dependent association of the calcium sensor CBL2 with the vacuolar membrane is essential for proper abscisic acid responses. Cell Res. 22:1155-1168. Bijur GN, Jope RS. 2003. Glycogen synthase kinase 3b is highly activated in nuclei and mitochondria. Neurochem. 14:2415\u00E2\u0080\u00932419. Bj\u00C3\u00B6rses P, Halonen M, Palvimo JJ, Kolmer M, Aaltonen J, Ellonen P, Perheentupa J, Ulmanen I, Peltonen L. 2000. Mutations in the AIRE Gene: effects on subcellular location and transactivation function of the autoimmune polyendocrinopathy-candidiasis-ectodermal dystrophy protein. Am. J. Hum. Genet. 66:378\u00E2\u0080\u0093392. Blanc G, Hokamp K, Wolfe KH. 2003. A recent polyploidy superimposed on older large-scale duplications in the Arabidopsis genome. Genome Res. 13:137\u00E2\u0080\u0093144. Blanc G, Wolfe KH. 2004. Widespread paleopolyploidy in model plant species inferred from age distributions of duplicate genes. Plant Cell. 16:1667\u00E2\u0080\u00931678. Byun-McKay SA, Geeta R. 2007. Protein subcellular relocalization: a new perspective on the origin of novel genes. Trends Ecol Evol. 22:338-344. Byun McKay SA, Geeta R, Duggan R, Carroll B, and McKay SJ. 2009. Missing the Subcellular Target: A Mechanism of Eukaryotic Gene Evolution Evolutionary Biology: Concept, Modeling, and Application. Chapter 10:175-183. Capron A, Gourgues M, Neiva LS, Faure JE, Berger F, Pagnussat G, Krishnan A, Alvarez-Mejia C, Vielle-Calzada JP, Lee YR, Liu B, Sundaresan V. 2008. Maternal Control of Male-Gamete Delivery in Arabidopsis Involves a Putative GPI-Anchored Protein Encoded by the LORELEI Gene. Plant Cell. 20:3038\u00E2\u0080\u00933049. 48 Cui L, Wall PK, Leebens-Mack JH, Lindsay BG, Soltis DE, Doyle JJ, Soltis PS, Carlson JE, Arumuganathan K, Barakat A, Albert VA, Ma H, dePamphilis CW. 2006. Widespread genome duplications throughout the history of flowering plants. Genome Res. 16:738\u00E2\u0080\u0093749. de la Fuente van Bentem S, Vossen JH, Vermeer JE, de Vroomen MJ, Gadella TW Jr, Haring MA, Cornelissen BJ. 2003. The Subcellular Localization of Plant Protein Phosphatase 5 Isoforms Is Determined by Alternative Splicing. Plant Physiol 133:702-712. Dermuth JP, Hahn MW. 2009. The life and death of gene families. Bioessays. 31:23\u00E2\u0080\u009339. Dom\u00C3\u00ADnguez D, Montserrat-Sent\u00C3\u00ADs B, Virg\u00C3\u00B3s-Soler A, Guaita S, Grueso J, Porta M, Puig I, Baulida J, Franc\u00C3\u00AD C, Garc\u00C3\u00ADa de Herreros A. 2003. Phosphorylation regulates the subcellular location and activity of the snail transcriptional repressor. Mol. Cell. Biol. 23:5078\u00E2\u0080\u00935089. Edgar RC. 2004. MUSCLE: multiple sequence alignment with high accuracy and high throughput. Nucleic Acids Res. 32:1792\u00E2\u0080\u00931797. Farr\u00C3\u00A9 D, Alb\u00C3\u00A0 MM. 2010. Heterogeneous patterns of gene-expression diversification in mammalian gene duplicates. Mol Biol Evol. 27:325\u00E2\u0080\u0093335. Freeling M. 2008. The evolutionary position of subfunctionalization, downgraded. Genome Dyn. 4:28\u00E2\u0080\u009340. Goodstein DM, Shu S, Howson R, Neupane R, Hayes RD, Fazo J, Mitros T, Dirks W, Hellsten U, Putnam N, Rokhsar DS. 2012. Phytozome: a comparative platform for green plant genomics, Nucleic Acids Res. 40 (D1): D1178-D1186. Haberer G, Hindemitt T, Meyers BC, Mayer KF. 2004. Transcriptional similarities, dissimilarities, and conservation of cis-elements in duplicated genes of Arabidopsis. Plant Physiol. 136:3009\u00E2\u0080\u00933022. Hahn MW. 2009. Distinguishing among evolutionary models for the maintenance of gene duplicates. J Hered. 100:605\u00E2\u0080\u0093617. Hall TA. 1999. BioEdit: a user-friendly biological sequence alignment editor and analysis program for Windows 95/98/NT. Nucleic Acids Symp Ser. 41:95\u00E2\u0080\u009398. Hastings PJ, Lupski JR, Rosenberg SM, Ira G. 2009. Mechanisms of change in gene copy number. Nat Rev Genet. 10:551\u00E2\u0080\u0093564. Hwang I, Harper JF, Liang F, Sze H. 2000. Calmodulin activation of an endoplasmic reticulum- located calcium pump involves an interaction with the N-terminal autoinhibitory domain. Plant Physiol. 122:157\u00E2\u0080\u0093167. Innan H, Kondrashov F. 2010. The evolution of gene duplications: classifying and distinguishing between models. Nat Rev Genet. 11:97\u00E2\u0080\u0093108. 49 Ishikawa T, Shigeoka S. 2008. Recent advances in ascorbate biosynthesis and the physiological significance of ascorbate peroxidase in photosynthesizing organisms. Biosci Biotechnol Biochem 72:1143\u00E2\u0080\u00931154. Ishikawa T, Yoshimura K, Tamoi M, Takeda T, Shigeoka S. 1997. Alternative mRNA splicing of 3\u00E2\u0080\u0099-terminal exons generates ascorbate peroxidase isoenzymes in spinach (Spinacia oleracea) chloroplasts. Biochem. J. 328, 795\u00E2\u0080\u0093800. Jiang W, Yu D. 2009. Arabidopsis WRKY2 transcription factor mediates seed germination and postgermination arrest of development by abscisic acid. BMC Plant Biol. 9:96. Jiao Y, Wickett NJ, Ayyampalayam S, Chanderbali AS, Landherr L, Ralph PE, Tomsho LP, Hu Y, Liang H, Soltis PS, Soltis DE, Clifton SW, Schlarbaum SE, Schuster SC, Ma H, Leebens- Mack J, dePamphilis CW. 2011. Ancestral polyploidy in seed plants and angiosperms. Nature. 473:97\u00E2\u0080\u0093100. Kangasj\u00C3\u00A4rvi S, Lepist\u00C3\u00B6 A, H\u00C3\u00A4nnik\u00C3\u00A4inen K, Piippo M, Luomala EM, Aro EM, Rintam\u00C3\u00A4ki E. 2008. Diverse roles for chloroplast stromal and thylakoid-bound ascorbate peroxidases in plant stress responses. Biochem. J. 412, 275\u00E2\u0080\u0093285. Kellis M, Birren BW, Lander ES. 2004. Proof and evolutionary analysis of ancient genome duplication in the yeast Saccharomyces cerevisiae. Nature. 428:617-624. Kyte J, Doolittle RF. 1982. A simple method for displaying the hydropathic character of a protein. J Mol Biol. 157(1):105-32. Lakhssassi N, Doblas VG, Rosado A, del Valle AE, Pos\u00C3\u00A9 D, Jimenez AJ, Castillo AG, Valpuesta V, Borsani O, Botella MA. 2012. The Arabidopsis tetratricopeptide thioredoxin-like gene family is required for osmotic stress tolerance and male sporogenesis. Plant Physiol. 158:1252\u00E2\u0080\u00931266. Lamberto I, Percudani R, Gatti R, Folli C, Petrucco S. 2010. Conserved Alternative Splicing of Arabidopsis Transthyretin-Like Determines Protein Localization and S-Allantoin Synthesis in Peroxisomes. Plant Cell. 22:1564\u00E2\u0080\u00931574. Li HJ, Liu NY, Shi DQ, Liu J, Yang WC. 2010. YAO is a nucleolar WD40-repeat protein critical for embryogenesis and gametogenesis in Arabidopsis. BMC Plant Biol. 10:169. Liu LL, Ren HM, Chen LQ, Wang Y, Wu WH. 2013. A protein kinase, calcineurin B-like protein-interacting protein Kinase9, interacts with calcium sensor calcineurin B-like Protein3 and regulates potassium homeostasis under low-potassium stress in Arabidopsis. Plant Physiol. 161:266\u00E2\u0080\u0093277. Liu SL, Baute GJ, Adams KL. 2011. Organ and Cell Type\u00E2\u0080\u0093Specific Complementary Expression Patterns and Regulatory Neofunctionalization between Duplicated Genes in Arabidopsis thaliana. Genome Biol. Evol. 3:1419\u00E2\u0080\u00931436. 50 Lucca N, Le\u00C3\u00B3n G. 2012. Arabidopsis ACA7, encoding a putative auto-regulated Ca(2+)-ATPase, is required for normal pollen development. Plant Cell Rep. 31(4):651-659. Lukowitz W, Nickle TC, Meinke DW, Last RL, Conklin PL, Somerville CR. 2001. Arabidopsis cyt1 mutants are deficient in a mannose-1-phosphate guanylyltransferase and point to a requirement of N-linked glycosylation for cellulose biosynthesis. Proc Natl Acad Sci U S A. 98(5):2262-2267. Mano S, Hayashi M, Nishimura M. 1999. Light regulates alternative splicing of hydroxypyruvate reductase in pumpkin. Plant J. 17(3):309\u00E2\u0080\u0093320. Mano S, Yamaguchi K, Hayashi M, Nishimura M. 1997. Stromal and thylakoid-bound ascorbate peroxidases are produced by alternative splicing in pumpkin. FEBS Lett. 413:21\u00E2\u0080\u009326. Marques AC, Vinckenbosch N, Brawand D, Kaessmann H. 2008. Functional diversification of duplicate genes through subcellular adaptation of encoded proteins. Genome Biol. 9:R54. Oakley TH, Ostman B, Wilson AC. 2006. Repression and loss of gene expression outpaces activation and gain in recently duplicated fly genes. Proc Natl Acad Sci U S A. 103:11637\u00E2\u0080\u0093 11641. Ohno S. 1970. Evolution by Gene Duplication. Allen and Unwin, London. Page RD. 1996. TreeView: an application to display phylogenetic trees on personal computers. Comput Appl Biosci. 12:357\u00E2\u0080\u0093358. Proost S, Van Bel M, Sterk L, Billiau K, Van Parys T, Van de Peer Y, Vandepoele K. 2009. PLAZA: a comparative genomics resource to study gene and genome evolution in plants. Plant Cell. 21: 3718-3731. Puyaubert J, Denis L, Alban C. 2008. Dual Targeting of Arabidopsis holocarboxylase synthetase1: A Small Upstream Open Reading Frame Regulates Translation Initiation and Protein Targeting. Plant Physiol. 146:478\u00E2\u0080\u0093491. Qian W, Zhang J. 2009. Protein Subcellular Relocalization in the Evolution of Yeast Singleton and Duplicate Genes. Genome. Biol. Evol. 1:198\u00E2\u0080\u0093204. Reddy AS. 2007. Alternative Splicing of Pre-Messenger RNAs in Plants in the Genomic Era. Annu. Rev. Plant Biol. 58:267\u00E2\u0080\u009394. Rizzon C, Ponger L, Gaut BS. 2006. Striking similarities in the genomic distribution of tandemly arrayed genes in Arabidopsis and rice. PLoS Comput Biol. 2:e115. Ronceret A, Gadea-Vacas J, Guilleminot J, Lincker F, Delorme V, Lahmy S, Pelletier G, Chabout\u00C3\u00A9 ME, Devic M. 2008. The first zygotic division in Arabidopsis requires de novo transcription of thymidylate kinase. Plant J. 53:778\u00E2\u0080\u0093789. 51 R\u00C3\u00B6sti S, Denyer K. 2007. Two Paralogous Genes Encoding Small Subunits of ADP-glucose Pyrophosphorylase in Maize, Bt2 and L2, Replace the Single Alternatively Spliced Gene Found in Other Cereal Species. J Mol Evol 65:316\u00E2\u0080\u0093327. Schapire AL, Valpuesta V, Botella MA. 2006. TPR Proteins in Plant Hormone Signaling. Plant Signal Behav. 1(5):229-230. Schmid M, Davison TS, Henz SR, Pape UJ, Demar M, Vingron M, Sch\u00C3\u00B6lkopf B, Weigel D, Lohmann J. 2005. A gene expression map of Arabidopsis thaliana development. Nat Genet. 37:501\u00E2\u0080\u0093506. Schmutz J, Cannon SB, Schlueter J, Ma J, Mitros T, Nelson W, Hyten DL, Song Q, Thelen JJ, Cheng J, Xu D, Hellsten U, May GD, Yu Y, Sakurai T, Umezawa T, Bhattacharyya MK, Sandhu D, Valliyodan B, Lindquist E, Peto M, Grant D, Shu S, Goodstein D, Barry K, Futrell-Griggs M, Abernathy B, Du J, Tian Z, Zhu L, Gill N, Joshi T, Libault M, Sethuraman A, Zhang XC, Shinozaki K, Nguyen HT, Wing RA, Cregan P, Specht J, Grimwood J, Rokhsar D, Stacey G, Shoemaker RC, Jackson SA. 2010. Genome sequence of the palaeopolyploid soybean. Nature. 463:178\u00E2\u0080\u0093183. S\u00C3\u00A9mon M, Wolfe KH. 2007. Consequences of genome duplication. Curr Opin Genet Dev. 17:505\u00E2\u0080\u0093512. Simpson JC, Pepperkok R. 2006. The subcellular localization of the mammalian proteome comes a fraction closer. Genome Biol. 7:222.1\u00E2\u0080\u0093222.3. Sirover MA. 1999. New insights into an old protein: the functional diversity of mammalian glyceraldehyde-3-phosphate dehydrogenase. Biochim. Biophys. Acta 1432:159\u00E2\u0080\u0093184. Stamatakis A. 2006. RAxML-VI-HPC: maximum likelihood-based phylogenetic analyses with thousands of taxa and mixed models. Bioinformatics. 22:2688\u00E2\u0080\u00932690. Sterck L, Rombauts S, Jansson S, Sterky F, Rouz\u00C3\u00A9 P, Van de Peer Y. 2005. EST data suggest that poplar is an ancient polyploidy. New Phytol. 167:165\u00E2\u0080\u0093170. Tang H, Bowers J, Wang X, Paterson AH. 2010. Angiosperm genome comparisons reveal early polyploidy in the monocot lineage. Proc Natl Acad Sci U S A. 107:472\u00E2\u0080\u0093477. Tirosh I, Barkai N. 2007. Comparative analysis indicates regulatory neofunctionalization of yeast duplicates. Genome Biol. 8:R50. Tsukamoto T, Qin Y, Huang Y, Dunatunga D, Palanivelu R. 2010. A role for LORELEI, a putative glycosylphosphatidylinositol-anchored protein, in Arabidopsis thaliana double fertilization and early seed development. Plant J. 62:571\u00E2\u0080\u0093588. Ueda M, Zhang Z, Laux T. 2011. Transcriptional activation of Arabidopsis axis patterning genes WOX8/9 links zygote polarity to embryo development. Developmental Cell 20(2):264-270. 52 Van Bel M, Proost S, Wischnitzki E, Movahedi S, Scheerlinck C, Van de Peer Y, Vandepoele K. 2012. Dissecting plant genomes with the PLAZA comparative genomics platform. Plant Physiol. 158:590-600. Wang X, Huang Y, Lavrov DV, Gu X. 2009. Comparative study of human mitochondrial proteome reveals extensive protein subcellular relocalization after gene duplications. BMC Evol Biol. 9:275. Wood TE, Takebayashi N, Barker MS, Mayrose I, Greenspoon PB, Rieseberg LH. 2009. The frequency of polyploid speciation in vascular plants. Proc Natl Acad Sci U S A. 106:13875\u00E2\u0080\u0093 13879. Yan N, Doelling JH, Falbel TG, Durski AM, Vierstra RD. 2000. The ubiquitin-specific protease family from Arabidopsis. AtUBP1 and 2 are required for the resistance to the amino acid analog canavanine. Plant Physiol. 124:1828\u00E2\u0080\u00931843. Yang Z. 2007. PAML 4: phylogenetic analysis by maximum likelihood. Mol Biol Evol. 24:1586\u00E2\u0080\u00931591. Zhang, J. 2003. Evolution by gene duplication: an update. Trends Ecol. Evol. 18, 292\u00E2\u0080\u0093298. Zhou L, Fu Y, Yang Z. 2009. A genome-wide functional characterization of Arabidopsis regulatory calcium sensors in pollen tubes. J Integr Plant Biol. 51(8):751\u00E2\u0080\u0093761. Zhou R, Moshgabadi N, Adams KL. 2011. Extensive changes to alternative splicing patterns following allopolyploidy in natural and resynthesized polyploids. Proc Natl Acad Sci U S A. 108:16122-16127. Zou C, Jiang W, Yu D. 2010. Male gametophyte-specific WRKY34 transcription factor mediates cold sensitivity of mature pollen in Arabidopsis. J Exp Bot. 61(14):3901\u00E2\u0080\u00933914. Zwickl D. 2006. Genetic algorithm approaches for the phylogenetic analysis of large biological sequence datasets under the maximum likelihood criterion. PhD thesis, University of Texas at Austin, TX. "@en . "Thesis/Dissertation"@en . "2013-05"@en . "10.14288/1.0073688"@en . "eng"@en . "Botany"@en . "Vancouver : University of British Columbia Library"@en . "University of British Columbia"@en . "Attribution-NonCommercial-NoDerivatives 4.0 International"@en . "http://creativecommons.org/licenses/by-nc-nd/4.0/"@en . "Graduate"@en . "Fates of genes after duplication : sublocalization and regulatory neofunctionalization"@en . "Text"@en . "http://hdl.handle.net/2429/44264"@en .