UBC Theses and Dissertations

UBC Theses Logo

UBC Theses and Dissertations

Improving white spruce genome annotation and generation of a chromosome-scale epigenetic map Yucel, Irem

Abstract

White spruce (Picea glauca; Pinaceae) is a conifer native to the northern temperate and boreal forests of North America. It is a resilient tree that tolerates variations in climatic conditions. As a result, it is often used as a model for studying the genetic makeup and adaptability of conifer trees. A previous genome assembly of P. glauca, generated using short- and linked-read sequencing data, had over 2.4 million scaffolds with an NG50 length (a measure of assembly contiguity indicating that at least half of the expected genome size is in pieces at least the NG50 length) of 131 kb. Here I produce and report on an improved assembly of P. glauca, built using long nanopore sequencing reads and scaffolded with linked-read sequencing data, which represents one of the most contiguous (NG50 length = 2.3 Mbp) and gene-complete (56.1% complete BUSCO genes in the Embryophyta lineage) genomes of this size (~20 Gb). The new assembly was annotated using BRAKER2, which predicted 68,796 genes with a mean length of 18 kb. Repeat masking demonstrates that approximately 90% of the white spruce genome consists of repeat sequences, the majority of which are long terminal repeats (LTRs). Among other sequenced conifer species, phylogenetic analysis finds the closest relative of the white spruce to be interior, Engelmann and Sitka spruces. Orthogroup analysis recovered 2,024 genes found only in white spruce and not in other spruce or pines I analyzed. These genes are enriched in Gene Ontology (GO) terms related to biotic and abiotic stress responses. I used epigenetic information inherent in the long-read sequencing data to conduct a methylome analysis using NanoMethPhase. Using this, I identified a total of 320,946,144 CpG sites and 12,698 quality-filtered allelic differentially methylated regions (DMRs) for the genome. A total number of 1,930 of the annotated genes intersect with these allelic DMRs and this overlapping subset if enriched with GO terms related to plant responses to external damage and pathogen infection. The updated white spruce genome assembly with its detailed annotation and epigenetic map described here provides a valuable resource for furthering conifer research.

Item Media

Item Citations and Data

Rights

Attribution-NonCommercial-NoDerivatives 4.0 International