BIRS Workshop Lecture Videos
Systematic Discovery of Conservation States for Single-Nucleotide Annotation of the Human Genome Ernst, Jason
Genome-wide association studies have identified a large number of non-coding genomic loci in the human genome associated with disease, whose biological significance is poorly understood. Additional annotations largely based on either functional genomics or comparative genomics data have been used to gain insights into such locations and potentially prioritize likely causal variants among those in linkage disequilibrium. A widely used representation of the functional genomics data is through chromatin states produced by methods such as ChromHMM, which provides cell type specific annotations based on the combinatorial and spatial patterns in epigenomic data. Comparative genomic data provides complementary information as it is not dependent on having data from the appropriate cell or tissue type and can provide single nucleotide resolution information. Recent analyses have suggested constrained elements are among the genomic annotations most enriched for disease heritability. However the currently widely used representations of conservation information focus on either binary calls or a single univariate score from phylogenetic models, and thus do not capture potentially valuable information contained in the multi-species alignments of an increasing number of available species. Here we develop a novel method based on a multivariate hidden Markov model, ConsHMM, to annotate the human genome at single nucleotide resolution into a large number of different conservation states based on the combinatorial patterns of which species align to and which match the human reference genome within a 100-way multi-species alignment. The various conservation states show distinct enrichment properties for other genomic annotations such as regions of open chromatin, CpG islands, transcription start sites, and exons. Using our conservation states we can isolate subsets of existing constrained elements that show enrichments for disease associated heritability and putative regulatory regions identified by functional genomics data from those that do not as well identify additional subsets of bases showing these enrichments outside of the constrained elements.
Item Citations and Data
Attribution-NonCommercial-NoDerivatives 4.0 International