Statistical and experimental methods for causal inference at complex trait associated loci

BIRS Workshop Lecture Videos

Featured Collection

BIRS Workshop Lecture Videos

Statistical and experimental methods for causal inference at complex trait associated loci Brown, Christoher

Description

Genome-wide association studies (GWAS) have identified thousands of loci that contribute to risk for complex diseases. The majority of the heritability of complex disease risk lies within the noncoding regions of the genome. This has led to the hypothesis that the causal variants at GWAS associated loci lead to changes in local gene expression. As a result of linkage disequilibrium and the fact that cis- regulatory elements (CREs) may target genes over large distances, it is often unclear which variant or gene affects disease risk. However, their identification will improve understanding of disease etiology and identify targets for novel therapeutic development. Recent work from efforts such as GTEx has identified genetic variation associated with gene expression variation for essentially every gene. Despite this wealth of data, the characterization of causal mechanisms at complex trait associate loci remains a significant challenge. To address this challenge, we have developed and applied high throughput computational and experimental approaches to identify candidate disease genes and the functional regulatory variants that mediate disease risk. We have focused on cardiovascular disease (CVD) and molecular trait mapping in the liver as model systems. Existing studies have focused on easily ascertained cell types, while the liver, which plays a critical role in regulating cholesterol and lipid metabolism, and where many CVD associated variants likely affect gene expression, has remained understudied. We have deeply phenotyped liver biopsies and iPSC derived hepatocytes form more than 400 donors, collecting RNA-seq along with histone modification and transcription factor ChIP-seq data. We have used these data to identify thousands of genetic variants associated with allele-specific transcription factor binding, histone modification, gene expression, and splicing. Comparison to data from the GTEx and Roadmap Epigenomics projects demonstrate that many of these associations are specific to the liver. We demonstrate that multi-phenotype molecular trait mapping improves statistical power to detect associations and results in improved resolution at identified loci. We have integrated these data with CVD GWAS data using a novel multi-phenotype causal inference framework based on Mendelian randomization to predict the precise variants, CREs, and genes that underlie CVD risk. Using a combination of massively parallel reporter assays, genome-edited stem cells, CRISPR interference, and in vivo mouse models, we establish rs2277862-CPNE1, rs10889356-ANGPTL3, rs10889356-DOCK7, and rs10872142-FRK as causal SNP- gene sets for CVD. These results demonstrate that a molecular trait mapping framework can rapidly identify causal genes and variants contributing to complex human traits and demonstrates that, at many GWAS loci, candidate genes have been falsely implicated based on proximity to the lead SNP.

Item Metadata

Title	Statistical and experimental methods for causal inference at complex trait associated loci
Creator	Brown, Christoher
Publisher	Banff International Research Station for Mathematical Innovation and Discovery
Date Issued	2017-03-30T09:48
Description	Genome-wide association studies (GWAS) have identified thousands of loci that contribute to risk for complex diseases. The majority of the heritability of complex disease risk lies within the noncoding regions of the genome. This has led to the hypothesis that the causal variants at GWAS associated loci lead to changes in local gene expression. As a result of linkage disequilibrium and the fact that cis- regulatory elements (CREs) may target genes over large distances, it is often unclear which variant or gene affects disease risk. However, their identification will improve understanding of disease etiology and identify targets for novel therapeutic development. Recent work from efforts such as GTEx has identified genetic variation associated with gene expression variation for essentially every gene. Despite this wealth of data, the characterization of causal mechanisms at complex trait associate loci remains a significant challenge. To address this challenge, we have developed and applied high throughput computational and experimental approaches to identify candidate disease genes and the functional regulatory variants that mediate disease risk. We have focused on cardiovascular disease (CVD) and molecular trait mapping in the liver as model systems. Existing studies have focused on easily ascertained cell types, while the liver, which plays a critical role in regulating cholesterol and lipid metabolism, and where many CVD associated variants likely affect gene expression, has remained understudied. We have deeply phenotyped liver biopsies and iPSC derived hepatocytes form more than 400 donors, collecting RNA-seq along with histone modification and transcription factor ChIP-seq data. We have used these data to identify thousands of genetic variants associated with allele-specific transcription factor binding, histone modification, gene expression, and splicing. Comparison to data from the GTEx and Roadmap Epigenomics projects demonstrate that many of these associations are specific to the liver. We demonstrate that multi-phenotype molecular trait mapping improves statistical power to detect associations and results in improved resolution at identified loci. We have integrated these data with CVD GWAS data using a novel multi-phenotype causal inference framework based on Mendelian randomization to predict the precise variants, CREs, and genes that underlie CVD risk. Using a combination of massively parallel reporter assays, genome-edited stem cells, CRISPR interference, and in vivo mouse models, we establish rs2277862-CPNE1, rs10889356-ANGPTL3, rs10889356-DOCK7, and rs10872142-FRK as causal SNP- gene sets for CVD. These results demonstrate that a molecular trait mapping framework can rapidly identify causal genes and variants contributing to complex human traits and demonstrates that, at many GWAS loci, candidate genes have been falsely implicated based on proximity to the lead SNP.
Extent	19 minutes
Subject	Mathematics; Statistics; Biology and other natural sciences
Type	Moving Image
File Format	video/mp4
Language	eng
Notes	Author affiliation: University of Pennsylvania
Series	BIRS Workshop Lecture Videos (Banff, Alta)
Date Available	2017-09-26
Provider	Vancouver : University of British Columbia Library
Rights	Attribution-NonCommercial-NoDerivatives 4.0 International
DOI	10.14288/1.0355795
URI	http://hdl.handle.net/2429/63127
Affiliation	Non UBC
Peer Review Status	Unreviewed
Scholarly Level	Faculty
Rights URI	http://creativecommons.org/licenses/by-nc-nd/4.0/
Aggregated Source Repository	DSpace

Item Media

201703300948-Brown_lrv.mp4 -- 59.11MB

Item Citations and Data

Rights

Attribution-NonCommercial-NoDerivatives 4.0 International

Open Collections

BIRS Workshop Lecture Videos

Statistical and experimental methods for causal inference at complex trait associated loci Brown, Christoher

Description

Item Metadata

Item Media

Item Citations and Data

Rights