UBC Theses and Dissertations

UBC Theses Logo

UBC Theses and Dissertations

A new data driven framework for simulating mendelian randomization data Tinti Tomio, Giuseppe


Mendelian randomization (MR) is a causal inference method that allows biostatisticians to leverage DNA measurements to study causal effects with only observed data. Recent advancements including two-sample summary-level mendelian randomization (TSSLMR) and the data source IEU OpenGWAS database have lowered the barrier for conducting MR studies and opened the opportunity to mine causal effects. In the first part of the thesis, I show that there is a mismatch between the characteristics of modern TSSLMR data and how articles that propose popular TSSLMR models conduct their simulations. Next, I propose my solution: a data driven simulation framework for MR data that aims to be realistic, interpretable and easy to use thanks to a complementary R package implementation. As for the results, I show that models perform far better in literature-based simulations compared to more realistic simulations based on my proposed framework. Lastly, I warn that the mismatch between simulated and real data along with the obtained results may lead researchers to have over optimistic expectations about models performance in real applications.

Item Citations and Data


Attribution-NonCommercial-NoDerivatives 4.0 International