Harmonization of SNP identifiers

UBC Theses and Dissertations

Featured Collection

UBC Theses and Dissertations

Harmonization of SNP identifiers Tripathi, Chhavi

Abstract

While data generation has been, and will remain crucial to making scientific discoveries, our ability to analyze data has not been at par with data generation. Therefore, it is important to direct our efforts towards making sense of the data already produced. In this thesis, the harmonization of single nucleotide polymorphism (SNP) identifiers is investigated. Harmonization of SNP identifiers means having the same identifier for a SNP every time it occurs. Harmonizing SNP identifiers would allow the genetic data from different datasets to become comparable, which would allow re-purposing of existing datasets in public repositories. Genetic data helps in associating genetic alterations with disease and health. Genetic data is being generated at a rate faster than Moore’s law. With the intention of making generated data available to all researchers in the world, public repositories like the UK Biobank, European Genome-phenome archive (EGA), and database of Genotypes and Phenotypes (dbGaP) have been set up to host public data and disseminate it according to protocols established. The data in these repositories is from different time points, is generated using different genotyping arrays, and is submitted by researchers all over the world. This leads to a large degree of heterogeneity in the data. In order to make the most of the data, they need to be harmonized. The greater the overlap between two datasets, the easier it is to harmonize them. Thus, in order to assess the extent to which datasets can be harmonized, it is important to perform an overlap between them. SNPs are of most interest in genetic datasets. Because of the numerous kinds of identifiers a SNP may have, determining the number and identity of overlapping SNPs between datasets is challenging and increases in complexity with the number of comparisons (SNPs and datasets). There is no tool available to perform on-the-fly harmonization of SNP identifiers. The SNP Overlap Tool (SPOT) was designed to harmonize SNP identifiers using the SNP chromosomal locations, and subsequently calculate the overlap of SNPs between two datasets. It is a web-based tool, coded in Java programming language.

Item Metadata

Title	Harmonization of SNP identifiers
Creator	Tripathi, Chhavi
Publisher	University of British Columbia
Date Issued	2018
Description	While data generation has been, and will remain crucial to making scientific discoveries, our ability to analyze data has not been at par with data generation. Therefore, it is important to direct our efforts towards making sense of the data already produced. In this thesis, the harmonization of single nucleotide polymorphism (SNP) identifiers is investigated. Harmonization of SNP identifiers means having the same identifier for a SNP every time it occurs. Harmonizing SNP identifiers would allow the genetic data from different datasets to become comparable, which would allow re-purposing of existing datasets in public repositories. Genetic data helps in associating genetic alterations with disease and health. Genetic data is being generated at a rate faster than Moore’s law. With the intention of making generated data available to all researchers in the world, public repositories like the UK Biobank, European Genome-phenome archive (EGA), and database of Genotypes and Phenotypes (dbGaP) have been set up to host public data and disseminate it according to protocols established. The data in these repositories is from different time points, is generated using different genotyping arrays, and is submitted by researchers all over the world. This leads to a large degree of heterogeneity in the data. In order to make the most of the data, they need to be harmonized. The greater the overlap between two datasets, the easier it is to harmonize them. Thus, in order to assess the extent to which datasets can be harmonized, it is important to perform an overlap between them. SNPs are of most interest in genetic datasets. Because of the numerous kinds of identifiers a SNP may have, determining the number and identity of overlapping SNPs between datasets is challenging and increases in complexity with the number of comparisons (SNPs and datasets). There is no tool available to perform on-the-fly harmonization of SNP identifiers. The SNP Overlap Tool (SPOT) was designed to harmonize SNP identifiers using the SNP chromosomal locations, and subsequently calculate the overlap of SNPs between two datasets. It is a web-based tool, coded in Java programming language.
Genre	Thesis/Dissertation
Type	Text
Language	eng
Date Available	2018-04-16
Provider	Vancouver : University of British Columbia Library
Rights	Attribution-NonCommercial-NoDerivatives 4.0 International
DOI	10.14288/1.0365716
URI	http://hdl.handle.net/2429/65417
Degree (Theses)	Master of Science - MSc
Program (Theses)	Experimental Medicine
Affiliation	Medicine, Faculty of; Medicine, Department of
Degree Grantor	University of British Columbia
Graduation Date	2018-05
Campus	UBCV
Scholarly Level	Graduate
Rights URI	http://creativecommons.org/licenses/by-nc-nd/4.0/
Aggregated Source Repository	DSpace

Open Collections

UBC Theses and Dissertations

UBC Theses and Dissertations

Harmonization of SNP identifiers Tripathi, Chhavi

Abstract

Item Metadata

Item Media

Item Citations and Data

Rights