Short-read DNA sequence alignment with custom designed FPGA-based hardware

UBC Theses and Dissertations

Featured Collection

UBC Theses and Dissertations

Short-read DNA sequence alignment with custom designed FPGA-based hardware Hall, Adam

Abstract

The alignment of short DNA read sequencing data to a human reference genome sequence has become a standard step in the analysis pipeline for short DNA read sequence data. As the rate at which short read DNA sequence data is being produced doubles every 5 months, analysis of this data in a computationally efficient way is becoming increasingly important. We demonstrate how we can exploit the ``embarrassingly parallel'' property of short read sequence alignment in custom-designed hardware in FPGA’s. Hardware is chosen, a system is designed, and this system is implemented. My FPGA-based hit finder was demonstrated to produce correct hit results. The performance of this single FPGA implementation was demonstrated to be 71,000 seed hits found per hour on a human genome sized reference sequence. The implementation was demonstrated to produce identical results to the hit finder stage of the MAQ aligner. We demonstrate that the price/performance of this sliding-window FPGA aligner (approximately ~355 seeds/hr/$) compares favorably to the price/performance of sliding-window software aligners (approximately ~67.5 seeds/hr/$ for MAQ). However, software aligners which are based on the superior Burrows-Wheeler alignment algorithm still have a significant price/performance advantage over the FPGA-based approach (approximately ~7,200 seeds/hr/$). We predict that as chips continue to increase in size due to Moore’s Law and computation is performed in high-density cloud-computing datacenters the FPGA-based approach will become preferable to current software aligners.

Item Metadata

Title	Short-read DNA sequence alignment with custom designed FPGA-based hardware
Creator	Hall, Adam
Publisher	University of British Columbia
Date Issued	2010
Description	The alignment of short DNA read sequencing data to a human reference genome sequence has become a standard step in the analysis pipeline for short DNA read sequence data. As the rate at which short read DNA sequence data is being produced doubles every 5 months, analysis of this data in a computationally efficient way is becoming increasingly important. We demonstrate how we can exploit the ``embarrassingly parallel'' property of short read sequence alignment in custom-designed hardware in FPGA’s. Hardware is chosen, a system is designed, and this system is implemented. My FPGA-based hit finder was demonstrated to produce correct hit results. The performance of this single FPGA implementation was demonstrated to be 71,000 seed hits found per hour on a human genome sized reference sequence. The implementation was demonstrated to produce identical results to the hit finder stage of the MAQ aligner. We demonstrate that the price/performance of this sliding-window FPGA aligner (approximately ~355 seeds/hr/$) compares favorably to the price/performance of sliding-window software aligners (approximately ~67.5 seeds/hr/$ for MAQ). However, software aligners which are based on the superior Burrows-Wheeler alignment algorithm still have a significant price/performance advantage over the FPGA-based approach (approximately ~7,200 seeds/hr/$). We predict that as chips continue to increase in size due to Moore’s Law and computation is performed in high-density cloud-computing datacenters the FPGA-based approach will become preferable to current software aligners.
Genre	Thesis/Dissertation
Type	Text
Language	eng
Date Available	2010-11-18
Provider	Vancouver : University of British Columbia Library
Rights	Attribution-NonCommercial-NoDerivatives 4.0 International
DOI	10.14288/1.0071441
URI	http://hdl.handle.net/2429/30008
Degree	Master of Science - MSc
Program	Bioinformatics
Affiliation	Science, Faculty of
Degree Grantor	University of British Columbia
Graduation Date	2011-05
Campus	UBCV
Scholarly Level	Graduate
Rights URI	http://creativecommons.org/licenses/by-nc-nd/4.0/
Aggregated Source Repository	DSpace

Open Collections

UBC Theses and Dissertations

UBC Theses and Dissertations

Short-read DNA sequence alignment with custom designed FPGA-based hardware Hall, Adam

Abstract

Item Metadata

Item Media

Item Citations and Data

Rights