UBC Theses and Dissertations

UBC Theses Logo

UBC Theses and Dissertations

Short-read DNA sequence alignment with custom designed FPGA-based hardware Hall, Adam

Abstract

The alignment of short DNA read sequencing data to a human reference genome sequence has become a standard step in the analysis pipeline for short DNA read sequence data. As the rate at which short read DNA sequence data is being produced doubles every 5 months, analysis of this data in a computationally efficient way is becoming increasingly important. We demonstrate how we can exploit the ``embarrassingly parallel'' property of short read sequence alignment in custom-designed hardware in FPGA’s. Hardware is chosen, a system is designed, and this system is implemented. My FPGA-based hit finder was demonstrated to produce correct hit results. The performance of this single FPGA implementation was demonstrated to be 71,000 seed hits found per hour on a human genome sized reference sequence. The implementation was demonstrated to produce identical results to the hit finder stage of the MAQ aligner. We demonstrate that the price/performance of this sliding-window FPGA aligner (approximately ~355 seeds/hr/$) compares favorably to the price/performance of sliding-window software aligners (approximately ~67.5 seeds/hr/$ for MAQ). However, software aligners which are based on the superior Burrows-Wheeler alignment algorithm still have a significant price/performance advantage over the FPGA-based approach (approximately ~7,200 seeds/hr/$). We predict that as chips continue to increase in size due to Moore’s Law and computation is performed in high-density cloud-computing datacenters the FPGA-based approach will become preferable to current software aligners.

Item Media

Item Citations and Data

Rights

Attribution-NonCommercial-NoDerivatives 4.0 International