Summarizing tens of thousands of RNA-seq samples: themes and lessons

BIRS Workshop Lecture Videos

Featured Collection

BIRS Workshop Lecture Videos

Summarizing tens of thousands of RNA-seq samples: themes and lessons Langmead, Ben

Description

The Sequence Read Archive contains RNA-seq data for over 450K samples, including over 140K from humans. Large-scale projects like GTEx and ICGC are generating RNA-seq data on many thousands of samples. Such huge datasets are valuable, but unwieldy for typical researchers. I will describe work toward the goal of making it easy for researchers to use the archived RNA-seq data available today. I will highlight Rail-RNA (http://rail.bio), its dbGaP-protected version (http://docs.rail.bio/dbgap/), as well as the recount resource (https://jhubiostatistics.shinyapps.io/recount/) and Snaptron service/API (http://snaptron.cs.jhu.edu). Besides showcasing these tools and resources, I'll expound three themes: (a) pulic data is valuable but not easy to use and computationalists should attack this; (b) scalability is not just about scaling software to be distributed & multi-threaded, but is also about making the best use of many datasets at once; (c) "strategically unplugging" from gene annotations can lead to clearer statements about splicing and differential expression.

Item Metadata

Title	Summarizing tens of thousands of RNA-seq samples: themes and lessons
Creator	Langmead, Ben
Publisher	Banff International Research Station for Mathematical Innovation and Discovery
Date Issued	2017-03-30T10:37
Description	The Sequence Read Archive contains RNA-seq data for over 450K samples, including over 140K from humans. Large-scale projects like GTEx and ICGC are generating RNA-seq data on many thousands of samples. Such huge datasets are valuable, but unwieldy for typical researchers. I will describe work toward the goal of making it easy for researchers to use the archived RNA-seq data available today. I will highlight Rail-RNA (http://rail.bio), its dbGaP-protected version (http://docs.rail.bio/dbgap/), as well as the recount resource (https://jhubiostatistics.shinyapps.io/recount/) and Snaptron service/API (http://snaptron.cs.jhu.edu). Besides showcasing these tools and resources, I'll expound three themes: (a) pulic data is valuable but not easy to use and computationalists should attack this; (b) scalability is not just about scaling software to be distributed & multi-threaded, but is also about making the best use of many datasets at once; (c) "strategically unplugging" from gene annotations can lead to clearer statements about splicing and differential expression.
Extent	26.0
Subject	Mathematics; Statistics; Biology and other natural sciences
Type	Moving Image
File Format	video/mp4
Language	eng
Notes	Author affiliation: John Hopkins University
Series	BIRS Workshop Lecture Videos (Banff, Alta)
Date Available	2019-03-12
Provider	Vancouver : University of British Columbia Library
Rights	Attribution-NonCommercial-NoDerivatives 4.0 International
DOI	10.14288/1.0376771
URI	http://hdl.handle.net/2429/68609
Affiliation	Non UBC
Peer Review Status	Unreviewed
Scholarly Level	Faculty
Rights URI	http://creativecommons.org/licenses/by-nc-nd/4.0/
Aggregated Source Repository	DSpace

Item Media

201703301037-Langmead_lrv.mp4 -- 75.65MB

Item Citations and Data

Rights

Attribution-NonCommercial-NoDerivatives 4.0 International

Open Collections

BIRS Workshop Lecture Videos

Summarizing tens of thousands of RNA-seq samples: themes and lessons Langmead, Ben

Description

Item Metadata

Item Media

Item Citations and Data

Rights