UBC Library and Archives

Open Research Data 2010

You don't seem to have a PDF reader installed, try download the pdf

Item Metadata

Download

Media
Piwowar_Heather_Open_Research_Data.pdf
Piwowar_Heather_Open_Research_Data.pdf [ 11.92MB ]
Piwowar_Heather_Open_Research_Data.wmv [ 156.01MB ]
Piwowar_Heather_Open_Research_Data.wmv
Metadata
JSON: 1.0077835.json
JSON-LD: 1.0077835+ld.json
RDF/XML (Pretty): 1.0077835.xml
RDF/JSON: 1.0077835+rdf.json
Turtle: 1.0077835+rdf-turtle.txt
N-Triples: 1.0077835+rdf-ntriples.txt
Citation
1.0077835.ris

Full Text

Open research data Heather Piwowar DataONE postdoc with Dryad and NESCent, UBC @researchremix OA week 2010 University of British Columbia #1 It matters http://www.metmuseum.org/toah/ho/09/euwf/ho_24.45.1.htm http://www.flickr.com/photos/jsmjr/62443357/ http://www.flickr.com/photos/camilleharrington/3587294608/ http://www.flickr.com/photos/rkuhnau/3318245976/ http://www.flickr.com/photos/conformpdx/1796399674/ http://www.flickr.com/photos/rkuhnau/3317418699/ http://www.flickr.com/photos/zemlinki/261617721/ http://www.flickr.com/photos/tracenmatt/3020786491/ http://www.flickr.com/photos/the-o/2078239333/  http://www.flickr.com/photos/75166820@N00/5318468/ #2 Wayfinding + progress http://www.flickr.com/photos/paulhami/1020538523// http://www.flickr.com/photos/paulhami/1020538523// Which data? http://www.flickr.com/photos/paulhami/1020538523// Where? http://www.flickr.com/photos/paulhami/1020538523// With whom? http://www.flickr.com/photos/paulhami/1020538523// When? http://www.flickr.com/photos/paulhami/1020538523// Under what terms? http://www.flickr.com/photos/paulhami/1020538523// Find Organize Document Deidentify Format Ask Submit Answer questions Worry about mistakes being found Worry about data being misinterpreted Worry about being scooped Forgo money and IP and prestige??? not very motivating. http://www.flickr.com/photos/tonivc/2283676770/ http://www.flickr.com/photos/johnnyvulkan/381941233/ a) policies +   expectations - NSF - Joint Data Archiving Policy - BioMed Central - PLoS b) repositories - datatype-based - institution-based - discipline-based - journal-based c) standards - data licenses - data citation - IDs for datasets, people, entities d) part of something   bigger - open government data - citizen science - supplemental materials - dataset-based usage metrics - awards, recognition #3 Is it working? http://www.genome.jp/en/db_growth.html lots of data sharing! but how much isn’t  shared? what isn’t shared? who isn’t sharing it? why not? what can we do  about it? how much does it matter? you can not manage what you do not measure quote:  Lord Kelvin http://www.flickr.com/photos/archeon/2941655917/ http://www.flickr.com/photos/ryanr/142455033/ Why is it important? Are we sure? Errors. Gore et al 1977, Kantoer and Taylor 1994, McGuigan 1995, Hurlbert and White 1993 More than half of all papers contain errors 5‐10% contain errors that change the conclusions Ok, let’s share on request. Doesn’t work self-reported denying a request in last 3 years trainees self-reported denying a request been denied access to data, materials, code authors “not able to retrieve raw data” not willing to release data 0% 10% 20% 30% 40% Campbell et al.  JAMA.  2002. Kyzas et al.  J Natl Cancer Inst.  2005. Vogeli et al.  Acad Med.  2006. Reidpath et al.  Bioethics 2001. Don’t get the email Evangelou et al.  FASEB J.  2006. Wren.  Bioinformatics 2008. Wren et al.  EMBO Rep 2006. Say no Hedstrom.  Society of Am Archivists Ann Meeting.  2008. want to publish more papers first want exclusive use ensure data confidentiality control avoid cost of preparation 0% 10% 20% 30% 40% 50% Ask why Reidpath et al.  Bioethics 2001. `Before I send you the data could I ask what you want it for?' `Can you be more explicit, please, about the analyses you have in  mind and what you plan to do with them?' `We'll have to discuss your request with the other coauthors.   Before we do that, I'd like to know your proposed analysis plan.'  `We are not finished using the data, but when we are finished with  it, we would be open to requests for the data.' `Any use of the data other than for the specific purpose laid down  in the contract of collaboration is effectively ruled out.' Not efficient. Not efficient. Not fair. Campbell et all 2000 Not random: ‐ young ‐ productive Has real costs. Survey of doctoral students and postdocs: 28-50% reported withholding negative effects: • hurt progress of their research, • hurt rate of discovery in their lab/research group, • hurt quality of their relationships with academic scientists, • hurt quality of their education, • hurt level of communication in their lab/research group. Vogeli et al.  Acad Med.  2006 Feb; 81(2):128-36 Ok, then on a website? No.  Urls stop working. Evangelou et al.  FASEB J.  2006. Wren.  Bioinformatics 2008. Wren et al.  EMBO Rep 2006. Ok, in a repository? lots of data sharing! http://www.genome.jp/en/db_growth.html http://www.flickr.com/photos/g_kat26/4255119413/ http://www.flickr.com/photos/jima/606588905/  Combined, these full-text portals reach 85% of the articles available through U of Pittsburgh library subscriptions. microarray data http://en.wikipedia.org/wiki/DNA_microarray http://en.wikipedia.org/wiki/Image:Heatmap.png http://commons.wikimedia.org/wiki/File:DNA_double_helix_vertikal.PNG  11,603 studies that created gene expression microarray data Is research data shared after publication? Funder Journal Investigator Institution Study funded by NIH? size of grant sharing plan req’d? funded by non-NIH? impact factor strength of policy open access? number of microarray studies published years since first paper # pubs # citations previously shared? previously reused? gender sector size impact rank country humans? mice? plants? cancer? clinical trial? number of authors year Funder Journal Investigator Institution Study “An inherent principle of publication is that others should be able to replicate and build upon the authors' published claims. Therefore, a condition of publication in a Nature journal is that authors are required to make materials, data and associated protocols available in a publicly accessible database …” http://www.nature.com/authors/editorial_policies/availability.html http://www.nature.com/nature/journal/v453/n7197/index.html journal data sharing policy journal rank institution rank Yu et al. BMC medical informatics and decision making (2007) vol. 7 pp. 17 funding level PubMed grant lists        + NIH grant details study type author gender and so on... 124 variables 11,603 studies 25% had links from datasets in databases 0 .0 5 0 .1 0 0 .1 5 0 .2 0 0 .2 5 0 .3 0 0 .3 5 Year article published P ro p o rt io n  o f a rt ic le s  w it h  d a ta s e ts  f o u n d  i n  G E O  o r A rr a y E x p re s s 2000 2001 2002 2003 2004 2005 2006 2007 2008 2009 Proportion of articles with shared datasets, by year Across time What can we do about it? What can we do about it? Funder policies. 19% Piwowar and Chapman.  Journal of Informetrics 2010 What can we do about it?  Journal policies. We looked at data sharing policies within Instruction to Author statements of 70 journals, as they apply to gene expression microarray data. Piwowar and Chapman.  ELPUB 2008 No applicable policy (43%) Weak policy (24%) should, recommend, request must, but without requiring database accession number Strong policy (33%) must, required, condition of publication requires database accession number strength of data sharing policies High-impact journals tend to have a strong data-sharing policy Articles published in journals with a strong data-sharing policy are more likely to have publicly available datasets What can we do about it? Learn • Learn from those who do it well • Focus on places that need it P h y s io l G e n o m ic s P L o S  G e n e t G e n o m e  B io l M ic ro b io lo g y P L o S  O n e B M C  G e n o m ic s P la n t C e ll G e n o m e  R e s E u k a ry o t C e ll A p p l E n v ir o n  M ic ro b io l B M C  M e d  G e n o m ic s H u m  M o l G e n e t P ro c  N a tl  A c a d  S c i U  S  A In fe c t Im m u n A m  J  R e s p ir  C e ll M o l B io l D e v  B io l J  B a c te ri o l M o l E n d o c ri n o l B M C  C a n c e r P la n t P h y s io l B io l R e p ro d B lo o d J  I m m u n o l F A S E B  J T o x ic o l S c i J  E x p  B o t N u c le ic  A c id s  R e s D ia b e te s M o l C e ll  B io l M o l C a n c e r T h e r B M C  B io in fo rm a ti c s S te m  C e ll s F E B S  L e tt J  N e u ro s c i A m  J  P a th o l J  B io l C h e m J  V ir o l O T H E R C a n c e r R e s J  C li n  E n d o c ri n o l M e ta b P la n t M o l B io l C li n  C a n c e r R e s G e n o m ic s In v e s t O p h th a lm o l V is  S c i M o l H u m  R e p ro d C a rc in o g e n e s is G e n e E n d o c ri n o lo g y O n c o g e n e C a n c e r L e tt B io c h e m  B io p h y s  R e s  C o m m u n P ro p o rt io n  o f d a ta s e ts  s h a re d 0.0 0.2 0.4 0.6 0.8 1.0 Journals (Physiological Genomics) S ta n fo rd  U n iv e rs it y U n iv e rs it y  o f P e n n s y lv a n ia U n iv e rs it y  o f Il lin o is U n iv e rs it y  o f C a lif o rn ia , L o s  A n g e le s U n iv e rs it y  o f W is c o n s in , M a d is o n U n iv e rs it y  o f W a s h in g to n U n iv e rs it y  o f C a lif o rn ia , D a v is T h e  U n iv e rs it y  o f B ri ti s h  C o lu m b ia U n iv e rs it y  o f C a lif o rn ia , S a n  F ra n c is c o U n iv e rs it y  o f F lo ri d a U n iv e rs it y  o f C a lif o rn ia , S a n  D ie g o U n iv e rs it y  o f M in n e s o ta , T w in  C it ie s B a y lo r C o lle g e  o f M e d ic in e O T H E R M a x  P la n c k  G e s e ll s c h a ft H a rv a rd  U n iv e rs it y D u k e  U n iv e rs it y  M e d ic a l C e n te r Y a le  U n iv e rs it y J o h n s  H o p k in s  U n iv e rs it y U n iv e rs it y  o f P it ts b u rg h W a s h in g to n  U n iv e rs it y  i n  S a in t L o u is U n iv e rs it y  o f T o ro n to U n iv e rs it y  o f C a lif o rn ia , B e rk e le y U n iv e rs it y  o f M ic h ig a n , A n n  A rb o r M ic h ig a n  S ta te  U n iv e rs it y N a ti o n a l C a n c e r In s ti tu te T o k y o  D a ig a k u P ro p o rt io n  o f d a ta s e ts  s h a re d 0.0 0.2 0.4 0.6 0.8 1.0 Institutions (Stanford) 11 0 1 2 0 1 3 0 1 4 0 1 5 0 1 6 0 1 7 0 1 8 0 1 9 0 1 1 0 0 1 1 1 0 1 1 2 0 1 1 3 0 1 1 4 0 1 1 5 0 1 1 6 0 1 1 7 0 1 1 8 0 1 1 9 0 1 P ro p o rt io n  o f d a ta s e ts  s h a re d 0.0 0.2 0.4 0.6 0.8 1.0 Institution rank Odds Ratio 0.25 0.50 1.00 2.00 4.00 8.00 Has journal policy 0.95Count of R01 & other NIH grants Authors prev GEOAE sharing & OA & microarray creation NO K funding or P funding Institution high citations & collaboration Journal impact Journal policy consequences & long halflife NOT animals or mice Instititution is government & NOT higher ed Last author num prev pubs & first year pub Large NIH grant Humans & cancer NO geo reuse + YES high institution output First author num prev pubs & first year pub Multivariate nonlinear regressions with interactions Odds Ratio 0.25 0.50 1.00 2.00 4.00 8.00 Has journal policy 0.95Count of R01 & other NIH grants Authors prev GEOAE sharing & OA & microarray creation NO K funding or P funding Journal impact Journal policy consequences & l ng h lflife Institution high citations & coll boration NOT animals o  mice Instititution is government & NOT highe  ed Last author num prev p bs & first y ar pub Large NIH grant Humans & cancer NO geo reuse + YES high institution output First author num prev pubs & first year pub Multivariate nonlinear regressions with interactions Odds Ratio 0.25 0.50 1.00 2.00 4.00 8.00 Has journal policy 0.95Count of R01 & other NIH grants Authors prev GEOAE sharing & OA & microarray creation NO K funding or P funding Institution high citations & collaboration Journal impact Journal policy consequences & long halflife NOT animals or mice Instititution is government & NOT higher ed Last author num prev pubs & first year pub Large NIH grant Humans & cancer NO geo reuse + YES high institution output First author num prev pubs & first year pub Multivariate nonlinear regressions with interactions Odds Ratio 0.25 0.50 1.00 2.00 4.00 8.00 Has journal policy 0.95Count of R01 & other NIH grants Authors prev GEOAE sharing & OA & microarray creation NO K funding or P funding Journal impact Journal policy consequences & l ng h lflife Institution high citations & coll boration NOT animals o  mice Instititution is government & NOT highe  ed Last author num prev p bs & first y ar pub Large NIH grant Humans & cancer NO geo reuse + YES high institution output First author num prev pubs & first year pub Multivariate nonlinear regressions with interactions Odds Ratio 0.25 0.50 1.00 2.00 4.00 OA journal & previous GEO-AE sharing 0.95Amount of NIH funding Journal impact factor and policy Higher Ed in USA Cancer & humans Multivariate nonlinear regression with interactions Odds Ratio 0.25 0.50 1.00 2.00 4.00 OA journal & previous GEO-AE sharing 0.95Amount of NIH funding Journal impact factor and policy Higher Ed in USA Cancer & humans Multivariate nonlinear regression with interactions Carrot? http://www.flickr.com/photos/sunrise/35819369/ currency of value? Citations. currency of value? Citations. $50! Diamond,Arthur M. What is a Citation Worth?. The Journal of Human Resources (1986) vol. 21 (2) pp. 200-215 dataset 85 cancer microarray trials published in 1999-2003, as identified by Ntzani and Ioannidis (2003) citations ISI Web of Science Citation index, citations from 2004-2005 data sharing locations Publisher and lab websites, microarray databases, WayBack Internet Archive, Oncomine statistics Multivariate linear regression Note: log scale ~70% Next? http://www.flickr.com/photos/gatewaystreets/3838452287/ Impact of JDAP Abadie et al.  Journal of the American Statistical Association 2010 Reuse. http://www.flickr.com/photos/boitabulle/3668162701/ http://upload.wikimedia.org/wikipedia/commons/thumb/e/e6/ Gamma_distribution_pdf.svg/500px-Gamma_distribution_pdf.svg.png   #4 We are the culture. Let’s do it. http://www.flickr.com/photos/joellevand/279468607/ http://www.flickr.com/photos/huzzahvintage/4577075021/ a) in our   communities - strengthening policies: - journal, conference, institutional - decision-makers - role-models and educators b) in our tools - measure opinions - measure use - be transparent! c) with our data - share it. - ugly?  incomplete? strange? “Flawed, but out there” is a million times better than “perfect, but unattainable” http://sciblogs.co.nz/seeing-data/2010/10/12/the-zen-of-open-data/ “Does anyone want your data?  That’s hard to predict […] After all, no one ever knocked on your door asking to buy those figurines collecting dust in your cabinet before you listed them on eBay. Your data, too, may simply be awaiting an effective matchmaker.” Got data?  Nature Neuroscience (2007) I post my data, code, and statistical scripts: http://researchremix.org Share yours too! http://www.flickr.com/photos/myklroventine/892446624/ More info? • OATP oa.data tag  on Connotea, Twi1er • FriendFeed • Mendeley  “data sharing” group • @researchremix  piwowar@zoology.ubc.ca  thank you Todd Vision, Michael Whitlock, Wendy Chapman The open science online community and those who release their articles, datasets and photos openly  http://www.flickr.com/photos/youraddresshere/6649228/

Cite

Citation Scheme:

    

Usage Statistics

Country Views Downloads
United States 28 0
Canada 22 0
China 1 20
City Views Downloads
San Francisco 19 0
Unknown 15 2
Menlo Park 7 0
Vancouver 6 0
Edmonton 1 0
Beijing 1 0
Burnaby 1 0
Ashburn 1 0

{[{ mDataHeader[type] }]} {[{ month[type] }]} {[{ tData[type] }]}

Share

Share to:

Comment

Related Items