UBC Library and Archives

Open Research Data Piwowar, Heather 2010

You don't seem to have a PDF reader installed, try download the pdf

Item Metadata

Download

Media
[if-you-see-this-DO-NOT-CLICK]
Piwowar_Heather_Open_Research_Data.pdf [ 11.92MB ]
[if-you-see-this-DO-NOT-CLICK]
Piwowar_Heather_Open_Research_Data.wmv [ 156.01MB ]
Metadata
JSON: 1.0077835.json
JSON-LD: 1.0077835+ld.json
RDF/XML (Pretty): 1.0077835.xml
RDF/JSON: 1.0077835+rdf.json
Turtle: 1.0077835+rdf-turtle.txt
N-Triples: 1.0077835+rdf-ntriples.txt
Original Record: 1.0077835 +original-record.json
Full Text
1.0077835.txt
Citation
1.0077835.ris

Full Text

Open research data Heather Piwowar  DataONE postdoc with Dryad and NESCent, UBC @researchremix OA week 2010 University of British Columbia  #1 It matters  http://www.metmuseum.org/toah/ho/09/euwf/ho_24.45.1.htm  http://www.flickr.com/photos/jsmjr/62443357/  http://www.flickr.com/photos/camilleharrington/3587294608/  http://www.flickr.com/photos/rkuhnau/3318245976/  http://www.flickr.com/photos/conformpdx/1796399674/  http://www.flickr.com/photos/rkuhnau/3317418699/  http://www.flickr.com/photos/zemlinki/261617721/  http://www.flickr.com/photos/tracenmatt/3020786491/  http://www.flickr.com/photos/the-o/2078239333/  http://www.flickr.com/photos/75166820@N00/5318468/  #2 Wayfinding + progress  http://www.flickr.com/photos/paulhami/1020538523//  Which data?  http://www.flickr.com/photos/paulhami/1020538523//  Where?  http://www.flickr.com/photos/paulhami/1020538523//  With whom?  http://www.flickr.com/photos/paulhami/1020538523//  When?  http://www.flickr.com/photos/paulhami/1020538523//  Under what terms?  http://www.flickr.com/photos/paulhami/1020538523//  http://www.flickr.com/photos/paulhami/1020538523//  Find Organize Document Deidentify Format Ask Submit Answer questions Worry about mistakes being found Worry about data being misinterpreted Worry about being scooped Forgo money and IP and prestige???  not very motivating.  http://www.flickr.com/photos/johnnyvulkan/381941233/ http://www.flickr.com/photos/tonivc/2283676770/  a) policies + expectations - NSF - Joint Data Archiving Policy - BioMed Central - PLoS  b) repositories  - datatype-based - institution-based - discipline-based - journal-based  c) standards  - data licenses - data citation - IDs for datasets, people, entities  d) part of something bigger - open government data - citizen science - supplemental materials - dataset-based usage metrics - awards, recognition  #3 Is it working?  lots of data sharing!  http://www.genome.jp/en/db_growth.html  but how much isn’t  shared? what isn’t shared? why not?  who isn’t sharing it?  how much does it matter? what can we do  about it?  you can not manage what you do not measure  quote: Lord Kelvin http://www.flickr.com/photos/archeon/2941655917/  http://www.flickr.com/photos/ryanr/142455033/  Why is it important? Are we sure?  Errors. More than half of all papers contain errors 5‐10% contain errors that change the conclusions  Gore et al 1977, Kantoer and Taylor 1994, McGuigan 1995, Hurlbert and White 1993  Ok, let’s share on request.  Doesn’t work self-reported denying a request in last 3 years trainees self-reported denying a request been denied access to data, materials, code authors “not able to retrieve raw data” not willing to release data 0%  10%  20%  30%  40%  Campbell et al. JAMA. 2002. Kyzas et al. J Natl Cancer Inst. 2005. Vogeli et al. Acad Med. 2006. Reidpath et al. Bioethics 2001.  Don’t get the email  Evangelou et al.  FASEB J.  2006. Wren.  Bioinformatics 2008. Wren et al.  EMBO Rep 2006.  Say no want to publish more papers first want exclusive use ensure data confidentiality control avoid cost of preparation 0%  10% 20% 30% 40% 50%  Hedstrom. Society of Am Archivists Ann Meeting. 2008.  Ask why `Before I send you the data could I ask what you want it for?' `Can you be more explicit, please, about the analyses you have in  mind and what you plan to do with them?' `We'll have to discuss your request with the other coauthors.   Before we do that, I'd like to know your proposed analysis plan.'  `We are not finished using the data, but when we are finished with  it, we would be open to requests for the data.' `Any use of the data other than for the specific purpose laid down  in the contract of collaboration is effectively ruled out.'  Reidpath et al. Bioethics 2001.  Not efficient.  Not efficient. Not fair. Not random: ‐ young ‐ productive  Campbell et all 2000  Has real costs. Survey of doctoral students and postdocs: 28-50% reported withholding negative effects: • hurt progress of their research, • hurt rate of discovery in their lab/research group, • hurt quality of their relationships with academic scientists, • hurt quality of their education, • hurt level of communication in their lab/research group. Vogeli et al. Acad Med. 2006 Feb; 81(2):128-36  Ok, then on a website? No. Urls stop working.  Evangelou et al.  FASEB J.  2006. Wren.  Bioinformatics 2008. Wren et al.  EMBO Rep 2006.  Ok, in a repository?  lots of data sharing!  http://www.genome.jp/en/db_growth.html  http://www.flickr.com/photos/g_kat26/4255119413/  http://www.flickr.com/photos/jima/606588905/  Combined, these full-text portals reach 85% of the articles available through U of Pittsburgh library subscriptions.  microarray data  http://en.wikipedia.org/wiki/DNA_microarray http://en.wikipedia.org/wiki/Image:Heatmap.png  http://commons.wikimedia.org/wiki/File:DNA_double_helix_vertikal.PNG  11,603 studies that created gene expression microarray data  Funder  Journal  Investigator  Institution  Is research data shared after publication?  Study  Funder  Journal  Investigator  funded by NIH?  impact factor  years since first paper  size of grant  strength of policy  # pubs  sharing plan req’d?  open access?  funded by non-NIH?  number of microarray studies published  # citations previously shared? previously reused? gender  Institution  Study  sector  humans?  size  mice?  impact rank  plants?  country  cancer? clinical trial? number of authors year  journal data sharing policy  “An inherent principle of publication is that others should be able to replicate and build upon the authors' published claims. Therefore, a condition of publication in a Nature journal is that authors are required to make materials, data and associated protocols available in a publicly accessible database …”  http://www.nature.com/authors/editorial_policies/availability.html http://www.nature.com/nature/journal/v453/n7197/index.html  journal rank  institution rank  Yu et al. BMC medical informatics and decision making (2007) vol. 7 pp. 17  funding level PubMed grant lists  + NIH grant details  study type  author gender  and so on... 124 variables  11,603 studies 25% had links from datasets in databases  0.35 0.30 0.25 0.20 0.15 0.10  Across time  0.05  Proportion of articles with datasets found in GEO or ArrayExpress  Proportion of articles with shared datasets, by year  2000  2001  2002  2003  2004  2005  2006  Year article published  2007  2008  2009  What can we do about it?  What can we do about it? Funder policies.  19%  Piwowar and Chapman. Journal of Informetrics 2010  What can we do about it? Journal policies.  We looked at data sharing policies within Instruction to Author statements of 70 journals, as they apply to gene expression microarray data.  Piwowar and Chapman. ELPUB 2008  strength of data sharing policies No applicable policy (43%) Weak policy (24%) should, recommend, request must, but without requiring database accession number Strong policy (33%) must, required, condition of publication requires database accession number  High-impact journals tend to have a strong data-sharing policy  Articles published in journals with a strong data-sharing policy are more likely to have publicly available datasets  What can we do about it? Learn • Learn from those who do it well • Focus on places that need it  0.8  0.6  Physiol Genomics PLoS Genet Genome Biol Microbiology PLoS One BMC Genomics Plant Cell Genome Res Eukaryot Cell Appl Environ Microbiol BMC Med Genomics Hum Mol Genet Proc Natl Acad Sci U S A Infect Immun Am J Respir Cell Mol Biol Dev Biol J Bacteriol Mol Endocrinol BMC Cancer Plant Physiol Biol Reprod Blood J Immunol FASEB J Toxicol Sci J Exp Bot Nucleic Acids Res Diabetes Mol Cell Biol Mol Cancer Ther BMC Bioinformatics Stem Cells FEBS Lett J Neurosci Am J Pathol J Biol Chem J Virol OTHER Cancer Res J Clin Endocrinol Metab Plant Mol Biol Clin Cancer Res Genomics Invest Ophthalmol Vis Sci Mol Hum Reprod Carcinogenesis Gene Endocrinology Oncogene Cancer Lett Biochem Biophys Res Commun  Proportion of datasets shared 1.0  Journals  (Physiological Genomics)  0.4  0.2  0.0  Tokyo Daigaku  National Cancer Institute  Michigan State University  University of Michigan, Ann Arbor  University of California, Berkeley  University of Toronto  Washington University in Saint Louis  University of Pittsburgh  Johns Hopkins University  0.6  Yale University  0.8  Duke University Medical Center  Harvard University  Max Planck Gesellschaft  OTHER  Baylor College of Medicine  University of Minnesota, Twin Cities  University of California, San Diego  University of Florida  University of California, San Francisco  The University of British Columbia  University of California, Davis  University of Washington  University of Wisconsin, Madison  University of California, Los Angeles  University of Illinois  University of Pennsylvania  Stanford University  Proportion of datasets shared 1.0  Institutions  (Stanford)  0.4  0.2  0.0  1901  1801  1701  1601  1501  1401  1301  1201  1101  0.8  1001  901  801  701  601  501  401  301  201  101  1  Proportion of datasets shared 1.0  Institution rank  0.6  0.4  0.2  0.0  Multivariate nonlinear regressions with interactions Odds Ratio 0.25  0.50  1.00  2.00  Has journal policy  0.50  1.00  2.00  0.95  Count of 0.25 Authors prev GEOAE sharing & OA & microarray creation Has journal policy NO KCount funding or P of R01 & other NIHfunding grants Authors prev GEOAE sharing & OA & microarray creation impact NO K Journal funding or P funding Institution high citations & collaboration Journal policy consequences & Journal longimpact halflife Journal policy consequences & long halflife Institution high citations NOT & collaboration animals or mice Instititution is government & NOT higher ed Last authorNOT num prevanimals pubs & first year orpub mice Large NIH grant Humans & cancer ed Instititution is government & NOT higher NO geo reuse + YES high institution output authorpubs num prev& pubs & firstyear year pubpub Last author numFirstprev first Large NIH grant Humans & cancer NO geo reuse + YES high institution output First author num prev pubs & first year pub  0.95  Multivariate nonlinear regressions with interactions Odds Ratio R01 & other NIH grants 4.00  8.00  4.00  8.00  Multivariate nonlinear regressions with interactions Odds Ratio 0.25  0.50  1.00  2.00  Has journal policy  0.50  1.00  2.00  0.95  Count of 0.25 Authors prev GEOAE sharing & OA & microarray creation Has journal policy NO KCount funding or P of R01 & other NIHfunding grants Authors prev GEOAE sharing & OA & microarray creation impact NO K Journal funding or P funding Institution high citations & collaboration Journal policy consequences & Journal longimpact halflife Journal policy consequences & long halflife Institution high citations NOT & collaboration animals or mice Instititution is government & NOT higher ed Last authorNOT num prevanimals pubs & first year orpub mice Large NIH grant Humans & cancer ed Instititution is government & NOT higher NO geo reuse + YES high institution output authorpubs num prev& pubs & firstyear year pubpub Last author numFirstprev first Large NIH grant Humans & cancer NO geo reuse + YES high institution output First author num prev pubs & first year pub  0.95  Multivariate nonlinear regressions with interactions Odds Ratio R01 & other NIH grants 4.00  8.00  4.00  8.00  Multivariate nonlinear regression with interactions Odds Ratio 0.25  0.50  1.00  Amount of NIH funding Journal impact factor and policy Higher Ed in USA Cancer & humans  0.95  OA journal & previous GEO-AE sharing  2.00  4.00  Multivariate nonlinear regression with interactions Odds Ratio 0.25  0.50  1.00  Amount of NIH funding Journal impact factor and policy Higher Ed in USA Cancer & humans  0.95  OA journal & previous GEO-AE sharing  2.00  4.00  Carrot?  http://www.flickr.com/photos/sunrise/35819369/  currency of value? Citations.  currency of value? Citations. $50!  Diamond,Arthur M. What is a Citation Worth?. The Journal of Human Resources (1986) vol. 21 (2) pp. 200-215  dataset 85 cancer microarray trials published in 1999-2003, as identified by Ntzani and Ioannidis (2003) citations ISI Web of Science Citation index, citations from 2004-2005 data sharing locations Publisher and lab websites, microarray databases, WayBack Internet Archive, Oncomine statistics Multivariate linear regression  Note: log scale  ~70%  Next?  http://www.flickr.com/photos/gatewaystreets/3838452287/  Abadie et al. Journal of the American Statistical Association 2010  http://www.flickr.com/photos/boitabulle/3668162701/  http://upload.wikimedia.org/wikipedia/commons/thumb/e/e6/ Gamma_distribution_pdf.svg/500px-Gamma_distribution_pdf.svg.png  #4 We are the culture. Let’s do it.  http://www.flickr.com/photos/joellevand/279468607/  http://www.flickr.com/photos/huzzahvintage/4577075021/  a) in our communities - strengthening policies: - journal, conference, institutional - decision-makers - role-models and educators  b) in our tools  - measure opinions - measure use - be transparent!  c) with our data - share it. - ugly? incomplete? strange? “Flawed, but out there” is a million times better than “perfect, but unattainable” http://sciblogs.co.nz/seeing-data/2010/10/12/the-zen-of-open-data/  “Does anyone want your data? That’s hard to predict […] After all, no one ever knocked on your door asking to buy those figurines collecting dust in your cabinet before you listed them on eBay. Your data, too, may simply be awaiting an effective matchmaker.” Got data? Nature Neuroscience (2007)  I post my data, code, and statistical scripts: http://researchremix.org Share yours too!  http://www.flickr.com/photos/myklroventine/892446624/  More info? • OATP oa.data tag   on Connotea, Twi1er  • FriendFeed • Mendeley   “data sharing” group  • @researchremix   piwowar@zoology.ubc.ca   thank you Todd Vision, Michael Whitlock, Wendy Chapman The open science online community and those who release their articles, datasets and photos openly  http://www.flickr.com/photos/youraddresshere/6649228/  

Cite

Citation Scheme:

    

Usage Statistics

Country Views Downloads
United States 51 0
Canada 28 0
China 2 22
Finland 2 0
Spain 1 0
Mexico 1 0
City Views Downloads
San Francisco 27 0
Unknown 19 10
Menlo Park 7 0
Vancouver 7 0
Ashburn 5 0
Oakville 3 0
Edmonton 2 0
Philadelphia 2 0
Turku 2 0
Wilmington 2 0
Alginet 1 0
Shenzhen 1 22
Cupertino 1 0

{[{ mDataHeader[type] }]} {[{ month[type] }]} {[{ tData[type] }]}
Download Stats

Share

Embed

Customize your widget with the following options, then copy and paste the code below into the HTML of your page to embed this item in your website.
                        
                            <div id="ubcOpenCollectionsWidgetDisplay">
                            <script id="ubcOpenCollectionsWidget"
                            src="{[{embed.src}]}"
                            data-item="{[{embed.item}]}"
                            data-collection="{[{embed.collection}]}"
                            data-metadata="{[{embed.showMetadata}]}"
                            data-width="{[{embed.width}]}"
                            async >
                            </script>
                            </div>
                        
                    
IIIF logo Our image viewer uses the IIIF 2.0 standard. To load this item in other compatible viewers, use this url:
http://iiif.library.ubc.ca/presentation/dsp.2689.1-0077835/manifest

Comment

Related Items