UBC Library and Archives

Open Research Data Piwowar, Heather 2010

You don't seem to have a PDF reader installed, try download the pdf

Item Metadata

Download

Media
[if-you-see-this-DO-NOT-CLICK]
Piwowar_Heather_Open_Research_Data.pdf [ 11.92MB ]
[if-you-see-this-DO-NOT-CLICK]
Piwowar_Heather_Open_Research_Data.wmv [ 156.01MB ]
[if-you-see-this-DO-NOT-CLICK]
Metadata
JSON: 1.0077835.json
JSON-LD: 1.0077835+ld.json
RDF/XML (Pretty): 1.0077835.xml
RDF/JSON: 1.0077835+rdf.json
Turtle: 1.0077835+rdf-turtle.txt
N-Triples: 1.0077835+rdf-ntriples.txt
Original Record: 1.0077835 +original-record.json
Full Text
1.0077835.txt
Citation
1.0077835.ris

Full Text

Open research data Heather Piwowar DataONE postdoc with Dryad and NESCent, UBC @researchremix OA week 2010 University of British Columbia#1 It mattershttp://www.metmuseum.org/toah/ho/09/euwf/ho_24.45.1.htmhttp://www.flickr.com/photos/jsmjr/62443357/http://www.flickr.com/photos/camilleharrington/3587294608/http://www.flickr.com/photos/rkuhnau/3318245976/http://www.flickr.com/photos/conformpdx/1796399674/http://www.flickr.com/photos/rkuhnau/3317418699/http://www.flickr.com/photos/zemlinki/261617721/http://www.flickr.com/photos/tracenmatt/3020786491/http://www.flickr.com/photos/the-o/2078239333/http://www.flickr.com/photos/75166820@N00/5318468/#2 Wayfinding + progresshttp://www.flickr.com/photos/paulhami/1020538523//http://www.flickr.com/photos/paulhami/1020538523// Which data?http://www.flickr.com/photos/paulhami/1020538523// Where?http://www.flickr.com/photos/paulhami/1020538523// With whom?http://www.flickr.com/photos/paulhami/1020538523// When?http://www.flickr.com/photos/paulhami/1020538523// Under what terms?http://www.flickr.com/photos/paulhami/1020538523//Find Organize Document Deidentify Format Ask Submit Answer questions Worry about mistakes being found Worry about data being misinterpreted Worry about being scooped Forgo money and IP and prestige???not very motivating.http://www.flickr.com/photos/tonivc/2283676770/ http://www.flickr.com/photos/johnnyvulkan/381941233/a) policies +   expectations - NSF - Joint Data Archiving Policy - BioMed Central - PLoSb) repositories - datatype-based - institution-based - discipline-based - journal-basedc) standards - data licenses - data citation - IDs for datasets, people, entitiesd) part of something   bigger - open government data - citizen science - supplemental materials - dataset-based usage metrics - awards, recognition#3 Is it working?http://www.genome.jp/en/db_growth.html lots of data sharing!but how much isn’t  shared? what isn’t shared? who isn’t sharing it? why not? what can we do  about it? how much does it matter?you can not manage what you do not measure quote:  Lord Kelvin http://www.flickr.com/photos/archeon/2941655917/http://www.flickr.com/photos/ryanr/142455033/Why is it important? Are we sure?Errors. Gore et al 1977, Kantoer and Taylor 1994, McGuigan 1995, Hurlbert and White 1993 More than half of all papers contain errors 5‐10% contain errors that change the conclusionsOk, let’s share on request.Doesn’t work self-reported denying a request in last 3 years trainees self-reported denying a request been denied access to data, materials, code authors “not able to retrieve raw data” not willing to release data 0% 10% 20% 30% 40% Campbell et al.  JAMA.  2002. Kyzas et al.  J Natl Cancer Inst.  2005. Vogeli et al.  Acad Med.  2006. Reidpath et al.  Bioethics 2001.Don’t get the email Evangelou et al.  FASEB J.  2006. Wren.  Bioinformatics 2008. Wren et al.  EMBO Rep 2006.Say no Hedstrom.  Society of Am Archivists Ann Meeting.  2008. want to publish more papers first want exclusive use ensure data confidentiality control avoid cost of preparation 0% 10% 20% 30% 40% 50%Ask why Reidpath et al.  Bioethics 2001. `Before I send you the data could I ask what you want it for?' `Can you be more explicit, please, about the analyses you have in  mind and what you plan to do with them?' `We'll have to discuss your request with the other coauthors.   Before we do that, I'd like to know your proposed analysis plan.'  `We are not finished using the data, but when we are finished with  it, we would be open to requests for the data.' `Any use of the data other than for the specific purpose laid down  in the contract of collaboration is effectively ruled out.'Not efficient.Not efficient.  Not fair. Campbell et all 2000 Not random: ‐ young ‐ productiveHas real costs. Survey of doctoral students and postdocs: 28-50% reported withholding negative effects: • hurt progress of their research, • hurt rate of discovery in their lab/research group, • hurt quality of their relationships with academic scientists, • hurt quality of their education, • hurt level of communication in their lab/research group. Vogeli et al.  Acad Med.  2006 Feb; 81(2):128-36Ok, then on a website? No.  Urls stop working. Evangelou et al.  FASEB J.  2006. Wren.  Bioinformatics 2008. Wren et al.  EMBO Rep 2006.Ok, in a repository? lots of data sharing! http://www.genome.jp/en/db_growth.html http://www.flickr.com/photos/g_kat26/4255119413/http://www.flickr.com/photos/jima/606588905/Combined, these full-text portals reach 85% of the articles available through U of Pittsburgh library subscriptions.microarray data http://en.wikipedia.org/wiki/DNA_microarray http://en.wikipedia.org/wiki/Image:Heatmap.png http://commons.wikimedia.org/wiki/File:DNA_double_helix_vertikal.PNG11,603 studies that created gene expression microarray dataIs research data shared after publication? Funder Journal Investigator Institution Studyfunded by NIH? size of grant sharing plan req’d? funded by non-NIH? impact factor strength of policy open access? number of microarray studies published years since first paper # pubs # citations previously shared? previously reused? gender sector size impact rank country humans? mice? plants? cancer? clinical trial? number of authors year Funder Journal Investigator Institution Study“An inherent principle of publication is that others should be able to replicate and build upon the authors' published claims. Therefore, a condition of publication in a Nature journal is that authors are required to make materials, data and associated protocols available in a publicly accessible database …” http://www.nature.com/authors/editorial_policies/availability.html http://www.nature.com/nature/journal/v453/n7197/index.html journal data sharing policyjournal rankinstitution rank Yu et al. BMC medical informatics and decision making (2007) vol. 7 pp. 17funding level PubMed grant lists        + NIH grant detailsstudy typeauthor genderand so on... 124 variables11,603 studies 25% had links from datasets in databases0.0 5 0.1 0 0.1 5 0.2 0 0.2 5 0.3 0 0.3 5 Year article published Proportion of articles with datasets found in GEO or ArrayExpres s 2000 2001 2002 2003 2004 2005 2006 2007 2008 2009 Proportion of articles with shared datasets, by year Across timeWhat can we do about it?  What can we do about it?  Funder policies.  19% Piwowar and Chapman.  Journal of Informetrics 2010 What can we do about it?  Journal policies.We looked at data sharing policies within Instruction to Author statements of 70 journals, as they apply to gene expression microarray data. Piwowar and Chapman.  ELPUB 2008 No applicable policy (43%) Weak policy (24%) should, recommend, request must, but without requiring database accession number Strong policy (33%) must, required, condition of publication requires database accession number strength of data sharing policiesHigh-impact journals  tend to have  a strong data-sharing policy Articles published in journals with a strong data-sharing policy are more likely to have publicly available datasetsWhat can we do about it?  Learn • Learn from those who do it well • Focus on places that need itPhysiol Genomic s PLoS Gene t Genome Bio l Microbiolog y PLoS On e BMC Genomic s Plant Cel l Genome Re s Eukaryot Cel l Appl Environ Microbio l BMC Med Genomic s Hum Mol Gene t Proc Natl Acad Sci U S A Infect Immun Am J Respir Cell Mol Bio l Dev Bio l J Bacterio l Mol Endocrino l BMC Cance r Plant Physio l Biol Repro d Blood J Immuno l FASEB J Toxicol Sc i J Exp Bo t Nucleic Acids Re s Diabetes Mol Cell Bio l Mol Cancer The r BMC Bioinformatic s Stem Cell s FEBS Let t J Neurosc i Am J Patho l J Biol Che m J Viro l OTHE R Cancer Re s J Clin Endocrinol Meta b Plant Mol Bio l Clin Cancer Re s Genomic s Invest Ophthalmol Vis Sc i Mol Hum Repro d Carcinogenesi s Gene Endocrinolog y Oncogen e Cancer Let t Biochem Biophys Res Commu n Proportion of datasets share d 0.0 0.2 0.4 0.6 0.8 1.0 Journals (Physiological Genomics)Stanford Universit y University of Pennsylvani a University of Illinoi s University of California, Los Angele s University of Wisconsin, Madiso n University of Washingto n University of California, Davi s The University of British Columbi a University of California, San Francisc o University of Florid a University of California, San Dieg o University of Minnesota, Twin Citie s Baylor College of Medicin e OTHE R Max Planck Gesellschaf t Harvard Universit y Duke University Medical Center  Yale Universit y Johns Hopkins Universit y University of Pittsburg h Washington University in Saint Loui s University of Toront o University of California, Berkele y University of Michigan, Ann Arbo r Michigan State Universit y National Cancer Institut e Tokyo Daigak u Proportion of datasets share d 0.0 0.2 0.4 0.6 0.8 1.0 Institutions (Stanford)1 10 1 20 1 301 401 50 1 601 701 801 901 100 1 1101 120 1 1301 140 1 150 1 1601 1701 180 1 190 1 Proportion of datasets share d 0.0 0.2 0.4 0.6 0.8 1.0 Institution rankOdds Ratio 0.25 0.50 1.00 2.00 4.00 8.00 Has journal policy 0.95Count of R01 & other NIH grants Authors prev GEOAE sharing & OA & microarray creation NO K funding or P funding Institution high citations & collaboration Journal impact Journal policy consequences & long halflife NOT animals or mice Instititution is government & NOT higher ed Last author num prev pubs & first year pub Large NIH grant Humans & cancer NO geo reuse + YES high institution output First author num prev pubs & first year pub Multivariate nonlinear regressions with interactions Odds Ratio 0.25 0.50 1.00 2.00 4.00 8.00 Has journal policy 0.95Count of R01 & other NIH grants Authors prev GEOAE sharing & OA & microarray creation NO K funding or P funding Journal impact Journal policy consequences & long halflife Institution high citations & collaboration NOT animals or mice Instititution is government & NOT higher ed Last author num prev pubs & first year pub Large NIH grant Humans & cancer NO geo reuse + YES high institution output First author num prev pubs & first year pub Multivariate nonlinear regressions with interactionsOdds Ratio 0.25 0.50 1.00 2.00 4.00 8.00 Has journal policy 0.95Count of R01 & other NIH grants Authors prev GEOAE sharing & OA & microarray creation NO K funding or P funding Institution high citations & collaboration Journal impact Journal policy consequences & long halflife NOT animals or mice Instititution is government & NOT higher ed Last author num prev pubs & first year pub Large NIH grant Humans & cancer NO geo reuse + YES high institution output First author num prev pubs & first year pub Multivariate nonlinear regressions with interactions Odds Ratio 0.25 0.50 1.00 2.00 4.00 8.00 Has journal policy 0.95Count of R01 & other NIH grants Authors prev GEOAE sharing & OA & microarray creation NO K funding or P funding Journal impact Journal policy consequences & long halflife Institution high citations & collaboration NOT animals or mice Instititution is government & NOT higher ed Last author num prev pubs & first year pub Large NIH grant Humans & cancer NO geo reuse + YES high institution output First author num prev pubs & first year pub Multivariate nonlinear regressions with interactionsOdds Ratio 0.25 0.50 1.00 2.00 4.00 OA journal & previous GEO-AE sharing 0.95Amount of NIH funding Journal impact factor and policy Higher Ed in USA Cancer & humans Multivariate nonlinear regression with interactionsOdds Ratio 0.25 0.50 1.00 2.00 4.00 OA journal & previous GEO-AE sharing 0.95Amount of NIH funding Journal impact factor and policy Higher Ed in USA Cancer & humans Multivariate nonlinear regression with interactionsCarrot? http://www.flickr.com/photos/sunrise/35819369/currency of value? Citations.currency of value? Citations. $50! Diamond,Arthur M. What is a Citation Worth?. The Journal of Human Resources (1986) vol. 21 (2) pp. 200-215dataset 85 cancer microarray trials published in 1999-2003, as identified by Ntzani and Ioannidis (2003) citations ISI Web of Science Citation index, citations from 2004-2005 data sharing locations Publisher and lab websites, microarray databases, WayBack Internet Archive, Oncomine statistics Multivariate linear regressionNote: log scale~70%Next? http://www.flickr.com/photos/gatewaystreets/3838452287/Impact of JDAP Abadie et al.  Journal of the American Statistical Association 2010Reuse. http://www.flickr.com/photos/boitabulle/3668162701/http://upload.wikimedia.org/wikipedia/commons/thumb/e/e6/ Gamma_distribution_pdf.svg/500px-Gamma_distribution_pdf.svg.png#4 We are the culture. Let’s do it.http://www.flickr.com/photos/joellevand/279468607/http://www.flickr.com/photos/huzzahvintage/4577075021/a) in our   communities - strengthening policies: - journal, conference, institutional - decision-makers - role-models and educatorsb) in our tools - measure opinions - measure use - be transparent!c) with our data - share it. - ugly?  incomplete? strange? “Flawed, but out there” is a million times better than “perfect, but unattainable” http://sciblogs.co.nz/seeing-data/2010/10/12/the-zen-of-open-data/“Does anyone want your data?  That’s hard to predict […] After all, no one ever knocked on your door asking to buy those figurines collecting dust in your cabinet before you listed them on eBay. Your data, too, may simply be awaiting an effective matchmaker.” Got data?  Nature Neuroscience (2007)I post my data, code, and statistical scripts:  http://researchremix.org Share yours too! http://www.flickr.com/photos/myklroventine/892446624/More info? • OATP oa.data tag  on Connotea, Twi1er • FriendFeed • Mendeley  “data sharing” group • @researchremix  piwowar@zoology.ubc.ca thank you Todd Vision, Michael Whitlock, Wendy Chapman The open science online community and those who release their articles, datasets and photos openlyhttp://www.flickr.com/photos/youraddresshere/6649228/

Cite

Citation Scheme:

    

Usage Statistics

Country Views Downloads
United States 43 0
Canada 23 0
Finland 2 0
Spain 1 0
Mexico 1 0
China 1 20
City Views Downloads
San Francisco 26 0
Unknown 17 2
Menlo Park 7 0
Vancouver 6 0
Ashburn 4 0
Wilmington 2 0
Turku 2 0
Edmonton 2 0
Morgantown 1 0
Monterrey 1 0
Burnaby 1 0
Alginet 1 0
Beijing 1 0

{[{ mDataHeader[type] }]} {[{ month[type] }]} {[{ tData[type] }]}
Download Stats

Share

Embed

Customize your widget with the following options, then copy and paste the code below into the HTML of your page to embed this item in your website.
                        
                            <div id="ubcOpenCollectionsWidgetDisplay">
                            <script id="ubcOpenCollectionsWidget"
                            src="{[{embed.src}]}"
                            data-item="{[{embed.item}]}"
                            data-collection="{[{embed.collection}]}"
                            data-metadata="{[{embed.showMetadata}]}"
                            data-width="{[{embed.width}]}"
                            async >
                            </script>
                            </div>
                        
                    
IIIF logo Our image viewer uses the IIIF 2.0 standard. To load this item in other compatible viewers, use this url:
http://iiif.library.ubc.ca/presentation/dsp.2689.1-0077835/manifest

Comment

Related Items