Open Collections

UBC Library and Archives

Keynote event : the case for Open Data and eScience : establishing a university data management program… Choudhury, G. Sayeed 2010-10-22

Your browser doesn't seem to have a PDF viewer, please download the PDF to view this item.

Notice for Google Chrome users:
If you are having trouble viewing or searching the PDF with Google Chrome, please download it here instead.

Item Metadata


67656-Choudhury_Sayeed_Keynote_Event_The_Case_for_Open_Data.wmv [ 242.87MB ]
67656-OAW2010_and_BCRLG.JPG [ 13.56kB ]
67656-Choudhury_Sayeed_The_Case_for_Open_Data_and_eScience.pdf [ 1.67MB ]
JSON: 67656-1.0058456.json
JSON-LD: 67656-1.0058456-ld.json
RDF/XML (Pretty): 67656-1.0058456-rdf.xml
RDF/JSON: 67656-1.0058456-rdf.json
Turtle: 67656-1.0058456-turtle.txt
N-Triples: 67656-1.0058456-rdf-ntriples.txt
Original Record: 67656-1.0058456-source.json
Full Text

Full Text

The Case for Open Data and eScience – Establishing a University Data Management Program at Johns Hopkins Sayeed Choudhury BCRLG/Open Access Week October 22, 2010 Vision In the beginning… • Digital Knowledge Center founded in 1997 • Mission specifically emphasized research and development • Non-traditional manager, staff and culture • Early grants from US National Science Foundation, Andrew W. Mellon Foundation and US Institute of Museum and Library Services Early principles • Automated systems instead of automation • Emphasis on new processes that raised human involvement or intervention to higher level • Engagement with new communities or researchers • Diversity of funding sources including venture capital group and corporate Creative Tension • Cultural dissonance • Challenges of managing R&D projects within operational environment • Benefits of managing R&D projects within operational environment…service oriented R&D • Gaining the respect of faculty and associated credibility Initial Projects Meanwhile… • The faculty vanguards were pushing their own frontiers • Growing interest in digital collections and services • Little emphasis on infrastructure • Initially inspired by specific disciplinary problems or needs  A Repository by Any Other Name… • With funding from the Mellon Foundation, we conducted an analysis of DSpace, Fedora, and Digital Commons • Locally, we deferred our specific choice while we conducted the analysis • We engaged the community by gathering use cases • Ultimately, we made a better choice – not without controversy Pixel data collected by telescope Sent to Fermilab for processing Data Flow (Levels of Data) Beowulf Cluster produces catalog Loaded in a SQL database Data and Publication Curation Author Publisher Archive JHU-based eResearch …not a rigid road map but principles of navigation. There is no one way to design cyberinfrastructure, but there are tools we can teach the designers to help them appreciate the true size of the solution space – which is often much larger than they may think, if they are tied into technical fixes for all problems. “…The natural path of industrialization: invention, propagation, adoption, control” -- Chris Anderson, Wired Magazine Data Conservancy • One of two current awards through the US National Science Foundation’s DataNet program • 5 year, $20 million award • Second phase of program could result in three additional awards forming DataNet federation Data Conservancy partners Data Curation The Data Conservancy embraces a shared vision: data curation is a means to collect, organize, validate and preserve data so that scientists can find new ways to address the grand research challenges that face society. OAIS Mapping to DC Architecture Technological Results • OAIS based architecture and PLANETS based data model • Storage framework • Ingested data from Sloan Digital Sky Survey and Dry Valleys Project • Integration with services from: Domain coverage/methods • Multi-site user research methods are a blend of: – Case study & domain comparisons – Depth & breadth – Local & global Astronomy Earth Sciences Life Sciences Social Sciences UCAR Task-based design and usability testing ⇒ Use cases, data requirements, system recommendations UCAR UCLA Ethnography, virtual ethnography, oral histories ⇒ Use cases, data requirements Interviews, Surveys, Worksheets, Content analysis ⇒ Curation requirements, taxonomy, metadata/provenance framework UIUC Information science research Educational Results • Data Curation Summer Institute at Illinois • New courses at Illinois and UCLA • Illinois has a Data Curation Education Program • Data scientist within Sheridan Libraries at Johns Hopkins Sustainability • Incorporate findings from Blue Ribbon Task Force on Sustainable Digital Preservation and Access • Memoranda of Understanding with Astrophysical Research Consortium (and Walters Art Museum) • Carey Business School capstone projects • Data management plans • “Business” partners Lessons Learned – Innovation • Innovation arises from chaos – Urgency is the mother of innovation • One or a few individuals typically initiate innovation, but only an organizational commitment will foster and advance innovation • Innovation requires courage, including acceptance of “failure” • Innovation can lead to long-term, unanticipated results but those aren’t the drivers for initial activity What allowed us to be innovative? • Leadership being willing to support yet defer – “You are the expert” • Trust between decision-makers and the doers • Funding – provides the “release” time and validation • Knowing when to lead and when to follow • Being aware of global issues while being cognizant of local needs Lessons Learned – Cultural • Is there a single library culture? • Perhaps even more important question: Is there a “correct” library culture? • It’s probably more important to unlearn than it is to learn • Human interoperability is a lot harder than machine interoperability • Librarians can become the human dimension of infrastructure Data Management Program • Critical to identify needs and associated requirements • Focus on service provision, but consider scientific data as the new special collections • Essential to engage faculty or researcher champions who become ambassadors – Institute for Data Intensive Engineering and Science (IDIES) • Choose carefully – scope is important Vision Thank you! • Questions? • Comments? • Suggestions? • “The future is already here.  It’s just not widely distributed yet.” – William Gibson


Citation Scheme:


Citations by CSL (citeproc-js)

Usage Statistics



Customize your widget with the following options, then copy and paste the code below into the HTML of your page to embed this item in your website.
                            <div id="ubcOpenCollectionsWidgetDisplay">
                            <script id="ubcOpenCollectionsWidget"
                            async >
IIIF logo Our image viewer uses the IIIF 2.0 standard. To load this item in other compatible viewers, use this url:


Related Items