Open Collections

UBC Library and Archives

Geodisy Project : Search Canadian research data data by location Barsky, Eugene 2019-11

Your browser doesn't seem to have a PDF viewer, please download the PDF to view this item.

Notice for Google Chrome users:
If you are having trouble viewing or searching the PDF with Google Chrome, please download it here instead.

Item Metadata


494-Barsky_Eugene_Geodisy_Canadian_2019.pdf [ 1.22MB ]
JSON: 494-1.0385552.json
JSON-LD: 494-1.0385552-ld.json
RDF/XML (Pretty): 494-1.0385552-rdf.xml
RDF/JSON: 494-1.0385552-rdf.json
Turtle: 494-1.0385552-turtle.txt
N-Triples: 494-1.0385552-rdf-ntriples.txt
Original Record: 494-1.0385552-source.json
Full Text

Full Text

Geodisy Project - Search Canadian research data data by locationEugene Barsky, UBCNovember 2019; Funded by CANARIE (RDM-059)2We aim for Canadian research data to be searched, filtered, and browsed using geographic locations ● Search results are driven by an interactive map● Location is the primary search facet, linking resources from a similar area● Relies less on textual searching, which is not ideal for spatial data34Why use it?● Data can be difficult to find! When searching for data about a particular place, keywords can be hit or miss. A text search might look something like this: ((British Columbia OR BC OR B.C.) N2 (north*)) OR (Alaska N2 south*) OR (Yukon N2 south*) OR (Glacier Bay ADJ2 (park or preserve)) OR (Tatshenshini-Alsek ADJ2 park) OR (Kluane ADJ2 (park OR reserve)) OR (Atlin ADJ2 (park OR recreation area) OR …  5Geodisy will show you where, in addition to what (expected GeoBlacklight user interface - NYU Spatial Data Repository)6Example record page (expected GeoBlacklight user interface - NYU Spatial Data Repository)7Why is it important?Geodisy is open to all users and will benefit any research area that has use for location-based discovery, including climate change, community development, public health, conservation, journalism, and many more8Geospatial discovery is possible using metadata● Metadata = information that describes a resource● Geospatial data = machine readable using a GIS● Quasi-geospatial data = data with a location component that isn’t true geospatial data*● Bounding boxes = rectangles representing the spatial extent of a data set*to generate bounding boxes from quasi-geospatial data we are using 9Geodisy1 (re-)uses 3 main open-source software components● Dataverse2: Research data repository +● GeoServer: Server for publishing and distributing geospatial data+● GeoBlacklight: Geospatial discovery layer1 Geodisy source code and documentation is available in github - For the initial step, for March 2020, Geodisy is funded to work with Canadian Dataverses only10Geodisy Architecture11Project pipeline (in steps):1. Software will query datasets from the Scholars Portal Dataverses (and later fromUAlberta, Dal, UNB, UManitoba, etc.) to determine which have geospatial information2. Software will harvest metadata from relevant quasi-geospatial datasets3. Software will harvest both metadata and data files from geospatial datasets4. Software will enrich the metadata with bounding boxes using GDAL (or from metadata or Geonames, if needed) and other information5. Software will transform metadata to more universal standards (ISO 19115 and GeoBlacklightJSON)6. Software will deposit geospatial data into Geoserver7. Software will deposit geospatial and quasi-geospatial metadata into OpenGeoMetadata8. Metadata will be harvested by GeoBlacklight for discovery12● Without bounding boxes, Geodisy cannot function○ GeoBlacklight uses bounding boxes for discovery● All datasets going through the Geodisy pipeline will either have bounding box coordinates entered into the metadata by the depositor or a bounding box that is programmatically generated by Geodisy using dataset files or other metadata  Bounding boxes in Geodisy13Dataverse metadata • Uses different blocks of metadata: citation (basic description), geospatial, social science, astronomy, and life sciences• The citation block includes several required fields for basic description• For bounding boxes, Geodisy first attempts to analyze geospatial files using GDAL. If unsuccessful, it utilizes the contents of the geospatial block14Automated system for generating bounding boxes (1)• If the dataset contains geospatial type files, GDAL is used to generate coordinates15Automated system for generating bounding boxes (2)• If Dataverse dataset includes geographic bounding coordinates and no usable geospatial files, it is used for Geodisy’s bounding box16Automated system for generating bounding boxes (3)• If coordinates are not included/invalid and there are no geospatial filetypes, geographic coverage metadata is sent to Geonames for coordinates. However, it must contain one of the following combinations:o Country/Nationo Country/Nation AND State/Provinceo Country/Nation AND State/Province AND City17Automated system for generating bounding boxes (4)• Datasets that do not include coordinates, geospatial filetypes, or geographic coverage metadata are ignored18Automated system for generating bounding boxes (5)• Regardless of geographic coverage metadata, if bounding boxes cannot be generated by GDAL, datasets will be logged for manual review in some cases:o If the “other” geographic coverage field contains texto If the geographic coverage fields do not provide enough informationo If the geographic coverage information does not find a valid match in Geonameso If Geonames finds a match but that match has no bounding box coordinates19Metadata standards and crosswalks• Metadata standard: a standardized series of metadata fields that describe a certain type of resource o e.g. Dublin Core for general resources, DDI for social science resources, Darwin Core for biology resources, etc• Crosswalk: when the elements/fields from one metadata standard are mapped onto the elements/fields of another standard20Geodisy metadata treatmentDataverse schemaISO 19115GeoBlacklight schemaEach dataset’s ISO metadata will be available for download on GeoBlacklightThis is the metadata GeoBlacklight uses for functionalityGeodisy-specific metadata21Core Project Team (UBC)• Eugene Barsky – Principal Investigator• Paul Dante – Software Developer • Edith Domingue – ARC Client Services Manager• Mark Goodwin – Geospatial Metadata Coordinator• Tang Lee – Project Manager• Paul Lesack – Co-Principal Investigator• Evan Thornberry – Co-Principal InvestigatorProject Partners• Jason Brodeur – McMaster University• Marcel Fortin – University of Toronto• Alex Garnett – SFU • Amber Leahey – Scholars Portal• Jason Hlady – University of Saskatchewan• Venkat Mahadevan – UBC ARC• Todd Trann – University of Saskatchewan• Lee Wilson – Portage NetworkLaunching in spring 202022Keep up to date:#Geodisy on social media


Citation Scheme:


Citations by CSL (citeproc-js)

Usage Statistics



Customize your widget with the following options, then copy and paste the code below into the HTML of your page to embed this item in your website.
                            <div id="ubcOpenCollectionsWidgetDisplay">
                            <script id="ubcOpenCollectionsWidget"
                            async >
IIIF logo Our image viewer uses the IIIF 2.0 standard. To load this item in other compatible viewers, use this url:


Related Items