UBC Graduate Research

Linked Data for Libraries and Archives Roman Amigo, Carolina 2017

Your browser doesn't seem to have a PDF viewer, please download the PDF to view this item.

Item Metadata

Download

Media
42591-Assignment 1 - Report.pdf [ 582.69kB ]
42591-Assignment 2 - Geo Project Report.pdf [ 1.97MB ]
42591-Assignment 3 - IIIF Project Report.pdf [ 2.44MB ]
Metadata
JSON: 42591-1.0362307.json
JSON-LD: 42591-1.0362307-ld.json
RDF/XML (Pretty): 42591-1.0362307-rdf.xml
RDF/JSON: 42591-1.0362307-rdf.json
Turtle: 42591-1.0362307-turtle.txt
N-Triples: 42591-1.0362307-rdf-ntriples.txt
Original Record: 42591-1.0362307-source.json
Full Text
42591-1.0362307-fulltext.txt
Citation
42591-1.0362307.ris

Full Text

University of British Columbia School of Library, Archival and Information Studies Master of Library and Information Studies          LIBR 594 - Assignment 1  Linked Data in Libraries and Archives  Carolina Román Amigo Supervisor: Richard Arias Hernandez         October 2017    Table of Contents  Main Linked Data Concepts 3 Linked Data 3 Linked Data Principles 3 URL versus URI 3 Semantic Web 4 Metadata Schemas and Metadata Application Profiles (MAP) 4 Namespaces 5 Controlled vocabularies 5 SKOS 6 Ontologies 7 Representational State Transfer (REST) Application Programming Interfaces (APIs) 8 Main Data Models (Data Structures) 8 Tabular Data 9 Relational Model 10 Meta-markup languages 10 RDF (Triples) 10 SPARQL 10 BIBFRAME 11 Serialization Formats for Linked Data 12 XML 12 XSD XML Schema 12 JSON and JSON-LD 12 RDF/XML 12 RDF Schema 12 Turtle and N-Triples 13 OWL 13 Linked Data Process 13 Planning 13 Designing 14 Implementing 14 Publishing 14 Consuming 15 Value and Challenges for Libraries and Archives 15 Linked Data Cases in Libraries and Archives 16 1 WorldCat Linked Data Project (Libraries) 16 Library of Congress’s (LoC) id.loc.gov service (Libraries) 16 British Library’s British National Bibliography (Libraries) 17 American Numismatic Society’s thesaurus (Archives) 17 Archaeology Data Service Linked Open Data (Archives) 17 Sources of data about LD projects in Libraries, Archives and Museums 18 OCLC survey on LD adoption (2015) 18 Library Linked Data Incubator Group (LLD XG) wiki (2011) 18 Linked Data for Libraries (LD4L) (2016) 19 References 20     2 Main Linked Data Concepts Linked Data Linked data (LD) is the term used to refer to the set of technologies and best practices aimed                                   to prepare and publish data in a way it can be automatically interlinked and shared on the web                                   (Hooland & Verborgh, 2014). By using unique resource identifiers and a data structure based                           on triples, linked data provides meaningful links among related objects of different                       provenances, offering more information to the user and improving the discoverability of                       resources. LD makes use of common vocabularies to ensure understanding across a                       community. It is also machine-readable, enabling automated agents to interpret data                     semantically in a similar way a human would do. For that reason, Linked Data, and specifically,                               Linked Open Data (LD made freely available on the web), can be seen as a building block or a                                     practical implementation of a primitive version of the Semantic Web (Miller, 2011; Southwick,                         2015).  Linked Data Principles  The Linked Data principles, or building blocks, popularized by Berners-Lee are:  - “Use URIs as names for things.”​: URIs are uniform resource identifiers, allowing                       resources to be identified in a unique way anywhere in the universe (Hooland &                           Verborgh, 2014). Each element in a rdf triple should have a unique identifier: subject,                           predicate and object. - “Use HTTP URIs so that people can look up those names.”​: Adding an HTTP method to                               an URI allows users to access those names to get more information.   - “When someone looks up a URI, provide useful information, using the standards (RDF,                         SPARQL).”​: Information provided should be relevant about the thing identified, it                     should be data that someone would like to know about the resource. - “Include links to other URIs so that they can discover more things.”​: Provide the                           relationships and other things that the URI is related to.   URL versus URI In summary, URIs are identifiers, while URLs are addresses. According to Hooland & Verborgh                           (2014):  “A URI, Uniform Resource Identifier, is a generalization of the concept that permits                         resources anywhere in the universe to be given a unique identification.” “A URL is a uniform resource locator, which, as the name says, enables to locate                             resources in a unique way.” 3  Every URL is also an URI, but not every URI is an URL. An URL is an URI added of a method                                           (such as HTTP) that provides access to a resource over a network. Technical standards such as                               W3C do not endorse the subdivision of URI in URLs, using rather a nomenclature such as HTTP                                 URIs to define URIs pointing to a network location. (Uniform Resource Identifier, 2017)                         However, although informal, URLs are recognized by the community as a useful concept and,                           according to Miessler (2015), it is best to use URL when referring to a URI containing both the                                   resource name and the method to access it, while URI is best used when referring directly to a                                   resource name.  Semantic Web The Semantic Web is the next level of evolution of the internet as we use it today, where                                   machines will be able to understand the semantic meaning of information provided by                         humans. It is a ​“framework for creating, managing, publishing and searching semantically rich                         information about web resources” (Alistair et al., 2005). This, according to Berners-Lee,                       Hendler and Lassila (2001), will ​“enable intelligent agents to autonomously perform tasks for                         us” (Berners-Lee et al., 2001). It is important to note that the terms “understand” and                             “intelligent” do not mean here the same for machines and humans. Machines are able to                             behave “intelligently” only in an operational sense, meaning that the are able to use the data                               provided according to predefined rules and to infer relationships using logic. In the semantic                           web, the data is annotated and ontologies, relationships and vocabularies (shared repositories                       of meaning) are provided so the web, that is constituted today mostly of human-readable                           information, becomes accessible to software agents.   Regarding information retrieval for humans, the Semantic Web overcomes several of the                       limitations of keyword based search engines (Alistair et al., 2005). It increases precision when                           searching for terms with multiple meanings, since the terms have extra information associated                         with them that allows the search to specify which meaning is the one he is looking for. It                                   provides a better recall as synonyms are taken in account, and, in a similar way, enables                               search for terms across languages. Limitations regarding retrieval of images, audio and video                         remain though, since they only become searchable when metadata is added to them.  Metadata Schemas and Metadata Application Profiles (MAP) In order to be machine-interoperable, metadata has to be structured and atomized. A                         metadata schema ensures the common interpretation of each metadata element, subelement                     and attributes, as well as its requirements, content guidelines, controlled vocabularies                     adopted, etc. (Hooland & Verborgh, 2014). When documented, a metadata schema becomes a                         metadata application profile (Miller, 2011). The term metadata schema can refer both to                         formally standardized element sets such as the Dublin Core, VRA 3.0 Core Categories, or DPLA                             MAP, or to locally established element sets developed to fulfill specific needs.  4 Some initiatives such the Europeana adopt the term “data model” for designating their                         metadata application profile documentation. Although technically correct, we find the term                     too generic to this application as a data model is anything that provides guidelines to structure                               data. Moreover, using data model to designate a metadata application profile can cause                         confusion since on the literature, the term is also used to designate data structures such as                               tabular data, relational data model, meta-markup and RDF. Namespaces  A namespace is a component of a metadata application profile. According to Hay (2006), ​“an                             XML namespace is the URI that describes an ontology from which terms are taken.”​(Hay, 2006)                             It allows consistent reuse of elements of metadata description already developed by someone                         else, ensuring a common understanding of how the data should be interpreted (Hooland &                           Verborgh, 2014). A prefix is usually added to each element indicating the namespace it comes                             from, and the full namespaces URIs are indicated at the beginning of the metadata schema                             file. The example below is an excerpt of the Portland Common Data Model RDF file available                               on Github (https://github.com/duraspace/pcdm/blob/master/models.rdf).  <?xml version="1.0"?> <?xml-stylesheet type="text/xsl" href="rdfs2html.xsl"?> <rdf:RDF      xmlns:dcterms="http://purl.org/dc/terms/"     xmlns:ldp="http://www.w3.org/ns/ldp#"     xmlns:ore="http://www.openarchives.org/ore/terms/"     xmlns:owl="http://www.w3.org/2002/07/owl#"     xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#"     xmlns:rdfs="http://www.w3.org/2000/01/rdf-schema#"     xmlns:xsd="http://www.w3.org/2001/XMLSchema#"     xmlns:pcdm="http://pcdm.org/models#">      <rdf:Description rdf:about="http://pcdm.org/models#">     <dcterms:title xml:lang="en">Portland Common Data Model</dcterms:title>     <dcterms:publisher rdf:resource="http://www.duraspace.org/"/>     <rdfs:seeAlso rdf:resource="https://github.com/duraspace/pcdm/wiki"/> <rdfs:comment xml:lang="en">Ontology for the Portland Common Data Model,         intended to underlie a wide array of repository and DAMS          applications.</rdfs:comment>     <owl:versionInfo>2016/04/18</owl:versionInfo>     <owl:priorVersion rdf:resource="http://pcdm.org/2015/09/28/models"/>     </rdf:Description> Controlled vocabularies  5 According to Hooland & Verborgh (2014), a controlled vocabulary ​“represents a restricted                       subset of language which has been explicitly created to avoid the problems which arise with                             the use of natural language during the indexing and retrieval of information” ​(Hooland &                           Verborgh, 2014). That means that standardized words are used to represent concepts,                       establishing preferred terms to promote consistency. (Harpring & Baca, 2010) Controlled                     vocabularies are a type of Knowledge Organization Systems (KOS). There are mainly three                         types of controlled vocabularies as described below.  ● Classification schemes: ​offer a way to group physically documents of similar content,                       using classes arranged systematically (Broughton, 2004). Example: Dewey Decimal Classification (DDC)  ● Subject headings: ​describe the subject of specific resources in a succinct way, using                         one or few words (Hooland & Verborgh, 2014).  Example: Library of Congress Subject Headings (LCSH)  ● Thesauri: ​represent an application domain in a logical way, building a structure of                         preferred and non preferred terms, related terms, broader and narrower terms                     (Hooland & Verborgh, 2014). Example: Arts and Architecture Thesaurus (AAT)  Controlled vocabularies increase the chances of finding desired content even with imprecise                       keywords (related terms). They also increase recall (proportion of the documents relevant to                         the search that were successfully retrieved) and precision (proportion of retrieved documents                       relevant to the search). Better recall is possible because thesauri provide synonymy control,                         meaning that it takes in account different words that may represent the same or similar                             concepts. Greater precision is possible because of polysemy control, which means that when                         the same term is used to represent different concepts the thesauri allow for disambiguation                           because of the hierarchical structure it provides. For example, apple the fruit would be in a                               different location (under a different broader term) than Apple the company.  However, controlled vocabularies are expensive to create and difficult to maintain, as it takes                           time and resources to keep them up to date. They are subjective and thus prone to express a                                   specific world view, usually biased. Also, they can be difficult to users to understand and use.                               They definitely have value in the linked data context, but each application should be evaluated                             in the light of pros and cons they offer.   SKOS Simple Knowledge Organization System (SKOS) is a simplified language to represent controlled                       vocabularies on the web. It is designed to be easier to use than more complex language                               ontologies such as OWL, but still powerful enough to support semantically enhanced search                         6 (machine-understandable). It is meant to describe content of books and other resources, not                         to formally describe aspects of the world by axioms and facts as ontologies do (Alistair et al.,                                 2005). The figure 1 below is an example of how SKOS may be used to represent a thesaurus                                   entry. Note the use of a prefix (skos), as in namespaces, and the label of the elements                                 (predicates of the triples) representing classical thesauri terms such as related, broader and                         narrower.   Figure 1 - Example of a thesaurus entry represented in SKOS (source: https://www.w3.org/TR/2005/WD-swbp-skos-core-guide-20051102/​)  Ontologies According to Harpring & Baca (2010), ​“an ontology is a formal, machine-readable specification                         of a conceptual model in which concepts, properties, relationships, functions, constraints, and                       axioms are all explicitly defined” (Harpring & Baca, 2010). Ontologies express knowledge about                         a domain and the relation between its concepts based upon an open world assumption,                           meaning that inference is allowed based on the information explicitly asserted. Differently                       from the closed world assumption, where only that of is asserted is considered known,                           ontologies are able to complete incomplete information entered by applying its rules logically                         7 (Hay, 2006). Ontologies should not be confused with controlled vocabularies. Controlled vocabularies are                     used by ontologies to express the vocabulary of a given domain, which is by its turn used                                 according to the grammar defined by the ontology. While controlled vocabularies aim to                         provide means to cataloging and retrieval, ontologies aim to represent knowledge in a                         machine-readable form. Concepts are organized in classes, individuals, attributes, relations                   and events (Harpring & Baca, 2010).  Representational State Transfer (REST) Application Programming           Interfaces (APIs)  APIs are a set of defined methods that make data accessible to machines. They have their own                                 vocabulary and syntax, defining property names and labels and how the information is                         arranged (University of British Columbia, n.d.). APIs receive programming instructions and                     provide data answers in XML or JSON. HTTP APIs, that preceded REST APIs, didn't offer a way                                 to integrate human access interface and machine access interface, keeping them both distinct.                         REST, by other hand, provide access for human and machine consumers in the same way,                             avoiding duplications and minimizing maintenance. This is achieved by uniform interface                     constraints, consisting on:  - URLs that refers to an object (piece of content) instead of a representation. - Resource manipulation though representations, returning the representation most               appropriate for each request. For example an HTML page for a person and a JSON for a                                 Javascript application. - Self-descriptive messages which contain all the information necessary to understand                   and process it. For example, the second page of search results doesn't require the first                             page in order to be accessed. - Hypermedia as the engine of application state, meaning that links are provided instead                         of identificators, dispensing the need for documentation to understand the information                     (Hooland & Verborgh, 2014). Main Data Models (Data Structures) There are four types of data structures: tabular data, relational model, meta-markup                       languages and RDF triples. Figure 2 synthesizes the differences among these four types.  8  Figure 2 - Schematic comparison of the four major data models (Hooland & Verborgh, 2014).  Tabular Data Data is organized in columns and rows, and the data in the intersections (cells), has its                               meaning defined by the columns and row it belongs to. Each item (row) has the same fields                                 9 (columns), and a header line can indicate their name. Tabular data is useful to import and                               export data with a simple structure. It is intuitive to use, portable and independent on the                               technology used. However, search and retrieval is inefficient and data has to be repeated in                             many instances, increasing the risk of inconsistencies (Hooland & Verborgh, 2014). Relational Model More than one table is used to structure data, each with its own set of fields, and tables are                                     interlinked by using key columns. The relational model minimizes redundancies and                     inconsistencies of the tabular model. It is used to normalize and manage complex data. It                             allows better search and retrieval functions, through queries, but is schema-dependent                     (Hooland & Verborgh, 2014). Meta-markup languages Meta-markup languages structure data in a hierarchical way, starting with a single root that                           breaks down in children elements. It is employed to import and export complex data. It is                               machine-readable, and also human readable with some training. It is independent of any                         platform but can be hard to implement for complex data. Its main disadvantage is its verbosity                               (Hooland & Verborgh, 2014). RDF (Triples) Triples, or RDF (resource description framework), structure data in statements consisting of                       subject, predicate and object. Each line connects a subject to an object through a predicate,                             expressing a precise relationship. There are no constraints to what can be connected to what                             and the structure is easily extended by the addition of more triples. Triples are                           schema-neutral, and the triple is complete semantically (no need for additional                     documentation). Triples can be expressed in a graph format, and this data model allow logical                             inference and the linking of data. However, normalization is lost when using a triple data                             structure, and the software market is still immature to work with data in this format (Hooland                               & Verborgh, 2014). SPARQL SPARQL Protocol and RDF Query Language is a query language to query data structured in                             triples (RDF) in any serialization format. Queries in SPARQL are based on graph patterns and                             follow the subject-predicate-object triples structure (Hooland & Verborgh, 2014; Southwick,                   2015).  10 BIBFRAME BIBFRAME (Bibliographic Framework) is a data model for bibliographic description expressed                     in RDF vocabulary with classes and properties. It is compatible with linked data principles and                             it is meant to replace MARC standards. BIBFRAME is based on Functional Requirements for                           Bibliographic Records (FRBR) model of work, expression, manifestation and item, consisting,                     however, of three core categories ​(work, instance, item), with additional key concepts related to                           the core classes (agent, subject, event) (Figure 3). Properties in BIBFRAME describe characteristics                         of the resource described as well the relationship among resources (ex: instance of, translation of).                             (Library of Congress, 2016)    Figure 3 - Illustration of BIBFRAME 2.0 model, with three core levels of abstraction (in blue)—Work,                               Instance, Item—and three related classes (in orange)—Agent, Subject, Event. (Source:                   https://en.wikipedia.org/wiki/BIBFRAME) 11 Serialization Formats for Linked Data Data structures have to be converted in a stream of bits in order to manipulated by software                                 or shared over a network. Serialization is the process of translating the data structure into a                               format (Hooland & Verborgh, 2014). The most relevant serialization formats to linked data                         purposes are briefly described below. XML In XML the content is annotated with tags that describe their meaning. Tags are similar to the                                 ones used in other markup languages such as HTML. XML enables data portability, however it                             lacks semantics (needs an schema in order to be interpreted) and is a verbose serialization                             format (Hay, 2006). XSD XML Schema XML Schema Definition (XSD) provides the semantics that are lacking in a XML format. It lists                               element and attributes names, relationships, data structure, and data types (Legg, 2007). JSON and JSON-LD JSON-LD, ​or JavaScript Object Notation for Linked Data, ​was developed based on JSON,                         enhancing it by providing additional mappings to an RDF model. It is aimed to encode linked                               data. The additional mappings provide context by linking object properties to concepts in an                           ontology (JSON-LD, n.d.). RDF/XML Resource Description Framework is the main serialization format for linked data. As in XML                           files, content is annotated with tags for describing semantics. Main tags are rdf:subject,                         rdf:predicate and rdf:object, forming a triple, with values being expressed as URIs (Hay, 2006). RDF Schema Resource Description Framework Schema (RDFS) extends RDF by adding tags to define                       domain, range, classes and subclasses. According to Ray (2006), ​“in RDFS, attributes and                         relationships are properties that are defined before assigning them to classes. Note that all                           relationships and attributes are considered optional many-to-many. There are no cardinality                     constraints in RDF” ​ (Hay, 2006). 12 Turtle and N-Triples Turtle (Terse RDF Triple Language) is a serialization format for RDF (triple) data structure in a                               less verbose way than RDF/XML, what makes it more compact and easier to read. It ​“provides                               ways to abbreviate such information, for example by factoring out common portions of URIs”                           (Turtle syntax, n.d.). N-triples is a subset of Turtle, line-based. Each line is a triple statement, composed by subject,                               predicate and object separated by a white space, and terminated with a full stop. Predicates                             have to be always expressed by URI, while subjects may be a URI or a blank nde, and objects                                     may be a URI, blank node or a literal (string) (N-Triples, n.d.). OWL OWL (Web Ontology Language) allows the precise definition of concepts of an ontology                         (Hooland & Verborgh, 2014), extending RDF by allowing the definition of relationships between                         classes (union, intersection, complement, etc.), class cardinality, equality for both classes and                       individuals, properties characteristics (symmetry, transitivity, functionality, etc.), and               restrictions on property behaviour by class (e.g. assign class UBC alumni to every record that                             has “UBC” as institution issuing degree) (Legg, 2007). OWL os based on open world assertion,                             meaning that ​“anything can be true unless asserted otherwise” ​(Hay, 2006).  Linked Data Process  Linked data (and linked open data) projects should follow the process depicted in figure 4. The                               phases are succinctly described below, with exception of the implementation phase, which is                         more detailed as it has more specificities when compared to standard metadata projects. The                           information in this section was gathered from Hooland and Verborgh (2014) and Southwick                         (2015).  Figure 4 - Linked Data process diagram, based on Southwick (2015) (Southwick, 2015). Planning ● Literature review ● Benchmarking 13 ● Stakeholders ● Proof of concept and preliminary testing ● Securing resources and funds ● Ensure top-level commitment to the project Designing ● selecting technologies ● defining a data model (aka Metadata Application Profile MAP, Namespaces) ● mapping ● defining the rules to create URIs Implementing  ● Modelling: The first step to build a linked data application is to have data structured                             following a RDF data model, consisting of triples, with URIs as names for things. ● Cleaning: Ensuring that your metadata is consistent and well structured is of crucial                         importance since it will affect the quality of the outputs of the reconciling and enriching                             steps of linked data process. Data profiling methods and tools can be used to help                             diagnose problems in your metadata in a semi-automated manner, and to deduplicate,                       normalize and clean it.  ● Reconciling: “Terms used in your metadata records can be reconciled with existing and                         well established vocabularies.” This is an easier to implement approach (when                     compared to full ontologies) to aggregate some level of semantics to your metadata,                         providing URIs with useful information in a standardized format and links to related                         URIs. String matching can be used as a low-cost approach to connect your metadata to                             a controlled vocabulary. ● Enriching: Enriching consists in obtaining structured metadata from unstructured data.                   Using OCR and named-entity recognition on a full text of a digitized document, for                           example, more metadata about the contents of the text can be extracted and added in                             a structured way to the record, becoming available for linking. It is specially useful                           when dealing with large digitization projects, big data linking or in the realm of digital                             humanities. Named-entity recognition (NER): ​“NER currently provides the easiest and                 cheapest method of identifying and disambiguating topics in large volumes of                     unstructured textual documents” ​(Hooland & Verborgh, 2014). Publishing  ● Publishing: Linked data should ideally be published in a format that allows both human                           and machine-interpretation. REST APIs allows you to do that in an elegant and                         sustainable way, avoiding needless duplication and maintenance. 14  ● The data set should: ○ be linked to other data sets ○ provide provenance of the metadata ○ explicitly indicate license for use ○ adopt terms from well-established controlled vocabularies ○ use dereferenceable URIs ○ map local vocabulary terms to other vocabularies ○ provide set-level metadata ○ provide more than one way to access the data set (e.g., SPARQL endpoint and                           RDF dumps) Consuming ● Final user interface ● APIs  Value and Challenges for Libraries and Archives According to the results of the survey on LD adoption conducted by Online Computer Library                             Center (OCLC) (Mitchell, 2016a), libraries and archives engaging in linked data projects are                         looking for:  ● “enriching bibliographic metadata or descriptions,”  ● “interlinking,”  ● “a reference source that harmonize data from multiple sources,”  ● “automate authority control,”  ● “enrich an application,” ● “to publish data more widely,” ● “to demonstrate potential use cases and impact.”   The same survey highlights some of the challenges Libraries and Archives are facing in order                             to implement those projects:  ● Inexistence of a formalized and established implementation approach across institutions. ● Lack of an easy-to-implement approach demand high level technological skills from staff. ● Immature software market and tools. ● Non standard approach and guidelines to data licensing for published data. ● Lack of integration of authority resources to linked data tools and services. 15 Linked Data Cases in Libraries and Archives The cases described below were selected based on the number of requests per day, meaning                             that they are the most popular resources among the ones listed on OCLC’s 2014 survey                             (Mitchell, 2016a; 2016b). Examples from both libraries and archives were selected. Some                       interesting results from the survey are summarized below:  ● “The most commonly used LD data sources (vocabularies) were id.loc.gov, DBpedia,                     GeoNames, and VIAF.”  ● “Data in the projects analyzed was often bibliographic or descriptive in nature.” ● “The most common organizational schemas used were Simple Knowledge Organization                   System (SKOS), Friend of a Friend (FOAF), Dublin Core and Dublin Core terms, and                           Schema.org.”  ● “Resource Description Framework (RDF) serialized in the eXtensible Markup Language                   (XML) was commonly used, as was RDF serialized in JavaScript Object Notation (JSON)                         and Terse RDF Triple Language (Turtle).” WorldCat Linked Data Project (Libraries) Link: ​https://www.worldcat.org/ Number of requests/day: an average of 16 million (OCLC Research, 2014) Technology used: URIs, RDF, keyword search.  WorldCat was enhanced in 2014 by the ​“addition of URIs from WorldCatWorks (OCLC 2014d),                           an RDF dataset that is automatically generated from WorldCat catalog records and identifies                         common content in the editions and formats of particular books, sound recordings, and other                           resources held in library collections.” ​(Godby et al., 2015) The motivation behind the project                           was to make WorldCat records more useful, ​“—especially to search engines, developers, and                         services on the wider Web, beyond the library community” and ​“easier for search engines to                             connect non-library organizations to library data”​. (Godby et al., 2015) Library of Congress’s (LoC) id.loc.gov service (Libraries) Link: ​http://id.loc.gov/about/ Number of requests/day: over 100,000 (OCLC Research, 2014) Technology used: URIs, keyword search, REST API. RDF/XML, Turtle, or N-triples, are available                         for bulk download for the authorities and vocabularies (MADS/RDF and SKOS/RDF                     representations of the data).  “The Library of Congress Linked Data Service enables both humans and machines to                         programmatically access authority data at the Library of Congress. The scope of the Linked                           16 Data Service is to provide access to commonly found standards and vocabularies promulgated                         by the Library of Congress. This includes data values and the controlled vocabularies that                           house them. The main application provides resolvability to values and vocabularies by                       assigning URIs. Each vocabulary possesses a resolvable URI, as does each data value within it.                             URIs accessible at id.loc.gov only link to authority data -- that is, controlled vocabularies and                             the values within them. Therefore, users will not find identifiers for electronic bibliographic                         resources. The Library of Congress uses other identifier schemes such as ​Handles for this                           purpose.”​ (Library of Congress, n.d.) British Library’s British National Bibliography (Libraries) Link: ​http://bnb.data.bl.uk/ Number of requests/day: 10,000 – 50,000 (OCLC Research, 2014) Technology used: URIs, RDF, keyword search, SPARQL queries.   “The BNB Linked Data Platform provides access to the ​British National Bibliography published                         as linked open data and made available through SPARQL services. Two different interfaces are                           provided: a ​SPARQL editor​, and /sparql a service endpoint for remote queries. The Linked                           Open BNB is a subset of the full British National Bibliography. It includes published books                             (including monographs published over time), serial publications and new and forthcoming                     books, representing approximately 3.9 million records. The dataset is available under a                       Creative Commons CC0 1.0 Universal Public Domain Dedication​ licence.”​ (British Library, n.d.) American Numismatic Society’s thesaurus (Archives) Link: ​http://nomisma.org/ Number of requests/day: 10,000 – 50,000 Technology used: URIs, RDF/XML, JSON-LD, Turtle, KML, SPARQL queries.  “Nomisma.org is a collaborative project to provide stable digital representations of                     numismatic (relating to or consisting of coins, paper currency, and medals) concepts according                         to the principles of ​Linked Open Data​. These take the form of http URIs that also provide                                 access to reusable information about those concepts, along with links to other resources. The                           canonical format of nomisma.org is RDF/XML, with serializations available in JSON-LD                     (including geoJSON-LD for complex geographic features), Turtle, KML (when applicable), and                     HTML5+RDFa 1.1.”​ (Nomisma, n.d.) Archaeology Data Service Linked Open Data (Archives) Link: ​http://data.archaeologydataservice.ac.uk/page/ Number of requests/day: fewer than 1,000 (Mitchell, 2016a) Technology used: URIs, RDF/XML, SPARQL queries. 17  The Archaeology Data Service “preserves digital data in the long term, and promotes and                           disseminating a broad range of data in archaeology, using a variety of avenues, including                           Linked Open Data. Linked Data at the ADS was initially made available through the STELLAR                             project (http://hypermedia.research.southwales.ac.uk/kos/stellar/), a joint project between the             University of South Wales, the ADS and Historic England. The STELLAR project developed an                           enhanced mapping tool for non-specialist users to map and extract archaeological datasets                       into RDF/XML, conforming to the CRM-EH ontology (an extension of CIDOC CRM for                         archaeology). The results of the STELLAR project are published from the ADS SPARQL                         endpoint. ADS also consumes LOD from other sources (Library of Congress, Ordnance Survey,                         GeoNames, DBpedia and the vocabularies developed as part of the SENESCHAL project -                         http://www.heritagedata.org/blog/about-heritage-data/seneschal) to populate the metadata         held within our Collection Management System with URIs, and then publishes the resource                         discovery metadata for all our archives via our SPARQL endpoint.”​ (The University of York, n.d.)  Sources of data about LD projects in Libraries, Archives and                   Museums These sources offer a comprehensive list of linked data initiatives in libraries, archives and                           museums, as well as use cases and frameworks for application. OCLC survey on LD adoption (2015) “In 2014, OCLC staff conducted a survey on LD adoption, a survey that is being repeated for                                 2015. The analyzed results from the 2014 survey are captured in a series of blog posts on the                                   site hangingtogether.org and provide a substantial window into the state of LD deployment in                           LAM institutions.1 The survey surfaced 172 projects, of which 76 included substantial                       description. Of those 76 projects, over a third (27) were in development.” ​(Mitchell, 2016a)  Links: Home: ​http://www.oclc.org/research/themes/data-science/linkeddata.html Blog: ​http://hangingtogether.org/?p=4137 Library Linked Data Incubator Group (LLD XG) wiki (2011) The mission of the LLD XG, chartered from May 2010 through August 2011, has been ​"to help                                 increase global interoperability of library data on the Web, by bringing together people                         involved in Semantic Web activities — focusing on Linked Data — in the library community and                               beyond, building on existing initiatives, and identifying collaboration tracks for the future."                       (W3C Incubator, 2011). They offer a series of generalized and individual use cases of linked                             data.  18  Links: Final report: ​https://www.w3.org/2005/Incubator/lld/XGR-lld-20111025/ Use cases page: ​https://www.w3.org/2005/Incubator/lld/XGR-lld-usecase-20111025/  Use cases wiki: ​https://www.w3.org/2005/Incubator/lld/wiki/Use_Cases  Linked Data for Libraries (LD4L) (2016)  “The goal of the project is to create a Scholarly Resource Semantic Information Store (SRSIS)                             model that works both within individual institutions and through a coordinated, extensible                       network of Linked Open Data to capture the intellectual value that librarians and other domain                             experts and scholars add to information resources when they describe, annotate, organize,                       select, and use those resources, together with the social value evident from patterns of usage.”                             (Duraspace, 2016) They offer a series of generalized cases, clustered into six main areas                           including “Bibliographic + Curation” data, “Bibliographic + Person” data, “Leveraging external                     data including authorities,” “Leveraging the deeper graph,” “Leveraging usage data,” and                     “Three-site services” (e.g., enabling a user to combine data from multiple sources).” (Mitchell,                         2016a)  Links: Project wiki: ​https://wiki.duraspace.org/display/ld4l/LD4L+Use+Cases Paper about the project: ​http://ceur-ws.org/Vol-1486/paper_53.pdf       19 References  Alistair, M., Matthews, B., Beckett, D., Brickley, D., Wilson, M. and Rogers, N. (2005) SKOS: a language to describe simple knowledge structures for the web, http://epubs.cclrc.ac.uk/bitstream/685/SKOS-XTech2005.pdf  Berners-Lee, T. (2009). Linked Data. Retrieved October 10, 2017, from https://www.w3.org/DesignIssues/LinkedData.html  Berners-Lee, T., Hendler, J. and Lassila, O. (2001) The Semantic Web, Scientific American, 284 (5), 34-43.  Bray, T. Hollander, D., Layman, A. and Tobin, R. (2006) Namespaces in XML 1.1, 2nd edn, W3C Recommendation, http://www.w3.org/TR/xml-names11/.  British Library. (n.d.). Welcome to bnb.data.bl.uk. Retrieved October 10, 2017, from http://bnb.data.bl.uk/  Broughton, V. (2004) Essential Classification, Facet Publishing.  Duraspace. (2016). LD4L Use Cases. Retrieved October 10, 2017, from https://wiki.duraspace.org/display/ld4l/LD4L+Use+Cases  Fielding, R. T. (2000) Architectural Styles and the Design of Network-based Software Architectures, PhD thesis, University of California, Irvine, CA.  Godby, C. J., Wang, S., & Mixter, J. K. (2015). Library Linked Data in the Cloud: OCLC’s Experiments with New Models of Resource Description. Synthesis Lectures on the Semantic Web: Theory and Technology, 5(2), 1–154. http://doi.org/10.2200/S00620ED1V01Y201412WBE012  Harpring, P., & Baca, M. (2010). 1. Controlled Vocabularies in Context. In Introduction to Controlled Vocabularies: Terminology for Art, Architecture, and Other Cultural Works (pp. 1–11). Retrieved from http://www.getty.edu/research/publications/electronic_publications/intro_controlled_vocab/  Hay, D. C. (2006). Data Modeling, RDF, & OWL - Part One: An Introduction To Ontologies. The Data Administration Newsletter, (April). Retrieved from http://www.tdan.com/view-articles/5025  20 Hooland, S.; Verborgh, R. (2014). Linked data for libraries, archives and museums : how to clean, link and publish your metadata. Neal-Schuman.  JSON-LD. (2017, October 17). In Wikipedia, The Free Encyclopedia. Retrieved October 18, 2017, from https://en.wikipedia.org/w/index.php?title=JSON-LD&oldid=805736276  Legg, C. (2007). Ontologies on the Semantic Web. Annual Review of Information Science and Technology, 41(1), 407–451. http://doi.org/10.1002/aris.2007.1440410116  Library of Congress. (2016). Overview of the BIBFRAME 2.0 Model. Retrieved October 10, 2017, from https://www.loc.gov/bibframe/docs/bibframe2-model.html  Library of Congress. (n.d.). About Linked Data Service. Retrieved October 10, 2017, from http://id.loc.gov/about/  Miessler, D. (2015). The Difference Between URLs and URIs. Retrieved October 10, 2017, from https://danielmiessler.com/study/url-uri/  Miller, S. J. (2011). Metadata, Linked Data, and the Semantic Web. In Metadata for Digital Collections (pp. 303–324).  Mitchell, E. T. (2016a). Library Linked Data: Early Activity and Development. Library Technology Reports (Vol. 52). http://doi.org/10.5860/ltr.52n1  Mitchell, E. T. (2016b). Chapter 1. The Current State of Linked Data in Libraries, Archives, and Museums. Retrieved October 10, 2017, from https://journals.ala.org/index.php/ltr/article/view/5892/7446  N-Triples. (2017, September 24). In Wikipedia, The Free Encyclopedia. Retrieved October 18, 2017,  from https://en.wikipedia.org/w/index.php?title=N-Triples&oldid=802208118  Nomisma. (n.d.). Retrieved October 10, 2017, from http://nomisma.org/  OCLC Research. (2014). Linked Data Survey results 1 – Who’s doing it (Updated). Retrieved October 10, 2017, from http://hangingtogether.org/?p=4137  Olson, J. (2003) Data Quality: the accuracy dimension, Morgan Kaufmann.  Southwick, S. B. . (2015). A Guide for Transforming Digital Collections Metadata into Linked Data Using Open Source Technologies. Journal of Library Metadata, 15(1), 1–35. http://doi.org/10.1080/19386389.2015.1007009  21 The University of York. (n.d.). Archaeology Data Service Linked Open Data. Retrieved October 10, 2017, from http://data.archaeologydataservice.ac.uk/page/  Turtle (syntax). (2017, September 24). In Wikipedia, The Free Encyclopedia. Retrieved October 18, 2017, from https://en.wikipedia.org/w/index.php?title=Turtle_(syntax)&oldid=802208209  Uniform Resource Identifier. (2017, October 14). In Wikipedia, The Free Encyclopedia. Retrieved October 18, 2017, from https://en.wikipedia.org/w/index.php?title=Uniform_Resource_Identifier&oldid=805285595  University of British Columbia. (n.d.). Open Collections API Documentation. Retrieved October 10, 2017, from https://open.library.ubc.ca/docs  W3C Incubator. (2011). Library Linked Data Incubator Group Final Report. Retrieved October 10, 2017, from https://www.w3.org/2005/Incubator/lld/XGR-lld-20111025/      22  University of British Columbia School of Library, Archival and Information Studies Master of Library and Information Studies          LIBR 594 - Assignment 1  Linked Data in Libraries and Archives  Carolina Román Amigo Supervisor: Richard Arias Hernandez         October 2017    Table of Contents  Main Linked Data Concepts 3 Linked Data 3 Linked Data Principles 3 URL versus URI 3 Semantic Web 4 Metadata Schemas and Metadata Application Profiles (MAP) 4 Namespaces 5 Controlled vocabularies 5 SKOS 6 Ontologies 7 Representational State Transfer (REST) Application Programming Interfaces (APIs) 8 Main Data Models (Data Structures) 8 Tabular Data 9 Relational Model 10 Meta-markup languages 10 RDF (Triples) 10 SPARQL 10 BIBFRAME 11 Serialization Formats for Linked Data 12 XML 12 XSD XML Schema 12 JSON and JSON-LD 12 RDF/XML 12 RDF Schema 12 Turtle and N-Triples 13 OWL 13 Linked Data Process 13 Planning 13 Designing 14 Implementing 14 Publishing 14 Consuming 15 Value and Challenges for Libraries and Archives 15 Linked Data Cases in Libraries and Archives 16 1 WorldCat Linked Data Project (Libraries) 16 Library of Congress’s (LoC) id.loc.gov service (Libraries) 16 British Library’s British National Bibliography (Libraries) 17 American Numismatic Society’s thesaurus (Archives) 17 Archaeology Data Service Linked Open Data (Archives) 17 Sources of data about LD projects in Libraries, Archives and Museums 18 OCLC survey on LD adoption (2015) 18 Library Linked Data Incubator Group (LLD XG) wiki (2011) 18 Linked Data for Libraries (LD4L) (2016) 19 References 20     2 Main Linked Data Concepts Linked Data Linked data (LD) is the term used to refer to the set of technologies and best practices aimed                                   to prepare and publish data in a way it can be automatically interlinked and shared on the web                                   (Hooland & Verborgh, 2014). By using unique resource identifiers and a data structure based                           on triples, linked data provides meaningful links among related objects of different                       provenances, offering more information to the user and improving the discoverability of                       resources. LD makes use of common vocabularies to ensure understanding across a                       community. It is also machine-readable, enabling automated agents to interpret data                     semantically in a similar way a human would do. For that reason, Linked Data, and specifically,                               Linked Open Data (LD made freely available on the web), can be seen as a building block or a                                     practical implementation of a primitive version of the Semantic Web (Miller, 2011; Southwick,                         2015).  Linked Data Principles  The Linked Data principles, or building blocks, popularized by Berners-Lee are:  - “Use URIs as names for things.”​: URIs are uniform resource identifiers, allowing                       resources to be identified in a unique way anywhere in the universe (Hooland &                           Verborgh, 2014). Each element in a rdf triple should have a unique identifier: subject,                           predicate and object. - “Use HTTP URIs so that people can look up those names.”​: Adding an HTTP method to                               an URI allows users to access those names to get more information.   - “When someone looks up a URI, provide useful information, using the standards (RDF,                         SPARQL).”​: Information provided should be relevant about the thing identified, it                     should be data that someone would like to know about the resource. - “Include links to other URIs so that they can discover more things.”​: Provide the                           relationships and other things that the URI is related to.   URL versus URI In summary, URIs are identifiers, while URLs are addresses. According to Hooland & Verborgh                           (2014):  “A URI, Uniform Resource Identifier, is a generalization of the concept that permits                         resources anywhere in the universe to be given a unique identification.” “A URL is a uniform resource locator, which, as the name says, enables to locate                             resources in a unique way.” 3  Every URL is also an URI, but not every URI is an URL. An URL is an URI added of a method                                           (such as HTTP) that provides access to a resource over a network. Technical standards such as                               W3C do not endorse the subdivision of URI in URLs, using rather a nomenclature such as HTTP                                 URIs to define URIs pointing to a network location. (Uniform Resource Identifier, 2017)                         However, although informal, URLs are recognized by the community as a useful concept and,                           according to Miessler (2015), it is best to use URL when referring to a URI containing both the                                   resource name and the method to access it, while URI is best used when referring directly to a                                   resource name.  Semantic Web The Semantic Web is the next level of evolution of the internet as we use it today, where                                   machines will be able to understand the semantic meaning of information provided by                         humans. It is a ​“framework for creating, managing, publishing and searching semantically rich                         information about web resources” (Alistair et al., 2005). This, according to Berners-Lee,                       Hendler and Lassila (2001), will ​“enable intelligent agents to autonomously perform tasks for                         us” (Berners-Lee et al., 2001). It is important to note that the terms “understand” and                             “intelligent” do not mean here the same for machines and humans. Machines are able to                             behave “intelligently” only in an operational sense, meaning that the are able to use the data                               provided according to predefined rules and to infer relationships using logic. In the semantic                           web, the data is annotated and ontologies, relationships and vocabularies (shared repositories                       of meaning) are provided so the web, that is constituted today mostly of human-readable                           information, becomes accessible to software agents.   Regarding information retrieval for humans, the Semantic Web overcomes several of the                       limitations of keyword based search engines (Alistair et al., 2005). It increases precision when                           searching for terms with multiple meanings, since the terms have extra information associated                         with them that allows the search to specify which meaning is the one he is looking for. It                                   provides a better recall as synonyms are taken in account, and, in a similar way, enables                               search for terms across languages. Limitations regarding retrieval of images, audio and video                         remain though, since they only become searchable when metadata is added to them.  Metadata Schemas and Metadata Application Profiles (MAP) In order to be machine-interoperable, metadata has to be structured and atomized. A                         metadata schema ensures the common interpretation of each metadata element, subelement                     and attributes, as well as its requirements, content guidelines, controlled vocabularies                     adopted, etc. (Hooland & Verborgh, 2014). When documented, a metadata schema becomes a                         metadata application profile (Miller, 2011). The term metadata schema can refer both to                         formally standardized element sets such as the Dublin Core, VRA 3.0 Core Categories, or DPLA                             MAP, or to locally established element sets developed to fulfill specific needs.  4 Some initiatives such the Europeana adopt the term “data model” for designating their                         metadata application profile documentation. Although technically correct, we find the term                     too generic to this application as a data model is anything that provides guidelines to structure                               data. Moreover, using data model to designate a metadata application profile can cause                         confusion since on the literature, the term is also used to designate data structures such as                               tabular data, relational data model, meta-markup and RDF. Namespaces  A namespace is a component of a metadata application profile. According to Hay (2006), ​“an                             XML namespace is the URI that describes an ontology from which terms are taken.”​(Hay, 2006)                             It allows consistent reuse of elements of metadata description already developed by someone                         else, ensuring a common understanding of how the data should be interpreted (Hooland &                           Verborgh, 2014). A prefix is usually added to each element indicating the namespace it comes                             from, and the full namespaces URIs are indicated at the beginning of the metadata schema                             file. The example below is an excerpt of the Portland Common Data Model RDF file available                               on Github (https://github.com/duraspace/pcdm/blob/master/models.rdf).  <?xml version="1.0"?> <?xml-stylesheet type="text/xsl" href="rdfs2html.xsl"?> <rdf:RDF      xmlns:dcterms="http://purl.org/dc/terms/"     xmlns:ldp="http://www.w3.org/ns/ldp#"     xmlns:ore="http://www.openarchives.org/ore/terms/"     xmlns:owl="http://www.w3.org/2002/07/owl#"     xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#"     xmlns:rdfs="http://www.w3.org/2000/01/rdf-schema#"     xmlns:xsd="http://www.w3.org/2001/XMLSchema#"     xmlns:pcdm="http://pcdm.org/models#">      <rdf:Description rdf:about="http://pcdm.org/models#">     <dcterms:title xml:lang="en">Portland Common Data Model</dcterms:title>     <dcterms:publisher rdf:resource="http://www.duraspace.org/"/>     <rdfs:seeAlso rdf:resource="https://github.com/duraspace/pcdm/wiki"/> <rdfs:comment xml:lang="en">Ontology for the Portland Common Data Model,         intended to underlie a wide array of repository and DAMS          applications.</rdfs:comment>     <owl:versionInfo>2016/04/18</owl:versionInfo>     <owl:priorVersion rdf:resource="http://pcdm.org/2015/09/28/models"/>     </rdf:Description> Controlled vocabularies  5 According to Hooland & Verborgh (2014), a controlled vocabulary ​“represents a restricted                       subset of language which has been explicitly created to avoid the problems which arise with                             the use of natural language during the indexing and retrieval of information” ​(Hooland &                           Verborgh, 2014). That means that standardized words are used to represent concepts,                       establishing preferred terms to promote consistency. (Harpring & Baca, 2010) Controlled                     vocabularies are a type of Knowledge Organization Systems (KOS). There are mainly three                         types of controlled vocabularies as described below.  ● Classification schemes: ​offer a way to group physically documents of similar content,                       using classes arranged systematically (Broughton, 2004). Example: Dewey Decimal Classification (DDC)  ● Subject headings: ​describe the subject of specific resources in a succinct way, using                         one or few words (Hooland & Verborgh, 2014).  Example: Library of Congress Subject Headings (LCSH)  ● Thesauri: ​represent an application domain in a logical way, building a structure of                         preferred and non preferred terms, related terms, broader and narrower terms                     (Hooland & Verborgh, 2014). Example: Arts and Architecture Thesaurus (AAT)  Controlled vocabularies increase the chances of finding desired content even with imprecise                       keywords (related terms). They also increase recall (proportion of the documents relevant to                         the search that were successfully retrieved) and precision (proportion of retrieved documents                       relevant to the search). Better recall is possible because thesauri provide synonymy control,                         meaning that it takes in account different words that may represent the same or similar                             concepts. Greater precision is possible because of polysemy control, which means that when                         the same term is used to represent different concepts the thesauri allow for disambiguation                           because of the hierarchical structure it provides. For example, apple the fruit would be in a                               different location (under a different broader term) than Apple the company.  However, controlled vocabularies are expensive to create and difficult to maintain, as it takes                           time and resources to keep them up to date. They are subjective and thus prone to express a                                   specific world view, usually biased. Also, they can be difficult to users to understand and use.                               They definitely have value in the linked data context, but each application should be evaluated                             in the light of pros and cons they offer.   SKOS Simple Knowledge Organization System (SKOS) is a simplified language to represent controlled                       vocabularies on the web. It is designed to be easier to use than more complex language                               ontologies such as OWL, but still powerful enough to support semantically enhanced search                         6 (machine-understandable). It is meant to describe content of books and other resources, not                         to formally describe aspects of the world by axioms and facts as ontologies do (Alistair et al.,                                 2005). The figure 1 below is an example of how SKOS may be used to represent a thesaurus                                   entry. Note the use of a prefix (skos), as in namespaces, and the label of the elements                                 (predicates of the triples) representing classical thesauri terms such as related, broader and                         narrower.   Figure 1 - Example of a thesaurus entry represented in SKOS (source: https://www.w3.org/TR/2005/WD-swbp-skos-core-guide-20051102/​)  Ontologies According to Harpring & Baca (2010), ​“an ontology is a formal, machine-readable specification                         of a conceptual model in which concepts, properties, relationships, functions, constraints, and                       axioms are all explicitly defined” (Harpring & Baca, 2010). Ontologies express knowledge about                         a domain and the relation between its concepts based upon an open world assumption,                           meaning that inference is allowed based on the information explicitly asserted. Differently                       from the closed world assumption, where only that of is asserted is considered known,                           ontologies are able to complete incomplete information entered by applying its rules logically                         7 (Hay, 2006). Ontologies should not be confused with controlled vocabularies. Controlled vocabularies are                     used by ontologies to express the vocabulary of a given domain, which is by its turn used                                 according to the grammar defined by the ontology. While controlled vocabularies aim to                         provide means to cataloging and retrieval, ontologies aim to represent knowledge in a                         machine-readable form. Concepts are organized in classes, individuals, attributes, relations                   and events (Harpring & Baca, 2010).  Representational State Transfer (REST) Application Programming           Interfaces (APIs)  APIs are a set of defined methods that make data accessible to machines. They have their own                                 vocabulary and syntax, defining property names and labels and how the information is                         arranged (University of British Columbia, n.d.). APIs receive programming instructions and                     provide data answers in XML or JSON. HTTP APIs, that preceded REST APIs, didn't offer a way                                 to integrate human access interface and machine access interface, keeping them both distinct.                         REST, by other hand, provide access for human and machine consumers in the same way,                             avoiding duplications and minimizing maintenance. This is achieved by uniform interface                     constraints, consisting on:  - URLs that refers to an object (piece of content) instead of a representation. - Resource manipulation though representations, returning the representation most               appropriate for each request. For example an HTML page for a person and a JSON for a                                 Javascript application. - Self-descriptive messages which contain all the information necessary to understand                   and process it. For example, the second page of search results doesn't require the first                             page in order to be accessed. - Hypermedia as the engine of application state, meaning that links are provided instead                         of identificators, dispensing the need for documentation to understand the information                     (Hooland & Verborgh, 2014). Main Data Models (Data Structures) There are four types of data structures: tabular data, relational model, meta-markup                       languages and RDF triples. Figure 2 synthesizes the differences among these four types.  8  Figure 2 - Schematic comparison of the four major data models (Hooland & Verborgh, 2014).  Tabular Data Data is organized in columns and rows, and the data in the intersections (cells), has its                               meaning defined by the columns and row it belongs to. Each item (row) has the same fields                                 9 (columns), and a header line can indicate their name. Tabular data is useful to import and                               export data with a simple structure. It is intuitive to use, portable and independent on the                               technology used. However, search and retrieval is inefficient and data has to be repeated in                             many instances, increasing the risk of inconsistencies (Hooland & Verborgh, 2014). Relational Model More than one table is used to structure data, each with its own set of fields, and tables are                                     interlinked by using key columns. The relational model minimizes redundancies and                     inconsistencies of the tabular model. It is used to normalize and manage complex data. It                             allows better search and retrieval functions, through queries, but is schema-dependent                     (Hooland & Verborgh, 2014). Meta-markup languages Meta-markup languages structure data in a hierarchical way, starting with a single root that                           breaks down in children elements. It is employed to import and export complex data. It is                               machine-readable, and also human readable with some training. It is independent of any                         platform but can be hard to implement for complex data. Its main disadvantage is its verbosity                               (Hooland & Verborgh, 2014). RDF (Triples) Triples, or RDF (resource description framework), structure data in statements consisting of                       subject, predicate and object. Each line connects a subject to an object through a predicate,                             expressing a precise relationship. There are no constraints to what can be connected to what                             and the structure is easily extended by the addition of more triples. Triples are                           schema-neutral, and the triple is complete semantically (no need for additional                     documentation). Triples can be expressed in a graph format, and this data model allow logical                             inference and the linking of data. However, normalization is lost when using a triple data                             structure, and the software market is still immature to work with data in this format (Hooland                               & Verborgh, 2014). SPARQL SPARQL Protocol and RDF Query Language is a query language to query data structured in                             triples (RDF) in any serialization format. Queries in SPARQL are based on graph patterns and                             follow the subject-predicate-object triples structure (Hooland & Verborgh, 2014; Southwick,                   2015).  10 BIBFRAME BIBFRAME (Bibliographic Framework) is a data model for bibliographic description expressed                     in RDF vocabulary with classes and properties. It is compatible with linked data principles and                             it is meant to replace MARC standards. BIBFRAME is based on Functional Requirements for                           Bibliographic Records (FRBR) model of work, expression, manifestation and item, consisting,                     however, of three core categories ​(work, instance, item), with additional key concepts related to                           the core classes (agent, subject, event) (Figure 3). Properties in BIBFRAME describe characteristics                         of the resource described as well the relationship among resources (ex: instance of, translation of).                             (Library of Congress, 2016)    Figure 3 - Illustration of BIBFRAME 2.0 model, with three core levels of abstraction (in blue)—Work,                               Instance, Item—and three related classes (in orange)—Agent, Subject, Event. (Source:                   https://en.wikipedia.org/wiki/BIBFRAME) 11 Serialization Formats for Linked Data Data structures have to be converted in a stream of bits in order to manipulated by software                                 or shared over a network. Serialization is the process of translating the data structure into a                               format (Hooland & Verborgh, 2014). The most relevant serialization formats to linked data                         purposes are briefly described below. XML In XML the content is annotated with tags that describe their meaning. Tags are similar to the                                 ones used in other markup languages such as HTML. XML enables data portability, however it                             lacks semantics (needs an schema in order to be interpreted) and is a verbose serialization                             format (Hay, 2006). XSD XML Schema XML Schema Definition (XSD) provides the semantics that are lacking in a XML format. It lists                               element and attributes names, relationships, data structure, and data types (Legg, 2007). JSON and JSON-LD JSON-LD, ​or JavaScript Object Notation for Linked Data, ​was developed based on JSON,                         enhancing it by providing additional mappings to an RDF model. It is aimed to encode linked                               data. The additional mappings provide context by linking object properties to concepts in an                           ontology (JSON-LD, n.d.). RDF/XML Resource Description Framework is the main serialization format for linked data. As in XML                           files, content is annotated with tags for describing semantics. Main tags are rdf:subject,                         rdf:predicate and rdf:object, forming a triple, with values being expressed as URIs (Hay, 2006). RDF Schema Resource Description Framework Schema (RDFS) extends RDF by adding tags to define                       domain, range, classes and subclasses. According to Ray (2006), ​“in RDFS, attributes and                         relationships are properties that are defined before assigning them to classes. Note that all                           relationships and attributes are considered optional many-to-many. There are no cardinality                     constraints in RDF” ​ (Hay, 2006). 12 Turtle and N-Triples Turtle (Terse RDF Triple Language) is a serialization format for RDF (triple) data structure in a                               less verbose way than RDF/XML, what makes it more compact and easier to read. It ​“provides                               ways to abbreviate such information, for example by factoring out common portions of URIs”                           (Turtle syntax, n.d.). N-triples is a subset of Turtle, line-based. Each line is a triple statement, composed by subject,                               predicate and object separated by a white space, and terminated with a full stop. Predicates                             have to be always expressed by URI, while subjects may be a URI or a blank nde, and objects                                     may be a URI, blank node or a literal (string) (N-Triples, n.d.). OWL OWL (Web Ontology Language) allows the precise definition of concepts of an ontology                         (Hooland & Verborgh, 2014), extending RDF by allowing the definition of relationships between                         classes (union, intersection, complement, etc.), class cardinality, equality for both classes and                       individuals, properties characteristics (symmetry, transitivity, functionality, etc.), and               restrictions on property behaviour by class (e.g. assign class UBC alumni to every record that                             has “UBC” as institution issuing degree) (Legg, 2007). OWL os based on open world assertion,                             meaning that ​“anything can be true unless asserted otherwise” ​(Hay, 2006).  Linked Data Process  Linked data (and linked open data) projects should follow the process depicted in figure 4. The                               phases are succinctly described below, with exception of the implementation phase, which is                         more detailed as it has more specificities when compared to standard metadata projects. The                           information in this section was gathered from Hooland and Verborgh (2014) and Southwick                         (2015).  Figure 4 - Linked Data process diagram, based on Southwick (2015) (Southwick, 2015). Planning ● Literature review ● Benchmarking 13 ● Stakeholders ● Proof of concept and preliminary testing ● Securing resources and funds ● Ensure top-level commitment to the project Designing ● selecting technologies ● defining a data model (aka Metadata Application Profile MAP, Namespaces) ● mapping ● defining the rules to create URIs Implementing  ● Modelling: The first step to build a linked data application is to have data structured                             following a RDF data model, consisting of triples, with URIs as names for things. ● Cleaning: Ensuring that your metadata is consistent and well structured is of crucial                         importance since it will affect the quality of the outputs of the reconciling and enriching                             steps of linked data process. Data profiling methods and tools can be used to help                             diagnose problems in your metadata in a semi-automated manner, and to deduplicate,                       normalize and clean it.  ● Reconciling: “Terms used in your metadata records can be reconciled with existing and                         well established vocabularies.” This is an easier to implement approach (when                     compared to full ontologies) to aggregate some level of semantics to your metadata,                         providing URIs with useful information in a standardized format and links to related                         URIs. String matching can be used as a low-cost approach to connect your metadata to                             a controlled vocabulary. ● Enriching: Enriching consists in obtaining structured metadata from unstructured data.                   Using OCR and named-entity recognition on a full text of a digitized document, for                           example, more metadata about the contents of the text can be extracted and added in                             a structured way to the record, becoming available for linking. It is specially useful                           when dealing with large digitization projects, big data linking or in the realm of digital                             humanities. Named-entity recognition (NER): ​“NER currently provides the easiest and                 cheapest method of identifying and disambiguating topics in large volumes of                     unstructured textual documents” ​(Hooland & Verborgh, 2014). Publishing  ● Publishing: Linked data should ideally be published in a format that allows both human                           and machine-interpretation. REST APIs allows you to do that in an elegant and                         sustainable way, avoiding needless duplication and maintenance. 14  ● The data set should: ○ be linked to other data sets ○ provide provenance of the metadata ○ explicitly indicate license for use ○ adopt terms from well-established controlled vocabularies ○ use dereferenceable URIs ○ map local vocabulary terms to other vocabularies ○ provide set-level metadata ○ provide more than one way to access the data set (e.g., SPARQL endpoint and                           RDF dumps) Consuming ● Final user interface ● APIs  Value and Challenges for Libraries and Archives According to the results of the survey on LD adoption conducted by Online Computer Library                             Center (OCLC) (Mitchell, 2016a), libraries and archives engaging in linked data projects are                         looking for:  ● “enriching bibliographic metadata or descriptions,”  ● “interlinking,”  ● “a reference source that harmonize data from multiple sources,”  ● “automate authority control,”  ● “enrich an application,” ● “to publish data more widely,” ● “to demonstrate potential use cases and impact.”   The same survey highlights some of the challenges Libraries and Archives are facing in order                             to implement those projects:  ● Inexistence of a formalized and established implementation approach across institutions. ● Lack of an easy-to-implement approach demand high level technological skills from staff. ● Immature software market and tools. ● Non standard approach and guidelines to data licensing for published data. ● Lack of integration of authority resources to linked data tools and services. 15 Linked Data Cases in Libraries and Archives The cases described below were selected based on the number of requests per day, meaning                             that they are the most popular resources among the ones listed on OCLC’s 2014 survey                             (Mitchell, 2016a; 2016b). Examples from both libraries and archives were selected. Some                       interesting results from the survey are summarized below:  ● “The most commonly used LD data sources (vocabularies) were id.loc.gov, DBpedia,                     GeoNames, and VIAF.”  ● “Data in the projects analyzed was often bibliographic or descriptive in nature.” ● “The most common organizational schemas used were Simple Knowledge Organization                   System (SKOS), Friend of a Friend (FOAF), Dublin Core and Dublin Core terms, and                           Schema.org.”  ● “Resource Description Framework (RDF) serialized in the eXtensible Markup Language                   (XML) was commonly used, as was RDF serialized in JavaScript Object Notation (JSON)                         and Terse RDF Triple Language (Turtle).” WorldCat Linked Data Project (Libraries) Link: ​https://www.worldcat.org/ Number of requests/day: an average of 16 million (OCLC Research, 2014) Technology used: URIs, RDF, keyword search.  WorldCat was enhanced in 2014 by the ​“addition of URIs from WorldCatWorks (OCLC 2014d),                           an RDF dataset that is automatically generated from WorldCat catalog records and identifies                         common content in the editions and formats of particular books, sound recordings, and other                           resources held in library collections.” ​(Godby et al., 2015) The motivation behind the project                           was to make WorldCat records more useful, ​“—especially to search engines, developers, and                         services on the wider Web, beyond the library community” and ​“easier for search engines to                             connect non-library organizations to library data”​. (Godby et al., 2015) Library of Congress’s (LoC) id.loc.gov service (Libraries) Link: ​http://id.loc.gov/about/ Number of requests/day: over 100,000 (OCLC Research, 2014) Technology used: URIs, keyword search, REST API. RDF/XML, Turtle, or N-triples, are available                         for bulk download for the authorities and vocabularies (MADS/RDF and SKOS/RDF                     representations of the data).  “The Library of Congress Linked Data Service enables both humans and machines to                         programmatically access authority data at the Library of Congress. The scope of the Linked                           16 Data Service is to provide access to commonly found standards and vocabularies promulgated                         by the Library of Congress. This includes data values and the controlled vocabularies that                           house them. The main application provides resolvability to values and vocabularies by                       assigning URIs. Each vocabulary possesses a resolvable URI, as does each data value within it.                             URIs accessible at id.loc.gov only link to authority data -- that is, controlled vocabularies and                             the values within them. Therefore, users will not find identifiers for electronic bibliographic                         resources. The Library of Congress uses other identifier schemes such as ​Handles for this                           purpose.”​ (Library of Congress, n.d.) British Library’s British National Bibliography (Libraries) Link: ​http://bnb.data.bl.uk/ Number of requests/day: 10,000 – 50,000 (OCLC Research, 2014) Technology used: URIs, RDF, keyword search, SPARQL queries.   “The BNB Linked Data Platform provides access to the ​British National Bibliography published                         as linked open data and made available through SPARQL services. Two different interfaces are                           provided: a ​SPARQL editor​, and /sparql a service endpoint for remote queries. The Linked                           Open BNB is a subset of the full British National Bibliography. It includes published books                             (including monographs published over time), serial publications and new and forthcoming                     books, representing approximately 3.9 million records. The dataset is available under a                       Creative Commons CC0 1.0 Universal Public Domain Dedication​ licence.”​ (British Library, n.d.) American Numismatic Society’s thesaurus (Archives) Link: ​http://nomisma.org/ Number of requests/day: 10,000 – 50,000 Technology used: URIs, RDF/XML, JSON-LD, Turtle, KML, SPARQL queries.  “Nomisma.org is a collaborative project to provide stable digital representations of                     numismatic (relating to or consisting of coins, paper currency, and medals) concepts according                         to the principles of ​Linked Open Data​. These take the form of http URIs that also provide                                 access to reusable information about those concepts, along with links to other resources. The                           canonical format of nomisma.org is RDF/XML, with serializations available in JSON-LD                     (including geoJSON-LD for complex geographic features), Turtle, KML (when applicable), and                     HTML5+RDFa 1.1.”​ (Nomisma, n.d.) Archaeology Data Service Linked Open Data (Archives) Link: ​http://data.archaeologydataservice.ac.uk/page/ Number of requests/day: fewer than 1,000 (Mitchell, 2016a) Technology used: URIs, RDF/XML, SPARQL queries. 17  The Archaeology Data Service “preserves digital data in the long term, and promotes and                           disseminating a broad range of data in archaeology, using a variety of avenues, including                           Linked Open Data. Linked Data at the ADS was initially made available through the STELLAR                             project (http://hypermedia.research.southwales.ac.uk/kos/stellar/), a joint project between the             University of South Wales, the ADS and Historic England. The STELLAR project developed an                           enhanced mapping tool for non-specialist users to map and extract archaeological datasets                       into RDF/XML, conforming to the CRM-EH ontology (an extension of CIDOC CRM for                         archaeology). The results of the STELLAR project are published from the ADS SPARQL                         endpoint. ADS also consumes LOD from other sources (Library of Congress, Ordnance Survey,                         GeoNames, DBpedia and the vocabularies developed as part of the SENESCHAL project -                         http://www.heritagedata.org/blog/about-heritage-data/seneschal) to populate the metadata         held within our Collection Management System with URIs, and then publishes the resource                         discovery metadata for all our archives via our SPARQL endpoint.”​ (The University of York, n.d.)  Sources of data about LD projects in Libraries, Archives and                   Museums These sources offer a comprehensive list of linked data initiatives in libraries, archives and                           museums, as well as use cases and frameworks for application. OCLC survey on LD adoption (2015) “In 2014, OCLC staff conducted a survey on LD adoption, a survey that is being repeated for                                 2015. The analyzed results from the 2014 survey are captured in a series of blog posts on the                                   site hangingtogether.org and provide a substantial window into the state of LD deployment in                           LAM institutions.1 The survey surfaced 172 projects, of which 76 included substantial                       description. Of those 76 projects, over a third (27) were in development.” ​(Mitchell, 2016a)  Links: Home: ​http://www.oclc.org/research/themes/data-science/linkeddata.html Blog: ​http://hangingtogether.org/?p=4137 Library Linked Data Incubator Group (LLD XG) wiki (2011) The mission of the LLD XG, chartered from May 2010 through August 2011, has been ​"to help                                 increase global interoperability of library data on the Web, by bringing together people                         involved in Semantic Web activities — focusing on Linked Data — in the library community and                               beyond, building on existing initiatives, and identifying collaboration tracks for the future."                       (W3C Incubator, 2011). They offer a series of generalized and individual use cases of linked                             data.  18  Links: Final report: ​https://www.w3.org/2005/Incubator/lld/XGR-lld-20111025/ Use cases page: ​https://www.w3.org/2005/Incubator/lld/XGR-lld-usecase-20111025/  Use cases wiki: ​https://www.w3.org/2005/Incubator/lld/wiki/Use_Cases  Linked Data for Libraries (LD4L) (2016)  “The goal of the project is to create a Scholarly Resource Semantic Information Store (SRSIS)                             model that works both within individual institutions and through a coordinated, extensible                       network of Linked Open Data to capture the intellectual value that librarians and other domain                             experts and scholars add to information resources when they describe, annotate, organize,                       select, and use those resources, together with the social value evident from patterns of usage.”                             (Duraspace, 2016) They offer a series of generalized cases, clustered into six main areas                           including “Bibliographic + Curation” data, “Bibliographic + Person” data, “Leveraging external                     data including authorities,” “Leveraging the deeper graph,” “Leveraging usage data,” and                     “Three-site services” (e.g., enabling a user to combine data from multiple sources).” (Mitchell,                         2016a)  Links: Project wiki: ​https://wiki.duraspace.org/display/ld4l/LD4L+Use+Cases Paper about the project: ​http://ceur-ws.org/Vol-1486/paper_53.pdf       19 References  Alistair, M., Matthews, B., Beckett, D., Brickley, D., Wilson, M. and Rogers, N. (2005) SKOS: a language to describe simple knowledge structures for the web, http://epubs.cclrc.ac.uk/bitstream/685/SKOS-XTech2005.pdf  Berners-Lee, T. (2009). Linked Data. Retrieved October 10, 2017, from https://www.w3.org/DesignIssues/LinkedData.html  Berners-Lee, T., Hendler, J. and Lassila, O. (2001) The Semantic Web, Scientific American, 284 (5), 34-43.  Bray, T. Hollander, D., Layman, A. and Tobin, R. (2006) Namespaces in XML 1.1, 2nd edn, W3C Recommendation, http://www.w3.org/TR/xml-names11/.  British Library. (n.d.). Welcome to bnb.data.bl.uk. Retrieved October 10, 2017, from http://bnb.data.bl.uk/  Broughton, V. (2004) Essential Classification, Facet Publishing.  Duraspace. (2016). LD4L Use Cases. Retrieved October 10, 2017, from https://wiki.duraspace.org/display/ld4l/LD4L+Use+Cases  Fielding, R. T. (2000) Architectural Styles and the Design of Network-based Software Architectures, PhD thesis, University of California, Irvine, CA.  Godby, C. J., Wang, S., & Mixter, J. K. (2015). Library Linked Data in the Cloud: OCLC’s Experiments with New Models of Resource Description. Synthesis Lectures on the Semantic Web: Theory and Technology, 5(2), 1–154. http://doi.org/10.2200/S00620ED1V01Y201412WBE012  Harpring, P., & Baca, M. (2010). 1. Controlled Vocabularies in Context. In Introduction to Controlled Vocabularies: Terminology for Art, Architecture, and Other Cultural Works (pp. 1–11). Retrieved from http://www.getty.edu/research/publications/electronic_publications/intro_controlled_vocab/  Hay, D. C. (2006). Data Modeling, RDF, & OWL - Part One: An Introduction To Ontologies. The Data Administration Newsletter, (April). Retrieved from http://www.tdan.com/view-articles/5025  20 Hooland, S.; Verborgh, R. (2014). Linked data for libraries, archives and museums : how to clean, link and publish your metadata. Neal-Schuman.  JSON-LD. (2017, October 17). In Wikipedia, The Free Encyclopedia. Retrieved October 18, 2017, from https://en.wikipedia.org/w/index.php?title=JSON-LD&oldid=805736276  Legg, C. (2007). Ontologies on the Semantic Web. Annual Review of Information Science and Technology, 41(1), 407–451. http://doi.org/10.1002/aris.2007.1440410116  Library of Congress. (2016). Overview of the BIBFRAME 2.0 Model. Retrieved October 10, 2017, from https://www.loc.gov/bibframe/docs/bibframe2-model.html  Library of Congress. (n.d.). About Linked Data Service. Retrieved October 10, 2017, from http://id.loc.gov/about/  Miessler, D. (2015). The Difference Between URLs and URIs. Retrieved October 10, 2017, from https://danielmiessler.com/study/url-uri/  Miller, S. J. (2011). Metadata, Linked Data, and the Semantic Web. In Metadata for Digital Collections (pp. 303–324).  Mitchell, E. T. (2016a). Library Linked Data: Early Activity and Development. Library Technology Reports (Vol. 52). http://doi.org/10.5860/ltr.52n1  Mitchell, E. T. (2016b). Chapter 1. The Current State of Linked Data in Libraries, Archives, and Museums. Retrieved October 10, 2017, from https://journals.ala.org/index.php/ltr/article/view/5892/7446  N-Triples. (2017, September 24). In Wikipedia, The Free Encyclopedia. Retrieved October 18, 2017,  from https://en.wikipedia.org/w/index.php?title=N-Triples&oldid=802208118  Nomisma. (n.d.). Retrieved October 10, 2017, from http://nomisma.org/  OCLC Research. (2014). Linked Data Survey results 1 – Who’s doing it (Updated). Retrieved October 10, 2017, from http://hangingtogether.org/?p=4137  Olson, J. (2003) Data Quality: the accuracy dimension, Morgan Kaufmann.  Southwick, S. B. . (2015). A Guide for Transforming Digital Collections Metadata into Linked Data Using Open Source Technologies. Journal of Library Metadata, 15(1), 1–35. http://doi.org/10.1080/19386389.2015.1007009  21 The University of York. (n.d.). Archaeology Data Service Linked Open Data. Retrieved October 10, 2017, from http://data.archaeologydataservice.ac.uk/page/  Turtle (syntax). (2017, September 24). In Wikipedia, The Free Encyclopedia. Retrieved October 18, 2017, from https://en.wikipedia.org/w/index.php?title=Turtle_(syntax)&oldid=802208209  Uniform Resource Identifier. (2017, October 14). In Wikipedia, The Free Encyclopedia. Retrieved October 18, 2017, from https://en.wikipedia.org/w/index.php?title=Uniform_Resource_Identifier&oldid=805285595  University of British Columbia. (n.d.). Open Collections API Documentation. Retrieved October 10, 2017, from https://open.library.ubc.ca/docs  W3C Incubator. (2011). Library Linked Data Incubator Group Final Report. Retrieved October 10, 2017, from https://www.w3.org/2005/Incubator/lld/XGR-lld-20111025/      22  University of British Columbia School of Library, Archival and Information Studies Master of Library and Information Studies       LIBR 594 - Assignment 2  Linked Data Web Application: use cases of UBC Open Collections API  Carolina Román Amigo  Supervisor: Richard Arias Hernandez, UBC SLAIS Instructor Co-supervisor: Paul Joseph, UBC Systems Librarian        November 2017    Table of Contents  Introduction 2 Collections 2 MacMillan Bloedel Limited fonds 2 UBC Institute of Fisheries Field Records 3 Vocabularies 4 Geonames Ontology 4 Encyclopedia of life 4 Tools 5 CARTO Builder 5 OpenRefine 5 Implementation Process 6 MacMillan Bloedel Limited fonds 6 Modelling 6 Cleaning 7 Reconciling 9 Building interface 12 UBC Institute of Fisheries Field Records 18 Modelling 18 Cleaning 19 Reconciling 22 Building interface 25 References 29       1 Introduction  This project aims to explore the potential uses of the ​Open Collections Research API for                             linked data projects. The API is made available by University of British Columbia                         Libraries to provide machine-readable access to collections metadata and transcripts.                   We developed simple web applications providing data visualizations of collections                   metadata linked to external controlled vocabularies, providing an enhanced view of                     UBC’s digital repository. We describe below the collections, controlled vocabularies and                     tools used in this project.  Collections MacMillan Bloedel Limited fonds https://open.library.ubc.ca/collections/macmillan  The MacMillan Bloedel Limited fonds collection contains 2781 still images depicting the                       history of MacMillan Bloedel forest products company. Metadata contains names of                     locations (Library of Congress format) across the world, mainly in British Columbia. The                         goal for this use case is to link locations to a controlled vocabulary and plot records on                                 a map, getting a preview of title, date and image thumbnail for each location.  Link for the demo: https://carolamigo.carto.com/builder/881d9644-0439-4131-ad92-c16bcb1e2608/embed 2  UBC Institute of Fisheries Field Records https://open.library.ubc.ca/collections/fisheries  The UBC Institute of Fisheries Field Records contains 11021 still images depicting pages of                           notebooks describing fish specimens collected around the world over a period of more                         than 100 years. Metadata contains latitude and longitude and species of fish collected                         for each record. The goal of this use case is to link fish species to a controlled                                 vocabulary and plot records on a map, getting a preview of title and a link to species                                 found in each location.  Link for the demo: https://carolamigo.carto.com/builder/df748d01-2fc8-424b-ae46-dfd834b200a0/embed 3  Vocabularies Geonames Ontology http://www.geonames.org/ontology/documentation.html  Geonames is an ontology that provides geospatial semantic information to be used in the                           World Wide Web. It provides over 11 million geonames toponyms with unique URIs                         and an API service that can be used for reconciliation.  Encyclopedia of life http://eol.org/  The goal of Encyclopedia of life is to provide access to knowledge about life on earth                               aggregating information scattered around the world in books, journals, databases,                   websites, specimen collections. It is the result of a collaborative initiative among                       academic institutions and the community. Although it is not a controlled vocabulary, it                         offers a unique URI to each record and a reconciliation service for OpenRefine. 4 Tools CARTO Builder https://carto.com/builder/  CARTO Builder is a web-based georeference visualization tool that allows you to easily                         build interfaces based on tabular data. By the time this report was written it was free                               for anyone to use, since you agreed to make your data publicly available. They also                             have free special licences for students (see more at ​GitHub Student Developer Pack​). OpenRefine http://openrefine.org/  OpenRefine is a powerful tool to convert, clean and enrich data. Is if free to use and offers                                   linked data extensions and reconciliation services to compare and combine related                     datasets.   5 Implementation Process  The implementation process for each use case is described in detail in this section, aiming to document and allow this project to be reproduced. The structure follows the workflow for Linked Data projects described by Hooland & Verborgh (2014). MacMillan Bloedel Limited fonds  GitHub repository:  https://github.com/carolamigo/ubc_carto_macmillan Modelling  ● Download collection metadata using the Open Collections Research API. A php script to batch download is provided at ​OC API Documentation page​ > Download Collection Data. This script returns a folder containing one RDF file per collection item (or XML, JSON, any format preferred). We are going to use N-triples because the file is cleaner (no headers or footers), what makes the merging easier later. Edit the script following the instructions on the documentation page and run it using the command:  $ php collection_downloader.php --cid macmillan --fmt ntriples  ● Merge the files using the Unix cat command:  $ cat * > merged_filename  ● Convert merged file obtained to a tabular format. Import project in Open Refine using the RDF/N3 files option. No character encoding selection is needed.   Challenges  - My first try to get the collection metadata was to use Postman to send a POST query for the OC API using a key, as described in the ​OC API Documentation page​. However, queries requesting more than 1000 items are not completed. Our collection is almost 3x larger than that limit, meaning that the data would have to be retrieved in batches and merged together in OpenRefine. 6  - OpenRefine is able to open N-triples files, but I had problems in the first try with predicates that are used more than once within the same record. For example, the predicate “subject”, used for keywords related to the resource, is repeated in several triples within a record. OpenRefine reads predicates as columns names (when using any RDF or triple based language decodification option), and it doesn’t allow repeated columns names. In my second try everything worked perfectly, so I believe I had a small formatting problem in my previous file that was causing the problems with OpenRefine.   Cleaning  ● Examine the metadata for Geographic Locations. With the tabular data open in OpenRefine, look for the column “​http://purl.org/dc/terms/spatial​”. In the column options, select “Facet” > “Text facet”. The facets show you all the unique values for geographic locations in this dataset. From that list, it is possible to see that:  ○ The location names are following Library of Congress formatting style, with the province following the name of the city, and that they are in between double quotes with the language notation following:   e.g. “Alberni (B.C.)”@en  ○ Some location names have small typos:   "Namaimo River (B.C.)”@en  ○ Some resources have more than one geographic location associated with it:   e.g. "Powell River (B.C.) ; Nanaimo (B.C)"@en  ● Split the cells containing more than one geographic location value.   ○ Duplicate the “​http://purl.org/dc/terms/spatial​” column using “Edit column” > “Add column based on this column” in order to preserve original values. Name the new column: spatial_cleaned ○ On the spatial_cleaned column, select “Edit cells” > “Split multi-valued cells”. The separator is “;”.  7 ● Remove double quotes, provinces and “@en” from location names. Select “Edit cells” > “Transform” and write the following expression:  value.replace("\"", " ").replace("@en"," ").replace(/\(([^\)]+)\)/," ")  ● Trim leading and trailing whitespaces by selecting “Edit cells” > “Common transforms” > “Trim leading and trailing whitespaces”.  ● Cluster location names in order to combine entries with typos and small inconsistencies under just one geographic location name. On the spatial_cleaned column, select “Facet” > “Text facet”, then select “Cluster” in the facet window. In the cluster window, select Nearest neighbour” method. Select the “merge” box for “Nanaimo River”, correct the typo, and select “Merge selected and close”.    ● Fill down column “subject”, “​http://purl.org/dc/terms/title​”, “http://purl.org/dc/terms/created” and “​http://www.europeana.eu/schemas/edm/isShownAt​” as we have several orphan cells resulting from the triple to tabular data format conversion. Go to each column, “Edit cells” > “Fill down”.  8 Challenges  - Creating the right expression to manipulate data in OpenRefine can be challenging if you are not used to the GREL syntax and to regular expressions. The Regex Builder (​https://regexr.com/​) and the OpenRefine documentation on GitHub (​https://github.com/OpenRefine/OpenRefine/wiki/Understanding-Expressions​) are helpful resources.  Reconciling  ● Configure Geonames reconciliation service in OpenRefine following the procedure described here: ​https://github.com/cmh2166/geonames-reconcile​. This procedure involves getting a Geonames API username, installing python packages, cloning the code in the GitHub repository above, and running the script provided in the code.   ● Perform reconciliation on the “spatial_cleaned” columns using “geonames/name_equals” option. Follow the steps described here: http://christinaharlow.com/walkthrough-of-geonames-recon-service​.  ● When reconciliation is finished, review the results. An easy way to do it is to facet by text (“Facet” > “Text facet”) and filter results by clicking on a location name on the facet menu. On the spatial_cleaned column, click on the link for the location to check the reconciled value. This will open the Geonames window with location information. If it is correct, no further action is required. If it is wrong (e.g. Alice Lake found is in B.C. but geonames returned a lake with the same name in Michigan), click on “Choose a new match” under any of the wrong entries on the spatial_cleaned column. Three options will show up. Select the correct one by using the double checked box, which means your decision will be applied to all other cells that match this condition. If no correct option show up in the cell, click on the double checked box of “Create new topic”, meaning that no reconciliation value will be added to cells that match this condition. There are 66 unique values in this dataset, so it is possible to review one by one until it is done. 9   ● Verify reconciliation results that didn’t find a match (including the ones you had to “create a new topic” for) by selecting the “none” facet in the judgment box. I have found the following ones with no matches, so I had to add coordinates manually for those by looking up manually in the Geonames database using the Geonames search box (​http://www.geonames.org/​). To mass edit a value, click on the edit link that appear next to the exclude link for the value in the facet window. Enter the value and click “Apply”.    10   ● Extract reconciled data retrieving name, id and latitude/longitude, as a string separated by “|”. Select spatial_cleaned, “Edit cells” > “Transform”, and entering the following expression:  cell.recon.match.name + " | " + cell.recon.match.id  ● Split the values obtained in the reconciliation, that should be in this format:  Nanaimo River | 49.1304, -123.89385 | ​http://sws.geonames.org/6951400  Select spatial_cleaned, “Edit column” > “Split into several columns”. Select the “|” separator and name the columns according after split: geonames_names, geonames_coord, geonames_uri.  ● The “geonames_coord” column has to be further split in latitudes and longitudes using the same command above, “Edit column” > “Split into several columns”, with separator “,”. Name the columns “lat” and “long”. Trim leading and trailing whitespaces by selecting “Edit cells” > “Common transforms” > “Trim leading and trailing whitespaces”.  Challenges  - Setting up the Geonames reconciliation service on OpenRefine was not straightforward, since I had to install more than one Python package to make it work.  - My first try to reconcile the data was to download the Geonames database dump for the countries that appear in the collection and constructing URIs based on each location id using OpenRefine. Then I used VLOOKUP function on Excel to reconcile the collections list of locations with the geonames spreadsheet, using the timezone info to check if the location retrieved is correct. The results were good, but this is a more laborious way that 11 may not apply to larger datasets. Manipulating the database dump just for the countries appearing in our collection (CA, US, NZ, AU) was already difficult because of its large size.  Building interface  ● Prepare the data to interface. In order to have links and images on CARTO interface, we have to add html tags in the source dataset.   ○ Remove double quotes and language. Create a new column “title” based on the column “​http://purl.org/dc/terms/title​”, using the following expression:  value.replace("\"", " ").replace("@en"," ")  ○ Add html tags for title links. Create a new column “title_link” based on the column “subject”, using the following expression:  "<a href=\""+value+"\">"+if(isBlank(cells["title"].value), " ", cells["title"].value)+"<\/a>"  ○ Remove double quotes and language. Create a new column “date” based on the column “http://purl.org/dc/terms/created”, using the following expression:  value.replace("\"", " ").replace("@en"," ")  ○ Add html tags for location links. Create a new column “geoname_link” based on the column “geonames_uri”, using the following expression:  "<a href=\""+value+"\">"+if(isBlank(cells["geonames_names"].value), " ", cells["geonames_names"].value)+"<\/a>"  ○ Add html tags and links for images. Create a new column “image_link” based on the column “http://www.europeana.eu/schemas/edm/isShownAt”, using the following expression:  "<img width=\"188\" src=\"http://iiif.library.ubc.ca/image/cdm.macmillan."+value.substring(10,19).replace(".","-")+".0000" + "/full/150,/0/default.jpg\"/>" 12  ● Export the dataset from OpenRefine in .csv format. Name the file “mcmillan_cleaned”.  ● Sign up or Log in to CARTO Builder: ​https://carto.com/signup/​. Create a new map and import the Open Refine exported file to your map.  ● Georeference your dataset following the instructions here: https://carto.com/learn/guides/analysis/georeference​. Once in your map, click on the dataset on the left menu, then click on “Analysys” > “Add analysys” > Georeference. Select the corresponding column names in your dataset for latitude and longitude (lat and long). Note that the application is plotting just one resource per location, so we will need to aggregate the results to have all the resources plotted.     ● Export the georeferenced dataset from CARTO in csv format, in order to incorporate the “the_geom_webmercator” column (with georeferenced data) in your dataset. Name the file “mcmillan_cleaned_geo”. Import the dataset back into CARTO map, and delete the previous dataset from your map. This step is necessary since CARTO does not allow georeference analysis and SQL manipulation (that we will need for aggregation) of the data concomitantly.   ● Click on the dataset on the lateral menu and select the “Data” tab. At the bottom of the lateral panel, enable SQL option. Paste the following query in the editor and click “Apply”:  SELECT string_agg(DISTINCT CONCAT (date, ' <br>', title_link, ' <br><br>', image_link, ' '),' <br><br><br><br>') as new_column_aggregated, geoname_link, the_geom_webmercator, Min(cartodb_id) cartodb_id 13 FROM mcmillan_cleaned_geo group by geoname_link, the_geom_webmercator  ● Click on the “Pop-up” tab, and enter the following settings:  ○ Style: color ○ Window size: 240 ○ Header color: #b6dc9 ○ Show item: check all boxes  ● Click on any point on the map, you should see something similar to this:    ● To build the filters:  ○ Click on “mcmillan_cleaned_geo” dataset, “style” tab, and use the following settings:  ■ Aggregation: none ■ Size 10, color #d1c4c4 ■ Stroke 1 color #FFFFFF ■ Blending: darken  ○ Exit “mcmillan_cleaned_geo” dataset and click on the “add” button to re import the same dataset. You will get copy named “mcmillan_cleaned_geo_1”. Click on this new dataset, “style” tab, and use the following settings:  ■ Aggregation: none ■ Size 15, color #ee4d5a ■ Stroke 1 color #FFFFFF 14 ■ Blending: none  ○ Exit the dataset and make it the second one in the list showing on the lateral panel, dragging and dropping it. We want this new layer behind the one that has the pop-up with photos.   ○ Click on “Widget” tab to add filters. Add widgets following the instructions here: https://carto.com/learn/guides/widgets/exploring-widgets​.  ● Publish your map using the “Publish” button on the left lateral panel, just under the map name. Final result:     Challenges  Solving the problem of showing more than one resource at the same location was tricky. My first try was to use some sort of cluster visualization. I build the map outside CARTO using CARTO.js following procedures here: - https://github.com/Leaflet/Leaflet.markercluster - http://bl.ocks.org/oriolbx/7518b5834d1b679759bda218871cb315  I used the following Carto.js URL to build app outside CARTO: 15 http://carolamigo.carto.com/api/v2/viz/80065cab-4b9e-4efd-83b6-91660891971e/viz.json  I was able to do it but we had locations with more than a 1000 records, so the interface was impossible to navigate:     My second try was to do it using CARTO data options. It allows data manipulation using SQL, but it took me some trial and error and help from the following resources to get it right: - https://gis.stackexchange.com/questions/135769/cartodb-displaying-multiple-items-at-same-address - https://gis.stackexchange.com/questions/90578/cartodb-aggregate-and-infowindow - https://docs.microsoft.com/en-us/sql/t-sql/functions/string-agg-transact-sql  This was the result I got with the right SQL query (using string_agg, CONCAT and Min): 16    My first try to get clickable links and images inside the pop-ups was to edit the CSS and HTML in CARTO “style” and “pop-up” tabs. I was not able to do it because the aggregate SQL statement was not allowing me to add more columns from the data being retrieved from the table. I was able to build the string with CONCAT, but when I looked at the table the columns concatenated were not there, just the resulting aggregated column. If I was able to retrieve the columns separately as well, it would be possible to edit the html for the pop-ups using moustache {} (https://mustache.github.io/).  So I changed my approach and added the html tags directly on the source code. In order to display images, IIIFs URLs had to be generated in Open refine using OC API instructions.  Finally, adding filters was tricky because of the same limitation I had with the use of the SQL query. The solution was to create a new layer with a duplicate of the dataset, without the aggregation query. This duplicated layer has all the original dataset columns accessible, and contains only filters, no pop-ups. Moving this layer behind the one with the pop-ups keeps them accessible.     17 UBC Institute of Fisheries Field Records  GitHub repository:  https://github.com/carolamigo/ubc_carto_fisheries Modelling  ● Download collection metadata using the Open Collections Research API. A php script to batch download is provided at ​OC API Documentation page​ > Download Collection Data.. This script returns a folder containing one RDF file per collection item (or XML, JSON, any format preferred). We are going to use N-triples because the file is cleaner (no headers or footers), what makes the merging easier later. Edit the script following the instructions on the documentation page and run it using the command:  $ php collection_downloader.php --cid fisheries --fmt ntriples  ● Merge files using the following python script. The folder containing the files to me merged has to be named “fisheries” and the script has to be run from the same folder the folder fisheries is in:  #adapted from: https://stackoverflow.com/questions/17749058/combine-multiple-text-files-into-one-text-file-using-python  import​ glob read_files = glob.glob( ​"fisheries/*.txt" ​) with​ open ​("result.txt", "wb") ​as​ outfile: for​ f ​in​ read_files:  with​ open ​(f, ​"rb" ​) ​as​ infile:  outfile.write(infile.read())  ● Convert merged file obtained to a tabular format. Import project in Open Refine using the RDF/N3 files option. No character encoding selection is needed.  Challenges  The Unix cat command is not suitable to merge a large number of files. I decided to use a python script to do it after getting stuck in a infinite loop by trying to use: 18  printf '%s\0' *.txt | xargs -0 cat > merged.txt  Source: https://stackoverflow.com/questions/21209029/merging-large-number-of-files-into-one  Cleaning  ● The latitude values are in the “​http://www.w3.org/2003/01/geo/wgs84_pos#lat​” column. We have to change their formatting to numbers so CARTO can understand them:  "54 13"@en  >  54.13 "0 30 S"@en  >  -0.3 (Latitudes south of the equator have negative values)  Create a new column “test” based on the column “​http://www.w3.org/2003/01/geo/wgs84_pos#lat​” using the following expression to remove any character and preserve only digits and blank spaces (keeping the spaces is important for placing the decimal points later):  value.replace(/[^\ ,^\d]/, "")  We have now to transform those values in numbers, but it is important to insert the decimal point in the right spot first, so select on the column “test”, “Edit column” > “Split into several columns”, separating by a blank space and selecting splitting into 2 columns at most. You are going to get two columns, “test 1” and “test 2”. Create a new column “latitude” based on the “test 2” column using the following expression to concatenate the values with a decimal dot in between:  cells["test 1"].value + "." + cells["test 2"].value  On “latitude” column, select “Edit cells” > “Transform” and write the following expression to remove any remaining blank spaces:  value.replace(" ","")  We have now the values with the decimal point in the right position. Ensure all values are numbers by selecting on the column “latitude” > “Edit cells” > “Common transforms” > “To number”. Delete columns “test 1” and “test 2”.   19 Filter column “​http://www.w3.org/2003/01/geo/wgs84_pos#lat​” to select only cells containing “S”, using “Text filter” and typing “S” in the box that appears in the left sidebar. On the “latitude” column, select “Edit cells” > “Transform” and write the following expression to make all south latitudes negative values:  value*-1  Now we have all latitudes south with a negative sign before them. Close the “Text facet” window on the left sidebar to remove the filter.   ● We have now to repeat the procedure to the “​http://www.w3.org/2003/01/geo/wgs84_pos#​long​” column (longitudes).  Create a new column “test” based on the column “​http://www.w3.org/2003/01/geo/wgs84_pos#​long​” using the following expression to remove any character and preserve only digits and blank spaces:  value.replace(/[^\ ,^\d]/, "")  We have to transform those values in numbers, but it is important to insert the decimal point in the right spot, so select on the column “test”, “Edit column” > “Split into several columns”, separating by an blank space and selecting splitting into 2 columns at most. You are going to get two columns, “test 1” and “test 2”. Create a new column “longitude” based on the “test 2” column using the following expression to concatenate the values with a decimal dot in between:  cells["test 1"].value + "." + cells["test 2"].value  On “longitude” column, select “Edit cells” > “Transform” and write the following expression to remove any remaining blank spaces:  value.replace(" ","")  We have now the values with the decimal point in the right position. Ensure all values are numbers by selecting on the column “longitude” > “Edit cells” > “Common transforms” > “To number”. Delete columns “test 1” and “test 2”.   Filter column “​http://www.w3.org/2003/01/geo/wgs84_pos#​long​” to select only cells containing “W”, using “Text filter” and typing “W” in the box that appears in the left sidebar. On the “longitude” column, select “Edit cells” > “Transform” and write the following expression to make all west longitudes negative values:  value*-1 20  Now we have all longitudes west with a negative sign before them. Close the “Text facet” window on the left sidebar to remove the filter.    ● Let’s verify if the values for latitude and longitude are within the correct ranges. Facet the “longitude” column by number (“Facet” > “Numeric facet”) to check values (you might need to increase the faceting limit number). Longitudes have a range of -180 to +180. Any value outside that range is incorrect. Slide the filter selector on the left sidebar to see values that are larger than 180. Uncheck box “blank”. Take a look on the rest of the metadata for inferring the correct value. Correct then manually by clicking on edit inside the wrong value cell, changing the value, the data type to “number” and selecting “Apply to all identical cells”.  Now facet the “latitude” column by number (“Facet” > “Numeric facet”) to check values. Latitudes have a range of -90 to +90. Any value outside that range is incorrect. Slide the filter selector on the left sidebar to see values that are larger than 90. Uncheck box “blank”. We can see by examining column “longitude” that these values of latitude and longitude are swapped. Correct then manually by clicking on edit inside the wrong value cell, changing the value, the data type to “number” and selecting “Apply to all identical cells”. Changes to latitude and longitude are complete.  ● The fish species are in the “​http://purl.org/dc/terms/subject​” column. To get better reconciliation results, we have to remove the double quotes, the “@en”, the “sp.” and keep just the species name inside the square brackets (when it exists):  "Agonus acipenerinus [Agonus accipenserinus]"@en  >  Agonus acipenerinus "Ambassis sp."@en  >  Ambassis  Create a new column “species” based on the column “​http://purl.org/dc/terms/subject​”, using the following expression:  value.split('[')[-1].replace("\"", " ").replace("@en"," ").replace("sp.","").replace("]",'')  ● Trim leading and trailing whitespaces by selecting “Edit cells” > “Common transforms” > “Trim leading and trailing whitespaces”.  ● Cluster species names in order to combine entries with typos and small inconsistencies under just one species name. On the “species” column, select “Facet” > “Text facet”, then select “Cluster” in the facet window. In the cluster window, experiment with different clustering methods. Start with “key collision” > fingerprint. Take a look at the results, and, if they are good enough, select the “Select all” button and then “Merge 21 selected and Re-Cluster”. Iterate until there are no more cluster formed, then try another clustering method until your have formed all clusters possible.   ● Fill down the following columns as we have several orphan cells resulting from the triple to tabular data format conversion. Go to each column, “Edit cells” > “Fill down”. ○ “subject” ○ “​http://purl.org/dc/terms/title​” ○ “http://purl.org/dc/elements/1.1/date” ○ “​http://www.europeana.eu/schemas/edm/isShownAt​”  ○ “​http://purl.org/dc/terms/​coverage” ○ “​http://purl.org/dc/terms/​spatial” ○ latitude_number ○ longitude_number  Challenges  To clean latitude and longitude values was the hardest part. It took me some trial and error and facet playing to get to know the data well enough to clean it, because it was far from uniform.    Reconciling  ● On the “species” column, select “Reconcile” > “Start Reconciling” > “Add standard service”. Paste the following URL* in the box, then click “Add Service”:  http://iphylo.org/~rpage/phyloinformatics/services/reconciliation_eol.php  22   *The Encyclopedia of Life (EOL) taxonomy reconciliation service to Open Refine was developed by: http://iphylo.blogspot.ca/2012/02/using-google-refine-and-taxonomic.html  ● The reconciliation service will appear under reconciliation services tab. Select it and click “Start reconciling”. The process will take a long time (one hour or two) since we have many entries. You have to wait until it is done to do any further work on the data.  ● When reconciliation is finished, review the results. Use the reconciliation faceting “species: judgment” box on the left sidebar to review the “none” ones. Those need you input to pick the best match. Up to three options show up. Select the correct one by using the double checked box, which means your decision will be applied to all other cells that match this condition. If no correct option show up in the cell, click on the double checked box of “Create new topic”, meaning that no reconciliation value will be added to cells that match this condition (they are going under “new” facet and you will need to add values manually for those later).   23      As there are too many unique values to assess, you can review a sample and then, with the “none” facet still on, select on the species column “Reconcile” > “Actions” > Match each cell to its best candidate.    ● Extract reconciled data retrieving name and id, as a string separated by “|”. Select “species”, “Edit cells” > “Transform”, and entering the following expression:  cell.recon.match.name + " | " + cell.recon.match.id  ● Split the values obtained in the reconciliation, that should be in this format:  Isopsetta isolepis (Lockington, 1880) | 995111  24 Select “species”, “Edit column” > “Split into several columns”. Select the “|” separator and name the columns according after split: species_eol, eol_id.  ● We have to build EOL links by creating a new column “eol_uri” based on “eol_id”, using the following expression:  "http://eol.org/pages/"+value  Challenges  Finding the controlled vocabulary to the reconciliation service was difficult, as we are dealing with a very specific area of knowledge I was not familiar with. My first try was the FishBase. which had an API and some services for R set up:  https://cran.rstudio.com/web/packages/rfishbase/ https://github.com/ropensci/rfishbase  However, Encyclopedia of Life had a better reconciliation service set up specifically for Open Refine, and URIs with a good landing page with photos. Although it is an encyclopedia and not a controlled vocabulary, it is collaboratively curated and presents up to date information aggregated from several other databases.  As we had many unique species values for this dataset, it was impossible to review all the reconciliation results. I’ reviewed a sample and, as the results were consistently good, accepted best match suggestions for all remaining entries. It is important to note that sometimes a species may be known by more than one name, so having matches among different names didn’t mean necessarily that the match was wrong (it was usually correct for this dataset).  Building interface  ● Prepare the data to interface. In order to have links on CARTO interface, we have to add html tags in the source dataset.   ○ Remove double quotes and language. Create a new column “title” based on the column “​http://purl.org/dc/terms/title​”, using the following expression:  value.replace("\"", " ").replace("@en"," ")  ○ Add html tags for title links. Create a new column “title_link” based on the column “subject”, using the following expression:  25 "<a href=\""+value+"\">"+if(isBlank(cells["title"].value), " ", cells["title"].value)+"<\/a>"  ○ Add html tags for EOL species links. Create a new column “eol_html” based on the column “eol_uri”, using the following expression:  "<a href=\""+value+"\">"+if(isBlank(cells["species_eol"].value), " ", cells["species_eol"].value)+"<\/a>"  ● Export the dataset from OpenRefine in .csv format. Name the file “fisheries_cleaned”.  ● Sign up or Log in to CARTO Builder: ​https://carto.com/signup/​. Create a new map and import the Open Refine exported file to your map.  ● Georeference your dataset following the instructions here: https://carto.com/learn/guides/analysis/georeference​. Once in your map, click on the dataset on the left menu, then click on “Analysys” > “Add analysys” > Georeference. Select the corresponding column names in your dataset for latitude and longitude. Note that the application is plotting just one resource per location, so we will need to aggregate the results to have all the resources plotted.   ● Export the georeferenced dataset from CARTO in csv format, in order to incorporate the “the_geom_webmercator” column (with georeferenced data) in your dataset. Name the file “fisheries_cleaned_geo”. Import the dataset back into CARTO map, and delete the previous dataset from your map. This step is necessary since CARTO does not allow georeference analysis and SQL manipulation (that we will need for aggregation) of the data concomitantly.   ● Click on the dataset on the lateral menu and select the “Data” tab. At the bottom of the lateral panel, enable SQL option. Paste the following query in the editor and click “Apply”:  SELECT string_agg(CONCAT(species, ' <br>', eol_html, ' <br>'),' <br>') as new_column_aggregated, title_link, the_geom_webmercator, Min(cartodb_id) cartodb_id FROM fisheries_cleaned_geo group by title_link, the_geom_webmercator  ● Click on the “Pop-up” tab, and enter the following settings:  ○ Style: color 26 ○ Window size: 400 ○ Header color: #a6e79a ○ Show item: check all boxes (make sure “title_link” is first on the list).  ● Click on any point on the map, you should see something similar to this:    ● To build the filters:  ○ Click on “fisheries_cleaned_geo” dataset, “style” tab, and use the following settings:  ■ Aggregation: none ■ Size 6, color #f6ff00 ■ Stroke 1 color #FFFFFF, transparent ■ Blending: overlay  ○ Exit “fisheries_cleaned_geo” dataset and click on the “add” button to re import the same dataset. You will get copy named “fisheries_cleaned_geo_1”. Click on this new dataset, “style” tab, and use the following settings:  ■ Aggregation: none ■ Size 12, color #ff0000 ■ Stroke 1 color #FFFFFF, A:0.7 ■ Blending: none  ○ Exit the dataset and make it the second one in the list showing on the lateral panel, dragging and dropping it. We want this new layer behind the one that has the pop-up with photos.   ○ Click on “Widget” tab to add filters. Add widgets following the instructions here: https://carto.com/learn/guides/widgets/exploring-widgets​.   ● Exit the datasets and change the basemap to “Here” > “Satellite Day” 27  ● Publish your map using the “Publish” button on the left lateral panel, just under the map name. Final result:     Challenges  Finding the right colours for the points on the map was challenging, because of the satellite base map used. Bright colours rendered the best user experience, because of the better contrast against the background.        28 References  API Documentation - UBC Library Open Collections​. (2017). ​Open.library.ubc.ca​. Retrieved 20 September 2017, from https://open.library.ubc.ca/docs  CartoDB CSS​. (2017). ​YouTube​. Retrieved 20 September 2017, from https://youtu.be/O6-1mRtuz1w  Enabling Pop-Up Information Windows — CARTO​. (2017). ​Carto.com​. Retrieved 25 September 2017, from https://carto.com/learn/guides/publish-share/enabling-pop-up-information-windows  Gibbs, F. (2017). ​Installing Python Modules with pip​. ​Programminghistorian.org​. Retrieved 20 September 2017, from https://programminghistorian.org/lessons/installing-python-modules-pip  Hooland, S.; Verborgh, R. (2014). Linked data for libraries, archives and museums : how to clean, link and publish your metadata. Neal-Schuman.  How to fully customize infowindows in CartoDB​. (2017). ​Gist​. Retrieved 25 September 2017, from https://gist.github.com/andrewxhill/8655774  CARTO Infowindows. (2017). ​CartoDB: Dynamic text with different links in infowindows​. Gis.stackexchange.com​. Retrieved 25 September 2017, from https://gis.stackexchange.com/questions/136973/cartodb-dynamic-text-with-different-links-in-infowindows  L.Markercluster spidify() cartodb.js​. (2017). ​Bl.ocks.org​. Retrieved 20 September 2017, from http://bl.ocks.org/oriolbx/7518b5834d1b679759bda218871cb315  Leaflet/Leaflet.markercluster​. (2017). ​GitHub​. Retrieved 20 September 2017, from https://github.com/Leaflet/Leaflet.markercluster  Mustache/mustache.github.com​. (2017). ​GitHub​. Retrieved 25 September 2017, from https://github.com/mustache/mustache.github.com  Page, R. (2017). ​iPhylo: Using Google Refine and taxonomic databases (EOL, NCBI, uBio, WORMS) to clean messy data​. ​Iphylo.blogspot.ca​. Retrieved 20 September 2017, from http://iphylo.blogspot.ca/2012/02/using-google-refine-and-taxonomic.html  29 URL Windows. (2017). ​url completion in cartoDB info windows​. ​Gis.stackexchange.com​. Retrieved 25 September 2017, from https://gis.stackexchange.com/questions/91402/url-completion-in-cartodb-info-windows  Walkthrough Of Geonames Recon Service · Christina Harlow​. (2017). ​Christinaharlow.com​. Retrieved 20 September 2017, from http://christinaharlow.com/walkthrough-of-geonames-recon-service      30  University of British Columbia School of Library, Archival and Information Studies Master of Library and Information Studies          LIBR 594 - Assignment 1  Linked Data in Libraries and Archives  Carolina Román Amigo Supervisor: Richard Arias Hernandez         October 2017    Table of Contents  Main Linked Data Concepts 3 Linked Data 3 Linked Data Principles 3 URL versus URI 3 Semantic Web 4 Metadata Schemas and Metadata Application Profiles (MAP) 4 Namespaces 5 Controlled vocabularies 5 SKOS 6 Ontologies 7 Representational State Transfer (REST) Application Programming Interfaces (APIs) 8 Main Data Models (Data Structures) 8 Tabular Data 9 Relational Model 10 Meta-markup languages 10 RDF (Triples) 10 SPARQL 10 BIBFRAME 11 Serialization Formats for Linked Data 12 XML 12 XSD XML Schema 12 JSON and JSON-LD 12 RDF/XML 12 RDF Schema 12 Turtle and N-Triples 13 OWL 13 Linked Data Process 13 Planning 13 Designing 14 Implementing 14 Publishing 14 Consuming 15 Value and Challenges for Libraries and Archives 15 Linked Data Cases in Libraries and Archives 16 1 WorldCat Linked Data Project (Libraries) 16 Library of Congress’s (LoC) id.loc.gov service (Libraries) 16 British Library’s British National Bibliography (Libraries) 17 American Numismatic Society’s thesaurus (Archives) 17 Archaeology Data Service Linked Open Data (Archives) 17 Sources of data about LD projects in Libraries, Archives and Museums 18 OCLC survey on LD adoption (2015) 18 Library Linked Data Incubator Group (LLD XG) wiki (2011) 18 Linked Data for Libraries (LD4L) (2016) 19 References 20     2 Main Linked Data Concepts Linked Data Linked data (LD) is the term used to refer to the set of technologies and best practices aimed                                   to prepare and publish data in a way it can be automatically interlinked and shared on the web                                   (Hooland & Verborgh, 2014). By using unique resource identifiers and a data structure based                           on triples, linked data provides meaningful links among related objects of different                       provenances, offering more information to the user and improving the discoverability of                       resources. LD makes use of common vocabularies to ensure understanding across a                       community. It is also machine-readable, enabling automated agents to interpret data                     semantically in a similar way a human would do. For that reason, Linked Data, and specifically,                               Linked Open Data (LD made freely available on the web), can be seen as a building block or a                                     practical implementation of a primitive version of the Semantic Web (Miller, 2011; Southwick,                         2015).  Linked Data Principles  The Linked Data principles, or building blocks, popularized by Berners-Lee are:  - “Use URIs as names for things.”​: URIs are uniform resource identifiers, allowing                       resources to be identified in a unique way anywhere in the universe (Hooland &                           Verborgh, 2014). Each element in a rdf triple should have a unique identifier: subject,                           predicate and object. - “Use HTTP URIs so that people can look up those names.”​: Adding an HTTP method to                               an URI allows users to access those names to get more information.   - “When someone looks up a URI, provide useful information, using the standards (RDF,                         SPARQL).”​: Information provided should be relevant about the thing identified, it                     should be data that someone would like to know about the resource. - “Include links to other URIs so that they can discover more things.”​: Provide the                           relationships and other things that the URI is related to.   URL versus URI In summary, URIs are identifiers, while URLs are addresses. According to Hooland & Verborgh                           (2014):  “A URI, Uniform Resource Identifier, is a generalization of the concept that permits                         resources anywhere in the universe to be given a unique identification.” “A URL is a uniform resource locator, which, as the name says, enables to locate                             resources in a unique way.” 3  Every URL is also an URI, but not every URI is an URL. An URL is an URI added of a method                                           (such as HTTP) that provides access to a resource over a network. Technical standards such as                               W3C do not endorse the subdivision of URI in URLs, using rather a nomenclature such as HTTP                                 URIs to define URIs pointing to a network location. (Uniform Resource Identifier, 2017)                         However, although informal, URLs are recognized by the community as a useful concept and,                           according to Miessler (2015), it is best to use URL when referring to a URI containing both the                                   resource name and the method to access it, while URI is best used when referring directly to a                                   resource name.  Semantic Web The Semantic Web is the next level of evolution of the internet as we use it today, where                                   machines will be able to understand the semantic meaning of information provided by                         humans. It is a ​“framework for creating, managing, publishing and searching semantically rich                         information about web resources” (Alistair et al., 2005). This, according to Berners-Lee,                       Hendler and Lassila (2001), will ​“enable intelligent agents to autonomously perform tasks for                         us” (Berners-Lee et al., 2001). It is important to note that the terms “understand” and                             “intelligent” do not mean here the same for machines and humans. Machines are able to                             behave “intelligently” only in an operational sense, meaning that the are able to use the data                               provided according to predefined rules and to infer relationships using logic. In the semantic                           web, the data is annotated and ontologies, relationships and vocabularies (shared repositories                       of meaning) are provided so the web, that is constituted today mostly of human-readable                           information, becomes accessible to software agents.   Regarding information retrieval for humans, the Semantic Web overcomes several of the                       limitations of keyword based search engines (Alistair et al., 2005). It increases precision when                           searching for terms with multiple meanings, since the terms have extra information associated                         with them that allows the search to specify which meaning is the one he is looking for. It                                   provides a better recall as synonyms are taken in account, and, in a similar way, enables                               search for terms across languages. Limitations regarding retrieval of images, audio and video                         remain though, since they only become searchable when metadata is added to them.  Metadata Schemas and Metadata Application Profiles (MAP) In order to be machine-interoperable, metadata has to be structured and atomized. A                         metadata schema ensures the common interpretation of each metadata element, subelement                     and attributes, as well as its requirements, content guidelines, controlled vocabularies                     adopted, etc. (Hooland & Verborgh, 2014). When documented, a metadata schema becomes a                         metadata application profile (Miller, 2011). The term metadata schema can refer both to                         formally standardized element sets such as the Dublin Core, VRA 3.0 Core Categories, or DPLA                             MAP, or to locally established element sets developed to fulfill specific needs.  4 Some initiatives such the Europeana adopt the term “data model” for designating their                         metadata application profile documentation. Although technically correct, we find the term                     too generic to this application as a data model is anything that provides guidelines to structure                               data. Moreover, using data model to designate a metadata application profile can cause                         confusion since on the literature, the term is also used to designate data structures such as                               tabular data, relational data model, meta-markup and RDF. Namespaces  A namespace is a component of a metadata application profile. According to Hay (2006), ​“an                             XML namespace is the URI that describes an ontology from which terms are taken.”​(Hay, 2006)                             It allows consistent reuse of elements of metadata description already developed by someone                         else, ensuring a common understanding of how the data should be interpreted (Hooland &                           Verborgh, 2014). A prefix is usually added to each element indicating the namespace it comes                             from, and the full namespaces URIs are indicated at the beginning of the metadata schema                             file. The example below is an excerpt of the Portland Common Data Model RDF file available                               on Github (https://github.com/duraspace/pcdm/blob/master/models.rdf).  <?xml version="1.0"?> <?xml-stylesheet type="text/xsl" href="rdfs2html.xsl"?> <rdf:RDF      xmlns:dcterms="http://purl.org/dc/terms/"     xmlns:ldp="http://www.w3.org/ns/ldp#"     xmlns:ore="http://www.openarchives.org/ore/terms/"     xmlns:owl="http://www.w3.org/2002/07/owl#"     xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#"     xmlns:rdfs="http://www.w3.org/2000/01/rdf-schema#"     xmlns:xsd="http://www.w3.org/2001/XMLSchema#"     xmlns:pcdm="http://pcdm.org/models#">      <rdf:Description rdf:about="http://pcdm.org/models#">     <dcterms:title xml:lang="en">Portland Common Data Model</dcterms:title>     <dcterms:publisher rdf:resource="http://www.duraspace.org/"/>     <rdfs:seeAlso rdf:resource="https://github.com/duraspace/pcdm/wiki"/> <rdfs:comment xml:lang="en">Ontology for the Portland Common Data Model,         intended to underlie a wide array of repository and DAMS          applications.</rdfs:comment>     <owl:versionInfo>2016/04/18</owl:versionInfo>     <owl:priorVersion rdf:resource="http://pcdm.org/2015/09/28/models"/>     </rdf:Description> Controlled vocabularies  5 According to Hooland & Verborgh (2014), a controlled vocabulary ​“represents a restricted                       subset of language which has been explicitly created to avoid the problems which arise with                             the use of natural language during the indexing and retrieval of information” ​(Hooland &                           Verborgh, 2014). That means that standardized words are used to represent concepts,                       establishing preferred terms to promote consistency. (Harpring & Baca, 2010) Controlled                     vocabularies are a type of Knowledge Organization Systems (KOS). There are mainly three                         types of controlled vocabularies as described below.  ● Classification schemes: ​offer a way to group physically documents of similar content,                       using classes arranged systematically (Broughton, 2004). Example: Dewey Decimal Classification (DDC)  ● Subject headings: ​describe the subject of specific resources in a succinct way, using                         one or few words (Hooland & Verborgh, 2014).  Example: Library of Congress Subject Headings (LCSH)  ● Thesauri: ​represent an application domain in a logical way, building a structure of                         preferred and non preferred terms, related terms, broader and narrower terms                     (Hooland & Verborgh, 2014). Example: Arts and Architecture Thesaurus (AAT)  Controlled vocabularies increase the chances of finding desired content even with imprecise                       keywords (related terms). They also increase recall (proportion of the documents relevant to                         the search that were successfully retrieved) and precision (proportion of retrieved documents                       relevant to the search). Better recall is possible because thesauri provide synonymy control,                         meaning that it takes in account different words that may represent the same or similar                             concepts. Greater precision is possible because of polysemy control, which means that when                         the same term is used to represent different concepts the thesauri allow for disambiguation                           because of the hierarchical structure it provides. For example, apple the fruit would be in a                               different location (under a different broader term) than Apple the company.  However, controlled vocabularies are expensive to create and difficult to maintain, as it takes                           time and resources to keep them up to date. They are subjective and thus prone to express a                                   specific world view, usually biased. Also, they can be difficult to users to understand and use.                               They definitely have value in the linked data context, but each application should be evaluated                             in the light of pros and cons they offer.   SKOS Simple Knowledge Organization System (SKOS) is a simplified language to represent controlled                       vocabularies on the web. It is designed to be easier to use than more complex language                               ontologies such as OWL, but still powerful enough to support semantically enhanced search                         6 (machine-understandable). It is meant to describe content of books and other resources, not                         to formally describe aspects of the world by axioms and facts as ontologies do (Alistair et al.,                                 2005). The figure 1 below is an example of how SKOS may be used to represent a thesaurus                                   entry. Note the use of a prefix (skos), as in namespaces, and the label of the elements                                 (predicates of the triples) representing classical thesauri terms such as related, broader and                         narrower.   Figure 1 - Example of a thesaurus entry represented in SKOS (source: https://www.w3.org/TR/2005/WD-swbp-skos-core-guide-20051102/​)  Ontologies According to Harpring & Baca (2010), ​“an ontology is a formal, machine-readable specification                         of a conceptual model in which concepts, properties, relationships, functions, constraints, and                       axioms are all explicitly defined” (Harpring & Baca, 2010). Ontologies express knowledge about                         a domain and the relation between its concepts based upon an open world assumption,                           meaning that inference is allowed based on the information explicitly asserted. Differently                       from the closed world assumption, where only that of is asserted is considered known,                           ontologies are able to complete incomplete information entered by applying its rules logically                         7 (Hay, 2006). Ontologies should not be confused with controlled vocabularies. Controlled vocabularies are                     used by ontologies to express the vocabulary of a given domain, which is by its turn used                                 according to the grammar defined by the ontology. While controlled vocabularies aim to                         provide means to cataloging and retrieval, ontologies aim to represent knowledge in a                         machine-readable form. Concepts are organized in classes, individuals, attributes, relations                   and events (Harpring & Baca, 2010).  Representational State Transfer (REST) Application Programming           Interfaces (APIs)  APIs are a set of defined methods that make data accessible to machines. They have their own                                 vocabulary and syntax, defining property names and labels and how the information is                         arranged (University of British Columbia, n.d.). APIs receive programming instructions and                     provide data answers in XML or JSON. HTTP APIs, that preceded REST APIs, didn't offer a way                                 to integrate human access interface and machine access interface, keeping them both distinct.                         REST, by other hand, provide access for human and machine consumers in the same way,                             avoiding duplications and minimizing maintenance. This is achieved by uniform interface                     constraints, consisting on:  - URLs that refers to an object (piece of content) instead of a representation. - Resource manipulation though representations, returning the representation most               appropriate for each request. For example an HTML page for a person and a JSON for a                                 Javascript application. - Self-descriptive messages which contain all the information necessary to understand                   and process it. For example, the second page of search results doesn't require the first                             page in order to be accessed. - Hypermedia as the engine of application state, meaning that links are provided instead                         of identificators, dispensing the need for documentation to understand the information                     (Hooland & Verborgh, 2014). Main Data Models (Data Structures) There are four types of data structures: tabular data, relational model, meta-markup                       languages and RDF triples. Figure 2 synthesizes the differences among these four types.  8  Figure 2 - Schematic comparison of the four major data models (Hooland & Verborgh, 2014).  Tabular Data Data is organized in columns and rows, and the data in the intersections (cells), has its                               meaning defined by the columns and row it belongs to. Each item (row) has the same fields                                 9 (columns), and a header line can indicate their name. Tabular data is useful to import and                               export data with a simple structure. It is intuitive to use, portable and independent on the                               technology used. However, search and retrieval is inefficient and data has to be repeated in                             many instances, increasing the risk of inconsistencies (Hooland & Verborgh, 2014). Relational Model More than one table is used to structure data, each with its own set of fields, and tables are                                     interlinked by using key columns. The relational model minimizes redundancies and                     inconsistencies of the tabular model. It is used to normalize and manage complex data. It                             allows better search and retrieval functions, through queries, but is schema-dependent                     (Hooland & Verborgh, 2014). Meta-markup languages Meta-markup languages structure data in a hierarchical way, starting with a single root that                           breaks down in children elements. It is employed to import and export complex data. It is                               machine-readable, and also human readable with some training. It is independent of any                         platform but can be hard to implement for complex data. Its main disadvantage is its verbosity                               (Hooland & Verborgh, 2014). RDF (Triples) Triples, or RDF (resource description framework), structure data in statements consisting of                       subject, predicate and object. Each line connects a subject to an object through a predicate,                             expressing a precise relationship. There are no constraints to what can be connected to what                             and the structure is easily extended by the addition of more triples. Triples are                           schema-neutral, and the triple is complete semantically (no need for additional                     documentation). Triples can be expressed in a graph format, and this data model allow logical                             inference and the linking of data. However, normalization is lost when using a triple data                             structure, and the software market is still immature to work with data in this format (Hooland                               & Verborgh, 2014). SPARQL SPARQL Protocol and RDF Query Language is a query language to query data structured in                             triples (RDF) in any serialization format. Queries in SPARQL are based on graph patterns and                             follow the subject-predicate-object triples structure (Hooland & Verborgh, 2014; Southwick,                   2015).  10 BIBFRAME BIBFRAME (Bibliographic Framework) is a data model for bibliographic description expressed                     in RDF vocabulary with classes and properties. It is compatible with linked data principles and                             it is meant to replace MARC standards. BIBFRAME is based on Functional Requirements for                           Bibliographic Records (FRBR) model of work, expression, manifestation and item, consisting,                     however, of three core categories ​(work, instance, item), with additional key concepts related to                           the core classes (agent, subject, event) (Figure 3). Properties in BIBFRAME describe characteristics                         of the resource described as well the relationship among resources (ex: instance of, translation of).                             (Library of Congress, 2016)    Figure 3 - Illustration of BIBFRAME 2.0 model, with three core levels of abstraction (in blue)—Work,                               Instance, Item—and three related classes (in orange)—Agent, Subject, Event. (Source:                   https://en.wikipedia.org/wiki/BIBFRAME) 11 Serialization Formats for Linked Data Data structures have to be converted in a stream of bits in order to manipulated by software                                 or shared over a network. Serialization is the process of translating the data structure into a                               format (Hooland & Verborgh, 2014). The most relevant serialization formats to linked data                         purposes are briefly described below. XML In XML the content is annotated with tags that describe their meaning. Tags are similar to the                                 ones used in other markup languages such as HTML. XML enables data portability, however it                             lacks semantics (needs an schema in order to be interpreted) and is a verbose serialization                             format (Hay, 2006). XSD XML Schema XML Schema Definition (XSD) provides the semantics that are lacking in a XML format. It lists                               element and attributes names, relationships, data structure, and data types (Legg, 2007). JSON and JSON-LD JSON-LD, ​or JavaScript Object Notation for Linked Data, ​was developed based on JSON,                         enhancing it by providing additional mappings to an RDF model. It is aimed to encode linked                               data. The additional mappings provide context by linking object properties to concepts in an                           ontology (JSON-LD, n.d.). RDF/XML Resource Description Framework is the main serialization format for linked data. As in XML                           files, content is annotated with tags for describing semantics. Main tags are rdf:subject,                         rdf:predicate and rdf:object, forming a triple, with values being expressed as URIs (Hay, 2006). RDF Schema Resource Description Framework Schema (RDFS) extends RDF by adding tags to define                       domain, range, classes and subclasses. According to Ray (2006), ​“in RDFS, attributes and                         relationships are properties that are defined before assigning them to classes. Note that all                           relationships and attributes are considered optional many-to-many. There are no cardinality                     constraints in RDF” ​ (Hay, 2006). 12 Turtle and N-Triples Turtle (Terse RDF Triple Language) is a serialization format for RDF (triple) data structure in a                               less verbose way than RDF/XML, what makes it more compact and easier to read. It ​“provides                               ways to abbreviate such information, for example by factoring out common portions of URIs”                           (Turtle syntax, n.d.). N-triples is a subset of Turtle, line-based. Each line is a triple statement, composed by subject,                               predicate and object separated by a white space, and terminated with a full stop. Predicates                             have to be always expressed by URI, while subjects may be a URI or a blank nde, and objects                                     may be a URI, blank node or a literal (string) (N-Triples, n.d.). OWL OWL (Web Ontology Language) allows the precise definition of concepts of an ontology                         (Hooland & Verborgh, 2014), extending RDF by allowing the definition of relationships between                         classes (union, intersection, complement, etc.), class cardinality, equality for both classes and                       individuals, properties characteristics (symmetry, transitivity, functionality, etc.), and               restrictions on property behaviour by class (e.g. assign class UBC alumni to every record that                             has “UBC” as institution issuing degree) (Legg, 2007). OWL os based on open world assertion,                             meaning that ​“anything can be true unless asserted otherwise” ​(Hay, 2006).  Linked Data Process  Linked data (and linked open data) projects should follow the process depicted in figure 4. The                               phases are succinctly described below, with exception of the implementation phase, which is                         more detailed as it has more specificities when compared to standard metadata projects. The                           information in this section was gathered from Hooland and Verborgh (2014) and Southwick                         (2015).  Figure 4 - Linked Data process diagram, based on Southwick (2015) (Southwick, 2015). Planning ● Literature review ● Benchmarking 13 ● Stakeholders ● Proof of concept and preliminary testing ● Securing resources and funds ● Ensure top-level commitment to the project Designing ● selecting technologies ● defining a data model (aka Metadata Application Profile MAP, Namespaces) ● mapping ● defining the rules to create URIs Implementing  ● Modelling: The first step to build a linked data application is to have data structured                             following a RDF data model, consisting of triples, with URIs as names for things. ● Cleaning: Ensuring that your metadata is consistent and well structured is of crucial                         importance since it will affect the quality of the outputs of the reconciling and enriching                             steps of linked data process. Data profiling methods and tools can be used to help                             diagnose problems in your metadata in a semi-automated manner, and to deduplicate,                       normalize and clean it.  ● Reconciling: “Terms used in your metadata records can be reconciled with existing and                         well established vocabularies.” This is an easier to implement approach (when                     compared to full ontologies) to aggregate some level of semantics to your metadata,                         providing URIs with useful information in a standardized format and links to related                         URIs. String matching can be used as a low-cost approach to connect your metadata to                             a controlled vocabulary. ● Enriching: Enriching consists in obtaining structured metadata from unstructured data.                   Using OCR and named-entity recognition on a full text of a digitized document, for                           example, more metadata about the contents of the text can be extracted and added in                             a structured way to the record, becoming available for linking. It is specially useful                           when dealing with large digitization projects, big data linking or in the realm of digital                             humanities. Named-entity recognition (NER): ​“NER currently provides the easiest and                 cheapest method of identifying and disambiguating topics in large volumes of                     unstructured textual documents” ​(Hooland & Verborgh, 2014). Publishing  ● Publishing: Linked data should ideally be published in a format that allows both human                           and machine-interpretation. REST APIs allows you to do that in an elegant and                         sustainable way, avoiding needless duplication and maintenance. 14  ● The data set should: ○ be linked to other data sets ○ provide provenance of the metadata ○ explicitly indicate license for use ○ adopt terms from well-established controlled vocabularies ○ use dereferenceable URIs ○ map local vocabulary terms to other vocabularies ○ provide set-level metadata ○ provide more than one way to access the data set (e.g., SPARQL endpoint and                           RDF dumps) Consuming ● Final user interface ● APIs  Value and Challenges for Libraries and Archives According to the results of the survey on LD adoption conducted by Online Computer Library                             Center (OCLC) (Mitchell, 2016a), libraries and archives engaging in linked data projects are                         looking for:  ● “enriching bibliographic metadata or descriptions,”  ● “interlinking,”  ● “a reference source that harmonize data from multiple sources,”  ● “automate authority control,”  ● “enrich an application,” ● “to publish data more widely,” ● “to demonstrate potential use cases and impact.”   The same survey highlights some of the challenges Libraries and Archives are facing in order                             to implement those projects:  ● Inexistence of a formalized and established implementation approach across institutions. ● Lack of an easy-to-implement approach demand high level technological skills from staff. ● Immature software market and tools. ● Non standard approach and guidelines to data licensing for published data. ● Lack of integration of authority resources to linked data tools and services. 15 Linked Data Cases in Libraries and Archives The cases described below were selected based on the number of requests per day, meaning                             that they are the most popular resources among the ones listed on OCLC’s 2014 survey                             (Mitchell, 2016a; 2016b). Examples from both libraries and archives were selected. Some                       interesting results from the survey are summarized below:  ● “The most commonly used LD data sources (vocabularies) were id.loc.gov, DBpedia,                     GeoNames, and VIAF.”  ● “Data in the projects analyzed was often bibliographic or descriptive in nature.” ● “The most common organizational schemas used were Simple Knowledge Organization                   System (SKOS), Friend of a Friend (FOAF), Dublin Core and Dublin Core terms, and                           Schema.org.”  ● “Resource Description Framework (RDF) serialized in the eXtensible Markup Language                   (XML) was commonly used, as was RDF serialized in JavaScript Object Notation (JSON)                         and Terse RDF Triple Language (Turtle).” WorldCat Linked Data Project (Libraries) Link: ​https://www.worldcat.org/ Number of requests/day: an average of 16 million (OCLC Research, 2014) Technology used: URIs, RDF, keyword search.  WorldCat was enhanced in 2014 by the ​“addition of URIs from WorldCatWorks (OCLC 2014d),                           an RDF dataset that is automatically generated from WorldCat catalog records and identifies                         common content in the editions and formats of particular books, sound recordings, and other                           resources held in library collections.” ​(Godby et al., 2015) The motivation behind the project                           was to make WorldCat records more useful, ​“—especially to search engines, developers, and                         services on the wider Web, beyond the library community” and ​“easier for search engines to                             connect non-library organizations to library data”​. (Godby et al., 2015) Library of Congress’s (LoC) id.loc.gov service (Libraries) Link: ​http://id.loc.gov/about/ Number of requests/day: over 100,000 (OCLC Research, 2014) Technology used: URIs, keyword search, REST API. RDF/XML, Turtle, or N-triples, are available                         for bulk download for the authorities and vocabularies (MADS/RDF and SKOS/RDF                     representations of the data).  “The Library of Congress Linked Data Service enables both humans and machines to                         programmatically access authority data at the Library of Congress. The scope of the Linked                           16 Data Service is to provide access to commonly found standards and vocabularies promulgated                         by the Library of Congress. This includes data values and the controlled vocabularies that                           house them. The main application provides resolvability to values and vocabularies by                       assigning URIs. Each vocabulary possesses a resolvable URI, as does each data value within it.                             URIs accessible at id.loc.gov only link to authority data -- that is, controlled vocabularies and                             the values within them. Therefore, users will not find identifiers for electronic bibliographic                         resources. The Library of Congress uses other identifier schemes such as ​Handles for this                           purpose.”​ (Library of Congress, n.d.) British Library’s British National Bibliography (Libraries) Link: ​http://bnb.data.bl.uk/ Number of requests/day: 10,000 – 50,000 (OCLC Research, 2014) Technology used: URIs, RDF, keyword search, SPARQL queries.   “The BNB Linked Data Platform provides access to the ​British National Bibliography published                         as linked open data and made available through SPARQL services. Two different interfaces are                           provided: a ​SPARQL editor​, and /sparql a service endpoint for remote queries. The Linked                           Open BNB is a subset of the full British National Bibliography. It includes published books                             (including monographs published over time), serial publications and new and forthcoming                     books, representing approximately 3.9 million records. The dataset is available under a                       Creative Commons CC0 1.0 Universal Public Domain Dedication​ licence.”​ (British Library, n.d.) American Numismatic Society’s thesaurus (Archives) Link: ​http://nomisma.org/ Number of requests/day: 10,000 – 50,000 Technology used: URIs, RDF/XML, JSON-LD, Turtle, KML, SPARQL queries.  “Nomisma.org is a collaborative project to provide stable digital representations of                     numismatic (relating to or consisting of coins, paper currency, and medals) concepts according                         to the principles of ​Linked Open Data​. These take the form of http URIs that also provide                                 access to reusable information about those concepts, along with links to other resources. The                           canonical format of nomisma.org is RDF/XML, with serializations available in JSON-LD                     (including geoJSON-LD for complex geographic features), Turtle, KML (when applicable), and                     HTML5+RDFa 1.1.”​ (Nomisma, n.d.) Archaeology Data Service Linked Open Data (Archives) Link: ​http://data.archaeologydataservice.ac.uk/page/ Number of requests/day: fewer than 1,000 (Mitchell, 2016a) Technology used: URIs, RDF/XML, SPARQL queries. 17  The Archaeology Data Service “preserves digital data in the long term, and promotes and                           disseminating a broad range of data in archaeology, using a variety of avenues, including                           Linked Open Data. Linked Data at the ADS was initially made available through the STELLAR                             project (http://hypermedia.research.southwales.ac.uk/kos/stellar/), a joint project between the             University of South Wales, the ADS and Historic England. The STELLAR project developed an                           enhanced mapping tool for non-specialist users to map and extract archaeological datasets                       into RDF/XML, conforming to the CRM-EH ontology (an extension of CIDOC CRM for                         archaeology). The results of the STELLAR project are published from the ADS SPARQL                         endpoint. ADS also consumes LOD from other sources (Library of Congress, Ordnance Survey,                         GeoNames, DBpedia and the vocabularies developed as part of the SENESCHAL project -                         http://www.heritagedata.org/blog/about-heritage-data/seneschal) to populate the metadata         held within our Collection Management System with URIs, and then publishes the resource                         discovery metadata for all our archives via our SPARQL endpoint.”​ (The University of York, n.d.)  Sources of data about LD projects in Libraries, Archives and                   Museums These sources offer a comprehensive list of linked data initiatives in libraries, archives and                           museums, as well as use cases and frameworks for application. OCLC survey on LD adoption (2015) “In 2014, OCLC staff conducted a survey on LD adoption, a survey that is being repeated for                                 2015. The analyzed results from the 2014 survey are captured in a series of blog posts on the                                   site hangingtogether.org and provide a substantial window into the state of LD deployment in                           LAM institutions.1 The survey surfaced 172 projects, of which 76 included substantial                       description. Of those 76 projects, over a third (27) were in development.” ​(Mitchell, 2016a)  Links: Home: ​http://www.oclc.org/research/themes/data-science/linkeddata.html Blog: ​http://hangingtogether.org/?p=4137 Library Linked Data Incubator Group (LLD XG) wiki (2011) The mission of the LLD XG, chartered from May 2010 through August 2011, has been ​"to help                                 increase global interoperability of library data on the Web, by bringing together people                         involved in Semantic Web activities — focusing on Linked Data — in the library community and                               beyond, building on existing initiatives, and identifying collaboration tracks for the future."                       (W3C Incubator, 2011). They offer a series of generalized and individual use cases of linked                             data.  18  Links: Final report: ​https://www.w3.org/2005/Incubator/lld/XGR-lld-20111025/ Use cases page: ​https://www.w3.org/2005/Incubator/lld/XGR-lld-usecase-20111025/  Use cases wiki: ​https://www.w3.org/2005/Incubator/lld/wiki/Use_Cases  Linked Data for Libraries (LD4L) (2016)  “The goal of the project is to create a Scholarly Resource Semantic Information Store (SRSIS)                             model that works both within individual institutions and through a coordinated, extensible                       network of Linked Open Data to capture the intellectual value that librarians and other domain                             experts and scholars add to information resources when they describe, annotate, organize,                       select, and use those resources, together with the social value evident from patterns of usage.”                             (Duraspace, 2016) They offer a series of generalized cases, clustered into six main areas                           including “Bibliographic + Curation” data, “Bibliographic + Person” data, “Leveraging external                     data including authorities,” “Leveraging the deeper graph,” “Leveraging usage data,” and                     “Three-site services” (e.g., enabling a user to combine data from multiple sources).” (Mitchell,                         2016a)  Links: Project wiki: ​https://wiki.duraspace.org/display/ld4l/LD4L+Use+Cases Paper about the project: ​http://ceur-ws.org/Vol-1486/paper_53.pdf       19 References  Alistair, M., Matthews, B., Beckett, D., Brickley, D., Wilson, M. and Rogers, N. (2005) SKOS: a language to describe simple knowledge structures for the web, http://epubs.cclrc.ac.uk/bitstream/685/SKOS-XTech2005.pdf  Berners-Lee, T. (2009). Linked Data. Retrieved October 10, 2017, from https://www.w3.org/DesignIssues/LinkedData.html  Berners-Lee, T., Hendler, J. and Lassila, O. (2001) The Semantic Web, Scientific American, 284 (5), 34-43.  Bray, T. Hollander, D., Layman, A. and Tobin, R. (2006) Namespaces in XML 1.1, 2nd edn, W3C Recommendation, http://www.w3.org/TR/xml-names11/.  British Library. (n.d.). Welcome to bnb.data.bl.uk. Retrieved October 10, 2017, from http://bnb.data.bl.uk/  Broughton, V. (2004) Essential Classification, Facet Publishing.  Duraspace. (2016). LD4L Use Cases. Retrieved October 10, 2017, from https://wiki.duraspace.org/display/ld4l/LD4L+Use+Cases  Fielding, R. T. (2000) Architectural Styles and the Design of Network-based Software Architectures, PhD thesis, University of California, Irvine, CA.  Godby, C. J., Wang, S., & Mixter, J. K. (2015). Library Linked Data in the Cloud: OCLC’s Experiments with New Models of Resource Description. Synthesis Lectures on the Semantic Web: Theory and Technology, 5(2), 1–154. http://doi.org/10.2200/S00620ED1V01Y201412WBE012  Harpring, P., & Baca, M. (2010). 1. Controlled Vocabularies in Context. In Introduction to Controlled Vocabularies: Terminology for Art, Architecture, and Other Cultural Works (pp. 1–11). Retrieved from http://www.getty.edu/research/publications/electronic_publications/intro_controlled_vocab/  Hay, D. C. (2006). Data Modeling, RDF, & OWL - Part One: An Introduction To Ontologies. The Data Administration Newsletter, (April). Retrieved from http://www.tdan.com/view-articles/5025  20 Hooland, S.; Verborgh, R. (2014). Linked data for libraries, archives and museums : how to clean, link and publish your metadata. Neal-Schuman.  JSON-LD. (2017, October 17). In Wikipedia, The Free Encyclopedia. Retrieved October 18, 2017, from https://en.wikipedia.org/w/index.php?title=JSON-LD&oldid=805736276  Legg, C. (2007). Ontologies on the Semantic Web. Annual Review of Information Science and Technology, 41(1), 407–451. http://doi.org/10.1002/aris.2007.1440410116  Library of Congress. (2016). Overview of the BIBFRAME 2.0 Model. Retrieved October 10, 2017, from https://www.loc.gov/bibframe/docs/bibframe2-model.html  Library of Congress. (n.d.). About Linked Data Service. Retrieved October 10, 2017, from http://id.loc.gov/about/  Miessler, D. (2015). The Difference Between URLs and URIs. Retrieved October 10, 2017, from https://danielmiessler.com/study/url-uri/  Miller, S. J. (2011). Metadata, Linked Data, and the Semantic Web. In Metadata for Digital Collections (pp. 303–324).  Mitchell, E. T. (2016a). Library Linked Data: Early Activity and Development. Library Technology Reports (Vol. 52). http://doi.org/10.5860/ltr.52n1  Mitchell, E. T. (2016b). Chapter 1. The Current State of Linked Data in Libraries, Archives, and Museums. Retrieved October 10, 2017, from https://journals.ala.org/index.php/ltr/article/view/5892/7446  N-Triples. (2017, September 24). In Wikipedia, The Free Encyclopedia. Retrieved October 18, 2017,  from https://en.wikipedia.org/w/index.php?title=N-Triples&oldid=802208118  Nomisma. (n.d.). Retrieved October 10, 2017, from http://nomisma.org/  OCLC Research. (2014). Linked Data Survey results 1 – Who’s doing it (Updated). Retrieved October 10, 2017, from http://hangingtogether.org/?p=4137  Olson, J. (2003) Data Quality: the accuracy dimension, Morgan Kaufmann.  Southwick, S. B. . (2015). A Guide for Transforming Digital Collections Metadata into Linked Data Using Open Source Technologies. Journal of Library Metadata, 15(1), 1–35. http://doi.org/10.1080/19386389.2015.1007009  21 The University of York. (n.d.). Archaeology Data Service Linked Open Data. Retrieved October 10, 2017, from http://data.archaeologydataservice.ac.uk/page/  Turtle (syntax). (2017, September 24). In Wikipedia, The Free Encyclopedia. Retrieved October 18, 2017, from https://en.wikipedia.org/w/index.php?title=Turtle_(syntax)&oldid=802208209  Uniform Resource Identifier. (2017, October 14). In Wikipedia, The Free Encyclopedia. Retrieved October 18, 2017, from https://en.wikipedia.org/w/index.php?title=Uniform_Resource_Identifier&oldid=805285595  University of British Columbia. (n.d.). Open Collections API Documentation. Retrieved October 10, 2017, from https://open.library.ubc.ca/docs  W3C Incubator. (2011). Library Linked Data Incubator Group Final Report. Retrieved October 10, 2017, from https://www.w3.org/2005/Incubator/lld/XGR-lld-20111025/      22  University of British Columbia School of Library, Archival and Information Studies Master of Library and Information Studies       LIBR 594 - Assignment 2  Linked Data Web Application: use cases of UBC Open Collections API  Carolina Román Amigo  Supervisor: Richard Arias Hernandez, UBC SLAIS Instructor Co-supervisor: Paul Joseph, UBC Systems Librarian        November 2017    Table of Contents  Introduction 2 Collections 2 MacMillan Bloedel Limited fonds 2 UBC Institute of Fisheries Field Records 3 Vocabularies 4 Geonames Ontology 4 Encyclopedia of life 4 Tools 5 CARTO Builder 5 OpenRefine 5 Implementation Process 6 MacMillan Bloedel Limited fonds 6 Modelling 6 Cleaning 7 Reconciling 9 Building interface 12 UBC Institute of Fisheries Field Records 18 Modelling 18 Cleaning 19 Reconciling 22 Building interface 25 References 29       1 Introduction  This project aims to explore the potential uses of the ​Open Collections Research API for                             linked data projects. The API is made available by University of British Columbia                         Libraries to provide machine-readable access to collections metadata and transcripts.                   We developed simple web applications providing data visualizations of collections                   metadata linked to external controlled vocabularies, providing an enhanced view of                     UBC’s digital repository. We describe below the collections, controlled vocabularies and                     tools used in this project.  Collections MacMillan Bloedel Limited fonds https://open.library.ubc.ca/collections/macmillan  The MacMillan Bloedel Limited fonds collection contains 2781 still images depicting the                       history of MacMillan Bloedel forest products company. Metadata contains names of                     locations (Library of Congress format) across the world, mainly in British Columbia. The                         goal for this use case is to link locations to a controlled vocabulary and plot records on                                 a map, getting a preview of title, date and image thumbnail for each location.  Link for the demo: https://carolamigo.carto.com/builder/881d9644-0439-4131-ad92-c16bcb1e2608/embed 2  UBC Institute of Fisheries Field Records https://open.library.ubc.ca/collections/fisheries  The UBC Institute of Fisheries Field Records contains 11021 still images depicting pages of                           notebooks describing fish specimens collected around the world over a period of more                         than 100 years. Metadata contains latitude and longitude and species of fish collected                         for each record. The goal of this use case is to link fish species to a controlled                                 vocabulary and plot records on a map, getting a preview of title and a link to species                                 found in each location.  Link for the demo: https://carolamigo.carto.com/builder/df748d01-2fc8-424b-ae46-dfd834b200a0/embed 3  Vocabularies Geonames Ontology http://www.geonames.org/ontology/documentation.html  Geonames is an ontology that provides geospatial semantic information to be used in the                           World Wide Web. It provides over 11 million geonames toponyms with unique URIs                         and an API service that can be used for reconciliation.  Encyclopedia of life http://eol.org/  The goal of Encyclopedia of life is to provide access to knowledge about life on earth                               aggregating information scattered around the world in books, journals, databases,                   websites, specimen collections. It is the result of a collaborative initiative among                       academic institutions and the community. Although it is not a controlled vocabulary, it                         offers a unique URI to each record and a reconciliation service for OpenRefine. 4 Tools CARTO Builder https://carto.com/builder/  CARTO Builder is a web-based georeference visualization tool that allows you to easily                         build interfaces based on tabular data. By the time this report was written it was free                               for anyone to use, since you agreed to make your data publicly available. They also                             have free special licences for students (see more at ​GitHub Student Developer Pack​). OpenRefine http://openrefine.org/  OpenRefine is a powerful tool to convert, clean and enrich data. Is if free to use and offers                                   linked data extensions and reconciliation services to compare and combine related                     datasets.   5 Implementation Process  The implementation process for each use case is described in detail in this section, aiming to document and allow this project to be reproduced. The structure follows the workflow for Linked Data projects described by Hooland & Verborgh (2014). MacMillan Bloedel Limited fonds  GitHub repository:  https://github.com/carolamigo/ubc_carto_macmillan Modelling  ● Download collection metadata using the Open Collections Research API. A php script to batch download is provided at ​OC API Documentation page​ > Download Collection Data. This script returns a folder containing one RDF file per collection item (or XML, JSON, any format preferred). We are going to use N-triples because the file is cleaner (no headers or footers), what makes the merging easier later. Edit the script following the instructions on the documentation page and run it using the command:  $ php collection_downloader.php --cid macmillan --fmt ntriples  ● Merge the files using the Unix cat command:  $ cat * > merged_filename  ● Convert merged file obtained to a tabular format. Import project in Open Refine using the RDF/N3 files option. No character encoding selection is needed.   Challenges  - My first try to get the collection metadata was to use Postman to send a POST query for the OC API using a key, as described in the ​OC API Documentation page​. However, queries requesting more than 1000 items are not completed. Our collection is almost 3x larger than that limit, meaning that the data would have to be retrieved in batches and merged together in OpenRefine. 6  - OpenRefine is able to open N-triples files, but I had problems in the first try with predicates that are used more than once within the same record. For example, the predicate “subject”, used for keywords related to the resource, is repeated in several triples within a record. OpenRefine reads predicates as columns names (when using any RDF or triple based language decodification option), and it doesn’t allow repeated columns names. In my second try everything worked perfectly, so I believe I had a small formatting problem in my previous file that was causing the problems with OpenRefine.   Cleaning  ● Examine the metadata for Geographic Locations. With the tabular data open in OpenRefine, look for the column “​http://purl.org/dc/terms/spatial​”. In the column options, select “Facet” > “Text facet”. The facets show you all the unique values for geographic locations in this dataset. From that list, it is possible to see that:  ○ The location names are following Library of Congress formatting style, with the province following the name of the city, and that they are in between double quotes with the language notation following:   e.g. “Alberni (B.C.)”@en  ○ Some location names have small typos:   "Namaimo River (B.C.)”@en  ○ Some resources have more than one geographic location associated with it:   e.g. "Powell River (B.C.) ; Nanaimo (B.C)"@en  ● Split the cells containing more than one geographic location value.   ○ Duplicate the “​http://purl.org/dc/terms/spatial​” column using “Edit column” > “Add column based on this column” in order to preserve original values. Name the new column: spatial_cleaned ○ On the spatial_cleaned column, select “Edit cells” > “Split multi-valued cells”. The separator is “;”.  7 ● Remove double quotes, provinces and “@en” from location names. Select “Edit cells” > “Transform” and write the following expression:  value.replace("\"", " ").replace("@en"," ").replace(/\(([^\)]+)\)/," ")  ● Trim leading and trailing whitespaces by selecting “Edit cells” > “Common transforms” > “Trim leading and trailing whitespaces”.  ● Cluster location names in order to combine entries with typos and small inconsistencies under just one geographic location name. On the spatial_cleaned column, select “Facet” > “Text facet”, then select “Cluster” in the facet window. In the cluster window, select Nearest neighbour” method. Select the “merge” box for “Nanaimo River”, correct the typo, and select “Merge selected and close”.    ● Fill down column “subject”, “​http://purl.org/dc/terms/title​”, “http://purl.org/dc/terms/created” and “​http://www.europeana.eu/schemas/edm/isShownAt​” as we have several orphan cells resulting from the triple to tabular data format conversion. Go to each column, “Edit cells” > “Fill down”.  8 Challenges  - Creating the right expression to manipulate data in OpenRefine can be challenging if you are not used to the GREL syntax and to regular expressions. The Regex Builder (​https://regexr.com/​) and the OpenRefine documentation on GitHub (​https://github.com/OpenRefine/OpenRefine/wiki/Understanding-Expressions​) are helpful resources.  Reconciling  ● Configure Geonames reconciliation service in OpenRefine following the procedure described here: ​https://github.com/cmh2166/geonames-reconcile​. This procedure involves getting a Geonames API username, installing python packages, cloning the code in the GitHub repository above, and running the script provided in the code.   ● Perform reconciliation on the “spatial_cleaned” columns using “geonames/name_equals” option. Follow the steps described here: http://christinaharlow.com/walkthrough-of-geonames-recon-service​.  ● When reconciliation is finished, review the results. An easy way to do it is to facet by text (“Facet” > “Text facet”) and filter results by clicking on a location name on the facet menu. On the spatial_cleaned column, click on the link for the location to check the reconciled value. This will open the Geonames window with location information. If it is correct, no further action is required. If it is wrong (e.g. Alice Lake found is in B.C. but geonames returned a lake with the same name in Michigan), click on “Choose a new match” under any of the wrong entries on the spatial_cleaned column. Three options will show up. Select the correct one by using the double checked box, which means your decision will be applied to all other cells that match this condition. If no correct option show up in the cell, click on the double checked box of “Create new topic”, meaning that no reconciliation value will be added to cells that match this condition. There are 66 unique values in this dataset, so it is possible to review one by one until it is done. 9   ● Verify reconciliation results that didn’t find a match (including the ones you had to “create a new topic” for) by selecting the “none” facet in the judgment box. I have found the following ones with no matches, so I had to add coordinates manually for those by looking up manually in the Geonames database using the Geonames search box (​http://www.geonames.org/​). To mass edit a value, click on the edit link that appear next to the exclude link for the value in the facet window. Enter the value and click “Apply”.    10   ● Extract reconciled data retrieving name, id and latitude/longitude, as a string separated by “|”. Select spatial_cleaned, “Edit cells” > “Transform”, and entering the following expression:  cell.recon.match.name + " | " + cell.recon.match.id  ● Split the values obtained in the reconciliation, that should be in this format:  Nanaimo River | 49.1304, -123.89385 | ​http://sws.geonames.org/6951400  Select spatial_cleaned, “Edit column” > “Split into several columns”. Select the “|” separator and name the columns according after split: geonames_names, geonames_coord, geonames_uri.  ● The “geonames_coord” column has to be further split in latitudes and longitudes using the same command above, “Edit column” > “Split into several columns”, with separator “,”. Name the columns “lat” and “long”. Trim leading and trailing whitespaces by selecting “Edit cells” > “Common transforms” > “Trim leading and trailing whitespaces”.  Challenges  - Setting up the Geonames reconciliation service on OpenRefine was not straightforward, since I had to install more than one Python package to make it work.  - My first try to reconcile the data was to download the Geonames database dump for the countries that appear in the collection and constructing URIs based on each location id using OpenRefine. Then I used VLOOKUP function on Excel to reconcile the collections list of locations with the geonames spreadsheet, using the timezone info to check if the location retrieved is correct. The results were good, but this is a more laborious way that 11 may not apply to larger datasets. Manipulating the database dump just for the countries appearing in our collection (CA, US, NZ, AU) was already difficult because of its large size.  Building interface  ● Prepare the data to interface. In order to have links and images on CARTO interface, we have to add html tags in the source dataset.   ○ Remove double quotes and language. Create a new column “title” based on the column “​http://purl.org/dc/terms/title​”, using the following expression:  value.replace("\"", " ").replace("@en"," ")  ○ Add html tags for title links. Create a new column “title_link” based on the column “subject”, using the following expression:  "<a href=\""+value+"\">"+if(isBlank(cells["title"].value), " ", cells["title"].value)+"<\/a>"  ○ Remove double quotes and language. Create a new column “date” based on the column “http://purl.org/dc/terms/created”, using the following expression:  value.replace("\"", " ").replace("@en"," ")  ○ Add html tags for location links. Create a new column “geoname_link” based on the column “geonames_uri”, using the following expression:  "<a href=\""+value+"\">"+if(isBlank(cells["geonames_names"].value), " ", cells["geonames_names"].value)+"<\/a>"  ○ Add html tags and links for images. Create a new column “image_link” based on the column “http://www.europeana.eu/schemas/edm/isShownAt”, using the following expression:  "<img width=\"188\" src=\"http://iiif.library.ubc.ca/image/cdm.macmillan."+value.substring(10,19).replace(".","-")+".0000" + "/full/150,/0/default.jpg\"/>" 12  ● Export the dataset from OpenRefine in .csv format. Name the file “mcmillan_cleaned”.  ● Sign up or Log in to CARTO Builder: ​https://carto.com/signup/​. Create a new map and import the Open Refine exported file to your map.  ● Georeference your dataset following the instructions here: https://carto.com/learn/guides/analysis/georeference​. Once in your map, click on the dataset on the left menu, then click on “Analysys” > “Add analysys” > Georeference. Select the corresponding column names in your dataset for latitude and longitude (lat and long). Note that the application is plotting just one resource per location, so we will need to aggregate the results to have all the resources plotted.     ● Export the georeferenced dataset from CARTO in csv format, in order to incorporate the “the_geom_webmercator” column (with georeferenced data) in your dataset. Name the file “mcmillan_cleaned_geo”. Import the dataset back into CARTO map, and delete the previous dataset from your map. This step is necessary since CARTO does not allow georeference analysis and SQL manipulation (that we will need for aggregation) of the data concomitantly.   ● Click on the dataset on the lateral menu and select the “Data” tab. At the bottom of the lateral panel, enable SQL option. Paste the following query in the editor and click “Apply”:  SELECT string_agg(DISTINCT CONCAT (date, ' <br>', title_link, ' <br><br>', image_link, ' '),' <br><br><br><br>') as new_column_aggregated, geoname_link, the_geom_webmercator, Min(cartodb_id) cartodb_id 13 FROM mcmillan_cleaned_geo group by geoname_link, the_geom_webmercator  ● Click on the “Pop-up” tab, and enter the following settings:  ○ Style: color ○ Window size: 240 ○ Header color: #b6dc9 ○ Show item: check all boxes  ● Click on any point on the map, you should see something similar to this:    ● To build the filters:  ○ Click on “mcmillan_cleaned_geo” dataset, “style” tab, and use the following settings:  ■ Aggregation: none ■ Size 10, color #d1c4c4 ■ Stroke 1 color #FFFFFF ■ Blending: darken  ○ Exit “mcmillan_cleaned_geo” dataset and click on the “add” button to re import the same dataset. You will get copy named “mcmillan_cleaned_geo_1”. Click on this new dataset, “style” tab, and use the following settings:  ■ Aggregation: none ■ Size 15, color #ee4d5a ■ Stroke 1 color #FFFFFF 14 ■ Blending: none  ○ Exit the dataset and make it the second one in the list showing on the lateral panel, dragging and dropping it. We want this new layer behind the one that has the pop-up with photos.   ○ Click on “Widget” tab to add filters. Add widgets following the instructions here: https://carto.com/learn/guides/widgets/exploring-widgets​.  ● Publish your map using the “Publish” button on the left lateral panel, just under the map name. Final result:     Challenges  Solving the problem of showing more than one resource at the same location was tricky. My first try was to use some sort of cluster visualization. I build the map outside CARTO using CARTO.js following procedures here: - https://github.com/Leaflet/Leaflet.markercluster - http://bl.ocks.org/oriolbx/7518b5834d1b679759bda218871cb315  I used the following Carto.js URL to build app outside CARTO: 15 http://carolamigo.carto.com/api/v2/viz/80065cab-4b9e-4efd-83b6-91660891971e/viz.json  I was able to do it but we had locations with more than a 1000 records, so the interface was impossible to navigate:     My second try was to do it using CARTO data options. It allows data manipulation using SQL, but it took me some trial and error and help from the following resources to get it right: - https://gis.stackexchange.com/questions/135769/cartodb-displaying-multiple-items-at-same-address - https://gis.stackexchange.com/questions/90578/cartodb-aggregate-and-infowindow - https://docs.microsoft.com/en-us/sql/t-sql/functions/string-agg-transact-sql  This was the result I got with the right SQL query (using string_agg, CONCAT and Min): 16    My first try to get clickable links and images inside the pop-ups was to edit the CSS and HTML in CARTO “style” and “pop-up” tabs. I was not able to do it because the aggregate SQL statement was not allowing me to add more columns from the data being retrieved from the table. I was able to build the string with CONCAT, but when I looked at the table the columns concatenated were not there, just the resulting aggregated column. If I was able to retrieve the columns separately as well, it would be possible to edit the html for the pop-ups using moustache {} (https://mustache.github.io/).  So I changed my approach and added the html tags directly on the source code. In order to display images, IIIFs URLs had to be generated in Open refine using OC API instructions.  Finally, adding filters was tricky because of the same limitation I had with the use of the SQL query. The solution was to create a new layer with a duplicate of the dataset, without the aggregation query. This duplicated layer has all the original dataset columns accessible, and contains only filters, no pop-ups. Moving this layer behind the one with the pop-ups keeps them accessible.     17 UBC Institute of Fisheries Field Records  GitHub repository:  https://github.com/carolamigo/ubc_carto_fisheries Modelling  ● Download collection metadata using the Open Collections Research API. A php script to batch download is provided at ​OC API Documentation page​ > Download Collection Data.. This script returns a folder containing one RDF file per collection item (or XML, JSON, any format preferred). We are going to use N-triples because the file is cleaner (no headers or footers), what makes the merging easier later. Edit the script following the instructions on the documentation page and run it using the command:  $ php collection_downloader.php --cid fisheries --fmt ntriples  ● Merge files using the following python script. The folder containing the files to me merged has to be named “fisheries” and the script has to be run from the same folder the folder fisheries is in:  #adapted from: https://stackoverflow.com/questions/17749058/combine-multiple-text-files-into-one-text-file-using-python  import​ glob read_files = glob.glob( ​"fisheries/*.txt" ​) with​ open ​("result.txt", "wb") ​as​ outfile: for​ f ​in​ read_files:  with​ open ​(f, ​"rb" ​) ​as​ infile:  outfile.write(infile.read())  ● Convert merged file obtained to a tabular format. Import project in Open Refine using the RDF/N3 files option. No character encoding selection is needed.  Challenges  The Unix cat command is not suitable to merge a large number of files. I decided to use a python script to do it after getting stuck in a infinite loop by trying to use: 18  printf '%s\0' *.txt | xargs -0 cat > merged.txt  Source: https://stackoverflow.com/questions/21209029/merging-large-number-of-files-into-one  Cleaning  ● The latitude values are in the “​http://www.w3.org/2003/01/geo/wgs84_pos#lat​” column. We have to change their formatting to numbers so CARTO can understand them:  "54 13"@en  >  54.13 "0 30 S"@en  >  -0.3 (Latitudes south of the equator have negative values)  Create a new column “test” based on the column “​http://www.w3.org/2003/01/geo/wgs84_pos#lat​” using the following expression to remove any character and preserve only digits and blank spaces (keeping the spaces is important for placing the decimal points later):  value.replace(/[^\ ,^\d]/, "")  We have now to transform those values in numbers, but it is important to insert the decimal point in the right spot first, so select on the column “test”, “Edit column” > “Split into several columns”, separating by a blank space and selecting splitting into 2 columns at most. You are going to get two columns, “test 1” and “test 2”. Create a new column “latitude” based on the “test 2” column using the following expression to concatenate the values with a decimal dot in between:  cells["test 1"].value + "." + cells["test 2"].value  On “latitude” column, select “Edit cells” > “Transform” and write the following expression to remove any remaining blank spaces:  value.replace(" ","")  We have now the values with the decimal point in the right position. Ensure all values are numbers by selecting on the column “latitude” > “Edit cells” > “Common transforms” > “To number”. Delete columns “test 1” and “test 2”.   19 Filter column “​http://www.w3.org/2003/01/geo/wgs84_pos#lat​” to select only cells containing “S”, using “Text filter” and typing “S” in the box that appears in the left sidebar. On the “latitude” column, select “Edit cells” > “Transform” and write the following expression to make all south latitudes negative values:  value*-1  Now we have all latitudes south with a negative sign before them. Close the “Text facet” window on the left sidebar to remove the filter.   ● We have now to repeat the procedure to the “​http://www.w3.org/2003/01/geo/wgs84_pos#​long​” column (longitudes).  Create a new column “test” based on the column “​http://www.w3.org/2003/01/geo/wgs84_pos#​long​” using the following expression to remove any character and preserve only digits and blank spaces:  value.replace(/[^\ ,^\d]/, "")  We have to transform those values in numbers, but it is important to insert the decimal point in the right spot, so select on the column “test”, “Edit column” > “Split into several columns”, separating by an blank space and selecting splitting into 2 columns at most. You are going to get two columns, “test 1” and “test 2”. Create a new column “longitude” based on the “test 2” column using the following expression to concatenate the values with a decimal dot in between:  cells["test 1"].value + "." + cells["test 2"].value  On “longitude” column, select “Edit cells” > “Transform” and write the following expression to remove any remaining blank spaces:  value.replace(" ","")  We have now the values with the decimal point in the right position. Ensure all values are numbers by selecting on the column “longitude” > “Edit cells” > “Common transforms” > “To number”. Delete columns “test 1” and “test 2”.   Filter column “​http://www.w3.org/2003/01/geo/wgs84_pos#​long​” to select only cells containing “W”, using “Text filter” and typing “W” in the box that appears in the left sidebar. On the “longitude” column, select “Edit cells” > “Transform” and write the following expression to make all west longitudes negative values:  value*-1 20  Now we have all longitudes west with a negative sign before them. Close the “Text facet” window on the left sidebar to remove the filter.    ● Let’s verify if the values for latitude and longitude are within the correct ranges. Facet the “longitude” column by number (“Facet” > “Numeric facet”) to check values (you might need to increase the faceting limit number). Longitudes have a range of -180 to +180. Any value outside that range is incorrect. Slide the filter selector on the left sidebar to see values that are larger than 180. Uncheck box “blank”. Take a look on the rest of the metadata for inferring the correct value. Correct then manually by clicking on edit inside the wrong value cell, changing the value, the data type to “number” and selecting “Apply to all identical cells”.  Now facet the “latitude” column by number (“Facet” > “Numeric facet”) to check values. Latitudes have a range of -90 to +90. Any value outside that range is incorrect. Slide the filter selector on the left sidebar to see values that are larger than 90. Uncheck box “blank”. We can see by examining column “longitude” that these values of latitude and longitude are swapped. Correct then manually by clicking on edit inside the wrong value cell, changing the value, the data type to “number” and selecting “Apply to all identical cells”. Changes to latitude and longitude are complete.  ● The fish species are in the “​http://purl.org/dc/terms/subject​” column. To get better reconciliation results, we have to remove the double quotes, the “@en”, the “sp.” and keep just the species name inside the square brackets (when it exists):  "Agonus acipenerinus [Agonus accipenserinus]"@en  >  Agonus acipenerinus "Ambassis sp."@en  >  Ambassis  Create a new column “species” based on the column “​http://purl.org/dc/terms/subject​”, using the following expression:  value.split('[')[-1].replace("\"", " ").replace("@en"," ").replace("sp.","").replace("]",'')  ● Trim leading and trailing whitespaces by selecting “Edit cells” > “Common transforms” > “Trim leading and trailing whitespaces”.  ● Cluster species names in order to combine entries with typos and small inconsistencies under just one species name. On the “species” column, select “Facet” > “Text facet”, then select “Cluster” in the facet window. In the cluster window, experiment with different clustering methods. Start with “key collision” > fingerprint. Take a look at the results, and, if they are good enough, select the “Select all” button and then “Merge 21 selected and Re-Cluster”. Iterate until there are no more cluster formed, then try another clustering method until your have formed all clusters possible.   ● Fill down the following columns as we have several orphan cells resulting from the triple to tabular data format conversion. Go to each column, “Edit cells” > “Fill down”. ○ “subject” ○ “​http://purl.org/dc/terms/title​” ○ “http://purl.org/dc/elements/1.1/date” ○ “​http://www.europeana.eu/schemas/edm/isShownAt​”  ○ “​http://purl.org/dc/terms/​coverage” ○ “​http://purl.org/dc/terms/​spatial” ○ latitude_number ○ longitude_number  Challenges  To clean latitude and longitude values was the hardest part. It took me some trial and error and facet playing to get to know the data well enough to clean it, because it was far from uniform.    Reconciling  ● On the “species” column, select “Reconcile” > “Start Reconciling” > “Add standard service”. Paste the following URL* in the box, then click “Add Service”:  http://iphylo.org/~rpage/phyloinformatics/services/reconciliation_eol.php  22   *The Encyclopedia of Life (EOL) taxonomy reconciliation service to Open Refine was developed by: http://iphylo.blogspot.ca/2012/02/using-google-refine-and-taxonomic.html  ● The reconciliation service will appear under reconciliation services tab. Select it and click “Start reconciling”. The process will take a long time (one hour or two) since we have many entries. You have to wait until it is done to do any further work on the data.  ● When reconciliation is finished, review the results. Use the reconciliation faceting “species: judgment” box on the left sidebar to review the “none” ones. Those need you input to pick the best match. Up to three options show up. Select the correct one by using the double checked box, which means your decision will be applied to all other cells that match this condition. If no correct option show up in the cell, click on the double checked box of “Create new topic”, meaning that no reconciliation value will be added to cells that match this condition (they are going under “new” facet and you will need to add values manually for those later).   23      As there are too many unique values to assess, you can review a sample and then, with the “none” facet still on, select on the species column “Reconcile” > “Actions” > Match each cell to its best candidate.    ● Extract reconciled data retrieving name and id, as a string separated by “|”. Select “species”, “Edit cells” > “Transform”, and entering the following expression:  cell.recon.match.name + " | " + cell.recon.match.id  ● Split the values obtained in the reconciliation, that should be in this format:  Isopsetta isolepis (Lockington, 1880) | 995111  24 Select “species”, “Edit column” > “Split into several columns”. Select the “|” separator and name the columns according after split: species_eol, eol_id.  ● We have to build EOL links by creating a new column “eol_uri” based on “eol_id”, using the following expression:  "http://eol.org/pages/"+value  Challenges  Finding the controlled vocabulary to the reconciliation service was difficult, as we are dealing with a very specific area of knowledge I was not familiar with. My first try was the FishBase. which had an API and some services for R set up:  https://cran.rstudio.com/web/packages/rfishbase/ https://github.com/ropensci/rfishbase  However, Encyclopedia of Life had a better reconciliation service set up specifically for Open Refine, and URIs with a good landing page with photos. Although it is an encyclopedia and not a controlled vocabulary, it is collaboratively curated and presents up to date information aggregated from several other databases.  As we had many unique species values for this dataset, it was impossible to review all the reconciliation results. I’ reviewed a sample and, as the results were consistently good, accepted best match suggestions for all remaining entries. It is important to note that sometimes a species may be known by more than one name, so having matches among different names didn’t mean necessarily that the match was wrong (it was usually correct for this dataset).  Building interface  ● Prepare the data to interface. In order to have links on CARTO interface, we have to add html tags in the source dataset.   ○ Remove double quotes and language. Create a new column “title” based on the column “​http://purl.org/dc/terms/title​”, using the following expression:  value.replace("\"", " ").replace("@en"," ")  ○ Add html tags for title links. Create a new column “title_link” based on the column “subject”, using the following expression:  25 "<a href=\""+value+"\">"+if(isBlank(cells["title"].value), " ", cells["title"].value)+"<\/a>"  ○ Add html tags for EOL species links. Create a new column “eol_html” based on the column “eol_uri”, using the following expression:  "<a href=\""+value+"\">"+if(isBlank(cells["species_eol"].value), " ", cells["species_eol"].value)+"<\/a>"  ● Export the dataset from OpenRefine in .csv format. Name the file “fisheries_cleaned”.  ● Sign up or Log in to CARTO Builder: ​https://carto.com/signup/​. Create a new map and import the Open Refine exported file to your map.  ● Georeference your dataset following the instructions here: https://carto.com/learn/guides/analysis/georeference​. Once in your map, click on the dataset on the left menu, then click on “Analysys” > “Add analysys” > Georeference. Select the corresponding column names in your dataset for latitude and longitude. Note that the application is plotting just one resource per location, so we will need to aggregate the results to have all the resources plotted.   ● Export the georeferenced dataset from CARTO in csv format, in order to incorporate the “the_geom_webmercator” column (with georeferenced data) in your dataset. Name the file “fisheries_cleaned_geo”. Import the dataset back into CARTO map, and delete the previous dataset from your map. This step is necessary since CARTO does not allow georeference analysis and SQL manipulation (that we will need for aggregation) of the data concomitantly.   ● Click on the dataset on the lateral menu and select the “Data” tab. At the bottom of the lateral panel, enable SQL option. Paste the following query in the editor and click “Apply”:  SELECT string_agg(CONCAT(species, ' <br>', eol_html, ' <br>'),' <br>') as new_column_aggregated, title_link, the_geom_webmercator, Min(cartodb_id) cartodb_id FROM fisheries_cleaned_geo group by title_link, the_geom_webmercator  ● Click on the “Pop-up” tab, and enter the following settings:  ○ Style: color 26 ○ Window size: 400 ○ Header color: #a6e79a ○ Show item: check all boxes (make sure “title_link” is first on the list).  ● Click on any point on the map, you should see something similar to this:    ● To build the filters:  ○ Click on “fisheries_cleaned_geo” dataset, “style” tab, and use the following settings:  ■ Aggregation: none ■ Size 6, color #f6ff00 ■ Stroke 1 color #FFFFFF, transparent ■ Blending: overlay  ○ Exit “fisheries_cleaned_geo” dataset and click on the “add” button to re import the same dataset. You will get copy named “fisheries_cleaned_geo_1”. Click on this new dataset, “style” tab, and use the following settings:  ■ Aggregation: none ■ Size 12, color #ff0000 ■ Stroke 1 color #FFFFFF, A:0.7 ■ Blending: none  ○ Exit the dataset and make it the second one in the list showing on the lateral panel, dragging and dropping it. We want this new layer behind the one that has the pop-up with photos.   ○ Click on “Widget” tab to add filters. Add widgets following the instructions here: https://carto.com/learn/guides/widgets/exploring-widgets​.   ● Exit the datasets and change the basemap to “Here” > “Satellite Day” 27  ● Publish your map using the “Publish” button on the left lateral panel, just under the map name. Final result:     Challenges  Finding the right colours for the points on the map was challenging, because of the satellite base map used. Bright colours rendered the best user experience, because of the better contrast against the background.        28 References  API Documentation - UBC Library Open Collections​. (2017). ​Open.library.ubc.ca​. Retrieved 20 September 2017, from https://open.library.ubc.ca/docs  CartoDB CSS​. (2017). ​YouTube​. Retrieved 20 September 2017, from https://youtu.be/O6-1mRtuz1w  Enabling Pop-Up Information Windows — CARTO​. (2017). ​Carto.com​. Retrieved 25 September 2017, from https://carto.com/learn/guides/publish-share/enabling-pop-up-information-windows  Gibbs, F. (2017). ​Installing Python Modules with pip​. ​Programminghistorian.org​. Retrieved 20 September 2017, from https://programminghistorian.org/lessons/installing-python-modules-pip  Hooland, S.; Verborgh, R. (2014). Linked data for libraries, archives and museums : how to clean, link and publish your metadata. Neal-Schuman.  How to fully customize infowindows in CartoDB​. (2017). ​Gist​. Retrieved 25 September 2017, from https://gist.github.com/andrewxhill/8655774  CARTO Infowindows. (2017). ​CartoDB: Dynamic text with different links in infowindows​. Gis.stackexchange.com​. Retrieved 25 September 2017, from https://gis.stackexchange.com/questions/136973/cartodb-dynamic-text-with-different-links-in-infowindows  L.Markercluster spidify() cartodb.js​. (2017). ​Bl.ocks.org​. Retrieved 20 September 2017, from http://bl.ocks.org/oriolbx/7518b5834d1b679759bda218871cb315  Leaflet/Leaflet.markercluster​. (2017). ​GitHub​. Retrieved 20 September 2017, from https://github.com/Leaflet/Leaflet.markercluster  Mustache/mustache.github.com​. (2017). ​GitHub​. Retrieved 25 September 2017, from https://github.com/mustache/mustache.github.com  Page, R. (2017). ​iPhylo: Using Google Refine and taxonomic databases (EOL, NCBI, uBio, WORMS) to clean messy data​. ​Iphylo.blogspot.ca​. Retrieved 20 September 2017, from http://iphylo.blogspot.ca/2012/02/using-google-refine-and-taxonomic.html  29 URL Windows. (2017). ​url completion in cartoDB info windows​. ​Gis.stackexchange.com​. Retrieved 25 September 2017, from https://gis.stackexchange.com/questions/91402/url-completion-in-cartodb-info-windows  Walkthrough Of Geonames Recon Service · Christina Harlow​. (2017). ​Christinaharlow.com​. Retrieved 20 September 2017, from http://christinaharlow.com/walkthrough-of-geonames-recon-service      30  University of British Columbia School of Library, Archival and Information Studies Master of Library and Information Studies       LIBR 594 - Assignment 3  IIIF: Use Cases for  UBC Open Collections  Carolina Román Amigo  Supervisor: Richard Arias Hernandez, UBC SLAIS Instructor Co-supervisor: Paul Joseph, UBC Systems Librarian        November 2017    Table of Contents  Table of Contents 1 Introduction 2 Comparing images 4 Greater Vancouver Regional District Planning Department Land Use Maps 4 Multiple slot viewer 4 Interactive Index Map 5 Aggregating collections 11 WWI & WWII Posters 11 Annotations 17 Epigraphic Squeezes - Decretum de Minervae Victoriae Sacerdote Temploque (I) 17 Annotation Use Cases 20 Cellxplorer on Mirador 21 IIIF Annotations on Diva.js 21 Leaflet annotation example 22 Displaying Geolocation 23 Autocomplete on searches 23 Authentication 24 3D Viewing 24 Appendix A - GVRD Maps HTML code 26 Appendix B - Epigraphic Squeeze IIIF manifest code 29 Appendix C - Epigraphic Squeeze HTML code 41 Appendix D - Epigraphic Squeeze IIIF annotation code 43      1 Introduction  This project aims to explore the potential uses of the International Image Interoperability                         Framework for UBC Open Collections. IIIF provides an interoperable technology and                     community framework for image delivery. It provides uniform access to images hosted in                         different repositories, and enables viewing, comparing, manipulating and annotating images                   through a variety of image viewing clients. Its main value consists in enabling collaboration                           with other institutions and making possible linked data initiatives. IIIF comprises four APIs:  1. Image API​: enables access to image pixels and image manipulation.  2. Presentation API​: structures images and metadata for a human viewing (e.g. informs the sequence of the pages of a book). In this API content is brought together from distributed systems via annotations. That content might include images, often with an IIIF​ ​Image API​ service to access them, audio, video, rich or plain text, or anything else. 3. Authentication API​: restricts or enables differential access to resources. 4. Search API​: provides search within resources, on the annotation layer that may include full text, transcriptions, translation, commentary, description, tagging or other annotations about the object. The table below summarizes the IIIF features that are currently implemented in UBC Open Collections. Table 1 - IIIF Features implementation in Open Collections   Implemented Not implemented Image API - Fast, rich, zoom/pan delivery of images. - Manipulation of size, scale, region of interest, rotation, quality and format. - Cite and share (stable image URIs). - Embed images in blogs and web pages.  Presentation API - UBC enhancement for word positions, for highlighting search results. - Images annotations. 2 Search API The search API ​is not implemented​, but  our custom viewer has a search feature that allows:  - Searching OCR generated text within a textual resource. - Searching transcribed content, provided by scholars. - Searching multiple streams of content, such as the translation or edition, rather than the raw transcription of the content, to jump to the appropriate part of an object. - Searching on sections of text, such as defined chapters. - Searching for user provided commentary. - Discovering similar sections of text to compare either the content or the object. - Providing autocomplete for search queries. Authentication API  - Restricts/allows for differentiate access to resources (login).  This report describes the development process of some proof of concept web applications                         created to demonstrate the potential applications of IIIF to OC, and provides references for                           further examples of applications possible with that technology.    3 Comparing images Greater Vancouver Regional District Planning Department Land Use Maps  This collection contains maps of the metro Vancouver region. There are two index maps that                             serve as a reference to localize the detailed sections:  ● Index Map : Subdivision and Land Use Maps ● Index - Land Use Series Multiple slot viewer  A multiple slot viewer such as Mirador could load the index map and the detailed section                               simultaneously, making it easier for the user to understand the context of each subdivision                           map. The example below shows, on the left slot, the region of the index map that contains the                                   detailed section number 348, depicted on the right slot.     4  How to do it:  This demo was built using Mirador demo: ​http://projectmirador.org/demo/​.   Images can be added by clicking on the upper left slot icon > replace object > Add new object from URL > Enter the manifest URL for the object. The manifest URLs of Open Collections items under the Embed tab on the item’s page.  Interactive Index Map  Leaflet-IIIF could be used to build an interactive index map such as the one in this demo:   Link for the demo (UBC on campus VPN connection only): https://leaflet.library.ubc.ca/  GitHub Repository: https://github.com/carolamigo/ubc_gvrd_maps  5   How to do it:   ● Install QGIS (​http://www.qgis.org/en/site/​).  ● Georeference the Index Map image following the instructions on this link: http://www.qgistutorials.com/en/docs/georeferencing_basics.html​.  ● Create a vector layer with an attribute “map_id”, following the instructions here: http://docs.qgis.org/2.18/en/docs/training_manual/create_vector_data/create_new_vector.html#basic-fa-the-layer-creation-dialog  ● Draw polygons digitizing the tiles information on the Index Map (red lines). On the vector layer, toggle the edit on on the top bar (pencil icon) and click on the “Add Feature” button. Use “control” to close the polygon. Use the attribute map_id to identify the polygons according to the tile number.  ● Toggle the edit off on the pencil icon. Left-click on the vector layer name on the left sidebar and save as GeoJSON, CRS (EPSG:4346, WGS 84). Name the file gvrd_partial. 6  ● Open the GeoJSON file gvrd_partial in Open Refine, parsing data as line based text files.   ● Rename Column 1 to “feature”. We want now to extract the map_id from each line. Add a new column named “map_id” based on this column, using the following expression:  value.replace(/[^\d]/, " ")  ● Trim leading and trailing whitespaces of the “map_id” column by selecting “Edit cells” > “Common transforms” > “Trim leading and trailing whitespaces”.  ● On the “map_id” column, select “Edit cells” > “Split multi-valued cells”. The separator is “ ” (empty space) and select maximum of two columns. Delete the second column from the split, keeping just the first one with the map ids. Rename it to map_id.  ● Delete the number “1” value from the first cell, since it is not a map id. This spreadsheet is ready to use.  ● Download collection metadata using the Open Collections Research API. A php script to batch download is provided at OC API Documentation page > Download Collection Data. This script returns a folder containing one RDF file per collection item (or XML, JSON, any format preferred). We are going to use N-triples because the file is cleaner (no headers or footers), what makes the merging easier later. Edit the script following the instructions on the documentation page and run it using the command:  $ php collection_downloader.php --cid gvrdmaps --fmt ntriples  ● Merge the files using the Unix cat command:  $ cat * > merged_filename  ● Convert merged file obtained to a tabular format. Import project in Open Refine using the RDF/N3 files option. No character encoding selection is needed.  ● This dataset has more than one set of maps, so we need to filter just the ones that belong to the index map we are working with. Facet column “​http://purl.org/dc/terms/publisher​” by text. Filter only "Lower Mainland Regional Planning Board of B.C." values.  7 ● To extract just the map numbers, create a new column named “map_id” based on the column “​http://purl.org/dc/terms/identifier​”, using the expression:  value.replace(/[^\d]/, " ")  Check the results by faceting the column. Just unique values should show up, no numbers missing.  ● To reconcile the values of the geojson spreadsheet with the current spreadsheet, create a new column named “geojson” based on the column “map_id” using the expression:  forEach(cell.cross("gvrd_partial", "map_id"),r,forNonBlank(r.cells["feature"].value,v,v,"")).join("|")  ● Build IIIF image urls. Create a new column named “iiif_url” based on the column “​http://www.europeana.eu/schemas/edm/isShownAt​”, using the expression:  "http://iiif.library.ubc.ca/image/cdm.gvrdmaps."+value.substring(10,19).replace(".","-")+".0000" + "/full/300,300/0/default.jpg"  ● Remove double quotes and @en from title. Create a new column named “title” based on the column “​http://purl.org/dc/terms/title​”, using the expression:  value.replace("\"", " ").replace("@en"," ")  ● Add all values to geojson string. Split column geojson by using separator “,“, maximum 3 columns. Transform split column 2 cells using the following expression:  value.replace(" }", "") + " , \"subject\": \"" + cells["subject"].value + "\" , \"title\": " + cells["title"].value + " , " + "\"purl\": \"" + cells["iiif_url"].value + "\" }"  ● Merge the three columns back together by creating a new column named geojson_code using the following expression:  cells["geojson 1"].value + " , " + cells["geojson 2"].value + " , " + cells["geojson 3"].value  8 Check the results by faceting the column. Any error on the syntax will prevent the application of working.  ● Export the geojson_code column back to the original geojson text file. Export > Custom tabular exporter. Select only the geojson_code column, do not output column headers or empty rows. On the download tab, select “Excel” and click download. Open the excel file, remove the comma from the end of the last line, copy the column, and paste it in the appropriate position in the original geojson text file, replacing the old code. Save it as gvrd_full.geojson. This file can be debugged with the help of a geojson debugger such as https://jsonlint.com/.  ● Set up leaflet in your machine, following the instructions here: http://leafletjs.com/download.html​ (Building Leaflet from the Source).  ● Use the code provided on Appendix A (also on the GitHub repo) to run the application.  Challenges:  The scanned index map doesn’t match exactly the digital map used as a basis for this web application. QGIS provide us with the option of distorting the image to make it fit the map as best as possible. However, as we wanted to use IIIF for all images involved in this web application, this would mean replacing the original map image by the distorted one in Open Collections. Although the distortion would be minimal, the digital representation would not be as true as possible to the cultural heritage object we have in our stacks anymore. We opted for a combined approach, using the original image but drawing the polygons following the contours of the deformed map. The user has the option to turn the index map view off to better see the map under it.  Another challenge was related to the digitization of the polygons. As the index map doesn’t follow a regular grid, they had to be drawn by hand, one by one. This potentialized the human error factor in this project, with some degree of imprecision on the drawing and misnumbered polygons.  Finally, debugging the geojson file was tricky since some lines had duplicate values, extra {} or misplaced commas. The debugger tool mentioned in the process description was very useful to help us spot the problematic lines and fix them.  References:  https://maptimeboston.github.io/leaflet-intro/ http://joshuafrazier.info/leaflet-basics/ 9 h​ttps://geekswithlatitude.readme.io/docs/leaflet-map-with-geojson-popups http://leafletjs.com/examples/quick-start/ https://github.com/Leaflet/Leaflet/blob/master/debug/map/image-overlay.html http://leafletjs.com/reference-1.2.0.html#imageoverlay http://www.qgistutorials.com/en/# http://leafletjs.com/examples/geojson/       10 Aggregating collections WWI & WWII Posters  This collection contains fifty nine posters, broadsides, and ephemera from World War I and II,                             published in Canada, Belgium, England, France, Germany, and the United States. There are                         similar collections across the web also using IIIF that could be aggregated in a web application,                               such as:  ● World Digital Library War Posters https://www.wdl.org/en/search/?additional_subjects=War%20posters  ● University of Washington War Poster Collection https://content.lib.washington.edu/postersweb/index.html http://digitalcollections.lib.washington.edu/cdm/search/collection/posters  There are collections about WW Posters available online that do not use IIIF, making them harder to integrate with collections from other institutions. Some examples:  ● https://umedia.lib.umn.edu/warsearch ● http://www.library.unt.edu/collections/government-documents/world-war-posters  For our demo we integrated the collections from World Digital Library (which contains                         collections from Library of Congress, British Library, among others), University of Washington                       and University of British Columbia, using Mirador viewer. The WW Posters collection from                         University of Washington is made available on a OCLC hosted CONTENTdm website and is                           using the built in IIIF implementation of CONTENTdm.  Link for the demo (UBC on campus VPN connection only): https://mirador.library.ubc.ca/  GitHub Repository: https://github.com/carolamigo/ubc_mirador_WWposters  11   12    How to do it:   ● Install Mirador viewer following the instructions here: https://github.com/ProjectMirador/mirador  ● Go to the root folder of the Mirador installation in your machine and open the file index.html (​https://github.com/ProjectMirador/mirador/blob/develop/index.html​). We need to replace the manifests of this template file with the manifests of our collections.  13 ● To get manifests for UBC collection, download collection metadata using the Open Collections Research API. A php script to batch download is provided at OC API Documentation page > Download Collection Data. This script returns a folder containing one RDF file per collection item (or XML, JSON, any format preferred). We are going to use N-triples because the file is cleaner (no headers or footers), what makes the merging easier later. Edit the script following the instructions on the documentation page and run it using the command:  $ php collection_downloader.php --cid wwposters --fmt ntriples  ● Merge the files using the Unix cat command:  $ cat * > merged_filename  ● Convert merged file obtained to a tabular format. Import project in Open Refine using the RDF/N3 files option. No character encoding selection is needed.  ● Build IIIF manifests. Create a new column named “manifest” based on the column “​http://www.europeana.eu/schemas/edm/isShownAt​”, using the expression:  "https://iiif.library.ubc.ca/presentation/cdm.wwposters."+value.substring(10,19).replace(".","-")+".0000" + "/manifest"  ● Create the code line ready to be inserted in the index.html file by creating a new column named “manifest_code” based on the “manifest” column, using the expression:  "{ \"manifestUri\": \""+value+"\", \"location\": \"University of British Columbia\"},"  ● Export the “manifest_code” column by clicking on Export > Custom tabular exporter. Select only the “manifest_code” column, do not output column headers or empty rows. On the download tab, select “Excel” and click download. Open the excel file, copy the column, and paste it in the appropriate position in the original index.html file, replacing the old code.   ● To get manifests from the University of Washington Collection, get page source code from the link http://digitalcollections.lib.washington.edu/cdm/search/collection/posters​, and import as line based data in OpenRefine.  14 ● Create a new column named “item_id” using the expression below to extract item IDs:  value.match(/.*item_id=\"(\d\d).*/).join("")  ● Trim leading and trailing whitespaces of the “item_id” column by selecting “Edit cells” > “Common transforms” > “Trim leading and trailing whitespaces”.  ● Build the manifest code by creating a new column named “manifest_code” based on the “item_id” column using the following expression:  "{ \"manifestUri\": \"http://digitalcollections.lib.washington.edu/digital/iiif-info/posters/"+value+"\", \"location\": \"University of Washington Libraries\"},"  ● Export the “manifest_code” column by clicking on Export > Custom tabular exporter. Select only the “manifest_code” column, do not output column headers or empty rows. On the download tab, select “Excel” and click download. Open the excel file, copy the column, and paste it in the appropriate position in the original index.html file, below UBC manifests.   ● To get item ids from the Library of Congress collection, use the following microdata extractor ​http://microdata-extractor.improbable.org/​ on the following URL: https://www.wdl.org/en/search/?additional_subjects=War%20posters​. Import the resulting JSON file to Open Refine.  ● Create a new column names “item_id” based on the column “_ - properties - url - url”, using the following expression:  value.replace(/[^\d]/, " ")  ● Trim leading and trailing whitespaces of the “item_id” column by selecting “Edit cells” > “Common transforms” > “Trim leading and trailing whitespaces”.  ● Build the manifest code by creating a new column named “manifest_code” based on the “item_id” column using the following expression:  "{ \"manifestUri\": \""+"https://www.wdl.org/en/item/"+value+"/manifest"+"\", \"location\": \"World Digital Library\"},"  15 ● Export the “manifest_code” column by clicking on Export > Custom tabular exporter. Select only the “manifest_code” column, do not output column headers or empty rows. On the download tab, select “Excel” and click download. Open the excel file, remove the comma from the end of the last line​, copy the column, and paste it in the appropriate position in the original index.html file, below University of Washington manifests.   ● Open the index.html file on your browser to see the final result. Click on Add item to see the menu with options.  Challenges:  For obtaining University of Washington item ids, we looked initially for an API that would provide us with the items metadata. It looks like CONTENTdm OCLC hosted websites have an API, according with the following links:  ● https://www.oclc.org/developer/develop/web-services/content-dm-api.en.html ● https://www.oclc.org/support/services/contentdm/help/customizing-website-help/other-customizations/contentdm-api-reference.en.html  But there is little documentation available and we could not find out how to access it. So the solution was to scrape the website data extracting item IDs from the source code of the collections page.  Another challenge was regarding the manifest URL pattern. The manifest URLs were initially built using the syntax provided here: https://www.oclc.org/developer/news/2017/image-open-access.en.html  Resulting on: ​http://digitalcollections.lib.washington.edu/digital/iiif-info/posters/70  However, when on the server, the Mirador application would not read those URLs and as a result the objects from University of Washington were not showing up. We opened one of the manifests and found out that they had a different @id, following another URL pattern:  https://cdm16786.contentdm.oclc.org/digital/iiif-info/posters/70/manifest.json  Our second try with the new URL pattern was successful. 16 Annotations Epigraphic Squeezes - Decretum de Minervae Victoriae Sacerdote Temploque (I)  Annotations could be used to show translations directly on the image for the collection above. The same could be some with other OC collections such as ​Royal Fisk Gold Rush Letters​ and Emma Crosby Letters​.   Link for the demo (UBC on campus VPN connection only): https://epigraphic.library.ubc.ca/  GitHub Repository: https://github.com/carolamigo/ubc_mirador_epigraphic     17 How to do it:   ● Install Mirador viewer following the instructions here: https://github.com/ProjectMirador/mirador  ● Set up a local web server using Node, following the instructions here: http://ronallo.com/iiif-workshop/preparation/web-server.html  ● Run the server on the terminal by going to directory where it is installed and entering:  http-server -p 3000 --cors  ● Get the manifest for the item to be annotated by going to the item page in Open Collection and clicking on the IIIF manifest link in the “embed” tab:  http://iiif.library.ubc.ca/presentation/cdm.squeezes.1-0050935/manifest  ● Save the manifest as a JSON file in the folder where Mirador and the web server is running. Open the JSON file using a text editor such as Sublime or Atom. Replace everything on line 351 and below for this code (check appendix B or GitHub repo for complete json file edited):   "on": "https://iiif.library.ubc.ca/cdm.squeezes.1-0050935/canvas/p0"  }  ],  "otherContent": [  {  "@id": "http://localhost:3000/annotation_list.json",  "@type": "sc:AnnotationList",  "label": "Text of this page"  }  ]  }  ] }   ],   "description": "[No description]",   "@context": "http://iiif.io/api/presentation/2/context.json",   "@id": "http://localhost:3000/epigraphic_manifest_edited.json",   "@type": "sc:Manifest" 18 }  ● By using a text editor such as Sublime or Atom, create an HTML file named “epigraphic.html”. Paste in the code available on Appendix C (also on the GitHub repo).  ● Open “epigraphic.html” on your browser. Click on “add item”, open the item, and toggl annotations on (button on the left upper corner). Draw your annotations on your image using this feature of Mirador.  ● The annotations will be stored in the local cache of your browser until it is closed. We need to retrieve the annotations from the local cache. In Firefox, on the Mirador window with the annotations, go to Tools > Web Developer > Inspector > Storage tab > Local storage. Select the value line and then the data line on the right. Copy and paste the data on a text editor such as Sublime or Atom, saving it as a JSON file named “annotation_list.json”. Remove the URL at the beginning of the code, the double quotes and square brackets (from beginning and end), as the code has to start at ​{“@context ​ (...). A version of the final annotation code, easier to read, is available on Appendix D (also on the GitHub repo) for reference.    ● Open the epigraphic.html file on your browser to see the final result. Click on Add item to see the menu with options.   19  Challenges:   Finding the easiest way to add annotations to the image was the main challenge of this project. After some research it became clear that using the annotation tool of Mirador would be the best option, but we needed to find a way to extract the data from the local storage of the browser in order to build the annotation_list.json file.  We first tried to extract annotations from Mirador using a parsing tool provided here (​http://www.darthcrimson.org/hacking-mirador-workshop/annotate.html​), but it did not work as expected, probably because we were working on a local server. The solution found was to extract the data manually from the browser using the inspector tool. Note that, although the syntax is correct, the code extracted from the local storage doesn’t follow exactly the same pattern of the annotation code suggested in the tutorial we used as a reference, available in this link:  ​http://ronallo.com/iiif-workshop/presentation/image-annotation.html​.   References:  http://ronallo.com/iiif-workshop/presentation/image-annotation.html http://iiif.io/api/presentation/2.1/#image-resources http://darthcrimson.org/hacking-mirador/ http://www.darthcrimson.org/hacking-mirador-workshop/annotate.html   Annotation Use Cases  The examples below show how annotations could be useful for STEM, music sheets and medieval manuscripts. 20 Cellxplorer​ on Mirador   IIIF Annotations​ on Diva.js    21 Leaflet annotation example       22 Displaying Geolocation  A plugin for Mirador allows for the use of a map layer with collection items locations, if geolocation information is available as it is the case for ​MacMillan Bloedel Limited fonds​ and UBC Institute of Fisheries Field Records​. See example below.   Mirador Georeferencing Plugin https://github.com/jbhoward-dublin/mirador-plugins-ucd   Autocomplete on searches  Autocomplete on full text search would be desirable for items with a large amount of text, such as books or ​newspapers​ from ​BC Historical Newspapers​ collection. Universal Viewer allows for the autocompletion of searches, as demonstrated in the example below.   Universal Viewer Autocomplete https://wellcomelibrary.org/item/b18035723#?c=0&m=0&s=0&cv=6&z=-0.4983%2C-0.115%2C2.9965%2C1.5162  23   References:  http://ronallo.com/iiif-workshop/search/service-in-manifest.html https://vimeo.com/126596158​ (UV tutorial)  Authentication  Authentication workflow for ​Electronic Theses and Dissertations in Creative Arts, 2017+​ could be managed by IIIF, if we opted to include that collection in Open Collections. 3D Viewing  For future OC collections it might be interesting to have a 3D visualization feature. Universal Viewer has support for displaying 3D objects. You can see some examples including a​ ​skull​ and the Kiss​. (Source: http://ronallo.com/iiif-workshop/now/futures.html)  24   Leaflet example: http://codh.rois.ac.jp/software/iiif-curation-viewer/demo/?manifest=https://iiif.library.ubc.ca/presentation/cdm.tokugawa.1-0227946/manifest        25 Appendix A - GVRD Maps HTML code  <!DOCTYPE html>  <html>  <head>  <title>Great Vancouver Regional District Maps</title>  <link rel="stylesheet" href="http://cdn.leafletjs.com/leaflet-0.7.3/leaflet.css" />  <script src="http://cdn.leafletjs.com/leaflet-0.7.3/leaflet.js"></script>  <script src="jquery-2.1.1.min.js"></script>  <style>  #map { width: 1400px; height: 1000px; }  </style>  </head>  <body>  <div id="map"></div>  <script> // Adapted from https://bl.ocks.org/mejackreed/15f4c1c40c36123547f2f401f06248a3  var map = L.map('map');  L.tileLayer('https://{s}.tile.openstreetmap.fr/hot/{z}/{x}/{y}.png', { 26 maxZoom: 19, attribution: '&copy; <a href="http://www.openstreetmap.org/copyright">OpenStreetMap</a>, Tiles courtesy of <a href="http://hot.openstreetmap.org/" target="_blank">Humanitarian OpenStreetMap Team</a>' }).addTo(map);    var imageUrl = 'http://iiif.library.ubc.ca/image/cdm.gvrdmaps.1-0135075.0000/full/full/0/default.jpg',  imageBounds = [[49.434, -123.333], [48.974, -121.638]];   var imageOverlay = L.imageOverlay(imageUrl, imageBounds, {opacity: 0.6}).addTo(map);   var template = '<div style="min-height: 340px;"><h2>{title}</h2><div><a href="{subject}" target=_blank><img src="{purl}" style="max-width: 300px;"/></a></div>'  function onEach(feature, layer) {   layer.on('click', function() { $.getJSON("gvrd_full.geojson", function(info) {  var popup = L.popup({  keepInView: true  })  .setContent(L.Util.template(template, {  title: feature.properties.title,  purl: feature.properties.purl,  subject: feature.properties.subject  })  );  layer.bindPopup(popup).openPopup(); })   }) }  var links =  $.getJSON("gvrd_full.geojson", function(data) { console.log(data); var indexMap = L.geoJson(data, {  onEachFeature: onEach,  style: function(feature) {  return {  weight: 1 27  }  } }   ).addTo(map);   var baseMaps = {   "Base Map": map };  var overlayMaps = {   "Index Map": imageOverlay };  L.control.layers(baseMaps, overlayMaps).addTo(map);  map.fitBounds(indexMap.getBounds());   });  </script>  </body>  </html>      28 Appendix B - Epigraphic Squeeze IIIF manifest code  {   "label": "Decretum de Minervae Victoriae Sacerdote Temploque (I)",   "viewingDirection": "left-to-right",   "viewingHint": "paged",   "metadata": [ {  "label": "AggregatedSourceRepository",  "value": "CONTENTdm",  "attrs": {  "lang": "en",  "ns": "http://www.europeana.eu/schemas/edm/dataProvider",  "classmap": "ore:Aggregation",  "property": "edm:dataProvider"  },  "iri": "http://www.europeana.eu/schemas/edm/dataProvider",  "explain": "A Europeana Data Model Property; The name or identifier of the organization who contributes data indirectly to an aggregation service (e.g. Europeana)" }, {  "label": "AlternateTitle",  "value": "Decretum De Minervae Victoriae Sacerdote Templeoque",  "attrs": {  "lang": "en",  "ns": "http://purl.org/dc/terms/alternative",  "classmap": "dpla:SourceResource",  "property": "dcterms:alternative"  },  "iri": "http://purl.org/dc/terms/alternative",  "explain": "A Dublin Core Terms Property; An alternative name for the resource.; Note - the distinction between titles and alternative titles is resource-specific." }, {  "label": "Category",  "value": "Decrees and laws dated to the second century",  "attrs": {  "lang": "en",  "ns": "http://purl.org/dc/terms/subject", 29  "classmap": "oc:DataDescription",  "property": "dcterms:subject"  },  "iri": "http://purl.org/dc/terms/subject",  "explain": "A Dublin Core Terms Property; The topic of the resource.; Typically, the subject will be represented using keywords, key phrases, or classification codes. Recommended best practice is to use a controlled vocabulary." }, {  "label": "Collection",  "value": "Epigraphic Squeezes Collection",  "attrs": {  "lang": "en",  "ns": "http://purl.org/dc/terms/isPartOf",  "classmap": "dpla:SourceResource",  "property": "dcterms:isPartOf"  },  "iri": "http://purl.org/dc/terms/isPartOf",  "explain": "A Dublin Core Terms Property; A related resource in which the described resource is physically or logically included." }, {  "label": "DateAvailable",  "value": "2014-11-21",  "attrs": {  "lang": "en",  "ns": "http://purl.org/dc/terms/issued",  "classmap": "edm:WebResource",  "property": "dcterms:issued"  },  "iri": "http://purl.org/dc/terms/issued",  "explain": "A Dublin Core Terms Property; Date of formal issuance (e.g., publication) of the resource." }, {  "label": "DateCreated",  "value": "199-100 BCE",  "attrs": {  "lang": "en",  "ns": "http://purl.org/dc/terms/created",  "classmap": "oc:SourceResource",  "property": "dcterms:created" 30  },  "iri": "http://purl.org/dc/terms/created",  "explain": "A Dublin Core Terms Property; Date of creation of the resource." }, {  "label": "DigitalResourceOriginalRecord",  "value": "https://open.library.ubc.ca/collections/squeezes/items/1.0050935/source.json",  "attrs": {  "lang": "en",  "ns": "http://www.europeana.eu/schemas/edm/aggregatedCHO",  "classmap": "ore:Aggregation",  "property": "edm:aggregatedCHO"  },  "iri": "http://www.europeana.eu/schemas/edm/aggregatedCHO",  "explain": "A Europeana Data Model Property; The identifier of the source object, e.g. the Mona Lisa itself. This could be a full linked open date URI or an internal identifier" }, {  "label": "Extent",  "value": "4 squeezes, 1 fragment",  "attrs": {  "lang": "en",  "ns": "http://purl.org/dc/terms/extent",  "classmap": "dpla:SourceResource",  "property": "dcterms:extent"  },  "iri": "http://purl.org/dc/terms/extent",  "explain": "A Dublin Core Terms Property; The size or duration of the resource." }, {  "label": "FileFormat",  "value": "image/jpeg",  "attrs": {  "lang": "en",  "ns": "http://purl.org/dc/elements/1.1/format",  "classmap": "edm:WebResource",  "property": "dc:format"  }, 31  "iri": "http://purl.org/dc/elements/1.1/format",  "explain": "A Dublin Core Elements Property; The file format, physical medium, or dimensions of the resource.; Examples of dimensions include size and duration. Recommended best practice is to use a controlled vocabulary such as the list of Internet Media Types [MIME]." }, {  "label": "FullText",  "value": "Array",  "attrs": {  "lang": "en",  "ns": "http://www.w3.org/2009/08/skos-reference/skos.html#note",  "classmap": "oc:AnnotationContainer"  },  "iri": "http://www.w3.org/2009/08/skos-reference/skos.html#note",  "explain": "Simple Knowledge Organisation System; Notes are used to provide information relating to SKOS concepts. There is no restriction on the nature of this information, e.g., it could be plain text, hypertext, or an image; it could be a definition, information about the scope of a concept, editorial information, or any other type of information." }, {  "label": "Genre",  "value": "Epigraphic Squeeze",  "attrs": {  "lang": "en",  "ns": "http://www.europeana.eu/schemas/edm/hasType",  "classmap": "dpla:SourceResource",  "property": "edm:hasType"  },  "iri": "http://www.europeana.eu/schemas/edm/hasType",  "explain": "A Europeana Data Model Property; This property relates a resource with the concepts it belongs to in a suitable type system such as MIME or any thesaurus that captures categories of objects in a given field. It does NOT capture aboutness" }, {  "label": "Identifier",  "value": "IG_I3_0035", 32  "attrs": {  "lang": "en",  "ns": "http://purl.org/dc/terms/identifier",  "classmap": "dpla:SourceResource",  "property": "dcterms:identifier"  },  "iri": "http://purl.org/dc/terms/identifier",  "explain": "A Dublin Core Terms Property; An unambiguous reference to the resource within a given context.; Recommended best practice is to identify the resource by means of a string conforming to a formal identification system." }, {  "label": "IsShownAt",  "value": "10.14288/1.0050935",  "attrs": {  "lang": "en",  "ns": "http://www.europeana.eu/schemas/edm/isShownAt",  "classmap": "edm:WebResource",  "property": "edm:isShownAt"  },  "iri": "http://www.europeana.eu/schemas/edm/isShownAt",  "explain": "A Europeana Data Model Property; An unambiguous URL reference to the digital object on the provider’s website in its full information context." }, {  "label": "Language",  "value": "Greek, Ancient (to 1453)",  "attrs": {  "lang": "en",  "ns": "http://purl.org/dc/terms/language",  "classmap": "dpla:SourceResource",  "property": "dcterms:language"  },  "iri": "http://purl.org/dc/terms/language",  "explain": "A Dublin Core Terms Property; A language of the resource.; Recommended best practice is to use a controlled vocabulary such as RFC 4646 [RFC4646]." }, {  "label": "Notes", 33  "value": "Title taken from Inscriptiones Graecae I2 (IG I2).<br><br>Alternative title taken from Inscriptiones Graecae I3 (IG I3).",  "attrs": {  "lang": "en",  "ns": "http://www.w3.org/2009/08/skos-reference/skos.html#note",  "classmap": "skos:Concept",  "property": "skos:note"  },  "iri": "http://www.w3.org/2009/08/skos-reference/skos.html#note",  "explain": "Simple Knowledge Organisation System; Notes are used to provide information relating to SKOS concepts. There is no restriction on the nature of this information, e.g., it could be plain text, hypertext, or an image; it could be a definition, information about the scope of a concept, editorial information, or any other type of information." }, {  "label": "ProjectWebsite",  "value": "http://fromstonetoscreen.wordpress.com/squeeze-collection",  "attrs": {  "lang": "en",  "ns": "http://purl.org/dc/terms/relation",  "classmap": "dpla:SourceResource",  "property": "dcterms:relation"  },  "iri": "http://purl.org/dc/terms/relation",  "explain": "A Dublin Core Terms Property; A related resource.; Recommended best practice is to identify the related resource by means of a string conforming to a formal identification system." }, {  "label": "Provider",  "value": "Vancouver : University of British Columbia Library",  "attrs": {  "lang": "en",  "ns": "http://www.europeana.eu/schemas/edm/provider",  "classmap": "ore:Aggregation",  "property": "edm:provider"  }, 34  "iri": "http://www.europeana.eu/schemas/edm/provider",  "explain": "A Europeana Data Model Property; The name or identifier of the organization who delivers data directly to an aggregation service (e.g. Europeana)" }, {  "label": "Publisher",  "value": "Vancouver: University of British Columbia Library.",  "attrs": {  "lang": "en",  "ns": "http://purl.org/dc/terms/publisher",  "classmap": "dpla:SourceResource",  "property": "dcterms:publisher"  },  "iri": "http://purl.org/dc/terms/publisher",  "explain": "A Dublin Core Terms Property; An entity responsible for making the resource available.; Examples of a Publisher include a person, an organization, or a service." }, {  "label": "Reference",  "value": "IG I2 # 24<br><br>IG I3 # 35<br><br>EM # 8116",  "attrs": {  "lang": "en",  "ns": "https://open.library.ubc.ca/terms#reference",  "classmap": "oc:ArtifactDescription",  "property": "oc:reference"  },  "iri": "https://open.library.ubc.ca/terms#reference",  "explain": "UBC Open Collections Metadata Components; Local Field; Records the reference numbers from various indices.; Records the reference numbers from various indices." }, {  "label": "Rights",  "value": "Images provided for research and reference use only. Permission to publish, copy or otherwise use these images must be obtained from the Digitization Centre: http://digitize.library.ubc.ca/",  "attrs": {  "lang": "en",  "ns": "http://purl.org/dc/terms/rights",  "classmap": "edm:WebResource", 35  "property": "dcterms:rights"  },  "iri": "http://purl.org/dc/terms/rights",  "explain": "A Dublin Core Terms Property; Information about rights held in and over the resource.; Typically, rights information includes a statement about various property rights associated with the resource, including intellectual property rights." }, {  "label": "SortDate",  "value": "199 BC",  "attrs": {  "lang": "en",  "ns": "http://purl.org/dc/elements/1.1/date",  "classmap": "dpla:SourceResource"  },  "iri": "http://purl.org/dc/elements/1.1/date",  "explain": "A Dublin Core Elements Property; A point or period of time associated with an event in the lifecycle of the resource.; Date may be used to express temporal information at any level of granularity. Recommended best practice is to use an encoding scheme, such as the W3CDTF profile of ISO 8601 [W3CDTF]." }, {  "label": "Source",  "value": "University of British Columbia. Department of Classical, Near Eastern and Religious Studies.",  "attrs": {  "lang": "en",  "ns": "http://purl.org/dc/terms/source",  "classmap": "oc:SourceResource",  "property": "dcterms:source"  },  "iri": "http://purl.org/dc/terms/source",  "explain": "A Dublin Core Terms Property; A related resource from which the described resource is derived.; The described resource may be derived from the related resource in whole or in part. Recommended best practice is to identify the related resource by means of a string conforming to a formal identification system." }, {  "label": "Title", 36  "value": "Decretum de Minervae Victoriae Sacerdote Temploque (I)",  "attrs": {  "lang": "en",  "ns": "http://purl.org/dc/terms/title",  "classmap": "dpla:SourceResource",  "property": "dcterms:title"  },  "iri": "http://purl.org/dc/terms/title",  "explain": "A Dublin Core Terms Property; The name given to the resource." }, {  "label": "Translation",  "value": ".…. ΑΥΚΟΣΕΙΠΕ ....<br>\n<br>.... ΚΕΙΗΙΕΡΕΑΝΕΑΓ.... <br>\n<br>....ΙΕΚΣΑΘΕΝΑΙΟΝΑΠΑ... <br>\n<br>....ΣΤΑΙΚΑΙΤΟΙΕΡΟΝΘΥΡΟΣΑ<br>\n<br>ΙΚΑΘΟΤΙΑΝΚΑΛΛΙΚΡΑΤΕΣΧΣΥΓΓΡΑΦΣ<br>\n<br>ΕΙΑΠΟΜΙΣΘΟΣΑΙΔΕΤΟΣΠΟΛΕΤΑΣΕΠΙΤ<br>\n<br>ΕΣΛΕΟΝΤΙΔΟΣΠΡΥΤΑΝΕΙΑΣΦΕΡΕΝΔΕΤ<br>\n<br>ΕΝΙΕΡΕΑΝΠΕΝΤΕΚΟΝΤΑΔΡΑΧΜΑΣΚΑΙ<br>\n<br>ΤΑΣΚΕΛΕΚΑΙΤΑΔΕΡΜΑΤΑΦΕΡΕΝΤΟΝΔΕ<br>\n<br>ΜΟΣΙΟΝΝΕΟΝΔΕΟΙΚΟΔΟΜΕΣΑΙΚΑΘΟΤΙ<br>\n<br>ΑΝΚΑΛΛΙΚΡΑΤΕΣΧΣΘΓΓΡΑΦΣΕΙΚΑΙΒΟ<br>\n<br>ΜΟΝΛΙΘΙΝΟΝ<br>\n<br>ΕΣΤΙΑΙΟΣΕΙΠΕΤΡΕΣΑΝΔΡΑΣΕΛΕΣΘ<br>\n<br>ΑΙΕΓΒΟΛΕΣΤΟΘΤΟΣΔΕΜΕΤ [. ] ΚΑΛΛΙΚΡΑ<br>\n<br>..ΣΧΣΘΓΓΑΦΣΑΝΤΑΣ ΕΠΙΔ.. <br>\n<br>....ΕΙΚΑΘΟΤΙΑΠΟΜ.....",  "attrs": {  "lang": "en",  "ns": "http://www.europeana.eu/schemas/edm/isDerivativeOf",  "classmap": "edm:ProvidedCHO",  "property": "edm:isDerivativeOf"  },  "iri": "http://www.europeana.eu/schemas/edm/isDerivativeOf",  "explain": "A Europeana Data Model Property; This property captures a narrower notion of derivation than edm:isSimilarTo, in the sense that it relates a resource to another one, obtained by reworking, reducing, expanding, parts or the whole contents of the former, and possibly adding some minor parts. Versions have an even narrower meaning, in that it requires common identity between the related resources. Translations, summaries, abstractions etc. do not qualify as versions, but do qualify as derivatives" }, {  "label": "Type",  "value": "Still Image", 37  "attrs": {  "lang": "en",  "ns": "http://purl.org/dc/terms/type",  "classmap": "dpla:SourceResource",  "property": "dcterms:type"  },  "iri": "http://purl.org/dc/terms/type",  "explain": "A Dublin Core Terms Property; The nature or genre of the resource.; Recommended best practice is to use a controlled vocabulary such as the DCMI Type Vocabulary [DCMITYPE]. To describe the file format, physical medium, or dimensions of the resource, use the Format element." }   ],   "thumbnail": "https://iiif.library.ubc.ca/image/cdm.squeezes.1-0050935.0000/full/80,100/0/default.jpg",   "attribution": "Images provided for research and reference use only. Permission to publish, copy or otherwise use these images must be obtained from the Digitization Centre: http://digitize.library.ubc.ca/",   "sequences": [ {  "@id": "https://iiif.library.ubc.ca/presentation/cdm.squeezes.1-0050935/sequence/normal",  "@type": "sc:Sequence",  "label": "Default",  "viewingDirection": "left-to-right",  "viewingHint": "paged",  "canvases": [  {  "@id": "https://iiif.library.ubc.ca/presentation/cdm.squeezes.1-0050935/canvas/p0",  "@type": "sc:Canvas",  "label": "Decretum de Minervae Victoriae Sacerdote Temploque (I)",  "height": 5747,  "width": 6188,  "images": [  { 38  "@id": "https://iiif.library.ubc.ca/presentation/cdm.squeezes.1-0050935/annotation/p0000",  "@type": "oa:Annotation",  "motivation": "sc:painting",  "resource": {  "@id": "https://iiif.library.ubc.ca/image/cdm.squeezes.1-0050935",  "@type": "dctypes:Image",  "format": "image/jpeg",  "height": 5747,  "width": 6188,  "service": {  "@context": "http://iiif.io/api/image/2/context.json",  "@id": "https://iiif.library.ubc.ca/image/cdm.squeezes.1-0050935",  "@profile": "http://iiif.io/api/image/2/level2.json",  "scaleFactors": [  1,  2,  4,  8,  16,  32,  64,  128,  256,  512,  1024  ]  }  },  "on": "https://iiif.library.ubc.ca/cdm.squeezes.1-0050935/canvas/p0"  }  ],  "otherContent": [  {  "@id": "http://localhost:3000/annotation_list.json",  "@type": "sc:AnnotationList",  "label": "Text of this page"  } 39  ]  }  ] }   ],   "description": "[No description]",   "@context": "http://iiif.io/api/presentation/2/context.json",   "@id": "http://localhost:3000/epigraphic_manifest_edited.json",   "@type": "sc:Manifest" }       40 Appendix C - Epigraphic Squeeze HTML code  <!DOCTYPE html> <html>   <head> <meta charset="utf-8"> <meta http-equiv="X-UA-Compatible" content="IE=edge,chrome=1"> <link rel="stylesheet" type="text/css" href="build/mirador/css/mirador-combined.css"> <title>Mirador Viewer</title> <style type="text/css">  #viewer {  width: 100%;  height: 100%;  position: fixed;  } </style>   </head>   <body> <div id="viewer"></div>  <script src="build/mirador/mirador.js"></script> <script type="text/javascript">   $(function() {  // Called without "let" or "var"  // so we can play with it in the browser  myMiradorInstance = Mirador({  "id": "viewer",  "layout": "1x1",  "data": [  { "manifestUri": "http://localhost:3000/epigraphic_manifest_edited.json", "location": "UBC Library"},  ],  "windowObjects": [],  "annotationEndpoint": { "name":"Local Storage", "module": "LocalStorageEndpoint" },  "sidePanelOptions" : {  "tocTabAvailable": true,  "layersTabAvailable": true, 41  "searchTabAvailable": true,  "annotations" : true  },  });  }); </script>   </body> </html>        42 Appendix D - Epigraphic Squeeze IIIF annotation code  {   "@context": "http://iiif.io/api/presentation/2/context.json",   "@id": "http://localhost:3000/annotation_list.json",   "@type": "sc:AnnotationList",    "resources": [   {  "@id": "anno_01",  "@type": "oa:Annotation",  "motivation": "sc:painting",  "resource":{  "@type": "cnt:ContentAsText",  "format": "text/plain",  "chars": "<p>.&hellip;. &Alpha;&Upsilon;&Kappa;&Omicron;&Sigma;&Epsilon;&Iota;&Pi;&Epsilon; ....</p>\n<p>[.....ἐ&pi;&epsilon;&sigma;&tau;ά&tau;&epsilon;, &Gamma;&lambda;]&alpha;ῦ&kappa;&omicron;&sigmaf; &epsilon;ἶ&pi;&epsilon;&bull; [&tau;&epsilon;ῖ ]</p>"  },  "on": "https://iiif.library.ubc.ca/cdm.squeezes.1-0050935/canvas/p0#xywh=3468,928,1396,246" }, {  "@id": "anno_02",  "@type": "oa:Annotation",  "motivation": "sc:painting",  "resource":{  "@type": "cnt:ContentAsText",  "format": "text/plain",  "chars": "<p>.... &Kappa;&Epsilon;&Iota;&Eta;&Iota;&Epsilon;&Rho;&Epsilon;&Alpha;&Nu;&Epsilon;&Alpha;&Gamma;....</p>\n<p>[Ἀ&theta;&epsilon;&nu;&alpha;ί&alpha;&iota; &tau;&epsilon;ῖ &Nu;ί]&kappa;&epsilon;&iota; ἱέ&rho;&epsilon;&alpha;&nu; ἕ ἄ&gamma; [&kappa;&lambda;]</p>"  }, 43  "on": "https://iiif.library.ubc.ca/cdm.squeezes.1-0050935/canvas/p0#xywh=2987,1172,1903,212"  },  {  "@id": "anno_03",  "@type": "oa:Annotation",  "motivation": "sc:painting",  "resource":{  "@type": "cnt:ContentAsText",  "format": "text/plain",  "chars": "<p>....&Iota;&Epsilon;&Kappa;&Sigma;&Alpha;&Theta;&Epsilon;&Nu;&Alpha;&Iota;&Omicron;&Nu;&Alpha;&Pi;&Alpha;... </p>\n<p>[&epsilon;&rho;&omicron;&mu;έ&nu;&epsilon; &lambda;ά&chi;&epsilon;]&iota; ἐ&chi;&sigmaf; Ἀ&theta;&epsilon;&nu;&alpha;ί&omicron;&nu; ἁ&pi;&alpha;[&sigma;&otilde;]</p>"  },  "on": "https://iiif.library.ubc.ca/cdm.squeezes.1-0050935/canvas/p0#xywh=2293,1395,2747,209"  },  {  "@id": "anno_04",  "@type": "oa:Annotation",  "motivation": "sc:painting",  "resource":{  "@type": "cnt:ContentAsText",  "format": "text/plain",  "chars": "<p>....&Sigma;&Tau;&Alpha;&Iota;&Kappa;&Alpha;&Iota;&Tau;&Omicron;&Iota;&Epsilon;&Rho;&Omicron;&Nu;&Theta;&Upsilon;&Rho;&Omicron;&Sigma;&Alpha;</p>\n<p>[&nu; &kappa;&alpha;&theta;ί&sigma;&tau;&alpha;]&sigma;&theta;&alpha;&iota; &kappa;&alpha;&iota; &tau;ὸ ἱ&epsilon;&rho;ὸ&nu; &theta;&upsilon;&rho;&otilde;&sigma;&alpha;</p>"  },  "on": "https://iiif.library.ubc.ca/cdm.squeezes.1-0050935/canvas/p0#xywh=2027,1608,3339,197"  },  { 44  "@id": "anno_05",  "@type": "oa:Annotation",  "motivation": "sc:painting",  "resource":{  "@type": "cnt:ContentAsText",  "format": "text/plain",  "chars": "<p>&Iota;&Kappa;&Alpha;&Theta;&Omicron;&Tau;&Iota;&Alpha;&Nu;&Kappa;&Alpha;&Lambda;&Lambda;&Iota;&Kappa;&Rho;&Alpha;&Tau;&Epsilon;&Sigma;&Chi;&Sigma;&Upsilon;&Gamma;&Gamma;&Rho;&Alpha;&Phi;&Sigma;</p>\n<p>&iota; &kappa;&alpha;&theta;᾽ ὅ &tau;&iota; ἄ&nu; &Kappa;&alpha;&lambda;&lambda;&iota;&kappa;&rho;ά&tau;&epsilon;&sigmaf; &chi;&sigma;&upsilon;&gamma;&gamma;&rho;ά&phi;&sigma;</p>"  },  "on": "https://iiif.library.ubc.ca/cdm.squeezes.1-0050935/canvas/p0#xywh=584,1821,4800,203"  },  {  "@id": "anno_06",  "@type": "oa:Annotation",  "motivation": "sc:painting",  "resource":{  "@type": "cnt:ContentAsText",  "format": "text/plain",  "chars": "<p>&Epsilon;&Iota;&Alpha;&Pi;&Omicron;&Mu;&Iota;&Sigma;&Theta;&Omicron;&Sigma;&Alpha;&Iota;&Delta;&Epsilon;&Tau;&Omicron;&Sigma;&Pi;&Omicron;&Lambda;&Epsilon;&Tau;&Alpha;&Sigma;&Epsilon;&Pi;&Iota;&Tau;</p>\n<p>&epsilon;&iota;&bull; ἀ&pi;&omicron;&mu;&iota;&sigma;&upsilon;&otilde;&sigma;&alpha;&iota; &delta;ὲ &tau;ὸ&sigmaf; &pi;&omicron;&lambda;&epsilon;&tau;ά&sigmaf; ἐ&pi;ὶ &tau;</p>"  },  "on": "https://iiif.library.ubc.ca/cdm.squeezes.1-0050935/canvas/p0#xywh=566,2028,4823,209"  },  {  "@id": "anno_07",  "@type": "oa:Annotation",  "motivation": "sc:painting",  "resource":{ 45  "@type": "cnt:ContentAsText",  "format": "text/plain",  "chars": "<p>&Epsilon;&Sigma;&Lambda;&Epsilon;&Omicron;&Nu;&Tau;&Iota;&Delta;&Omicron;&Sigma;&Pi;&Rho;&Upsilon;&Tau;&Alpha;&Nu;&Epsilon;&Iota;&Alpha;&Sigma;&Phi;&Epsilon;&Rho;&Epsilon;&Nu;&Delta;&Epsilon;&Tau;</p>\n<p>ἔ&sigmaf; &Lambda;&epsilon;&omicron;&nu;&tau;ί&delta;&omicron;&sigmaf; &pi;&rho;&upsilon;&tau;&alpha;&nu;&epsilon;ί&alpha;&sigmaf;. &phi;έ&rho;&epsilon;&nu; &delta;ὲ &tau;</p>"  },  "on": "https://iiif.library.ubc.ca/cdm.squeezes.1-0050935/canvas/p0#xywh=566,2247,4823,209"  },  {  "@id": "anno_08",  "@type": "oa:Annotation",  "motivation": "sc:painting",  "resource":{  "@type": "cnt:ContentAsText",  "format": "text/plain",  "chars": "<p>&Epsilon;&Nu;&Iota;&Epsilon;&Rho;&Epsilon;&Alpha;&Nu;&Pi;&Epsilon;&Nu;&Tau;&Epsilon;&Kappa;&Omicron;&Nu;&Tau;&Alpha;&Delta;&Rho;&Alpha;&Chi;&Mu;&Alpha;&Sigma;&Kappa;&Alpha;&Iota;</p>\n<p>ἐ&nu; ἱ&epsilon;&rho;&epsilon;&alpha;&nu; &pi;&epsilon;&nu;&tau;έ&kappa;&omicron;&nu;&tau;&alpha; &delta;&rho;&alpha;&chi;&mu;ὰ&sigmaf; &kappa;&alpha;ὶ</p>"  },  "on": "https://iiif.library.ubc.ca/cdm.squeezes.1-0050935/canvas/p0#xywh=554,2466,4835,197"  },  {  "@id": "anno_09",  "@type": "oa:Annotation",  "motivation": "sc:painting",  "resource":{  "@type": "cnt:ContentAsText",  "format": "text/plain",  "chars": "<p>&Tau;&Alpha;&Sigma;&Kappa;&Epsilon;&Lambda;&Epsilon;&Kappa;&Alpha46 ;&Iota;&Tau;&Alpha;&Delta;&Epsilon;&Rho;&Mu;&Alpha;&Tau;&Alpha;&Phi;&Epsilon;&Rho;&Epsilon;&Nu;&Tau;&Omicron;&Nu;&Delta;&Epsilon;</p>\n<p>&tau;ὰ &sigma;&kappa;έ&lambda;&epsilon; &kappa;&alpha;ὶ &tau;ὰ &delta;έ&rho;&mu;&alpha;&tau;&alpha; &phi;έ&rho;&epsilon;&nu; &tau;&otilde;&nu; &delta;&epsilon;</p>"  },  "on": "https://iiif.library.ubc.ca/cdm.squeezes.1-0050935/canvas/p0#xywh=565,2677,4817,194"  },  {  "@id": "anno_10",  "@type": "oa:Annotation",  "motivation": "sc:painting",  "resource":{  "@type": "cnt:ContentAsText",  "format": "text/plain",  "chars": "<p>&Mu;&Omicron;&Sigma;&Iota;&Omicron;&Nu;&Nu;&Epsilon;&Omicron;&Nu;&Delta;&Epsilon;&Omicron;&Iota;&Kappa;&Omicron;&Delta;&Omicron;&Mu;&Epsilon;&Sigma;&Alpha;&Iota;&Kappa;&Alpha;&Theta;&Omicron;&Tau;&Iota;</p>\n<p>&mu;&omicron;&sigma;ί&omicron;&nu;&bull; &nu;&epsilon;ὸ&nu; &delta;ὲ &omicron;ἰ&kappa;&omicron;&delta;&omicron;&mu;&epsilon;&sigma;&alpha;&iota; &kappa;&alpha;&theta;᾽ὅ &tau;&iota;</p>"  },  "on": "https://iiif.library.ubc.ca/cdm.squeezes.1-0050935/canvas/p0#xywh=554,2874,4833,205"  },  {  "@id": "anno_11",  "@type": "oa:Annotation",  "motivation": "sc:painting",  "resource":{  "@type": "cnt:ContentAsText",  "format": "text/plain",  "chars": "<p>&Alpha;&Nu;&Kappa;&Alpha;&Lambda;&Lambda;&Iota;&Kappa;&Rho;&Alpha;&Tau;&Epsilon;&Sigma;&Chi;&Sigma;&Theta;&Gamma;&Gamma;&Rho;&Alpha;&Phi;&Sigma;&Epsilon;&Iota;&Kappa;&Alpha;&Iota;&Beta;&Omicron;</p>\n<p>ἀ&nu; &Kappa;&alpha;&lambda;&lambda;&iota;&kappa;&rho;ά&tau;&epsilon;&sigma47 f; &chi;&sigma;&upsilon;&gamma;&gamma;&rho;ά&phi;&sigma;&epsilon;&iota; &kappa;&alpha;ὶ &beta;&omicron;</p>"  },  "on": "https://iiif.library.ubc.ca/cdm.squeezes.1-0050935/canvas/p0#xywh=543,3082,4877,205"  },  {  "@id": "anno_12",  "@type": "oa:Annotation",  "motivation": "sc:painting",  "resource":{  "@type": "cnt:ContentAsText",  "format": "text/plain",  "chars": "<p>&Mu;&Omicron;&Nu;&Lambda;&Iota;&Theta;&Iota;&Nu;&Omicron;&Nu;</p>\n<p>&mu;ό&nu; &lambda;ί&theta;&iota;&nu;&omicron;&nu;</p>"  },  "on": "https://iiif.library.ubc.ca/cdm.squeezes.1-0050935/canvas/p0#xywh=548,3301,1711,188"  },  {  "@id": "anno_13",  "@type": "oa:Annotation",  "motivation": "sc:painting",  "resource":{  "@type": "cnt:ContentAsText",  "format": "text/plain",  "chars": "<p>&Epsilon;&Sigma;&Tau;&Iota;&Alpha;&Iota;&Omicron;&Sigma;&Epsilon;&Iota;&Pi;&Epsilon;&Tau;&Rho;&Epsilon;&Sigma;&Alpha;&Nu;&Delta;&Rho;&Alpha;&Sigma;&Epsilon;&Lambda;&Epsilon;&Sigma;&Theta;</p>\n<p>ἑ&sigma;&tau;&iota;&alpha;ῖ&omicron;&sigmaf; &epsilon;ἶ&pi;&epsilon;&bull; &tau;&rho;&epsilon;&sigmaf; ἄ&nu;&delta;&rho;&alpha;&sigma; ἑ&lambda;έ&sigma;&theta;</p>"  },  "on": "https://iiif.library.ubc.ca/cdm.squeezes.1-0050935/canvas/p0#xywh=641,3499,4757,216"  },  { 48  "@id": "anno_14",  "@type": "oa:Annotation",  "motivation": "sc:painting",  "resource":{  "@type": "cnt:ContentAsText",  "format": "text/plain",  "chars": "<p>&Alpha;&Iota;&Epsilon;&Gamma;&Beta;&Omicron;&Lambda;&Epsilon;&Sigma;&Tau;&Omicron;&Theta;&Tau;&Omicron;&Sigma;&Delta;&Epsilon;&Mu;&Epsilon;&Tau; [. ] &Kappa;&Alpha;&Lambda;&Lambda;&Iota;&Kappa;&Rho;&Alpha;</p>\n<p>&alpha;&iota; ἐ&gamma; &beta;&omicron;&lambda;&epsilon;&sigmaf;&bull; &tau;&omicron;ύ&tau;&omicron;&sigma; &delta;ὲ &mu;&epsilon;&tau;[ὰ] &Kappa;&alpha;&lambda;&lambda;&iota;&kappa;&rho;ά</p>"  },  "on": "https://iiif.library.ubc.ca/cdm.squeezes.1-0050935/canvas/p0#xywh=773,3723,4620,188"  },  {  "@id": "anno_15",  "@type": "oa:Annotation",  "motivation": "sc:painting",  "resource":{  "@type": "cnt:ContentAsText",  "format": "text/plain",  "chars": "<p>..&Sigma;&Chi;&Sigma;&Theta;&Gamma;&Gamma;&Alpha;&Phi;&Sigma;&Alpha;&Nu;&Tau;&Alpha;&Sigma; &Epsilon;&Pi;&Iota;&Delta;..</p>\n<p>[&tau;&omicron;]&sigmaf; &chi;&sigma;&upsilon;&gamma;&gamma;ά&phi;&sigma;&alpha;&nu;&tau;&alpha;&sigmaf; ἐ&pi;[&iota;&delta;&epsilon;ῖ&chi;&sigma;&alpha;&iota; &tau;&epsilon;]</p>"  },  "on": "https://iiif.library.ubc.ca/cdm.squeezes.1-0050935/canvas/p0#xywh=850,3926,3059,205"  },  {  "@id": "anno_16",  "@type": "oa:Annotation",  "motivation": "sc:painting",  "resource":{ 49  "@type": "cnt:ContentAsText",  "format": "text/plain",  "chars": "<p>....&Epsilon;&Iota;&Kappa;&Alpha;&Theta;&Omicron;&Tau;&Iota;&Alpha;&Pi;&Omicron;&Mu;.....</p>\n<p>[&iota; &beta;&omicron;&lambda;]&epsilon;&iota; &kappa;&alpha;&theta;᾽ὅ &tau;&iota; ἀ&pi;&omicron;&mu;[&iota;&sigma;&theta;&omicron;&theta;έ&sigma;&epsilon;&tau;&alpha;&iota; . .]</p>"  },  "on": "https://iiif.library.ubc.ca/cdm.squeezes.1-0050935/canvas/p0#xywh=1255,4140,1848,216"  }    ] }   50 

Cite

Citation Scheme:

        

Citations by CSL (citeproc-js)

Usage Statistics

Share

Embed

Customize your widget with the following options, then copy and paste the code below into the HTML of your page to embed this item in your website.
                        
                            <div id="ubcOpenCollectionsWidgetDisplay">
                            <script id="ubcOpenCollectionsWidget"
                            src="{[{embed.src}]}"
                            data-item="{[{embed.item}]}"
                            data-collection="{[{embed.collection}]}"
                            data-metadata="{[{embed.showMetadata}]}"
                            data-width="{[{embed.width}]}"
                            async >
                            </script>
                            </div>
                        
                    
IIIF logo Our image viewer uses the IIIF 2.0 standard. To load this item in other compatible viewers, use this url:
http://iiif.library.ubc.ca/presentation/dsp.42591.1-0362307/manifest

Comment

Related Items