- Library Home /
- Search Collections /
- Open Collections /
- Browse Collections /
- API Documentation
Open Collections
API Documentation
The Open Collections API
Here you'll find references and tutorials to help you get started using the Open Collections API to access data directly from our system. If you have any questions or feedback or questions, or want to share work that you've done using the API, please get in touch!
Quick Start
Know your way around an API and want to dive right in? Here's the essentials:
- https://oc2-index.library.ubc.ca is the base URL of the Open Collections API.
- Register for an API key and append &api_key=[:YOUR_API_KEY] (where [:YOUR_API_KEY] to any requests you make.
- Use the open collections REST API to request item and collection metadata and search the collections.
- Review the API Reference section for a full list of search endpoints, collection nicknames, metadata fields, and output formats.
- Use the Open Collections IIIF Endpoint to request images for many items in the collections or to open Open Collections content in other IIIF-compatible viewers.
- Please review and comply with our terms of use.
API Basics
What is an API?
At its most basic, an API, or Application Programming Interface, is made up of a set of defined methods that someone can use to communicate with a (often complex) software system, and get back responses in a way that a computer (and, with some practice, a human) can understand.
In essence, an API defines the 'language' a system speaks. Like a language, it has its own vocabulary with terms that have special meanings (e.g., property names and labels), grammar (how those property names and labels are arranged collectively, its schema), and syntax (i.e., how the information itself is arranged). Like a language, you can use it to ask questions and understand responses. And like a language, with a little practice, it can be a powerful and extensible tool for communication.
A request is a URL sent to the web server over HTTP with the expectation of getting resource items back in the form of human-readable text or data. The URL supplies the web server with everything it needs to create and return a correct response. This is called a RESTful approach to API design and is employed by the Open Collections API.
How do I use it?
- Register an API Key
- Construct a query to request item and collection metadata or search results
- Use your web browser, a browser plugin like Postman, or your custom code to make your request
- Read the response
API Keys
To control abuse, users will need to register their email address to receive an API Key (much like DPLA).
Users are also constrained to no more than 200 requests per minute.
Register an API Key
Name | API Key | Rate Limit: |
---|---|---|
Public API Key | ac40e6c2cb345593ed1691e0a8b601bba398e42d85f81f893c5ab709cec63c6c | 10req/min |
Your API Key | You haven't registered an API key. | 200req/min |
Constructing a Request
Most data from the Open Collections API can be accessed with simple HTTP GET requests. These type of requests look essentially like the URL of a regular web page, but return data objects rather than HTML markup. These URLs can have two parts:
- A URL or "endpoint" that indicates where to direct the query. The Open Collections API base URL is: https://oc2-index.library.ubc.ca . See below for a full list of available endpoints .
- A query string consisting of parameters that describe the data being requested. This is used for search requests.
Parsing A Response
The Open Collections API returns responses in JSON format by default, and some endpoints can return other formats on request. See the API Reference section for more details.
Tip: Browser extensions such as the Chrome JSON Formatter can make viewing and working with JSON in your browser much easier.
Tip: JSON stores data in 'objects,' that do not always translate easily into tabular formats, but tools such as this JSON to CSV converter or Open Refine can help you get API data into a spreadsheet or table if that's what you need.
Using the Open Collections API
Metadata Endpoints
Collection and item metadata is requested from the Open Collections API by querying specifically structured endpoints. All metadata endpoints begin with the API base url, https://oc2-index.library.ubc.ca
, followed by a specific path for the collection and item being requested.
Collections
A collection is a reference to a set of items indexed in Open Collections. Collections have two
RESTful endpoints associated with them, a /collections/
endpoint that returns information about the collection, and and a /collections/.../items/
endpoint that lists the contents of the collection.
Collection Metadata
The collection metadata endpoints return data describing the contents of a collection.
The collections endpoints are constructed as follows, where
[:name]
is the nickname of a collection in Open Collections:
https://oc2-index.library.ubc.ca/collections/[:name]
For example, the following endpoint returns the metadata associated with the Berkeley Posters Collection:
https://oc2-index.library.ubc.ca/collections/berkpost
Mappings for all collection nicknames are available here.
Collection Items
The collections items endpoints return a list all items in a collections. There are two types of collection items endpoints, one returns a simple list of all items in a collection (ids only), and the other returns a paginated list of all items in the collection with a richer subset of metadata.
The collection items endpoints to return a full list of items are constructed as follows, where [:name]
is the nickname of the collection:
https://oc2-index.library.ubc.ca/collections/[:name]/items
For example, the following returns a list of all items in the Arkley collection:
https://oc2-index.library.ubc.ca/collections/arkley/items
To return a paginated list of items with richer metadata, add the limit
and offset parameters
as in the example below, where [:limit]
is the number of items to return per page,
and [:offset]
is the index of the first item on the page (i.e. with a page size of 10, the second page would start with the offset 10).
https://oc2-index.library.ubc.ca/collections/[:name]/items?limit=[:limit]&offset[:offset]
For example, the following returns the first ten items from the the Arkley collection:
https://oc2-index.library.ubc.ca/collections/arkley/items?limit=10&offset=0
Item Metadata
An "item" in open collection is typically a single piece of content indexed in Open Collections. The content can be, for example, a book, an image, a video, and in some cases can have multiple components (ie. a PDF and an audio file). The /item/
endpoint returns data
The following request returns the metadata for a specific item, where [:name]
is the nickname of the item's parent collection, and [:itemId] is the item's unique identifier:
https://oc2-index.library.ubc.ca/collections/[:name]/items/[:itemId]
For example, the following returns the metadata from a single item in the the Arkley collection:
https://oc2-index.library.ubc.ca/collections/arkley/items/1.0013125
Tip: You can grab the requisite
[:name]
and
[:itemId]
values from the collection/.../items
endpoint.
Search Endpoint
The Open Collections /search/[:api_version]/
endpoint returns search results and metadata from the system's ElasticSearch index, where [:api_version] is the version of ElasticSearch syntax to use (currently we are supporting 7.5.1). The endpoint supports GET requests with a limited subset of parameters, and more complete access to the full ElasticSearch feature set via POST request (see Advanced Usage).
Basic Searching (GET Requests)
This endpoint returns a basic set of search results, and [:query] is your query string. The endpoint supports basic strings and Lucene query syntax:
https://oc2-index.library.ubc.ca/search/7.5.1?q=[:query]
For example, the following will submit a search for the term "cat":
https://oc2-index.library.ubc.ca/search/7.5.1?q=cat
To limit your search to a specific collection, you can use the 'index' parameter where [:index] is the nickname of the collection or collections you would like to search (multiple collection nicknames can be combined with commas):
https://oc2-index.library.ubc.ca/search/7.5.1?q=[:query]&index=[:index]
For example, the following will submit a search for the term "cat" in the Tremaine Arkley Croquet Collections and The Alice Arm and Anyox Herald newspaper collection:
https://oc2-index.library.ubc.ca/search/7.5.1?q=cat&index=arkley
To limit your search to a specific metadata value, you can use the 'term' parameter where [:field] is the name of the metadata field, and [:term] is the value you would like to match. Multiple fields and terms can be specified, separated by semicolons:
https://oc2-index.library.ubc.ca/search/7.5.1?q=[:query]&term=[:field],[:term];[:field],[:term]
For example, the following will submit a search for the term "cat" where "Type" is "Still Image" and "creator" is "Kull, Bob". Note that adding ".raw" to the field name enforces an exact match, and that the values are URI-encoded to prevent errors:
https://oc2-index.library.ubc.ca/search/7.5.1?q=cat&term=creator.raw,Kull%2C%20Bob;type.raw,Still%20Image
See the full
list of supported parameters in the API Reference or at
https://oc2-index.library.ubc.ca/search/help
. Note that parameter values should be
URI-encoded
to avoid errors.
Advanced Usage
For more advanced use cases, the Open Collections Search API provides access to the full power of its underlying ElasticSearch index, including aggregations (useful for data visualization and analysis), advanced searching and filtering, geolocation, and much more (everything except custom script fields).
ElasticSearch version 7.5.1 Query DSL objects can be submitted to the
/search/7.5.1/
endpoint via POST request and can be combined with the Open Collections
search URI parameters. Query DSL object values will take precedence over URI parameter values.
Learn more about the ElasticSearch API:
API Reference
API Endpoints
The following endpoints are available for use:
Method | Endpoint | Description | Status |
---|---|---|---|
GET | collections | Returns a list of all collections | Online |
GET | collections/:collection_identifier | Returns the metadata for a specific collection | Online |
GET | collections/:collection_identifier/download/collection | Returns a GZIP download of a specific collection's metadata | Online |
GET | collections/:collection_identifier/items | Returns a list of all items for a specific collection | Online |
GET | collections/:collection_identifier/items/:item_identifier | Returns the metadata for a specific collection's single item | Online |
GET | collections/:collection_identifier/_total | Returns the item count of a specific collection | Online |
GET | /search/help | Returns list of accepted OC Search API query parameters. | Online |
GET | /search/7.5.1 | Accepts OC Search API parameters, returns search results. | Online |
POST | /search/7.5.1 | Accepts ElasticSearch 7.5.1 Query DSL objects, returns search results, aggregation data, and more. | Online |
Collections
Collection ID | Collection Name |
---|---|
59371 | American Physical Society Northwestern Section Annual Meeting (APS-NW) (11th : 2009) |
artefacts | Ancient Artefacts |
mccormick | Andrew McCormick Maps and Prints |
auce | Association of University and College Employees (AUCE) fonds |
bcbooks | BC Historical Books |
bcdocs | BC Historical Documents |
bcsessional | BC Sessional Papers |
48630 | BIRS Workshop Lecture Videos |
berkpost | Berkeley 1968-1973 Poster Collection |
biblos | Biblos |
bch | British Columbia History |
59367 | British Columbia Mine Reclamation Symposium |
ccmms | CCS - Chinese Canadian Military Museum Society |
chockon | CCS - Chock On Fong Fonds |
gri | CCS - Gorsebrook Research Institute for Atlantic Studies |
loktin | CCS - Henry Lok-Tin Lee |
instrcc | CCS - INSTRCC |
kcca | CCS - Kamloops Chinese Cultural Association |
louie | CCS - Louie Papers |
normankwong | CCS - Norman Kwong Collection |
bicklee | CCS - Ron Bick Lee Fonds |
wahshun | CCS - Wah Shun Company |
59370 | Canadian Summer School on Quantum Information (CSSQI) (10th : 2010) |
capilano | Capilano Timber Company Fonds |
darwin | Charles Darwin Letters |
chineserare | Chinese Rare Books |
chungobject | Chung Objects |
chungosgr | Chung Oversize and Graphic Materials |
chungphotos | Chung Photographs |
chungpub | Chung Published Works |
chungtext | Chung Textual Materials |
citraudio | CiTR Audiotapes |
59404 | Congress of the Humanities and Social Sciences (77th : 2008) |
46624 | Consortium for Nursing History Inquiry |
cg | Creative Giving |
davidconde | David Conde Fonds |
davidsonia | Davidsonia |
delgamuukw | Delgamuukw Trial Transcripts |
dhimjournal | Digital Himalaya Journals |
55474 | Digital Library Federation (DLF) (2015) |
discorder | Discorder |
dorothyburn | Dorothy Burnett Bookbinding Tools |
djcox | Doug and Joyce Cox Research Collection |
24 | Electronic Theses and Dissertations (ETDs) 2008+ |
ecrosby | Emma Crosby Letters |
squeezes | Epigraphic Squeezes |
etheljohns | Ethel Johns Fonds |
focus | FOCUS |
52383 | Faculty Research and Publications |
first100theses | First Hundred Theses |
fisherman | Fisherman Publishing Society Collection |
florence | Florence Nightingale Letters |
germanconsulate | German Consulate fonds |
42591 | Graduate Research [non-thesis] |
gvrdmaps | Greater Vancouver Regional District Planning Department Land Use Maps |
bullock | H. Bullock-Webster fonds |
stravinsky | H. Colin Slim Stravinsky Collection |
hawthorn | Hawthorn Fly Fishing & Angling Collection |
hdoyle | Henry Doyle Fonds |
childrenlit | Historical Children's Literature Collection |
nursing | History of Nursing |
iel | Indian Education Newsletter |
feeders | Infant Feeders Collection |
59405 | Insights from Nuclear Magnetic Resonance (NMR) : A symposium in honour of Myer Bloom (1993) |
ifcsm | Interim Forest Cover Series Maps |
59585 | International Conference of Institutes and Libraries for Chinese Overseas Studies (WCILCOS) (5th : 2012) |
53032 | International Conference on Applications of Statistics and Probability in Civil Engineering (ICASP) (12th : 2015) |
52657 | International Conference on Engineering Education for Sustainable Development (EESD) (7th : 2015) |
59278 | International Conference on Gas Hydrates (ICGH) (6th : 2008) |
53926 | International Conference on Health Promoting Universities and Colleges (7th : 2015) |
52660 | International Construction Specialty Conference of the Canadian Society for Civil Engineering (ICSC) (5th : 2015) |
62152 | International Institute for Critical Studies in Improvisation (IICSI) Colloquium |
59406 | Investigating Our Practices (IOP) |
67657 | Irving K. Barber Learning Centre Events |
jphotos | Japanese Canadian Photograph Collection |
tokugawa | Japanese Maps of the Tokugawa Era |
johnkeenlyside | John Keenlyside Legal Research Collection |
kinesis | Kinesis |
holland | Laura Holland Fonds |
67247 | Leader in Residence (Okanagan) |
33850 | Library Awards |
67246 | Library History and Events |
494 | Library Staff Publications and Research |
creelman | Lyle Creelman Fonds |
macmillan | MacMillan Bloedel Limited fonds |
meiji150 | Meiji at 150 |
59369 | Metropolis British Columbia Policy Research Symposium (MBC) (2008) |
upubmisc | Miscellaneous Documents |
42446 | Multidisciplinary Undergraduate Research Conference (MURC) |
60499 | NEXUS Spring Institute |
agassiz | Newspapers - Agassiz Record |
xalberniadv | Newspapers - Alberni Advocate |
armstrongad | Newspapers - Armstrong Advance |
arlaadvo | Newspapers - Arrow Lake Advocate |
bcln | Newspapers - BC Labor News (Vancouver) |
bclumber | Newspapers - BC Lumberman (Vancouver) |
bctu | Newspapers - BC Trades Unionist (Vancouver) |
xbellacoo | Newspapers - Bella Coola Courier |
bensun | Newspapers - Bennett Sun |
bcfed | Newspapers - British Columbia Federationist |
bcnews | Newspapers - British Columbia News |
xbcrecord | Newspapers - British Columbia Record |
bcret | Newspapers - British Columbia Retailer |
bct | Newspapers - British Columbia Tribune (Yale) |
brooklynnews | Newspapers - Brooklyn News |
cflacla | Newspapers - Canadian Farmer Labor Advocate and Canadian Labor Advocate |
canford | Newspapers - Canford Radium |
cascade | Newspapers - Cascade Record |
cassiarnews | Newspapers - Cassiar News (Stewart) |
chasetrib | Newspapers - Chase Tribune |
chilliwackfp | Newspapers - Chilliwack Free Press |
citizennw | Newspapers - Citizen - New Westminister |
cwhustler | Newspapers - Clanwilliam Hustler |
coalmont | Newspapers - Coalmont Courier |
coasmine | Newspapers - Coast Miner (Van Anda) |
columbiarev | Newspapers - Columbia Review |
courtenayrev | Newspapers - Courtenay Review |
cwn | Newspapers - Courtenay Weekly News |
cranherald | Newspapers - Cranbrook Herald |
croftongaz | Newspapers - Crofton Gazette and Cowichan News |
cumberlandis | Newspapers - Cumberland Islander |
dbc | Newspapers - Daily British Columbian |
xdbr | Newspapers - Daily Building Record |
enterprise | Newspapers - Duncan Enterprise |
eastkootmine | Newspapers - East Kootenay Miner (Golden) |
echo | Newspapers - Echo (Duncan) |
epnoh | Newspapers - Enderby Progress and Northern Okanagan Herald |
eveningt | Newspapers - Evening Telegraph (Victoria) |
evewoross | Newspapers - Evening World (Rossland) |
fgherald | Newspapers - Fort George Herald |
fraseradvanc | Newspapers - Fraser Advance (Chilliwack) |
gcdb | Newspapers - General Conference Daily Bulletin (Victoria) |
glennews | Newspapers - Glenora News |
goldenera | Newspapers - Golden Era |
goldentimes | Newspapers - Golden Times |
gvchinook | Newspapers - Greater Vancouver Chinook |
hqueek | Newspapers - Hazelton Queek |
xhotsprings | Newspapers - Hot Springs News (Ainsworth) |
htbva | Newspapers - Hubert Times and Bulkley Valley Advertiser |
indworld | Newspapers - Industrial World (Rossland) |
kwawa | Newspapers - Kamloops Wawa |
xkelownarec | Newspapers - Kelowna Record and The Orchard City Record |
kerechro | Newspapers - Keremeos Chronicle |
kootstar | Newspapers - Kootenay Star (Revelstoke) |
laborstar | Newspapers - Labor Star (Vancouver) |
ladysmithl | Newspapers - Ladysmith Leader |
ladysmithr | Newspapers - Ladysmith Recorder |
ladysmithsi | Newspapers - Ladysmith Signal |
ladysmithst | Newspapers - Ladysmith Standard |
ardeau | Newspapers - Lardeau Eagle (Ferguson) |
lardeaum | Newspapers - Lardeau Mining Review (Trout Lake) |
leadera | Newspapers - Leader Advocate (Vancouver) |
locla | Newspapers - Lowery's Claim |
mherald | Newspapers - Mail Herald (Revelstoke) |
michelr | Newspapers - Michel Recorder |
morninglnw | Newspapers - Morning Ledger - New Westminster |
mpadvocate | Newspapers - Mount Pleasant Advocate (Vancouver) |
ndaymine | Newspapers - Nelson Daily Miner |
nelsondaily | Newspapers - Nelson Daily News |
nwminer | Newspapers - Nelson Weekly Miner |
nwdn | Newspapers - New Westminster Daily News |
newestimes | Newspapers - New Westminster Times |
nicoheral | Newspapers - Nicola Herald |
norcoa | Newspapers - North Coast (Port Simpson) |
omr | Newspapers - Okanagan Mining Review |
omineca | Newspapers - Omineca Herald (Hazelton) |
ominecaminer | Newspapers - Omineca Miner (Hazelton) |
paccannw | Newspapers - Pacific Canadian (New Westminster) |
paystreak | Newspapers - Paystreak (Sandon) |
peloyalist | Newspapers - Port Essington Loyalist |
pmgazette | Newspapers - Port Moody Gazette |
prj | Newspapers - Prince Rupert Journal |
princero | Newspapers - Prince Rupert Optimist |
pwv | Newspapers - Progress and Week – Victoria |
cranbrookpro | Newspapers - Prospector (Cranbrook) |
proslill | Newspapers - Prospector (Lillooet) |
prossross | Newspapers - Prospector (Rossland) |
qcminer | Newspapers - Quartz Creek Miner |
qcislander | Newspapers - Queen Charlotte Islander |
redflag | Newspapers - Red Flag (Vancouver) |
xrevherald | Newspapers - Revelstoke Herald |
sfjcbce | Newspapers - San Francisco Journal of Commerce B.C. Edition (Victoria) |
satworld | Newspapers - Saturday World (Rossland) |
silsil | Newspapers - Silverton Silvertonian |
similkameen | Newspapers - Similkameen Star |
slodrill | Newspapers - Slocan Drill |
smreview | Newspapers - Slocan Mining Review |
slocanp | Newspapers - Slocan Prospector |
slorec | Newspapers - Slocan Record |
surreytimes | Newspapers - Surrey Times (Cloverdale) |
xabpost | Newspapers - The Abbotsford Post |
advance | Newspapers - The Advance (Midway) |
aaah | Newspapers - The Alice Arm and Anyox Herald |
xanaconda | Newspapers - The Anaconda News |
xatlin | Newspapers - The Atlin Claim |
xboundarycr | Newspapers - The Boundary Creek Times |
xcariboosen | Newspapers - The Cariboo Sentinel (Barkerville) |
xcoastnews | Newspapers - The Coast News (Gibsons) |
xcrestonrev | Newspapers - The Creston Review |
xcumberland | Newspapers - The Cumberland News |
dcanadi | Newspapers - The Daily Canadian |
xdailyledg | Newspapers - The Daily Ledger (Ladysmith) |
daytele | Newspapers - The Daily Telegram |
deltnews | Newspapers - The Delta News |
delttime | Newspapers - The Delta Times |
despatch | Newspapers - The Despatch (Morrissey) |
disledfer | Newspapers - The District Ledger (Fernie) |
xenderby | Newspapers - The Enderby Press and Walker's Weekly |
evenkoot | Newspapers - The Evening Kootenaian |
expressnv | Newspapers - The Express (North Vancouver) |
fernieled | Newspapers - The Fernie Ledger (Fernie) |
gfminer | Newspapers - The Grand Forks Miner |
xgrandforks | Newspapers - The Grand Forks Sun |
greemine | Newspapers - The Greenwood Miner |
xhedley | Newspapers - The Hedley Gazette |
htimes | Newspapers - The Hosmer Times |
xindependen | Newspapers - The Independent (Vancouver) |
koolib | Newspapers - The Kootenay Liberal |
xkootmail | Newspapers - The Kootenay Mail |
ledgefern | Newspapers - The Ledge (Fernie) |
xledgreen | Newspapers - The Ledge (Greenwood) |
ledge | Newspapers - The Ledge (Nakusp) |
ledgenel | Newspapers - The Ledge (Nelson) |
xnakledge | Newspapers - The Ledge (New Denver) |
lilladva | Newspapers - The Lillooet Advance |
marytrib | Newspapers - The Marysville Tribune |
xmassett | Newspapers - The Massett Leader |
xminer | Newspapers - The Miner (Nelson) |
xminingrev | Newspapers - The Mining Review (Sandon) |
misscity | Newspapers - The Mission City News |
mmention | Newspapers - The Morrissey Mention |
mminer | Newspapers - The Morrissey Miner |
xmoyie | Newspapers - The Moyie Leader |
nanacour | Newspapers - The Nanaimo Courier |
nanamail | Newspapers - The Nanaimo Mail |
xnelsonecon | Newspapers - The Nelson Economist |
xnicola | Newspapers - The Nicola Valley News (Merritt) |
thenugget | Newspapers - The Nugget |
xpentimes | Newspapers - The Peninsula Times |
penpress | Newspapers - The Penticton Press |
xphoenix | Newspapers - The Phoenix Pioneer |
xprospector | Newspapers - The Prospector (Fort Steele) |
thestar | Newspapers - The Star (Port Essington) |
thesun | Newspapers - The Sun (Port Essington) |
xtribune | Newspapers - The Tribune (Nelson) |
vancouverw | Newspapers - The Vancouver Weekly Herald and North Pacific News |
thewave | Newspapers - The Wave (Victoria) |
xwestcall | Newspapers - The Western Call (Vancouver) |
truthd | Newspapers - Truth - Donald |
vslp | Newspapers - Valley Sentinel - Langley Prairie |
vanad | Newspapers - Vancouver Advertiser |
vanbuildrec | Newspapers - Vancouver Building Record |
beaverdell | Newspapers - West Forks News (Beaverdell) |
wclarion | Newspapers - Western Clarion (Vancouver) |
westho | Newspapers - Westward Ho! |
ymirherald | Newspapers - Ymir Herald |
ymirminer | Newspapers - Ymir Miner |
ymirmirror | Newspapers - Ymir Mirror |
ohs | Okanagan Historical Society Reports |
hundred | One Hundred Poets | 百人一首 |
67656 | Open Access Week |
prism | PRISM international |
pedestal | Pedestal |
anderson | Peter Anderson fonds |
presrep | Presidents’ Reports |
75346 | Psychology Undergraduate Research Conference (PURC) |
mathison | R. Mathison Collection |
bookplate | RBSC Bookplates |
rainbow | Rainbow Ranche Collection |
libsenrep | Report of the University Librarian to the Senate |
831 | Retrospective Theses and Dissertations, 1919-2007 |
rosetti | Rosetti Studios - Stanley Park Collection |
royalfisk | Royal Fisk Gold Rush Letters |
saga | SAGA Document Collection |
310 | SCARP Graduating Projects |
76471 | Sawchen Series |
51869 | Science One Research Projects |
senmin | Senate Minutes |
33426 | Supplementary Thesis Materials and Errata |
59374 | Symposium on Early Modern Japanese Values and Individuality (2013) |
51833 | TRIUMF Reports |
59368 | Tailings and Mine Waste Conference |
tairikunipp | Tairiku Nippō (Continental Daily News) |
bcreports | The British Columbia Reports |
alumchron | The Graduate Chronicle/The UBC Alumni Chronicle/Trek |
touchpoints | Touchpoints |
tgdp | Traité général des pesches |
arkley | Tremaine Arkley Croquet Collection |
archivesav | UBC Archives Audio Recordings Collection |
arphotos | UBC Archives Photograph Collection |
25332 | UBC Authors and Their Works Program, 1991-2006 |
calendars | UBC Calendars |
52387 | UBC Community and Partner Publications |
ubccong | UBC Congregation Video Collection |
31776 | UBC Faculty Publications Lists (1928-1969) |
31775 | UBC Historical Sound and Moving Image Collection |
ubchist | UBC History |
fisheries | UBC Institute of Fisheries Field Records |
43377 | UBC Japanese Canadian Students of 1942 |
67634 | UBC Lectures, Seminars, and Symposia |
ubcavfrc | UBC Legacy Video Collection |
specialp | UBC Library Digitization Centre Special Projects |
framed | UBC Library Framed Works Collection |
ubclibnews | UBC Library News |
ubclsb | UBC Library Staff Bulletin |
ubcmedicine | UBC Medicine |
53169 | UBC President's Speeches and Writings |
641 | UBC Press Publications, Supplements, and Catalogues |
ubcreports | UBC Reports |
researchdata | UBC Research Data |
66428 | UBC Social Ecological Economic Development Studies (SEEDS) Student Reports (Graduate) |
18861 | UBC Social Ecological Economic Development Studies (SEEDS) Student Reports (Undergraduate) |
ubcstuhan | UBC Student Handbooks |
ubctp | UBC Theatre Programmes |
ubcyearb | UBC Yearbooks |
ubysseynews | Ubyssey |
52966 | Undergraduate Research |
langmann | Uno Langmann Family Collection of British Columbia Photographs |
12708 | Vancouver Institute Lectures |
vma | Vancouver Medical Association |
wwposters | WWI & WWII Posters |
73804 | West Coast Conference on Formal Linguistics (WCCFL) (38th : 2020) |
manuscripts | Western Manuscripts and Early Printed Books |
westland | Westland |
58233 | Workshop for Instruction in Library Use (WILU) (45th : 2016) |
70440 | World Sanskrit Conference (WSC) (17th : 2018) |
wwiphoto | World War I British press photograph collection |
yipsang | Yip Sang Collection |
63300 | Ziegler Series |
33381 | cIRcle License Text |
32457 | iSchool (Library, Archival and Information Studies) Research Days |
the432 | 432 |
Fields
To view more about our metadata terms please view this page.
Search Parameters
The open collections https://oc2-index.library.ubc.ca/search/7.5.1
endpoint accepts the following paremeters in GET requests:
Parameter | Description | Default Value | Notes |
---|---|---|---|
q | Query string | * | Accepts basic string or boolean query (lucene query syntax). should be uri encoded. |
size | Number of responses | 20 | |
from | Index of first result | 0 | Used to paginate through results (ie. use '19' to return results starting with the 20th hit) |
index | Search index | oc | Used to limit search to specific collections (use collection 'nickname'). 'oc' searches all content. supports multiple values (comma-separated) |
sort | How to sort results | _score,desc | Comma-separated: [:field],[:order]. accepts '_score' and metadata field names, and 'asc'/'desc' |
term | Filter by metadata field contents, must match all | Comma-separated: field,filter_on. supports multiple value pairs separated by semicolons. use [:field].raw to match full text of field value. | |
terms | Filter by metadata field contents, match any in a given field | Comma-separated: field,filter_on. supports multiple value pairs separated by semicolons. use [:field].raw to match full text of field value. | |
range | Limit to range of values for a metadata field | Comma-separated: field,gte,lte,date_format(date_format is optional. if not specified, the provided date will be matched with all available formats until a match is found). only works on numeric fields: e.g. sortdate accepts milliseconds. accepted date formats are: yyyy-mm-dd, yyyy-mm, y, date_time_no_millis, date_hour_minute_second, epoch_millis | |
source | Array of field names to return | Comma separated. if omitted a default set of metadata will be returned. | |
key | Api key | Requests without an api key will be rate limited |
Output Formats
Responses from the API are returned in JSON by default however there are a number of different formats available which we will show you how to request for below.
GET Requests
For GET requests you should replace
:outputFormat
in the URL below with the format you want returned:
https://oc2-index.library.ubc.ca/collections/
[:name]
/items/
[:itemId]
/output-format/
[:outputFormat]
POST Requests
For POST requests you should set the HTTP Accept header to the format you want returned.
Available Formats
Description | GET Value | POST Value |
---|---|---|
Json conforming to ubc metadata manual | json | application/json |
Json-ld (with ubc property tags) | ubc+json | application/json |
Json-ld | ld | application/json |
Json-ld | ld+json | application/json |
Rdf/xml from json-ld (deprecated) | ld+rdf | application/xml |
Rdf+json rdf in json | rdf+json | application/xml |
Rdf/xml | rdf | application/xml |
Rdf/xml | xml | application/xml |
Xml stored at datacite (the metadata attached to the document doi) | datacite | application/xml |
Turtle stored at datacite (the metadata attached to the document doi) | dx | text/plain |
Turtle stored at datacite (the metadata attached to the document doi) | dx+turtle | text/plain |
Bibtex stored at datacite (the metadata attached to the document doi) | dx+bibtex | text/plain |
Ris stored at datacite (the metadata attached to the document doi) | dx+ris | text/plain |
Rdf stored at datacite (the metadata attached to the document doi) | dx+rdf | application/xml |
Plain text (turtle) | turtle | text/plain |
Plain text (ntriples) | ntriples | text/plain |
Just the item's fulltext (gosh all that full text!!) | plaintext | text/plain |
IIIF API
UBC Open Collections uses the International Image Interoperability Framework ("IIIF") standard to display image content in the site, and provides access to this API through the
https://iiif.library.ubc.ca
endpoint. This endpoint can return images resized and modified to specific specifications as well as metadata ("manifests") associated with the images, and can be used to view content in a number of different
IIIF-compatible viewers.
Learn more about the IIIF standard, and the specific parameters available here:
When working with results from the Open Collections Metadata API the item identifiers need to have the dot replaced with a dash to work with the IIIF API. For instance '1.0001258' would become '1-0001258'
Example
Say we find this wonderful item about winning the war by eating less bread and want to view the item in IIIF.
First you'll need to look at the 'Embed' accordion on the item's page and if you open that up you will see the following link:
http://iiif.library.ubc.ca/presentation/cdm.wwposters.1-0038290/manifest
This will give us the manifest for the image/s in JSON, which includes the metadata and information about the sequences(image/s) like so:
{ "label": "The Kitchen is the [key] to victory. Eat less bread", "viewingDirection": "left-to-right", "viewingHint": "paged", "metadata": [ { "label": "Collection", "value": [ { "label": "Is Part Of", "value": "World War I Poster and Broadside Collection", "attrs": { "lang": "en", "ns": "http:\/\/purl.org\/dc\/terms\/isPartOf", "classmap": "dpla:SourceResource", "property": "dcterms:isPartOf" }, "iri": "http:\/\/purl.org\/dc\/terms\/isPartOf", "explain": "A Dublin Core Terms Property; A related resource in which the described resource is physically or logically included." } ] }, { "label": "DateAvailable", "value": [ { "label": "Date Available", "value": "2013-10", "attrs": { "lang": "en", "ns": "http:\/\/purl.org\/dc\/terms\/issued", "classmap": "edm:WebResource", "property": "dcterms:issued" }, "iri": "http:\/\/purl.org\/dc\/terms\/issued", "explain": "A Dublin Core Terms Property; Date of formal issuance (e.g., publication) of the resource." } ] }, { "label": "DateCreated", "value": [ { "label": "Date Created", "value": "[between 1914 and 1918?]", "attrs": { "lang": "en", "ns": "http:\/\/purl.org\/dc\/terms\/created", "classmap": "oc:SourceResource", "property": "dcterms:created" }, "iri": "http:\/\/purl.org\/dc\/terms\/created", "explain": "A Dublin Core Terms Property; Date of creation of the resource." } ] }, { "label": "Description", "value": [ { "label": "Description", "value": "World War I poster.", "attrs": { "lang": "en", "ns": "http:\/\/purl.org\/dc\/terms\/description", "classmap": "dpla:SourceResource", "property": "dcterms:description" }, "iri": "http:\/\/purl.org\/dc\/terms\/description", "explain": "A Dublin Core Terms Property; An account of the resource.; Description may include but is not limited to: an abstract, a table of contents, a graphical representation, or a free-text account of the resource." } ] }, { "label": "DigitalResourceOriginalRecord", "value": [ { "label": "Digital Resource Original Record", "value": "https:\/\/open.library.ubc.ca\/collections\/wwposters\/items\/1.0038290", "attrs": { "lang": "en", "ns": "http:\/\/www.europeana.eu\/schemas\/edm\/aggregatedCHO", "classmap": "ore:Aggregation", "property": "edm:aggregatedCHO" }, "iri": "http:\/\/www.europeana.eu\/schemas\/edm\/aggregatedCHO", "explain": "A Europeana Data Model Property; The identifier of the source object, e.g. the Mona Lisa itself. This could be a full linked open date URI or an internal identifier" } ] }, { "label": "Extent", "value": [ { "label": "Extent", "value": "1 poster ; 76 x 51 cm", "attrs": { "lang": "en", "ns": "http:\/\/purl.org\/dc\/terms\/extent", "classmap": "dpla:SourceResource", "property": "dcterms:extent" }, "iri": "http:\/\/purl.org\/dc\/terms\/extent", "explain": "A Dublin Core Terms Property; The size or duration of the resource." } ] }, { "label": "FileFormat", "value": [ { "label": "Format", "value": "image\/jpeg", "attrs": { "lang": "en", "ns": "http:\/\/purl.org\/dc\/elements\/1.1\/format", "classmap": "edm:WebResource", "property": "dc:format" }, "iri": "http:\/\/purl.org\/dc\/elements\/1.1\/format", "explain": "A Dublin Core Elements Property; The file format, physical medium, or dimensions of the resource.; Examples of dimensions include size and duration. Recommended best practice is to use a controlled vocabulary such as the list of Internet Media Types [MIME]." } ] }, { "label": "Genre", "value": [ { "label": "Genre", "value": "Posters", "attrs": { "lang": "en", "ns": "http:\/\/www.europeana.eu\/schemas\/edm\/hasType", "classmap": "dpla:SourceResource", "property": "edm:hasType" }, "iri": "http:\/\/www.europeana.eu\/schemas\/edm\/hasType", "explain": "A Europeana Data Model Property; This property relates a resource with the concepts it belongs to in a suitable type system such as MIME or any thesaurus that captures categories of objects in a given field. It does NOT capture aboutness" } ] }, { "label": "Identifier", "value": [ { "label": "UBC Call Number", "value": "SPAM462C", "attrs": { "lang": "en", "ns": "http:\/\/purl.org\/dc\/terms\/identifier", "classmap": "dpla:SourceResource", "property": "dcterms:identifier" }, "iri": "http:\/\/purl.org\/dc\/terms\/identifier", "explain": "A Dublin Core Terms Property; An unambiguous reference to the resource within a given context.; Recommended best practice is to identify the resource by means of a string conforming to a formal identification system." } ] }, { "label": "IsShownAt", "value": [ { "label": "DOI", "value": "10.14288\/1.0038290", "attrs": { "lang": "en", "ns": "http:\/\/www.europeana.eu\/schemas\/edm\/isShownAt", "classmap": "edm:WebResource", "property": "edm:isShownAt" }, "iri": "http:\/\/www.europeana.eu\/schemas\/edm\/isShownAt", "explain": "A Europeana Data Model Property; An unambiguous URL reference to the digital object on the provider\u2019s website in its full information context." } ] }, { "label": "Language", "value": [ { "label": "Language", "value": "English", "attrs": { "lang": "en", "ns": "http:\/\/purl.org\/dc\/terms\/language", "classmap": "dpla:SourceResource", "property": "dcterms:language" }, "iri": "http:\/\/purl.org\/dc\/terms\/language", "explain": "A Dublin Core Terms Property; A language of the resource.; Recommended best practice is to use a controlled vocabulary such as RFC 4646 [RFC4646]." } ] }, { "label": "Notes", "value": [ { "label": "Notes", "value": "Printed on bottom left: \"F.C. no. 22.\" Donated by Simon Newcomb.", "attrs": { "lang": "en", "ns": "http:\/\/www.w3.org\/2009\/08\/skos-reference\/skos.html#note", "classmap": "skos:Concept", "property": "skos:note" }, "iri": "http:\/\/www.w3.org\/2009\/08\/skos-reference\/skos.html#note", "explain": "Simple Knowledge Organisation System; Notes are used to provide information relating to SKOS concepts. There is no restriction on the nature of this information, e.g., it could be plain text, hypertext, or an image; it could be a definition, information about the scope of a concept, editorial information, or any other type of information." } ] }, { "label": "Publisher", "value": [ { "label": "Publisher - Original", "value": "London, England : Printed by Hazell, Watson & Viney Ltd. Litho.", "attrs": { "lang": "en", "ns": "http:\/\/purl.org\/dc\/terms\/publisher", "classmap": "dpla:SourceResource", "property": "dcterms:publisher" }, "iri": "http:\/\/purl.org\/dc\/terms\/publisher", "explain": "A Dublin Core Terms Property; An entity responsible for making the resource available.; Examples of a Publisher include a person, an organization, or a service." }, { "label": "Publisher - Digital", "value": "Vancouver : University of British Columbia Library", "attrs": { "lang": "en", "ns": "http:\/\/purl.org\/dc\/terms\/publisher", "classmap": "dpla:SourceResource", "property": "dcterms:publisher" }, "iri": "http:\/\/purl.org\/dc\/terms\/publisher", "explain": "A Dublin Core Terms Property; An entity responsible for making the resource available.; Examples of a Publisher include a person, an organization, or a service." } ] }, { "label": "Rights", "value": [ { "label": "Rights", "value": "Images provided for research and reference use only. Permission to publish, copy, or otherwise use these images must be obtained from Rare Books and Special Collections: http:\/\/rbsc.library.ubc.ca", "attrs": { "lang": "en", "ns": "http:\/\/purl.org\/dc\/terms\/rights", "classmap": "edm:WebResource", "property": "dcterms:rights" }, "iri": "http:\/\/purl.org\/dc\/terms\/rights", "explain": "A Dublin Core Terms Property; Information about rights held in and over the resource.; Typically, rights information includes a statement about various property rights associated with the resource, including intellectual property rights." } ] }, { "label": "SortDate", "value": [ { "label": "Sort Date", "value": "1918", "attrs": { "lang": "en", "ns": "http:\/\/purl.org\/dc\/elements\/1.1\/date", "classmap": "dpla:SourceResource" }, "iri": "http:\/\/purl.org\/dc\/elements\/1.1\/date", "explain": "A Dublin Core Elements Property; A point or period of time associated with an event in the lifecycle of the resource.; Date may be used to express temporal information at any level of granularity. Recommended best practice is to use an encoding scheme, such as the W3CDTF profile of ISO 8601 [W3CDTF]." } ] }, { "label": "Source", "value": [ { "label": "Source", "value": "Original Format: University of British Columbia. Library. Rare Books and Special Collections. World War I Poster and Broadside Collection. SPAM462C", "attrs": { "lang": "en", "ns": "http:\/\/purl.org\/dc\/terms\/source", "classmap": "oc:SourceResource", "property": "dcterms:source" }, "iri": "http:\/\/purl.org\/dc\/terms\/source", "explain": "A Dublin Core Terms Property; A related resource from which the described resource is derived.; The described resource may be derived from the related resource in whole or in part. Recommended best practice is to identify the related resource by means of a string conforming to a formal identification system." } ] }, { "label": "Title", "value": [ { "label": "Title", "value": "The Kitchen is the [key] to victory. Eat less bread", "attrs": { "lang": "en", "ns": "http:\/\/purl.org\/dc\/terms\/title", "classmap": "dpla:SourceResource", "property": "dcterms:title" }, "iri": "http:\/\/purl.org\/dc\/terms\/title", "explain": "A Dublin Core Terms Property; TA name given to the resource." } ] }, { "label": "Type", "value": [ { "label": "Type", "value": "Still Image", "attrs": { "lang": "en", "ns": "http:\/\/purl.org\/dc\/terms\/type", "classmap": "dpla:SourceResource", "property": "dcterms:type" }, "iri": "http:\/\/purl.org\/dc\/terms\/type", "explain": "A Dublin Core Terms Property; The nature or genre of the resource.; Recommended best practice is to use a controlled vocabulary such as the DCMI Type Vocabulary [DCMITYPE]. To describe the file format, physical medium, or dimensions of the resource, use the Format element." } ] } ], "thumbnail": "\/\/iiif.library.ubc.ca\/image\/cdm.wwposters.1-0038290.0000\/full\/80,100\/0\/default.jpg", "attribution": "Images provided for research and reference use only. Permission to publish, copy, or otherwise use these images must be obtained from Rare Books and Special Collections: http:\/\/rbsc.library.ubc.ca", "sequences": [ { "@id": "\/\/iiif.library.ubc.ca\/presentation\/cdm.wwposters.1-0038290\/sequence\/normal", "@type": "sc:Sequence", "label": "Default", "viewingDirection": "left-to-right", "viewingHint": "paged", "canvases": [ { "@id": "\/\/iiif.library.ubc.ca\/presentation\/cdm.wwposters.1-0038290\/canvas\/p0", "@type": "sc:Canvas", "label": "The Kitchen is the [key] to victory. Eat less bread", "height": 6352, "width": 4263, "images": [ { "@id": "\/\/iiif.library.ubc.ca\/presentation\/cdm.wwposters.1-0038290\/annotation\/p0000", "@type": "oa:Annotation", "motivation": "sc:painting", "resource": { "@id": "\/\/iiif.library.ubc.ca\/image\/cdm.wwposters.1-0038290", "@type": "dctypes:Image", "format": "image\/jpeg", "height": 6352, "width": 4263, "service": { "@context": "http:\/\/iiif.io\/api\/image\/2\/context.json", "@id": "\/\/iiif.library.ubc.ca\/image\/cdm.wwposters.1-0038290", "@profile": "http:\/\/iiif.io\/api\/image\/2\/level2.json", "scaleFactors": [ 1, 2, 4, 8, 16, 32, 64, 128, 256, 512, 1024 ] } }, "on": "\/\/iiif.library.ubc.ca\/cdm.wwposters.1-0038290\/canvas\/p0" } ] } ] } ], "description": "World War I poster.", "@context": "http:\/\/iiif.io\/api\/presentation\/2\/context.json", "@id": "https:\/\/iiif.library.ubc.ca\/presentation\/cdm.wwposters.1-0038290\/manifest", "@type": "sc:Manifest" }
From the JSON we can see the property 'thumbnail' which in this instance has the value:
http://iiif.library.ubc.ca/image/cdm.wwposters.1-0038290.0000/full/80,100/0/default.jpg
Now we can start to see how we could manipulate the URL to modify the image that is returned to us:
http://iiif.library.ubc.ca/image/cdm.wwposters.1-0038290.0000/[:region]/[:width],[:height]/[:rotation]/[:quality].[:format]
To find out what qualities and formats the image supports you can just remove everything after the identifier
and inspect just the image metadata for instance:
http://iiif.library.ubc.ca/image/cdm.wwposters.1-0038290.0000
For this image that would return the following JSON:
{ "@context": "http:\/\/iiif.io\/api\/image\/2\/context.json", "@id": "http:\/\/iiif.library.ubc.ca\/image\/cdm.wwposters.1-0038290", "protocol": "http:\/\/iiif.io\/api\/image", "width": 4263, "height": 6352, "sizes": [ { "width": 4263, "height": 6352 }, { "width": 81, "height": 120 } ], "tiles": [ { "width": 512, "scaleFactors": [ 1, 2, 4, 8, 16, 32, 64, 128, 256, 512, 1024 ] } ], "profile": [ "http:\/\/iiif.io\/api\/image\/2\/level2.json", { "formats": [ "jpg", "png", "gif" ], "qualities": [ "color", "gray", "bitonal" ], "supports": [ "baseUriRedirect", "cors", "jsonldMediaType", "mirroring", "regionByPct", "regionByPx", "rotationArbitrary", "rotationBy90s", "sizeByWhListed", "sizeByForcedWh", "sizeByH", "sizeByPct", "sizeByW", "sizeByWh" ] } ] }
Example: Requesting a Wallpaper
Once you've found an image you like, you can construct a URL like this to request a nice sized desktop background:
http://iiif.library.ubc.ca/image/cdm.tokugawa.1-0213163/full/1600,900/0/default.png
If we wanted to remove the excess borders we could pass through the [:region] part of the IIIF url:
http://iiif.library.ubc.ca/image/cdm.tokugawa.1-0213163/1145,374,9234,4645/1600,900/0/default.png
Example Scripts
Scripts for working with the API can be viewed or downloaded below.
Get all items from a collection
To get all items from a collection you need to request the collection first and retrieve all its item ids and then loop through each item using the API.
// Author: Yves Beaudoin package main import ( "encoding/json" "log" "fmt" "github.com/christophwitzko/go-curl" "strings" ) const ( // Replace this public key with your API Key _apiKey = "ac40e6c2cb345593ed1691e0a8b601bba398e42d85f81f893c5ab709cec63c6c" // Define the collection to get all items from _collection = "darwin" ) // Set up the data types for unmarshalling the JSON search results type _searchResult4Collection struct { DATA []struct { ID string `json:"_id"` } `json:"data"` } type _searchResult4ItemData struct { DATA interface{} `json:"data"` } func main() { var ( collectionItems _searchResult4Collection err error itemData _searchResult4ItemData itemBytes []byte items [][]byte ) // Get all the item ids for the collection curlOutput := curlGet("https://oc-index.library.ubc.ca/collections/" + _collection + "/items" + "?api_key=" + _apiKey) // Store them in a slice we can use if err = json.Unmarshal(curlOutput, &collectionItems); err != nil { log.Fatalln("\a[json.Unmarshal] ", err) } // Loop through each item id and store the associated JSON data for itemIdx, item := range collectionItems.DATA { fmt.Println("Item ID =", item.ID, strings.Repeat("-", 80)) curlOutput = curlGet("https://oc-index.library.ubc.ca/collections/" + _collection + "/items/" + item.ID + "?api_key=" + _apiKey) if err = json.Unmarshal(curlOutput, &itemData); err != nil { log.Fatalln("\a[json.Unmarshal] ", err) } if itemBytes, err = json.MarshalIndent(itemData, "", " "); err != nil { log.Fatalln("\a[json.MarshalIndent] ", err) } items = append(items, itemBytes) fmt.Println(string(items[itemIdx])) } } func curlGet(URL string) []byte { err, curlOutput, response := curl.Bytes(URL, "method=", "GET", "disablecompression=", true) if err != nil { log.Fatalln("\a[curlGet] ", err) } else if response != nil && response.StatusCode != 200 { log.Fatalln("\a[curlGet] response.StatusCode = ", response.StatusCode) } return curlOutput }
$apiURL = 'https://oc-index.library.ubc.ca/'; // Replace this with your API Key $apiKey = 'ac40e6c2cb345593ed1691e0a8b601bba398e42d85f81f893c5ab709cec63c6c'; // Collection to get all items from $collection = 'darwin'; $perPage = 25; $offset = 0; // First query the API for the count of items in the collection $curlOutput = curlGet($apiURL.'collections/'.$collection.'?api_key='.$apiKey); $itemCount = $curlOutput->data->items; // Now we can work out how many pages to loop through $pages = (int) ceil($itemCount/$perPage); $itemIds = []; // First we need to get all the Item Ids from the API while($pages > 0) { $curlOutput = curlGet('https://oc-index.library.ubc.ca/collections/'.$collection.'/items?limit='.$perPage.'&offset='.$offset.'&api_key='.$apiKey); // Now we want to store them in an array we can use foreach($curlOutput->data as $itemInCollection) { $itemIds[] = $itemInCollection->_id; } $offset += $perPage; $pages--; } $items = []; // Next we want to loop through each item id and store the item data. foreach($itemIds as $itemId) { $curlOutput = curlGet('https://oc-index.library.ubc.ca/collections/'.$collection.'/items/'.$itemId.'?api_key='.$apiKey); $items[] = $curlOutput->data; } // We now have our items and can use or manipulate them as needed. echo json_encode($items); exit; //Simple curl function to keep code DRY, will exit on error. function curlGet($url) { try { $ch = curl_init(); curl_setopt( $ch, CURLOPT_URL, $url ); curl_setopt( $ch, CURLOPT_SSLVERSION, CURL_SSLVERSION_TLSv1 ); curl_setopt( $ch, CURLOPT_RETURNTRANSFER, 1 ); $curlOutput = json_decode( curl_exec( $ch ) ); curl_close( $ch ); } catch ( Exception $e ) { var_dump($e); exit; } return $curlOutput; }
import requests, math, json ocApiUrl = 'https://oc-index.library.ubc.ca' apiKey = 'ac40e6c2cb345593ed1691e0a8b601bba398e42d85f81f893c5ab709cec63c6c' collection = 'darwin' perPage = 25 offset = 0 # Query the API for the collection item count collectionUrl = ocApiUrl + '/collections/' + collection + '?api_key=' + apiKey apiResponse = requests.get(collectionUrl).json() itemCount = float(apiResponse['data']['items']) # Figure out how many pages there are pages = int(math.ceil(itemCount / float(perPage))) # Loop through collection item pages to get all items itemIds = [] for x in range(0, pages): collectionItemsUrl = ocApiUrl + '/collections/' + collection collectionItemsUrl += '/items?limit=' + str(perPage) + '&offset=' + str(offset) + '&api_key=' + apiKey offset += 25 # Get list of 25 items apiResponse = requests.get(collectionItemsUrl).json() collectionItems = apiResponse['data'] # Add each item id to the itemIds list for collectionItem in collectionItems: itemIds.append(collectionItem['_id']) # Store all the items so we can print them out later items = [] for itemId in itemIds: itemUrl = ocApiUrl + '/collections/' + collection + '/items/' + itemId apiResponse = requests.get(itemUrl).json() item = apiResponse['data'] items.append(item) print(json.dumps(items))
Harvest full text of all items in a collection
Note: not all collections have full text, we've specifically chosen one of the smaller collections with full text to show you how to programmatically loop through items and export the full text.
// Replace this with your API Key $apiKey = 'ac40e6c2cb345593ed1691e0a8b601bba398e42d85f81f893c5ab709cec63c6c'; // Collection we want to harvest $collection = 'darwin'; // Setup variables $limit = 100; $itemIds = []; $items = []; // First we need to find out how many items are in the collection $curlOutput = curlGet('https://oc-index.library.ubc.ca/collections/' . $collection.'?api_key='.$apiKey); // Now we have the item count, figure out the page count and create an offset of 0 $itemCount = $curlOutput->data->items; $pageCount = ceil($itemCount / $limit); $offset = 0; // Loop through the pages and extract the item ids into the $itemIds array. while ($pageCount > 0) { $curlOutput = curlGet('https://oc-index.library.ubc.ca/collections/' . $collection . '/items?api_key='.$apiKey.'&offset=' . $offset . '&limit=' . $limit); foreach ($curlOutput->data as $item) { $itemIds[] = $item->_id; } $pageCount--; $offset = $offset + 100; } // Loop through the item ids and extract metadata into the $items array. foreach ($itemIds as $itemId) { $curlOutput = curlGet('https://oc-index.library.ubc.ca/collections/' . $collection . '/items/' . $itemId.'?api_key='.$apiKey); $item = $curlOutput->data; $items[] = array( "id" => $itemId, "title" => $item->Title[0]->value, "description" => $item->Description[0]->value, "fullText" => property_exists($item, 'FullText') ? $item->FullText[0]->value : null ); } // We now have the items stored in $items, uncomment below to check it out. // echo json_encode($items); // exit; // For more fun lets add them into a CSV file. ( You could have file permission problems attempting this ) $fp = fopen($collection . '.csv', 'w'); fputcsv($fp, ['ID', 'TITLE', 'DESCRIPTION', 'FULLTEXT'], '~', '"'); foreach ($items as $item) { fputcsv($fp, $item, '~', '"'); } fclose($fp); //Simple curl function to keep code DRY, will exit on error. function curlGet($url) { try { $ch = curl_init(); curl_setopt($ch, CURLOPT_URL, $url); curl_setopt($ch, CURLOPT_SSLVERSION, CURL_SSLVERSION_TLSv1); curl_setopt($ch, CURLOPT_RETURNTRANSFER, 1); $curlOutput = json_decode(curl_exec($ch)); curl_close($ch); } catch (Exception $e) { var_dump($e); exit; } return $curlOutput; }
import requests, math, csv ocApiUrl = 'https://oc-index.library.ubc.ca' apiKey = 'ac40e6c2cb345593ed1691e0a8b601bba398e42d85f81f893c5ab709cec63c6c' collection = 'darwin' perPage = 25 offset = 0 # Query the API for the collection item count collectionUrl = ocApiUrl + '/collections/' + collection + '?api_key=' + apiKey apiResponse = requests.get(collectionUrl).json() itemCount = float(apiResponse['data']['items']) # Figure out how many pages there are pages = int(math.ceil(itemCount / float(perPage))) # Loop through collection item pages to get all items itemIds = [] for x in range(0, pages): collectionItemsUrl = ocApiUrl + '/collections/' + collection collectionItemsUrl += '/items?limit=' + str(perPage) + '&offset=' + str(offset) + '&api_key=' + apiKey offset += 25 # Get list of 25 items apiResponse = requests.get(collectionItemsUrl).json() collectionItems = apiResponse['data'] # Add each item id to the itemIds list for collectionItem in collectionItems: itemIds.append(collectionItem['_id']) items = [] for itemId in itemIds: itemUrl = ocApiUrl + '/collections/' + collection + '/items/' + itemId apiResponse = requests.get(itemUrl).json() item = apiResponse['data'] itemStore = dict() itemStore['id'] = itemId itemStore['title'] = item['Title'][0]['value'].encode('utf8') itemStore['description'] = item['Description'][0]['value'].encode('utf8') if 'FullText' in item: # Note we are ignoring any utf8 encoding errors here itemStore['fullText'] = item['FullText'][0]['value'].encode('utf8', errors='ignore') else: itemStore['fullText'] = '' items.append(itemStore) with open('full-text.csv', 'w+b') as csvfile: writer = csv.writer(csvfile, delimiter='~', quotechar='|', quoting=csv.QUOTE_MINIMAL) writer.writerow(['ID', 'Title', 'Description', 'Full Text']) for item in items: writer.writerow([item['id'], item['title'], item['description'], item['fullText']])
Note: to use your CSV in Excel you will need to do the following steps:
- Open Blank Workbook.
- Go to DATA tab.
- Click button From Text in the General External Data section.
- Select your CSV file.
- In Step 1 of the wizard set Original Data Type to 'Delimited' also check the box My data has headers
- In Step 2 of the wizard in the Delimiters section, un-check the 'Tab' option, then check the option 'Other' and insert the value ' ~ '.
- Press Finish
Download Collection Data
This script should be run from a terminal window, it allows you to download items metadata into a format of your choosing using the API.
// Replace this with your API Key $apiKey = 'ac40e6c2cb345593ed1691e0a8b601bba398e42d85f81f893c5ab709cec63c6c'; // Replace this with where you want the downloads to go $dir = '/ocdata/downloads'; ini_set('display_errors', 1); ini_set('log_errors', 1); ini_set('error_log', '/ocdata/oc-downloader.log'); function displayHelpMessage() { echo "_________________________________________________________________\n"; echo " OC DOWNLOADER HELP\n"; echo "\n"; echo " --cid collection to ingest from\n"; echo " --fmt output format\n"; echo " json json - ubc metadata manual format\n"; echo " ubc+json json - with keys are ubc property tag\n"; echo " ld json-ld- keys are iri\n"; echo " ld+json json-ld- keys are iri\n"; echo " ld+rdf rdf/xml (direct transform of json-ld)\n"; echo " rdf rdf/xml\n"; echo " rdf+json rdf represented in rdf+json\n"; echo " turtle rdf (turtle))\n"; echo " ntriples rdf (ntriples)\n"; echo " --help show this message\n"; exit; } $flags = [ '--cid' => FALSE, // collection to ingest from '--fmt' => 'rdf' // output format ]; array_shift($argv); while ($arg = array_shift($argv)) { switch ($arg) { case '--help': displayHelpMessage(); exit; case '--env': case '--cid': case '--fmt': $flags[$arg] = array_shift($argv); break; case '--txt': $flags[$arg] = 'true' === array_shift($argv); break; default: echo "Found [{$arg}] - no processing command is stated for this argument"; } } if (!$flags['--cid']) { echo " Error: You must specify a collection [--cid] to process, see --help for more\n"; exit; } $extension = [ 'json' => 'json', 'ubc+json' => 'json', 'ld' => 'json', 'ld+json' => 'json', 'rdf+json' => 'json', 'ld+rdf' => 'xml', 'rdf' => 'xml', 'turtle' => 'txt', 'ntriples' => 'txt' ]; $format = $flags['--fmt']; $ext = $extension[$format]; $ocREST = 'https://oc-index.library.ubc.ca'; // Create directory if doesn't exist if (!is_dir($dir)) { mkdir($dir, 0777, TRUE); } // Get list of all collections if --cid all has been passed if ('_all' === "{$flags['--cid']}") { $ch = curl_init(); curl_setopt($ch, CURLOPT_SSL_VERIFYPEER, FALSE); curl_setopt($ch, CURLOPT_RETURNTRANSFER, TRUE); curl_setopt($ch, CURLOPT_URL, "{$ocREST}/collections?api_key={$apiKey}"); $response = curl_exec($ch); curl_close($ch); $collections = json_decode($response, TRUE)['data']; } else { $collections [] = "{$flags['--cid']}"; } // Process each collection foreach ($collections as $idx) { // Create a directory for each collection $dlddir = $dir."/{$idx}"; if (!is_dir($dlddir)) { mkdir($dlddir, 0777, TRUE); } echo "Generating Download Files: {$idx}\n"; $finishedProcessing = FALSE; $limit = 100; $offset = 0; while (!$finishedProcessing) { echo("GET:/collections/{$idx}/items?api_key={$apiKey}&limit={$limit}&offset={$offset}\n"); // Get items from collection $ch = curl_init(); curl_setopt($ch, CURLOPT_SSL_VERIFYPEER, FALSE); curl_setopt($ch, CURLOPT_RETURNTRANSFER, TRUE); curl_setopt($ch, CURLOPT_URL, "{$ocREST}/collections/{$idx}/items?api_key={$apiKey}&limit={$limit}&offset={$offset}"); $response = curl_exec($ch); curl_close($ch); $items = json_decode($response, TRUE); $items = $items['data']; if (empty($items)) { $finishedProcessing = TRUE; continue; } else { foreach ($items as $item) { $iid = $item ['_id']; echo(" - GET:/collections/{$idx}/items/{$iid}/output-format/{$format}\n"); // Get item data in format requested $ch = curl_init(); curl_setopt($ch, CURLOPT_SSL_VERIFYPEER, FALSE); curl_setopt($ch, CURLOPT_RETURNTRANSFER, TRUE); curl_setopt($ch, CURLOPT_URL, "{$ocREST}/collections/{$idx}/items/{$iid}/output-format/{$format}?api_key={$apiKey}"); $response = curl_exec($ch); curl_close($ch); switch ($ext) { case 'json': $d = $response; $d = json_decode($d, TRUE); $d = json_encode($d); file_put_contents("{$dlddir}/{$iid}_{$format}.{$ext}", $d); break; default: file_put_contents("{$dlddir}/{$iid}_{$format}.{$ext}", $response); } } } $offset += 100; } echo "Finished Generating Download Files: {$idx}\n"; } echo "Finished Generating Download Files. Goodbye!\n"; exit;
Sooon.