Open Collections

UBC Graduate Research

Institutional repository software comparison: DSpace, EPrints, Digital Commons, Islandora and Hydra Castagné, Michel Aug 14, 2013

Your browser doesn't seem to have a PDF viewer, please download the PDF to view this item.

Item Metadata


42591-Castagne_M_LIBR596_IR_comparison_2013.pdf [ 473.52kB ]
JSON: 42591-1.0075768.json
JSON-LD: 42591-1.0075768-ld.json
RDF/XML (Pretty): 42591-1.0075768-rdf.xml
RDF/JSON: 42591-1.0075768-rdf.json
Turtle: 42591-1.0075768-turtle.txt
N-Triples: 42591-1.0075768-rdf-ntriples.txt
Original Record: 42591-1.0075768-source.json
Full Text

Full Text

           Institutional repository software comparison: DSpace, EPrints, Digital Commons, Islandora and Hydra      Michel Castagn?  University of British Columbia                   July 2013  Institutional repository software comparison     1  1.0 Executive Summary The following report is an environmental scan of institutional repository software packages and frameworks. DSpace, EPrints, Digital Commons and Fedora Commons were selected based on their ROAR statistics1 and overall suitability for a large research library. In order to discuss the Fedora framework in practical terms, two promising Fedora-based projects were selected: Islandora and Hydra. This report was a major component in the requirements of the Professional Experience (LIBR 596) course, which counted towards the author?s Master of Library and Information Studies degree at the University of British Columbia.                                                             1  Institutional repository software comparison     2  2.0 DSpace 2.1. Overview DSpace is an open-source digital asset management system originally created by developers from MIT and HP Labs in 2002. It is most commonly used by institutional repositories (Tansley, Smith, & Walker, 2005) and as of July 2013, ROAR has recorded 1,356 implementations, making it by far the most popular and tested repository solution available. A public demo site2 allows users to test DSpace 3.1, the current stable release. DSpace 4.0 is slated for release around November or December 2013. 2.2. Installation / Administration  The DSpace installation process follows a ?turn-key? approach and, as such, is relatively straightforward, at least in comparison to a framework like Fedora. Advanced customization of the software, however, can be difficult and might require external consultation, in addition to introducing the possibility of unexpected complications during upgrades. In light of this, DSpace development has been trending towards making the software more flexible and extensible, with some developers and institutions currently discussing the possibility of rewriting DSpace as a ?Hydra-head? module powered by Fedora (see 5.1), or perhaps leveraging some of the Hydra front-end tools onto a DSpace back-end (DSpace Futures, 2013). A number of DSpace hosting services exist, notably DSpaceDirect3 and the UK-based Open Repository.4 Neither organization discloses typical hosting costs except by request, but ?robust? preservation service for DSpaceDirect using DuraCloud costs about $31,725 for three years. In British Columbia and Nova Scotia, however, the Freedom of Information and Protection of Privacy Act (FIPPA) currently prohibits public bodies from storing or allowing access to personal information outside of Canada without consent (Office of the Information and Privacy Commissioner for BC, 2012). 2.2.1.  Metadata DSpace supports Qualified Dublin Core metadata by default, and can export to 11 other formats: OAI_DC, DIDL, DIM, ETDMS, METS, MODS, OAI-ORE, RDF, MARC, UKETD_DC, and XOAI. Additionally, it is possible to create custom metadata schema using XML. In 3.0, DSpace added a new feature for using a controlled vocabulary with vocabulary look-up possible in submission forms. According to the DSpace Futures report (2013), however, some institutions are concerned that the software lacks sufficient support for geospatial and journal article metadata. 2.2.2.  Interoperabil ity  As an open-source project of the not-for-profit DuraSpace, DSpace is oriented towards open standards and protocols. In addition to fully supporting both the OAI-PMH and SWORD protocols, it can ingest and export Archival Information Packages (AIPs) as defined by the OAIS Reference Model.5 Integration with Archivematica (a robust, open-source digital preservation                                                           2  3  4  5  Institutional repository software comparison     3  system developed by Artefactual Systems in New Westminster, BC) would be possible through AIP exports.6 DSpace also has some rudimentary SHERPA/RoMEO API look-up functionality, but this is only for authority control.7 The lack of a stable, built-in RESTful web API means that DSpace web services do not have the same level of interoperability as Fedora-based IR software. However, this feature is currently under development and will possibly be ready by 4.0.8 Peter Dietz at Ohio State University Libraries mentions a few use cases of a REST API, namely: embedding DSpace content into other websites, integrating DSpace content into other systems, and/or building a user interface in a lightweight web application framework (like Ruby on Rails).9 2.3. Content Management  2.3.1.  Embargoes, Versioning and Preservation  As of version 3.0, DSpace supports creating embargoes on items. Item-level versioning was also introduced in 3.0, but is currently incompatible with AIPs (i.e., the versioning information will not be preserved). On the subject of preservation, the DSpace Futures report (2013) notes that [w]hile DSpace does offer "hooks" for preservation, some think that this is still an underdeveloped area that could use more attention. Generally, they believe that, while much interest is directed towards digital preservation, media, data, and digital collections, DSpace is not always well tailored for this work. Some institutions do intentionally maintain separate applications for open access and preservation management, with preservation copies kept on a separate server. One possibility for a separate application would be Archivematica, mentioned in 2.2.2. 2.3.2.  Statistics  The built-in Solr statistics module logs internal events, such as bitstream downloads and workflows statistics. In order to display these statistics, Atmire developed an add-on10 that is available for purchase. Alternatively, DSpace 3.0 has the option to use the open-source Elasticsearch to display this information, a feature that is easily enabled.11 It is also possible to store downloads and page views in Google Analytics. The Edinburgh Research Archive, for instance, has developed a Google Analytics integration module (JSPUI-only) that can track and publicly display usage statistics at the item, collection and repository level.12  2.3.3.  File Formats and Batch Importing                                                            6  7  8  9  10  11  12  Institutional repository software comparison     4  DSpace supports all file types, but importing large research datasets can be challenging. The University of Exeter has developed a workflow using Globus and SWORD.13 Batch importing is possible through XML, using SAFBuilder.14 2.3.4.  User Interface  The user interface is functional but somewhat dated, and lacks responsiveness to device size variation. While the mobile theme (in beta) addresses this issue, a single fully responsive, easily customizable interface would be preferable. Currently, some institutions find simple branding to be relatively time-consuming and there is a lack of built-in front-end development tools (DSpace Futures, 2013). 2.3.5.  Search The search engine is based on Lucene, a popular and powerful open-source engine. It can be challenging to configure the search engine and problems can be difficult to solve without programmers on hand who are well-versed in the software. 2.4. Support  Support is available through several mailing lists, an IRC channel, DuraSpace, as well as third parties, such as Atmire. The DSpace Ambassador Program focuses on identifying volunteers in every country who are willing to be a point of contact for organizations getting started with DSpace. 2.5. Summary DSpace has proven to be a solid repository platform since its launch in 2002. With the recent release of 3.1?and 4.0 on the horizon for November/December 2013?DSpace remains promising and competitive amidst new developments on the landscape, such as the increasing need for more robust support for research data and more extensible back-ends.                                                               13  14  Institutional repository software comparison     5  3.0 EPrints 3.1. Overview EPrints is a free and open-source software package originally developed by researchers at the University of Southampton School of Electronics and Computer Science in 2000 (making it the oldest of the platforms in this report). It was designed specifically for archiving research papers, theses and teaching materials, though it can accept any content. As of July 2013, ROAR has recorded 500 implementations, making it the second most popular platform. A sandbox demonstration site is available.15  3.2. Installation / Administration  Like DSpace, EPrints follows a ?turn-key? approach and some institutions have reported that the installation process is fairly straightforward (Beazley, 2010). The administrative back-end provides access to configuration options. EPrints? Bazaar Store16 is an interesting concept, aiming to allow repository managers to install extensions with a single click. A fully hosted EPrints repository is available through EPrints Services. 3.2.1.  Metadata EPrints is capable of using a controlled vocabulary and authority lists, which can help ensure high metadata quality. It provides native support for Dublin Core with the possibility of exporting to a number of formats (e.g., METS, MODS and DIDL). Qualified Dublin Core and MARC are not supported (Younglove, 2012). 3.2.2.  Interoperabil ity  As open-source software, EPrints is fairly interoperable, supporting OAI-PMH and SWORD. It is also possible to export the repository metadata and directory structure using XML, though advanced scripting knowledge is necessary. Case studies of migration from EPrints to DSpace and vice versa were reported at OR11 (Davis & Subirats-Coll, 2011). EPrints does not support AIP imports/exports, but integration with Archivematica might be possible through SWORD.17 3.3. Content Management  3.3.1.  Embargoes, Versioning and Preservation  EPrints allows defining embargo dates and a detailed object history is maintained with versioning possible. According to the Preservation Support wiki,18 EPrints considers its key preservations actions to be: ?recording changes to a repository object by updating its 'preservation metadata'?; ?enabling the service provider to download all the files and metadata comprising an object (METS and DIDL export plugins)?; and ?notifying the service provider of any rights it has to copy and act on the content of an object?.                                                           15  16  17  18  Institutional repository software comparison     6  EPrints is currently working closely with the JISC-funded Preserv project to develop a more complete digital preservation plan.19 3.3.2.  Statistics  Through the IRStats package,20 EPrints keeps track of download counts of full-text documents (statistics can be viewed as a graph or table). An in-depth discussion of extending IRStats to record and display comprehensive statistics was published by researchers at Queensland University of Technology (Callan & Gregson, 2012). 3.3.3.  File Formats and Batch Importing  It is possible to add files in any format, but customization is required to extend EPrints to support research datasets. Batch importing can be challenging and requires some knowledge of Perl scripting. Researchers at Concordia University recently published a report on their method of performing batch ingests (Neugebauer & Han, 2012). 3.3.4.  User Interface  In recent years, EPrints has made a large effort towards usability and has a clean and user-friendly interface (Beazley, 2010), making it easy for users to submit and manage files. The deposit workflow can be modified. 3.3.5.  Search The default, built-in search engine can search all metadata fields; sort results by issue date, author name and title; and supports Boolean operators. Full-text indexing is available for some formats (PDF, Word and HTML) when the appropriate tools are installed.21 Searches are executed through the plug-ins layer and EPrints has support for the Xapian engine, which allows sorting results by relevance. A 2009 comparison of open-source search engines placed Lucene ahead of Xapian (Singh, 2009). 3.4. Support  In addition to hosting, EPrints Services offers customization, training and support services. The Eprints-tech mailing list is fairly active, and documentation and training materials are available. The EPrints community seems to be concentrated in Europe, specifically the UK. 3.5. Summary The main attractions of EPrints seem to be its user-friendly interface and ease-of-implementation. However, these features might not be enough of an advantage to warrant a migration from another system. EPrints roughly estimates costs to be about ?2000 (in staff-time) for set-up and ?8 (in staff-time) to add a record.22 Godfrey (2008) describes it as ?an ideal repository solution for initial implementation in a university with limited financial resources and IT support?.                                                           19  20  21  22  Institutional repository software comparison     7  4.0 Digital Commons 4.1. Overview Digital Commons is a hosted IR platform licensed by Berkeley Electronic Press (or Bepress) that officially launched in 2004. External hosting offloads much of the technical work involved in maintaining self-hosted repository infrastructure. Subscribers sign to an annual license, the cost of which scales according to the size of an institution. As of July 2013, ROAR has recorded 176 implementations. A demo site is available.23 4.2. Installation / Administration  The setting up of the repository is fully handled by Bepress. Upgrades are performed on a quarterly basis with no downtime. Institutions have access to an administrative back-end that allows configuration of workflow settings and user privileges. Workflows are flexible, robust and customizable. According to Bepress? comments in the University at Albany (2012) comparison, ?major reconstruction? of the ?HTML templates? is not possible, but they also note it is possible to customize the repository to match an institution?s branding and desired ?look and feel?. 4.2.1.  Metadata Digital Commons supports Qualified Dublin Core. METS, MARC, PREMIS are not supported, though non-DC elements are supported in the interface. 4.2.2.  Interoperabil ity  Digital Commons is a registered OAI Data Provider (i.e., ?exposes metadata to the world using the OAI-PMH protocol?), but not Service Provider (?uses the metadata harvested via the OAI-PMH as a basis for building value-added services?) (Van de Sompel, Nelson, Lagoze, Warner (2002). In case of a need for migration, content and metadata can be exported via OAI harvesting, but this requires advanced programming skills and at least one institution has encountered difficulty.24 Digital Commons supports Unicode metadata, so non-Western language submissions are possible. There is no SWORD support or RoMEO integration. 4.3. Content Management  4.3.1.  Embargoes, Versioning and Preservation  It is possible to control access and set embargo periods for ETDs, by specific dates or within a date range. It is unclear whether a record of this information is recorded in the metadata. Access control can be configured at the object or collection level. On his personal blog, Neil Godfrey, the Research Data Management Coordinator at Charles Darwin University, noted that Digital Commons is a ?presentation repository?, not a ?preservation repository? (2008). Further, Godfrey states that ?a preservation repository, unlike Digital Commons, will record and preserve authentication, versioning, rights, structural and descriptive metadata. In Digital Commons such data will not be preserved for migration/exit strategy purposes to a preservation repository.? The blog entry is an ?abbreviated and highly                                                           23  24  Institutional repository software comparison     8  edited? version of an unpublished report for an unnamed institution. Another unnamed institution has decided to use Archivematica as a preservation archive for material, in parallel (not integrated) with Digital Commons.25 4.3.2.  Statistics  Comprehensive statistics are available. Bepress also discusses the advanced filtering technology they use to get accurate download counts.26 It is designed ?not to count downloads triggered by internet robots, automated processes, crawlers, and spam-bots (RACS)?. 4.3.3.  File Formats and Batch Importing  Digital Commons will import any file format. A tutorial on batch deposits has been published by Witt and Newton (2008). Large datasets are problematic as the software is fully Web-based. Some institutions prefer to simply upload metadata for the research data and point to a different file server (University at Albany, 2012). 4.3.4.  User Interface  The user interface is clean and user-friendly, but not as easily customizable as Fedora-based frameworks or EPrints (University at Albany, 2012). It is straightforward for users to submit and manage files, receive email alerts and RSS feeds, as well as monthly email reports of activity/downloads of submissions. 4.3.5.  Search Digital Commons uses a built-in Lucene-based search engine, which supports full-text indexing. It is possible to search any field, along with the usual sorting and Boolean support. Cross-institutional searching is a unique feature that provides a single discovery portal for content from all institutions that use Digital Commons.27 4.4. Support  Another of Digital Commons? strengths is the support offered by Bepress, which includes unlimited training and phone/email support. Two engineers are on call 24/7. They work with institutions to add requested features to the software. 4.5. Summary With an attractive and user-friendly interface, reduced technical responsibilities and comprehensive support, Digital Commons is an attractive choice. The built-in peer-reviewed journal publishing system can also create further value. Like hosted DSpace options, the fact that Digital Commons does not yet provide Canada-based hosting could be problematic. Further, although their website discusses how they preserve bitstreams, there is a lack of information about preserving metadata and audit trails.                                                            25  26  27  Institutional repository software comparison     9  5.0 Islandora and Hydra 5.1. Overview Fedora Commons is a modular digital asset management architecture originally developed by researchers at Cornell University and the University of Virginia Library in 2003. It is a framework with no built-in functionality for management, indexing, discovery and delivery of items. Instead, it is designed to allow a high degree of extensibility, making it possible for developers to implement virtually any feature, as well as integrate third-party software into the framework. This level of flexibility comes at the cost of ease-of-implementation. Fedora 4 is under development, and is currently in Alpha 1 (released July 11, 2013). Islandora, originally developed by UPEI in 2009, is a ?best-practices? Fedora-based software stack that uses Drupal as a front-end. The key advantage of Islandora is that it removes many of the largest barriers often encountered by institutions interested in setting up a Fedora-powered IR. The project has also released a number of highly configurable modules, both tools and ?solution packs?, which facilitate the performance of a wide range of actions. While ROAR does not keep track of implementation numbers, the Islandora Installations map records 60.28 Another Fedora-based solution worth mentioning is Hydra. Instead of using a PHP-based front-end like Drupal, Hydra uses the Ruby on Rails web application framework and is less of a ?turn-key? solution than Islandora.29 There are currently fourteen institutional partners committed to supporting the project.30 5.2. Installation / Administration  The Islandora installation is straightforward and simply requires a server with Fedora Commons and Drupal installed. Web site administration and configuration is performed through the Drupal front-end. A full-featured Islandora sandbox is available in order to explore the range of configuration options.31 5.2.1.  Metadata Islandora preserves Fedora?s strong support for descriptive and administrative metadata (such as audit streams). 32 The basic metadata is Dublin Core XML, though any format is possible (such as MODS or Qualified Dublin Core). Each Solution Pack allows the creation of custom metadata forms using the XML Forms module package.33 5.2.2.  Interoperabil ity  Interoperability is facilitated mainly by the Fedora back-end, which exposes a REST API. Fedora 4 is geared towards linked open data and has experimental support for Content Management Interoperability Services (CMIS).                                                           28  29  30  31  32  33  Institutional repository software comparison     10  In particular, the Islandora OAI Module34 provides visibility via OAI-PMH. SWORD support is not yet available. The Islandora Scholar Module, 35 which is still in the early stages of development and not yet an official part of Islandora, allows RoMEO querying via the SHERPA/RoMEO API. When a result is found, a tab will appear on the item management page that displays the relevant policies. There is no projected integration date, although many of the module?s features are already implemented in IslandScholar, UPEI?s repository. 5.3. Content Management  5.3.1.  Embargoes, Versioning and Preservation  Embargo support is available through the Islandora Scholar Module. Audit trails and item/metadata versioning are maintained through Fedora. For preservation purposes, the University of Saskatchewan Archives is currently in the process of exploring integration between Archivematica and Islandora.36 Hydra integration with Archivematica is on the roadmap for Archivematica 1.1, with sponsorship from Yale.37 5.3.2.  Statistics  While there is a Google Analytics Module that tracks bitstream downloads and item view counts, there is not yet an advanced reporting feature. This is a planned component of the Islandora Scholar Module.38 5.3.3.  File Formats and Batch Importing  Like Fedora, Islandora can ingest any file format. Batch ingest is possible through the Islandora Batch Importer module, which ingests zipped archives with content files and XML metadata. In particular, Islandora has tailored modules for film archiving and strong support for digital humanities: annotation, books, large images, and so on. It is also able to import objects via DOI and PMID. Hydra can also ingest any type of file. While batch ingest is possible, there is currently no common workflow, though Duke University is in the process of developing a tool.39 The College of Charleston reported using OpenWMS,40 which facilitated batch ingest from tab-delimited text files. 5.3.4.  User Interface  The Islandora UI is easy to customize and simply requires some familiarity with Drupal. It also integrates well with two responsive, mobile-friendly themes (AdaptiveTheme and Aether). Audio                                                           34  35  36  37  38!msg/islandora/pkAa24BYWEs/OEFRKHwyZBIJ  39  40  Institutional repository software comparison     11  and video streaming is available through the respective Audio and Video Solution Packs. Users can bookmark items and it is easy for them to submit files. 5.3.5.  Search Islandora?s search functionality is provided by installing Solr and GSearch, as well as the Islandora Solr Search Module. The documentation includes a section on customization. Facets are derived from MODS elements. Hydra uses Blacklight, which is based on Apache Solr. 5.4. Support  The Islandora website has a Confluence-based documentation wiki41. There are also two active Google Groups (islandora and islandora-dev) and a JIRA-based issue-tracking system. Mark Leggott is the founder of discoverygarden inc (DGI)42, which provides commercial support services for the Islandora community, including scoping, development, quality assurance, training, hosting and support. Implementation of some of the IslandScholar features, for instance, might require DGI support. Costs are not disclosed on the website. The Hydra Project uses GitHub and has an active Google Group (hydra-tech). HydraCamps are held at least once a year in North America. 5.5. Summary In recent years, Fedora 3 has seen a declining number of developers and commits, but the ongoing development of Fedora 4 has attracted renewed interest from the community (Shin & Woods, 2013). The power of Fedora makes both Hydra and Islandora exciting new entries in the IR software landscape. Paired with Archivematica, either of these platforms could provide a single solution to the storage, display and preservation needs of a large spectrum of digital collections. The main drawback is that because both are still relative newcomers, they have not been as thoroughly tested by institutions, at least in comparison to DSpace.                                                             41  42  Institutional repository software comparison     12  6.0 Other Features 6.1. SWORD Nine use cases for SWORD are described in a recent D-Lib article by Lewis, de Castro and Jones (2012). They are: ? Publisher to Repository ? Research Information System to Repository ? Desktop to Repository ? Repository to Repository ? Specialised Deposit User Interface to Repository ? Conference Submission System to Repository ? Laboratory Equipment to Repository ? Repository Bulk Ingest ? Collaborative Authoring 6.2. Publishing One of the unique features of Digital Commons is that it provides a platform for publishing open access journals and conference proceedings. The standard subscription includes five journals, and additional journals can be set up at a fee of $1,500 each. The peer-review workflow and publishing system is more user-friendly than Open Journal Systems, a popular open-source software package that provides similar functionality. 6.3. Social Features  Researcher pages create a platform for researchers to consolidate their archived work. Further social features, like informal tagging, comments, discussion forums, and shared bookmarks are also useful. In Digital Commons, all of these are provided. Islandora has some social media tools and automatically generated Scholar Profiles are implemented in UPEI?s IslandScholar repository. The MePrints plugin for EPrints also adds a user profile system for promoting ?work and identity?.43 7.0 Conclusion IR needs are currently well-defined. With major new releases changing the landscape, the next steps in IR development involve attracting more participation from research communities, providing an easy platform for researchers to aggregate and disseminate work, support for large research datasets and built-in peer-review, more robust preservation support and increased interoperability.                                                            43  Institutional repository software comparison     13  8.0 References Beazley, M.R. (2010). EPrints institutional repository software: A review. Partnership. Retrieved from  Callan, P.A., & Gregson, M. (2012, July). Implementing an enhanced repository statistics system for QUT ePrints. Poster presented at the annual meeting of the International Conference on Open Repositories, Edinburgh, UK. Retrieved from Davis, R.M., & Subirats-Coll, I. (2011, June). Changing platforms: Parallel case studies of repository platform migration projects. Presented at the annual meeting of the International Conference on Open Repositories, Austin, US. Retrieved from  DSpace Futures. (2013). Retrieved  Godfrey, N. (2008). Informal comparison of some institutional repository solutions. Metalogger. Retrieved from  Lewis, S., de Castro, P., & Jones, R. (2012). SWORD: Facilitating deposit scenarios. D-Lib Magazine, 18(1/2). Retrieved from Neugebauer, T., & Han, B. (2012). Batch ingesting into EPrints digital repository software. Information Technology and Libraries, 31(1). Retrieved from Office of the Information and Privacy Commissioner for BC. (2012). Cloud computing guidelines for public bodies. Retrieved from  Open Society Institute. (2004). A guide to institutional repository software (3rd ed.). Retrieved from  Paradigm Project. (2008, January 2). Comparing repository software for preserving personal digital archives. Retrieved from Singh, V. (2009). A comparison of open source search engines. Vik?s Blog. Retrieved from  Shin, E., & Woods, A. (2013). Building the future of Fedora. Retrieved from  Tansley, Smith, & Walker. (2005). The DSpace open source digital asset management system: Challenges and opportunities. Paper presented at ECDL'05 Proceedings of the 9th European conference on Research and Advanced Technology for Digital Libraries. Retrieved from  Institutional repository software comparison     14  University at Albany. (2012, September 5). Comparison spreadsheet. Retrieved from Van de Sompel, H., Nelson, M.L., Lagoze, C., & Warner, S. (2002). The Open Archives Initiative Protocol for Metadata Harvesting. Retrieved from  Witt, M., & Newton, M.P. (2008). Preparing batch deposits for Digital Commons repositories. Retrieved from  Younglove, A. (2012). Other options. Retrieved from   


Citation Scheme:


Citations by CSL (citeproc-js)

Usage Statistics



Customize your widget with the following options, then copy and paste the code below into the HTML of your page to embed this item in your website.
                            <div id="ubcOpenCollectionsWidgetDisplay">
                            <script id="ubcOpenCollectionsWidget"
                            async >
IIIF logo Our image viewer uses the IIIF 2.0 standard. To load this item in other compatible viewers, use this url:


Related Items