DataStage is a flexible data storage system that provides controlled access, secure backup, and the ability to transfer selected files to a more permanent archiving facility. Designed for research groups, the system appears as a mapped drive on the end-user’s computer.
Oxford University Bodleian Libraries, as part of the wider DataFlow project
Licensing and cost
Free – open source MIT license
DataStage version 0.3 was released in May 2012. While the DataFlow project will end in May 2012, the Bodlean Library has decided to trial the software for its own use and will support development until at least May 2013.
Platform and interoperability
DataStage is designed for the Ubuntu Linux 11.10 Oneiric Ocelot operating system, and the Virtual Machines work with VMWare Fusion 4.x. While it is intended to integrate with DataBank, the software offers an API so that it can package datasets for submission to any SWORD-2-compliant repository. End-users can connect to DataStage through a web interface or as a mapped drive on Mac, Linux or Windows machines.
The software gives three levels of password-controlled access: a "private" area only accessible to the file owner and the group leader, a "shared" area giving read-only access to the group, and a "collaborative" area giving read- and write-access. The administrator can invite outside collaborators into the group, pinpointing their level of access. Users can also access and annotate the files through a web interface. DataStage can be deployed on a local server, or on an institutional or commercial cloud; users can also dynamically invoke additional cloud storage as required. Users can integrate the system into existing backup procedures. The repository interface also allows researchers to push selected files into a more permanent archive facility. While users can add free-text metadata via the web interface, DataStage also automatically captures a number of general file attributes: date uploaded; file name; last modified; type; owner; location; and size.
Documentation and user support
Documentation is available in the form of an Information for Test Users page and the DataStage documentation wiki, which is very much a work in progress, but does offer information for installation and use. The project has an active mailing list at http://email@example.com and links to a JIRA issue tracker. Installation instructions are included in a README file, which comes zipped with the installation package. The project is creating video walkthroughs for installation, configuration and use of DataStage, to be available from the website by the end of May 2012.
End-users interact with the system either as a mapped drive on their computer, implicitly integrating with their operating system’s current navigation structure, or through a web interface.
Installation and configuration would greatly benefit from knowledge of system administration, and use of the Linux command line. Walkthrough videos should make it possible to get DataStage running without expertise, but novice users may not be able to get maximum functionality and customisability from the system.
Metadata automatically gathered by the system is in RDF format. The system uses the BagIt specification when transferring files to a permanent archive, which must be SWORD-2 compliant.
Influence and take-up
DataStage is used at the Oxford Bodleian Libraries; information about wider use is unavailable.