Difference between revisions of "Workflow:Workflow for preserving research data using Archivematica, Fedora, Hydra and PURE"

From COPTR
Jump to navigation Jump to search
(Created page with "Upload file (Toolbox on left) and add a workflow image here or remove Category:COW Workflows ==Workflow description== <!-- Describe your workflow he...")
 
(7 intermediate revisions by 3 users not shown)
Line 1: Line 1:
[[File:workflow.png|Upload file (Toolbox on left) and add a workflow image here or remove]]
+
 
 
[[Category:COW Workflows]]
 
[[Category:COW Workflows]]
 +
 +
{{Infobox_COW
 +
|name=Workflow for preserving research data using Archivematica, Fedora, Hydra and PURE
 +
|status=Experimental
 +
|tools=Archivematica<br />[[Fedora_Commons]]<br />Hydra<br />PURE
 +
|input= Corpus of PDF/A files
 +
|output= CSV with validationresult and metadata
 +
|organisation=[http://dpconline.org/ Digital Preservation Coalition]
 +
}}
  
 
==Workflow description==
 
==Workflow description==
 
<!-- Describe your workflow here. If necessary add a diagram. Link to tool entries in COPTR where possible -->
 
<!-- Describe your workflow here. If necessary add a diagram. Link to tool entries in COPTR where possible -->
 +
[[File:UoY_RDM_workflow.jpg|Upload file (Toolbox on left) and add a workflow image here or remove]]
 +
<br>
 +
This workflow uses Archivematica, Fedora, Hydra and PURE to preserve and provide access to academic research data. The workflow includes a high level of automation.
  
This workflow uses Archivematica, Fedora, Hydra and PURE to preserve and provide access to academic research data. The workflow includes a high level of automation.
+
* PURE is the Current Research Information System at the University of York. This is where researchers enter metadata about the dataset they are depositing
 +
* Once metadata is entered into PURE, library staff contact the researcher to request that the data is uploaded
 +
* Upload is carried out via an online form - this upload form is part of the Research Data York application (a bespoke hydra based application)
 +
* Uploaded data goes into a directory that is watched by Archivematica and here it is arranged into a SIP structure
 +
* Archivematica picks up the SIP and processes it - the creation of an AIP is fully automated
 +
* The AIP is stored. A DIP is not created by default
 +
* Metadata about the dataset is available in the data catalogue
 +
* If the dataset is requested, there is a manual approval step (within the Research Data York application) and then Archivematica automatically creates a DIP
 +
* This DIP is passed to Fedora
 +
* The user is notified when the data is ready
 +
* Data is available for download
 +
 
 +
The workflow was created as part of the [https://www.york.ac.uk/borthwick/projects/archivematica/ Filling the Digital Preservation Gap project] and is heavily based on an implementation plan included in the Phase 2 project report which can be found on [https://dx.doi.org/10.6084/m9.figshare.2073220 Figshare]. A description of how the workflow was implemented as a proof of concept is included in the Phase 3 project report which is also on [https://dx.doi.org/10.6084/m9.figshare.4040787 Figshare].
  
 
==Purpose, context and content==
 
==Purpose, context and content==
 
<!-- Describe what your workflow is for, what the organisational context of the workflow is, and what content it is designed to work with -->
 
<!-- Describe what your workflow is for, what the organisational context of the workflow is, and what content it is designed to work with -->
  
The purpose of this workflow is to preserve and disseminate research data in an automated fashion. Research data is a valuable asset produced by academic institutions and should be retained so that findings can be validated. Some of this data may have longer term re-use potential, particularly where it can not be replicated  
+
The purpose of this workflow is to preserve and disseminate research data in an automated fashion. Research data is a valuable asset produced by academic institutions and should be retained so that findings can be validated. Some of this data may have longer term re-use potential, particularly where it can not be replicated. At the University of York our Research Data Management policy states that research data should be retained for ten years from date of last access. This means that even for datasets that are only occasionally accessed, the retention period may be much longer than ten years.
  
 
==Evaluation/Review==
 
==Evaluation/Review==
 
<!-- How effective was the workflow? Was it replaced with a better workflow? Did it work well with some content but not others? What is the current status of the workflow? Does it relate to another workflow already described on the wiki? Link, explain and elaborate -->
 
<!-- How effective was the workflow? Was it replaced with a better workflow? Did it work well with some content but not others? What is the current status of the workflow? Does it relate to another workflow already described on the wiki? Link, explain and elaborate -->
  
 +
This workflow has been created as a proof of concept at the University of York. It is due to move into production in May 2017.
  
 
<!-- Add four tildes below ("~~~~") to create an automatic signature, including your wiki username. Ensure your user page (click on your username to create it) includes an up to date contact email address -->
 
<!-- Add four tildes below ("~~~~") to create an automatic signature, including your wiki username. Ensure your user page (click on your username to create it) includes an up to date contact email address -->
  
 +
[[User:Jenny mitcham|Jenny mitcham]] ([[User talk:Jenny mitcham|talk]]) 16:29, 3 March 2017 (UTC)
  
 
<!-- Note that your workflow will be marked with a CC3.0 licence -->
 
<!-- Note that your workflow will be marked with a CC3.0 licence -->

Revision as of 12:20, 6 September 2018


Workflow for preserving research data using Archivematica, Fedora, Hydra and PURE
Status:Experimental
Tools:
  • Archivematica
    Fedora_Commons
    Hydra
    PURE
  • Property "Tools" (as page type) with input value "ArchivematicaFedora_CommonsHydraPURE" contains invalid characters or is incomplete and therefore can cause unexpected results during a query or annotation process.
Input:Corpus of PDF/A files
Output:CSV with validationresult and metadata
Organisation:Digital Preservation Coalition


Workflow description

Upload file (Toolbox on left) and add a workflow image here or remove
This workflow uses Archivematica, Fedora, Hydra and PURE to preserve and provide access to academic research data. The workflow includes a high level of automation.

  • PURE is the Current Research Information System at the University of York. This is where researchers enter metadata about the dataset they are depositing
  • Once metadata is entered into PURE, library staff contact the researcher to request that the data is uploaded
  • Upload is carried out via an online form - this upload form is part of the Research Data York application (a bespoke hydra based application)
  • Uploaded data goes into a directory that is watched by Archivematica and here it is arranged into a SIP structure
  • Archivematica picks up the SIP and processes it - the creation of an AIP is fully automated
  • The AIP is stored. A DIP is not created by default
  • Metadata about the dataset is available in the data catalogue
  • If the dataset is requested, there is a manual approval step (within the Research Data York application) and then Archivematica automatically creates a DIP
  • This DIP is passed to Fedora
  • The user is notified when the data is ready
  • Data is available for download

The workflow was created as part of the Filling the Digital Preservation Gap project and is heavily based on an implementation plan included in the Phase 2 project report which can be found on Figshare. A description of how the workflow was implemented as a proof of concept is included in the Phase 3 project report which is also on Figshare.

Purpose, context and content

The purpose of this workflow is to preserve and disseminate research data in an automated fashion. Research data is a valuable asset produced by academic institutions and should be retained so that findings can be validated. Some of this data may have longer term re-use potential, particularly where it can not be replicated. At the University of York our Research Data Management policy states that research data should be retained for ten years from date of last access. This means that even for datasets that are only occasionally accessed, the retention period may be much longer than ten years.

Evaluation/Review

This workflow has been created as a proof of concept at the University of York. It is due to move into production in May 2017.


Jenny mitcham (talk) 16:29, 3 March 2017 (UTC)