Difference between revisions of "Workflow:Cloud-based preservation and access workflow for MXF and MPG video"

From COPTR
Jump to navigation Jump to search
 
(19 intermediate revisions by the same user not shown)
Line 1: Line 1:
 
{{Infobox COW
 
{{Infobox COW
 
|status=Production
 
|status=Production
|tools=Goobi, AWS S3, AWS Lambda, AWS MediaConvert, Wellcome Storage Service, DDS, DLCS
+
|tools=Goobi, AWS S3, AWS Lambda, AWS MediaConvert, Wellcome Storage Service, DDS, DLCS, Bag, Bagit, IIIF
|input=MFX + MP4 with jpg poster image, optional PDF transcript or MPG with jpg poster image and optional PDF transcript
+
|input=.mxf + .mp4 with .jpg poster image,and optional .pdf transcript or .mpg with .jpg poster image and optional .pdf transcript
|output=Low res MP4 with JPG poster image and optional PDF transcript and IIIF manifest
+
|output=Low res .mp4 with .jpg poster image, optional .pdf transcript, and IIIF manifest
 
|organisation=Wellcome Collection
 
|organisation=Wellcome Collection
 
|organisationurl=http://www.wellcomecollection.org
 
|organisationurl=http://www.wellcomecollection.org
Line 9: Line 9:
 
==Workflow Description==
 
==Workflow Description==
 
<!-- Use the line below to add an diagram image, or delete the line if you don't want one -->
 
<!-- Use the line below to add an diagram image, or delete the line if you don't want one -->
 +
==== '''MXF Video Pre-Ingest Workflow''' ====
 
[[File:Av preingest workflow 400dpi v1.png|MXF Film Pre-ingest Workflow]]<br>
 
[[File:Av preingest workflow 400dpi v1.png|MXF Film Pre-ingest Workflow]]<br>
 +
 +
The video pre-ingest workflow converts the .mxf video to .mp4 (to be used as QA and access) and moves all files from a public bucket to a private bucket
 +
 +
# Vendor uploads Film Batch X consisting of .mxf and .jpg post images to the wellcomecollection-digitisation-transfer bucket in AWS S3
 +
# The files arriving in the bucket activate two Lambdas simultaneously
 +
# The A/V Pre-Ingest Copy Lambda copies the .mxf and .jpg files over to a different bucket, the wellcomecollection-av-digitisation bucket, which can only be accessed by Wellcome Staff
 +
# The A/V Pre-Ingest Convert Lambda sends the .mxf to AWS MediaConvert to create an .mp4
 +
# The .mp4 is delivered to the wellcomecollection-av-digitisation bucket alongside the .mxf and .jpgs. The .mp4 will be QA’d and the files remain here until ingest.
 +
 +
==== '''MXF/MP4 or MPG Video Ingest Workflow''' ====
 
[[File:Detailed film workflow 400dpi v1.png]]<br>
 
[[File:Detailed film workflow 400dpi v1.png]]<br>
 
<!-- Describe your workflow here. Provide a list of each tool or component in your workflow if it expands on the basic list of tools and links in the infobox above, but don't duplicate for the sake of it-->
 
<!-- Describe your workflow here. Provide a list of each tool or component in your workflow if it expands on the basic list of tools and links in the infobox above, but don't duplicate for the sake of it-->
 +
<br>
 +
This video ingest workflow can be used for either and .mxf/.mp4 and .jpg poster image or an .mpg and .jpg poster image. An optional .pdf transcript can be added to either type of ingest. Both ingest types are sent to the Wellcome Storage Service for preservation and then to DDS/DLCS to be made available for access.<br>
 +
 +
# Copy over an .mxf and .mp4 or an .mpg, with the accompanying .jpg poster image and .pdf (*optional as not always available) transcript from their original bucket into a folder created in the wellcomecollection-workflow-upload bucket. The folder name should match the name of the process title for the item in Goobi. The process title will have been created in Goobi by loading the marc.xml prior to ingest in the **bibliographic import step.
 +
# The upload of the files will trigger the Goobi Lambda which queries Goobi for a process title that matches the name of the folder and will send the files to the process if a match is found
 +
# In Goobi, the user can check that the files are copied over and release the video data import step
 +
# Goobi automatically moves the files into appropriate internal folders (access, preservation, poster, transcript) in preparation for writing the METS filegroups and usage attributes
 +
# At Edit METS, the user must select the license and access status for the film to be written to the METS
 +
# Goobi continues the workflow automatically
 +
#* writing PREMIS data to the METS
 +
#* checking if the item is a single item or multiple manifestation
 +
#* creating a bag
 +
# The bag is sent to the Wellcome Storage Service where it is verified and stored. The .mxf files are automatically life cycled to Glacier Deep Store. The storage sends a callback to Goobi to verify the bag has been stored successfully
 +
# When Goobi gets the call back, it calls the DDS API. DDS (access software) then reads the METS in storage and starts writing a IIIF manifest while DLCS (access file creator) begins making an access copy of the high res .mp4 or .mpg
 +
# When the IIIF manifest and access copies are ready, wellcomecollection.org/collection starts displaying the video
  
 
==Purpose, Context and Content==
 
==Purpose, Context and Content==
 
<!-- Describe what your workflow is for - i.e. what it is designed to achieve, what the organisational context of the workflow is, and what content it is designed to work with -->
 
<!-- Describe what your workflow is for - i.e. what it is designed to achieve, what the organisational context of the workflow is, and what content it is designed to work with -->
 +
This is a Cloud-based, semi-automated pre-ingest workflow for .mxf videos and an ingest workflow for both .mxf and .mpg videos. The previous version of the video workflow was for .mpg videos only and was completely based on on-premises servers.
 +
 +
The pre-ingest workflow was created to handle Wellcome's new film/video digitisation to JPEG2000/MXF format. The ingest workflow was created to handle the new .mxf format as well as a backlog of .mpgs.
 +
 +
For more information about why and how we built this workflow, please see our [https://stacks.wellcomecollection.org/audiovisual-workflows-for-digital-preservation-8c071ca39e96 Wellcome Collection Stacks blog post]. The post was written in December 2020. As of February 2021, both the pre-ingest and ingest workflows are in production.
  
 
==Evaluation/Review==
 
==Evaluation/Review==
 
<!-- How effective was the workflow? Was it replaced with a better workflow? Did it work well with some content but not others? What is the current status of the workflow? Does it relate to another workflow already described on the wiki? Link, explain and elaborate -->
 
<!-- How effective was the workflow? Was it replaced with a better workflow? Did it work well with some content but not others? What is the current status of the workflow? Does it relate to another workflow already described on the wiki? Link, explain and elaborate -->
 +
The workflows are in production and are meeting our current requirements for preservation and access. The process will be reviewed and adjusted as required in the future.
  
 
==Further Information==
 
==Further Information==
 
<!-- Provide any further information or links to additional documentation here -->
 
<!-- Provide any further information or links to additional documentation here -->
 +
 +
* [https://wellcomecollection.org/pages/Wvmu3yAAAIUQ4C7F Policies and plans] 
 +
* Digitisation Strategy [[https://wellcomecollection.cdn.prismic.io/wellcomecollection/0047856d-bba9-4ab2-81b6-a270f887a8fb_WC+Digitisation+Strategy+2020-2025.pdf PDF]]
 +
* [https://wellcomecollection.org/works Library catalogue]
 +
* [https://stacks.wellcomecollection.org/ Digital Engagement blog]
 +
** [https://stacks.wellcomecollection.org/digital-preservation-at-wellcome-3f86b423047 Digital Preservation at Wellcome]
 +
** [https://stacks.wellcomecollection.org/building-wellcome-collections-new-archival-storage-service-3f68ff21927e Building Wellcome Collection’s new archival storage service] 
 +
** [https://stacks.wellcomecollection.org/how-we-store-multiple-versions-of-bagit-bags-e68499815184 How we store multiple versions of BagIt bags]
 +
** [https://stacks.wellcomecollection.org/large-things-living-in-cold-places-66cbc3603e14 Large things living in cold places] - How we use Glacier Deep Store
 +
** [https://stacks.wellcomecollection.org/a-sprinkling-of-azure-6cef6e150fb2 A sprinkling of Azure] - How we backed up AWS to Azure
 +
** [https://stacks.wellcomecollection.org/our-approach-to-digital-verification-79da59da4ab7 Our approach to digital verification]
 +
* [https://developers.wellcomecollection.org/ Developer information]
 +
* [https://github.com/wellcomecollection Github site] 
 +
* [https://roadmap.wellcomecollection.org/tabs/1-planned Public roadmap]
 +
  
 
<!-- Add four tildes below ("~~~~") to create an automatic signature, including your wiki username. Ensure your user page (click on your username to create it) includes an up to date contact email address so that people can contact you if they want to discuss your workflow -->
 
<!-- Add four tildes below ("~~~~") to create an automatic signature, including your wiki username. Ensure your user page (click on your username to create it) includes an up to date contact email address so that people can contact you if they want to discuss your workflow -->
 +
 +
[[User:ARay|ARay]] ([[User talk:ARay|talk]]) 13:56, 28 April 2021 (UTC)
  
 
<!-- Note that your workflow will be marked with a CC3.0 licence -->
 
<!-- Note that your workflow will be marked with a CC3.0 licence -->

Latest revision as of 14:05, 28 April 2021

Cloud-based preservation and access workflow for MXF and MPG video
Status:Production
Tools:
Input:.mxf + .mp4 with .jpg poster image,and optional .pdf transcript or .mpg with .jpg poster image and optional .pdf transcript
Output:Low res .mp4 with .jpg poster image, optional .pdf transcript, and IIIF manifest
Organisation:Wellcome Collection

Workflow Description[edit]

MXF Video Pre-Ingest Workflow[edit]

MXF Film Pre-ingest Workflow

The video pre-ingest workflow converts the .mxf video to .mp4 (to be used as QA and access) and moves all files from a public bucket to a private bucket

  1. Vendor uploads Film Batch X consisting of .mxf and .jpg post images to the wellcomecollection-digitisation-transfer bucket in AWS S3
  2. The files arriving in the bucket activate two Lambdas simultaneously
  3. The A/V Pre-Ingest Copy Lambda copies the .mxf and .jpg files over to a different bucket, the wellcomecollection-av-digitisation bucket, which can only be accessed by Wellcome Staff
  4. The A/V Pre-Ingest Convert Lambda sends the .mxf to AWS MediaConvert to create an .mp4
  5. The .mp4 is delivered to the wellcomecollection-av-digitisation bucket alongside the .mxf and .jpgs. The .mp4 will be QA’d and the files remain here until ingest.

MXF/MP4 or MPG Video Ingest Workflow[edit]

Detailed film workflow 400dpi v1.png

This video ingest workflow can be used for either and .mxf/.mp4 and .jpg poster image or an .mpg and .jpg poster image. An optional .pdf transcript can be added to either type of ingest. Both ingest types are sent to the Wellcome Storage Service for preservation and then to DDS/DLCS to be made available for access.

  1. Copy over an .mxf and .mp4 or an .mpg, with the accompanying .jpg poster image and .pdf (*optional as not always available) transcript from their original bucket into a folder created in the wellcomecollection-workflow-upload bucket. The folder name should match the name of the process title for the item in Goobi. The process title will have been created in Goobi by loading the marc.xml prior to ingest in the **bibliographic import step.
  2. The upload of the files will trigger the Goobi Lambda which queries Goobi for a process title that matches the name of the folder and will send the files to the process if a match is found
  3. In Goobi, the user can check that the files are copied over and release the video data import step
  4. Goobi automatically moves the files into appropriate internal folders (access, preservation, poster, transcript) in preparation for writing the METS filegroups and usage attributes
  5. At Edit METS, the user must select the license and access status for the film to be written to the METS
  6. Goobi continues the workflow automatically
    • writing PREMIS data to the METS
    • checking if the item is a single item or multiple manifestation
    • creating a bag
  7. The bag is sent to the Wellcome Storage Service where it is verified and stored. The .mxf files are automatically life cycled to Glacier Deep Store. The storage sends a callback to Goobi to verify the bag has been stored successfully
  8. When Goobi gets the call back, it calls the DDS API. DDS (access software) then reads the METS in storage and starts writing a IIIF manifest while DLCS (access file creator) begins making an access copy of the high res .mp4 or .mpg
  9. When the IIIF manifest and access copies are ready, wellcomecollection.org/collection starts displaying the video

Purpose, Context and Content[edit]

This is a Cloud-based, semi-automated pre-ingest workflow for .mxf videos and an ingest workflow for both .mxf and .mpg videos. The previous version of the video workflow was for .mpg videos only and was completely based on on-premises servers.

The pre-ingest workflow was created to handle Wellcome's new film/video digitisation to JPEG2000/MXF format. The ingest workflow was created to handle the new .mxf format as well as a backlog of .mpgs.

For more information about why and how we built this workflow, please see our Wellcome Collection Stacks blog post. The post was written in December 2020. As of February 2021, both the pre-ingest and ingest workflows are in production.

Evaluation/Review[edit]

The workflows are in production and are meeting our current requirements for preservation and access. The process will be reviewed and adjusted as required in the future.

Further Information[edit]


ARay (talk) 13:56, 28 April 2021 (UTC)