Editing Workflow:Workflow for ingesting digitized books into a digital archive
Jump to navigation
Jump to search
Warning: You are not logged in. Your IP address will be publicly visible if you make any edits. If you log in or create an account, your edits will be attributed to your username, along with other benefits.
The edit can be undone. Please check the comparison below to verify that this is what you want to do, and then save the changes below to finish undoing the edit.
Latest revision | Your text | ||
Line 1: | Line 1: | ||
− | {{ | + | [[File:workflow.png|Upload file (Toolbox on left) and add a workflow image here or remove]] |
+ | [[Category:COW Workflows]] | ||
+ | |||
+ | {{Infobox_COW | ||
+ | |name=Ingest of digitized books | ||
|status=Testphase | |status=Testphase | ||
− | |tools=7-Zip | + | |tools= |
− | + | ||
− | + | [[7-Zip]]<br />[http://www.docuteam.ch/en/products/it-for-archives/software/ docuteam feeder]<br />[https://en.wikipedia.org/wiki/CURL cURL]<br />[https://en.wikipedia.org/wiki/Saxon_XSLT Saxon]<br />[[DROID]]<br />[[FITS_(File_Information_Tool_Set)]]<br />[https://en.wikipedia.org/wiki/Clam_AntiVirus Clam AV]<br />[[Fedora_Commons]] | |
− | |organisation= | + | |organisation=[http://www.unibe.ch/university/services/university_library/ub/index_eng.html Universitätsbibliothek Bern] |
− | |||
− | |||
}} | }} | ||
+ | |||
==Workflow Description== | ==Workflow Description== | ||
<div class="toccolours mw-collapsible mw-collapsed" data-expandtext="Show Diagram" data-collapsetext="Hide Diagram" > | <div class="toccolours mw-collapsible mw-collapsed" data-expandtext="Show Diagram" data-collapsetext="Hide Diagram" > | ||
Line 13: | Line 16: | ||
</div> | </div> | ||
− | # The data provider provides | + | # The data provider provides his content as an input for the transfer tool (currently in development). |
# The transfer tool creates a zip-container with the content and calculates a checksum of the container. | # The transfer tool creates a zip-container with the content and calculates a checksum of the container. | ||
− | # The zip-container and the checksum are bundled (another zip-container or a plain folder) and form the SIP | + | # The zip-container and the checksum are bundled (another zip-container or a plain folder) and form the SIP toghether. |
# The transfer tool moves the SIP to a registered, data provider specific hotfolder, which is connected to the ingest server. | # The transfer tool moves the SIP to a registered, data provider specific hotfolder, which is connected to the ingest server. | ||
# As soon as the complete SIP has been transfered to the ingest server, a trigger is raised and the ingest workflow starts. | # As soon as the complete SIP has been transfered to the ingest server, a trigger is raised and the ingest workflow starts. | ||
# The SIP gets unpacked. | # The SIP gets unpacked. | ||
− | # The zip container that contains the content is validated according to the provided checksum. If this fixity check fails, the data provider is asked to reingest | + | # The zip container, that contains the content, is validated according to the provided checksum. If this fixity check fails, the data provider is asked to reingest his data. |
# The content and the structure of the content are validated against the submission agreement, that was signed with the data provider (this step is currently in development). | # The content and the structure of the content are validated against the submission agreement, that was signed with the data provider (this step is currently in development). | ||
# Based on an unique id (encoded in the content filename) descriptive metadata is fetched from the library's OPAC over an OAI-PMH interface. | # Based on an unique id (encoded in the content filename) descriptive metadata is fetched from the library's OPAC over an OAI-PMH interface. | ||
Line 31: | Line 34: | ||
# Each content file is scanned for viruses and malware by Clam AV. | # Each content file is scanned for viruses and malware by Clam AV. | ||
# For each content object and for the whole information entity (the book) a PID is fetched from the repository. | # For each content object and for the whole information entity (the book) a PID is fetched from the repository. | ||
− | # For each content object and for the whole information entity an AIP is generated. This process includes the generation of RDF | + | # For each content object and for the whole information entity an AIP is generated. This process includes the generation of RDF-tipples, that contain the relationships between the objects. |
# The AIPs are ingested into the repository. | # The AIPs are ingested into the repository. | ||
− | # The data producer | + | # The data producer get's informed, that the ingest finished successfully. |
The tools and their function in the workflow: | The tools and their function in the workflow: | ||
Line 44: | Line 47: | ||
* [https://en.wikipedia.org/wiki/Clam_AntiVirus Clam AV] - Virus check | * [https://en.wikipedia.org/wiki/Clam_AntiVirus Clam AV] - Virus check | ||
* [[Fedora_Commons]] - Digital Repository | * [[Fedora_Commons]] - Digital Repository | ||
+ | |||
==Purpose, Context and Content== | ==Purpose, Context and Content== |