Difference between revisions of "Workflow:LAC Pre-Ingest Workflow"
Jump to navigation
Jump to search
Prwheatley (talk | contribs) |
|||
Line 31: | Line 31: | ||
#Use TreeSize Pro to create a digital object listing of all digital objects successfully copied to LAC infrastructure | #Use TreeSize Pro to create a digital object listing of all digital objects successfully copied to LAC infrastructure | ||
#Run DROID on the entire registration and create a DROID export (*.CSV) for all content | #Run DROID on the entire registration and create a DROID export (*.CSV) for all content | ||
− | #Start populating the Pre-Ingest Report. Record | + | #Start populating the Pre-Ingest Report. Record Pre-Ingest volume, number of files, and number of folders |
#Weed transitory objects (temporary files, system files, configuration files, Thumbs.db, program files) that are not required to render the information of long-term value | #Weed transitory objects (temporary files, system files, configuration files, Thumbs.db, program files) that are not required to render the information of long-term value | ||
#In the Pre-Ingest Report, record if viruses were detected as part of the virus scan | #In the Pre-Ingest Report, record if viruses were detected as part of the virus scan | ||
#If authored Audio or video files are present, add a note that content must be processed according to a separate AV workflow. | #If authored Audio or video files are present, add a note that content must be processed according to a separate AV workflow. | ||
#Non-authored AV material (i.e. unstructured AV material that has been copied to the Pre-Ingest server)can be processed in the same manner as other digital records. | #Non-authored AV material (i.e. unstructured AV material that has been copied to the Pre-Ingest server)can be processed in the same manner as other digital records. | ||
− | #Use TreeSize Pro | + | #Use TreeSize Pro to identify: |
+ | #(sub-bullet)Duplicated records existing within the corpus. Note existence of duplicate records to be addressed as part of archival processing. | ||
+ | #(sub-bullet)Container files. Extract any container files identified using software like WinZip. | ||
+ | #(sub-bullet)Email file formats. Identified email files may require a stand alone workflow for processing | ||
+ | #(sub-bullet)Data file formats. | ||
+ | #(sub-bullet)Database file formats | ||
+ | #(sub-bullet)Website file formats. Consult with the Web-Archives and Social Media Program. | ||
+ | #(sub-bullet)zero bytes files. Use TreeSize Pro to move zero byte files into disposition folder. | ||
+ | #(sub-bullet)long file paths. | ||
+ | #(sub-bullet)empty folders. | ||
+ | |||
+ | |||
Revision as of 19:29, 29 April 2021
Workflow Description
- Review Digital Transfer Assessment Form (DTAF) and provide strategic advice to acquiring staff on potential challenges for pre-ingest, archival processing, and long-term digital preservation
- Transfer the content
- Ensure that the content has a persistent identifier (e.g., Registration Control Number; barcode for containers where physical storage media are stored)
- Confirm security classification of records and infrastructure that is required for processing information at this level of classification
- Create a processing workspace on the Pre-ingest server (or classified infrastructure). Note: The Pre-ingest server is a server location where processing workspaces are organized by a standardized directory structure
- Ensure that all appropriate metadata and metadata generating templates are copied to the processing workspace (MET subfolder)
- Complete a physical carrier inventory documenting all physical storage media involved in the transfer. Media must be numbered sequentially (e.g. "01" or "001").
- Review and complete Digital Processing Checklist
- Write protect physical media when possible
- Insert into drive/attach to computer/anti-virus software (LAC network network will automatically scan for viruses; if a virus is detected, do not copy the data and document the virus in the Pre-ingest Report
- Review file format/directory structure of physical media at a high level
- If interactive CD/DVD content or authored AV content is detected, do not perform pre-ingest for the physical carriers containing that content. Follow workflow for authored AV content. Document this in the Physical Carrier Inventory spreadsheet
- Create sub-folders for physical media on the Pre-ingest server. If a physical carrier is blank or unreadable, do not create a subfolder for it
- Copy the content to the Pre-ingest server using SafeCopy. If issues occur, troubleshoot via SafeCopy or other checksum software like Fsum or MD5summer
- For any content that is unreadable, send to Digital Preservation for extraction. Record this in Physical Carrier Inventory
- Use TreeSize Pro to create a digital object listing of all digital objects successfully copied to LAC infrastructure
- Run DROID on the entire registration and create a DROID export (*.CSV) for all content
- Start populating the Pre-Ingest Report. Record Pre-Ingest volume, number of files, and number of folders
- Weed transitory objects (temporary files, system files, configuration files, Thumbs.db, program files) that are not required to render the information of long-term value
- In the Pre-Ingest Report, record if viruses were detected as part of the virus scan
- If authored Audio or video files are present, add a note that content must be processed according to a separate AV workflow.
- Non-authored AV material (i.e. unstructured AV material that has been copied to the Pre-Ingest server)can be processed in the same manner as other digital records.
- Use TreeSize Pro to identify:
- (sub-bullet)Duplicated records existing within the corpus. Note existence of duplicate records to be addressed as part of archival processing.
- (sub-bullet)Container files. Extract any container files identified using software like WinZip.
- (sub-bullet)Email file formats. Identified email files may require a stand alone workflow for processing
- (sub-bullet)Data file formats.
- (sub-bullet)Database file formats
- (sub-bullet)Website file formats. Consult with the Web-Archives and Social Media Program.
- (sub-bullet)zero bytes files. Use TreeSize Pro to move zero byte files into disposition folder.
- (sub-bullet)long file paths.
- (sub-bullet)empty folders.
Purpose, Context and Content
Pre-ingest is the technological review of digital records transferred to LAC. It involves technical appraisal using multiple software tools to automate the process. The goal of pre-ingest is to aid in creating a SIP that conforms to LAC’s preservation policies and that which LAC has a reasonable success of preserving in its repository and making accessible for the long-term.
There are two major tasks for pre-ingest:
- Weed any digital records that should not have transferred in the first place (i.e. computer files that are configuration, developmental, temporary, software files etc.).
- Identify file format or other issues that need addressing prior to or during appraisal/selection/description by archival staff.