Workflow:LAC Pre-Ingest Workflow
- Review Digital Transfer Assessment Form (DTAF) and provide strategic advice to acquiring staff on potential challenges for pre-ingest, archival processing, and long-term digital preservation
- Transfer the content
- Ensure that the content has a persistent identifier (e.g., Registration Control Number; barcode for containers where physical storage media are stored)
- Confirm security classification of records and infrastructure that is required for processing information at this level of classification
- Create a processing workspace on the Pre-ingest server (or classified infrastructure). Note: The Pre-ingest server is a server location where processing workspaces are organized by a standardized directory structure
- Ensure that all appropriate metadata and metadata generating templates are copied to the processing workspace (MET subfolder)
- Complete a physical carrier inventory documenting all physical storage media involved in the transfer. Media must be numbered sequentially (e.g. "01" or "001").
- Review and complete Digital Processing Checklist
- Write protect physical media when possible
- Insert into drive/attach to computer/anti-virus software (LAC network network will automatically scan for viruses; if a virus is detected, do not copy the data and document the virus in the Pre-ingest Report
- Review file format/directory structure of physical media at a high level
- If interactive CD/DVD content or authored AV content is detected, do not perform pre-ingest for the physical carriers containing that content. Follow workflow for authored AV content. Document this in the Physical Carrier Inventory spreadsheet
- Create sub-folders for physical media on the Pre-ingest server. If a physical carrier is blank or unreadable, do not create a subfolder for it
- Copy the content to the Pre-ingest server using SafeCopy. If issues occur, troubleshoot via SafeCopy or other checksum software like Fsum or MD5summer
- For any content that is unreadable, send to Digital Preservation for extraction. Record this in Physical Carrier Inventory
- Use TreeSize Pro to create a digital object listing of all digital objects successfully copied to LAC infrastructure
- Run DROID on the entire registration and create a DROID export (*.CSV) for all content
- Start populating the Pre-Ingest Report. Record Pre-Ingest volume, number of files, and number of folders
- Weed transitory objects (temporary files, system files, configuration files, Thumbs.db, program files) that are not required to render the information of long-term value
- In the Pre-Ingest Report, record if viruses were detected as part of the virus scan
- If authored Audio or video files are present, add a note that content must be processed according to a separate AV workflow.
- Non-authored AV material (i.e. unstructured AV material that has been copied to the Pre-Ingest server)can be processed in the same manner as other digital records.
- Use TreeSize Pro to identify:
(sub-bullet)Duplicated records existing within the corpus. Note existence of duplicate records to be addressed as part of archival processing.
- (sub-bullet)Container files. Extract any container files identified using software like WinZip.
- (sub-bullet)Email file formats. Identified email files may require a stand alone workflow for processing
- (sub-bullet)Data file formats.
- (sub-bullet)Database file formats
- (sub-bullet)Website file formats. Consult with the Web-Archives and Social Media Program.
- (sub-bullet)zero bytes files. Use TreeSize Pro to move zero byte files into disposition folder.
- (sub-bullet)long file paths.
- (sub-bullet)empty folders.
- Triage the DROID report.
- Create a DROID report.xlsx, which can be filed in the package's metadata subfolder
- (sub-bullet) color code formats in the spreadsheet according to the following categories as needed - for example:
- (sub-sub-bullet) Green = Ok as-is
- (sub-sub-bullet) Red = format unknown or non-standard: archivist to determine how to access prior preservation
- (sub-sub-bullet) Grey = ineligible for selection - file format preservation issue
- (sub-bullet)Refer to the Guidelines for Transferring Information Resources of Enduring Value for information on file formats already encountered during processing
- (sub-sub-bullet) these guidelines document LAC's formal and information file format policy decisions based on feedback and discussion with LAC Subject Matter Experts
- (sub-bullet) Further research may be required if unrecognized or unknown file formats are encountered
- Identify password protected and encrypted files
- (sub-bullet)Run software to identify files that are encrypted or password protection
- (sub-bullet) create a report in CSV format that reflects the results of the analysis
- Record Pre-Ingest metrics
- (sub-bullet) record post pre-ingest volume, number of files and folder in the Pre-Ingest Report
- Follow up with archival processing staff
- (sub-bullet) email archival processing staff to report that pre-ingest is complete
- (sub-bullet) provide hyperlink to Pre-Ingest Report or indicate it is in the MET subfolder of the processing workspace
- (sub-bullet)provide hyperlink to Pre-Ingest Report or indicate it is sitting in the the metadata subfolderof the processing workspace
- (sub-bullet)flag anything else that is noteworthy - especially file formats requiring research by archival staff
- (sub-bullet) If feasible/required, schedule a meeting with archival processing staff to review Pre-Ingest analysis and findings
Purpose, Context and Content
Pre-ingest is the technological review of digital records transferred to LAC. It involves technical appraisal using multiple software tools to automate the process. The goal of pre-ingest is to aid in creating a SIP that conforms to LAC’s preservation policies and that which LAC has a reasonable success of preserving in its repository and making accessible for the long-term.
There are two major tasks for pre-ingest:
- Weed any digital records that should not have transferred in the first place (i.e. computer files that are configuration, developmental, temporary, software files etc.).
- Identify file format or other issues that need addressing prior to or during appraisal/selection/description by archival staff.