Difference between revisions of "Workflow:LAC Pre-Ingest Workflow"
Jump to navigation
Jump to search
(17 intermediate revisions by 3 users not shown) | |||
Line 3: | Line 3: | ||
|tools=7-Zip, DROID (Digital Record Object Identification), Md5summer, Quick View Plus, Safecopy, TreeSize, fsum | |tools=7-Zip, DROID (Digital Record Object Identification), Md5summer, Quick View Plus, Safecopy, TreeSize, fsum | ||
|input=Born digital content transferred to Library and Archives Canada | |input=Born digital content transferred to Library and Archives Canada | ||
− | |output=Pre-ingest report | + | |output=Pre-ingest report, DROID report |
|organisation=Library and Archives Canada | |organisation=Library and Archives Canada | ||
}} | }} | ||
Line 10: | Line 10: | ||
<!-- To add an image of your workflow, open the "Upload File" link on the left in a new browser tab and follow on screen instructions, then return to this page and add the name of your uploaded image to the line below - replacing "workflow.png" with the name of your file. Replace the text "Textual description" with a short description of your image. Filenames are case sensitive! If you don't want to add a workflow diagram or other image, delete the line below --> | <!-- To add an image of your workflow, open the "Upload File" link on the left in a new browser tab and follow on screen instructions, then return to this page and add the name of your uploaded image to the line below - replacing "workflow.png" with the name of your file. Replace the text "Textual description" with a short description of your image. Filenames are case sensitive! If you don't want to add a workflow diagram or other image, delete the line below --> | ||
− | + | ||
<!-- Describe your workflow here with an overview of the different steps or processes involved--> | <!-- Describe your workflow here with an overview of the different steps or processes involved--> | ||
− | + | ||
− | + | #Ensure that content has a persistent identifier (e.g., Registration Control Number; barcode for containers where physical storage media are stored) | |
− | Ensure that | + | #Confirm security classification of records and infrastructure that is required for processing information at this level of classification |
− | Confirm security classification of records and infrastructure that is required for processing information at this level of classification | + | #Create a processing workspace on the Pre-ingest server (or classified infrastructure) that conforms to a standardized directory structure which segregates transferred digital objects from metadata created during processing. |
− | Create a processing workspace on the Pre-ingest server (or classified infrastructure) | + | #Ensure that all appropriate metadata and metadata generating templates are copied to the processing workspace (MET subfolder) |
− | Ensure that all appropriate metadata and metadata generating templates are copied to the processing workspace (MET subfolder) | + | #Complete a physical carrier inventory documenting all physical storage media involved in the transfer. Media must be numbered sequentially (e.g. "01" or "001"). |
− | Complete a physical carrier inventory documenting all physical storage media involved in the transfer. Media must be numbered sequentially (e.g. "01" or "001"). | + | #Review the Digital Processing Checklist and create a record/row for the content being processed. The Digital Proccesing Checklist serves as a master tracking sheet that documents information related to each Pre-Ingest project, indicates each step taken in the Pre-Ingest workflow, and where a project sits in the Pre-Ingest workflow. |
− | Review and | + | #Write protect physical media when possible |
− | Write protect physical media when possible | + | #Insert into drive/attach to computer/anti-virus software (LAC network network will automatically scan for viruses; if a virus is detected, do not copy the data and document the virus in the Pre-ingest Report |
− | Insert into drive/attach to computer/anti-virus software (LAC network network will automatically scan for viruses; if a virus is detected, do not copy the data and document the virus in the Pre-ingest Report | + | #Review file format/directory structure of physical media at a high level |
− | Review file format/directory structure of physical media at a high level | + | #If interactive CD/DVD content or authored AV content is detected, do not perform pre-ingest for the physical carriers containing that content. Follow workflow for authored AV content. Document this in the Physical Carrier Inventory spreadsheet |
− | If interactive CD/DVD content or authored AV content is detected, do not perform pre-ingest for the physical carriers containing that content. Follow workflow for authored AV content. Document this in the Physical Carrier Inventory spreadsheet | + | #Create sub-folders for each physical media in the processing workspace (on the Pre-ingest server). If a physical carrier is blank or unreadable, do not create a subfolder for it |
− | Create sub-folders for physical media on the Pre-ingest server. If a physical carrier is blank or unreadable, do not create a subfolder for it | + | #Copy the content to the Pre-ingest server using SafeCopy. If issues occur, troubleshoot via SafeCopy or other checksum software like Fsum or MD5summer |
− | Copy the content to the Pre-ingest server using SafeCopy. If issues occur, troubleshoot via SafeCopy or other checksum software like Fsum or MD5summer | + | #For any content that is unreadable, send to Digital Preservation for extraction. Record this in Physical Carrier Inventory |
− | For any content that is unreadable, send to Digital Preservation for extraction. Record this in Physical Carrier Inventory | + | #Use TreeSize Pro to create a digital object listing of all digital objects successfully copied to LAC infrastructure. TreeSize report should include fields for physical container number, file path, file name, file extension, file size, date of creation, and last modified date. |
− | Use TreeSize Pro to create a digital object listing of all digital objects successfully copied to LAC infrastructure | + | #Run DROID on the entire registration and create a DROID export (*.CSV) for all content |
− | Run DROID on the entire registration and create a DROID export (*.CSV) for all content | + | #Start populating the Pre-Ingest Report. Record Pre-Ingest volume, number of files, and number of folders |
− | Start populating the Pre-Ingest Report. Record | + | #Weed transitory objects (temporary files, system files, configuration files, Thumbs.db, program files) that are not required to render the information of long-term value. Weeded files should be moved to the disposition subfolder located under the OBJ sub-directory of the processing workspace. These files should not be deleted at this stage of processing . |
− | Weed transitory objects (temporary files, system files, configuration files, Thumbs.db, program files) that are not required to render the information of long-term value | + | #In the Pre-Ingest Report, record if viruses were detected as part of the virus scan |
− | In the Pre-Ingest Report, record if viruses were detected as part of the virus scan | + | #Non-authored AV material (i.e. unstructured AV material that has been copied to the Pre-Ingest server)can be processed in the same manner as other digital records. |
− | + | #Use TreeSize Pro to identify: | |
− | Non-authored AV material (i.e. unstructured AV material that has been copied to the Pre-Ingest server)can be processed in the same manner as other digital records. | + | #*duplicated records existing within the corpus. Note existence of duplicate records to be addressed as part of archival processing |
− | Use TreeSize Pro | + | #*container files. Extract any container files identified using software like WinZip |
+ | #*email file formats. Identified email files may require a stand alone workflow for processing | ||
+ | #*data file formats | ||
+ | #*database file formats | ||
+ | #*website file formats. Consult with the Web-Archives and Social Media Program | ||
+ | #*zero bytes files. Use TreeSize Pro to move zero byte files into disposition folder | ||
+ | #*long file paths | ||
+ | #*empty folders | ||
+ | #Triage the DROID report. | ||
+ | #Create a DROID report.xlsx, which can be filed in the package's metadata subfolder | ||
+ | #*color code formats in the spreadsheet according to the following categories as needed - for example: | ||
+ | #**Green = Ok as-is | ||
+ | #**Red = format unknown or non-standard: archivist to determine how to access prior preservation | ||
+ | #**Grey = ineligible for selection - file format preservation issue | ||
+ | #*refer to the Guidelines for Transferring Information Resources of Enduring Value for information on file formats already encountered during processing | ||
+ | #**these guidelines document LAC's formal and information file format policy decisions based on feedback and discussion with LAC Subject Matter Experts | ||
+ | #*further research may be required if unrecognized or unknown file formats are encountered | ||
+ | #Identify password protected and encrypted files | ||
+ | #*un software to identify files that are encrypted or password protection | ||
+ | #*create a report in CSV format that reflects the results of the analysis | ||
+ | # Record Pre-Ingest metrics | ||
+ | #*record post pre-ingest volume, number of files and folder in the Pre-Ingest Report | ||
+ | #Follow up with archival processing staff. Archival processing staff to provide selection, description and physical arrangement. | ||
+ | #*notify archival processing staff that pre-ingest is complete | ||
+ | #*provide hyperlink to Pre-Ingest and DROID reports or indicate they are in the MET subfolder of the processing workspace | ||
+ | #*provide hyperlink to Pre-Ingest Report or indicate it is sitting in the the metadata subfolder of the processing workspace | ||
+ | #*flag anything else that is noteworthy - especially file formats requiring research by archival staff | ||
+ | #*If feasible/required, schedule a meeting with archival processing staff to review Pre-Ingest analysis and findings | ||
+ | |||
+ | |||
+ | |||
+ | |||
+ | |||
Line 48: | Line 80: | ||
There are two major tasks for pre-ingest: | There are two major tasks for pre-ingest: | ||
− | Weed any digital records that should not have transferred in the first place (i.e. computer files that are configuration, developmental, temporary, software files etc.). | + | #Weed any digital records that should not have transferred in the first place (i.e. computer files that are configuration, developmental, temporary, software files etc.). |
− | Identify file format or other issues that need addressing prior to or during appraisal/selection/description by archival staff. | + | #Identify file format or other issues that need addressing prior to or during appraisal/selection/description by archival staff. |
Latest revision as of 19:23, 21 July 2021
Workflow Description[edit]
- Ensure that content has a persistent identifier (e.g., Registration Control Number; barcode for containers where physical storage media are stored)
- Confirm security classification of records and infrastructure that is required for processing information at this level of classification
- Create a processing workspace on the Pre-ingest server (or classified infrastructure) that conforms to a standardized directory structure which segregates transferred digital objects from metadata created during processing.
- Ensure that all appropriate metadata and metadata generating templates are copied to the processing workspace (MET subfolder)
- Complete a physical carrier inventory documenting all physical storage media involved in the transfer. Media must be numbered sequentially (e.g. "01" or "001").
- Review the Digital Processing Checklist and create a record/row for the content being processed. The Digital Proccesing Checklist serves as a master tracking sheet that documents information related to each Pre-Ingest project, indicates each step taken in the Pre-Ingest workflow, and where a project sits in the Pre-Ingest workflow.
- Write protect physical media when possible
- Insert into drive/attach to computer/anti-virus software (LAC network network will automatically scan for viruses; if a virus is detected, do not copy the data and document the virus in the Pre-ingest Report
- Review file format/directory structure of physical media at a high level
- If interactive CD/DVD content or authored AV content is detected, do not perform pre-ingest for the physical carriers containing that content. Follow workflow for authored AV content. Document this in the Physical Carrier Inventory spreadsheet
- Create sub-folders for each physical media in the processing workspace (on the Pre-ingest server). If a physical carrier is blank or unreadable, do not create a subfolder for it
- Copy the content to the Pre-ingest server using SafeCopy. If issues occur, troubleshoot via SafeCopy or other checksum software like Fsum or MD5summer
- For any content that is unreadable, send to Digital Preservation for extraction. Record this in Physical Carrier Inventory
- Use TreeSize Pro to create a digital object listing of all digital objects successfully copied to LAC infrastructure. TreeSize report should include fields for physical container number, file path, file name, file extension, file size, date of creation, and last modified date.
- Run DROID on the entire registration and create a DROID export (*.CSV) for all content
- Start populating the Pre-Ingest Report. Record Pre-Ingest volume, number of files, and number of folders
- Weed transitory objects (temporary files, system files, configuration files, Thumbs.db, program files) that are not required to render the information of long-term value. Weeded files should be moved to the disposition subfolder located under the OBJ sub-directory of the processing workspace. These files should not be deleted at this stage of processing .
- In the Pre-Ingest Report, record if viruses were detected as part of the virus scan
- Non-authored AV material (i.e. unstructured AV material that has been copied to the Pre-Ingest server)can be processed in the same manner as other digital records.
- Use TreeSize Pro to identify:
- duplicated records existing within the corpus. Note existence of duplicate records to be addressed as part of archival processing
- container files. Extract any container files identified using software like WinZip
- email file formats. Identified email files may require a stand alone workflow for processing
- data file formats
- database file formats
- website file formats. Consult with the Web-Archives and Social Media Program
- zero bytes files. Use TreeSize Pro to move zero byte files into disposition folder
- long file paths
- empty folders
- Triage the DROID report.
- Create a DROID report.xlsx, which can be filed in the package's metadata subfolder
- color code formats in the spreadsheet according to the following categories as needed - for example:
- Green = Ok as-is
- Red = format unknown or non-standard: archivist to determine how to access prior preservation
- Grey = ineligible for selection - file format preservation issue
- refer to the Guidelines for Transferring Information Resources of Enduring Value for information on file formats already encountered during processing
- these guidelines document LAC's formal and information file format policy decisions based on feedback and discussion with LAC Subject Matter Experts
- further research may be required if unrecognized or unknown file formats are encountered
- color code formats in the spreadsheet according to the following categories as needed - for example:
- Identify password protected and encrypted files
- un software to identify files that are encrypted or password protection
- create a report in CSV format that reflects the results of the analysis
- Record Pre-Ingest metrics
- record post pre-ingest volume, number of files and folder in the Pre-Ingest Report
- Follow up with archival processing staff. Archival processing staff to provide selection, description and physical arrangement.
- notify archival processing staff that pre-ingest is complete
- provide hyperlink to Pre-Ingest and DROID reports or indicate they are in the MET subfolder of the processing workspace
- provide hyperlink to Pre-Ingest Report or indicate it is sitting in the the metadata subfolder of the processing workspace
- flag anything else that is noteworthy - especially file formats requiring research by archival staff
- If feasible/required, schedule a meeting with archival processing staff to review Pre-Ingest analysis and findings
Purpose, Context and Content[edit]
Pre-ingest is the technological review of digital records transferred to LAC. It involves technical appraisal using multiple software tools to automate the process. The goal of pre-ingest is to aid in creating a SIP that conforms to LAC’s preservation policies and that which LAC has a reasonable success of preserving in its repository and making accessible for the long-term.
There are two major tasks for pre-ingest:
- Weed any digital records that should not have transferred in the first place (i.e. computer files that are configuration, developmental, temporary, software files etc.).
- Identify file format or other issues that need addressing prior to or during appraisal/selection/description by archival staff.