De-Duplication

From COPTR
Revision as of 10:56, 20 April 2021 by Rcdeboer (talk | contribs) (Created page with "{{Infobox function |definition=Tools that enable the identification and/or removal of duplicate or similar files. |stage=Preservation Action }} For some guidance on approaches...")
(diff) ← Older revision | Latest revision (diff) | Newer revision → (diff)
Jump to navigation Jump to search
Function definition: Tools that enable the identification and/or removal of duplicate or similar files.
Lifecycle stage: Preservation Action

Tools for this function

ToolPurpose
AllDupA brief description
Autopsy Digital ForensicsOpen source, free digital forensics tool
CloneSpyFinds duplicates, also precisely, deletes with rules.
CzkawkaCzkawka is a simple, fast and free app to remove unnecessary files from your computer.
DemystifyFormat Identification Analysis and Reporting
Double CommanderOpen source file manager with two panels side by side
DupeGuruA brief description
EmailchemyConverts proprietary emails to standard portable formats
FileVerifier++Windows utility for verifying file contents
FolderMatchCompares two directory trees and flags up duplicates
FreeCommanderSplit-screen file manager with desirable extras
FslintSet of utilities to find and clean various forms of lint on a filesystem, such as duplicate files, empty directories, and bad file names.
GNU DiffutilsGNU Diffutils is a package of several programs related to finding differences between files.
Java library implementing PairtreeThe PAIRTREE LIBRARY is a software library that supports the mapping between identifiers and filepaths according to the Pairtree Specification.
Matchbox ToolMatchbox: Duplicate detection tool for digital document collections.
NT (New Tool)This is a presrvation tool
SSDeepRecursive piecewise hashing tool
Sumfolder1sumfolder1 is a utility for use within the archival and digital preservation community to generate checksums for file system directories, and to generate an overall "collection" checksum for a given set of files. The utility may be used in support of de-duplication at a directory/folder level.
The DeDuplicator (Heritrix add-on module)The DeDuplicator is an add-on module for Heritrix to reduce the amount of duplicate data collected in a series of snapshot crawls.
WinMergeA visual tool for differencing and merging of file collections, images and texts.
XcorrSoundThe xcorrSound package compares sound waves using cross correlation.

For some guidance on approaches to de-duplication see: