Revision as of 10:56, 20 April 2021 by Rcdeboer (talk | contribs) (Created page with "{{Infobox function |definition=Tools that enable the identification and/or removal of duplicate or similar files. |stage=Preservation Action }} For some guidance on approaches...")
(diff) ← Older revision | Latest revision (diff) | Newer revision → (diff)
Jump to navigation Jump to search
Function definition: Tools that enable the identification and/or removal of duplicate or similar files.
Lifecycle stage: Preservation Action

Tools for this function

AllDupA brief description
Autopsy Digital ForensicsOpen source, free digital forensics tool
CloneSpyFinds duplicates, also precisely, deletes with rules.
DROID Siegfried Sqlite Analysis EngineAnalysis and automatic generation of summary information from DROID output
Double CommanderOpen source file manager with two panels side by side
DupeGuruA brief description
EmailchemyConverts proprietary emails to standard portable formats
FileVerifier++Windows utility for verifying file contents
FolderMatchCompares two directory trees and flags up duplicates
FreeCommanderSplit-screen file manager with desirable extras
FslintSet of utilities to find and clean various forms of lint on a filesystem, such as duplicate files, empty directories, and bad file names.
GNU DiffutilsGNU Diffutils is a package of several programs related to finding differences between files.
Java library implementing PairtreeThe PAIRTREE LIBRARY is a software library that supports the mapping between identifiers and filepaths according to the Pairtree Specification.
Matchbox ToolMatchbox: Duplicate detection tool for digital document collections.
SSDeepRecursive piecewise hashing tool
The DeDuplicator (Heritrix add-on module)The DeDuplicator is an add-on module for Heritrix to reduce the amount of duplicate data collected in a series of snapshot crawls.
WinMergeA visual tool for differencing and merging of file collections, images and texts.
XcorrSoundThe xcorrSound package compares sound waves using cross correlation.

For some guidance on approaches to de-duplication see: