The DeDuplicator (Heritrix add-on module)
Jump to navigation
Jump to search
Description
The DeDuplicator is an add-on module for Heritrix to reduce the amount of duplicate data collected in a series of snapshot crawls.