The DeDuplicator is an add-on module for Heritrix to reduce the amount of duplicate data collected in a series of snapshot crawls. | |
Homepage: | http://landsbokasafn.github.io/DeDuplicator/ |
Function: | De-Duplication,Web Capture |
Content type: | Web |
The DeDuplicator is an add-on module for Heritrix to reduce the amount of duplicate data collected in a series of snapshot crawls.
Contributors: COPTR Bot, Prwheatley