The DeDuplicator (Heritrix add-on module)

From COPTR
Jump to navigation Jump to search




The DeDuplicator is an add-on module for Heritrix to reduce the amount of duplicate data collected in a series of snapshot crawls.
Homepage:http://landsbokasafn.github.io/DeDuplicator/
Function:Web Crawl,De-Duplication
Content type:Web


Description[edit]

The DeDuplicator is an add-on module for Heritrix to reduce the amount of duplicate data collected in a series of snapshot crawls.

User Experiences[edit]

Development Activity[edit]