Difference between revisions of "The DeDuplicator (Heritrix add-on module)"

From COPTR
Jump to navigation Jump to search
 
Line 2: Line 2:
 
|purpose=The DeDuplicator is an add-on module for Heritrix to reduce the amount of duplicate data collected in a series of snapshot crawls.
 
|purpose=The DeDuplicator is an add-on module for Heritrix to reduce the amount of duplicate data collected in a series of snapshot crawls.
 
|homepage=http://landsbokasafn.github.io/DeDuplicator/
 
|homepage=http://landsbokasafn.github.io/DeDuplicator/
|function=Web Crawl, De-Duplication
+
|function=De-Duplication, Web Capture
 
|content=Web
 
|content=Web
 
}}
 
}}

Latest revision as of 16:32, 26 November 2021





The DeDuplicator is an add-on module for Heritrix to reduce the amount of duplicate data collected in a series of snapshot crawls.
Homepage:http://landsbokasafn.github.io/DeDuplicator/
Function:De-Duplication,Web Capture
Content type:Web


Description[edit]

The DeDuplicator is an add-on module for Heritrix to reduce the amount of duplicate data collected in a series of snapshot crawls.

User Experiences[edit]

Development Activity[edit]