Difference between revisions of "The DeDuplicator (Heritrix add-on module)"

From COPTR
Jump to navigation Jump to search
(Trial import from script.)
 
(2 intermediate revisions by 2 users not shown)
Line 1: Line 1:
{{Infobox_tool
+
{{Infobox tool
|purpose= The DeDuplicator is an add-on module for Heritrix to reduce the amount of duplicate data collected in a series of snapshot crawls.
+
|purpose=The DeDuplicator is an add-on module for Heritrix to reduce the amount of duplicate data collected in a series of snapshot crawls.
|image=
+
|homepage=http://landsbokasafn.github.io/DeDuplicator/
|homepage= http://deduplicator.sourceforge.net/
+
|function=De-Duplication, Web Capture
|license=
+
|content=Web
|platforms=
+
}}
 +
{{Infobox tool details
 +
|ohloh_id=The DeDuplicator (Heritrix add-on module)
 
}}
 
}}
 
<!-- Delete the Categories that do not apply -->
 
[[Category:Web Crawl]]
 
[[Category:De-Duplication]]
 
 
 
 
= Description =
 
= Description =
 
The DeDuplicator is an add-on module for Heritrix to reduce the amount of duplicate data collected in a series of snapshot crawls.  
 
The DeDuplicator is an add-on module for Heritrix to reduce the amount of duplicate data collected in a series of snapshot crawls.  
Line 19: Line 15:
  
 
= Development Activity =
 
= Development Activity =
 
{{Infobox_tool_details
 
|ohloh_id=The DeDuplicator (Heritrix add-on module)
 
}}
 

Latest revision as of 16:32, 26 November 2021





The DeDuplicator is an add-on module for Heritrix to reduce the amount of duplicate data collected in a series of snapshot crawls.
Homepage:http://landsbokasafn.github.io/DeDuplicator/
Function:De-Duplication,Web Capture
Content type:Web


Description[edit]

The DeDuplicator is an add-on module for Heritrix to reduce the amount of duplicate data collected in a series of snapshot crawls.

User Experiences[edit]

Development Activity[edit]