Difference between revisions of "The DeDuplicator (Heritrix add-on module)"

From COPTR
Jump to navigation Jump to search
(Import from spreadsheet via script.)
Line 1: Line 1:
{{Infobox_tool
+
{{Infobox tool
 
|purpose=The DeDuplicator is an add-on module for Heritrix to reduce the amount of duplicate data collected in a series of snapshot crawls.
 
|purpose=The DeDuplicator is an add-on module for Heritrix to reduce the amount of duplicate data collected in a series of snapshot crawls.
|image=
+
|homepage=http://landsbokasafn.github.io/DeDuplicator/
|homepage=http://deduplicator.sourceforge.net/
+
|function=Web Crawl, De-Duplication
|license=
+
|content=Web
|platforms=
+
}}
 +
{{Infobox tool details
 +
|ohloh_id=The DeDuplicator (Heritrix add-on module)
 
}}
 
}}
 
<!-- Delete the Categories that do not apply -->
 
[[Category:Web Crawl]]
 
[[Category:De-Duplication]]
 
 
 
 
= Description =
 
= Description =
 
The DeDuplicator is an add-on module for Heritrix to reduce the amount of duplicate data collected in a series of snapshot crawls.  
 
The DeDuplicator is an add-on module for Heritrix to reduce the amount of duplicate data collected in a series of snapshot crawls.  
Line 19: Line 15:
  
 
= Development Activity =
 
= Development Activity =
 
{{Infobox_tool_details
 
|ohloh_id=The DeDuplicator (Heritrix add-on module)
 
}}
 

Revision as of 12:14, 21 April 2021





The DeDuplicator is an add-on module for Heritrix to reduce the amount of duplicate data collected in a series of snapshot crawls.
Homepage:http://landsbokasafn.github.io/DeDuplicator/
Function:Web Crawl,De-Duplication
Content type:Web
Error in widget Ohloh Project: unable to write file /var/www/html/extensions/Widgets/compiled_templates/wrt6623139d77a5f2_13060831


Description

The DeDuplicator is an add-on module for Heritrix to reduce the amount of duplicate data collected in a series of snapshot crawls.

User Experiences

Development Activity