Difference between revisions of "Heritrix"

From COPTR
Jump to navigation Jump to search
(Trial import from script.)
 
(Trial import from script.)
Line 19: Line 19:
  
 
= Development Activity =
 
= Development Activity =
 +
 +
{{Infobox_tool_details
 +
|ohloh_id=Heritrix
 +
}}

Revision as of 17:38, 12 November 2013

Heritrix is a flexible, extensible, robust, and scalable Web crawler capable of fetching, archiving, and analyzing Internet-accessible content.
Homepage:http://crawler.archive.org
License:GNU Lesser General Public License 2.1
Platforms:Written in Java. Must have Java Runtime Environment (JRE, http://www.java.com/en/download/index.jsp) and at least Java version 5.0 installed. Default heap size is 256MB RAM.
Appears in COW:Quality Assurance: Iterative Seed Issue Decision Tree, Web Archiving Quality Assurance (QA) Workflow, Web Archiving Quality Assurance Lifecycle


Description

Heritrix is a flexible, extensible, robust, and scalable Web crawler capable of fetching, archiving, and analyzing Internet-accessible content. Developed by Internet Archive. Written in Java.

User Experiences

Development Activity

Error in widget Ohloh Project: unable to write file /var/www/html/extensions/Widgets/compiled_templates/wrt673fb7057fe1b5_42163807