Difference between revisions of "Warctools"

From COPTR
Jump to navigation Jump to search
Line 1: Line 1:
<!-- Use the structure provided in this template, do not change it! -->
+
{{Infobox tool
 
 
{{Infobox_tool
 
 
|purpose=Command line tools and libraries for handling and manipulating WARC files (and HTTP contents)
 
|purpose=Command line tools and libraries for handling and manipulating WARC files (and HTTP contents)
 
|homepage=https://pypi.python.org/pypi/warctools/
 
|homepage=https://pypi.python.org/pypi/warctools/
|sourcecode=https://github.com/internetarchive/warctools/  
+
|sourcecode=https://github.com/internetarchive/warctools/
 
|license=MIT License
 
|license=MIT License
 
|platforms=Cross-platform
 
|platforms=Cross-platform
Line 10: Line 8:
 
|formats_in={{Format|WARC}}, {{Format|ARC (Internet Archive)}}
 
|formats_in={{Format|WARC}}, {{Format|ARC (Internet Archive)}}
 
|formats_out={{Format|WARC}}
 
|formats_out={{Format|WARC}}
 +
|function=Metadata Extraction, Validation, File Format Migration
 +
|content=Web
 +
}}
 +
{{Infobox tool details
 +
|ohloh_id=warctools
 
}}
 
}}
 
<!-- Add one ore more categories to describe the function of the tool. Choose carefully, and view the list of existing categories first (see the Navigation sidebar on the left). The following are common category examples, remove those that don't apply -->
 
[[Category:Metadata Extraction]]
 
[[Category:Validation]]
 
[[Category:ARC To WARC Migration]]
 
 
<!-- Add relevant categories to describe the content type that the tool addresses. Choose carefully, and view the list of existing categories first (see the Navigation sidebar on the left). If the tool works on any content type, do not add a category. The following are common category examples, remove those that don't apply -->
 
[[Category:Web]]
 
 
 
== Description ==
 
== Description ==
 
<!-- Describe the what the tool does, focusing on it's digital preservation value. Keep it factual. -->
 
<!-- Describe the what the tool does, focusing on it's digital preservation value. Keep it factual. -->
Line 47: Line 41:
 
=== Development ===
 
=== Development ===
 
<!-- Add the Ohloh.com ID for the tool, if known. -->
 
<!-- Add the Ohloh.com ID for the tool, if known. -->
{{Infobox_tool_details
+
 
|ohloh_id=warctools
 
}}
 
  
 
<rss max=5>https://github.com/internetarchive/warctools/commits/master.atom</rss>
 
<rss max=5>https://github.com/internetarchive/warctools/commits/master.atom</rss>

Revision as of 14:19, 21 April 2021

Command line tools and libraries for handling and manipulating WARC files (and HTTP contents)
Homepage:https://pypi.python.org/pypi/warctools/
Source Code:https://github.com/internetarchive/warctools/
License:MIT License
Platforms:Cross-platform
Language:Python
Input Formats:Template:Format, Template:Format
Output Formats:Template:Format
Function:Metadata Extraction,Validation,File Format Migration
Content type:Web


Error in widget Ohloh Project: unable to write file /var/www/html/extensions/Widgets/compiled_templates/wrt66323050ee3794_12081349


Description

This is the most current and well-maintained Python codebase for working with WARC files. It provides a number of command-line tools for common WARC/ARC operations, and can also act as a library to create or work with WARC files directly from Python.

Pull requests and releases are currently managed by Thomas Figg, who can be contacted via Twitter.

Older Python WARC Implementations

This codebase was initially funded by IIPC and developed by Hanzo Archives. This lead to the hanzo-warc-tools package and source code.

There is also a separate warc package that was created by the Internet Archive (see source code), but is no longer in use.

Both of these projects are defunct and are now superseded by the internetarchive/warctools project.

User Experiences

Development Activity

Releases

2016-09-01 22:39:45
[tag:github.com,2008:Repository/8960735/4.10.0 4.10.0]
by nlevitt
2012-11-29 13:31:13
[tag:github.com,2008:Repository/8960735/4.15-rc1 4.15-rc1]
by lekash
2012-09-14 15:18:43
[tag:github.com,2008:Repository/8960735/build_success-2012-09-14T16-25-56.483325901 build_success-2012-09-14T16-25-56.483325901]
by SteveJones
2012-09-14 13:27:40
[tag:github.com,2008:Repository/8960735/build_success-2012-09-14T15-24-42.616660024 build_success-2012-09-14T15-24-42.616660024]
by SteveJones
2012-06-29 13:24:01
[tag:github.com,2008:Repository/8960735/4.7 4.7]
by SteveJones

Development

2019-11-20 23:07:26
[tag:github.com,2008:Grit::Commit/a86b3f404cc61b89bfd39d44d4da33f8412a35f9 Merge pull request #26 from internetarchive/siznax/better-setup-py]
by nlevitt https://github.com/nlevitt
2019-11-20 23:06:51
[tag:github.com,2008:Grit::Commit/de0612faef9b63af24a1310719bd4330ae3241b2 Merge pull request #24 from internetarchive/siznax/update-README]
by nlevitt https://github.com/nlevitt
2019-11-20 22:58:22
[tag:github.com,2008:Grit::Commit/38a7172dddec86a40b9b5c2236fbfd01d7ce380c Restored original author_, added maintainer_ to setup.py]
by steve@archive.org
2019-11-20 22:09:41
[tag:github.com,2008:Grit::Commit/2f2a4101db853e9e594684abdf2b38d5d07908df Made author Internet Archive, added classifiers, etc.]
by steve@archive.org
2019-11-20 21:27:06
[tag:github.com,2008:Grit::Commit/ffaeccdf23d36ff4aeae5d7f6df55cdefbef788d Markdown formatted README, added python3 WARC writing example and usage.]
by steve@archive.org