Jump to: navigation, search

Command line tools and libraries for handling and manipulating WARC files (and HTTP contents)
Source Code:https://github.com/internetarchive/warctools/
License:MIT License
Input Formats:WARC, ARC (Internet Archive)
Output Formats:WARC


[edit] Description

This is the most current and well-maintained Python codebase for working with WARC files. It provides a number of command-line tools for common WARC/ARC operations, and can also act as a library to create or work with WARC files directly from Python.

Pull requests and releases are currently managed by Thomas Figg, who can be contacted via Twitter.

[edit] Older Python WARC Implementations

This codebase was initially funded by IIPC and developed by Hanzo Archives. This lead to the hanzo-warc-tools package and source code.

There is also a separate warc package that was created by the Internet Archive (see source code), but is no longer in use.

Both of these projects are defunct and are now superseded by the internetarchive/warctools project.

[edit] User Experiences

[edit] Development Activity

[edit] Releases

2016-09-01 22:39:45
[tag:github.com,2008:Repository/8960735/4.10.0 4.10.0]
by nlevitt
2012-11-29 13:31:13
[tag:github.com,2008:Repository/8960735/4.15-rc1 4.15-rc1]
by lekash
2012-09-14 15:18:43
[tag:github.com,2008:Repository/8960735/build_success-2012-09-14T16-25-56.483325901 build_success-2012-09-14T16-25-56.483325901]
by SteveJones
2012-09-14 13:27:40
[tag:github.com,2008:Repository/8960735/build_success-2012-09-14T15-24-42.616660024 build_success-2012-09-14T15-24-42.616660024]
by SteveJones
2012-06-29 13:24:01
[tag:github.com,2008:Repository/8960735/4.7 4.7]
by SteveJones

[edit] Development

2016-09-08 17:41:45
[tag:github.com,2008:Grit::Commit/2f9ea1d8babba95ad17ab9de754e04d6d7c92a24 Merge pull request #19 from DonRichards/patch-1]
by nlevitt https://github.com/nlevitt
2016-09-08 17:14:51
[tag:github.com,2008:Grit::Commit/fcd28f5b8191758da1644c576b05fe0062e76d03 Normalizing '4.10.1dev2']
by DonRichards https://github.com/DonRichards
2016-09-02 16:07:54
[tag:github.com,2008:Grit::Commit/b3b76c82c220cf551ade60fb1bdba773385aa192 back to dev version number]
by nlevitt https://github.com/nlevitt
2016-09-01 22:39:45
[tag:github.com,2008:Grit::Commit/e70feb2a362e23e00a7f36747310fa7ce758c0d1 set version=4.10.0 for push to pypi]
by nlevitt https://github.com/nlevitt
2016-06-27 23:37:28
[tag:github.com,2008:Grit::Commit/0878870e1e620ba07b0b6a0ef5ef39f9714a87ba allow failures for python 3.5 and nightly, since they fail now (to be…]
by nlevitt https://github.com/nlevitt


Andy Jackson (100.0%)