Proposed by Andrew N. Jackson for the Preservation theme of the IIPC GA in 2015.
DigiPresHack@IIPC-GA-2015: Formats & Tools Hackathon
To effectively capture and preserve the web, we need to understand the formats and protocols of the web, and the tools that can be used to manage them over time. This need has manifested itself via a range of digital preservation tool and format registries and test corpora, but so far these have only represented a partial success. Many registries have been developed but have failed to take hold, and those that have succeeded are those that have sought to identify, support and recognised those individuals willing to spend time contributing their effort and knowledge.
The idea of this DigiPresHack is to support those who wish to contribute in this area by providing a supportive environment and a clear framework for contribution. The hackathon format would be a one-day workshop in the ‘unconference’ style. Suggested activities would include:
- Generating example test files for various formats, e.g.
- WARC and ARC files demonstrating the different de-duplication methods
- HTML files demonstrating particular features, ideally accompanied by screenshots to capture the results (e.g. using emulators for old browsers).
- Extending the Archival Acid Test suite: https://github.com/machawk1/archivalAcidTest
- Extending the PWG database, possibly combining it with the newly-developed PET tools (https://github.com/pericles-project/pet)
- Review and/or add web archiving tool information to COPTR (http://coptr.digipres.org/)
- Document difficult or particularly interesting/challenging formats (http://fileformats.archiveteam.org)
- Extend the aggregations and visualisations at http://www.digipres.org/ in order to be able to see how far we’ve come.
To go ahead, this hackathon would require some additional funding to bring in appropriate individuals who could facilitate this event and who would not otherwise be able to attend. If possible, modest prizes for significant contributions could help build momentum. Ideally, we could use a webcast/hangout or similar to enable engagement by those who cannot attend.