https://coptr.digipres.org/api.php?action=feedcontributions&user=Nullhandle&feedformat=atomCOPTR - User contributions [en-gb]2024-03-28T18:02:58ZUser contributionsMediaWiki 1.35.14https://coptr.digipres.org/index.php?title=WCT_(Web_Curator_Tool)&diff=2829WCT (Web Curator Tool)2016-03-02T17:14:40Z<p>Nullhandle: </p>
<hr />
<div>{{Infobox_tool<br />
|purpose=Web Curator Tool (WCT) is a workflow management application for selective web archiving.<br />
|image=<br />
|homepage=http://webcurator.sourceforge.net/<br />
|license=[http://www.apache.org/licenses/LICENSE-2.0 Apache License 2.0]<br />
|platforms=Apache Tomcat<br />
}}<br />
<br />
<!-- Delete the Categories that do not apply --><br />
[[Category:Metadata Processing]]<br />
[[Category:Web]]<br />
[[Category:Web Crawl]]<br />
<br />
= Description =<br />
The [http://webcurator.sourceforge.net/ Web Curator Tool] (WCT) is a workflow management application for selective web archiving. WCT allows users to target websites that they wish to include in their collection, create and manage schedules to automatically harvest those sites, and package the collected files to easily submit them to a digital archive.<br />
====Provider====<br />
Developed by the National Library of New Zealand and the British Library, initiated by the International Internet Preservation Consortium. Currently maintained by Oakleigh Consulting Ltd.<br />
====Licensing and cost====<br />
[http://www.apache.org/licenses/LICENSE-2.0 Apache License] &ndash; free.<br />
====Development activity====<br />
No information is available on the current funding status for development, although the SourceForge site&rsquo;s bugtracker continues to list new entries and responses. WCT encourages developer participation, publishing a Developers Guide with the latest release.<br />
====Platform and interoperability====<br />
WCT was written in Java and designed to run in Apache Tomcat. It has been tested on Red Hat Linux, Solaris, and (to a lesser extent) Microsoft Windows. Three relational databases are officially supported: Oracle, MySQL and PostgreSQL. The software itself makes use of part or all of several other open-source components, including: Heritrix; Wayback; Acegi Security System; Apache Axis; Apache Commons Logging; Hibernate; Quartz; and Spring Application Framework.<br />
====Functional notes====<br />
An important functionality of the software is the ability to collect, store, and abide by harvest authorisations, i.e. permissions to download from the copyright holders. WCT also creates separate administrative levels, so that those who set up the harvests do not necessarily have the authority to have the system actively begin them. All material is captured in ARC format; since WCT incorporates Wayback, access within the system is not a problem. However, those who collect material with the ultimate goal of archiving it in a separate system must take the format into account.<br />
====Documentation and user support====<br />
The project site includes a well written quick-start guide and [http://webcurator.sourceforge.net/docs/1.5.2/Web%20Curator%20Tool%20User%20Manual%20(WCT%201.5.2).pdf user manual], although the manual includes heading sections missing the corresponding text. The site also includes a developer guide, published with release version 1.5.2. Links to the advertised wiki and FAQ sections are currently broken, forwarding to the sourceforge developer page instead. &nbsp;More information about the project can be found in an informative [http://www.ariadne.ac.uk/issue50/beresford/ article] published in Ariadne Issue 50. The primary forum for technical support appears to be an active &ldquo;webcurator-users&rdquo; mailing list. While bugs continue to be posted on the SourceForge bug/ feature request tracker, the last addressed item was in February 2011.<br />
====Usability====<br />
WCT is specifically designed to be operated by non-technical users such as librarians, with a simple and relatively intuitive GUI. Installation, however, is difficult and most likely requires tech support.<br />
====Expertise required====<br />
Setup, especially if it includes links to an archival repository, requires system administration knowledge. Users must have a comprehensive understanding of their institution&rsquo;s collections policies when designing harvests.<br />
====Standards compliance====<br />
WCT allows users to add basic Dublin Core metadata to the material.<br />
====Influence and take-up====<br />
WCT is used by the National Library of New Zealand, the National Library of Norway, and the British Library. The SourceForge site lists nearly 8,300 downloads as of December 2011.<br />
<br />
= User Experiences =<br />
<br />
<br />
= Development Activity =<br />
<br />
{{Infobox_tool_details<br />
|releases_rss=https://github.com/DIA-NZ/webcurator/commits/master.atom<br />
|issues_rss=https://sourceforge.net/p/webcurator/legacy-bugs/feed.rss<br />
|mailing_lists=http://webcurator.sourceforge.net/mailinglists.shtml<br />
|ohloh_id=WCT (Web Curator Tool)<br />
}}</div>Nullhandlehttps://coptr.digipres.org/index.php?title=NetarchiveSuite&diff=2828NetarchiveSuite2016-03-02T17:06:21Z<p>Nullhandle: </p>
<hr />
<div>{{Infobox_tool<br />
|purpose=NetarchiveSuite is a web archiving software package designed to plan, schedule and run web harvests of parts of the Internet.<br />
|image=NAS.gif<br />
|homepage=http://netarchive.dk/suite/Welcome<br />
|license=[http://www.gnu.org/licenses/lgpl.html#translations GNU Lesser General Public License]<br />
|platforms=<br />
}}<br />
<br />
<!-- Delete the Categories that do not apply --><br />
[[Category:Web Crawl]]<br />
[[Category:Web]]<br />
<br />
<br />
= Description =<br />
The [https://sbforge.org/display/NAS/NetarchiveSuite NetarchiveSuite] is a web archiving software package designed to plan, schedule and run web harvests of parts of the Internet. It is also serves as an archiving platform for the collected materials, storing and performing consistency checks of the data.<br />
<br />
====Provider====<br />
The Royal Library of Denmark and the State and University Library of Denmark<br />
====Licensing and cost====<br />
[http://www.gnu.org/licenses/lgpl.html#translations GNU Lesser General Public License] &ndash; free.<br />
====Development activity====<br />
Version 4.1 was released 27th May 2013. Version 4.2 should be released at the end of June 2013.<br />
The software appears to be an integral part of the Libraries&rsquo; ongoing web archiving effort, indicating continuing support. The website includes a roadmap for the software.<br />
====Platform and interoperability====<br />
Netarchive requires a computer running a Linux operating system with Sun Java 1.6, as well as Java Messaging Service. The software uses Heritrix as its web crawler.<br />
====Functional notes====<br />
The NetarchiveSuite is split into four main modules: three modules corresponding to processes of harvesting, archiving and accessing materials, and one module to coordinate functions. The Harvester module can organise both snapshot and recurring harvests; it supports packaging metadata about the harvest together with the harvested data. The Archive module advertises bit consistency checks and the ability to support distributed batch jobs; it also supports storage. The Access Module uses a proxy solution to give access to the material. As the software uses Heritrix for its crawls, the materials collected take the form of arc files (not to be confused with ARC files).<br />
====Documentation and user support====<br />
Netarchive&rsquo;s website includes extensive [https://sbforge.org/display/NAS/Documentation documentation], including an Overview and Quick Start Manual. Detailed guidance is found in the Configuration, Installation, and User Manuals. Developer guidance is found in the System Design and Additional Tools Manual. The website also points to a new Wiki that as of writing is in some places incomplete, and in general rather difficult to navigate. The project supports four mailing lists, all of which are currently active: -announce; -curator; -devel; and -users. The site also includes a contact page with email addresses for individuals at KB, SB, BnF, and ONB.<br />
====Usability====<br />
The Suite advertises its ability to be used by librarians rather than systems administrators; its Quickstart installation option is designed to take an hour, and the GUI for ongoing use is extremely simple.<br />
====Expertise required====<br />
With any web archiving project, deep understanding of the project&rsquo;s scope and collections policy is essential in order to set up appropriate targets.<br />
====Standards compliance====<br />
No standards compliance is obviously advertised.<br />
====Influence and take-up====<br />
The Royal Library of Denmark and the State and University Library have used NetarchiveSuite to harvest the Danish world wide web since 2005. The software is also used by the Bibliothèque nationale de France, and Österreichische Nationalbibliothek.<br />
<br />
<br />
= User Experiences =<br />
<br />
<br />
= Development Activity =<br />
<br />
{{Infobox_tool_details<br />
|releases_rss=https://github.com/netarchivesuite/netarchivesuite/commits/master.atom<br />
|ohloh_id=NetarchiveSuite<br />
}}</div>Nullhandlehttps://coptr.digipres.org/index.php?title=Heritrix&diff=2827Heritrix2016-03-02T17:06:09Z<p>Nullhandle: </p>
<hr />
<div>{{Infobox_tool<br />
|purpose=Heritrix is an open-source web crawler, allowing users to target websites they wish to include in a collection and to harvest an instance of each site.<br />
|image=<br />
|homepage=http://crawler.archive.org<br />
|license=GNU Lesser General Public License 2.1<br />
|platforms=Written in Java. Must have Java Runtime Environment (JRE, http://www.java.com/en/download/index.jsp) and at least Java version 5.0 installed. Default heap size is 256MB RAM.<br />
}}<br />
<br />
<!-- Delete the Categories that do not apply --><br />
[[Category:Web Crawl]]<br />
[[Category:Web]]<br />
<br />
<br />
= Description =<br />
[https://webarchive.jira.com/wiki/display/Heritrix/Heritrix Heritrix] is an open-source web crawler, allowing users to target websites they wish to include in a collection and to harvest an instance of each site. The software is most often used as a powerful back-end tool incorporated into a web archiving workflow.<br />
====Provider====<br />
Internet Archive<br />
====Licensing and cost====<br />
[http://www.apache.org/licenses/LICENSE-2.0.html Apache License, Version 2.0] &ndash; free. Some individual source code files are subject to or offered under other licenses.<br />
====Development activity====<br />
Version 3.1.1 was released in May 2012.<br />
Heritrix powers the Internet Archive, and so receives ongoing support.<br />
====Platform and interoperability====<br />
As a Java application, Heritrix is theoretically platform agnostic; however, only Linux is supported.&nbsp; The software requires Java Runtime Environment 1.6 or higher, and at least 256MB of available RAM.<br />
====Functional notes====<br />
Web crawls are carried out by configuring a &lsquo;job,&rsquo; which itself is an instance of a crawl template called a &lsquo;profile.&rsquo; Although they contain the same configurations, these two entities have different functions; profiles record the set of configurations and act as a starting point for shaping a new job, but only the job itself can excecute a crawl.<br />
The software will crawl FTP sites in addition to HTTP. Users can examine the results of a crawl by opening its log files, which include information about crawl problems and errors, each URI that was collected, and statistics about the job as a whole. Users can also create reports showing a summary of the crawl&rsquo;s activity.<br />
Heritrix stores the web resources it crawls in an Arc file. The software includes a command-line tool called arcreader which can be used to extract the contents.<br />
====Documentation and user support====<br />
The [https://webarchive.jira.com/wiki/display/Heritrix/Heritrix+3.0+and+3.1+User+Guide User Guide for versions 3.0 and 3.1] is in the form of a wiki, which at time of writing is not structured in any obvious narrative order; while detailed, it is very difficult to navigate.&nbsp; The [http://crawler.archive.org/articles/user_manual/ User Manual for version 2.0]&nbsp;is structured and can be used as a reference for navigation.&nbsp; Extensive documentation is available, including release notes, Javadoc API documentation, and FAQs linking within the wiki.<br />
Heritrix&rsquo;s website links to two active mailing lists: a yahoo discussion group and a sourceforge list distributing source code commits. The project also uses a public JIRA for bug, feature, and issue tracking.<br />
====Usability====<br />
Heritrix is installed via a command line interface, but once installed the user can launch a web-based interface for configuration. Setting up a crawl requires a significant number of adjustments.<br />
====Expertise required====<br />
Installation requires solid knowledge of Linux and command line interfaces. As with any web archiving software, deep understanding of the project&rsquo;s scope and collections policy is essential in order to set up appropriate targets.<br />
====Standards compliance====<br />
Heritrix does not offer metadata support. The software is designed to respect robots.txt exclusion directives and META robots tags.<br />
====Influence and take-up====<br />
Heritrix is extremely influential; as of March 2012, the SourceForge site reports nearly 240,000 downloads. [https://webarchive.jira.com/wiki/display/Heritrix/Users+of+Heritrix Users] include the Internet Archive, the British Library, the United States Library of Congress, and the French National Library. The software powers [http://www.dcc.ac.uk/node/9380 NetarchiveSuite] and the [http://www.dcc.ac.uk/node/9394 Web Curator Tool].<br />
<br />
<br />
= User Experiences =<br />
<br />
<br />
= Development Activity =<br />
<br />
{{Infobox_tool_details<br />
|releases_rss=https://github.com/internetarchive/heritrix3/commits/master.atom<br />
|issues_rss=https://webarchive.jira.com/sr/jira.issueviews:searchrequest-rss/temp/SearchRequest.xml?jqlQuery=project+%3D+HER&tempMax=100<br />
|mailing_lists=https://groups.yahoo.com/neo/groups/archive-crawler/info<br />
|ohloh_id=Heritrix<br />
}}</div>Nullhandlehttps://coptr.digipres.org/index.php?title=File:NAS.gif&diff=2826File:NAS.gif2016-03-02T17:00:49Z<p>Nullhandle: NetarchiveSuite logo</p>
<hr />
<div>NetarchiveSuite logo</div>Nullhandlehttps://coptr.digipres.org/index.php?title=CINCH&diff=2825CINCH2016-02-29T23:37:25Z<p>Nullhandle: Created page with "{{Infobox_tool |purpose=CINCH (Capture INgest and CHecksum Tool) facilitates batch downloading and ingest of Internet-accessible documents and/or images to a central repositor..."</p>
<hr />
<div>{{Infobox_tool<br />
|purpose=CINCH (Capture INgest and CHecksum Tool) facilitates batch downloading and ingest of Internet-accessible documents and/or images to a central repository.<br />
|image=Cinch.png<br />
|homepage=http://cinch.nclive.org/Cinch/<br />
|license=[http://unlicense.org/ Unlicense]<br />
|platforms=*nix<br />
}}<br />
<br />
<!-- Delete the Categories that do not apply --><br />
[[Category:Web]]<br />
<br />
= Description =<br />
A project to develop a bulk download service to a central repository that will maintain original file timestamps, extract file level metadata, create file checksums and periodically validate checksums for continued file integrity.<br />
<br />
Users merely need to upload a list of URLs to download and when the process completes they can download the requested files and file metadata to their local environment. <br />
<br />
====Provider====<br />
[http://statelibrary.ncdcr.gov/ State Library of North Carolina]<br />
====Licensing and cost====<br />
[http://unlicense.org/ Unlicense]<br />
====Platform and interoperability====<br />
<nowiki>*</nowiki>nix<br />
====Documentation and user support====<br />
Diagrams, an FAQ, and documentation for users and developers is available [http://cinch.nclive.org/Cinch/site/page?view=about through the application website]. Setup instructions can be found on the [http://slnc-dimp.github.io/Cinch/ associated GitHub pages].<br />
<br />
= User Experiences =<br />
<br />
= Development Activity =<br />
<br />
{{Infobox_tool_details<br />
|releases_rss=https://github.com/SLNC-DIMP/Cinch/commits/master.atom<br />
|ohloh_id=<br />
}}</div>Nullhandlehttps://coptr.digipres.org/index.php?title=File:Cinch.png&diff=2824File:Cinch.png2016-02-29T23:22:41Z<p>Nullhandle: CINCH logo</p>
<hr />
<div>CINCH logo</div>Nullhandlehttps://coptr.digipres.org/index.php?title=GNU_Wget&diff=2823GNU Wget2016-02-26T21:50:09Z<p>Nullhandle: </p>
<hr />
<div>{{Infobox_tool<br />
|purpose= Non-interactive network downloader <br />
|image=Gnu2.png<br />
|homepage=http://www.gnu.org/software/wget/<br />
|license=GNU General Public License<br />
|platforms=Unix, Linux, Windows, Macintosh<br />
}}<br />
<br />
<!-- Delete the Categories that do not apply --><br />
[[Category:Web Crawl]]<br />
[[Category:Web]]<br />
<br />
= Description =<br />
GNU Wget is a free software package for retrieving files using HTTP, HTTPS and FTP, the most widely-used Internet protocols. It is a non-interactive command line tool, so it may easily be called from scripts, cron jobs, terminals without X-Windows support, etc. <br />
<br />
== Features ==<br />
<br />
From the Wget manual: <br />
<br />
* Wget is non-interactive, meaning that it can work in the background, while the user is not logged on. This allows you to start a retrieval and disconnect from the system, letting Wget finish the work. By contrast, most of the Web browsers require constant user’s presence, which can be a great hindrance when transferring a lot of data.<br />
* Wget can follow links in HTML, XHTML, and CSS pages, to create local versions of remote web sites, fully recreating the directory structure of the original site. This is sometimes referred to as “recursive downloading.” While doing that, Wget respects the Robot Exclusion Standard (/robots.txt). Wget can be instructed to convert the links in downloaded files to point at the local files, for offline viewing.<br />
* File name wildcard matching and recursive mirroring of directories are available when retrieving via FTP. Wget can read the time-stamp information given by both HTTP and FTP servers, and store it locally. Thus Wget can see if the remote file has changed since last retrieval, and automatically retrieve the new version if it has. This makes Wget suitable for mirroring of FTP sites, as well as home pages.<br />
* Wget has been designed for robustness over slow or unstable network connections; if a download fails due to a network problem, it will keep retrying until the whole file has been retrieved. If the server supports regetting, it will instruct the server to continue the download from where it left off.<br />
* Wget supports proxy servers, which can lighten the network load, speed up retrieval and provide access behind firewalls. Wget uses the passive FTP downloading by default, active FTP being an option.<br />
* Wget supports IP version 6, the next generation of IP. IPv6 is autodetected at compile-time, and can be disabled at either build or run time. Binaries built with IPv6 support work well in both IPv4-only and dual family environments.<br />
* Built-in features offer mechanisms to tune which links you wish to follow (see Following Links).<br />
* The progress of individual downloads is traced using a progress gauge. Interactive downloads are tracked using a “thermometer”-style gauge, whereas non-interactive ones are traced with dots, each dot representing a fixed amount of data received (1KB by default). Either gauge can be customized to your preferences.<br />
* Most of the features are fully configurable, either through command line options, or via the initialization file .wgetrc (see Startup File). Wget allows you to define global startup files (/usr/local/etc/wgetrc by default) for site settings. You can also specify the location of a startup file with the –config option.<br />
* Finally, GNU Wget is free software. This means that everyone may use it, redistribute it and/or modify it under the terms of the GNU General Public License, as published by the Free Software Foundation (see the file COPYING that came with GNU Wget, for details).<br />
<br />
As of version 1.14, Wget supports WARC output. See http://www.archiveteam.org/index.php?title=Wget_with_WARC_output for details of the development of this feature.<br />
<br />
== Platform ==<br />
<br />
GNU Wget can be installed on Unix-like systems (UNIX, Linux), Mac OS, and Windows computers.<br />
<br />
=== Installation ===<br />
<br />
* Unix-like systems: Most package managers include Wget, but they may not include the latest version. To get a later version with support for WARC, for example, Linux and UNIX users should compile the latest version of the source code following the instructions at http://wget.addictivecode.org/FrequentlyAskedQuestions#How_do_I_compile_Wget.3F.<br />
<br />
* Macintosh: The default Mac OS does not include Wget. Source code can be compiled for Mac OS X or users can install an alternative package manager such as Homebrew (Homebrew installs the latest version by default). See http://coolestguidesontheplanet.com/install-and-configure-wget-on-os-x/ for instructions on how to install from source.<br />
<br />
* Windows: packages for later versions of Wget compiled for Windows are available at http://eternallybored.org/misc/wget/.<br />
<br />
==Documentation==<br />
The user manual is available at http://www.gnu.org/software/wget/manual/wget.html. The manual is also available via man wget in Unix-like systems.<br />
<br />
Additional documentation, including an FAQ, is available on the Wget wiki, http://wget.addictivecode.org/Wget.<br />
<br />
= User Experiences =<br />
<br />
* Milligan, Ian. (2012). Automated downloading with Wget. http://programminghistorian.org/lessons/automated-downloading-with-wget<br />
* ArchiveTeam. (2014). Wget. http://www.archiveteam.org/index.php?title=Wget<br />
<br />
= Development Activity =<br />
<br />
{{Infobox_tool_details<br />
|ohloh_id=Wget<br />
}}</div>Nullhandlehttps://coptr.digipres.org/index.php?title=Warrick&diff=2822Warrick2016-02-26T21:46:58Z<p>Nullhandle: </p>
<hr />
<div>{{Infobox_tool<br />
|purpose=Warrick is a free utility for reconstructing (or recovering) a website from web archives.<br />
|homepage=https://github.com/oduwsdl/warrick<br />
|image=Warricklogo.gif<br />
|license=[http://www.gnu.org/licenses/old-licenses/gpl-2.0.en.html GNU General Public License 2+]<br />
|platforms=<br />
}}<br />
<br />
<!-- Delete the Categories that do not apply --><br />
[[Category:Web Crawl]]<br />
[[Category:Web]]<br />
<br />
= Description =<br />
Warrick is a free utility for reconstructing (or recovering) a website when a back-up is not available. Warrick utilizes the Memento Framework to discover archived versions of resources from web archives. The resources are gathered to provide a single collection of files.<br />
====Provider====<br />
[http://www.harding.edu/fmccown/ Frank McCown] and [http://www.justinfbrunelle.com/ Justin Brunelle]<br />
====Licensing and cost====<br />
[http://www.gnu.org/licenses/old-licenses/gpl-2.0.en.html GNU General Public License 2+]<br />
====Standards compliance====<br />
Can use [http://mementoweb.org/about/ Memento] to retrieve archived web content.<br />
<br />
= User Experiences =<br />
<br />
= Development Activity =<br />
<br />
{{Infobox_tool_details<br />
|releases_rss=https://github.com/oduwsdl/warrick/commits/master.atom<br />
|mailing_lists=https://groups.google.com/forum/#!topic/warrickrecovery/<br />
|ohloh_id=Warrick<br />
}}</div>Nullhandlehttps://coptr.digipres.org/index.php?title=Warrick&diff=2821Warrick2016-02-26T21:46:10Z<p>Nullhandle: </p>
<hr />
<div>{{Infobox_tool<br />
|purpose=Warrick is a free utility for reconstructing (or recovering) a website when a back-up is not available.<br />
|homepage=https://github.com/oduwsdl/warrick<br />
|image=Warricklogo.gif<br />
|license=[http://www.gnu.org/licenses/old-licenses/gpl-2.0.en.html GNU General Public License 2+]<br />
|platforms=<br />
}}<br />
<br />
<!-- Delete the Categories that do not apply --><br />
[[Category:Web Crawl]]<br />
[[Category:Web]]<br />
<br />
= Description =<br />
Warrick is a free utility for reconstructing (or recovering) a website when a back-up is not available. Warrick utilizes the Memento Framework to discover archived versions of resources from web archives. The resources are gathered to provide a single collection of files.<br />
====Provider====<br />
[http://www.harding.edu/fmccown/ Frank McCown] and [http://www.justinfbrunelle.com/ Justin Brunelle]<br />
====Licensing and cost====<br />
[http://www.gnu.org/licenses/old-licenses/gpl-2.0.en.html GNU General Public License 2+]<br />
====Standards compliance====<br />
Can use [http://mementoweb.org/about/ Memento] to retrieve archived web content.<br />
<br />
= User Experiences =<br />
<br />
= Development Activity =<br />
<br />
{{Infobox_tool_details<br />
|releases_rss=https://github.com/oduwsdl/warrick/commits/master.atom<br />
|mailing_lists=https://groups.google.com/forum/#!topic/warrickrecovery/<br />
|ohloh_id=Warrick<br />
}}</div>Nullhandlehttps://coptr.digipres.org/index.php?title=File:Warricklogo.gif&diff=2820File:Warricklogo.gif2016-02-26T21:38:38Z<p>Nullhandle: Warrick logo</p>
<hr />
<div>Warrick logo</div>Nullhandlehttps://coptr.digipres.org/index.php?title=ArchiveFacebook&diff=2819ArchiveFacebook2016-02-26T21:33:04Z<p>Nullhandle: </p>
<hr />
<div>{{Infobox_tool<br />
|purpose=ArchiveFacebook is a Firefox extension which allows individuals to save and manage Facebook web content.<br />
|image=13993-64.png<br />
|homepage=https://github.com/machawk1/archivefacebook<br />
|license=[https://www.mozilla.org/en-US/MPL/1.1/ Mozilla Public License 1.1]<br />
|platforms=[https://www.mozilla.org/en-US/firefox/new/ Mozilla Firefox]<br />
}}<br />
<br />
<!-- Delete the Categories that do not apply --><br />
[[Category:Web Crawl]]<br />
[[Category:Web]]<br />
<br />
= Description =<br />
ArchiveFacebook is a Firefox extension, which helps you to save web pages from Facebook and easily manage them. Save content from Facebook directly to your hard drive and view them exactly the same way you currently view them on Facebook.<br />
<br />
====Provider====<br />
[http://www.cs.odu.edu/~cpi/old/cpi-f2004/aos/carlton.htm Carlton Northern] and [http://www.cs.odu.edu/~mkelly/ Mat Kelly] at Old Dominion University<br />
====Licensing and cost====<br />
[https://www.mozilla.org/en-US/MPL/1.1/ Mozilla Public License 1.1]<br />
====Platform and interoperability====<br />
[https://www.mozilla.org/en-US/firefox/new/ Mozilla Firefox]<br />
====Documentation and user support====<br />
Basic usage notes (and add-on download) are available from the [https://addons.mozilla.org/en-US/firefox/addon/archivefacebook/ Mozilla add-on page for ArchiveFacebook].<br />
<br />
= User Experiences =<br />
<br />
= Development Activity =<br />
<br />
{{Infobox_tool_details<br />
|releases_rss=https://github.com/machawk1/archivefacebook/commits/master.atom<br />
|mailing_lists=http://groups.google.com/group/archivefacebook?hl=en<br />
}}</div>Nullhandlehttps://coptr.digipres.org/index.php?title=WARCreate&diff=2818WARCreate2016-02-26T21:32:09Z<p>Nullhandle: </p>
<hr />
<div>{{Infobox_tool<br />
|purpose=Google Chrome browser extension for creating WARC files from web pages<br />
|homepage=https://warcreate.com<br />
|image=Icon.png<br />
|sourcecode=https://github.com/machawk1/warcreate<br />
|license=GPLv3<br />
|platforms=Cross-platform<br />
|language=JavaScript<br />
|formats_out={{Format|WARC}}<br />
}}<br />
<br />
[[Category:Data capture and Deposit]]<br />
[[Category:Personal_Archiving]]<br />
[[Category:Web Crawl]]<br />
[[Category:Web Snapshot]]<br />
<br />
== Description ==<br />
WARCreate is a browser extension for Google Chrome that preserves web pages viewed by end-users in the browsers into WARC files to be stored on the user's local disk.<br />
<br />
== User Experiences ==<br />
<!-- Add hotlinks to user experiences with the tool (eg. blog posts). These should illustrate the effectiveness (or otherwise) of the tool. --><br />
* [http://ws-dl.blogspot.com/2013/07/2013-07-10-warcreate-and-wail-warc.html 2013-07-10: WARCreate and WAIL: WARC, Wayback and Heritrix Made Easy]<br />
<br />
== Development ==<br />
<!-- Provide *evidence* of development activity of the tool. For example, RSS feeds for code issues or commits. --><br />
<br />
<!-- Add the Ohloh.com ID for the tool, if known. --><br />
{{Infobox_tool_details<br />
|releases_rss=https://github.com/machawk1/warcreate/commits/master.atom<br />
|ohloh_id=warcreate<br />
}}</div>Nullhandlehttps://coptr.digipres.org/index.php?title=File:Icon.png&diff=2817File:Icon.png2016-02-26T21:28:53Z<p>Nullhandle: WARCreate logo</p>
<hr />
<div>WARCreate logo</div>Nullhandlehttps://coptr.digipres.org/index.php?title=SiteStory&diff=2816SiteStory2016-02-26T21:27:09Z<p>Nullhandle: </p>
<hr />
<div><!-- Use the structure provided in this template, do not change it! --><br />
<br />
{{Infobox_tool<br />
|purpose=SiteStory is a transactional web archive. It archives resources of a web server it is associated with. <br />
|image=SiteStory_test-1.png<br />
|homepage=http://mementoweb.github.io/SiteStory/<br />
|license=[http://mementoweb.github.io/SiteStory/license.html BSD open source software license]<br />
|platforms=Works with Apache Web Server version 2.2 or higher. Tested on GNU/Linux.<br />
}}<br />
<!-- Note that to use the image field, you should leave the value as {{PAGENAMEE}}.png (or similar) and upload a copy of the image. Hot-linking is not supported. If you don't want an image, just remove that line. --><br />
<br />
<!-- Add one or more categories to describe the function of the tool, such as:<br />
[[Category:Metadata Extraction]] or [[Category:Preservation System]] or [[Category:Backup]]<br />
Choose carefully, and view the list of existing categories first (see the Navigation sidebar on the left) --><br />
[[Category:Web Crawl]]<br />
<br />
<!-- Add relevant categories to describe the content type that the tool addresses, such as:<br />
[[Category:Audio]] or [[Category:Document]] or [[Category:Research Data]]<br />
Choose carefully, and view the list of existing categories first (see the Navigation sidebar on the left). If the tool works on any content type, do not add a category. --><br />
[[Category:Web]]<br />
<br />
= Description =<br />
[http://mementoweb.github.io/SiteStory/ SiteStory] is an open-source transactional web archive. It archives resources of a web server it is associated with. As a browser requests a resource published by the server, that resource is delivered to the browser but also pushed into the archive. As a result, a SiteStory web archive contains a copy of all versions of a server's resources that were requested by a web client. A SiteStory archive is accessible via the [http://www.mementoweb.org/guide/rfc/ Memento protocol].<br />
====Provider====<br />
Los Alamos National Laboratory, Research Library<br />
====Licensing and cost====<br />
[http://mementoweb.github.io/SiteStory/license.html BSD open source software license]<br />
====Platform and interoperability====<br />
SiteStory can serve as a transactional web archive for Apache Web Server with version 2.2 or higher. The web archive component can work with other web servers, but requires development of an add-on similar to [http://mementoweb.github.io/SiteStory/getStarted.html mod_sitestory] created for Apache. Tested on GNU/Linux.<br />
====Documentation and user support====<br />
Extensive documentation is available on the [http://mementoweb.github.io/SiteStory/ SiteStory website].<br />
====Standards compliance====<br />
A SiteStory archive is accessible via the [http://www.mementoweb.org/guide/rfc/ Memento protocol] and supports exporting captured resources to [http://en.wikipedia.org/wiki/Web_ARChive WARC files]. <br />
<br />
= User Experiences =<br />
<!-- Add hotlinks to user experiences with the tool (e.g. blog posts). These should illustrate the effectiveness (or otherwise) of the tool. Use a bullet list. --><br />
<br />
= Development Activity =<br />
{{Infobox_tool_details<br />
|releases_rss=https://github.com/mementoweb/SiteStory/commits/master.atom<br />
|ohloh_id=SiteStory<br />
}}</div>Nullhandlehttps://coptr.digipres.org/index.php?title=SiteStory&diff=2815SiteStory2016-02-26T21:21:00Z<p>Nullhandle: </p>
<hr />
<div><!-- Use the structure provided in this template, do not change it! --><br />
<br />
{{Infobox_tool<br />
|purpose=SiteStory is a transactional web archive. It archives resources of a web server it is associated with. <br />
|image=SiteStory_test-1.png<br />
|homepage=http://mementoweb.github.io/SiteStory/<br />
|license=BSD open source software license<br />
|platforms=Works with Apache Web Server version 2.2 or higher. Tested on GNU/Linux.<br />
}}<br />
<!-- Note that to use the image field, you should leave the value as {{PAGENAMEE}}.png (or similar) and upload a copy of the image. Hot-linking is not supported. If you don't want an image, just remove that line. --><br />
<br />
<!-- Add one or more categories to describe the function of the tool, such as:<br />
[[Category:Metadata Extraction]] or [[Category:Preservation System]] or [[Category:Backup]]<br />
Choose carefully, and view the list of existing categories first (see the Navigation sidebar on the left) --><br />
[[Category:Web Crawl]]<br />
<br />
<!-- Add relevant categories to describe the content type that the tool addresses, such as:<br />
[[Category:Audio]] or [[Category:Document]] or [[Category:Research Data]]<br />
Choose carefully, and view the list of existing categories first (see the Navigation sidebar on the left). If the tool works on any content type, do not add a category. --><br />
[[Category:Web]]<br />
<br />
= Description =<br />
[http://mementoweb.github.io/SiteStory/ SiteStory] is an open-source transactional web archive. It archives resources of a web server it is associated with. As a browser requests a resource published by the server, that resource is delivered to the browser but also pushed into the archive. As a result, a SiteStory web archive contains a copy of all versions of a server's resources that were requested by a web client. A SiteStory archive is accessible via the [http://www.mementoweb.org/guide/rfc/ Memento protocol].<br />
====Provider====<br />
Los Alamos National Laboratory, Research Library<br />
====Licensing and cost====<br />
[http://mementoweb.github.io/SiteStory/license.html BSD open source software license] <br />
====Platform and interoperability====<br />
SiteStory can serve as a transactional web archive for Apache Web Server with version 2.2 or higher. The web archive component can work with other web servers, but requires development of an add-on similar to [http://mementoweb.github.io/SiteStory/getStarted.html mod_sitestory] created for Apache. Tested on GNU/Linux.<br />
====Standards compliance====<br />
A SiteStory archive is accessible via the [http://www.mementoweb.org/guide/rfc/ Memento protocol] and supports exporting captured resources to [http://en.wikipedia.org/wiki/Web_ARChive WARC files]. <br />
<br />
= User Experiences =<br />
<!-- Add hotlinks to user experiences with the tool (eg. blog posts). These should illustrate the effectiveness (or otherwise) of the tool. Use a bullet list. --><br />
<br />
= Development Activity =<br />
Version 1.0 was released in July 2013.<br />
<!-- Provide *evidence* of development activity of the tool. For example, RSS feeds for code issues or commits. --><br />
<!-- Add the OpenHub.com ID for the tool, if known. --><br />
{{Infobox_tool_details<br />
|releases_rss=<br />
|issues_rss=<br />
|mailing_lists=<br />
|ohloh_id=<br />
}}</div>Nullhandlehttps://coptr.digipres.org/index.php?title=File:SiteStory_test-1.png&diff=2814File:SiteStory test-1.png2016-02-26T21:20:18Z<p>Nullhandle: SiteStory logo</p>
<hr />
<div>SiteStory logo</div>Nullhandlehttps://coptr.digipres.org/index.php?title=ArchiveFacebook&diff=2813ArchiveFacebook2016-02-26T21:17:27Z<p>Nullhandle: </p>
<hr />
<div>{{Infobox_tool<br />
|purpose=ArchiveFacebook is a Firefox extension which allows individuals to save and manage Facebook web content.<br />
|image=13993-64.png<br />
|homepage=https://github.com/machawk1/archivefacebook<br />
|license=[https://www.mozilla.org/en-US/MPL/1.1/ Mozilla Public License 1.1]<br />
|platforms=[https://www.mozilla.org/en-US/firefox/new/ Mozilla Firefox]<br />
}}<br />
<br />
<!-- Delete the Categories that do not apply --><br />
[[Category:Web Crawl]]<br />
[[Category:Web]]<br />
<br />
= Description =<br />
ArchiveFacebook is a Firefox extension, which helps you to save web pages from Facebook and easily manage them. Save content from Facebook directly to your hard drive and view them exactly the same way you currently view them on Facebook.<br />
<br />
====Provider====<br />
[http://www.cs.odu.edu/~cpi/old/cpi-f2004/aos/carlton.htm Carlton Northern] and [http://www.cs.odu.edu/~mkelly/ Mat Kelly] at Old Dominion University<br />
====Licensing and cost====<br />
[https://www.mozilla.org/en-US/MPL/1.1/ Mozilla Public License 1.1]<br />
====Platform and interoperability====<br />
[https://www.mozilla.org/en-US/firefox/new/ Mozilla Firefox]<br />
====Documentation and user support====<br />
Basic usage notes (and add-on download) are available from the [https://addons.mozilla.org/en-US/firefox/addon/archivefacebook/ Mozilla add-on page for ArchiveFacebook].<br />
<br />
= User Experiences =<br />
<br />
= Development Activity =<br />
<br />
{{Infobox_tool_details<br />
|releases_rss=https://github.com/machawk1/archivefacebook/commits/master.atom<br />
|mailing_lists=http://groups.google.com/group/archivefacebook?hl=en<br />
|ohloh_id=ArchiveFacebook<br />
}}</div>Nullhandlehttps://coptr.digipres.org/index.php?title=HTTrack&diff=2812HTTrack2016-02-26T21:16:04Z<p>Nullhandle: </p>
<hr />
<div>{{Infobox_tool<br />
|purpose=HTTrack is a website copying utility.<br />
|image=<br />
|homepage=http://www.httrack.com/<br />
|license=[http://www.gnu.org/licenses/quick-guide-gplv3.html GNU General Public License 3+]<br />
|platforms=Android, BSD, Linux, OS X, Windows<br />
}}<br />
<br />
<!-- Delete the Categories that do not apply --><br />
[[Category:Web Crawl]]<br />
[[Category:Web]]<br />
<br />
= Description =<br />
HTTrack is an offline browser utility, allowing you to download a World Wide website from the Internet to a local directory, building recursively all directories, getting html, images, and other files from the server to your computer.<br />
<br />
HTTrack arranges the original site's relative link-structure. Simply open a page of the "mirrored" website in your browser, and you can browse the site from link to link, as if you were viewing it online.<br />
<br />
HTTrack can also update an existing mirrored site, and resume interrupted downloads. HTTrack is fully configurable, and has an integrated help system.<br />
<br />
WinHTTrack is the Windows 2000/XP/Vista/Seven release of HTTrack, and WebHTTrack the Linux/Unix/BSD release. <br />
<br />
====Provider====<br />
[https://github.com/xroche Xavier Roche]<br />
====Licensing and cost====<br />
[http://www.gnu.org/licenses/quick-guide-gplv3.html GNU General Public License 3+]<br />
====Platform and interoperability====<br />
Android, BSD, Linux, OS X, Windows<br />
====Documentation and user support====<br />
Documentation is both bundled with the program and [https://www.httrack.com/html/index.html available from the HTTrack website]. There is a [https://forum.httrack.com/ forum for user support].<br />
<br />
= User Experiences =<br />
<br />
= Development Activity =<br />
<br />
{{Infobox_tool_details<br />
|releases_rss=https://github.com/xroche/httrack/commits/master.atom<br />
|ohloh_id=HTTrack<br />
}}</div>Nullhandlehttps://coptr.digipres.org/index.php?title=Template:Infobox_tool&diff=2811Template:Infobox tool2016-02-26T20:26:37Z<p>Nullhandle: </p>
<hr />
<div><table class="infobox formatinfo" border="0" style="float: right; border: 1px solid #aaa; max-width: 33%; overflow: hidden; background-color: #f9f9f9; padding: 0.25em; margin: 0 0.25em 1em;"><br />
{{#if:{{{image|}}}|<br />
<tr><td align="center" colspan="2">[[Image:{{{image}}}|100px|{{PAGENAME}}]]</td></tr><br />
}}<br />
<tr><td align="center" colspan="2">{{{purpose}}}</td></tr><br />
<tr><td><small><b>Homepage:</b></small></td><td><small>{{{homepage|Unknown}}}</small></td></tr><br />
{{#if:{{{sourcecode|}}}|<br />
<tr><td><small><b>Source Code:</b></small></td><td><small>{{{sourcecode|Unknown}}}</small></td></tr><br />
}}<br />
{{#if:{{{license|}}}|<br />
<tr><td><small><b>License:</b></small></td><td><small>{{{license|Unknown}}}</small></td></tr><br />
}}<br />
{{#if:{{{cost|}}}|<br />
<tr><td><small><b>Cost:</b></small></td><td><small>{{{cost|Unknown}}}</small></td></tr><br />
}}<br />
{{#if:{{{platforms|}}}|<br />
<tr><td><small><b>Platforms:</b></small></td><td><small>{{{platforms|Unknown}}}</small></td></tr><br />
}}<br />
{{#if:{{{language|}}}|<br />
<tr><td><small><b>Language:</b></small></td><td><small>{{{language|Unknown}}}</small></td></tr><br />
}}<br />
{{#if:{{{formats_in|}}}|<br />
<tr><td><small><b>Input Formats:</b></small></td><td><small>{{{formats_in|Unknown}}}</small></td></tr><br />
}}<br />
{{#if:{{{formats_out|}}}|<br />
<tr><td><small><b>Output Formats:</b></small></td><td><small>{{{formats_out|Unknown}}}</small></td></tr><br />
}}<br />
</table><br />
<br />
[[Category:Tools]]<br />
<br />
{{#if:{{{license|}}}|<br />
|<br />
[[Category:Unknown_license]]<br />
}}</div>Nullhandlehttps://coptr.digipres.org/index.php?title=ArchiveFacebook&diff=2810ArchiveFacebook2016-02-26T20:25:59Z<p>Nullhandle: </p>
<hr />
<div>{{Infobox_tool<br />
|purpose=ArchiveFacebook is a Firefox extension which allows individuals to save and manage Facebook web content.<br />
|image=13993-64.png<br />
|homepage=https://github.com/machawk1/archivefacebook<br />
|license=[https://www.mozilla.org/en-US/MPL/1.1/ Mozilla Public License 1.1]<br />
|platforms=[https://www.mozilla.org/en-US/firefox/new/ Mozilla Firefox]<br />
}}<br />
<br />
<!-- Delete the Categories that do not apply --><br />
[[Category:Web Crawl]]<br />
[[Category:Web]]<br />
<br />
= Description =<br />
ArchiveFacebook is a Firefox extension which allows individuals to save and manage Facebook web content.<br />
<br />
====Provider====<br />
[http://www.cs.odu.edu/~cpi/old/cpi-f2004/aos/carlton.htm Carlton Northern] and [http://www.cs.odu.edu/~mkelly/ Mat Kelly] at Old Dominion University<br />
====Licensing and cost====<br />
[https://www.mozilla.org/en-US/MPL/1.1/ Mozilla Public License 1.1]<br />
====Platform and interoperability====<br />
[https://www.mozilla.org/en-US/firefox/new/ Mozilla Firefox]<br />
====Documentation and user support====<br />
Basic usage notes (and add-on download) are available from the [https://addons.mozilla.org/en-US/firefox/addon/archivefacebook/ Mozilla add-on page for ArchiveFacebook].<br />
<br />
= User Experiences =<br />
<br />
= Development Activity =<br />
<br />
{{Infobox_tool_details<br />
|releases_rss=https://github.com/machawk1/archivefacebook/commits/master.atom<br />
|mailing_lists=http://groups.google.com/group/archivefacebook?hl=en<br />
|ohloh_id=ArchiveFacebook<br />
}}</div>Nullhandlehttps://coptr.digipres.org/index.php?title=File:13993-64.png&diff=2809File:13993-64.png2016-02-26T20:17:00Z<p>Nullhandle: ArchiveFacebook logo</p>
<hr />
<div>ArchiveFacebook logo</div>Nullhandlehttps://coptr.digipres.org/index.php?title=ArchiveFacebook&diff=2808ArchiveFacebook2016-02-26T20:15:58Z<p>Nullhandle: Created page with "{{Infobox_tool |purpose=ArchiveFacebook is a Firefox extension which allows individuals to save and manage Facebook web content. |image=https://addons.cdn.mozilla.net/user-med..."</p>
<hr />
<div>{{Infobox_tool<br />
|purpose=ArchiveFacebook is a Firefox extension which allows individuals to save and manage Facebook web content.<br />
|image=https://addons.cdn.mozilla.net/user-media/addon_icons/13/13993-64.png<br />
|homepage=https://github.com/machawk1/archivefacebook<br />
|license=[https://www.mozilla.org/en-US/MPL/1.1/ Mozilla Public License 1.1]<br />
|platforms=[https://www.mozilla.org/en-US/firefox/new/ Mozilla Firefox]<br />
}}<br />
<br />
<!-- Delete the Categories that do not apply --><br />
[[Category:Web Crawl]]<br />
[[Category:Web]]<br />
<br />
= Description =<br />
ArchiveFacebook is a Firefox extension which allows individuals to save and manage Facebook web content.<br />
<br />
====Provider====<br />
[http://www.cs.odu.edu/~cpi/old/cpi-f2004/aos/carlton.htm Carlton Northern] and [http://www.cs.odu.edu/~mkelly/ Mat Kelly] at Old Dominion University<br />
====Licensing and cost====<br />
[https://www.mozilla.org/en-US/MPL/1.1/ Mozilla Public License 1.1]<br />
====Platform and interoperability====<br />
[https://www.mozilla.org/en-US/firefox/new/ Mozilla Firefox]<br />
====Documentation and user support====<br />
Basic usage notes (and add-on download) are available from the [https://addons.mozilla.org/en-US/firefox/addon/archivefacebook/ Mozilla add-on page for ArchiveFacebook].<br />
<br />
= User Experiences =<br />
<br />
= Development Activity =<br />
<br />
{{Infobox_tool_details<br />
|releases_rss=https://github.com/machawk1/archivefacebook/commits/master.atom<br />
|mailing_lists=http://groups.google.com/group/archivefacebook?hl=en<br />
|ohloh_id=ArchiveFacebook<br />
}}</div>Nullhandle