Difference between revisions of "GNU Wget"

From COPTR
Jump to navigation Jump to search
(Import from spreadsheet via script.)
 
(9 intermediate revisions by 4 users not shown)
Line 1: Line 1:
{{Infobox_tool
+
{{Infobox tool
|purpose=GNU Wget is a free software package for retrieving files using HTTP,  HTTPS and FTP,  the most widely-used Internet protocols.
+
|image=Gnu2.png
|image=
+
|purpose=Non-interactive network downloader
 
|homepage=http://www.gnu.org/software/wget/
 
|homepage=http://www.gnu.org/software/wget/
|license=
+
|license=GNU General Public License
|platforms=
+
|platforms=Unix, Linux, Windows, Macintosh
 +
|function=Web Capture
 +
|content=Web
 
}}
 
}}
 +
{{Infobox tool details
 +
|ohloh_id=Wget
 +
}}
 +
= Description =
 +
GNU Wget is a free software package for retrieving files using HTTP,  HTTPS and FTP,  the most widely-used Internet protocols. It is a non-interactive command line tool,  so it may easily be called from scripts,  cron jobs,  terminals without X-Windows support,  etc.
 +
 +
=== Features ===
 +
 +
From the Wget manual:
 +
 +
* Wget is non-interactive, meaning that it can work in the background, while the user is not logged on. This allows you to start a retrieval and disconnect from the system, letting Wget finish the work. By contrast, most of the Web browsers require constant user’s presence, which can be a great hindrance when transferring a lot of data.
 +
* Wget can follow links in HTML, XHTML, and CSS pages, to create local versions of remote web sites, fully recreating the directory structure of the original site. This is sometimes referred to as “recursive downloading.” While doing that, Wget respects the Robot Exclusion Standard (/robots.txt).  Wget can be instructed to convert the links in downloaded files to point at the local files, for offline viewing.
 +
* File name wildcard matching and recursive mirroring of directories are available when retrieving via FTP. Wget can read the time-stamp information given by both HTTP and FTP servers, and store it locally. Thus Wget can see if the remote file has changed since last retrieval, and automatically retrieve the new version if it has. This makes Wget suitable for mirroring of FTP sites, as well as home pages.
 +
* Wget has been designed for robustness over slow or unstable network connections; if a download fails due to a network problem, it will keep retrying until the whole file has been retrieved. If the server supports regetting, it will instruct the server to continue the download from where it left off.
 +
* Wget supports proxy servers, which can lighten the network load, speed up retrieval and provide access behind firewalls. Wget uses the passive FTP downloading by default, active FTP being an option.
 +
* Wget supports IP version 6, the next generation of IP. IPv6 is autodetected at compile-time, and can be disabled at either build or run time. Binaries built with IPv6 support work well in both IPv4-only and dual family environments.
 +
* Built-in features offer mechanisms to tune which links you wish to follow (see Following Links).
 +
* The progress of individual downloads is traced using a progress gauge. Interactive downloads are tracked using a “thermometer”-style gauge, whereas non-interactive ones are traced with dots, each dot representing a fixed amount of data received (1KB by default). Either gauge can be customized to your preferences.
 +
* Most of the features are fully configurable, either through command line options, or via the initialization file .wgetrc (see Startup File). Wget allows you to define global startup files (/usr/local/etc/wgetrc by default) for site settings. You can also specify the location of a startup file with the –config option.
 +
* Finally, GNU Wget is free software. This means that everyone may use it, redistribute it and/or modify it under the terms of the GNU General Public License, as published by the Free Software Foundation (see the file COPYING that came with GNU Wget, for details).
 +
 +
As of version 1.14, Wget supports WARC output. See http://www.archiveteam.org/index.php?title=Wget_with_WARC_output for details of the development of this feature.
 +
 +
=== Platform ===
 +
 +
GNU Wget can be installed on Unix-like systems (UNIX, Linux), Mac OS, and Windows computers.
 +
 +
=== Installation ===
 +
 +
* Unix-like systems: Most package managers include Wget, but they may not include the latest version. To get a later version with support for WARC, for example, Linux and UNIX users should compile the latest version of the source code following the instructions at http://wget.addictivecode.org/FrequentlyAskedQuestions#How_do_I_compile_Wget.3F.
  
<!-- Delete the Categories that do not apply -->
+
* Macintosh: The default Mac OS does not include Wget. Source code can be compiled for Mac OS X or users can install an alternative package manager such as Homebrew (Homebrew installs the latest version by default). See http://coolestguidesontheplanet.com/install-and-configure-wget-on-os-x/ for instructions on how to install from source.
[[Category:Web Crawl]]
 
  
 +
* Windows: packages for later versions of Wget compiled for Windows are available at http://eternallybored.org/misc/wget/.
  
= Description =
+
===Documentation===
GNU Wget is a free software package for retrieving files using HTTP,  HTTPS and FTP,  the most widely-used Internet protocols. It is a non-interactive commandline tool, so it may easily be called from scripts, cron jobs, terminals without X-Windows support,  etc.  
+
The user manual is available at http://www.gnu.org/software/wget/manual/wget.html. The manual is also available via man wget in Unix-like systems.
 +
 
 +
Additional documentation, including an FAQ, is available on the Wget wiki, http://wget.addictivecode.org/Wget.
  
 
= User Experiences =
 
= User Experiences =
  
 +
* Milligan, Ian. (2012). Automated downloading with Wget. http://programminghistorian.org/lessons/automated-downloading-with-wget
 +
* ArchiveTeam. (2014). Wget. http://www.archiveteam.org/index.php?title=Wget
  
 
= Development Activity =
 
= Development Activity =
 
{{Infobox_tool_details
 
|ohloh_id=GNU Wget
 
}}
 

Latest revision as of 15:56, 26 November 2021



GNU Wget
Non-interactive network downloader
Homepage:http://www.gnu.org/software/wget/
License:GNU General Public License
Platforms:Unix, Linux, Windows, Macintosh
Function:Web Capture
Content type:Web
Appears in COW:PDF/A validation and metadata extraction


Error in widget Ohloh Project: unable to write file /var/www/html/extensions/Widgets/compiled_templates/wrt661d12b679e5f6_13001856


Description[edit]

GNU Wget is a free software package for retrieving files using HTTP, HTTPS and FTP, the most widely-used Internet protocols. It is a non-interactive command line tool, so it may easily be called from scripts, cron jobs, terminals without X-Windows support, etc.

Features[edit]

From the Wget manual:

  • Wget is non-interactive, meaning that it can work in the background, while the user is not logged on. This allows you to start a retrieval and disconnect from the system, letting Wget finish the work. By contrast, most of the Web browsers require constant user’s presence, which can be a great hindrance when transferring a lot of data.
  • Wget can follow links in HTML, XHTML, and CSS pages, to create local versions of remote web sites, fully recreating the directory structure of the original site. This is sometimes referred to as “recursive downloading.” While doing that, Wget respects the Robot Exclusion Standard (/robots.txt). Wget can be instructed to convert the links in downloaded files to point at the local files, for offline viewing.
  • File name wildcard matching and recursive mirroring of directories are available when retrieving via FTP. Wget can read the time-stamp information given by both HTTP and FTP servers, and store it locally. Thus Wget can see if the remote file has changed since last retrieval, and automatically retrieve the new version if it has. This makes Wget suitable for mirroring of FTP sites, as well as home pages.
  • Wget has been designed for robustness over slow or unstable network connections; if a download fails due to a network problem, it will keep retrying until the whole file has been retrieved. If the server supports regetting, it will instruct the server to continue the download from where it left off.
  • Wget supports proxy servers, which can lighten the network load, speed up retrieval and provide access behind firewalls. Wget uses the passive FTP downloading by default, active FTP being an option.
  • Wget supports IP version 6, the next generation of IP. IPv6 is autodetected at compile-time, and can be disabled at either build or run time. Binaries built with IPv6 support work well in both IPv4-only and dual family environments.
  • Built-in features offer mechanisms to tune which links you wish to follow (see Following Links).
  • The progress of individual downloads is traced using a progress gauge. Interactive downloads are tracked using a “thermometer”-style gauge, whereas non-interactive ones are traced with dots, each dot representing a fixed amount of data received (1KB by default). Either gauge can be customized to your preferences.
  • Most of the features are fully configurable, either through command line options, or via the initialization file .wgetrc (see Startup File). Wget allows you to define global startup files (/usr/local/etc/wgetrc by default) for site settings. You can also specify the location of a startup file with the –config option.
  • Finally, GNU Wget is free software. This means that everyone may use it, redistribute it and/or modify it under the terms of the GNU General Public License, as published by the Free Software Foundation (see the file COPYING that came with GNU Wget, for details).

As of version 1.14, Wget supports WARC output. See http://www.archiveteam.org/index.php?title=Wget_with_WARC_output for details of the development of this feature.

Platform[edit]

GNU Wget can be installed on Unix-like systems (UNIX, Linux), Mac OS, and Windows computers.

Installation[edit]

Documentation[edit]

The user manual is available at http://www.gnu.org/software/wget/manual/wget.html. The manual is also available via man wget in Unix-like systems.

Additional documentation, including an FAQ, is available on the Wget wiki, http://wget.addictivecode.org/Wget.

User Experiences[edit]

Development Activity[edit]