wget command in Linux

Open-Source Internship opportunity by OpenGenus for programmers. Apply now.

wget is a Linux command line utility useful for retrieval and downloading of files from the internet. It uses protocols such as FTP, HTTP and HTTPS for such tasks. In this article we have discussed commonly used wget commands and options.

Table of contents.

Introduction.
Syntax.
Commands.
Summary.
References.

Introduction.

wget is a Linux command line utility useful for retrieving or downloading files from the internet. It uses protocols such as FTP, HTTP and HTTPS for such tasks.

wget can work even with slow and unstable network connection and when we loose a connection, we can proceed with a download from where it stopped. It is also used to mirror websites.

Syntax.

The syntax is as follows,

wget [OPTION] [URL]

Commands.

To download a file from a website https://opengenus.org/notes.pdf we write,

wget https://opengenus.org/notes.pdf

Fora silent download we use the -q option,

wget -q https://opengenus.org/notes.pdf

If we want to download the file into a specific directory we write,

wget -P ~/Documents/notes https://opengenus.org/notes.pdf

where ~/Documents/notes is the directory we specify and where the downloaded file will be stored.

We can also save the downloaded file under a different name by using the -O option,

wget -O newFile.pdf https:///www/url.com/notes.pdf

The command will download notes.pdf and save it as newFile.pdf.

We can also download multiple files by saving the sources in a file, e.g

vim urls.txt

Enter the following urls,

https://www.url1.com/pdf
https://www.url2.com
https://www.url3.com/documents

To fetch the files, we write,

wget -i urls.txt

wget will fetch all files from the three urls described in urls.txt.

We can also opt to write all urls in the command itself as follows,

wget https://www.url1.com/pdf https://www.url2.com https://www.url3.com/documents

We can also download multiple files of the same type from the same source using the wildcard character * as follows,

wget ftp://exampleserver/music/*.mp3

The above command downloads all files with .mp3 file extension.

Assuming we want to proceed with other work, wget can work in the background by using the -b option as follows,

wget -b https:///www/url.com/files

We can define a log file where information pertaining to the download will be stored, that is while it is working in the background,

wget -b info.log https:///www/url.com/files

We can also log messages into a file by using the -o option,

wget https:///www/url.com/files -o info.log

Assuming we are in the process of downloading something and we get interrupted, we can use the -c option to resume the download as follows,

wget -c https:///www/url.com/image.iso

We can also limit the download speed by using the --limit-rate option as follows,

wget --limit-rate=128k https:///www/url.com/image.iso

The above command downloads files at 128kbs.
Speeds can be in k for kilobytes, m for megabytes and g for gigabytes.

A remote resource may be configured to block wget user agents to bypass such a situation we can use the -U option to emulate a different browser as follows,

wget --user-agent="Mozilla/5.0 (Windows NT 10.0; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/97.0.4692.99 Safari/537.36" http://wget-forbidden.com/

Where we define the user agent as 'Mozilla/5.0 (Windows NT 10.0; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/97.0.4692.99 Safari/537.36'.

We can also download via ftp protocol whereby the ftp server is password protected, we write,

wget --ftp-user=USERNAME --ftp-password=PASSWORD ftp://ftp.example.com/filename.zip

where 'USERNAME' is the placeholder for the actual username and 'PASSWORD' is the actual password.

And to download files from a HTTP server which is password protected we write,

wget --http-user=USERNAME --http-password=PASSWORD http://http.example.com/filename.zip

Mirroring a website involves creating a local copy of the website by downloading the html, css and javascript as well as a website's internal links. We use the -m option,

wget -m https://www.mirrorwebsite.com

Assuming we want to be able to browse the downloaded mirror, we use -k option which converts the links to make them suitable for viewing and -p option to download all files for displaying the html page.

wget -mkp https://www.mirrorwebsite.com

We can also write,

wget -rpEHk --restrict-file-names=windows -D mirrorwebsite.com -np https://www.mirrorwebsite.com

Where

-r, --recursive means download the whole site
-p, --page-requisites, to get all assets such as html, css and js
-E, --adjust-extension, to save files with necessary extensions e.g .html
-H, --span-hosts, to include necessary assets from offsite.
-k, --convert-links to update links so that they still work in the local version.
--restrict-file-names=windows, to modify file names so they can also work in a windows environment.
-D, --domains https:www.mirrorwebsite.com, so that wget doesn't follow links outside this domain
-np, --no-parent, so wget does not follow links outside the passed directory.

If we want to download a file over HTTPS protocol and ignore SSL certificate check, we use the --no-check-certificate option,

wget --no-check-certificate https://domain-with-invalid-ssl.com

Summary.

We use wget utility for downloading files, resuming interrupted downloads, mirroring websites, controlling the download speeds etc.

wget is a non-interactive network downloader that does not require a user to interact with it and therefore makes it useful for shell scripting.

References.

Execute the command wget --help or man wget for its manual page.
curl command