Open-Source Internship opportunity by OpenGenus for programmers. Apply now.
wget is a Linux command line utility useful for retrieval and downloading of files from the internet. It uses protocols such as FTP, HTTP and HTTPS for such tasks. In this article we have discussed commonly used wget commands and options.
Table of contents.
- Introduction.
- Syntax.
- Commands.
- Summary.
- References.
Introduction.
wget is a Linux command line utility useful for retrieving or downloading files from the internet. It uses protocols such as FTP, HTTP and HTTPS for such tasks.
wget can work even with slow and unstable network connection and when we loose a connection, we can proceed with a download from where it stopped. It is also used to mirror websites.
Syntax.
The syntax is as follows,
wget [OPTION] [URL]
Commands.
To download a file from a website https://opengenus.org/notes.pdf we write,
wget https://opengenus.org/notes.pdf
Fora silent download we use the -q option,
wget -q https://opengenus.org/notes.pdf
If we want to download the file into a specific directory we write,
wget -P ~/Documents/notes https://opengenus.org/notes.pdf
where ~/Documents/notes is the directory we specify and where the downloaded file will be stored.
We can also save the downloaded file under a different name by using the -O option,
wget -O newFile.pdf https:///www/url.com/notes.pdf
The command will download notes.pdf and save it as newFile.pdf.
We can also download multiple files by saving the sources in a file, e.g
vim urls.txt
Enter the following urls,
https://www.url1.com/pdf
https://www.url2.com
https://www.url3.com/documents
To fetch the files, we write,
wget -i urls.txt
wget will fetch all files from the three urls described in urls.txt.
We can also opt to write all urls in the command itself as follows,
wget https://www.url1.com/pdf https://www.url2.com https://www.url3.com/documents
We can also download multiple files of the same type from the same source using the wildcard character * as follows,
wget ftp://exampleserver/music/*.mp3
The above command downloads all files with .mp3 file extension.
Assuming we want to proceed with other work, wget can work in the background by using the -b option as follows,
wget -b https:///www/url.com/files
We can define a log file where information pertaining to the download will be stored, that is while it is working in the background,
wget -b info.log https:///www/url.com/files
We can also log messages into a file by using the -o option,
wget https:///www/url.com/files -o info.log
Assuming we are in the process of downloading something and we get interrupted, we can use the -c option to resume the download as follows,
wget -c https:///www/url.com/image.iso
We can also limit the download speed by using the --limit-rate option as follows,
wget --limit-rate=128k https:///www/url.com/image.iso
The above command downloads files at 128kbs.
Speeds can be in k for kilobytes, m for megabytes and g for gigabytes.
A remote resource may be configured to block wget user agents to bypass such a situation we can use the -U option to emulate a different browser as follows,
wget --user-agent="Mozilla/5.0 (Windows NT 10.0; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/97.0.4692.99 Safari/537.36" http://wget-forbidden.com/
Where we define the user agent as 'Mozilla/5.0 (Windows NT 10.0; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/97.0.4692.99 Safari/537.36'.
We can also download via ftp protocol whereby the ftp server is password protected, we write,
wget --ftp-user=USERNAME --ftp-password=PASSWORD ftp://ftp.example.com/filename.zip
where 'USERNAME' is the placeholder for the actual username and 'PASSWORD' is the actual password.
And to download files from a HTTP server which is password protected we write,
wget --http-user=USERNAME --http-password=PASSWORD http://http.example.com/filename.zip
Mirroring a website involves creating a local copy of the website by downloading the html, css and javascript as well as a website's internal links. We use the -m option,
wget -m https://www.mirrorwebsite.com
Assuming we want to be able to browse the downloaded mirror, we use -k option which converts the links to make them suitable for viewing and -p option to download all files for displaying the html page.
wget -mkp https://www.mirrorwebsite.com
We can also write,
wget -rpEHk --restrict-file-names=windows -D mirrorwebsite.com -np https://www.mirrorwebsite.com
Where
- -r, --recursive means download the whole site
- -p, --page-requisites, to get all assets such as html, css and js
- -E, --adjust-extension, to save files with necessary extensions e.g .html
- -H, --span-hosts, to include necessary assets from offsite.
- -k, --convert-links to update links so that they still work in the local version.
- --restrict-file-names=windows, to modify file names so they can also work in a windows environment.
- -D, --domains https:www.mirrorwebsite.com, so that wget doesn't follow links outside this domain
- -np, --no-parent, so wget does not follow links outside the passed directory.
If we want to download a file over HTTPS protocol and ignore SSL certificate check, we use the --no-check-certificate option,
wget --no-check-certificate https://domain-with-invalid-ssl.com
Summary.
We use wget utility for downloading files, resuming interrupted downloads, mirroring websites, controlling the download speeds etc.
wget is a non-interactive network downloader that does not require a user to interact with it and therefore makes it useful for shell scripting.
References.
- Execute the command wget --help or man wget for its manual page.
- curl command