Learn to use GNU Tar compression tool for Unix like systems


Reading time: 30 minutes | Coding time: 5 minutes

GNU Tar (Tape Archiver) is an open source file compression and decompression tool. In this article, we will explore how to use it along with its different options.

We will cover the following sub-topics:

  1. Create a .tar archive file
  2. Extract a .tar archive file
  3. List the contents of archive file
  4. Append files at end of archive files
  5. Creating a single archive file for multiple file systems
  6. Create a .tar.gz archive file
  7. Extract a .tar.gz archive file
  8. Shell script to understand .tar versus .tar.gz compressed files
  9. Checking diff between archive file and source file system
  10. Updating archive after changes in file system

Syntax:

tar -[Options] filename

Options:

-A, --catenate, --concatenate appends newly created tar file with a previous version of the tar file.
-c, --create: Creates a new .tar or .tar.gz archive
-d, --diff, --compare: compares the file system and its archived file
--delete: delete files from the archive
-r, --append: appends file at end of the tar file
-t, --list: show all the files present in the archived tar file
-u, --update: adds the files from the file system not present in the archive file
-x, --extract, --get: extracts the files from archive to the destination folder.
-v, --verbose: verbosely gives summary of each process in execution of tar command.
-z, --gzip: creates or decompresses a tar.gz file

Note: Append adds a file at end of tar file while concatentation adds another tar at end of the tar file.

1. Create a .tar archive file

Syntax:

tar -c[v]f /{destination address}/{compressedfilename}.tar {file system}

Here,

  • -c: option is used to create a .tar file
  • -v: is an optional tag for verbose, i.e., display a summary of each task performed during file compression
  • -f: tag is used to access files to be archived.
  • tar creates a compressed file called {compressedfilename}.tar and stores it in the destination address.
  • The file system to be archived must be specified with the absolute address.

Implementation:

tar cvf compress.tar /home/nishkarshraj/Desktop/HelloWorld
  • A HelloWorld directory exists on the absolute path (with respect to root directory) /home/nishkarshraj/Desktop where nishkarshraj is a user in Linux machine.
  • tar compresses the files into an archive file compress.tar and displays progress verbosely.
  • Since destination path of the compress.tar file is not specified, it is stored on the current directory of execution of the command, i.e., root directory.

cvf

  • First command ls shows the content of current directory which has no archive files.
  • tar command compresses the HelloWorld directory in the specified path and creates a compress.tar file in current directory.
  • ls command entered again shows the existence of compress.tar file in current directory.

2. Extract a .tar archive file

Syntax:

tar -x[v]f {path to}/{filename}.tar
  • -x: tag specifies tar to extract the archive file
  • -v: is an optional tag which displays summary of each process in the tar extraction.
  • -f: tag fetches each file of the archive to be extracted.
  • tar tool extracts the filename.tar file in the same folder where the file system previously existed before being compressed.

Note: Reason of tar extracting the files in same location they were archived is that .tar files store namespace file rather than filename. Thus, a file called file1.txt stored at /home/Desktop will be stored as /home/Desktop/file1.txt in archive rather than as file1.txt

Implementation:

tar xvf compress.tar
  • tar extracts the archive file present in specified path (here, no path is specified in prefix of the tar file, thus current directory is taken) and sends the extracted files in the same filesystem from which they were compressed.

xvf

Working on the same compress.tar file which was created in Task 1:

  • First list content of /home/nishkarshraj/Desktop using the ls command and check that it does not contain the HelloWorld Directory.
  • Use the tar tool to extract the files from archive.
  • List content of /home/nishkarshraj/Desktop again to verify HelloWorld directory is created.

3. List contents of the archive file

It is possible to see the individual files present in the archive file using tar command.

Syntax:

tar -tf filename.tar
  • -t option is used to list content of the archive file
  • -f option fetches each file present in archive file.

list

4. Append files at the end of archive files

It is possible to append files at the end of archive files using tar command.

Syntax:

tar -rf {filename}.tar {file to be attached}

or

tar --append -f {filename}.tar {file to be attached}
  • -r or --append tag is used to append the file specified at end of archive file.

append

Here, following command are used on the Shell:

  • tar -cvf file.tar /home/nishkarshraj/Desktop/HelloWorld
    It creates an archive file for specified HelloWorld directory at current location (root /).
  • ls
    Lists content of current folder to verify the creation of the archive file highlighted in red color.
  • echo "test data" >> test.data
    It creates a new file called test.data with the content "test data"
  • ls
    Lists content of the current folder to verify the creation of test.txt file
  • tar -rf file.tar test.txt
    Appends the test.txt file at the end of file.tar archive.
  • tar -tf file.tar
    Lists the content of the file.tar file showing the newly added test.txt file at end of it.

5. Create a single archive file for multiple file systems

Multiple file systems can be compressed into one archive file by the tar.
Specify all the file system to be compressed in space separated list after the filename.tar in tar command.

Implementation:

Multiple

Here, two directories, HelloWorld/ and Test/ are compressed into a single archive file.

6. Create a .tar.gz archive file

Tar tool can be used to create another type of archive file with the extension .tar.gz which follows the GNU Compression algorithm.

Syntax:

tar -c[v]zf {Destination path}/{filename}.tar.gz {file system}

-z: -z option specifies the tar to create an archive file using GNU compression algorithm

Implementation:

tar cvzf file1.tar.gz /home/nishkarshraj/Desktop/HelloWorld
  • It creates a file1.tar.gz archive file in current directory (Here root directory, /)
  • The source file system to be compressed is HelloWorld in the /home/nishkarshraj/Desktop path.

gz

  • ls command displays that no archive file is present in the current folder (here, root).
  • tar command compresses the HelloWorld directory in the specified path to a archive file file1.tar.gz in the current path.
  • ls command entered again displays the newly created .tar.gz archive file.

7. Extract a .tar.gz archive file

Syntax:

tar -x[v]zf {path to}/{filename}.tar.gz

It extracts all files in filename.tar.gz directory and stores them in the source file system.

Implementation:

tar xvzf file1.tar.gz

xgz

  • ls command for /home/nishkarshraj/Desktop shows that HelloWorld directory does not exists in the path.
  • tar xvzf on the file1.tar.gz archive file extracts the folder into /home/nishkarshraj/Desktop path.

8. Shell script to understand .tar versus .tar.gz compressed files

Generally speaking, GNU compressed archived files with extension .tar.gz are more efficient that normal .tar archive files but this is not true for all file systems.

Here, a shell script is created to compress a same file system (HelloWorld directory) using both simple compression and GNU compression algorithm and their respective size are displayed using the du Disk utility command.

Code:

#!/bin/bash

# Simple compression
tar cvf file1.tar /home/nishkarshraj/Desktop/HelloWorld
du -sh file1.tar

# GNU compression
tar cvzf file2.tar.gz /home/nishkarshraj/Desktop/HelloWorld
du -sh file2.tar.gz

shell

Output:

compare

Here,

  • Disk usage of .tar file: 12 Kb
  • Disk usage of .tar.gz file: 4 Kb

Thus, .tar.gz files have higher compression rate.

9. Check diff between archive file and source file system

Tar tool can be used to check the difference between the .tar archive file and the source file system.

Syntax:

tar -dvf {filename}.tar {path of source folder}
  • -d command is used to see the diff

Lets create an archive file from the same HelloWorld directory called file.tar and then change it to check the diff.

Creation of file.tar for HelloWorld directory

diff_create

Modify the HelloWorld directory

diff_modify

Explanation of the Image

HelloWorld directory consists of two files: Intro.md and test.txt

  • We see the diff between file system and archive file.
    Since, no modifications are done, diff works as a listing of files in the archive.
  • We modify the test.txt file by adding "mod" string at the end of file.
  • We see the diff again and the tar command lists the files in the archive along with message that:
    mod time differs: Modification time of test.txt in archive file and in the file system differs.
    size differs: Size of the test.txt file in archive differs from that in the file system.

Deleting files from the file system

Check the diff if files are deleted from the file system.

diff_delete

Explanation of the image:

  • We remove the test.txt file using the rm command.
  • We see the diff and it lists the content of the archive file with following message after test.txt
    Warning: Cannot stat: No such file or directory
    This message signifies that the test.txt file in the archive is no longer mapped with the file in archive which means the file has been deleted from file system.

Creation of new files in filesystem

We check the diff of archive and file system by creating new files not existing and thus not mapped in the archive files.

diff_newfile

Explanation of the image:

  • We create a new file called new.txt containing a string new
  • We see the diff but it does not show any output related to new.txt because there was no such file on creation of archive files.

Conclusion:

diff function maps the individual files in the archive against the original file system to check for modification with respect to modification time and size and also to check if the original files are deleted or not but does not check creation of new files in the same file system.

10. Update the archive file with modified file system

Tar can be used to update the archive file to have the same content as that of the modified source file system having diff with it.

Syntax:

tar -uf {filename.tar} {path to source file system}

or

tar --update -f {filename.tar} {path to source file system}

update

Explanation:

Here, we continue from the last step of diff with a newly created new.txt file and deleted test.txt file as diff with the archive file

  • We update the tar file with the current state of the file system.
  • On seeing the diff again:
  1. new.txt file is added.
  2. test.txt file is not removed from the archive even though it is deleted from the main file system.

Conclusion:

The update command for tar tool updates the archived files if they are modified, adds a new file if created but does not remove archived files that are deleted from the source archive system.

References/Further Reading:

GNU Tar Manual page for Linux