Exploring Python's Stat Module


Python's stat() module performs a stat system call on the given path and is used to get all information about a file or folder. It provides several information like inode number, size, number of hard links, time it was created and modified and much more.

So before understanding what python's stat() module is we need to understand a bit about the UNIX File System which is a logical method to organize and store large amounts of information in a way that makes it easy to manage.

We are going to take a look at how the files are stored in the unix system and what type of files are available in the unix system.

The directory tree

The files in unix system are organized in a similar manner as in a tree data structure and that's why known as a directory tree. At the very top of the file system there is a directory called "root" which is represented by "/" and all other files come below this.

For now we don't need to understand what each of these things represent.

Next we need to see what directories are.

In unix directories are equivalent to folder in Windows, A directory file contains an entry for every file and sub-directory that it houses so if suppose the directory has 10 files there are going to be 10 entries in the directory and each directory has its two components.

  • Filename.
  • A unique identification number for the file or directory(called the inode number), this Inode number is different for each file and can represent any type of file i nthe unix system and everything that exists in unix has a Inode number.

A unix file is stored in two different parts of the disk the data blocks and the inodes, The data blocks contain the "contents" of the file and the information is stored in the inode

In a unix like system you an look up the inode numbers by typing this command in the terminal.

ls -i

And the output will be something like this.

13 bin          2 dev         14 lib         17 libx32      14183425 mnt    7091713 root  11556865 snap         1 sys 

The numbers in the starting are the Inode numbers and after that is the files(in unix directories are just special files).

Apart from all this unix like operating systems define a user by a unique user id often abbreviated to uid and a group id abbreviated as gid all these are used to determine what files and what part of system can that user access.

As we know files reside on a physical piece of hardware that can be a SSD, hard disk or anything, In unix those are also given a number and often referred as dev and the files that reside there can be referred by a number of names or we can create a hardlink for a file, A hardlink is essentially a label or a name given to a file and we can refer a file by many different names with the help of hardlink so this means we can access the same file content with different names.

We know that if there exists a file it will acquire some space and that space is represented in bytes by the stat command in python.

Unix also records the time when you open or modify a file or when the metadata of the file is changed these informations are also stored in the inode which we are going to see.

Now let's dive into the python's stat module

Python's stat() module performs a stat system call on the given path.

You may ask what stat actually means? It means the status of a file or a directory and to be precise it return attributes about an inode.

We are going to see some python code and some examples for the pythons stat() module.

Syntax

os.stat(path)

Where the path is defined for the file or directory who's status is wanted.

Now we are going to see some code

import os

#A folder in my system
info_dir = os.stat('Music')
print(info_dir)

#A png file in my system 
info_file = os.stat('Pictures/85327.png')
print(info_file)

lets see the output.

#output for "print(info_dir)"
os.stat_result(st_mode=16877, st_ino=14183471, st_dev=66308, st_nlink=2, st_uid=1000, st_gid=1000, st_size=4096, st_atime=1583396013, st_mtime=1571541309, st_ctime=1571541309)

#Output for "print(info_file)"
os.stat_result(st_mode=33188, st_ino=14971652, st_dev=66308, st_nlink=1, st_uid=1000, st_gid=1000, st_size=119113, st_atime=1583320315, st_mtime=1581789576, st_ctime=1581789576)

"Music" is a folder in my system and "85327.png" is a picture so we can see that the stat() module can be used for any type of file as well as directories.

Understanding what the keywords mean

st_mode

It shows the inode protection mode, so what actually is a protection mode, let's look into it with a terminal command.

ls -lai | tail 

Output

 7091713 drwx------  10 root root       4096 Mar  6 13:49 root
       2 drwxr-xr-x  33 root root        940 Mar  9 17:06 run
      18 lrwxrwxrwx   1 root root          8 Oct 20 03:00 sbin -> usr/sbin
11556865 drwxr-xr-x  13 root root       4096 Mar  6 14:31 snap
 7748353 drwxr-xr-x   2 root root       4096 Oct 17 17:54 srv
      12 -rw-------   1 root root 2147483648 Oct 20 02:59 swapfile
       1 dr-xr-xr-x  13 root root          0 Mar  9  2020 sys
14971393 drwxrwxrwt  20 root root       4096 Mar  9 17:36 tmp
12082177 drwxr-xr-x  14 root root       4096 Oct 17 17:56 usr
13658113 drwxr-xr-x  15 root root       4096 Mar  8 02:48 var

As we can see after the inode number some strangely written alphabets those are the permissions, and that shows who can access or read or change the file that is the inode protection mode.

st_ino

Inode number, Its simply the inode number assigned to the file.

st_dev

The disk partition on which the inode is residing, On which storage device the inode is residing.

Number of hard links, A hard link is a directory entry that associates a name with a file on a file system.

st_uid

It contains the user id of the owner which has the rights to access and modify the inode, suppose there are 2 users on the system so files will have different user id which can assess the file.

st_gid

Group id of owner that has the rights to the inode.

st_size

Size in bytes of a plain file, Amount of data waiting on some special files.

st_atime

Time of most recent access or the last access.

st_mtime

Time of most recent or the last content modification.

st_ctime

The β€œctime” as reported by the operating system. On some systems (like Unix) is the time of the last metadata change, and, on others (like Windows), is the creation time.

The time doesn't looks like it is in a human readable form, lets do something about it.

import stat, time
print(time.ctime(info_dir[stat.ST_ATIME]))

Output

Wed Mar  4 16:41:55 2020

Now its good, you can use the similar syntax for st_mtime and st_ctime.

Use cases

  • Suppose you are writing to a file and want to check if some updates were made or not you can check by the last modified time.
  • You can check the file size directly from your code.
  • There are more uses but are a bit complex and can be thought of according to the problem faced.

Resources