There might be times when you need to delete some files which have not been accessed for a long time say N days ,deleting all these files manually can be a tedious task so we need to write a python script to recursively traverse all the directories present in the given path or the present working directory to find all the files whose last access time is more than N days.
In this article, you will learn how to use the functionality provided by python modules to get the last access time of a file, recursively traversing the directories and deleting files.
os module:The os module in python is used to provide user a portable way of using the operating system dependent functionality.
time module:This module provides various time related functions.
Stepwise breakdown of the problem (algorithm)
- Getting the path of the desired directory from the user or setting the path to the present working directory in case no path is entered.
- Checking whether the path of the directory entered by the user exists or not.
- if the path is not valid exit the program.
- Getting the number of days since last access of file from the user.
- Getting the present time of the system to check the difference between the last access time of file with current time.
- Storing the difference of current time and the N days to get the expected last access of the files that should not be deleted.
- Showing all the files that we found under the given path/directory and it's subdirectories by using the function os.walk().
- Getting the list of all the files present under the given path or the present working directory.
- Getting the last access time of the file and checking whether it is less than the difference of present time and the N days in seconds.
- If the last access file is less then the expected last access time display the file path and then delete the file.
- Continue step 8 and 9 until all the files in the directory are checked.
- Print the list of all the files that are left after the deletion process.
Note:It is suggested to not work in the present working directory as it would delete the python script file if the input days is 0.
Now we will learn about some of the important steps which we used in the above algorithm.
- Getting current time.
- Recursively traversing the directories.
- Getting the last access time of the files.
- Deleting/removing the files.
- Converting the time we got in seconds into date and time format.
Getting current time:
We use the time module to get the current time.using the time.time() function we can get the current time of the machine/os.The time() function returns the current time of the os/machine in seconds.
time():This function is used to get the current time of the system in seconds.
import time #storint current time in seconds into t t=time.time() print(t,"Seconds")
Recursively traversing the directories
We might have a condition where there are multiples directories and in the given path .We use the os.walk() function to recursively traverse all the directories and get list of roots,directories and files.
os.walk():Generate the file names in a directory tree by walking the tree either top-down or bottom-up. For each directory in the tree rooted at directory top (including top itself), it yields a 3-tuple (dirpath, dirnames, filenames).
import os path="path to the desired directory" #os.walk return a tuple containing 3 elements root,directories and files. for (root,directory,file) in os.walk(path): print(root) print(directory) print(file) print("___________________")
Getting path to the files
Now that we have the root and the list of files in the root we need to join the files in the root to get the path of the file.We use os.path.join() function to join two or more path together.In our case we need to join the filename which we got from the list with the root do we pass root and filename as arguments.
os.path.join():Join one or more path components intelligently. The return value is the concatenation of path and any members of paths with exactly one directory separator (os.sep) following each non-empty part except the last, meaning that the result will only end in a separator if the last part is empty.
import os path="Path to the desired directory" #root contains the path to the directory which the function is traversing. #directory contains the list of directories in the root. #files contain the list of files in the root. for (root,directory,files) in os.walk(): for file in files(): pth=os.path.join(root,file) print("path of file:",pth)
Getting the last access time of the file
We need to get the last access time of the file using the path which we created in the last step.To get the last access time of the file we use the os.stat() function to get the details of the file by passing it's path as the argument.
os.stat():Get the status of a file or a file descriptor. Perform the equivalent of a stat() system call on the given path.Returns a stat_result object.
The stat_result object contains the following attributes:
We will need to access the st_atime attribute to get the most recent access time of the file in seconds.
import os path="Path to the desired directory" #first we take the stat_result object in the filestat #then we extract the last access time in acstime variable for (root,directory,files) in os.walk(): for file in files(): pth=os.path.join(root,file) filestat=os.stat(pth) acstime=filestat.st_atime print("path of file:",pth,"\tlast access time:",acstime)
Deleting/removing the files
We now have got all the required data to check whether a file should be deleted or not.Now we will learn how to delete a file, to delete a file using python we need to use the os.remove() function which allows us to delete a file by passing the path of the file or a file descriptor as an argument(parameter) of the function.
import os path="path of the file to be deleted" os.remove(path) print("The file has been deleted.")
Converting the time we got in seconds into date and time format
Although we have covered all the required steps that we need to delete a file based on it's last access time you can skip this step if you want to, this step is just to make the output look better and so that we can understand from the output which files were deleted by getting the access time in human readable format. for this we will use the time.ctime() function provided by the time module to convert the time in seconds to the date and time format.
import time #getting present system time in seconds t=time.time() #converting the time in secondds into date and time format tf=time.ctime(t) print("present time:",tf)
Code of the problem statement
#importing os and time module import os import time #Getting path of file in pth variable pth=input("Enter path to the directory where you want to delete files or press enter for current working directory:") #Checking whether pth is empty or not if empty store path of present working directory if len(pth)==0: pth=os.getcwd(); else: #if pth has some value we check whether the path is a directory or not. if not os.path.isdir(pth): print("Wrong path!!!") exit(0) #getting number of days since access in variable n n=int(input("Enter number of days since last access date of file:")) #converting days into seconds n=n*86400 #Getting present time of system in seconds ptime=time.time() print("\n\t*****\t*****\n") print("List of files and directories before deleting:") #Recursively traversing files and directories using os.walk() for roots,dirs,files in os.walk(pth): for f in files: print(os.path.join(roots,f)) print("\n\t*****\t*****\n") print("\nPresent system time:",time.ctime(ptime)) print("\nLast access time should be greater than:",time.ctime(ptime-n)) #Getting the access time of the files and deleting files whose access time is more than the the expected access time print("\nFiles that are being deleted:") print("\nFile name\t\t\tLast access time") for roots,dirs,files in os.walk(pth): for f in files: fil=os.path.join(roots,f) filstat=os.stat(fil) at=filstat.st_atime if at< ptime-n: print(fil,":",time.ctime(at)) os.remove(fil) print("\n\t*****\t*****\n") print("List of files and directories after deleting:") for roots,dirs,files in os.walk(pth): for f in files: print(os.path.join(roots,f))
Enter path to the directory where you want to delete files or press enter for current working directory:H:\OpenGenus_Projects\timecheckpy Enter number of days since last access date of file:8 ***** ***** List of files and directories before deleting: H:\OpenGenus_Projects\timecheckpy\file1.txt H:\OpenGenus_Projects\timecheckpy\file2.txt H:\OpenGenus_Projects\timecheckpy\a\file3.txt H:\OpenGenus_Projects\timecheckpy\b\file4.txt H:\OpenGenus_Projects\timecheckpy\c\file5.txt H:\OpenGenus_Projects\timecheckpy\c\d\file6.txt ***** ***** Present system time: Tue Mar 3 18:39:29 2020 Last access time should be greater than: Mon Feb 24 18:39:29 2020 Files that are being deleted: File name Last access time H:\OpenGenus_Projects\timecheckpy\file1.txt : Fri Feb 21 18:35:34 2020 H:\OpenGenus_Projects\timecheckpy\a\file3.txt : Fri Feb 21 18:35:59 2020 H:\OpenGenus_Projects\timecheckpy\b\file4.txt : Fri Feb 21 18:36:07 2020 H:\OpenGenus_Projects\timecheckpy\c\d\file6.txt : Fri Feb 21 18:36:31 2020 ***** ***** List of files and directories after deleting: H:\OpenGenus_Projects\timecheckpy\file2.txt H:\OpenGenus_Projects\timecheckpy\c\file5.txt
With this we have learned how to delete files that have not been accessed since N days by using the functionalities provided by different Python modules.