If you have ever wanted to compare files or directories in a simple way, then Python's filecmp module is the perfect place to start. The module involves simple operations and data types (loops and lists) that can be replicated in any programming language.
filecmp just simplifies and shortens your code by taking care of the logic for you.
filecmp is very similar to the
difflib module, which we have discussed in a previous article. Check it out if you're curious to learn more about Python's modules.
In order to understand and use the
filecmp module, we will walk through a series of examples together. First, we need to go over the structure of the module and how it works.
The cmp class is used to generate a simple (boolean)
False result based on how similar the module finds two files to be.
filecmp.cmp(file1, file2, [shallow=False])
The class takes in two files and an optional third parameter known as shallow. This parameter determines how the two files will be compared; they will be examined from, either, a shallow or deep perspective.
If we set the boolean
cmp class will call the
os.stat() function on each iteration, passing in the current file that is being evaluated.
os.stat() method will take in the file and return a series of stats.
cmp will then compare that information with stats of the other files it has already evalutated. Some of the information returned includes: the size of the file (in bits), the date the file was last accessed, the user id of the file owner, and more. All of the stats are acquired from the file's stat signature.
Files that are compared using
shallow are only compared once, unless either of the files' stat signatures change. This prevents the program from repeating the iteration unnecessarily.
>>> import filecmp >>> filecmp.cmp('../dir1/text3.txt', '../dir2/text3.txt') False >>> filecmp.cmp('../dir1/text3.txt', '../dir2/text3.txt', shallow=True) True
cmp class is fairly basic in that it doesn't require any external functions, making it very portable. However, it is limited in its abilities since it returns only a boolean result and can only compare two files at a time.
Earlier, we saw how the
cmp class returns a boolean after comparing two files, but
cmpfiles compares two directories and returns the comparison in three lists:
errors. This gives us much more information, helping us understand the relationships between directories in more depth.
Using cmpfiles is very simple. Two directories are passed in as parameters to the cmpfiles method and each directory is opened and evaluated. We start out by defining
common as a list containing a string of each file name that is present in both directories. This tells
cmpfiles to loop through and compare the files that we've specified in
If the files are found to be the same,
cmpfiles concatenates the string onto the
match list, the files that are not a match are concatenated onto
mismatch. As you may have guessed, files that cannot be compared at all are concatenated onto the
Let's see cmpfiles in action:
import filecmp # from filecmp import cmpfiles dir1 = "/Users/lyndi/documents/opengenus/dir1" dir2 = "/Users/lyndi/documents/opengenus/dir2" common = ["text1.txt", "text2.txt", "text3.txt"] # shallow comparison match, mismatch, errors = filecmp.cmpfiles(dir1, dir2, common) # Note that we did not specify a shallow parameter # This defaults to shallow=True print("Shallow Comparison") print("Match: ", match) print("Mismatch: ", mismatch) print("Errors: ", errors, "\n\n") # deep Comparison match, mismatch, errors = filecmp.cmpfiles(dir1, dir2, common, shallow=False) print("Deep Comparison") print("Match: ", match) print("Mismatch: ", mismatch) print("Errors: ", errors) # output: Shallow Comparison Match: ["text2.txt"] Mismatch: ["text3.txt"] Errors: ["text1.txt"] Deep Comparison Match: ["text2.txt"] Mismatch: ["text3.txt"] Errors: ["text1.txt"]
In the above example, we could have specified whether or not
cmpfiles should perform a shallow comparison. However, in this case it would have evaluated to the same result.
dircmp class finds the difference of two directories by constructing a new directory comparison object. This allows the files in each directory to be compared via a shallow comparison.
For instance, in the next example we have three common files: "text1.txt", "text2.txt", and "text3.txt". They are common because both directories, dir1 and dir2, contain each of these files.
Remember that these files are only considered a match in a shallow comparison if their signatures are the same. If we take a peek at what is contained within the files (or perform a deep comparison), we will find that not all of these files are actually identical.
Below is an example that utilizes some of the attributes of the dircmp class:
import filecmp from filecmp import dircmp # prints out the difference between directories def printDiff(difference): for name in difference.diff_files: print("The difference found in %s and %s is %s" % (difference.left, difference.right, name)) difference = dircmp("../dir1", "../dir2") printDiff(difference) # output: The difference found in ../dir1 and ../dir2 is text3.txt
From the output, you may notice that "text3.txt" was the only file found to have a difference between the two directories. Essentially, this means that even though a file named "text3.txt" exists in both dir1 and dir2, the text inside did not match up exactly in both files.
This result also implies that "text1.txt" and "text2.txt" did match exactly. If we had received no result then that would mean that all of the files with common names had the same contents.
Did you notice the use of
difference.right in our example? The dircmp class can also evaluate a directory based on which parameter it was passed in as: the left or the right one. In this case, "../dir1" was passed in on the left (or as the first parameter) so it is equivalent to
The dircmp class has several attributes that pertain to the left or right parameters. Check out the Python documents for a closer look into these attributes to find out more about how they can be used for your own programs.
With this article at OpenGenus, you must have the complete idea of using filecmp in Python. Enjoy.