Open-Source Internship opportunity by OpenGenus for programmers. Apply now.
Reading time: 25 minutes
As an explorer of the Cosmos, you are destined to come across data stored in files. You need to know how to open, read and write data into them. This post will cover how to work with text files specifically.
We cover:
- Opening a file
- Reading a file: read() and readlines()
- Writing to files
- keeping track of pointers: tell() and seek()
- Closing a file
- File attributes
Opening a file
Python has inbuilt functionality to handle file related tasks which makes it a lot more easier to tackle them.
# syntax for opening a file
fileObj = open('filename', 'mode')
The open()
function accepts one mandatory argument i.e., the name of the file (basically the path where the file resides) you want to open and other arguments are optional. Generally you will pass the mode
which is a string that represents how you want to open the file. Open function returns a file object which is a class that exposes us to the various file related APIs (methods) and stores information about the file.
The most common modes are:
- 'r' - Opens the file in Read Only mode. Pointer is placed at the beginning. This is the default mode
- 'w' - Opens the file in Write mode. Overrides the file if it exists else creates a new file for writing
- 'a' - Opens the file in Append mode. Pointer is placed in the end. If file does not exist new one is created.
Reading a file
Read method
If you want to read the entire contents of the file as a string value the .read()
method is useful.
>>> fileObj = open('hello.txt')
>>> fileObj.read()
'Hi, how are you?\nI am fine! Thanks for asking.\n'
Note that \n
denotes end of a line in linux based systems. On windows this may be \r\n
.
read() method optionally takes in an argument size
to specify how many bytes of data you want to read.
>>> fileObj = read('hello.txt')
>>> fileObj.read(11)
'Hi, how are'
Readlines method
Alternatively, you can use the .readlines()
method to get a list of string values from the file, each string corresponding to each line of text.
>>> opengenus = open('opengenus.txt', 'r')
>>> opengenus.readlines()
['Join the strongest computer science community in the World for free.\n',
'Learn computer science concepts with simplicity and innovation.\n']
Iterate over each line
We can simply apply a for loop over the file object to access data line by line in each iteration. This approach is more pythonic and efficient.
>>> lang = open('program.txt')
>>> for each in lang:
... print(each.rstrip('\n'))
...
Python
C
Html(oops)
Javascript
Php
Notice that we are using .rstrip()
to remove extra \n
after each line of the text. Alternatively you can use print(each, end='')
to prevent python from adding additional new line at the end and only print what's written in the file.
Writing and appending to a file
Writing contents to a file is similar to printing strings on the screen. We can't write to a file we have opened in read mode though.
newFile = open('newfile.txt', 'w')
newFile.write("You can run but you can't hide.")
# You can also open multiple files in one program
another = open('oldFile.txt', 'a')
another.write("Ready.\nSteady.\nGo.")
Keeping track of the pointer
Have a look at these code snippet. Can you figure out what is going wrong?
>>> data = open('important.txt', 'r')
>>> data.read()
'This works. There is definately some text here.'
>>> data.read()
''
Wierd right? It seems that the first time we read in from data it gave us the output but the second time it just returned an empty string?
That's where the concept of pointers come in. A pointer can considered as a cursor that keep tracks of the the current location we are at in the file. Whenever we do an operation on the file it is shifted forward by a number of bytes respectively.
We cannot read or write any data before the current pointer location.
So how do we read the data again? One possible solution is to store the contents of a file in a variable the first time we read it and use that variable whenever we need to access the contents again. Or we can change the pointer location manually.
tell and seek methods
The .tell()
method returns the current position of the pointer and with .seek()
we can change the current position. Seek accepts two arguments offset and reference.
Reference can be one of these:
- 0 - Beginning of the file and is default
- 1 - Current position of file
- 2 - End of file
>>> data = open('important.txt', 'r')
>>> data.tell()
0
>>> data.read()
'This works. There is definately some text here.'
>>> data.tell()
48
>>> data.seek(0)
0
>>> data.read()
'This works. There is definately some text here.'
Closing the file
After we are done with casting spells on our file we need to close it like a good programmer. Not doing so can lead to unwanted memory leaks and side effects. To close a file we simply call the .close()
method on the file object.
lol = open('laugh.txt', 'w')
lol.write("That's not a bug, that's a feature.")
lol.close()
A better way?
To be honest closing a file can slip off from the top of our mind. Hence the recommended way is to use the with
statement. It automatically closes the file once it leaves the inner block.
with open('laugh.txt', 'w') as lol:
lol.write("That's not a bug, that's a feature.")
# further processing on the file
print("Done.")
File Object attributes
They are useful to know the current state of files and other information. Some common file attributes are given below.
- name - returns the name of the file
- closed - returns True if file is closed, False if opened
- encoding - returns the encoding of file
- mode - returns the mode in which file was opened
>>> article = open('tutorial.txt', 'w')
>>> article.name
'tutorial.txt'
>>> article.closed
False
>>> article.encoding
'UTF-8'
>>> article.mode
'w'
Backslash vs Forward Slash
Windows uses backslashes(\) as seperator between folder names and Linux/OS X uses(/). Our program should be able to handle both cases and run on any machine. To solve this problem we can use os.path.join()
function from os module. It accepts multiple folder names as arguments as returns a file path as a string using correct seperators.
import os
path = os.path.join('folder', 'extras', 'firefox.exe')
# 'folder/extra/firefox.exe' on linux/osx
# 'folder\\extra\\firefox.exe' on windows
print(path)
os library contains many other useful functions for handling filenames and paths but it is beyond the scope of this article.
Stringing it all together
Here is program to copy contents from one file to another to apply the concepts learnt above.
f1 = input('Enter source file name: ')
f2 = input('Enter destination file name: ')
with open(f1, 'r') as source:
inputString = source.read()
with open(f2, 'w') as dest:
dest.write(inputString)
Yaaay!!
You did it. That was a lot of stuff to take in. Best way to learn is to practice. Go on and experiment with different stuff to make your concepts clear. Happy coding!