Reading Data from Files in Java


Reading time: 35 minutes

Reading external files in Java can be done in many ways depending on the type, size and content of the file. This article will be primarily focusing on .txt files, and hence the techniques explained below are related to this type only. File handling can be complicated at times, but if you understand the basic steps, then you're all set. Some intuitive things which are common for all the techniques are opening the file, declaring a variable with the appropriate function, reading the data with the corresponding method with or without a loop, and closing the file after the reading process is over.

On going through this article, you will learn about:

  • 5 different ways to read files in Java
  • What is the role of encodings like UTF-8 in reading data in Java?
  • Errors you may encounter while reading files
  • Applications of the different techniques of reading files

We will explore 5 different ways of reading files in Java:

  • BufferedReader
  • Scanner
  • StreamTokenizer
  • FileChannel
  • DataInputStream

1) BufferedReader

This function reads text from a character input stream. It buffers the characters so as to provide efficient reading of characters, arrays and lines. Each read request made of in a BufferedReader method causes a corresponding read request to be made of the underlying character or byte stream. Hence, it is necessary to wrap a BufferedReader around any Reader whose read() operations may be costly, like FileReader or InputStreamReader functions. It belongs to the java.io.BufferedReader import package. Some of its common methods are:

  • read()- Reads a single character
  • read(char[] cbuf, int off, int len)- Reads characters into a part of an array
  • readLine()- Reads a line of text
  • skip(long n)- Skips characters
  • close()- Closes the stream and releases system resources associated with it

Syntax:

BufferedReader br = new BufferedReader(new FileReader("filename.txt"));

Following this, one can get the first line as:

String line = br.readLine();

Code to read file line by line:

public void read() throws IOException 
{
    File file = new File("opengenus.txt"); 
    BufferedReader br = new BufferedReader(new FileReader(file)); 
    String st; 
    while ((st = br.readLine()) != null) 
        System.out.println(st); 
}

2) Scanner

This function can parse primitive types and strings using regular expressions. A Scanner method breaks its input into tokens using a delimiter (default: whitespace). T he default whitespace delimiter used by a Scanner is recognized by Character.isWhitespace. However, it can also use delimiters other than a whitespace by using the useDelimiter() method. The resulting tokens may then be converted into values of different types using next() methods. It belongs to the java.util.Scanner import package. Some of its common methods are:

  • delimiter()- Returns the pattern this scanner is using to match delimiters
  • hasNext()- Returns true if this scanner has another token in its input
  • next()- Finds and returns the next complete token from this scanner
  • match()- Returns the match result of the last scanning operation performed by this scanner
  • skip(Pattern pattern)- Skips input that matches the specified pattern, ignoring delimiters
  • close()- Closes this scanner

Syntax:

Scanner sc = new Scanner(new File("filename.txt"));

Following this, one can get the first line as:

String line = sc.nextLine();

Code:

public void read() throws IOException 
{
    File file = new File("opengenus.txt"); 
    Scanner scan = new Scanner(file); 
    while (sc.hasNextLine()) 
        System.out.println(sc.nextLine()); 
} 

3) StreamTokenizer

This function takes an input stream and parses it into tokens, reading them one at a time. The tokenizer sees whether the next token is a string or number by looking at the tokenizer.ttype field. So if the field was tokenizer.nval, then the type was a number or tokenizer.sval then the type was a string. The stream tokenizer can recognize identifiers, numbers, quoted strings, and various comment styles. Each byte read from the input stream is regarded as a character in the range '\u0000' through '\u00FF'. It belongs to the java.io.StreamTokenizer import package. Some of its common methods are:

  • nextToken()- Parses the next token from the input stream of this tokenizer
  • parseNumbers()- Specifies that numbers should be parsed by this tokenizer
  • toString()- Returns the string representation of the current stream token and the line number it occurs on
  • quoteChar(int ch)- Specifies that matching pairs of this character delimit string constants in this tokenizer
  • wordChars(int low, int high)- Specifies that all characters c in the range low to high are word constituents
  • lowerCaseMode(boolean b)- Determines whether or not word token is automatically lowercased

Syntax:

StreamTokenizer tokenizer = new StreamTokenizer(new FileReader("filename.txt"));

Following this, one can get the first token (as a word) as:

String line = tokenizer.sval;

Code:

public void read() throws IOException 
{
    String file = "opengenus.txt";
    FileReader reader = new FileReader(file);
    StreamTokenizer tokenizer = new StreamTokenizer(reader);
    int t; 
    while((t = tokenizer.nextToken()) != StreamTokenizer.TT_EOF) 
    { 
        switch(t) 
        {
            case StreamTokenizer.TT_NUMBER: 
                System.out.println("Number : " + tokenizer.nval); 
                break; 
            case StreamTokenizer.TT_WORD: 
                System.out.println("Word : " + tokenizer.sval); 
                break;
        }
    }
}

4) FileChannel

This function is used for reading, writing, mapping, and manipulating a file. It can read large files faster. FileChannel is also safe for use by multiple concurrent threads. A channel and buffer goes hand in hand. Channels are the tube through which data is transferred and buffers are the source and target of those data transfer. It belongs to the java.nio.channels.FileChannel import package. Some of its common methods are:

  • position()- Returns this channel's file position
  • read(ByteBuffer dst)- Reads a sequence of bytes from this channel into the given buffer
  • write(ByteBuffer src)- Writes a sequence of bytes to this channel from the given buffer
  • size()- Returns the current size of this channel's file
  • truncate(long size)- Truncates this channel's file to the given size

Syntax:

    RandomAccessFile reader = new RandomAccessFile("filename.txt", "r");
    FileChannel channel = reader.getChannel();

Code:

public void read() throws IOException 
{
    String file = "opengenus.txt";
    RandomAccessFile reader = new RandomAccessFile(file, "r");
    FileChannel channel = reader.getChannel();
    ByteBuffer buffer = ByteBuffer.allocate(1024);
    channel.read(buffer);
    buffer.flip();
    channel.close();
    reader.close();
}

5) DataInputStream

This function lets an application read primitive Java data types from an underlying input stream in a machine-independent way. An application uses a data output stream to write data that can later be read by a data input stream. A key point is that it is not necessarily safe for multithreaded access. The DataInputStream class is often used together with a DataOutputStream.These streams represent Unicode strings in a format that is a slightly different to that of UTF-8. It belongs to the java.io.channels.DataInputStream import package. Some of its common methods are:

  • read(byte[] b)- Reads some number of bytes from the contained input stream and stores them into the buffer array b
  • read(byte[] b, int off, int len)- Reads up to len bytes of data from the contained input stream into an array of bytes
  • boolean readBoolean()- Reads one input byte and returns true if that byte is nonzero, false if that byte is zero
  • char readChar()- Reads two input bytes and returns a char value
  • double readDouble()- Reads eight input bytes and returns a double value
  • int readInt()- Reads four input bytes and returns an int value

Syntax:

DataInputStream dataInputStream = new DataInputStream(new FileInputStream("file.txt"));

Code:

public void read() throws IOException 
{
    String file = "opengenus.txt";
    DataInputStream dataInputStream = new DataInputStream(new FileInputStream(file));
    int i= dataInputStream.readInt();
    float f = dataInputStream.readFloat();
    long l = dataInputStream.readLong();
    dataInputStream.close();
}

Applications

There are multiple ways of reading text files in java. Every method provides something special. For example:

  • BufferedReader provides buffering of data for fast reading
  • Scanner provides parsing ability
  • StreamTokenizer tokenizes characters making the reading process more accessible
  • FileChannel allows reading of large files
  • DataInputStream can be used to read read primitive Java data types

The type of data that each function reads also differs to some extent.

  • BufferedReader can only read string datatypes.
  • Scanner can read string and other data types like int, float, long, double, float etc.
  • StreamTokenzer parses the data and reads it as tokens.
  • FileChannel uses a buffer to read data and reads it byte by byte.
  • DataInputStream can read data as numbers instead of just bytes.

Depending on the size of the file to be read, the functions will vary.

  • If the file is small (approximately 1kb) then a Scanner can be used.
  • BufferedReader can read files upto 8kb.
  • FileChannel and DataInputStream can read much larger files at a faster pace.

UTF-8 Files

UTF-8 is a compromise character encoding that can be as compact as ASCII but can also contain any unicode characters (with some increase in file size). UTF stands for Unicode Transformation Format. The '8' means it uses 8-bit blocks to represent a character. One of the nice features of UTF-8 is that it is compatible with null terminated strings. No character will have a null (0) byte when encoded.

The java.io package provides classes that allow you to convert between Unicode character streams and byte streams of non-Unicode text. With the InputStreamReader class, you can convert byte streams to character streams. The OutputStreamWriter class is used to translate character streams into byte streams. When you create InputStreamReader and OutputStreamWriter objects, you specify the byte encoding that you want to convert. For example, to translate a text file in the UTF-8 encoding into Unicode, you create an InputStreamReader as follows:

public void readInput() throws IOException 
{
    FileInputStream fis = new FileInputStream("opengenus.txt");
    InputStreamReader isr = new InputStreamReader(fis, "UTF8");
    String str = isr.getEncoding();
}

The StreamConverter program converts a sequence of Unicode characters from a String object into a FileOutputStream of bytes encoded in UTF-8. The method that performs the conversion is called writeOutput:

public void writeOutput(String str) throws IOException 
{
    FileOutputStream fos = new FileOutputStream("test.txt");
    Writer out = new OutputStreamWriter(fos, "UTF8");
    out.write(str);
    out.close();
}

Exceptions/ errors while reading file

An Exception is a problem that arises during the execution of a program. When an exception occurs the normal flow of the program is disrupted and the program terminates abnormally, which is not recommended, therefore, these exceptions are to be handled. Exceptions can occur during compilation time or runtime of a program.

One of the common exceptions to occur are related to file handling. Two ways of handling such exceptions are by using try/catch/finally blocks or using a keyword from the Throwable class.

A method catches an exception using a combination of the try and catch keywords. A try/catch block is placed around the code that might generate an exception. Code within a try/catch block is referred to as protected code. The code which is prone to exceptions is placed in the try block. When an exception occurs, that exception occurred is handled by catch block associated with it. Every try block should be immediately followed either by a catch block or finally block. A catch statement involves declaring the type of exception you are trying to catch. If an exception occurs in protected code, the catch block that follows the try is checked. If the type of exception that occurred is listed in a catch block, the exception is passed to the catch block much as an argument is passed into a method parameter.

try 
{
   file = new FileInputStream(fileName);
   x = (byte) file.read();
} 
catch (IOException i) 
{
   i.printStackTrace();
   return -1;
} 
catch (FileNotFoundException f) // Not valid! 
{
   f.printStackTrace();
   return -1;
}

If a method does not handle a checked exception, the method must declare it using the throws keyword. The throws keyword appears at the end of a method's signature.
You can throw an exception, either a newly instantiated one or an exception that you just caught, by using the throw keyword.

Examples of exceptions related to file handling are:

  • FileNotFoundException- This Exception is raised when a file is not accessible or does not open
  • IOException- It is thrown when an input-output operation failed or interrupted
  • NullPointerException-This exception is raised when referring to the members of a null object