Open-Source Internship opportunity by OpenGenus for programmers. Apply now.
In this article, we have explored the 5 advantages of Huffman coding and why it is one of the best encoding method despite being so simple.
Table of contents:
- Brief description of Huffman coding
- 5 Advantages of Huffman coding
Brief description of Huffman coding
Developed by David Huffman, Huffman coding is an algorithm used to losslessly encode and decode data. Huffman coding uses a greedy algorithm and a binary tree so that encoding and decoding is fast.
Huffman coding assigns codes to characters such that the length of the code depends on the relative frequency of the corresponding charcacter. Thus this method generates variable-length bit sequences called codes in such a way that the most frequently occurring character has the shortest code length. This is an optimal way to minimize the average access time of characters. It provides prefix codes and hence ensures lossless data compression and prevents ambiguity in the process of decoding. A binary tree to store the data about the frequncies (or equivalently probabilities of occurrence) of symbols. Codes are generated by traversing the binary tree.
5 Advantages of Huffman coding
Advantages of Huffman coding are:
- Huffman coding uses variable-length encoding
- Huffman coding uses prefix-free code
- Huffman coding is computationally simple
- Huffman coding takes O(n log(n)) time
- The code words have shortest average length
We have explored each point/ advantage in depth.
-
Huffman coding uses variable-length encoding
While character encodings like ASCII that uses fixed-length encoding are convenient because the boundaries between characters are easily determined. In ASCII each character code takes exactly 8 bits. In practise, not all characters occur with the same frequncy. Huffman coding takes advantage of this and assigns smaller code for more frequent characters at the expense of assigning larger codes to less frequent ones. This is more efficient than fixed-length encoding since such characters with larger codes assigned would appear rarely. -
Huffman coding uses prefix-free code
Huffman encoding has prefix property. In the encoding no other code word exists that is a prefix of another valid codeword. This property is required to use the variable-length coding to avoid any ambiguities during the decoding process. Thus the encodeed text would be uniquely decodable. -
Huffman coding is computationally simple
Both encoding and decoding of symbols using Huffman coding is fast and uses less memory. Huffman coding uses a table of frequency of occurrence for each symbol in the input. A binary tree is used to determine the codes. This tree is to be sent alongwith the encoded data for it to be successfully decoded. Decoding is done by simply traversing the binary tree. -
Huffman coding takes O(n log(n)) time
The time complexity of the Huffman algorithm is O(nlog(n)) where n is the number of characters in text. The weights in the tree correspond to the frequency of occurrence of the character. The greedy approach places the n characters in n sub-trees and starts by combining the two least weight nodes into a tree which is assigned the sum of the two leaf node weights as the weight for its root node.
Using a heap to store the weight of each tree, each iteration requires O(log(n)) time to determine the cheapest weight and insert the new weight. There are O(n) iterations, one for each item.
With this article at OpenGenus, you must have the complete idea of advantages of Huffman coding.