Huffman Coding Problem

Distribute Education like Computer Geek

Huffman coding is a widely used algorithm in the field of data compression. Huffman coding is primarily used for lossless data compression.

In lossless compression, the original data can be perfectly reconstructed from the compressed version, ensuring that no information is lost during the compression and decompression processes.

It is an example of a greedy algorithm that efficiently encodes data by assigning shorter codes to more frequent symbols.

.

Introduction

Huffman coding, invented by David A. Huffman in 1952, plays a crucial role in reducing the size of data for efficient storage and transmission.

It’s a variable-length prefix coding algorithm, meaning it assigns shorter codes to more frequent symbols. This ensures that commonly used symbols are encoded with fewer bits, reducing the overall size of the data.

.

Algorithm Of Huffman Code
  1. Make a leaf node for each character and also write down the frequency of each character in leaf node.
  2. Arrange the nodes in non-descending order of frequency value.
  3. Choose the first two nodes have the lowest frequency,
  4. Make a new internal node by joining the two nodes have lowest frequency.
  5. The frequency of this new internal node is the sum of the frequencies of the two previous nodes.
  6. Make the first node the left child of the newly generated node, and the other node the right child.
  7. Repeat Steps 2 to 6 until all of the nodes form a single tree.

The tree that was ultimately obtained is the Huffman Tree for the input.

.

How Huffman Code Works

Let’s consider a simple example with the string

             “ABRA-KA-DABRA.”

First, we calculate the frequency of each character.

Frequency of each Character to form Huffman Code Algorithm
Frequency of each Character

.

The shortest frequency is (K, 1) and (D, 1).

Then, we make two leaf nodes K and D and write its Frequency.

Huffman Tree of K & D

.

The Shortest frequency is (B, 2), (R, 2), (-, 2) and (KD, 2)

Then, we make two leaf nodes B and R and write its frequency.

Also, we make other leaf node (-) with internal node KD and write its frequency.

Huffman Tree of BR & -KD

.

The Shortest frequency is (BR, 4) and (-KD, 4).

Then, we join two internal nodes BR and -KD and write its frequency.

Huffman Tree of BR-KD of Huffman Tree Algorithm

.

The Shortest frequency is (A, 5) and (BR-KD, 8).

Then, we join two internal nodes A & BR-KD and write its frequency.

Huffman Tree of ABR-KD

.

We have constructed the Huffman tree and there is only 1 node left (ABR-KD) and its frequency is 13.

.

Now, in Huffman tree, Start from the root node.

Label the edge to the left child as 0 and the edge to the right child as 1. Repeat the process at both the left child and the right child.

Huffman Tree

Count the edge number of every leaf child.

A = 0

B = 100

R = 101

(-) = 110

K = 1110

D = 1111

So, the shortest form of ABRA-KA-DABRA is 0100101011011100110111101001010.

Huffman Code Data Compression
Huffman Code Data Compression

Without Encoding = 8*13 = 104 bits

With Encoding = 31 bits

.

Source Code for Huffman Code

Time Complexity of Huffman Code

The time complexity of the Huffman coding algorithm depends on several factors.

  1. Frequency Counting – In the first step, the algorithm counts the frequency of each character in the input. The time complexity for frequency counting is O(n), where ‘n’ is the number of characters in the input.
  1. Building the Huffman Tree – When using a priority queue (min-heap), the time complexity to build the Huffman Tree is O(nlogn), where ‘n’ is the number of characters.
  1. Generating Huffman Codes – Once the Huffman tree is constructed, the algorithm traverses the tree to generate Huffman codes for each character. This step has a time complexity of O(n) since each character is processed once.

Time Complexity of Huffman Code is

è O(n) + O(nlogn) + O(n) = O(nlogn).

.

Space Complexity of Huffman Code

Since time complexity depends on many factors, the same applies to space complexity.

  1. Frequency Counting – The space complexity for this step is O(n), where ‘n’ is the number of unique characters in the input. In this step, you need to store the frequency counts of each character. Therefore, the space required is proportional to the number of unique characters.
  1. Building the Huffman Tree – If a priority queue is used, the space complexity for the tree is O(n), where ‘n’ is the number of unique characters. Additionally, you need to store internal nodes and their relationships. The total space used for the tree structure is also O(n).
  1. Generating Huffman Codes – As you generate Huffman codes for each character, you need space to store these codes. The space required for this step is O(n), where ‘n’ is the number of unique characters.

Space Complexity of Huffman Code is = O(n).

 

Advantages of Huffman Code
  1. Huffman coding is highly efficient in compressing data.
  2. It is used in various applications, including image and video compression, and in popular file compression formats like ZIP.
  3. can adapt to the input data frequencies.
  4. It minimizes the average length of the code, which ensures that no other encoding method can achieve better compression for the same input data.
  5. Fast Encoding and Decoding
  6. It eliminates redundancy by assigning unique codes to each symbol.
Disadvantages of Huffman Code
  1. Inefficient for datasets that contains only one alphabet, like “The quick brown fox jumps over the lazy dog
  2. It does not include error detection or correction mechanisms.
Applications of Huffman Code
  1. Used in social media contents like WhatsApp and Facebook. But it is not meant for encryption; it’s for data compression.
  2. Used in optical discs, such as DVDs and Blu-ray discs.
  3. Data Transmission in IoT.
  4. Data Backup and Archiving.
  5. Dictionaries and Encyclopedias.

Test Yourself

Q1- Can a Huffman tree be unbalanced? If so, under what conditions might it happen?

A Huffman tree can become unbalanced in cases where the frequencies of characters are not distributed evenly. If certain characters have significantly higher frequencies than others, the tree may become skewed, which can result in inefficient encoding and longer codes for some characters.

Q2- Explain the concept of a “prefix-free” code in the context of Huffman coding. Why is it important?

In Huffman coding, a “prefix-free” code means that no code for one character is a prefix of the code for another character.

  1. This is important because it ensures that the encoded message can be uniquely decoded without any ambiguity.
  2. If a code were not prefix-free, it could lead to decoding errors.
For example, let’s say we have a simple alphabet with only three letters: A, B, and C. In a prefix-free code, we might assign the following codes: A = 0, B = 10, and C = 11. You can see that no code is the beginning of another code. You can read the bitstream 01011 as “BAC” without any confusion.
Q3- Explain the time complexity of the Huffman coding algorithm.

The time complexity is O(n log n), where ‘n’ is the number of unique characters, due to the building of the Huffman tree using a priority queue.

Q4- Explain the role of Huffman coding in data transmission for the Internet of Things (IoT).

Huffman coding reduces the size of data packets, enabling faster and more efficient data transmission in IoT applications with limited bandwidth.

Q5- For a given set of characters and their frequencies, how would you calculate the average code length in bits for the Huffman code?

You can calculate the average code length by summing the product of each character’s frequency and its code length and then dividing by the total number of characters.

Q6- What is the primary purpose of Huffman coding?
  1. Lossy data compression
  2. Error correction
  3. Lossless data compression
  4. Data encryption

Ans – (3)

Explanation – Huffman coding is primarily used for lossless data compression.

Q7- Who invented the Huffman coding algorithm?
  1. David A. Huffman
  2. Alan Turing
  3. Claude Shannon
  4. Robert Fano

Ans – (1)

Explanation – David A. Huffman invented the Huffman coding algorithm in 1952.

Q8- What does the Huffman coding algorithm assign to more frequent symbols?
  1. Longer codes
  2. Lower frequencies
  3. Shorter codes
  4. Random codes

Ans – (3)

Explanation – Huffman coding assigns shorter codes to more frequent symbols.

Q9- What is the time complexity of the Huffman coding algorithm for building the Huffman tree?
  1. O(n)
  2. O(log n)
  3. O(1)
  4. O(n log n)

Ans – (4)

Explanation – The time complexity for building the Huffman tree is O(n log n).

Q10- In Huffman coding, what is the compression ratio?
  1. Original size / Compressed size
  2. Compressed size / Original size
  3. Original size – Compressed size
  4. Compressed size – Original size

Ans – (1)

Explanation – The compression ratio is calculated as the original size divided by the compressed size.

Q11- In which application does Huffman coding play a crucial role in reducing data size?
  1. Real-time video streaming
  2. Optical character recognition (OCR)
  3. Image and video compression
  4. Data encryption

Ans – (3 & 4)

Explanation – Huffman coding is commonly used for compressing images and videos.

Q12- What is the main advantage of Huffman coding in data compression?
  1. Lossy compression
  2. Random code assignment
  3. Fast encoding and decoding
  4. No compression

Ans – (3)

Explanation – Huffman coding allows for fast encoding and decoding of data, making it efficient for various applications.

BOOKS

Algorithm T H CoremanAlgorithm by Udit AgarwalAlgorithm Ellis HorowitzData Structure & Algorithm Made Easy Narasimha

Â